Get Started with Python: Create Your First Program

Having extracted useful information from the database, we'll need to process it further to generate better insights.

SQL isn't designed for this kind of manipulation. Its main strength is to query data from tables within databases.

Python is a general purpose high-level programming language that can handle data manipulation. It is easy to learn even for absolute beginners.

python for data science

In this article, you'll learn how to set up anaconda(a python distribution) and write your first python program.

Why Python For Data Science?

Python's massive selection of libraries are used for data manipulation such as statistical analysis, regression tests, and time-series data manipulation. This is very difficult to achieve using SQL exclusively.

Pandas is a python library that helps with the further manipulation of the tabular data fetched using SQL.

Matplotlib and Seaborn amongst others are used for generating beautiful visualization charts that can show obvious patterns in data.

Scikit-learn, TensorFlow and PyTorch are libraries for building machine learning models.

Libraries like Beautiful Soup and Scrapy can be used to pull data from the web.

Setting up the Anaconda Environment

Anaconda is a free open-source python distribution equipped with various libraries and packages suitable for Windows, Linux and MacOs. It is a very popular for Data Science.

Follow these steps to install Anaconda on your machine:

Download the Anaconda Installer

Visit the official Anaconda page: Anaconda official page

Scroll down to the Downloads section. It should look like this.

download anaconda

Download the latest version(Python 3.7 as at the time of writing) for your operating system. You can check your system information from the Control Panel.

The file is a bit large so it might take some time to download.

Install Anaconda

After your installer must have successfully downloaded, navigate to your download folder and double click to open the exceutable file(.exe extension)

Click 'Next' to agree to installation.

anaconda installation

Agree to license

agree to license

When you get to the advance installation part, uncheck "Add Anaconda to my PATH environment variable"

path variable

After successful installation, type "Anaconda Powershell Prompt" in your search bar.

anaconda powershell

Click on the app to open and wait for the shell to get ready.

powershell

Type 'python' in the shell and press the 'Enter' key.

python

If you get the the message as shown above, Python has been successfully installed on your computer through the anaconda distribution.

The ">>>" symbol indicates that the python interpreter has been opened. You can run programs instantly from the interpreter.

Type exit() in the interpreter and press the 'Enter' key to return to the anaconda powershell prompt.

exit python

Run Your First Python Program

We'll use Jupyter Notebook for running our python programs. Jupyter Notebook is a powerful tool preinstalled in the anaconda environment for developing open-source projects.

With your anaconda powershell prompt open, type "jupyter notebook", click enter and wait for the browser to open.

jupyter in powershell

jupyter interface

Navigate to a directory to save your python file or stay in the root directory.

Click on the 'New' dropdown at the top and select 'Python 3'. This should open a new tab in your browser.

python in jupyter

Enter the code as shown below and press "Ctrl" + "Enter" to run the code.

python hello world

print() is a python function that prints whatever is put in its parentheses to the screen. As seen above, "hello world" has been printed to the screen.

print("mydatacourse") outputs mydatacourse to the screen without the quotes

You would notice we put "hello world" and "mydatacourse" in quotes. These are strings.

Strings are series of characters surrounded by quotes. It can be a single or double quote. 'python' and "python" are both strings. You'll see more of this in next article.

print(2) outputs 2 to the screen. You would notice we didn't surround 2 with quotes. 2 is of a different data type(integer). Integer data types are used for storing whole numbers. 3, 50, and 0 are integers.

2 is different from "2". One is of the integer data type and the other is of the string data type. The difference between these two is that we can perform mathematical operations directly on the integer data type but not strings.

Variables

Variables are labels that we can assign values to. They can only reference one value at a time.

We can assign a value to a variable by using the "=" symbol.

message = "hello world" assigns "hello world" to the message variable

When we print a variable, we are actually printing the value assigned to it. The previous code can be written as shown below.

python variables

Please note that variables are not surrounded with quotes.

But why do we need to use variables when we can print them directly?

Variables make it easy for us to use a value multiple times in a program. This might not matter so much now but imagine we have multiple lines of text that we'll need in various sections of our program. It'll be repetitive to type the full text everywhere it is needed.

Let's say we want to change the message content. Without variables, we'll have to change the message everywhere that it appears. But with variables, we only need to change the content once which is where we referenced the message.

Variables make our code easy to maintain.

We can also reassign a value to a variable. When we do this, we are assigning another value to the variable. The variable now references the new value.

reassign variable

In the code above, we reassigned 'welcome' to the message variable. This variable now references "welcome". As seen, the print statement outputs welcome.

Summary

Python is a general purpose high-level programming language that can handle data manipulation in Data Science

The choice of Python for Data Science is because of its vast selection of libraries.

Anaconda is a popular free open-source toolkit equipped with various libraries and packages for Data science.

Python's print function is used to output values to the screen.

Variables are labels used to reference values.

No comments:

Powered by Blogger.