Like R, Python can be used with a variety of Integrated Development Environments (IDEs), mainly dependent on what you are using Python for. Because we will be using Python for Data Science, we will be using the DataSpell
IDE from JetBrains. Released in 2021 by the company, this IDE was designed specifically for exploratory data analysis by combining interactivity of Jupyter notebooks (A very popular web-based application for programming in Python that is as closest to an R Markdown as it can get) and coding assistance similar to PyCharm (An IDE mainly used for professional web development) and RStudio.
The big advantages of using DataSpell over the conventional Anaconda Distribution platform (You can learn more about Anaconda here) can be summarized as:
DataSpell is offered for free to university students, so as part of today's agenda, we will be downloading it, setting it up, and taking a quick tutorial on it before getting down to the Python basics. The following section outlines the steps for this activity.
When creating a new file, you can either create a python script, or a Jupyter notebook. For the purposes of this workshop, we will work within the confines of a Jupyter notebook. As a result, it is important to become familiar with shortcuts that will make your time more bearable when programming in Python.
The following is a shortcut guide meant to give you some insight on the key strokes you can use to navigate across a notebook.
There are blank cells created below for purpose of trying some of the key strokes out.
Like in R, Python has similar data types and building blocks necessary for computing more intensive tasks. This section will cover a few of those basic types of commands.
Like in R, Python allows us to assign a range of values to objects, which can then be used to perform a wide set of operations. In this section, we will go over basic object creation and basic arithmetic operations that can be done.
While in R, you can create an object using the <-
or =
operators, here, you can only use the latter method to create an object. You can, however, continue to use the #
to make notes within your lines of code. For example:
import sys
# Creating basic integer objects
A = 5
B = 10
C = 30
Objects created will be stored for viewing in the 'Jupyter Variables' window that can be accessed by clicking on the bottom right tab that also says 'Jupyter Variables'.
Like in R, you can call objects by either using the print()
function or simply typing out the object name.
# Using print function
print(A)
# Simply calling the object
A
We are able to use several arithmetic operators within Python. Using the objects created above, we will execute some of commonly used ones.
# Addition
A+B+C
45
# Subtraction
C-B
20
# Multiplication
A*B*C
1500
# Exponentiation
A**A
3125
# Division
C/A
6.0
Dividing will always produce a floating value (i.e., number with decimals or scientific notation). If you want to produce a scalar integer (e.g., number without decimals or scientific notation), you can use the floor division operator //
.
C//A
6
If you want to convert a floating variable or a string into an integer, you can use the int()
function. You can use the type()
function to determine the type (or class for Python's lingo) of object your variable is. Alternatively, you can see the class of the object in the Jupyter Variables window as well.
D = 6.88889
# Using `type()` to show it is a float
print(type(D))
int(D)
<class 'float'>
6
We use the print()
function so that both outputs could be displayed above. If you do not use print when wanting to see multiple outputs, the cell will only display to you the last function it executed.
Conversely, you can also use the str()
function to convert an integer into a string (i.e., text or characters)
str(D)
# Like R, you can chain functions inside each other to save space
type(str(D))
str
Operations with strings in Python work slightly different than in R. Unlike in R, where you cannot use any arithmetic operators with string type objects, you are able to use the multiplication operator in Python. For example:
E = 'Hello'
E*7
'HelloHelloHelloHelloHelloHelloHello'
As you can see, Python interprets the operation of multiplication as an order to repeat the text stored in the object 'E', seven times.
You can also concatenate string objects together with the addition operator.
# Not accounting for space
print('Python'+'Worskshop')
# Accounting for space
print('Python '+'Worskshop')
PythonWorskshop Python Worskshop
Python can also use Boolean operations, which is a tool used to check for truth or falsity of created boolean variables (variables that take on either the value of TRUE OR FALSE). The and
operator checks both objects to make sure they are deemed 'TRUE' and only if they are true, you will receive a 'TRUE' output. or
, on the otherhand, checks to make sure that at least one object of the two is deemed 'TRUE', and if at least one of them is 'TRUE', you will receive a 'TRUE' output. For example:
variable1 = True
variable2 = False
# Checking to see if both booleans are true
print(variable1 and variable2)
# Checking to see if at least one boolean is true
print(variable1 or variable2)
variable3 = True
# We should get an output of 'True' here
print(variable1 and variable3)
False True True
Booleans are also used for comparison operators, such as >
(greater than), <
(less than), >=
(greater than or equal), <=
(less than or equal), and ==
(equal to).
print(5>2)
print(2>5)
print(5==2)
True False False
Like in R, it is possible to store multiple values within a single object. These vectors come in different forms, depending on what you want to use them for. In a gist, these types of vectors are:
We will first discuss tuples and lists, as they are very similar in nature with one few key distinction. Tuples and Lists are similar in the following way:
The only distinction between both of these data type vectors is that tuples are not able to be changed, whereas lists are. In the context of Python, a tuple is an immutable vector, which basically means that no observation within it can be modified.
Why is this important and when should you use tuples over lists or vice versa?
It all comes down to memory and time efficiency. Because lists are mutable, Python allocates an additional amount of memory block to allow it to be changed at any time. As a result, lists take up a bit more memory space than tuples. How does this affect the time component? Because of this extra memory block, indexing parts of a list can take slightly longer than if you tried to find observations in a tuple. Realistically, while there are some memory and time differences, the gap between them can either be small or large depending on the kind of data you are working with. The following code is meant to demonstrate these principles.
### The first set of code is meant to compare how much memo
## Generating data for a tuple | Tuples are created by using parentheses `()` when creating an object or by using the `tuple()` function when creating an object
# importing random and sys packages to help us generate n values and get system information
import random
import time
random.seed(4)
tuple1 = tuple(random.sample(range(1,5000),4000))
## Generating data for a list | Lists are created by using the brackets `[]` when creating an object or by using the `list()` function alike
random.seed(4)
list1 = list(random.sample(range(1,5000),4000))
## Getting memory sizes of the tuple and list
print(sys.getsizeof(tuple1), 'bytes of memory for the tuple object')
print(sys.getsizeof(list1), 'bytes of memory for the list object')
## Getting system time data for indexing the tuple and list
start_time = time.time()
for item in tuple1 :
aa = tuple1[3999]
end_time = time.time()
print("Lookup time for tuple: ", end_time - start_time)
start_time = time.time()
for item in list1 :
aa = list1[3999]
end_time = time.time()
print("Lookup time for list: ", end_time - start_time)
32040 bytes of memory for the tuple object 32056 bytes of memory for the list object Lookup time for tuple: 0.0005550384521484375 Lookup time for list: 0.0005030632019042969
With the code above, we can see how while there are differences, they are minimal in this example.
The following code snippet is meant to show in a more simple way how to create both of these vectors.
# Creating a tuple
tuple2 = (1,2,3.3,'A','B','C',True)
print(type(tuple2))
# Creating a list
list2 = [1,2,3.3,'A','B','C', True]
print(type(list2))
<class 'tuple'> <class 'list'>
Like in R, we can index specific observations from both a tuple or list. We do this with the []
operator. Unlike R that begins counting at 1, Python's system starts at 0, so it is important to keep this in mind when wanting to index an object.
# Indexing 4th observation from tuple
print(tuple2[3])
# Indexing 7th observation from list
print(list2[6])
A True
We can take it step further and subset specific observations from either of these objects via 'slicing'. This is done by using the following syntax: 'list[x:y]', where you tell Python to grab the the observations that start from x all the way up to (but not include) y. For example:
# Slicing list to get the values that range from 1 all the way up to 3.3 only
print(list2[0:3])
# Slicing list to get all the values that start at A and go to the end
print(list2[3:])
[1, 2, 3.3] ['A', 'B', 'C', True]
As mentioned above, the main difference between tuples and lists is that only a list can be changed. We can change the content of a list by indexing the position we want and assigning it a new value. This called 'Mutating'.
# Showing original list
print(list2)
# Mutating list by changing the 4th observation (A) to a new letter (J)
list2[3] = 'J'
list2
[1, 2, 3.3, 'A', 'B', 'C', True]
[1, 2, 3.3, 'J', 'B', 'C', True]
Now that we covered tuples and list, we can talk about Dictionaries. Dictionaries, while similar to Lists, follow a numeric index where each element is assigned a key. That key will then be tied to the corresponding index and element assigned. Dictionaries can be created either with the dict()
function or by creating an object and using braces '{}'. The following are examples showing both methods.
# Crating a dictionary using `dict()` function
dictionary1 = dict(One='1', Two=2, Three='3', Four=4, Five='5', Apple=True)
print(dictionary1)
# Creating a dictionary using object creation and braces
dictionary2 = {'One':'1', 'Two':2, 'Three':'3', 'Four':4, 'Five':'5', 'Apple':True}
print(dictionary2)
{'One': '1', 'Two': 2, 'Three': '3', 'Four': 4, 'Five': '5', 'Apple': True} {'One': '1', 'Two': 2, 'Three': '3', 'Four': 4, 'Five': '5', 'Apple': True}
Like indexing tuples and lists, you can index a dictionary using the same '[]' operator and typing the exact name of the key.
# Indexing particular keys from dictionary1
print(dictionary1['Four'])
dictionary1['Apple']
4
True
We have been using a variety of functions throughout this lesson to get Python to perform a set of operations. Functions are essentially instructions that the computer takes through arguments in order to produce a desired result. Similar to R, we can use integrated functions or we can create our own set of functions.
The following chunk of code demonstrates some functions that can assist your programming in Python. In addition, the link here showcases a list of built-in python functions available to its users.
# The `len()` function tells you the length of an object
# Using len() to find the length of 'list1' created earlier
print(len(list1))
# 'round()' function rounds a desired number to the place that you'd like
print(round(3.14159265359, 2))
# 'max()' function returns the largest value of an object
print(max(list1))
# 'min()' function returns the smallest value of an object
print(min(list1))
# 'sum()' function calculates the sum of all values found within an object
list4 = [1,2,3,4,5,6]
print(sum(list4))
# 'list()' function converts an object into a list
tuple3 = (1,2,3,4,5,6,7,8,9,10)
print(type(tuple3))
list3 = list(tuple3)
print(type(list3))
# `tuple()` function does the same but the opposite
tuple4 = tuple(list3)
print(type(tuple4))
4000 3.14 4999 2 21 <class 'tuple'> <class 'list'> <class 'tuple'>
Creating functions can be very straight forward and they can be combined with multiple operators and arguments depending on what your desired outcome is. For a basic example, we will be creating a function that calculates the mean of a number it is given.
# Defining a function that takes the mean of a list
def function1(X):
entire_sum = sum(X)
entire_length = len(X)
mean = entire_sum/entire_length
print(mean)
function1(list1)
2499.2215
While we created a function to calculate the mean above, there are already packages that exist that contain functions that allow us to do this and many other operations. These packages are called libraries in Python and they are downloaded to your workspace by using the import
command. Depending on the library name, you can import it and assigned it a shorthand name to avoid having to spell out the entire name of the library to use it. In the following examples, you can see how we will use the import
function, then add the "as 'x'" argument right after so that Python stores that library as the shorthand name assigned.
# Importing packages that will be used for the next lesson
import pandas as pd
import numpy as np
# Importing package that can calculate statistical operations such as finding the mean of an object
import statistics as st
From this point, you can use the library imported to conduct operations of interest. In this case, we will use the mean()
function from the 'statistics' package to find the mean of 'list1'
st.mean(list1)
2499.2215
You must use the library name or its shorthand everytime in order to perform the desired operation or else Python will not know what function you are referring to.