Getting Started with Pandas for data Science
while I am a huge fun of R , i love teaching some python flavours for data science and my favourite package is
statsmodels
Hello world!
Calculations with variables
Create a variable factor, equal to 1.10. Use savings and factor to calculate the amount of money you end up with after 7 years. Store the result in a new variable, result. Print out the value of result.
\[ result=savings*factor^{years}\]
Other Operations
equality
- To check if two Python values, or variables, are equal you can use \(==\).
- To check for inequality, you need
!=
. As a refresher, have a look at the
And Or Not
A boolean is either 1 or 0, True or False. With boolean operators such as and, or and not, you can combine these booleans to perform more advanced queries on your data.
IF-statements
Examine the if statement that prints out “Looking around in the kitchen.” if room equals “kit”. Write another if statement that prints out “big place!” if area is greater than 15.
if-Else statements
- Add an else statement to the second control structure so that “pretty small.” is printed out if area > 15 evaluates to False.
Data Structures and Sequences
python has simple yet powerful data structures and just like
R
, mastering these data structures is a critical part of becoming a proficientPython
programmer.
Tuple
A tuple is a fixed length immutable
sequence of python objects . The easiest way to create one is with comma-separated
sequence of values
List
A list can contain any Python type. Although it’s not really common, a list can also contain a mix of Python types including strings, floats, booleans, etc.
Subsetting lists
unlike R and other languages , python uses
zero-based indexing
implying it starts counting elements from 0
Subsetting Python lists is a piece of cake. Take the code sample below, which creates a list x and then selects “b” from it. Remember that this is the second element, so it has index 1. You can also use negative indexing.
x = ["a", "b", "c", "d"]
x[1]
a
x[-3]
a
# same result!
Print out second element from areas
Print out last element from areas
Print out the area of the living room
Merge lists
Create lists first and second
Paste together first and second: full
List Methods
Most list methods will change the list they’re called on. Examples are:
append()
- that adds an element to the list it is called on,remove()
- that removes the first element of a list that matches the input, andreverse()
- that reverses the order of the elements in the list it is called on.
Create a list
Reverse element
More methods
index()
- to get the index of the first element of a list that matches its input andcount()
- to get the number of times an element appears in a list.
Dictionaries
Dictionaries can contain
key:value
pairs where the values are again dictionaries.
my_dict = {"key1":"value1","key2":"value2",}
for instance
Accessing a dictionery
If the keys of a dictionary are chosen wisely, accessing the values in a dictionary is easy and intuitive. For example, to get the capital for France from europe you can use:
europe['france']
Here, ‘france’ is the key and ‘paris’ the value is returned.
Manipulating Dictioneries
If you know how to access a dictionary, you can also assign a new value to it. To add a new key-value pair to europe you can use something like this:
europe['iceland'] = 'reykjavik'
- Add italy to europe
lists to Dictionary to DataFrames
Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python. Sounds promising!
The DataFrame is one of Pandas’ most important data structures. It’s basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.
Dataframes and analysis
indexing
- The simplest, but not the most powerful way, is to use square brackets.
cars['sepal_length']
cars[['sepal_length']]
The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.
Adding on
Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame. The following call selects the first five rows from the cars DataFrame:
cars[0:5]
The result is another DataFrame containing only the rows you specified.
loc and iloc
With loc and iloc you can do practically any data selection operation on DataFrames you can think of.
- loc is label-based, which means that you have to specify rows and columns based on their row and column labels.
- iloc is integer index based, so you have to specify rows and columns by their integer index.
the following will give out the same results