Getting Started with Pandas for data Science

Note

while I am a huge fun of R , i love teaching some python flavours for data science and my favourite package is statsmodels

Hello world!

Calculations with variables

Create a variable factor, equal to 1.10. Use savings and factor to calculate the amount of money you end up with after 7 years. Store the result in a new variable, result. Print out the value of result.

\[ result=savings*factor^{years}\]

Other Operations

equality

  • To check if two Python values, or variables, are equal you can use \(==\).
  • To check for inequality, you need !=. As a refresher, have a look at the

And Or Not

A boolean is either 1 or 0, True or False. With boolean operators such as and, or and not, you can combine these booleans to perform more advanced queries on your data.

IF-statements

Examine the if statement that prints out “Looking around in the kitchen.” if room equals “kit”. Write another if statement that prints out “big place!” if area is greater than 15.

if-Else statements

  • Add an else statement to the second control structure so that “pretty small.” is printed out if area > 15 evaluates to False.

Data Structures and Sequences

python has simple yet powerful data structures and just like R , mastering these data structures is a critical part of becoming a proficient Python programmer.

Tuple

A tuple is a fixed length immutable sequence of python objects . The easiest way to create one is with comma-separated sequence of values

List

A list can contain any Python type. Although it’s not really common, a list can also contain a mix of Python types including strings, floats, booleans, etc.

Subsetting lists

unlike R and other languages , python uses zero-based indexing implying it starts counting elements from 0

Subsetting Python lists is a piece of cake. Take the code sample below, which creates a list x and then selects “b” from it. Remember that this is the second element, so it has index 1. You can also use negative indexing.

x = ["a", "b", "c", "d"] x[1] a

x[-3] a # same result!

Merge lists

Create lists first and second

Paste together first and second: full

List Methods

Most list methods will change the list they’re called on. Examples are:

  • append() - that adds an element to the list it is called on,
  • remove() - that removes the first element of a list that matches the input, and
  • reverse()- that reverses the order of the elements in the list it is called on.
Create a list

Reverse element

More methods

  • index() - to get the index of the first element of a list that matches its input and
  • count() - to get the number of times an element appears in a list.

Dictionaries

Dictionaries can contain key:value pairs where the values are again dictionaries.

my_dict = {"key1":"value1","key2":"value2",}

for instance

Accessing a dictionery

If the keys of a dictionary are chosen wisely, accessing the values in a dictionary is easy and intuitive. For example, to get the capital for France from europe you can use:

europe['france']

Here, ‘france’ is the key and ‘paris’ the value is returned.

Manipulating Dictioneries

If you know how to access a dictionary, you can also assign a new value to it. To add a new key-value pair to europe you can use something like this:

europe['iceland'] = 'reykjavik'

  • Add italy to europe

lists to Dictionary to DataFrames

Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python. Sounds promising!

The DataFrame is one of Pandas’ most important data structures. It’s basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.

Dataframes and analysis

indexing

  • The simplest, but not the most powerful way, is to use square brackets.

cars['sepal_length'] cars[['sepal_length']]

The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.

Adding on

Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame. The following call selects the first five rows from the cars DataFrame:

cars[0:5] The result is another DataFrame containing only the rows you specified.

loc and iloc

With loc and iloc you can do practically any data selection operation on DataFrames you can think of.

  • loc is label-based, which means that you have to specify rows and columns based on their row and column labels.
  • iloc is integer index based, so you have to specify rows and columns by their integer index.

the following will give out the same results