# Python Pandas…Eats, shoots and leaves…?

Ok so I thought it was about time we got into the nitty gritty of some “financial analysis”, rather than have me spout info regarding available beginners courses like I’m some sort of salesman. “I want to do some analysis”, I hear you scream…

Well ok then…let’s start. With the Python Pandas module.

So financial analysis usually starts by asking questions. I mean, our analysis has to be focused on something right? Data/financial analysis is just a tool, rather than an end in itself.

So to begin with, we’re going to try to answer 3 very simple questions regarding various stocks we may have in mind:

1) What was the change in the price of the stock over time, along with the volume traded of that stock?

2) What was the average daily return of the stock?

3) What is the correlation between this stock and multiple other stocks we may have in mind?

To carry out this analysis we will be using a python module called Pandas. Pandas is an absolutely awesome Python module; so powerful and easy to use. The name is derived from the term “panel data”, an econometrics term for multidimensional structured data sets.

Pandas is based around an object called a “DataFrame”, which is kind of similar in theory to a data table found in Excel. I believe the wide ranging use of Excel in the financial world is what motivated Wes McKinney (the creator of Pandas) to create Pandas in the first place.

The first thing we need to do is install the pandas module (if you don’t already have it), along with a few other packages.

If you have Python already installed, along with “pip” then it’s as easy as going to the command line and typing:

You can do the same for matplotlib, numpy and seaborn (you may already have these preinstalled with your python distribution anyway)

Note: you may need to upgrade your pip package – if so go to here to find out how.

Once Pandas and the other modules are installed, we need to begin by importing those modules:

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set_style('whitegrid') from pandas.io.data import DataReader |

FYI the “sns.set_style(‘whitegrid’)” just sets us up to use a nice pre set plot scheme, provided by the seaborn library. Hopefully it will just make our charts look s a little nicer.

The “from pandas.io.data import DataReader” imports a neat little Pandas capability that allows us to download stock data directly from Yahoo Finance. Can’t ask for better than that, eh?

Then, let us choose a few stocks to analyse. If we take the tech sector we could choose Apple, Google, Microsoft and Amazon.

We can download each of those stocks’ data into a DataFrame from Yahoo as follows:

start_date = '2010/01/01' AAPL = DataReader('AAPL', "yahoo",start_date) GOOG = DataReader('AAPL', "yahoo",start_date) MSFT = DataReader('MSFT', "yahoo",start_date) AMZN = DataReader('AMZN', "yahoo",start_date) |

If we then print out one of those DataFrames using the built in Pandas function “.head()” (which prints the top 5 results of the DataFrame) we get the following:

AAPL.head() |

Once we have our data in our DataFrames, creating a plot is as simple as using the built in “plot” method and just typing:

AAPL['Adj Close'].plot() |

Which produces the following chart

We can even very simply chart out the volume traded by using the following command:

AAPL['Volume'].plot() |

Which gives us

Calculating the average daily return of the stock is very simple, and involves 2 steps. Firstly, we need to convert the price series into a daily percent returns series. We can do this using the “pct_change()” function and then the “mean()” function:

AAPL['Daily Return'] = AAPL['Adj Close'].pct_change() AAPL['Daily Return'].mean() |

This gives us the output

`0.0012557649005606581` |

Showing that the average daily return over the period 01/01/2010 – 23/03/2016 was 0.12558%. Fantastic!

Ok now let’s look at some correlation calculations. Firstly we need to get a DataFrame set up that holds only the ‘Adjusted Close” prices for each of our 4 tech stocks. We can do this with the following command:

tech_list = ['APPL', 'MSFT', 'GOOG', 'AMZN'] closing_df = DataReader(tech_list,'yahoo',start_date)['Adj Close'] |

Then run the “pct_change()” function on that DataFrame and store it in a new DataFrame called “rets” as follows:

rets = closing_df.pct_change() |

Once we have these daily returns, we can then use the Seaborn “jointplot” function to get a visual representation of the correlation between two of the stocks, along with the “pearson r” value as follows – for example using Google and Microsoft.

sns.jointplot('GOOG', 'MSFT',rets, kind='scatter') |

So I hope you can see from this brief introduction to the power of Pandas (along with numpy, matplotlib and seaborn) that it is more than a worthwhile addition to our module library.

This is just a hint of what lies beneath…stay tuned and hopefully I can bring you some more soon.

Thank you so much. Really nice explanation!

Just one thing:

The module pandas.io.data has been moved. So for those that does not know how to solve it, you have to change pandas.io.data for pandas_datareader.

Thanks very much for your comment, that is indeed the case – I shall try to update the post when I get some free time!