# What are ‘skew’ lookin’ at? – calculating skew and kurtosis with Python…

Categories Basic Data Analysis**Python**…**skew**…**kurtosis**…

So you have a series of returns you wish to analyse….mean and variance are easy to calculate…how easy does Python make it to calculate skew and kurtosis?

When we look at a series of investment returns, we tend to concentrate on the first 2 ‘moments’ of the distribution; that is the mean and the variance of the returns. The mean gives us a representation of the average expected return, and the variance gives us a measure of the dispersion of returns around the mean. But those two measures don’t give us the full picture. There are ‘higher order’ moments to be aware of…

Skewness is a measure of the symmetry in a distribution. A symmetrical dataset will have a skewness equal to 0. So, a normal distribution will have a skewness of 0. Skewness essentially measures the relative size of the two tails.

Kurtosis is a measure of the combined sizes of the two tails. It measures the amount of probability in the tails. The value is often compared to the kurtosis of the normal distribution, which is equal to 3. If the kurtosis is greater than 3, then the dataset has heavier tails than a normal distribution (more in the tails). If the kurtosis is less than 3, then the dataset has lighter tails than a normal distribution (less in the tails). Careful here. Kurtosis is sometimes reported as “excess kurtosis.” Excess kurtosis is determined by subtracting 3 from the kurtosis. This makes the normal distribution kurtosis equal 0.

So let’s move on to using Python to analyse the skew and kurtosis of a returns series.

To get our return series, we will use Pandas to download the historical stock prices for, let’s say Google, and turn that price series into a series of daily percentage returns.

Firstly, we need to import the relevant Pandas module, along with the ‘data’ function from the ‘pandas_datareader’ module. We then use the ‘data’ function to download Google price history from Yahoo Finance from a start date of 01/01/2000.

We then use the ‘.head()’ method to show the top 5 lines of the DataFrame we have created to hold the Google price data.

import pandas as pd from pandas_datareader import data GOOG = data.DataReader('GOOG', "yahoo", start='01/01/2000') GOOG.head() |

We can then add a column to the Pandas DataFrame that holds the percentage daily returns, and print out the ‘.head()’ method to show us the result as follows:

GOOG['Percentage Returns'] = GOOG['Adj Close'].pct_change() GOOG.head() |

We can now see the DataFrame contains the ‘Percentage Returns’ column:

Now we can very easily plot the histogram of returns to show the shape of the distribution

GOOG['Percentage Returns'].plot(kind='hist',bins=100) |

Then we can quickly find out the first two moments (mean and variance) of the distribution using the following commands:

print 'Mean =:',GOOG['Percentage Returns'].mean() print 'Variance =:',GOOG['Percentage Returns'].var() |

Mean =: 0.00112814536595

Variance =: 0.000409815254785

So on first glance, the distribution looks kind of ‘normal’, albeit a bit more peaked and with fatter tails than would be expected with a truly normal distribution. Let’s see if we can overlay a plot of an actual normal distribution, with the same mean and variance as our Google returns to compare more easily.

import matplotlib.pyplot as plt import numpy as np import scipy.stats as stats #convert pandas DataFrame object to numpy array and sort h = np.asarray(GOOG['Percentage Returns'].dropna()) h = sorted(h) #use the scipy stats module to fit a normal distirbution with same mean and standard deviation fit = stats.norm.pdf(h, np.mean(h), np.std(h)) #plot both series on the histogram plt.plot(h,fit,'-',linewidth = 2) plt.hist(h,normed=True,bins = 100) plt.show() |

Ok so our suspicions seem to have been confirmed; there is excess kurtosis for sure, but it’s difficult to tell if there is any skewness.

To get confirmation we can just run

print 'Skew =', GOOG['Percentage Returns'].skew() print 'Kurtosis =', GOOG['Percentage Returns'].kurt() |

Skew = 0.978932703599

Kurtosis = 10.9417780005

(n.b as pointed out by James Kilfiger in the comments – the Pandas Kurtosis function “returns unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1”. So this is already “Excess Kurtosis”.)

And there we have it, confirmation that our Google returns distribution has significant excess kurtosis and is slightly positively skewed.

So that was pretty darn easy…I’m starting to like this Python language more and more!

Useful article, thanks.

According to docs, the kurt() function returns “Fisher’s definition of kurtosis (kurtosis of normal == 0.0).” In other words, it returns the excess kurtosis, so your “kurt() – 3” above is not needed or correct. Instead you can just use the value 10.9.

Do I understand this correctly?

Hi James…Really good spot there! You’re 100% correct…that’ll teach me to read the docs more carefully in future!