Home Basic Data Analysis What are ‘skew’ lookin’ at? – calculating skew and kurtosis with Python…

What are ‘skew’ lookin’ at? – calculating skew and kurtosis with Python…

by s666
python skew kurtosis

Pythonskewkurtosis

So you have a series of returns you wish to analyse….mean and variance are easy to calculate…how easy does Python make it to calculate skew and kurtosis?

When we look at a series of investment returns, we tend to concentrate on the first 2 ‘moments’ of the distribution; that is the mean and the variance of the returns. The mean gives us a representation of the average expected return, and the variance gives us a measure of the dispersion of returns around the mean. But those two measures don’t give us the full picture. There are ‘higher order’ moments to be aware of…

Skewness is a measure of the symmetry in a distribution. A symmetrical dataset will have a skewness equal to 0. So, a normal distribution will have a skewness of 0. Skewness essentially measures the relative size of the two tails.

Kurtosis is a measure of the combined sizes of the two tails. It measures the amount of probability in the tails. The value is often compared to the kurtosis of the normal distribution, which is equal to 3. If the kurtosis is greater than 3, then the dataset has heavier tails than a normal distribution (more in the tails). If the kurtosis is less than 3, then the dataset has lighter tails than a normal distribution (less in the tails). Careful here. Kurtosis is sometimes reported as “excess kurtosis.” Excess kurtosis is determined by subtracting 3 from the kurtosis. This makes the normal distribution kurtosis equal 0.

So let’s move on to using Python to analyse the skew and kurtosis of a returns series.

To get our return series, we will use Pandas to download the historical stock prices for, let’s say Google, and turn that price series into a series of daily percentage returns.

Firstly, we need to import the relevant Pandas module, along with the ‘data’ function from the ‘pandas_datareader’ module. We then use the ‘data’ function to download Google price history from Yahoo Finance from a start date of 01/01/2000.

We then use the ‘.head()’ method to show the top 5 lines of the DataFrame we have created to hold the Google price data.

import pandas as pd
from pandas_datareader import data
GOOG = data.DataReader('GOOG', "yahoo", start='01/01/2000')
GOOG.head()
Goog

We can then add a column to the Pandas DataFrame that holds the percentage daily returns, and print out the ‘.head()’ method to show us the result as follows:

GOOG['Percentage Returns'] = GOOG['Adj Close'].pct_change()

GOOG.head()
Goog2

We can now see the DataFrame contains the ‘Percentage Returns’ column:

Now we can very easily plot the histogram of returns to show the shape of the distribution

GOOG['Percentage Returns'].plot(kind='hist',bins=100)
hist1

Then we can quickly find out the first two moments (mean and variance) of the distribution using the following commands:

print 'Mean =:',GOOG['Percentage Returns'].mean()
print 'Variance =:',GOOG['Percentage Returns'].var()

Mean =: 0.00112814536595
Variance =: 0.000409815254785

So on first glance, the distribution looks kind of ‘normal’, albeit a bit more peaked and with fatter tails than would be expected with a truly normal distribution. Let’s see if we can overlay a plot of an actual normal distribution, with the same mean and variance as our Google returns to compare more easily.

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats

#convert pandas DataFrame object to numpy array and sort
h = np.asarray(GOOG['Percentage Returns'].dropna())
h = sorted(h)

#use the scipy stats module to fit a normal distirbution with same mean and standard deviation
fit = stats.norm.pdf(h, np.mean(h), np.std(h)) 

#plot both series on the histogram
plt.plot(h,fit,'-',linewidth = 2)
plt.hist(h,normed=True,bins = 100)      
plt.show() 
normplot

Ok so our suspicions seem to have been confirmed; there is excess kurtosis for sure, but it’s difficult to tell if there is any skewness.

To get confirmation we can just run

print 'Skew =', GOOG['Percentage Returns'].skew()
print 'Kurtosis =', GOOG['Percentage Returns'].kurt()
Skew = 0.978932703599<br>Kurtosis = 10.9417780005

(n.b as pointed out by James Kilfiger in the comments – the Pandas Kurtosis function “returns unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1”. So this is already “Excess Kurtosis”.)

And there we have it, confirmation that our Google returns distribution has significant excess kurtosis and is slightly positively skewed.

So that was pretty darn easy…I’m starting to like this Python language more and more!

You may also like

4 comments

James Kilfiger February 4, 2017 - 9:43 pm

Useful article, thanks.

According to docs, the kurt() function returns “Fisher’s definition of kurtosis (kurtosis of normal == 0.0).” In other words, it returns the excess kurtosis, so your “kurt() – 3” above is not needed or correct. Instead you can just use the value 10.9.

Do I understand this correctly?

Reply
s666 February 5, 2017 - 9:54 am

Hi James…Really good spot there! You’re 100% correct…that’ll teach me to read the docs more carefully in future!

Reply
Peter Westfall April 7, 2019 - 6:25 pm

Higher kurtosis implies greater extremity of tails rather than greater probability in the tails. The probability in the tails might decrease, but with large enough corresponding tail extension, the kurtosis will increase without bound. A simple example is the standardized Bernoulli distirbution (a two-point distirbution scaled so that the mean is zero and the variance is 1.0). The smaller of the two probabilities is the tail probability. As this tail probability tends toward zero, two things happen: (i) the tail, which is the possible data value corresponding to the smaller probability, tends toward either +infinity or -infinity, (ii) kurtosis tends toward infinity.

Reply
s666 April 7, 2019 - 6:56 pm

Hi Peter – Very interesting point, and an example you can’t argue with!! I am very aware that statistical properties and values often carry with them many subtle, but important characteristics and resulting effects on proper interpretation. I have to admit I have always interpreted Kurtosis as being the combined probability held in the two tails…but as I said, you have clearly given a simple example which proves this to be incorrect.

Also if you are Mr Peter H. Westfall of the following publication (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321753/) then who am I to argue!! 😉

I appreciate you taking the time to comment and pass on the correct understanding of Kurtosis!

I will update the body of the blog post and link to your research paper if you don’t mind.

Reply

Leave a Reply

%d bloggers like this: