Python Backtesting Mean Reversion – Part 3

Welcome back everyone, finally I have found a little time to get around to finishing off this short series on Python Backtesting Mean Reversion strategy on ETF pairs.

In the last post we got as far as creating the spread series between the two ETF price series in question (by first running a linear regression to find the hedge ratio) and ran an Augmented Dickey Fuller test, along with calculating the half-life of that spread series to see whether it was a decent candidate for a tradable strategy pair.

Now we have to write the part of the script that will calculate the “normalised” Z-Score of the spread series, and set up a “bollinger-band” style entry and exit system whereby short trades are entered into if the normalised Z-Score rises above 2, and exits when it falls below 0, and vice-versa for long trades (i.e. the Z-Score has to fall below -2 to enter and exit when it rises above 0).

Now we are actually going to calculate the Z-Score by using a rolling window for the mean and standard deviation, set as the half-life previously calculated in the previous blog post. This saves us from either committing a look-forward bias by using the mean across the whole period, or from choosing an arbitrary look-back window that would need to be optimised and could lead to data-mining bias.

We calculate and plot the normalised Z-Score as follows:

meanSpread = df1.spread.rolling(window=halflife).mean()
stdSpread = df1.spread.rolling(window=halflife).std()
 
df1['zScore'] = (df1.spread-meanSpread)/stdSpread
 
df1['zScore'].plot()

z_score

Ok so this next part can be a little fiddly; what we need to end up with is a column in our dataframe that signifies whether we should currently be long, short or flat in terms of position. We can accomplish this by following a couple of steps.

Firstly we will set up a column called “num units long” which will signify when we need to be long by filling those rows with a 1, and fill the remaining rows with a 0 to signify no long position.

We will then do exactly the same for short positions by setting up a columns called “num units short” and fill those rows where we should be short with a -1 and those where there is no short position with a 0.

This is achieved as follows (we also set our absolute entry and exit Z-Scores as 2 and 0 respectively):

entryZscore = 2
exitZscore = 0
 
#set up num units long             
df1['long entry'] = ((df1.zScore < - entryZscore) & ( df1.zScore.shift(1) > - entryZscore))
df1['long exit'] = ((df1.zScore > - exitZscore) & (df1.zScore.shift(1) < - exitZscore)) 
df1['num units long'] = np.nan 
df1.loc[df1['long entry'],'num units long'] = 1 
df1.loc[df1['long exit'],'num units long'] = 0 
df1['num units long'][0] = 0 
df1['num units long'] = df1['num units long'].fillna(method='pad') 
 
#set up num units short 
df1['short entry'] = ((df1.zScore >  entryZscore) & ( df1.zScore.shift(1) < entryZscore))
df1['short exit'] = ((df1.zScore < exitZscore) & (df1.zScore.shift(1) > exitZscore))
df1.loc[df1['short entry'],'num units short'] = -1
df1.loc[df1['short exit'],'num units short'] = 0
df1['num units short'][0] = 0
df1['num units short'] = df1['num units short'].fillna(method='pad')

Now we can just create another column, which sums the “num units long” and “num units short” to get us our “numUnits” – the overall position that our portfolio should be in at that time; either long (1), short (-1) or flat (0).

We will also generate a column containing the percentage change of the spread series, and then generate a portfolio return column by multiplying the percentage change of the spread series by the current holding of the portfolio (long, short or flat).

The daily portfolio returns are then cumulatively added to generate an equity curve, held in “cum rets”.

df1['numUnits'] = df1['num units long'] + df1['num units short']
df1['spread pct ch'] = (df1['spread'] - df1['spread'].shift(1)) / ((df1['x'] * abs(df1['hr'])) + df1['y'])
df1['port rets'] = df1['spread pct ch'] * df1['numUnits'].shift(1)
 
df1['cum rets'] = df1['port rets'].cumsum()
df1['cum rets'] = df1['cum rets'] + 1

We can now plot the portfolio equity curve as follows:

plt.plot(df1['cum rets'])
plt.xlabel(i[1])
plt.ylabel(i[0])
plt.show()

equitycurve

Now all we have left to do is calculate the Sharpe Ratio and the Compound Annual Growth Rate (CAGR):

sharpe = ((df1['port rets'].mean() / df1['port rets'].std()) * sqrt(252))  
 
start_val = 1
end_val = df1['cum rets'].iat[-1]
 
start_date = df1.iloc[0].name
end_date = df1.iloc[-1].name
days = (end_date - start_date).days
 
CAGR = round(((float(end_val) / float(start_val)) ** (252.0/days)) - 1,4)
 
 
print "CAGR = {}%".format(CAGR*100)
print "Sharpe Ratio = {}".format(round(sharpe,2))
CAGR = 0.33%
Sharpe Ratio = 0.09

Wow ok so that result doesn’t look too great at all in terms of returns – hardly better than flat and when transaction fees and trading costs are taken into account that’s going to be a negative overall return.

Now I guess all that is left is to test the strategy over different ETF pairs and over different time frames.

We’ll get onto that in the next and final blog post regarding this particular mean reversion strategy.

One final caveat to all this, is that I have tried my best to write this Python backtest in an accurate and logical way – if anyone can spot any errors in the code, whether from a scripting perspective or indeed from a fundamental strategy error perspective please do let me know in the comments section. Remember I am still learning Python so my word is far, far from gospel by any means.

I’ll try not to leave it as long until the next post this time – I need a kick up the ass to be a little more active with my updates!!!

It's only fair to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someonePin on PinterestShare on Reddit
Written by s666