Python Backtesting Mean Reversion – Part 4

I thought I’d finish off this short series of Python Backtesting Mean Reversion by providing a full, executable script that incorporates the use of SQL queries to extract our ticker symbols from the SQLite database we created in an earlier post (This can be found here)

In this particular example I have decided to run a series of backtests on ticker symbols from the database, based upon a “Niche” of “Gold Miners”. In theory this selection of tickers based upon some non-arbitrary, meaningful grouping criteria should allow us to focus in on pairs of symbols that are more likely to have statistically meaningful co-integration of prices series.

To test other groups of symbols, all we have to do is update the SQL query based upon whichever criteria we are interested in….just as an example, we could change the query to select symbols based upon an “Asset Class” of “Fixed Income”. The possible combinations are obviously plentiful, and we could even incorporate multiple criteria at once to hone our focus even further.

So, after following the instructions to create our database, one should be able to just copy and paste the entire code below into a python script and it will run through all the backtests in one go:

 

#import needed modules
#import needed modules
from datetime import datetime
from pandas_datareader import data
import pandas as pd
import numpy as np
from numpy import log, polyfit, sqrt, std, subtract
import statsmodels.tsa.stattools as ts
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
import pprint
import sqlite3 as db
 
 
#set the database file path we wish to connect to
#this will obviously be unique to wherever you created
#the SQLite database on your local system
database = 'C:\Users\S666\sqlite_databases\etfs.db'
 
#this is the SQL statement containing the information
#regarding which tickers we want to pull from the database
#As an example I have chosen to pull down all tickers which
#have their "Focus" listed as being "Silver"
sql = 'SELECT Ticker FROM etftable WHERE Niche = "Value";'
 
#create a connection to the database specified above
cnx = db.connect(database)
cur = cnx.cursor()
 
#execute the SQL statement and place the results into a 
#variable called "tickers"
tickers = pd.read_sql(sql, con=cnx)
 
 
#create an empty list
symbList = []
 
#iterate over the DataFrame and append each item into the empty list
for i in xrange(len(tickers)):
    symbList.append(tickers.ix[i][0])
 
 
def get_symb_pairs(symbList):
    '''symbList is a list of ETF symbols
       This function takes in a list of symbols and 
       returns a list of unique pairs of symbols'''
 
    symbPairs = []
 
    i = 0
 
    #iterate through the list and create all possible combinations of
    #ticker pairs - append the pairs to the "symbPairs" list
    while i < len(symbList)-1:
        j = i + 1
        while j < len(symbList): 
            symbPairs.append([symbList[i],symbList[j]]) 
            j += 1 
            i += 1 
 
    #iterate through the newly created list of pairs and remove any pairs #made up of two identical tickers 
    for i in symbPairs: 
        if i[0] == i[1]: 
            symbPairs.remove(i) 
    #create a new empty list to store only unique pairs 
    symbPairs2 = [] 
    #iterate through the original list and append only unique pairs to the 
    #new list 
    for i in symbPairs: 
        if i not in symbPairs2: 
            symbPairs2.append(i) 
 
    return symbPairs2 
 
 
 
symbPairs = get_symb_pairs(symbList) 
 
 
def backtest(symbList): 
    start_date = '2012/01/01' 
    end_date = datetime.now() 
    #download data from Yahoo Finance 
    y=data.DataReader(symbList[0], "yahoo", start=start_date, end=end_date) 
    x=data.DataReader(symbList[1], "yahoo", start=start_date, end=end_date) 
 
    #rename column to make it easier to work with later 
    y.rename(columns={'Adj Close':'price'}, inplace=True) 
    x.rename(columns={'Adj Close':'price'}, inplace=True) 
 
    #make sure DataFrames are the same length 
    min_date = max(df.dropna().index[0] for df in [y, x]) 
    max_date = min(df.dropna().index[-1] for df in [y, x]) 
    y = y[(y.index >= min_date) &amp; (y.index <= max_date)] 
    x = x[(x.index >= min_date) &amp; (x.index <= max_date)]
 
    ############################################################
 
    plt.plot(y.price,label=symbList[0])
    plt.plot(x.price,label=symbList[1])
    plt.ylabel('Price')
    plt.xlabel('Time')
    plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
    plt.show()
 
    #############################################################
 
    sns.jointplot(y.price, x.price ,color='b')
    plt.show()
 
    #############################################################
 
    #run Odinary Least Squares regression to find hedge ratio
    #and then create spread series
    df1 = pd.DataFrame({'y':y['price'],'x':x['price']})
    est = sm.OLS(df1.y,df1.x)
    est = est.fit()
    df1['hr'] = -est.params[0]
    df1['spread'] = df1.y + (df1.x * df1.hr)
 
    ##############################################################
 
    plt.plot(df1.spread)
    plt.show()
 
    ##############################################################
 
    cadf = ts.adfuller(df1.spread)
    print 'Augmented Dickey Fuller test statistic =',cadf[0]
    print 'Augmented Dickey Fuller p-value =',cadf[1]
    print 'Augmented Dickey Fuller 1%, 5% and 10% test statistics =',cadf[4]
 
    ##############################################################
 
    def hurst(ts):
    	"""Returns the Hurst Exponent of the time series vector ts"""
    	# Create the range of lag values
    	lags = range(2, 100)
 
    	# Calculate the array of the variances of the lagged differences
    	tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]
 
    	# Use a linear fit to estimate the Hurst Exponent
    	poly = polyfit(log(lags), log(tau), 1)
 
    	# Return the Hurst exponent from the polyfit output
    	return poly[0]*2.0
 
 
    ##############################################################
 
    print "Hurst Exponent =",round(hurst(df1.spread),2)
 
    ##############################################################
 
    #Run OLS regression on spread series and lagged version of itself
    spread_lag = df1.spread.shift(1)
    spread_lag.ix[0] = spread_lag.ix[1]
    spread_ret = df1.spread - spread_lag
    spread_ret.ix[0] = spread_ret.ix[1]
    spread_lag2 = sm.add_constant(spread_lag)
 
    model = sm.OLS(spread_ret,spread_lag2)
    res = model.fit()
 
 
    halflife = round(-np.log(2) / res.params[1],0)
 
    if halflife <= 0:
        halflife = 1
 
    print  'Halflife = ',halflife
 
    ##############################################################
 
 
    meanSpread = df1.spread.rolling(window=halflife).mean()
    stdSpread = df1.spread.rolling(window=halflife).std()
 
    df1['zScore'] = (df1.spread-meanSpread)/stdSpread
 
    df1['zScore'].plot()
    plt.show()
 
    ##############################################################
 
    entryZscore = 2
    exitZscore = 0
 
    #set up num units long             
    df1['long entry'] = ((df1.zScore < - entryZscore) & ( df1.zScore.shift(1) > - entryZscore))
    df1['long exit'] = ((df1.zScore > - exitZscore) & (df1.zScore.shift(1) < - exitZscore)) df1['num units long'] = np.nan df1.loc[df1['long entry'],'num units long'] = 1 df1.loc[df1['long exit'],'num units long'] = 0 df1['num units long'][0] = 0 df1['num units long'] = df1['num units long'].fillna(method='pad') #set up num units short df1['short entry'] = ((df1.zScore >  entryZscore) & ( df1.zScore.shift(1) < entryZscore))
    df1['short exit'] = ((df1.zScore < exitZscore) & (df1.zScore.shift(1) > exitZscore))
    df1.loc[df1['short entry'],'num units short'] = -1
    df1.loc[df1['short exit'],'num units short'] = 0
    df1['num units short'][0] = 0
    df1['num units short'] = df1['num units short'].fillna(method='pad')
 
    df1['numUnits'] = df1['num units long'] + df1['num units short']
    df1['spread pct ch'] = (df1['spread'] - df1['spread'].shift(1)) / ((df1['x'] * abs(df1['hr'])) + df1['y'])
    df1['port rets'] = df1['spread pct ch'] * df1['numUnits'].shift(1)
 
    df1['cum rets'] = df1['port rets'].cumsum()
    df1['cum rets'] = df1['cum rets'] + 1
 
    ##############################################################
 
    try:
        sharpe = ((df1['port rets'].mean() / df1['port rets'].std()) * sqrt(252)) 
    except ZeroDivisionError:
        sharpe = 0.0
 
 
 
    plt.plot(df1['cum rets'])
    plt.xlabel(i[1])
    plt.ylabel(i[0])
    plt.show()
 
    ##############################################################
 
 
 
    start_val = 1
    end_val = df1['cum rets'].iat[-1]
 
    start_date = df1.iloc[0].name
    end_date = df1.iloc[-1].name
    days = (end_date - start_date).days
 
    CAGR = round(((float(end_val) / float(start_val)) ** (252.0/days)) - 1,4)
 
 
    print "CAGR = {}%".format(CAGR*100)
    print "Sharpe Ratio = {}".format(round(sharpe,2))
    print 100*"----"
 
for i in symbPairs:
    backtest(i)

So hopefully this makes it as easy as possible for anyone reading this to just copy and paste and run this whole thing locally.

I must admit, the backtest results are far from amazing, and many, many of them end in negative territory. However there are a couple of pairs in there that generate quite a smooth equity curve with an upward trajectory (and this is just from the “Niche” = “Gold Miners” run, returning about 4-5% CAGR with Sharpe rations of 1.9 – 2.5. For an example, check out [GDX, RING] and [GDX, PSAU] ticker pairs.

Of course, we havn’t made any allowance for transaction costs so those small gains could well end up in negative territory also, depending on how many trades need to be carried out.

I think I will leave it here for this series; I think I’ve covered pretty much everything that i wanted to cover and have ended up with a fully working, executable script to carry out basic, initial tests on ETF pairs to identify any possible candidates for a mean-reversion strategy.

Onto something new for next time…until then!

It's only fair to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someonePin on PinterestShare on Reddit
Written by s666