Python Backtesting - ETF Mean Reversion - creating the ticker pairs

Python Backtesting – ETF Mean Reversion – creating the ticker pairs

Categories Trading Strategy Backtest

Right, welcome back and sorry for the slight delay between posts…I’ve been way more busy than I had hoped. Onto our python backtesting!

So this, I guess, could be considered the first proper post regarding the ETF mean reversion backtest script we’re trying to come up with. In the last post we went over creating our SQLite database and populating it with the ETF data scraped from www.etf.com. So we should now have over 1000 ETF tickers at our disposal to pull down and use in conjunction with the Pandas DataReader to pull daily pricing data from the web and use in our backtest.

I wanted to do it this way so that we didn’t just have a database full of unidentifiable ETF tickers, but rather we now have a whole raft of supporting information to go along with each one; underlying asset class, geographic region and “focus” to name but three. These categories will allow us to pull down the ETF tickers and create pairs that are more likley to be co-integrating due to underlying fundamental factors, rather than ploughing through untold permutations of random tickers in the hope that we stumble across something worthwhile. For example, 2 ETFs that track silver as an underlying asset are more likely to be co-integrated than one ETF that tracks silver and another that tracks utilities companies in the Asia pacific region right? Seems to make sense at least…

Let’s start some code!

import sqlite3 as db
import pandas as pd
 
#set the database file path we wish to connect to
#this will obviously be unique to wherever you created
#the SQLite database on your local system
database = 'C:\Users\Stuart\sqlite_databases\etfs.db'
 
#this is the SQL statement containing the information
#regarding which tickers we want to pull from the database
#As an example I have chosen to pull down all tickers which
#have their "Focus" listed as being "Silver"
sql = 'SELECT Ticker FROM etftable WHERE Focus = "Silver";'
 
#create a connection to the database specified above
cnx = db.connect(database)
cur = cnx.cursor()
 
#execute the SQL statement and place the results into a 
#variable called "tickers"
tickers = pd.read_sql(sql, con=cnx)

If we now print out tickers we get a DataFrame as follows:

   Ticker
0     SLV
1    SIVR
2     AGQ
3    USLV
4     ZSL
5    SLVO
6    DSLV
7     DBS
8     USV
9     SLV
10   SIVR
11    AGQ
12   USLV
13    ZSL
14   SLVO
15   DSLV
16    DBS
17    USV

I’m going to then quickly iterate over the data in this DataFrame and append each item into a list so we can use them more easily later.

#create an empty list
symbList = []
 
#iterate over the DataFrame and append each item into the empty list
for i in xrange(len(tickers)):
    symbList.append(tickers.ix[i][0])

Now time to create a quick function that will take our list “symbList” as an input, and return a list of unique ticker pairs. We obviously want to work with pairs of tickers as that’s what we will be feeding into our main function for the backtest, which we will create later.

Now this may not be the neatest way to go about doing this, but it’s the best I’ve got at the moment and it does the job just fine so far. If however, some more advanced python enthusiasts stumble across this post and have some constructive criticism – well I’m always happy to learn a thing or two.

 
def get_symb_pairs(symbList):
    '''symbList is a list of ETF symbols
       This function takes in a list of symbols and 
       returns a list of unique pairs of symbols'''
 
    symbPairs = []
 
    i = 0
 
    #iterate through the list and create all possible combinations of
    #ticker pairs - append the pairs to the "symbPairs" list
    while i < len(symbList)-1:
        j = i + 1
        while j < len(symbList):
            symbPairs.append([symbList[i],symbList[j]])
            j += 1
        i += 1
 
    #iterate through the newly created list of pairs and remove any pairs
    #made up of two identical tickers
    for i in symbPairs:
        if i[0] == i[1]:
            symbPairs.remove(i)
 
    #create a new empty list to store only unique pairs        
    symbPairs2 = []
 
    #iterate through the original list and append only unique pairs to the 
    #new list 
    for i in symbPairs:
        if i not in symbPairs2:
            symbPairs2.append(i)
 
    return symbPairs2

Once that’s defined, we can add:

symbPairs = get_symb_pairs(symbList)

And if we print symbPairs we will get:

[[u'SLV', u'SIVR'],
 [u'SLV', u'AGQ'],
 [u'SLV', u'USLV'],
 [u'SLV', u'ZSL'],
 [u'SLV', u'SLVO'],
 [u'SLV', u'DSLV'],
 [u'SLV', u'DBS'],
 [u'SLV', u'USV'],
 [u'SIVR', u'AGQ'],
 [u'SIVR', u'USLV'],
 [u'SIVR', u'ZSL'],
 [u'SIVR', u'SLVO'],
 [u'SIVR', u'DSLV'],
 [u'SIVR', u'DBS'],
 [u'SIVR', u'USV'],
 [u'SIVR', u'SLV'],
 [u'AGQ', u'USLV'],
 [u'AGQ', u'ZSL'],
 [u'AGQ', u'SLVO'],
 [u'AGQ', u'DSLV'],
 [u'AGQ', u'DBS'],
 [u'AGQ', u'USV'],
 [u'AGQ', u'SLV'],
 [u'AGQ', u'SIVR'],
 [u'USLV', u'ZSL'],
 [u'USLV', u'SLVO'],
 [u'USLV', u'DSLV'],
 [u'USLV', u'DBS'],
 [u'USLV', u'USV'],
 [u'USLV', u'SLV'],
 [u'USLV', u'SIVR'],
 [u'USLV', u'AGQ'],
 [u'ZSL', u'SLVO'],
 [u'ZSL', u'DSLV'],
 [u'ZSL', u'DBS'],
 [u'ZSL', u'USV'],
 [u'ZSL', u'SLV'],
 [u'ZSL', u'SIVR'],
 [u'ZSL', u'AGQ'],
 [u'ZSL', u'USLV'],
 [u'SLVO', u'DSLV'],
 [u'SLVO', u'DBS'],
 [u'SLVO', u'USV'],
 [u'SLVO', u'SLV'],
 [u'SLVO', u'SIVR'],
 [u'SLVO', u'AGQ'],
 [u'SLVO', u'USLV'],
 [u'SLVO', u'ZSL'],
 [u'DSLV', u'DBS'],
 [u'DSLV', u'USV'],
 [u'DSLV', u'SLV'],
 [u'DSLV', u'SIVR'],
 [u'DSLV', u'AGQ'],
 [u'DSLV', u'USLV'],
 [u'DSLV', u'ZSL'],
 [u'DSLV', u'SLVO'],
 [u'DBS', u'USV'],
 [u'DBS', u'SLV'],
 [u'DBS', u'SIVR'],
 [u'DBS', u'AGQ'],
 [u'DBS', u'USLV'],
 [u'DBS', u'ZSL'],
 [u'DBS', u'SLVO'],
 [u'DBS', u'DSLV'],
 [u'USV', u'SLV'],
 [u'USV', u'SIVR'],
 [u'USV', u'AGQ'],
 [u'USV', u'USLV'],
 [u'USV', u'ZSL'],
 [u'USV', u'SLVO'],
 [u'USV', u'DSLV'],
 [u'USV', u'DBS']]

Fantastic! We now have a full list of unique ticker pairs made up of ETFs that have “Silver” listed as their “Focus”.

This will be very useful when it comes to feeding inputs into our main backtesting function which we will create over the next few blog posts.

As always, any questions or comments please leave them below. I’m always interested in what others have to say 😀

Until next time…

It's only fair to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someonePin on PinterestShare on Reddit

Leave a Reply