Python Mean Reversion Backtest for ETFs…
I have been looking into using Python to create a backtesting script to test mean reversion strategies based on cointegrated ETF pairs. I have broken down the process in my head into several stages, each of which will form the basis of my next few blog posts. I’m not sure how many I will have to split the whole “shebang” across, but here’s the basic outline.
1) Create an SQLite3 database, which will hold ETF data in order for Tickers to be easily pulled into the main backtest program, based upon a user defined SQL query. I want the database to hold information regarding the fund ticker, obviously, but also want a whole raft of other descriptive information regarding geographic region, underlying securities and industry/asset focus to name a few. This will allow me to run the main backtest, based upon tickers that meet my requirements for that particular backtest. Co-integrated pairs are usually co-integrated for an economically valid reason; Being able to call down tickers based upon criteria such as the underlying ETF asset class and industry will hopefully allow me to identify ETF pairs which are more likely to display a co-integrated relationship.
2) Write the script that actually scrapes the relevant data from an ETF information based website and stores it in my previously created database. For this, I will be using the Pandas module as it has a built in ability to read and parse html, converting pure html into a Pandas DataFrame which we can then easily upload to the database.
3) Then comes the “meat and potatos”…time to write the actual backtest program itself. I’ll be looking to firstly, create a “valid” backtest based upon the concept of trading the spread between cointegrated ETF pairs, as mentioned previously. This will entail a few distinct working parts. Firstly, we will need to download the relevant ETF ticker data from the database we created earlier, then create a list of ticker pairs out of that single list of tickers, which we can feed into the backtest function to work it’s magic.
4) We then need to run a regression on the price data between the pair to calculate the “hedge ratio”, which we will use to generate the correct spread between the prices. This spread will then be subjected to a number of tests to discover whether it displays statistically significant mean reverting properties. This will involve us running co-integrated augmented Dickey-Fuller tests and calculating the Hurst exponent and “half-life” of the series among other things.
5) Once we have identified whether the spread series is mean reverting, we can then run a back test based upon a “Bollinger band” style system where entries and exits are determined by the divergence and convergence of a “normalised spread”, calculated as the deviation of the spread from it’s mean, divided by the standard deviation of the spread itself.
6) We then store the returns of the particular pairs strategy, and plot the resulting equity curve. We will also take the opportunity to calculate some key performance indicators and display them under each chart. This information will all be displayed, before the script loops around and tests the next pair of ETF tickers held in our list of ticker pairs.
Now I can promise we will get some sort of working code at the end of all this…but I definitely can’t promise it will be pretty! As I have previously stated I’m on a learning journey here, so sometimes I struggle with writing pretty code, that all works faultlessly. It often takes me more than a few goes at the same thing, before I actually get something that does what it is I want it to do. So bear with me, please…!
I’ll try to get the next blog post out within the next few days with part one and two of the process outlined above. See you soon!