Staying on the same topic of optimisation that we visited in the last post concerning portfolio holdings and efficient frontiers/portfolio theory, I thought I would quickly revisit the moving average crossover strategy we built a few posts ago; the previous article can be found here.
Optimisation of Moving Average Crossover Trading Strategy In Python
In that post we built a quick backtest that had the number of days used for the short moving average and the long moving average hard coded in at 42 and 252 days respectively. This is fine for a preliminary run to test our code and make sure it is running correctly, but what are the chances that those two particular moving average periods generate the highest returns, or highest Sharpe ratio out of all the possible (sensible) variations of moving average periods?
Well the only way to answer this question is to run multiple backtests, using varying moving average periods each time and record the results. Numpy arrays are great for this purpose; we can initiate a multi-dimensional array with all value set to zeros, and then as we iterate through our collection of backtests, we can firstly alter the moving average periods with each run through, but we can also store the results in the relevant numpy array cell for later analysis.
Let’s refactor the code into a couple of functions that we can then call later:
def ma_strat(ticker,short_ma,long_ma): #read in data from Yahoo Finance for the relevant ticker sp500 = data.DataReader(ticker, 'yahoo',start='01/01/2000') sp500['short_ma'] = np.round(sp500['Close'].rolling(window=short_ma).mean(),2) sp500['long_ma'] = np.round(sp500['Close'].rolling(window=long_ma).mean(),2) #create column with moving average spread differential sp500['short_ma-long_ma'] = sp500['short_ma'] - sp500['long_ma'] #set desired number of points as threshold for spread difference and create column containing strategy 'Stance' X = 50 sp500['Stance'] = np.where(sp500['short_ma-long_ma'] > X, 1, 0) sp500['Stance'] = np.where(sp500['short_ma-long_ma'] < X, -1, sp500['Stance']) sp500['Stance'].value_counts() #create columns containing daily market log returns and strategy daily log returns sp500['Market Returns'] = np.log(sp500['Close'] / sp500['Close'].shift(1)) sp500['Strategy'] = sp500['Market Returns'] * sp500['Stance'].shift(1) #set strategy starting equity to 1 (i.e. 100%) and generate equity curve sp500['Strategy Equity'] = sp500['Strategy'].cumsum() + 1 sharpe = annualised_sharpe(sp500['Strategy']) return (sp500['Strategy'].cumsum()[-1], sharpe) #function to calculate Sharpe Ratio - Risk free rate element excluded for simplicity def annualised_sharpe(returns, N=252): return np.sqrt(N) * (returns.mean() / returns.std())
So we can now use the “numpy.linspace()” function to create an array of values that we can assign to represent the different values of short moving average window and long moving average window that we wish to run the tests over, as follows:
short_ma = np.linspace(10,60,25,dtype=int)
As the moving average window needs to be an integer, we include the specification “dtype=int” which casts the values as integers. What “linspace” does is create an array of values starting at the first number passed to the function and ending at the second value, and taking a number of steps that is set by the third value passed to the function – so the example above would create an array of 25 equally spaced integer values starting at 10 and ending at 60.
We do this for the long moving average window also:
long_ma = np.linspace(220,270,25,dtype=int)
Here we set an array of 25 equally spaced integer values starting at 220 and ending at 270.
Now we have to initialise 2 numpy arrays that will hold the results of our various backtest iterations; one array to hold the ending p&l and one array to hold the Sharpe ratio. We set them to be 2 dimensional with sizes/lengths equal to the length of each of our short and long moving average array values that we wish to iterate over.
results_pnl = np.zeros((len(short_ma),len(long_ma))) results_sharpe = np.zeros((len(short_ma),len(long_ma)))
Now let’s get to the meat of the code that allows us to actually go through all of the various combinations of short and long moving average windows held in our two moving average arrays:
for i, shortma in enumerate(short_ma): for j, longma in enumerate(long_ma): pnl, sharpe = ma_strat('^GSPC',shortma,longma) results_pnl[i,j] = pnl results_sharpe[i,j] = sharpe
As an FYI, this code takes a little bit of time to run so if you’re following along, give it 5-10 minutes for this to complete – the code isn’t massively efficient and the permutations of possible combinations of moving averages if quite large so the code has to run through quite a few times to get our results; just be patient! 😉
OK, so once the code has finished running, we now have two numpy arrays “results_pnl” and “results_sharpe” that hold the ending P&L data and the Sharpe ratio for each run through of the backtest (with respective combinations of long and short moving averages) respectively.
Arrays are difficult to gain insight from in their raw format – so let’s visualise the results with a color plot, which should allow us to see where the combinations of moving average windows gave the best results – here are the results of the P&L array:
plt.pcolor(short_ma,long_ma,results_pnl) plt.colorbar() plt.show()
So we can see from this that there seems to be a sweet spot around the top left of the plot, signifying the use of a short moving average of around 22 days and a long moving average of around 262 days may produce the best results, whereas we can see from the blue area in the bottom right that using a combination of 40 days and 232 days may produce the worst results.
When we plot our Sharpe Ratio array, in this instance we see that the results are almost identical, meaning the highest Sharpe ratio can also be found by using a combination of 22 and 262 days for our moving averages. The Sharpe Ratio color plot is shown below:
plt.pcolor(short_ma,long_ma,results_sharpe) plt.colorbar() plt.show()
So hopefully that gives you an idea how to go about optimising strategy backtest input parameters – in this case we only had two parameters to optimise and still the code took a little bit of time to run, so you can see that as the number of variables that need to be optimised increases, the time needed to run the iterations increases exponentially. In other words, optimising more than a few parameters through “brute force” methods such as this can take a LONG time!
Anyway, I’ll leave it there for this post. Until next time!