Staying on the same topic of optimisation that we visited in the last post concerning portfolio holdings and efficient frontiers/portfolio theory, I thought I would quickly revisit the moving average crossover strategy we built a few posts ago; the previous article can be found here.
Optimisation of Moving Average Crossover Trading Strategy In Python
In that post we built a quick backtest that had the number of days used for the short moving average and the long moving average hard coded in at 42 and 252 days respectively. This is fine for a preliminary run to test our code and make sure it is running correctly, but what are the chances that those two particular moving average periods generate the highest returns, or highest Sharpe ratio out of all the possible (sensible) variations of moving average periods?
Well the only way to answer this question is to run multiple backtests, using varying moving average periods each time and record the results. Numpy arrays are great for this purpose; we can initiate a multi-dimensional array with all value set to zeros, and then as we iterate through our collection of backtests, we can firstly alter the moving average periods with each run through, but we can also store the results in the relevant numpy array cell for later analysis.
Let’s refactor the code into a couple of functions that we can then call later:
def ma_strat(sp500,short_ma,long_ma): #read in data from Yahoo Finance for the relevant ticker sp500['short_ma'] = np.round(sp500['Close'].rolling(window=short_ma).mean(),2) sp500['long_ma'] = np.round(sp500['Close'].rolling(window=long_ma).mean(),2) #create column with moving average spread differential sp500['short_ma-long_ma'] = sp500['short_ma'] - sp500['long_ma'] #set desired number of points as threshold for spread difference and create column containing strategy 'Stance' X = 50 sp500['Stance'] = np.where(sp500['short_ma-long_ma'] > X, 1, 0) sp500['Stance'] = np.where(sp500['short_ma-long_ma'] < -X, -1, sp500['Stance']) sp500['Stance'].value_counts() #create columns containing daily market log returns and strategy daily log returns sp500['Market Returns'] = np.log(sp500['Close'] / sp500['Close'].shift(1)) sp500['Strategy'] = sp500['Market Returns'] * sp500['Stance'].shift(1) #set strategy starting equity to 1 (i.e. 100%) and generate equity curve sp500['Strategy Equity'] = sp500['Strategy'].cumsum() + 1 sharpe = annualised_sharpe(sp500['Strategy']) return (sp500['Strategy'].cumsum()[-1], sharpe) #function to calculate Sharpe Ratio - Risk free rate element excluded for simplicity def annualised_sharpe(returns, N=252): return np.sqrt(N) * (returns.mean() / returns.std())
So we can now use the “numpy.linspace()” function to create an array of values that we can assign to represent the different values of short moving average window and long moving average window that we wish to run the tests over, as follows:
short_ma = np.linspace(10,60,25,dtype=int)
As the moving average window needs to be an integer, we include the specification “dtype=int” which casts the values as integers. What “linspace” does is create an array of values starting at the first number passed to the function and ending at the second value, and taking a number of steps that is set by the third value passed to the function – so the example above would create an array of 25 equally spaced integer values starting at 10 and ending at 60.
We do this for the long moving average window also:
long_ma = np.linspace(220,270,25,dtype=int)
Here we set an array of 25 equally spaced integer values starting at 220 and ending at 270.
Now we have to initialise 2 numpy arrays that will hold the results of our various backtest iterations; one array to hold the ending p&l and one array to hold the Sharpe ratio. We set them to be 2 dimensional with sizes/lengths equal to the length of each of our short and long moving average array values that we wish to iterate over.
results_pnl = np.zeros((len(short_ma),len(long_ma))) results_sharpe = np.zeros((len(short_ma),len(long_ma)))
Now let’s get to the meat of the code that allows us to actually go through all of the various combinations of short and long moving average windows held in our two moving average arrays:
First we read in the S&P500 data from Yahoo Finance.
sp500 = data.DataReader(ticker, 'yahoo',start='01/01/2000')
Then create our lists of inputs to iterate over.
for i, shortma in enumerate(short_ma): for j, longma in enumerate(long_ma): pnl, sharpe = ma_strat(sp500,shortma,longma) results_pnl[i,j] = pnl results_sharpe[i,j] = sharpe
OK, so once the code has finished running, we now have two numpy arrays “results_pnl” and “results_sharpe” that hold the ending P&L data and the Sharpe ratio for each run through of the backtest (with respective combinations of long and short moving averages) respectively.
Arrays are difficult to gain insight from in their raw format – so let’s visualise the results with a color plot, which should allow us to see where the combinations of moving average windows gave the best results – here are the results of the P&L array:
plt.pcolor(short_ma,long_ma,results_pnl) plt.colorbar() plt.show()
So we can see from this that there seems to be a sweet spot around the top left of the plot, signifying the use of a short moving average of around 22 days and a long moving average of around 262 days may produce the best results, whereas we can see from the blue area in the bottom right that using a combination of 40 days and 232 days may produce the worst results.
When we plot our Sharpe Ratio array, in this instance we see that the results are almost identical, meaning the highest Sharpe ratio can also be found by using a combination of 22 and 262 days for our moving averages. The Sharpe Ratio color plot is shown below:
plt.pcolor(short_ma,long_ma,results_sharpe) plt.colorbar() plt.show()
So hopefully that gives you an idea how to go about optimising strategy backtest input parameters – in this case we only had two parameters to optimise and still the code took a little bit of time to run, so you can see that as the number of variables that need to be optimised increases, the time needed to run the iterations increases exponentially. In other words, optimising more than a few parameters through “brute force” methods such as this can take a LONG time!
Anyway, I’ll leave it there for this post. Until next time!
Hello, I would like to replicate this from data on my computer. I am confused how to do this given the ‘ticker’ portion of the function. Thanks!
Hi there, thanks for your comment – I can help you but the answer will depend on what format the data is currently held on your computer? Is it in csv format? And if so, what are the columns of data etc? If you could paste a sample of your data I’ll see what I can do.
Hi, First of all thanks for this article and your whole website. This information is crucial part of the research process. The code should be changed slightly, which would improve the performance alot!. There is no need to call “data.DataReader(ticker, ‘yahoo’,start=’01/01/2000′)” on every iteration. Just call once and pass the data to the method.
Hi James, thanks for your comment…that’s very true and a very simple, logical way to speed up the code – I should have thought of that myself! Appreciate your input.
I want to automate my strategy using Python. Can you help me writing script for me.
Hi Prashant, I can try to help you – what is your strategy logic and what have you come up with so far? I shall send you an email – perhaps it is easier to communicate off site for this one.
I wonder if you can illustrate how to optimize either returns or sharpe ratio or both (for eg. to generate returns > x, maximize sharpe) using scipy’s minimize(*-1) function. Mainly, for the mean-reversion stragteies that you posted earlier, I want the independent variables to be the entry zscores and regress_periods (which i’ll define below). It would be of great help. Thank you.
In context of https://www.pythonforfinance.net/2016/05/09/python-backtesting-mean-reversion-part-2/,
est = sm.OLS(df1.y[-regress_period:],df1.x[-regress_period:])
Hi Joe – you could perhaps just store your desired combinations of inputs in a list of tuples for example, and then just iterate through the list of tuples (inputs), run the mean reversion backtest function and store the resulting PnL in a multi dimensional array – and then plot a heatmap as shown in various posts on this site.
I appreciate that is rather a high level explanation,but to carry it out would take a substantial amount of time.
If it is something people are interested in, I am happy to resurrect the mean reversion posts and run optimisations on them.
If anyone wants this – then do please speak up 😀
I’m curious about a few lines of code you have here:
#set strategy starting equity to 1 (i.e. 100%) and generate equity curve
sp500[‘Strategy Equity’] = sp500[‘Strategy’].cumsum() + 1
sharpe = annualised_sharpe(sp500[‘Strategy’])
return (sp500[‘Strategy’].cumsum()[-1], sharpe)
The sharpe(…) call in the middle makes sense, but you construct the ‘Strategy Equity’ column but fail to use it, and I’m not sure what the plus one accomplishes. You’ve got log returns, so the cumsum gives the aggregate log return for the strategy, which you pretty obviously recognize in the return tuple.
Hi, I’ve re-created this strategy but I get only 0% return. Do you know what could be wrong?
I have some issue with the plotting part.
My pcolor plot straight lines on the x-axis instead of these small squares which you have on yours. Do you know what could have caused this issue?
Thanks in advance
Thank you for this post and all others in the site. They are great help to us who are learning. I wrote the code and it is working perfectly. James Bowler’s fix has also sped up the processing.
Nice code, but i have a question.
Why do you put the download script inside the function? I’ll prefer to get the data only one time, at the begininnging of the script. Is my idea uncorrect?
Hi there, no your idea is absolutely correct… I was obviously being a bit slow the day I posted this as yes indeed it is completely unnecessary to download the data again and again!! I should probably fix the code and change it.
Thanks for the post. How can we actually return the actual values of 22 and 262 in our console? Right now the code only returns the color bars , but what if we want to write a line of code to return the 22 and 262 in numeric terms so we can see the actual values that produce the best sharpe and pnl , instead of seeing it through a chart where we cannot see the actual best value? Thanks in advance!
Speeds up code by orders of magnitude if you move the get data part out of the for loop. Thanks JAMES BOWLER!
sp500 = data.DataReader(‘^GSPC’, ‘yahoo’,start=’01/01/2000′)
#read in data from Yahoo Finance for the relevant ticker
sp500[‘short_ma’] = np.round(sp500[‘Close’].rolling(window=short_ma).mean(),2)
sp500[‘long_ma’] = np.round(sp500[‘Close’].rolling(window=long_ma).mean(),2)
Hi there, thanks for the interesting content. I trying to reproduce your code but get the following error:
TypeError Traceback (most recent call last)
1 for i, shortma in enumerate(short_ma):
2 for j, longma in enumerate(long_ma):
—-> 3 pnl, sharpe = ma_strat(‘^GSPC’,shortma,longma)
4 results_pnl[i,j] = pnl
5 results_sharpe[i,j] = sharpe
/var/folders/7r/4y266hgj1_b4ddy3c5fj8rcr0000gn/T/ipykernel_66746/247670384.py in ma_strat(sp500, short_ma, long_ma)
1 def ma_strat(sp500,short_ma,long_ma):
2 #read in data from Yahoo Finance for the relevant ticker
—-> 3 sp500[‘short_ma’] = np.round(sp500[‘Close’].rolling(window=short_ma).mean(),2)
4 sp500[‘long_ma’] = np.round(sp500[‘Close’].rolling(window=long_ma).mean(),2)
TypeError: string indices must be integers
I have tired several things but I’m unable to figure out what I’m doing wrong. I am typecasting to int like you mentioned for:
short_ma = np.linspace(10,60,25,dtype=int)
long_ma = np.linspace(220,270,25,dtype=int).
To generate the closed data I’m using the following code:
ticker = [‘^GSPC’]
sp500 = data.DataReader(ticker, ‘yahoo’,start=’01/01/2017′)
Hope you are able to help me out with this example.
Thank you in advance
Hi Boet, I have to apologise – the error is my fault. I changed the structure of the code so as to pull the data from Yahoo once and then pass that to the main function, rather than pulling the data each time the function was run. It seems I forgot to update the function call in the nested for loop to pass in the actual sp500 data frame of data – currently it is still passing in the ticker symbol ‘^GSPC’.
just change the following line:
for i, shortma in enumerate(short_ma):
for j, longma in enumerate(long_ma):
# CHANGE THE LINE BELOW TO PASS IN THE sp500 (not the ticker symbol) as the first argument
pnl, sharpe = ma_strat(sp500,shortma,longma)
results_pnl[i,j] = pnl
results_sharpe[i,j] = sharpe
I will update the c ode in the post to reflect this.