Hi all, for this post I will be building a simple moving average crossover trading strategy backtest in Python, using the S&P500 as the market to test on.
A simple moving average cross over strategy is possibly one of, if not the, simplest example of a rules based trading strategy using technical indicators so I thought this would be a good example for those learning Python; try to keep it as simple as possible and build up from there.
So as always when using Python for finacial data related shenanigans, it’s time to import our required modules:
import pandas as pd import numpy as np from pandas_datareader import data
We will first use the pandas-datareader functionality to download the price data from the first trading day in 2000, until today, for the S&P500 from Yahoo Finance as follows:
sp500 = data.DataReader('^GSPC', 'yahoo',start='1/1/2000')
Ok, lets do a quick check to see what format the data has been pulled down in.
Good stuff, so let’s create a quick plot of the closing prices to see how the S&P has performed over the period.
The trend strategy we want to implement is based on the crossover of two simple moving averages; the 2 months (42 trading days) and 1 year (252 trading days) moving averages.
Our first step is to create the moving average values and simultaneously append them to new columns in our existing sp500 DataFrame.
sp500['42d'] = np.round(sp500['Close'].rolling(window=42).mean(),2) sp500['252d'] = np.round(sp500['Close'].rolling(window=252).mean(),2)
The above code both creates the series and automatically adds them to our DataFrame. We can see this as follows (I use the ‘.tail’ call here as the moving averages don’t actually hold values until day 42 and day 252 so wil just show up as ‘NaN’ in a ‘.head’ call):
And here we see that indeed the moving average columns have been correctly added.
Now let’s go ahead and plot the closing prices and moving averages together on the same chart.
Our basic data set is pretty much complete now, with all that’s really left to do is devise a rule to generate our trading signals.
We will have 3 basic states/rules:
1) Buy Signal (go long) – the 42d moving average is for the first time X points above the 252d tend.
2) Park in Cash – no position.
3) Sell Signal (go short) – the 42d moving average is for the first time X points below the 252d trend.
The first step in creating these signals is to add a new column to the DataFrame which is just the difference between the two moving averages:
sp500['42-252'] = sp500['42d'] - sp500['252d']
The next step is to formalise the signals by adding a further column which we will call Stance. We also set our signal threshold ‘X’ to 50 (this is somewhat arbitrary and can be optimised at some point)
X = 50 sp500['Stance'] = np.where(sp500['42-252'] > X, 1, 0) sp500['Stance'] = np.where(sp500['42-252'] < -X, -1, sp500['Stance']) sp500['Stance'].value_counts()
(n.b. there was an error in logic with the above lines of code when this post article was posted – so you will very possibly get significantly different results even if using the same inputs and time period of data as I have – the error was that I had omitted the minus sign in front of the “X” in the second line of code in the above code box – the error was kindly pointed out by Theodore in the comments section on 07/03/2019)
The last line of code above produces:
-1 2077 1 1865 0 251 Name: Stance, dtype: int64
Showing that during the time period we have chosen to backtest, on 2077 trading dates the 42d moving average lies more than 50 points below the 252d moving average, and on 1865 the 42d moving average lies more than 50 points above the 252d moving average.
A quick plot shows a visual representation of this ‘Stance’. I have set the ‘ylim’ (which is the y axis limits) to just above 1 and just below -1 so we can actually see the horizontal parts of the line.
Everything is now in place to test our investment strategy based upon the signals we have generated. In this instance we assume for simplicity that the S&P500 index can be bought or sold directly and that there are no transaction costs. In reality we would need to gain exposure to the index through ETFs, index funds or futures on the index…and of course there would be transaction costs to pay! Hopefully this omission wont have too much of an effect as we don’t plan to be in and out of trades “too often”.
So in this model, our investor is either long the market, short the market or flat – this allows us to work with market returns and simply multiply the day’s market return by -1 if he is short, 1 if he is long and 0 if he is flat the previous day.
So we add yet another column to the DataFrame to hold the daily log returns of the index and then multiply that column by the ‘Stance’ column to get strategy returns:
sp500['Market Returns'] = np.log(sp500['Close'] / sp500['Close'].shift(1)) sp500['Strategy'] = sp500['Market Returns'] * sp500['Stance'].shift(1)
Note how we have shifted the sp[‘Close’] series down so that we are using the ‘Stance’ at the close of the previous day to calculate the return on the next day
Now we can plot the returns of the S&P500 versus the returns on the moving average crossover strategy on the same chart for comparison:
So we can see that although the strategy seems to perform rather well during market downturns, it doesn’t do so well during market rallies or when it is just trending upwards.
Over the test period it barely outperforms a simple buy and hold strategy, hardly enough to call it a “successful” strategy at least.
But there we have it; A simple moving average cross over strategy backtested in Python from start to finish in just a few lines of code!!
HI I am having trouble with this line. By any chance would you be able to assist?
sp500[’42d’] = np.round(sp500[‘Close’].rolling(window=42).mean(),2)
sp500[‘252d’] = np.round(sp500[‘Close’].rolling(window=252).mean(),2)
Sure thing… What is it that you’re having problems with exactly? If you could provide a little bit more information, I’ll try to help…
Are you getting an error message? If you could post it here, I’ll take a look.
Thank you very much for responding to my initial comment, I really appreciate it and I was able to solve the issue. (100% my fault) These tutorials are great. THANK YOU VERY MUCH AGAIN!!
I have another question/though about this back-test. If we were using shorter moving averages, would it be possible to create to following parameters:
(1) If the short moving average crosses above the long moving average go long for x days.
(2) if the short moving average crosses below the long moving average short for x days.
(3a) If there is an additional crossover during holding period ignore it
(3b) If there are not crossovers hold cash
Hi Sal, thanks for the kind words…happy to know my online ramblings are of help to at least one or two people!
Your questions are good ones, and ones that I am sure many people would have when looking into an MA cross over trading strategy. I have had a play around and I believe I have come up with something that will get you what you want. It’s not the fastest of code, and it sure ain’t the prettiest either but the final outcome follows the logic of what you have asked for…so here is it:
Couple of things to be aware of:
1) The "threshold" of the distance that the MA series need to diverge by to count as a "cross over" has been set at 50. This can be changed and optimised according to your own preferences. For example, if you wanted the MA lines to JUST cross to count as a "cross over" you could set the threshold (vairable X) to 1.
2) I have set the "days" variable to 50 - this is the holding period, and of course you can change this at will also.
Hope that helps and if you have any further questions, please do ask.
Thank you for the response. I am having some trouble understanding this piece of code. The code is working but I would like to better understand it. I am primarily confused with the iloc, and k and I. I really don’t understand what those are or where they are pulling information from. any clarity would be greatly appreciated!!
#iterate through the DataFrame and update the “Stance2” column to hold the revelant stance
for i in range(X,len(sp500)):
#logical test to check for 1) a cross over short over long MA 2) That we are currently in cash
if (sp500[‘Stance’].iloc[i] > sp500[‘Stance’].iloc[i-1]) and (sp500[‘Stance’].iloc[i-1] == 0) and (sp500[‘Stance2’].iloc[i-1] == 0):
#populate the DataFrame forward in time for the amount of days in our holding period
for k in range(days):
sp500[‘Stance2’].iloc[i+k] = 1
sp500[‘Stance2’].iloc[i+k+1] = 0
#logical test to check for 1) a cross over short under long MA 2) That we are currently in cash
if (sp500[‘Stance’].iloc[i] < sp500['Stance'].iloc[i-1]) and (sp500['Stance'].iloc[i-1] == 0) and (sp500['Stance2'].iloc[i-1] == 0):
#populate the DataFrame forward in time for the amount of days in our holding period
for k in range(days):
sp500['Stance2'].iloc[i+k] = -1
sp500['Stance2'].iloc[i+k+1] = 0
Hi there, no problem at all…glad to hear the code works as intended, at least.
In terms of your other questions regarding the “iloc” and the k and i, I think they may be best tackled in a separate blog post centered around that section of code specifically; it would be a little tough to explain it all properly in these comment boxes.
I’ll try my best to find some time this weekend and put something together for you that will hopefully make it a little clearer as to what the is actually doing etc
Hi Sal – please find the latest blog post which hopefully answers your questions at https://www.pythonforfinance.net/2016/12/18/moving-average-crossover-trading-strategy-backtest-python-v-2-0/
May I ask – are you and “algo” the same person? I see posts by both yourself and “algo” about the same topic.
[…] Staying on the same topic of optimisation that we visited in the last post concerning portfolio holdings and efficient frontiers/portfolio theory, I thought I would quickly revisit the moving average crossover strategy we built a few posts ago; the previous article can be found here. […]
[…] Welcome back…this post is going to deal with a couple of questions I received in the comments section of a previous post, one relating to a moving average crossover trading strategy – the article can be found here. […]
[…] of the results we got from the moving average crossover strategy backtest in the last post (can be found here), and spend a bit of time digging a little more deeply into the equity curve and producing a bit of […]
Hi there, I am having a problem with the import of data from yahoo using pandas.Could you please help?
File “C:\Python27\lib\site-packages\requests\adapters.py”, line 504, in send
raise ConnectionError(e, request=request)
oo.com’, port=80): Max retries exceeded with url: /table.csv?a=0&ignore=.csv&s=%
5EGSPC&b=1&e=10&d=6&g=d&f=2017&c=2000 (Caused by NewConnectionError(‘: Failed to establish a new connect
ion: [Errno 11004] getaddrinfo failed’,))
Hi, thanks for the comment and apologies for the delay in replying, I have been travelling these past 2 weeks – unfortunately the Yahoo Finance API has been discontinued I believe and so no longer works with the Pandas DataReader. You could use a provider like Quandl instead – the syntax is slightly different and the data comes down in a slightly different format but with a few tweaks you can use it no problem. You will need to install the “quandl” module with “pip install” and then sign up to http://www.quandl.com. after that you can search for the contract you need and click the “Python” option under the “Export Data” in the top right of the page.
Have a go at that and if you need any extra guidance or clarification, do let me know!
When I ran this code line: sp500[‘Strategy’] = sp500[‘Market Returns’] * sp500[‘Stance’].shift(1), I got this error: AttributeError: ‘numpy.ndarray’ object has no attribute ‘shift’
Please what do you think I am doing wrong
That’s very strange, sp500[‘Stance’] should be a “pandas.core.series.Series” not a “numpy.ndarray”.
Please try to run the code
and let me know what the output is.
I eventually sort this out.
Btw, please do you have a code to graphically represent Lake Ratio & Gain to Pain ratio of such a strategy as above?
The Gain to Pain ratio is an easy one to do – I’ve had a quick play around and have some code that calculates and creates a very simple bar chart of the Gain to Pain data. The Lake Ratio is however a much more complicated process…I would have to have a think and spend some time trying to get something put together.
As a start, here is the code for the Gain to Pain…
Thank you. That was really helpful
Hello: Please one more problem, I am trying to plot simple graphical chart that shows the bearish and bullish period distinctly using the Exponential Moving average and create a new regime, etc. I will appreciate as I need more education on this.
Hi Famson, you can just use the Exponential Weighted Average method included in the Pandas library…
Take a look at: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ewm.html
That explains its use.
So for example we could use the code:
to get the exponential weighted average of the sp500 Adjusted Close using a “Centre of Mass” of 0.5.
If you wanted to plot it, just add “.plot()” at the end of the line above.
Hope that helps.
Yes it does. But what I was actually looking at is MA that use different colour and trendline for downward movement and upward movement.
In addition, I am also looking at pairs trade between these 2 indices using specific indicator.
Thank you very much for this series of tutorials! I mean all your WORK! Excellent work! Keep it coming please!!!!!
Brilliant work indeed! Thank you very much. Would be nice if you could clarify my below doubt.
I have a csv file. 6 columns in the below format.
Date Stock 1 Price Stock 2 Price Stock 3 Price Stock 4 Price Market Index Price
But the thing here is I have the price data stored in csv on desktop. I would like to utilise mine instead of pulling from Yahoo. And yes, the thing is Stock 1 is the indicator. That is the whole strategy of crossover signal is obtained just from second column stock 1 price list. Based on this signal, stock 2, 3, 4 is purchased weighted equally. Could you kindly advice me on the code that I need to input as a replacement.
Also, I don’t want to bring in the short position.
Just long position and hold on to it – when short moving avg crosses above long moving average
Sell position entirely – when short moving avg crosses below long moving average; then buy back once again after 5 trading days.
I have been struggling a lot with the code as I’m a newbie in python. It would be really kind of you, if you assist me with the code. Thank you once again for the fantastic work of yours. Keep going.
Apologies for the delay in replying – with regard to the request above – to read in a cvs file you can use pandas “read_csv”:
With regards to the other criteria specified, may i ask what you have come up with so far? If you post it, perhaps I can take a look through and suggest areas to modify etc.
I will reply to you via email too to see if I can help.
Hi, There is a small change in my problem. Below is the code I’m using. Just stuck up in the threshold part. That is, rebalance portfolio only if it deviates beyong the threshold say 5%. I would like to put this condition before initiating the rebalance so that it doesn’t rebalance every month for even a small deviation. Could you pls guide me. Thank you very much.
# fetch some data and also if out of these stocks, the recent listed date range is considered
data = bt.get(‘VTI, BND’,start=’2007,01,11′,end=’2017,01,11′)
def __init__(self, weights):
self.target_weights = weights
def __call__(self, target):
target.temp[‘weights’] = dict(zip(target.temp[‘selected’], self.target_weights))
def my_comm(q, p):
return abs(q) * 0.5
# create the strategy & if you need it to run weekly the rebalancing use Weekly instead of Monthly
s = bt.Strategy(‘Portfolio1’, [bt.algos.RunMonthly(),
# create a backtest and run it
test = bt.Backtest(s, data, initial_capital=10000, commissions=my_comm)
res = bt.run(test)
# first let’s see an equity curve
# ok and how does the return distribution look like
# and just to make sure everything went along as planned, let’s plot the security weights over time
just wanna know the reason when you sum up strategy return
why don’t you used np.exp to the log return ?
Hey, I am a bit confused about this part:
sp500[‘Stance’] = np.where(sp500[’42-252′] < X, -1, sp500[‘Stance’])
I think we should have taken the absolute value and changed sign to greater.
For example, if we have 100 and 80, that will be 20 which will be < 50 which was the limit. However do we want it like this? I thought we wanted only cases where say 50 -110 = -60 which is a sell.
Hi Theodore – you are indeed correct!! Thanks very much for pointing this out…it’s quite an egregious error on my part, as it’s an important part of the logic!!!
The line of code should read:
I had omitted the minus sign in front of the "X" - we are indeed looking for the value of the 42 period MA minus the 252 period MA to be lower than MINUS 50!!
Again - thanks for bringing that to my attention - I shall change the code accordingly.
Thank you very much for providing us access to these tutorials. I am a retiree who learns python by studying the resources that he finds on the Internet. Trying to understand these script, I have the following questions.
a) .- What criteria should we follow to fix set our signal threshold ‘X’ ?. You use X = 50 for the SP_500. I have tested with “IBE.MC” and with quotes from two Investment Funds and, if the threshold is not 0 or very close to 0, practically all the result “stances” are zero and the whole post process of the scripts is a disaster.
b) .- The calculation of Volatility / Max Drawdown, always gives me the error “ZeroDivisionError: float division by zero”
I will appreciate any suggestions to set these concepts.
Hi there, apologies for the late reply. I will email you directly and help you with this, that will probably be easier than commenting back and forth. Check your inbox shortly 🙂