Moving Average Crossover Trading Strategy Backtest in Python – V 2.0

Welcome back…this post is going to deal with a couple of questions I received in the comments section of a previous post, one relating to a moving average crossover trading strategy – the article can be found here.

The questions I received at first asked whether the code for the backtest could be adapted to follow the following rules:

(1) If the short moving average crosses above the long moving average go long for x days.
(2) if the short moving average crosses below the long moving average short for x days.
(3a) If there is an additional crossover during holding period ignore it
(3b) If there are not crossovers hold cash

Here is the original code:

#import relevant modules
import pandas as pd
import numpy as np
from pandas_datareader import data
from math import sqrt
import matplotlib.pyplot as plt
 
 
#download data into DataFrame and create moving averages columns
sp500 = data.DataReader('^GSPC', 'yahoo',start='1/1/2000')
sp500['42d'] = np.round(sp500['Close'].rolling(window=42).mean(),2)
sp500['252d'] = np.round(sp500['Close'].rolling(window=252).mean(),2)
 
#create column with moving average spread differential
sp500['42-252'] = sp500['42d'] - sp500['252d']
 
#set desired number of points as threshold for spread difference and create column containing strategy 'Stance'
X = 50
sp500['Stance'] = np.where(sp500['42-252'] > X, 1, 0)
sp500['Stance'] = np.where(sp500['42-252'] < X, -1, sp500['Stance'])
sp500['Stance'].value_counts()
 
#create columns containing daily market log returns and strategy daily log returns
sp500['Market Returns'] = np.log(sp500['Close'] / sp500['Close'].shift(1))
sp500['Strategy'] = sp500['Market Returns'] * sp500['Stance'].shift(1)
 
#set strategy starting equity to 1 (i.e. 100%) and generate equity curve
sp500['Strategy Equity'] = sp500['Strategy'].cumsum() + 1
 
#show chart of equity curve
sp500['Strategy Equity'].plot()

And here is the code which incorporates the above rules:

#import relevant modules
import pandas as pd
import numpy as np
from pandas_datareader import data
from math import sqrt
import matplotlib.pyplot as plt
%matplotlib inline
 
 
#download data into DataFrame and create moving averages columns
sp500 = data.DataReader('^GSPC', 'yahoo',start='1/1/2014')
sp500['42d'] = np.round(sp500['Close'].rolling(window=42).mean(),2)
sp500['252d'] = np.round(sp500['Close'].rolling(window=252).mean(),2)
 
#create column with moving average spread differential
sp500['42-252'] = sp500['42d'] - sp500['252d']
 
#set desired number of points as threshold for spread difference and create column containing strategy 'Stance'
X = 50
sp500['Stance'] = np.where(sp500['42-252'] > X, 1, 0)
sp500['Stance'] = np.where(sp500['42-252'] < -X, -1, sp500['Stance']) 
sp500['Stance'].value_counts() 
#create columns containing daily market log returns and strategy daily log returns 
sp500['Market Returns'] = np.log(sp500['Close'] / sp500['Close'].shift(1)) 
sp500['Strategy'] = sp500['Market Returns'] * sp500['Stance'].shift(1) 
#set up a new column to hold our stance relevant for the prespecified holding period 
sp500['Stance2'] = 0 
#set out predetermined holding period, after which time we will go back to holding cash and wait 
#for the next moving average cross over - also we will ignore any extra crossovers during this holding period days = 50 
#iterate through the DataFrame and update the "Stance2" column to hold the revelant stance 
for i in range(X,len(sp500)): 
    #logical test to check for 1) a cross over short over long MA 2) That we are currently in cash 
    if (sp500['Stance'].iloc[i] > sp500['Stance'].iloc[i-1]) and (sp500['Stance'].iloc[i-1] == 0) and (sp500['Stance2'].iloc[i-1] == 0):
        #populate the DataFrame forward in time for the amount of days in our holding period
        for k in range(days):
            try:
                sp500['Stance2'].iloc[i+k] = 1
                sp500['Stance2'].iloc[i+k+1] = 0
            except:
                pass
    #logical test to check for 1) a cross over short under long MA 2) That we are currently in cash
    if (sp500['Stance'].iloc[i] < sp500['Stance'].iloc[i-1]) and (sp500['Stance'].iloc[i-1] == 0) and (sp500['Stance2'].iloc[i-1] == 0):
        #populate the DataFrame forward in time for the amount of days in our holding period
        for k in range(days):
            try:
                sp500['Stance2'].iloc[i+k] = -1
                sp500['Stance2'].iloc[i+k+1] = 0
            except:
                pass
 
 
#Calculate daily market returns and strategy daily returns
sp500['Market Returns'] = np.log(sp500['Close'] / sp500['Close'].shift(1))
sp500['Strategy'] = sp500['Market Returns'] * sp500['Stance2'].shift(1)
 
#plot strategy returns vs market returns
sp500[['Market Returns','Strategy']].cumsum().plot(grid=True,figsize=(8,5))
plt.show()
 
#set strategy starting equity to 1 (i.e. 100%) and generate equity curve
sp500['Strategy Equity'] = sp500['Strategy'].cumsum() + 1
 
#show chart of equity curve
sp500['Strategy Equity'].plot(grid=True,figsize=(8,5))
plt.show()

After posting this code I received the following follow-up questions:

“Thank you for the response. I am having some trouble understanding this piece of code. The code is working but I would like to better understand it. I am primarily confused with the iloc, and k and I. I really don’t understand what those are or where they are pulling information from. any clarity would be greatly appreciated!!”

So…!! Here’s my attempt at trying to clarify my train of thought while writing the code, and try to explain exactly what is going on line by line.

Let’s start with the “iloc”; here is the info from the official Pandas documentation:

'''.iloc is primarily integer position based (from 0 to length-1 of the axis), 
but may also be used with a boolean array. 
.iloc will raise IndexError if a requested indexer is out-of-bounds, 
except slice indexers which allow out-of-bounds indexing. 
(this conforms with python/numpy slice semantics). 
Allowed inputs are:
 
An integer e.g. 5
 
A list or array of integers [4, 3, 0]
 
A slice object with ints 1:7
 
A boolean array
 
A callable function with one argument (the calling Series, DataFrame or Panel) 
and that returns valid output for indexing (one of the above)'''

So as stated above, “iloc” is just an easy way to reference a row position in your DataFrame – simple as that. If we create the following DataFrame and show it, we get:

df = pd.DataFrame({'x':['A','B','C','D','E']})
 
df

We can now just use “iloc” to select a row from the DataFrame using an integer as the value passed, like so:

df.iloc[0]

x A

or:

df.iloc[1]

x B

and so on. So in the moving average cross over strategy code above, the “.iloc” use is literally just allowing me to select values in the DataFrame at specific row indexes, based upon the integer value I pass to the “.iloc” function.

sp500['Stance'].iloc[1]

allows me to reference the value of the “Stance” column at index 1.

Ok so that should hopefully explain the “.iloc” function.

Let’s move on to the use of “i” in the above code and what that is doing, and where it comes from. This is also a pretty simple concept once you understand what is happening. In the code we set up a column called “Stance2” which is set to hold our new position stance, based upon the augmented rules mentioned at the start of this post – about holding a position for x amount of days etc.

When first set up, the “Stance2” column has just been populated with zeros all the way down – we now need to iterate through all the values and apply the logic in order to specify whether the value should actually be 1 (long position), 0 (no position) or -1 (short position).

The clue to the use of “i” and what it’s doing is in the use of the word “iterate” – we need to run through each value in the column individually and check and update as necessary.

So – how do we access each individual value in the column in turn? We set up an iterator, and use that.

An iterator is basically a value that starts at a value and increments up or down with each pass through the code. They are more of then than not used with “for” and “while” loops. Let’s look at a short example:

for i in [0,1,2,3,4]:
    print i

Will get us:

0
1
2
3
4

We can see that each run through the code iterates through the values in the list, moving on to the next one each time and prints out its value, until it reaches the end of the list and ends the “for” loop.

So in effect the value of “i” changes each time the “for” loop runs.

In Python, if we want to use an iterator that increments through a list of 0 to 1000, for example – rather than write out a list with all the values from 0 to 1000, we can just use the “range” function. This can be used as follows:

range(0,5)

which produces:

[0, 1, 2, 3, 4]

Wow – so the range function actually creates a list with all the values starting at 0 up to (but not including) the values passed to the function in brackets. We could now just easily create a list of values form 0 to 1000 by using:

range(0,1001)

So the following line in the strategy code:

#iterate through the DataFrame and update the "Stance2" column to hold the relevant stance 
for i in range(X,len(sp500)):

Is using the “range” function to create a list of values starting at “X” and going up to (but not including) the value of however long the sp500 DataFrame is, by using the “len” function. So if X is 50, for example, and the length of the sp500 DataFrame is 500, the list of values that we will iterate through will run from 50 to 499 – i.e. from the 50th day in our DataFrame to the last.

When we combine the “.iloc” and the “for i in range” and actually pass the value of “i” to the “iloc” as “.iloc[i]” – we are able to use the iterator “i”, which is incrementing up by one each time the for loop runs, to access each DataFrame row index, one after the other.

Right, so we should now be clear about the main “for i in range (X, len(sp500)):” loop – we are literally just iterating though each row in the DataFrame one by one, starting at row X and ending at the end of the DataFrame (actually technically we are finishing at the second last day as the range function creates a list that goes up to, but doesn’t include the second value passed – but for our purposes finishing on the second last day has no effect on the output)

So now let’s try to tackle all these “if” and “for k” loops and tests within the main outer “for” loop explained above.

The “if” tests are just running a couple of logical tests on each row of the DataFrame that we are iterating through – the first logical tests deal with crossovers when the short moving average crosses ABOVE the long moving average and uses the values in the “Stance” column firstly to check whether the values moves from 0 to 1 from day “T-1” to day “T” AND the value in the “Stance2” column at day “T-1” is also 0 – if so, this signifies that the short moving average has crossed above the long moving average and we should enter a position.

If this isn’t clear why – take a little time to think through the logic. We need the “Stance” column to move from 0 to 1 to signify a short over long MA crossover, but we also need the “Stance2” column at day “T-1” to be 0, as if it is, no position is currently held and one should therefore be entered into. If “Stance2” at day “T-1” wasn’t zero, then that signifies we are already in a position and nothing should be done, even though the first logical test shows a moving average crossover.

I appreciate this could take a little time to work through and get your head around, but all I can say is that the logical tests as set up in the code do indeed follow the logic presented in the initial questions.

The second set of “if” tests are just the mirror image of the first – and deal with crossovers when the short moving average crosses BELOW the long moving average.

Now finally, hopefully the above is all clear (as mud) and all we have left to deal with is the use of “k” in the “for k in range(days)” loop.

Firstly, “days” is just a variable that holds the number of days we wish to hold our positions for before automatically exiting those positions.

“days” was set to 50 in the code, but could be set to whatever value one desires for their particular strategy.

So with that in mind, the “for k in range(days)” loop runs if and only if the previous logical tests are satisfied, and our strategy is telling us to open a position (whether long or short). If an entry signal is presented, we then iterate forward in time by “k” days (starting at day “i” – so we use “i+k” as our starting day) and set our “Stance2” to either 1 or -1 (depending on whether the signal is a long or short position signal). This represents the time we wish to hold our position before automatically exiting. So if an entry signal is produced on day 100 (at which point i will equal 100), the stance at day “i + k” will be set to either -1 or 1; k will start at 0 and run to 49, incrementing by one with each run through the “for k in” loop.

So stance at day “i + k”, will set day 100 + 0 to 1 or -1, then day 100 + 1 then 100 + 2 etc, up until the position is exited at day 100 + 49, and the stance on day 100 + 50 is set to zero to show a flat position.

A couple of possibly confusing additions to the logic above and in the following lines:

for k in range(days):
            try:
                sp500['Stance2'].iloc[i+k] = 1
                sp500['Stance2'].iloc[i+k+1] = 0
            except:
                pass

are the use of the second line “sp500[‘Stance2’].iloc[i+k+1] = 0” – I have included this to make sure the “Stance2” column is reset to 0 as we have exited our position and are now flat again – this flat stance is set for the day after we exit our position – in effect day (i+k+1).

The second possibly confusing addition is the use of the “try and except”. We use this so that if we reach the end of the DataFrame while still iterating forward in time by X days, instead of throwing up an “index out of bounds” error and exiting the code, the code just passes and ends with no error.

This can happen for example if say our “days” window is set to 50 and with only 30 days left of data, an entry signal is produced – if we try to iterate forward 50 days in time when only 30 days of data exists – without the try/except our code will spit an error and quit.

Hopefully this post explains and clarifies the vast majority of the augmented code – if not, then please do leave your questions and comments below and I’ll try to tackle them all…

Until next time!

It's only fair to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someonePin on PinterestShare on Reddit
Written by s666