Welcome back…this post is going to deal with a couple of questions I received in the comments section of a previous post, one relating to a moving average crossover trading strategy – the article can be found here.
The questions I received at first asked whether the code for the backtest could be adapted to follow the following rules:
(1) If the short moving average crosses above the long moving average go long for x days.
(2) if the short moving average crosses below the long moving average short for x days.
(3a) If there is an additional crossover during holding period ignore it
(3b) If there are not crossovers hold cash
Here is the original code:
#import relevant modules import pandas as pd import numpy as np from pandas_datareader import data from math import sqrt import matplotlib.pyplot as plt #download data into DataFrame and create moving averages columns sp500 = data.DataReader('^GSPC', 'yahoo',start='1/1/2000') sp500['42d'] = np.round(sp500['Close'].rolling(window=42).mean(),2) sp500['252d'] = np.round(sp500['Close'].rolling(window=252).mean(),2) #create column with moving average spread differential sp500['42-252'] = sp500['42d'] - sp500['252d'] #set desired number of points as threshold for spread difference and create column containing strategy 'Stance' X = 50 sp500['Stance'] = np.where(sp500['42-252'] > X, 1, 0) sp500['Stance'] = np.where(sp500['42-252'] < X, -1, sp500['Stance']) sp500['Stance'].value_counts() #create columns containing daily market log returns and strategy daily log returns sp500['Market Returns'] = np.log(sp500['Close'] / sp500['Close'].shift(1)) sp500['Strategy'] = sp500['Market Returns'] * sp500['Stance'].shift(1) #set strategy starting equity to 1 (i.e. 100%) and generate equity curve sp500['Strategy Equity'] = sp500['Strategy'].cumsum() + 1 #show chart of equity curve sp500['Strategy Equity'].plot()
And here is the code which incorporates the above rules:
#import relevant modules import pandas as pd import numpy as np from pandas_datareader import data from math import sqrt import matplotlib.pyplot as plt %matplotlib inline #download data into DataFrame and create moving averages columns sp500 = data.DataReader('^GSPC', 'yahoo',start='1/1/2014') sp500['42d'] = np.round(sp500['Close'].rolling(window=42).mean(),2) sp500['252d'] = np.round(sp500['Close'].rolling(window=252).mean(),2) #create column with moving average spread differential sp500['42-252'] = sp500['42d'] - sp500['252d'] #set desired number of points as threshold for spread difference and create column containing strategy 'Stance' X = 50 sp500['Stance'] = np.where(sp500['42-252'] > X, 1, 0) sp500['Stance'] = np.where(sp500['42-252'] < -X, -1, sp500['Stance']) sp500['Stance'].value_counts() #create columns containing daily market log returns and strategy daily log returns sp500['Market Returns'] = np.log(sp500['Close'] / sp500['Close'].shift(1)) sp500['Strategy'] = sp500['Market Returns'] * sp500['Stance'].shift(1) #set up a new column to hold our stance relevant for the prespecified holding period sp500['Stance2'] = 0 #set out predetermined holding period, after which time we will go back to holding cash and wait #for the next moving average cross over - also we will ignore any extra crossovers during this holding period days = 50 #iterate through the DataFrame and update the "Stance2" column to hold the revelant stance for i in range(X,len(sp500)): #logical test to check for 1) a cross over short over long MA 2) That we are currently in cash if (sp500['Stance'].iloc[i] > sp500['Stance'].iloc[i-1]) and (sp500['Stance'].iloc[i-1] == 0) and (sp500['Stance2'].iloc[i-1] == 0): #populate the DataFrame forward in time for the amount of days in our holding period for k in range(days): try: sp500['Stance2'].iloc[i+k] = 1 sp500['Stance2'].iloc[i+k+1] = 0 except: pass #logical test to check for 1) a cross over short under long MA 2) That we are currently in cash if (sp500['Stance'].iloc[i] < sp500['Stance'].iloc[i-1]) and (sp500['Stance'].iloc[i-1] == 0) and (sp500['Stance2'].iloc[i-1] == 0): #populate the DataFrame forward in time for the amount of days in our holding period for k in range(days): try: sp500['Stance2'].iloc[i+k] = -1 sp500['Stance2'].iloc[i+k+1] = 0 except: pass #Calculate daily market returns and strategy daily returns sp500['Market Returns'] = np.log(sp500['Close'] / sp500['Close'].shift(1)) sp500['Strategy'] = sp500['Market Returns'] * sp500['Stance2'].shift(1) #plot strategy returns vs market returns sp500[['Market Returns','Strategy']].cumsum().plot(grid=True,figsize=(8,5)) plt.show() #set strategy starting equity to 1 (i.e. 100%) and generate equity curve sp500['Strategy Equity'] = sp500['Strategy'].cumsum() + 1 #show chart of equity curve sp500['Strategy Equity'].plot(grid=True,figsize=(8,5)) plt.show()
After posting this code I received the following follow-up questions:
“Thank you for the response. I am having some trouble understanding this piece of code. The code is working but I would like to better understand it. I am primarily confused with the iloc, and k and I. I really don’t understand what those are or where they are pulling information from. any clarity would be greatly appreciated!!”
So…!! Here’s my attempt at trying to clarify my train of thought while writing the code, and try to explain exactly what is going on line by line.
Let’s start with the “iloc”; here is the info from the official Pandas documentation:
'''.iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with python/numpy slice semantics). Allowed inputs are: An integer e.g. 5 A list or array of integers [4, 3, 0] A slice object with ints 1:7 A boolean array A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)'''
So as stated above, “iloc” is just an easy way to reference a row position in your DataFrame – simple as that. If we create the following DataFrame and show it, we get:
df = pd.DataFrame({'x':['A','B','C','D','E']}) df

We can now just use “iloc” to select a row from the DataFrame using an integer as the value passed, like so:
df.iloc[0]
x A
or:
df.iloc[1]
x B
and so on. So in the moving average cross over strategy code above, the “.iloc” use is literally just allowing me to select values in the DataFrame at specific row indexes, based upon the integer value I pass to the “.iloc” function.
sp500['Stance'].iloc[1]
allows me to reference the value of the “Stance” column at index 1.
Ok so that should hopefully explain the “.iloc” function.
Let’s move on to the use of “i” in the above code and what that is doing, and where it comes from. This is also a pretty simple concept once you understand what is happening. In the code we set up a column called “Stance2” which is set to hold our new position stance, based upon the augmented rules mentioned at the start of this post – about holding a position for x amount of days etc.
When first set up, the “Stance2” column has just been populated with zeros all the way down – we now need to iterate through all the values and apply the logic in order to specify whether the value should actually be 1 (long position), 0 (no position) or -1 (short position).
The clue to the use of “i” and what it’s doing is in the use of the word “iterate” – we need to run through each value in the column individually and check and update as necessary.
So – how do we access each individual value in the column in turn? We set up an iterator, and use that.
An iterator is basically a value that starts at a value and increments up or down with each pass through the code. They are more of then than not used with “for” and “while” loops. Let’s look at a short example:
for i in [0,1,2,3,4]: print i
Will get us:
0 1 2 3 4
We can see that each run through the code iterates through the values in the list, moving on to the next one each time and prints out its value, until it reaches the end of the list and ends the “for” loop.
So in effect the value of “i” changes each time the “for” loop runs.
In Python, if we want to use an iterator that increments through a list of 0 to 1000, for example – rather than write out a list with all the values from 0 to 1000, we can just use the “range” function. This can be used as follows:
range(0,5)
which produces:
[0, 1, 2, 3, 4]
Wow – so the range function actually creates a list with all the values starting at 0 up to (but not including) the values passed to the function in brackets. We could now just easily create a list of values form 0 to 1000 by using:
range(0,1001)
So the following line in the strategy code:
#iterate through the DataFrame and update the "Stance2" column to hold the relevant stance for i in range(X,len(sp500)):
Is using the “range” function to create a list of values starting at “X” and going up to (but not including) the value of however long the sp500 DataFrame is, by using the “len” function. So if X is 50, for example, and the length of the sp500 DataFrame is 500, the list of values that we will iterate through will run from 50 to 499 – i.e. from the 50th day in our DataFrame to the last.
When we combine the “.iloc” and the “for i in range” and actually pass the value of “i” to the “iloc” as “.iloc[i]” – we are able to use the iterator “i”, which is incrementing up by one each time the for loop runs, to access each DataFrame row index, one after the other.
Right, so we should now be clear about the main “for i in range (X, len(sp500)):” loop – we are literally just iterating though each row in the DataFrame one by one, starting at row X and ending at the end of the DataFrame (actually technically we are finishing at the second last day as the range function creates a list that goes up to, but doesn’t include the second value passed – but for our purposes finishing on the second last day has no effect on the output)
So now let’s try to tackle all these “if” and “for k” loops and tests within the main outer “for” loop explained above.
The “if” tests are just running a couple of logical tests on each row of the DataFrame that we are iterating through – the first logical tests deal with crossovers when the short moving average crosses ABOVE the long moving average and uses the values in the “Stance” column firstly to check whether the values moves from 0 to 1 from day “T-1” to day “T” AND the value in the “Stance2” column at day “T-1” is also 0 – if so, this signifies that the short moving average has crossed above the long moving average and we should enter a position.
If this isn’t clear why – take a little time to think through the logic. We need the “Stance” column to move from 0 to 1 to signify a short over long MA crossover, but we also need the “Stance2” column at day “T-1” to be 0, as if it is, no position is currently held and one should therefore be entered into. If “Stance2” at day “T-1” wasn’t zero, then that signifies we are already in a position and nothing should be done, even though the first logical test shows a moving average crossover.
I appreciate this could take a little time to work through and get your head around, but all I can say is that the logical tests as set up in the code do indeed follow the logic presented in the initial questions.
The second set of “if” tests are just the mirror image of the first – and deal with crossovers when the short moving average crosses BELOW the long moving average.
Now finally, hopefully the above is all clear (as mud) and all we have left to deal with is the use of “k” in the “for k in range(days)” loop.
Firstly, “days” is just a variable that holds the number of days we wish to hold our positions for before automatically exiting those positions.
“days” was set to 50 in the code, but could be set to whatever value one desires for their particular strategy.
So with that in mind, the “for k in range(days)” loop runs if and only if the previous logical tests are satisfied, and our strategy is telling us to open a position (whether long or short). If an entry signal is presented, we then iterate forward in time by “k” days (starting at day “i” – so we use “i+k” as our starting day) and set our “Stance2” to either 1 or -1 (depending on whether the signal is a long or short position signal). This represents the time we wish to hold our position before automatically exiting. So if an entry signal is produced on day 100 (at which point i will equal 100), the stance at day “i + k” will be set to either -1 or 1; k will start at 0 and run to 49, incrementing by one with each run through the “for k in” loop.
So stance at day “i + k”, will set day 100 + 0 to 1 or -1, then day 100 + 1 then 100 + 2 etc, up until the position is exited at day 100 + 49, and the stance on day 100 + 50 is set to zero to show a flat position.
A couple of possibly confusing additions to the logic above and in the following lines:
for k in range(days): try: sp500['Stance2'].iloc[i+k] = 1 sp500['Stance2'].iloc[i+k+1] = 0 except: pass
are the use of the second line “sp500[‘Stance2’].iloc[i+k+1] = 0” – I have included this to make sure the “Stance2” column is reset to 0 as we have exited our position and are now flat again – this flat stance is set for the day after we exit our position – in effect day (i+k+1).
The second possibly confusing addition is the use of the “try and except”. We use this so that if we reach the end of the DataFrame while still iterating forward in time by X days, instead of throwing up an “index out of bounds” error and exiting the code, the code just passes and ends with no error.
This can happen for example if say our “days” window is set to 50 and with only 30 days left of data, an entry signal is produced – if we try to iterate forward 50 days in time when only 30 days of data exists – without the try/except our code will spit an error and quit.
Hopefully this post explains and clarifies the vast majority of the augmented code – if not, then please do leave your questions and comments below and I’ll try to tackle them all…
Until next time!
5 comments
Good info. But I have to note that previous website design was much much better than this one. That gray text is hard to read, zoom in/out functions doesnt work on this theme. The whole website works slower than with previous theme. Maybe because several plugins installed such as Squirrly SEO, W3 Total Cache and Jetpack. Beleive me, having such informative content, you don’t need all that seo garbage at all.
Hi there…I totally agree, I’ve given up with the previous theme…it was far too complex behind the scenes and I could never get anything to look the way I wanted it to. So I’ve decided to use this new theme now; I would be interested to hear what you think if it!
Thanks for the feedback – it’s always much appreciated and definitely get’s listened to!
Now it looks better and works faster. thank you
Thanks for the awesome information sir!! I really appreciated. I have a question regarding the crossover. If we want to go buy when the short MA crosses above the long MA, why couldn’t we type the following code?
for row in range(len(sp500)):
if (sp500[’42-252′][row] >= sp500[’42-252′][row]) and (sp500[’42-252′][row-1] X, 1, 0).
My guess of the meaning of this line is: for each trading date, if the short MA is X points greater than the long MA, put 1 into the corresponding spot of the “Stance” column. Is it correct? But if that’s true, how is it related to the “crossing above” thing? “sp500[’42-252′] > X” is just comparing short MA and long MA “within the same day” right? We can only know whether it does crosses above by comparing “different” consecutive days right? Sorry for typing so many words. Hope to hear from you. Thank you in advance!
Please ignore the previous comment I posted moments ago. There were several typos.
If we want to go buy when the short MA crosses above the long MA, why couldn’t we type the following code?
for row in range(len(sp500)):
if (sp500[’42-252′][row] >= sp500[’42-252′][row]) and (sp500[’42-252′][row-1] X, 1, 0)
My guess of the meaning of this line is: for each trading date, if the short MA is X points greater than the long MA, put 1 into the corresponding spot of the “Stance” column. Is it correct? But if that’s true, how is it related to the “crossing above” thing? “sp500[’42-252′] > X” is just comparing short MA and long MA “within the same day” right? We can only know whether it does crosses above by comparing “different” consecutive days right? Hope to hear from you. Thank you in advance!