After completing the series on creating an inter-day mean reversion strategy, I thought it may be an idea to visit another mean reversion strategy, but one that works on an intra-day scale. That is, we will be looking for the mean reversion to take place within one trading day.
Stock prices tend to follow geometric random walks, as we are often reminded by countless financial scholars; but this is true only if we test their price series for mean reversion strictly at regular intervals, such as using their daily closing price. Our job is to find special conditions where mean reversion occurs with regularity. As the following strategy will show, there may indeed be seasonal mean reversion occurring at the intra-day time frame for stocks.
The strategy rules are as follows:
1) Select all stocks near the market open whose returns from their previous day’s lows to today’s opens are lower than one standard deviation. The standard deviation is computed using the daily close-to-close returns of the last 90 days. These are stocks that “gapped down”.
2) Narrow down this list of stocks by requiring that their open prices be higher than the 20-day moving average of the closing prices.
3) Liquidate the positions at the market close.
So we will first begin with our necessary module imports as follows:
import pandas as pd import numpy as np from pandas_datareader import data from math import sqrt
I will be running this backtest using the NYSE stock universe which contains 3159 stock – you can download the ticker list by clicking on the download button below.
Once you have that file stored somewhere, we can feed it in using pandas, and set up our stock ticker list as follows:
#make sure the NYSE.txt file is in the same folder as your python script file stocks = pd.read_csv('NYSE.txt',delimiter="\t") #set up our empty list to hold the stock tickers stocks_list = [] #iterate through the pandas dataframe of tickers and append them to our empty list for symbol in stocks['Symbol']: stocks_list.append(symbol)
As a quick check to see if they have been fed in correctly:
len(stocks_list)
should produce
3159
and
stocks_list[:5]
Should produce:
['A', 'AA', 'AAC', 'AAN', 'AAP']
Ok great, so now we have our list of stocks that we wish to use as our “investment universe” – we can begin to write the code for the actual backtest.
The logic of our approach is as follows…we will iterate through the list of stock tickers, each time we will download the relevant price data into a DataFrame and then add a couple of columns to help us create signals as to when our two criteria are met (gap down of larger than 1 90 day rolling standard deviation and an opening price above the 20 day moving average).
We will then use these signals to create our return series for that stock, and then store that information by appending each stocks return series to a list. Finally we will concatenate all those return series into a master DataFrame and calculate our overall daily return.
#create empty list to hold our return series DataFrame for each stock frames = [] for stock in stocks_list: try: #download stock data and place in DataFrame df = data.DataReader(stock, 'yahoo',start='1/1/2000') #create column to hold our 90 day rolling standard deviation df['Stdev'] = df['Close'].rolling(window=90).std() #create a column to hold our 20 day moving average df['Moving Average'] = df['Close'].rolling(window=20).mean() #create a column which holds a TRUE value if the gap down from previous day's low to next #day's open is larger than the 90 day rolling standard deviation df['Criteria1'] = (df['Open'] - df['Low'].shift(1)) < -df['Stdev'] #create a column which holds a TRUE value if the opening price of the stock is above the 20 day moving average df['Criteria2'] = df['Open'] > df['Moving Average'] #create a column that holds a TRUE value if both above criteria are also TRUE df['BUY'] = df['Criteria1'] & df['Criteria2'] #calculate daily % return series for stock df['Pct Change'] = (df['Close'] - df['Open']) / df['Open'] #create a strategy return series by using the daily stock returns where the trade criteria above are met df['Rets'] = df['Pct Change'][df['BUY'] == True] #append the strategy return series to our list frames.append(df['Rets']) except: pass
Now this stock list has over 3000 stocks in it, so expect this code to take a bit of time to run…I believe mine took about 15-20 minutes to run when I tried it, so try to be a bit patient.
Once the code has run and we have our list filled with all the individual strategy return series for each stock, we have to concatenate them all into a master DataFrame and then calculate the overall daily strategy return. This can be done as follows:
#concatenate the individual DataFrames held in our list- and do it along the column axis masterFrame = pd.concat(frames,axis=1) #create a column to hold the sum of all the individual daily strategy returns masterFrame['Total'] = masterFrame.sum(axis=1) #create a column that hold the count of the number of stocks that were traded each day #we minus one from it so that we dont count the "Total" column we added as a trade. masterFrame['Count'] = masterFrame.count(axis=1) - 1 #create a column that divides the "total" strategy return each day by the number of stocks traded that day to get equally weighted return. masterFrame['Return'] = masterFrame['Total'] / masterFrame['Count']
So now we have a return series that holds the strategy returns based on trading the qualifying stocks each day, in equal weight. If 2 stocks qualified, we would weight each stock at 50% in our portfolio for example.
So all that’s left to do now, is to plot the equity curve and calculate a rough Sharpe Ratio and annual return.
masterFrame['Return'].dropna().cumsum().plot()

The Sharpe Ratio (excluding the risk free element for simplicity) can be calculated as follows:
(masterFrame['Return'].mean() *252) / (masterFrame['Return'].std() * (sqrt(252)))
which gets us:
2.176240875776992
and the annual return can be calculated as:
(masterFrame['Return'].dropna().cumsum()[-1]+1)**(365.0/days) - 1
Which gets us:
0.088146958591373892
So a Sharpe Ratio of over 2 and an annual return of around 8.8% – that’s not too shabby!!
Of course, we have to remember that we are not taking into account any transaction costs so those returns could be quite heavily effected in a real world setting. Also, this strategy logic assumes we can buy the stocks that have gapped down exactly at their opening price, and assumes we always achieve the closing (settlement) price on selling at the end of the day, which of course wouldn’t be the case.
I’ll leave it up to you guys and girls to delve more deeply into the strategy returns – you can use my previous blog post where I analysed the returns of our moving average crossover strategy as inspiration. That post can be found here
24 comments
Hi,
Thanks for the post. We are working on a high performance data analytics framework in python and would like to use your codes as examples. Are we allowed to use the material? Is there a license for this material?
Thanks,
Ehsan
Hi Ehsan – thanks for the kind words. I write this blog just for my own amusement, so no license is needed to re-use the code, please feel free to do so. All I would ask is that, if possible, you reference my blog as the source so that I may possibly attract more traffic. That’s up to you though 😉
I would be very interested to see the outcome of/hear more about your project, it sounds very interesting!
Of course, I’ll add a reference to this post. Here is the link to the example in the project: https://github.com/IntelLabs/hpat/blob/master/examples/intraday_mean.py
HPAT will compile this code (with minimal changes) automatically to run efficiently on clusters.
Looks great! Thanks for the mention too…much appreciated!
Hi S666 I was using your codes to test
I noticed something because this is taking Open to Close change, the line below should add a shift(1)?
df[‘Criteria2’] = df[‘Open’] > df[‘Moving Average’].shift(1)
Because if you dont you will be taking in today close price (But we are buying at Open and cannot possibly know today close prices)
*I am pulling data from my database but you data source may have accounted for this already if so pls ignore me thanks
Hi Jerrickng – good spot, I believe you are correct. If we are buying at the open price based upon the opening price being higher than the moving average, and we are using closing prices to calculate the moving average, we are in effect suffering from look forward bias as in real time we would not know the close price to use in the moving average calculation.
I shall change the code as soon as I get a moment.
nice blog!! …The best that I found about Python being used in Finance!!!
The only model which closely approximates financial markets is Geometric Brownian movement(GBM).Distance travelled under GBM is proportional to square root of time interval. Positive & negative shocks cancel each other over time in A diversified portfolio of stocks. On A net basis one can rarely beat the markets. According to option formula for A given stock S, if one month option costs 1 dollar then 4 month option on the same stock costs only 2 dollars because square root of 4 is two. Indirect way of stating this is that for A given time period chances that this stock would travel distance of 1d is 4 times compared to travelling distance of 2d.Option formulas may not be perfect 100%, but are damn good because trillions of dollars of derivatives are traded every day based on option formulas & market makers do not go bankrupt—whether they make market in puts or calls & stay out of speculation.
My question is whether following strategy is possibly sound in trading using computerized trading by A fund manager–
Computer puts in following order on stock “ S”.On the same ticket take profit & stop loss orders are always on the same side of current market price that day & not on opposite sides of current stock price.
1) Below the current price “P” put an order to buy that stock at “ P minus 1d” with take profit at “P minus1/2 d” & a stop loss at “P minus 2d”.This order is entered every day based on current price that day until executed whether at profit or with a loss–& same process is repeated on diversified portfolio of stocks all by computer with no human intervention. Similar orders are placed on the upside to sell short every day based on current prices that day using the same principals by the computer.No directional bet is ever made.
2)Stock prices go through noise every day on intraday basis. Chances that buy order would get filled at distance of “P minus 1D” is 4 times compared to hitting stop loss at “ P minus 2D” within same period of time on the same ticket order. With intraday noise, reversion to the mean, take profit order would get hit more times than stop loss on the same ticket order.
3) Under GBM, out of 4 episodes, 3 times there would be profit earned of “1/2d” each & one time there would be loss of “ 1d”with net profit of “½ d” on these 4 executions over & over again both on the downside as well as on the upside. Unfilled orders are cancelled every day when stock exchange closes. New orders are entered every morning based on CURRENT PRICE of the stock that day. Distance d is adjusted depending upon historical volatility of the stock so that decent number of orders are getting executed—if too many orders are getting executed then value of “d” is increased to slow down executions.With decent number of executions laws of averages would apply. Risk is controlled by controlling how many stock orders are placed both on the upside & downside. No directional bet any time—all orders are non-directional ,automatic & computer generated based on current volatility.Risk is also controlled by trading smaller amount of fund assets relative to total assets.
With low transactional costs ,fund manager would make money.
I would greatly appreciate your input into this strategy
Hi S666!
Great blog! I’m learning a lot!
I have a question about relative returns, log returns, and adding returns. In another blog post you mention that relative returns aren’t able to be summed like log returns can. (https://www.learndatasci.com/tutorials/python-finance-part-2-intro-quantitative-trading-strategies/)
But here, it looks like we are using relative returns:
#calculate daily % return series for stock
df[‘Pct Change’] = (df[‘Close’] – df[‘Open’]) / df[‘Open’]
Then later we sum them up and even cumsum them:
#create a column to hold the sum of all the individual daily strategy returns
masterFrame[‘Total’] = masterFrame.sum(axis=1)
…
masterFrame[‘Return’].dropna().cumsum().plot()
Should be we using log returns here?
Thanks!
It seems the link to the txt file is not working:
Forbidden
You don’t have permission to access /wp-content/uploads/delightful-downloads/2017/02/NYSE.txt on this server.
Is there a new link?
Regards.
Thanks for bringing that to my attention – I will look into it now and update once fixed!! Hopefully shouldn’t take too long!
Ok that should work now – when you click the button it will open the text file in your browser – you can just right click and select “save as” and then it will save as a text file onto your local machine. Hope you can access it now…if not, just let me know and I will send you the text file myself.
Thank you so much S666 for answering so fast. I’ll like to try your code, it looks great. Regards.
Super duper! Got it, thank you so much S666. I’ll try the code right now. Regards.
No problem :D….let me know if you come across any problems and I will try to help
Hi S666, I have a little problem, when I run this section:
#concatenate the individual DataFrames held in our list- and do it along the column axis
masterFrame = pd.concat(frames,axis=1)
#create a column to hold the sum of all the individual daily strategy returns
masterFrame[‘Total’] = masterFrame.sum(axis=1)
#create a column that hold the count of the number of stocks that were traded each day
#we minus one from it so that we dont count the “Total” column we added as a trade.
masterFrame[‘Count’] = masterFrame.count(axis=1) – 1
#create a column that divides the “total” strategy return each day by the number of stocks traded that day to get equally weighted return.
masterFrame[‘Return’] = masterFrame[‘Total’] / masterFrame[‘Count’]
I’m getting this error:
ValueError Traceback (most recent call last)
in ()
—-> 1 masterFrame = pd.concat(frames,axis=1)
2
3 #create a column to hold the sum of all the individual daily strategy returns
4 masterFrame[‘Total’] = masterFrame.sum(axis=1)
5
/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
210 keys=keys, levels=levels, names=names,
211 verify_integrity=verify_integrity,
–> 212 copy=copy)
213 return op.get_result()
214
/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/concat.py in init(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
243
244 if len(objs) == 0:
–> 245 raise ValueError(‘No objects to concatenate’)
246
247 if keys is None:
ValueError: No objects to concatenate
Any idea, what I’m doing wrong? I’m running on Google Colab Notebook 3.
Thank you for your help.
I am pretty sure I can guess what is going on – the message at the end “ValueError: No objects to concatenate” is the important one…it’s saying exactly that – that you actually have no DataFrame objects in your “frames” list to concatenate together.
That will be due to the fact that the Yahoo Finance API has changed since this post was made and it no longer works as before – if you remove the “try/except” wrapper from around the first block of code you will then get the error message that actually is causing the problem – the Yahoo Finance API is not returning the stock data for any of the tickers.
It can be adapted to make it work again – I don’t know what level of ability/knowledge you have just at the moment but if I point you towards this package:
https://github.com/AndrewRPorter/yahoo-historical
That is a working package that has been adapted to the new Yahoo API – do you feel comfortable adapting the code, installing the package and using it?
Hi S666, thank you for your guidance. Let me try with the package you said and I’ll let you know.
Thank you for sharing with all of us your expertise. I’m very interesting in using Python for stock trading.
Regards.
Hello S666, I found a solution for the data retrieval, this is the fix:
from pandas_datareader import data as pdr
import fix_yahoo_finance as yf
yf.pdr_override() # <== that’s all it takes 🙂
download dataframe
data = pdr.get_data_yahoo(“SPY”, start=”2017-01-01″, end=”2017-04-30″)
the code is from:
https://pypi.org/project/fix-yahoo-finance/
Now the df has the OHLC values and the STDEV and MovingAverage
Date Open High Low Close Adj Close Volume Stdev Moving Average
2019-03-13 76.349998 76.529999 76.139999 76.300003 76.300003 4801400 2.302081 74.772501
2019-03-14 76.599998 76.739998 76.070000 76.639999 76.639999 5120600 2.331112 74.942001
But I can’t still concatenate the dataframes, look the error:
ValueError: No objects to concatenate
I think we are almost there but I think there is a little bug but I can’t find it.
Regards.
Hi there – i have noticed there is a bug in the code – WordPress has changed the formatting of some of the symbols – namely “<“,”>” and the ampersand sign
They have been changed (incorrectly) to “lt;”, “gt;” and “amp;” – (all with ampersands at the start too) so make sure you change them back!
Let me know if that doesn’t make sense…
jajaja, you were right S666.
It worked!!!!
I just had to define the days variable because it’s not defined anywhere.
Thank you for you help. Now I’ll try with more stocks and I’ll keep you informed.
Regards.
can i know for this column (masterFrame[‘Return’].dropna().cumsum()[-1]+1)**(365.0/days) – 1, what value should i put for ‘days’?
Hi S666, thanks for the blog !
I am having an error i cannot figure out if you can help.
The error is on masterFrame = pd.concat(frames,axis=1).
It says:
ValueError: cannot reindex from a duplicate axis
My df looks fine and the beginning of my frame as follows (note:i started my backtest in 2010 and on Russell1000 stocks instead to speed up time to run):
[Date
2014-03-28 NaN
2014-03-31 NaN
2014-04-01 NaN
2014-04-02 NaN
2014-04-03 NaN
..
2020-02-06 NaN
2020-02-07 NaN
2020-02-10 NaN
2020-02-11 NaN
2020-02-12 NaN
Name: Rets, Length: 1475, dtype: float64, Date
2010-01-04 NaN
2010-01-05 NaN
2010-01-06 NaN
2010-01-07 NaN
2010-01-08 NaN
..
2020-02-06 NaN
2020-02-07 NaN
2020-02-10 NaN
2020-02-11 NaN
2020-02-12 NaN:
Thanks
Hi S666,
I am having an error i cannot figure out if you can help.
The error is on masterFrame = pd.concat(frames,axis=1).
It says:
ValueError: cannot reindex from a duplicate axis
My df looks fine and the beginning of my frame as follows (note:i started my backtest in 2010 and on Russell1000 stocks instead to speed up time to run):
[Date
2014-03-28 NaN
2014-03-31 NaN
2014-04-01 NaN
2014-04-02 NaN
2014-04-03 NaN
..
2020-02-06 NaN
2020-02-07 NaN
2020-02-10 NaN
2020-02-11 NaN
2020-02-12 NaN
Name: Rets, Length: 1475, dtype: float64, Date
2010-01-04 NaN
2010-01-05 NaN
2010-01-06 NaN
2010-01-07 NaN
2010-01-08 NaN
..
2020-02-06 NaN
2020-02-07 NaN
2020-02-10 NaN
2020-02-11 NaN
2020-02-12 NaN:
Thanks