Home Basic Data Analysis Monte Carlo Simulation in Python – Simulating a Random Walk

Monte Carlo Simulation in Python – Simulating a Random Walk

by s666

Monte Carlo Simulation in Python – Simulating a Random Walk

Ok so it’s about that time again – I’ve been thinking what my next post should be about and I have decided to have a quick look at Monte Carlo simulations.

Wikipedia states “Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. Their essential idea is using randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three distinct problem classes:[1] optimization, numerical integration, and generating draws from a probability distribution.”

We will be using a Monte Carlo simulation to look at the potential evolution of asset prices over time, assuming they are subject to daily returns that follow a normal distribution (n.b. as we know, asset price returns usually follow a distribution that is more leptokurtic (fat tailed) than a normal distribution, but a normal distribution is often assumed for these kind of purposes). This type of price evolution is also known as a “random walk”.

If we want to buy a particular stock, for example, we may like to try to look into the future and attempt to predict what kind of returns we can expect with what kind of probability, or we may be interested in investigating what potential extreme outcomes we may experience and how exposed we are to the risk of ruin or, on the flip side, superior returns.

To set up our simulation, we need to estimate the expected level of return (mu) and volatility (vol) of the stock in question. This data can be estimated from historic prices, with the simplest methods just assuming past mean return and volatility levels will continue into the future. One could also adjust historic data to account for investor views or market regime changes etc, however to keep things simple and concentrate on the code we will just set simple return and volatility levels based on past price data.

OK so let’s start to write some code and generate the initial data we need as inputs to our Monte Carlo simulation. For illustrative purposes let’s look at the stock of Apple Inc….not very original, but there you go!

We begin with the obligatory importation of the necessary modules to assist us, and go from there:

#import necessary packages
import numpy as np
import math
import matplotlib.pyplot as plt
from scipy.stats import norm
from pandas_datareader import data

#download Apple price data into DataFrame
apple = data.DataReader('AAPL', 'yahoo',start='1/1/2000')

#calculate the compound annual growth rate (CAGR) which 
#will give us our mean return input (mu) 
days = (apple.index[-1] - apple.index[0]).days
cagr = ((((apple['Adj Close'][-1]) / apple['Adj Close'][1])) ** (365.0/days)) - 1
print ('CAGR =',str(round(cagr,4)*100)+"%")
mu = cagr

#create a series of percentage returns and calculate 
#the annual volatility of returns
apple['Returns'] = apple['Adj Close'].pct_change()
vol = apple['Returns'].std()*sqrt(252)
print ("Annual Volatility =",str(round(vol,4)*100)+"%")

This gives us:

CAGR = 23.09%
Annual Volatility = 42.59%

Now we know our mean return input (mu) is 23.09% and our volatility input (vol) is 42.59% – the code to actually run the Monte Carlo simulation is as follows:

#Define Variables
S = apple['Adj Close'][-1] #starting stock price (i.e. last available real stock price)
T = 252 #Number of trading days
mu = 0.2309 #Return
vol = 0.4259 #Volatility

#create list of daily returns using random normal distribution
daily_returns=np.random.normal((mu/T),vol/math.sqrt(T),T)+1

#set starting price and create price series generated by above random daily returns
price_list = [S]

for x in daily_returns:
    price_list.append(price_list[-1]*x)

#Generate Plots - price series and histogram of daily returns
plt.plot(price_list)
plt.show()
plt.hist(daily_returns-1, 100) #Note that we run the line plot and histogram separately, not simultaneously.
plt.show()

This code returns the following plots:

prices
dist

The above code basically ran a single simulation of potential price series evolution over a trading year (252 days), based upon a draw of random daily returns that follow a normal distribution. This is represented by the single line series shown in the first chart. The second chart plots a histogram of those random daily returns over the year period.

So great – we have managed to successfully simulate a year’s worth of future daily price data. This is fantastic and all, but really it doesn’t afford us much insight into risk and return characteristics of the stock as we only have one randomly generated path. The likelyhood of the actual price evolving exactly as described in the above charts is pretty much zero.

So what’s the point of this simulation you may ask? Well, the real insight is gained from running thousands, tens of thousands or even hundreds of thousands of simulations, with each run producing a different series of potential price evolution based upon the same stock characteristics (mu and vol).

We can very simply adjust the above code to run multiple simulations. This code is presented below. In the below code you will notice a couple of things – firstly I have removed the histogram plot (we’ll come back to this in a slightly different way later), and also the code now plots multiple price series on one chart to show info for each separate run of the simulation.

import numpy as np
import math
import matplotlib.pyplot as plt
from scipy.stats import norm

#Define Variables
S = apple['Adj Close'][-1] #starting stock price (i.e. last available real stock price)
T = 252 #Number of trading days
mu = 0.2309 #Return
vol = 0.4259 #Volatility

#choose number of runs to simulate - I have chosen 1000
for i in range(1000):
    #create list of daily returns using random normal distribution
    daily_returns=np.random.normal(mu/T,vol/math.sqrt(T),T)+1
    
    #set starting price and create price series generated by above random daily returns
    price_list = [S]
    
    for x in daily_returns:
        price_list.append(price_list[-1]*x)

    #plot data from each individual run which we will plot at the end
    plt.plot(price_list)

#show the plot of multiple price series created above
plt.show()

This give us the following plot of the 1000 different simulated price series

multprice

Amazing! Now we can see the potential outcomes generated from 1000 different simulations, all based on the same basic inputs, allowing for the randomness of the daily return series.

The spread of final prices is quite large, ranging from around $45 to $500! That’s a pretty wide spread!

In it’s current format, with the chart being so full of data it can be a little difficult to actually see clearly what is going on – so this is where we come back to the histogram that we removed before, albeit this time it will show us the distribution of ending simulation values, rather than the distribution of daily returns for an individual simulation. I have also simulated 10,000 runs this time to give us more data.

Again, the code is easily adapted to include this histogram.

import numpy as np
import math
import matplotlib.pyplot as plt
from scipy.stats import norm

#set up empty list to hold our ending values for each simulated price series
result = []

#Define Variables
S = apple['Adj Close'][-1] #starting stock price (i.e. last available real stock price)
T = 252 #Number of trading days
mu = 0.2309 #Return
vol = 0.4259 #Volatility

#choose number of runs to simulate - I have chosen 10,000
for i in range(10000):
    #create list of daily returns using random normal distribution
    daily_returns=np.random.normal(mu/T,vol/math.sqrt(T),T)+1
    
    #set starting price and create price series generated by above random daily returns
    price_list = [S]
    
    for x in daily_returns:
        price_list.append(price_list[-1]*x)

    #plot data from each individual run which we will plot at the end
    plt.plot(price_list)
    
    #append the ending value of each simulated run to the empty list we created at the beginning
    result.append(price_list[-1])

#show the plot of multiple price series created above
plt.show()

#create histogram of ending stock values for our mutliple simulations
plt.hist(result,bins=50)
plt.show()

This gives us the following charts:

multprice
dist

We can now quickly calculate the mean of the distribution to get our “expected value”:

#use numpy mean function to calculate the mean of the result
print(round(np.mean(result),2))

Which gives me 141.15. Of course you will get a slightly different result due to the fact that these are simulations of random daily return draws. The more paths or “runs” you include in each simulation, the more the mean value will tend towards the mean return we used as our “mu” input. This is as a result of the law of large numbers.

We can also take a look at a couple of “quantiles” of the potential price distribution, to get an idea of the likelyhood of very high or very low returns.

We can just use the numpy “percentile” function as follows to calculate the 5% and 95% quantiles:

print("5% quantile =",np.percentile(result,5))
print("95% quantile =",np.percentile(result,95))
5% quantile = 63.5246208951
95% quantile = 258.436742087

Lovely! So we now know that there is a 5% chance that our stock price will end up below around $63.52 and a 5% chance it will finish above $258.44.

So now we can begin to ask ourselves questions along the lines of “am I willing to risk a 5% chance of ending up with a stock worth less than $63.52, in order to chase an expected return of around 23%, giving us an expected stock price of around $141.15?”

I think the last thing to do is to quickly plot the two quantiles we have just calculated on the histogram to give us a visual representation and…well, why the hell not eh? 😉

plt.hist(result,bins=100)
plt.axvline(np.percentile(result,5), color='r', linestyle='dashed', linewidth=2)
plt.axvline(np.percentile(result,95), color='r', linestyle='dashed', linewidth=2)
plt.show()
dist

Ok, I think i’ll leave it there for now – if anyone has any comments or questions, please do leave them below.

Thanks!

You may also like

16 comments

Viktor December 26, 2016 - 12:24 am

Thanks for the interesting article! I found your blog very useful.

I have just one thing to note regarding this line: daily_returns=np.random.normal(mu/T,vol/math.sqrt(T),T)+1
Here you use arithmetic returns (and not log returns) so if you use mu/T for calculating daily returns from annual one you will end up applying compund returns. It will overestimate the annual return (mu). Instead I would use something like this: daily_returns=np.random.normal((1+mu)**(1/T),vol/sqrt(T),T)

Cheers!

Viktor

Reply
s666 December 26, 2016 - 8:31 pm

Hi Viktor, thanks for the comment. You are indeed correct…I was concentrating too much on the coding element of the blog post and missed that simple error’ I appreciate you pointing it out!

Also, very happy to hear you’re finding the blog useful.

Reply
Stephen October 15, 2017 - 10:37 pm

Hi. I’m REALLY enjoying this great blog so far. Thanks! This post is really useful, but I think there’s a problem – the calculation of a price list using:

for x in daily_returns:
price_list.append(price_list[-1]*x)

Blows up to ridiculous numbers, on the order of 10^(77). I ran this script on the day of posting, and those are the results I was getting. I don’t see how I could have done anything wrong, but I apologize if I did. Does anyone have any thoughts on this? I can’t seem to find a solution yet.

Cheers,
Stephen.

Reply
s666 October 16, 2017 - 8:46 am

Hi Stephen, glad to hear you’re finding the content interesting…thanks for your comment. You are correct that I made a mistake in the second block of code (although the following blocks that actually runs the Monte Carlo simulations are correct) – instead of the code posted, it should have been:

#create list of daily returns using random normal distribution
#IT IS THIS LINE WHICH HAS CHANGED
    daily_returns=np.random.normal(mu/T,vol/math.sqrt(T),T)+1 

#set starting price and create price series generated by above random daily returns
price_list = [S]
 
for x in daily_returns:
    price_list.append(price_list[-1]*x)

Hope that helps – I will fix the code when I get a moment.

Cheers!

Reply
Stephen October 16, 2017 - 1:49 pm

Hi,

Thanks for your reply. I did manage to get sensible results when using:

daily_returns = np.random.normal((1 + mu) ** (1/T), volatility/math.sqrt(T), T)

i.e. by not including the +1 term at the end. I’m having trouble finding a source to explain the mathematics of modelling daily returns though, so removing it might not be a legitimate solution. It does appear to work though!

Thanks again, I’m looking forward to studying the rest of your posts.

Cheers,
Stephen.

Reply
Denis March 7, 2018 - 1:32 pm

datareader doesn’t work properly with yahoo finance

Reply
s666 March 7, 2018 - 3:25 pm

Hi Denis – I have just tried to run the code above and it works fine without any error in downloading Apple stock price data from Yahoo. May I ask what error message you are getting? There was an issue with the Pandas Data_reader whereby prices couldn’t be downloaded anymore but that was fixed in a later update.

Perhaps you could try updating your pandas-datareader package to the latest version with pip.

Try running:

pip install data-reader –upgrade

in a CMD terminal.

Reply
leo vince October 24, 2018 - 6:44 pm

Denis, you probably figured it out by now, historical prices area avail at yahoo in csv.
#download Apple price data into DataFrame
apple = pd.read_csv(‘AAPL.csv’, usecols=[‘Date’,’Adj Close’],index_col=0, parse_dates=True)

Reply
ATST March 8, 2018 - 7:00 pm

This blog is really helpful, and I’m learning a lot about Python as I go through the lessons. Could you help me with a (probably silly) question? You use the following normal distribution for daily returns: ‘daily_returns=np.random.normal(mu/T,vol/math.sqrt(T),T)+1’. Why add the constant ‘1’ to the distribution? I believe that adding a constant changes the mean and not the variance, but I tried plotting it without the constant to no avail (nothing appears). I tried other constants and would get weird or no plots, too. I’m sure there’s a simply explanation about adding the constant ‘1’, which I’m not aware of.

Reply
s666 March 16, 2018 - 4:05 am

Thanks for your comment – the answer of why I am adding 1 to the distribution is as follows:

I am generating a set of random “normally distributed” returns based on the mean and variance of the particular price series we happen to be looking at (in the above example it is Apple stock). The next step is to apply those randomly generated returns to the “starting price “of Apple (which in our case is the last entry of “real” data downloaded from Yahoo), and then generated, step by step a set of “generated” price series.

The issue comes as the normal distribution will spit out numbers such as “0.0345” or “-0.0126” to select two completely random examples.

You have to add 1 to each of those numbers so that when you step through each randomly generated return and multiply it by the previous stock price – you get the new price of the stock, rather than just the incermental increase or decrease.

For example, if 100 increases by 2% for example, you don’t just multiply 100 by 0.02 – you multiply it by 1.02 (i.e. you add 1 to the return).

This is why 1 is added to each of the returns in the generated return series.

Hope that answers the question.

Reply
ATST March 16, 2018 - 8:19 pm

That was helpful. Thanks. I forgot that the normal distribution was of returns.

Reply
Denis May 17, 2018 - 3:34 pm

Traceback (most recent call last):
File “C:\Users\Denis\AppData\Local\Programs\Python\Python36-32\1.py”, line 9, in
apple = data.DataReader(‘AAPL’, ‘yahoo’,start=’1/1/2000′)
File “C:\Users\Denis\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas_datareader\data.py”, line 291, in DataReader
raise ImmediateDeprecationError(DEP_ERROR_MSG.format(‘Yahoo Daily’))
pandas_datareader.exceptions.ImmediateDeprecationError:
Yahoo Daily has been immediately deprecated due to large breaks in the API without the
introduction of a stable replacement. Pull Requests to re-enable these data
connectors are welcome.

See https://github.com/pydata/pandas-datareader/issues

Thank you for your previous reply and sorry for my delay

Reply
BERTRAND OBI May 26, 2018 - 6:13 am

Thank you very much for the detailled presentation.

I wish to find out how i can run the montecarlo simulation and calculate the returns for a portfolio of stocks

Thank you

Reply
Phillip April 8, 2019 - 4:04 am

thanks for the tutorial! I’m trying to run 100,000 iterations and it takes a huge amount of time. I’d like to cut that time down a little… How can this code be optimized to use more cores or threads? I’ve got 8 cores and 16 threads and id like to use them all. Do you know how to cut the time down for these monte carlo sims?

Reply
s666 April 8, 2019 - 9:50 pm

Hi Phillip – I have been playing around with the code and have created a multithreaded process however the pay off in speed is actually negative, the way I have structured it. This is due to the fact that the biggest time sink in the whole script is the call to “plt.plot(price_list)” in each iteration of the simulation – with the ending call of “plt.show()” actually displaying the chart with all the 100,000 odd lines on.

If you were to remove these calls to plot each and every MC iteration result – the code would speed up massively. You can still keep the call the plot the histogram as that doesn’t take up too much time.

The reason why the multi-threaded code ended up taking longer I believe is due to the fact that it takes longer to fire up the threads and use them, than the time saving you get from doing that. The actual generation of the daily returns is pretty darn fast for each iteration so multi-threading it was counter-productive.

If you could multi-thread the creation of the plot then that would be the way to go but I’m really not sure that’s even possible!!

If you want to test it out for yourself I have pasted the mutli-threaded code below:

import numpy as np
import time
import multiprocessing as mp
import math
import matplotlib.pyplot as plt



def sample():
    # Just to make it slow!
    # Pretend that this is a complicated distribution!
    # time.sleep(0.1)

    # Re-seed the random number generator
    np.random.seed()

    daily_returns = np.random.normal(0.25/252,0.41/math.sqrt(252),252)+1

    price_list = [100]

    for x in daily_returns:
        price_list.append(price_list[-1]*x)

    # plt.plot(price_list)

    return price_list

def _parallel_mc(iter):
    pool = mp.Pool(mp.cpu_count())

    future_res = [pool.apply_async(sample) for _ in range(iter)]
    res = [f.get() for f in future_res]

    return res



def parallel_monte_carlo(iter):
    samples = _parallel_mc(iter)
    return samples


if __name__ == '__main__':
    # freeze_support()
    start_time = time.time()
    p = parallel_monte_carlo(500)
    print("MP--- %s seconds ---" % (time.time() - start_time))

If you uncomment out the line “time.sleep(0.1) and compare that to the old code also with a “time.sleep(0.1)” added to each iteration, you will see that the multi-threading starts to pay positive dividends in terms of time savings and it is now worthwhile taking the time to fire up new threads.

Reply
陳建元 May 25, 2019 - 9:31 am

Wondering what’s the best way to deterimate the data during we use? (ex: You use 01/01/2010 to the latest)
Great post by the way!

Reply

Leave a Reply

%d bloggers like this: