**Monte Carlo Simulation in Python – Simulating a Random Walk**

Ok so it’s about that time again – I’ve been thinking what my next post should be about and I have decided to have a quick look at Monte Carlo simulations.

Wikipedia states “Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. Their essential idea is using randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three distinct problem classes:[1] optimization, numerical integration, and generating draws from a probability distribution.”

We will be using a Monte Carlo simulation to look at the potential evolution of asset prices over time, assuming they are subject to daily returns that follow a normal distribution (n.b. as we know, asset price returns usually follow a distribution that is more leptokurtic (fat tailed) than a normal distribution, but a normal distribution is often assumed for these kind of purposes). This type of price evolution is also known as a “random walk”.

If we want to buy a particular stock, for example, we may like to try to look into the future and attempt to predict what kind of returns we can expect with what kind of probability, or we may be interested in investigating what potential extreme outcomes we may experience and how exposed we are to the risk of ruin or, on the flip side, superior returns.

To set up our simulation, we need to estimate the expected level of return (mu) and volatility (vol) of the stock in question. This data can be estimated from historic prices, with the simplest methods just assuming past mean return and volatility levels will continue into the future. One could also adjust historic data to account for investor views or market regime changes etc, however to keep things simple and concentrate on the code we will just set simple return and volatility levels based on past price data.

OK so let’s start to write some code and generate the initial data we need as inputs to our Monte Carlo simulation. For illustrative purposes let’s look at the stock of Apple Inc….not very original, but there you go!

We begin with the obligatory importation of the necessary modules to assist us, and go from there:

#import necessary packages import numpy as np import math import matplotlib.pyplot as plt from scipy.stats import norm from pandas_datareader import data #download Apple price data into DataFrame apple = data.DataReader('AAPL', 'yahoo',start='1/1/2000') #calculate the compound annual growth rate (CAGR) which #will give us our mean return input (mu) days = (apple.index[-1] - apple.index[0]).days cagr = ((((apple['Adj Close'][-1]) / apple['Adj Close'][1])) ** (365.0/days)) - 1 print ('CAGR =',str(round(cagr,4)*100)+"%") mu = cagr #create a series of percentage returns and calculate #the annual volatility of returns apple['Returns'] = apple['Adj Close'].pct_change() vol = apple['Returns'].std()*sqrt(252) print ("Annual Volatility =",str(round(vol,4)*100)+"%")

This gives us:

CAGR = 23.09% Annual Volatility = 42.59%

Now we know our mean return input (mu) is 23.09% and our volatility input (vol) is 42.59% – the code to actually run the Monte Carlo simulation is as follows:

#Define Variables S = apple['Adj Close'][-1] #starting stock price (i.e. last available real stock price) T = 252 #Number of trading days mu = 0.2309 #Return vol = 0.4259 #Volatility #create list of daily returns using random normal distribution daily_returns=np.random.normal((mu/T),vol/math.sqrt(T),T)+1 #set starting price and create price series generated by above random daily returns price_list = [S] for x in daily_returns: price_list.append(price_list[-1]*x) #Generate Plots - price series and histogram of daily returns plt.plot(price_list) plt.show() plt.hist(daily_returns-1, 100) #Note that we run the line plot and histogram separately, not simultaneously. plt.show()

This code returns the following plots:

The above code basically ran a single simulation of potential price series evolution over a trading year (252 days), based upon a draw of random daily returns that follow a normal distribution. This is represented by the single line series shown in the first chart. The second chart plots a histogram of those random daily returns over the year period.

So great – we have managed to successfully simulate a year’s worth of future daily price data. This is fantastic and all, but really it doesn’t afford us much insight into risk and return characteristics of the stock as we only have one randomly generated path. The likelyhood of the actual price evolving exactly as described in the above charts is pretty much zero.

So what’s the point of this simulation you may ask? Well, the real insight is gained from running thousands, tens of thousands or even hundreds of thousands of simulations, with each run producing a different series of potential price evolution based upon the same stock characteristics (mu and vol).

We can very simply adjust the above code to run multiple simulations. This code is presented below. In the below code you will notice a couple of things – firstly I have removed the histogram plot (we’ll come back to this in a slightly different way later), and also the code now plots multiple price series on one chart to show info for each separate run of the simulation.

import numpy as np import math import matplotlib.pyplot as plt from scipy.stats import norm #Define Variables S = apple['Adj Close'][-1] #starting stock price (i.e. last available real stock price) T = 252 #Number of trading days mu = 0.2309 #Return vol = 0.4259 #Volatility #choose number of runs to simulate - I have chosen 1000 for i in range(1000): #create list of daily returns using random normal distribution daily_returns=np.random.normal(mu/T,vol/math.sqrt(T),T)+1 #set starting price and create price series generated by above random daily returns price_list = [S] for x in daily_returns: price_list.append(price_list[-1]*x) #plot data from each individual run which we will plot at the end plt.plot(price_list) #show the plot of multiple price series created above plt.show()

This give us the following plot of the 1000 different simulated price series

Amazing! Now we can see the potential outcomes generated from 1000 different simulations, all based on the same basic inputs, allowing for the randomness of the daily return series.

The spread of final prices is quite large, ranging from around $45 to $500! That’s a pretty wide spread!

In it’s current format, with the chart being so full of data it can be a little difficult to actually see clearly what is going on – so this is where we come back to the histogram that we removed before, albeit this time it will show us the distribution of ending simulation values, rather than the distribution of daily returns for an individual simulation. I have also simulated 10,000 runs this time to give us more data.

Again, the code is easily adapted to include this histogram.

import numpy as np import math import matplotlib.pyplot as plt from scipy.stats import norm #set up empty list to hold our ending values for each simulated price series result = [] #Define Variables S = apple['Adj Close'][-1] #starting stock price (i.e. last available real stock price) T = 252 #Number of trading days mu = 0.2309 #Return vol = 0.4259 #Volatility #choose number of runs to simulate - I have chosen 10,000 for i in range(10000): #create list of daily returns using random normal distribution daily_returns=np.random.normal(mu/T,vol/math.sqrt(T),T)+1 #set starting price and create price series generated by above random daily returns price_list = [S] for x in daily_returns: price_list.append(price_list[-1]*x) #plot data from each individual run which we will plot at the end plt.plot(price_list) #append the ending value of each simulated run to the empty list we created at the beginning result.append(price_list[-1]) #show the plot of multiple price series created above plt.show() #create histogram of ending stock values for our mutliple simulations plt.hist(result,bins=50) plt.show()

This gives us the following charts:

We can now quickly calculate the mean of the distribution to get our “expected value”:

#use numpy mean function to calculate the mean of the result print(round(np.mean(result),2))

Which gives me 141.15. Of course you will get a slightly different result due to the fact that these are simulations of random daily return draws. The more paths or “runs” you include in each simulation, the more the mean value will tend towards the mean return we used as our “mu” input. This is as a result of the law of large numbers.

We can also take a look at a couple of “quantiles” of the potential price distribution, to get an idea of the likelyhood of very high or very low returns.

We can just use the numpy “percentile” function as follows to calculate the 5% and 95% quantiles:

print("5% quantile =",np.percentile(result,5)) print("95% quantile =",np.percentile(result,95))

5% quantile = 63.5246208951 95% quantile = 258.436742087

Lovely! So we now know that there is a 5% chance that our stock price will end up below around $63.52 and a 5% chance it will finish above $258.44.

So now we can begin to ask ourselves questions along the lines of “am I willing to risk a 5% chance of ending up with a stock worth less than $63.52, in order to chase an expected return of around 23%, giving us an expected stock price of around $141.15?”

I think the last thing to do is to quickly plot the two quantiles we have just calculated on the histogram to give us a visual representation and…well, why the hell not eh? 😉

plt.hist(result,bins=100) plt.axvline(np.percentile(result,5), color='r', linestyle='dashed', linewidth=2) plt.axvline(np.percentile(result,95), color='r', linestyle='dashed', linewidth=2) plt.show()

Ok, I think i’ll leave it there for now – if anyone has any comments or questions, please do leave them below.

Thanks!

## 16 comments

Thanks for the interesting article! I found your blog very useful.

I have just one thing to note regarding this line: daily_returns=np.random.normal(mu/T,vol/math.sqrt(T),T)+1

Here you use arithmetic returns (and not log returns) so if you use mu/T for calculating daily returns from annual one you will end up applying compund returns. It will overestimate the annual return (mu). Instead I would use something like this: daily_returns=np.random.normal((1+mu)**(1/T),vol/sqrt(T),T)

Cheers!

Viktor

Hi Viktor, thanks for the comment. You are indeed correct…I was concentrating too much on the coding element of the blog post and missed that simple error’ I appreciate you pointing it out!

Also, very happy to hear you’re finding the blog useful.

Hi. I’m REALLY enjoying this great blog so far. Thanks! This post is really useful, but I think there’s a problem – the calculation of a price list using:

for x in daily_returns:

price_list.append(price_list[-1]*x)

Blows up to ridiculous numbers, on the order of 10^(77). I ran this script on the day of posting, and those are the results I was getting. I don’t see how I could have done anything wrong, but I apologize if I did. Does anyone have any thoughts on this? I can’t seem to find a solution yet.

Cheers,

Stephen.

Hi Stephen, glad to hear you’re finding the content interesting…thanks for your comment. You are correct that I made a mistake in the second block of code (although the following blocks that actually runs the Monte Carlo simulations are correct) – instead of the code posted, it should have been:

Hope that helps – I will fix the code when I get a moment.

Cheers!

Hi,

Thanks for your reply. I did manage to get sensible results when using:

daily_returns = np.random.normal((1 + mu) ** (1/T), volatility/math.sqrt(T), T)

i.e. by not including the +1 term at the end. I’m having trouble finding a source to explain the mathematics of modelling daily returns though, so removing it might not be a legitimate solution. It does appear to work though!

Thanks again, I’m looking forward to studying the rest of your posts.

Cheers,

Stephen.

datareader doesn’t work properly with yahoo finance

Hi Denis – I have just tried to run the code above and it works fine without any error in downloading Apple stock price data from Yahoo. May I ask what error message you are getting? There was an issue with the Pandas Data_reader whereby prices couldn’t be downloaded anymore but that was fixed in a later update.

Perhaps you could try updating your pandas-datareader package to the latest version with pip.

Try running:

pip install data-reader –upgrade

in a CMD terminal.

Denis, you probably figured it out by now, historical prices area avail at yahoo in csv.

#download Apple price data into DataFrame

apple = pd.read_csv(‘AAPL.csv’, usecols=[‘Date’,’Adj Close’],index_col=0, parse_dates=True)

This blog is really helpful, and I’m learning a lot about Python as I go through the lessons. Could you help me with a (probably silly) question? You use the following normal distribution for daily returns: ‘daily_returns=np.random.normal(mu/T,vol/math.sqrt(T),T)+1’. Why add the constant ‘1’ to the distribution? I believe that adding a constant changes the mean and not the variance, but I tried plotting it without the constant to no avail (nothing appears). I tried other constants and would get weird or no plots, too. I’m sure there’s a simply explanation about adding the constant ‘1’, which I’m not aware of.

Thanks for your comment – the answer of why I am adding 1 to the distribution is as follows:

I am generating a set of random “normally distributed” returns based on the mean and variance of the particular price series we happen to be looking at (in the above example it is Apple stock). The next step is to apply those randomly generated returns to the “starting price “of Apple (which in our case is the last entry of “real” data downloaded from Yahoo), and then generated, step by step a set of “generated” price series.

The issue comes as the normal distribution will spit out numbers such as “0.0345” or “-0.0126” to select two completely random examples.

You have to add 1 to each of those numbers so that when you step through each randomly generated return and multiply it by the previous stock price – you get the new price of the stock, rather than just the incermental increase or decrease.

For example, if 100 increases by 2% for example, you don’t just multiply 100 by 0.02 – you multiply it by 1.02 (i.e. you add 1 to the return).

This is why 1 is added to each of the returns in the generated return series.

Hope that answers the question.

That was helpful. Thanks. I forgot that the normal distribution was of returns.

Traceback (most recent call last):

File “C:\Users\Denis\AppData\Local\Programs\Python\Python36-32\1.py”, line 9, in

apple = data.DataReader(‘AAPL’, ‘yahoo’,start=’1/1/2000′)

File “C:\Users\Denis\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas_datareader\data.py”, line 291, in DataReader

raise ImmediateDeprecationError(DEP_ERROR_MSG.format(‘Yahoo Daily’))

pandas_datareader.exceptions.ImmediateDeprecationError:

Yahoo Daily has been immediately deprecated due to large breaks in the API without the

introduction of a stable replacement. Pull Requests to re-enable these data

connectors are welcome.

See https://github.com/pydata/pandas-datareader/issues

Thank you for your previous reply and sorry for my delay

Thank you very much for the detailled presentation.

I wish to find out how i can run the montecarlo simulation and calculate the returns for a portfolio of stocks

Thank you

thanks for the tutorial! I’m trying to run 100,000 iterations and it takes a huge amount of time. I’d like to cut that time down a little… How can this code be optimized to use more cores or threads? I’ve got 8 cores and 16 threads and id like to use them all. Do you know how to cut the time down for these monte carlo sims?

Hi Phillip – I have been playing around with the code and have created a multithreaded process however the pay off in speed is actually negative, the way I have structured it. This is due to the fact that the biggest time sink in the whole script is the call to “plt.plot(price_list)” in each iteration of the simulation – with the ending call of “plt.show()” actually displaying the chart with all the 100,000 odd lines on.

If you were to remove these calls to plot each and every MC iteration result – the code would speed up massively. You can still keep the call the plot the histogram as that doesn’t take up too much time.

The reason why the multi-threaded code ended up taking longer I believe is due to the fact that it takes longer to fire up the threads and use them, than the time saving you get from doing that. The actual generation of the daily returns is pretty darn fast for each iteration so multi-threading it was counter-productive.

If you could multi-thread the creation of the plot then that would be the way to go but I’m really not sure that’s even possible!!

If you want to test it out for yourself I have pasted the mutli-threaded code below:

If you uncomment out the line “time.sleep(0.1) and compare that to the old code also with a “time.sleep(0.1)” added to each iteration, you will see that the multi-threading starts to pay positive dividends in terms of time savings and it is now worthwhile taking the time to fire up new threads.

Wondering what’s the best way to deterimate the data during we use? (ex: You use 01/01/2010 to the latest)

Great post by the way!