In this post I’ll be looking at **investment portfolio optimisation with python**, the fundamental concept of diversification and the creation of an efficient frontier that can be used by investors to choose specific mixes of assets based on investment goals; that is, the trade off between their desired level of portfolio return vs their desired level of portfolio risk.

Investopedia defines “Portfolio Theory” as:

“Modern Portfolio Theory (MPT), a hypothesis put forth by Harry Markowitz in his paper “Portfolio Selection,” (published in 1952 by the Journal of Finance) is an investment theory based on the idea that risk-averse investors can construct portfolios to optimize or maximize expected return based on a given level of market risk, emphasizing that risk is an inherent part of higher reward. It is one of the most important and influential economic theories dealing with finance and investment.

Also called “portfolio theory” or “portfolio management theory,” MPT suggests that it is possible to construct an “efficient frontier” of optimal portfolios, offering the maximum possible expected return for a given level of risk. It suggests that it is not enough to look at the expected risk and return of one particular stock. By investing in more than one stock, an investor can reap the benefits of diversification, particularly a reduction in the riskiness of the portfolio. MPT quantifies the benefits of diversification, also known as not putting all of your eggs in one basket.”

It’s very easy to run a few lines of Python to download data for a single stock, calculate the mean daily return and daily standard deviation of returns, and then just annualise them to get mean expected annual return and volatility of that single stock.

Let’s choose Apple as an example stock.

We can do this as follows:

import pandas_datareader.data as web import numpy as np import pandas as pd stock = ['AAPL'] data = web.DataReader(stock,data_source="yahoo",start='01/01/2010')['Adj Close'] returns = data.pct_change() mean_return = returns.mean() return_stdev = returns.std() annualised_return = round(mean_return * 252,2) annualised_stdev = round(return_stdev * np.sqrt(252),2) print ('The annualised mean return of stock {} is {}, ' 'and the annualised volatility is {}').format(stock[0],annualised_return,annualised_stdev) |

import pandas_datareader.data as web import numpy as np import pandas as pd stock = ['AAPL'] data = web.DataReader(stock,data_source="yahoo",start='01/01/2010')['Adj Close'] returns = data.pct_change() mean_return = returns.mean() return_stdev = returns.std() annualised_return = round(mean_return * 252,2) annualised_stdev = round(return_stdev * np.sqrt(252),2) print ('The annualised mean return of stock {} is {}, ' 'and the annualised volatility is {}').format(stock[0],annualised_return,annualised_stdev)

which gets us the output:

`"The annualised mean return of stock AAPL is 0.24, and the annualised volatility is 0.26"` |

"The annualised mean return of stock AAPL is 0.24, and the annualised volatility is 0.26"

This simple calculation can obviously be carried out for any stock you have in mind, but not many investor portfolios are made up of just one stock – they are made up of 10s of stocks, 100s of stocks sometimes. The rule regarding diversification is that it’s marginal benefit reduces as the number of stocks increases…in other words, the diversification benefit you get when moving from 1 stock to 2 stocks, is larger than when moving from 2 stocks to 3 stocks and so on (this is obviously just a general rule and the actual diversification benefits of any given stock depends on its correlation with the existing portfolio).

So what we are interested in is not the expected return and volatility (standard deviation) of a collection of individual stocks, but rather we want that information for the portfolio of stocks as a whole. This will capture the benefits of diversification of less than perfect correlation between the stocks in the portfolio.

So let’s now assume that we hold a portfolio of 4 tech stocks, Apple, Microsoft, Amazon and Yahoo…how do we start to calculate the expected return and volatility of that portfolio?

Well we need as our first input, the weights of the stocks in the portfolio – that is, how much of each stock do we hold as a percentage of the entire portfolio holdings.

Let’s say our portfolio is made up of 50% Apple stock, 20% Microsoft stock, 20% Amazon stock and 10% Yahoo. We can calculate the portfolio expected return and volatility as follows:

#list of stocks in portfolio stocks = ['AAPL','MSFT','AMZN','YHOO'] #download daily price data for each of the stocks in the portfolio data = web.DataReader(stocks,data_source='yahoo',start='01/01/2010')['Adj Close'] #convert daily stock prices into daily returns returns = data.pct_change() #calculate mean daily return and covariance of daily returns mean_daily_returns = returns.mean() cov_matrix = returns.cov() #set array holding portfolio weights of each stock weights = np.asarray([0.5,0.2,0.2,0.1]) #calculate annualised portfolio return portfolio_return = round(np.sum(mean_daily_returns * weights) * 252,2) #calculate annualised portfolio volatility portfolio_std_dev = round(np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252),2) print('Portfolio expected annualised return is {} and volatility is {}').format(portfolio_return,portfolio_std_dev) |

#list of stocks in portfolio stocks = ['AAPL','MSFT','AMZN','YHOO'] #download daily price data for each of the stocks in the portfolio data = web.DataReader(stocks,data_source='yahoo',start='01/01/2010')['Adj Close'] #convert daily stock prices into daily returns returns = data.pct_change() #calculate mean daily return and covariance of daily returns mean_daily_returns = returns.mean() cov_matrix = returns.cov() #set array holding portfolio weights of each stock weights = np.asarray([0.5,0.2,0.2,0.1]) #calculate annualised portfolio return portfolio_return = round(np.sum(mean_daily_returns * weights) * 252,2) #calculate annualised portfolio volatility portfolio_std_dev = round(np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252),2) print('Portfolio expected annualised return is {} and volatility is {}').format(portfolio_return,portfolio_std_dev)

Which gets us:

`"Portfolio expected annualised return is 0.23 and volatility is 0.21"` |

"Portfolio expected annualised return is 0.23 and volatility is 0.21"

Great so we’re getting there, we now know the expected return and volatility of our current portfolio. But what if we aren’t happy with the level of volatility of our current portfolio and would like to reduce it? What if we are willing to take on more risk in search of a higher expected return? How can we rearrange the weight of each stock in our portfolio to achieve these goals?

Well, we could start by manually changing the weights in the array in our previous code, run the program again and see what the expected return and volatility of that particular set of weights comes out at. But that’s a VERY manual way to proceed….there are technically an infinite number of sets of portfolio weights to test, so obviously that’s not practical.

Luckily we can use Monte Carlo simulation to run 1000s of runs of different randomly generated weights for the individual stocks (obviously making sure the weights sum to 100%) and then calculate the expected return, expected volatility and Sharpe Ratio for each of the randomly generated portfolios.

It’s then very simple (and helpful) to plot these combinations of expected returns and volatilities on a scatter plot – we can even colour the data points based on the Sharpe Ratio of that particular portfolio. So let’s do this:

import numpy as np import pandas as pd import pandas_datareader.data as web import matplotlib.pyplot as plt #list of stocks in portfolio stocks = ['AAPL','MSFT','AMZN','YHOO'] #download daily price data for each of the stocks in the portfolio data = web.DataReader(stocks,data_source='yahoo',start='01/01/2010')['Adj Close'] #convert daily stock prices into daily returns returns = data.pct_change() #calculate mean daily return and covariance of daily returns mean_daily_returns = returns.mean() cov_matrix = returns.cov() #set number of runs of random portfolio weights num_portfolios = 25000 #set up array to hold results results = np.zeros((3,num_portfolios)) for i in xrange(num_portfolios): #select random weights for portfolio holdings weights = np.random.random(4) #rebalance weights to sum to 1 weights /= np.sum(weights) #calculate portfolio return and volatility portfolio_return = np.sum(mean_daily_returns * weights) * 252 portfolio_std_dev = np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252) #store results in results array results[0,i] = portfolio_return results[1,i] = portfolio_std_dev #store Sharpe Ratio (return / volatility) - risk free rate element excluded for simplicity results[2,i] = results[0,i] / results[1,i] #convert results array to Pandas DataFrame results_frame = pd.DataFrame(results.T,columns=['ret','stdev','sharpe']) #create scatter plot coloured by Sharpe Ratio plt.scatter(results_frame.stdev,results_frame.ret,c=results_frame.sharpe,cmap='RdYlBu') plt.colorbar() |

import numpy as np import pandas as pd import pandas_datareader.data as web import matplotlib.pyplot as plt #list of stocks in portfolio stocks = ['AAPL','MSFT','AMZN','YHOO'] #download daily price data for each of the stocks in the portfolio data = web.DataReader(stocks,data_source='yahoo',start='01/01/2010')['Adj Close'] #convert daily stock prices into daily returns returns = data.pct_change() #calculate mean daily return and covariance of daily returns mean_daily_returns = returns.mean() cov_matrix = returns.cov() #set number of runs of random portfolio weights num_portfolios = 25000 #set up array to hold results results = np.zeros((3,num_portfolios)) for i in xrange(num_portfolios): #select random weights for portfolio holdings weights = np.random.random(4) #rebalance weights to sum to 1 weights /= np.sum(weights) #calculate portfolio return and volatility portfolio_return = np.sum(mean_daily_returns * weights) * 252 portfolio_std_dev = np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252) #store results in results array results[0,i] = portfolio_return results[1,i] = portfolio_std_dev #store Sharpe Ratio (return / volatility) - risk free rate element excluded for simplicity results[2,i] = results[0,i] / results[1,i] #convert results array to Pandas DataFrame results_frame = pd.DataFrame(results.T,columns=['ret','stdev','sharpe']) #create scatter plot coloured by Sharpe Ratio plt.scatter(results_frame.stdev,results_frame.ret,c=results_frame.sharpe,cmap='RdYlBu') plt.colorbar()

From the figure above it’s obvious that changing the weight of each stock in the portfolio can have a dramatic effect on the expected return and level of risk (standard deviation/volatility) the investor is exposed to.

We can see, for example, that if the investor is targeting a return of 18%, that can be achieved by holding a portfolio with a volatility of just under 20%, but some portfolios share that same expected return with a volatility as high as just over 28%. This makes it very clear that we must be very thoughtful when choosing how much weight each of our stocks in our portfolio should carry.

Two portfolios that we may like to highlight as being “special” are 1) the portfolio with the highest Sharpe Ratio (i.e. the highest risk adjusted returns) and 2) The “minimum variance portfolio” which is the portfolio with the lowest volatility.

We can locate these 2 portfolios by making a few changes to our code to store all the random weight arrays used for each run of the Monte Carlo simulation along side the expected return, standard deviation and Sharpe Ratio as we go along, then locate the point in the resulting DataFrame where the Sharpe Ratio is highest for portfolio “1” and where the standard deviation is lowest for portfolio “2”. Then – because we stored them as we went along, we can extract the array of weights that corresponds to each of those two portoflios and bang! there we have all the info we need to construct either of the two portfolios!

The full code is as follows (with the changes/additions highlighted):

import numpy as np import pandas as pd import pandas_datareader.data as web import matplotlib.pyplot as plt #list of stocks in portfolio stocks = ['AAPL','MSFT','AMZN','YHOO'] #download daily price data for each of the stocks in the portfolio data = web.DataReader(stocks,data_source='yahoo',start='01/01/2010')['Adj Close'] #convert daily stock prices into daily returns returns = data.pct_change() #calculate mean daily return and covariance of daily returns mean_daily_returns = returns.mean() cov_matrix = returns.cov() #set number of runs of random portfolio weights num_portfolios = 25000 #set up array to hold results #We have increased the size of the array to hold the weight values for each stock results = np.zeros((4+len(stocks)-1,num_portfolios)) for i in xrange(num_portfolios): #select random weights for portfolio holdings weights = np.array(np.random.random(4)) #rebalance weights to sum to 1 weights /= np.sum(weights) #calculate portfolio return and volatility portfolio_return = np.sum(mean_daily_returns * weights) * 252 portfolio_std_dev = np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252) #store results in results array results[0,i] = portfolio_return results[1,i] = portfolio_std_dev #store Sharpe Ratio (return / volatility) - risk free rate element excluded for simplicity results[2,i] = results[0,i] / results[1,i] #iterate through the weight vector and add data to results array for j in range(len(weights)): results[j+3,i] = weights[j] #convert results array to Pandas DataFrame results_frame = pd.DataFrame(results.T,columns=['ret','stdev','sharpe',stocks[0],stocks[1],stocks[2],stocks[3]]) #locate position of portfolio with highest Sharpe Ratio max_sharpe_port = results_frame.iloc[results_frame['sharpe'].idxmax()] #locate positon of portfolio with minimum standard deviation min_vol_port = results_frame.iloc[results_frame['stdev'].idxmin()] #create scatter plot coloured by Sharpe Ratio plt.scatter(results_frame.stdev,results_frame.ret,c=results_frame.sharpe,cmap='RdYlBu') plt.xlabel('Volatility') plt.ylabel('Returns') plt.colorbar() #plot red star to highlight position of portfolio with highest Sharpe Ratio plt.scatter(max_sharpe_port[1],max_sharpe_port[0],marker=(5,1,0),color='r',s=1000) #plot green star to highlight position of minimum variance portfolio plt.scatter(min_vol_port[1],min_vol_port[0],marker=(5,1,0),color='g',s=1000) |

import numpy as np import pandas as pd import pandas_datareader.data as web import matplotlib.pyplot as plt #list of stocks in portfolio stocks = ['AAPL','MSFT','AMZN','YHOO'] #download daily price data for each of the stocks in the portfolio data = web.DataReader(stocks,data_source='yahoo',start='01/01/2010')['Adj Close'] #convert daily stock prices into daily returns returns = data.pct_change() #calculate mean daily return and covariance of daily returns mean_daily_returns = returns.mean() cov_matrix = returns.cov() #set number of runs of random portfolio weights num_portfolios = 25000 #set up array to hold results #We have increased the size of the array to hold the weight values for each stock results = np.zeros((4+len(stocks)-1,num_portfolios)) for i in xrange(num_portfolios): #select random weights for portfolio holdings weights = np.array(np.random.random(4)) #rebalance weights to sum to 1 weights /= np.sum(weights) #calculate portfolio return and volatility portfolio_return = np.sum(mean_daily_returns * weights) * 252 portfolio_std_dev = np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252) #store results in results array results[0,i] = portfolio_return results[1,i] = portfolio_std_dev #store Sharpe Ratio (return / volatility) - risk free rate element excluded for simplicity results[2,i] = results[0,i] / results[1,i] #iterate through the weight vector and add data to results array for j in range(len(weights)): results[j+3,i] = weights[j] #convert results array to Pandas DataFrame results_frame = pd.DataFrame(results.T,columns=['ret','stdev','sharpe',stocks[0],stocks[1],stocks[2],stocks[3]]) #locate position of portfolio with highest Sharpe Ratio max_sharpe_port = results_frame.iloc[results_frame['sharpe'].idxmax()] #locate positon of portfolio with minimum standard deviation min_vol_port = results_frame.iloc[results_frame['stdev'].idxmin()] #create scatter plot coloured by Sharpe Ratio plt.scatter(results_frame.stdev,results_frame.ret,c=results_frame.sharpe,cmap='RdYlBu') plt.xlabel('Volatility') plt.ylabel('Returns') plt.colorbar() #plot red star to highlight position of portfolio with highest Sharpe Ratio plt.scatter(max_sharpe_port[1],max_sharpe_port[0],marker=(5,1,0),color='r',s=1000) #plot green star to highlight position of minimum variance portfolio plt.scatter(min_vol_port[1],min_vol_port[0],marker=(5,1,0),color='g',s=1000)

And now, all that’s left to do is pull out the weights of the stocks needed to create the two portfolios highlighted above, and that can be done as follows:

print(max_sharpe_port) |

print(max_sharpe_port)

ret 0.255758

stdev 0.220652

sharpe 1.159100

AAPL 0.472286

MSFT 0.387202

AMZN 0.090554

YHOO 0.049959

Name: 7146, dtype: float64

and

print(min_vol_port) |

print(min_vol_port)

ret 0.196974

stdev 0.194356

sharpe 1.013470

AAPL 0.299661

MSFT 0.087340

AMZN 0.433787

YHOO 0.179211

Name: 5792, dtype: float64

Ok, I think I’ll leave it there for now. As always, any comments or questions, please do leave them in the comments section below.

Until next time…