Home Basic Data Analysis Investment Portfolio Optimisation with Python

# Investment Portfolio Optimisation with Python

In this post I’ll be looking at investment portfolio optimisation with python, the fundamental concept of diversification and the creation of an efficient frontier that can be used by investors to choose specific mixes of assets based on investment goals; that is, the trade off between their desired level of portfolio return vs their desired level of portfolio risk.

Investopedia defines “Portfolio Theory” as:

“Modern Portfolio Theory (MPT), a hypothesis put forth by Harry Markowitz in his paper “Portfolio Selection,” (published in 1952 by the Journal of Finance) is an investment theory based on the idea that risk-averse investors can construct portfolios to optimize or maximize expected return based on a given level of market risk, emphasizing that risk is an inherent part of higher reward. It is one of the most important and influential economic theories dealing with finance and investment.

Also called “portfolio theory” or “portfolio management theory,” MPT suggests that it is possible to construct an “efficient frontier” of optimal portfolios, offering the maximum possible expected return for a given level of risk. It suggests that it is not enough to look at the expected risk and return of one particular stock. By investing in more than one stock, an investor can reap the benefits of diversification, particularly a reduction in the riskiness of the portfolio. MPT quantifies the benefits of diversification, also known as not putting all of your eggs in one basket.”

It’s very easy to run a few lines of Python to download data for a single stock, calculate the mean daily return and daily standard deviation of returns, and then just annualise them to get mean expected annual return and volatility of that single stock.

Let’s choose Apple as an example stock.

We can do this as follows:

```import pandas_datareader.data as web
import numpy as np
import pandas as pd

stock = ['AAPL']

data.sort_index(inplace=True)

returns = data.pct_change()

mean_return = returns.mean()
return_stdev = returns.std()

annualised_return = round(mean_return * 252,2)
annualised_stdev = round(return_stdev * np.sqrt(252),2)

print ('The annualised mean return of stock {} is {}, '
'and the annualised volatility is {}').format(stock,annualised_return,annualised_stdev)

```

which gets us the output:

```"The annualised mean return of stock AAPL is 0.24, and the annualised volatility is 0.26"
```

This simple calculation can obviously be carried out for any stock you have in mind, but not many investor portfolios are made up of just one stock – they are made up of 10s of stocks, 100s of stocks sometimes. The rule regarding diversification is that it’s marginal benefit reduces as the number of stocks increases…in other words, the diversification benefit you get when moving from 1 stock to 2 stocks, is larger than when moving from 2 stocks to 3 stocks and so on (this is obviously just a general rule and the actual diversification benefits of any given stock depends on its correlation with the existing portfolio).

So what we are interested in is not the expected return and volatility (standard deviation) of a collection of individual stocks, but rather we want that information for the portfolio of stocks as a whole. This will capture the benefits of diversification of less than perfect correlation between the stocks in the portfolio.

So let’s now assume that we hold a portfolio of 4 tech stocks, Apple, Microsoft, Amazon and Yahoo…how do we start to calculate the expected return and volatility of that portfolio?

Well we need as our first input, the weights of the stocks in the portfolio – that is, how much of each stock do we hold as a percentage of the entire portfolio holdings.

Let’s say our portfolio is made up of 50% Apple stock, 20% Microsoft stock, 20% Amazon stock and 10% Yahoo. We can calculate the portfolio expected return and volatility as follows:

```#list of stocks in portfolio
#NOTE THAT THESE MUST BE ENTERED IN ALPHABETICAL ORDER FOR THE RESULTS TO BE CORRECT!!!
stocks = ['AAPL','AMZN','MSFT','YHOO']

#download daily price data for each of the stocks in the portfolio

data.sort_index(inplace=True)

#convert daily stock prices into daily returns
returns = data.pct_change()

#calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

#set array holding portfolio weights of each stock
weights = np.asarray([0.5,0.2,0.2,0.1])

#calculate annualised portfolio return
portfolio_return = round(np.sum(mean_daily_returns * weights) * 252,2)
#calculate annualised portfolio volatility
portfolio_std_dev = round(np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252),2)

print('Portfolio expected annualised return is {} and volatility is {}').format(portfolio_return,portfolio_std_dev)

```

Which gets us:

```"Portfolio expected annualised return is 0.23 and volatility is 0.21"
```

Great so we’re getting there, we now know the expected return and volatility of our current portfolio. But what if we aren’t happy with the level of volatility of our current portfolio and would like to reduce it? What if we are willing to take on more risk in search of a higher expected return? How can we rearrange the weight of each stock in our portfolio to achieve these goals?

Well, we could start by manually changing the weights in the array in our previous code, run the program again and see what the expected return and volatility of that particular set of weights comes out at. But that’s a VERY manual way to proceed….there are technically an infinite number of sets of portfolio weights to test, so obviously that’s not practical.

Luckily we can use Monte Carlo simulation to run 1000s of runs of different randomly generated weights for the individual stocks (obviously making sure the weights sum to 100%) and then calculate the expected return, expected volatility and Sharpe Ratio for each of the randomly generated portfolios.

It’s then very simple (and helpful) to plot these combinations of expected returns and volatilities on a scatter plot – we can even colour the data points based on the Sharpe Ratio of that particular portfolio. So let’s do this:

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#list of stocks in portfolio
stocks = ['AAPL','AMZN','MSFT','YHOO']

#download daily price data for each of the stocks in the portfolio

data.sort_index(inplace=True)

#convert daily stock prices into daily returns
returns = data.pct_change()

#calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

#set number of runs of random portfolio weights
num_portfolios = 25000

#set up array to hold results
results = np.zeros((3,num_portfolios))

for i in xrange(num_portfolios):
#select random weights for portfolio holdings
weights = np.random.random(4)
#rebalance weights to sum to 1
weights /= np.sum(weights)

#calculate portfolio return and volatility
portfolio_return = np.sum(mean_daily_returns * weights) * 252
portfolio_std_dev = np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252)

#store results in results array
results[0,i] = portfolio_return
results[1,i] = portfolio_std_dev
#store Sharpe Ratio (return / volatility) - risk free rate element excluded for simplicity
results[2,i] = results[0,i] / results[1,i]

#convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T,columns=['ret','stdev','sharpe'])

#create scatter plot coloured by Sharpe Ratio
plt.scatter(results_frame.stdev,results_frame.ret,c=results_frame.sharpe,cmap='RdYlBu')
plt.colorbar()
```

From the figure above it’s obvious that changing the weight of each stock in the portfolio can have a dramatic effect on the expected return and level of risk (standard deviation/volatility) the investor is exposed to.

We can see, for example, that if the investor is targeting a return of 18%, that can be achieved by holding a portfolio with a volatility of just under 20%, but some portfolios share that same expected return with a volatility as high as just over 28%. This makes it very clear that we must be very thoughtful when choosing how much weight each of our stocks in our portfolio should carry.

Two portfolios that we may like to highlight as being “special” are 1) the portfolio with the highest Sharpe Ratio (i.e. the highest risk adjusted returns) and 2) The “minimum variance portfolio” which is the portfolio with the lowest volatility.

We can locate these 2 portfolios by making a few changes to our code to store all the random weight arrays used for each run of the Monte Carlo simulation along side the expected return, standard deviation and Sharpe Ratio as we go along, then locate the point in the resulting DataFrame where the Sharpe Ratio is highest for portfolio “1” and where the standard deviation is lowest for portfolio “2”. Then – because we stored them as we went along, we can extract the array of weights that corresponds to each of those two portoflios and bang! there we have all the info we need to construct either of the two portfolios!

The full code is as follows (with the changes/additions highlighted):

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#list of stocks in portfolio
stocks = ['AAPL','AMZN','MSFT','YHOO']

#download daily price data for each of the stocks in the portfolio

data.sort_index(inplace=True)

#convert daily stock prices into daily returns
returns = data.pct_change()

#calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

#set number of runs of random portfolio weights
num_portfolios = 25000

#set up array to hold results
#We have increased the size of the array to hold the weight values for each stock
results = np.zeros((4+len(stocks)-1,num_portfolios))

for i in xrange(num_portfolios):
#select random weights for portfolio holdings
weights = np.array(np.random.random(4))
#rebalance weights to sum to 1
weights /= np.sum(weights)

#calculate portfolio return and volatility
portfolio_return = np.sum(mean_daily_returns * weights) * 252
portfolio_std_dev = np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252)

#store results in results array
results[0,i] = portfolio_return
results[1,i] = portfolio_std_dev
#store Sharpe Ratio (return / volatility) - risk free rate element excluded for simplicity
results[2,i] = results[0,i] / results[1,i]
#iterate through the weight vector and add data to results array
for j in range(len(weights)):
results[j+3,i] = weights[j]

#convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T,columns=['ret','stdev','sharpe',stocks,stocks,stocks,stocks])

#locate position of portfolio with highest Sharpe Ratio
max_sharpe_port = results_frame.iloc[results_frame['sharpe'].idxmax()]
#locate positon of portfolio with minimum standard deviation
min_vol_port = results_frame.iloc[results_frame['stdev'].idxmin()]

#create scatter plot coloured by Sharpe Ratio
plt.scatter(results_frame.stdev,results_frame.ret,c=results_frame.sharpe,cmap='RdYlBu')
plt.xlabel('Volatility')
plt.ylabel('Returns')
plt.colorbar()
#plot red star to highlight position of portfolio with highest Sharpe Ratio
plt.scatter(max_sharpe_port,max_sharpe_port,marker=(5,1,0),color='r',s=1000)
#plot green star to highlight position of minimum variance portfolio
plt.scatter(min_vol_port,min_vol_port,marker=(5,1,0),color='g',s=1000)
```

And now, all that’s left to do is pull out the weights of the stocks needed to create the two portfolios highlighted above, and that can be done as follows:

```print(max_sharpe_port)
```
```ret 0.255758
stdev 0.220652
sharpe 1.159100
AAPL 0.472286
MSFT 0.387202
AMZN 0.090554
YHOO 0.049959
Name: 7146, dtype: float64```

(PLEASE NOTE THESE RESULTS ARE INCORRECT AS THEY WERE RUN PREVIOUS TO CORRECTLY SETTING MY INITIAL STOCK LIST IN ALPHABETICAL ORDER – I SHALL TRY TO FIX WHEN I GET A MOMENT)

and

```print(min_vol_port)
```
```ret 0.196974
stdev 0.194356
sharpe 1.013470
AAPL 0.299661
MSFT 0.087340
AMZN 0.433787
YHOO 0.179211
Name: 5792, dtype: float64```

Ok, I think I’ll leave it there for now. As always, any comments or questions, please do leave them in the comments section below.

Until next time…

#### You may also like

January 21, 2017 - 1:49 pm

Interesting article just as the previous ones. I like the simple solutions that lead to insightful results.

I have a question regarding your 18% return example. Why would anyone opt for 18% return if the minimum volatility portfolio can produce >19% return? Even the best 18% return portfolio has a volatility that can also produce a 21% return. Or it was just an example for the variance of volatilities with the same return?

Here again you calculate arithmetic return so may this line can be inaccurate: portfolio_return = round(np.sum(mean_daily_returns * weights) * 252,2) . Probably it is not a big problem since below 1% the difference of log and arithmetic return is negligible but if you have higher values it can distort results.

You may want to take a look at the portfolioopt (https://github.com/czielinski/portfolioopt) module, that is basically built on top of cvxopt (a convex optimizer module for python). It can calculate quickly Markowitz portfolios, minimum variance portfolios and tangency portfolios.

Keep up the good work!

January 28, 2017 - 6:16 pm

Hi Viktor, thanks for your comments as always. Again, you are technically correct regarding the log vs atrithmetic returns, I should stop being lazy and start to use log returns going forward.

The example of the 18% return was just indeed an example to highlight the scale of different volatilities possible for the same return…of course no rational investor would choose a return of 18% as as you noticed, a higher return is available on the efficient frontier for the same level of volatility…it was just an example, albeit looking back perhaps not the best example.

Thanks for the recommendation regarding the module you mentioned, I’ll be sure to check it out…I may even do quick blog post about it at some point ?

July 27, 2017 - 11:48 pm

Hello,

Awesome article. Simple question for you. If I want to take into account ‘short’ positions, would I just follow the same logic and syntax of this code and instead put for example:

weights = np.asarray([ -0.5, 0.2, 0.2, 0.1]) if I want Apple to be a short position of 50% instead of a long position of 50%?

July 28, 2017 - 9:28 pm

Hi Sam, glad you like the article. I would say that the method to include short positions isn’t quite as simple as just reversing the sign on the short, however nor is it particularly complex.

If we assume that you are trading cash equities, if you short sell you will actually receive a flow of funds into your portfolio which you can then use to purchase other shares.

Say you had a portfolio of 100k, and you short sell 25k worth of Apple stock, you will actually have 125k cash in your portfolio and a short position in Apple.

So you need to decide whether you want your model to assume you are fully invested at all times – meaning you sit with zero cash in your portfolio.

Taking the example above, this means if you were short 50% Apple, you would need to invest those additional funds of 50% across the other stocks in your portfolio, for e.g.

weights = np.asarray([ -0.5, 1.2, 0.2, 0.1])

You can see now that the sum of the weights still adds up to one and you have invested the additional cash from the Apple sale into the second stock in the list, whatever that may be.

If you wish to sit in cash, then you can just assume whatever % you have in cash has zero mean return and zero volatility.

I would simply add a column to the DataFrame that held a constant – say 1 – and then assign a particular percentage of cash to that stream – it will result in calculating the relevant percentage for a cash amount that doesn’t change.

July 28, 2017 - 11:30 pm

Thank you so much for the response. Ok makes sense. So going off of your latter example, lets assume I want to not put that additional funds that I receive from the short to work and any additional funds that I get from shorting I let sit in cash. When you say to add a column to the data frame, how would the mechanics of this actually look like. In the apple case where I want 50% to be short APPLE, 20% long MSFT, 20% long AMZN, and 10% long YHOO, would it then look like this?

# list of stocks in portfolio including excess cash from short positions
stocks = [‘AAPL’, ‘MSFT’, ‘AMZN’, ‘YHOO’]

# download daily price data for each of the stocks in the portfolio

# set array holding portfolio weights of each stock
weights = np.asarray([ -0.5, 0.2, 0.2, 0.1, 1.0])

Am I one the right track? I’m worried that since I added a column to the data frame to represent cash (at 1.0), the algorithm gets confused since my initial data frame only had 4 columns (stocks = [‘AAPL’, ‘MSFT’, ‘AMZN’, ‘YHOO’]), instead of the newly added data frame including net cash with 5 columns (weights = np.asarray([ -0.5, 0.2, 0.2, 0.1, 1.0]). Thanks in advance for the help! Really appreciate it………..

Best,

Sam

July 30, 2017 - 11:19 pm

Hi Sam…That’s not exactly how i meant it should be done.

You should indeed add an extra number to your weights vector as mentioned, and that number will represent the % of your portfolio that you are holding un-invested, in cash.

So using your example of being short 50% of Apple stock you would have a weight vector of:

(weights = np.asarray([ -0.5, 0.2, 0.2, 0.1, 1.0])

The 1.0 at the end is calculated as being the percentage of your portfolio that you hold in cash – in this instance it is actually 100% as you have financed the 0.2, 0.2 and 0.1 holdings by selling 0.5 of Apple – leaving you flat and holding 100% of your portfolio value in cash.

If you had only shorted 10% Apple stock for example, your weights vector would be:

(weights = np.asarray([ -0.1, 0.2, 0.2, 0.1, 0.6])

as the portfolio needs to sum to 1.0.

But you also need to add the cash “returns” stream to your initial DataFrame of investments – you need 5 columns, the last one representing cash.

I think the easiest way to do it would be just to write:

data[‘CASH’] = 1

after the line:

This will get you a proxy for your cash returns (which of course will be zero, with zero volatility).

After that the script should run as normal, with the relevant percentage being assigned to a stream of cash returns (i.e. zero returns with zero volatility).

The only problem I can forsee (without having run the new code myself) is that you may get a “division by zero” error somewhere along the way.

If you do…let me know and I can have a rethink.

July 31, 2017 - 7:18 pm

Hello again!

Yes this seemed to work fine from what I can see. Just want to clarify one thing though. Basically, I assumed a net short of 50% Apple, while keeping 20% long MSFT, 20% long AMZN, and 10% long YHOO. Thus my data frame along with relevant code looks like this:

# download daily price data for each of the stocks in the portfolio

# add line of code so specify % of portfolio from short sale to remain in cash; in this case 100% of it

data[‘CASH’] = 1

# list of stocks in portfolio
stocks = [‘AAPL’, ‘MSFT’, ‘AMZN’, ‘YHOO’]

weights = np.asarray([-0.5, 0.2, 0.2, 0.1,1.0])

The code seems to run just fine and I do get output with no division errors, which is great. However, I want to make sure I am interpreting this correctly. I assumed I was net short 50% apple for this example, and thus wanted to optimize my returns based on either sharpe ratio or volatility dating back to 2010. Now, the output I get is this:

**********Portfolio to Maximize Sharpe Ratio**********
ret 0.269247
stdev 0.221132
sharpe 1.217584
AAPL 0.455846
MSFT 0.413705
AMZN 0.006025
YHOO 0.124424

_____________________________________________________________________________________________________________
**********Portfolio to Minimize Volatility**********
ret 0.199826
stdev 0.190343
sharpe 1.049820
AAPL 0.302666
MSFT 0.100345
AMZN 0.435584

I guess my simple question is this. Is the python optimization telling me that despite having a net short position of 50% apple due to my own discretion and preference to construct this portfolio, if I wanted to in fact optimize the portfolio fully based on both Sharpe Ratio, or Minimum Volatility, then I should rebalance this portfolio to actually be NET LONG apple (at 45.5% for Sharpe Ratio maximization) and NET LONG apple (at 30.26% to Maximize Volatility) instead of my desired -50% NET SHORT apple I initially specified in my portfolio? In other words, to optimize based on both criteria, it is telling me to actually be NET LONG apple instead of NET SHORT apple, at least going back to my inputted date of 1/1/2010?

As always, thank you so much for the assistance and guidance. Your work is truly inspirational and helpful in my learnings. Look forward to hearing your thoughts soon!

Best,

Sam Khorsand

July 31, 2017 - 11:40 pm

Hi Sam,

A couple of points to make first…

When you write the line:

# add line of code so specify % of portfolio from short sale to remain in cash; in this case 100% of it
data[‘CASH’] = 1

The series of 1s isn’t there to represent the 100% cash held, it is there to represent the return stream of actually holding the cash. So it is analogous to the “price” of cash, just as the price of Apple or the price of Yahoo is shown in the DataFrame. I just chose 1 as an arbitrary figure – you could have chosen anything that remained constant to represent the fact that cash is risk free (zero volatility) but also generates zero returns – hence the “price” of cash doesn’t change. Does that make sense?

It’s the 1.0 in the “weights” vector which represents how much cash you are actually holding in % terms.

Secondly, when I stated that the above code should run ok, I assumed you were just calculating the returns and volatility of one pre-defined portfolio. To run an optimisation of a portfolio which allows a cash holding will require some more modifications to the code.

Unfortunately i can’t get the code to run at all anymore due to the discontinuation of the Yahoo Finance API, but to point you in the right direction you will need to do the following:

#download daily price data for each of the stocks in the portfolio

stocks.append(‘CASH’)

Then change the line:

results = np.zeros((4+len(stocks)-1,num_portfolios))

to

results = np.zeros((5+len(stocks)-1,num_portfolios))

Then change the line:

weights = np.array(np.random.random(4))

to

weights = np.array(np.random.random(5))

Then change the line:

#convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T,columns=[‘ret’,’stdev’,’sharpe’,stocks,stocks,stocks,stocks])

to

#convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T,columns=[‘ret’,’stdev’,’sharpe’,stocks,stocks,stocks,stocks,stocks])

and HOPEFULLY that should run ok…as i said I am doing this without actually running the code so no guarantees! 😉

To answer your last point – remember it doesn’t matter what you have specified previously in your weights vector as being short Apple or what have you – when running the optimisation simulation, your initial weights vector is overwritten each time it runs with random weights for each asset – that is the whole point of the exercise – to allow the optimisation procedure to find the “best” portfolio for you….regardless of what your initial portfolio is specified as being.

Hope all that makes sense. Any questions – fire away! 😉

August 1, 2017 - 12:18 am

Stuart,

Thank you very much for the prompt response. I made the necessary changes you describe above and getting a very odd error message specifically:

ValueError: Wrong number of items passed 9, placement implies 8
ValueError: Shape of passed values is (9, 25000), indices imply (8, 25000)

The Full ERROR is:
******************************************************************************************************************************************************************

Traceback (most recent call last):
File “C:\Users\Sam\Anaconda64Best\lib\site-packages\pandas\core\internals.py”, line 4294, in create_block_manager_from_blocks
placement=slice(0, len(axes)))]
File “C:\Users\Sam\Anaconda64Best\lib\site-packages\pandas\core\internals.py”, line 2719, in make_block
return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
File “C:\Users\Sam\Anaconda64Best\lib\site-packages\pandas\core\internals.py”, line 115, in __init__
len(self.mgr_locs)))
ValueError: Wrong number of items passed 9, placement implies 8

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:/Users/Sam/PycharmProjects/Test/.ipynb_checkpoints/Long Short Portfolio Opt Cash Sit.py”, line 178, in
results_frame = pd.DataFrame(results.T, columns=[‘ret’, ‘stdev’, ‘sharpe’, stocks, stocks, stocks, stocks, stocks])
File “C:\Users\Sam\Anaconda64Best\lib\site-packages\pandas\core\frame.py”, line 306, in __init__
copy=copy)
File “C:\Users\Sam\Anaconda64Best\lib\site-packages\pandas\core\frame.py”, line 483, in _init_ndarray
return create_block_manager_from_blocks([values], [columns, index])
File “C:\Users\Sam\Anaconda64Best\lib\site-packages\pandas\core\internals.py”, line 4303, in create_block_manager_from_blocks
construction_error(tot_items, blocks.shape[1:], axes, e)
File “C:\Users\Sam\Anaconda64Best\lib\site-packages\pandas\core\internals.py”, line 4280, in construction_error
passed, implied))
ValueError: Shape of passed values is (9, 25000), indices imply (8, 25000)

******************************************************************************************************************************************************************
I also tried to change:

for j in range(len(weights)):
results[j + 3, i] = weights[j]

to:

for j in range(len(weights)):
results[j + 4, i] = weights[j] , thinking that might do it. (from 3 to 4) But it did not.

Any idea what could be going on with the 9 instead of 8 error?

My full code is shown below, if it may help. Thank you so much for your time and concern in helping me. I really appreciate it.

******************************************************************************************************************************************************************
******************************************************************************************************************************************************************

FULL CODE:

******************************************************************************************************************************************************************
******************************************************************************************************************************************************************

from pandas_datareader import data as web
import numpy as np
import pandas as pd

stock = [‘AAPL’]

returns = data.pct_change()

mean_return = returns.mean()
return_stdev = returns.std()

annualised_return = round(mean_return * 252, 2)
annualised_stdev = round(return_stdev * np.sqrt(252), 2)

print(‘AAPL’)

print(‘Annualised Mean Return’)
print(annualised_return)

print(‘Annualised Volatility’)
print(annualised_stdev)

print(‘____________________________________________________________________________________________________________’)

# now assume that we hold a portfolio of 4 tech stocks, Apple, Microsoft, Amazon and Yahoo…how do we start to calculate the expected return and volatility of that portfolio?

# first input needed will be the weights of the stocks in the portfolio
# Assume portfolio is made up of:

# 50% AAPL, 20% MSFT, 20% AMZN, and 10% YHOO. Calculate expected return and volatility as follows:

# list of stocks in portfolio
stocks = [‘AAPL’, ‘MSFT’, ‘AMZN’, ‘YHOO’]

# download daily price data for each of the stocks in the portfolio

# *****************************************************************************************************************************************************************************************************
# Since we are not reinvesting short sale proceeds into portfolio and letting it sit in cash, we also need to add the cash “returns” stream to our initial DataFrame of investments – we need 5 columns,
# the last one representing cash.
# ******************************************************************************************************************************************************************************************************

# there to represent the return stream of actually holding the cash. So it is analogous to the “price” of cash, just as the price of Apple or the price of Yahoo is shown in the DataFrame.
# I chose 1 as an arbitrary figure – you can choose anything that remains constant to represent the fact that cash is risk free (zero volatility) but also generates zero returns – hence the
# “price” of cash does not change. It’s the 1.0 in the ‘weights’ vector which represents how much cash you are actually holding in % terms.
data[‘CASH’] = 1

# convert daily stock prices into daily returns
returns = data.pct_change()

# calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

#***********************************************************************************************************************************************
## NOW , ASSUME WE WANT TO BE SHORT 50% APPLE, INSTEAD OF LONG 50%
# WHEN YOU SHORT 50% APPLE, YOU CAN REINVEST THOSE FUNDS THAT YOU OBTAIN FROM BROKER FOR TAKING ON THE SHORT ACROSS OTHER STOCKS IN THE PORTFOLIO
# HOWEVER, IN THIS EXAMPLE CODE BELOW WE WANT TO REMAIN NET SHORT, YET KEEP PROCEEDS FROM SHORT SALE IN CASH AND NOT REINVEST IT
# NOTICE SUMS STILL ADD TO 1 FOR THE DATA FRAME
#***********************************************************************************************************************************************

# set array holding portfolio weights of each stock
# The 1.0 at the end is calculated as being the percentage of your portfolio that you hold in cash – in this instance it is actually 100% as you have financed the 0.2, 0.2 and 0.1
# holdings by selling 0.5 of Apple – leaving you flat and holding 100% of your portfolio value in cash.
# f you had only shorted 10% Apple stock for example, your weights vector would be:
# (weights = np.asarray([ -0.1, 0.2, 0.2, 0.1, 0.6])
# as the portfolio needs to sum to 1.0

weights = np.asarray([-0.5, 0.2, 0.2, 0.1,1.0])

# calculate annualised portfolio return
portfolio_return = round(np.sum(mean_daily_returns * weights) * 252, 2)
# calculate annualised portfolio volatility
portfolio_std_dev = round(np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))) * np.sqrt(252), 2)

print(‘Portfolio expected annualised return’)
print(portfolio_return)

print(‘Portfolio expected volatility’)
print(portfolio_std_dev)

print(‘____________________________________________________________________________________________________________’)

# But what if we aren’t happy with the level of volatility of our current portfolio and would like to reduce it?
# What if we are willing to take on more risk in search of a higher expected return?

# we can use Monte Carlo simulation to run 1000s of runs of different randomly generated weights for the individual stocks
#(obviously making sure the weights sum to 100%) and then calculate the expected return, expected volatility and Sharpe Ratio for each of the randomly generated portfolios.

# it’s then very simple (and helpful) to plot these combinations of expected returns and volatilities on a scatter plot – we can even colour the data points based on the
# Sharpe Ratio of that particular portfolio.

# Two portfolios that we may like to highlight as being “special” are
# 1) the portfolio with the highest Sharpe Ratio (i.e. the highest risk adjusted returns)
# and
# 2) The “minimum variance portfolio” which is the portfolio with the lowest volatility.

# We can locate these 2 portfolios by making a few changes to our code to store all the random weight arrays used for each run of the Monte Carlo simulation along side the expected return,
# standard deviation and Sharpe Ratio as we go along, then locate the point in the resulting DataFrame where the Sharpe Ratio is highest for portfolio “1” and where the standard deviation is lowest for
# portfolio “2”. Then – because we stored them as we went along, we can extract the array of weights that corresponds to each of those two portoflios and like that, we have all the info we need to construct
# either of the two portfolios.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# list of stocks in portfolio
stocks = [‘AAPL’, ‘MSFT’, ‘AMZN’, ‘YHOO’]

stocks.append(‘CASH’)

# download daily price data for each of the stocks in the portfolio

# convert daily stock prices into daily returns
returns = data.pct_change()

# calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

# set number of runs of random portfolio weights
num_portfolios = 25000

# set up array to hold results
# We have increased the size of the array to hold the weight values for each stock
results = np.zeros((5 + len(stocks) – 1, num_portfolios))

for i in range(num_portfolios):
# select random weights for portfolio holdings
weights = np.array(np.random.random(5))
# rebalance weights to sum to 1
weights /= np.sum(weights)

# calculate portfolio return and volatility
portfolio_return = np.sum(mean_daily_returns * weights) * 252
portfolio_std_dev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))) * np.sqrt(252)

# store results in results array
results[0, i] = portfolio_return
results[1, i] = portfolio_std_dev
# store Sharpe Ratio (return / volatility) – risk free rate element excluded for simplicity
results[2, i] = results[0, i] / results[1, i]
# iterate through the weight vector and add data to results array
for j in range(len(weights)):
results[j + 3, i] = weights[j]

# convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T, columns=[‘ret’, ‘stdev’, ‘sharpe’, stocks, stocks, stocks, stocks, stocks])

# locate position of portfolio with highest Sharpe Ratio
max_sharpe_port = results_frame.iloc[results_frame[‘sharpe’].idxmax()]

# locate positon of portfolio with minimum standard deviation
min_vol_port = results_frame.iloc[results_frame[‘stdev’].idxmin()]

# create scatter plot coloured by Sharpe Ratio
plt.scatter(results_frame.stdev, results_frame.ret, c=results_frame.sharpe, cmap=’RdYlBu’)
plt.xlabel(‘Volatility’)
plt.ylabel(‘Returns’)
plt.colorbar()

# plot red star to highlight position of portfolio with highest Sharpe Ratio
plt.scatter(max_sharpe_port, max_sharpe_port, marker=(5, 1, 0), color=’r’, s=1000)

# plot green star to highlight position of minimum variance portfolio
plt.scatter(min_vol_port, min_vol_port, marker=(5, 1, 0), color=’g’, s=1000)

# And now, all that’s left to do is pull out the weights of the stocks needed to create the two portfolios highlighted above, and that can be done as follows:

# print portfolio to maximize sharpe ratio
print(‘**********Portfolio to Maximize Sharpe Ratio**********’)
print(max_sharpe_port)

print(‘_____________________________________________________________________________________________________________’)

# print portfolio to minimize portfolio variance
print(‘**********Portfolio to Minimize Volatility**********’)
print(min_vol_port)

plt.show(plt.scatter)
plt.show(plt.colorbar())

******************************************************************************************************************************************************************
******************************************************************************************************************************************************************

August 1, 2017 - 1:27 am

Hi Sam,
Yeah ok just change line

```results = np.zeros((5+len(stocks)-1,num_portfolios))
```

Back to:

```results = np.zeros((4+len(stocks)-1,num_portfolios))
```

Also I’ve had a play around with some code and if you want to run an optimisation which allows short selling, and a portfolio value that always sums to 1, then you can use the following code:

```from random import uniform as rand

def randConstrained(n, m, low, high):
tot = m
if not low <= 0 <= high:
raise ValueError("Cannot guarantee a solution when the input does not allow for 0s")
for _ in range(n-1):

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

#list of stocks in portfolio
stocks = ['AAPL','MSFT','AMZN','YHOO']

#download daily price data for each of the stocks in the portfolio

data['CASH'] = 1.0

stocks.append('CASH')

#convert daily stock prices into daily returns
returns = data.pct_change()

#calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

#set number of runs of random portfolio weights
num_portfolios = 25000

#set up array to hold results
#We have increased the size of the array to hold the weight values for each stock
results = np.zeros((4+len(stocks)-1,num_portfolios))

for i in range(num_portfolios):
#select random weights for portfolio holdings
weights = np.asarray(randConstrained(5,1,-1,1))

#calculate portfolio return and volatility
portfolio_return = np.sum(mean_daily_returns * weights) * 252
portfolio_std_dev = np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252)

#store results in results array
results[0,i] = portfolio_return
results[1,i] = portfolio_std_dev
#store Sharpe Ratio (return / volatility) - risk free rate element excluded for simplicity
results[2,i] = results[0,i] / results[1,i]
#iterate through the weight vector and add data to results array
for j in range(len(weights)):
results[j+3,i] = weights[j]

#convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T,columns=['ret','stdev','sharpe',stocks,stocks,stocks,stocks,stocks])

#locate position of portfolio with highest Sharpe Ratio
max_sharpe_port = results_frame.iloc[results_frame['sharpe'].idxmax()]
#locate positon of portfolio with minimum standard deviation
min_vol_port = results_frame.iloc[results_frame['stdev'].idxmin()]

#create scatter plot coloured by Sharpe Ratio
plt.scatter(results_frame.stdev,results_frame.ret,c=results_frame.sharpe,cmap='RdYlBu')
plt.xlabel('Volatility')
plt.ylabel('Returns')
plt.colorbar()
#plot red star to highlight position of portfolio with highest Sharpe Ratio
plt.scatter(max_sharpe_port,max_sharpe_port,marker=(5,1,0),color='r',s=1000)
#plot green star to highlight position of minimum variance portfolio
plt.scatter(min_vol_port,min_vol_port,marker=(5,1,0),color='g',s=1000)
```

My only concern with these two suggestions is that in theory the minimum volatility portfolio should really just be 100% invested in cash, although the optimisations I have run get close to 100%, they don’t quite get there. i guess that is just a result of not being able to brute force your way through every single possible combination of weights – and with an infinite number theoretically possible, it’s no wonder it never actually hits a perfect 100%.
Again – any questions, just ask.

August 1, 2017 - 6:14 am

Awesome makes sense! Thanks for providing some of your own code and take on this long/short construction.

Random question, so I assume your code here (given shorting is an option now) will optimize and provide the option to short either 1, or 2, or 3, or even 4 of the specific stocks you state here? In other words, if the optimization reveals that all 4 stocks apple, msft, yhoo, and amzn were to be shorted given the time frame you specify to optimize based on sharpe ratio or minimum volatility, that could technically be a viable option with this code?

Thus, it is not restricted to 3 of the 4 names always being long, and 1 of the 4 names being short? In other words, as many shorts or longs are an option if that be the case if the optimization says so?

Best,

Sam

August 1, 2017 - 8:05 am

Yeah the last bit of code provided actually allows any of the stocks to be shorted, along with the possibility to short the cash element too – that is borrow more cash to invest in the stocks – the only constraint is that the entire portfolio must sum to 1.0

August 25, 2017 - 7:11 am

@s666

Thanks for the code help. Going off of the algorithm you sent me on running the long short algo results (shown below after message), I am having a somewhat hard time interpreting results.

You mentioned that the only constraint is that the entire portfolio must some to 1.0 or 100% on the long short, which makes sense. But, you also mentioned that the ‘code provided allows any of the stocks to be shorted, along with the possibility to short the cash element too; that is borrow more cash to invest in the stocks. My question is pretty simple (so I think):

If the outcome of my hypothetical optimization for a portfolio of 27 securities based on your 1 restriction cited above (all of the portfolio allocation must sum to 100%) along with the notion that there is a possibility to short the cash element too (that is borrow more cash to invest in the stocks), is for example:

SPY 0.375613
JPM -0.067214
BAC 0.158994
JNJ -0.657451
PFE 1.203582
AAPL -1.022245
FB -0.260270
GOOGL 0.891955
XOM -0.802305
CVX 0.060475
AMZN -0.514650
CMCSA 1.471165
MCD -0.810129
MMM -0.743058
HON 1.983260
PG -1.506028
WMT -0.241323
AMGN -0.504087
BIIB 2.857846
GLD -1.274357
SHY -0.499822
IEF 0.742873
AGG -0.759372
BWX -0.841793
CWI 0.823486
VTI 0.572939
EEM -0.713970
CASH 1.075886

Then, I am having trouble understanding this from a hypothetical Notion portfolio value terms assuming I am a PM and want to put \$100K to work today based on what I see in how to allocate my long short portfolio with the above results.

In other words, lets assume I ran the optimization shown above and the results I get in the above figures is what I want to put to work in my portfolio today in real time as far as how I want to allocate a hypothetical \$100,000 from the moment I see the results the algo gave me above. Given the optimization estimates that spit out are extreme in nature (i.e. some stocks are at +100% long, while others are -80% short), I want to figure out exactly how I can allocate this hypothetical \$100,000 based on the above percentage allocations in real time?

Now, given there is such a stretch in extreme long to shorts (i.e a range of +150% long to -80% short), how do I determine based on the above results how to actually translate these results to a portfolio of \$100,000 today assuming I wanted to believe in these result allocations? Or would I need to adjust the long/short code to adhere to my objectives stated? I guess all I am trying to do is interpret the %s above (whether long or short) and translate them to my \$100K at work and know what % of each of these 27 names I am shorting and longing based on the results the algo spit out.

As always, really appreciate your guidance and assistance. Love your work and hope to hear back soon.

Best,

Sam K.

July 31, 2017 - 12:25 am
July 31, 2017 - 12:30 am

Are you referencing this to point out the difference in the way i have calculated the Sharpe Ratio vs how this article does?

I have excluded the risk free return element for simplicity – as mentioned in the code comments. With risk free rates as low as they are currently, the difference in the Sharpe ratio calculation will be minimal.

August 1, 2017 - 8:29 pm

Awesome makes sense. Thanks a lot. Curious- would you recommend any particular resources such as specific PDFs, articles, or even books that further expand on this topic of portfolio optimization using Python specifically. I am very infatuated by the subject, however cannot find any resourceful books, sites, or articles online that focus specifically on how to expand its horizons specifically with the use of Python.

Please let me know if you can recommend anything. And again, thank you so much for the guidance and help. Truly appreciated!

Best,

Sam

August 12, 2017 - 10:00 pm

This is a basic question but i am starting in the financial world. Why when you’re calculating the portfolio return multiply by 252 ?

August 13, 2017 - 7:51 am

Hi Mario – that’s a perfectly reasonable questions, and well noticed. Very simply, we multiply the average daily return by 252 to get the annual return as there are 252 “trading days” in a year; we don’t include weekends as the markets are shut.

As a second thing to notice, when we are converting daily returns to annual returns we multiply by 252 as stated, as returns scale linearly with time, but to scale the volatility (standard deviation) from daily to yearly, we multiply by the square root of 252. This is because standard deviation does not scale linearly with time, variance does, and standard deviation if the square root of variance.

For a more indepth explanation of this last point, please visit: http://www.macroption.com/why-is-volatility-proportional-to-square-root-of-time/

August 13, 2017 - 5:46 pm

Thanks for the amazing reply. Imagine I want to calculate the returns from a specific date to a specific one let’s give an example, from start=’01/01/2010′ to end=’01/03/2010′ if I deleted the 252 value it would give me the correct return of the portfolio?

August 14, 2017 - 4:35 pm

Hi Mario, deleting the 252 completely won’t get you what you want…you need to change the number to reflect the number of business days (trading days) between the two dates you are generating your returns for. So in your case, you have selected a 2 month period – there are 21 trading days per month on average so if you multiply by 42 you will get your answer. Obviously February has less trading days than other months so you may like to account for that.

August 15, 2017 - 7:04 pm

nice Tutorial !! Thanks !!!

August 15, 2017 - 8:29 pm

It seems to me that the weights of the Maximum Sharpe Portfolio are wrong.
The low weight of AMZN seems to be wrong given the high mean return of AMZN.
The order of the weights should be: AAPL, AMZN, MSFT, YHOO instead of AAPL, MSFT, AMZN, YHOO, correct?
Many thanks!

August 16, 2017 - 5:48 am

Hi Jan, thanks for your comment. I can see why on first thoughts that may seem like a logical conclusion, that the highest Sharpe ratio portfolio would consist of the highest return stocks, however the return of the portfolio is just half the equation…we also have the portfolio volatility in the denominator of the Sharpe ratio formula.

The volatility of the portfolio depends very heavily on the variances of, and covariances between the stocks in question…the line:

portfolio_std_dev = np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252)

calculates the portfolio standard deviation (volatility)…so in short, it is a little more complex than just choosing stocks with high returns.

Any question, fire away.

August 16, 2017 - 7:10 am

Many thanks for your quick reply. Of course, the minimum variance portfolio and the highest Sharpe ratio portfolio are functions of a) mean returns and b) the variance-co-variance matrix.
However, it seems there is an error in your code. The ORDER of the weights seems to be wrong.
PROVE:
1.) The average portfolio return r(p) is simply: r(p) = sum(w(i) *r(i)), where w(i) are the weights of the individual holdings and r(i) are the mean returns of the individual holdings.
2.) If I multiply the weights of the maximum Sharpe ratio portfolio as stated on your webpage with the mean returns the code calculates and sum them up, I do not get the portfolio return you state. Ergo: There is an error, q.e.d.
REASON FOR THE ERROR:
The order of the weights of the four holdings is wrong for both portfolios (maximum Sharpe portfolio AND minimum vol. portfolio). The order should be AAPL, AMZN, MSFT, YHOO not AAPL, MSFT, AMZN, YHOO.
I would appreciate if you would apply the formula r(p) = sum(w(i) *r(i)) to check if I am right (or wrong) that the ORDER of the weights of BOTH portfolios is wrong. If you agree that there is an error, please let me know where I can find the error in the code. There is no better way than learning from mistakes.
Thanks for the great code! I really appreciate your webpage. It is very helpful to learn Python …

August 16, 2017 - 10:23 am

Hi Jan – fantastic spot!!! I was too quick to answer you before. I have had a play around with the code and I have located the error…The list of stocks has to be in alphabetical order as this line of code:

#convert daily stock prices into daily returns
returns = data.pct_change()

#calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()

create a DataFrame of mean daily returns that is sorted by alphabetical order, so in order for the labels to be correct in the “results” array, the stocks must be listed in alphabetical order to begin with.

Again, really well noticed, I appreciate your eye for detail!

I will fix this when I get a free moment.

August 26, 2017 - 4:37 pm

@s666

Thanks for the code help. Going off of the algorithm you sent me on running the long short algo results (shown below after message), I am having a somewhat hard time interpreting results.
You mentioned that the only constraint is that the entire portfolio must some to 1.0 or 100% on the long short, which makes sense. But, you also mentioned that the ‘code provided allows any of the stocks to be shorted, along with the possibility to short the cash element too; that is borrow more cash to invest in the stocks. My question is pretty simple (so I think):

If the outcome of my hypothetical optimization for a portfolio of 27 securities based on your 1 restriction cited above (all of the portfolio allocation must sum to 100%) along with the notion that there is a possibility to short the cash element too (that is borrow more cash to invest in the stocks), is for example:

SPY 0.375613
JPM -0.067214
BAC 0.158994
JNJ -0.657451
PFE 1.203582
AAPL -1.022245
FB -0.260270
GOOGL 0.891955
XOM -0.802305
CVX 0.060475
AMZN -0.514650
CMCSA 1.471165
MCD -0.810129
MMM -0.743058
HON 1.983260
PG -1.506028
WMT -0.241323
AMGN -0.504087
BIIB 2.857846
GLD -1.274357
SHY -0.499822
IEF 0.742873
AGG -0.759372
BWX -0.841793
CWI 0.823486
VTI 0.572939
EEM -0.713970
CASH 1.075886

Then, I am having trouble understanding this from a hypothetical Notion portfolio value terms assuming I am a PM and want to put \$100K to work today based on what I see in how to allocate my long short portfolio with the above results.

In other words, lets assume I ran the optimization shown above and the results I get in the above figures is what I want to put to work in my portfolio today in real time as far as how I want to allocate a hypothetical \$100,000 from the moment I see the results the algo gave me above. Given the optimization estimates that spit out are extreme in nature (i.e. some stocks are at +100% long, while others are -80% short), I want to figure out exactly how I can allocate this hypothetical \$100,000 based on the above percentage allocations in real time?
Now, given there is such a stretch in extreme long to shorts (i.e a range of +150% long to -80% short), how do I determine based on the above results how to actually translate these results to a portfolio of \$100,000 today assuming I wanted to believe in these result allocations? Or would I need to adjust the long/short code to adhere to my objectives stated? I guess all I am trying to do is interpret the %s above (whether long or short) and translate them to my \$100K at work and know what % of each of these 27 names I am shorting and longing based on the results the algo spit out.
As always, really appreciate your guidance and assistance. Love your work and hope to hear back soon.

Best,
Sam K.

August 26, 2017 - 8:49 pm

@s666 going off the message I just sent you. I guess maybe to alter this, is there a way to add a constraint to the code so that lets say no more than +30% or -30% of the portfolio allocation can be in any one stock. This way maybe we can avoid the over leveraging of +100% or -100%? Yet, still keep the existing contraint that all the stocks must sum to 100%? The code you had created (to refresh your memory), is pasted below. Thanks for the help!

from random import uniform as rand

def randConstrained(n, m, low, high):
tot = m
if not low <= 0 <= high:
raise ValueError("Cannot guarantee a solution when the input does not allow for 0s")
for _ in range(n-1):

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

#list of stocks in portfolio
stocks = ['AAPL','MSFT','AMZN','YHOO']

#download daily price data for each of the stocks in the portfolio

data['CASH'] = 1.0

stocks.append('CASH')

#convert daily stock prices into daily returns
returns = data.pct_change()

#calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

#set number of runs of random portfolio weights
num_portfolios = 25000

#set up array to hold results
#We have increased the size of the array to hold the weight values for each stock
results = np.zeros((4+len(stocks)-1,num_portfolios))

for i in range(num_portfolios):
#select random weights for portfolio holdings
weights = np.asarray(randConstrained(5,1,-1,1))

#calculate portfolio return and volatility
portfolio_return = np.sum(mean_daily_returns * weights) * 252
portfolio_std_dev = np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252)

#store results in results array
results[0,i] = portfolio_return
results[1,i] = portfolio_std_dev
#store Sharpe Ratio (return / volatility) – risk free rate element excluded for simplicity
results[2,i] = results[0,i] / results[1,i]
#iterate through the weight vector and add data to results array
for j in range(len(weights)):
results[j+3,i] = weights[j]

#convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T,columns=['ret','stdev','sharpe',stocks,stocks,stocks,stocks,stocks])

#locate position of portfolio with highest Sharpe Ratio
max_sharpe_port = results_frame.iloc[results_frame['sharpe'].idxmax()]
#locate positon of portfolio with minimum standard deviation
min_vol_port = results_frame.iloc[results_frame['stdev'].idxmin()]

#create scatter plot coloured by Sharpe Ratio
plt.scatter(results_frame.stdev,results_frame.ret,c=results_frame.sharpe,cmap='RdYlBu')
plt.xlabel('Volatility')
plt.ylabel('Returns')
plt.colorbar()
#plot red star to highlight position of portfolio with highest Sharpe Ratio
plt.scatter(max_sharpe_port,max_sharpe_port,marker=(5,1,0),color='r',s=1000)
#plot green star to highlight position of minimum variance portfolio
plt.scatter(min_vol_port,min_vol_port,marker=(5,1,0),color='g',s=1000)

August 28, 2017 - 9:31 am

Hi Sam,

Sorry for the delay in replying, I’ve had a busy week! On the face of it, what you’re asking for may seem simple, but it is in fact mathematically very difficult (if not impossible) to get a truly random draw of numbers in a range including negative numbers, that sums to 1.

I have checked all over maths.stackexchange.com and stackoverflow.com and have yet to find a solution that satisfies all our requirements at the same time.

I will continue to have a play around but I think the we may have to think of another way to do this rather than brute forcing our way through a set of random portfolio weights. There is a convex optimisation package available for Python (www.cvxopt.org) which may be of use here, but I’ve never been able to install it properly. That would allow us to find the true optimal portfolio and MAY allow negative weights and also maximum/minimum size constraints.

Sorry that the problem isn’t as simple as it first sounds. I’ll let you know if and when I find anything useful.

Also please check out the comments section – VERY IMPORTANT POINT – is that there is a part of the code which rearranges our list of stocks into alphabetical order without being explicit about it – so please make sure that you enter your list of stocks to the code in alphabetical order so that the correct weight is applied to the correct stock during the iterative process of fulling the numpy array.

August 28, 2017 - 5:29 pm

Hello there again @S666. Thank you for the response as always. Ok I figured that may be the case and will dig into it on my end as well and keep you posted. I will also take a look at your suggested optimization package.

I have a qick question on what you meant at the end where you say very important point. Which part of the code are you specifically referring to when you say it “rearranges our list of stocks into alphabetical order without being explicit about it – so please make sure that you enter your list of stocks to the code in alphabetical order so that the correct weight is applied to the correct stock during the iterative process of fulling the numpy array.” I will be sure to enter the list of stocks in alphabetical order going forward to comply to this, however just curious on which line of code does it? And I assume you mean on my end, when i create:

stocks = [ stock1 , stock 2, etc. ], that I need to enter the stocks in alphabetical order in this frame?

August 28, 2017 - 7:04 pm

Yes this is correct, you need to enter your “stocks = [ stock1 , stock 2, etc. ]” in alphabetical order.

The line that places the stocks in alphabetical order without notifying you is:

```#convert daily stock prices into daily returns
returns = data.pct_change()
```
August 28, 2017 - 6:45 pm

@s666

One more follow up please to the message I just sent. How about if we alter or add a line of code to that we take out the restriction that the portfolio must add up to 100%? In other words it can be net – or net +, but by doing so we add one constaint:

1. That the portfolio cannot be net long or net short more than lets say +20% or -20% in any individual stock?

Wouldn’t that be possible now since we are taking out the 100% requirement for the sum? It makes feasible sense kind of too since many funds are net leveraged – or + on their book……….And not necessarily having the book sum = 100% as a restriction…….

August 28, 2017 - 6:59 pm

Hi Sam,

Yes when you take away the “sum to 1” constraint then the problem becomes extremely simple – you can just create the weights with the following code:

you can just use:

```weights = []

for i in range(20):
x = random.uniform(-200,200) / 1000
weights.append(x)
```

#select random weights for portfolio holdings
weights = np.array(np.random.random(4))
#rebalance weights to sum to 1
weights /= np.sum(weights)

The 20 in “range(20)” should be changed to reflect the actual number of stocks you have in your portfolio, and the 200 and -200 in “x = random.uniform(-200,200) / 1000” can be changed if you want to change the restrictions from +/- 20% to something else. e.g. 300 and -300 would obviously get you 30% and -30% restriction.

Hope that helps.

August 28, 2017 - 7:39 pm

Hmm ok. So Am I changing the

for i in range(num_portfolios):
# select random weights for portfolio holdings
weights = np.asarray(randConstrained(17, 1, -1, 1))

to:

weights = []

for i in range(20):
x = random.uniform(-200,200) / 1000
weights.append(x)

A little confused on what to take out of my existing code and how to add these few lines in. I have pasted the code you helped me with below to hopefully help you in helping me replace it. Thanks as always you are awesome!

******************************************************************
Current code:

******************************************************************

from random import uniform as rand

def randConstrained(n, m, low, high):
tot = m
if not low <= 0 <= high:
raise ValueError("Cannot guarantee a solution when the input does not allow for 0s")
for _ in range(n – 1):
answer.append(low + rand(0, tot) * (high – low))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

stocks = ['EEM','GDX','IBB','IEF','IEI','SHY','TLT', 'XLE','XLF','XLI','XLK','XLP','XLRE','XLU','XLV','XLY',]

# download daily price data for each of the stocks in the portfolio

data['CASH'] = 1.0

stocks.append('CASH')

# convert daily stock prices into daily returns
returns = data.pct_change()

# calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

# set number of runs of random portfolio weights
num_portfolios = 25000

# set up array to hold results
# We have increased the size of the array to hold the weight values for each stock (keep this at 4 constant irregardless of stocks in basket)
results = np.zeros((4 + len(stocks) – 1, num_portfolios))

for i in range(num_portfolios):
# select random weights for portfolio holdings
weights = np.asarray(randConstrained(17, 1, -1, 1))

# calculate portfolio return and volatility
portfolio_return = np.sum(mean_daily_returns * weights) * 252
portfolio_std_dev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))) * np.sqrt(252)

# store results in results array
results[0, i] = portfolio_return
results[1, i] = portfolio_std_dev
# store Sharpe Ratio (return / volatility) – risk free rate element excluded for simplicity
results[2, i] = results[0, i] / results[1, i]
# iterate through the weight vector and add data to results array
for j in range(len(weights)):
results[j + 3, i] = weights[j]

# convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T,
columns=['ret', 'stdev', 'sharpe', stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks])

# locate position of portfolio with highest Sharpe Ratio
max_sharpe_port = results_frame.iloc[results_frame['sharpe'].idxmax()]
# locate positon of portfolio with minimum standard deviation
min_vol_port = results_frame.iloc[results_frame['stdev'].idxmin()]

# create scatter plot coloured by Sharpe Ratio
plt.scatter(results_frame.stdev, results_frame.ret, c=results_frame.sharpe, cmap='RdYlBu')
plt.xlabel('Volatility')
plt.ylabel('Returns')
plt.colorbar()
# plot red star to highlight position of portfolio with highest Sharpe Ratio
plt.scatter(max_sharpe_port, max_sharpe_port, marker=(5, 1, 0), color='r', s=1000)
# plot green star to highlight position of minimum variance portfolio
plt.scatter(min_vol_port, min_vol_port, marker=(5, 1, 0), color='g', s=1000)

# print portfolio to maximize sharpe ratio
print('**********Portfolio to Maximize Sharpe Ratio**********')
print(max_sharpe_port)

print('_____________________________________________________________________________________________________________')

# print portfolio to minimize portfolio variance
print('**********Portfolio to Minimize Volatility**********')
print(min_vol_port)

plt.show(plt.scatter)
plt.show(plt.colorbar())

August 28, 2017 - 7:47 pm

No worries, happy to try to help…Ok change the line:

```# select random weights for portfolio holdings
weights = np.asarray(randConstrained(17, 1, -1, 1))
```

to the code given above, i.e.:

weights = []

for i in range(17):
x = random.uniform(-200,200) / 1000
weights.append(x)

That will give you 17 random weights constrained to be within +/- 20% range.

August 28, 2017 - 8:41 pm

Hmm. something is odd. When I do that, I get an error:

File “C:/Users/Sam/PycharmProjects/Test/.ipynb_checkpoints/SVCAPITAL PORTFOLIO OPTIMIZATION AND ALLOCATION MODEL/c.py”, line 76, in
x = random.uniform(-200, 200) / 1000
NameError: name ‘random’ is not defined

I have pasted my entire code (including thew new change below) to better help you see maybe what is going on? Also, I have 16 stocks here not 17, so that should be for i in range(16) instead of for i in range(17) I believe?

********************

CODE

*********************

# LONG SHORT PORTFOLIO OPTIMIZATION
# Code provided allows any of the stocks to be shorted, along with the possibility to short the cash element too – that is borrow
# more cash to invest in the stocks – the only constraint is that the entire portfolio must sum to 1.0

from random import uniform as rand

def randConstrained(n, m, low, high):
tot = m
if not low <= 0 <= high:
raise ValueError("Cannot guarantee a solution when the input does not allow for 0s")
for _ in range(n – 1):
answer.append(low + rand(0, tot) * (high – low))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

stocks = ['EEM','GDX','IBB','IEF','IEI','SHY','TLT', 'XLE','XLF','XLI','XLK','XLP','XLRE','XLU','XLV','XLY',]

# download daily price data for each of the stocks in the portfolio

data['CASH'] = 1.0

stocks.append('CASH')

# convert daily stock prices into daily returns
returns = data.pct_change()

# calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

# set number of runs of random portfolio weights
num_portfolios = 25000

# set up array to hold results
# We have increased the size of the array to hold the weight values for each stock (keep this at 4 constant irregardless of stocks in basket)
results = np.zeros((4 + len(stocks) – 1, num_portfolios))

for i in range(num_portfolios):
# select random weights for portfolio holdings
weights = []
for i in range(17):
x = random.uniform(-200, 200) / 1000
weights.append(x)

# calculate portfolio return and volatility
portfolio_return = np.sum(mean_daily_returns * weights) * 252
portfolio_std_dev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))) * np.sqrt(252)

# store results in results array
results[0, i] = portfolio_return
results[1, i] = portfolio_std_dev
# store Sharpe Ratio (return / volatility) – risk free rate element excluded for simplicity
results[2, i] = results[0, i] / results[1, i]
# iterate through the weight vector and add data to results array
for j in range(len(weights)):
results[j + 3, i] = weights[j]

# convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T,
columns=['ret', 'stdev', 'sharpe', stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks])

# locate position of portfolio with highest Sharpe Ratio
max_sharpe_port = results_frame.iloc[results_frame['sharpe'].idxmax()]
# locate positon of portfolio with minimum standard deviation
min_vol_port = results_frame.iloc[results_frame['stdev'].idxmin()]

# create scatter plot coloured by Sharpe Ratio
plt.scatter(results_frame.stdev, results_frame.ret, c=results_frame.sharpe, cmap='RdYlBu')
plt.xlabel('Volatility')
plt.ylabel('Returns')
plt.colorbar()
# plot red star to highlight position of portfolio with highest Sharpe Ratio
plt.scatter(max_sharpe_port, max_sharpe_port, marker=(5, 1, 0), color='r', s=1000)
# plot green star to highlight position of minimum variance portfolio
plt.scatter(min_vol_port, min_vol_port, marker=(5, 1, 0), color='g', s=1000)

# print portfolio to maximize sharpe ratio
print('**********Portfolio to Maximize Sharpe Ratio**********')
print(max_sharpe_port)

print('_____________________________________________________________________________________________________________')

# print portfolio to minimize portfolio variance
print('**********Portfolio to Minimize Volatility**********')
print(min_vol_port)

plt.show(plt.scatter)
plt.show(plt.colorbar())

August 28, 2017 - 8:44 pm

Should be an easy fix, just add “import random” at the top of your script. And yup – change the number to 16 if you have 16 stocks.

August 28, 2017 - 8:51 pm

Made the change and imported random.

However, now getting error:

portfolio_std_dev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))) * np.sqrt(252)
AttributeError: ‘list’ object has no attribute ‘T’

Maybe this is related somehow to the console telling me that in the code (weights.append(x)) , it is saying ‘x can not be defined’. Could it be related to that maybe? Full code pasted below:

***********************
CODE
***********************

# LONG SHORT PORTFOLIO OPTIMIZATION
# Code provided allows any of the stocks to be shorted, along with the possibility to short the cash element too – that is borrow
# more cash to invest in the stocks – the only constraint is that the entire portfolio must sum to 1.0

from random import uniform as rand

def randConstrained(n, m, low, high):
tot = m
if not low <= 0 <= high:
raise ValueError("Cannot guarantee a solution when the input does not allow for 0s")
for _ in range(n – 1):
answer.append(low + rand(0, tot) * (high – low))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random

# list of ETFs in portfolio
# XLRE – REITs
# XLF – Financial Select Sector SPDR ETF
# XLV – Health Care Select Sector SPDR RTF
# XLK – Technology sector SPDR ETF
# XLE – Energy Select SPDR Fund
# XLY- Consumer Discretionary Select Sector SPDR Fund
# XLI – Industrial Select Sector SPDR Fund
# XLP – Consumer Staples Select Sector SPDR Fund
# XLU – Utilities Select Sector SPDR ETF
# IBB – iShares Nasdaq Biotechnology ETF (GOES BACK TO 2001)
# GDX – VanEck Vectors Gold Miners ETF (GOES BACK TO 2006)
# EEM – Emerging Markets ETF ( GOES BACK t0 2003)
# SHY – iShares 1-3 Year Treasury Bond ETF ( GOES BACK TO 2003)
# IEI – iShares 3-7 Year Treasury Bond ETF ( GOES BACK TO FEB 2007)
# IEF – iShares 7-10 Year Treasury Bond ETF (GOES BACK TO 2002)
# TLT – iShares 20+ Year Treasury Bond ETF (GOES BACK TO 2002)

stocks = ['EEM','GDX','IBB','IEF','IEI','SHY','TLT', 'XLE','XLF','XLI','XLK','XLP','XLRE','XLU','XLV','XLY',]

# download daily price data for each of the stocks in the portfolio

data['CASH'] = 1.0

stocks.append('CASH')

# convert daily stock prices into daily returns
returns = data.pct_change()

# calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

# set number of runs of random portfolio weights
num_portfolios = 25000

# set up array to hold results
# We have increased the size of the array to hold the weight values for each stock (keep this at 4 constant irregardless of stocks in basket)
results = np.zeros((4 + len(stocks) – 1, num_portfolios))

for i in range(num_portfolios):
# select random weights for portfolio holdings
weights = []
for i in range(16):
x = random.uniform(-200, 200) / 1000
weights.append(x)

# calculate portfolio return and volatility
portfolio_return = np.sum(mean_daily_returns * weights) * 252
portfolio_std_dev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))) * np.sqrt(252)

# store results in results array
results[0, i] = portfolio_return
results[1, i] = portfolio_std_dev
# store Sharpe Ratio (return / volatility) – risk free rate element excluded for simplicity
results[2, i] = results[0, i] / results[1, i]
# iterate through the weight vector and add data to results array
for j in range(len(weights)):
results[j + 3, i] = weights[j]

# convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T,
columns=['ret', 'stdev', 'sharpe', stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks])

# locate position of portfolio with highest Sharpe Ratio
max_sharpe_port = results_frame.iloc[results_frame['sharpe'].idxmax()]
# locate positon of portfolio with minimum standard deviation
min_vol_port = results_frame.iloc[results_frame['stdev'].idxmin()]

# create scatter plot coloured by Sharpe Ratio
plt.scatter(results_frame.stdev, results_frame.ret, c=results_frame.sharpe, cmap='RdYlBu')
plt.xlabel('Volatility')
plt.ylabel('Returns')
plt.colorbar()
# plot red star to highlight position of portfolio with highest Sharpe Ratio
plt.scatter(max_sharpe_port, max_sharpe_port, marker=(5, 1, 0), color='r', s=1000)
# plot green star to highlight position of minimum variance portfolio
plt.scatter(min_vol_port, min_vol_port, marker=(5, 1, 0), color='g', s=1000)

# print portfolio to maximize sharpe ratio
print('**********Portfolio to Maximize Sharpe Ratio**********')
print(max_sharpe_port)

print('_____________________________________________________________________________________________________________')

# print portfolio to minimize portfolio variance
print('**********Portfolio to Minimize Volatility**********')
print(min_vol_port)

plt.show(plt.scatter)
plt.show(plt.colorbar())

August 28, 2017 - 9:36 pm

Sorry the “.T” is the numpy “transpose” method and it doesn’t work on lists, it only works on numpy arrays – so we just need to recast the weights vector as a numpy array as follows:

weights = np.asarray(weights)

August 28, 2017 - 10:59 pm

Hmmm odd still getting an error:

portfolio_std_dev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))) * np.sqrt(252)
ValueError: shapes (17,17) and (1,) not aligned: 17 (dim 1) != 1 (dim 0)

I have 16 stocks total. Full code I have below that spits this error out:

********************
CODE
******************

# LONG SHORT PORTFOLIO OPTIMIZATION
# Code provided allows any of the stocks to be shorted, along with the possibility to short the cash element too – that is borrow
# more cash to invest in the stocks – the only constraint is that the entire portfolio must sum to 1.0

from random import uniform as rand

def randConstrained(n, m, low, high):
tot = m
if not low <= 0 <= high:
raise ValueError("Cannot guarantee a solution when the input does not allow for 0s")
for _ in range(n – 1):
answer.append(low + rand(0, tot) * (high – low))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random

# list of ETFs in portfolio
# XLRE – REITs
# XLF – Financial Select Sector SPDR ETF
# XLV – Health Care Select Sector SPDR RTF
# XLK – Technology sector SPDR ETF
# XLE – Energy Select SPDR Fund
# XLY- Consumer Discretionary Select Sector SPDR Fund
# XLI – Industrial Select Sector SPDR Fund
# XLP – Consumer Staples Select Sector SPDR Fund
# XLU – Utilities Select Sector SPDR ETF
# IBB – iShares Nasdaq Biotechnology ETF (GOES BACK TO 2001)
# GDX – VanEck Vectors Gold Miners ETF (GOES BACK TO 2006)
# EEM – Emerging Markets ETF ( GOES BACK t0 2003)
# SHY – iShares 1-3 Year Treasury Bond ETF ( GOES BACK TO 2003)
# IEI – iShares 3-7 Year Treasury Bond ETF ( GOES BACK TO FEB 2007)
# IEF – iShares 7-10 Year Treasury Bond ETF (GOES BACK TO 2002)
# TLT – iShares 20+ Year Treasury Bond ETF (GOES BACK TO 2002)

stocks = ['EEM','GDX','IBB','IEF','IEI','SHY','TLT', 'XLE','XLF','XLI','XLK','XLP','XLRE','XLU','XLV','XLY',]

# download daily price data for each of the stocks in the portfolio

data['CASH'] = 1.0

stocks.append('CASH')

# convert daily stock prices into daily returns
returns = data.pct_change()

# calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

# set number of runs of random portfolio weights
num_portfolios = 25000

# set up array to hold results
# We have increased the size of the array to hold the weight values for each stock (keep this at 4 constant irregardless of stocks in basket)
results = np.zeros((4 + len(stocks) – 1, num_portfolios))

for i in range(num_portfolios):
# select random weights for portfolio holdings
weights = []
for i in range(16):
x = random.uniform(-200, 200) / 1000
weights.append(x)

weights = np.asarray(weights)

# calculate portfolio return and volatility
portfolio_return = np.sum(mean_daily_returns * weights) * 252
portfolio_std_dev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))) * np.sqrt(252)

# store results in results array
results[0, i] = portfolio_return
results[1, i] = portfolio_std_dev
# store Sharpe Ratio (return / volatility) – risk free rate element excluded for simplicity
results[2, i] = results[0, i] / results[1, i]
# iterate through the weight vector and add data to results array
for j in range(len(weights)):
results[j + 3, i] = weights[j]

# convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T,
columns=['ret', 'stdev', 'sharpe', stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks])

# locate position of portfolio with highest Sharpe Ratio
max_sharpe_port = results_frame.iloc[results_frame['sharpe'].idxmax()]
# locate positon of portfolio with minimum standard deviation
min_vol_port = results_frame.iloc[results_frame['stdev'].idxmin()]

# create scatter plot coloured by Sharpe Ratio
plt.scatter(results_frame.stdev, results_frame.ret, c=results_frame.sharpe, cmap='RdYlBu')
plt.xlabel('Volatility')
plt.ylabel('Returns')
plt.colorbar()
# plot red star to highlight position of portfolio with highest Sharpe Ratio
plt.scatter(max_sharpe_port, max_sharpe_port, marker=(5, 1, 0), color='r', s=1000)
# plot green star to highlight position of minimum variance portfolio
plt.scatter(min_vol_port, min_vol_port, marker=(5, 1, 0), color='g', s=1000)

# print portfolio to maximize sharpe ratio
print('**********Portfolio to Maximize Sharpe Ratio**********')
print(max_sharpe_port)

print('_____________________________________________________________________________________________________________')

# print portfolio to minimize portfolio variance
print('**********Portfolio to Minimize Volatility**********')
print(min_vol_port)

plt.show(plt.scatter)
plt.show(plt.colorbar())

August 29, 2017 - 4:34 pm

@s666 I think the issue might have to due with ‘Warnings’ I am getting due to the for loop being nested in another for loop. Specifically, it is saying

‘Redeclared i above without any usage’ in the statement ‘for i in range (16), and Name ‘x’ can not be defined in the statement ‘weights.append(x)

Thus, when I run the code we end up with a compiler error of:

portfolio_std_dev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))) * np.sqrt(252)
ValueError: shapes (17,17) and (1,) not aligned: 17 (dim 1) != 1 (dim 0)

Full Code Pasted Below:

********CODE********************
***********************************

from random import uniform as rand

def randConstrained(n, m, low, high):
tot = m
if not low <= 0 <= high:
raise ValueError("Cannot guarantee a solution when the input does not allow for 0s")
for _ in range(n – 1):
answer.append(low + rand(0, tot) * (high – low))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random

stocks = ['EEM','GDX','IBB','IEF','IEI','SHY','TLT', 'XLE','XLF','XLI','XLK','XLP','XLRE','XLU','XLV','XLY',]

# download daily price data for each of the stocks in the portfolio

data['CASH'] = 1.0

stocks.append('CASH')

# convert daily stock prices into daily returns
returns = data.pct_change()

# calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

# set number of runs of random portfolio weights
num_portfolios = 25000

# set up array to hold results
# We have increased the size of the array to hold the weight values for each stock (keep this at 4 constant irregardless of stocks in basket)
results = np.zeros((4 + len(stocks) – 1, num_portfolios))

for i in range(num_portfolios):
# select random weights for portfolio holdings
weights = []
for i in range(16):
x = random.uniform(-200, 200) / 1000
weights.append(x)

weights = np.asarray(weights)

# calculate portfolio return and volatility
portfolio_return = np.sum(mean_daily_returns * weights) * 252
portfolio_std_dev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))) * np.sqrt(252)

# store results in results array
results[0, i] = portfolio_return
results[1, i] = portfolio_std_dev
# store Sharpe Ratio (return / volatility) – risk free rate element excluded for simplicity
results[2, i] = results[0, i] / results[1, i]
# iterate through the weight vector and add data to results array
for j in range(len(weights)):
results[j + 3, i] = weights[j]

# convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T,
columns=['ret', 'stdev', 'sharpe', stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks])

# locate position of portfolio with highest Sharpe Ratio
max_sharpe_port = results_frame.iloc[results_frame['sharpe'].idxmax()]
# locate positon of portfolio with minimum standard deviation
min_vol_port = results_frame.iloc[results_frame['stdev'].idxmin()]

# create scatter plot coloured by Sharpe Ratio
plt.scatter(results_frame.stdev, results_frame.ret, c=results_frame.sharpe, cmap='RdYlBu')
plt.xlabel('Volatility')
plt.ylabel('Returns')
plt.colorbar()
# plot red star to highlight position of portfolio with highest Sharpe Ratio
plt.scatter(max_sharpe_port, max_sharpe_port, marker=(5, 1, 0), color='r', s=1000)
# plot green star to highlight position of minimum variance portfolio
plt.scatter(min_vol_port, min_vol_port, marker=(5, 1, 0), color='g', s=1000)

# print portfolio to maximize sharpe ratio
print('**********Portfolio to Maximize Sharpe Ratio**********')
print(max_sharpe_port)

print('_____________________________________________________________________________________________________________')

# print portfolio to minimize portfolio variance
print('**********Portfolio to Minimize Volatility**********')
print(min_vol_port)

plt.show(plt.scatter)
plt.show(plt.colorbar())

August 29, 2017 - 7:28 pm

Hi Sam, sorry I have just had a face-palm moment…I’ve been totally over complicating this!! All you need to do it set your weights using the numpy random.uniform distiribution setting your low and high to -0.3 and 0.3 and you “sizee” to the number of stocks you have in your list.

So try this:

weights = np.array(np.random.uniform(-0.3,0.3,4))

weights = []
for i in range(16):
x = random.uniform(-200, 200) / 1000
weights.append(x)
weights = np.asarray(weights)

August 29, 2017 - 7:30 pm

Here is a complete working example:

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
%matplotlib inline

#list of stocks in portfolio
stocks = ['AAPL','AMZN','MSFT','YHOO']

#download daily price data for each of the stocks in the portfolio

#convert daily stock prices into daily returns
returns = data.pct_change()

#calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

#set number of runs of random portfolio weights
num_portfolios = 2500

#set up array to hold results
#We have increased the size of the array to hold the weight values for each stock
results = np.zeros((4+len(stocks)-1,num_portfolios))

for i in range(num_portfolios):
#select random weights for portfolio holdings
weights = np.array(np.random.uniform(-0.3,0.3,4))

#calculate portfolio return and volatility
portfolio_return = np.sum(mean_daily_returns * weights) * 252
portfolio_std_dev = np.sqrt(np.dot(weights.T,np.dot(cov_matrix, weights))) * np.sqrt(252)

#store results in results array
results[0,i] = portfolio_return
results[1,i] = portfolio_std_dev
#store Sharpe Ratio (return / volatility) - risk free rate element excluded for simplicity
results[2,i] = results[0,i] / results[1,i]
#iterate through the weight vector and add data to results array
for j in range(len(weights)):
results[j+3,i] = weights[j]

#convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T,columns=['ret','stdev','sharpe',stocks,stocks,stocks,stocks])

#locate position of portfolio with highest Sharpe Ratio
max_sharpe_port = results_frame.iloc[results_frame['sharpe'].idxmax()]
#locate positon of portfolio with minimum standard deviation
min_vol_port = results_frame.iloc[results_frame['stdev'].idxmin()]

#create scatter plot coloured by Sharpe Ratio
plt.scatter(results_frame.stdev,results_frame.ret,c=results_frame.sharpe,cmap='RdYlBu')
plt.xlabel('Volatility')
plt.ylabel('Returns')
plt.colorbar()
#plot red star to highlight position of portfolio with highest Sharpe Ratio
plt.scatter(max_sharpe_port,max_sharpe_port,marker=(5,1,0),color='r',s=1000)
#plot green star to highlight position of minimum variance portfolio
plt.scatter(min_vol_port,min_vol_port,marker=(5,1,0),color='g',s=1000)
```
August 29, 2017 - 7:41 pm

@s66 Awesome, yes that worked. Thank you for the help as always! =D

Best,

Sam

August 30, 2017 - 12:16 am

@s666 Had one random follow up question for you. I am working off that code you recently submitted and it works great. However, I am trying to run it on a basket of 100 stocks instead of 4. Now, the code works just fine and spits out the optimization in the console of all 100 stocks. However, given the data set is so large, it cuts off at the letter ‘C’ and then forwards to the letter ‘N’ in the names of stocks. Thus, not displaying all of the results for me to see in the console. I assume its because the data set is so big. Is there any simple command or line of code I should add that will show all of my results in the console, intstead of the current where it cuts some out? Essentially, I want to see the entire frame. I have posted below what the output of the code is where it is cutting off, you will see from letter C and then forwards to letter N without showing me the middle chunk of the optimization!

Let me know if you have a chance please. Thanks! I am on Python 3.

**********Portfolio to Maximize Sharpe Ratio**********
ret 1.191248
stdev 0.305746
sharpe 3.896200
AAL -0.055412
AAPL 0.047818
AKAM 0.026778
ALXN -0.061917
AMAT 0.074762
AMGN -0.195493
AMZN -0.180999
ATVI 0.027458
AVGO 0.191906
BIDU 0.019338
BIIB -0.024391
BMRN -0.152147
CA 0.010816
CELG -0.007887
CERN -0.149978
CHKP 0.017568
CHTR 0.119874
CMCSA -0.051849
COST 0.029636
CSCO 0.031001
CSX 0.180601
CTAS 0.142230
CTRP -0.149408

……………………………. (THIS IS WHERE ITS CUTTING OFF AND LEAVING OUT MANY NAMES)

NVDA 0.197222
ORLY -0.067248
PAYX -0.180576
PCAR -0.065061
PCLN 0.061630
PYPL -0.000687
QCOM 0.071748
QVCA 0.004147
REGN 0.061266
ROST -0.167468
SBUX 0.052665
SHPG -0.174563
SIRI 0.146059
STX -0.001512
SWKS 0.061795
SYMC 0.049847
TMUS 0.160700
TSCO 0.033063
TSLA -0.014295
TXN 0.075495
ULTA 0.016199
VIAB -0.091411
VOD 0.194879
VRSK 0.180818
VRTX 0.004934
WBA 0.018271
WDC -0.071544
WYNN -0.028889
XLNX -0.125301
XRAY -0.021831
Name: 1318, Length: 109, dtype: float64

August 30, 2017 - 12:25 pm

Do you actually NEED it to display in your console or do you just need to access the data? if it’s the latter I suggest using the “to_csv” fucntionality of DataFrames to create a csv file output with the data.

Its as simple as running:

max_sharpe_port.to_csv(“Max Sharpe Portfolio.csv”)

This will spit out a csv file called “Max Sharpe Portfolio.csv” to your working directory which gives you access to the full DataFrame data.

August 30, 2017 - 4:09 pm

Hi @s666,

Yes having it saved to csv is just fine. I tried the to_csv function in the past many times and it worked just fine. However, for this specific code, it seems to not be doing anything for some odd reason. Specifically, at the end of my code, I am adding:

max_sharpe_port.to_csv(‘Max Sharpe Portfolio.csv’) (had to change it to single qutations as opposed to double ones that you showed because it was giving syntax error and was not even able to run with double in python 3)

When I run the code with that line at the end, the code seems to run fine and spits out output in the console. However when I check my directory, there is no csv file that is created with that data frame! Thus, making it seem like that line of code is just being ignored. Could it be that its having a hard time saving the output of max_sharpe_port to a csv because that output of that file is two columns ( the ticker and then the % of weight allocation) and thus we need to adjust to it?

Please let me know. I have attached full code below. As you will see the only line added at the end is

max_sharpe_port.to_csv(‘Max Sharpe Portfolio.csv’)

I checked my directory through my folder as well and the CSV simply is never created! Thanks as always for your help.

*******************

CODE

****************

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
import csv

# list of stocks in portfolio
stocks = [‘AXP’,’AAPL’,’BAC’,’CAT’,’CSCO’,’CVX’,’DD’, ‘DIS’,’GE’,’GS’,’HD’,’IBM’,’INTC’,’JNJ’,’KO’,’JPM’,’MCD’,’MMM’,’MRK’,’MSFT’,’NKE’,’PFE’,’PG’,’TRV’,’UNH’,’UTX’,’V’,’VZ’,’WMT’,’XOM’]

# download daily price data for each of the stocks in the portfolio

# convert daily stock prices into daily returns
returns = data.pct_change()

# calculate mean daily return and covariance of daily returns
mean_daily_returns = returns.mean()
cov_matrix = returns.cov()

# set number of runs of random portfolio weights
num_portfolios = 2500

# set up array to hold results
# We have increased the size of the array to hold the weight values for each stock
results = np.zeros((4 + len(stocks) – 1, num_portfolios))

for i in range(num_portfolios):
# select random weights for portfolio holdings
weights = np.array(np.random.uniform(-0.2, 0.2, 30)) #number of stocks is 3rd variable, first two are min and max requirements

# calculate portfolio return and volatility
portfolio_return = np.sum(mean_daily_returns * weights) * 252
portfolio_std_dev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))) * np.sqrt(252)

# store results in results array
results[0, i] = portfolio_return
results[1, i] = portfolio_std_dev
# store Sharpe Ratio (return / volatility) – risk free rate element excluded for simplicity
results[2, i] = results[0, i] / results[1, i]
# iterate through the weight vector and add data to results array
for j in range(len(weights)):
results[j + 3, i] = weights[j]

# convert results array to Pandas DataFrame
results_frame = pd.DataFrame(results.T, columns=[‘ret’, ‘stdev’, ‘sharpe’, stocks, stocks, stocks, stocks,stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks,stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks, stocks])
# SINCE WE START AT 0, MAKE SURE NUMBER OF STOCKS IN INDEX HERE IS ALWAYS 1 LESS THAN OUR TOTAL STOCKS BEING OPTIMIZED
# locate position of portfolio with highest Sharpe Ratio
max_sharpe_port = results_frame.iloc[results_frame[‘sharpe’].idxmax()]
# locate positon of portfolio with minimum standard deviation
min_vol_port = results_frame.iloc[results_frame[‘stdev’].idxmin()]

# create scatter plot coloured by Sharpe Ratio
plt.scatter(results_frame.stdev, results_frame.ret, c=results_frame.sharpe, cmap=’RdYlBu’)
plt.xlabel(‘Volatility’)
plt.ylabel(‘Returns’)
plt.colorbar()
# plot red star to highlight position of portfolio with highest Sharpe Ratio
plt.scatter(max_sharpe_port, max_sharpe_port, marker=(5, 1, 0), color=’r’, s=1000)
# plot green star to highlight position of minimum variance portfolio
plt.scatter(min_vol_port, min_vol_port, marker=(5, 1, 0), color=’g’, s=1000)

# print portfolio to maximize sharpe ratio
print(‘**********Portfolio to Maximize Sharpe Ratio**********’)
print(max_sharpe_port)

print(‘_____________________________________________________________________________________________________________’)

# print portfolio to minimize portfolio variance
print(‘**********Portfolio to Minimize Volatility**********’)
print(min_vol_port)

plt.show(plt.scatter)
plt.show(plt.colorbar())

max_sharpe_port.to_csv(‘Max Sharpe Portfolio.csv’)

August 31, 2017 - 8:21 am

It’s very strange that it didn’t work with double quotes, that should be just fine.

Perhaps to solve your problem you could set the full file path in the name of the csv file, so something like:

max_sharpe_port.to_csv(‘C:/XXX/XXX/XXX/Max Sharpe Portfolio.csv’) – obviously with the “Xs” representing an actual file path.

It sounds like your “working directory” is somewhere other than your python script in this case.

Check your current working directory by running:

import os
cwd = os.getcwd()

That will be where the first csv file was being saved.

September 1, 2017 - 11:28 pm

@s666 ok something is definitely fishy and going on here!

I have pasted in the entire path file as you suggested and the code runs fine and spits out data into the console, and with no errors, no CSV is being created!

Specifically, my line of code is:

max_sharpe_port.to_csv(‘C:/Users/Sam/PycharmProjects/Test/.ipynb_checkpoints/PORTFOLIO OPTIMIZATION/Max Sharpe Portfolio.csv’)

Code runs fine and data comes into console. But when I check that directory, no spreadsheet was made! Any idea what could be going on here?

Best,

Sam

September 7, 2017 - 6:38 pm

Hi Sam, I think it is because you are saving it in what is by default a “hidden folder” – you have the “.ipynb_checkpoints” in your path – this folder wont be visible unless you set your windows explorer to show hidden folders and files.

Try “‘C:/Users/Sam/PycharmProjects/Test/PORTFOLIO OPTIMIZATION/Max Sharpe Portfolio.csv’ instead – that should work.

September 4, 2017 - 10:47 am

Hi, I am quiet a beginner in finances and i didn’t understand why when you’re calculating the anualised returns you calculate the mean daily returns and multiply it by the days and not just calculate the roi with the dates of the beginning and the end of that year

September 7, 2017 - 6:21 pm

Hi James, there is no particularly important reason why I chose to do it this way – I could indeed have calculated an average annual growth rate and annual volatility by using the start and end dates and annualising it in the correct fashion.

The results would be similar with either method.

September 7, 2017 - 1:53 am

Hi,

Just wondering in the dataFrame columns, is there way to put some for loop so that when you add a stock in portfolio you dont need to manually change the code at dataFrame Columns as stocks[x]…

September 7, 2017 - 6:28 pm

Hi Jay, there sure is – you can just use list comprehension (which is a great thing to learn when learning Python, it’s so useful – I recommend Googling it):

Just use:

results_frame = pd.DataFrame(results.T,columns = [‘ret’,’stdev’,’sharpe’] + [stock for stock in stocks])

Remember that the initial stock list should be in alphabetical order for the results to be correct!!

December 1, 2017 - 2:59 pm

Hi, I’ve noticed you use used np.sum. I, however, uses np.dot. Although both gives the same answer, wouldn’t np.dot be more efficient? I’m just asking as I’m still learning.

December 1, 2017 - 3:02 pm

sI’ve noticed you use np.sum. Although both gives the same answer, wouldn’t np.dot be more efficient in this case?

January 3, 2018 - 11:35 am

Hi there, congratulations for the post !
Do you know how can we change the code to instead of retrieving the allocation for the locations with highest sharpe or minimum variance, set a volatility constraint like [‘Volatility’ = 0.15] for example and find the optimal return and allocation ?
Of course this must be inside the range of optimal allocations…

January 10, 2018 - 12:49 am

Dear S666, thank you very much for the online available class, it is very helpful!
At the “portfolio_return = np.sum(mean_daily_returns * weights) * 252” sentence, are you annualyzing the portfolio return?

January 23, 2018 - 8:03 pm

Dear S666,
I apologize in advance if this is a foolish question. It seems that the code “returns = data.pct_change()” calculates the negative of the returns, because the dataframe has the data orderef with the most recent prices at the top. Shouldn’t the return be the percentage change from day-to-day, with the most recent day’s price as the last row of the dataframe? And the oldest day as the first row?

Below is the code I copied from this post (and the output), which shows a negative annualized return, even though Apple’s stock price has increased over the last year.

###CODE###
import numpy as np
import pandas as pd

stock = [‘AAPL’]

returns = data.pct_change()

mean_return = returns.mean()
return_std = returns.std()

annualized_return = round(mean_return * 252, 2)
annualized_std = round(return_std * np.sqrt(252), 2)

print(‘The annualised mean return of stock {} is {}, and the annualised volatility is {}’.format(stock, annualized_return, annualized_std))

###OUTPUT###
The annualised mean return of stock AAPL is AAPL -0.2
dtype: float64, and the annualised volatility is AAPL 0.25
dtype: float64
AAPL
Date
2017-12-29 NaN
2017-12-28 0.010932
2017-12-27 -0.002806
2017-12-26 -0.000176
2017-12-22 0.026030
2017-12-21 0.000000
2017-12-20 -0.003771
2017-12-19 0.001090
2017-12-18 0.010771
2017-12-15 -0.013887
AAPL
Date
2017-12-29 169.229996
2017-12-28 171.080002
2017-12-27 170.600006
2017-12-26 170.570007
2017-12-22 175.009995
2017-12-21 175.009995
2017-12-20 174.350006
2017-12-19 174.539993
2017-12-18 176.419998
2017-12-15 173.970001

January 28, 2018 - 8:53 am

Hi Donald – thanks for your post. I have just tested the code and it turns out that you are indeed correct – the Pandas Datareader currently pulls down data in reverse chronological order. This must have changed as when I wrote the article the data was pulled down in chronological order. I have tested the data against the start and end dates used in the original post (from 21st Jan 2017) and I now have to reverse the data to get the same results that i achieved before without reversing – so this data order default must have changed!

You can fix this easily by using:

data.sort_index(inplace=True)

Well spotted though, I wasn’t aware of this change!

April 28, 2018 - 3:40 pm

Can you please modify and provide the code of using log returns instead of arithmetic return?

May 12, 2018 - 8:29 pm

Great article. I am working on similar project. I have 32 funds out of which I want to make unique fund combinations of 5 and then proceed with the process as you described in this article.
Do You know if there is a convenient solution in NumPy to make random sample from list?

October 31, 2018 - 12:08 pm

Apologies for the late response – if you are still looking for a good option, take a look at “numpy.random.choice” – https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.choice.html I’m sure you could fit that to your use case. Perhaps generate random integers within a range (the length of your list) and pull out the corresponding stock at that list index.

June 16, 2018 - 7:12 pm

First….thank you for your effort in this.

Second….the web.datareader appears to be unavailable now. Is there any update to the code to make this work again ? OR maybe I’m just doing something wrong ?

Any help would be greatly appreciated.

July 1, 2018 - 9:12 am

Hi there – yes there are definitely issues with the Pandas data-reader at the moment since it moved to version 0.6.0 which can be read about here:

They have depricated Google Finance and Yahoo amongst other – but they have also added some new ones. For example, you can download from “morningstar” now too – so just change “yahoo” to “morningstar” and change “Adj Close” to “Close” and that should work.

October 28, 2018 - 7:49 am

Hi~ I just have a question about the “marker=(5,1,0)”. What do these three numbers mean respectively? Thanks~

October 31, 2018 - 11:51 am

Hi Tony, the numbers in the “marker=(5,1,0)” are as follows: the first number (5) is the number of points you want on your marker shape – the second number refers to the marker shape/style – and the last number refers to the rotation would you would like to apply to the markers.

For example “marker=(3,1,0)” gives you a non-rotated solid 3 pointed triangle, whereas “marker=(4,2,0)” gives you what looks like a “+” sign and “marker=(4,2,45)” gives you a “+” rotated at 45 degrees to look like an “x”.

Have a play around with the numbers and you will see the shapes change.

You can see the full description here – https://matplotlib.org/api/markers_api.html

Scroll about halfway down the page and you will see the explanation for “(numsides, style, angle)”.

October 28, 2018 - 5:19 pm

@S666
Hey there first of all a fantastic discussion is going on here,
now i have some question you might be able to answer:

1. is the sorting Problem of the input and output solved by now?
“`(PLEASE NOTE THESE RESULTS ARE INCORRECT AS THEY WERE RUN PREVIOUS TO CORRECTLY SETTING MY INITIAL STOCK LIST IN ALPHABETICAL ORDER – I SHALL TRY TO FIX WHEN I GET A MOMENT)“`

October 31, 2018 - 9:27 am

Hi Sam, the output shown above is incorrect for the particular stocks I have used, but as long as you enter your initial stock list in alphabetical order, your results will be correct. Hope that answers your question.

January 10, 2019 - 7:48 am

A tip: you can avoid the alphabetical order requirement by doing this:

stocks = sorted([‘AAPL’,’AMZN’,’MSFT’,’YHOO’])

February 23, 2019 - 6:11 am

results[j+3,i] = weights[j]
IndexError: index 3 is out of bounds for axis 0 with size 3

Not sure what is causing this IndexError

July 5, 2019 - 11:43 am

Having exactly same issue. Not sure how to resolve.. Is there anyone who can help?

July 7, 2019 - 6:51 am

Hi JUYPter and Kamil, with regards to the error you are receiving, are you sure you have correctly set up thr results matrix? It should be :

`results = np.zeros((4+len(stocks)-1,num_portfolios))`
July 9, 2019 - 2:28 pm

Thank you so much for looking into it and leaving your answer, it all makes sense now.

July 10, 2019 - 2:28 am

Not a problem, you’re very welcome.

July 10, 2019 - 6:16 pm

Hi – thank you for this lovely code. However I’m sticky on applying a constraint on the weights of each stock output. I want to apply a cap which says if if the allocated weights from the Monte Carlo simulation exceed 20% redistribute the excess weights evenly across the board. I don’t want the weights of result from the simulation to be greater than 20%

Can someone help me?

July 10, 2019 - 7:04 pm

Hi Zack, do you mean you dont want any single individual stock weight to be more than 20% or that you want the total weight to sum to 20%? I actually have just recently published an updated version of this post which also shows the use of a optimiser function which allows you to set constraints directly.

Take a look at https://pythonforfinance.net/2019/07/02/investment-portfolio-optimisation-with-python-revisited/ and see if that helps at all. If you need any help adapting the code, just comment on that article and I will pick it up there. Thanks!

July 10, 2019 - 8:14 pm

Hi Zack, just multiply the weights you get by 0.2 and that should get you what you want.

Or just change the constraint so that

July 11, 2019 - 11:11 pm

Hi – sorry bit confused by this. to provide more context i have 10 US stocks and the output from the model puts a lot of weight for 2 stocks and very little to the others. so as a constraint i want to say if the weight goes above 20% distrubute the excess evenly across to the other stocks. now i have a while loop code that can do this but i want to incorperate this as a constraint into the monte carlo simulation code.

the below is the output weights i have

40
30
5
3
5
5
4
6
1
1

July 11, 2019 - 11:15 pm

Hi Zack, is there a particular reason for using a Monte Carlo approach or would you be open to using an optimisation function with relevant constraints supplied as arguments? This really is much more easily achieved with an optimiser!

July 11, 2019 - 11:18 pm

If you are adamant you would like to use the MC approach then you are best served by setting the constraints at the point of random weight vector creation rather than trying to “redistribute” allocations after the fact. If you send me a quick message using the contact form on the contact page I am happy to help you via email – it might prove easier that way.

August 15, 2019 - 9:50 am

I’m not sure about the way you calculate annual return. I found this formula Annualized Return=(1+ Return)1/N-1. Can you please explain?

September 5, 2019 - 10:04 pm

Hi sorry for the delay in replying!!

I admit I have been a bit lazy and inconsistent in some of my posts regarding the calculation of annualised returns, when calculating them based on daily mean returns etc. In the code example I have posted I have just multiplied the simple daily arithmetic mean return.

Your fomula above has unfortunately lost any formatting that was applied – but I think its safe to assume the original formula you are referring to is [(1 + Return)^(1 / N)] – 1

If that is indeed the formula you wrote, it would caclulate the geometric annualised return if we assume that the “Return” in the formula represents the total returns experienced over the entire period of “N” years. So if you recorded a 50% return over say 3.5 years, that would equate to an annualised Compound Annual Growth Rate of:

(1.5)^(1/3.5) – 1 = 12.28%

I was starting with a mean daily return and extrapolating to a “annualised” figure. Strictly if I wanted to just mutliply the daily mean as I have done, I should have used daily log returns as the input, not arethmetic returns as I have.