HierarchicalRiskParity PortfolioOptimization
HierarchicalRiskParity PortfolioOptimization
Author: Supervisor:
Mikel Mercader Pérez Dr. Josep Masdemont
May 2021
iii
Contents
Abstract v
Acknowledgements vii
1 Portfolio optimization 3
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Markowitz model: definitions . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Efficient frontier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Mean-variance optimization . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Risk parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Most diversified portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Robo-advisory 45
3.1 Scalable capital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1.1 Investment Universe . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1.2 Risk Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.3 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Conclusions 51
A Code 53
Bibliography 69
v
Abstract
Striking the optimal balance between risk and return is at the core of financial in-
vestment theory. This paper analyzes whether Hierarchical Risk Parity algorithm, a
novel portfolio optimization method, could outperform conventional models in certain
dimensions. In a series of 3 tests run with a custom written code, the performance
of HRP is compared to that of classical methods like Minimum Variance and Risk
Parity in terms of expected return, volatility, Sharpe ratio, maximum drawdown and
diversification ratio. In many of the cases analyzed, HRP showed superior results
in expected return and in minimizing drawdown, but often at the expense of higher
volatility and higher transaction costs. The paper finishes with a review of the alterna-
tive methods being currently used by the emerging financial robo-advisory industry,
and a qualitative discussion about the future opportunities for HRP application in
this context.
keywords: Finance, HRP, Markowitz, robo-advisory, portfolio optimization. 91-
08
vii
Acknowledgements
I would like to thank Gerard Alba for all the ideas, information and tools he has
provided me during this work. This project would not have been possible without his
help.
I would also like to express my most sincere gratitude to my good friend Aina,
whose support has accompanied me during all this journey.
1
Introduction
Investing in corporate securities may be profitable, but one should not underestimate
the risk of investing on a single security. Risk arises when there is a possibility of
variation of the return. Holding more than one security at a time enables investors to
spread their risks: even if some of the securities incur losses, the investor hopes that a
better performance of the rest might protect them from an extreme loss. This is the
underlying rationale behind the concept of a financial portfolio. We are defining the
idea of a portfolio, that is, the composite set of ownership rights to financial assets in
which the investor wishes to invest.
Although this idea was formalized in 1952 by Harry Markowitz, it was not an
strange concept previously in history. In the 16-th century, in the play The Merchant
of Venice, Shakespeare proves to have some intuition about the idea of diversifying
one’s investments. As spoken by merchant Antonio in Act I, Scene I:
It seems that Shakespeare understood not only the concept of diversification, but
also, to some extent, the effect of covariance.
A more recent example is the series of the Wiesenberger’s annual reports in Invest-
ment Companies, beginning in 1941. These reports analyzed firms that held a large
number of securities to invest and provided some diversification for their customers.
At the same time, those were modeled after the investments trusts of Scotland and
England from the 19-th century.
In 1938, John Burr Williams wrote a book called The theory of Investment value
that captured the most advanced thinking of the time: the dividend discount model,
which predicts the price of a company’s stock based on the theory that its present-
day price is equivalent to the expected value of the sum of all of its future dividend
payments when discounted back to their present value. During those times, financial
information from the companies was difficult to obtain and the loose ways of the
market, even after the tightened regulations following the Great Depression, generated
the impression that investing was a form of gambling for the wealthy. Despite of
that, professional investment managers like Benjamin Graham made huge progress
by first getting accurate information and then analyzing correctly before making any
investments.
What was lacking prior to 1952 was a formalized theory that covered the effects
of diversification when the assets are correlated, distinguished efficient and inefficient
portfolios, and analyzed risk-return tradeoffs.
Harry Markowitz, at the time a graduate student in operations research, devoted
his doctoral thesis to the study of the investment market. Upon reading John Burr
William’s book, he was struck by the fact that no consideration was given to the risk
of a particular investment. This inspired him to write Portfolio Selection, an article
published in 1952 in the Journal of Finance. However, the work languished for a
2
decade, mostly because only a minor part of the article contained text explanations,
while the vast majority was dominated by graphs and mathematical proofs, which
made it difficult to understand for a non academic audience.
In this work, we will in first place review the fundamentals of the Markowitz model,
providing the basic definitions and the tool set needed to better understand his work.
We will then aim to go beyond the conventional methods of portfolio optimization,
and go further by exploiting the power of Data Analysis. For that purpose, we will
also study a new method of optimization, namely Hierarchical Risk Parity, and we
will also review some of the methods used by the emerging Robo-advisory services
industry. In order to do that, our work is structured as follows:
• Chapter 1 provides with the basic definition for the Markowitz model and goes
through the standard methods of portfolio optimization.
• Chapter 3 explains the new subject of the Robo-Advisory, focusing on the case-
study of Scalable Capital.
3
Chapter 1
Portfolio optimization
1.1 Motivation
Why should one construct portfolios instead of looking for individual assets to invest
in? We can give a simple example that will show the main reason why constructing
portfolios is useful. In this example we will use two properties of assets that we will
properly define later: the expected return, which is essentially the expected value that
the asset will have at the end of a time period compared to its current one, and the
risk or volatility, which has to do with the variance in this return. Suppose now that
we have two different assets, asset A with an expected return of 4% and a volatility
of 10% and asset B with an expected return of 6% and a volatility of 14%.
Now we consider all the possible portfolios that can be made combining asset A
and B. It is clear that the properties of the whole portfolio will resemble those of an
individual asset more as we increase the weight of given asset, but how exactly can
they be computed? Lets say our portfolio consists of 50% of asset A and 50% of asset
B, then one might guess that its expected return is 4+6 2 = 5% and its volatility is
10+14
2 = 12%.
Return
6% B
5% P
4% A
Even though this seems like the most intuitive approach, here we have not taken
into account an important concept, the correlation between the two assets. The
4 Chapter 1. Portfolio optimization
expected return is actually correct and can be computed with a simple weighted
mean of the assets’ returns, but it turns out that when we properly compute the risk
of our portfolio it will generally be lower than the one we expected looking at the
mean of the assets’. This is directly related to the correlation between them: if the
assets are largely uncorrelated we will be able to reduce significantly the volatility of
their combinations, while the more correlated they are the more reality resembles our
previous naive approach. Figure 1 shows the actual behaviour of all portfolios that
result from combining A and B in different proportions, assuming a correlation of 0.4.
Every point corresponds to a certain composition, 100% A and 0% B, 90% A and 10%
B etc... The horizontal axis displays volatility while the vertical axis displays return.
We can observe how the curve is on the left of the straight line that goes from A
to B. If the correlation was lower we would see a more pronounced curvature, while
if it was closer to 1 the curve would approach the straight line. The intuition behind
this is that if an asset behaves in an opposite way of the other and we invest in both,
when one of them presents losses the other is likely to present gains and cover for
those losses, therefore reducing the total risk of the investment.
We have seen this basic example for two assets, but how would this graphic look
for three or more? We will discover that for an arbitrary number of assets instead
of a curve we have a region that represents all the possible portfolios. Out of this
region, we will only be interested on its edge on the left, which we will call efficient
frontier. This is because for a given return we will only be interested in the portfolio
that offers the lowest risk. This way, we obtain a curve similar to the one from our
simple example. This phenomenon will be explained in a more detailed manner once
we have formally presented Markowitz’s model.
1.2. Markowitz model: definitions 5
Definition 1.2.1. We define the return from time period t to time period t + 1 as
Pt+1
Rt,t+1 = .
Pt
The return represents the change in the value of an asset and tells us how much
it grows or decreases. An asset the value of which grows from 10 to 12 has a return
of 1.2 or 120%. We could also say that the return is 0.2 or 20%, but for the purpose
of making computation easier it is sometimes more practical to define returns in the
previous manner.
With this definition if an asset has a return R1,2 from time period 1 to time period
2 and a return R2,3 from time period 2 to time period 3 we can calculate the total
return from time period 1 to time period 3 simply as
Up until now, for a portfolio consisting of N assets we have the following magni-
tudes:
• A vector r = (r1 , r2 , ....rN ) of the average returns of each asset. Notice that
until now we had used R̄ to refer to these average returns, but for simplicity we
will just call them ri from now on.
Just as for every individual asset, we can compute an expected return and a volatility
for the whole portfolio.
The return of the portfolio is simply a weighted average of the individual returns,
however computing its volatility requires some extra steps. This is because we can
not look only at the variance or volatility of each asset individually, but we also have
to look at the correlations between them. To do this we will first need to store all the
information regarding the variances and correlations in a covariance matrix.
Σ = (σij )N ×N = σi σj ρij
where ρij is the correlation between the returns of asset i and asset j. It can be
computed as
Pk
(ri,k − ri ) (rj,k − rj )
ρij = qP
k
(ri,k − ri )2 k (rj,k − rj )2
P
where ri and rj are the average returns for assets i and j and ri,k is the return of asset
i at time period k.
1.2. Markowitz model: definitions 7
This is the volatility of our portfolio as a whole, which, taking into account the
correlations between assets, is different than the average of their volatilities.
To illustrate how we use all of this in a practical way, lets compute all these values
in the initial example we used as motivation. Lets recall our case:
Asset A: ra = 4% σa = 10%
Asset B: rb = 6% σb = 14%
Correlation: ρij = 0.4
How would a portfolio consisting of 50% A and 50% B look?
σp = 0.101 = 10.1%
This is lower than the average of the assets’ volatilities, almost as low as A’s. In
fact, it is possible to make portfolio from these assets that has a volatility lower than
any of the two. If we make a portfolio with weights 0.8 of A and 0.2 of B, we obtain a
volatility σp = 9.47. In figure 1.1 we can see the volatilities for portfolios with weights
w = (1, 0) w = (0.9, 0.1) . . . w = (0.1, 0.9) w = (0, 1). This example illustrates the
usefulness of portfolio creation when it comes to reducing volatility.
Definition 1.2.7. The sharpe ratio is a magnitude used to evaluate the performance
of a portfolio. It measures the return it provides per unit of risk and is defined as
Rp − Rf
.
σp
Rf is the risk free return, meaning the return we can obtain from an investment
that has virtually no risk, for example a U.S. Treasury bond. Subtracting it allows us
to focus on the amount of return that does come with the assumption of some risk.
Definition 1.2.8. The maximum drawdown is a risk measure that represents the
greatest loss that a financial product undergoes in a certain period of time. It does
not need the Markowitz model to be defined, it can be computed using only the
temporal price series of the product.
The drawdown in any point of time is defined as the difference between the current
value and the previous maximum (divided my the maximum, to express it in relative
terms). If we search for the largest of these values across the period, we obtain the
maximum drawdown, a measure of the largest value downfall that has taken place.
8 Chapter 1. Portfolio optimization
An important advantage this kind of risk measure has over volatility is that it only
measures downside risk, focusing only on possible losses, while volatility is affected by
both unexpected gains and unexpected losses.
In the figure 1.4 the original assets are represented by the red dots. However, of
the infinite number of possible portfolios represented by this region, which ones are
of our interest? It is not always easy to compare a pair of portfolios: take the ones
in our initial example, portfolio A with a 4% return and 10% volatility and portfolio
B with a return of 6% and 14% volatility. Different investors could choose either
one and there are arguments to be made for both, depending on what your priority
is, maximizing return or avoiding risk. However, if we had two assets both with 5%
return and one with 10% volatility while the other had 15% volatility, everyone would
agree that the first one is objectively superior since it offers the same return for less
volatility. A way to see this is to think of risk as something that has a cost, meaning
that more risk reduces the value of an asset. The same applies to portfolios: in the
region we have generated there are several portfolios with the same expected return
and different volatilities, of which only the one with the lowest volatility is of interest.
This means that in the bigger picture we will only be looking at the left edge of the
region of possible portfolios. The curve that this edge defines is called the efficient
frontier.
Computing this efficient frontier is an optimization problem: for a given return
we have to minimize the risk or volatility of the portfolio. The formulation of the
problem is
Minimize: σp2 = wT Σw
We can solve this problem for returns ranging from the lowest one of all the assets
to the highest one to plot the efficient frontier. Observe that this is a quadratic
programming problem and therefore it can be solved efficiently. This way we can find
the optimal portfolios for every return level according to Markowitz’s model.
Once we have found this efficient frontier, however, it is not immediate to choose
which particular portfolio is the best one. That is why many classical methods have
been created that work inside Markowitz’s model and to try to find a single most
optimal portfolio. Here we will explain and discuss some of them.
Maximize: wT r − λ2 wT Σw = rp − λ2 σp
N
X
Subject to: wi = 1
i=0
wi ≥ 0 ∀i
The function that we are trying to maximize depends on both return and risk,
return increases it while risk decreases it. The greater λ is the more relevant the risk
term will be and the more emphasis the method will make in avoiding risk.
σp = σp (w) = wT Σw
is a homogeneous function and therefore we can apply Euler’s theorem for homo-
geneous functions, obtaining
N
X
σ(w) = σi (w), where
i=1
wi (Σw)i
σi (w) = wi · ∂wi σ(w) = √ is the risk contribution of asset i.
w0 Σw
If we want each asset’s contribution to be equal, we must impose σi (w) = σj (w) ∀i, j
or equally σi (w) = σ(w)
N ∀i where N is the total number of assets. This problem can
be solved by looking at the fixed point problem
σ(w)2
wi =
(Σw)i N
or solving the optimization problem given by
N 2
X σ(w)2
min wi − .
w (Σw)i N
i=1
λ T
T T λ T
max w r − w Σw = max w σ − w Σw .
w 2 w 2
Maximizing this function is equivalent to maximizing the diversification ratio.
Conceptually, it can be interpreted as an attempt to maximize the reduction in risk
that the portfolio has when taking into account the effect of the correlations, in com-
parison to the naive sum of weighted individual volatilities.
In the next part of this work we will introduce a new method, hierarchical risk
parity (HRP) and we will evaluate the performance of its performance in comparison
to the previously mentioned classical methods.
13
Chapter 2
Perfectly correlated assets will have the minimum distance of 0 while the most uncor-
related, or negatively correlated assets will have a distance close to 1.
This way we can obtain a distance matrix from the correlation matrix by simply
applying this definition to each of its elements. However this distance only looks at
assets as pairs, it would be much more useful to define a distance that takes into
account the role that each one of the two assets plays in the context of the universe
of assets. For example, two assets A and B might not be particularly correlated, but
maybe the relationship they have with the rest of the assets is similar, in the sense
that if we take a random asset C it is likely that A and C are correlated in a similar
way that B and C are. In this case we would want to consider A and B more similar
than we initially thought. Using the initial distance d we can define a new distance d¯
that applies this concept in the following way:
v
uN
uX
¯
di,j = t (dn,i − dn,j )2 .
n=1
This basically means that the new distance d¯i,j between assets i and j is the
euclidean distance between columns i and j of the original distance d matrix.
Once this distance is defined, the next step is to recursively form clusters. This
way we will end up with a tree in which every cluster is made up of sub-clusters.
When all our assets are still split, we look for the minimum distance in the distance
matrix and combine its two assets into a cluster. For example, suppose our distance
matrix is the following:
14 Chapter 2. Hierarchical Risk Parity
a b c d e
a 0 17 21 31 23
b 17 0 30 34 21
c 21 30 0 28 39
d 31 34 28 0 43
e 23 21 39 43 0
Here the two closest nodes are a and b with a distance of 17. Therefore we would
combine a and b into a cluster (a, b). We update the distance matrix by removing the
two previous nodes and adding one that represents the recently formed cluster. In
this step we need to define a method to compute the distance between the new cluster
and the other nodes (or, in the future, other clusters). One way to do this is with the
nearest point algorithm.
h i
d¯A,B = min d¯i,j i∈A,j∈B
where A and B are clusters (an asset can be seen as a cluster with only 1 element).
Although this is the most commonly used method one could define a different one and
examine if the results of the algorithm change significantly. After this first step our
distance matrix would transform into the following:
(a, b) c d e
(a, b) 0 21 31 21
c 21 0 28 39
d 31 28 0 43
e 21 39 43 0
The new distances have been calculated with the method described above, for ex-
¯ (a, b)) = min(d¯c,a , d¯c,b ) = min(21, 30) = 21. In our next step the minimum
ample d(c,
distance we can find is 21, which is shared by the new cluster, c and e. Therefore
we will merge these three into a new cluster without forgetting the hierarchy, meaning
that we will keep in mind that a and b were combined together before any of the
others. Our distance matrix will end up looking like this:
((a, b), c, e) d
((a, b), c, e) 0 28
d 28 0
a b c e d
A way to store the whole clustering process is with a linkage matrix. This matrix
has (N − 1) × 4 size, where N is the number of clusters, and every row represents a
step in our clustering algorithm. If row m is (ym,1 , ym,2 , ym,3 , ym,4 ) this means that
in step m we have fused the ym,1 and ym,2 clusters that had a distance of ym,3 .The
last number ym,4 is the number of original, individual components that the resultant
cluster will have. We label our initial n elements with integers 0 to n−1 and then every
new cluster is labeled with the next integer. For example if our clustering method
ended up with the following result:
Figure 2.1
0 1 2 3
0 3.0 5.0 0.618707 2.0
1 6.0 8.0 0.788512 2.0
2 9.0 11.0 0.837068 3.0
3 4.0 10.0 0.845906 3.0
4 7.0 12.0 0.869022 4.0
5 13.0 14.0 0.878178 7.0
6 0.0 1.0 0.897254 2.0
7 15.0 16.0 0.898109 9.0
8 2.0 17.0 0.924699 10.0
As we can observe in the dendrogram, this would mean that in the first step we
combined assets 3 and 5 into a new cluster labeled 10. In the second step we would
combine assets 6 and 8 into a cluster labeled 11 and in the next step we would combine
this cluster 11 with asset 9 to create a cluster of 3 assets.
2.2 Quasi-diagonalization
In this step we reorganize the assets in the correlation matrix so that the largest
correlations lie around the diagonal. This way, assets will end up close to those similar
to them and far apart from very different ones and we will be able to visualize the
clusters in a correlation matrix. To follow the order given by our clustering method
we just have to look at every row of the linkage matrix, starting by the last one, and
recursively replace the clusters (ym−1,1 , ym−1,2 ) by their components.
In figure —- we can see how a correlation matrix could look before and after under-
going this quasi-diagonalization. The colors represent higher and lower correlations
and we can observe that greater correlations lie near the diagonal forming clusters.
diag[V ]−1
w=
trace (diag[V ]−1 )
Where V is the covariance matrix of the constituents of the cluster. This way we
can compute the variance of both clusters as
2.4. Code 17
σ1 = w1T V1 w1
σ2 = w2T V2 w2
.
Now we compute two factors we will use to re-escalate the weights of the clusters,
α1 and α2 .
σ1
α1 = 1 − ; α2 = 1 − α1
σ1 + σ2
.
And we will have that w1 = w1 α1 , w2 = w2 α2 . Observe that the total sum of the
weights of the final result is equal to 1. Suppose we have 2n assets, then our initial
vector w will be a vector of 2n ones, with a sum of 2n . In the first step we multiply
half the vector by α1 and the other half by α2 = 1 − α1 . If we make pairs with an
updated weight from the first half and another one from the second, each pair has a
sum of 1 and there are n2 pairs, meaning that the total sum has been halved in this
step. Recursively, in every step we take a vectors of identical weights and multiply
half of it by αi and the other half by 1 − αi , halving their sum. If we consider that
we visit the entirety of our assets every step, the whole process will take log2 2n = n
steps and the final sum will be 2n · ( 12 )n = 1. For example:
(1, 1, 1, 1) −→ (0.7, 0.7, 0.3, 0.3) −→ (0.7 · 0.6, 0.7 · 0.4, 0.3 · 0.8, 0.3 · 0.2)
2.4 Code
We prepared a snippet of code that allows us to perform HRP on a predefined set of
stocks and then assess its performance in comparison to the classic methods.
These will be the imports that we will need. Most of the work is made using the
functions of Numpy and taking advantage of the properties that Pandas’ structures
offer. We will only need Scipy to allow us to make dendrograms. The visualizations
will be made through the Matplotlib and Seaborn libraries.
import pandas as pd
import numpy as np
from scipy.cluster.hierarchy import dendrogram, linkage
from matplotlib import pyplot as plt
import seaborn as sns
1. In our first tests, we obtained the data from Yahoo Finances. In this website,
you can select a given stock and obtain historical data within some parameters
like time period and frequency. You will be able to download a file csv that has
7 columns: Date, Open, High, Low, Close, Adj Close and Volume.
In order to perform our simulations, we needed to download one file for each of
the stocks we wanted to analyze. Then, we had to perform some computations
to compute the returns of each stock (returns are computed as explained in the
previous chapter). Finally, we needed to obtain a single data structure that
encapsulated the information for all the stocks, as the HRP algorithm works in
a matrix manner like. This is what the next code does: from reading the files,
18 Chapter 2. Hierarchical Risk Parity
to computing returns, to create a single data structure for all the stocks. Note
that we need to give the name of the stocks beforehand to the code, and that
depends on the test that we are running.
#Rename the column from "Close" to the stock name that it came from
returns = returns.rename(columns={"Close": stocks_compl[0]})
prices = prices.rename(columns={"Close": stocks_compl[0]})
#Do exactly the same for all the stocks in the list, and join all
#the dataframes together
for stock in stocks:
df = pd.read_csv(stock+'.csv', index_col=0)
df_pct = df["Close"].pct_change()
df_price = df["Close"]
returns = returns.join(df_pct).rename(columns={"Close": stock})
prices = prices.join(df_price).rename(columns={"Close": stock})
2. Later, we wanted to test our code with data extracted from JPM’s paper, which
we will discuss later. The data was given to us in a different format: a single file
contained all the stocks. Such file had a column for the date and one column
with the prices for each of the stocks. This kind of format was easier to deal
with, but we had to code some lines for it, as we still needed to compute returns.
Additionally, the data extracted from the paper was not clean. Some of the
stocks couldn’t be found because they were either not publicly disclosed or
no longer existed. From the stocks that we could obtain, many values from
certain dates were missing. To this point, we had two options: either input the
2.4. Code 19
missing data or just skip the rows. We decided to simply skip the rows because
inputing the missing data would implicate to dedicate a special analysis on how
the inputation affects the result of the HRP algorithm. We wanted to focus this
work on the algorithm specifically, so we did not want to introduce new variables
with the inputation.
#-----------------------------------------------------------------------
#TEST 5 JPMORGAN PAG 16 VALUES
stocks = ["JMABDJSE INDEX","JPFCVA01 INDEX",
"JPFCVA05 INDEX","JPMZVEL1 INDEX", "JPMZVP4G INDEX"]
stocks_compl = ["ERJPMFGU INDEX"] + stocks
variable_read = "vals_16.csv"
#-----------------------------------------------------------------------
Once we reach this part, we have a single data structure using the library Pandas:
it is indexed by the date and has a column containing the returns information for each
of the stocks that we are dealing with. In the code, this structure has the variable
name returns. We can start the HRP algorithm.
The first step is the tree clustering. We need to compute a distance matrix from
the return prices. Note that here relies the importance of having all the information
in the same data structure: we deal with it as if it were a matrix. We will make
use of the advantages of computing with Numpy and then use Scipy to compute the
dendrogram, as it makes it a lot easier than to just do it manually.
Notice that we use the libraries Matplotlib and Seaborn for visualization pur-
poses.
Our next goal is to work with the quasi diagonalization. The function linkage
from the previous stack of code provided us with a table that described the order
in which the clusters were formed. Using this table, we will rearrange the indexes
20 Chapter 2. Hierarchical Risk Parity
of the matrix in the proper order defined by the clustering, thus providing a quasi
diagonalization.
def get_quasi_diag(link):
link = link.astype(int)
# get the first and the second item of the last tuple
sort_ix = pd.Series([link[-1,0], link[-1,1]])
# the total num of items is the third item of the last list
num_items = link[-1, 3]
sort_ix = sort_ix.append(df0)
sort_ix = sort_ix.sort_index()
sort_ix.index = range(sort_ix.shape[0])
return sort_ix.tolist()
To make that effective, we have to remember that our initial data structure is
indexed by names, more specifically, the names of the stocks. It is key that we
have provided a list containing all the stocks, in the code is referenced as stocks
compl. Then, we use the indexes properties of Numpy to obtain our quasi diagonalized
matrix. Using Seaborn again we produce another visualization, so we can compare
the structure of the before and after.
sort_ix = get_quasi_diag(link)
stocks_compl = np.array(stocks_compl)
df_vis = returns[stocks_compl[sort_ix]]
corr2 = df_vis.corr()
ax = sns.heatmap(corr2, cmap="coolwarm")
2.4. Code 21
The last step is computing the recursive bisection. We start with a vector of
weights that initially has all weights to 1. Then, for each cluster that we bisect, we
will multiply those weights by a variation of the volatility of each cluster.
Since the volatility of the cluster is used several times in this function, we built a
function to compute it. This code essentially computes the formula adapting to the
format of our data structure.
diag[V ]−1
w=
trace (diag[V ]−1 )
def get_cluster_var(cov, c_items):
cov_ = cov.iloc[c_items, c_items] # matrix slice
# calculate the inversev-variance portfolio
ivp = 1./np.diag(cov_)
ivp/=ivp.sum()
w_ = ivp.reshape(-1,1)
c_var = np.dot(np.dot(w_.T, cov_), w_)[0,0]
return c_var
And then we proceed with the recursive bisection.
def get_rec_bipart(cov, sort_ix):
# compute HRP allocation
# intialize weights of 1
w = pd.Series(1, index=sort_ix)
# now it has 2
for i in range(0, len(c_items), 2):
alpha = 1 - c_var0/(c_var0+c_var1)
w[c_items0] *= alpha
w[c_items1] *=1-alpha
return w
22 Chapter 2. Hierarchical Risk Parity
Since our goal was to analyze the HRP algorithm, we need some parameters to
compare, namely, the results of the other classic methods. We computed the minimum
variance, the risk parity and the uniform weights using the formulas provided in the
previous chapter.
def compute_MV_weights(covariances):
inv_covar = np.linalg.inv(covariances)
u = np.ones(len(covariances))
def compute_RP_weights(covariances):
weights = (1 / np.diag(covariances))
x = weights / sum(weights)
return pd.Series(x, index = stocks_compl, name="RP")
def compute_unif_weights(covariances):
x = [1 / len(covariances) for i in range(len(covariances))]
return pd.Series(x, index = stocks_compl, name="unif")
We proceed to show the final results for a given set of stocks. We compute all the
weights and join them in the same data structure so we can print the result and have
better readability.
cov = returns.cov()
weights_MV = compute_MV_weights(cov)
weights_RP = compute_RP_weights(cov)
weights_unif = compute_unif_weights(cov)
results = weights_HRP.to_frame()
results = results.join(weights_MV.to_frame())
results = results.join(weights_RP.to_frame())
results = results.join(weights_unif.to_frame())
Finally, we compute some parameters that will allow us to better compare and
analyze the result of our algorithms.
We start by computing the expected return. The function provided in this code
computed the expected return for a given set of stocks, provided with some weights.
The rest of the code is just dealing the Pandas requirements to put everything together
in a single data structure so it can be easily read.
def compute_ER(weights):
mean = returns.mean(0)
2.4. Code 23
er_hrp = compute_ER(weights_HRP)
er_hrp.name = "HRP"
er_mv = compute_ER(weights_MV)
er_mv.name = "MV"
er_rp = compute_ER(weights_RP)
er_rp.name = "RP"
er_unif = compute_ER(weights_unif)
er_unif.name = "unif"
ers = er_hrp.to_frame()
ers = ers.join(er_mv.to_frame())
ers = ers.join(er_rp.to_frame())
ers = ers.join(er_unif.to_frame())
ers = ers.sum()
ers.name = "Expected Return"
ers = ers.to_frame()
Next, we compute the portfolio volatility. Again, only the code encapsulated in
the function is really necessary to compute the portfolio volatility, the rest depends
of the structure of Pandas.
Our next parameter is the sharpe ratio. In order to compute it, we need a function
that gives us the risk free return. This is the rate that one can obtain with virtually
no risk, for example with treasury bonds. This value would depend on the country
the investor is on, for now we have left it at 0.
def risk_free():
return 0
Another important parameter is the maximum drawdown. There are several ways
of computing this with python, including built-in functions from some libraries. How-
ever, we decided to code this by hand as it was not very difficult to do so.
def compute_mdd(weights):
df = weights * prices
df = df.sum(1)
roll_max = df.cummax()
daily_drawdown = df/roll_max - 1.0
data = [compute_mdd(weights_HRP)]
data.append(compute_mdd(weights_MV))
data.append(compute_mdd(weights_RP))
data.append(compute_mdd(weights_unif))
dd = pd.DataFrame(data = data, index=["HRP", "MV", "RP", "unif"],
columns = ["Max DD"])
The last parameter that we will compute will be the diversification ratio. Notice
that, in order to compute this one, we need to use the portfolio volatility function.
The last lines of our code just put all the parameters together so we can print a
nice table containing all the results.
final_results = ers.join(volatility)
final_results = final_results.join(sharpe_R)
final_results = final_results.join(dd)
final_results = final_results.join(dr)
OCO.V (Oroco Resources). We obtained this data from the website Yahoo Finance,
that allows us to select the period of time that we want to analyze among other things.
In image 2.3 we can see the options that we are given and the structure that the data
has.
To perform our tests, we looked at the closing price of every day from February
2020 to February 2021, for a total of 254 prices (since asset markets only open on
working days). As commented in the previous section, the first step is to load the
data for all the assets and compute all the returns. We obtain a 11x253 table that we
can use to compute the average return of every asset and the covariance and correlation
matrix. Initially, with our assets in a random order, this is a representation of our
correlation matrix:
26 Chapter 2. Hierarchical Risk Parity
Now we run the first part of HRP, the clustering. Based on the correlation between
the assets, the algorithm organizes them in a clustering tree.
Unsurprisingly, the technological start-up Gamestop, that had some artificial ir-
regularities in its behaviour lately, is isolated in its own cluster. Inside the other one,
the four largest companies MSFT, GOOG, AAPL and AMZN make their own cluster,
as well as the four Spanish assets IAG, ACS, TRE and REP. OCO and REE seem to
be pretty uncorrelated to these clusters, even though not as much as GME. Once we
use this tree to reorganize our assets and update the correlation matrix we can see
that it now looks more like a diagonal matrix, and we can clearly see the clusters of
correlated assets.
2.5. Tests with real world values 27
Now we are ready for our last step: weight allocation. As we explained, we
initialize all our weights at a value of 1 and then start from the top of the hierarchical
tree. Every bifurcation defines two clusters that will compete for weights. In the first
step we will compare the "cluster" formed by GME against the one formed by the
rest of the stocks. In the second step the first cluster is formed by the four global
multinationals (AMZN, AAPL, MSFT and GOOG) and it will compete against the
second cluster containing OCO.V and the Spanish companies. Inside every cluster,
subclusters will me made as well. For example, out of the large global companies, HRP
considers that AMZN can be split up first and compared with the remaining three,
and unsurprisingly it also separates OCO.V, a Canadian mineral explorer working in
Mexico, from the group of Spanish companies early in the process. After the allocation
is complete we compare the results we obtain from HRP with the results of some of
the classical methods: minimum variance (MV), risk parity (RP) and uniform weights
(Unif).
HRP MV RP Unif
GME 0.5130 0.4375 0.3603 9.0909
AMZN 23.3498 41.7889 16.4011 9.0909
AAPL 8.2981 -1.9115 10.0986 9.0909
MSFT 5.0862 -28.8795 11.4706 9.0909
GOOG 6.2536 24.5276 14.1034 9.0909
OCO.V 4.4302 0.4018 2.8555 9.0909
REE.MC 35.0759 53.4189 24.7810 9.0909
IAG.MC 3.5279 -1.9741 2.4925 9.0909
ACS.MC 0.55035 5.0107 5.1570 9.0909
TRE.MC 3.9406 7.1005 6.0778 9.0909
REP.MC 4.0212 0.0792 6.2022 9.0909
Table 2.6
The main goal of this test is not to do a quantitative analysis, since the investment
universe we have chosen is arbitrary and does not resemble a real world problem.
However, it is a good illustration of the clustering process of HRP works and how
the clusters that are formed make sense intuitively. It is interesting, in any case, that
HRP manages to offer a good expected return without too much cost in the volatility
compared to other methods. The uniform weights offer a similar result in these areas,
but presents the largest maximum drawdown, while HRP keeps it minimal. In our
following tests we will aim to determine if this performance is also shown in more
general contexts.
The clustering process that uses this matrix leads to a clustering tree much more
complex than the one in the previous example:
It is also enlightening to look at the ordered correlation matrix to see the reasoning
behind the clusters. In this case, we can observe clearly how there is a strongly
correlated cluster formed by over half of the items in the list.
30 Chapter 2. Hierarchical Risk Parity
Once the weight allocations are complete we compare again the expected perfor-
mance of this method and the other ones.
Table 2.7
While it is true that HRP offers again the highest expected return, except for that
of the uniform method, this time the volatility of its portfolio is too high in comparison
to the others. This is understandable when compared to minimum variance, since the
whole point of the method is to minimize the volatility, but it is surprising when
compared to risk parity, which is a similar method. When it comes to the maximum
drawdown, HRP is the best performing method again. Overall we can observe here
how using the tools in Markowitz’ model in the various methods allows us to greatly
reduce the risk of a great loss in comparison to the behaviour of the investment
universe as a whole, represented by the uniform weight allocation.
such studies. We found out that many of their indices were not of public domain, so
the simulations will definitely have a different result from the one presented in their
paper.
The first test begins at page 12, and there is a list of 16 stocks involved. However,
from this list, 6 indices were restricted, meaning that we could find them but could
not have access to their values; and 4 others did not exist, meaning that we could
not find their values or their historical data. So, our test was cut short at 7 stocks.
Additionally, some stocks had missing data, and since the HRP needs a matrix-like
structure, we had to remove all rows were there was some missing data.
We can observe that in this case most of the assets are weakly correlated, meaning
that we do not have clearly well defined clusters like in the previous examples.
Table 2.9
When we analyze the performance of our methods in this investment universe there
are not significant differences between the three algorithms, except for HRP having a
larger volatility once again.
The second test begins at page 16 and there is a list of 31 stocks involved. From
such list, 16 stocks were restricted and 9 did not exist. We could only have access to
6 indices, meaning that this was the test that we performed with the least data.
34 Chapter 2. Hierarchical Risk Parity
Similarly to the previous test, the assets are weakly correlated and the correlation
matrix is already quasi-diagonal to begin with, without many changes or significant
clusters to be made.
Table 2.11
In this case, HRP is the only method that offers us a reasonable expected return,
however it is again at the expense of a higher volatility. It also does a good job
reducing the maximum drawdown, as well as RP.
Finally, the last experiment begins at page 19 with a list of 16 stocks. This time,
we could find all the data: MSCI ACWI Gross Total Return USD Index (M2WD),
MSCI World Gross Total Return USD Index (M2WO), MSCI Emerging Markets
Gross Total Return USD Index (M2EF), MSCI USA Gross Total Return USD Index
(M2US), MSCI Europe Gross Return EUR Index (M8EU), MSCI Japan Gross Return
JPY Index (M8JP), MSCI AC Asia Pacific Net Total Return USD Index (M1AP),
Bloomberg Barclays Global Aggregate Government Total Return Index Hedged USD
(LGAGTRUH), Bloomberg Barclays US Govt Total Return Value Unhedged USD
(LUAGTRUU), Bloomberg Barclays EuroAgg Treasury Total Return Index Value Un-
hedged EUR (LEATTREU), Bloomberg Barclays Global Aggregate Corporate Total
Return Index Hedged USD (LGCPTRUH), Bloomberg Barclays US Corporate To-
tal Return Value Unhedged USD (LUACTRUU), Bloomberg Barclays Pan European
Aggregate Corporate TR Index Hedged EUR (LP05TREH), Bloomberg JPMorgan
Asia Dollar Index (ADXY), Dollar Index (DXY), Bloomberg Commodity Index Total
Return (BCOMTR).
In this case we have a completely different data set, with a correlation matrix that
presents many large values allowing HRP to make well defined clusters.
2.5. Tests with real world values 37
The clustering process obtains two large clusters and an individual asset. The
differences between them become clear when we look at the quasi-diagonalized corre-
lation matrix.
38 Chapter 2. Hierarchical Risk Parity
Despite again having good results when it comes to the maximum drawdown, HRP
presents again a higher volatility than its alternatives.
It is important to keep in mind, however, that these magnitudes are calculated as
a prediction of the performance that portfolios will have in the future. To know the
real results each portfolio would have offered we would need data of future years to
simulate their evolution, which is the base for our next case study.
then observe how this portfolio performs during February. After that, we can use
the data from February 2017 to February 2018 to calculate new optimal weights, re-
calibrate the portfolio to these, and evaluate its performance in March. This way we
will obtain a particular variation in price every month, which will be the real return of
the portfolio, not just the expected one. With this return we can track the evolution
in the total portfolio’s value throughout time.
It is also important to take into account the transaction cost there is when moving
capital from an asset to another. The presence of this cost favours methods that are
stable and do not need to make large changes on their weight allocations when the
data from the assets is updated. This is a new concept that did not arise when making
static simulations and will now play an important role when evaluating a method’s
performance.
For these simulations we used JP Morgan’s ETFs, of which we dispose of data
from January 2015 to March 2021.
We can observe that HRP and RP give the best profit, having both a very similar
evolution. MV does not perform as well as the risk parity methods, but it is clear that
it outperforms the naive uniform weights method. In particular, we see once again
how all the methods manage to palliate sudden downfalls that the market suffers as
a whole, which happens in this example during the 2018 year.
In our previous tests, HRP had usually given us higher expected returns and
volatility than other methods. We can compute the expected return that every method
provides monthly to then compare it to the real returns they achieve. The following
graph illustrates the expected returns of all methods step by step.
40 Chapter 2. Hierarchical Risk Parity
It can be observed in the image that HRP and RP present very similar, correlated
expected returns. However, HRP’s present a higher variance, which is coherent with
the higher volatility it came up with in most cases. However, it is natural to ask
how this expected return compares to the real return that HRP’s portfolio has every
month. If we compare both we obtain the following.
It turns out that a real portfolio that follows the HRP method would have pre-
sented a much smaller variance than expected theorically. We found that this feature
is particular of HRP, since other methods present more similarity in their real vs ex-
pected returns. For example, for minimum variance the comparison is illustrated in
figure 2.22.
2.6. Tests with time evolution 41
The reason for this difference could be explained by the fact that HRP introduces
a previous step, the tree clustering, which does not stem directly from an optimiza-
tion performed in the Markowitz model, therefore not giving maximum priority to
the magnitudes that we are measuring, in contrast to methods like MV that mini-
mize the volatility. However, as we can seen, that does not necessarily mean a poor
performance.
There is a last factor to add to the simulation, the transaction cost that one
has to assume when moving capital from an asset to another. This factor rewards
methods that are not excessively sensitive to changes in data, since they could need
to move large amounts of funds every time the portfolio is re-calibrated. For this
test we have assumed a transaction cost of 0.2%. Transaction cost must be computed
and subtracted from the return at every step. When adding this factor, the previous
evolution shown in figure 2.19 becomes the following:
We observe that despite still being the second best performing method, HRP is
the one that has been handicapped the most because of the transaction cost, which
is sign that the allocations it gives vary greatly with time. It is interesting that HRP
presents this issue while RP does not, even when both methods have shown a very
42 Chapter 2. Hierarchical Risk Parity
similar performance without taking transaction cost into account. To have a better
insight to this matter we can plot the weights that a particular method chooses in
every point in time to see if the changes it makes are rather extreme or smooth. For
example the RP method gives us the following weights.
It seems like RP values greatly a couple of assets of which it makes most of the
portfolio, sometimes choosing one over the other or balancing both with a similar
weight, while keeping the other weights at low values. It looks like a reasonable
evolution and does not present extreme changes, which accounts for the low costs RP
suffers. However, HRP’s graph looks much more chaotic.
While HRP also chooses one or two favourite assets of which it makes most of
the portfolio, these vary greatly at every point in time. Having to move almost the
entirety of the capital in every step, it is logical that this method is the most affected
by the transaction costs. To solve this issue it could be possible to implement an
in-between step before updating the weights: comparing the expected performance
of the new ones against the previous one and deciding if the improvement is worth
the transaction cost. If it is not, the updating can be postponed until the next time
period.
2.6. Tests with time evolution 43
However, there are two important remarks to be made about this case. First, as
in the previous case, the real returns that HRP generates are much more stable and
consistent than what is predicted using the Markowitz model.
Secondly, when we add transaction cost to the equation, HRP does not underper-
form like it did with the first data set. In fact, it keeps up with RP which again does
not present heavy losses from these costs and maintains a similar evolution to the one
without them.
44 Chapter 2. Hierarchical Risk Parity
This leads us to believe that, even though in this case HRP did not outperform
methods like MV, its issue with re-calibrating portfolios too extremely and causing
too many costs is situational and does not appear in every scenario.
45
Chapter 3
Robo-advisory
Robo-advisors are platforms that give financial analysis and advice to their users in
a fast and personalized way. In today’s world we have to work with large quantities
of information that are shared at high speed, and the amount of customers grows
exponentially. Therefore it is clear that a platform that can automatically process
data and give advice to thousands of customers at the same time can have a lot of
advantages over traditional financial advisory. For example, it could be able to detect
when certain parameters of a portfolio are outside of the expected range and it can
quickly inform the user or even make automatic changes to solve the issue. In this
chapter we will give examples of successful robo-advisors and explain their methods,
to see the differences in their approach to the classical methods we have showed until
now and to see if we can use HRP to improve on some of their results.
• Profiling of the investor, classifying their risk tolerance, financial status and
investment goals.
• Monitoring and continuous adjusting of the portfolio to maintain its target fea-
tures.
based on. To make a selection of ETF’s Scalable capital looks at multiple qualities
that they may or may not present.
First, how expensive they are in terms of commissions, fees, management costs,
etc. This is defined in terms of the Total Expense Ratio (TER) calculated as the total
cost of the fund compared to the cost of the assets that it contains. This measures
how many expenses a fund has and the lower it is, the better for the investor. We
should also take into account the liquidity of the ETF, it is preferable to invest in
an ETF with large trading volumes which have lower transaction costs. There is
also the accuracy of the ETF when it comes to tracking the underlying index they
are based on. There are strategies that invest only in a subset of the assets of the
index to reduce the cost, but Scalable Capital prefers ETFs with high accuracy in the
tracking of the index. When it comes to diversification it is clearly ideal to diversify
as much as possible, however investing in very broad indexes can increase the TER
since these often include elements with low liquidity. Therefore it is best to find a
balance between diversifying as much as possible and keeping the expenses low. Also,
in every country tax regulations may vary as well as the currency. Both have to be
taken into account when selecting its ETF universe. It is also preferable to invest in
ETFs as fractional as possible, with cheaper units so that it is possible to choose more
accurately the amount we wish to invest or make more precise adjustments. Lastly,
when it comes to evaluating the risk of an ETF it might be necessary to not only look
to its market risk but also at other types. For example, some ETFs do not buy the
underlying asset but replicate its price by buying derivative agreements with a third
party. In this case, the possibility that this third party is not able to fulfill its part
adds risk to the equation.
• Value at risk (VaR): this is the loss that would only appear with less probability
than a given one. For example, if the VaR at 5% has a value of 10% this means
that only in 5% of cases we could observe a loss of 10%. In a gaussian distribution
this is equivalent to finding the point that leaves a tail of 5% probability to its
left.
3.1. Scalable capital 47
• Expected shortfal (ES): we define the ES as the expected loss that we would
have if we are outside the confidence interval used for the VaR. In the previous
example, this would be the expected loss with the conditioned probability that
the loss exceeds 10%, or the expected loss within the realm of the 5% worst
possibilities.
like an option, can depend on many other factors. This is important to determine if
a risk measure is subadditive. Subadditivity is a property in probability that means
that the value of a function evaluated in the sum of two distributions will be lower
than the sum of the function evaluated in each individual distribution. For example,
VaR is subadditive for linear derivatives and it stems from the subadditivity of the
standard deviation of normal distributions. If we have two normal distributions X
and Y with standard deviations σX and σY and correlation ρ the standard deviation
of the sum is
q
σX+Y = σX 2 + σ 2 + 2ρσ σ ≤ σ + σ .
Y X Y X Y
Since VaR can be calculated as a multiple of the volatility it conserves this prop-
erty, which is very useful since the opposite, superadditivity, leads to an overestimation
of the risk of the portfolio. The investment universe of Scalable Capital does not con-
tain non-linear ETFs and therefore it is safe to use VaR as a risk measure. In this
case of linearity it is also true that the ES is proportional to the VaR, meaning that
there is no extra information to be obtained from it.
Another important aspect is the ability of a risk measure is to allow backtesting
and empirical model validation. Backtesting means evaluating the performance of a
given strategy on historical data different to the one used previously to construct the
strategy. For a risk measure to be used properly in backtesting it needs to have a
property named elicitability. For a variable to be called elicitable there must exist a
scoring function whose expected value under a distribution takes its minimum in the
variable’s value. For example, we know that the mean of a distribution Y minimizes
the function f (x) = E(Y − x)2 . In this regard, VaR is elicitable while ES lacks this
property, making it practically impossible to validate its effectiveness by backtesting.
Next is the computational robustness. Since MDD and ES have to do with the lower
extreme of the distribution it is natural that they are more susceptible to alterations.
To tackle this problem, one can try to look at a higher number of observations, not
just a few on the end of the tail, but then the actual shape of the distribution of
the tail is less defined and becomes similar to the original distribution. Another
strategy is to look only at the most extreme observations and then use a model to
theorize the rest of the distribution. The drawback is, of course, that we are making
an arbitrary assumption on the distribution of the tail. These issues when it comes
to robustness speak in favour of the use of VaR. When it comes to computational
complexity, however, VaR is not necessarily a convex function of the weights and
therefore it can present certain difficulty to compute. In this regard ES is preferable,
so it is important to consider the question of when it worth it to go over the extra
computational complexity in order to work with VaR.
3.1.3 Optimisation
After carefully assessing an investor’s maximum risk level, Scalable Capital tries to find
the portfolio that offers the maximum benefit without surpassing this risk level. As
mentioned before, they focus on downside risk measures instead of looking at volatility
like most methods based on the Markowitz model do. Therefore one of the constraints
of the problem is not surpassing the VaR an investor can accept. There are also
individual constraints on individual assets and classes that limit the maximum amount
of capital that can be invested in them, to ensure diversification. Scalable capital also
introduces a penalization for excessive trading to their optimization problem, to avoid
3.1. Scalable capital 49
Where:
• lbn and ubn are the lower and upper bounds for individual assets.
• lbg and ubg are the lower and upper bounds for asset classes or groups.
Chapter 4
Conclusions
In conclusion, we have found that HRP presents certain characteristics that classical
methods based on Markowitz’ model do not have and can lead to advantages in several
situations. A particular advantage is its human-like behaviour of comparing elements
to group them in clusters, which can make its results more easily understandable for
investors. It is also a method that still has room for improvement. Looking into its
issue with transaction costs and searching for a solution can improve its performance.
The tree clustering step that HRP uses can also be used in other methods: for exam-
ple we have seen that Scalable capital improves on the classical concept of variance
and correlation as a risk measure, but their optimization problem still involves the
concept of separating assets into different groups or clusters. That is why we believe
that looking for a way to implement HRP’s concept into problems like this as well
evaluating its performance more diverse situations can definitely be profitable.
53
Appendix A
Code
import pandas as pd
import numpy as np
from scipy.cluster.hierarchy import dendrogram, linkage
from matplotlib import pyplot as plt
import seaborn as sns
import warnings
from matplotlib import cm
import matplotlib.animation as animation
from matplotlib import rc
rc('animation', html='html5')
warnings.filterwarnings("ignore")
'''
#-----------------------------------------------------------------------
#TEST 1 REDUCED DATASET
stocks = ["AMZN_2018", "MSFT_2018", "GOOG_2018"]
stocks_compl = ["AAPL_2018"] + stocks
#-----------------------------------------------------------------------
'''
#-----------------------------------------------------------------------
#TEST 2 AMPIFIED DATASET
stocks = ["AMZN_2018","GME_2018","IAG.MC_2018","ACS.MC_2018", "OCO.V_2018","TRE.MC_2018",
"MSFT_2018","REE.MC_2018", "GOOG_2018","REP.MC_2018"]
stocks_compl = ["AAPL_2018"] + stocks
#-----------------------------------------------------------------------
'''
#-----------------------------------------------------------------------
#TEST 3 DIVERSIFIED DATASET
stocks = ["BNDX_2018","EMB_2018","IEFA_2018","IEMG_2018", "ITOT_2018",
"IWN_2018", "IWS_2018","JPST_2018",
"MUB_2018","SCHV_2018","TFI_2018", "VBR_2018","VEA_2018", "VOE_2018",
"VTI_2018", "VTIP_2018","VTV_2018", "VWO_2018"]
stocks_compl = ["AGG_2018"] + stocks
#-----------------------------------------------------------------------
'''
54 Appendix A. Code
#Rename the column from "Close" to the stock name that it came from
returns = returns.rename(columns={"Close": stocks_compl[0]})
prices = prices.rename(columns={"Close": stocks_compl[0]})
#Do exactly the same for all the stocks in the list, and join all
#the dataframes together
for stock in stocks:
df = pd.read_csv(stock+'.csv', index_col=0)
df_pct = df["Close"].pct_change()
df_price = df["Close"]
returns = returns.join(df_pct).rename(columns={"Close": stock})
prices = prices.join(df_price).rename(columns={"Close": stock})
#The pct_change leaves one NaN at the start of the dataframe, we must take that out.
returns = returns.dropna()
returns.head(), prices.head()
'''
#-----------------------------------------------------------------------
#TEST 4 JPMORGAN PAG 12 VALUES
stocks = ["ERJPMOGU INDEX","ERJPVLGU INDEX","JPFCCP02 INDEX"
,"JPFCVA01 INDEX", "JPVOFXB2 INDEX"]
stocks_compl = ["JMABDJSE INDEX"] + stocks
variable_read = "vals_12.csv"
#-----------------------------------------------------------------------
'''
'''
#-----------------------------------------------------------------------
#TEST 5 JPMORGAN PAG 16 VALUES
stocks = ["JMABDJSE INDEX","JPFCVA01 INDEX","JPFCVA05 INDEX",
"JPMZVEL1 INDEX", "JPMZVP4G INDEX"]
stocks_compl = ["ERJPMFGU INDEX"] + stocks
variable_read = "vals_16.csv"
#-----------------------------------------------------------------------
'''
Appendix A. Code 55
'''
#-----------------------------------------------------------------------
#TEST 5 JPMORGAN PAG 19 VALUES
stocks = ["M2WO Index", "M2EF Index","M2US Index","M8EU Index",
"M8JP Index","M1AP Index", "LGAGTRUH Index",
"LUAGTRUU Index", "LEATTREU Index", "LGCPTRUH Index",
"LUACTRUU Index", "LP05TREH Index",
"ADXY Index", "DXY Index", "BCOMTR Index"]
stocks_compl = ["M2WD Index"] + stocks
variable_read = "vals_19.csv"
#-----------------------------------------------------------------------
'''
'''
prices = pd.read_csv(variable_read, decimal=',', index_col=0)
prices = prices.dropna()
prices = prices[stocks_compl]
returns = prices.pct_change().dropna()
returns = returns[stocks_compl]
prices.head(), returns.head()
'''
def get_quasi_diag(link):
link = link.astype(int)
# get the first and the second item of the last tuple
sort_ix = pd.Series([link[-1,0], link[-1,1]])
56 Appendix A. Code
# the total num of items is the third item of the last list
num_items = link[-1, 3]
sort_ix = sort_ix.append(df0)
sort_ix = sort_ix.sort_index()
sort_ix.index = range(sort_ix.shape[0])
return sort_ix.tolist()
sort_ix = get_quasi_diag(link)
stocks_compl = np.array(stocks_compl)
df_vis = returns[stocks_compl[sort_ix]]
corr2 = df_vis.corr()
ax = sns.heatmap(corr2, cmap="coolwarm")
c_items = [sort_ix]
while len(c_items) > 0:
# bisection
"""
[[3, 6, 0, 9, 2, 4, 13], [5, 12, 8, 10, 7, 1, 11]]
[[3, 6, 0], [9, 2, 4, 13], [5, 12, 8], [10, 7, 1, 11]]
[[3], [6, 0], [9, 2], [4, 13], [5], [12, 8], [10, 7], [1, 11]]
[[6], [0], [9], [2], [4], [13], [12], [8], [10], [7], [1], [11]]
"""
c_items = [i[int(j):int(k)] for i in c_items for j,k in
((0,len(i)/2),(len(i)/2,len(i))) if len(i)>1]
# now it has 2
for i in range(0, len(c_items), 2):
alpha = 1 - c_var0/(c_var0+c_var1)
w[c_items0] *= alpha
w[c_items1] *=1-alpha
return w
def compute_MV_weights(covariances):
inv_covar = np.linalg.inv(covariances)
u = np.ones(len(covariances))
def compute_RP_weights(covariances):
weights = (1 / np.diag(covariances))
x = weights / sum(weights)
return pd.Series(x, index = stocks_compl, name="RP")
def compute_unif_weights(covariances):
x = [1 / len(covariances) for i in range(len(covariances))]
return pd.Series(x, index = stocks_compl, name="unif")
cov = returns.cov()
58 Appendix A. Code
weights_MV = compute_MV_weights(cov)
weights_RP = compute_RP_weights(cov)
weights_unif = compute_unif_weights(cov)
results = weights_HRP.to_frame()
results = results.join(weights_MV.to_frame())
results = results.join(weights_RP.to_frame())
results = results.join(weights_unif.to_frame())
results
def compute_ER(weights):
mean = returns.mean(0)
rt = weights.values * mean
return (1 + rt)**252 -1
er_hrp = compute_ER(weights_HRP)
er_hrp.name = "HRP"
er_mv = compute_ER(weights_MV)
er_mv.name = "MV"
er_rp = compute_ER(weights_RP)
er_rp.name = "RP"
er_unif = compute_ER(weights_unif)
er_unif.name = "unif"
ers = er_hrp.to_frame()
ers = ers.join(er_mv.to_frame())
ers = ers.join(er_rp.to_frame())
ers = ers.join(er_unif.to_frame())
ers = ers.sum()
ers.name = "Expected Return"
ers = ers.to_frame()
ers
def risk_free():
return 0
def compute_mdd(weights):
df = weights * prices
df = df.sum(1)
roll_max = df.cummax()
daily_drawdown = df/roll_max - 1.0
#max_dd = daily_drawdown.cummin()
#return max_dd.min()
# Plot the results
#daily_drawdown.plot(figsize=(20, 16))
#max_dd.plot(figsize=(20, 16))
#plt.show()
return daily_drawdown.min()
data = [compute_mdd(weights_HRP)]
data.append(compute_mdd(weights_MV))
data.append(compute_mdd(weights_RP))
data.append(compute_mdd(weights_unif))
dd = pd.DataFrame(data = data, index=["HRP", "MV", "RP", "unif"],
columns = ["Max DD"])
dd
final_results = ers.join(volatility)
final_results = final_results.join(sharpe_R)
final_results = final_results.join(dd)
final_results = final_results.join(dr)
final_results
#-----------------------------------------------------------------------
#TEST 4 JPMORGAN PAG 12 VALUES
stocks_compl = ["JMABDJSE INDEX", "ERJPMOGU INDEX","ERJPVLGU INDEX",
"JPFCCP02 INDEX","JPFCVA01 INDEX", "JPVOFXB2 INDEX"]
variable_read = "vals_12.csv"
#-----------------------------------------------------------------------
'''
#-----------------------------------------------------------------------
#TEST 5 JPMORGAN PAG 16 VALUES
stocks_compl = ["ERJPMFGU INDEX", "JMABDJSE INDEX","JPFCVA01 INDEX",
"JPFCVA05 INDEX","JPMZVEL1 INDEX", "JPMZVP4G INDEX"]
variable_read = "vals_16.csv"
#-----------------------------------------------------------------------
'''
'''
#-----------------------------------------------------------------------
#TEST 5 JPMORGAN PAG 19 VALUES
stocks_compl = ["M2WD Index", "M2WO Index", "M2EF Index","M2US Index",
"M8EU Index", "M8JP Index","M1AP Index", "LGAGTRUH Index",
"LUAGTRUU Index", "LEATTREU Index", "LGCPTRUH Index",
"LUACTRUU Index", "LP05TREH Index",
"ADXY Index", "DXY Index", "BCOMTR Index"]
#stocks_compl = ["LGCPTRUH Index", "DXY Index", "M1AP Index",
"M2WO Index", "BCOMTR Index"]
variable_read = "vals_19.csv"
#-----------------------------------------------------------------------
Appendix A. Code 61
'''
returns.index = pd.to_datetime(returns.index)
prices.index = pd.to_datetime(prices.index)
returns.tail()
pp_hrp = []
pp_mv = []
pp_rp = []
pp_unif = []
hrp = []
hrp_weights = pd.DataFrame(columns=stocks_compl)
mv = []
mv_weights = pd.DataFrame(columns=stocks_compl)
rp = []
rp_weights = pd.DataFrame(columns=stocks_compl)
unif = []
unif_weights = pd.DataFrame(columns=stocks_compl)
hrp_er = []
hrp_pv = []
mv_er = []
mv_pv = []
rp_er = []
rp_pv = []
unif_er = []
62 Appendix A. Code
unif_pv = []
filenames = []
year = 2015
month = 1
k = 0
while(True):
start = str(year)+'-'+str(month)
if month == 12:
month = 1
year = year+1
else:
month+=1
if year==2021:
break
end = str(year)+'-'+str(month)
mask = (returns.index > start) & (returns.index < end)
train = returns.loc[mask]
# HRP
corr = train.corr()
d_corr = np.sqrt(0.5*(1-corr))
link = linkage(d_corr, 'single')
sort_ix = get_quasi_diag(link)
cov = train.cov()
fig, ax = plt.subplots()
sstocks = np.array(stocks_compl)
df_vis = train[sstocks[sort_ix]]
corr2 = df_vis.corr()
im = ax.imshow(corr2, cmap='bwr')
dicts={}
weights_HRP = get_rec_bipart(cov, sort_ix)
new_index = [returns.columns[i] for i in weights_HRP.index]
weights_HRP.index = new_index
weights_HRP.name = "HRP"
for i in range (len(stocks_compl)):
dicts[stocks_compl[i]] = weights_HRP[i]
hrp_weights = hrp_weights.append(dicts, ignore_index=True)
weights_MV = compute_MV_weights(cov)
for i in range (len(stocks_compl)):
dicts[stocks_compl[i]] = weights_MV[i]
mv_weights = mv_weights.append(dicts, ignore_index=True)
weights_RP = compute_RP_weights(cov)
for i in range (len(stocks_compl)):
dicts[stocks_compl[i]] = weights_RP[i]
rp_weights = rp_weights.append(dicts, ignore_index=True)
weights_unif = compute_unif_weights(cov)
for i in range (len(stocks_compl)):
dicts[stocks_compl[i]] = weights_unif[i]
unif_weights = unif_weights.append(dicts, ignore_index=True)
a = ss * weights_MV
mv.append(a.sum())
a = ss * weights_RP
rp.append(a.sum())
a = ss * weights_unif
unif.append(a.sum())
# GUARDAR COSTES
64 Appendix A. Code
if k > 0:
n = hrp_weights.iloc[k] - hrp_weights.iloc[k-1]
n = n.abs()
n = n * ss
pp_hrp.append(n.sum()*0.001)
n = mv_weights.iloc[k] - mv_weights.iloc[k-1]
n = n.abs()
n = n * ss
pp_mv.append(n.sum()*0.001)
n = rp_weights.iloc[k] - rp_weights.iloc[k-1]
n = n.abs()
n = n * ss
pp_rp.append(n.sum()*0.001)
n = unif_weights.iloc[k] - unif_weights.iloc[k-1]
n = n.abs()
n = n * ss
pp_unif.append(n.sum()*0.001)
k += 1
# Grafica comparando todos los returns de los diferentes metodos (mes a mes)
plt.plot(cp_hrp, color="red")
plt.plot(cp_mv, color="blue")
plt.plot(cp_rp, color="green")
plt.plot(cp_unif, color="orange")
plt.xlabel("Time steps", fontsize=18)
plt.ylabel("Portfolio Value", fontsize=18)
plt.show()
cost_unif = np.zeros(len(unif))
for i in range(len(hrp)):
if i==0:
cost_hrp[i] = hrp[i]
cost_mv[i] = mv[i]
cost_rp[i] = rp[i]
cost_unif[i] = unif[i]
else:
cost_hrp[i] = (hrp[i] - pp_hrp[i-1]) * np.product(cost_hrp[i-1])
cost_mv[i] = (mv[i] - pp_mv[i-1]) * np.product(cost_mv[i-1])
cost_rp[i] = (rp[i] - pp_rp[i-1]) * np.product(cost_rp[i-1])
cost_unif[i] = (unif[i] - pp_unif[i-1]) * np.product(cost_unif[i-1])
# Grafica que compara los expected returns mes a mes para los diferentes metodos
fig = plt.figure(figsize=(25, 10))
plt.plot(hrp_er, color="red")
plt.plot(mv_er, color="blue")
plt.plot(rp_er, color="green")
plt.plot(unif_er, color="orange")
plt.xlabel("Time steps", fontsize=18)
plt.ylabel("Monthly expected return", fontsize=18)
plt.show()
plt.plot(cp_hrp, color="blue")
plt.show()
#Grafica 2: Expected return vs Return real pero acumulado para Minimum Variance
fig = plt.figure(figsize=(25, 10))
plt.plot(np.cumprod(mv_er), color="red")
plt.plot(cp_mv, color="blue")
plt.show()
plt.show()
import imageio
# build gif
with imageio.get_writer('mygif.gif', mode='I') as writer:
for filename in filenames:
image = imageio.imread(filename)
writer.append_data(image)
69
Bibliography
[1] Kilian Axel Schafer Harald Lohre Carsten Rother. Hierarchical risk parity: Ac-
counting for tail dependencies in multi-asset multi-factor allocations. 2020.
[2] Emmanuel Jurczenko. Machine learning for asset management, new developments
and financial applications. 2020.
[3] JP Morgan. Hierarchical Risk Parity: Enhancing Returns at Target Volatility.
2016.
[4] JP Morgan. Systematic Strategies Across Asset Classes Risk Factor Approach to
Investing and Portfolio Management. 2013.
[5] Marcos Lopez de Prado. Advances in Financial Machine Learning. 2016.
[6] Marcos Lopez de Prado. BUILDING DIVERSIFIED PORTFOLIOS THAT OUT-
PERFORM OUT-OF-SAMPLE. 2015.