0% found this document useful (0 votes)

5 views

Machine Learning Asset Allocation

The document discusses the challenges of convex optimization in financial machine learning, particularly the instability of optimized portfolios due to noise and signal structure. It introduces a new optimization method that addresses signal-induced instability and presents techniques for de-noising and de-toning correlation matrices to improve portfolio allocation. Experimental results demonstrate the effectiveness of these methods in generating optimal portfolios.

Uploaded by

Wing Kin CHAN

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Machine Learning Asset Allocation

Uploaded by

Wing Kin CHAN

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Machine Learning Asset Allocation

Prof. Marcos López de Prado

Advances in Financial Machine Learning
ORIE 5256

Electronic copy available at: https://ssrn.com/abstract=3469964

Key Points
• Convex optimization solutions tend to be unstable, to the point of entirely
offsetting the benefits of optimization.
– For example, in the context of financial applications, it is known that portfolios optimized in-sample
often underperform the naïve (equal weights) allocation out-of-sample.
• This instability can be traced back to two sources:
– noise in the input variables
– signal structure that magnifies the estimation errors in the input variables.
• There is abundant literature discussing noise-induced instability.
• In contrast, signal-induced instability is often ignored or misunderstood.
• We introduce a new optimization method that is robust to signal-induced
instability.
• For additional details, see the full paper at: https://ssrn.com/abstract=3469961

Electronic copy available at: https://ssrn.com/abstract=3469964 2

SECTION I
Problem Statement

Electronic copy available at: https://ssrn.com/abstract=3469964

The Problem
• Consider an investment universe with 𝑁 assets, where the expected value of
returns is represented by an array 𝜇, and the covariance of returns is represented
by the matrix 𝑉
• We would like to minimize the variance of a portfolio with allocations 𝜔, measured
as 𝜔′ 𝑉𝜔, subject to achieving a target 𝜔′ 𝑎 = 𝑎,
ത where 𝑎 characterizes the optimal
solution
• The problem can be stated as

1
min 𝜔′ 𝑉𝜔
𝜔 2
s. t. : 𝜔′ 𝑎 = 𝑎ത

Electronic copy available at: https://ssrn.com/abstract=3469964 4

The Solution (1/2)
• This problem can be expressed in lagrangian form as
1
𝐿 𝜔, 𝜆 = 𝜔′ 𝑉𝜔 − 𝜆 𝜔′𝑎 − 𝑎ത
2
with first order conditions
𝜕𝐿 𝜔, 𝜆
= 𝑉𝜔 − 𝜆𝑎
𝜕𝜔
𝜕𝐿 𝜔, 𝜆
= 𝜔′𝑎 − 𝑎ത
𝜕𝜆
• Setting the first order (necessary) conditions to zero, we obtain that 𝑉𝜔 − 𝜆𝑎 =
𝑎ത
0 ⟹ 𝜔 = 𝜆𝑉 −1 𝑎, and 𝜔′ 𝑎 = 𝑎′ 𝜔 = 𝑎ത ⟹ 𝜆𝑎 ′ 𝑉 −1 𝑎 = 𝑎ത ⟹ 𝜆 = ′ −1 , thus
𝑎 𝑉 𝑎
−1
𝑉 𝑎
∗
𝜔 = 𝑎ത ′ −1
𝑎𝑉 𝑎

Electronic copy available at: https://ssrn.com/abstract=3469964 5

The Solution (2/2)
• The second order (sufficient) condition confirms that this solution is the minimum
of the lagrangian,

𝜕𝐿2 𝜔, 𝜆 𝜕𝐿2 𝜔, 𝜆
𝜕𝜔 2 𝜕𝜔𝜕𝜆 = 𝑉′ −𝑎′ = 𝑎′ 𝑎 ≥ 0
𝜕𝐿2 𝜔, 𝜆 𝜕𝐿2 𝜔, 𝜆 𝑎 0
𝜕𝜆𝜕𝜔 𝜕𝜆2

• The issue is, this solution is mathematically correct, but impractical, among other
reasons due to numerical instabilities

Electronic copy available at: https://ssrn.com/abstract=3469964 6

Numerical Instability
• The common approach to estimating 𝜔∗ is to compute

෠ −1 𝑎ො
𝑉
ෝ ∗ = 𝑎ത ′ −1
𝜔
𝑎ො 𝑉෠ 𝑎ො

where 𝑉෠ is the estimated 𝑉, and 𝑎ො is the estimated 𝑎.

• In general, replacing each variable with its estimate will lead to unstable solutions,
that is, solutions where a small change in the inputs will cause extreme changes in
ෝ∗
𝜔
• This is problematic, because in many practical applications there are material costs
associated with the re-allocation from one solution to another

Electronic copy available at: https://ssrn.com/abstract=3469964 7

SECTION II
Noise-induced Instability

Electronic copy available at: https://ssrn.com/abstract=3469964

The Marcenko-Pastur Distribution (1/2)
• Consider a matrix of independent and identically distributed random observations
𝑋, of size 𝑇𝑥𝑁, where the underlying process generating the observations has zero
mean and variance 𝜎 2
• The matrix 𝐶 = 𝑇 −1 𝑋′𝑋 has eigenvalues 𝜆 that asymptotically converge (as 𝑁 →
+ ∞ and 𝑇 → +∞ with 1 < 𝑇Τ𝑁 < +∞) to the Marcenko-Pastur probability
density function (PDF),

𝑇 𝜆+ − 𝜆 𝜆 − 𝜆−
𝑓 𝜆 = ൞𝑁 if 𝜆 ∈ 𝜆− , 𝜆+
2𝜋𝜆𝜎 2
0 if 𝜆 ∉ 𝜆− , 𝜆+

Electronic copy available at: https://ssrn.com/abstract=3469964 9

The Marcenko-Pastur Distribution (2/2)
2
𝑁
… where the maximum expected eigenvalue is 𝜆+ = 𝜎 2 1 + , and the minimum
𝑇
2
𝑁
expected eigenvalue is 𝜆− = 𝜎 2 1 − . When 𝜎 2 = 1, then 𝐶 is the correlation
𝑇
matrix associated with 𝑋.
Eigenvalues 𝜆 ∈ 𝜆− , 𝜆+ are consistent with random behavior due
to the finite sample size. Specifically, we associate eigenvalues 𝜆 ∈
0, 𝜆+ with estimation error.

Problem: In empirical covariance matrices, most of the eigenvalues

fall under the Marcenko-Pastur distribution, and are insignificant.

The implication is that neither 𝑪−𝟏 nor 𝑽−𝟏 can be computed

robustly. Solutions are only optimal in-sample, not out-of-sample.

Electronic copy available at: https://ssrn.com/abstract=3469964 10

Fitting the Marcenko-Pastur PDF
• Laloux et al. [2005] argue that, since part of the variance is caused by random
eigenvectors, we can adjust 𝜎 2 accordingly in the above equations
– For instance, if we assume that the eigenvector associated with the highest eigenvalue is not
𝜆+
random, then we should replace 𝜎 2 with 𝜎 2 1 − in the above equations
𝑁

• In fact, we can fit the function 𝑓 𝜆 to the empirical distribution of the eigenvalues
to derive the implied 𝜎 2
That will give us the variance that is explained by the random
eigenvectors present in the correlation matrix, and it will determine
the cut-off level 𝜆+ , adjusted for the presence of non-random
eigenvectors.

Key point: Because we know what eigenvalues are associated with

noise, we can shrink only those, without diluting the signal!

Electronic copy available at: https://ssrn.com/abstract=3469964 11

SECTION III
De-Noising and De-Toning

Electronic copy available at: https://ssrn.com/abstract=3469964

The Constant Residual Eigenvalue Method
• Let 𝜆𝑛 𝑛=1,…,𝑁 be the set of all eigenvalues, ordered descending, and 𝑖 be the
position of the eigenvalue such that 𝜆𝑖 > 𝜆+ and 𝜆𝑖+1 ≤ 𝜆+
1
• Then we set 𝜆𝑗 = 𝑁−𝑖 σ𝑁
𝑘=𝑖+1 𝜆𝑘 , 𝑗 = 𝑖 + 1, … , 𝑁, hence preserving the trace of the
correlation matrix

Given the eigenvector decomposition 𝑉𝑊 = 𝑊Λ, we form the

de-noised correlation matrix 𝐶1 as

𝐶ሚ1 = 𝑊Λ
෩𝑊 ′
−1ൗ2 −1ൗ2
𝐶1 = diag 𝐶ሚ1 𝐶ሚ1 diag 𝐶ሚ1

where Λ ෩ is the diagonal matrix holding the corrected eigenvalues.

The reason for the second transformation is to re-scale the matrix
𝐶ሚ1 , so that the main diagonal of 𝐶1 is an array of 1s.

Electronic copy available at: https://ssrn.com/abstract=3469964 13

The Targeted Shrinkage Method
• The numerical method described earlier is preferable to shrinkage, because it
removes the noise while preserving the signal
• Alternatively, we could target the application of the shrinkage strictly to the
random eigenvectors. Consider the correlation matrix 𝐶1

𝐶1 = 𝑊𝐿 Λ𝐿 𝑊𝐿′ + 𝛼𝑊𝑅 Λ𝑅 𝑊𝑅′ + 1 − 𝛼 diag 𝑊𝑅 Λ𝑅 𝑊𝑅′

where 𝑊𝑅 and Λ𝑅 are the eigenvectors and eigenvalues

associated with 𝑛ห𝜆𝑛 ≤ 𝜆+ , 𝑊𝐿 and Λ𝐿 are the eigenvectors and
eigenvalues associated with 𝑛ห𝜆𝑛 > 𝜆+ , and 𝛼 regulates the
amount of shrinkage among the eigenvectors and eigenvalues
associated with noise (𝛼 → 0 for total shrinkage).

Electronic copy available at: https://ssrn.com/abstract=3469964 14

De-Toning (1/3)
• Financial correlation matrices usually incorporate a market component
• The market component is characterized by the first eigenvector, with loadings
1
−
𝑊𝑛,1 ≈ 𝑁 , 𝑛 = 1, … , 𝑁
2

• Accordingly, a market component affects every item of the covariance matrix

• By removing the market component, we allow a greater portion of the correlation
to be explained by components that affect specific subsets of the securities
• Intuition: De-toning is similar to removing a loud tone that prevents us from
hearing other sounds

Electronic copy available at: https://ssrn.com/abstract=3469964 15

De-Toning (2/3)
• We can remove the market component from the de-noised correlation matrix, 𝐶1 ,
to form the de-toned correlation matrix,

𝐶ሚ2 = 𝐶1 − 𝑊𝑀 Λ𝑀 𝑊𝑀′ = 𝑊𝐷 Λ𝐷 𝑊𝐷′

−1ൗ2 −1ൗ2
𝐶2 = diag 𝐶ሚ2 𝐶ሚ2 diag 𝐶ሚ2

where 𝑊𝑀 and Λ𝑀 are the eigenvectors and eigenvalues associated with market
components (usually only one, but possibly more), and 𝑊𝐷 and Λ𝐷 are the
eigenvectors and eigenvalues associated with non-market components.

Electronic copy available at: https://ssrn.com/abstract=3469964 16

De-Toning (3/3)
• The de-toned correlation matrix is singular, as a result of eliminating (at least) one
eigenvector
– This is not a problem for clustering applications, as most approaches do not require the invertibility
of the correlation matrix
• Still, a de-toned correlation matrix 𝐶2 cannot be used directly for mean-variance
portfolio optimization
• Instead, we can optimize a portfolio on the selected (non-zero) principal
components, and map the optimal allocations 𝑓 ∗ back to the original basis
• The optimal allocations in the original basis are
𝜔∗ = 𝑊+ 𝑓 ∗
where 𝑊+ contains only the eigenvectors that survived the de-toning process (i.e.,
with a non-null eigenvalue), and 𝑓 ∗ is the vector of optimal allocations to those same
components.
Electronic copy available at: https://ssrn.com/abstract=3469964 17
Experimental Results (1/2)
• We generate a vector of means and a covariance matrix out of 10 blocks of size 50
each, where off-diagonal elements within each block have a correlation of 0.5
– This covariance matrix is a stylized representation of a “true” (non-empirical) de-toned correlation
matrix of the S&P 500, where each block is associated with an economic sector
– Without loss of generality, the variances are drawn from a uniform distribution bounded between
5% and 20%, and the vector of means is drawn from a Normal distribution with mean and standard
deviation equal to the standard deviation from the covariance matrix
– This is consistent with the notion that in an efficient market all securities have the same expected
Sharpe ratio
• We use this means vector and covariance matrix to draw 1,000 random matrices 𝑋
of size 𝑇𝑥𝑁 = 1000𝑥500, compute the associated empirical covariance matrices
and vectors of means, and evaluate the (empirical) optimal portfolios
• We compute the root-mean-square error (RMSE) between the “empirical” and
“true” optimal portfolios
Electronic copy available at: https://ssrn.com/abstract=3469964 18
Experimental Results (2/2)
Not De-Noised De-Noised Not De-Noised De-Noised
Not Shrunk 4.95E-03 1.99E-03 Not Shrunk 9.48E-01 5.27E-02
Shrunk 3.45E-03 1.70E-03 Shrunk 2.77E-01 5.17E-02

Minimum Variance Portfolio Maximum Sharpe Ratio Portfolio

De-noising is much more effective than shrinkage: The de-noised maximum Sharpe ratio portfolio
the de-noised minimum variance portfolio incurs incurs only 0.04% of the RMSE incurred by the
only 40.15% of the RMSE incurred by the minimum maximum Sharpe ratio portfolio without de-noising.
variance portfolio without de-noising. That is a That is a 94.44% reduction in RMSE from de-noising
59.85% reduction in RMSE from de-noising alone, alone, compared to a 70.77% reduction using
compared to a 30.22% reduction using Ledoit-Wolf Ledoit-Wolf shrinkage. While shrinkage is somewhat
shrinkage. Shrinkage adds little benefit beyond helpful in absence of de-noising, it adds no benefit
what de-noising contributes. The reduction in RMSE in combination with de-noising. This is because
from combining de-noising with shrinkage is shrinkage dilutes the noise at the expense of
65.63%, which is not much better than the result diluting some of the signal as well.
from using de-noising only.

Electronic copy available at: https://ssrn.com/abstract=3469964 19

SECTION IV
Signal-induced Instability

Electronic copy available at: https://ssrn.com/abstract=3469964

The Condition Number (1/2)
• Certain covariance structures can make the mean-variance optimization solution
unstable
• Consider a correlation matrix between two securities,
1 𝜌
𝐶=
𝜌 1
where 𝜌 is the correlation between their returns.
• Matrix 𝐶 can be diagonalized as 𝐶𝑊 = 𝑊Λ as follows, where
1 1
2 2 1+𝜌 0
𝑊= ,Λ =
1 1 0 1−𝜌
−
2 2

Electronic copy available at: https://ssrn.com/abstract=3469964 21

The Condition Number (2/2)
• The trace of 𝐶 is 𝑡𝑟 𝐶 = Λ1,1 + Λ2,2 = 2, so 𝜌 sets how big one eigenvalue gets at
the expense of the other
• The determinant of 𝐶 is given by 𝐶 = Λ1,1 Λ2,2 = 1 + 𝜌 1 − 𝜌 = 1 − 𝜌2
– The determinant reaches its maximum at Λ1,1 = Λ2,2 = 1, which corresponds to the uncorrelated
case, 𝜌 = 0
– The determinant reaches its minimum at Λ1,1 = 0 or Λ2,2 = 0, which corresponds to the perfectly
correlated case, 𝜌 = 1
1 −𝜌 1
• The inverse of 𝐶 is 𝐶 −1 = 𝑊Λ−1 𝑊 ′ =
−𝜌 1 𝐶
• The implication is that, the more 𝜌 deviates from zero, the bigger one eigenvalue
becomes relative to the other, causing 𝐶 to approach zero, which makes the
values of 𝐶 −1 explode. This happens regardless of the 𝑵/𝑻 ratio

Electronic copy available at: https://ssrn.com/abstract=3469964 22

Markowitz’s Curse
• Matrix 𝐶 is just a standardized version of 𝑉, and the conclusions we drew on 𝐶 −1
apply to the 𝑉 −1 used to compute 𝜔∗
• When securities within a portfolio are highly correlated (−1 < 𝜌 ≪ 0 or 0 ≪ 𝜌 <
1), 𝐶 has a high condition number, and the values of 𝑉 −1 explode
• This is problematic in the context of portfolio optimization, because 𝜔∗ depends
on 𝑉 −1 , and unless 𝜌 ≈ 0, we must expect an unstable solution to the convex
optimization program
• In other words, Markowitz’s solution is guaranteed to be numerically stable only if
𝜌 ≈ 0, which is precisely the case when we don’t need it!
• The reason we needed Markowitz was to handle the 𝜌 ≉ 0 case, but the more we
need Markowitz, the more numerically unstable is its estimation of 𝜔∗

Electronic copy available at: https://ssrn.com/abstract=3469964 23

Signal-induced Instability in Finance
• When a subset of securities exhibits greater correlation among themselves than to
the rest of the investment universe, that subset forms a cluster within the
correlation matrix
• Clusters appear naturally, as a consequence of hierarchical relationships
• When 𝐾 securities form a cluster, they are more heavily exposed to a common
eigenvector, which implies that the associated eigenvalue explains a greater
amount of variance

But because the trace of the correlation matrix is exactly 𝑁, that means
that an eigenvalue can only increase at the expense of the other 𝑁 − 𝐾
eigenvalues, resulting in a condition number greater than 1.

Accordingly, the greater the intra-cluster correlation is, the higher the
condition number becomes.

Electronic copy available at: https://ssrn.com/abstract=3469964 24

SECTION V
The NCO Algorithm

Electronic copy available at: https://ssrn.com/abstract=3469964

NCO’s Structure (1/2)
• The Nested Clustered Optimization (NCO) algorithm is composed of five steps:
1. Correlation Clustering: Find the optimal number of clusters
• One possibility is to apply the ONC algorithm, however NCO is agnostic as to what particular
algorithm is used for determining the number of clusters
• For large matrices, where 𝑇/𝑁 is relatively low, it is advisable to de-noise the correlation
matrix prior to clustering, following the method described in Section III
2. Intra-Cluster Weights: Compute optimal intra-cluster allocations, one optimal allocation per cluster,
using the de-noised covariance matrix. This operation can be parallelized
• If the cluster correlation matrices are nearly singular, you may de-tone them prior to
optimization
3. System Reduction: Use the intra-cluster weights to reduce the system (one row/column per cluster)
4. Inter-Cluster Weights: Compute optimal inter-cluster allocations, using the reduced covariance
matrix
5. Dot Product: The final allocation per security results from multiplying intra-cluster weights with the
inter-cluster weights

Electronic copy available at: https://ssrn.com/abstract=3469964 26

NCO’s Structure (2/2)
Input:
Why does NCO beat Markowitz?

De-noise By construction, the reduced covariance matrix

is close to a diagonal matrix, and the
Partition variables into mutually disjoint clusters optimization problem is close to the ideal
Markowitz case.
Opt. cluster 1 Opt. cluster c Opt. cluster C

In other words, the clustering and intra-cluster

Apply intra-cluster allocations to collapse
(one column/row per cluster)
optimization steps have allowed us to transform
a “Markowitz-cursed” problem ( 𝜌 ≫ 0) into a
Optimize collapsed
well-behaved problem (𝜌 ≈ 0).

Final alloc = dot_product(intra-alloc,inter-alloc)

Electronic copy available at: https://ssrn.com/abstract=3469964 27
Experimental Results (1/2)
• We generate a vector of means and a covariance matrix out of 10 blocks of size 5
each, where off-diagonal elements within each block have a correlation of 0.5
– This covariance matrix is a stylized representation of a “true” (non-empirical) de-toned correlation
matrix of the S&P 500, where each block is associated with an economic sector
– Without loss of generality, the variances are drawn from a uniform distribution bounded between
5% and 20%, and the vector of means is drawn from a Normal distribution with mean and standard
deviation equal to the standard deviation from the covariance matrix
– This is consistent with the notion that in an efficient market all securities have the same expected
Sharpe ratio
• We use this means vector and covariance matrix to draw 1,000 random matrices 𝑋
of size 𝑇𝑥𝑁 = 1000𝑥50, compute the associated empirical covariance matrices
and vectors of means, and evaluate the (empirical) optimal portfolios
• We compute the root-mean-square error (RMSE) between the “empirical” and
“true” optimal portfolios
Electronic copy available at: https://ssrn.com/abstract=3469964 28
Experimental Results (2/2)
Markowitz NCO Markowitz NCO
Raw 7.95E-03 4.21E-03 Raw 7.02E-02 3.17E-02
Shrunk 8.89E-03 6.74E-03 Shrunk 6.54E-02 5.72E-02

Minimum Variance Portfolio Maximum Sharpe Ratio Portfolio

NCO computes the minimum variance portfolio with NCO computes the maximum Sharpe ratio portfolio
52.98% of Markowitz’s RMSE, i.e. a 47.02% with 45.17% of Markowitz’s RMSE, i.e. a 54.83%
reduction in RMSE. Ledoit-Wolf shrinkage is reduction in RMSE. The combination of shrinkage
detrimental. Combining shrinkage and NCO yields a and NCO yields a 18.52% reduction in the RMSE of
15.30% reduction in RMSE, which is better than the maximum Sharpe ratio portfolio, which is better
shrinkage but worse than NCO alone. than shrinkage but worse than NCO. Once again,
The implication is that NCO delivers substantially NCO delivers substantially lower RMSE than
lower RMSE than Markowitz’s solution, even for a Markowitz’s solution, and shrinkage adds no value.
small portfolio of only 50 securities, and that It is easy to test that NCO’s advantage widens for
shrinkage adds no value. larger portfolios.

Electronic copy available at: https://ssrn.com/abstract=3469964 29

SECTION VI
Robustness Analysis via Monte Carlo

Electronic copy available at: https://ssrn.com/abstract=3469964

Monte Carlo Optimization Selection (MCOS)
• The tragic irony of finance: People seek asset-risk diversification while engaging in
model-risk concentration (a single point of failure, e.g. Black-Litterman)
• There is no reason to believe that one particular optimization method is the most
robust under all conditions
• The interactions between noise and signal-induced instabilities make it hard to
determine a priori what is the most robust optimization approach for a particular
problem
• Thus, rather than relying always on one particular approach, researchers should
evaluate what optimization methods are best suited to a particular setting
– We introduce a Monte Carlo approach that derives the estimation error produced by various
optimization methods on a particular set of input variables
– This information can be used to produce an ensemble (weighted) portfolio of the most robust
solutions (diversification across asset risk and model risk)

Electronic copy available at: https://ssrn.com/abstract=3469964 31

MCOS’ Structure (1/2)
1. Compute input set: Get the input variables that characterize the problem, 𝜇, 𝑉
2. Random draw: Draws 𝑇 observations from the data-generating process
characterized by 𝜇, 𝑉 , and derive from those observations the pair 𝜇,Ƹ 𝑉෠
3. Optimal allocations: Compute optimal allocations for 𝜇,Ƹ 𝑉෠ , applying 𝑀
alternative methods, resulting in 𝑀 alternative allocations
4. Storage: Record the 𝑀 allocations associated with this run of the MCOS method
5. Loop: Repeat steps 1-4 a user-defined number of times
6. Benchmark: Compute the true allocation derived from 𝜇, 𝑉
7. Estimation error: Compute the estimation error associated with each of the 𝑀
alternative methods
8. Report: Report the method that yields the most robust allocation for the particular
set of inputs 𝜇, 𝑉
Electronic copy available at: https://ssrn.com/abstract=3469964 32
MCOS’ Structure (2/2)
Input:

Sim_Num = 0
Report best performing
method for that Draw:
particular set of inputs

De-noise

Compute
estimation error
for each method Optimize using
True

Opt. method 1 Opt. method m Opt. method M

True allocation, False Sim_Num <
using Max_Sim

Allocation 1 Allocation m Allocation M

Sim_Num+=1 Record all allocations for run Sim_Num

Electronic copy available at: https://ssrn.com/abstract=3469964 33
For Additional Details
The first wave of quantitative innovation in
finance was led by Markowitz optimization.
Machine Learning is the second wave and it will
touch every aspect of finance. López de Prado’s
Advances in Financial Machine Learning is
essential for readers who want to be ahead of
the technology rather than being replaced by it.
— Prof. Campbell Harvey, Duke University.
Former President of the American Finance
Association.

Financial problems require very distinct

machine learning solutions. Dr. López de
Prado’s book is the first one to characterize
what makes standard machine learning tools
fail when applied to the field of finance, and the
first one to provide practical solutions to unique
challenges faced by asset managers. Everyone
who wants to understand the future of finance
should read this book.
— Prof. Frank Fabozzi, EDHEC Business School.
Editor of The Journal of Portfolio Management.

Electronic copy available at: https://ssrn.com/abstract=3469964 34

Disclaimer
• The views expressed in this document are the authors’ and do not necessarily
reflect those of the organizations he is affiliated with.
• No investment decision or particular course of action is recommended by this
presentation.
• All Rights Reserved. © 2018-2024 by Marcos Lopez de Prado

www.QuantResearch.org

Electronic copy available at: https://ssrn.com/abstract=3469964 35

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (82)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jakub M. Tomczak - Deep Generative Modeling-Springer (2022)
100% (1)
Jakub M. Tomczak - Deep Generative Modeling-Springer (2022)
210 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Algorithmic Trading & Quantitative Strategies Gappy Lecture 4
No ratings yet
Algorithmic Trading & Quantitative Strategies Gappy Lecture 4
23 pages
Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
0% (1)
Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
4 pages
SolutionsShumway PDF
No ratings yet
SolutionsShumway PDF
82 pages
(Tables) Laplace and Z-Transforms PDF
No ratings yet
(Tables) Laplace and Z-Transforms PDF
7 pages
EE306 Power System Analysis Hadi Saadat
No ratings yet
EE306 Power System Analysis Hadi Saadat
18 pages
Jean Philippe Bouchaud
No ratings yet
Jean Philippe Bouchaud
31 pages
Financial Applications of Random Matrix
No ratings yet
Financial Applications of Random Matrix
18 pages
Elements of Statistical Learning Solutions
100% (3)
Elements of Statistical Learning Solutions
112 pages
Friedlander Weiss 98
No ratings yet
Friedlander Weiss 98
4 pages
Weatherwax Epstein Hastie Solution Manual
No ratings yet
Weatherwax Epstein Hastie Solution Manual
147 pages
(Textbook) (Solution) The Elements of Statistical Learning
No ratings yet
(Textbook) (Solution) The Elements of Statistical Learning
147 pages
Applications of Random Matrix Theory To Principal Component Analysis (PCA)
No ratings yet
Applications of Random Matrix Theory To Principal Component Analysis (PCA)
25 pages
Weatherwax Epstein Hastie Solution Manual
No ratings yet
Weatherwax Epstein Hastie Solution Manual
147 pages
A Solution Manual and Notes For: The Elements of Statistical Learning by Jerome Friedman, Trevor Hastie, and Robert Tibshirani
No ratings yet
A Solution Manual and Notes For: The Elements of Statistical Learning by Jerome Friedman, Trevor Hastie, and Robert Tibshirani
121 pages
Lesson-1
No ratings yet
Lesson-1
43 pages
Noise Removal of Trading Prices Via Fourier Analysis
No ratings yet
Noise Removal of Trading Prices Via Fourier Analysis
6 pages
chapter_3 Performance Surface and Search Method
No ratings yet
chapter_3 Performance Surface and Search Method
24 pages
chapter_3 Performance Surface and Search Method
No ratings yet
chapter_3 Performance Surface and Search Method
49 pages
GSVD and Its Applications in Model Analy
No ratings yet
GSVD and Its Applications in Model Analy
15 pages
Bouchaud Potters Random Matrix Finance
No ratings yet
Bouchaud Potters Random Matrix Finance
23 pages
SVAR's: ECO 513 Fall 2016 C. Sims
No ratings yet
SVAR's: ECO 513 Fall 2016 C. Sims
34 pages
Cleaning Correlation Matrices
No ratings yet
Cleaning Correlation Matrices
6 pages
Lecture_4_notes_final20180219203938
No ratings yet
Lecture_4_notes_final20180219203938
21 pages
IntroBayesTimeSeries2
No ratings yet
IntroBayesTimeSeries2
73 pages
The Singular Value Decomposition: Prof. Walter Gander ETH Zurich Decenber 12, 2008
No ratings yet
The Singular Value Decomposition: Prof. Walter Gander ETH Zurich Decenber 12, 2008
18 pages
Background/Random Processes
No ratings yet
Background/Random Processes
33 pages
05.1 PP 3 14 Deterministic Matrices
No ratings yet
05.1 PP 3 14 Deterministic Matrices
12 pages
PCA Fin. Econ.
No ratings yet
PCA Fin. Econ.
56 pages
Reconstruction of A Low-Rank Matrix in The Presence of Gaussian Noise
No ratings yet
Reconstruction of A Low-Rank Matrix in The Presence of Gaussian Noise
34 pages
Lesson 2
No ratings yet
Lesson 2
34 pages
MLF Notes - Rishab Dec 24
No ratings yet
MLF Notes - Rishab Dec 24
6 pages
A Step by Step Mathematical Derivation A
No ratings yet
A Step by Step Mathematical Derivation A
32 pages
LVM Class 5
No ratings yet
LVM Class 5
83 pages
Lec7 PDF
No ratings yet
Lec7 PDF
20 pages
ECE 6123 Advanced Signal Processing: 1 Filters
No ratings yet
ECE 6123 Advanced Signal Processing: 1 Filters
9 pages
Parameter Estimation 1: Linear Least Squares
No ratings yet
Parameter Estimation 1: Linear Least Squares
7 pages
Matrix OLS NYU Notes
No ratings yet
Matrix OLS NYU Notes
14 pages
Tutorial On Principal Component Analysis: Javier R. Movellan
No ratings yet
Tutorial On Principal Component Analysis: Javier R. Movellan
9 pages
L6 - Kalman Filter
No ratings yet
L6 - Kalman Filter
15 pages
SUROP Report
No ratings yet
SUROP Report
14 pages
Fast Adaptive Eigenvalue Decomposition A Maximum Likelihood Approach
No ratings yet
Fast Adaptive Eigenvalue Decomposition A Maximum Likelihood Approach
4 pages
F-Bach
No ratings yet
F-Bach
36 pages
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
No ratings yet
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
9 pages
Spiked Separable Covariance Matrices and Principal Components
No ratings yet
Spiked Separable Covariance Matrices and Principal Components
104 pages
Linear Algebra and Applications: Benjamin Recht
No ratings yet
Linear Algebra and Applications: Benjamin Recht
42 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
79 pages
Linear Regression, Active Learning
No ratings yet
Linear Regression, Active Learning
10 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
Stock Market Analysis and Prediction
No ratings yet
Stock Market Analysis and Prediction
26 pages
Shifting Method
No ratings yet
Shifting Method
9 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
44 pages
Course1 Review
No ratings yet
Course1 Review
45 pages
Stability Analysis For VAR Systems
No ratings yet
Stability Analysis For VAR Systems
12 pages
Eigenvalues, Eigenvectors (CDT-28) : April 2020
No ratings yet
Eigenvalues, Eigenvectors (CDT-28) : April 2020
11 pages
Singular Value Decomposition: Reduced Density Matrix
No ratings yet
Singular Value Decomposition: Reduced Density Matrix
3 pages
Optimum Receiver
No ratings yet
Optimum Receiver
25 pages
MIT14 384F13 Problems
No ratings yet
MIT14 384F13 Problems
7 pages
Kalmannote Basics
No ratings yet
Kalmannote Basics
4 pages
Calculus Volume1
From Everand
Calculus Volume1
Ming Yao Tsai
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Worked Examples in Advanced Mechanics of Materials using MATLAB
From Everand
Worked Examples in Advanced Mechanics of Materials using MATLAB
Eric Okoth Ogur
No ratings yet
Control of Multiple-Input, Multiple-Output Processes
No ratings yet
Control of Multiple-Input, Multiple-Output Processes
38 pages
Panel Data Econometrics Sul Donggyu; instant download
100% (1)
Panel Data Econometrics Sul Donggyu; instant download
26 pages
2 CIE IGCSE Additional Mathematics Topical Past Paper Quadratic Functions
100% (1)
2 CIE IGCSE Additional Mathematics Topical Past Paper Quadratic Functions
13 pages
Bregman Divergence and Mirror Descent
No ratings yet
Bregman Divergence and Mirror Descent
8 pages
IJISAE 3 Dr.+Shwetambari+Borade 3 1899
No ratings yet
IJISAE 3 Dr.+Shwetambari+Borade 3 1899
8 pages
Download Full Computer vision: theory, algorithms, practicalities Fifth Edition Davies PDF All Chapters
100% (1)
Download Full Computer vision: theory, algorithms, practicalities Fifth Edition Davies PDF All Chapters
51 pages
PGP in Data Science and AI With Fellowship
No ratings yet
PGP in Data Science and AI With Fellowship
14 pages
Slope & Y-Intercept Interpretation
No ratings yet
Slope & Y-Intercept Interpretation
6 pages
Market Models A Guide To Financial Data Analysis by Carol
No ratings yet
Market Models A Guide To Financial Data Analysis by Carol
2 pages
DBSCAN.docx
No ratings yet
DBSCAN.docx
7 pages
Chapter 7
No ratings yet
Chapter 7
20 pages
Color Detection: Bhai Parmanand Institute of Business Studies
No ratings yet
Color Detection: Bhai Parmanand Institute of Business Studies
3 pages
Enhanced Forecasting Accuracy of A Grid-Connected
No ratings yet
Enhanced Forecasting Accuracy of A Grid-Connected
21 pages
4.introduction To Undecidability
No ratings yet
4.introduction To Undecidability
2 pages
PROBABILITY
No ratings yet
PROBABILITY
19 pages
Fin 2101 Module 6 - Time Value of Money Analysis or Tvma-1
No ratings yet
Fin 2101 Module 6 - Time Value of Money Analysis or Tvma-1
12 pages
Control System MCQ (Multiple Choice Questions) - Javatpoint
No ratings yet
Control System MCQ (Multiple Choice Questions) - Javatpoint
51 pages
Online and Linear-Time Attention by Enforcing Monotonic Alignments
No ratings yet
Online and Linear-Time Attention by Enforcing Monotonic Alignments
19 pages
Purple Gradient Artificial Intelligence Presentation
No ratings yet
Purple Gradient Artificial Intelligence Presentation
9 pages
Modeling The Effect of A Drain
No ratings yet
Modeling The Effect of A Drain
3 pages
Review on state-of-the-art dynamic task allocation strategies for multiple-
No ratings yet
Review on state-of-the-art dynamic task allocation strategies for multiple-
15 pages
Logit Multinomial
No ratings yet
Logit Multinomial
16 pages
DSA - Searching PDF
100% (1)
DSA - Searching PDF
50 pages
Machine Learning ML Based Electricity Price Prediction With Visualization
No ratings yet
Machine Learning ML Based Electricity Price Prediction With Visualization
6 pages
ComputerExercisesDSP PDF
No ratings yet
ComputerExercisesDSP PDF
404 pages
Paper3
No ratings yet
Paper3
22 pages
Chapter 6 Stability of Linear Control Systems: 6-1 (A) (B) (C) (D) (E) (F)
No ratings yet
Chapter 6 Stability of Linear Control Systems: 6-1 (A) (B) (C) (D) (E) (F)
13 pages

Machine Learning Asset Allocation

Uploaded by

Machine Learning Asset Allocation

Uploaded by

Machine Learning Asset Allocation

Prof. Marcos López de Prado

Electronic copy available at: https://ssrn.com/abstract=3469964

Electronic copy available at: https://ssrn.com/abstract=3469964 2

Electronic copy available at: https://ssrn.com/abstract=3469964

Electronic copy available at: https://ssrn.com/abstract=3469964 4

Electronic copy available at: https://ssrn.com/abstract=3469964 5

Electronic copy available at: https://ssrn.com/abstract=3469964 6

where 𝑉෠ is the estimated 𝑉, and 𝑎ො is the estimated 𝑎.

Electronic copy available at: https://ssrn.com/abstract=3469964 7

Electronic copy available at: https://ssrn.com/abstract=3469964

Electronic copy available at: https://ssrn.com/abstract=3469964 9

Problem: In empirical covariance matrices, most of the eigenvalues

The implication is that neither 𝑪−𝟏 nor 𝑽−𝟏 can be computed

Electronic copy available at: https://ssrn.com/abstract=3469964 10

Key point: Because we know what eigenvalues are associated with

Electronic copy available at: https://ssrn.com/abstract=3469964 11

Electronic copy available at: https://ssrn.com/abstract=3469964

Given the eigenvector decomposition 𝑉𝑊 = 𝑊Λ, we form the

where Λ ෩ is the diagonal matrix holding the corrected eigenvalues.

Electronic copy available at: https://ssrn.com/abstract=3469964 13

𝐶1 = 𝑊𝐿 Λ𝐿 𝑊𝐿′ + 𝛼𝑊𝑅 Λ𝑅 𝑊𝑅′ + 1 − 𝛼 diag 𝑊𝑅 Λ𝑅 𝑊𝑅′

where 𝑊𝑅 and Λ𝑅 are the eigenvectors and eigenvalues

Electronic copy available at: https://ssrn.com/abstract=3469964 14

• Accordingly, a market component affects every item of the covariance matrix

Electronic copy available at: https://ssrn.com/abstract=3469964 15

𝐶ሚ2 = 𝐶1 − 𝑊𝑀 Λ𝑀 𝑊𝑀′ = 𝑊𝐷 Λ𝐷 𝑊𝐷′

Electronic copy available at: https://ssrn.com/abstract=3469964 16

Minimum Variance Portfolio Maximum Sharpe Ratio Portfolio

Electronic copy available at: https://ssrn.com/abstract=3469964 19

Electronic copy available at: https://ssrn.com/abstract=3469964

Electronic copy available at: https://ssrn.com/abstract=3469964 21

Electronic copy available at: https://ssrn.com/abstract=3469964 22

Electronic copy available at: https://ssrn.com/abstract=3469964 23

Electronic copy available at: https://ssrn.com/abstract=3469964 24

Electronic copy available at: https://ssrn.com/abstract=3469964

Electronic copy available at: https://ssrn.com/abstract=3469964 26

De-noise By construction, the reduced covariance matrix

In other words, the clustering and intra-cluster

Final alloc = dot_product(intra-alloc,inter-alloc)

Minimum Variance Portfolio Maximum Sharpe Ratio Portfolio

Electronic copy available at: https://ssrn.com/abstract=3469964 29

Electronic copy available at: https://ssrn.com/abstract=3469964

Electronic copy available at: https://ssrn.com/abstract=3469964 31

Opt. method 1 Opt. method m Opt. method M

Allocation 1 Allocation m Allocation M

Sim_Num+=1 Record all allocations for run Sim_Num

Financial problems require very distinct

Electronic copy available at: https://ssrn.com/abstract=3469964 34

Electronic copy available at: https://ssrn.com/abstract=3469964 35

You might also like