0% found this document useful (0 votes)
5 views

Machine Learning Asset Allocation

The document discusses the challenges of convex optimization in financial machine learning, particularly the instability of optimized portfolios due to noise and signal structure. It introduces a new optimization method that addresses signal-induced instability and presents techniques for de-noising and de-toning correlation matrices to improve portfolio allocation. Experimental results demonstrate the effectiveness of these methods in generating optimal portfolios.

Uploaded by

Wing Kin CHAN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Machine Learning Asset Allocation

The document discusses the challenges of convex optimization in financial machine learning, particularly the instability of optimized portfolios due to noise and signal structure. It introduces a new optimization method that addresses signal-induced instability and presents techniques for de-noising and de-toning correlation matrices to improve portfolio allocation. Experimental results demonstrate the effectiveness of these methods in generating optimal portfolios.

Uploaded by

Wing Kin CHAN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Machine Learning Asset Allocation

Prof. Marcos López de Prado


Advances in Financial Machine Learning
ORIE 5256

Electronic copy available at: https://ssrn.com/abstract=3469964


Key Points
• Convex optimization solutions tend to be unstable, to the point of entirely
offsetting the benefits of optimization.
– For example, in the context of financial applications, it is known that portfolios optimized in-sample
often underperform the naïve (equal weights) allocation out-of-sample.
• This instability can be traced back to two sources:
– noise in the input variables
– signal structure that magnifies the estimation errors in the input variables.
• There is abundant literature discussing noise-induced instability.
• In contrast, signal-induced instability is often ignored or misunderstood.
• We introduce a new optimization method that is robust to signal-induced
instability.
• For additional details, see the full paper at: https://ssrn.com/abstract=3469961

Electronic copy available at: https://ssrn.com/abstract=3469964 2


SECTION I
Problem Statement

Electronic copy available at: https://ssrn.com/abstract=3469964


The Problem
• Consider an investment universe with 𝑁 assets, where the expected value of
returns is represented by an array 𝜇, and the covariance of returns is represented
by the matrix 𝑉
• We would like to minimize the variance of a portfolio with allocations 𝜔, measured
as 𝜔′ 𝑉𝜔, subject to achieving a target 𝜔′ 𝑎 = 𝑎,
ത where 𝑎 characterizes the optimal
solution
• The problem can be stated as

1
min 𝜔′ 𝑉𝜔
𝜔 2
s. t. : 𝜔′ 𝑎 = 𝑎ത

Electronic copy available at: https://ssrn.com/abstract=3469964 4


The Solution (1/2)
• This problem can be expressed in lagrangian form as
1
𝐿 𝜔, 𝜆 = 𝜔′ 𝑉𝜔 − 𝜆 𝜔′𝑎 − 𝑎ത
2
with first order conditions
𝜕𝐿 𝜔, 𝜆
= 𝑉𝜔 − 𝜆𝑎
𝜕𝜔
𝜕𝐿 𝜔, 𝜆
= 𝜔′𝑎 − 𝑎ത
𝜕𝜆
• Setting the first order (necessary) conditions to zero, we obtain that 𝑉𝜔 − 𝜆𝑎 =
𝑎ത
0 ⟹ 𝜔 = 𝜆𝑉 −1 𝑎, and 𝜔′ 𝑎 = 𝑎′ 𝜔 = 𝑎ത ⟹ 𝜆𝑎 ′ 𝑉 −1 𝑎 = 𝑎ത ⟹ 𝜆 = ′ −1 , thus
𝑎 𝑉 𝑎
−1
𝑉 𝑎

𝜔 = 𝑎ത ′ −1
𝑎𝑉 𝑎

Electronic copy available at: https://ssrn.com/abstract=3469964 5


The Solution (2/2)
• The second order (sufficient) condition confirms that this solution is the minimum
of the lagrangian,

𝜕𝐿2 𝜔, 𝜆 𝜕𝐿2 𝜔, 𝜆
𝜕𝜔 2 𝜕𝜔𝜕𝜆 = 𝑉′ −𝑎′ = 𝑎′ 𝑎 ≥ 0
𝜕𝐿2 𝜔, 𝜆 𝜕𝐿2 𝜔, 𝜆 𝑎 0
𝜕𝜆𝜕𝜔 𝜕𝜆2

• The issue is, this solution is mathematically correct, but impractical, among other
reasons due to numerical instabilities

Electronic copy available at: https://ssrn.com/abstract=3469964 6


Numerical Instability
• The common approach to estimating 𝜔∗ is to compute

෠ −1 𝑎ො
𝑉
ෝ ∗ = 𝑎ത ′ −1
𝜔
𝑎ො 𝑉෠ 𝑎ො

where 𝑉෠ is the estimated 𝑉, and 𝑎ො is the estimated 𝑎.


• In general, replacing each variable with its estimate will lead to unstable solutions,
that is, solutions where a small change in the inputs will cause extreme changes in
ෝ∗
𝜔
• This is problematic, because in many practical applications there are material costs
associated with the re-allocation from one solution to another

Electronic copy available at: https://ssrn.com/abstract=3469964 7


SECTION II
Noise-induced Instability

Electronic copy available at: https://ssrn.com/abstract=3469964


The Marcenko-Pastur Distribution (1/2)
• Consider a matrix of independent and identically distributed random observations
𝑋, of size 𝑇𝑥𝑁, where the underlying process generating the observations has zero
mean and variance 𝜎 2
• The matrix 𝐶 = 𝑇 −1 𝑋′𝑋 has eigenvalues 𝜆 that asymptotically converge (as 𝑁 →
+ ∞ and 𝑇 → +∞ with 1 < 𝑇Τ𝑁 < +∞) to the Marcenko-Pastur probability
density function (PDF),

𝑇 𝜆+ − 𝜆 𝜆 − 𝜆−
𝑓 𝜆 = ൞𝑁 if 𝜆 ∈ 𝜆− , 𝜆+
2𝜋𝜆𝜎 2
0 if 𝜆 ∉ 𝜆− , 𝜆+

Electronic copy available at: https://ssrn.com/abstract=3469964 9


The Marcenko-Pastur Distribution (2/2)
2
𝑁
… where the maximum expected eigenvalue is 𝜆+ = 𝜎 2 1 + , and the minimum
𝑇
2
𝑁
expected eigenvalue is 𝜆− = 𝜎 2 1 − . When 𝜎 2 = 1, then 𝐶 is the correlation
𝑇
matrix associated with 𝑋.
Eigenvalues 𝜆 ∈ 𝜆− , 𝜆+ are consistent with random behavior due
to the finite sample size. Specifically, we associate eigenvalues 𝜆 ∈
0, 𝜆+ with estimation error.

Problem: In empirical covariance matrices, most of the eigenvalues


fall under the Marcenko-Pastur distribution, and are insignificant.

The implication is that neither 𝑪−𝟏 nor 𝑽−𝟏 can be computed


robustly. Solutions are only optimal in-sample, not out-of-sample.

Electronic copy available at: https://ssrn.com/abstract=3469964 10


Fitting the Marcenko-Pastur PDF
• Laloux et al. [2005] argue that, since part of the variance is caused by random
eigenvectors, we can adjust 𝜎 2 accordingly in the above equations
– For instance, if we assume that the eigenvector associated with the highest eigenvalue is not
𝜆+
random, then we should replace 𝜎 2 with 𝜎 2 1 − in the above equations
𝑁

• In fact, we can fit the function 𝑓 𝜆 to the empirical distribution of the eigenvalues
to derive the implied 𝜎 2
That will give us the variance that is explained by the random
eigenvectors present in the correlation matrix, and it will determine
the cut-off level 𝜆+ , adjusted for the presence of non-random
eigenvectors.

Key point: Because we know what eigenvalues are associated with


noise, we can shrink only those, without diluting the signal!

Electronic copy available at: https://ssrn.com/abstract=3469964 11


SECTION III
De-Noising and De-Toning

Electronic copy available at: https://ssrn.com/abstract=3469964


The Constant Residual Eigenvalue Method
• Let 𝜆𝑛 𝑛=1,…,𝑁 be the set of all eigenvalues, ordered descending, and 𝑖 be the
position of the eigenvalue such that 𝜆𝑖 > 𝜆+ and 𝜆𝑖+1 ≤ 𝜆+
1
• Then we set 𝜆𝑗 = 𝑁−𝑖 σ𝑁
𝑘=𝑖+1 𝜆𝑘 , 𝑗 = 𝑖 + 1, … , 𝑁, hence preserving the trace of the
correlation matrix

Given the eigenvector decomposition 𝑉𝑊 = 𝑊Λ, we form the


de-noised correlation matrix 𝐶1 as

𝐶ሚ1 = 𝑊Λ
෩𝑊 ′
−1ൗ2 −1ൗ2
𝐶1 = diag 𝐶ሚ1 𝐶ሚ1 diag 𝐶ሚ1

where Λ ෩ is the diagonal matrix holding the corrected eigenvalues.


The reason for the second transformation is to re-scale the matrix
𝐶ሚ1 , so that the main diagonal of 𝐶1 is an array of 1s.

Electronic copy available at: https://ssrn.com/abstract=3469964 13


The Targeted Shrinkage Method
• The numerical method described earlier is preferable to shrinkage, because it
removes the noise while preserving the signal
• Alternatively, we could target the application of the shrinkage strictly to the
random eigenvectors. Consider the correlation matrix 𝐶1

𝐶1 = 𝑊𝐿 Λ𝐿 𝑊𝐿′ + 𝛼𝑊𝑅 Λ𝑅 𝑊𝑅′ + 1 − 𝛼 diag 𝑊𝑅 Λ𝑅 𝑊𝑅′

where 𝑊𝑅 and Λ𝑅 are the eigenvectors and eigenvalues


associated with 𝑛ห𝜆𝑛 ≤ 𝜆+ , 𝑊𝐿 and Λ𝐿 are the eigenvectors and
eigenvalues associated with 𝑛ห𝜆𝑛 > 𝜆+ , and 𝛼 regulates the
amount of shrinkage among the eigenvectors and eigenvalues
associated with noise (𝛼 → 0 for total shrinkage).

Electronic copy available at: https://ssrn.com/abstract=3469964 14


De-Toning (1/3)
• Financial correlation matrices usually incorporate a market component
• The market component is characterized by the first eigenvector, with loadings
1

𝑊𝑛,1 ≈ 𝑁 , 𝑛 = 1, … , 𝑁
2

• Accordingly, a market component affects every item of the covariance matrix


• By removing the market component, we allow a greater portion of the correlation
to be explained by components that affect specific subsets of the securities
• Intuition: De-toning is similar to removing a loud tone that prevents us from
hearing other sounds

Electronic copy available at: https://ssrn.com/abstract=3469964 15


De-Toning (2/3)
• We can remove the market component from the de-noised correlation matrix, 𝐶1 ,
to form the de-toned correlation matrix,

𝐶ሚ2 = 𝐶1 − 𝑊𝑀 Λ𝑀 𝑊𝑀′ = 𝑊𝐷 Λ𝐷 𝑊𝐷′


−1ൗ2 −1ൗ2
𝐶2 = diag 𝐶ሚ2 𝐶ሚ2 diag 𝐶ሚ2

where 𝑊𝑀 and Λ𝑀 are the eigenvectors and eigenvalues associated with market
components (usually only one, but possibly more), and 𝑊𝐷 and Λ𝐷 are the
eigenvectors and eigenvalues associated with non-market components.

Electronic copy available at: https://ssrn.com/abstract=3469964 16


De-Toning (3/3)
• The de-toned correlation matrix is singular, as a result of eliminating (at least) one
eigenvector
– This is not a problem for clustering applications, as most approaches do not require the invertibility
of the correlation matrix
• Still, a de-toned correlation matrix 𝐶2 cannot be used directly for mean-variance
portfolio optimization
• Instead, we can optimize a portfolio on the selected (non-zero) principal
components, and map the optimal allocations 𝑓 ∗ back to the original basis
• The optimal allocations in the original basis are
𝜔∗ = 𝑊+ 𝑓 ∗
where 𝑊+ contains only the eigenvectors that survived the de-toning process (i.e.,
with a non-null eigenvalue), and 𝑓 ∗ is the vector of optimal allocations to those same
components.
Electronic copy available at: https://ssrn.com/abstract=3469964 17
Experimental Results (1/2)
• We generate a vector of means and a covariance matrix out of 10 blocks of size 50
each, where off-diagonal elements within each block have a correlation of 0.5
– This covariance matrix is a stylized representation of a “true” (non-empirical) de-toned correlation
matrix of the S&P 500, where each block is associated with an economic sector
– Without loss of generality, the variances are drawn from a uniform distribution bounded between
5% and 20%, and the vector of means is drawn from a Normal distribution with mean and standard
deviation equal to the standard deviation from the covariance matrix
– This is consistent with the notion that in an efficient market all securities have the same expected
Sharpe ratio
• We use this means vector and covariance matrix to draw 1,000 random matrices 𝑋
of size 𝑇𝑥𝑁 = 1000𝑥500, compute the associated empirical covariance matrices
and vectors of means, and evaluate the (empirical) optimal portfolios
• We compute the root-mean-square error (RMSE) between the “empirical” and
“true” optimal portfolios
Electronic copy available at: https://ssrn.com/abstract=3469964 18
Experimental Results (2/2)
Not De-Noised De-Noised Not De-Noised De-Noised
Not Shrunk 4.95E-03 1.99E-03 Not Shrunk 9.48E-01 5.27E-02
Shrunk 3.45E-03 1.70E-03 Shrunk 2.77E-01 5.17E-02

Minimum Variance Portfolio Maximum Sharpe Ratio Portfolio

De-noising is much more effective than shrinkage: The de-noised maximum Sharpe ratio portfolio
the de-noised minimum variance portfolio incurs incurs only 0.04% of the RMSE incurred by the
only 40.15% of the RMSE incurred by the minimum maximum Sharpe ratio portfolio without de-noising.
variance portfolio without de-noising. That is a That is a 94.44% reduction in RMSE from de-noising
59.85% reduction in RMSE from de-noising alone, alone, compared to a 70.77% reduction using
compared to a 30.22% reduction using Ledoit-Wolf Ledoit-Wolf shrinkage. While shrinkage is somewhat
shrinkage. Shrinkage adds little benefit beyond helpful in absence of de-noising, it adds no benefit
what de-noising contributes. The reduction in RMSE in combination with de-noising. This is because
from combining de-noising with shrinkage is shrinkage dilutes the noise at the expense of
65.63%, which is not much better than the result diluting some of the signal as well.
from using de-noising only.

Electronic copy available at: https://ssrn.com/abstract=3469964 19


SECTION IV
Signal-induced Instability

Electronic copy available at: https://ssrn.com/abstract=3469964


The Condition Number (1/2)
• Certain covariance structures can make the mean-variance optimization solution
unstable
• Consider a correlation matrix between two securities,
1 𝜌
𝐶=
𝜌 1
where 𝜌 is the correlation between their returns.
• Matrix 𝐶 can be diagonalized as 𝐶𝑊 = 𝑊Λ as follows, where
1 1
2 2 1+𝜌 0
𝑊= ,Λ =
1 1 0 1−𝜌

2 2

Electronic copy available at: https://ssrn.com/abstract=3469964 21


The Condition Number (2/2)
• The trace of 𝐶 is 𝑡𝑟 𝐶 = Λ1,1 + Λ2,2 = 2, so 𝜌 sets how big one eigenvalue gets at
the expense of the other
• The determinant of 𝐶 is given by 𝐶 = Λ1,1 Λ2,2 = 1 + 𝜌 1 − 𝜌 = 1 − 𝜌2
– The determinant reaches its maximum at Λ1,1 = Λ2,2 = 1, which corresponds to the uncorrelated
case, 𝜌 = 0
– The determinant reaches its minimum at Λ1,1 = 0 or Λ2,2 = 0, which corresponds to the perfectly
correlated case, 𝜌 = 1
1 −𝜌 1
• The inverse of 𝐶 is 𝐶 −1 = 𝑊Λ−1 𝑊 ′ =
−𝜌 1 𝐶
• The implication is that, the more 𝜌 deviates from zero, the bigger one eigenvalue
becomes relative to the other, causing 𝐶 to approach zero, which makes the
values of 𝐶 −1 explode. This happens regardless of the 𝑵/𝑻 ratio

Electronic copy available at: https://ssrn.com/abstract=3469964 22


Markowitz’s Curse
• Matrix 𝐶 is just a standardized version of 𝑉, and the conclusions we drew on 𝐶 −1
apply to the 𝑉 −1 used to compute 𝜔∗
• When securities within a portfolio are highly correlated (−1 < 𝜌 ≪ 0 or 0 ≪ 𝜌 <
1), 𝐶 has a high condition number, and the values of 𝑉 −1 explode
• This is problematic in the context of portfolio optimization, because 𝜔∗ depends
on 𝑉 −1 , and unless 𝜌 ≈ 0, we must expect an unstable solution to the convex
optimization program
• In other words, Markowitz’s solution is guaranteed to be numerically stable only if
𝜌 ≈ 0, which is precisely the case when we don’t need it!
• The reason we needed Markowitz was to handle the 𝜌 ≉ 0 case, but the more we
need Markowitz, the more numerically unstable is its estimation of 𝜔∗

Electronic copy available at: https://ssrn.com/abstract=3469964 23


Signal-induced Instability in Finance
• When a subset of securities exhibits greater correlation among themselves than to
the rest of the investment universe, that subset forms a cluster within the
correlation matrix
• Clusters appear naturally, as a consequence of hierarchical relationships
• When 𝐾 securities form a cluster, they are more heavily exposed to a common
eigenvector, which implies that the associated eigenvalue explains a greater
amount of variance

But because the trace of the correlation matrix is exactly 𝑁, that means
that an eigenvalue can only increase at the expense of the other 𝑁 − 𝐾
eigenvalues, resulting in a condition number greater than 1.

Accordingly, the greater the intra-cluster correlation is, the higher the
condition number becomes.

Electronic copy available at: https://ssrn.com/abstract=3469964 24


SECTION V
The NCO Algorithm

Electronic copy available at: https://ssrn.com/abstract=3469964


NCO’s Structure (1/2)
• The Nested Clustered Optimization (NCO) algorithm is composed of five steps:
1. Correlation Clustering: Find the optimal number of clusters
• One possibility is to apply the ONC algorithm, however NCO is agnostic as to what particular
algorithm is used for determining the number of clusters
• For large matrices, where 𝑇/𝑁 is relatively low, it is advisable to de-noise the correlation
matrix prior to clustering, following the method described in Section III
2. Intra-Cluster Weights: Compute optimal intra-cluster allocations, one optimal allocation per cluster,
using the de-noised covariance matrix. This operation can be parallelized
• If the cluster correlation matrices are nearly singular, you may de-tone them prior to
optimization
3. System Reduction: Use the intra-cluster weights to reduce the system (one row/column per cluster)
4. Inter-Cluster Weights: Compute optimal inter-cluster allocations, using the reduced covariance
matrix
5. Dot Product: The final allocation per security results from multiplying intra-cluster weights with the
inter-cluster weights

Electronic copy available at: https://ssrn.com/abstract=3469964 26


NCO’s Structure (2/2)
Input:
Why does NCO beat Markowitz?

De-noise By construction, the reduced covariance matrix


is close to a diagonal matrix, and the
Partition variables into mutually disjoint clusters optimization problem is close to the ideal
Markowitz case.
Opt. cluster 1 Opt. cluster c Opt. cluster C

In other words, the clustering and intra-cluster


Apply intra-cluster allocations to collapse
(one column/row per cluster)
optimization steps have allowed us to transform
a “Markowitz-cursed” problem ( 𝜌 ≫ 0) into a
Optimize collapsed
well-behaved problem (𝜌 ≈ 0).

Final alloc = dot_product(intra-alloc,inter-alloc)


Electronic copy available at: https://ssrn.com/abstract=3469964 27
Experimental Results (1/2)
• We generate a vector of means and a covariance matrix out of 10 blocks of size 5
each, where off-diagonal elements within each block have a correlation of 0.5
– This covariance matrix is a stylized representation of a “true” (non-empirical) de-toned correlation
matrix of the S&P 500, where each block is associated with an economic sector
– Without loss of generality, the variances are drawn from a uniform distribution bounded between
5% and 20%, and the vector of means is drawn from a Normal distribution with mean and standard
deviation equal to the standard deviation from the covariance matrix
– This is consistent with the notion that in an efficient market all securities have the same expected
Sharpe ratio
• We use this means vector and covariance matrix to draw 1,000 random matrices 𝑋
of size 𝑇𝑥𝑁 = 1000𝑥50, compute the associated empirical covariance matrices
and vectors of means, and evaluate the (empirical) optimal portfolios
• We compute the root-mean-square error (RMSE) between the “empirical” and
“true” optimal portfolios
Electronic copy available at: https://ssrn.com/abstract=3469964 28
Experimental Results (2/2)
Markowitz NCO Markowitz NCO
Raw 7.95E-03 4.21E-03 Raw 7.02E-02 3.17E-02
Shrunk 8.89E-03 6.74E-03 Shrunk 6.54E-02 5.72E-02

Minimum Variance Portfolio Maximum Sharpe Ratio Portfolio

NCO computes the minimum variance portfolio with NCO computes the maximum Sharpe ratio portfolio
52.98% of Markowitz’s RMSE, i.e. a 47.02% with 45.17% of Markowitz’s RMSE, i.e. a 54.83%
reduction in RMSE. Ledoit-Wolf shrinkage is reduction in RMSE. The combination of shrinkage
detrimental. Combining shrinkage and NCO yields a and NCO yields a 18.52% reduction in the RMSE of
15.30% reduction in RMSE, which is better than the maximum Sharpe ratio portfolio, which is better
shrinkage but worse than NCO alone. than shrinkage but worse than NCO. Once again,
The implication is that NCO delivers substantially NCO delivers substantially lower RMSE than
lower RMSE than Markowitz’s solution, even for a Markowitz’s solution, and shrinkage adds no value.
small portfolio of only 50 securities, and that It is easy to test that NCO’s advantage widens for
shrinkage adds no value. larger portfolios.

Electronic copy available at: https://ssrn.com/abstract=3469964 29


SECTION VI
Robustness Analysis via Monte Carlo

Electronic copy available at: https://ssrn.com/abstract=3469964


Monte Carlo Optimization Selection (MCOS)
• The tragic irony of finance: People seek asset-risk diversification while engaging in
model-risk concentration (a single point of failure, e.g. Black-Litterman)
• There is no reason to believe that one particular optimization method is the most
robust under all conditions
• The interactions between noise and signal-induced instabilities make it hard to
determine a priori what is the most robust optimization approach for a particular
problem
• Thus, rather than relying always on one particular approach, researchers should
evaluate what optimization methods are best suited to a particular setting
– We introduce a Monte Carlo approach that derives the estimation error produced by various
optimization methods on a particular set of input variables
– This information can be used to produce an ensemble (weighted) portfolio of the most robust
solutions (diversification across asset risk and model risk)

Electronic copy available at: https://ssrn.com/abstract=3469964 31


MCOS’ Structure (1/2)
1. Compute input set: Get the input variables that characterize the problem, 𝜇, 𝑉
2. Random draw: Draws 𝑇 observations from the data-generating process
characterized by 𝜇, 𝑉 , and derive from those observations the pair 𝜇,Ƹ 𝑉෠
3. Optimal allocations: Compute optimal allocations for 𝜇,Ƹ 𝑉෠ , applying 𝑀
alternative methods, resulting in 𝑀 alternative allocations
4. Storage: Record the 𝑀 allocations associated with this run of the MCOS method
5. Loop: Repeat steps 1-4 a user-defined number of times
6. Benchmark: Compute the true allocation derived from 𝜇, 𝑉
7. Estimation error: Compute the estimation error associated with each of the 𝑀
alternative methods
8. Report: Report the method that yields the most robust allocation for the particular
set of inputs 𝜇, 𝑉
Electronic copy available at: https://ssrn.com/abstract=3469964 32
MCOS’ Structure (2/2)
Input:

Sim_Num = 0
Report best performing
method for that Draw:
particular set of inputs

De-noise

Compute
estimation error
for each method Optimize using
True

Opt. method 1 Opt. method m Opt. method M


True allocation, False Sim_Num <
using Max_Sim

Allocation 1 Allocation m Allocation M

Sim_Num+=1 Record all allocations for run Sim_Num


Electronic copy available at: https://ssrn.com/abstract=3469964 33
For Additional Details
The first wave of quantitative innovation in
finance was led by Markowitz optimization.
Machine Learning is the second wave and it will
touch every aspect of finance. López de Prado’s
Advances in Financial Machine Learning is
essential for readers who want to be ahead of
the technology rather than being replaced by it.
— Prof. Campbell Harvey, Duke University.
Former President of the American Finance
Association.

Financial problems require very distinct


machine learning solutions. Dr. López de
Prado’s book is the first one to characterize
what makes standard machine learning tools
fail when applied to the field of finance, and the
first one to provide practical solutions to unique
challenges faced by asset managers. Everyone
who wants to understand the future of finance
should read this book.
— Prof. Frank Fabozzi, EDHEC Business School.
Editor of The Journal of Portfolio Management.

Electronic copy available at: https://ssrn.com/abstract=3469964 34


Disclaimer
• The views expressed in this document are the authors’ and do not necessarily
reflect those of the organizations he is affiliated with.
• No investment decision or particular course of action is recommended by this
presentation.
• All Rights Reserved. © 2018-2024 by Marcos Lopez de Prado

www.QuantResearch.org

Electronic copy available at: https://ssrn.com/abstract=3469964 35

You might also like