0% found this document useful (0 votes)
65 views

Enhanced Portfolio Optimization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Enhanced Portfolio Optimization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Financial Analysts Journal

ISSN: 0015-198X (Print) 1938-3312 (Online) Journal homepage: https://www.tandfonline.com/loi/ufaj20

Enhanced Portfolio Optimization

Lasse Heje Pedersen, Abhilash Babu & Ari Levine

To cite this article: Lasse Heje Pedersen, Abhilash Babu & Ari Levine (2021): Enhanced Portfolio
Optimization, Financial Analysts Journal, DOI: 10.1080/0015198X.2020.1854543

To link to this article: https://doi.org/10.1080/0015198X.2020.1854543

© 2021 The Author(s). Published with


license by Taylor & Francis Group, LLC

Published online: 19 Feb 2021.

Submit your article to this journal

Article views: 52

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=ufaj20
Financial Analysts Journal | A Publication of CFA Institute Research
https://doi.org/10.1080/0015198X.2020.1854543

Enhanced Portfolio
Optimization
Lasse Heje Pedersen , Abhilash Babu, CFA , and Ari Levine
Lasse Heje Pedersen is a principal at AQR Capital Management and a professor at Copenhagen Business School, Frederiksberg, Denmark.
Abhilash Babu, CFA, is a vice president at AQR Capital Management, Greenwich, Connecticut. Ari Levine is a principal at AQR Capital
Management, Greenwich, Connecticut.

I
Portfolio optimization should nvestors seek to construct portfolios that optimally trade off risk and
provide large benefits for inves- expected return. A standard tool to achieve this goal is mean–variance
tors, but standard mean–variance optimization (Markowitz 1952), but mean–variance optimization (MVO)
optimization (MVO) works so poorly often produces large and unintuitive bets that perform poorly in practice
in practice that optimization is often (Michaud 1989). Indeed, finding optimization methods that beat the sim-
abandoned. Many of the approaches ple 1/N portfolio that allocates capital (or risk) equally across securities
developed to address this issue are has proven surprisingly difficult (DeMiguel, Garlappi, and Uppal 2009).
surrounded by mystique regarding Perhaps as a result, many investors skip optimization altogether. Similarly,
how, why, and whether they really
standard academic factors that bet on such characteristics as value (high
work. So, we sought to simplify,
book-to-market ratio minus low book-to-market ratio, or HML), size
unify, and demystify optimization.
(small minus big, or SMB), and momentum (up minus down, or UMD) are
We identified the portfolios that
cause problems in standard MVO, constructed without the use of optimization or, in fact, the use of any
and we present here a simple volatility or correlation information (e.g., the factor models of Fama and
“enhanced portfolio optimization” French 1993, 2015). Theoretically, optimization should be a big help, but
method. Applying this method to the practical failure of standard MVO raises several questions: Why does
industry momentum and time- standard optimization perform so poorly? Is there a better way to use
series momentum across equities the information contained in estimated risks, correlations, and expected
and global asset classes, we found returns? If so, how much does this method improve performance?
significant alpha beyond the market,
the 1/N portfolio, and standard In the study reported here, we sought to demystify optimization by
asset pricing factors. addressing these questions. In short, we show (1) where the problem
with standard optimization arises, (2) how to fix it in a simple way, (3)
This is an Open Access article distributed how the fix explains and unifies a number of enhanced optimization
under the terms of the Creative Commons methods in the literature, and (4) that the fix works surprisingly well.
Attribution-NonCommercial-NoDerivatives
License (http://creativecommons.org/licenses/ Specifically, we show the following:
by-nc-nd/4.0/), which permits non-commercial
re-use, distribution, and reproduction in any
medium, provided the original work is properly
1. It is well-known that the problems with standard MVO arise because of
cited, and is not altered, transformed, or built noise in the estimation of risk and expected return,1 but our contribu-
upon in any way.
tion is to identify the “problem portfolios” that cause trouble for MVO.
Disclosure: The authors report no conflicts of 2. Our fix is an “enhanced portfolio optimization” (EPO) method
interest. AQR Capital Management is a global designed to downweight these problem portfolios. We provide a
investment management firm that may or may
not apply similar investment techniques or simple closed-form solution that makes EPO as simple to implement
methods of analysis as described here. The as standard MVO.
views expressed here are those of the authors
and not necessarily those of AQR. Lasse Heje
Pedersen gratefully acknowledges support
from Center for Financial Frictions (Grant
No. DNRF102). We thank Michele Aghassi, Stephen Brown (the editor), Ben Davis, Victor DeMiguel,
Antti Ilmanen, Roni Israelov, Bryan Kelly, Lorenzo Garlappi, Ernst Schaumburg, and
Raman Uppal for helpful comments and Matthew Silverman and Jusvin Dhillon for
PL Credits: 2.0 excellent research assistance.

Volume 77 Number 2 © 2021 The Author(s). Published with license by Taylor & Francis Group, LLC. 1
Financial Analysts Journal | A Publication of CFA Institute

3. The method unifies a broad range of existing expected returns decrease with the principal-com-
methods to enhance portfolio optimization; it ponent number, the expected returns of the least
shows what these methods have in common and important principal components are nevertheless too
how they can be implemented in a simple way. high relative to their realized returns, as can be seen
in Panel B of Figure 1. As a result, from the perspec-
4. The EPO method improves industry momentum
tive of standard MVO, these problem portfolios have
and time-series momentum performance in an
large estimated Sharpe ratios, as shown in Panel C of
economically and statistically significant way
Figure 1. These bets perform poorly in practice, as
relative to standard benchmarks. For example,
can be seen from their low realized Sharpe ratios in
the EPO time-series momentum portfolio in
Panel C. MVO takes large risks in these portfolios, as
global equities, bonds, currencies, and com-
shown in Panel D of Figure 1.
modities shows a large improvement in Sharpe
ratio and statistically significant alpha relative Having identified the problem portfolios, we show
to equal-notional-weighted and equal-volatility- how to address the problem. In the simplest form,
weighted time-series momentum portfolios. the solution is to reduce the estimated Sharpe ratios
Similarly, in equities, we found large perfor- of the least important principal components—to
mance improvements relative to standard factors make their ex ante Sharpe ratios more consistent
when we applied the EPO method to optimize with the realized Sharpe ratios seen in Panel C
industry momentum. These findings mean that of Figure 1. Reducing estimated Sharpe ratios of
the EPO method can be a powerful tool both for the least important principal components can be
investment practice and for constructing strong achieved by increasing their estimated volatilities.
academic factors. Furthermore, we show that increasing the ex ante
volatilities of the problem portfolios is exactly the
To understand the poor performance of standard
same as shrinking correlations of the original assets
MVO, consider how optimization works in practice.
toward zero! Thus, correlation shrinkage directly
An investor first identifies the securities that she
reduces the estimated Sharpe ratios of the problem
likes and dislikes—or, said differently, estimates the
portfolios.
securities’ expected returns. Then, she estimates
the securities’ risks (volatilities and correlations). All This method is what we call the “simple EPO.” The
these estimates naturally have measurement errors, simple EPO first shrinks all correlations toward zero
which can lead MVO to take large unintuitive bets and then computes the standard MVO portfolio.
that work poorly in practice. The two key insights are (1) correlation shrinkage
can fix errors in both risk and expected return and
What are the problem portfolios that plague stan-
(2) this can be achieved by choosing the shrinkage
dard MVO? We show how to find them in a simple
parameter to maximize the portfolio’s Sharpe ratio
way. To do this, we transform the standard optimiza-
(out of sample). This approach contrasts with the
tion problem into the space of principal components;
existing literature cited below that chooses correla-
that is, we work with long–short portfolios that are
tion shrinkage to maximize the fit of the correlation
uncorrelated with each other and are ranked by their
(or variance–covariance) matrix. Tuning to maximize
importance—namely, their variance. Working with
Sharpe ratios yields a much larger shrinkage param-
principal components greatly simplifies the diag-
eter, which empirically provides a large improvement
nosis of the problems with standard MVO because
in performance and is motivated by the theory that
principal components are by definition uncorrelated,
we develop. Indeed, recall that shrinkage correlations
which means, in turn, that the risk that MVO takes
of the original assets corresponds to increasing the
in each principal component is simply proportional
ex ante volatilities of the problem portfolios, which
to its Sharpe ratio. The least important principal
further corresponds to shrinking their Sharpe ratios,
components are exactly the portfolios that cause
so this shrinkage addresses errors in both the risk
trouble for standard MVO. Indeed, these portfolios
model and expected returns.
have the lowest estimated risk, and as a result,
their risks tend to be slightly underestimated, as This insight—that tuning correlation shrinkage to
shown in Panel A of Figure 1. (In Figure 1, the least maximize risk-adjusted returns has more power than
important principal components are those with the correlation shrinkage to reduce errors in risk alone—
highest numbers. Figure 1 is explained in detail in the has deep theoretical foundations based on Bayesian
subsection “Identifying Problem Portfolios” in the estimation and robust optimization. Indeed, we solve
section “EPO in Practice.”) Furthermore, although a new form of robust optimization and show that

2  Second Quarter 2021


 Enhanced Portfolio Optimization

Figure 1. Understanding Problem Portfolios, 1985–2018

A. Average ex Ante Volality and Realized B. Average ex Ante Expected Return and
Volality by Principal Component Realized Return by Principal Component
Volality Return
4.0 1.6
3.5 1.4
3.0 1.2
1.0
2.5
0.8
2.0
0.6
1.5
0.4
1.0 0.2
0.5 0
0 –0.2
1 6 11 16 21 26 31 36 41 46 51 1 6 11 16 21 26 31 36 41 46 51
Principal Component Number Principal Component Number

Ex Ante Volality Realized Volality Ex Ante Expected Return Realized Return

C. Average Annualized ex Ante Sharpe Rao D. Realized Risk Allocaon of EPO and
and Annualized Realized Sharpe Rao Standard MVO by Principal Component
of TSMOM by Principal Component
Risk Allocaon (%)
Sharpe Rao 5
1.0
4
0.8
3
0.6

0.4 2

0.2 1
0
0
–0.2 1 6 11 16 21 26 31 36 41 46 51
1 6 11 16 21 26 31 36 41 46 51 Principal Component Number
Principal Component Number
Standard MVO Volality
Ex Ante SR Realized SR Out-of-Sample EPO Volality

Notes: This figure shows that the least important principal components (those with high numbers) are the “problem portfolios”
because the ex ante risk model underestimates their realized risk (Panel A), the ex ante expected return overestimates their real-
ized returns (Panel B), and the ex ante Sharpe ratios are higher than those of the more intuitive factors (low-numbered principal
components), whereas the reverse is true for realized Sharpe ratios (Panel C). Therefore, standard MVO invests too heavily in
problem portfolios but EPO does not (Panel D). The sample consisted of monthly data for 55 global equities, bonds, commodities,
and currencies.

uncertainty about expected returns leads endog- To see how the simple EPO works in practice,
enously to shrinkage of correlations, even when consider a shrinkage parameter w ∈ [0,1]. First, we
correlations are known without error. Furthermore, replace the off-diagonal correlation Wij between any
we show that the solution to this robust optimization pair of assets i and j with (1 - w) Wij. Then, we use
equals the solution to the seminal model of Black this modified variance–covariance matrix to perform
and Litterman (1992) and methods used in machine MVO. That is it!
learning. In addition to unifying these approaches, a
key contribution of this article is to explain why these Note how easy it is to do. When the EPO parameter
methods work—namely, because they shrink correla- is w = 0, there is no shrinkage, so our method yields
tions, which fixes the problem portfolios. the standard MVO. When w = 1, then all correlations

3
Volume 77 Number 2 
Financial Analysts Journal | A Publication of CFA Institute

are set to zero and the solution is essentially the momentum factors, equal-volatility-weighted time-
same as not optimizing (similar to the use of standard series momentum factors, standard MVO, and MVO
Fama–French factors and even more similar to the methods with enhanced risk models.
signal-weighted portfolios considered in Asness,
Moskowitz, and Pedersen 2013). With any shrinkage Furthermore, in the context of equity industry
w ∈ (0,1), we get somewhere in between standard portfolios, the EPO industry momentum portfolio
MVO and no optimizing but in a way that works significantly outperformed the market portfolio, 1/N
surprisingly well.2 portfolios, standard MVO, MVO with an enhanced
risk model, and standard industry momentum. The
How much shrinkage is needed? The simple answer out-of-sample EPO industry momentum portfolio
is that this matter is an empirical question. We had significant alpha relative to the Fama–French
empirically choose out-of-sample w as follows: Each five-factor model augmented with a standard indus-
time period, we estimate what choice of w would try momentum factor.
have produced the highest Sharpe ratio in the time
period up until today; then, we use this estimate in
the next time period. In several applications, w = 75% Related Literature
worked well. Our theory provides some intuition Our study is related to several approaches in the lit-
for this finding. First, shrinking correlations means erature; indeed, one of our theoretical contributions
increasing the risk of unimportant principal compo- is to unify and demystify these seemingly different
nents. To “fix” the correlation matrix (i.e., to fix errors frameworks.3 First, some papers have focused on
in the risk model alone), we typically need to shrink using shrinkage to improve the variance–covariance
the correlation matrix only about 5%–10%. So, why estimate (Ledoit and Wolf 2003; Elton, Gruber, and
do we need a much larger shrinkage, around 75%? Spitzer 2006), factor models (Fan, Fan, and Lv 2008),
As explained previously, we show theoretically that or random matrix theory (e.g., Ledoit and Wolf
errors in the estimates of expected returns also make 2004, 2012, 2017; El Karoui 2008; Bun, Bouchaud,
correlation shrinkage useful, and these errors may be and Potters 2017). We found that the EPO solution
much larger than the errors in the correlation matrix significantly outperforms such approaches because
itself. We found strong optimization improvements EPO uses a much larger shrinkage to account for
when we used a surprisingly large amount of shrink- noise in estimates of both risk and expected returns
age (surprisingly large from the perspective of what (as discussed).
is needed to fix the correlation matrix from a pure
risk perspective). Second, Black and Litterman (1992) pioneered the
focus on noise in expected returns. Despite the fame
We also develop here a general form of EPO that of their paper, it remains mysterious to many readers,
allows the investor to control how close the solu- who find it difficult to apply and find where the result
tion stays to an “anchor portfolio.” For example, an comes from difficult to understand, including what
investor benchmarked to a certain stock index may is being assumed and what the parameters mean.
desire to control how much his optimized portfolio Although the EPO solution is seemingly different
deviates from this benchmark; that is, he is using the from Black and Litterman, we show that it is, in fact,
benchmark as an anchor. Or an investor may have a equivalent to Black and Litterman. But EPO is simpler
heuristic way to construct a portfolio—for example, to apply and more transparent in how and why it
splitting her money equally among good stocks works. Indeed, the EPO solution is given as a new
(1/N)—and may wish for that optimized portfolio to expression, which shows how correlation shrinkage
stay close to the anchor. can help address uncertainty in expected returns.4
Furthermore, we demystify the whole approach
Empirically, we applied our EPO method to optimize
by proposing an easy and transparent method (the
momentum portfolios using several realistic datasets.
simple EPO) and by illustrating how it fixes the
We show that EPO produces significant performance
problem portfolios.
gains relative to standard benchmarks in the litera-
ture. When applied to a universe of global equity Third, we link our approach to the literature on
indexes, bonds, currencies, and commodities, the robust optimization (see the survey by Fabozzi,
EPO time-series momentum portfolio substantially Huang, and Zhou 2010 and references in it) by show-
outperformed several benchmarks that are known ing how to solve a problem with a general “ellipsoi-
to be difficult to beat. Indeed, EPO outperformed dal uncertainty” set on the mean and by showing,
1/N portfolios, equal-notional-weighted time-series perhaps surprisingly, the exact equivalence between

4  Second Quarter 2021


 Enhanced Portfolio Optimization

this form of robust optimization and the Bayesian assets’ conditional expected excess returns, α = E(r|s).
estimator. Garlappi, Uppal, and Wang (2007) dis- For now, assume that the investor ignores potential
cussed robustness based on ambiguity aversion and noise in the signal. Furthermore, rather than consider-
uncovered a connection between their approach and ing an abstract signal, assume for simplicity that the
shrinkage estimators. Raponi, Uppal, and Zaffaroni signal is already scaled to be the conditional expected
(2020) found strong results for robust portfolio excess return—that is, a = s. Similarly, the investor
optimization. computes a risk model—that is, the conditional vari-
ance–covariance matrix of excess returns, Σ = var(r|s).
Fourth, Britten‐Jones (1999) showed that standard
MVO can be seen as the regression coefficient when The investor starts with a wealth of W0 and chooses
a constant is being regressed on realized returns.
Machine learning has many ways to regularize
( )′
a portfolio x = x1 ,..., x n . Specifically, xi is the
fraction of capital invested in security i; expressed
regressions, and Ao, Li, and Zheng (2019) found that differently, the investor buys x iW0 dollars worth of
a so-called LASSO regression significantly improves security i. Given this portfolio choice, the investor’s
performance. These papers assumed that assets future wealth is
have constant expected returns, whereas we allow
signals to vary over time. Furthermore, we show
that the EPO can be viewed as a “ridge regression,” (
W = W0 1 + r f + x′r .)
another form of regularization used in machine
learning. To generate the most general form of EPO, The investor seeks to maximize mean–variance utility
we must consider the regression of expected returns over final wealth with absolute risk aversion γ = γ /W0:
on the variance–covariance matrix, which is related
to the elastic net regression of Kozak, Nagel, and
γ  γ 
Santosh (2020). E(W|s) − var (W|s) = W0  1 + r f + x′s − x′Σx  . (1)
2  2 
Fifth, our empirical results extend and enhance stan-
dard factor models—in particular, industry momentum Hence, to pick the investor’s optimal portfolio x, the
(Moskowitz and Grinblatt 1999) and time-series investor optimizes as follows:
momentum (Moskowitz, Ooi, Pedersen 2012). See
Baltas (2015), Yang, Qian, and Belton (2019), and  γ 
Baltas and Kosowski (2020) for other enhancements of max  x′s − x′Σx  . (2)
x  2 
time-series momentum based on risk-parity methods.

Finally, Clarke, de Silva, and Thorley (2006), studying Based on the first-order condition, 0 = s − γΣ
Σx, we get
the performance of the minimum-variance portfo- the standard mean–variance-optimal portfolio:
lio, showed the power of risk modeling when using
principal components and Bayesian shrinkage even in 1 −1
x MVO = Σ s. (3)
the absence of return predictors. γ

This portfolio has the highest possible Sharpe ratio


Identifying the Problem with among all portfolios if the expected excess return
Standard Optimization and risk are measured correctly, but the MVO portfo-
We first lay out the standard framework for portfolio lio is sensitive to measurement errors.
choice. Then, we show how to identify problem portfo-
lios. Appendix A contains a summary of our notation. Problem Portfolios. We first show here how the
problem portfolios for standard MVO can be identi-
Standard Mean–Variance Optimization.  fied by using principal components of the correlation
Consider an investor’s problem of choosing a portfolio matrix. To understand, note that the variance–covari-
of n risky assets and a risk-free security. The risk-free ance matrix, S = sWs, can be decomposed into the
return is rf, and the risky assets have excess returns correlation matrix, W, and the diagonal matrix of
asset volatilities,
given by r = (r 1 ,..., r n )′. The investor receives a signal,
s, about the assets (such as their past momentum)
and, using this signal, computes the vector of the risky σ = diag ( Σ11 ,…, Σnn . ) (4)

5
Volume 77 Number 2 
Financial Analysts Journal | A Publication of CFA Institute

Focusing on the correlation matrix essentially means where z = P′σx is the overall portfolio weight mea-
that we first scale all the original assets to have sured in terms of the principal-component portfolios.
equal volatility (but we could also use the variance– Thus, the optimal portfolio weight, z, for the principal
covariance matrix itself). components is

For background on principal components, note


1
that the first principal component maximizes the zMVO = D−1sp . (7)
function h′Ωh subject to h′h = 1. In other words, it γ
maximizes the variance h′Ωh of any portfolio h (in the
space of assets that have been scaled to unit volatil- Given that all principal components are uncorrelated
ity, given that we are working with the correlation (that is, D -1 is also a diagonal matrix calculated by
matrix instead of the variance–covariance matrix). simply replacing each diagonal element in D with its
Hence, the first principal component is the most reciprocal), this solution means that the risk taken in
risky portfolio (for a given sum of squared weights). principal-component portfolio i is proportional to its
The second principal component maximizes the Sharpe ratio:
same function h′Ωh subject to being independent of
the first, and so on. The last principal components 1 siP 1
are exactly those portfolios that potentially give ziMVO = . (8)
{ γ D D
trouble to the standard mean–variance optimiza- notional {i {i
position Sharpe leverage
tion. These portfolios have, by definition, the small- in portfolio i ratio of needed to
est possible variance among all portfolios (relative portfolio i
14243 achieve a
volatility of 1
to their sum of squared portfolio weights) but not desired for portfolio i
necessarily a small magnitude of estimated expected volatillity for
portfolio i
returns. In other words, for these portfolios, the
noise can easily swamp the signal, and what is
The least important principal components are those
worse, standard MVO tends to take large lever-
with the lowest volatilities, Di . Any error in the
aged bets on these noise-driven portfolios. These
estimation of the risk model is likely to lead to an
points are illustrated in Figure 1 as discussed in the
underestimation of the volatilities of these portfolios
introduction and explained in detail in the section
(because they have been chosen as the lowest-risk
“EPO in Practice.”
portfolios). Furthermore, any noise in the estimation
To identify the principal components, we consider of expected return siP will probably be large relative
the eigendecomposition of the correlation matrix, to the risk. Hence, as seen in Equation 8, estima-
tion noise has two problematic effects for the least
important principal components: (1) The optimizer
Ω = PDP −1 , (5) may have a large desired volatility for such a prob-
lem portfolio because of a large (absolute value of
where P is a matrix whose columns are the principal the) Sharpe ratio (because of noise in the estimate
components (also called eigenvectors) and D is a of expected return, which is large relative to the
diagonal matrix of the variances of each principal low risk). (2) The low estimated risk, Di , leads the
component (also called eigenvalues). optimizer to apply high leverage to these portfolios
to achieve a given level of risk. Furthermore, these
Each principal component is scaled such that the sum
two problems exacerbate each other.
of square weights is 1—that is, PP′ = I, so that P −1 = P′ .
The principal-component (PC) portfolios have real-
ized excess returns P′σ −1r, with expected excess
returns of sp = P′σ −1s and variance given by D.
Addressing the Problem: Enhanced
Because D is diagonal, these PC portfolios are uncor- Portfolio Optimization
related (by construction). The portfolio optimization Now that the problem with MVO has been identi-
problem can be written as fied, the solution is straightforward: Increase the
estimated risk of the problem portfolios, which can
γ be achieved by shrinking the estimated correlations
x ′s − x ′Σx =
2 of the assets, leading to the simple EPO as shown in
γ γ the next subsection. The simple EPO underlies most
(P′σx )′ sp − (P′σx )′D(P′σx ) = z′sp − z′Dz, (6)
2 2 of our empirical analysis, so readers who want to

6  Second Quarter 2021


 Enhanced Portfolio Optimization

immediately apply these insights can go directly to This shrinkage is not only helpful in addressing mis-
the empirical section, “EPO in Practice,” after reading specification of the variances; it also addresses mis-
the next subsection, “Shrinking Correlations: The specification in expected returns because it implicitly
Simple EPO.” Readers who are interested in why this shrinks the Sharpe ratios of the PC portfolios, as we
simple EPO approach works well, how to anchor the discuss further in the next subsections. That is, the
EPO portfolio to a benchmark, and how various opti- simple EPO uses enough correlation information to
mization techniques are connected should continue improve diversification relative to an unoptimized
with all the following subsections. portfolio but not too much correlation information in
order to avoid the problems of standard MVO.
Shrinking Correlations: The Simple
EPO. As discussed previously, principal compo- Anchoring Expected Returns: A Bayesian
nents (PCs) can be viewed as portfolios that are Approach. We next address that the inves-
ordered by their degree of troublesomeness for tor’s signal, s, is observed with noise. This section
portfolio optimization. In essence, the problem is considers a Bayesian approach following Black and
that the estimated variances are likely to be too low Litterman (1992) but with a different way of express-
for the safest portfolios (and too high for the riskiest ing the solution (and different notation). We first
ones). An easy fix is to shrink the estimated variances describe the assumptions and then provide some
toward their average.5 The average variance of these intuition. The investor observes a vector of signals
PC portfolios is 1 (because they are the principal s = m + e, which is the true (unobserved) expected
components of the correlation matrix, which has 1s return vector, m, plus the noise term, e, that captures
along the diagonal). Hence, we can use the modified measurement errors about expected returns. The
risks of the PCs: noise is normally distributed with a mean of zero and
a covariance of L.
 = (1 − θ)D + θI,
D (9)
The investor must try to estimate true expected
return m on the basis of noisy signal s. Although stan-
where θ∈[0, 1] is the degree of shrinkage, I is the dard MVO estimates the true expected return simply
identity matrix, and the tilde (~) over the D means as the signal s that contains measurement errors, we
that it has been adjusted to account for estimated consider a Bayesian investor who updates his “prior
error. The corresponding correlation matrix for the beliefs” about m to make a better estimate of true
original assets is expected returns by using the observed signal—that
is, E(µ|s). The investor’s prior beliefs about the assets’
 ′ = P (1 − θ)D + θIP′ = (1 − θ)Ω + θI.
 = PDP
Ω (10) true expected return vector, m, is given by
 

Hence, one can see that the adjusted correlation µ = γΣa + η, (12)
matrix is simply original matrix W shrunk toward the
identity matrix. In other words, we have shown the where γΣ Σa is constant and h represents random
following: fluctuations in investment opportunities. Specifically,
h is normally distributed with mean zero and a
Observation: Adjusting the volatilities of PC covariance of τΣ Σ for some constant t.6 The first term
portfolios corresponds to adjusting the correla- in Equation 12, γΣ Σa, is the unconditional average
tions of the original assets. Specifically, increas-
return, which is written (without loss of generality)
ing the volatility of the problem portfolios
as a product of risk aversion g (defined in the sub-
while lowering the volatility of the important
section “Standard Mean–Variance Optimization” in
PC portfolios is the same as multiplying all the
correlations of the original assets by 1 - q. the section “Identifying the Problem with Standard
Optimization”); the variance–covariance matrix of
The variance–covariance matrix with the shrunk returns, S (also defined in the subsection “Standard
correlations is Σ = σΩ
 σ, which we can use as an Mean–Variance Optimization”); and an anchor port-
input in portfolio optimization. The result is a simple folio, a. Writing the average return in this way means
enhanced portfolio optimization: that the anchor is the investor’s “typical portfolio.”

Intuitively, this model means that the investor


1 is aware that the signal is estimated with error
EPO s = Σ −1s. (11)
γ and has a framework for the nature of this error.

7
Volume 77 Number 2 
Financial Analysts Journal | A Publication of CFA Institute

This framework involves some standard parameters Anchoring Expected Returns: Robust
(the risk, S; the signal about expected returns, s; and Optimization. An alternative approach to address
risk aversion, g), and other more mysterious param- noise in expected returns is to use robust optimiza-
eters L, t, and anchor portfolio a. The mysterious tion. Robust optimization aims to improve upon
parameters can be explained as follows: The anchor standard MVO by explicitly modeling uncertainty
portfolio is basically the investor’s typical portfolio or around expected returns as a part of the optimiza-
strategic asset allocation, t indicates the variation in tion problem. Specifically, we want to choose the
the investor’s optimal portfolio, and L is the amount portfolio that gives the highest utility even if the
of measurement error. We need not worry too much expected return is the worst possible, within some
about these parameters, however, because we show uncertainty region:
in the upcoming subsection “Putting Optimization to
Work” how the simple EPO makes all these mysteri-
 γ 
ous parameters disappear! maxmin ( x − a)′µ − x′ Σx  subject to
x µ  2  (15)
We also consider an anchored EPO, which makes
all the mysterious parameters disappear except { }
µ ∈ µ | (µ − s)′Λ−1 (µ − s) ≤ c2 .
the anchor, because having an anchor can be use-
ful in practice—for example, to control how much This specification means that we seek to be robust
an optimized portfolio deviates from a benchmark. to measurement error in the signal s about expected
Indeed, we can think of the anchor as the investor’s returns. In other words, the true expected return, m,
benchmark, strategic asset allocation, or typical can deviate from s, and we wish to ensure good
investment strategy. To understand the anchor, note performance even for the worst possible m. The
that when expected returns are at their average parameters L and c control how much the true
value (i.e., h = 0), the optimal portfolio is the anchor expected return can deviate from the signal—that is,
(i.e., x = (1/γ )Σ −1µ = a).7 the amount of measurement error in the signal s. But
these mysterious parameters disappear in the simple
To solve the model, we first compute the inves-
EPO as explained in the next subsection. Finally, we
tor’s view on expected returns based on her signal
interpret the anchor portfolio, a, as a benchmark
and prior—namely, E(µ|s). Given that the investor
portfolio that we wish to outperform (or are afraid of
maximizes her mean–variance utility (as defined in
underperforming)—for example, the market portfolio.8
the section “Identifying the Problem with Standard
The solution is given in the following proposition.
Optimization”), the solution to the enhanced portfo-
lio optimization problem is then (1 / γ )Σ −1E(µ|s). The Proposition 2. The solution to the robust portfo-
following proposition summarizes the result, and all lio optimization problem is
proofs can be found in Appendix B.
1
Proposition 1. In this Bayesian model, given x = ( τΣ + Λ )−1 ( τs + γΛa), (16)
γ
the observed signal, the investor’s expected
where t depends on c and the set of solutions
return is
for c ∈ (0,∞) equals the set of solutions for
τ ∈ (0,∞).
E(µ|s) = Σ ( τΣ + Λ )−1 ( τs + γΛa), (13)

and the solution to the enhanced portfolio This result shows how robust optimization can
optimization problem is be done via shrinkage of the mean and variance–
covariance matrices. Surprisingly, the optimal portfo-
1 lio (Equation 16) is exactly the same as the solution
x = ( τΣ + Λ )−1 ( τs + γΛa). (14)
γ in the previous subsection! This result provides a
new link between robust optimization and Bayesian
Interestingly, the optimal portfolio, Equation 14,
optimization. What is the intuition behind this link?
looks like the solution to an MVO when both the
Both methods capture the ideas that signal s con-
mean and variance have been modified, even though
tains imperfect information about the conditional
here, we have only assumed that the mean contains
expected returns, that the amount of noise in the
errors. That is, errors in expected returns alone lead
signal is related to L, and that there exists an anchor
to the shrinkage of correlations, even when correla-
portfolio, a, that one might not want to deviate too
tions are assumed to be known without error.
much from.

8  Second Quarter 2021


 Enhanced Portfolio Optimization

Putting Optimization to Work: Simple and we denote w = λ /( τ + λ) ∈ [0, 1] as the “EPO


EPO and Anchored EPO. We have discussed shrinkage parameter.” A benefit of Equation 18 is that
that estimation errors occur in both the variance– two of the tricky parameters (l and t) disappear, so
covariance matrix and in expected returns. Hence, we need to keep track of only their relative magni-
we first fix the problem with the variance–covariance tude via w.
matrix by using simple shrinkage as shown in the
The EPO shrinkage parameter. The shrinkage
subsection “Shrinking Correlations: The Simple EPO”
parameter, w, plays a key role in our empirical imple-
(or using the random matrix theory discussed in
mentation. We see from Equations 18 and 19 that
Appendix A), giving rise to enhanced risk estimate
the EPO shrinkage parameter controls the shrink-
Σ , and second, we enhance expected returns as
age of both (1) expected returns toward the anchor
described in the two sections on anchoring returns,
and (2) the correlations toward zero. For example, a
leading to the general EPO solution:
shrinkage of w = 0 gives the standard MVO solution;
a shrinkage of w = 100% yields the anchor portfolio.
1 For the empirical implementation, we chose the
EPO = ( τΣ + Λ )−1 ( τs + γΛa). (17)
γ shrinkage parameter in a pragmatic way—namely, as
the value that yields the best risk-adjusted returns—
The general EPO solution in Equation 17 depends on and we show how to choose w both in-sample and
several parameters, some of which are straightfor- out-of-sample.
ward to estimate, but others are tricky. So, we will
provide some guidance. Let us start with the easier Because w becomes an empirical choice variable,
ones: The variance–covariance matrix, Σ , can be we write the EPO solution as a function of w—that
estimated in the standard way based on the sample is, EPO(w). On the one hand, intuitively, the optimal
counterpart, possibly enhanced with shrinkage as shrinkage parameter is larger when measurement
discussed in “Shrinking Correlations: The Simple errors are larger—because of, for example, poor data
EPO.” The signal about expected returns, s, is the quality, illiquidity, or using a weak return predictor
investor’s favorite predictor of returns. (To be clear, (i.e., w is increasing in l). On the other hand, the
predicting returns is never easy, but an investor shrinkage is smaller when true expected returns
would probably not be interested in portfolio optimi- fluctuate more (i.e., w is decreasing in t).
zation if he did not have some predictors to opti-
mize.) The trickier parameters are the anchor, a, the Simple EPO. A particularly simple expression arises
risk aversion, g, the magnitude of shocks to expected if we choose the anchor portfolio as a = (1/ γ )V −1s.10
returns, t, and the uncertainty matrix, L. In this case, we recover the simple EPO already
discussed in the subsection “Shrinking Correlations:
Starting with the uncertainty matrix, a natural The Simple EPO”:
assumption is that the noise in the measurement of
expected returns is independent across assets; that 1 −1
is, L = lV, where l is a constant and V is the diago- EPO s (w ) = Σ w s, (20)
γ
nal matrix of variances or, equivalently, the matrix
of squared volatilities, V = σ 2. The independence
where Sw is the shrunk variance–covariance matrix
across assets arises, for example, from the common
from Equation 19.
practice of estimating signals about returns in a way
that is unrelated to the estimation of risk.9 Under this Remarkably, Equation 20 is the same as the solu-
assumption, the EPO solution can be written as tion to a standard MVO except that the correlations
(or variance–covariance matrix) have been shrunk.
−1 (1 − w ) 1 s + wVa  ,
So, surprisingly, errors in the estimation of mean and
EPO(w ) = Σ w   (18)
 γ  variance make it helpful to shrink the correlations;
that is, these correlations are shrunk beyond what is
where Sw is a shrunk variance–covariance matrix cor- justified by the errors in the variance alone because
responding to a shrunk correlation matrix, Ω:
 errors in the estimates of expected returns also
make correlation shrinkage useful. Furthermore, the
simple EPO solution given in Equation 20 is linear in
Σ w = (1 − w )Σ + wV
(19) the risk tolerance, so performance statistics such as
 + wI σ,
= σ (1 − w )Ω the Sharpe ratio do not depend on risk aversion g.

9
Volume 77 Number 2 
Financial Analysts Journal | A Publication of CFA Institute

Therefore, this expression is straightforward to simple EPO (Equation 20) and the anchored EPO
implement—for example, by setting g = 1 or any other (Equation 22), which are used in our empirical imple-
number that corresponds to a desirable level of risk. mentations. The reader has already seen that the
EPO method is related to several other approaches
Anchored EPO. Some investors prefer their portfo- to portfolio optimization and, as described in the
lios to be tied to an anchor, so it is useful to consider following proposition, the method has, in fact, even
a practical implementation of an anchored EPO. For broader links to the literature.
example, an investor might have a signal, s, about the
assets’ expected returns based on their momentum Proposition 3. The EPO solution (Equation 17) is
and an anchor, a, based on the 1/N portfolio or based equal to
on a benchmark portfolio. In this case, we know all
1. standard MVO when the estimate of vari-
the inputs in Equation 18 except w, which we choose
ance has no noise, so Σ = Σ, and the signal of
empirically, and risk aversion g. So, the last question
expected returns has no noise, so L = 0;
is how to choose the risk aversion. The risk aversion
g can be chosen based on the investor’s preferences 2. the anchor when t = 0 as in reverse MVO;
(typically, a number between 1 and 10).11 But using
Equation 18 with a g based on risk aversion requires 3. the Bayesian estimator from “Anchoring
that the signal be measured in the right “units.” Expected Returns: A Bayesian Approach,”
Specifically, the signal must not only predict returns; which is equivalent to Black–Litterman (1992)
it should also be scaled so that, for instance, si = 2% when the anchor portfolio is the market
means that asset i has an expected return of 2%. portfolio, the signal is their “view portfolios,”
and we assume that the variance–covariance
Suppose, instead, that our signal is proportional matrix is estimated without error;
to expected returns but we do not really know
4. t he solution to robust optimization with
the scale. For example, an asset’s past momentum
ellipsoidal uncertainty set as defined in
predicts that it will outperform in the future, but we
“Anchoring Expected Returns: Robust
do not know by how much. Or, as another example,
Optimization”; and
suppose the signal is a relative ranking of securities
based on their valuations. In these cases, the risk 5. a generalized ridge regression (a form of
aversion g can be chosen based on the insight that regularization used in machine learning) of
the investor apparently likes the risk level inherent expected returns on the variance–covariance
in the anchor portfolio. Note that the EPO solution matrix.13
(Equation 18) is essentially a mixture of the anchor
−1 (1/ γ )s, so we can pick g
portfolio and the portfolio Σ w Proposition 3 shows how the EPO method helps
to equalize the variance of these portfolios: unify seemingly unrelated strands of literature.
Regarding Parts 1 and 2, EPO obviously contains
as special cases the standard MVO and the anchor,
−1Σ
s′Σ w  Σ −1s
w which is trivial in itself, but by nesting these
γ= , (21)
a′Σ a approaches, we get an enhanced version of things
we already know. Furthermore, when using expected
which yields12 returns that imply the anchor is the optimal portfolio
(Part 2), we get what is called “reverse MVO” in the
context of an optimization that includes a set of
 
−1 (1 − w ) a′ Σ a constraints. The reason is that the optimal portfolio
EPOa (w ) = Σ w s + wVa . (22)
 −1Σ
s′ Σ w  Σ −1s  is taken as given and optimization is performed with
 w 
the “implied expected returns,” E(µ|s) = Σ a, which is
the expected return that makes the anchor portfolio
Equation 22 is our anchored EPO solution for anchor
optimal in the absence of constraints.
a based on shrinkage parameter w, where risk aver-
sion is chosen endogenously. Regarding Part 3, we see that the Bayesian estima-
tor from “Anchoring Expected Returns: A Bayesian
A Unified Approach to Optimization.  Approach” is connected to the Black–Litterman
In summary, we have derived a general enhanced (1992) formula, which is not surprising given the
portfolio optimization method (Equation 17) and two similar Bayesian structure. Despite this connec-
straightforward ways to implement this method—the tion, we note that our empirical implementation

10  Second Quarter 2021


 Enhanced Portfolio Optimization

is very different from previous applications of that involve the US dollar. We excluded non-USD
Black–Litterman: Black and Litterman always took cross-currency pairs to ensure that the variance–
the anchor portfolio to be the market portfolio. covariance matrix would be of full rank.14 For each
They considered certain “view portfolios” rather instrument, we constructed a return series by
than maintaining our direct focus on a signal about computing the daily excess return of the most liquid
expected returns. They considered a relatively small contract at each point in time and then compounded
set of assets. And they ignored noise in the variance– daily returns to a cumulative return index from which
covariance matrix. In contrast, we focus on different we could compute returns at any horizon. The data
anchors, including where the anchor essentially start in 1970 and extend through 2018. Following
disappears in the simple EPO, we use the shrinkage Moskowitz et al. (2012), we started the backtest in
parameter as the key tuning variable, we consider 1985, at which time we had data for a broad set of
noise in estimates of both risk and expected return, instruments. Furthermore, having the earlier data
and we consider a number of datasets with many allowed us to choose an initial out-of-sample EPO
more assets. shrinkage parameter without shortening the time
series relative to the Moskowitz et al. study.
Regarding Parts 4 and 5 of Proposition 3, note the
interesting—and far from obvious—aspect that the The samples for Equity 1 through Equity 7 in Table 1
Bayesian estimator corresponds to both robust opti- are the 49 value-weighted US equity industry port-
mization (as derived in “Anchoring Expected Returns: folios from Kenneth French’s website.15 As noted,
Robust Optimization”) and regularization methods for Equity 8, we split each industry portfolio into two
used in other strands of statistics and machine components, for a total of 2 × 49 = 98 test assets.
learning (not discussed previously here; the proof in Specifically, using the CRSP data on the underly-
Appendix B, however, describes ridge regressions ing stocks, we computed a “high-momentum” and
and other regularizations). “low-momentum” portfolio within each of the 49
industry portfolios. Each low-momentum portfolio
return is a value-weighted average of the half of the
EPO in Practice: Empirical Results stocks in that industry with the lowest past 12-month
In this section, we describe the data and methodol- returns, and the construction is similar for the high-
ogy for our empirical study and discuss results for momentum portfolio. To calculate excess returns of all
application of the method to global asset classes and the equity portfolios, we subtracted the one-month
equity portfolios. US T-bill rate, also sourced from French’s website. The
equity portfolio data begin in 1927 and end in 2018.
Data and Methodology. For our empirical To ensure enough data from which to select an initial
implementation, we constructed optimized industry out-of-sample EPO shrinkage parameter using only
momentum and time-series momentum portfolios for past information, we evaluated EPO performance for
11 samples that differed in terms of their test assets a sample period beginning 15 years after data were
and methodology, as summarized in Table 1. The first available (as we did for shrunk Global 1–Global 3),
data used, the number of test assets, the methods so all equity backtests ran from 1942 to 2018.
used, the start date of the data, and the start date of
Benchmark factors. We used monthly returns from
our backtests are provided in Table 1. The first three
French’s website to evaluate the returns of optimized
samples—Global 1, Global 2, and Global 3—consist
equity industry momentum portfolios relative to
of equity indexes, bond futures, commodities, and
the Fama–French (2015) five-factor model. We also
currencies (foreign exchange, or FX); the Equity 1
evaluated optimized time-series momentum port-
through Equity 8 samples consist of equity portfolios,
folios relative to the time-series momentum bench-
as we describe in detail next. The samples consist
marks described in Appendix A.
of various datasets and methodologies in order to
examine the robustness of the EPO method. Optimization methods. Table 1 shows the opti-
mization methods we considered. To demonstrate
Test assets and data. Our data for Global 1–Global 3
the robustness of our results, we considered vari-
in Table 1 consist of 55 liquid futures and forward
ous optimization methods, various signals about
contracts described in Moskowitz et al. (2012).
expected returns, and various ways to estimate risk.
Specifically, in addition to every equity, commodity,
Specifically, we used the simple EPO method from
and bond futures contract used in Moskowitz et al.,
Equation 20 in the sample of global assets and in
we used the nine currency pairs in Moskowitz et al.

11
Volume 77 Number 2 
12 
Table 1. Samples and Summary Statistics: Backtests Ending 2018

Start of Start of
Data Backtest
Number Optimization Return (1 January (1 January
Portfolio Dataset of Assets Method Risk Model Signal each year) each year)

Global 1 Global equities, bonds, FX, 55 EPOs Exponentially weighted daily volatilities TSMOM 1970 1985
and commodities (60-day center of mass) and 3-day overlap-
ping correlations (150-day center of mass)
Global 2 Global equities, bonds, FX, 55 EPOs Risk model from Global 1, where correla- TSMOM 1970 1985
Financial Analysts Journal | A Publication of CFA Institute

and commodities tions are shrunk 5%


Global 3 Global equities, bonds, FX, 55 EPOs Risk model from Global 1, enhanced via TSMOM 1970 1985
and commodities random matrix theory
Equity 1 49 industry portfolios 49 EPOs 60 months (equal weighted), 5% shrunk XSMOM 1927 1942
Equity 2 49 industry portfolios 49 EPOs 40 days (equal weighted), 5% shrunk XSMOM 1927 1942
Equity 3 49 industry portfolios 49 EPOs 120 days (equal weighted), 5% shrunk XSMOM 1927 1942
Equity 4 49 industry portfolios 49 EPOs 120 days (equal weighted), 5% shrunk XSMOM*σ 1927 1942
Equity 5 49 industry portfolios 49 EPOs 120 days (equal weighted), 5% shrunk XSMOM*σ2 1927 1942
Equity 6 49 industry portfolios 49 EPOa with 60 months (equal weighted), 5% shrunk XSMOM 1927 1942
anchor = 1/N
Equity 7 49 industry portfolios 49 EPOa with 60 months (equal weighted), 5% shrunk XSMOM 1927 1942
anchor = 1/σ
Equity 8 Each industry split in 2 portfolios 98 EPOs 60 months (equal weighted), 5% shrunk XSMOM 1927 1942
based on past 12-month return

Notes: TSMOM is time-series momentum; XSMOM is cross-sectional momentum. All backtests began 15 years after the earliest initial data were available, so we always had at least
15 years of data to select an out-of-sample EPO shrinkage parameter.

Second Quarter 2021


 Enhanced Portfolio Optimization

Equity 1 through Equity 5 and Equity 8; we used the For Global 1–Global 3, we used TSMOM signals,
anchored EPO method from Equation 22 in Equity 6 meaning that the signal of expected return for each
and Equity 7. In Equity 6, the anchor portfolio is instrument was related to its past 12-month excess
the 1/N portfolio that gives equal notional weight return. Specifically, the signal about the expected
to each industry portfolio. In Equity 7, the anchor return of instrument i in month t was
portfolio is the 1/s portfolio that assigns equal ex
ante volatility to each industry portfolio. Specifically,
sti = 0.1 × σit × sign(rti−12,t ). (23)
this portfolio has a notional weight in industry i given
by (σit )−1/ Σ j (σtj )−1, where σit is the estimated volatility
of industry i at time t. Equation 23 means that each instrument had a
positive expected excess return when the sign
Risk models. Table 1 further shows how we esti- of the past 12-month excess return was positive
mated risk—again considering various methods to (otherwise, expected excess return was negative)
demonstrate robustness. For Global 1, we used a and that the monthly Sharpe ratio for each asset
method similar to that of commercial risk models. was constant and equal to 0.1. The assumption of a
The volatility of each instrument was estimated by constant Sharpe ratio is consistent with the implicit
using exponentially weighted daily returns with a assumption of Moskowitz et al. (2012) because they
60-day center of mass. The correlations, Ω  Global 1, used a constant volatility target for each asset. The
were estimated by using exponentially weighted scaling of 0.1 is consistent with the average real-
3-day overlapping returns with a 150-day center of ized Sharpe ratios reported by Moskowitz et al. and,
mass.16 We used three-day returns, ri3,td = Σ2k =0 rti−k , more recently, by Babu, Levine, Ooi, Pedersen, and
to mitigate the effects of asynchronous trading Stamelos (2020),19 but this choice is inconsequen-
among global assets, which affects correlations tial for the Sharpe ratio of the final EPO portfolio.
but not volatilities. For Global 2, we used the To ensure that the fully shrunk EPO portfolio—
same risk model as Global 1, except that all nondi- EPO s (w = 100%)—exactly matched the TSMOM
agonal correlations were shrunk 5% toward zero: strategy of Moskowitz et al., we used a risk aversion
Ω Global 2 = 0.95 Ω Global 1 + 0.05I. For Global 3, we coefficient of γ t = nt / 40%, where nt is the number of
started with the risk model of Global 1 and then instruments at time t.20
enhanced the model by using random matrix theory,
 Global 1 ), where RIE stands for rota-
 Global 3 = RIE (Ω For Equity 1 through 3 and Equity 6 through 8,

we considered a simple version of cross-sectional
tionally invariant estimator (see Bun, Bouchaud, and
momentum (XSMOM), meaning that the signal of
Potters 2016) as described in Appendix A, with n set
each instrument depended on its past 12-month
to the number of securities available at each given
relative outperformance (i.e., its return minus the
point in time and T = 300, which is twice the center
average return across all instruments):
of mass of 150 days. We then combined each of
these correlation matrices with the diagonal matrix,
s, of volatility estimates to arrive at variance–covari-  1 
ance matrix Σ = σΩ  σ. sti = XSMOMti := ct  rti−12,t − ∑ rtj−12,t  , (24)
 n j =1, … ,n 
 
For the Equity 1 through Equity 8 samples, we
started with the standard equal-weighted estimates where the scaling factor, ct, was chosen such that
of variances and covariances for 60 months, for the positive and negative signals would sum to 1.0;
40 days, and for 120 days of data.17 We then shrank that is,
all off-diagonal correlations (or, equivalently, covari-
ances) 5% toward zero.
∑sti 1{ s >0} = ∑ sti 1{ s <0} = 1.
i
t
i
t
(25)
i i
Signals about expected returns. Finally, we needed
a signal about expected returns in each sample.
For Equity 4, each industry’s signal of expected
To have a simple signal that we knew correlates
returns is its past 12-month outperformance multi-
with future returns, we decided on past 12-month
plied by its volatility,
returns,18 which we used as our signal throughout
this analysis. Note, however, that our EPO method
is general and can be used to optimize any predic- sti = σit × XSMOMti . (26)
tor of future returns (or combination of predictors),
not just those predictors based on past returns.

13
Volume 77 Number 2 
Financial Analysts Journal | A Publication of CFA Institute

This choice of multiplying by volatility is similar


in spirit to the scaling of TSMOM as defined in Table 2. Gross Sharpe Ratios of Optimized
Equation 23. To see again that this choice is natural, TSMOM Portfolios: Global 1–
consider the implications for the fully shrunk EPO Global 3, 1985–2018
portfolio. This portfolio has a notional weight of each
industry i given by Global 2 Global 3
Global 1 (Shrunk) (RMT)

1 sti XSMOMti Portfolio


EPO s (w = 100%)i = = , (27)
γ (σit )2 γσit Long only: 1/N 0.44 0.44 0.44
Long only: 1/σ 0.76 0.76 0.76
which is proportional to the Sharpe ratio of its
TSMOM: equal 0.74 0.74 0.74
outperformance—an intuitive scaling. Furthermore,
notional weight
the fully shrunk EPO’s risk weight in industry i is
σit ati = XSMOMti / γ , implying that when the absolute TSMOM: equal 1.09 1.09 1.09
volatility weight
values of these risk weights are summed over all
instruments, this total risk weight is constant over EPOs: out-of-sample 1.24 1.24 1.23
time (because of the definition of ct). So, Equation 27
is also an expression of an intuitive scaling if we EPOs(w): shrinkage parameter w
believe that the investment opportunity set is not 0% (naive MVO) 0.87 1.08 1.02
varying much over time. 10% 1.15 1.18 1.19

Finally, for Equity 5, we let sti = (σit )2 × XSMOMti , 25% 1.24 1.26 1.26
which implies that the fully shrunk EPO portfolio, 50% 1.31 1.31 1.32
EPO s (w = 100%)i = XSMOMti / γ, is proportional to 75% 1.32 1.31 1.32
each industry’s outperformance.
90% 1.26 1.26 1.26
Global Asset Classes: Beating Time-Series 99% 1.13 1.13 1.13
Momentum. In this subsection, we consider the 100% (anchor) 1.09 1.09 1.09
performance of EPO versus benchmarks, how to
identify problem portfolios, the alphas of EPO, and Notes: Global 1–Global 3 samples are described in Table 1.
leverage and turnover. The long-only 1/N portfolio invests with equal notional expo-
sure across all assets; the 1/σ portfolio invests with equal vola-
tility weight in each asset; the TSMOM strategy invests with
Performance of EPO vs. benchmark portfo- equal notional exposure in each asset; the TSMOM strategy
lios. Turning to our empirical results, we first invests with equal volatility weight in each asset and a range
consider the performance of optimized TSMOM of optimized portfolios. The optimized portfolios consist of the
simple out-of-sample EPO and a range of in-sample EPO port-
portfolios relative to key benchmarks for global
folios that differ on the basis of EPO shrinkage parameter w.
assets, such as long-only portfolios and standard The out-of-sample EPO uses only past data to choose w. For
TSMOM factor portfolios, as shown in Table 2. Global 1, the correlation matrix was estimated in the standard
way; Global 2 had a 5% shrunk correlation matrix; Global 3
The first portfolio that we consider is the 1/N portfo- used a cleaned correlation matrix based on random matrix
lio that invests an equal notional exposure across all theory (RMT).
assets. This portfolio delivered a Sharpe ratio of 0.44,
arising from the equity risk premium and similar risk takes into account expected returns by trading on
premiums in other asset classes. The 1/σ portfolio TSMOM, and it takes into account volatility differ-
targeted an equal amount of standalone volatility ences across assets and over time by scaling posi-
in each asset; for example, the notional exposure in tions accordingly. Said differently, although the 1/N
−1 −1
( )
asset i was σit ( )
/ Σ j σit . This portfolio delivered portfolio is normally difficult to beat, we also consider
benchmarks such as TSMOM that already beat 1/N
a higher Sharpe ratio of 0.76. The Sharpe ratios are hands down—so these benchmarks set a high bar.
even higher for the standard time-series momentum
factors. The risk-weighted TSMOM factor already Nevertheless, the out-of-sample EPO significantly out-
had a high Sharpe ratio of 1.09 because it does performed TSMOM—by 14%—delivering, as Table 2
several things that an optimizer hopes to achieve: It shows, a Sharpe ratio of 1.24 in Global 1 and Global 2

14  Second Quarter 2021


 Enhanced Portfolio Optimization

and 1.23 in Global 3. Recall that these samples differ Figure 2. EPO Shrinkage Parameter
in their estimation of the risk model. Global 2 shrinks over Time: Global 2, 1985–2018
correlations initially by 5%, and Global 3 uses random
matrix theory (RMT). This performance of the EPO Out-of-Sample EPO Shrinkage Parameter, w
TSMOM portfolio is remarkably strong. 1.0
0.9
In our tests, the EPO portfolio relied on a single 0.8
parameter—namely, the EPO shrinkage parameter, w. 0.7
The out-of-sample EPO chose this parameter in an 0.6
expanding fashion, using only data available before 0.5
each month to decide on the parameter to use next 0.4
0.3
month. Also informative is the performance of EPO
0.2
when a constant w was used. The unshrunk EPO with 0.1
w = 0 corresponds to standard MVO, and Table 2 0
shows that MVO performs worse than equal-volatil- 1/31/1985 1/31/1994 1/31/2003 1/31/2012
ity-weighted TSMOM. In other words, standard MVO
does not work here. The fully shrunk EPO with w = 1 Notes: We empirically chose w out-of-sample as follows: For
means that we invested in the anchor portfolio, which each time period, we estimated what choice of w (within a
is the TSMOM factor by construction. With shrinkage finite grid of possible values) would have produced the highest
EPO portfolio Sharpe ratio in the time period up until that
factors in between zero and 100%, we can see that date. Then, we used this estimate in the next time period.
performance improves. It peaks at an even higher
level than the out-of-sample EPO (which we will now
call “OOS EPO”), but of course, picking the in-sample performs strongly, so an interesting question is,
highest w is not implementable in real time. what is the source of this difference? For simplicity,
we illustrate problem portfolios for the sample in
Figure 2 shows the evolution of the OOS EPO Global 1.
shrinkage parameter over time for the Global 2 sam-
ple. Note that at least 15 years of data are required Following the ideas in “Shrinking Correlations: The
to select an initial OOS EPO shrinkage parameter. Simple EPO,” we uncovered the problem portfolios
Over time, the shrinkage parameter used by the as follows. Each month t, we first estimated the
OOS EPO method approaches the optimal in-sample volatilities and correlation matrix of global assets Ωt
value, but initially OOS EPO used a lower shrinkage, as described in “Data and Methodology.” We then
and some time had to pass for the out-of-sample computed the eigendecomposition of the correlation
process to settle on the optimal shrinkage parameter. matrix,
This aspect explains why the performance of OOS
EPO is a bit below the in-sample maximum Sharpe
Ωt = PtDtPt −1 , (28)
ratio in Table 2.

Figure 3 shows how the realized Sharpe ratios of


optimized portfolios also vary with the choice of EPO
( n
)
where Pt = Pt1 ,…,Pt t is the matrix in which each
column is a PC portfolio.
shrinkage parameter. The EPO performance is strong
for a wide range of shrinkage parameters, reflect- We then studied the expected returns, ex ante
ing the robustness of the process. Furthermore, the volatilities, and ex ante Sharpe ratios of these PC
enhancements of the correlation matrix in Global 2 portfolios (which were rebalanced monthly). We
and Global 3 improve the performance relative to were comparing these ex ante data with the realized
Global 1 in the case of w = 0 (the left side of the data. To compute these statistics, we considered
graph), which corresponds to standard MVO, but the assets rescaled to have unit volatility, σ t −1rt +1,
have almost no effect on the peak of the curve. with an ex ante variance–covariance matrix equal to
In other words, improving the correlation matrix is the correlation matrix (recall that σ t is the diagonal
important for standard MVO but has little effect matrix of volatilities). Similarly, PC portfolio i has a
when we subsequently shrink the correlation by a
large factor.
( ) ′
return Pti σ t −1rt +1.

Therefore, based on this time series, we can com-


Identifying problem portfolios. We have shown pute the realized average excess return, volatility,
that standard MVO performs poorly whereas EPO

15
Volume 77 Number 2 
Financial Analysts Journal | A Publication of CFA Institute

Figure 3. Performance Sharpe Rao


of Optimized TSMOM 1.4

(Global) Portfolios, 1.3


1985–2018 1.2

1.1

1.0

0.9

0.8
0 10 20 30 40 50 60 70 80 90 100
EPO Shrinkage Parameter, w (%)

Standard Shrunk
RMT OOS EPO

Notes: Global 1–Global 3 data were used. The three correlation matrices are the standard sample
correlation matrix (“Standard” in the figure), a correlation matrix with 5% correlation shrink-
age applied (“Shrunk”), and a cleaned correlation matrix based on RMT. A shrinkage of 0% is
standard mean–variance optimization, 100% shrinkage is the anchor portfolio, and in between
is EPO.

and Sharpe ratio. The ex ante expected return is PCs. Furthermore, we see that realized returns
(Pti )′σ t −1st, where signal st about the expected return approach zero faster than the expected returns do.
is given in Equation 23. The ex ante volatility of PC Said differently, the expected returns appear to be
portfolio i is given by its corresponding eigenvalue, too high for the least important PCs, which adds to
Dti , and the ex ante Sharpe is the ratio of expected the problem identified in Panel A. Any noise in the
return and ex ante volatility.21 expected returns of the actual assets leads to non-
zero expected returns of the unimportant PCs, and
Figure 1 plots the results. Looking back at it, first because the optimizer can always choose the sign of
consider Panel A, showing the volatilities of the the portfolio to make a nonzero expected return into
principal-component portfolios. By construction, a positive expected return, the optimizer wants to
PC#1 had the highest ex ante volatility and PC#55 take a large position in these noise PCs.
had the lowest ex ante volatility. Looking at the real-
ized volatilities of these portfolios, we see that the Panel C of Figure 1 illustrates how the problems with
realized returns are also decreasing in the PC number risk and expected return interact by looking at the
with volatility levels that roughly match their average corresponding Sharpe ratios. We see a dramatic dif-
ex ante counterparts, indicating that the risk model ference between ex ante and realized Sharpe ratios:
works reasonably well. However, we do see system- Realized Sharpe ratios decrease with the PC number,
atic errors: The least important PCs (those with the whereas ex ante Sharpe ratios increase. Realized
highest numbers) have higher realized volatilities Sharpe ratios decrease because the important
than their average ex ante volatility. The reason is low-numbered PCs are more likely to be driven by
that these portfolios have been chosen as those true economic factors whereas the high-numbered
with the lowest ex ante volatility, so errors in the risk PCs are unintuitive long–short factors. Said differ-
model may lead to underestimation of the ex ante ently, the low-numbered PCs have larger signal-
risk of these portfolios. This low level of estimated to-noise ratios than the high-numbered PCs. The
risk leads the optimizer to apply excess leverage to ex ante Sharpe ratios are high for the unimportant
these noise portfolios to achieve a given level of risk. PCs because their risk is underestimated, and their
expected return is overestimated, especially relative
Now consider PC returns, plotted in Panel B of to their level of risk.
Figure 1. Naturally, realized returns are noisy;
expected returns are smoother simply because real- To see the implications of this discrepancy, note in
ized performance always has an element of chance. Panel D of Figure 1 the relative importance of each
Nevertheless, we see that both expected and real- PC for the MVO and EPO portfolios. Specifically, we
ized returns tend to be lower for the less important plotted the realized risk for each PC of the standard

16  Second Quarter 2021


 Enhanced Portfolio Optimization

MVO portfolio and the OOS EPO portfolio (where which leads, in turn, to much smaller amounts of real-
both portfolios were scaled to realize 10% volatility ized risk in the unimportant PC portfolios, as Panel D
over the full sample to focus on differences in rela- shows.
tive risks across PC portfolios). We see that the erro-
neous pattern in ex ante Sharpe ratios leads standard The alpha of EPO. Having shown the underlying
MVO to take large amounts of risk in the unimport- cause of EPO’s economically significant performance
ant PC portfolios, which turns out, ex post, to be improvements, we now report the alphas of EPO
largely betting on noise in past data. Furthermore, over passive market exposures and other known
the notional weights on the unimportant PC port- factors. Table 3 shows the alphas of OOS EPO for
folios are even larger because these portfolios need TSMOM (using the Global 2 sample) in relation to
to be leveraged as a result of their low risk per several benchmarks (with all variables standard-
notional amount (not shown in Figure 1). This large ized ex post to 10% volatility for comparability of
risk exposure to “problem portfolios” highlights why coefficients). In column 1, we simply controlled for
standard mean–variance optimization techniques the volatility-adjusted TSMOM factor. The result-
often perform poorly out-of-sample. In contrast, ing improvement in Sharpe ratio that we saw in
the EPO method accommodates this problem. Table 2 translates into a statistically significant
Indeed, EPO shrinkage corresponds to reducing the alpha and a large information ratio despite the high
ex ante Sharpe ratio of unimportant PC portfolios, R2. Column 2 reflects a further adjustment for the

Table 3. Alpha of Out-of-Sample EPO for TSMOM: Global 2, 1985–2018

Dependent Variable

EPO EPO EPO TSMOM TSMOM

Alpha 2.48% 2.17% 2.11% –0.43% –0.34%


(3.36) (2.92) (2.93) (–0.57) (–0.45)
Long only (1/σ) 0.06 0.07 –0.02
(2.77) (3.57) (–0.86)
TSMOM 0.91 0.90
(44.82) (43.77)
TSMOM(COM) 0.53
(26.02)
TSMOM(EQ) 0.30
(14.99)
TSMOM(FI) 0.34
(16.77)
TSMOM(FX) 0.32
(15.69)
EPO 0.91 0.92
(44.82) (43.77)
Information ratio 0.60 0.53 0.54 –0.10 –0.08
R2 83% 84% 85% 83% 83%

Notes: The alphas are for OOS EPO (using the Global 2 sample described in Table 1) when controlling for a volatility-scaled long-
only portfolio diversified across all instruments and volatility-scaled TSMOM portfolios diversified across all instruments or all
instruments within each asset class. Also reported are alphas of the volatility-scaled TSMOM portfolios in relation to OOS EPO. All
variables were ex post standardized to an annualized full-sample volatility of 10% to make the alphas comparable. The scaling did
not affect the t-statistics, which are reported in parentheses.

17
Volume 77 Number 2 
Financial Analysts Journal | A Publication of CFA Institute

volatility-adjusted long-only portfolio (called 1/σ),


which also had good performance, to see whether Table 4. Leverage and Turnover of
EPO simply benefits from being more long passive Optimized TSMOM Portfolios:
market exposures. Table 3 shows that the alpha Global 2, 1985–2018
remains statistically significant. Column 3 reflects
controls for volatility-adjusted TSMOM strategies in Gross Annualized
each of the four asset classes to see whether EPO Leverage Turnover
statically exploits a different asset allocation strat- per 10% as % of Avg.
egy. This test is stringent because we are now con- Volatility Gross Leverage
trolling for five high-performance volatility-adjusted Portfolio
strategies that already implicitly do part of the job
that we hoped an optimizer would do. Nevertheless, Long only: 1/N 135% 26%
the alpha of EPO remains statistically significant. Long only: 1/σ 267 43
The last two columns of Table 3 turn things around, TSMOM: equal 167 153
regressing the volatility-adjusted TSMOM strategy notional weight
on the EPO portfolio. We found an insignificant TSMOM: equal risk 358 163
alpha, which is consistent with the dominant perfor-
mance of EPO. EPOs: out-of-sample 457 254

EPOs(w): shrinkage parameter w


Leverage and turnover. Finally, to show that EPO
produces realistic and implementable portfolios, 0% (naive MVO) 991% 546%
we considered the turnover and gross leverage 10% 767 480
profiles of EPO portfolios. Table 4 shows leverage
25% 649 417
and turnover statistics for the benchmark portfolios,
the OOS EPO portfolio, and the EPO with various 50% 551 339
constant shrinkage parameters. We focus on the 75% 479 263
sample from Global 2 with a 5% shrunk correlation 90% 424 208
matrix. Furthermore, for comparability, gross lever-
99% 368 166
age statistics are shown for portfolios ex post scaled
to 10% annualized volatility, and annualized turnover 100% (anchor) 358 163
statistics are reported as a percentage of average
gross leverage. The lower EPO shrinkage parameters Notes: For comparability, all statistics are reported for portfo-
lios that were ex post scaled to an annualized full-sample vola-
exhibit larger turnover and more gross leverage. For tility of 10%. Annualized turnover is reported as a percentage
example, the standard MVO portfolio arising from of each portfolio’s average gross leverage.
an EPO shrinkage parameter of zero has substan-
tially more turnover and leverage than the anchor
so their model could be combined with our enhance-
portfolio arising from an EPO shrinkage parameter
ments in future research.
of 100%. Nevertheless, an EPO shrinkage parameter
of 90% would yield turnover and leverage similar
Results for Equity Portfolios: Beating
to the anchor, with a substantial improvement in
performance, as shown in Table 2. The OOS EPO has
Industry Momentum, the Market, 1/N, and
a larger turnover and leverage than the anchor, but Standard Factors. We have seen that EPO sub-
they remain of the same order of magnitude. In sum- stantially improves the performance of time-series
mary, when the EPO shrinkage parameter is chosen momentum predictors applied to a universe of global
appropriately, EPO yields implementable portfolios assets. We next consider the performance of EPO
with realistic leverage and turnover profiles, as well for equity portfolios and study the robustness of the
as substantial performance improvements over the performance to a range of choices on optimization,
standard TSMOM factors in the literature. Although risk estimation, and signals about expected returns.
the EPO method shown here abstracts from transac-
EPO performance vs. benchmarks. Table 5 reports
tion costs, modeling transaction costs explicitly as a
the Sharpe ratios of the OOS EPO portfolio, a range
part of the optimization can potentially reduce turn-
of EPO portfolios with various constant shrink-
over. Gârleanu and Pedersen (2013, 2016) derived
age parameters, and three benchmark portfolios.
the optimal portfolio in light of transaction costs but
The benchmark portfolios are the 1/N portfolio, a
without taking estimation uncertainty into account,
standard industry momentum (INDMOM) portfolio

18  Second Quarter 2021


 Enhanced Portfolio Optimization

Table 5. Realized Gross Sharpe Ratios of Optimized Equity Portfolios, 1942–2018

Equity 1 Equity 2 Equity 3 Equity 4 Equity 5 Equity 6 Equity 7 Equity 8

Portfolio
1/N 0.59 0.59 0.59 0.59 0.59 0.59 0.59 0.57
INDMOM 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.67
MVO (no correlation 0.19 –0.02 0.92 0.84 0.47 0.21 0.21 0.01
shrinkage)
EPO: out-of-sample 0.79 0.72 0.96 0.99 0.66 0.83 0.90 0.90

EPO(w): in-sample with shrinkage of w


0% (MVO with 0.56 0.82 0.97 0.96 0.66 0.50 0.51 0.60
5% correlation
shrinkage)
10% 0.68 0.89 0.98 0.99 0.71 0.59 0.60 0.80
25% 0.75 0.92 0.98 0.99 0.72 0.66 0.67 0.91
50% 0.79 0.93 0.96 0.97 0.71 0.72 0.75 0.98
75% 0.80 0.91 0.93 0.94 0.69 0.85 0.91 0.98
90% 0.79 0.88 0.89 0.92 0.67 0.83 0.90 0.94
99% 0.73 0.77 0.77 0.91 0.65 0.60 0.63 0.86
100% (anchor) 0.71 0.73 0.73 0.91 0.63 0.59 0.62 0.81

Notes: The Equity 1 through Equity 8 samples are described in Table 1. The long-only 1/N portfolio invests with equal notional
exposure across all industries; the INDMOM portfolio is long industries that outperformed over the past 12 months and short
industries that underperformed; the standard MVO portfolio is without correlation shrinkage. The optimized portfolios are the
out-of-sample and in-sample EPO portfolios. The optimal in-sample EPO portfolio and the out-of-sample EPO portfolio are shown
in bold in each column. We considered in-sample EPO portfolios for a range of shrinkage parameters. The OOS EPO chose w by
using only past data.

(following Moskowitz and Grinblatt 1999) with all samples except for Equity 2, which shows the
notional weights given by XSMOMti in Equation 24, robustness of the process. Also, note that all the
and a standard MVO using unshrunk correlations. OOS EPO portfolios realized higher Sharpe ratios
than all five Fama–French factors, despite the fact
In all cases in Table 5, the OOS EPO portfolio that the Fama–French factors are based on individual
outperformed 1/N, INDMOM, and the standard stocks whereas the EPO factors rely only on industry
MVO portfolios—often by a substantial margin. The returns. In fact, the best OOS EPO factors even out-
robustness of these results is noteworthy in light performed a portfolio that simultaneously invested
of the range of specifications. Recall that Equity 1 in all five Fama–French factors (equal weighted) over
through Equity 3 varied the risk model from 40 days the comparable time period.22
to 60 months—a broad span of risk models. Equity 4
and Equity 5 used different ways to scale the signals Alpha to standard factors. Table 6 reports returns
about expected returns. Equity 6 and Equity 7 used after controlling for a nonoptimized INDMOM portfo-
different implementations of the EPO method—the lio (the anchor) as well as the Fama–French five-factor
anchored EPO rather than the simple EPO—while model. The alpha is positive in all cases. Furthermore,
considering different anchors. Finally, Equity 8 was the positive alphas are statistically significant at the
based on a more granular set of test assets—that is, 5% level in all samples except for Equity 6, where
two portfolios per industry. the t-statistic of 1.80 is significant only at the 10%
level. For Equity 2 through Equity 4, the t-statistic is
The OOS EPO portfolio comes close to realizing greater than 6, which is highly statistically significant.
the highest in-sample Sharpe ratio among all EPO The weaker risk-adjusted return of Equity 6 may
portfolios with a constant shrinkage parameter in

19
Volume 77 Number 2 
Financial Analysts Journal | A Publication of CFA Institute

Table 6. Alpha of EPO for Equity Portfolios, 1963–2018

Dependent Variable: OOS EPO Portfolio

Equity 1 Equity 2 Equity 3 Equity 4 Equity 5 Equity 6 Equity 7 Equity 8

Alpha (annualized) 3.82% 8.09% 7.65% 6.25% 2.50% 1.07% 1.31% 4.40%
(4.41) (6.68) (6.29) (6.07) (2.38) (1.80) (2.49) (5.07)
INDMOM 0.78 0.53 0.53 0.69 0.67 0.31 0.33 0.78
(32.11) (15.66) (15.64) (24.00) (22.71) (18.85) (22.43) (32.21)
Mkt – RF 0.08 –0.09 –0.07 –0.07 –0.04 0.85 0.91 0.10
(3.12) (–2.38) (–1.95) (–2.10) (–1.23) (45.87) (55.57) (3.69)
SMB –0.06 –0.04 –0.02 –0.04 –0.07 0.16 0.09 –0.05
(–2.27) (–1.14) (–0.66) (–1.32) (–2.14) (9.33) (5.88) (–2.04)
HML –0.01 0.10 0.10 0.04 0.01 0.03 0.05 –0.03
(–0.44) (2.04) (2.12) (0.94) (0.23) (1.40) (2.31) (–1.01)
CMA –0.14 –0.12 –0.12 –0.05 –0.01 –0.02 0.00 –0.10
(–4.05) (–2.43) (–2.35) (–1.22) (–0.27) (–0.84) (0.12) (–2.73)
RMW –0.04 –0.04 –0.04 –0.03 0.01 0.06 0.08 –0.03
(–1.42) (–1.04) (–1.06) (–0.95) (0.40) (3.33) (5.16) (–1.10)
Information ratio 0.63 0.96 0.90 0.87 0.34 0.26 0.36 0.73
R2 64% 29% 28% 49% 46% 83% 87% 64%

Notes: Performance of the OOS EPO portfolios is presented after controlling for standard factors. The Equity 1 through Equity 8
portfolios are described in Table 1. Each column reports a multivariate regression of EPO on a standard industry momentum factor
(INDMOM) and the Fama–French five-factor model (Mkt-RF, SMB, HML, CMA, and RMW), from 1963 to 2018. Note that the
Fama–French five-factor model data series does not begin until 1963. All variables are ex post standardized to an annualized full-
sample volatility of 10% to make coefficients comparable. This scaling did not affect the t-statistics reported in parentheses.

result from the fact that in this specification, the EPO portfolio performance by accounting for noise in
is anchored to the long-only 1/N portfolio, which cre- the investor’s estimates of risk and expected return.
ates two issues: (1) a large market loading of 0.85 and The method encompasses several optimization pro-
(2) a trade-off (in the choice of the shrinkage param- cedures in the literature—notably, Black–Litterman
eter) between stabilizing the optimization and moving (1992), robust optimization, and regularization
toward a long-only portfolio, which does not exploit methods used in machine learning—so it demystifies,
signals about expected returns. Nevertheless, the unifies, and simplifies much of this literature.
EPO portfolios delivered strong performance across a
range of settings, and this strong performance cannot To illuminate why standard MVO techniques often
be explained by standard factors. fail, we identified the problem portfolios, to which
MVO gives large weight despite their poor perfor-
mance. Our EPO method addresses this issue via
Conclusion: A Practical Guide correlation shrinkage, which, perhaps surprisingly,
downweights the problem portfolios.
to Optimization
We developed a simple and transparent method Despite the method’s simplicity, EPO delivers power-
to make portfolio optimization work in practice. ful results empirically. Applying our EPO method to
The method is essentially as simple as standard several realistic examples, we found surprisingly large
mean–variance optimization. The simple EPO performance improvements in optimized industry
method uses a single extra input—namely, a correla- momentum and time-series momentum portfolios rel-
tion shrinkage parameter, which is chosen to maxi- ative to standard benchmarks and predictors used in
mize risk-adjusted returns in past data. EPO improves the literature. When applied to global assets, our EPO

20  Second Quarter 2021


 Enhanced Portfolio Optimization

time-series momentum portfolio substantially outper- portfolio, the 1/N benchmark, and a standard industry
formed the market portfolio, the 1/N portfolio, and momentum portfolio. This strong outperformance
even relatively sophisticated benchmarks that already of EPO cannot be explained by exposure to existing
perform substantially better than the 1/N portfolio. factors in the literature, such as the Fama–French
Indeed, the EPO method delivered significant alpha factors. Furthermore, the performance enhancements
relative even to volatility-scaled long-only and stan- are robust to a range of specifications. Although for
dard time-series momentum portfolios. These sophis- simplicity we focused on momentum predictors,
ticated benchmarks already deliver high Sharpe ratios future research could use this approach to enhance
because they exploit the lowest hanging fruits of other predictors.
optimization by (1) using information about expected
returns, (2) controlling for volatility differences across
assets and over time, (3) potentially exploiting market Appendix A. Summary of Notation
risk premiums and risk-parity effects, and (4) poten- and Auxiliary Results
tially readjusting asset class weights. This benchmark
In addition to a summary of the notation, we discuss
is a tough one to beat, yet EPO beat it.
the construction of TSMOM factors and random
When applied to equities, our EPO industry momen- matrix theory.
tum portfolio substantially outperformed the market

Summary of Notation

Symbol Meaning

( )
r = r 1 ,..., r n ′ Vector of excess returns

(
x = x1 ,..., x n ′ ) Vector of portfolio holdings

g Relative risk aversion

( )
s = s1 ,..., s n ′ Vector of signals about expected excess returns

Σ = var(r |s) Variance–covariance matrix

Σ Enhanced risk estimate

Σ w = (1 − w )Σ + wV Shrunk variance–covariance matrix

w EPO shrinkage parameter

σ = diag ( )
Σ11 ,…, Σ nn = diag(σ1 ,…, σ n ) Diagonal matrix of volatilities

V = σ2 Diagonal matrix of variances

Ω = PDP′ Correlation matrix

P Matrix whose columns are principal-component portfolio weights


(eigenvectors)
D Diagonal matrix of variances of principal-component portfolios
(eigenvalues)
a Anchor portfolio
µ True, but unobserved, expected return
τ Variation in true expected returns

Λ, λ Error in the estimation of expected returns

21
Volume 77 Number 2 
Financial Analysts Journal | A Publication of CFA Institute

Standard Time-Series Momentum The subsection “Shrinking Correlations: The Simple


EPO” discusses a simple way to stabilize risk—namely,
(TSMOM) Factors by shrinking correlations toward zero. How do we
Following Moskowitz et al. (2012), we used the choose the shrinkage parameter, q? One approach
“global assets” data to construct standard TSMOM is to choose a parameter that works well empirically
factors. In particular, we considered the equal- (by looking at past data), but one can also use random
notional-weighted TSMOM factor with the following matrix theory to derive an asymptotically optimal
notional positions: choice (Ledoit and Wolf 2004). Furthermore, RMT
can be used to derive more general forms of stabi-
1 lized correlation matrices, such as a nonlinear shrink-
xtTSMOM, equal-notional-weighted =
nt
(
sign rti−12,t . ) (A1) age of the eigenvalues (see El Karoui 2008; Ledoit
and Wolf 2017; Bun et al. 2017; and the references
in them).
This factor goes long or short depending on the sign
of the past 12 months’ excess returns and invests Whereas standard statistics relies on estimates to
equally across the nt available assets. Notional- be close to the true values when the number of
weighted portfolios are not common in practice time periods, T, is large, RMT, instead, deals with
when investing across asset classes with large the “big data” environment of modern financial
cross-sectional dispersion of volatilities. In such markets—that is, when we have large values of both
cases, a notional-weighted portfolio’s risk may be the number of securities, n, and the number of time
dominated by a few assets or asset classes with periods. Specifically, RMT considers what happens
higher volatilities. Nevertheless, we include com- when T → ∞ and n → ∞ such that n/T → q, where the
parisons to notional-weighted portfolios because number q is typically in (0,1). In practice, this aspect
they are the standard benchmark in the academic of the theory means that we can learn a lot about a
literature, are close to the 1/N portfolio, and are variance–covariance matrix simply from knowing the
still used by some investors in practice (e.g., many ratio of the number of securities to the number of
investors who are benchmarked to a 60/40 stock/ time periods used for estimation.
bond portfolio). We also considered the equal-
volatility-weighted TSMOM factor with notional In line with our analysis in “Shrinking Correlations:
positions given by The Simple EPO,” RMT is focused on the eigenval-
ues of the matrix. A basic result is that if all returns
1 40% are independent across securities and over time,
xtTSMOM, equal-volatility-weighted =
nt σit
(
sign rti−12,t . ) then the asymptotic distribution of the eigenvalues
is known explicitly and is given by Marčenko and
 (A2) Pastur (1967). As shown in the example in Figure A1,
the Marčenko–Pastur distribution fits the distribu-
Baltas (2015) and Yang et al. (2019) considered tion of the observed eigenvalues well even in a single
equal-risk-contribution TSMOM portfolios by sample. This characteristic is called the self-averaging
extending the concept of “equal risk contribution” property of random matrices.
to long–short portfolios. In contrast, our anchor
portfolio simply targets equal standalone volatility Of course, security returns from real financial data
in each asset, thus matching the Moskowitz et al. are not independent, so the distribution of eigen-
(2012) implementation. values from real data does not closely fit Marčenko
and Pastur (1967). The point is that on the basis of
Marčenko and Pastur, we know what random noise
Random Matrix Theory in eigenvalues looks like. In particular, the “bulk” of
The subsection “Problem Portfolios” in the sec- small eigenvalues inside the Marčenko–Pastur dis-
tion “Identifying the Problem with Standard tribution are probably just noise, whereas the larger
Optimization” shows that errors in the estimated risk eigenvalues outside the bulk are more likely to reflect
model lead to problems for MVO. Specifically, small true common return factors. Interestingly, we can
eigenvalues of the variance–covariance matrix give talk about a specific “bulk” because the Marčenko–
rise to “problem portfolios.” These problem portfolios Pastur distribution is concentrated on a bounded
may be accommodated by stabilizing the correlation interval; this distribution is very different from the
matrix, but what is the best way to do this?

22  Second Quarter 2021


 Enhanced Portfolio Optimization

Figure A1. Distribution 1.0


of Eigenvalues for
0.9
Independent Securities
0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.5 1.0 1.5 2.0 2.5 3.0

Eigenvalues Marčenko–Pastur Distribuon

Notes: This figure is a histogram of the eigenvalues for the correlation matrix for 1,000 securities
with returns simulated over 2,000 days, where the returns are assumed to be independently
and identically distributed (i.i.d.) normal. The true eigenvalues are all 1 for a correlation matrix of
independent securities, but estimation noise creates randomness (smaller and larger estimated
eigenvalues), which is well captured by Marčenko and Pastur (1967).

normal distribution that we are used to seeing as a j j


1 T  rti − r i   rt − r 
limiting distribution in standard statistics. Ωij = ∑   
T t =1  σi   σ j 
, (A3)
 
RMT offers various methods to “clean” the correla-
tion matrix in the following two steps. First, we
where r i is the average return of security i and σi
replace the estimated eigenvalues, ( D1 ,…, Dn ),
is the standard deviation of the return. In standard
( 1 n )
with cleaned eigenvalues D ,…, D while, typi- “frequentist statistics,” we then let the number of
cally, leaving the eigenvectors, P, unchanged.23 For time periods go to infinity, concluding that the esti-
this cleaning of eigenvalues, we focus here on the mated correlation matrix converges to the population
“IWs” method described in Bun et al. (2017), which is counterpart (and having access to the central limit
essentially the same as the RIE (rotationally invariant theorem).
estimator) method described in Box 1 of Bun et al.
(2016), with the extra steps of sorting the cleaned RMT, instead, considers the limit T , n → ∞ such that
eigenvalues by size (to ensure that the ordering of n/T → q. The remarkable result is that the empirical
the cleaned eigenvalues matches that of the original distribution of eigenvalues of W converges to the
eigenvalues) and rescaling the cleaned eigenvalues Marčenko–Pastur distribution. When ratio q satis-
to ensure that their sum matches that of the original fies q ∈(0, 1), then density f of the Marčenko–Pastur
eigenvalues. Then, we recover the cleaned correla- distribution is given by
tion matrix as Ω  −1 and the cleaned variance–
 = PDP
covariance matrix as Σ = σΩ σ.
(q+ − d )(d − q− )
f (d ) = (A4)
To understand Marčenko and Pastur (1967) in more 2πqd
detail, we start with the estimated correlation matrix,
W, for n i.i.d. random returns observed over T time
periods:

23
Volume 77 Number 2 
Financial Analysts Journal | A Publication of CFA Institute

for d ∈ (q− , q+ ), and otherwise, f(d) = 0, where Based on this solution to the minimization problem,
we can write the robust portfolio problem in the
( ) ( )
2 2
q− = 1 − q and q+ = 1 + q . A slightly more following way:
complicated result holds for q > 1. This density is
plotted in Figure A1 together with a histogram of  γ 
max ( x − a)′s − x′Σx − c ( x − a)′Λ ( x − a)  .
estimated eigenvalues. This is a form of central limit x  2 
theorem for RMT.
Given that c can be chosen freely, the set of solu-
tions (as we vary the parameter c) is the same as the
set of solutions where we drop the square root (see
Appendix B. Proofs Lemma 1 below). Further, for consistency with the
other sections, we replace the parameter c by the
Proof of Proposition 1 parameter t (which we put in the denominator) and
This model yields the following posterior mean for m: drop constant terms:

 γ γ 
E(µ|s) = E(µ ) + cov(µ, s) var(s)−1 (s − E(s)) max  x′s − x′Σx − ( x − a)′Λ ( x − a)  .
x  2 2τ 
= γΣa + τΣ( τΣ + Λ )−1 (s − γΣa)
= Σ(τΣ + Λ )−1 τs + γΣ I − ( τΣ + Λ )−1 τΣ  a The first-order condition is

= Σ( τΣ + Λ )−1 τs + γΣ( τΣ + Λ )−1 Λa γ


0 = s − γΣx − Λ ( x − a),
= Σ(τΣ + Λ )−1 ( τs + γΛa), τ

where the first equality is due to the standard formula which yields the final solution to the robust portfolio
for conditional means of normally distributed random optimization problem:
variables (or, equivalently, the standard ordinary least-
squares, OLS, formula for regressing m on s) and the 1
x = ( τΣ + Λ )−1 ( τs + γΛa).
fourth equality uses the Woodbury matrix identity.24 γ

Proof of Proposition 2 Lemma 1


We first solve the minimization problem inside For any vector a ∈ Rn and positive definite matri-
Equation 15. For this, consider the Lagrangian: ces B, C ∈ Rn×n, the set of solutions to Problem A,
{ }c≥0
x*A (c) , equals the set of solutions to Problem B,

L = ( x − a)′µ + l (µ − s)′ Λ−1 (µ − s) − c2  ,


 
{x*B (d)}d≥0, where
Problem A: max( x′a − x′Bx − cx′Cx )
where l is the Lagrange multiplier. Differentiating x
with respect to m, we get the first-order condition:
Problem B: max( x′a − x′Bx − d x'Cx ).
x
Λ−1 (µ − s)
0 = ( x − a) + 2lΛ

so µ = s −
1
Λ ( x − a). Choosing l so that the constraint
Proof of Lemma 1
2l For a given c, note that the solution x*A (c ) to Problem A
specifying the uncertainty region is satisfied with
satisfies the first-order condition:
equality, we see that the solution to the minimization
problem is
0 = a − Bx − 2cCx.

c
µ =s− Λ ( x − a). We wish to show that x*A (c ) also satisfies the first-
( x − a)′Λ ( x − a) order condition corresponding to Problem B for an
appropriate choice of d:

24  Second Quarter 2021


 Enhanced Portfolio Optimization

d (summarized by s) are estimated in a regression, then


0 = a − Bx − Cx. a ridge regression can be used to stabilize the param-
x ′Cx
eter estimates. This is related to, but somewhat
We see that the result holds for different from, the stabilization of the optimization
behind the EPO solution.
d = 2c (x*A (c))′ Cx*A (c) . To see the direct relation to EPO, recall that we seek
Similarly, for any given d with corresponding solu- to solve the first-order condition for the optimal
tion x*B (d ) to Problem B, we see that this vec- portfolio problem (Equation 3), s = γΣ Σx. That is, we
tor is also a solution to Problem A when we let need to solve for the optimal portfolio x based on
  the noisy data on S and s. We rewrite this equation

( )
c = d / 2 x *B (d ) ′ Cx *B (d )  .
 1
as Σ −1/2s = Σ1/2 x + ε , introducing an error term e
γ
in order to interpret this equation as a regression
Proof of Proposition 3 (and to indicate that we are willing to accept that the
Parts 1 and 2 are clear. Regarding part 3, the deriva- equation does not hold with equality, in exchange for
tion is shown in “Anchoring Expected Returns: A robustness).25 We interpret the left-hand side as the
Bayesian Approach.” Regarding the relation to Black dependent variable in a regression and the right-
and Litterman (1992), we use the superscript BL hand side as the independent variable multiplied by
to indicate their notation. With the relations that the “regression coefficient” x. The ridge regression
1
ΠBL = γΣa, QBL = s, PBL = I, ΩBL = Λ, ΣBL = Σ , and estimator is x = (Σ + λI)−1 s, which is closely related
τBL = τ, their expression in point 8 of their appendix γ
to the EPO solution.
can be shown to equal our expression for the condi-
tional mean: The Tikhonov regularization introduces a matrix G
(instead of the multiple of the identity matrix, λI)
E(µ|s) = Σ( τΣ + Λ )−1 ( τs + γΛa)  minimizes (y − zβ
and β)′ (y − zβ) + β′Γ′Γβ with solution
β Tikhonov = (z′z + Γ′Γ )−1 z′y . In our context, we can use
= ( τI + ΛΣ −1 )−1 ( τs + γΛa)
the same regression as above with Γ = λσ , which
= ( τΛ−1 + Σ −1 )−1 Λ−1 ( τs + γΛa) 1
yields x = (Σ + λV )−1 s using V = σ′σ . This solution
γ
= ( τΛ−1 + Σ −1 )−1 ( τΛ−1s + γa) is proportional to the simple EPO—that is, it is the
−1 simple EPO solution with a different risk aversion.
Σ)−1 + Λ−1 
= ( τΣ ( τ−1γa + Λ−1s)
Next, consider the Lavrentiev regularization (which is
−1
= ( τBL ΣBL )−1 + (ΩBL )−1  a generalized version of the Tikhonov regularization
when z is symmetric and positive definite), which
⋅ ( τB L ΣBL )−1 ΠBL + (ΩBL )−1 QBL  . generally solves y = zβ + ε by choosing b in order to
 
2 2
minimize zβ − y z−1
+ β − β0 , where the norm is
Q
Regarding our part 4, the derivation of robust opti- 2
mization is in “Anchoring Expected Returns: Robust defined as x = x′Qx, Q is a symmetric matrix, and
Q
Optimization,” using Lemma 1, which is stated and b0 is a base-case parameter choice. The solution is
proved in this appendix. βˆ Lavrentiev = (z + Q )−1 ( y + Qβ0 ). Next, consider this
1
Regarding part 5, note that a ridge regression is regularization for the regression s = Σx + ε, where
γ
a method used to mitigate noise and collinearity again, we are solving for x, letting the anchor port-
in a regression setting. Specifically, consider the 1
regression y = zβ + ε, where b is the vector of regres- folio a play the role of β 0 and Λ play the role of Q.
τ
sion coefficients. The ridge regression chooses the −1
 1  1 1 
b that minimizes the sum of squared errors plus a Then, we have x =  Σ + Λ   s + Λa  , which is
 τ   γ τ 
scalar, say l, times the sum of squared regression
exactly equal to the EPO portfolio.
coefficients, ( y − zβ)′ ( y − zβ) + λβ′β. The solution is
βˆ ridge = (z′z + λI)−1 z′y , so we see that the symmetric Lastly, consider the regression of a vector of ones,
matrix z′z is being pushed toward the identity matrix 1, on a matrix, R, of realized excess returns for all
I, ensuring invertibility. So, if expected returns n assets over T time periods, 1 = Rx + ε. As pointed

25
Volume 77 Number 2 
Financial Analysts Journal | A Publication of CFA Institute

out by Britten-Jones (1999), the OLS estimate, 1  1


−1
−1 get x =  R ′R + λV  R ′1, which is the simple EPO
1  1 T  T
x =  R ′R  R ′1, is the standard MVO when we
T  T under the stated assumptions.
1
view the average realized return, R ′1, as the signal
T
about expected returns and the realized second
−1
1  Editor’s Note
moment,  R ′R  , as the variance estimate. If we
T  Submitted 13 August 2020
use the Tikhonov regularization with Γ = λT σ , we Accepted 17 November 2020 by Stephen J. Brown

Notes
1. A large literature has addressed estimation noise—for by showing the link to correlation shrinkage (which is not
example, Ledoit and Wolf (2003, 2004) on noise in clear from the equations in Black and Litterman, p. 42), by
variance–covariance matrices and Black and Litterman presenting a simple, new, and powerful way to operation-
(1992) on noise in expected returns. alize the method, and by documenting empirically how it
works.
2. Note that this result is not simply the same as saying that
averaging portfolios improves performance (as shown by 5. Appendix A describes a method to stabilize the risk model
Tu and Zhou 2011). We found that EPO can work even that is more sophisticated than shrinking correlations
better. For example, if we first compute the standard called “random matrix theory” (RMT). We have found
MVO portfolio without shrinkage, x w =0, and the solution empirically, however, that EPO works as well with simple
with full shrinkage, x w =1, and then take the average of correlation shrinkage as with RMT.
these, ax w = 0 + (1 − a) x w =1, the result does not work as well
as our EPO method for any a, especially if the MVO is 6. The variance of h is proportional to S in order to capture
particularly ill behaved. The EPO method first shrinks and the idea that true fluctuations in expected returns are cor-
then optimizes, not the other way around, which is useful related across correlated assets (similar to the assumption
because shrinking the correlations stabilizes the optimiza- made in Point 7 of the appendix of Black and Litterman
tion process. 1992). Expressed in a different way, the PC portfolios
have expected returns P ′σ −1µ = γP ′σ −1 Σa + P ′σ −1 η,
3. We unify several leading approaches to optimization, but where the random fluctuation term, P ′σ −1 η, has variance
EPO obviously does not nest all methods. Roncalli (2013) τP ′σ −1 Σσ −1P = τD, implying that the expected returns of
and Bruder, Gaussel, Richard, and Roncalli (2013) reviewed the least important principal components vary the least.
various methods of regularizing MVO, including a discus-
sion of the eigendecomposition of the variance–covariance 7. To understand the anchor at a deeper level, consider
matrix similar to our problem portfolios, showing that again the case of h = 0. In this case, the expected
the risk of these portfolios is low. We additionally show excess return on any asset—say, asset number 1, is
that the expected return of problem portfolios is too high E(r1 ) = γ (1, 0,…, 0) Σa = γ cov (r1 , ra |s). Using this relationship
(see Panel B of Figure 1) and that large EPO shrinkage for anchor portfolio a and solving for γ = E(ra ) / var(ra |s),
can help address both these problems. DeMiguel et al.
we get E(r1 ) = cov(r1 , ra |s) / var(ra |s)  E(ra ) =: β1,aE(ra ). If a is
(2009), considering 14 methods of optimization, found that
none consistently outperformed the simple 1/N portfolio. the market portfolio, this relationship is simply the condi-
Some methods do show promise in outperforming the tional capital asset pricing model (CAPM). Hence, Equation
1/N portfolio, however, such as methods that constrain 12 defining m means that the CAPM holds, on average,
the portfolio norm (Jagannathan and Ma 2003; DeMiguel, but h pushes the expected returns around in such a way
Garlappi, Nogales, and Uppal 2009), methods based on that the CAPM does not always hold exactly, resulting in
ambiguity aversion (Garlappi, Uppal, and Wang 2007), trading opportunities. More generally, Equation 12 says
methods that average several approaches (Tu and Zhou that the anchor is the tangency portfolio when there are
2011), and methods that apply careful MVO with good no shocks (h = 0).
inputs (Allen, Lizieri, and Satchell 2019).
8. To our knowledge, the specification of Equation 15 and
4. Although a version of EPO can be shown to be equivalent its solution is new, but Fabozzi et al. (2010) considered a
to Black and Litterman (1992), there are several differ- version of Equation 15 that is simpler in two ways: First,
ences. Indeed, Black and Litterman always shrank toward whereas we consider a general Λ, Fabozzi et al. assumed
the market portfolio, whereas we consider a general that Λ equals Σ, which means that there is no shrinkage of
anchor (or no anchor); they considered long–short “view the variance–covariance matrix, and second, Fabozzi et al.
portfolios,” whereas we simply consider signals about did not have an anchor portfolio.
expected returns, such as industry momentum or time-
series momentum, and we allow “double shrinkage”—of 9. The assumption of independence of errors in the expected
both the estimated expected returns and the variance– returns across securities, Λ = λV, implies that the error in
covariance matrix. Most importantly, our contribution is the measurement of the expected return of the principal
to unify this approach with other optimization methods components has a variance given by P′σ−1 (λV ) σ−1P = λI,

26  Second Quarter 2021


 Enhanced Portfolio Optimization

where I is the identity matrix. That is, errors of all the 18. Some studies have considered longer time horizons—for
principal components are independent and of equal example, past five-year returns. Past long-term returns,
magnitude. however, predict returns negatively, if at all, perhaps
because securities that have risen in price over a long time
10. Alternatively, we can think of the anchor being a = 0, which have become expensive (De Bondt and Thaler 1985). Alas,
gives the same result as Equation 20 up to a constant that comparing optimization methods using a faulty signal of
can be absorbed in the risk aversion coefficient. However, expected returns is not informative.
we think of the anchor as also being the EPO portfolio
with full shrinkage, w = 1, implying that a = (1 / γ )V −1s is the 19. Babu et al. (2020) reported a median time-series momen-
more natural interpretation of Equation 20. tum Sharpe ratio per asset of 0.34 per year (i.e., 0.10 per
month) for traditional assets.
11. Investors can also avoid specifying g altogether by solving
an equivalent optimization that maximizes expected 20. Indeed, this coefficient implies that the EPO portfolio
returns subject to a maximum volatility constraint, thus with full shrinkage, EPO s (w = 100%) = (1/ γ t )Vt−1s t, has
specifying a volatility target in lieu of g. a notional exposure to asset i that matches that of
Moskowitz et al. (2012) given in Appendix A. That is,
12. Choosing g may be done in several other, related ways, EPO s (w = 100%)i
some of which work better than others. For example,
2
although g in Equation 22 equalizes the variance of the
−1s, one could also replace the

( ) ( )
= (1/ γ t )  sti / σit  = (1/ nt ) 40%/ σit sign rti−12,t .
 
( )
anchor with that of (1/ γ ) Σ w
latter with the variance of the standard MVO solution, 21. Because the number of assets in our sample varied over
(1/γ )Σ −1s, but this is a poor choice if the standard MVO is time, we scaled the realized and ex ante average returns
ill behaved. Ao et al. (2019) and Raponi et al. (2020) also and volatilities to preserve the trace of the correlation
considered methods where g is based on variance. matrix—that is, ensuring that the sum of variances would
equal the largest number of assets in our sample, 55.
13. Specifically, the general EPO is the solution to a Lavrentiev
regularization (Lavrentiev 1967), and the simple EPO is the 22. From 1963 to 2018, the five Fama–French factors realized
solution to a Tikhonov regularization. The simple EPO can Sharpe ratios between 0.27 and 0.49 and the equal-
also be seen as a ridge regression of a vector of 1s on the weighted portfolio of all five factors realized a Sharpe
matrix of realized returns when risk and expected returns ratio of 0.93.
are estimated by their sample counterparts.
23. Estimates of the eigenvectors are kept equal to the sample
14. If we had included non-USD currency pairs, then the eigenvectors to make the estimate of the correlation
variance–covariance matrix would not be of full rank matrix rotational invariant, meaning that rotating the data
because, for example, EUR–USD, EUR–JPY, and USD–JPY by some orthogonal matrix rotates the estimator in the
are linked through a triangular arbitrage. same way (see Ledoit and Wolf 2012; Bun et al. 2017).

15. Available at https://mba.tuck.dartmouth.edu/pages/ 24. The Woodbury matrix identity shows a way to rewrite
faculty/ken.french/data_library.html. the inverse of a sum of matrices and, using the Woodbury
formula, we see that
16. The annualized variance of instrument i was esti-
mated as (σ ti )2 = 261∑ k = 0, … , ∞ (1 − δ )δ k (rti−1 − k − rt i )2 , −1 −1

(
I − ( τΣ + Λ )−1 τΣ  = I + Λ−1τΣ

) =  Λ−1 ( Λ + τΣ ) 
rt i
where is the exponentially weighted average return
−1
computed similarly, 261 annualizes the daily returns, = ( τΣ + Λ ) Λ.
and d was chosen to achieve a center of mass of
∑ k = 0, … , ∞ (1 − δ)δ k k = δ /(1 − δ) = 60 days. The correla- 25. We can also write the regression in a simpler way,
tions were estimated by first computing covariance and 1
s = Σx + ε, as we do when we consider the Lavrentiev
volatilities in the corresponding way—using 3-day returns γ
with 150-day center of mass—and then computing the regularization. When we use the standard ridge regres-
1
( )
−1
correlations as ratios of the covariances to the product of sion on this simpler equation, we get x = Σ 2 + λI Σs,
the volatilities. We required at least 300 days of data to γ
be available for an asset before it entered the covariance so we have written the regression differently to avoid the
matrix. S-squared.

17. In other words, the covariance of assets i and j is estimated


as [1/(K − 1)] ∑ (rti− k − rt i ) (rtj− k − rt j ).
k =1, … , K

References
Allen, D., C. Lizieri, and S. Satchell. 2019. “In Defense Ao, Mengmeng, Yingying Li, and Xinghua Zheng. 2019.
of Portfolio Optimization: What If We Can Forecast?” “Approaching Mean–Variance Efficiency for Large Portfolios.”
Financial Analysts Journal 75 (3): 20–38. Review of Financial Studies 32 (7): 2890–919.

27
Volume 77 Number 2 
Financial Analysts Journal | A Publication of CFA Institute

Asness, C., T. Moskowitz, and L. H. Pedersen. 2013. “Value Fan, Jianqing, Yingying Fan, and Jinchi Lv. 2008. “High
and Momentum Everywhere.” Journal of Finance 68 (3): Dimensional Covariance Matrix Estimation Using a Factor
929–85. Model.” Journal of Econometrics 147 (1): 186–97.
Babu, Abhilash, Ari Levine, Yao Hua Ooi, Lasse Heje Pedersen, Garlappi, Lorenzo, Raman Uppal, and Tan Wang. 2007.
and Erik Stamelos. 2020. “Trends Everywhere.” Journal of “Portfolio Selection with Parameter and Model Uncertainty: A
Investment Management 18 (1): 52–68. Multi-Prior Approach.” Review of Financial Studies 20 (1): 41–81.
Baltas, N. 2015. “Trend-Following, Risk-Parity and the Gârleanu, Nicolae, and Lasse Heje Pedersen. 2013. “Dynamic
Influence of Correlations.” In Risk-Based and Factor Investing, Trading with Predictable Returns and Transaction Costs.”
edited by Emmanuel Jurczenko, 65–95. Amsterdam: Elsevier. Journal of Finance 68 (6): 2309–40.
Baltas, N., and R. Kosowski. 2020. “Demystifying Time-Series ———. 2016. “Dynamic Portfolio Choice with Frictions.”
Momentum Strategies: Volatility Estimators, Trading Rules Journal of Economic Theory 165 (September): 487–516.
and Pairwise Correlations.” In Market Momentum: Theory and
Practice, edited by Stephen Satchell and Andrew Grant, 30–67. Jagannathan, Ravi, and Tongshu Ma. 2003. “Risk Reduction in
Hoboken, NJ: Wiley. Large Portfolios: Why Imposing the Wrong Constraints Helps.”
Journal of Finance 58 (4): 1651–83.
Black, Fischer, and Robert Litterman. 1992. “Global Portfolio
Optimization.” Financial Analysts Journal 48 (5): 28–43. Kozak, Serhiy, Stefan Nagel, and Shrihari Santosh. 2020.
“Shrinking the Cross-Section.” Journal of Financial Economics
Britten‐Jones, Mark. 1999. “The Sampling Error in Estimates of 135 (2): 271–92.
Mean–Variance Efficient Portfolio Weights.” Journal of Finance
54 (2): 655–71. Lavrentiev, M. M. 1967. Some Improperly Posed Problems of
Mathematical Physics. New York: Springer.
Bruder, Benjamin, Nicolas Gaussel, Jean-Charles Richard, and
Thierry Roncalli. 2013. “Regularization of Portfolio Allocation.” Ledoit, Olivier, and Michael Wolf. 2003. “Improved Estimation
(June). Available at SSRN: https://ssrn.com/abstract=2767358 of the Covariance Matrix of Stock Returns with an Application
or http://dx.doi.org/10.2139/ssrn.2767358. to Portfolio Selection.” Journal of Empirical Finance 10 (5):
603–21.
Bun, Joël, Jean-Philippe Bouchaud, and Marc Potters. 2016.
“Cleaning Correlation Matrices.” Risk.net (29 March). ———. 2004. “A Well-Conditioned Estimator for Large-
Dimensional Covariance Matrices.” Journal of Multivariate
———. 2017. “Cleaning Large Correlation Matrices: Tools from Analysis 88 (2): 365–411.
Random Matrix Theory.” Physics Reports 666: 1–109.
———. 2012. “Nonlinear Shrinkage Estimation of Large-
Clarke, R. G., H. de Silva, and S. Thorley. 2006. “Minimum– Dimensional Covariance Matrices.” Annals of Statistics
Variance Portfolios in the U.S. Equity Market.” Journal of 40 (2): 1024–60.
Portfolio Management 33 (1): 10–24.
———. 2017. “Nonlinear Shrinkage of the Covariance Matrix for
De Bondt, Werner F. M., and Richard Thaler. 1985. “Does the Portfolio Selection: Markowitz Meets Goldilocks.” Review of
Stock Market Overreact?” Journal of Finance 40 (3): 793–805. Financial Studies 30 (12): 4349–88.
DeMiguel, Victor, Lorenzo Garlappi, Francisco J. Nogales, and Marčenko, Vladimir A., and Leonid A. Pastur. 1967.
Raman Uppal. 2009. “A Generalized Approach to Portfolio “Distribution of Eigenvalues for Some Sets of Random
Optimization: Improving Performance by Constraining Matrices.” Mathematics of the USSR-Sbornik 1 (4): 457–83.
Portfolio Norms.” Management Science 55 (5): 798–812.
Markowitz, H. 1952. “Portfolio Selection.” Journal of Finance
DeMiguel, Victor, Lorenzo Garlappi, and Raman Uppal. 2009. 7 (1): 77–91.
“Optimal versus Naive Diversification: How Inefficient Is the
1/N Portfolio Strategy?” Review of Financial Studies 22 (5): Michaud, R. O. 1989. “The Markowitz Optimization Enigma: Is
1915–53. ‘Optimized’ Optimal?” Financial Analysts Journal 45 (1): 31–42.

El Karoui, Noureddine. 2008. “Spectrum Estimation for Large Moskowitz, T. J., and M. Grinblatt. 1999. “Do Industries
Dimensional Covariance Matrices Using Random Matrix Explain Momentum?” Journal of Finance 54 (4): 1249–90.
Theory.” Annals of Statistics 36 (6): 2757–90. Moskowitz, T., Y. H. Ooi, and L. H. Pedersen. 2012. “Time
Elton, Edwin J., Martin J. Gruber, and Jonathan Spitzer. Series Momentum.” Journal of Financial Economics 104 (2):
2006. “Improved Estimates of Correlation Coefficients and 228–50.
Their Impact on Optimum Portfolios.” European Financial Raponi, Valentina, Raman Uppal, and Paolo Zaffaroni. 2020.
Management 12 (3): 303–18. “Robust Portfolio Choice.” Unpublished working paper,
Fabozzi, Frank J., Dashan Huang, and Guofu Zhou. 2010. Imperial College Business School.
“Robust Portfolios: Contributions from Operations Roncalli, T. 2013. Introduction to Risk Parity and Budgeting.
Research and Finance.” Annals of Operations Research Boca Raton, FL: CRC Press.
176 (1): 191–220.
Tu, Jun, and Guofu Zhou. 2011. “Markowitz Meets Talmud:
Fama, Eugene F., and Kenneth R. French. 1993. “Common A Combination of Sophisticated and Naive Diversification
Risk Factors in the Returns on Stocks and Bonds.” Journal of Strategies.” Journal of Financial Economics 99 (1): 204–15.
Financial Economics 33 (1): 3–56.
Yang, K., E. Qian, and B. Belton. 2019. “Protecting the
———. 2015. “A Five-Factor Asset Pricing Model.” Journal of Downside of Trend When It Is Not Your Friend.” Journal of
Financial Economics 116 (1): 1–22. Portfolio Management 45 (5): 99–111.

28  Second Quarter 2021

You might also like