0% found this document useful (0 votes)
23 views34 pages

giglio-et-al-2022-factor-models-machine-learning-and-asset-pricing

This document surveys recent methodological advancements in asset pricing, focusing on factor models and machine learning techniques. It categorizes methodologies based on their objectives, such as estimating expected returns and risk premia, while also addressing challenges in measurement and inference. The authors aim to guide financial economists in utilizing modern tools to enhance empirical asset pricing research and suggest future research directions.

Uploaded by

maning009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views34 pages

giglio-et-al-2022-factor-models-machine-learning-and-asset-pricing

This document surveys recent methodological advancements in asset pricing, focusing on factor models and machine learning techniques. It categorizes methodologies based on their objectives, such as estimating expected returns and risk premia, while also addressing challenges in measurement and inference. The authors aim to guide financial economists in utilizing modern tools to enhance empirical asset pricing research and suggest future research directions.

Uploaded by

maning009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Annual Review of Financial Economics

Factor Models, Machine


Learning, and Asset Pricing
Stefano Giglio,1 Bryan Kelly,1,2 and Dacheng Xiu3
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

1
Yale School of Management, Yale University, New Haven, Connecticut, USA;
email: [email protected], [email protected]
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

2
AQR Capital Management, Greenwich, Connecticut, USA
3
Booth School of Business, University of Chicago, Chicago, Illinois, USA;
email: [email protected]

Annu. Rev. Financ. Econ. 2022. 14:337–68 Keywords


First published as a Review in Advance on
asset pricing, machine learning, factor models, stochastic discount factor,
August 8, 2022
risk premium
The Annual Review of Financial Economics is online at
financial.annualreviews.org Abstract
https://doi.org/10.1146/annurev-financial-101521-
We survey recent methodological contributions in asset pricing using fac-
104735
tor models and machine learning. We organize these results based on
Copyright © 2022 by Annual Reviews.
their primary objectives: estimating expected returns, factors, risk exposures,
All rights reserved
risk premia, and the stochastic discount factor as well as model compar-
JEL codes: C52, C55, C58, G0, G1, G17
ison and alpha testing. We also discuss a variety of asymptotic schemes
for inference. Our survey is a guide for financial economists interested in
harnessing modern tools with rigor, robustness, and power to make new as-
set pricing discoveries, and it highlights directions for future research and
methodological advances.

337
1. INTRODUCTION
Factor models are natural workhorses for modeling equity returns because they offer a parsimo-
nious statistical description of the returns’ cross-sectional dependence structure. Return factor
models evolved from early asset pricing theories, most notably the capital asset pricing model
(CAPM) of Sharpe (1964) and the intertemporal CAPM (ICAPM) of Merton (1973). These and
other seminal factor models used observable financial and macroeconomic variables as risk factors
motivated by economic theory.
The arbitrage pricing theory (APT) of Ross (1976) later provided a rigorous economic link
between the factor structure in returns and risk premia through no-arbitrage conditions. One
important innovation in the APT was the ability to speak directly to foundational economic con-
cepts, such as risk exposures and risk premia, without requiring a specific identity or economic
interpretation for the factors. The APT’s focus on a common factor structure (that could be rep-
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

resented by any type of factors, whether observable or unobservable, traded or nontraded) spurred
a line of inquiry lending itself to primarily statistically oriented models of returns. In light of this,
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

factor models have become the single most widely adopted empirical research paradigm for aca-
demics and practitioners alike. In particular, the APT opens the door to latent factor models for
returns. Being intimately tied to unsupervised and semisupervised machine learning, the APT and
latent factor models can be viewed as catalysts for the revolution of machine learning methods in
empirical asset pricing.
Linking risk premia and the (observable or latent) factor structure requires first providing a
measurement of those quantities. Measurement issues are notoriously difficult for expected re-
turns because market efficiency forces return variation to be dominated by unforecastable news.
In addition, the sample size of equity returns is small relative to the predictor count. Structural
breaks, regime switches, and nonstationarity in general further diminish the effective sample size.
Furthermore, the collection of candidate conditioning variables is large, and such variables are
often close cousins and highly correlated. Further still, complicating the problem is ambiguity
regarding functional forms through which the high-dimensional predictor set enters into the ex-
pected returns. All these issues result in a low signal-to-noise ratio environment that affects the
measurement of risk premia and is in stark contrast to prediction problems in computer science
and other domains.
Certain aspects of the machine learning paradigm, such as variable selection and dimension
reduction, have been part of empirical asset pricing since the very beginning of this research field.
In early days, economic theories and parsimonious model specifications were adopted in order to
regularize learning problems in financial markets. Indeed, we have become accustomed to sorting
stocks by their characteristics, forming equal or value-weighted portfolios, and selecting a small
number of portfolios as factor proxies. These choices have been made, either explicitly or implic-
itly, to cope with nonlinearity, low signal-to-noise ratios, and the curse of dimensionality, which
are difficult realities when studying asset returns.
Recent decades have seen the rapid growth of exploratory and predictive techniques proposed
by the statistics and machine learning communities. These tools complement economic theory to
provide a data-driven solution to the empirical challenges of asset pricing. Embracing these tools
enables economists to make rigorous, robust, and powerful empirical discoveries about which
economic theory alone may not be a sufficient guide. Conversely, these new discoveries can offer
new insights from data that in turn lead to improved economic theories.
Our objectives in this article are twofold. First, we survey recent methodological contributions
in empirical asset pricing. We categorize these methodologies based on their primary purposes,
which range from estimating expected returns, factors and assets’ factor exposures, risk premia, and
stochastic discount factors (SDFs) to comparing asset pricing models and testing alphas. Second,

338 Giglio • Kelly • Xiu


we discuss the accompanying asymptotic theory, broken out by the focus on time series asymptotics
(large T), cross-sectional asymptotics (large N), or two-dimensional panel asymptotics (large T and
N), to help guide financial economists to the methods most appropriate for their specific research
needs. Along the way, we compare methodologies, highlight their strengths and limitations, and
point out future directions for improvement.
Throughout the survey, we use (A : B) to denote the concatenation (by columns) of two matrices
A and B. ei is a vector with 1 in the ith entry and 0 elsewhere whose dimension depends on the
context. ιk denotes a k-dimensional vector with all entries being 1, and IK denotes the K × K
T
identity matrix. For any time series of vectors {at }t=1
T
, we denote ā = T1 t=1 at . In addition, we
write āt = at − ā. We use the capital letter A to denote the matrix (a1 : a2 : . . . : aT ) and write
A¯ = A − āιT correspondingly. We denote PA = A(A A)−1 A and MA = IK − PA for some K × T
matrix A. We use a  b to denote the max of a and b, and a  b as their min, for any scalars a and
b. We also use the notation a  b to denote a ≤ Cb for some constant C > 0. Similarly, we use
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

x  P y to denote x = OP (y) for two random variables x and y.


We use λmin (A) and λmax (A) to denote the minimum and maximum eigenvalues of A and use
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

λi (A) to denote the ith largest eigenvalue of A. Similarly, we use σ i (A) to denote the ith singular
value of A. We use A1 , A∞ , A, and AF to denote the L1 norm, the L∞ norm, the operator
 
norm (or L2 norm),
  and the Frobenius norm of a matrix A = (aij ), that is, max j i |aij |, max i j |aij |,
λmax (A A), and Tr(A A), respectively. We also use AMAX = max i, j |aij | to denote the L∞ norm
of A on the vector space. Finally, we use Diag(A) to denote the diagonal matrix of A and A[I] , a
submatrix of A whose rows are indexed in I.

2. MODEL SPECIFICATIONS
We start by introducing a static factor model, which serves as a benchmark throughout the article.

2.1. Static Factor Models


In its simplest form, a static factor model can be written as
rt = E(rt ) + βvt + ut , 1.
where rt is an N × 1 vector of excess returns of test assets (e.g., size and value double-sorted
portfolios) over the risk-free rate, β is an N × K matrix of factor exposures, v t is a K × 1 vector of
(zero-mean) factor innovations, and ut is an N × 1 vector of idiosyncratic errors.
The expected return can (always) be decomposed as
E(rt ) = α + βγ , 2.
where γ is a K × 1 vector of risk premia and α is an N × 1 vector of pricing errors. This represen-
tation always holds, as the right-hand side has more degrees of freedom than does the left, but the
APT of Ross (1976) and follow-up work by Huberman (1982), Chamberlain & Rothschild (1983)
and Ingersoll (1984) predicts that no asymptotic arbitrage implies α  u−1 α < ∞ as N increases,
where  u is the covariance matrix of ut .
The most common framework in academic finance literature assumes that factors are known
and observable (an example would be industrial production growth, as in Chen, Roll & Ross 1986).
That is,
ft = μ + vt , 3.
where μ is some unknown parameter of the (population) expectation of ft . If factors are tradable
portfolios (such as the Fama-French factors in Fama & French 1993, 2015), then μ = γ : The risk
premium of a tradable factor is just its expected excess return.

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 339


A second framework, which has regained popularity recently but dates back to as early as
Connor & Korajczyk (1986), assumes that all factors and their exposures are latent, which re-
laxes the somewhat restrictive assumption in the setting discussed above that all factors are known
and observable to econometricians.
A third framework assumes that factor exposures are observable, but the factors are latent. This
is arguably the most prevalent framework for practitioners and is rooted in the MSCI Barra model
originally proposed by Rosenberg (1974). The popularity of this model stems from the fact that it
conveniently accommodates time-varying exposures of individual equity returns. We turn to the
topic of time-varying exposures next.

2.2. Conditional Factor Models


One might argue that the static model in Equation 1 is suitable for certain portfolios of assets
(though even in this case the static assumption is dubious), but it is clearly inadequate for most
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

individual assets. Yet it is important that models are capable of describing the behavior of individual
assets, not just sorted portfolios, to more thoroughly understand the full range of heterogeneity
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

in asset markets. Once we begin considering individual assets, conditional model formulations
become critical.1 For example, risk exposures of individual stocks very likely change over time
as firms evolve. In addition, assets with fixed maturities and nonlinear payoff structures (e.g., bonds
and options) experience mechanical variation in their risk exposures as their maturity rolls down
or the value of the underlying asset changes (Kelly, Palhares & Pruitt 2021; Büchner & Kelly
2022). In this case, a factor model should accommodate time-varying conditional risk exposures.
In its general form, the conditional factor model is
r̃t = αt−1 + βt−1 γt−1 + βt−1 vt + ũt , 4.
where r̃t and ũt are M × 1 vectors of excess returns and idiosyncratic errors of individual assets,
respectively. In this equation, β t−1 γ t−1 is the conditional risk premium earned through exposure to
the common risk factor v t , as assets earn conditional compensation of γ t−1 per unit of conditional
beta on factors v t . The term α t−1 includes any excess compensation an asset earns that is not
associated with factor exposure.
Obviously, the right-hand side of Equation 4 contains too many degrees of freedom, and the
model cannot be identified without additional restrictions. One example of additional restric-
tions is provided by the model of Rosenberg (1974), which imposes that β t−1 = bt−1 β, where bt−1
is an M × N matrix of observable characteristics, and β is an N × K vector of parameters. In this
case, the general form of Equation 4 becomes
r̃t = bt−1 f˜t + ε̃t , 5.
where f˜t := β(γt−1 + vt ) is a new N × 1 vector of latent factors, and ε̃t := αt−1 + ũt .2 This is
the MSCI Barra model prototype that has been embraced by practitioners for its simplicity and
versatility in modeling individual equity returns.
Barra’s model includes several dozen characteristics and industrial variables in bt−1 . Their ad
hoc selection procedure is opaque, and evidence suggests it is heavily overparameterized. When

1 Even for static models, Ang, Liu & Schwarz (2020) discuss the benefits of using individual stocks rather than

portfolios when analyzing factor models. In early papers, researchers wrestled with the technical challenges
of dealing with large cross sections of test assets. Sections 3 and 4 discuss how modern methodologies exploit
large cross sections to develop tractable factor model estimators with attractive statistical properties.
2 This model also allows for additional approximation error, if any, of β
t−1 using bt−1 β because such error can
be absorbed into ε̃t as well.

340 Giglio • Kelly • Xiu


the number of firm characteristics N is large, the number of free parameters is { f˜t }, N × T, which
can be large compared to sample size, and thus these parameters would be noisily estimated.
Kelly, Pruitt & Su (2019) suggest a new modeling approach known as instrumented princi-
pal components analysis (IPCA). IPCA inherits Barra’s versatility and tractability, yet avoids its
statistical inefficiency via a built-in dimension reduction,
r̃t = bt−1 β ft + ε̃t , 6.
where β and {ft } have N × K and K × T unknown parameters, respectively. This model of individual
asset returns has a direct link with the static model for portfolios in Equation 1. As discussed by
Giglio & Xiu (2021), if we project bt−1 on both sides of Equation 6 at each t, we obtain
rt := (bt−1 bt−1 )−1 bt−1 r̃t = β ft + ut , where ut := (bt−1 bt−1 )−1 bt−1 ε̃t . 7.
This echoes the static factor model in Equation 1 (which is why we use consistent notation,
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

such as β, N, and K, for both models). Moreover, (bt−1 bt−1 )−1 bt−1 can be interpreted as portfolio
weights for characteristic-sorted portfolio returns, (bt−1 bt−1 )−1 bt−1 r̃t . This derivation is consistent
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

with the convention of estimating (static) asset pricing models using characteristic-sorted port-
folios as test assets: The intuition is that to the extent that characteristics drive risk exposures,
sorting by those characteristics removes the time variation in exposure for the sorted portfolios.
Therefore, the static portfolio representation of Equation 1 can be applied directly to portfo-
lios appropriately sorted by relevant characteristics; alternatively, individual stocks can be used
as test assets, using IPCA to explicitly account for the time-varying risk loadings related to their
characteristics.
In general, the risk premia associated with factors γ t−1 := Et−1 ( ft ) could also be time varying,
but the time series path of risk premia {γ t−1 } is not identifiable without additional restrictions.
Only E(γ t−1 ) can be identified. To recover the path of risk premia, Gagliardini, Ossola & Scaillet
(2016) employ a parametric model of risk premia, as suggested by Harvey & Ferson (1999),

γt−1 = zt−1 θ , where z includes macro time series such as the term spread and θ is an unknown
parameter. Combined with the assumption that factor loadings are linear functions of observed
characteristics and macro time series, one can rewrite the dynamics of individual stock returns as
r̃i,t = xi,t β̃i + ε̃t , 8.
where {xi, t } are multidimensional regressors that depend on observable factors, macro variables,
and firm characteristics, and {β̃i } contain (functions of ) unknown parameters.
In essence, IPCA and related models employ a linear approximation for risk exposures based on
observable characteristics data. But there are no obvious theoretical or intuitive justifications for
the linearity assumption beyond tractability. To the contrary, there are many reasons to expect that
this assumption is violated. Essentially all leading theoretical asset pricing models predict nonlin-
earities in return dynamics as a function of state variables; Campbell & Cochrane (1999), Bansal
& Yaron (2004), Santos & Veronesi (2004), and He & Krishnamurthy (2013) provide prominent
examples.
To overcome this limitation, Connor, Hagmann & Linton (2012) and Fan, Liao & Wang (2016)
replace the assumption that factor betas are linear in characteristics with an assumption that factor
betas are nonparametric functions of characteristics (although these characteristics are assumed
to not vary over time for theoretical tractability). Kim, Korajczyk & Neuhierl (2021) adopt this
framework to construct arbitrage portfolios.
Gu, Kelly & Xiu (2021) extend the Barra and IPCA models to a nonlinear setting using a condi-
tional autoencoder model, augmented with additional explanatory variables. This model replaces

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 341


the linear beta specification in Equation 6 with a more realistic and flexible beta function. The
Gu, Kelly & Xiu (2021) autoencoder model is the first deep learning model of equity returns that
explicitly accounts for the risk-return trade-off. Thanks to recent progress in algorithms and com-
puting power, deep learning models like this are readily available and increasingly popular among
practitioners. Nevertheless, deep learning models are often criticized for their black-box nature.
Although these models are composed of simple composite functions (not much more complicated
than a regression model), training them can be tedious and is sometimes more art than science.
Rigorous theoretical justification still lags far behind the evolution of model architectures and
training algorithms.
Continuous-time factor models can sometimes be preferable for modeling the time-varying dy-
namics of asset returns, particularly when high-frequency returns data are available. Return factors
have complex dynamics such as stochastic volatility and jumps, and individual asset returns respond
to these factors with time-varying risk exposures. High-frequency modeling is well-suited for tack-
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

ling such complexities, though details are beyond the scope of this review. We refer interested
readers to Aït-Sahalia, Jacod & Xiu (2021).
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

3. METHODOLOGIES
The conventional methodologies for statistical inference of asset pricing models are designed for
low-dimensional settings, e.g., 25 test assets with a handful of factors over tens of years. Recently,
the set of explanatory variables (potentially) associated with equity returns has expanded rapidly
(e.g., Harvey, Liu & Zhu 2016), and researchers have begun using individual securities as test
assets (e.g., Kelly, Pruitt & Su 2019). With the transition to large-scale sets of factors and test
assets, high-dimensional statistical methods are increasingly relevant for empirical asset pricing
analysis. Our review covers classical methods but places particular emphasis on statistical method-
ologies designed to cope with a high-dimensional setting. We begin in Section 3.1 by discussing
machine learning methods to measure conditional expected returns without imposing restrictions
of factor pricing models. In Sections 3.2–3.5, we discuss various facets of factor model specifica-
tion, estimation, and evaluation. Then in Section 3.6 we focus on the divergence between expected
returns and factor exposures to discuss alpha tests.

3.1. Measuring Expected Returns


A central objective of asset pricing is to understand the behavior of expected returns. But ex-
pected returns are shrouded in noise in the form of unforecastable news that moves asset prices.
This makes expected returns difficult to measure. If we can improve measurement to see ex-
pected returns more clearly, we can better devise economic theories to explain their behavior. In
other words, return prediction (i.e., the measurement of expected returns) is critical to develop-
ing a clearer understanding of financial markets. Much of the asset pricing literature is dedicated
to understanding differences in expected returns across assets through the lens of factor pric-
ing models. However, there is an accumulation of evidence that assets earn average returns that
sometimes deviate substantially from the restrictions implied by factor pricing. Before discussing
factor models, we thus provide an overview of reduced-form models for expected returns that
do not impose asset pricing restrictions, which provides a backdrop to the analysis in subse-
quent sections. We pay particular attention to machine learning approaches to expected return
measurement.
The empirical literature on stock return prediction has three basic strands. The first models
differences in expected returns across stocks as a function of a small list of stock-level characteris-
tics, exemplified by Fama & French (2008) and Lewellen (2015). It mostly approaches estimation

342 Giglio • Kelly • Xiu


via cross-sectional regression (CSR) of future returns on lagged stock characteristics.3 The second
strand estimates expected returns via a time series regression (TSR) of portfolio returns on a small
number of predictor variables (surveyed in Welch & Goyal 2007, Koijen & Nieuwerburgh 2011,
and Rapach & Zhou 2013).
These traditional methods have potentially severe limitations that more advanced statistical
tools in machine learning can help overcome. Most important is that regressions and portfolio
sorts are ill-suited to handle the large numbers of predictor variables that the literature has accu-
mulated over five decades. The challenge is how to assess the incremental predictive content of a
newly proposed predictor while jointly controlling for the gamut of extant signals (or, relatedly,
handling the problems of overfit and multiple comparisons).
The third strand of stock return predictions is newly emerging and is rooted in the methods
of machine learning. With an emphasis on variable selection and dimension reduction tech-
niques, machine learning is well-suited for such challenging prediction problems by reducing
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

degrees of freedom and condensing redundant variation among predictors. A first wave of high-
dimensional models used linear methods such as partial least squares (e.g., Kelly & Pruitt 2013,
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

Rapach et al. 2013) and lasso (Chinco, Clark-Joseph & Ye 2019; Freyberger, Neuhierl & Weber
2020).
More recently, Gu, Kelly & Xiu (2020) conduct a wide-ranging analysis of machine learning
methods for return prediction, considering not only regularized linear methods but also more
cutting-edge nonlinear methods including random forest, boosted regression trees, and deep
learning. Their research illustrates the substantial gains of incorporating machine learning when
estimating expected returns. This translates into improvements in out-of-sample predictive R2
as well as large gains for investment strategies that leverage machine learning predictions. The
empirical analysis also identifies the most informative predictor variables, which helps facilitate
deeper investigation into economic mechanisms of asset pricing.
Machine learning also makes it possible to improve expected return estimates using predic-
tive information in complex and unstructured data sets. For example, Ke, Kelly & Xiu (2019)
propose a new supervised topic model for constructing return predictions from raw news text
and demonstrate its prowess for out-of-sample forecasting. Jiang, Kelly & Xiu (2021) and Obaid
& Pukthuanthong (2022) demonstrate how to tap return predictive information in image data
using machine learning models from the computer vision literature. Both text and image data
confer particularly strong return forecasting gains at short horizons of days and weeks and are
likely underpinned by comparatively fast-moving market sentiments, rather than fundamental in-
formation that arguably plays a dominant role at forecast horizons of quarters or years. Indeed,
sentiment and related behavioral economic driving forces are becoming a core aspect of finan-
cial markets research. These are subtle phenomena with circuitous transmission and feedback
effects. As such, they are fertile ground for machine learning methods, which offer an ability
to capture approximate complex nonlinear associations by exploiting rich and unwieldy data
sets.
In general, the return prediction literature delves little into understanding the economic
mechanisms (such as risk-return trade-offs, market frictions, or behavioral biases) that may be
responsible for observed predictability. Distinguishing, for example, between risk premia and
mispricing requires a more structured modeling approach, and factor models are the dominant
tool researchers have used in this pursuit.

3 Inaddition to least squares regression, the literature often sorts assets into portfolios on the basis of
characteristics and studies portfolio averages—a form of nonparametric regression.

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 343


3.2. Estimating Factors and Exposures
In a factor model, the total variance of an asset is decomposed into a systematic risk compo-
nent driven by covariances with the factors and a component that is idiosyncratic to the asset.
There are many factor modeling strategies available that differ in their assumptions about whether
factors and their exposures are assumed known, and whether the model uses a conditional or
unconditional risk decomposition.
3.2.1. Time series and cross-sectional regressions. Consider a static factor model given by
Equation 1. If factors are known, we can estimate factor exposures via asset-by-asset TSRs, which,
in matrix form, can be written as

TSR :  = R̄F̄  (F̄ F̄  )−1 .


β 9.
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

If asset returns are assumed to be time constant functions of static, asset-level characteristics, as
in Gagliardini, Ossola & Scaillet (2016) and Equation 8, then asset-by-asset TSRs yield estimates
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

for β̃i , which in turn leads to estimates of parameterized factor loadings.


If factors are instead latent but exposures are observable (as in Rosenberg 1974 and MSCI
Barra), then we can estimate factors by CSRs at each time point. In matrix form, we can write the
estimator as

CSR :  = (β  β )−1 β  R.
F 10.

This approach is most commonly used for individual stocks, for which their loadings can be
proxied by firm characteristics. It is convenient for the CSR to accommodate time-varying char-
acteristics, as in Equation 5, in which case we can rewrite Equation 10 accordingly, for each t, as


f t = (bt−1 bt−1 )−1 bt−1 r̃t . 11.

The limitation of TSRs and CSRs is their reliance on the strong assumption that either factors
or factor exposures are fully observable to the econometrician. Though theory offers some guid-
ance on the nature of common risk factors, and though firm attributes are likely to correlate with
their factor exposures, the necessary observability assumptions for the success of TSRs or CSRs
are unlikely to be satisfied in the data.
3.2.2. Principal components analysis. If neither factors nor loadings are known, we can re-
sort to PCA to extract latent factors and their loadings. The use of PCA in asset pricing dates
back to as early as Chamberlain & Rothschild (1983) and Connor & Korajczyk (1986) and has
become increasingly popular (see, e.g., Kozak, Nagel & Santosh 2018; Kelly, Pruitt & Su 2019;
Pukthuanthong, Roll & Subrahmanyam 2019; Giglio & Xiu 2021). For a static factor model
(Equation 1) PCA can identify factors and their loadings up to some unknown linear trans-
formation. It is more convenient to implement this via a singular value decomposition (SVD)
of R̄,


K
R̄ = ,
σ j ς j ξ j + U 12.
j=1

 singular values and the left and right singular


where {σ j }, {ς j }, and {ξ j } correspond to the first K

vectors of R̄, respectively, and K can be any consistent estimator (e.g., Bai & Ng 2002) of the
number of factors in rt . This decomposition yields a pair of estimates of factor innovations and

344 Giglio • Kelly • Xiu


exposures as
 = T 1/2 (ξ1 : ξ2 : . . . ξ  ) ,
V  = T −1/2 (σ1 ς1 : σ2 ς2 : . . . : σ ς  ).
β 13.
K K K

Because of the fundamental indeterminacy of latent factor models, it is equivalent to use


 ) as alternative estimates for any invertible matrix H. Said differently, a rotation of fac-
 −1 , H V
(βH
tors and an inverse rotation of betas leave model fits exactly unchanged. While allowing for latent
factors and exposures can add great flexibility to a research project, this rotation indeterminacy
makes it difficult to interpret the factors in a latent factor model.
The PCA approach is also applicable if some but not all factors are observable. In such a case,
Giglio, Liao & Xiu (2021) suggest conducting PCA on residuals from TSRs of returns onto ob-
servable factors. They show that the estimated betas of observable and latent factors are, again,
consistent with respect to the true betas up to some unknown linear transformation.

3.2.3. Risk-premia principal components analysis. One potential shortcoming of PCA is that
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

it extracts information about latent factors solely from realized return covariances. To see this, the
SVD in Equation 12 is applied to R̄, which eliminates the average return from each column of R.
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

In fact, if we assume α = 0 in Equation 2, the expected return is also spanned by beta, so that the
information in average returns (r̄) can be exploited for more efficient recovery of factors.
Lettau & Pelger (2020b) exploit this intuition and propose a so-called risk-premia PCA estima-
tor for factors. Instead of using T −1 R̄R̄ = T −1 RR − r̄ r̄  , they conduct PCA on T −1 RR + λr̄ r̄  ,
where λ is a tuning parameter. Risk-premia PCA generalizes the proposal of Connor & Korajczyk
(1986), which corresponds to the special case of λ = 0. Lettau & Pelger (2020a) further prove
that the risk-premia PCA could achieve a smaller asymptotic variance for factor loadings than the
standard PCA if all factors are pervasive; it outperforms PCA empirically when factors are weak.
We defer a more detailed discussion of weak factors to Section 3.3.4.

3.2.4. Instrumented principal components analysis. A limitation of PCA is that it only ap-
plies to static factor models. It also lacks the flexibility to incorporate other data beyond returns. To
address both issues, Kelly, Pruitt & Su (2019) estimate the conditional factor model (Equation 6) by
T  
r̃t − bt−1 β ft 2 . The estimates satisfy first-order
solving the optimization problem minβ,{ ft } t=2
conditions:

    
ft = β  −1 β
 b bt−1 β  b r̃t , 14.
t−1 t−1
−1

T 
T
 ) =
vec(β bt−1 bt−1 ⊗ 
ft 
ft (bt−1 ⊗ 
ft ) r̃t . 15.
t=2 t=2

Consistent with the discussion in Section 3.2.1, Equation 14 shows that, given conditional be-
tas, factors are estimated from CSRs of returns on betas. Equation 14 resembles Equation 11,
but the former accommodates a potentially larger number of characteristics because of the built-
in dimension reduction assumption. Equation 15 shows that conditional betas can be recovered
from panel regressions of returns onto characteristics interacted with factors. The authors recom-
mend an alternating least squares algorithm to iteratively update β and ft until convergence. Kelly,
Pruitt & Su (2020) develop the accompanying asymptotic inference for the extracted factors and
loadings.
Kelly, Moskowitz & Pruitt (2021) apply this IPCA framework to explain momentum and long-
term reversal phenomena in equity returns. The general framework of IPCA also extends beyond
equity into other asset classes, such as corporate bonds (Kelly, Palhares & Pruitt 2021) and options
(Büchner & Kelly 2022).

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 345


3.2.5. Autoencoder learning. IPCA assumes that conditional factor loadings are a linear func-
tion of asset characteristics. While this is a tractable assumption, there is no a priori reason for
imposing linearity on the mapping from characteristics to loadings. A natural extension of IPCA
is to leverage machine learning to develop a richer and more flexible mapping from conditioning
variables to factor loadings. While there are a number of possible machine learning specifica-
tions for this map, the natural candidate is a deep learning model known as an autoencoder. The
machine learning literature has long recognized the close connection between autoencoders and
PCA (e.g., Baldi & Hornik 1989). Unlike the linearity embedded in PCA, an autoencoder uses a
neural network specification to estimate factors and loadings. That said, a standard autoencoder
only uses returns data to estimate the latent factor model. Thus, it does not utilize the additional
nonreturn conditioning variables that are central to the success of the IPCA specification.
Gu, Kelly & Xiu (2021) propose a customized autoencoder factor model that benefits from a
flexible neural network formulation (as in a standard autoencoder) while at the same time lever-
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

aging additional nonreturn conditioning information (in the spirit of IPCA). In the Gu, Kelly &
Xiu (2021) conditional autoencoder, stock characteristics are mapped into betas through a feed-
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

forward neural network, thus replacing IPCA betas with a more realistic nonlinear specification.
Figure 1 illustrates the model’s basic structure. At a high level, the mathematical representation
of the model is identical to Equation 4. On the left side of the network, factor loadings are a

Output layer
...
(M × 1)

Dot product
Beta output layer Factor output layer
(M × 1) (K × 1)
M
...

K
g g ... g g
Hidden layer (s) g g ... g g
...
Input layer 2
g g ... g g
... (N × 1)
...
Input layer 1 ... or
(M × N) M
...
... ... (M × 1)
N
(Beta) (Factor)
Figure 1
A diagram of a conditional encoder model in which an encoder is augmented to incorporate covariates in the factor loading
specification. (Left) A diagram of how factor loadings β t−1 at time t − 1 (green) depend on firm characteristics bt−1 (yellow) of input layer
1 through an activation function g on neurons of the hidden layer. Each row of yellow neurons represents the K × 1 vector of
characteristics of one ticker. (Right) A diagram showing the corresponding factors at time t. ft nodes (purple) are weighted combinations
of neurons of input layer 2, which can be either N characteristic-managed portfolios rt (pink) or M individual asset returns r̃t (red). In
the former case, the dashed arrows indicate that the characteristic-managed portfolios rely on individual assets through predetermined
weights (not to be estimated). In either case, the effective input can be regarded as individual asset returns, exactly what the output layer
(red) aims to approximate; thus, this model shares the same spirit as a standard encoder.

346 Giglio • Kelly • Xiu


nonlinear function of covariates (e.g., firm characteristics), while the right side of the network
models factors as portfolios of individual stock returns.
In particular, the K × 1 vector β i, t−1 is specified as a neural network model of lagged firm
characteristics bi, t−1 . The recursive formulation for the nonlinear beta function is

b(0)
i,t−1 = bi,t−1 , 16.

b(li,t−1
)
= g b(l−1) + W (l−1) b(l−1)
i,t−1 , l = 1, . . . , Lβ , 17.

β (L )
βi,t−1 = b(Lβ ) + W (Lβ ) bi,t−1 . 18.

Equation 16 initializes the network as a function of the baseline characteristic data, bi, t−1 .
Equation 17 describes the nonlinear (and interactive) transformation of characteristics as they
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

propagate through hidden layer neurons. Equation 18 describes how a set of K-dimensional factor
betas emerge from the terminal output layer.
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

On the right side of Figure 1, we see an otherwise standard autoencoder for the factor
specification. The recursive mathematical formulation of the factors is

rt(0) = (bt−1 bt−1 )−1 bt−1 rt , 19.

 (l−1) rt(l−1) ,
rt(l ) = g b(l−1) + W l = 1, . . . , L f , 20.

 (L f ) rt(L f ) .
ft = b(L f ) + W 21.

Equation 19 initializes the network with characteristic-sorted portfolios of individual asset re-
turns, as defined by Equation 7. This sidesteps the incompleteness issue of the panel of individual
stock returns and in the meantime performs a preliminary reduction of data. The expressions in
Equation 20 transform and compress the dimensionality of returns as they propagate through
hidden layers. Equation 21 describes the final set of K factors at the output layer. If a single linear
layer is included on the factor network, that is, if Lf = 1, this structure maintains the economic
interpretation of factors: They are themselves portfolios (linear combination of returns).
At last, the so-called dotted operation multiplies the M × K matrix output from the beta net-
work with the K × 1 output from the factor network to produce the final model fit for each
individual asset return.
When the autoencoder has one hidden layer and a linear activation function, it is equivalent
to the PCA estimator for linear factor models described in Section 3.2.2. Just as the autoencoder
model nests the static linear factor model, the augmented autoencoder nests the IPCA factor
model as a special case. The high capacity of a neural network model enhances its flexibility to
construct the most informative features from data. With enhanced flexibility, however, comes a
higher propensity to overfit. Next we discuss several generic algorithms likely applicable to any
deep learning models.

3.2.5.1. Training, validation, and testing. To curb overfitting, the entire sample is typically
divided into three disjoint subsamples that maintain the temporal ordering of the data. The first, or
training, subsample is used to estimate the model subject to a specific set of tuning hyperparameter
values.
The second, or validation, subsample is used for tuning the hyperparameters. Fitted values
are constructed for data points in the validation sample based on the estimated model from the

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 347


training sample. Next, the objective function is calculated based on errors from the validation
sample, and hyperparameters are then selected to optimize the validation objective.
The validation sample fits are of course not truly out of sample because they are used for tuning,
which is in turn an input to the estimation. Thus the third, or testing, subsample is used for neither
estimation nor tuning. It is thus used to evaluate a method’s out-of-sample performance.

3.2.5.2. Regularization techniques. The most common machine learning device for guarding
against overfitting is to append a penalty to the objective function in order to favor more parsimo-
nious specifications. This regularization approach mechanically deteriorates a model’s in-sample
performance in the hope of improving its stability out of sample. This is the case when penalization
manages to reduce the model’s fit of noise while preserving its fit of the signal.
Gu, Kelly & Xiu (2021) define the estimation objective to be

1  
T N
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

L(θ ; ·) = r̃i,t − β  ft 2 + φ(θ; ·), 22.


i,t−1
N T t=1 i=1
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

where θ summarizes the weight parameters in the loading and factor networks of Equations 16–
21, and φ(θ) is a penalty function, such as lasso (or l1 ) penalization, which takes the form φ(θ; λ) =

λ j |θ j |.
In addition to l1 penalization, Gu, Kelly & Xiu (2021) employ a second machine learning reg-
ularization tool known as early stopping. By ending the parameter search early (as soon as the
validation sample error begins to increase), parameters are shrunken toward the initial guess, for
which parsimonious parameterization is often imposed. It is a popular substitute to l2 penaliza-
tion of θ parameters because of its convenience in implementation and effectiveness in combating
overfit.
As a third regularization technique, Gu, Kelly & Xiu (2021) adopt an ensemble approach in
training neural networks. In particular, they use multiple random seeds to initialize neural net-
work estimation and construct model predictions by averaging estimates from all networks. This
enhances the stability of the results because the stochastic nature of the optimization can cause
different seeds to settle at different optima.

3.2.5.3. Optimization algorithms. The high degree of nonlinearity and nonconvexity in neu-
ral networks, together with their rich parameterization, makes brute force optimization highly
computationally intensive (often to the point of infeasibility). Gu, Kelly & Xiu (2021) adopt the
adaptive moment estimation algorithm (Adam), an efficient version of stochastic gradient de-
scent introduced by Kingma & Ba (2014), which computes adaptive learning rates for individual
parameters using estimates of first and second moments of the gradients.
Gu, Kelly & Xiu (2021) also adopt batch normalization (Ioffe & Szegedy 2015) to control the
variability of predictors across different regions of the network and across different data sets. This
method is motivated by the phenomenon of internal covariate shift in which inputs of hidden
layers follow different distributions than their counterparts in the validation sample.

3.2.6. Matrix completion. It is not uncommon in finance applications to deal with unbalanced
panels. Giglio, Liao & Xiu (2021) adopt a matrix completion algorithm to handle missing data
when extracting factors and loadings of a factor model.
The matrix completion approach relies on the assumption that the full matrix can be writ-
ten as a noisy low-rank matrix. This assumption is naturally justified for Equation 1 (assuming
α = 0), which, in matrix form, can be rewritten as R = β(V + γ ιT ) + U and thus clearly satisfies
the assumption.

348 Giglio • Kelly • Xiu


The goal is to recover an N × T low-rank matrix X := β(V + γ ιT ). Suppose R (the noisy ver-
sion of X) is not fully observed, and  is an N × T matrix whose (i, t)th element ωit = 1{rit is observed} .
Using this notation, econometricians can only observe R   and , where  represents the
element-wise matrix product.
The following nuclear-norm penalized regression approach can be employed to recover X:4
 = arg min (R − X )  2 + λN T X n ,
X 23.
X

where Xn denotes the matrix nuclear norm, and λNT > 0 is a tuning parameter. By penalizing
the singular values of X, the algorithm achieves a low-rank matrix as the output. The latent factors
and betas can then be estimated via the corresponding singular vectors of X .

3.3. Estimating Risk Premia


Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

The risk premium of a factor is informative about the equilibrium compensation investors demand
to hold risk associated with that factor. One of the central predictions of asset pricing models
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

is that some risk factors—for example, consumption growth, intermediary capital, or aggregate
liquidity—should command a risk premium: Investors should be compensated for their exposure
to those factors, holding constant their exposure to all other sources of risk.
For tradable factors—such as the market portfolio in the CAPM—estimating risk premia re-
duces to calculating the sample average excess return of the factor. This estimate is simple and
robust and requires minimal modeling assumptions.
However, many theoretical models are formulated with regard to nontradable factors—factors
that are not themselves portfolios—such as consumption, inflation, liquidity, and so on. To estimate
the risk premium of any of these factors, it is necessary to construct its tradable incarnation. Such
a tradable factor is a hedging portfolio that isolates the risk of the nontradable factor while holding
all other risks constant. There are two standard approaches to constructing tradable counterparts
of a nontradable factor: two-pass regressions and factor-mimicking portfolios.

3.3.1. Classical two-pass regressions. The classical two-pass (or Fama-MacBeth) regression
requires a model like Equation 1 with all factors observable. The first time series pass yields esti-
mates of beta using regressions in Equation 9. Then the second cross-sectional pass estimates risk
premia via an ordinary least squares (OLS) regression of average returns on the estimated beta:


γ = (β )−1 β
 β  r̄. 24.

The generalized least squares (GLS) version of Equation 24 replaces the OLS in the
cross-sectional pass with

 
γGLS = (β
 )−1 β
u−1 β  
u−1 r̄, 25.

where  u = T −1 R̄MV̄ R̄ is the sample covariance matrix of the residuals.


Lewellen, Nagel & Shanken (2010) advocate the GLS approach and suggest reporting GLS
R2 , partially because their simulations suggest that obtaining a high GLS R2 appears to be a
more rigorous hurdle than obtaining a high OLS R2 . In our view, this benefit of GLS is partially
overshadowed by its disastrous finite sample performance due to the poor estimates of  u , in

min{N ,T }
4 The nuclear norm is X n := i=1 ψi (X ), where ψ 1 (X) ≥ ψ 2 (X) ≥ . . . are the sorted singular values of
X.

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 349


particular when N is large. In Section 4.2, we also show that OLS and infeasible GLS (which
assumes perfect knowledge of  u ) are asymptotically equivalent when both N and T are large, so
that there is no asymptotic efficiency gain from using GLS in that setting.

3.3.2. Factor-mimicking portfolios. In contrast to Equation 24, Fama & MacBeth (1973)

propose an inference procedure that regresses realized returns at each time t onto β:

γt = (β
 )−1 β
 β  rt . 26.

Note that the estimated slope of the Fama-MacBeth regression at each time t,  γt , is itself a portfo-
lio return, corresponding to the portfolio weights (β β
)−1 β
 . This highlights an important point:
The classical two-pass regression discussed above or Fama-MacBeth (both of which yield the
same point estimates of the risk premium) obtains the risk premium of a nontradable factor by
first building a factor-mimicking portfolio for it and then estimating the corresponding risk pre-
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

mium as the average excess return of this portfolio return, γt (Fama & MacBeth 1973). Regularity
conditions in Giglio & Xiu (2021) imply that
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

γt = (β
 )−1 β
 β  [β(γ + vt ) + ut ] ≈ γ + vt ,

and thus the Fama-MacBeth procedure is an effective approach to estimating risk premia γ .
Now, suppose we are interested in estimating the risk premium of a measured nontradable risk
factor gt , say, a climate risk measure that is not tradable, that satisfies

gt = ξ + ηvt + zt . 27.

This is a general representation where the measured risk factor gt is related through the vector η to
the fundamental risk factors v t (it could itself be part of v t , but it could also just be correlated with
it and therefore still command a risk premium). The representation also allows for measurement
error zt (e.g., measurement error in consumption growth).
Obviously, the risk premium of gt is γ g = ηγ . This can be readily estimated using the Fama-
MacBeth procedure by first building the mimicking portfolios for v t ( γt ) and then obtaining
the mimicking portfolio for gt as  η
γt , where η is simply the vector of coefficients of a TSR
(Equation 27). This yields the risk premium estimate as  η γ.
Another standard approach to tracking a nontradable factor is the maximal-correlation factor-
mimicking portfolio approach (e.g., Huberman, Kandel & Stambaugh 1987; Lamont 2001). This
directly projects gt onto a set of basis asset returns, yt , that yields weights of the mimicking
portfolio,

wg = Var(yt )−1 Cov(yt , gt ),

whose returns and expected returns are given by wg yt and wg E(yt ), respectively.
How do we reconcile these two approaches? Is γ g the same as wg E(yt ) for some choice of yt ?
How do we select such yt ? Under data-generating processes given by Equations 1, 2, and 27, if
we select yt = Aft (recall that ft = μ + v t ) for any invertible matrix A, then w g = (A )−1 η, and
wg E(y) = ηγ . In this scenario, both approaches are equivalent, suggesting that we should use the
same factors we use in Fama-MacBeth regressions to build a mimicking portfolio for gt . Obviously,
this is only possible if ft is a vector of tradable portfolios. If not, we can use mimicking portfolios
of ft , 
γt , but this implies that the mimicking portfolio needs hedging portfolios already built by
the Fama-MacBeth approach described above, limiting its usefulness.
There is, however, a more interesting choice for yt that obviates the need to build hedging port-
folios from Fama-MacBeth regressions in the first place. The idea is to use all returns yt = rt as basis
assets when building a mimicking portfolio for gt . In this case, we find that wg =  −1 β v η , and

350 Giglio • Kelly • Xiu


wg E(rt ) = ηv β  ()−1 βγ . This appears to give a different risk premium parameter in the pop-
ulation. Nevertheless, Giglio & Xiu (2021) prove that as N → ∞, wg E(rt ) converges to ηγ . This
result suggests that, in the limit, the maximal-correlation factor-mimicking portfolio approach
targets the same risk premium parameter as do Fama-MacBeth regressions.
What makes the second approach more appealing is the fact that it does not require a fully
specified factor model because all it needs are the factor of interest, gt , and the cross section of
test assets, rt . This suggests that estimating a factor’s risk premium does not require knowledge
about the identities of factors driving asset returns, an important advantage when, as we discuss
more in the next section, the entire factor model is not always known.This approach, however,
has a fundamental drawback: the curse of dimensionality—it requires a large cross section of assets
(N → ∞), which may exceed the sample size T to the extent that a projection of gt on rt becomes
infeasible. We now turn to a three-pass estimator that adopts PCA regression to resolve this high-
dimensionality issue.
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

3.3.3. Three-pass regressions and the omitted factor bias. Giglio & Xiu (2021) suggest
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

using principal component regression (PCR) when building the factor-mimicking portfolios for gt
using all returns rt as basis assets. The PCR approach is a natural choice among high-dimensional
regressions in that rt follows a factor model according to Equation 1. The three-pass method
proceeds as follows:
1. The first pass is an SVD of R̄ to obtain β  as in Equation 13.
 and V,
2. The second pass runs a cross-sectional OLS regression (Equation 24) to obtain risk premia
.
of V

3. Finally, the third pass projects gt onto V,


η = ḠV V
  (V   )−1 ,

thus recovering the weights of the mimicking portfolio.


The three-pass estimator of the risk premium of gt is then obtained by multiplying the portfolio
weights 
η by the risk premia of these portfolios 
γ . In a compact form, it is given by
  (V
γg = ḠV
 V  )−1 (β )−1 β
 β  r̄. 28.

As we discuss in Section 4.2, this estimator has asymptotic guarantees in the large N, large
T setting, but only if all factors are pervasive.
Because this estimator does not rely on any prespecified asset pricing model for risk-premia
estimation, it is especially useful for cases in which a researcher is interested in estimating the risk
premium of a nontradable factor predicted by theory (e.g., consumption growth, liquidity, etc.)
but does not want to take a stand on what the other factors are in the model. In contrast, stan-
dard Fama-MacBeth regressions would be biased if some true factors are omitted in this model.
Relatedly, Gagliardini, Ossola & Scaillet (2019) propose a diagnostic criterion for detecting the
number of omitted factors.

3.3.4. Weak factors. Besides the omitted factor bias, another severe issue that plagues the clas-
sical two-pass regression is weak identification. Kan & Zhang (1999) first noted that the inference
on risk premia from two-pass regressions becomes distorted when a useless factor—a factor to
which test assets have zero exposure—is included in the model. Kleibergen (2009) further points
out that standard inference fails if betas are relatively small. This issue is quite relevant in prac-
tice because many test assets are not very sensitive to macroeconomic shocks. Moreover, the same
rank-deficiency problem arises when betas are collinear (even if the factors are individually strong);

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 351


that is, some factors are redundant in terms of explaining the variation of expected returns. This
is again a relevant issue in practice due to the existence of hundreds of factors discovered in the
literature, many of which are close cousins and do not add any explanatory power for the cross
section.
When beta is close to zero, its estimation error dominates the true signal, resulting in an error-
in-variables problem. Kleibergen (2009) proposes several test statistics for risk premia that are
uniformly valid over all values of beta. The beneficial robustness of these tests comes at the cost
of a lack of power when weak factors exist. These tests are also designed for testing risk premia of
all factors jointly but are often not informative about the risk premium on any particular factor.
Bryzgalova (2015) suggests eliminating weak factors via a penalized two-pass regression so as to
improve the power for detecting strong factors. However, eliminating weaker factors can lead to
invalid inference and potentially large biases in the risk-premia estimates of the remaining factors.
Jegadeesh et al. (2019) propose a sample-splitting and instrumental variable estimator to cor-
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

rect the error-in-variables bias. In the same spirit and more rigorously, Anatolyev & Mikusheva
(2022) propose a four-split approach that addresses the issues of weak factors and omitted factors.
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

They assume that part of v t in Equation 1, call it v 1t , is observable, though potentially weak. They
also assume that its beta, namely β 1 , fully spans the space of expected returns. The other part of
v t , call it v 2t , is latent (hence omitted by econometricians) and unpriced. The four-split estimator
aims for valid inference on the risk premia of v 1t . Note that omitted factors in their setup must be
unpriced to achieve valid inference. In practice, however, it is the omitted priced factors that are
most concerning.
Giglio, Xiu & Zhang (2021) argue that the weak factor problem is fundamentally an issue of test
asset selection. They argue that factor strength is not an inherent property of a factor but instead is
dictated by the selection of test assets. Weaker factors may still be priced, so just eliminating them
is an undesirable solution. Instead, Giglio, Xiu & Zhang (2021) suggest actively selecting test assets
to guarantee that the selected assets have sufficient exposure to the factors of interest; in other
words, a factor can be made stronger by appropriate asset selection—by selecting assets highly
exposed to it. To simultaneously address the weak and omitted factor problems, they propose
an iterative supervised PCA procedure that integrates correlation screening with the three-pass
estimator of Giglio & Xiu (2021). This estimator is robust to both omitted variable bias and the
weak factor problem, as well as to measurement error in observed factors.
Test assets are an important component of empirical asset pricing, yet little work has been
dedicated to rigorously and systematically investigating how they should be selected. When a
model is composed of tradable factors, many important asset pricing analyses are independent of
the test assets. For example, risk premia of tradable factors are best calculated as simple averages
of factor returns, the maximum Sharpe ratio portfolio in the model economy can be inferred from
the tradable factors alone, and model comparison can be conducted without test assets (Barillas &
Shanken 2017). In contrast, test assets are central to the study of nontradable factors because they
are used to construct the necessary factor-mimicking portfolios that in turn are inputs to most
asset pricing analyses.
The choice of test assets in the literature has mainly followed one of three approaches. The first
approach, adopted by the vast majority of the literature, uses a standard set of portfolios sorted on
a few characteristics, such as size and value, following the seminal work of Fama & French (1993).
Lewellen, Nagel & Shanken (2010) argue that this approach sets a rather low hurdle for a factor
pricing model. They suggest augmenting the set of test assets with industry portfolios. Giglio, Xiu
& Zhang (2021) argue that using the standard cross section likely creates a weak factor problem
because these assets may not have exposure to the factor of interest. Alternatively, Ahn, Conrad &

352 Giglio • Kelly • Xiu


Dittmar (2009) suggest forming portfolios as test assets by clustering individual securities based
on their correlations so that securities within clusters are similar and those across clusters are
different. There is not a clear theoretical rationale behind this proposal, however.
A second approach that has gained traction more recently expands the set of test assets to in-
clude portfolios sorted on a much larger set of characteristics discovered in the past few decades,
on the order of hundreds of portfolios (e.g., Bryzgalova, Pelger & Zhu 2020; Kozak, Nagel &
Santosh 2020). Along these lines, an attractive property of IPCA is that it can be viewed and
assessed from the perspective of individual stocks or characteristic-managed portfolios as test as-
sets. Kelly, Pruitt & Su (2019) argue that this has the attractive property of reducing researcher
discretion over test asset selection.
A third approach curates test assets that are targeted for a specific factor of interest (see, e.g.,
Ang et al. 2006). A common approach is to estimate stock-level betas on a given factor, then sort
assets into portfolios based on the estimated exposure. A small cross section of these sorted port-
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

folios is expected to be particularly informative about the factor of interest, but it is affected by
the omitted factor problem because it tends to focus only on univariate exposures.
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

The approach of Giglio, Xiu & Zhang (2021) builds on these approaches by starting from a
large universe of test assets but then selecting only informative assets for estimation.

3.4. Estimating the Stochastic Discount Factor and Its Loadings


A factor’s risk premium is equal to its (negative) covariance with the SDF. In the setup of
Equation 1, an SDF can be written as
mt = 1 − b vt , 29.

where b = v−1 γ , and  v is the covariance matrix of factor innovations. The SDF is central to the
field of asset pricing because, in the absence of arbitrage, covariances with the SDF alone explain
cross-sectional differences in expected returns.
As shown in Equation 29, the vector of SDF loadings, b, is related to mean-variance optimal
portfolio weights. SDF loadings b and risk premia γ are directly related through the covariance
matrix of the factors, but they differ substantially in their interpretations. The SDF loading of a
factor tells us whether that factor is useful in pricing the cross section of returns. For example, a
factor could command a nonzero risk premium without appearing in the SDF simply because it is
correlated with the true factors driving the SDF. It is therefore not surprising to see many factors
with significant risk premia. For this reason, it makes more sense to tame the factor zoo by testing
if a new factor has a nonzero SDF loading (or has a nonzero weight in the mean-variance efficient
portfolio) rather than testing if it has a significant risk premium.

3.4.1. Generalized method of moments. The classical approach to estimating SDF loadings
is the generalized method of moments (GMM). In light of Equations 3 and 29 and the definition
of the SDF, we can formulate a set of moment conditions,

E(mt rt ) = 0N ×1 , E(vt ) = 0K×1 .

Because there are in total K + N moments with 2K parameters (μ and b), we need N ≥ K to ensure
the system is identified.
The GMM estimator is thereby defined as the solution to the optimization problem,

min  
gT (b, μ)W gT (b, μ), 30.
b,μ

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 353


where the sample moments are given by
T
1
rt (1 − b ( ft − μ))

gT (b, μ) = T t=1  T .
t=1 ft − μ
1
T (N +K )×1

The inference procedure follows the usual GMM formulation (Hansen 1982). For efficiency rea-
sons, it is customary to choose the optimal weighting matrix, W opt = −1 , where 
 is a consistent
estimator of , as in Section 4.1. As an alternative, there is a special class of weighting matrices
for which a closed-form solution to Equation 30 is available,

b = (CW
11C)
 −1 (C 11 r̄), 
W μ = f¯ , 31.
where W 11 is the top N × N submatrix of some W,  and C  is the N × K sample covariance matrix be-

tween rt and v t . Recall that b = v−1 γ , and note also that β  It follows that 
v = C. b=v−1
γ , where

γ is given by Equation 24 (W 11 = IN ) or 25 (W11 =  u−1 ). In other words, Equation 31 amounts to
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

running two-pass CSRs with (univariate) covariances in place of β.  This is not surprising because
according to Equation 2 we have
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

E(rt ) = α + βγ = α + Cb, 32.


where C = β v is the covariance between rt and v t . The two-pass procedure—with univariate
covariances instead of multivariate betas—thereby achieves an estimate of b.

3.4.2. Principal components analysis–based methods. Kozak, Nagel & Santosh (2018) argue
that the absence of near-arbitrage opportunities forces expected returns to (approximately) align
with common factor covariances, even in a world in which belief distortions can affect asset prices.
The strong covariation among asset returns suggests that the SDF can be represented as a function
of a few dominant sources of return variation. PCA of asset returns recovers the common compo-
nents that dominate return variation. Specifically, the first two passes of the three-pass procedure
in Section 3.3.3 yield an SDF estimator without relying on knowledge of factor identities:
m γ 
t = 1 −  vt , 33.
where  .
vt is the tth column of V

3.4.3. Penalized regressions. In the PCA approach, the SDF is essentially parameterized as a
small number of linear combinations of factors, as shown in Equation 29. Kozak, Nagel & Santosh
(2020) consider an SDF represented in terms of a set of tradable test asset returns,
mt = 1 − b [rt − E(rt )], 34.
where b satisfies E(rt ) = b, and  is the covariance matrix of rt . Giglio, Xiu & Zhang (2021)
show that the relationship between the two SDFs (Equations 29 and 34) depends on the degree
of completeness of markets. Assuming that rt follows Equation 1 and some regularity conditions
hold, these two forms of SDF are asymptotically equivalent as N → ∞ in the sense that

1 
T
1
|mt − m t |2  .
T t=1 λmin (β  β )
Because the right-hand side diminishes as N → ∞ even for relatively weak factors, there is
generally no theoretical difference between estimands.
To estimate the SDF (Equation 34), Kozak, Nagel & Santosh (2020) suggest solving an

optimization problem, which amounts to a regression of r̄ onto ,
 
  
b = arg min (r̄ − b)  −1 (r̄ − b)
 + pλ (b) , 35.
b

354 Giglio • Kelly • Xiu


for which the estimated pricing kernel is given by
t = 1 − 
m b  (rt − r̄). 36.
In Equation 35,   is the sample covariance matrix of rt , and pλ (b) is a penalty term (such as ridge,
lasso, or elastic net) through which economic priors are imposed. Relatedly, Korsaye, Quaini &
Trojani (2019) provide a rigorous framework for regularization techniques in the recovery of the
SDF in economies with frictions or ambiguity.
The objective function in Equation 35 appears to require the inverse of the sample covariance
matrix  −1 , which is not well-defined when N > T. Instead, it is equivalent to optimizing a different
form of Equation 35,
 
  − 2b r̄ + b b
b = arg min b b  + pλ (b) , 37.
b

 −1 .
which avoids calculating 
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

3.4.4. Double machine learning. A fundamental task facing the asset pricing field today is
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

how to bring more discipline to the proliferation of factors. In particular, a question that re-
mains open is how to judge whether a new factor adds explanatory power for asset pricing, relative
to the hundreds of factors the literature has so far produced. Feng, Giglio & Xiu (2020) attempt to
address this question by systematically evaluating the contribution of individual factors relative to
existing factors as well as for conducting appropriate statistical inference in this high-dimensional
setting. While machine learning methods discussed in the previous section perform well by em-
ploying regularization to trade off bias with variance, both regularization and overfitting cause
a bias that distorts inference. Chernozhukov et al. (2018) introduce a general double machine
learning (DML) framework to mitigate bias and restore valid inference on a low-dimensional pa-
rameter of interest in the presence of high-dimensional nuisance parameters. Feng, Giglio & Xiu
(2020) make use of this framework to test the SDF loading of a newly proposed factor.
Suppose that gt is the factor of interest and ht a vector of potentially confounding factors such
that vt = (gt : ht ) . To test if gt (e.g., a newly proposed factor) contributes to expected returns
beyond the variables in ht (e.g., factors that have already been discovered by previous literature),
we should conduct inference on bg while controlling bh , where b = (bg : bh ) satisfies E(rt ) = Cb =
Cg bg + Ch bh , and C = β v is the covariance between rt and v t . If the number of factors in v t , K,
is finite, then the GMM approach introduced in Section 3.4.1 is adequate. When it comes to a
large K setting, however, the classical inference procedure is no longer valid. This is certainly a
relevant case in practice, as T is typically in the hundreds, roughly of the same scale as the number
of factors studied.
In the spirit of DML, Feng, Giglio & Xiu (2020) select controls from {C h } via two respective
lasso regressions: r̄ onto C h and C g onto Ch . The selected controls are the union of the controls
selected by each of the two lasso regressions and are denoted by C h[I] and C
h[I] . Then, C g serve as
regressors in another CSR of r̄. The resulting estimator of bg ,
  M  C
bg = (C  −1   r̄),
g Ch[I] g ) (Cg MCh[I]

is a desirable candidate
√ for inference because the regularization biases in lasso diminish at a faster
rate than does T after partialing out the effect of Ch from C g .

3.4.5. Parametric portfolios and deep learning stochastic discount factors. Because the
SDF (when projected onto tradable assets) is spanned by optimal portfolio returns, estimating the
SDF is effectively a problem of optimal portfolio formation. A fundamental obstacle to the con-
ventional mean-variance analysis is the low signal-to-noise ratio: Expected returns and covariances

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 355


of a large cross section of investable assets cannot be learned with high precision. In the previous
sections, we have discussed factor-based approaches that either exploit economic intuition and
theory or rely on statistical machine learning methods to regularize this learning problem. With
better estimates of expected returns and covariances comes improved portfolio performance.
Brandt, Santa-Clara & Valkanov (2009) propose an innovative solution to the portfolio opti-
mization problem by directly parameterizing portfolio weights as functions of asset characteristics
and then estimating the parameters by solving a utility optimization problem,

1  
T Nt
max U w(θ, bi,t−1 )r̃i,t ,
θ T t=2 i=1

where w(θ , bi, t−1 ) is a parametric function of stock characteristics, and U(·) is some prespecified
utility function. DeMiguel et al. (2020) show that this approach, when restricted to the special
case of a linear parametric weight function and mean-variance utility, is equivalent to the usual
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

mean-variance portfolio allocation paradigm but with characteristic-sorted portfolios as basis


assets. Cong et al. (2021) extend this framework to a more flexible neural network model and
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

optimize the Sharpe ratio of the portfolio (SDF) via reinforcement learning, with more than
50 features plus their lagged values. Chen, Pelger & Zhu (2019) parameterize the SDF loadings
and weights of test asset portfolios as two separate neural networks and adopt an adversarial min-
imax approach to estimate the SDF. Both adopt long-short-term memory models to incorporate
lagged time series information from macro variables, firm characteristics, or past returns.

3.5. Model Specification Tests and Model Comparison


Although financial economists suggest using economic theory to pin down the best model, their
efforts, unfortunately, have led to a zoo of factors and numerous candidate models. Some recent
and prominent models with observable portfolios as factors include Fama & French (2015),
Hou, Xue & Zhang (2015), He, Kelly & Manela (2017), Stambaugh & Yuan (2017), and Daniel,
Hirshleifer & Sun (2020). However, purely statistical tests are often powerless because the sample
sizes are too limited to tease out the true model.

3.5.1. GRS test and extensions. Specifically, assessments of factor pricing models can be
formalized as statistical hypothesis testing problems. Such tests most commonly focus on the zero-
alpha condition: If the factor model reflects the true SDF, then it should price all test assets with
zero alpha (up to sampling variation). A standard formulation for the null hypothesis is
H0 : α1 = α2 = · · · = αN = 0. 38.
In a simple setting in which all factors are observable and tradable, the model given by Equations 1
and 2 can be written as rt = α + βft + ut , so that alphas can be estimated via asset-wise TSRs:
αTS = (RMF ιT )(ιT MF ιT )−1 .
 39.
Gibbons, Ross & Shanken (1989) constructed a quadratic test statistic,
  −1
= T −N −K 
F
αTS u  αTS
, 40.
N ¯  
(1 + f v−1 f¯ )
and developed its exact finite sample distribution, a noncentral F-distribution, under the
assumption of Gaussian errors.
An important limitation of this result is that it requires that T > N + K. In practice, N can be
much larger than T. Even in the case of N < T, the power of the Gibbons-Ross-Shanken (GRS)
test may be compromised because it employs an unrestricted sample covariance matrix,  u , that is

356 Giglio • Kelly • Xiu


known to perform badly even for moderate values of N. When asset returns follow an approximate
factor model (Chamberlain & Rothschild 1983), the idiosyncratic errors may be weakly correlated;
thus, it is possible to enhance the power of the GRS test by imposing structure on  u .
Pesaran & Yamagata (2017) suggest a simple quadratic test statistic that ignores off-diagonal
elements of  u ,5
T 
αTS u )−1
Diag( v−1 f¯ )−1 − N
αTS (1 + f¯  

J1 =  ,
2N [1 + (N − 1) ρN2 ,T ]
where ρ N ,T is a correction term related to the sparsity of  u . This term can be omitted if  u is
assumed diagonal, which in turn leads to a simpler test statistic.
Alternatively, Fan, Liao & Yao (2015) suggest imposing a sparsity structure on  u . They ex-
ploit this sparsity to achieve a consistent estimator of  u , denoted as uT , following Fan, Liao &
Mincheva (2011). The resulting test statistic is
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

  T −1 v−1 f¯ )−1 − N
 TαTS (u ) αTS (1 + f¯  
J2 = √ .
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

2N
They propose other enhancements to improve the power of their test against sparse alternatives.
These extensions remedy some of the drawbacks of the GRS test and have asymptotic guaran-
tees as N, T → ∞, which represent an important step forward for tests of asset pricing models. Tests
in this section all rely on models entirely composed of tradable factors, but in light of Equation 46
below, the same test statistics and asymptotic inference should be directly applicable to models
with nontradable and latent factors via Equation 44.

3.5.2. Model comparison tests. Testing models is perhaps less informative than comparing
models. After all, all models are wrong, but some are more useful than others. As Gibbons, Ross
& Shanken (1989) emphasize, the factor model given in Equation 1 directly implies the following
equality for the GRS test statistic:
−1
 βv β  + u βv α + βγ
α u−1 α 
≡ (α + βγ ) , γ 
− γ  v−1 γ . 41.
v β  v γ

Going one step further, we obtain α  u−1 α = SR2 ({rt , vt + γ }) − SR2 ({vt + γ }), where SR({at })
denotes the optimal Sharpe ratio of a portfolio using assets at . In other words, the classical GRS
test statistic can be interpreted as a test of whether the factors achieve the maximal Sharpe ratio,
or whether one can improve on that Sharpe ratio by trading the test assets in addition to the
factors. Intuitively, if {v t + γ } already span the optimal portfolio (i.e., the asset pricing model is
correctly specified), the Sharpe ratio gains from augmenting this portfolio with additional test
assets rt should be zero.
Indeed, we can compare models using the left-hand side of Equation 41 as a criterion. Specif-
ically, consider two models with tradable factor sets { ft(1) } and { ft(2) }, respectively. Barillas &
Shanken (2017) advocate comparing these models on their ability to price all returns, both test
assets and traded factors. With this perspective comes an insight that test assets tell us nothing
about model comparison beyond what we learn from each model’s ability to price factors of the
other models! This observation is verified from Equation 41, since

  (1) −1 (1)  −1 (2)


α (1) u α < α (2) u(2) α ⇐⇒ SR2 ({ ft(1) }) > SR2 ({ ft(2) }). 42.

5 We omit finite sample adjustment terms from the original construction of their test statistic for simplicity

and clarity.

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 357


Barillas et al. (2020) exploit this insight and build asymptotically valid tests of model
comparison using differences of squared Sharpe ratios. Their analysis allows for pairwise compar-
ison between nonnested models and accounts for estimation error in factor-mimicking portfolio
weights for nontradable factors. Alternative criteria for model comparison also include the
Hansen-Jagannathan distance by Hansen & Jagannathan (1997) (see, e.g., Kan & Robotti 2009;
Gospodinov, Kan & Robotti 2013) and the cross-sectional R2 (see Kan, Robotti & Shanken 2013).

3.5.3. Bayesian approach. As the set of candidate models expands, model comparison via
pairwise asymptotic tests becomes a daunting task. And pairwise model comparison may not un-
ambiguously isolate the best-performing model. Moreover, multiple testing issues can arise. To
find the best factor pricing model, Barillas & Shanken (2018) develop a Bayesian procedure that
computes model probabilities for a collection of asset pricing models with tradable factors. They
adopt an off-the-shelf Jeffreys prior on betas and residual covariances, following the earlier work
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

of Harvey & Zhou (1990):


P(β, u ) ∝ |u |−(N +1)/2 .
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

Under the null hypothesis of no alpha, alpha follows a delta function concentrated at 0. Under the
alternative, alpha is distributed as
P(α|β, u ) = N (0, ku ), for some k > 0.
The benefit of this prior is its convenience and economic sensibility: It imposes that the expected
Sharpe ratio of the arbitrage portfolio, α   −1 α, is kN, which does not take implausibly large values.
Having an otherwise diffuse prior on α would force the Bayes factor to favor the null (Kass &
Raftery 1995). Barillas & Shanken (2018) provide closed-form expressions of the Bayes factor for
testing zero alpha and, more importantly, of the marginal likelihood of each model. In light of
Barillas & Shanken (2017), the model comparison of Barillas & Shanken (2018) is based on an
aggregation of evidence from all possible multivariate regressions of excluded factors on factor
subsets—i.e., it takes test assets out of the picture. Chib, Zeng & Zhao (2020) show that the use
of the standard Jeffreys priors on model-specific nuisance parameters is unsound for Bayes factors
and propose a new class of improper priors for nuisance parameters based on invertible maps,
which leads to valid marginal likelihoods and model comparisons.
Bryzgalova, Huang & Julliard (2022) further extend the Bayesian framework for model selec-
tion in the presence of potentially weak and nontradable factors. They reparameterize the expected
returns using Equation 32 and propose a spike-and-slab prior on b to encourage model selection
and ensure the validity of Bayes factors (because a flat prior would otherwise inflate Bayes factors
for models that contain weak factors).
More specifically, they introduce a vector of binary latent variables δ = (δ 1 , δ 2 , . . . , δ K ) , where
δ j  {0, 1}. δ indexes 2K possible models. The jth variable, bj (with associated loadings Cj ), is
included if and only if δ j = 1. Their prior on b has the following spike-and-slab form:


K
P(b|δ, σ 2 ) = (1 − δ j )Dirac(b j ) + δ j P(b j |σ 2 ), P(b j |σ 2 ) ∼ N (0, σ 2 ψ j );
j=1


K
P(δ|w) = wδ j (1 − w)1−δ j , w ∼ P(w); P(σ 2 ) ∼ σ −2 .
j=1

The Gaussian prior is used to model the nonnegligible entries (the slab), and the Dirac mass
at zero is used to model the negligible entries (the spike), which could be replaced by a con-
tinuous density heavily concentrated around zero. This prior, originally proposed by Mitchell &

358 Giglio • Kelly • Xiu


Beauchamp (1988), is known to favor parsimonious models in high dimensions, avoiding the curse
of dimensionality. Another crucial component of this prior lies in their choice of ψ j ,
ψ j = ψρ j ρ j ,
where ρ j is an N × 1 vector of correlation coefficients between factor j and the test assets, and
ψ > 0 is a tuning parameter that controls the degree of shrinkage over all factors. If ρ j is close
to zero, the prior discourages it from being selected. This prior, however, does not seem to guard
against models with highly correlated factors that cause a rank-deficiency issue similar to that of
weak factors.
Taking test assets as given, Bryzgalova, Huang & Julliard (2022) aim for selecting an SDF that
does not contain weak factors. The weak factors are defined in terms of C, which is similar to but
distinct from the definition in Section 3.3.4, in which the weak factor problem is with respect to
beta.6
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

3.6. Alphas and Multiple Testing


Access provided by 107.186.235.202 on 12/25/23. For personal use only.

Alphas are the portion of expected returns that cannot be explained by risk exposures. Thus, a
portfolio with a significant alpha relative to a status quo model (e.g., the CAPM or the Fama-
French three-factor model) is dubbed an anomaly. Harvey, Liu & Zhu (2016) investigate more
than 300 anomalies proposed in the literature and argue that many of these anomalies are statistical
artifacts due to data snooping or multiple testing (MT).
The literature in asset pricing has long been aware of data-snooping concerns and MT issues
in alpha tests and has taken various approaches to address them over the years. Leading examples
include Lo & MacKinlay (1990) and Sullivan, Timmermann & White (1999), among many others.
Early proposals suggest replacing a multitude of null hypotheses with one single null hypothe-
sis: H0 : maxi αi ≤ 0 or H0 : E(αi ) = 0 (see, e.g., White 2000, Kosowski et al. 2006, Fama & French
2010). While these are interesting null hypotheses for testing, more relevant and informative
hypotheses for alpha testing are perhaps
Hi0 : αi = 0, i = 1, . . . , N. 43.
This collection of hypotheses is fundamentally different from the single null hypothesis of
Gibbons, Ross & Shanken (1989) in Equation 38. Suppose ti is a test statistic for the null Hi0
(often taken as the t-statistic) and that a corresponding test rejects the null whenever ti > ci for
some prespecified cutoff ci . Let H0 ⊂ {1, . . . , N } denote the set of indices for which the corre-
sponding null hypotheses are true. In addition, let R be the total number of rejections in a sample,
and let F be the number of false rejections in that sample:


N 
N
F= 1{i ≤ N : ti > ci and i ∈ H0 }, R= 1{i ≤ N : ti > ci }.
i=1 i=1

Both F and R are random variables. Note that, in a specific sample, we can obviously observe R,
but we cannot observe F. Nonetheless, we can design procedures to effectively limit F relative to
R in expectation.
More formally, the MT literature often works with the false discoveries proportion (defined as
FDP = F/max{R, 1}) and seeks procedures to control its expectation, known as the false discovery

6 In related work, Gospodinov, Kan & Robotti (2014) take a frequentist approach to this problem.

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 359


rate [defined as FDR = E(FDP)]. Other objects that some MT approaches seek to control are the
per-test error rate, E(F )/N , and the family-wise error rate [defined as FWER = P(F ≥ 1)].
A naïve procedure that tests each individual hypothesis at a predetermined level τ  (0, 1)
guarantees that E(F )/N ≤ τ . Alternatively, the Bonferroni procedure tests each hypothesis at a
level τ /N, which translates into a higher t-statistic hurdle. This guarantees that P(F ≥ 1) ≤ τ and
keeps the FDR below τ . Naturally, raising the hurdle for a discovery reduces the incidence of
false discovery, but this also mechanically reduces the rate of true positives. In other words, false
discovery control sacrifices power. The FDR control procedures of Benjamini & Hochberg (1995)
and Benjamini & Yekutieli (2001) attempt to strike a better balance between false discovery and
power. By accepting a certain number of false discoveries, we pay a lesser price in power and thus
have fewer missed discoveries. Barras, Scaillet & Wermers (2010), Bajgrowicz & Scaillet (2012),
and Harvey, Liu & Zhu (2016) are among the first to import these statistical methods into asset
pricing contexts.7
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

More recently, to obtain valid p-values and t-statistics for alphas in this context, Giglio, Liao
& Xiu (2021) develop a rigorous framework with asymptotic guarantees to conduct inference
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

on alphas in linear factor models, accounting for the high dimensionality of test assets, missing
data, and potentially omitted factors. Factor model presentations up to this point have imposed
that alphas are zero, which makes risk premia identifiable. Giglio, Liao & Xiu (2021) relax the
zero-alpha assumption and impose an assumption that alpha is cross-sectionally independent of
beta (and accompany this with a large N asymptotic scheme). Their alpha estimator is given by

 γ  ,
α = r̄ − β  Mι β
γ  = (β
 )−1 (β
 Mι r̄), 44.
N N

where β  is given by Equation 9 if all factors are observable or by Equation 13 if factors are
latent. Including an intercept term in the CSR in Equation 44 allows for a possibly nonzero
cross-sectional mean for alpha. Then p-values of  α can be constructed using Equation 46 below,
which serve as inputs for FDR control.
The aforementioned frequentist MT corrections tend to be very conservative to limit false dis-
coveries. Generally speaking, they widen confidence intervals and raise p-values but do not alter
the underlying point estimate. Jensen, Kelly & Pedersen (2021) take an empirical Bayes approach
to understanding alphas in the high-dimensional context of the factor zoo, including addressing
concerns about false anomaly discoveries. They propose a Bayesian hierarchical model to accom-
plish their MT correction, which leverages two key model attributes. First is a zero-alpha prior,
which imposes statistical conservatism in analogy to frequentist MT methods. It anchors alpha
estimates to a sensible null in case the data are insufficiently informative about the parameters
of interest. Bayesian false discovery control comes from shrinking estimates toward this prior. A
benefit of the Bayesian approach, however, is that the degree of FDR control decreases as data ac-
cumulate. Eventually, with enough data, the prior gets zero weight and there is no MT correction.
This is justified: In the large data limit, there are no false discoveries! In other words, Bayesian
modeling flexibly decides on the severity of MT correction based on how much information there
is in the data.
Second, the hierarchical structure in the Jensen, Kelly & Pedersen (2021) model leverages the
joint behavior of factors, allowing factors’ alpha estimates to borrow strength from one another.
As a result, alphas for different factors are shrunk not only toward zero but also toward each other.
The frequentist corrections typically treat factors in isolation, making those corrections even more

7 In addition, Harvey & Liu (2020) propose an innovative double-bootstrap method to control FDR while also

considering a false negative rate and odds ratio.

360 Giglio • Kelly • Xiu


conservative in some cases, and those corrections always widen confidence intervals and reduce
discoveries. A fascinating feature of the Bayesian hierarchical model is that jointly modeling factors
can in some cases narrow confidence intervals. If increased precision of alpha estimates from joint
estimation overshadows the discovery-reducing effect of shrinkage, the Bayesian MT approach
can in fact enhance statistical power. In fact, Jensen, Kelly & Pedersen (2021) show, in global factor
returns data, that conservative shrinkage to the prior and improved alpha estimate precision almost
exactly net out, and the number of discoveries is roughly the same as in the frequentist analysis
without an MT correction.
In related work, Chen (2021) argues that it would require an absurd amount of hacking
attempts for p-hacking to explain the anomaly alpha discoveries documented in the literature.
More explicitly, these anomalies are, broadly speaking, replicable, as demonstrated by Chen &
Zimmermann (2022) and Jensen, Kelly & Pedersen (2021).
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

4. ASYMPTOTIC THEORY
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

Three main asymptotic schemes have emerged in the literature for characterizing the statistical
properties of factor models, risk premia, and alphas. Classical inference relies on the usual large T,
fixed N asymptotics. This remains the most common setup in asset pricing. The second scheme
allows both N and T to increase to ∞ (with some rate restrictions). The third scheme adopts a
large N, fixed T design. There are pros and cons with each scheme that should be considered when
conducting inference. We illustrate this point with several examples here.

4.1. Fixed N, Large T


Under the classical scheme, Shanken (1992) developed the central limit theorem of the two-pass
estimator (Equations 9 and 24). The asymptotic variances of the OLS and GLS two-pass risk-
premia estimators are given by
⎡ ⎤
1 ⎢   ⎥
⎢(β  β )−1 β  u β(β  β )−1 1 + γ  (v )−1 γ +v ⎥ ,
γ) =
OLS : Avar( ⎣ ⎦
T   

Shanken adjustment for β
⎡ ⎤
1 ⎢   ⎥
⎢(β  (u )−1 β )−1 1 + γ  (v )−1 γ +v ⎥ .
γ) =
GLS : Avar( ⎣ ⎦
T   

Shanken adjustment for β

In the same vein, the GMM estimator of SDF loadings,


b, given by Equation 30, has asymptotic
variance,
1
Avar(
b) = (GW G)−1 GW W G(GW G)−1 ,
T
where
√ 
,
W = plimT →∞W G = plimT →∞ ∇(b,μ)
gT (b, μ), and  = limT →∞ Var gT (b, μ) .
T

4.2. Large N, Large T


Suppose that N is allowed to increase with T. Additionally, suppose that betas satisfy a perva-
siveness assumption N−1 β  β −  β  = oP (1) for some  β > 0 as well as a bounded eigenvalue

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 361


assumption  u   1. Then we have

(β  β )−1   N −1 , β  u β  N , [β  (u )−1 β]−1   N −1 .

As a result, it is straightforward to show that the asymptotic variances of both OLS and (infeasible)
GLS share the form
 
Avar(γ ) = T −1 v + O N −1 T −1 . 45.

Heuristically, we see that when N is large, there is no need to worry about estimating a large
covariance matrix  u or making a Shanken adjustment. Moreover, both OLS and infeasible GLS
are asymptotically equivalent to the sample mean estimator f¯ regardless of whether f is tradable
or not. All these estimators achieve the same asymptotic variance,  v /T. In this regard, adopting
the large N, large T scheme greatly simplifies the inference on γ !
Similarly, in light of the aforementioned relationship between  γ (Equation 24) and  b
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

  −1
(Equation 31, so that b = v  γ ), as well as Equation 45, we can heuristically derive the asymptotic
variance of b for both OLS and (infeasible) GLS in the large N, large T setting. Simply applying
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

the delta method to the joint distribution of v̄ and  v , we have


1     !
Avar(
b) = (v )−1 − 2E (v )−1 vt vt (v )−1 γ  (v )−1 vt + Var (v )−1 vt vt (v )−1 γ .
T
Again, both infeasible GLS and OLS estimates of  b are asymptotically equivalent when N and T
are large. But OLS is simpler, as GLS requires ( u )−1 , which would be poorly estimated without
additional restrictions on  u .
Another blessing of high dimensionality (N → ∞) is that econometricians need not know
the factors’ identities. Latent factors and factor exposures can be consistently recovered via SVD
in Equation 12, up to some invertible matrix H. Consequently, factor risk premia, γ , are also
recoverable up to this transformation. Formally, Giglio & Xiu (2021) establish that

γ − Hγ = H v̄ + OP (N −1 + T −1 ).


Even though these estimated factors cannot be interpreted, which is a major drawback of any
latent factor model, Giglio & Xiu (2021) show that these factors serve as controls that facili-
tate the inference on γ g = ηγ , which can be identified and hence interpreted for any factor of
interest, gt .
With respect to alphas, Giglio, Liao & Xiu (2021) show that alpha estimates satisfy

−1 d
σi,N αi − αi ) −→ N (0, 1),
T (
1 1 1
σi,N
2
T = Var(uit (1 − vt v−1 γ )) + Var(αi ) βi Sβ−1 βi , 46.
T N N

for each i ≤ N as N, T → ∞. Here we have Sβ = N1 β  M1N β. The second term is OP (N−1 ), sug-
gesting that  α is inconsistent if N is finite. This formula holds whether factors are observable or
latent. If T log N = o(N), the second term diminishes sufficiently fast that one only needs the first
term in Equation 46 to construct p-values for each individual alpha.
A critical assumption behind the above analysis is that all factors are pervasive. While this
assumption is widely adopted in modern factor analysis (e.g., Bai 2003) due to its simplicity and
convenience, it is often in conflict with empirical evidence. If this assumption is violated, factors
and their risk exposures may not be discovered by PCA.
There is a growing strand of econometrics literature on weak factor models. Bai & Ng (2008)
argue that the properties of idiosyncratic errors should be considered when constructing principal

362 Giglio • Kelly • Xiu


components. Dropping some data, if they are noisy, may improve the forecasting. They compare
the empirical performance of hard thresholding, lasso, elastic net, and least angle regressions for
the selection of subsets for factor estimation (without theoretical analysis). Huang et al. (2021)
propose a scaled PCA approach that incorporates information from the forecasting target into
the factor extraction procedure. Bailey, Kapetanios & Pesaran (2021) assume a sparse structure on
the loading matrix of factor exposure. Under this assumption, they propose a measure of factor
strength. Freyaldenhoven (2019) proposes an estimator of the number of factors in the presence
of weak factors, though the notion of weak factors is somewhat strong because PCA in that setting
can still recover such factors consistently. Pesaran & Smith (2019) investigate the impact of factor
strength and pricing error on risk premium estimation. They point out that the conventional
two-pass risk premium estimator converges at a lower rate as the factors become weaker.
Lettau & Pelger (2020a) compare their risk-premia PCA with the standard PCA estimator
in a setting in which all factors are extremely weak, so much so that they are not statistically
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

distinguishable from idiosyncratic noise [see theoretical results by Onatski (2009, 2012) in similar
weak factor models]. In that case, no estimator can be consistent for either risk premia or the SDF.
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

Lettau & Pelger (2020a) show that risk-premia PCA does not consistently recover the SDF, but
it correlates more with the SDF than does the SDF obtained from standard PCA. Rather than
focusing on this extreme case of weak factors, Giglio, Xiu & Zhang (2021) develop asymptotic
theory covering a whole range of factor weaknesses, which permits consistent estimation of factors,
risk premia, and the SDF. Formally, they allow for the case in which the minimum eigenvalues of
the factor component in the covariance matrix of returns diverge, whereas the largest eigenvalue
due to the idiosyncratic errors is bounded. In this general setup, a weak factor problem arises if
and only if N/[λmin (β  β)T] 0, in which case the three-pass estimator of Giglio & Xiu (2021),
ridge or partial least squares estimators, and the risk-premia PCA estimator of Lettau & Pelger
(2020a) all give a biased risk premium estimate, but the supervised PCA estimator of Giglio, Xiu
& Zhang (2021) still works.

4.3. Large N, Fixed T


Raponi, Robotti & Zaffaroni (2020) propose a different asymptotic framework to estimate and
test linear asset pricing models. In their setup, T is fixed, yet N increases. As explained by Shanken
(1992), when T is fixed, it is impossible to have a consistent estimator of risk premia. Raponi,
Robotti & Zaffaroni (2020) therefore focus on the so-called ex post risk premia, defined as
γ p = γ + f¯ − E( ft ), and establish that the two-pass OLS estimator, after some bias correction,
converges to γ p at the rate of N−1/2 .
Not surprisingly, their central limit theorem provides a more accurate finite sample descrip-
tion of the two-pass estimator when T is small. The caveat, nonetheless, is that the estimand is
dominated by factor innovations. This is because f¯ − E( ft ) ∼ OP (T −1/2 ) is typically large relative
to γ , as γ /std( f¯ − E( ft )) is effectively the t-statistic for testing the factor risk premium, which is
small or insignificant unless T is large.
Zaffaroni (2019) extends this framework to allow for latent factors, providing new asymptotic
analysis on PCA-based estimators of ex post risk premia and the associated ex post SDF. The
strength of this setup is that it naturally handles time-varying factor models, in which every feature
is allowed to be time varying, including loadings, idiosyncratic risk, and the number of risk factors.

5. CONCLUSION
Factor models have historically been the workhorse framework for empirical analysis in asset
pricing. In this review, we survey the next generation of factor models with an emphasis on

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 363


high-dimensional settings and the concomitant statistical tools of machine learning. Our review
highlights a recent revival of (highly sophisticated) methodological research into factor modeling
in asset markets. The advances and insights that have come with this revival ensure that factor
models will continue to be central to empirical asset pricing in coming years.
Machine learning is neither an empirical panacea nor a substitute for economic theory and
the structure it lends to empirical work. In other words, finance domain knowledge remains an
indispensable component of statistical learning problems in asset markets. Indeed, our view is that
the most promising direction for future empirical asset pricing research is developing a genuine
fusion of economic theory and machine learning. It is a natural marriage, as asset pricing theory
revolves around price formation through the aggregation of investor beliefs, which undoubtedly
enter prices in subtle, complex, and sometimes surprising ways. At the same time, machine learning
constitutes a sophisticated collection of statistical models that flexibly adapt to settings with rich
and complex information sets.
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

Machine learning factor models are one such example of this fusion. Almost all leading theo-
retical asset pricing models predict a low-dimensional factor structure in asset prices. Where these
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

models differ is in their predictions regarding the identity of the common factors. Much of the
frontier work in empirical asset pricing can be viewed as using the (widely agreed upon) factor
structure skeleton as a theory-based construct within which various machine learning schemes
are injected to conduct an open-minded investigation into the economic nature of the common
factors.
Our survey is inevitably selective and disproportionally influenced by our own research on
these topics. We have mainly focused on methodological contributions, leaving a detailed review of
empirical discoveries via these methodologies for future work. A frequently discussed dichotomy
in the literature is observable factor versus latent factor models. While some of the methods we
discuss apply to observable factor settings (or hybrid settings), we have also skewed our coverage
in favor of latent factor methods given the growing emphasis on them in the literature. In addi-
tion, we have focused on statistical frameworks as opposed to theoretical economic underpinnings
or specifications implied by structural models (which mirrors the emphasis in the literature as a
whole).
This area of research is evolving quickly, and there is a myriad of opportunities for improve-
ments and new directions. The first CAPM-based return factor models were analyzed to test
specific predictions of theoretical models. In the time since, the research pendulum has swung
far in the opposite direction toward purely statistical model formulations with little connection
to theory. Perhaps the most important direction for future research is to reestablish the link
between asset pricing theory and empirical models of returns. Machine learning, through its abil-
ity to cast a wide net for detecting the underlying determinants of return behavior, can be a critical
tool for this endeavor. To do so, it will need to focus more squarely on integrating the behavior
of returns with data on fundamental microeconomic and macroeconomic activity and cash flows
as well as emphasizing the economic interpretability of the associations it finds. Along these lines,
new theories in behavioral finance present opportunities to marry returns with more readily avail-
able nonprice data such as survey responses and textual narratives. Machine learning methods can
be a key ingredient in deriving the empirical map between prices and the beliefs of economic
agents encoded in these nonstandard data sources. Another important research direction is to
take seriously structural change in financial markets and asset returns. How should our return
models accommodate structural evolution in the economy, regulatory and political regime shifts,
and financial technological progress? How can we capture the subtle return dynamics of more
gradual economic feedback mechanisms, for example, alpha decay emerging from learning and
competition effects in markets?

364 Giglio • Kelly • Xiu


DISCLOSURE STATEMENT
The authors are not aware of any affiliations, memberships, funding, or financial holdings that
might be perceived as affecting the objectivity of this review.

LITERATURE CITED
Ahn DH, Conrad J, Dittmar RF. 2009. Basis assets. Rev. Financ. Stud. 22(12):5133–74
Aït-Sahalia Y, Jacod J, Xiu D. 2021. Inference on risk premia in continuous-time asset pricing models. Work. Pap.,
Univ. Chicago, Chicago, IL. https://dachxiu.chicagobooth.edu/download/RPContTime.pdf
Anatolyev S, Mikusheva A. 2022. Factor models with many assets: strong factors, weak factors, and the two-pass
procedure. J. Econom. 229(1):103–26
Ang A, Hodrick R, Xing Y, Zhang X. 2006. The cross-section of volatility and expected returns. J. Finance
61:259–99
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

Ang A, Liu J, Schwarz K. 2020. Using individual stocks or portfolios in tests of factor models. J. Financ. Quant.
Anal. 55:709–50
Bai J. 2003. Inferential theory for factor models of large dimensions. Econometrica 71(1):135–71
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

Bai J, Ng S. 2002. Determining the number of factors in approximate factor models. Econometrica 70:191–221
Bai J, Ng S. 2008. Forecasting economic time series using targeted predictors. J. Econom. 146(2):304–17
Bailey N, Kapetanios G, Pesaran MH. 2021. Measurement of factor strength: theory and practice. J. Appl.
Econom. 36(5):587–613
Bajgrowicz P, Scaillet O. 2012. Technical trading revisited: false discoveries, persistence tests, and transaction
costs. J. Financ. Econ. 106(3):473–91
Baldi P, Hornik K. 1989. Neural networks and principal component analysis: learning from examples without
local minima. Neural Netw. 2(1):53–58
Bansal R, Yaron A. 2004. Risks for the long run: a potential resolution of asset pricing puzzles. J. Finance
59(4):1481–509
Barillas F, Kan R, Robotti C, Shanken J. 2020. Model comparison with Sharpe ratios. J. Financ. Quant. Anal.
55(6):1840–74
Barillas F, Shanken J. 2017. Which alpha? Rev. Financ. Stud. 30(4):1316–38
Barillas F, Shanken J. 2018. Comparing asset pricing models. J. Finance 73(2):715–54
Barras L, Scaillet O, Wermers R. 2010. False discoveries in mutual fund performance: measuring luck in
estimated alphas. J. Finance 65(1):179–216
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to
multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57(1):289–300
Benjamini Y, Yekutieli D. 2001. The control of the false discovery rate in multiple testing under dependency.
Ann. Stat. 28(4):1165–88
Brandt MW, Santa-Clara P, Valkanov R. 2009. Parametric portfolio policies: exploiting characteristics in the
cross-section of equity returns. Rev. Financ. Stud. 22(9):3411–47
Bryzgalova S. 2015. Spurious factors in linear asset pricing models. Work. Pap., Stanford Univ., Stanford, CA
Bryzgalova S, Huang J, Julliard C. 2022. Bayesian solutions for the factor zoo: We just ran two quadrillion models.
Work. Pap., Lond. Sch. Econ. Political Sci., London, UK. https://personal.lse.ac.uk/julliard/papers/
BSftFT.pdf
Bryzgalova S, Pelger M, Zhu J. 2020. Forest through the trees: building cross-sections of asset returns. SSRN Work.
Pap. https://dx.doi.org/10.2139/ssrn.3493458
Bchner M, Kelly BT. 2022. A factor model for option returns. J. Financ. Econ. 143(3):1140–61
Campbell JY, Cochrane JH. 1999. By force of habit: a consumption-based explanation of aggregate stock
market behavior. J. Political Econ. 107(2):205–51
Chamberlain G, Rothschild M. 1983. Arbitrage, factor structure, and mean-variance analysis on large asset
markets. Econometrica 51:1281–304
Chen AY. 2021. The limits of p-hacking: some thought experiments. J. Finance 76(5):2447–80
Chen AY, Zimmermann T. 2022. Open source cross-sectional asset pricing. Crit. Finance Rev. 11(2):207–64
Chen L, Pelger M, Zhu J. 2019. Deep learning in asset pricing. Work. Pap., Stanford Univ., Stanford, CA

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 365


Chen NF, Roll R, Ross SA. 1986. Economic forces and the stock market. J. Bus. 59(3):383–403
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, et al. 2018. Double/debiased machine
learning for treatment and structure parameters. Econom. J. 21(1):C1–68
Chib S, Zeng X, Zhao L. 2020. On comparing asset pricing models. J. Finance 75(1):551–77
Chinco A, Clark-Joseph AD, Ye M. 2019. Sparse signals in the cross-section of returns. J. Finance 74:449–92
Cong LW, Tang K, Wang J, Zhang Y. 2021. Alphaportfolio: direct construction through deep reinforcement learning
and interpretable AI. Work. Pap., Cornell Univ., Ithaca, NY
Connor G, Hagmann M, Linton O. 2012. Efficient semiparametric estimation of the Fama–French model and
extensions. Econometrica 80(2):713–54
Connor G, Korajczyk RA. 1986. Performance measurement with the arbitrage pricing theory: a new
framework for analysis. J. Financ. Econ. 15(3):373–94
Daniel K, Hirshleifer D, Sun L. 2020. Short- and long-horizon behavioral factors. Rev. Financ. Stud.
33(4):1673–736
DeMiguel V, Martin-Utrera A, Nogales FJ, Uppal R. 2020. A transaction-cost perspective on the multitude
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

of firm characteristics. Rev. Financ. Stud. 33(5):2180–222


Fama EF, French KR. 1993. Common risk factors in the returns on stocks and bonds. J. Financ. Econ. 33(1):3–56
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

Fama EF, French KR. 2008. Dissecting anomalies. J. Finance 63(4):1653–78


Fama EF, French KR. 2010. Luck versus skill in the cross-section of mutual fund returns. J. Finance 65(5):1915–
47
Fama EF, French KR. 2015. A five-factor asset pricing model. J. Financ. Econ. 116(1):1–22
Fama EF, MacBeth JD. 1973. Risk, return, and equilibrium: empirical tests. J. Political Econ. 81(3):607–36
Fan J, Liao Y, Mincheva M. 2011. High-dimensional covariance matrix estimation in approximate factor
models. Ann. Stat. 39(6):3320–56
Fan J, Liao Y, Wang W. 2016. Projected principal component analysis in factor models. Ann. Stat. 44(1):219–54
Fan J, Liao Y, Yao J. 2015. Power enhancement in high-dimensional cross-sectional tests. Econometrica
83(4):1497–541
Feng G, Giglio S, Xiu D. 2020. Taming the factor zoo: a test of new factors. J. Finance 75(3):1327–70
Freyaldenhoven S. 2019. A generalized factor model with local factors. Work. Pap. 19-23, Fed. Reserve Bank Phila.,
Philadelphia, PA
Freyberger J, Neuhierl A, Weber M. 2020. Dissecting characteristics nonparametrically. Rev. Financ. Stud.
33(5):2326–77
Gagliardini P, Ossola E, Scaillet O. 2016. Time-varying risk premium in large cross-sectional equity datasets.
Econometrica 84(3):985–1046
Gagliardini P, Ossola E, Scaillet O. 2019. A diagnostic criterion for approximate factor structure. J. Econom.
212(2):503–21
Gibbons M, Ross SA, Shanken J. 1989. A test of the efficiency of a given portfolio. Econometrica 57(5):1121–52
Giglio S, Liao Y, Xiu D. 2021. Thousands of alpha tests. Rev. Financ. Stud. 34(7):3456–96
Giglio S, Xiu D. 2021. Asset pricing with omitted factors. J. Political Econ. 129(7):1947–90
Giglio S, Xiu D, Zhang D. 2021. Test assets and weak factors. NBER Work. Pap. 29002
Gospodinov N, Kan R, Robotti C. 2013. Chi-squared tests for evaluation and comparison of asset pricing
models. J. Econom. 173(1):108–25
Gospodinov N, Kan R, Robotti C. 2014. Misspecification-robust inference in linear asset-pricing models with
irrelevant risk factors. Rev. Financ. Stud. 27(7):2139–70
Gu S, Kelly B, Xiu D. 2020. Empirical asset pricing via machine learning. Rev. Financ. Stud. 33(5):2223–73
Gu S, Kelly BT, Xiu D. 2021. Autoencoder asset pricing models. J. Econom. 222:429–50
Hansen LP. 1982. Large sample properties of generalized method of moments estimators. Econometrica
50:1029–54
Hansen LP, Jagannathan R. 1997. Assessing specification errors in stochastic discount factor models. J. Finance
52:557–90
Harvey CR, Ferson WE. 1999. Conditioning variables and the cross-section of stock returns. J. Finance
54:1325–60
Harvey CR, Liu Y. 2020. False (and missed) discoveries in financial economics. J. Finance 75(5):2503–53

366 Giglio • Kelly • Xiu


Harvey CR, Liu Y, Zhu H. 2016. ... and the cross-section of expected returns. Rev. Financ. Stud. 29(1):5–68
Harvey CR, Zhou G. 1990. Bayesian inference in asset pricing tests. J. Financ. Econ. 26(2):221–54
He Z, Kelly B, Manela A. 2017. Intermediary asset pricing: new evidence from many asset classes. J. Financ.
Econ. 126(1):1–35
He Z, Krishnamurthy A. 2013. Intermediary asset pricing. Am. Econ. Rev. 103(2):732–70
Hou K, Xue C, Zhang L. 2015. Digesting anomalies: an investment approach. Rev. Financ. Stud. 28(3):650–705
Huang D, Jiang F, Li K, Tong G, Zhou G. 2021. Scaled PCA: a new approach to dimension reduction. Manag.
Sci. 68(3):1678–95
Huberman G. 1982. A simple approach to arbitrage pricing theory. J. Econ. Theory 28(1):183–91
Huberman G, Kandel S, Stambaugh RF. 1987. Mimicking portfolios and exact arbitrage pricing. J. Finance
42(1):1–9
Ingersoll JE. 1984. Some results in the theory of arbitrage pricing. J. Finance 39(4):1021–39
Ioffe S, Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal
covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, ed. F Bach, D Blei,
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

pp. 448–56. New York: Assoc. Comput. Mach.


Jegadeesh N, Noh J, Pukthuanthong K, Roll R, Wang J. 2019. Empirical tests of asset pricing models
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

with individual assets: resolving the errors-in-variable bias in risk premium estimation. J. Financ. Econ.
133(2):273–98
Jensen TI, Kelly B, Pedersen LH. 2021. Is there a replication crisis in finance? J. Finance. In press
Jiang J, Kelly B, Xiu D. 2021. (Re-)Imag(in)ing price trends. SSRN Work. Pap. https://dx.doi.org/10.2139/
ssrn.3756587
Kan R, Robotti C. 2009. Model comparison using the Hansen-Jagannathan distance. Rev. Financ. Stud.
22(9):3449–90
Kan R, Robotti C, Shanken J. 2013. Pricing model performance and the two-pass cross-sectional regression
methodology. J. Finance 68(6):2617–49
Kan R, Zhang C. 1999. Two-pass tests of asset pricing models with useless factors. J. Finance 54(1):203–35
Kass RE, Raftery AE. 1995. Bayes factors. J. Am. Stat. Assoc. 90(430):773–95
Ke T, Kelly B, Xiu D. 2019. Predicting returns with text data. Work. Pap. 2019-69, Univ. Chicago, Chicago, IL.
https://bfi.uchicago.edu/wp-content/uploads/BFI_WP_201969.pdf
Kelly B, Moskowitz T, Pruitt S. 2021. Understanding momentum and reversal. J. Financ. Econ. 140(3):726–43
Kelly B, Palhares D, Pruitt S. 2021. Modeling corporate bond returns. J. Finance. In press
Kelly B, Pruitt S. 2013. Market expectations in the cross-section of present values. J. Finance 68(5):1721–56
Kelly B, Pruitt S, Su Y. 2019. Characteristics are covariances: a unified model of risk and return. J. Financ. Econ.
134(3):501–24
Kelly B, Pruitt S, Su Y. 2020. Instrumented principal component analysis. SSRN Work. Pap. https://dx.doi.org/
10.2139/ssrn.2983919
Kim S, Korajczyk RA, Neuhierl A. 2021. Arbitrage portfolios. Rev. Financ. Stud. 34(6):2813–56
Kingma D, Ba J. 2014. Adam: a method for stochastic optimization. arXiv:1412.6980 [cs.LG]
Kleibergen F. 2009. Tests of risk premia in linear factor models. J. Econom. 149(2):149–73
Koijen R, Nieuwerburgh SV. 2011. Predictability of returns and cash flows. Annu. Rev. Financ. Econ. 3:467–91
Korsaye SA, Quaini A, Trojani F. 2019. Smart SDFs. Work. Pap., Univ. Geneva, Geneva, Switz.
Kosowski R, Timmermann A, Wermers R, White H. 2006. Can mutual fund “stars” really pick stocks? New
evidence from a bootstrap analysis. J. Finance 61(6):2551–95
Kozak S, Nagel S, Santosh S. 2018. Interpreting factor models. J. Finance 73(3):1183–223
Kozak S, Nagel S, Santosh S. 2020. Shrinking the cross section. J. Financ. Econ. 135(2):271–92
Lamont OA. 2001. Economic tracking portfolios. J. Econom. 105(1):161–84
Lettau M, Pelger M. 2020a. Estimating latent asset-pricing factors. J. Econom. 218(1):1–31
Lettau M, Pelger M. 2020b. Factors that fit the time series and cross-section of stock returns. Rev. Financ. Stud.
33(5):2274–325
Lewellen J. 2015. The cross-section of expected stock returns. Crit. Finance Rev. 4(1):1–44
Lewellen J, Nagel S, Shanken J. 2010. A skeptical appraisal of asset pricing tests. J. Financ. Econ. 96(2):175–94
Lo AW, MacKinlay AC. 1990. Data-snooping biases in tests of financial asset pricing models. Rev. Financ. Stud.
3(3):431–67

www.annualreviews.org • Factor Models, Machine Learning, and Asset Pricing 367


Merton RC. 1973. An intertemporal capital asset pricing model. Econometrica 41(5):867–87
Mitchell TJ, Beauchamp JJ. 1988. Bayesian variable selection in linear regression. J. Am. Stat. Assoc.
83(404):1023–32
Obaid K, Pukthuanthong K. 2022. A picture is worth a thousand words: measuring investor sentiment by
combining machine learning and photos from news. J. Financ. Econ. 144(1):273–97
Onatski A. 2009. Testing hypotheses about the number of factors in large factor models. Econometrica
77(5):1447–79
Onatski A. 2012. Asymptotics of the principal components estimator of large factor models with weakly
influential factors. J. Econom. 168(2):244–58
Pesaran MH, Smith R. 2019. The role of factor strength and pricing errors for estimation and inference in asset pricing
models. SSRN Work. Pap. http://dx.doi.org/10.2139/ssrn.3480925
Pesaran MH, Yamagata T. 2017. Testing for alpha in linear factor pricing models with a large number of securities.
SSRN Work. Pap. http://dx.doi.org/10.2139/ssrn.2973079
Pukthuanthong K, Roll R, Subrahmanyam A. 2019. A protocol for factor identification. Rev. Financ. Stud.
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

32(4):1573–607
Rapach D, Strauss JK, Zhou G. 2013. International stock return predictability: What is the role of the United
Access provided by 107.186.235.202 on 12/25/23. For personal use only.

States? J. Finance 68(4):1633–62


Rapach D, Zhou G. 2013. Forecasting stock returns. In Handbook of Economic Forecasting, Vol. 2, ed. G Elliott,
A Timmermann, pp. 328–83. Amsterdam: Elsevier
Raponi V, Robotti C, Zaffaroni P. 2020. Testing beta-pricing models using large cross-sections. Rev. Financ.
Stud. 33:2796–842
Rosenberg B. 1974. Extra-market components of covariance in security returns. J. Financ. Quant. Anal.
9(2):263–74
Ross SA. 1976. The arbitrage theory of capital asset pricing. J. Econ. Theory 13(3):341–60
Santos T, Veronesi P. 2004. Conditional betas. NBER Work. Pap. 10413
Shanken J. 1992. On the estimation of beta pricing models. Rev. Financ. Stud. 5(1):1–33
Sharpe WF. 1964. Capital asset prices: a theory of market equilibrium under conditions of risk. J. Finance
19(3):425–42
Stambaugh RF, Yuan Y. 2017. Mispricing factors. Rev. Financ. Stud. 30(4):1270–315
Sullivan R, Timmermann A, White H. 1999. Data-snooping, technical trading rule performance, and the
bootstrap. J. Finance 54(5):1647–91
Welch I, Goyal A. 2007. A comprehensive look at the empirical performance of equity premium prediction.
Rev. Financ. Stud. 21(4):1455–508
White H. 2000. A reality check for data snooping. Econometrica 68(5):1097–126
Zaffaroni P. 2019. Factor models for asset pricing. Work. Pap., Imp. Coll. Lond., London, UK

368 Giglio • Kelly • Xiu


FE14_FrontMatter ARjats.cls September 16, 2022 14:5

Annual Review
of Financial
Economics

Contents Volume 14, 2022


Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

The Village Money Market Revealed: Financial Access and Credit


Access provided by 107.186.235.202 on 12/25/23. For personal use only.

Chain Links Between Formal and Informal Sectors


Parit Sripakdeevong and Robert M. Townsend p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 1
Zombie Lending: Theoretical, International, and Historical
Perspectives
Viral V. Acharya, Matteo Crosignani, Tim Eisert, and Sascha Steffen p p p p p p p p p p p p p p p p p p p p p21
Bank Supervision
Beverly Hirtle and Anna Kovner p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p39
The Economics of Liquidity Lines Between Central Banks
Saleem Bahaj and Ricardo Reis p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p57
Sovereign Debt Sustainability and Central Bank Credibility
Tim Willems and Jeromin Zettelmeyer p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p75
Bitcoin and Beyond
Kose John, Maureen O’Hara, and Fahad Saleh p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p95
Some Simple Economics of Stablecoins
Christian Catalini, Alonso de Gortari, and Nihar Shah p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 117
Nonbanks and Mortgage Securitization
You Suk Kim, Karen Pence, Richard Stanton, Johan Walden, and Nancy Wallace p p p p p p 137
Student Loans and Borrower Outcomes
Constantine Yannelis and Greg Tracey p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 167
FinTech Lending
Tobias Berg, Andreas Fuster, and Manju Puri p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 187
Financing Health Care Delivery
Jonathan Gruber p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 209
Financing Biomedical Innovation
Andrew W. Lo and Richard T. Thakor p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 231

vii
FE14_FrontMatter ARjats.cls September 16, 2022 14:5

Private or Public Equity? The Evolving Entrepreneurial Finance


Landscape
Michael Ewens and Joan Farre-Mensa p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 271
The Effects of Public and Private Equity Markets on Firm Behavior
Shai Bernstein p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 295
Private Finance of Public Infrastructure
Eduardo Engel, Ronald Fischer, and Alexander Galetovic p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 319
Factor Models, Machine Learning, and Asset Pricing
Stefano Giglio, Bryan Kelly, and Dacheng Xiu p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 337
Empirical Option Pricing Models
Annu. Rev. Financ. Econ. 2022.14:337-368. Downloaded from www.annualreviews.org

David S. Bates p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 369


Access provided by 107.186.235.202 on 12/25/23. For personal use only.

Decoding Default Risk: A Review of Modeling Approaches, Findings,


and Estimation Methods
Gurdip Bakshi, Xiaohui Gao, and Zhaodong Zhong p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 391
The Pricing and Ownership of US Green Bonds
Malcolm Baker, Daniel Bergstresser, George Serafeim, and Jeffrey Wurgler p p p p p p p p p p p p 415
A Survey of Alternative Measures of Macroeconomic Uncertainty:
Which Measures Forecast Real Variables and Explain Fluctuations
in Asset Volatilities Better?
Alexander David and Pietro Veronesi p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 439
A Review of China’s Financial Markets
Grace Xing Hu and Jiang Wang p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 465
Corporate Debt and Taxes
Michelle Hanlon and Shane Heitzman p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 509
Corporate Culture
Gary B. Gorton, Jillian Grennan, and Alexander K. Zentefis p p p p p p p p p p p p p p p p p p p p p p p p p p p p 535
Kindleberger Cycles: Method in the Madness of Crowds?
Randall Morck p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 563

Indexes

Cumulative Index of Contributing Authors, Volumes 7–14 p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 587


Cumulative Index of Article Titles, Volumes 7–14 p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 590

Errata

An online log of corrections to Annual Review of Financial Economics articles may be


found at http://www.annualreviews.org/errata/financial

viii Contents

You might also like