0% found this document useful (0 votes)
22 views22 pages

1 s2.0 S0304405X23001770 Main

This study explores the use of machine learning to select mutual funds that generate significant out-of-sample annual alphas of 2.4% after costs, highlighting the importance of fund characteristics and their interactions in predicting future performance. The findings indicate that machine learning methods, particularly gradient boosting and random forests, outperform traditional linear methods in identifying skilled fund managers and constructing profitable portfolios. The research suggests that investors can benefit from active management if they utilize sophisticated prediction techniques to navigate the complexities of mutual fund performance.

Uploaded by

jingli0816lj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views22 pages

1 s2.0 S0304405X23001770 Main

This study explores the use of machine learning to select mutual funds that generate significant out-of-sample annual alphas of 2.4% after costs, highlighting the importance of fund characteristics and their interactions in predicting future performance. The findings indicate that machine learning methods, particularly gradient boosting and random forests, outperform traditional linear methods in identifying skilled fund managers and constructing profitable portfolios. The research suggests that investors can benefit from active management if they utilize sophisticated prediction techniques to navigate the complexities of mutual fund performance.

Uploaded by

jingli0816lj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Journal of Financial Economics 150 (2023) 103737

Contents lists available at ScienceDirect

Journal of Financial Economics


journal homepage: www.elsevier.com/locate/jfec

Machine learning and fund characteristics help to select mutual funds with
positive alpha ✩
Victor DeMiguel a,∗ , Javier Gil-Bazo b , Francisco J. Nogales c , André A.P. Santos d
a
Management Science and Operations, London Business School, United Kingdom
b
Department of Economics and Business, Universitat Pompeu Fabra, Barcelona School of Economics, and UPF Barcelona School of Management, Spain
c
Department of Statistics, Universidad Carlos III de Madrid, Spain
d
CUNEF Universidad, Spain

A R T I C L E I N F O A B S T R A C T

Dataset link: https:// Machine-learning methods exploit fund characteristics to select tradable long-only portfolios of mutual funds
doi.org/10.17632/rpgb99m5zy.3 that earn significant out-of-sample annual alphas of 2.4% net of all costs. The methods unveil interactions in the
relation between fund characteristics and future performance. For instance, past performance is a particularly
JEL classification:
G11 strong predictor of future performance for more active funds. Machine learning identifies managers whose skill
G17 is not sufficiently offset by diseconomies of scale, consistent with informational frictions preventing investors
G23 from identifying the outperforming funds. Our findings demonstrate that investors can benefit from active
management, but only if they have access to sophisticated prediction methods.
Keywords:
Active asset management
Mutual-fund performance
Mutual-fund misallocation
Machine learning
Tradable strategies
Nonlinearities and interactions

1. Introduction fees, and other expenses (Sharpe, 1966; Jensen, 1968; Gruber, 1996;
Ferreira et al., 2013). Moreover, although several studies document the
Mutual-fund research consistently shows that the average active existence of a subset of managers that outperform their benchmarks
fund earns negative risk-adjusted returns (alpha) after transaction costs, (Wermers, 2000; Barras et al., 2010; Fama and French, 2010; Kacper-


Nikolai Roussanov was the editor for this article. A previous version of this manuscript was circulated under the title “Can Machine Learning Help to Select
Portfolios of Mutual Funds?” We are grateful for detailed and constructive feedback from an anonymous referee and the editor. We are also grateful for comments
from Eddie Anderson, Fahiz Baba-Yara, Paul Borochin, Andrea Buraschi, Joao Cocco, Pasquale Della Corte, Francisco Gomes, Jin Guo, Martin Haugh, Juan Imbet,
Marcin Kacperczyk, Howard Kung, Narayan Naik, Jean Pauphilet, Anna Pavlova, Markus Pelger, Zhen Qi, Alexandre Rubesam, Stephen Schaefer, Henri Servaes,
Raman Uppal, Michael Young, and Paolo Zaffaroni, as well as seminar participants at CUNEF, Imperial College Business School, London Business School, Universidad
Autónoma de Madrid, Universidad de Zaragoza, University of Bath, University of Bristol and Université Paris Dauphine and conference participants at the 2021
conference of the French Finance Association, 2021 Finance Forum (Spanish Finance Association), 2021 EFMA annual meeting, 2021 conference of the Brazilian
Finance Society, 2021 FMA annual meeting, 2021 INFORMS Annual Meeting, 2021 Paris December Finance Meeting, 2022 Funcas (Jornadas de Análisis Financiero
y Big Data), 2022 Global Finance Conference, 2022 UF Research Conference on Machine Learning in Finance, 2023 Oxford-Man Institute Machine Learning in
Quantitative Finance Conference, and 2023 SoFiE conference. Javier Gil-Bazo acknowledges financial support from the Spanish Government, Ministry of Science and
Innovation grant PID2020-118541GB-I00, and Spanish Agencia Estatal de Investigación (AEI), through the Severo Ochoa Programme for Centres of Excellence in
R&D (Barcelona School of Economics CEX2019-000915-S). Francisco J. Nogales acknowledges the financial support from the Spanish Government through project
PID2020-116694GB-I00. André A. P. Santos acknowledges the financial support from the Comunidad de Madrid Government through project 2022-T1/SOC-24167.
* Corresponding author.
E-mail addresses: [email protected] (V. DeMiguel), [email protected] (J. Gil-Bazo), [email protected] (F.J. Nogales), [email protected]
(A.A.P. Santos).

https://doi.org/10.1016/j.jfineco.2023.103737
Received 19 September 2022; Received in revised form 12 October 2023; Accepted 12 October 2023
Available online 26 October 2023
0304-405X/© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

czyk et al., 2014; Berk and Van Binsbergen, 2015), it is notoriously the problem. Fund performance is determined by a host of different fac-
difficult to identify the outperforming funds ex ante. We show that tors including the manager multifaceted ability, portfolio constraints,
machine-learning methods that exploit nonlinearities and interactions manager incentives and agency problems, as well as fund trading costs,
in the relation between fund characteristics and performance can help fees, and other expenses. Thus, it seems unlikely that using a single vari-
to construct tradable long-only portfolios of mutual funds that earn sig- able to predict performance would be as efficient as exploiting a large
nificant out-of-sample alphas net of all costs. Our results imply that set of characteristics.
investors can earn economically significant alpha by investing in active Second, we use three machine-learning methods to forecast fund
mutual funds, but only if they have access to sophisticated prediction performance: elastic net, gradient boosting, and random forests. These
methods that capture the complexity in the relation between fund char- methods can accommodate irrelevant or highly correlated predictors,
acteristics and performance. and thus, they allow us to consider multiple characteristics with lower
To understand the economic mechanism behind our results, we risk of overfitting than Ordinary Least Squares (OLS). In addition,
study whether the performance of our portfolios can be explained the two decision-tree based methods (gradient boosting and random
by capital misallocation in the mutual-fund market (Roussanov et al., forests) can exploit nonlinearities and interactions, and thus, they may
2021), and indeed find that nonlinear machine-learning methods select uncover predictability that would be missed by linear methods such as
funds that are “too small” relative to their managers’ skill. Thus, ma- elastic net or OLS. As a robustness test, Section IA.6 of the Internet Ap-
chine learning helps to select outperforming funds not only because it pendix considers also neural networks.
can identify skilled managers, but also because it can identify managers Third, we focus on identifying tradable portfolios of funds. In par-
whose skill is not sufficiently offset by diseconomies of scale. This is ticular, we consider long-only portfolios of mutual funds, we construct
consistent with informational frictions preventing investors from iden- the portfolios using exclusively past data, and we evaluate their future
tifying some of the funds whose managers have the highest skill, and (out-of-sample) performance in terms of alpha net of fees, transaction
thus, these funds remaining small relative to their manager’s skill. Our costs, and other expenses. Finally, we employ a dynamic approach—
work implies that there is scope for pension-plan administrators and fi- the decision whether to exploit a fund characteristic is taken every time
nancial advisors to integrate machine learning with other tools in order we rebalance the portfolio. By allowing for variation over time in the
to help investors select active mutual funds with positive alpha. relation between characteristics and performance, our method can ac-
Passive funds have recently surpassed active funds in terms of assets commodate changes in the determinants of fund performance due to
under management in U.S. domestic equity mutual funds. Many inter- investor learning or shifts in market conditions.
pret this victory of passive management as a result of the persistent We compare the out-of-sample and net-of-costs performance of
inability of the average active manager to outperform cheaper passive the portfolios of funds constructed using the three machine-learning
alternatives (Gittelsohn, 2019). To determine whether at least some ac- methods, OLS, and two naive strategies (equally weighted and asset-
tive managers outperform, researchers have investigated if future fund weighted portfolios of all funds). We use monthly data on the returns
performance can be predicted using past returns. The consensus that and 17 characteristics of no-load actively managed U.S. domestic eq-
emerges from this literature is that positive net alpha does not persist, uity mutual funds spanning the 1980 to 2020 period. We consider only
particularly after accounting for the exposure of mutual-fund returns to no-load funds to ensure that our alphas are net of all costs. We use
the momentum factor (Carhart, 1997).1 the first 10 years of data to train the three machine-learning methods
Lack of persistence in fund net alpha is consistent with the model and OLS to predict future annual net alpha, estimated using the five-
of Berk and Green (2004), in which investors supply capital with infi- factor model of Fama and French (2015) augmented with momentum.
nite elasticity to funds they expect to outperform, based on past returns. As predictors, we use lagged values of the 17 fund characteristics. We
If there are diseconomies of scale in portfolio management, in equilib- then form a long-only equally weighted portfolio of the funds in the top
rium funds with positive past alpha attract more assets, and thus, earn decile of predicted net alpha, and compute the net return of the port-
the same expected net alpha as any other active fund: that of the al- folio in the following 12 months. For every remaining year, we expand
ternative passive benchmark (zero). However, informational frictions the training sample forward by one year, construct a new top-decile
may prevent investor flows from driving fund performance to zero (Du- portfolio, and track its net return for the next 12 months. This way,
mitrescu and Gil-Bazo, 2018; Roussanov et al., 2021). Consequently, we construct a time series of monthly out-of-sample net returns of the
whether mutual-fund performance is predictable is ultimately an empir- top-decile portfolio spanning the period from 1990 to 2020. Finally, we
ical question that has received considerable attention in the literature. evaluate the net alpha of the portfolio over the whole out-of-sample
Several studies have shown that mutual-fund characteristics can be used period with respect to four models: Carhart (1997) four-factor model;
to predict fund performance; see Jones and Mo (2020) for a review. Fama and French (2015) five-factor model (FF5); FF5 augmented with
Typically, these studies rank funds every month or quarter on the basis momentum; and FF5 augmented with momentum and the liquidity fac-
of a mutual-fund characteristic. They then allocate funds to quintile or tor of Pástor and Stambaugh (2003).
decile portfolios and evaluate the performance of long-short portfolios We highlight five findings. First, the two machine-learning meth-
of funds. However, only a small subset of the mutual-fund characteris- ods that exploit nonlinearities and interactions (gradient boosting and
tics considered in the literature can be used to select long-only portfolios random forests) select long-only portfolios of funds that earn statisti-
of funds with positive alpha after transaction costs, fees, and other ex- cally significant alphas net of all costs of 2.36% and 2.69% per year,
penses. This is crucial because open-end funds cannot be easily shorted, respectively, relative to the FF5 model augmented with momentum.
and thus, investors can only benefit from active management via long- These alphas are also economically significant—for instance, they are
only portfolios of funds that deliver positive net alpha. more than double the average expense ratio in our sample (1.11%).
Our goal is to study whether investors can benefit from active man- In contrast, the portfolios based on the linear methods (elastic net and
agement, and thus, we take on the challenge of identifying long-only OLS) deliver annual net alphas of 1.09% and 1.21%, respectively, which
portfolios of mutual funds with positive future alpha net of all costs. are statistically indistinguishable from zero. The equally weighted and
Our approach departs from the existing literature along three dimen- asset-weighted portfolios earn negative annual net alphas of −0.22% and
sions. First, we jointly exploit 17 mutual-fund characteristics to predict −0.44%, respectively, consistent with existing evidence that the average
fund performance, which allows us to account for the complex nature of active fund underperforms passive benchmarks after costs. Our findings
are similar when we evaluate out-of-sample alpha using other factor
models. In summary, while portfolios that exploit predictability in the
1 A notable exception is the study of Bollen and Busse (2005), who find evi- data help investors to avoid underperforming funds, only the machine-
dence of short-term (quarterly) persistence among top-performing funds. learning methods that exploit nonlinearities and interactions—gradient

2
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

boosting and random forests—allow them to earn significantly positive as alphas based on the factor models of Cremers et al. (2013), Hou et
net alpha by investing in active funds. al. (2015), and Stambaugh and Yuan (2017). Third, the performance of
Second, machine learning unveils nonlinearities and interactions the top-decile portfolio is just as good or even better if we exclude from
in the relation between fund characteristics and future performance. our sample institutional share classes, which implies that our results
The most important characteristics for the nonlinear machine-learning are not driven by the presence of share classes targeted to sophisticated
methods include various measures of past performance and fund ac- investors. Fourth, performance is only slightly weaker if we construct
tiveness. We find that the relation between fund activeness and future portfolios consisting of funds in the top 5% or 20% of the predicted al-
performance is highly nonlinear, with the relation being strongly pos- pha distribution. Fifth, if we extend the holding period to 24 months
itive for the most active funds, but flat for the rest of the funds. The instead of 12 months, the performance of the top-decile portfolios se-
nonlinear methods also unveil important interactions between past- lected by gradient boosting and random forests improves substantially.
performance and fund-activeness measures. In particular, we find that, For instance, the annual net alpha for the random-forest portfolio is
although investors may generally achieve higher net alpha by holding 4%. Sixth, we find that although neural networks can deliver portfo-
funds with good past performance, past performance is a particularly lios with positive alphas, their alphas are systematically smaller and
strong predictor of future performance for more active funds. less significant than those obtained with gradient boosting and random
Third, given the importance of the interactions between past perfor- forests. Seventh, the performance of the machine-learning portfolios is
mance and fund activeness for the nonlinear machine-learning port- similar if we use a cross-validation method that accounts for time-series
folios, we explore whether it is possible to achieve positive net al- properties of the data. Eighth, the performance of the machine-learning
pha by double sorting funds across one measure of past performance methods does not decline if we invest in at most one share class per
and one measure of fund activeness. We find that, although it is pos- fund. Ninth, the performance of the machine-learning methods is sim-
sible to achieve positive net alpha by double sorting mutual funds, ilar if we use as a predictor the “value-added” characteristic proposed
the performance of such double-sorted portfolios is quite sensitive to by Berk and Van Binsbergen (2015) estimated over a 36-month win-
the particular measures of past performance and fund activeness con- dow instead of a 12-month window. Finally, the performance of the
sidered. Moreover, we find that the relative predicting ability of the machine-learning methods is similar if we use alternative methods to
measures of past performance and fund activeness varies substantially impute missing observations of fund characteristics.
over time, and thus, to achieve superior out-of-sample performance, We emphasize two implications of our work for investment man-
investors should use machine learning dynamically to identify the char-
agers and regulators. First, the economically large positive net alphas
acteristics and interactions that are important at each point in time
that we document show that investors can benefit from active man-
using only past data.
agement in the mutual-fund industry, but only if they have access to
Fourth, we build on the work by Roussanov et al. (2021) to study
the predictions of sophisticated nonlinear methods. Thus, our findings
whether capital misallocation in the mutual-fund market explains the
suggest that there is scope for managers of funds of funds, pension-plan
performance of the nonlinear machine-learning portfolios. Roussanov et
administrators, financial advisors, and independent analysts to integrate
al. (2021) estimate managerial skill using a Bayesian approach and find
machine learning with other tools in order to help investors select ac-
that funds in the top decile of the skill distribution are “too small” for
tive mutual funds with positive alpha. This may help to improve the
diseconomies of scale to offset the skill of their managers. We compute
efficiency of capital allocation in the mutual-fund market. Second, we
the average net skill and fund size of the decile portfolios of funds gen-
show that mutual-fund characteristics that do not require information
erated by the four prediction methods and, consistent with Roussanov et
on fund portfolio holdings are enough to predict positive alpha. This
al. (2021), we find that the top decile of funds are “too small” given the
is particularly relevant given the recent debate on the SEC proposal to
skill of their managers, with funds in the top decile of the two nonlinear
raise the asset threshold for mandatory portfolio disclosure (Form 13F)
machine-learning methods being particularly small. These findings pro-
from US$ 100 million to US$ 3.5 billion (Aliaj, 2020). While informa-
vide an economic interpretation of our results: Machine learning helps
tion on portfolio holdings is potentially valuable to investors, it can
to select mutual funds not only because it can identify skilled managers,
but also because it can identify managers whose skill is not sufficiently also reveal portfolio strategies and reduce active managers’ incentives
offset by diseconomies of scale. This is consistent with a competition to identify mispriced assets, which can be detrimental for market ef-
framework à la Berk and Green (2004) in which informational frictions ficiency (Aragon et al., 2013; Shi, 2017). Our results imply that even
prevent a substantial fraction of the investor population from identify- if no information on portfolio holdings had been available during our
ing some of the funds whose managers have the highest skill, and thus, sample period, our methods would have identified funds with positive
these funds remaining small relative to their manager’s skill. net alpha on average.
Fifth, Jones and Mo (2020) show that the ability of fund characteris- Our work is related to the literature that documents associations be-
tics to predict performance has declined over time due to increased ar- tween a single mutual-fund characteristic and fund performance (Jones
bitrage activity and mutual-fund competition. Motivated by their work, and Mo, 2020). A strong association between a fund characteristic and
we study how the alpha of the different portfolios varies from 1991 performance does not guarantee that long-only portfolios of funds based
to 2020. We find that the three prediction-based portfolios (gradient on that characteristic earn positive net alphas. For instance, higher
boosting, random forests, and OLS) outperform the two naive portfolios expense ratios are negatively associated with net fund alphas (in our
(equally weighted and asset weighted) from 1991 to 2011. Consistent sample, funds in the bottom decile of the expense-ratio distribution
with Jones and Mo (2020), however, the performance of the prediction- outperform funds in the top decile by 1% per year relative to the FF5
based portfolios is similar to that of the naive portfolios from 2012 until model augmented with momentum), but a portfolio that invests only
2018. Interestingly, all three prediction-based portfolios outperform the in the cheapest funds does not outperform passive benchmarks in net
two naive portfolios in the last two years of our sample (2019 and terms. Thus, expense ratios help investors to avoid expensive under-
2020). We also find that the difference in the performance of the non- performing funds, but not to select outperforming funds with positive
linear machine-learning portfolios across different business-cycle and net alphas. In fact, only seven of the 27 studies identified by Jones and
sentiment regimes is not statistically significant. Mo (2020) report positive and statistically significant in-sample Carhart
We check the robustness of our findings to considering various alter- (1997) alphas after fees and transaction costs for long-only portfolios of
native methodological choices in the Internet Appendix. First, we show mutual funds (Chan et al., 2002; Busse and Irvine, 2006; Mamaysky
that our results are robust to considering the post-publication decay in et al., 2008; Cremers and Petajisto, 2009; Elton et al., 2011; Amihud
predictability documented by McLean and Pontiff (2016). Second, our and Goyenko, 2013; Gupta-Mukherjee, 2014). We contribute to this lit-
results continue to hold if we use other performance measures, such erature by showing that it is possible to select long-only portfolios of

3
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

mutual funds with significant positive net alpha by exploiting multiple 2.1. CRSP sample data
characteristics and using machine learning.
Our paper is related to an emerging literature that uses machine We collect monthly information on U.S. domestic-equity mutual
learning to predict fund performance. Wu et al. (2021) predict fu- funds from the CRSP Survivor-Bias-Free US Mutual Fund database. To
ture hedge-fund returns by exploiting characteristics constructed from keep our analysis as close as possible to the actual selection problem
fund historical returns. Instead, we predict future mutual-fund alphas faced by investors, we perform the analysis at the share-class level.2
by exploiting both fund historical returns as well as other fund char- Moreover, we restrict our analysis to share classes that charge no front-
acteristics. Like us, Li and Rossi (2020) use machine learning to select end or back-end loads, and thus rebalancing our portfolios of mutual
portfolios of mutual funds, but a fundamental difference between the funds do not incur any costs. Our sample includes both institutional and
two papers is that they use disjoint sets of predictors: while Li and retail share classes and spans from January 1980 to December 2020.3
Rossi (2020) exploit data on fund holdings and stock characteristics, we We apply a few filters that are common in the mutual-fund liter-
exploit data on fund characteristics. Our findings complement theirs ature. First, we include only share classes of actively managed funds,
by showing that investors can select portfolios of mutual funds with therefore excluding ETFs and passive mutual funds.4 Second, we in-
clude only share classes of funds with more than 70% of their portfolios
positive net alpha by exploiting solely the information contained in
invested in equities. Third, to avoid previously documented biases in
fund characteristics. Kaniel et al. (2023) use neural networks to pre-
the CRSP database, we exclude observations of a share class before it
dict mutual-fund alpha using a comprehensive set of predictors that
reaches 36 months of age and before the first observation with at least
includes stock characteristics, fund characteristics, and macroeconomic
US$ 5 million of Total Net Assets (TNA), see Elton et al. (2001) and
variables. They not only corroborate our finding that fund characteris-
Evans (2010). Our final sample contains 8,767 unique share classes, of
tics predict performance, but also show that when fund characteristics
which 7,921 correspond to diversified equity funds (representing 95%
are included as predictors, stock characteristics no longer help to pre-
of aggregate TNA in the sample) and 846 to sector funds.
dict alpha. A key distinguishing feature of our work is the focus on
tradable portfolios of mutual funds, which allows us to study whether 2.2. Mutual-fund characteristics
investors can actually benefit from active management. In particular,
we identify long-only portfolios of mutual funds using exclusively past We construct a dataset of 17 share-class characteristics using read-
data, and evaluate their future (out-of-sample) performance net of all ily available information on fund characteristics and historical returns.
costs (including loads). Kaniel et al. (2023) focus on long-short portfolios None of our characteristics requires information about portfolio hold-
of mutual funds, forecast performance using three-fold cross validation ings, and thus, our set of predictors is disjoint from that used by Li and
over the entire sample, and do not account for fund loads. Moreover, Rossi (2020).
most of the predictability in after-fee alpha documented by Kaniel et al. For the 𝑖th share class in the 𝑚th month, we obtain data on its return
(2023, Figure 6b) comes from the short leg of their long-short portfolios in excess of the risk-free rate net of expenses and transaction costs (𝑟𝑖,𝑚 ),
of funds. total net assets (𝑇 𝑁𝐴𝑖,𝑚 ), expense ratio (𝐸𝑅𝑖,𝑚 ), and portfolio turnover
Our paper is also related to studies that use Bayesian methods to con- ratio.5 In addition, we compute the class age as the number of months
struct optimal portfolios of mutual funds (Baks et al., 2001; Pástor and since its inception; we estimate the monthly flows as the relative growth
Stambaugh, 2002; Jones and Shanken, 2005; Avramov and Wermers, in the class TNA adjusted for returns net of expenses
2006; Banegas et al., 2013). Unlike these papers, we do not study how ( )
𝑇 𝑁𝐴𝑖,𝑚 − 𝑇 𝑁𝐴𝑖,𝑚−1 1 + 𝑟𝑖,𝑚
investors should allocate their wealth across funds given their prefer- 𝑓 𝑙𝑜𝑤𝑖,𝑚 = ; (1)
𝑇 𝑁𝐴𝑖,𝑚−1
ences and priors about managerial skill and predictability. Instead, our
goal is to identify active funds with positive alpha that investors can we estimate the volatility of flows as the standard deviation of flows in
combine with passive funds to achieve better risk-return tradeoffs. the calendar year; and we compute the manager tenure in years.6 All of
Finally, our paper is related to the growing literature that employs these characteristics have been identified as predictors of mutual-fund
machine learning to address empirical problems in finance such as pre- performance (Chen et al., 2004; Rakowski, 2010; Jones and Mo, 2020).
dicting global equity-market returns (Rapach et al., 2013); predicting Moreover, we obtain several characteristics associated with the
consumer credit-card defaults (Butaru et al., 2016); measuring equity- time-series regression of share-class returns on the five Fama and French
risk premia (Gu et al., 2020; Chen et al., 2020a); detecting predictability (2015) and momentum factors (hereafter, FF5+MOM). In particular, for
in bond risk premia (Bianchi et al., 2021); building test assets that each share class and month in our sample, we run a “rolling-window”
capture nonlinearities and interactions in asset pricing (Feng et al., regression of the share-class returns on the FF5+MOM factor returns
2020; Bryzgalova et al., 2019); forecasting inflation (Garcia et al., 2017; for the previous 36 months.7 We then compute alpha 𝑡-stat (the inter-
Medeiros et al., 2021), and studying the relation between investor char-
acteristics and portfolio allocations (Rossi and Utkus, 2020). In the 2 Section IA.8 of the Internet Appendix shows that our findings are robust to
context of mutual funds, Pattarin et al. (2004), Moreno et al. (2006),
investing in at most one share class per fund.
and Mehta et al. (2020) employ machine learning to classify mutual 3 Section IA.3 of the Internet Appendix shows that our results are robust to
funds by investment category, but they do not study fund performance. considering only retail classes and it also studies how the differences between
Chiang et al. (1996) and Indro et al. (1999) use neural networks to pre- retail and institutional classes affect the different prediction methods.
4 We use the index-fund identifier from CRSP, index_fund_flag, to identify
dict mutual-fund net asset value and return, respectively. While these
authors focus on forecasting accuracy, our goal is to identify funds with funds that aim to replicate an index. When the identifier is missing, we use the
superior performance. fund name to infer whether it is passively managed.
5 We proxy for the risk-free rate using the one-month T-bill rate downloaded

from Ken French’s website.


2. Data 6 We cross-sectionally winsorize flows at the 1st and 99th percentiles; that is,

each month we replace extreme observations that are below the 1st percentile
or above the 99th percentile with the value of those percentiles. The computa-
In this section, we describe the data we use in our analysis. Sec-
tion of the standard deviation of flows is based on winsorized flows. For each
tion 2.1 describes the sample data. Section 2.2 defines the 17 monthly calendar year, we require at least ten monthly flow observations to compute
mutual-fund characteristics that we consider. Section 2.3 explains how volatility of flows.
we transform these monthly characteristics to generate the annual tar- 7 To run each regression, we require at least 30 months of non-missing returns

get and predicting variables for the machine-learning methods. in the 36-month window.

4
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Table 1
Share-class characteristics: Definitions. This table lists the 17 monthly mutual-fund
share-class characteristics that we consider. The first column gives the name of each
characteristic and the second column provides its definition.

Variable Definition

realized alpha Monthly realized alpha calculated using Equation (2)


flows Monthly flows calculated using Equation (1)
value added Monthly dollar value extracted by the fund’s manager from
asset market calculated using Equation (3)
volatility of flows Standard deviation of monthly flows in calendar year
total net assets (TNA) Total assets minus total liabilities at end of month
expense ratio Annual expenses as percentage of assets under management
age (months) Number of months since share-class’s inception date
manager tenure (years) Number of years since beginning of manager’s mandate
turnover ratio Minimum of annual aggregate sales and annual aggregate
purchases divided by total net assets
alpha 𝑡-stat Alpha 𝑡-stat from rolling-window regression on
FF5+MOM factors for previous 36 months
market beta 𝑡-stat Market beta 𝑡-stat from rolling-window regression on
FF5+MOM factors for previous 36 months
profitability beta 𝑡-stat Profitability beta 𝑡-stat from rolling-window regression on
FF5+MOM factors for previous 36 months
investment beta 𝑡-stat Investment beta 𝑡-stat from rolling-window regression on
FF5+MOM factors for previous 36 months
size beta 𝑡-stat Size beta 𝑡-stat from rolling-window regression on
FF5+MOM factors for previous 36 months
value beta 𝑡-stat Value beta 𝑡-stat from rolling-window regression on
FF5+MOM factors for previous 36 months
momentum beta 𝑡-stat Momentum beta 𝑡-stat from rolling-window regression on
FF5+MOM factors for previous 36 months
𝑅2 R-squared from rolling-window regression on
FF5+MOM factors for previous 36 months

cept scaled by its standard error) and beta 𝑡-stats. We use 𝑡-stats instead Table 2
of raw alphas and betas as predictors to account for estimation error Share-class characteristics: Descriptive statistics. This table reports monthly
(Hunter et al., 2014). In addition, we use the 𝑅2 from the FF5+MOM descriptive statistics (mean, median, standard deviation, and number of class-
rolling-window regression as a predictor of fund performance, as pro- month observations) for the mutual-fund share-class characteristics we con-
sider. All variables are measured at the share-class level and correspond to U.S.
posed by Amihud and Goyenko (2013), who explain that 𝑅2 is a mea-
domestic equity funds in the 1980 to 2020 period.
sure of fund activeness because low-𝑅2 funds track the benchmark less
closely.8 We also compute the monthly realized alpha for the 𝑖th share Mean Median Standard Class-month
class in the 𝑚th month (𝛼𝑖,𝑚 ) as: deviation observations

monthly return 0.86% 1.25% 5.23% 718,928


𝛼𝑖,𝑚 = 𝑟𝑖,𝑚 − 𝛽̂𝑀𝐾𝑇 ,𝑖,𝑚 𝑀𝐾𝑇𝑚 − 𝛽̂𝑆𝑀𝐵,𝑖,𝑚 𝑆𝑀𝐵𝑚 − 𝛽̂𝐻𝑀𝐿,𝑖,𝑚 𝐻𝑀𝐿𝑚 monthly realized alpha -0.14% -0.13% 2.22% 676,147
alpha 𝑡-stat -0.431 -0.430 1.209 676,475
− 𝛽̂𝑅𝑀𝑊 ,𝑖,𝑚 𝑅𝑀𝑊𝑚 − 𝛽̂𝐶𝑀𝑊 ,𝑖,𝑚 𝐶𝑀𝑊𝑚 − 𝛽̂𝑀𝑂𝑀,𝑖,𝑚 𝑀𝑂𝑀𝑚 , TNA (USD mill.) 679.9 97.4 2,593 719,398
(2) expense ratio 1.11% 1.04% 0.52% 712,564
age (months) 145.7 117.0 109.8 719,398
where 𝑀𝐾𝑇𝑚 , 𝑆𝑀𝐵𝑚 , 𝐻𝑀𝐿𝑚 , 𝑅𝑀𝑊𝑚 , 𝐶𝑀𝑊𝑚 , and 𝑀𝑂𝑀𝑚 are the flows 0.002 -0.004 0.094 718,734
returns in month 𝑚 of the five Fama-French and momentum factors, manager tenure (years) 8.219 7.005 5.352 656,418
and 𝛽̂𝑀𝐾𝑇 ,𝑖,𝑚 , 𝛽̂𝑆𝑀𝐵,𝑖,𝑚 , 𝛽̂𝐻𝑀𝐿,𝑖,𝑚 , 𝛽̂𝑅𝑀𝑊 ,𝑖,𝑚 , 𝛽̂𝐶𝑀𝑊 ,𝑖,𝑚 , 𝛽̂𝑀𝑂𝑀,𝑖,𝑚 are the
turnover ratio 0.790 0.550 1.141 711,568
volatility of flows 0.173 0.091 0.240 704,945
factor loadings of the 𝑖th share class excess return with respect to the value added -0.295 -0.016 37.233 669,727
FF5+MOM factors estimated using the 36-month estimation window market beta 𝑡-stat 16.667 15.064 10.591 676,475
ending in month 𝑚 − 1. profitability beta 𝑡-stat -0.125 -0.125 1.463 676,475
Finally, we use the realized alpha defined in Equation (2) to compute investment beta 𝑡-stat -0.444 -0.495 1.544 676,475
size beta 𝑡-stat 1.460 0.617 3.801 676,475
the value added for each class and month, which we define as in Berk value beta 𝑡-stat 0.022 -0.081 2.195 676,475
and Van Binsbergen (2015): momentum beta 𝑡-stat 0.009 0.026 1.878 676,475
𝑅2 0.907 0.944 0.122 676,475
𝑣𝑎𝑙𝑢𝑒 𝑎𝑑𝑑𝑒𝑑𝑖,𝑚 = (𝛼𝑖,𝑚 + 𝐸𝑅𝑖,𝑚 ∕12) × 𝑇 𝑁𝐴𝑖,𝑚−1 . (3)
This variable captures the dollar value extracted by the fund’s manager of class-month observations for each characteristic. Consistent with the
from the asset market.9 mutual-fund literature, we observe that the average share class in our
Table 1 lists the 17 share-class characteristics and their definitions, sample has negative alpha and loads positively on the market factor.
and Table 2 reports the mean, median, standard deviation, and number The average 𝑅2 is 90.7%, which suggests that the FF5+MOM factors
explain most of the time-series variation in equity mutual-fund returns.
8
The total number of class-month observations varies across variables
Another popular measure of fund activeness is the active share of Cremers
from 656,418 to 719,398.
and Petajisto (2009). We do not use this measure because we rely only on fund
characteristics that do not require information on mutual-fund holdings.
9
Berk and Van Binsbergen (2015) estimate before-fee alpha by regressing 2.3. Target and predicting variables
fund gross returns on the gross returns of passive mutual funds tracking differ-
ent indexes. In unreported analysis, we follow their approach and obtain similar We now explain how we transform the 17 mutual-fund characteris-
results to those based on the FF5+MOM model. tics to generate the target and predicting variables for machine learning.

5
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

First, we convert our sample from monthly to annual frequency because class in year 𝑡, and 𝜃 is the 𝐾-dimensional parameter vector. The OLS
some of the characteristics are available only at the quarterly or an- estimator of realized alpha, 𝑧′𝑖,𝑡 𝜃, is a linear function of the share-class
nual frequency, and even some of the characteristics available at the characteristics. Although OLS provides an unbiased and interpretable
monthly frequency are very persistent. For each calendar year, we com- prediction, machine-learning methods often outperform OLS for data
pute annual realized alpha, value added, and flows as the average of that exhibit high variance, nonlinearities, and interactions.
their monthly values multiplied by twelve.10 Flow volatility is already We consider three machine-learning methods: elastic net, random
defined for each calendar year and we multiply it by square root of forests, and gradient boosting. Elastic net is a linear method, like OLS,
12 to annualize it. For all other characteristics, we use their values in but uses regularization to alleviate overfitting. To capture nonlinearities
December of each year. and interactions, we consider two types of ensembles of decision trees
Second, like Green et al. (2017) we standardize each characteristic (random forests and gradient boosting), which often outperform the linear
so that it has a cross-sectional mean of zero and a standard deviation of methods on structured (tabular) data like our mutual-fund database;
one. This ensures the estimation process of the machine-learning meth- see, for instance, Medeiros et al. (2021).
ods is scale invariant. We set missing observations of each standardized Another popular machine-learning method is neural networks,
characteristic equal to its cross-sectional mean (zero). Section IA.10 of which tend to perform well on non-structured data or highly nonlin-
the Internet Appendix shows that our findings are robust to using an ear structured data. To capture these nonlinearities, neural networks
alternative imputation method for missing observations that exploits employ a large number of parameters, and hence, they require a large
cross-sectional and time-series dependence in the data. number of observations to deliver accurate estimates. Consequently,
Third, we build our final dataset consisting of the target variable and neural networks are not as well suited to our setting as ensembles of
the characteristics that we use as predictors when training the predic- trees. Nonetheless, as a robustness check we evaluate the performance
tion methods. Our target variable is the share-class realized alpha in the of feed-forward neural networks with up to three hidden layers in Sec-
calendar year. This choice is consistent with our goal to exploit share- tion IA.6 of the Internet Appendix.13
class characteristics to generate positive alpha. In contrast, Li and Rossi
(2020) use fund excess returns as their target variable, which allows 3.1. Elastic net
them to study whether the returns of mutual funds can be predicted
from the characteristics of the stocks they hold. The 17 characteristics Regularization is often employed to alleviate overfitting in datasets
we use as predictors are the following one-year-lagged standardized with a large number of predicting variables. The elastic net of Zou
variables: realized alpha, alpha 𝑡-stat, TNA, expense ratio, age, flows, and Hastie (2005) uses both 1-norm and 2-norm regularization terms
volatility of flows, manager tenure, value added, 𝑅2 , and the 𝑡-stats of to shrink the size of the estimated parameters. The objective function
the market, profitability, investment, size, value, and momentum be- for the elastic net, with two regularization terms, is:
tas.11 Fig. 1 shows the correlation matrix of the target and predicting
𝑇∑ 𝑁𝑡
−1 ∑
variables. The target variable has low correlation with lagged predic-
tors. However, some predictors exhibit substantial correlations, with min (𝛼𝑖,𝑡+1 − 𝑧′𝑖,𝑡 𝜃)2 + 𝜆𝜌 ||𝜃||1 + 𝜆(1 − 𝜌) ||𝜃||22 , (4)
𝜃
𝑡=1 𝑖=1
the highest absolute correlation being that between lagged flows and
∑ ∑𝐾 2 1∕2
volatility of flows (61%). where ||𝜃||1 = 𝐾 𝑘=1 |𝜃𝑘 | and ||𝜃||2 = ( 𝑘=1 𝜃𝑘 ) are the 1-norm and 2-
norm of the parameter vector 𝜃, and 𝜆 and 𝜌 are hyper parameters.
3. Machine-learning methods The 1-norm term (𝜆𝜌 ||𝜃||1 ) can be used to control the sparsity of the
estimated parameter vector 𝜃 and the 2-norm term (𝜆(1 − 𝜌) ||𝜃||22 ) to
We use well-known software packages to implement the machine- increase its stability. For the case with 𝜌 = 0, the objective function in
learning methods—the interested reader can refer to their documen- (4) includes only the 2-norm term, and thus, elastic net is equivalent
tation for a detailed description of the methods.12 Gu et al. (2020) to ridge regression, which provides a dense estimator of the parameter
also provide an extensive description of various machine-learning meth- vector 𝜃. If, on the other hand, 𝜌 = 1, the objective function includes
ods in the context of asset pricing. In the remainder of this section, only the 1-norm term, and a Least Absolute Sum of Squares Operator
we briefly describe the methods we consider and the five-fold cross- (LASSO) regression is performed, which provides a sparse estimator. We
validation procedure we use to tune their hyper parameters. explain in Section 3.4 how we calibrate the two hyper parameters 𝜌 and
We organize our data in panel structure, with years indexed as 𝑡 = 𝜆.
1, 2, … , 𝑇 and share classes as 𝑖 = 1, 2, … , 𝑁𝑡 . As a benchmark, we use
the ordinary least squares (OLS) method: 3.2. Random forests
𝑇∑ 𝑁𝑡
−1 ∑
min (𝛼𝑖,𝑡+1 − 𝑧′𝑖,𝑡 𝜃)2 , Random forests are ensembles of decision trees formed by boot-
𝜃
𝑡=1 𝑖=1 strap aggregation (Breiman, 2001). Decision trees split a sample re-
where 𝛼𝑖,𝑡+1 is the realized alpha of the 𝑖th share class in year 𝑡 + 1, 𝑧𝑖,𝑡 is cursively into homogeneous and non-overlapping regions shaped like
a 𝐾-dimensional vector of standardized characteristics for the 𝑖th share high-dimensional boxes. The procedure to generate these boxes is often
represented as a tree, in which the sample is split at each node based
on the characteristic that is most relevant at that particular node. The
10 We require at least ten monthly observations in a calendar year to com-
tree grows from the root node to the leaf nodes, and the prediction is
pute annual realized alpha, value added, and flows in that year. Section IA.9 of the average value of the target variable for the observations in each leaf
the Internet Appendix shows that using a 36-month window to estimate value
node.
added instead of a 12-month window does not help to improve the performance
Decision trees are highly interpretable, but their performance can be
of the different portfolios.
11 The target variable and some predictors are not observable and must be poor because of the high variance of their predictions. Random forests
estimated from the data. While this may pose a problem for inference, our goal reduce the prediction variance by averaging across the predictions of
is to predict future performance rather than conduct inference. numerous decision trees in a forest. The reduction in prediction variance
12 Specifically, we use glmnet, randomForest, xgboost, and h2o packages

to implement elastic net, random forests, gradient boosting, and neural net-
works, respectively. The documentation for these four packages can be found 13 We have not considered other classes of machine-learning methods such as

in Friedman et al. (2010), Liaw and Wiener (2002), Chen et al. (2020b), and principal-component regression or partial least squares because they are typi-
LeDell et al. (2020), respectively. cally outperformed by elastic net; see Elliott et al. (2013).

6
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Fig. 1. Correlation matrix between the target variable and fund characteristics. This figure reports correlation coefficients between the target variable (annual
realized alpha) and the 17 fund characteristics used as predictors. Predictors are lagged one year with respect to the target variable. (For interpretation of the colors
in the figure(s), the reader is referred to the web version of this article.)

is inversely related to the correlation between trees, and thus, ideally boosting aggregates decision trees sequentially to give more influence
the trees should be uncorrelated. To accomplish this, random forests to those observations that are poorly predicted by previous trees. As
use bootstrap to select the observations for each tree, and consider a a result, the gradient-boosting method starts from weak decision trees
random subset of characteristics for each node. (those with prediction performance only slightly better than random
Our random-forest method uses bootstrap with replacement to gen- guessing) and converges to strong trees (better performance). In this
erate 𝐵 = 1, 000 samples from the original data. For each bootstrap fashion, boosting achieves improved predictions by reducing not only
sample, the method grows a decision tree by choosing a random sub- the prediction variance, but also the prediction bias (Schapire and Fre-
set of 𝑚 < 𝐾 characteristics at each node, and choosing the best out of und, 2012).
these 𝑚 characteristics to split the sample. Section 3.4 discusses how At each iteration of gradient boosting, a new decision tree is used to
we tune the hyper parameter 𝑚. The existing literature shows that ran- fit the residuals of the current ensemble of decision trees. Thus, this new
dom forests achieve good prediction performance, specially when there decision tree gives more weight to those observations that are poorly
are many prediction variables and their relation to the target variable predicted by the current ensemble. Then, gradient boosting updates the
is nonlinear and contains interactions (Medeiros et al., 2021; Coulombe ensemble using the new decision tree. A key hyper parameter in gra-
et al., 2020). dient boosting is the learning rate, which determines the weight the
ensemble gives to the most recent decision tree.
3.3. Gradient boosting Unlike random forests, gradient boosting tends to overfit the data.
To avoid overfitting, gradient boosting employs several regularization
Gradient boosting uses ensembles of decision tress, but instead of techniques that require tuning additional hyper parameters. For in-
aggregating independent decision trees like random forests, gradient stance, gradient boosting often imposes constraints on the number of

7
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

decision trees aggregated, the depth and number of nodes of each tree, Table 3
and the minimum number of observations in a leaf node. Out-of-sample alpha of fund portfolios. This table reports the monthly out-
of-sample alphas (in %) net of all costs of the top-decile fund portfolios obtained
3.4. Cross validation of hyper parameters with three machine-learning methods (gradient boosting, random forests, and
elastic net), with Ordinary Least Squares (OLS), and with two naive strategies
(equally weighted and asset-weighted portfolios of all available funds). Alphas
For each estimation window, we tune the hyper parameters of the
are computed by regressing the out-of-sample excess monthly portfolio returns
elastic net, random forests, and gradient boosting using five-fold cross-
net of all costs against the Fama and French (1993) three-factor model aug-
validation; see Hastie et al. (2009, Chapter 7). Specifically, we select mented with momentum (FF3+MOM), the Fama and French (2015) five factors
a grid of possible values for the hyper parameters. We divide the sam- (FF5), and the FF5 model augmented with momentum (FF5+MOM) and with
ple into five equal intervals or “folds.” For 𝑗 from 1 to 5, we remove the liquidity risk factor of Pástor and Stambaugh (2003) (FF5+MOM+LIQ). The
the 𝑗th fold and use the remaining four folds to obtain the predictions out-of-sample period spans from January 1991 to December 2020. We report
corresponding to the different values of the hyper parameters. We then standard errors with Newey-West adjustment for 12 lags in parentheses. One,
evaluate the prediction error (or cross-validation error) of the predic- two, and three asterisks indicate that the alpha is significant at the 10%, 5%,
tion associated with each value of the hyper parameters on the 𝑗th and 1% level, respectively.
fold. After completing this process for each of the five folds, we select FF3+MOM FF5 FF5+MOM FF5+MOM
the value of the hyper parameters that minimizes the average cross- + LIQ
validation error. Gradient boosting 0.178** 0.222*** 0.197** 0.198**
An alternative to 𝑘-fold cross validation that accounts for the time- (0.077) (0.085) (0.080) (0.081)
series properties of the data is time-series cross validation, which reserves Random forest 0.210** 0.263*** 0.224** 0.226**
a section at the end of the training sample for evaluation. Section IA.7 (0.086) (0.097) (0.087) (0.089)
of the Internet Appendix reports the results of a robustness check where Elastic net 0.044 0.075 0.091 0.098
we use time-series cross validation. We find that five-fold cross valida- (0.065) (0.067) (0.069) (0.068)
tion performs slightly better, consistent with Bergmeir et al. (2018) and OLS 0.056 0.085 0.101 0.109*
Coulombe et al. (2020). (0.063) (0.065) (0.066) (0.066)
Equally weighted -0.018 -0.007 -0.018 -0.017
4. Performance of machine-learning portfolios (0.045) (0.045) (0.044) (0.045)
Asset weighted -0.043 -0.033 -0.037 -0.036
In this section, we first describe our performance-evaluation method- (0.036) (0.035) (0.035) (0.036)
ology and then compare the out-of-sample performance of the various
portfolios.

4.1. Performance-evaluation methodology


(FF5+MOM+LIQ). Note however, that in all cases, fund selection is
We now describe the procedure we use to select share classes and based on performance predicted according to the FF5+MOM model.
evaluate the performance of the resulting portfolios. Although the anal-
ysis is carried out at the share-class level, for simplicity herein we refer
to share classes as funds. 4.2. Out-of-sample and net-of-costs performance
We use the first 10 years of data on one-year ahead realized alphas
(from 1981 until 1990) and one-year-lagged fund characteristics (from
1980 until 1989) to train each machine-learning method and OLS. We Table 3 reports the out-of-sample alpha net of all costs of the top-
then use the values of fund characteristics in December of 1990, which decile fund portfolios selected by the three machine-learning methods—
are not employed in the training process, to predict fund performance gradient boosting, random forests, and elastic net—and by OLS. For
in 1991. We form an equally weighted portfolio of the funds in the top comparison purposes, we also report the alpha of two naive fund port-
decile of the predicted-performance distribution and track its return folios: an equally weighted and an asset-weighted portfolio of all share
(net of expenses, fees, loads, and transaction costs) in the 12 months classes, both rebalanced annually.
of 1991. If, during that period, a fund that belongs to the portfolio Our main finding is that the two machine-learning methods that
disappears from the sample, the amount invested in that fund is equally exploit nonlinearities and interactions (gradient boosting and random
distributed across the remaining funds. For every successive year, we forests) select long-only portfolios of funds that deliver statistically sig-
expand the training sample forward one year, train the algorithm again nificant net alphas of 19.7 bp and 22.4 bp per month (2.36% and 2.69%
on the expanded sample, make new predictions for the following year,
per year), respectively, relative to the FF5+MOM model. In contrast, the
construct a new top-decile fund portfolio and track its net return in the
portfolios based on linear methods (elastic net and OLS) deliver net al-
next 12 months. This way, we construct a time series of monthly out-
phas of 9.1 bp and 10.1 bp per month (1.09% and 1.21% per year),
of-sample net returns of the top-decile fund portfolio that spans from
respectively, which are statistically indistinguishable from zero. The
January 1991 to December 2020 (360 months). The average number of
equally weighted and asset-weighted portfolios earn negative net al-
funds selected into the top-decile portfolios is 159 with a minimum of
phas of −1.8 bp and −3.7 bp per month (−0.22% and −0.44% per year),
11 and a maximum of 326.
To evaluate the out-of-sample performance of the top-decile fund respectively. Interestingly, the asset-weighted portfolio underperforms
portfolio, we run a time-series regression of the 360 out-of-sample the equally weighted portfolio, which implies that the average dollar in-
monthly portfolio excess returns on contemporaneous risk-factor re- vested in active funds earns lower risk-adjusted after-cost returns than
turns. The portfolio alpha is the intercept of the time-series regression. the average fund. In summary, while portfolios that exploit predictabil-
We consider four risk-factor models to evaluate portfolio performance: ity in the data help investors to avoid underperforming funds, only the
the Fama and French (1993) three-factor model augmented with mo- machine-learning methods that exploit nonlinearities and interactions
mentum (FF3+MOM) proposed by Carhart (1997); the Fama and French (gradient boosting and random forests) allow them to significantly ben-
(2015) five-factor model (FF5); the FF5 model augmented with mo- efit from investing in actively managed funds. Table 3 shows that these
mentum (FF5+MOM); and the FF5 model augmented with momentum findings are remarkably stable when we evaluate out-of-sample alpha
and the aggregate liquidity factor of Pástor and Stambaugh (2003) using the other three factor models we consider, with the only excep-

8
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Table 4
Out-of-sample alpha with respect to OLS. This table reports the monthly out-of-sample
alphas (in %) net of all costs of the portfolio that goes long in the funds selected by
one of the methods we consider (gradient boosting, random forests, elastic net, equally
weighted, asset weighted) and short in the funds selected by OLS. For instance, “gradient
boosting minus OLS” refers to a long-short portfolio that is long on the prediction-based
top-decile portfolio obtained with the gradient-boosting method and short on the top-
decile portfolio obtained with the OLS method. Alphas are computed by regressing the
out-of-sample excess monthly long-short portfolio returns net of all costs against the Fama
and French (1993) three-factor model augmented with momentum (FF3+MOM), the Fama
and French (2015) five factors (FF5), and the FF5 model augmented with the momentum
factor (FF5+MOM) and with the liquidity risk factor of Pástor and Stambaugh (2003)
(FF5+MOM+LIQ). The out-of-sample period spans from January 1991 to December 2020.
We report standard errors with Newey-West adjustment for 12 lags in parentheses. One,
two, and three asterisks indicate that the alpha is significant at the 10%, 5%, and 1% level,
respectively.

FF3+MOM FF5 FF5+MOM FF5+MOM


+ LIQ

Gradient boosting minus OLS 0.122*** 0.136** 0.096** 0.089**


(0.046) (0.056) (0.043) (0.044)
Random forest minus OLS 0.154*** 0.178** 0.123** 0.117**
(0.053) (0.069) (0.050) (0.051)
Elastic net minus OLS -0.012 -0.010 -0.010 -0.010
(0.011) (0.011) (0.011) (0.010)
Equally weighted minus OLS -0.074 -0.092* -0.119** -0.126***
(0.048) (0.052) (0.048) (0.048)
Asset weighted minus OLS -0.100** -0.118** -0.137*** -0.145***
(0.050) (0.054) (0.052) (0.051)

tion being that OLS is statistically significant at the 10% level for the tion on portfolio holdings, which is relevant for the debate on the costs
FF5+MOM+LIQ factor model.14 and benefits of mandatory portfolio disclosure (Aliaj, 2020).
The positive net alphas achieved by the long-only portfolios of funds Although the alphas of the nonlinear machine-learning portfolios
selected by gradient-boosting and random forests are also economically are significantly different from zero, it is unclear whether they are also
significant. For instance, the median of the in-sample alpha spreads be- significantly different from that of the OLS portfolio. To answer this
tween the top and bottom quintile portfolios of funds sorted by the question, we evaluate the performance of a self-financed portfolio that
predictors considered by Jones and Mo (2020, Table 2) is 21.91 bp per goes long in each machine-learning portfolio and short in the OLS port-
month (2.62% per year). Gradient-boosting and random forests achieve folio. Table 4 shows that the difference in performance between the
a similar net alpha for long-only portfolios and out of sample. Note also gradient-boosting and OLS portfolios is positive and significant, ranging
that the out-of-sample net alphas achieved by the portfolios of funds se- from 8.9 bp to 13.6 bp per month (1.1% to 1.6% per year) with respect
lected by gradient boosting and random forests are more than double to the four factor models we consider. A similar conclusion holds for
the average expense ratio in our sample of active funds (1.11%). This the random-forest portfolio, whose outperformance of the OLS portfo-
means that if the average fund decided to cut down all fees and ex- lio ranges between 11.7 bp and 17.8 bp per month (1.4% and 2.1%
penses to zero, it would only boost its net performance by less than half per year) depending on the model. In contrast, the performance of the
the size of the alpha we find for our best portfolios. elastic-net portfolio is statistically indistinguishable from that of the
Our best method, random forests, selects a portfolio of mutual funds OLS portfolio. Finally, both the equally weighted and asset-weighted
that earns a net alpha of 21 bp per month (2.52% per year) with re- portfolios underperform OLS, with the difference being generally statis-
spect to the FF3+MOM model, which is very similar to that of the best tically significant.
top-decile portfolio of Li and Rossi (2020, Table 4), 2.88% per year. Our main goal is to identify funds with positive net alpha. The al-
This is somewhat surprising given that the two studies use disjoint sets pha of a fund measures its ability to improve the Sharpe ratio of an
of predictors: fund characteristics in our case, and stock characteristics investor who already has access to the factors in the model (Gibbons
combined with fund holdings in Li and Rossi (2020). Thus, our em- et al., 1989). However, investors may choose to invest only in mutual
pirical findings complement those of Li and Rossi (2020) by showing funds instead of combining them with benchmark portfolios. Thus, it
that just like manager portfolio holdings, fund traits contain informa- is interesting to study how the various portfolios of active funds per-
tion that can be used to construct portfolios of funds with large positive form in terms of mean return and risk. To answer this question, Table 5
alpha.15 Moreover, our findings demonstrate that it is possible to select reports the following measures for each portfolio of funds: mean ex-
mutual funds with positive net alpha even in the absence of informa- cess net returns; standard deviation of net returns; Sharpe ratio (mean
excess net return divided by standard deviation); Sortino ratio (mean
excess net return divided by semi-deviation); information ratio (alpha
14
Section IA.2 of the Internet Appendix shows that our findings are also ro- net of all costs with respect to FF5+MOM model divided by idiosyn-
bust to evaluating performance with respect to the factor models proposed by
cratic volatility); maximum drawdown; and value-at-risk (VaR) based
Cremers et al. (2013), Hou et al. (2015), and Stambaugh and Yuan (2017).
15 Li and Rossi (2020, Sections 5.3 and 6.3) show that a linear combination of on the historical simulation method with 99% confidence. The rank-
fund characteristics cannot improve the information contained in fund holdings
ing of mean excess net returns closely mirrors the ranking in alphas.
and stock characteristics about future fund returns. Nonetheless, we show that This result is far from obvious because the target variable we use to
using only fund characteristics with machine learning, one can construct port- train the methods is fund alpha, and not fund excess returns, unlike
folios of mutual funds with alphas similar to those obtained by exploiting fund the studies of Wu et al. (2021) and Li and Rossi (2020). Higher mean
holdings and stock characteristics. excess net returns for the prediction-based portfolios are at least par-

9
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Table 5
Out-of-sample mean excess return and risk. For each fund portfolio, this table reports the following monthly
out-of-sample performance metrics: mean excess returns net of all costs; standard deviation; Sharpe ratio (mean
excess return divided by the standard deviation); Sortino ratio (mean excess return divided by the semi-deviation);
information ratio (alpha net of all costs with respect to FF5+MOM model divided by idiosyncratic volatility);
maximum drawdown; and value-at-risk (VaR) based on the historical simulation method with 99% confidence.
The last column reports the average annual portfolio turnover.

Mean Standard Sharpe Sortino Information Maximum VaR Turnover


deviation ratio ratio ratio drawdown 99%

Gradient boosting 0.90% 4.71% 0.192 0.292 0.174 50.3% 12.0% 1.476
Random forest 0.93% 4.96% 0.188 0.290 0.163 55.4% 13.4% 1.410
Elastic net 0.81% 4.81% 0.168 0.249 0.075 58.3% 12.4% 1.219
OLS 0.82% 4.80% 0.170 0.253 0.083 58.5% 12.3% 1.218
Equally weighted 0.78% 4.39% 0.178 0.263 -0.029 51.4% 10.2% 0.414
Asset weighted 0.73% 4.42% 0.166 0.243 -0.069 52.8% 10.7% 0.369

tially explained by higher standard deviation. However, the two best portfolios also have the highest Sharpe, Sortino, and information ratios
methods in terms of alpha (gradient boosting and random forests) also of all the portfolios considered.
deliver portfolios with the highest Sharpe ratios. Our conclusions do
not change if we consider downside risk: gradient boosting and random 5. Which characteristics and interactions matter?
forests select portfolios of funds with the highest Sortino ratio. In terms
of maximum drawdown, the portfolios selected by elastic net and OLS We now study the importance of characteristics and their interac-
appear to be the riskiest, and in terms of VaR, the equally weighted tions for the performance of gradient boosting and random forests. We
and asset-weighted portfolios are the safest. Finally, the relative perfor- also analyze the nature of the nonlinearities and interactions exploited
mance of the different portfolios in terms of information ratio closely by these nonlinear machine-learning methods. Finally, we investigate
parallels that based on net alpha reported in Table 3.16 whether it is possible to replicate the performance of the machine-
Although our measures of performance are net of all costs, it is use- learning portfolios by using a simple strategy based on double sorting
ful to know how much trading the top-decile portfolios require. The last funds across two of the most important characteristics.
column of Table 5 reports the average annual turnover of the top-decile To study the importance of characteristics, we estimate SHAP val-
portfolios. Annual turnover is calculated at the beginning of each cal- ues (Lundberg and Lee, 2017). SHapley Additive exPlanations (SHAP)
endar year, when the portfolio is rebalanced, as the sum of the absolute is a method based on cooperative game theory and used to estimate
values of changes in portfolio weights with respect to the last month of the contribution of each characteristic to each individual prediction.
the previous year across all funds in the sample. For instance, a turnover SHAP is an additive method because aggregating SHAP values across
value of one means that 50% of the wealth in the portfolio is reallocated characteristics, one recovers the difference between the prediction for
across funds each year. As expected, the naive portfolios have very low an individual observation and the average prediction across all obser-
turnover. Approximately, only 20% of the portfolio is reallocated from vations.17 Fig. 2 reports characteristic importance for OLS, elastic net,
year to year due to changes in the pool of available funds and (for the gradient-boosting, and random forests. To quantify the importance of a
equally weighted portfolio) also to changes in fund values. In contrast, characteristic, we compute the mean across all observations of the abso-
managing a portfolio based on the performance predictions of elastic lute SHAP value for the characteristic. We evaluate importance within
net and OLS involves trading roughly 60% of the portfolio value each the last estimation window, which spans the 1980 to 2019 period.
year, whereas investing based on gradient boosting and random forests We highlight two main findings from Fig. 2. First, value added, alpha
requires trading 70% of the portfolio value. These findings suggest that intercept 𝑡-stat, market beta 𝑡-stat, and 𝑅2 are among the top five most
to achieve superior performance investing in actively managed funds, important characteristics for both nonlinear methods (gradient boosting
portfolio managers must also actively trade their wealth across these and random forests). This demonstrates that the nonlinear machine-
funds, and thus, it is important to account for fund loads when we eval- learning methods can exploit at least two different measures of past
uate portfolio performance. performance (alpha intercept 𝑡-stat and value added) to predict future
Taken together, the results in this section suggest that it is possible alpha.18 The nonlinear methods also exploit measures of fund active-
to exploit readily available fund characteristics to select portfolios of ness to predict future performance. To see this, note that market beta
mutual funds that significantly outperform (in terms of net alpha) the 𝑡-stat can be interpreted as a measure of fund activeness because one
equally weighted or asset-weighted average mutual fund. This is true
even if investors use the worst-performing forecasting methods, elastic
17 The SHAP method is model-agnostic, applicable to any type of data, and
net and OLS, to predict performance. In other words, elastic net and
provides additive interpretation (contribution of each characteristic to the pre-
OLS help investors to avoid underperforming funds. However, neither
diction) of machine-learning models, including feature importance, feature de-
elastic net nor OLS allow investors to identify funds with significant
pendence, interactions, clustering and summary plots. Moreover, the tree-based
positive net alpha ex-ante. Only methods that allow for nonlinearities versions take into account the dependencies between characteristics (Lundberg
and interactions in the relation between fund characteristics and subse- et al., 2020). For these reasons, SHAP has recently become the method of choice
quent performance, namely gradient boosting and random forests, can to visualize feature importance and interactions. For a general discussion see
detect funds with large and significant alphas. Moreover, the resulting Molnar (2019) and for applications in finance see Pedersen (2022) and Bali et
al. (2023).
18 Note that the other measure of past performance we consider (realized al-
16
Note that there is a close relation between information ratio and alpha t-stat. pha) is only the eighth most important characteristic for gradient boosting and
In particular, equation (4) in Gibbons et al. (1989) implies that the alpha 𝑡-stat the twelfth for random forests, which demonstrates that alpha intercept 𝑡-stat
of a portfolio is proportional to its information ratio, with the proportionality and value added are much more important measures of past performance for
constant depending on the number of observations and the maximum Sharpe our nonlinear methods. This finding contrasts with that of Kaniel et al. (2023),
ratio of the factors in the model. This explains why the relative performance of who find that their 12-month fund-momentum characteristic, which is closely
the different portfolios in terms of out-of-sample information ratio is similar to related to our annual realized alpha, is the second most important predictor for
that in terms of alpha. their neural networks.

10
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Fig. 2. Characteristic importance. This figure reports the importance of each characteristic measured as the average across all observations of the absolute SHAP
value of the characteristic for ordinary least squares (OLS), elastic net, gradient boosting, and random forests. We compute characteristic importance for the last
estimation window, which spans the period from 1980 to 2019.

would expect less active funds to have highly statistically significant lar, the solid lines depicting the conditional mean SHAP value for each
betas on the market. Indeed, Fig. 1 shows that market beta 𝑡-stat has a characteristic are quite similar across the two nonlinear methods.20
high correlation of 54% with 𝑅2 , which Amihud and Goyenko (2013) Interestingly, we find that there is an approximately linear relation
consider as a measure of fund activeness. between alpha intercept 𝑡-stat and its conditional mean SHAP value.
Our second finding is that nonlinear and linear methods differ in This may explain why alpha intercept 𝑡-stat is the most important char-
characteristic importance. For example, for the two linear methods acteristic for both linear methods, OLS and elastic net.21 However,
characteristic importance declines sharply beyond the two most impor- there is a substantial degree of nonlinearity in the relation between
tant characteristics, but it declines much more gradually for the two the other three characteristics, which are important mainly for the non-
nonlinear methods, for which around seven characteristics are similarly linear methods, and predicted performance. For instance, we find that
important. Another difference is that value added, which is one of the the relation between fund activeness and future performance is highly
two most important characteristics for the nonlinear methods, is not nonlinear, with the relation being strongly positive for the most active
very important for the linear methods. Finally, fund expense ratio is the funds, but flat for the rest of the funds. In particular, we observe that
sixth most important characteristic for the linear methods, but it is less very low standardized market beta 𝑡-stats predict superior performance,
important for the nonlinear methods. but the relation between market beta 𝑡-stat and future performance is
The differences between nonlinear and linear methods in terms of flat for larger market beta 𝑡-stats. Similarly, consistent with Amihud
both performance and characteristic importance suggest that there ex- and Goyenko (2013) there is an inverse relation between 𝑅2 and per-
ist nonlinearities and interactions in the relation between characteristics formance for values of 𝑅2 between −2.75 and −2, but the relation is
and performance that investors can exploit to select actively managed roughly flat for values of standardized 𝑅2 above −2. Finally, the re-
equity funds. To explore the nature of these nonlinear relations, Figs. 3 lation between value added and its conditional mean SHAP value is
and 4 display SHAP plots for four of the most important characteristics flat for standardized value added below −0.06, u-shaped for intermedi-
for gradient boosting and random forests: alpha intercept 𝑡-stat, value ate value added, monotonically increasing for standardized value added
added, market beta 𝑡-stat, and 𝑅2 . For each SHAP plot, the horizon- between zero and 0.15, and decreasing above 0.15.
tal axis shows the cross-sectionally standardized characteristic and the
vertical axis the characteristic SHAP value for each observation (green
20
dots) and the mean SHAP value conditional on the value of the charac- Comparing Figs. 3 and 4, we also find that one difference between the two
nonlinear methods is that the SHAP values for random forests are much more
teristic (solid dark green line).19
dispersed that those for gradient boosting. This is because, as explained in Sec-
Comparing Figs. 3 and 4, we find that the nonlinear patterns identi-
tion 3, while random forests employ ensembles of uncorrelated regression trees,
fied by the two machine-learning methods are very similar. In particu- gradient boosting employs a sequence of regression trees that build on each
other, and thus, are potentially correlated.
21 In unreported results, we also find that there is a linear relation between
19 To estimate the conditional mean SHAP value, we split the horizontal axis expense ratio and predicted alpha. This is not surprising as the expense ratio is
into a set of bins and compute the average SHAP value for each bin. linearly subtracted from gross alpha to obtain net alpha.

11
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Fig. 3. Nonlinearity in the relation between fund characteristics and performance for gradient boosting. This figure displays SHAP plots for the gradient-
boosting method corresponding to four characteristics: alpha intercept 𝑡-stat (top left graph), value added (top right graph), market beta 𝑡-stat (bottom left graph),
and 𝑅2 (bottom right). For each SHAP plot, the horizontal axis shows the cross-sectionally standardized characteristic and the vertical axis the characteristic’s SHAP
value for each observation (green dots) and the mean SHAP value conditional on the value of the characteristic (solid dark green line). Estimates are for the last
estimation window spanning the period from 1980 to 2019.

We now turn our attention to interaction importance. Fig. 5 depicts the top 30 most important interactions.23 Similarly, for gradient boost-
the strength of the 30 most important interactions of characteristics for ing three of the four possible interactions between the aforementioned
gradient boosting and random forests.22 The figure reveals that past measures of past performance and fund activeness are among the top
performance measures such as alpha intercept 𝑡-stat and value added 30. This suggests that the ability of fund past performance to predict
are not only important as standalone predictors as shown in Fig. 2, but future performance may depend on the activeness of the fund.
are also crucial through their interactions with measures of fund active- To further explore this conjecture, Figs. 6 and 7 illustrate the inter-
ness such as market beta 𝑡-stat and 𝑅2 . For instance, the most important action between measures of past performance (alpha intercept 𝑡-stat or
value added) and measures of fund activeness (market beta 𝑡-stat or 𝑅2 )
interaction for random forests is alpha intercept 𝑡-stat with market beta
for gradient boosting and random forests. For each interaction, we split
𝑡-stat. Also, all four possible interactions between the two aforemen-
all observations into deciles of the fund-activeness characteristic and
tioned measures of past performance and fund activeness are among
depict, for each decile, the conditional mean SHAP value of the past-
performance characteristic. For instance, the top-left graph in Fig. 6
illustrates the interaction between alpha intercept 𝑡-stat and market
beta 𝑡-stat for gradient boosting. As expected, the SHAP values increase
22 As mentioned before, SHAP values are additive across characteristics: aggre-
with alpha intercept 𝑡-stat for every decile of market beta 𝑡-stat, but the
gating SHAP values for each observation across the characteristics, we recover
increase is much steeper for lower deciles of market beta 𝑡-stat (blue
the difference between the prediction for each observation and the average
solid lines). That is, alpha intercept 𝑡-stat is a particularly strong predic-
prediction across all observations. Moreover, the SHAP value for each charac-
teristic can also be decomposed into the pure effect of the characteristic and
the SHAP interaction value of the characteristic with each of the other charac-
teristics; see Molnar (2019, Section 9.6.8). Thus, the SHAP method estimates 23 Note that there is a total of 136 pairwise interactions between the 17 char-

interaction strength by computing the mean across all observations of the abso- acteristics in our dataset, and thus, all interactions among the top 30 are at the
lute SHAP interaction value for each pair of characteristics. top quartile of importance.

12
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Fig. 4. Nonlinearity in the relation between fund characteristics and performance for random forests. This figure displays SHAP plots for the random-forest
method corresponding to four characteristics: alpha intercept 𝑡-stat (top left graph), value added (top right graph), market beta 𝑡-stat (bottom left graph), and 𝑅2
(bottom right). For each SHAP plot, the horizontal axis shows the cross-sectionally standardized characteristic and the vertical axis the characteristic’s SHAP value
for each observation (green dots) and the mean SHAP value conditional on the value of the characteristic (solid dark green line). Estimates are for the last estimation
window spanning the period from 1980 to 2019.

tor of future performance for more active mutual funds. In other words, Given the importance of the measures of past performance and fund
although investors may generally achieve higher net alpha by holding activeness and their interactions for the nonlinear machine-learning
funds with good past performance, the effect is much stronger for more portfolios, it is interesting to study whether it is possible to earn posi-
active funds. Similarly, the top-right graph in Fig. 6 shows that alpha tive net alpha by using a simple strategy based on double sorting funds
intercept 𝑡-stat is particularly helpful to predict the future performance across one measure of past performance and one measure of fund ac-
of funds with low 𝑅2 , that is, funds whose returns are not explained by tiveness. To do this, at the beginning of each year in our out-of-sample
common risk factors. The bottom-left and bottom-right graphs in Fig. 6 period, we first sort all funds in terms of the performance measure
√ for
show that the effect of the interactions between value added and the the previous year and select funds that are above the top- 10th per-
two measures of fund activeness is similar, albeit weaker. Fig. 7 shows centile. Second, we sort the selected funds in terms of the activeness
very similar effects for random forests.24 measure√at the end of the previous year and select funds below the
bottom- 10th percentile.25 This procedure results in a portfolio that
contains 10% of the funds. Table 6 reports the monthly out-of-sample
24 alphas net of all costs of the resulting long-only portfolios of funds
To understand the impact on portfolio composition of the nonlinearities and
interactions exploited by machine learning, we compute the fund overlap for obtained by combining one of two past-performance measures (alpha
the portfolios of the four prediction methods averaged over the out-of-sample 𝑡-stat and value added) with one of two fund-activeness measures (𝑅2
period. We find that while the fund portfolios selected by the two linear meth- and market beta 𝑡-stat).
ods (OLS and elastic net) are very similar, with an average 94% fund overlap,
the overlap between the portfolios of the two nonlinear methods and OLS is
much smaller, around 45%. This shows that while the shrinkage of elastic net
has negligible impact on portfolio composition, the nonlinearities and interac- 25 Note that 𝑅2 and market beta 𝑡-stat are inverse measures of fund activeness,

tions exploited by gradient boosting and random forests lead to portfolios of and thus, we select funds below the bottom- 10th percentile of their distribu-
funds that differ substantially from the OLS portfolios. tion.

13
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Fig. 5. Interaction importance. This figure reports the interaction strength of the 30 most important interactions for the gradient-boosting and random-forest meth-
ods. We compute interaction strength as the average across all observations of the absolute SHAP interaction value for each pairwise combination of characteristics.
We compute interaction importance for the last estimation window, which spans the period from 1980 to 2019.

14
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Fig. 6. Interactions between past performance and activeness measures for gradient boosting. Each graph illustrates the interaction between one past-
performance characteristic (alpha intercept 𝑡-stat or value added) and one fund-activeness characteristic (market beta 𝑡-stat or 𝑅2 ) for gradient boosting. For each
graph, the horizontal axis depicts the cross-sectionally standardized past-performance characteristic and the vertical axis the characteristic’s SHAP value for each
observation (green dots). To visualize the interaction, we split all observations into deciles of the fund-activeness characteristic and depict, for each decile, the
conditional mean SHAP value of the past-performance characteristic (solid lines). Estimates are for the last estimation window spanning the period from 1980 to
2019.

Table 6 shows that it is indeed possible to achieve positive net al- Table 6 suffer from look-ahead bias because the pairs of characteristics
pha by double sorting mutual funds based on past performance and for the double sort have been selected based on characteristic and in-
fund activeness. For instance, the portfolios of funds based on a double teraction importance computed using the entire sample. The results in
sort of alpha 𝑡-stat and 𝑅2 achieve alphas that are statistically signifi- Table 6 demonstrate that although the portfolios obtained from a sim-
cant at the 10% level, albeit slightly smaller than those attained by the ple double sort can achieve good out-of-sample performance, investors
nonlinear machine-learning methods in Table 3. Interestingly, the port- should resort to nonlinear machine-learning methods in order to iden-
folios of funds based on a double sort of alpha 𝑡-stat and market beta tify the relevant characteristics and interactions at each point in time
𝑡-stat achieve even higher alphas that are generally statistically signif- (based only on past data) and achieve good performance in real time.
icant at the 5% level and comparable in magnitude to those attained To investigate whether the predictive ability of some characteristics
by the nonlinear machine-learning methods. This confirms the impor- changes over time, Figs. 8 and 9 depict the importance of each pre-
tance of the interaction of 𝑅2 with measures of past performance as dictor in each year of the out-of-sample period for gradient boosting
documented by Amihud and Goyenko (2013), but also reveals market and random forests, respectively. Figs. 8 and 9 exhibit some remark-
beta 𝑡-stat as an alternative measure of fund activeness whose inter- able similarities, which suggests that the two methods identify similar
action with past performance helps to identify outperforming funds. patterns in the data. More importantly, the figures show that the impor-
However, Table 6 also shows that the performance of the portfolios of tance of characteristics such as alpha 𝑡-stat, value added, and 𝑅2 varies
funds based on the double sorts is quite heterogeneous across differ- substantially over time.
ent pairs of characteristics. For instance, the out-of-sample net alphas Overall, our findings suggest that various measures of past perfor-
of the double-sorted portfolios based on value added and either mar- mance and fund activeness and their interactions are important for the
ket beta 𝑡-stat or 𝑅2 are not significantly different from zero, and their ability of the nonlinear machine-learning portfolios to achieve signif-
magnitude is substantially smaller than those of the nonlinear machine- icant positive net alphas. We also find that, although it is possible to
learning portfolios. Moreover, it is important to note that the results in achieve positive net alpha by double sorting mutual funds based on

15
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Fig. 7. Interactions between past-performance and activeness measures for random forests. Each graph illustrates the interaction between one past-
performance characteristic (alpha intercept 𝑡-stat or value added) and one fund-activeness characteristic (market beta 𝑡-stat or 𝑅2 ) for random forests. For each
graph, the horizontal axis depicts the cross-sectionally standardized past-performance characteristic and the vertical axis the characteristic’s SHAP value for each
observation (green dots). To visualize the interaction, we split all observations into deciles of the fund-activeness characteristic and depict, for each decile, the
conditional mean SHAP value of the past-performance characteristic (solid lines). Estimates are for the last estimation window spanning the period from 1980 to
2019.

past performance and fund activeness, the performance of such double- they can identify skilled managers, but also because they can identify
sorted portfolios is heterogeneous across different pairs of characteris- managers whose skill is not offset by diseconomies of scale.
tics. Moreover, the relative predicting ability of the measures of past In the perfectly competitive equilibrium of Berk and Green (2004),
performance and fund activeness varies substantially over time, and fund size is such that diseconomies of scale and fees completely offset
thus, to achieve superior out-of-sample performance, investors should the manager’s ability to generate gross alpha, and thus, expected net al-
use machine learning dynamically to identify the characteristics and in- pha is zero for every fund. However, Roussanov et al. (2021) show that,
teractions that are important at each point in time. in a structural model where investors face informational frictions, funds
do not necessarily reach their Berk and Green (2004) equilibrium size.
6. Capital misallocation and machine learning Consequently, in expectation a subset of funds may earn positive net
alpha while others may earn negative net alpha. Using data on U.S. ac-
To investigate the economic mechanism behind our results, we now tive domestic equity funds from 1964 to 2015, Roussanov et al. (2021)
build on the work by Roussanov et al. (2021) and study whether capital employ a Bayesian approach to estimate managerial skill and find that
misallocation in the mutual-fund market can explain the performance about 80% of funds manage assets above their efficient size, while funds
of the nonlinear machine-learning portfolios. To do this, we compute in the top decile of skill are “too small” relative to their manager’s skill.
the average net skill and size of funds in the decile portfolios gener- Following Roussanov et al. (2021), we assume that the net alpha
ated by the four prediction methods. Our main finding is that funds in of a fund can be decomposed into skill, diseconomies of scale, expense
the top decile are “too small” for diseconomies of scale to completely ratio, and a zero-mean idiosyncratic shock. Thus, the expected net alpha
offset the skill of their managers, with funds in the top decile gener-
of fund 𝑖 can be written as:
ated by the nonlinear methods being particularly small. This provides
an economic interpretation of our results: Nonlinear machine-learning
( )
methods help to select outperforming mutual funds, not only because 𝐸 𝛼𝑖,𝑡+1 |𝑡 = 𝑎̂𝑖,𝑡+1 − 𝐷(𝑄𝑖,𝑡 ) − 𝑝𝑖,𝑡 , (5)

16
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Table 6
Out-of-sample alpha of double-sorted portfolios. This table reports the monthly out-of-
sample alphas (in %) net of all costs of the portfolio of funds obtained by double sorting
the funds in terms of past performance and fund activeness. Specifically, at the beginning of
each year in the out-of-sample period, we sort all funds in terms√of the performance mea-
sure for the previous year and select funds that are above the top- 10th percentile. Second,
we sort the remaining funds in terms of √ the activeness measure at the end of the previous
year and select funds below the bottom- 10th percentile. This procedure results in a port-
folio that contains 10% of the funds. We consider two past-performance measures (alpha
𝑡-stat and value added) and two fund-activeness measures (𝑅2 and market beta 𝑡-stat). The
portfolio alphas reported in the table are computed by regressing the out-of-sample excess
monthly portfolio returns net of all costs against the Fama and French (1993) three-factor
model augmented with momentum (FF3+MOM), the Fama and French (2015) five factors
(FF5), and the FF5 model augmented with the momentum factor (FF5+MOM) and with the
liquidity risk factor of Pástor and Stambaugh (2003) (FF5+MOM+LIQ). The out-of-sample
period spans from January 1991 to December 2020. We report standard errors with Newey-
West adjustment for 12 lags in parentheses. One, two, and three asterisks indicate that the
alpha is significant at the 10%, 5%, and 1% level, respectively.

FF3+MOM FF5 FF5+MOM FF5+MOM


Double sort on +LIQ

Alpha 𝑡-stat and 𝑅2 0.179** 0.195** 0.177* 0.179*


(0.088) (0.097) (0.090) (0.092)
Alpha 𝑡-stat and market beta 𝑡-stat 0.181* 0.235** 0.207** 0.211**
(0.096) (0.108) (0.099) (0.100)
Value added and 𝑅2 0.109 0.154 0.113 0.111
(0.091) (0.102) (0.095) (0.096)
Value added and market beta 𝑡-stat 0.110 0.181 0.137 0.136
(0.098) (0.113) (0.102) (0.104)

( )
where 𝑎̂𝑖,𝑡+1 = 𝐸 𝑎𝑖,𝑡+1 |𝑡 is the expected skill of fund 𝑖 conditional on diction methods identify managers with higher net skill. The figure
the information set 𝑡 , 𝐷(𝑄𝑖,𝑡 ) is the impact of diseconomies of scale also shows that fund size also increases monotonically for the bottom
given the size of fund 𝑖 at time 𝑡, 𝑄𝑖,𝑡 , and 𝑝𝑖,𝑡 is the expense ratio of nine decile portfolios, consistent with investors being generally able to
fund 𝑖 at time 𝑡, which, given the persistence of fund expense ratios, identify funds with higher net skill. However, we observe that funds
is a reliable predictor of the expense ratio at time 𝑡 + 1. Roussanov et
in the top decile of alpha predicted by all four methods manage on
al. (2021) further assume that the diseconomies of scale are logarith-
( ) average substantially smaller portfolios than funds in the second-best
mic, 𝐷(𝑄𝑖,𝑡 ) = 𝜂 log 𝑄𝑖,𝑡 . Thus, in the perfectly competitive equilibrium
decile. This pattern is particularly striking for funds in the top decile of
of ( ) Green (2004), the efficient size of fund 𝑖 should satisfy
Berk and
alpha predicted by the two nonlinear machine-learning methods (gra-
log 𝑄𝐵𝐺
𝑖,𝑡 = (𝑎̂𝑖,𝑡+1 − 𝑝𝑖,𝑡 )∕𝜂, where 𝑎̂𝑖,𝑡+1 − 𝑝𝑖,𝑡 is the net skill of fund 𝑖 at
dient boosting and random forests), which are surprisingly small with
time 𝑡 + 1.
size similar to that of funds in the bottom fourth decile of the predicted
To estimate the expected skill for fund 𝑖 in year 𝑡, 𝑎̂𝑖,𝑡+1 , we follow
Zhu (2018) and average the fund’s (annual) realized alphas before fees alpha distribution.
and diseconomies of scale from the fund’s inception. We compute the These findings suggest that informational frictions prevent investors
( )
diseconomies of scale as 𝐷(𝑄𝑖,𝑡 ) = 𝜂 log 𝑄𝑖,𝑡 where 𝜂 = 0.0048, as es- from identifying some of the funds whose managers have the highest net
timated by Roussanov et al. (2021), and 𝑄𝑖,𝑡 equals the assets under skill, and thus, these funds remain small relative to their manager’s skill.
management of all of the fund’s share classes at the end of year 𝑡, ex- Comparing the mean log size of the decile portfolios of the four predic-
pressed in 2015 dollars.26 tion methods to the straight black line that depicts the efficient (Berk
Fig. 10 illustrates capital misallocation for the decile portfolios gen- and Green) log size, we observe that our findings are largely consistent
erated by the four prediction methods. For the 𝑗th decile portfolio of with those of Roussanov et al. (2021) despite the different method-
funds ranked by predicted alpha, the horizontal axis gives the mean net
ologies employed in the two papers. Funds in the bottom 80% of the
skill, 𝐸(𝑎̂𝑖,𝑡+1 − 𝑝𝑖,𝑡 |𝑖 ∈ 𝐷𝑗 ), where 𝐷𝑗 is the set of funds in the 𝑗th decile,
( ) predicted net alpha distribution are “too large” for their estimated skill
and the vertical axis the mean log size, 𝐸(log 𝑄𝑖,𝑡 |𝑖 ∈ 𝐷𝑗 ). The colored
lines plot the mean log size for each decile portfolio generated by OLS while funds in the top 10% of the distribution are below their efficient
(orange stars), elastic net (yellow squares), gradient boosting (purple size.
crosses), and random forests (green diamonds). For every method, the Overall, the findings in this section suggest that the conclusions
first decile portfolio has the lowest net skill( and ) mean log size. We also of Roussanov et al. (2021) regarding capital misallocation in the U.S.
plot the efficient (Berk-Green) log size, log 𝑄𝐵𝐺
𝑖,𝑡 , for each level of net mutual-fund industry are robust to the method of finding misallocated
skill (straight black line). funds. Moreover, the findings provide an economic interpretation of
Fig. 10 shows that mean net skill increases monotonically for the our results. Nonlinear machine-learning methods help to select mutual
decile portfolios of all four prediction methods; that is, the four pre- funds not only because they can identify skilled managers, but also be-
cause they can identify managers whose skill is not sufficiently offset
by diseconomies of scale. Our findings are consistent with a competi-
26 To adjust assets under management for inflation, we follow Roussanov et al.
tion framework à la Berk and Green (2004) in which frictions prevent a
(2021) and multiply assets in year 𝑡 by the Consumer Price Index (CPI) at the
substantial fraction of the investor population from identifying some of
end of 2015 divided by the CPI at the end of year 𝑡. We download data for CPI
using the FRED series “Consumer Price Index for All Urban Consumers: All Items the funds whose managers have the highest skill, and thus, these funds
in U.S. City Average, Index 1982-1984=100, Monthly, Seasonally Adjusted.”. remaining small relative to their manager’s skill.

17
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Fig. 8. Time evolution of characteristic importance for gradient boosting. This figure plots the time evolution of the importance of each characteristic for
gradient boosting. We measure the importance of each characteristic as the average across all observations of the absolute SHAP value of the characteristic. We
scale characteristic importance so that it ranges between zero for the least important characteristic and 100 for the most important characteristic and report relative
importance for each year from 1980 to 2019.

7. Performance over time and across market conditions prediction-based portfolios outperform the two naive portfolios in the
last two years of our sample (2019 and 2020). In particular, while the
Jones and Mo (2020) show that the ability of fund characteristics gradient-boosting, random-forests, and OLS portfolios achieve cumula-
to predict performance has declined over time due to increased arbi- tive (2019–2020) net alphas of 4.7%, 2.2%, and −0.1%, respectively,
trage activity and mutual-fund competition. Motivated by their work, the equally weighted and asset-weighted portfolios earn negative cu-
we study how the alpha of the different portfolios varies over time. To mulative net alphas of −2.8% and −3.9%, respectively.
do this, we compute the cumulative net alpha of the top-decile port-
Li and Rossi (2020) study whether the ability of mutual-fund holdings
folio for gradient boosting, random forests, and OLS in each month of
and stock characteristics to predict fund performance varies across mar-
the out-of-sample period from 1991 to 2020 as well as those of the
ket conditions. Inspired by their work, we now investigate whether the
equally weighted and asset-weighted portfolios.27 Fig. 11 shows the
ability of fund characteristics to select funds with positive alpha changes
time-series of cumulative abnormal returns. The three prediction-based
portfolios (gradient boosting, random forests, and OLS) outperform the across market conditions. Like Li and Rossi (2020), we condition esti-
two naive portfolios (equally weighted and asset weighted) over the mates of performance on expansions and recessions, as well as on high
whole 30-year out-of-sample period. In particular, while the gradient- and low investor sentiment. Specifically, we regress the out-of-sample
boosting, random-forests, and OLS portfolios achieve cumulative net monthly excess returns of the top decile portfolios selected by gradi-
alphas of 69%, 78%, and 34%, respectively, the equally weighted and ent boosting and random forests on the Fama and French (2015) five
asset-weighted portfolios earn negative cumulative net alphas of −7% factors and momentum as well as indicator variables for expansions
and −13%, respectively. Consistent with Jones and Mo (2020), how- and recessions, and high and low investor sentiment. Expansions and
ever, the performance of the prediction-based portfolios is similar to recessions are defined following the NBER convention. The high (low)
that of the naive portfolios from 2012 until 2018. Nevertheless, all three investor sentiment indicator equals one if investor sentiment, as defined
in Baker and Wurgler (2006, 2007), is above (below) the median of the
27 July 1965 to December 2020 period. Specifically, we download from
We compute monthly net alphas as the portfolio excess returns net of all
costs each month minus the product of the factor realization in that month and Jeffrey Wurgler’s website the version of investor sentiment based on
the portfolio betas estimated over the whole out-of-sample sample period using the first principal component of five sentiment proxies, where each of
the FF5 model augmented with momentum. the proxies has first been orthogonalized with respect to six macroeco-

18
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Fig. 9. Time evolution of characteristic importance for random forests. This figure plots the time evolution of the importance of each characteristic for
random forests. We measure the importance of each characteristic as the average across all observations of the absolute SHAP value of the characteristic. We scale
characteristic importance so that it ranges between zero for the least important characteristic and 100 for the most important characteristic and report relative
importance for each year from 1980 to 2019.

Table 7
Out-of-sample alpha of fund portfolios under different market conditions. This table reports the
monthly out-of-sample alphas (in %) net of all costs for the top-decile fund portfolios obtained with
gradient boosting and random forests under different market conditions. Alphas are computed by
regressing the out-of-sample excess monthly portfolio returns net of all costs against the Fama and
French (2015) five factors and momentum as well as indicator variables for expansions and recessions
(Panel A), and high and low investor sentiment (Panel B). Expansions and recessions are defined
following the NBER convention. The high (low) investor sentiment indicator equals one if investor
sentiment, as defined in Baker and Wurgler (2006, 2007), is above (below) the median of the July
1965 to December 2020 period. The out-of-sample period spans from January 1991 to December
2020. We report standard errors with Newey-West adjustment for 12 lags in parentheses. One, two,
and three asterisks indicate that the alpha is significant at the 10%, 5%, and 1% level, respectively.

Panel A. Business Cycle Panel B. Investor Sentiment


Expansion Recession Exp.− Rec. High Low High − Low

Gradient boosting 0.179** 0.375 -0.196 0.233*** 0.150 0.083


(0.082) (0.228) (0.226) (0.085) (0.109) (0.106)
Random forests 0.202** 0.445* -0.243 0.266**** 0.169 0.097
(0.087) (0.248) (0.236) (0.102) (0.118) (0.131)

nomic indicators. Table 7 reports estimated alphas for different market folios achieve positive alphas across all market conditions, and although
conditions and their standard errors with Newey-West adjustment for they perform better in recessions and times of high investor sentiment,
12 lags. We also report differences in alphas across market conditions. the differences in alpha across different market conditions are not sta-
Our main finding is that the gradient-boosting and random-forest port- tistically significant.

19
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Fig. 10. Capital misallocation and machine learning. This figure illustrates capital misallocation for the decile portfolios generated by the four prediction methods
we consider. For the 𝑗th decile portfolio of funds ranked by predicted alpha, the horizontal axis gives the mean net skill, 𝐸(𝑎̂ 𝑖,𝑡+1 − 𝑝𝑖,𝑡 |𝑖 ∈ 𝐷𝑗 ), where 𝐷𝑗 is the set of
( )
funds in the 𝑗th decile, and the vertical axis the mean log size, 𝐸(log 𝑄𝑖,𝑡 |𝑖 ∈ 𝐷𝑗 ). The colored lines plot the mean log size for each decile portfolio generated by OLS
(orange stars), elastic net (yellow squares), gradient boosting (purple crosses), and random forests ( (green
) diamonds). For every method, the first decile portfolio has
the lowest mean net skill and mean log size. We also plot the efficient (Berk-Green) log size, log 𝑄𝐵𝐺
𝑖,𝑡 , for each level of net skill (straight black line). Net skill is the
average of past realized alpha before fees and diseconomies of scale estimated using the approach of Zhu (2018) minus the current expense ratio. Diseconomies of
scale are computed based on Roussanov et al. (2021) as the log of fund size multiplied by the diseconomies of scale parameter, 𝜂 = 0.0048 as estimated by Roussanov
et al. (2021). The efficient (Berk and Green) fund sizes for each level of skill are computed by dividing net skill by the diseconomies of scale parameter, 𝜂.

Fig. 11. Cumulative portfolio alpha. This figure plots the time series of cumulative out-of-sample portfolio realized alphas of the excess returns net of all costs
of the top-decile fund portfolios. Realized portfolio alphas are based on the regressions on the five Fama-French factors augmented with momentum (FF5+MOM).
Portfolios are obtained with gradient boosting (GB), random forests (RF), OLS, and with two naive strategies (equally weighted (EW) and asset-weighted (AW)
portfolios of all available funds).

20
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

8. Conclusions Avramov, D., Wermers, R., 2006. Investing in mutual funds when returns are predictable.
J. Financ. Econ. 81 (2), 339–377.
Baker, M., Wurgler, J., 2006. Investor sentiment and the cross-section of stock returns. J.
The question of whether mutual-fund investors can earn positive net
Finance 61 (4), 1645–1680.
alpha by investing in active mutual funds has received much attention Baker, M., Wurgler, J., 2007. Investor sentiment in the stock market. J. Econ. Perspect. 21
from academics, practitioners, and regulators. We posit that the pes- (2), 129–152.
simistic results that dominate the literature could be a consequence of Baks, K.P., Metrick, A., Wachter, J., 2001. Should investors avoid all actively managed
the methods employed to exploit predictability in fund performance. mutual funds? A study in Bayesian performance evaluation. J. Finance 56 (1), 45–85.
Bali, T.G., Beckmeyer, H., Moerke, M., Weigert, F., 2023. Option return predictability
In particular, we show that machine-learning methods can dynamically
with machine learning and big data. Rev. Financ. Stud. 36 (9), 3548–3602.
identify and exploit nonlinearities and interactions in the relation be- Banegas, A., Gillen, B., Timmermann, A., Wermers, R., 2013. The cross section of condi-
tween fund characteristics and performance and help investors to select tional mutual fund performance in European stock markets. J. Financ. Econ. 108 (3),
funds that earn significant and positive alphas net of fees and transac- 699–726.
tion costs. The machine-learning methods reveal that the interactions Barras, L., Scaillet, O., Wermers, R., 2010. False discoveries in mutual fund performance:
measuring luck in estimated alphas. J. Finance 65 (1), 179–216.
between measures of past performance and fund activeness help to pre-
Bergmeir, C., Hyndman, R.J., Koo, B., 2018. A note on the validity of cross-validation
dict future fund performance. Our results demonstrate that investors for evaluating autoregressive time series prediction. Comput. Stat. Data Anal. 120,
can benefit from actively managed mutual funds, but only if they have 70–83.
access to sophisticated predictions that allow flexibility in the relation Berk, J., Green, R., 2004. Mutual fund flows and performance in rational markets. J. Polit.
between fund characteristics and performance. Econ. 112 (6), 1269–1295.
Berk, J.B., Van Binsbergen, J.H., 2015. Measuring skill in the mutual fund industry. J.
To understand the economic mechanism behind our results, we
Financ. Econ. 118 (1), 1–20.
study whether the performance of our portfolios can be explained by Bianchi, D., Büchner, M., Tamoni, A., 2021. Bond risk premiums with machine learning.
capital misallocation in the mutual-fund market, and find that indeed Rev. Financ. Stud. 34 (2), 1046–1089.
machine learning selects funds that are small relative to their managers’ Bollen, N.P., Busse, J.A., 2005. Short-term persistence in mutual fund performance. Rev.
skill, consistent with informational frictions preventing some investors Financ. Stud. 18 (2), 569–597.
Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32.
from identifying the outperforming funds. Our work implies that there
Bryzgalova, S., Pelger, M., Zhu, J., 2019. Forest Through the Trees: Building Cross-
is scope for pension-plan administrators and financial advisors to inte- Sections of Stock Returns. Available at SSRN 3493458.
grate machine learning with other tools in order to help investors select Busse, J.A., Irvine, P.J., 2006. Bayesian alphas and mutual fund persistence. J. Finance 61
active mutual funds with positive alpha. (5), 2251–2288.
Finally, our finding that mutual-fund characteristics that do not re- Butaru, F., Chen, Q., Clark, B., Das, S., Lo, A.W., Siddique, A., 2016. Risk and risk man-
agement in the credit card industry. J. Bank. Finance 72, 218–239.
quire information on fund portfolio holdings are enough to predict
Carhart, M.M., 1997. On persistence in mutual fund performance. J. Finance 52 (1),
positive alpha implies that even if no information on portfolio hold- 57–82.
ings had been available during our sample period, our methods would Chan, L.K., Chen, H.-L., Lakonishok, J., 2002. On mutual fund investment styles. Rev.
have identified funds with positive net alpha on average. This is rel- Financ. Stud. 15 (5), 1407–1437.
evant to the debate around the recent SEC proposal to raise the asset Chen, J., Hong, H., Huang, M., Kubik, J.D., 2004. Does fund size erode mutual fund per-
formance? The role of liquidity and organization. Am. Econ. Rev. 94 (5), 1276–1302.
threshold for mandatory portfolio disclosure.
Chen, L., Pelger, M., Zhu, J., 2020a. Deep learning in asset pricing. Manag. Sci. https://
doi.org/10.1287/mnsc.2023.4695. Forthcoming.
Declaration of competing interest Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., Mitchell, R.,
Cano, I., Zhou, T., Li, M., Xie, J., Lin, M., Geng, Y., Li, Y., 2020b. xgboost: Extreme
The authors declare that they have no known competing financial gradient boosting. R package version 1.2.0.1.
Chiang, W.-C., Urban, T.L., Baldridge, G.W., 1996. A neural network approach to mutual
interests or personal relationships that could have appeared to influence
fund net asset value forecasting. Omega 24 (2), 205–215.
the work reported in this paper. Coulombe, P.G., Leroux, M., Stevanovic, D., Surprenant, S., 2020. How is machine learn-
ing useful for macroeconomic forecasting? Available at arXiv:2008.12477 [https://
Data availability .org/abs].
Cremers, K.M., Petajisto, A., 2009. How active is your fund manager? A new measure that
predicts performance. Rev. Financ. Stud. 22 (9), 3329–3365.
The code and data packet for this article can be found at https:// Cremers, M., Petajisto, A., Zitzewitz, E., 2013. Should benchmark indices have alpha?
doi.org/10.17632/rpgb99m5zy.3: Revisiting performance evaluation. Crit. Finance Rev. 2 (1), 001.
Dumitrescu, A., Gil-Bazo, J., 2018. Market frictions, investor sophistication, and persis-
Machine Learning and Fund Characteristics Help to Select Mutual tence in mutual fund performance. J. Financ. Mark. 40, 40–59.
Elliott, G., Gargano, A., Timmermann, A., 2013. Complete subset regressions. J.
Funds with Positive Alpha (Original Data) (Mendeley Data).
Econom. 177 (2), 357–373.
Elton, E.J., Gruber, M.J., Blake, C.R., 2001. A first look at the accuracy of the CRSP mutual
Acknowledgements fund database and a comparison of the CRSP and morningstar mutual fund databases.
J. Finance 56 (6), 2415–2430.
A.P. Santos gratefully acknowledge the support of the Agencia Es- Elton, E.J., Gruber, M.J., Blake, C.R., 2011. Holdings data, security returns, and the se-
lection of superior mutual funds. J. Financ. Quant. Anal. 46 (2), 341–367.
tatal de Investigación (Spain) under grant PID2022-138289NB-I00.
Evans, R.B., 2010. Mutual fund incubation. J. Finance 65 (4), 1581–1611.
Fama, E.F., French, K.R., 1993. Common risk factors in the returns on stocks and bonds.
Appendix A. Supplementary material J. Financ. Econ. 33 (1), 3–56.
Fama, E.F., French, K.R., 2010. Luck versus skill in the cross-section of mutual fund re-
Supplementary material related to this article can be found online turns. J. Finance 65 (5), 1915–1947.
Fama, E.F., French, K.R., 2015. A five-factor asset pricing model. J. Financ. Econ. 116 (1),
at https://doi.org/10.1016/j.jfineco.2023.103737.
1–22.
Feng, G., Polson, N.G., Xu, J., 2020. Deep Learning in Characteristics-Sorted Factor Mod-
References els. Available at SSRN 3243683.
Ferreira, M.A., Keswani, A., Miguel, A.F., Ramos, S.B., 2013. The determinants of mutual
Aliaj, O., 2020. Most hedge funds to be allowed to keep equity holdings secret. fund performance: a cross-country study. Rev. Finance 17 (2), 483–525.
Financial Times, July 11, https://www.ft.com/content/c68ca89c-3f9b-45f9-8205- Friedman, J., Hastie, T., Tibshirani, R., 2010. Regularization paths for generalized linear
6dbea70ed859. models via coordinate descent. J. Stat. Softw. 33 (1), 1–22.
Amihud, Y., Goyenko, R., 2013. Mutual fund’s 𝑅2 as predictor of performance. Rev. Fi- Garcia, M.G., Medeiros, M.C., Vasconcelos, G.F., 2017. Real-time inflation forecasting
nanc. Stud. 26 (3), 667–694. with high-dimensional models: the case of Brazil. Int. J. Forecast. 33 (3), 679–693.
Aragon, G.O., Hertzel, M., Shi, Z., 2013. Why do hedge funds avoid disclosure? Evidence Gibbons, M.R., Ross, S.A., Shanken, J., 1989. A test of the efficiency of a given portfolio.
from confidential 13F filings. J. Financ. Quant. Anal. 48 (5), 1499–1518. Econometrica 57, 1121–1152.

21
V. DeMiguel, J. Gil-Bazo, F.J. Nogales, and A.A.P. Santos Journal of Financial Economics 150 (2023) 103737

Gittelsohn, J., 2019. End of era: passive equity funds surpass active in epic shift. McLean, R.D., Pontiff, J., 2016. Does academic research destroy stock return predictabil-
Bloomberg, September 11. https://www.bloomberg.com/news/articles/2019-09-11/ ity? J. Finance 71 (1), 5–32.
passive-u-s-equity-funds-eclipse-active-in-epic-industry-shift. Medeiros, M.C., Vasconcelos, G.F., Veiga, Á., Zilberman, E., 2021. Forecasting inflation
Green, J., Hand, J.R., Zhang, X.F., 2017. The characteristics that provide independent in a data-rich environment: the benefits of machine learning methods. J. Bus. Econ.
information about average us monthly stock returns. Rev. Financ. Stud. 30 (12), Stat. 39 (1), 1–22.
4389–4436. Mehta, D., Desai, D., Pradeep, J., 2020. Machine learning fund categorizations. Available
Gruber, M.J., 1996. Another puzzle: the growth in actively managed mutual funds. J. at arXiv:2006.00123 [https://.org/abs].
Finance 51 (3), 783–810. Molnar, C., 2019. Interpretable Machine Learning. Lulu.com. https://christophm.github.
Gu, S., Kelly, B., Xiu, D., 2020. Empirical asset pricing via machine learning. Rev. Financ. io/interpretable-ml-book/.
Stud. 33 (5), 2223–2273. Moreno, D., Marco, P., Olmeda, I., 2006. Self-organizing maps could improve the classifi-
Gupta-Mukherjee, S., 2014. Investing in the “new economy”: mutual fund performance cation of Spanish mutual funds. Eur. J. Oper. Res. 174 (2), 1039–1054.
and the nature of the firm. J. Financ. Quant. Anal. 49 (1), 165–191. Pástor, L., Stambaugh, R.F., 2002. Investing in equity mutual funds. J. Financ. Econ. 63
Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning: Data (3), 351–380.
Mining, Inference, and Prediction. Springer. Pástor, L., Stambaugh, R.F., 2003. Liquidity risk and expected stock returns. J. Polit.
Hou, K., Xue, C., Zhang, L., 2015. Digesting anomalies: an investment approach. Rev. Econ. 111 (3), 642–685.
Financ. Stud. 28 (3), 650–705. Pattarin, F., Paterlini, S., Minerva, T., 2004. Clustering financial time series: an applica-
Hunter, D., Kandel, E., Kandel, S., Wermers, R., 2014. Mutual fund performance evalua- tion to mutual funds style analysis. Comput. Stat. Data Anal. 47 (2), 353–372.
tion with active peer benchmarks. J. Financ. Econ. 112 (1), 1–29. Pedersen, L.H., 2022. Big data asset pricing 5: Machine learning in asset pricing. Available
Indro, D.C., Jiang, C., Patuwo, B., Zhang, G., 1999. Predicting mutual fund performance at SSRN 4068797.
using artificial neural networks. Omega 27 (3), 373–380. Rakowski, D., 2010. Fund flow volatility and performance. J. Financ. Quant. Anal. 45 (1),
Jensen, M.C., 1968. The performance of mutual funds in the period 1945–1964. J. Fi- 223–237.
nance 23 (2), 389–416. Rapach, D.E., Strauss, J.K., Zhou, G., 2013. International stock return predictability: what
Jones, C.S., Mo, H., 2020. Out-of-sample performance of mutual fund predictors. Rev. is the role of the United States? J. Finance 68 (4), 1633–1662.
Financ. Stud. 34 (1), 149–193. Rossi, A.G., Utkus, S.P., 2020. Who Benefits from Robo-Advising? Evidence from Machine
Jones, C.S., Shanken, J., 2005. Mutual fund performance with learning across funds. J. Learning. Available at SSRN 3552671.
Financ. Econ. 78 (3), 507–552. Roussanov, N., Ruan, H., Wei, Y., 2021. Marketing mutual funds. Rev. Financ. Stud. 34
Kacperczyk, M., Nieuwerburgh, S.V., Veldkamp, L., 2014. Time-varying fund manager (6), 3045–3094.
skill. J. Finance 69 (4), 1455–1484. Schapire, R.E., Freund, Y., 2012. Boosting: Foundations and Algorithms. MIT Press.
Kaniel, R., Lin, Z., Pelger, M., Van Nieuwerburgh, S., 2023. Machine-learning the skill of Sharpe, W.F., 1966. Mutual fund performance. J. Bus. 39 (1), 119–138.
mutual fund managers. J. Financ. Econ. 150 (1), 94–138. Shi, Z., 2017. The impact of portfolio disclosure on hedge fund performance. J. Financ.
LeDell, E., Gill, N., Aiello, S., Fu, A., Candel, A., Click, C., Kraljevic, T., Nykodym, T., Econ. 126 (1), 36–53.
Aboyoun, P., Kurka, M., Malohlava, M., 2020. H2 O: R interface for the ‘H2O’ scalable Stambaugh, R.F., Yuan, Y., 2017. Mispricing factors. Rev. Financ. Stud. 30 (4),
machine learning platform. R package version 3.30.1.3. 1270–1315.
Li, B., Rossi, A.G., 2020. Selecting mutual funds from the stocks they hold: A machine Wermers, R., 2000. Mutual fund performance: an empirical decomposition into stock-
learning approach. Available at SSRN 3737667. picking talent, style, transactions costs, and expenses. J. Finance 55 (4), 1655–1695.
Liaw, A., Wiener, M., 2002. Classification and regression by random forest. R News 2 (3), Wu, W., Chen, J., Yang, Z., Tindall, M.L., 2021. A cross-sectional machine learn-
18–22. ing approach for hedge fund return prediction and selection. Manag. Sci. 67 (7),
Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., Katz, R., Himmel- 4577–4601.
farb, J., Bansal, N., Lee, S.-I., 2020. From local explanations to global understanding Zhu, M., 2018. Informative fund size, managerial skill, and investor rationality. J. Financ.
with explainable AI for trees. Nat. Mach. Intell. 2 (1), 56–67. Econ. 130 (1), 114–134.
Lundberg, S.M., Lee, S.-I., 2017. A unified approach to interpreting model predictions. Zou, H., Hastie, T., 2005. Regularization and variable selection via the elastic net. J. R.
Adv. Neural Inf. Process. Syst. 30, 1–10. Stat. Soc., Ser. B, Stat. Methodol. 67 (2), 301–320.
Mamaysky, H., Spiegel, M., Zhang, H., 2008. Estimating the dynamics of mutual fund
alphas and betas. Rev. Financ. Stud. 21 (1), 233–264.

22

You might also like