Moving the Goalposts Mutual Fund Benchmark Changes and Relative Performance Manipulation
Moving the Goalposts Mutual Fund Benchmark Changes and Relative Performance Manipulation
Abstract
∗
We thank Lauren Cohen (our editor) and two anonymous referees for a very constructive set of comments.
For helpful feedback, we thank Vikas Agarwal, Itzhak Ben-David, Andriy Bodnaruk, Sugata Ray, William
O’Brien (discussant), Jon Fulkerson (discussant), George Aragon (discussant), Berk Sensoy, Yuehua Tang,
and seminar participants at the University of Central Florida, University of Arizona, University of California
Riverside, Michigan State University, University of Toledo, the University of Texas at Dallas, Naresuan
University, the Securities and Exchange Commission (SEC), the Center for Financial Research Cologne,
the Arizona/ASU Finance Conference, the Behavioural Finance Working Group annual conference, the 2023
Spring Chicago Quantitative Alliance Meeting, the 2022 Financial Management Association Annual Meeting,
and the 2022 Midwestern Finance Association conference. Nesrine Karout and Scott Jones provided helpful
research assistance. This article was previously titled “Benchmark Backdating by Mutual Funds”. Mullally
is with the University of Central Florida and Rossi is with the University of Arizona. The authors have no
conflict of interest to disclose. Emails: [email protected] and [email protected].
1 Introduction
A large percentage of households invest their wealth via intermediaries such as mutual
funds.1 Extant research has documented that mutual fund investors base their capital al-
location decisions on funds’ past performance (e.g., Sirri and Tufano, 1998; Ivković and
Weisbenner, 2009; Barber, Huang, and Odean, 2016). Moreover, investors use readily avail-
able information (e.g., Sensoy, 2009; Kaniel and Parham, 2017; Evans and Sun, 2021) and
relatively simple performance measures (e.g., Del Guercio and Tkac, 2008; Elton, Gruber,
and Blake, 2014) to make these decisions. Combined, these findings suggest that fund man-
agers have an incentive to manipulate the performance information they present to investors.
In this paper we explore whether and how mutual funds take actions, ex-post, to alter the ap-
pearance of the performance information investors receive, and whether such actions “work”
in attracting investor flows.
Securities and Exchange Commission (SEC) Rule 33-6988 requires mutual funds to dis-
close at least one “appropriate” broad-based market index serving as a performance bench-
mark. Specifically, funds are required to provide comparisons of their past 1-, 5-, and 10-year
returns to those of at least one self-designated benchmark index. The SEC’s stated ratio-
nale for this requirement is to help investors evaluate “how much value the management of
the fund added by showing whether the fund outperformed or underperformed the market.”
Given this rationale, it is perhaps surprising that the rule allows funds to add and remove
benchmark indexes with little justification, and does not prohibit funds from comparing their
past returns to those of newly-chosen index(es) rather than to the returns of the index(es)
they selected at the time the returns were generated. In essence, a side effect of this rule is
that it allows funds to alter the appearance of the benchmark-adjusted returns (henceforth,
“BAR”) they present to investors simply by changing their benchmarks.2
1
According to the 2016 Federal Reserve Survey of Consumer Finances, only 13% of surveyed households
directly invested in the stock market while 52% of households had stock market exposure through an inter-
mediary vehicle such as a mutual fund. The report can be found at https://www.federalreserve.gov/
publications/2017-September-changes-in-us-family-finances-from-2013-to-2016.htm.
2
We discuss the rules governing mutual funds’ benchmark choices and changes in Section 2 below.
1
We present the Advisors’ Inner Circle Cambiar Opportunity Fund as an example of
these benchmark changes. In 2016, the fund’s only benchmark was the S&P 500. In its
2017 prospectus, filed on March 15, 2018, the fund added the Russell 1000 Value Index.
The 2017 returns for the Russell 1000 Value and S%P 500 indexes were 13.66% and 21.83%,
respectively. Comparing the fund’s performance to the Russell 1000 Value leads to an 8.17%
improvement in the fund’s BAR. The fund deleted references to the S&P 500 in its 2018
prospectus, which means that investors viewing the fund’s prospectuses from 2018 onward
would not know that the benchmark had been changed.3
In this study, we examine changes in mutual funds’ self-selected benchmarks using funds’
actual SEC disclosures. Unlike prior studies that rely solely on Morningstar Direct for
information on funds’ benchmarks, we download and parse funds’ prospectuses and summary
prospectuses (i.e., forms 485BPOS and 497K, respectively) from the SEC EDGAR website
to obtain information on their benchmark indexes. The raw data from funds’ SEC filings
allows us to observe all of the benchmark indexes that appear on funds’ prospectuses each
year and thus detect any changes. Our data reveals that benchmark changes are common.
We find that 1,050 out of 2,870 funds (36.5%) made changes to their prospectus benchmarks
at least once during our sample period of 2006 to 2018. For funds that make at least one
benchmark change, the mean (median) number of changes is 2.27 (2) per fund. Benchmark
changes occur in 6.85% of all fund-year observations. Obtaining the data directly from
funds’ disclosures has additional advantages; for instance, we can read funds’ discussion of
benchmark changes and we can detect the presence of multiple benchmarks. Indeed, we find
that 39.4% of all fund-year prospectus observations contain more than one benchmark index.
Our main hypothesis is that funds take advantage of the loophole in SEC rules by strate-
gically changing their benchmarks to improve the appearance of their BAR. We first examine
whether funds have incentives to increase BAR by estimating various flow-performance re-
gressions. These regressions reveal a positive and statistically significant relation between
3
The performance tables from this fund’s prospectus filings as well as several other examples can be found
in Appendix B.
2
fund flows and funds’ BAR. Specifically, a one standard deviation increase in BAR is asso-
ciated with a 4.63% to 7.80% increase in annual fund flows. Controlling for BAR and other
known determinants of fund flows, funds that choose benchmarks with higher returns and
a lower tracking error relative to their self-selected benchmarks attract incrementally more
flows. These results suggest that, ceteris paribus, funds also have incentives to choose bench-
marks associated with “hotter” investment styles and those that more accurately reflect their
investment strategies. Combined, these results suggest that benchmark choices are complex
and funds may make these decisions dynamically, based on their particular incentives and
circumstances.
To test the hypothesis that funds make benchmark changes to increase BAR, we compare
the past returns of the benchmarks funds add or drop to the returns of several control groups.
Our main finding is that funds add indexes with low past returns and drop indexes with
high past returns, which leads to systematic decreases in the past benchmark returns they
report. These changes have economically meaningful effects on the benchmark index returns
presented to investors. For example, funds add indexes with 2.39% lower 5-year returns than
their existing benchmark indexes, and 5.56% lower 5-year returns than the returns of the
index that best matches their strategy. These results are highly statistically significant, and
we confirm that they are not driven by the past return horizon considered nor by outliers.
Further, the fraction of cases in which a benchmark addition or deletion increases a fund’s
BAR ranges from 54% to 66%.
We next examine the determinants of funds making benchmark changes and find evidence
consistent with the incentives implied by our flow-performance regressions. Funds with
lower past BAR, lower past flows, and higher tracking errors are more likely to change their
benchmarks. In short, our results indicate that many funds’ benchmark changes improve the
appearance of their past relative performance, and that funds with the greatest incentive to
make this type of benchmark change are the most likely to do so.
Although a majority (54% to 66%) of benchmark changes result in an increase of funds’
3
BAR, a sizable group of benchmark changes do not lead to this outcome. We next investigate
why some funds make changes that do not increase BAR. Our analysis of the flow-based in-
centives surrounding funds’ benchmark indexes suggests that these choices are multi-faceted
and that funds benefit from having indexes that better match their current investment strat-
egy and indexes that reflect “hotter” investment styles. Consistent with these incentives, the
two most common reasons funds provide for making benchmark changes in their prospec-
tuses are i) choosing an index that “better reflects the fund’s investment strategy” and ii)
a concurrent change in investment style. Indeed, we find evidence that benchmark changes
that do not increase BAR are consistent with these rationales.
Our final set of tests examines the consequences of benchmark changes. Funds making
benchmark changes that increase BAR receive positive abnormal flows during the five-year
period after the change. These extra flows represent 9.6 to 11.4% of fund size, or $67.0 to
$79.8 million dollars, for the average fund. Lastly, we find that investors who allocate addi-
tional capital to these funds are adversely affected, as funds that change their benchmarks
generate about 0.5% per year lower Fama and French (1993) three-factor alphas, net of fees,
in the subsequent five years than funds that do not change their benchmarks. Overall, our
study provides evidence that i) funds change their benchmarks in ways consistent with their
flow-based incentives, ii) these benchmark changes help funds attract flows, and iii) fund
investors are adversely affected by responding to these benchmark changes.
In sum, our study has implications for regulators, investors, and academics. For regu-
lators, our results suggest that a significant number of funds take advantage of SEC regu-
lation that allows them to change their benchmarks by choosing indexes that improve the
appearance of their past performance. These actions appear to conflict with the stated
purpose of disclosure, which is to increase transparency for investors. Recent rule changes
that shorten mutual funds’ reports to shareholders could make these performance compar-
isons more salient for retail investors. Our results suggest that this increased salience could
increase funds’ incentives to choose benchmark indexes with lower past returns and, as a
4
result, may distort investors’ capital allocation decisions.4 Regulators could amend the rules
on disclosure to require funds to compare their past performance only to the benchmark
indexes they cited during the performance period. Such a change would create little, if any,
additional costs for funds while drastically improving the quality of the disclosures provided
to investors. Moreover, such a rule amendment would still allow funds to choose benchmarks
that more accurately reflect their investment strategies in a forward-looking sense. For in-
vestors, our results prescribe caution when consuming the performance information mutual
funds publish in their prospectuses and related sales material.
For academics, our paper contributes to multiple strands of the forensic finance literature
(Griffin and Kruger, 2023). Most broadly, we contribute to the literature on the strategic
behaviors mutual funds engage in to attract flows from investors. Several studies document
that funds choose benchmarks that are mismatched relative to their investment strategies
(Sensoy, 2009; Elton, Gruber, and Blake, 2014; Chen, Cohen, and Gurun, 2021; Cremers,
Fulkerson, and Riley, 2020). A key contribution of our study is that we document fund
managers often change their benchmarks and do so in a systematic, strategic manner based
on the past returns of these indexes. We note that the benchmark changes we document can
affect the appearance of a fund’s BAR for up to 10 years. The long-lasting nature of this
effect is important given the notion that investors’ capital allocations to active managers
likely depend on the value they perceive these managers add relative to passive investing.
Other studies have found strategic behavior at the family level in the form of cross-fund
subsidization (Gaspar, Massa, and Matos, 2006) and fund incubation (Evans, 2010).
Second, we also contribute to the literature on fund performance manipulation. Cici,
Gibson, and Merrick Jr (2011) find evidence that bond mutual funds take advantage of
holding relatively harder-to-value bonds to smooth their returns. Similarly, Emin and James
(2022) find that mutual funds trading bank loans smooth their returns. Unlike these studies,
4
See the press release announcing the format change at: https://www.sec.gov/news/
press-release/2020-172 and the proposed format at: https://www.sec.gov/files/final_2020_
im_annual-shareholder%20report.pdf.
5
which show behaviors that directly affect the returns they present to investors, we document
a way funds alter the framing in which fund performance is viewed by investors after the
performance has been generated.
Finally, we add to the literature documenting ways funds strategically respond to manda-
tory disclosure requirements. For instance, funds window dress their portfolio holdings
(Lakonishok, Shleifer, Thaler, and Vishny, 1991; Musto, 1999; Meier and Schaumberg, 2004;
Agarwal, Gay, and Ling, 2014), opportunistically change their fund names (Cooper, Gulen,
and Rau, 2005), and withhold the names of certain portfolio managers (Massa, Reuter, and
Zitzewitz, 2010). Our study shows that managers take advantage of disclosure regulation by
retroactively choosing benchmarks that exaggerate the value they add for investors.
2 Institutional Details
SEC Rule 33-6988 governs the disclosure of mutual funds’ performance information. Item
5A describes how funds must compare their past returns to those of a benchmark index. In
its discussion of the rule, the SEC notes that the “index comparison requirement is designed
to show how much value the management of the fund added by showing whether the fund
‘out-performed’ or ‘under-performed’ the market.” As such, Item 5A(b) requires funds to
provide both visual and tabular representations of this value added. Funds must provide
a line graph displaying the account values a fund investor would have in the most recently
completed ten fiscal years if she invested in the fund and the benchmark index(es) and a
table containing the average annual total returns for the past one, five, and ten year periods
for the fund and its chosen benchmark index(es).
Instruction 7 of Item 5A provides the only details on the criteria by which funds should
select an “appropriate broad-based securities market index.” The parameters governing this
choice are that the index 1) should not be administered by an affiliate of the fund, 2) must
be adjusted to reflect the reinvestment of dividends but not the fund’s expenses, and 3) must
6
cover the entire past ten-year period. Instruction 8 of Item 5A is titled “Use of Additional
indexes” and urges funds to compare their performance to “other, more narrowly-based
indexes which reflect the market sectors in which they invest.” The rule also allows funds
to report “an additional broad-based index” or a “non-securities index.” Despite this text,
Rule 33-6998 does not appear to provide any guidance in how to determine whether an
additional index is an “additional broad-based index” or a “more narrowly-defined index”
for a given fund.5 Although funds are not permitted to use a peer-based benchmark as
their only benchmark index, many funds choose peer-based benchmarks as an additional
benchmark. Peer-based benchmarks are indexes that track the performance of a group of
funds (e.g., large-cap growth funds) and are compiled by fund rating agencies such as Lipper
and Morningstar.
Finally, Instruction 12 of Item 5A dictates the rules for benchmark changes. If a fund
selects a different index from the one being used in the preceding fiscal year, it must provide
a reason for the change and report the performance comparisons for both the newly-selected
index as well as the previous one(s). This requirement only applies during the reporting year
in which the change takes place, and only if a fund intends to completely replace an existing
benchmark. As alluded to in the introduction, the rule does not require a fund to compare
its past performance to the returns of previously-selected benchmarks for the years those
benchmarks were selected. Moreover, there appear to be no guidelines or rules regarding
the reasons funds choose a new index or even what defines an “appropriate” index. Indeed,
funds often justify benchmark changes with boilerplate language and choose benchmarks
that do not reflect their investment styles. Examples of prospectus benchmark changes can
be found in Appendix B. In short, SEC Rule 33-6988 provides funds with significant latitude
5
In fact, we identify cases in which different funds reporting the same additional index (e.g., the Russell
1000 Value) categorize it differently. Since funds’ rationales for changing their broad-based or more narrowly-
based indexes may differ, we repeat all of our main analyses on each subsample of index change to mitigate
concerns that we are falsely inferring nefarious behavior by mutual funds. We describe the algorithm we
used to classify funds’ benchmarks as “broad-based” or “more narrowly-based” and report the results of
these tests in the Internet Appendix. We refer to these tests throughout the paper where appropriate. We
also discuss this issue in Section 5.2.2.
7
in how they choose and change their benchmark indexes.
Our data comes from multiple sources. First, we obtain data on mutual funds from the
Center for Research on Security Prices (CRSP) Mutual Fund database. We begin by iden-
tifying the sample of U.S. domestic equity mutual funds from 2005—2018. Consistent with
most academic studies, we only include diversified U.S. equity funds and exclude balanced
funds and sector funds. We identify and exclude from our sample any index fund, exchange-
traded fund (ETF), exchange-traded note (ETN), and target-date fund (TDF) using CRSP
classification codes and fund names following Dannhauser and Pontiff (2019) and Ben-David,
Li, Rossi, and Song (2022). The analysis is at the fund level (rather than at the asset class
level). Continuous variables such as fund size and returns, which are known to contain out-
liers and data entry errors, are winsorized at the 99% level. We aggregate the assets under
management (AUM) of different share classes within a fund using the CRSP CL GRP iden-
tifier and calculate fund-level returns by value-weighting each share class. The CRSP data
also contains information on the funds’ expense ratios, turnover ratios, and other charac-
teristics. We augment the CRSP Mutual Fund data with manager and investment strategy
data from Morningstar Direct.
Second, we locate each fund in the SEC EDGAR database using a combination of ticker
and name searches. Starting in 2006, the SEC began requiring fund companies to include
fund-level (Series ID) and share class-level (Class ID) identifiers in their SEC filings to make
it easier to locate individual funds’ filings within a given fund family’s documents.6 We
6
Our sample period is limited to filings made from 2006—2019 for this reason. The SEC has made
these identifiers relatively easy to search and match by providing tables with Series ID, Class ID, and ticker
information. The tables are available here: https://www.sec.gov/open/datasets-investment_company.
html.
8
download each fund’s prospectus (485) and summary prospectus (497K) filings and use text
processing code to identify the table that contains the fund’s returns and those of their
benchmarks. As described in Section 2, each fund is required to include in these documents
a table of its average annual total returns for the one-, five- and ten-year periods as well
as the corresponding returns of its self-designated benchmark indexes. The content of these
tables allows us to create a panel of fund-year-benchmark index observations and identify
time series variation within a given fund. We present examples of these tables in Appendix B.
The reader should note that, in many cases, investors viewing a fund’s current prospectus
would not be able to detect a change in the fund’s benchmark index(es) without viewing prior
prospectuses. To ensure the accuracy of our data, we manually check any instance in which
we detect a change in a fund’s benchmark or a fund reports more than three benchmarks.
Our main sample contains 2,870 unique funds and 27,288 annual observations.
Lastly, we obtain monthly return data for the Standard and Poor’s, Russell, and other
major equity indexes from Compustat and Bloomberg and monthly return data for the
Morningstar peer groups from Morningstar Direct. The Lipper Peer Group indexes are
computed by taking an equally weighted average of the returns of the 30 largest mutual
funds based on fund total net assets in a given category.7 We compute these returns using
the Lipper classification and monthly fund data from the CRSP Mutual Fund Data.
9
Figure 1: Frequency of Benchmark Changes Over Time
This figure contains a year-by-year plot of the number of funds that add or drop a benchmark index, as well
as the percentage of funds in our sample that make any change each year. Our sample of changes begins in
2006 because we collect benchmark data starting in 2005 and thus use the 2005 observations as the baseline
for our sample of funds.
300 15%
Number of benchmark changes
100 5%
0 0%
2006 2008 2010 2012 2014 2016 2018
Dropped benchmarks (left axis)
% funds with changes (right axis)
Added benchmarks (left axis)
Notably, we find that 1,050 out of 2,870 funds made at least one change to their prospectus
benchmarks during our 13-year sample period. Because we collect data on funds’ benchmarks
beginning in 2005, the first year in which we can detect changes is 2006. The average fund in
our sample reports 1.44 benchmarks per year and makes 0.84 benchmark changes during our
sample period. However, these overall statistics obscure major differences between the groups
of funds that do and do not make any changes. Funds that make at least one benchmark
change make an average of 2.27 changes during this period, suggesting that there is a serial
component to this behavior. Funds making at least one benchmark change also report
significantly more benchmarks each year (1.74) than funds that never make a benchmark
change (1.23).
As discussed in Section 2, funds can also compare their past performance to other types
of indexes. We find that the use of these alternative indexes is quite common, a fact that
has been overlooked in prior literature due to data limitations. In Table 1 we break down
10
Table 1Summary Statistics
This table contains summary statistics for the prospectus benchmarks reported by our sample of
2,870 funds. Column 1 reports statistics across all funds. Column 2 reports statistics for the
set of funds that never make a change to their benchmarks. Columns 3 to 8 report statistics for
the set of funds that make at least one change to their benchmarks. Major equity indexes are
commonly-used S&P and Russell stock-based indexes with a well-defined placement in the classic
3×3 style box. Peer-based benchmarks are indexes comprised of groups of mutual funds. “Other
benchmarks” include seldom-used indexes, sector indexes, and custom “blended” indexes computed
as a weighted average of two or more indexes.
prospectus benchmarks into three categories: major equity indexes, peer-based benchmarks,
and a residual category labeled “other benchmarks,” which include rarely-used stock indexes
and custom “blended” indexes that reflect the average of two or more indexes. Overall, we
find that funds include an average of 0.22 peer-based benchmarks and 0.06 other bench-
marks. As shown in Column 3 of the table, funds that make a benchmark change during
our sample period are significantly more likely to include these alternative types of indexes.
The average number of peer-based and other benchmarks in that subsample is 0.46 and 0.12,
respectively. Unless otherwise noted, our analyses throughout the paper includes only major
equity indexes. Appendix A.1 presents a full list of the 72 indexes used in this study and
provides additional details.
A contribution of our paper is to document the frequency with which funds change their
11
benchmarks. On average, funds make a change to their reported performance benchmarks
in 6.8% of fund years. Within the subset of funds that make at least one change, the average
fraction of years with benchmark changes is 18.7%. In contrast to our study, prior academic
papers have traditionally relied on Morningstar data to identify fund benchmarks and have
generally assumed that benchmark changes are rare or inconsequential (e.g., Sensoy, 2009;
Cremers, Fulkerson, and Riley, 2020). Because the Morningstar data does not contain time-
series information on funds’ benchmarks, researchers can only detect changes with this data
if they possess multiple snapshots of the data as in Chen, Evans, and Sun (2022). Moreover,
because Morningstar only has data on funds’ primary and sometimes secondary benchmarks,
researchers using that data do not observe the cases in which a fund claims more than two
benchmarks. In our sample, funds include three or more benchmark indexes in over 10% of
fund-year observations.
We begin our analysis by examining whether investors reward higher BAR with higher
capital flows. Specifically, we start by regressing flows onto a fund’s 3-year return decomposed
into two parts: the BAR and the return of the self-reported benchmark.8 We then estimate
additional regressions controlling for Morningstar star ratings, unadjusted fund returns, fund
characteristics, tracking error relative to the benchmark, and various sets of fixed effects.
The results of these regressions are presented in Table 2. The insights discussed in
this section are very robust to the choice of the empirical specification, e.g., the inference is
unchanged when using five years as the performance evaluation period, as shown in Table C.1
in the Appendix.
First and foremost, we find a positive and statistically significant relation between fund
flows and BAR in all specifications. This result strongly suggests that funds have incentives
8
In cases in which a fund reports more than one stock-based benchmark, we take the average of the
returns of the benchmarks before subtracting it from the return of the fund. The coefficients and t-stats on
BAR and benchmark return are extremely robust to variation in this choice.
12
to outperform their benchmark and to potentially alter how investors view their performance
relative to their chosen benchmark. Consistent with prior literature (Ben-David et al., 2022;
Evans and Sun, 2021), we also find that the coefficient on fund performance declines once
we control for Morningstar star ratings in Columns 2 – 9. Starting in Column 4, we present
specifications in which we explicitly control for funds’ unadjusted returns; these specifica-
tions show that BAR is a flow determinant that is economically important and statistically
different from the unadjusted fund return. In Column 5, we employ (Fama and MacBeth,
1973) regressions as a robustness test (Ben-David et al., 2022). Column 6 includes indi-
cator variables for past fund return deciles constructed based on past 3-year unadjusted
fund returns within each month to account for convexity in the flow-performance relation
(Chevalier and Ellison, 1997; Sirri and Tufano, 1998). Column 7 includes lags of individual
monthly unadjusted returns to account for the fact that investors respond more strongly
to recent performance (Barber, Huang, and Odean, 2016). We include fund fixed effects in
Column 9 to ensure that funds’ incentives to outperform their benchmarks exist both in the
cross-section and within their own time series. Regardless of the specification chosen, we
find a positive, economically large, and statistically significant relation between fund flows
and BAR. Specifically, the coefficients on BAR in Columns 2 – 9 imply that a 1% increase in
a fund’s cumulative 3-year BAR is associated with an increase in annual fund flows between
0.38% to 0.64%.
Second, specifications 1 and 2 indicate that fund flows are also positively related to the
returns of funds’ benchmark indexes themselves. In Columns 8 – 9, we include indicator
variables for “hot” and “cold” benchmark styles, which are equal to 1 if a given benchmark’s
returns were in the top 80% or bottom 20% of all benchmark index returns in the prior
year, respectively, and 0 otherwise. The results indicate that, all else equal, having a hot
(cold) benchmark is associated with 2.04% higher (3.12% lower) fund flows annually. This
result suggests that funds may have incentives to choose benchmarks in hot styles and shun
benchmarks in cold styles, consistent with prior literature (Cooper, Gulen, and Rau, 2005;
13
Lynch and Musto, 2003).
Table 2Flow-Performance Regressions
This table contains regressions of funds’ monthly flows on various measures of fund performance,
benchmark returns, and fund characteristics. Benchmark-adjusted return is the fund’s return minus
the return of its self-selected benchmark index(es) for the previous three years. Benchmark return is
the return of the fund’s self-selected benchmark index(es) for the previous three years. Unadjusted
return is the fund’s net return for the previous three years. Tracking error is the standard deviation
of the residuals from a regression of the fund’s returns on the returns of its self-selected benchmark
index(es). 1[Mismatched Benchmark] is an indicator variable equal to 1 if the fund does not have
a benchmark that matches its Morningstar classification, and 0 otherwise. 1[Hot benchmark style]
is an indicator variable equal to 1 if the fund’s benchmark is in the top 20% of all benchmark
returns for the past year, and 0 otherwise. 1[Cold benchmark style] is an indicator variable equal
to 1 if the fund’s benchmark is in the bottom 20% of all benchmark returns for the past year,
and 0 otherwise. The control variables are the natural logarithms of fund size and age, expense
ratio, and the fund’s return volatility from the prior 36 months. Regression-based t-statistics are
shown in parentheses below the coefficients and are computed from standard errors which are
double clustered by fund and month. Column 5 presents coefficients for a specification estimated
via Fama-MacBeth regressions with t-statistics calculated using Newey-West standard errors. *p
< .10; **p < .05; ***p < .01.
Control Variables Yes Yes Yes Yes Yes Yes Yes Yes Yes
Time F.E. Yes No No No No No No No No
Time × Morningstar Star F.E. No Yes Yes Yes Yes Yes Yes Yes Yes
Fund F.E. No No No No No No No No Yes
Return Decile F.E. No No No No No Yes No No No
Individual Monthly Return Lags No No No No No No Yes No No
Adj. R2 0.138 0.165 0.161 0.165 0.153 0.166 0.185 0.164 0.243
N obs 297,014 296,850 296,850 296,850 296,850 296,850 296,850 297,014 296,991
Finally, our results also indicate that fund flows are negatively related to funds’ tracking
errors relative to their benchmarks, consistent with studies showing that investors prefer
14
funds that choose benchmarks that closely reflect their investment styles (Del Guercio and
Tkac, 2002; Chen, Evans, and Sun, 2022). In Columns 8 – 9, we use an indicator variable,
Mismatched Benchmark, that is equal to 1 if none of the fund’s benchmarks match the fund’s
Morningstar 3×3 investment style box assignment (following Sensoy, 2009) and 0 otherwise.
The coefficient in Column 8 indicates that using a mismatched benchmark is associated with
1.2% lower flows each year.
In sum, the results in Table 2 suggest that funds’ incentives around benchmark choices are
multi-faceted. While funds have incentives to choose benchmark indexes they have previously
outperformed, they also have incentives to choose indexes that reflect hot investment styles
and those that accurately reflect their investment strategies.9
In this section, we examine whether funds take advantage of the loophole in SEC rules
by changing their benchmarks to improve the appearance of their past BAR. Funds have
multiple options when they decide to make changes to their benchmark indexes. A fund can
choose to add a new index, delete an existing index, or replace one or more benchmarks.
These indexes can be stock-based or peer-based. A challenge when evaluating the rationale
for making these changes is determining the appropriate counterfactual to which to compare
them. Our empirical strategy is to examine funds’ benchmark changes relative to multiple
control groups, each proxying for a reasonable counterfactual. In this spirit, our baseline
analysis includes several quasi-independent tests, each of which has a different economic
interpretation. This empirical strategy allows us to decrease the sensitivity of our inferences
with respect to the choice of a single control group while simultaneously minimizing the
9
In Internet Appendix Table IA.4, we repeat these regressions after classifying each fund’s benchmarks(s)
as being “broad-based” or “narrowly-based” and find that both BAR types predict incremental flows.
15
chance that the overall results would arise by chance.
Consider a fund that adds a new stock-based benchmark index. We calculate the 1-, 5-,
and 10-year returns of the added index in the year prior to the filing and subtract from them
the returns of the stock-based index(es) i) the fund currently uses, ii) that best reflect the
fund’s investment strategy, and iii) the fund did not choose but are in the same style of the
added index.10 Second, when a fund drops a stock-based index, we compare the returns of
the stock-based index(es) it retains to the return of the index it dropped.
Next, we consider peer-based indexes. When a fund adds a peer-based index (e.g., the
Lipper Small Cap Growth Funds index), we compare this index’s returns to those of i) the
fund’s existing stock index(es), ii) dropped peer-based index(es), and iii) the peer-based
benchmark(s) that best matches its investment strategy.
These control groups are designed to compare the returns of a fund’s newly chosen bench-
mark(s) to the index(es) the fund previously used, the index(es) the fund should have used
(i.e., the benchmark(s) that best reflect the fund’s investment category), and the indexes the
fund could have used given its index style choice.
If funds systematically revise their benchmarks to improve the appearance of their BAR,
we expect the average of each of these return differences to be negative. We gauge the
statistical significance of these differences in two ways. The first method relies on a standard
t-test with standard errors adjusted for heteroskedasticity. A potential issue with a t-test is
that the return differences we compute are likely correlated within years and across years.
For example, the S&P 500’s 5-year return in 2018 is a function of its 1-year returns from
2014 through 2018. Moreover, benchmark changes in a given year could be correlated across
funds if funds are adding or dropping certain indexes because of their realized returns. It
is not clear that we should fully control for the latter form of correlation, since this pattern
is precisely what our main hypothesis predicts. Nevertheless, as an alternative method of
10
To examine whether funds choose indexes from styles with lower returns, we also compare the style
returns of the newly-added index to the average style returns of all the non-chosen styles as well as to the
return of the style that best matches the fund’s investment strategy. We report the results of these tests in
Appendix Table C.2.
16
gauging statistical significance, we calculate bootstrapped p-values that control for clustering
of style choices within a year as well as index return dependence within and across time.11
Finally, to mitigate concerns that our results may be driven by outliers, for each of
the tests described above, we also tabulate the median difference, as well as the fraction
of differences that are negative and its statistical significance under the null of a binomial
distribution in which positive and negative numbers occur with the same probability.
Table 3 contains the results of tests comparing the past returns of the revised stock-
based benchmarks to those of the control groups. For brevity, we primarily discuss the
tests comparing the 5-year return differences. Panel A contains the tests that use funds’
previously chosen stock-based index(es) as the control groups. Funds add (retain) indexes
with statistically lower returns than their existing (dropped) indexes. For instance, Column 2
of Panel A shows that the newly-added indexes have, on average, 2.39% lower 5-year returns
and that this difference is statistically significant at the 1% level using either the conventional
t-statistic or the bootstrapped p-value. In the case of the 5-year returns, funds add a lower
performing benchmark in 430 out of 776 cases. The probability of this pattern occurring
randomly is just 0.001.
The results comparing the returns of the dropped indexes to those of retained indexes are
presented in Columns 4 – 6 of Panel A and are similar to those described in the preceding
paragraph in terms of both economic and statistical significance. We note that the past 1-
year returns of indexes dropped and retained indexes do not differ statistically. This result is
likely driven by the SEC’s rule governing benchmark changes. Specifically, the SEC requires
funds that decide to permanently replace their benchmark with another one to wait one
additional year before they can drop the old one. Restated, funds cannot simply completely
11
Further details on our bootstrapping method can be found in Appendix C.2.
17
replace an existing benchmark with a new one in a single year. This implies that, in the
majority of cases, funds likely made the “drop” decision one year before it is reflected in
their prospectuses, so we should not expect to observe a negative retain-minus-drop 1-year
return difference even if the drop choices are strategic.
Panel B of Table 3 contains the tests using the control groups representing the stock-
based indexes that funds could or should have selected. Columns 1 – 3 of Panel B compare
the returns of the indexes that funds add to those of the benchmark that best reflects the
fund’s investment strategy over the performance reporting period. We calculate two variables
to determine the index that best reflects the fund’s investment strategy. First, we regress
the fund’s monthly returns on those of each index in our sample to compute univariate R 2 .
Second, we calculate the fund’s tracking error with respect to each index.12 The index with
the highest R 2 and the lowest tracking error are deemed to be the best-match index. If
the highest R 2 and lowest tracking error belong to two different indexes, we use the average
returns of these two indexes. Fund-year observations in which a fund chooses the best-match
index according to either or both of the two criteria are excluded from this test.
We note that, for 652 out of 784 index additions, funds do not add the best match index.
More importantly, the differences in the returns for the index the fund actually adds and
the one that would be most appropriate are strikingly large. Funds add indexes with 5.56%
lower 5-year returns than the best match index and this difference is statistically significant
at the 1% level. 66.3% of these differences are negative and the probability that this pattern
occurs randomly is 0. We again stress that, although the best match index is a reasonable
counterfactual, the SEC only requires a fund to report an appropriate broad-based index,
not necessarily the one that best matches its investment strategy.13
Columns 4 – 6 in Panel B compare the returns of the benchmarks a fund chooses to
12
Recall that, after a benchmark change, a funds’ prospectus will compare the past return of the fund and
of the newly-added index. For this reason, the R 2 and tracking error are computed using fund and index
returns prior to and until the year in which the index change takes place.
13
We repeat the analyses in Table 3 after subdividing indexes into “broad-based” and “narrowly-based.”
These analyses can be found in Table IA.5 of the Internet Appendix.
18
Table 3How Do Benchmark Changes Affect Past Reported Index Returns?
This table illustrates the effect of benchmark changes on past reported index returns using different counter-
factuals. The first three columns of Panel A show the difference between the returns of added indexes and
those of pre-existing indexes. The last three columns of Panel A show the difference between the returns
of retained indexes and those of dropped indexes. Panel B uses non-chosen indexes as the counterfactuals.
The first three columns of Panel B show the difference between the returns of added indexes and those of
non-added indexes that best match the funds’ returns. The last three columns of Panel B show the difference
between the returns of added indexes and the average return of non-added indexes in the same investment
style. Panel C repeats the Add minus Existing test after splitting the sample based on whether a fund
reports a single or multiple benchmarks. The table contains the mean of each difference and its associated
t-statistic based on White’s heteroskedasticity-consistent standard errors and bootstrapped p-values. We
also present the percentage of the differences that are negative, and their statistical significance based on a
binomial test where the null hypothesis is that positive and negative differences each occur 50% of the time.
*p < .10; **p < .05; ***p < .01.
19
those of benchmarks within that same style the fund could, but did not, choose. For most
investment styles, there are only two to four indexes that funds can choose from (see Table A.1
for more details on the number of benchmark indexes in each investment strategy). For
instance, a fund choosing a small value index is likely to select either the Russell 2000 Value
or the S&P 600 Value Index. The average correlation of the monthly returns in a given style
is 0.97 and the average within-style standard deviations of 5- and 10-year returns are only
1.54% and 3.73%, respectively. We cite these statistics to emphasize that the within-style
choice is one in which funds have limited degrees of freedom to alter the appearance of their
past BAR. Despite this, we continue to find evidence that funds change their benchmarks
strategically; funds add indexes with 1.68% lower 5-year returns than the average of the
non-added indexes in the same style and this difference is statistically significant at the 1%
level.
As detailed in Section 2, funds are allowed to report multiple benchmarks. The ambiguity
in the wording of the rule calls for nuance when interpreting the actions of funds that report
more than one benchmark. One potential concern is that our results on return manipulation
are driven by funds that report multiple benchmark indexes. A potential interpretation of
such a result could be that these funds are changing a secondary benchmark for reasons
unrelated to performance manipulation, such as choosing an index that better reflects the
fund’s investment strategy. In the Internet Appendix, we elaborate and provide robustness
tests related to these nuances.
Here, we perform a simple exercise to test whether our headline results are driven by
funds that report multiple benchmarks. To do so, we split the sample of benchmark-changing
funds based on whether they report a single benchmark or multiples ones. We define single-
benchmark reporters as those funds that have only one benchmark before and after the
fiscal year in which a benchmark change occurs. The subsample of changes made by single-
20
benchmark reporters represent cases in which a fund simply replaces its only existing index
with another index. By focusing on these changes, we can rule out that our results may be
driven solely by “secondary” (i.e., additional) benchmarks.
In Panel C of 3, we present the difference between the returns of added and existing
indexes (i.e., the test statistic reported in Panel A) after splitting the sample into single-
and multiple-benchmark reporters. The test statistics are negative across the board, indi-
cating that out headline results are not driven by funds that report multiple benchmarks. If
anything, the results are slightly stronger in the subsample of single-benchmark reporters,
especially for 10-year returns.
Table 4 contains the analysis of the peer-based benchmark changes. Peer-based bench-
mark indexes, which are calculated based on the net returns of mutual funds, have mechani-
cally lower returns than similar stock-based benchmark indexes because of the effect of fees.
For this reason, mutual funds have incentives to compare their returns to those of peer-based
indexes. Our first test (Columns 1 – 3) compares the past returns of newly-added peer bench-
marks to those of the funds’ existing stock-based indexes. This analysis is a “within-fund”
analysis, similar to that of Panel A of Table 3. As expected, we find that the newly-added
peer benchmark indexes have statistically lower returns than the funds’ current stock-based
indexes over the past 1-, 5-, and 10-years. We note that these differences in Columns 1 –
3 do not, on their own, provide evidence of strategic behavior because of the effect of fund
fees.
We conduct two additional tests that are likely to be more informative about potential
strategic behavior. First, we compute a double-difference statistic. The first difference is
the added peer index return minus the existing same-fund stock index return (as is shown
in Columns 1 – 3). The second difference is the return of dropped peer-based indexes minus
21
Table 4Peer Benchmark Changes
This table presents tests comparing the returns of added peer-based indexes to three sets of control
groups. Columns 1 – 3 compare the returns of the added peer indexes to the average return
of the existing stock-based benchmarks reported by the funds making the peer index additions.
Columns 4 – 6 present a double difference: added peer indexes minus stock-based equity indexes of
funds adding peer indexes (as in Columns 1 – 3), minus the average difference between dropped peer
indexes minus existing equity indexes of the funds dropping peer indexes. Columns 7 – 9 compare
the returns of added peer indexes to the average of the peer indexes in the style that best matches
the fund’s investment style. The table contains the mean of each difference and its associated
t-statistic based on White’s heteroskedasticity-consistent standard errors and the bootstrapped p-
values described in the text. We also present the percentage of the differences that are negative,
and their statistical significance based on a binomial test where the null hypothesis is that positive
and negative differences each occur 50% of the time. *p < .10; **p < .05; ***p < .01.
Control: Existing stock indexes Dropped peer indexes Best-match peer indexes
1 year 5 year 10 year 1 year 5 year 10 year 1 year 5 year 10 year
(1) (2) (3) (4) (5) (6) (7) (8) (9)
∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗ ∗∗∗
Mean -1.16 -6.83 -13.38 -0.29 -2.07 -3.23 -0.42 -3.78 -5.08∗∗
t-statistic (-5.35) (-12.41) (-10.06) (-1.36) (-3.77) (-2.43) (-1.04) (-3.54) (-2.46)
Bootstrapped p-value (0.000) (0.000) (0.000) (0.129) (0.010) (0.031) (0.202) (0.003) (0.020)
Median -1.17 -7.07 -12.30 -0.40 -2.32 -2.16 -0.14 -3.13 -3.85
∗∗∗ ∗∗∗ ∗∗∗ ∗ ∗∗∗ ∗∗ ∗∗∗
%<0 68.8 79.7 78.6 54.5 60.9 56.3 50.9 62.1 67.0∗∗∗
N obs 266 266 206 266 266 206 116 116 94
those of the funds’ existing stock-based indexes.14 We report the results of these tests
in Columns 4 – 6. Consistent with the results from the tests using stock-based indexes,
we find that these differences are uniformly negative, economically large, and statistically
significant for both the 5- and 10-year return horizons. Next, we compare the returns of the
added peer-based indexes to those of peer-based indexes whose style best matches a fund’s
investment style, where the best matching style is defined the same way as in Panel B of
Table 3. This test only includes the subsample of 116 cases in which funds choose a peer-
based benchmark that does not match their investment style. We continue to find strong
evidence that benchmark changes inflate the appearance of funds’ BAR. Specifically, funds
choose peer-based benchmarks with 3.78% lower 5-year returns than those of the peer-based
14
Because most peer additions and deletions are carried out by funds that have only one peer index at
the time of the change, it is not possible to perform a direct within-fund add-minus-drop analysis using only
peer indexes.
22
index that best matches the fund’s investment style.
The results presented in Tables 3 and 4 indicate that funds’ benchmark changes lead
to a systematic decrease in the benchmark returns to which they compare their own re-
turns. Combined, these results strongly support our main hypothesis that funds change
their benchmarks to improve the appearance of their BAR.
In this section, we conduct some robustness checks to ensure that the results of the six
key tests in Tables 3 and 4 are not driven by winsorization choices, wholesale changes at the
fund family level, or concurrent fund name or manager changes.15
Figure 2 summarizes the key results presented in Section 5.2 and reproduces those re-
sults after winsorizing extreme benchmark returns differences at the 5% and 95% levels and
without winsorization (test 1 and test 2, respectively). There is no consistent pattern in how
the level of winsorization affects the economic or statistical significance of the results across
the six tests. We note that the other key statistics for these tests (i.e., median differences
and the fraction of differences that are negative) are by definition unaffected by outliers or
winsorization choices. Combined, these results suggest that our main results are neither
driven by outliers nor by our decision to winsorize them.
We next explore the possibility that the decision to change funds’ benchmarks is made at
the fund family level rather than the individual fund level. For instance, a fund family may
decide to change the index(es) for all of its funds to economize on index licensing costs.16
These considerations are likely to be economically important for low-fee passive products
(e.g., see An, Benetton, and Song, 2021) rather than for the actively-managed mutual funds
we analyze in this study. Regardless, we explore this possibility here.
15
Here, we do not reproduce the test presented in Columns 1 – 3 of Table 4 because that particular test
does not necessarily imply a gaming motive. Unsurprisingly, in unreported tests, we find that those results,
too, are not driven by outliers.
16
For example, in 2012, Vanguard began using CRSP stock indexes as its benchmark indexes rather than
MSCI indexes. The stated rationale for doing so was cost savings. See https://www.morningstar.com/
articles/569258/vanguard-to-switch-benchmarks-for-22-index-funds for more information.
23
Figure 2: Robustness Tests
This figure presents robustness tests for the baseline results presented in Tables 3 and 4. Baseline results
use returns winsorized at the 1% and 99% level. In Test 1, benchmark returns are winsorized at the 5% and
95% level. In Test 2, benchmark returns are not winsorized. In Test 3, we exclude wholesale family-level
benchmark changes. See Section 5.3 for more details.
t = −5.56 t = −4.65
Within-fund (Table 3): t = −6.15 t = −5.19 Baseline
Add minus Existing t = −4.95 t = −4.11 Test 1
t = −5.34 t = −4.56 Test 2
Test 3
t = −4.89 t = −5.64
Within-fund (Table 3): t = −5.68 t = −5.97
Retain minus Drop t = −4.31 t = −2.53
t = −3.93 t = −4.63
t = −10.45 t = −7.97
Index Selection (Table 3): t = −11.30 t = −8.51
Add minus Best-match t = −9.42 t = −6.97
t = −10.49 t = −8.03
t = −6.95 t = −5.55
Index Selection (Table 3): t = −8.05 t = −7.84
Add minus Same-style t = −5.72 t = −4.32
t = −7.69 t = −6.54
t = −3.77 t = −2.44
Peer Benchmarks (Table 4): t = −4.16 t = −2.48
Add minus Drop t = −3.63 t = −2.16
t = −3.02 t = −1.55
t = −3.55 t = −2.47
Peer Benchmarks (Table 4): t = −3.53 t = −2.51
Add minus Best-match t = −2.95 t = −2.43
t = −3.47 t = −2.36
0 −2 −4 −6 −8 0 −2 −4 −6 −8 −10
We identify instances in which two-thirds or more of a family’s funds change their bench-
marks in a given year (conditional on a family managing at least three funds in that year).
This analysis reveals that wholesale family changes of active fund benchmarks are quite rare,
suggesting that these decisions are generally made at the fund level rather than the family
level. Specifically, wholesale fund family changes only account for 146 benchmark changes,
or 6.6%, of the 2,201 observations used in the tests presented in Table 3 and Table 4. More
importantly, as summarized in test 3 of Figure 2, removing these wholesale changes from
24
our main analysis has no effect on the magnitude or statistical significance of our results.
Some of the test results become slightly stronger or slightly weaker after removing the whole-
sale changes, as we would expect if these changes were no different than changes made in
isolation. Finally, our tests are not sensitive to the percentage of a family’s funds being
simultaneously changed. In untabulated analyses, we confirm that these results hold if we
change the wholesale definition cutoff to 50%, 75%, or 100% of a family’s funds.
Lastly, another possible explanation for why mutual funds change their benchmarks is
that benchmark changes are a part of other major changes such as fund name, portfolio
manager, or investment strategy changes. Prior literature has documented that funds are
more likely to replace their managers and change their fund names or investment strategies
after periods of poor performance (Kostovetsky and Warner, 2015; Khorana, 1996; Lynch
and Musto, 2003; Cooper, Gulen, and Rau, 2005). Thus, it is possible that instances in
which funds make benchmark changes that improve the appearance of their past perfor-
mance are concentrated in years in which they make other major changes like fund name or
portfolio manager changes. If this were the case, we would need to take it into account when
interpreting the results and assessing our main hypothesis.
We identify investment style changes using Morningstar’s 3×3 category classification
and portfolio manager changes using data from Morningstar Direct. Fund name changes
are detected using the official fund names reported to the SEC. We find that funds change
either their name or their portfolio manager in 3,908 out of 27,288 fund-prospectus year
observations in our sample. We present the results of our analysis in Table 5.
Unsurprisingly, funds are more likely to change their benchmark indexes when other
significant fund-level changes take place. Panel A shows that funds change their benchmarks
in 10.7% of the years in which they also change their name or their manager and in 6.2% of
the years in which they do not. Similarly, Panel B shows that benchmark changes are more
likely to occur at the same time as an investment style change.
Interestingly, despite the likelihood of a benchmark change occurring being higher when
25
Table 5Robustness Tests: Other Fund-level Changes
This table splits each fund-year observation in the sample based on whether a fund changed its
name, portfolio manager, or investment style. The second and third columns report the frequency
and fraction of fund-years that have benchmark changes within each subsample. The last three
columns report the average benchmark return difference and average t-statistic across the tests
from Table 3 and the tests in Columns 4 – 9 in Table 4 recomputed within each subsample of
change.
other fund-level changes occur, we find that the results from the previous section are not
disproportionally driven by these other events. We present the average of the 1-, 5-, and
10-year return differences for the tests presented in Tables 3 – 4 as well as the average t-
statistics of these differences, recomputed within each subsample of Table 5. The differences
are negative and tend to have a similar magnitude within each subsample, especially over
the 5-year and 10-year horizons. This suggests that, on average, funds prefer to add indexes
with low past returns and drop indexes with high past returns, regardless of whether these
benchmark changes happen concurrently with other fund-level events. More importantly for
the interpretation of our previous results, more than two-thirds of the benchmark changes
26
occur in years when there are no other fund-level changes, and these observations drive the
overall result that benchmark changes tend to increase past BAR.
Our results so far suggest that, on average, funds appear to change their benchmarks
to improve the appearance of their past BAR. However, it is also possible that funds may
also be choosing new benchmarks in investment strategies they believe they will subsequently
outperform. Indeed, the results in Sensoy (2009) and Chen, Cohen, and Gurun (2021) suggest
that equity and fixed income funds strategically choose mismatched benchmarks that they
anticipate outperforming based on the benchmarks’ expected return or risk characteristics.
For instance, Sensoy (2009) finds that equity funds tilted their benchmarks towards large
indexes or growth indexes given the historically higher expected returns from investing in
small or value stocks during his sample period. In recent years, however, the returns to
the size and value factors have been less predictable, suggesting that it is more difficult for
funds to choose indexes with low expected returns.17 To shed light on this issue, we explore
whether funds change their benchmarks based on the past returns to the size and value
factors.
We classify each benchmark’s value and size tilt in a manner akin to the way the Fama
and French (1993) SMB and HML factors are formed (e.g., small minus big, value minus
growth). For the value tilt, an index receives a score of 1 if it contains value stocks, 0 if it
contains blend stocks, and a -1 if it contains growth stocks. For the size tilt, an index receives
a score of 1 if it contains small stocks, 0 if it contains mid cap stocks, and -1 if it contains
large stocks. For instance, the Russell 2000 Value index would have a value tilt equal to 1
and a size tilt equal to 1 since it contains small value stocks. We test whether funds are more
likely to add (drop) a value or size index when its relative returns are lower (higher). Figure
17
See, for instance, articles on the performance size and value factors at https://alphaarchitect.
com/2018/11/15/factor-investing-fact-check-are-value-and-momentum-dead/ and https://www.
blackrock.com/us/individual/investment-ideas/what-is-factor-investing/factor-commentary/
andrews-angle/the-sleeping-giant-values-dormant-not-dead.
27
3 presents the results in visual form while Table 6 contains the formal regression results.
The x-axis of each graph in Panel A of Figure 3 maps the difference in the past 5-year
returns of the value minus growth indexes or small minus large indexes while the y-axis maps
the value or size tilt of the indexes funds add or drop. The graph shows a clear negative
correlation between the value tilt of added indexes and the realized value return. In other
words, funds are more likely to add indexes with a value tilt after value returns have been
low, and vice-versa. Such negative correlation is not present when repeating the same test
using the value tilt of dropped indexes, which rules out the interpretation that funds may
simply be swapping indexes with a certain value tilt for other indexes with the same value
tilt, or that the negative correlation observed for added indexes is driven by an overall time-
trend rather than by realized factor returns. Graph (b) of the same panel suggests that there
is no clear pattern for the size tilt. However, a dynamic strategic motive seems to resurface
even on the size dimension once we focus on “mismatched” benchmark changes.
Panel B of Figure 3 contains a similar analysis for the subset of cases in which funds
add a benchmark that is not the most appropriate given their investment strategy. 58% of
benchmark additions have a style mismatch.18 Relative to Panel A, we make one change to
the specification of our test, namely, the y-axis in Panel B represents the difference between
the tilt of the index a fund adds and that of the index that would be most appropriate
given the fund’s investment strategy. The picture remains clear: funds add indexes with
a value tilt if the returns to value investing have been relatively low in the recent past.
This result provides confirmation that funds engage in this behavior regardless of their
current investment style. Moreover, graph (b) shows that a similar pattern exists for the
size dimension of added benchmarks. That is, we observe a negative correlation between the
direction of size mismatch of benchmark additions and the realized return of the size factor.
Table 6 contains formal regression analysis quantifying the patterns of Figure 3. Specifi-
cally, we estimate linear regressions in which the dependent variable is the style (either size
18
The definition of style mismatch is similar to that used in Section 5.2. A fund is considered to be
mismatched if its chosen benchmark style does not maximize R 2 or minimize tracking error.
28
Figure 3: Do Funds Choose Benchmarks Styles with Lower Past Returns?
This figure contains charts illustrating the relation between funds’ style tilts and the past relative returns
of the value and size factors. The x-axis for each graph is the 3-year returns of i) the average value index
minus the average growth index or ii) the average small stock index minus the average large stock index.
The y-axis in Panel A is the style tilt of the indexes funds add or drop. The y-axis in Panel B is the style
tilt of the index funds add minus the style tilt of the index that best reflects the funds investment strategy
(this difference is labeled value or size mismatch). Panel B contains only the subset of additions that appear
mismatched relative to a fund’s actual investment style.
0.6 0.2
0.4
0
0.2
0 −0.2
−0.2
−0.4
−4% −2% 0% 2% 4% 6% −2% 0% 2% 4% 6%
Realized value minus growth return Realized small minus large return
or value) tilt of the benchmarks funds add and the main independent variable of interest is
the average difference in returns for either the subsets of value and growth indexes or the
small and large capitalization indexes. First, for the columns examining funds’ value tilts,
29
Table 6Style Choice as a Function of Past Returns
This table contains the results of linear regressions examining the determinants of the style tilt of
funds’ benchmark choices. In panel A, the dependent variable is the style tilt of added benchmarks.
In panel B, the dependent variable is the style tilt of a fund’s added benchmarks minus the average
style tilt of the benchmarks that funds dropped in the same year. In Panel C, the dependent
variable is the difference in the tilts for the benchmark a fund adds and that of the fund’s most
appropriate benchmark (this difference is labeled style mismatch). The independent variable for
each panel is either the i) difference in the past returns for the value minus growth indexes or ii)
the small capitalization minus large capitalization indexes. The sample of observations used in
Panel C is only the subset of observations in which funds choose a style-mismatched benchmark.
t-statistics based on standard errors clustered by year are reported in parentheses. *p < .10, **p
< .05, ***p < .01.
the coefficients on the relative return variables are negative and statistically significant.19
19
The results based on 10-year index returns are weaker, and disappear in Panel C. Due to the high degree
of mean-reversion in the value factor observed in our sample period, it would have been particularly difficult
for funds to dynamically cherry-pick the value tilt of their added benchmarks so as to consistently choose
benchmarks with low style returns for the entire 10-year look-back period. Moreover, some funds in our
sample were less than 10 year old at the time of the benchmark changes, and they would have no reason to
pick styles with low realized returns before their inception.
30
Second, the magnitude of this effect is economically large. For instance, using the 5-year
past return column in the “Value tilt” section of Panel A as an example, a 1% increase in
the 5-year annualized return difference between value and growth indexes is associated with
a 4.62% increase in the likelihood that a fund tilts its benchmark 1 cell towards growth.20
Lastly, although the results on the size tilt are weaker than those for value, we do find some
evidence that funds change their benchmarks to reflect the relative returns to size investing,
especially when focusing on the most recent 3 or 5 years. Consistent with Panel B(b) of
Figure 3, this effect is especially present when we focus our attention to the subsample of
instances in which a fund picks a mismatched benchmark. For example, in the 5-year return
column in the “Size mismatch” section of Panel C, we can see that a 1% change in the
annualized 5-year return difference is associated with a 2.34% increase in the likelihood a
fund tilts its benchmark 1 cell towards a large index.21
The results in this subsection provide additional evidence that funds are changing their
benchmarks based on ex-post, realized returns and results further distinguish our study from
those of Sensoy (2009), Cremers, Fulkerson, and Riley (2020), and Chen, Cohen, and Gurun
(2021). However, we acknowledge that because we do not observe funds’ expectations about
future index returns, we cannot rule out the possibility that these expectations also influence
benchmark change decisions.
To further investigate why funds change their benchmarks, we conduct an event study
around the disclosure of these changes. We compare the difference in the added benchmarks’
performance to that of the best-matching index that we identified and discussed in Section 5.2
for each year around these disclosures. First, we examine whether the return differences we
document in Tables 3 exist outside the performance period funds are required to report in
20
A 1-cell tilt could mean that funds move from a value index to a blend or from a blend to a growth
index.
21
We repeat these analyses and find similar results when we subdivide benchmarks into “broad-based”
and “more narrowly-based” indexes. These results are in the Internet Appendix.
31
their prospectuses. Consider a fund filing its prospectus for year t − 1, an action that takes
place at some point during year t. The fund is required to compare its performance to that
of the benchmark for the periods of (t − 10, t − 1), (t − 5, t − 1), and t − 1. If funds primarily
choose their new benchmarks based on the benchmarks’ realized returns in those periods, we
expect the return differences to be smaller for the periods before and after these windows,
say in (t − 15, t − 11) and (t, t + 4).
Table 7 contains the results. We calculate the returns of the added indexes minus the
returns of the best-match indexes for each fund for different time periods around a benchmark
change. In Columns 1 and 2, we compare the average annual index returns in the 10-year
prospectus period to the 5-year periods immediately before and after. We find that the
returns of the indexes that funds add are significantly lower in the prospectus period than
they are outside of it. Funds add indexes with 0.69% and 0.34% lower per annum returns
in the 10-year reporting period relative to the 5-year periods immediately before and after,
respectively. These differences are statistically significant at the 1% level.
Second, we test for discontinuities at salient prospectus cut-off points. As discussed,
32
funds must present a comparison of their past 1-, 5- and 10-year returns with those of the
chosen benchmarks. If funds are indeed adding indexes in order to embellish their past
performance, we expect these return differences to become larger as they get closer to time
t-1. The reasoning behind this expectation is straightforward. The t-10 index return will
only affect the current reported calculation of the fund’s 10-year benchmark return. By
contrast, the t-1 index return will affect three reported performance numbers (the past 1-
, 5-, and 10-year returns). Further, the t-1 benchmark return will continue to affect the
fund’s BAR for nine additional years, while the t-10 benchmark return will do so only for
the current year.
Columns 3 – 5 of Table 7 contain the results of these tests. We compare the index returns
at time t to those at time t-1, the returns from (t-5, t-2 ) to those at t-1, and those from
(t-10, t-6 ) to those from (t-5, t-1 ). Indeed, the results show that each of these differences
is negative and statistically significant at the 5% level or more. They are also economically
large; for instance, we find that the difference in returns is 0.99% lower at time t-1 than it
is at time t. In sum, the results in this section provide further evidence that funds change
their benchmarks based on past returns.
We next examine whether the group of funds changing their benchmarks are those with
the greatest incentives to do so. Specifically, we estimate logistic regressions in which the
dependent variable is Change, an indicator variable equal to 1 for years in which a fund
changes its benchmarks and 0 otherwise. We include various measures of BAR and fund
flows as our main independent variables of interest. If funds change their benchmarks to
improve the appearance of their BAR, we expect funds with lower BAR to be more likely to
do so. Further, we also include the fund’s tracking error relative to its current benchmark(s)
given our results in Table 2 which show that investors prefer lower tracking error. For
this reason, we expect that funds with higher tracking errors will be more likely to make
33
benchmark changes.
34
Additionally, because a fund changing its benchmark only superficially changes its past
BAR, we also expect funds with less sophisticated clientele to be more likely to make changes.
Prior literature has found that broker-sold funds have less sophisticated clientele and may
reflect greater conflict of interest in the sales channel (Del Guercio, Reuter, and Tkac, 2010;
Del Guercio and Reuter, 2014; Edelen, Evans, and Kadlec, 2012; Bergstresser, Chalmers, and
Tufano, 2008). We follow Edelen, Evans, and Kadlec (2012) and consider a share class to be
broker sold if it has a load charge or a 12b-1 fee and construct a fund-level measure, % Assets
Broker Sold, by value-weighting a fund’s share classes by their assets under management.
We present the results of these logistic regressions in Table 8. The coefficients on lagged
returns, BAR and 5-year fund flows are negative and statistically significant. For instance, a
one standard deviation increase in a fund’s past 5-year BAR (5-year fund flows) is associated
with a 7.72% (11.28%) decrease in the likelihood that a fund changes its benchmarks. In
Column 5, we construct a measure of BAR that decays over time (e.g., each year’s return is
decayed by 20%). The coefficient on that measure is also negative and statistically significant,
which suggests that funds are more likely to change their benchmarks after recent periods
of underperforming their self-selected benchmark(s).22 Tracking error is also a statistically
and economically significant determinant of a benchmark change. A one standard deviation
increase in a fund’s tracking error relative to its current benchmarks is associated with a
16.35% increase in the likelihood that the fund changes its benchmarks. Lastly, broker-sold
funds are also more likely to change their benchmarks, consistent with the notion that the
objective of some benchmark changes may be to try to take advantage of less sophisticated
investors and conflicts of interest in the sales channel.
Combined, the results in Sections 5.2 – 5.6 provide strong evidence that funds change
their benchmarks to improve the appearance of their BAR, do so largely on the basis of the
benchmark indexes’ realized returns, and that funds with the greatest incentives to make
benchmark changes to increase BAR are the ones most likely to do so.
22
In untabulated results, we also estimated models using individual years’ lagged returns and find that
funds with low BAR in more recent years’ are more likely to change their benchmarks.
35
5.7 Other Motivations for Benchmark Changes
Although we find that the majority of funds make benchmark changes that increase their
BAR, we also find that between 34% and 46% of benchmark changes do not increase BAR.
This suggests that some benchmark changes are made for other reasons. We investigate the
potential rationales for this type of benchmark change in this section.
To begin, we read funds’ prospectuses to see whether they contain any written explanation
for the actions taken. We find that funds’ prospectuses contain justification for only 25%
of benchmark changes. The top two rationales provided are i) the new benchmark better
reflecting the fund’s investment strategy and ii) concurrent changes in investment strategy.23
We note that funds’ stated rationales are consistent with the flow-based incentives implied
by our regressions in Table 2.24
We examine whether benchmark changes that do not increase BAR are accompanied
by changes in investment style or if they improve how well a fund’s benchmark reflects its
investment strategy. We use funds’ Morningstar’s 3×3 investment category box to detect
style changes and changes to “hotter” investment styles. Consistent with the definition of
“hot” and “cold” styles used in Table 2, we define a fund as having switched to a “hotter”
style if the fund switches to an investment category in the top 20% of returns or away from an
investment category in the bottom 20% of returns from the prior year. We also calculate the
differences in a fund’s tracking error and R 2 using its prior and current benchmarks in year
t. If some funds that make benchmark changes do so to improve how well their benchmarks
reflect their investment strategies, we expect these benchmark changes to decrease (increase)
tracking error (R 2 ).
We put each fund-year observation into one of three categories based on whether the
23
We tabulated this textual analysis and it can be found in Table B.1 in the Appendix. We thank an
anonymous referee for suggesting that we conduct this analysis.
24
These stated rationales are also consistent with the findings of prior studies. For instance, Lynch and
Musto (2003) and Cooper, Gulen, and Rau (2005) find evidence that mutual funds change their styles, or at
least make investors believe they are doing so, as a way of attracting or retaining fund flows. Christoffersen
and Simutin (2017) and Del Guercio and Tkac (2002) provide evidence that mutual fund investors allocate
capital based on how well funds track their benchmark indexes.
36
fund makes a change to its benchmarks that increases BAR, a change to its benchmarks that
decreases its BAR, or no change to its benchmarks in year t. We then estimate regressions to
examine the concurrent actions of each group of funds and present the results in Table 9. The
first two columns contain the results of logistic regressions in which the dependent variables
are indicator variables, ∆Style (∆ToHotStyle), which are equal to 1 if the fund changes its
investment style (changes to a hotter investment style) in year t and 0 otherwise. The next
two columns contain the results of ordinary least squares regressions where the dependent
variables are ∆TrackingError and ∆R2 , and are as defined in the preceding paragraph. We
omit the group of funds that do not make a benchmark change so that these funds become
the benchmark group to which to compare the coefficients on Increase BAR Change and
Decrease BAR Change. We test differences in coefficients using Wald tests.
The results of these regressions are consistent with our hypotheses for why not all funds
make benchmark changes that increase their BAR. In Columns 1 and 2, we find that funds
making benchmark changes that decrease BAR are more likely to both change their invest-
ment styles and to change to hot styles than both funds making benchmark changes that
increase BAR and funds not changing their benchmarks. These differences in probability
are both statistically and economically significant. For example, funds making benchmark
changes that decrease BAR are 71.7% more likely to change to a hotter investment style than
those making changes that increase BAR. This difference is statistically significant at the
5% level. Funds making benchmark changes that decrease BAR are also 24.9% more likely
to make any style change (i.e., a change to any style, not necessarily the “currently hot”
style) than funds making a change which increases BAR, a difference that is statistically
significant at the 10% level.25 Finally, funds making changes that decrease BAR also choose
benchmarks that lead to greater improvements in how well these benchmarks reflect their
investment strategies. Specifically, benchmark changes that decrease BAR are associated
with 0.11% larger decreases (0.91% increases) in tracking error (R2 ) than those benchmark
25
These differences are computed from the marginal effects of the logistic regressions in Columns 1 – 2.
37
Table 9Benchmark Changes and Concurrent Fund Actions
This table contains the results of regressions modeling the actions funds take concurrently to
benchmark changes. Columns 1 and 2 contain the results of logistic regressions in which the
dependent variables are indicator variables, ∆Style and ∆ToHotStyle, equal to 1 if a fund changes
its investment style and changes to a hot investment style in year t, respectively, or 0 otherwise.
Columns 3 and 4 contain the results of ordinary least squares regressions with continuous dependent
variables, ∆TrackingError and ∆R2 , which are the differences in tracking error and R 2 when using
a fund’s previous and revised benchmarks for the three-year period before time t. The independent
variables of interest are indicator variables, Decrease BAR Change and Increase BAR Change,
which are equal to 1 if the fund makes a benchmark change that does not (does) improve the
appearance of its BAR, and 0 otherwise. The omitted category is funds that do not change their
benchmarks in a given year. The control variables are the fund’s return and flow for the prior three
years, % Assets broker sold, Expense Ratio, Turnover Ratio, and the natural logarithms of both
fund size and fund age as defined in prior tables. The regressions include both year and strategy
dummies or fixed effects. z - or t-statistics are shown in parentheses below the coefficients and are
computed from standard errors double clustered by fund and year. The last two rows of the table
present the differences in Decrease BAR Change and Increase BAR Change and the p-values from
Wald tests of the difference in their coefficients. *p < .10; **p < .05; ***p < .01.
changes that increase BAR. Each of these differences is statistically significant at the 5%
level or higher.
The analysis in this section complements our analysis in Sections 5.2 – 5.6 to provide
a more complete picture of why funds change their benchmarks. While the majority of
benchmark changes improve the appearance of funds’ performance, we document a significant
38
number of changes that do not. Benchmark changes that do not increase BAR are often
made with concurrent changes in the fund’s investment style or improvements in how well
its benchmark(s) reflect its investment strategy.26 Combined, our analysis suggests that
funds’ decisions to change their benchmarks are complex and depend dynamically on their
particular circumstances and flow-based incentives.
In this section, we examine whether investors respond to funds’ benchmark changes with
their capital allocation decisions and, if so, whether they are adversely affected by doing so.
To begin, we plot the flows received by funds that change their benchmarks in Panel A of
Figure 4. For the sample of funds that make a benchmark change, we construct event-time
indicator variables for each year around a fund’s first benchmark change, where year t is
the reporting year of the prospectus containing the benchmark change. We plot flows and
abnormal flows. The flow coefficients come from regressing the monthly flows of funds that
make a benchmark change onto the event-time indicators and time fixed effects to control for
potential common time trends. Abnormal flow coefficients are from a model that includes
all funds in the sample and in which flows are regressed on the aforementioned indicator
variables, funds’ lagged 3-year unadjusted return, time-Morningstar star fixed effects, and
the fund-level control variables described in Table 10.
Panel A of Figure 4 reveals several patterns. First, consistent with our results in Table 8,
funds experience low flows and low abnormal flows in the five-year period leading to a
benchmark change. Second, these funds begin receiving higher flows in the years after
they make the benchmark changes. Specifically, the figure indicates that these funds begin
receiving annual abnormal flows between 0.51 – 2.64% in the five years after the benchmark
26
We also considered other possibilities for why funds could be changing their benchmarks. For instance,
in Table IA.6 of the Internet Appendix, we examine whether funds change their “broad-based” benchmarks
to ones that are more representative of the overall stock market. We did not find evidence consistent with
this hypothesis.
39
change.
Panel B of Figure 4 compares the alphas and expense ratios of funds making benchmark
changes to those of funds that never change their benchmarks. For each of these two groups,
we form equally-weighted portfolios of funds and calculate the difference in their net-of-fee
three-factor Fama and French (1993) alphas using an event-time methodology. Consistent
with the results in Section 5.6, funds that change their benchmarks generate 0.39% lower
alphas in the two years prior to making benchmark changes. Importantly, these funds
continue to underperform their peers by 15 – 33 bps per year in the five years after they
change their benchmarks. These net-of-fee alpha differences are partially explained by the
fact that funds changing their benchmarks charge expense ratios that are 8 – 10 bps higher
than their peers.
Combined, the results in Panels A and B of Figure 4 suggest that investors allocate more
capital to funds that change their benchmark and that these benchmark changes are not
associated with higher future performance.
We ensure the robustness of the pattern depicted in Panel A of Figure 4 by estimating
a battery of flow-performance regressions. The dependent variable for these regressions is a
fund’s monthly percentage flow. All funds in the sample are included in the regressions, re-
gardless of whether they make a benchmark change. For funds making a benchmark change,
we retain the monthly observations that occur in the five-year periods before and after their
first benchmark change to compare the flows these funds receive after benchmark changes
to their prior flows and to those of their peers. Our independent variable of interest in
Columns 1 – 3 is Any Benchmark Change, an indicator variable equal to one for monthly
observations in the five-year period after a fund makes a benchmark change, and zero other-
wise. The regressions also include the same control variables as in Table 2 and various sets
of fixed effects. We should note that benchmark changes are not random and therefore the
effects we document here should not be interpreted as “treatment effects.”
We present the results of these regressions in Table 10. Consistent with the evidence
40
Figure 4: Flows and Alphas around Benchmark Changes
Panel A of this figure plots the flows and abnormal flows that funds making benchmark changes receive in
the years around the benchmark change. The year for which the benchmark is first changed is labeled year
0, the subsequent year (i.e., the year in which the prospectus change is first filed with the SEC) is labeled
year 1, and so on. The blue line with the square marks plots average annualized unadjusted flows. The red
line with circular marks shows annualized average abnormal fund flows obtained from a flow-performance
regression (see the text for additional details). Panel B shows the annualized differences in average net-of-fee
three factor Fama and French (1993) alpha and in expense ratio between benchmark-changing funds and
non-changing funds. The blue line with square marks plots the differences in the difference in alphas while
the red line with circular marks plots the inverted differences in expense ratio. The latter is inverted so that
the expense ratio line shows the degree to which the difference in fees contributes to the difference in net
alpha between the two groups.
1.5
−1.5
−3
−4.5
−4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10
Years from benchmark change
(B) Relative performance and fees of funds that make a benchmark change
Alpha and expense ratio, per annum (%)
−0.1
−0.2
−0.3
−0.4
−0.5
−4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10
Years from first prospectus benchmark change
presented in Figure 4, these regressions indicate that funds attract statistically higher flows
after they change their benchmarks. Specifically, Columns 1 – 3 indicate that funds receive
2.31% to 2.68% higher flows per year in the five-year period after they make a benchmark
change. Consistent with the patterns visible in the figure, in Appendix Table C.3 we find
41
Table 10Fund Flows around Benchmark Changes
This table contains the results of regressions examining the impact of benchmark changes on
monthly fund flows for the 5-year periods surrounding a benchmark change. The dependent vari-
able in each column is a fund’s monthly flow. The main independent variables of interest are
indicator variables equal to one for fund-month observations that occur in the 5-year period after a
fund makes a given type of benchmark change, and zero otherwise. 1[Any Benchmark Change] is
equal to one for any benchmark change. 1[Increase BAR Change] is equal to one for a benchmark
change that improves the appearance of a fund’s BAR. 1[Better-Fit Change] is equal to one for
a benchmark change that leads to a fund’s benchmark no longer being mismatched relative to its
Morningstar style. 1[Change to Hot Style Benchmark] is equal to one for instances in which a fund
changes to a hot style benchmark, as defined in Table 2. 1[Other Benchmark Change] is equal to
one for all changes that do not increase BAR, do not correct a mismatch, and are not shifts to hot
style benchmarks. The control variables are the fund’s past 3-year cumulative return, the natural
logarithms of fund size and fund age, expense ratio, return volatility, tracking error, indicator vari-
ables equal to 1 if the fund’s benchmarks represent hot and cold styles and 0 otherwise, and an
indicator variable equal to 1 if the fund’s benchmark does not match its Morningstar defined style
and 0 otherwise. t-statistics are shown in parentheses below the coefficients and are computed from
standard errors which are double clustered by fund and time. *p < 0.1; **p < .05; ***p < .01.
that the insights of the regressions presented in Table 10 are not sensitive to changing the
flow-measurement period around the benchmark change chosen (as can be seen in the figure
as well).
42
As discussed in Section 4, funds have flow-based incentives to choose benchmarks that
increase BAR, reflect hot styles, and accurately reflect their investment strategies. We
next examine whether benchmark changes that achieve these different objectives lead to
differential flow effects. To do so, we construct indicator variables equal to one for the five-
year period after a benchmark change if it achieves a given objective, and zero otherwise.
Increase BAR Change is equal to one if a benchmark change improves the appearance of
a fund’s BAR. Better-Fit Change is equal to one if a benchmark change leads to a fund’s
benchmark no longer being mismatched relative to its Morningstar style. Change to Hot Style
Benchmark is equal to one if a fund changes to a hot style benchmark as defined in Table 2. It
is important to note that these three indicator variables are not mutually exclusive. Finally,
Other Benchmark Change is equal to one for all benchmark changes for which the previous
three indicator variables are equal to zero. Given that these remaining benchmark changes
do not accomplish any of the aforementioned objectives, we do not expect the coefficient on
Other Benchmark Change to be statistically different from zero.
The regressions in Columns 4 – 6 of Table 10 each contain one of the three indicator
variables of interest as the main independent variable of interest. The coefficient of 0.19 on
Increase BAR Change (t-statistic = 2.81) in Column 4 indicates that benchmark changes
that increase the appearance of a fund’s BAR are associated with a 2.28% increase in annual
fund flows. The economic magnitude of this effect is consistent with that implied by the
standard flow-performance regression estimates from Table 2.27 Consistent with the cross-
sectional flow-performance results of Table 2, we also find positive and statistically significant
relations between fund flows and benchmark changes that indicate shifts to hot styles or
improve how well a fund’s benchmark reflects its investment strategy.
In Columns 7 – 10, we include all three of the aforementioned indicator variables along
with Other Benchmark Change. We continue to find positive and statistically significant
27
Specifically, the average benchmark change that increases BAR in Panel A of Table 3 improves the
appearance of a fund’s 3-year BAR by 5.88%. Multiplying this increase by the annualized coefficients in
Columns 2 – 8 of Table 2 implies that the average benchmark change that increases a fund’s BAR is associated
with an annual flow increase between 2.22% and 3.74%.
43
coefficients on Increase BAR Change and Change to Hot Style Benchmark. The coefficients
on these variables are also the two largest. The coefficient on Other Benchmark Change is
statistically insignificant, suggesting that investors do not respond to benchmark changes
that do not improve BAR, improve the fit of a fund’s benchmark index, or indicate a shift
to a hot style. These inferences are robust to the inclusion of variables that capture other
major fund-level events such as a change in manager, fund name, or fund family (Columns 8
– 10), mitigating the concern that the observed flow response to improved BAR may be due
to other concurrent fund-level events. In Column 9, we control for convexity in the flow-
performance relation by including indicator variables for the decile of past performance in
which a fund falls (Chevalier and Ellison, 1997; Sirri and Tufano, 1998). In Column 10, we
include individual lags of monthly returns to account for the fact that investors have stronger
responses to more recent past performance (Barber, Huang, and Odean, 2016). Regardless of
the way we model the flow-performance relation or the set of control variables we include, we
continue to find strong evidence that investors respond to benchmark changes that increase
BAR with subsequently higher flows.
Next, we formally compare the performance of funds that change their benchmarks to
that of non-benchmark-changing funds in the years after the benchmark changes are made to
confirm the pattern documented in Panel B of Figure 4. Again, we conduct this analysis to
assess whether investors earn lower returns by allocating more capital to funds that change
their benchmarks.
We create two portfolios of funds to conduct this analysis. The first portfolio contains all
funds that change their benchmarks. The monthly returns of this portfolio are constructed
using only the returns of the benchmark-changing funds for the five years after the funds make
the changes. The second portfolio contains all funds that never change their benchmarks
during our sample period. We equally weight the monthly returns of each fund in each
portfolio to calculate portfolio returns following the literature (e.g., Cremers and Petajisto,
2009). We then compare both the gross-of-fee and net-of-fee returns of each portfolio. We
44
Table 11Fund Performance After Benchmark Changes
Funds are sorted into portfolios based on whether they ever change their benchmark indexes.
Portfolio returns are calculated by equal-weighting the gross- or net-of fee returns of the funds
within each portfolio. Funds that make a benchmark change are included in the portfolio only in the
five years following their first benchmark change. Panel A compares the post-change performance
of benchmark-changing funds to the performance of funds that never make a benchmark change.
Panel B compares the post-change performance of funds that make a BAR-increasing change to
those that make a BAR-decreasing change. Unadj. is the unadjusted portfolio return. CAPM
alpha, FF3 alpha, and FFC4 alpha are the intercepts from regressions of the difference in portfolio
returns on the factors of the given model. All intercepts are annualized and t-statistics are based
on White’s heteroskedasticity-consistent standard errors. *p < .10, **p < .05, ***p < .01.
also calculate CAPM alpha, Fama and French (1993) three-factor alpha, and Carhart (1997)
four-factor alpha as the intercepts from regressions of these portfolios’ monthly returns on
the corresponding monthly factor returns. We present the results of these tests in Panel A of
Table 11. Regardless of the factor model chosen or the inclusion of fund fees, we find strong
evidence that funds that change their benchmarks continue to underperform non-changing
funds in the years after the benchmark changes occur. These performance differences range
from 47 to 52 bps net of fees and from 36 to 41 bps gross of fees.
Finally, in Panel B of Table 11, we compare the post-change net-of-fee performance of
funds making benchmark changes that increase BAR to that of funds making benchmark
45
changes that do not, using a portfolio regression methodology analogous to that described
for the tests in Panel A. We find no statistically significant difference in these groups’ per-
formance after the benchmark changes occur. Comparing across Panel A and B, we see that
both groups of benchmark-changing funds deliver negative net alphas that are lower than the
alphas of funds that do not make any benchmark changes. These findings are consistent with
the overarching narrative from the rest of the paper. That is, on average, funds that make
changes to their benchmark are poor-performing funds; these funds continue to underper-
form after making changes to their benchmarks; and they likely attempt to use benchmark
changes (that increase BAR, or signal a style pivot, etc.) in order to retain investors and/or
attract new flows.
7 Conclusion
In this paper, we document that mutual funds take advantage of a loophole in the SEC’s
disclosure requirements. Specifically, we find that many mutual funds systematically change
their self-designated benchmark indexes to improve the appearance of their BAR. Simply
put, funds tend to add indexes with low past returns and drop indexes with high past
returns. Investors respond to these changes by allocating more capital to these funds and
subsequently experience persistently low returns.
Our study has implications for academics and regulators alike. For academics, our study
provides a striking example of agency conflict manifestation and strategic behavior in re-
sponse to disclosure requirements. For regulators, although the behavior we document does
not appear to be technically illegal, it does seem to conflict with the SEC’s stated goal of pro-
viding investors with transparency and a clear measure of the value a fund creates. The SEC
has recently changed its rules to simplify funds’ disclosures to investors. Our study suggests
that these new disclosure guidelines, which make these relative performance comparisons
more salient, may adversely affect unsophisticated investors if enacted without additional
46
requirements for how funds report their past performance information. Regulators should
consider requiring funds to compare their past returns only to those of the benchmark in-
dexes they cited at the time the returns were generated. This requirement would effectively
close the existing loophole without limiting the ability of funds to make “legitimate” changes
to their investment strategy or benchmarks in a forward-looking sense.
47
References
Agarwal, V., G. D. Gay, and L. Ling. 2014. Window dressing in mutual funds. The Review
of Financial Studies 27:3133–70.
An, Y., M. Benetton, and Y. Song. 2021. Index providers: Whales behind the scenes of etfs.
Working paper, SSRN.
Barber, B. M., X. Huang, and T. Odean. 2016. Which factors matter to investors? evidence
from mutual fund flows. The Review of Financial Studies 29:2600–42.
Ben-David, I., J. Li, A. Rossi, and Y. Song. 2022. What do mutual fund investors really care
about? The Review of Financial Studies 35:1723–74.
Bergstresser, D., J. M. Chalmers, and P. Tufano. 2008. Assessing the costs and benefits of
brokers in the mutual fund industry. Review of Financial Studies 22:4129–56.
Chen, H., L. Cohen, and U. Gurun. 2021. Don’t take their word for it: The misclassification
of bond mutual funds. Journal of Finance forthcoming.
Chen, H., R. Evans, and Y. Sun. 2022. Self-declared benchmarks and fund manager intent:
Cheating or competing? Working paper, SSRN.
Chevalier, J., and G. Ellison. 1997. Risk taking by mutual funds as a response to incentives.
Journal of Political Economy 105:1167–200.
Christoffersen, S. E., and M. Simutin. 2017. On the demand for high-beta stocks: Evidence
from mutual funds. The Review of Financial Studies 30:2596–620.
Cici, G., S. Gibson, and J. J. Merrick Jr. 2011. Missing the marks? dispersion in corporate
bond valuations across mutual funds. Journal of Financial Economics 101:206–26.
Cooper, M. J., H. Gulen, and P. R. Rau. 2005. Changing names with style: Mutual fund
name changes and their effects on fund flows. Journal of Finance 60:2825–58.
Cremers, K. M., and A. Petajisto. 2009. How active is your fund manager? a new measure
that predicts performance. The review of financial studies 22:3329–65.
Cremers, M., J. A. Fulkerson, and T. B. Riley. 2020. Benchmark discrepancies and mutual
fund performance evaluation. Journal of Financial and Quantitative Analysis forthcoming.
Dannhauser, C. D., and J. Pontiff. 2019. Flow. Working paper, Boston College.
Del Guercio, D., and J. Reuter. 2014. Mutual fund performance and the incentive to generate
alpha. Journal of Finance 69:1673–704.
48
Del Guercio, D., J. Reuter, and P. A. Tkac. 2010. Broker incentives and mutual fund market
segmentation. Working Paper, National Bureau of Economic Research.
Del Guercio, D., and P. A. Tkac. 2002. The determinants of the flow of funds of managed
portfolios: Mutual funds vs. pension funds. Journal of Financial and Quantitative Analysis
37:523–57.
———. 2008. Star power: The effect of monrningstar ratings on mutual fund flow. Journal
of Financial and Quantitative Analysis 43:907–36.
Edelen, R. M., R. B. Evans, and G. B. Kadlec. 2012. Disclosure and agency conflict: Evidence
from mutual fund commission bundling. Journal of Financial Economics 103:308–26.
Elton, E. J., M. J. Gruber, and C. R. Blake. 2014. The performance of separate accounts
and collective investment trusts. Review of Finance 18:1717–42.
Emin, M., and C. M. James. 2022. Bank loan funds: Discretionary values as a source of
stability. Working paper, SSRN.
Evans, R. B., and Y. Sun. 2021. Models or stars: The role of asset pricing models and
heuristics in investor risk adjustment. Review of Financial Studies 34:67–107.
Fama, E. F., and K. R. French. 1993. Common risk factors in the returns on stocks and
bonds. Journal of Financial Economics 33:3–56.
Fama, E. F., and J. D. MacBeth. 1973. Risk, return, and equilibrium: Empirical tests.
Journal of Political Economy 81:607–36.
Gaspar, J.-M., M. Massa, and P. Matos. 2006. Favoritism in mutual fund families? evidence
on strategic cross-fund subsidization. The Journal of Finance 61:73–104.
Griffin, J. M., and S. Kruger. 2023. What is forensic finance? Available at SSRN 4490028 .
Ivković, Z., and S. Weisbenner. 2009. Individual investor mutual fund flows. Journal of
Financial Economics 92:223–37.
Kaniel, R., and R. Parham. 2017. WSJ category kings: The impact of media attention
on consumer and mutual fund investment decisions. Journal of Financial Economics
123:337–56.
Kostovetsky, L., and J. B. Warner. 2015. You’re fired! new evidence on portfolio manager
turnover and performance. Journal of Financial and Quantitative Analysis 50:729–55.
49
Lynch, A. W., and D. K. Musto. 2003. How investors interpret past fund returns. Journal
of Finance 58:2033–58.
Massa, M., J. Reuter, and E. Zitzewitz. 2010. When should firms share credit with employ-
ees? evidence from anonymously managed mutual funds. Journal of Financial Economics
95:400–24.
Meier, I., and E. Schaumberg. 2004. Do funds window dress? Evidence for us domestic
equity mutual funds. Working Paper, HEC Montreal, Working Paper.
Sirri, E. R., and P. Tufano. 1998. Costly search and mutual fund flows. Journal of Finance
53:1589–622.
50
Appendix A Benchmark Indexes
In this Appendix, we report additional details regarding fund benchmark indexes. The
full list of indexes utilized in the tests carried out in the paper is presented in Table A.1.
The statistics are based on the prospectus-year observations for funds that meet the sample
requirements. We report only indexes for which we were able to obtain a time series of
monthly returns over the duration of the sample. There are a total of 72 indexes in our
sample.
The indexes listed in Panel A have a well-defined place in the 3×3 size/value style box
based on how they are constructed. The indexes listed in Panel B are assigned a style
classification using regression analysis. Specifically, for each index in Panel B, we analyze
the univariate regression beta and R2 with respect to the indexes in Panel A. This analysis
results in unambiguous categorization for all indexes. For instance, the Dow Jones Industrial
Average index is classified as a large-blend benchmark because it loads on the S&P 500 and
the other major large-blend indexes with a beta just below one and an R2 greater than 90%.
The S&P 500 Dividend Aristocrats, the Nasdaq Dividend Achievers, and the Dow Jones
Select Dividend Index are all classified as large-value indexes. This procedure also classifies
peer-based indexes in a way that is consistent with the style implied by their name. Only
two indexes receive a style classification that merits further comment. The S&P 500 Equal
Weight index is classified as large-value because its returns covary with those of other large-
value indexes more than with those of the (value-weighted) S&P 500 or other large-blend
indexes. The Russell 1000 Equal Weight index is classified as mid-blend because its returns
covary with those of other large-value indexes more than with those of the (value-weighted)
Russell 1000 or other large-blend indexes. These classification choices do not impact our
results in a meaningful way because only 9 of the benchmark changes we study involve the
S&P 500 Equal Weight and the Russell 1000 Equal Weight, which constitute less than 0.4%
of the sample of changes studied. Moreover, as discussed in the main text, the analysis
presented in the paper is robust to including only the indexes listed in Panel A.
51
Table A.2 shows the 5 most frequently used benchmarks by funds within each of the
Morningstar 3×3 size/value style boxes, assigned by Morningstar based on a fund’s actual
holdings. Panel A presents stock-based indexes and Panel B presents peer-based indexes.
S&P 500V 256 44 1.84 Russell 1000 891 69 2.89 S&P 500G 113 15 0.63
Russell 200V 15 1 0.04 S&P 1500 52 3 0.13 Russell 200G 10 0 0.00
S&P 1500V 11 3 0.13 Russell 200 21 1 0.04 S&P 1500G 1 1 0.04
Russell MCV 1,354 36 1.51 Russell MC 903 67 2.81 Russell MCG 1,905 58 2.43
Russell 2500V 459 20 0.84 S&P 400 710 70 2.93 Russell 2500G 483 39 1.63
Mid
S&P 400V 60 8 0.34 Russell 2500 603 49 2.05 S&P 400G 23 6 0.25
S&P 1000V 0 0 0.00 S&P 1000 3 0 0.00 S&P 1000G 0 0 0.00
Small
Russell 2000V 2,037 62 2.60 Russell 2000 3,182 164 6.87 Russell 2000G 2,251 77 3.22
S&P 600V 46 3 0.13 S&P 600 247 37 1.55 S&P 600G 8 1 0.04
Russell MicroV 15 3 0.13 Russell Micro 127 18 0.75 Russell MicroG 69 10 0.42
Lipper SV 187 27 1.13 Wilshire Micro 17 3 0.13 Lipper SG 307 45 1.88
Small
52
Table A.2Benchmark Indexes by Fund Investment Style
This table presents a list of the most commonly used benchmark indexes by funds’ in each 3×3
Morningstar category. Panel A reports (up to) the top 5 most commonly used stock-based bench-
marks (e.g., those constructed using various groups of stocks) while Panel B reports (up to) the
top 5 peer-based benchmarks. Peer-based indexes are averages of fund returns computed by Lipper
and Morningstar.
Russell 3000V 254 5.59 Russell 3000 360 8.88 Russell 3000G 449 7.79
S&P 500V 185 4.07 Russell 1000V 255 6.29 Russell 1000 160 2.78
Russell 1000 145 3.19 Russell 3000V 72 1.78 Russell 3000 125 2.17
Russell MCV 815 53.03 Russell MC 327 21.04 Russell MCG 1,470 41.15
S&P 500 147 9.56 S&P 500 264 16.99 S&P 500 479 13.41
Mid
Russell MC 129 8.39 Russell MCV 242 15.57 S&P 400 338 9.46
Russell 2500V 104 6.77 S&P 400 167 10.75 Russell MC 320 8.96
Russell 3000V 96 6.25 Russell 2500 130 8.37 Russell 2500G 277 7.75
Russell 2000V 1,023 64.22 Russell 2000 1,359 51.85 Russell 2000G 1,784 53.14
Small
Russell 2000 323 20.28 Russell 2000V 684 26.10 Russell 2000 885 26.36
Russell 2500V 90 5.65 Russell 2500 133 5.07 S&P 500 177 5.27
S&P 500 38 2.39 S&P 500 120 4.58 Russell 2500G 130 3.87
S&P 600V 29 1.82 S&P 600 86 3.28 S&P 600 100 2.98
Lipper MultiCap Value 117 15.60 Lipper Equity Income 57 9.95 Lipper LC Core 113 11.80
Morningstar LV 95 12.67 Morningstar LB 45 7.85 Morningstar LG 94 9.81
Lipper Growth & Income 31 4.13 Lipper LC Value 42 7.33 Lipper MultiCap Core 71 7.41
Lipper MC Value 3,405 5.49 Lipper MC Core 8,956 14.07 Lipper MC Growth 3,485 4.65
Morningstar MV 575 1.47 Morningstar MB 922 3.27 Morningstar MG 681 1.93
Mid
Lipper MC Core 256 1.84 Lipper MultiCap Core 891 2.89 Lipper MultiCap Growth 113 0.63
Lipper MultiCap Value 15 0.04 Lipper MC Value 52 0.13 Lipper MC Core 10 0.00
Lipper Equity Income 11 0.13 Morningstar MG 21 0.04 Lipper SC Growth 1 0.04
Lipper SC Value 122 67.03 Lipper SC Core 182 65.94 Lipper SC Growth 285 58.40
Morningstar SV 29 15.93 Lipper SC Value 48 17.93 Lipper SC Core 117 23.98
Small
53
Appendix B Prospectus Information and Examples
To gain further insight on the reasons why funds change their performance benchmarks,
we read and classify the rationales funds give for changing their benchmark indexes in their
prospectuses. Table B.1 summarizes the results of this analysis. Around three quarters of
changes are not accompanied by any explanation.28 For funds that did explain their choice,
the most common explanation was that the investment manager or advisor “believe the new
benchmark is more representative of the Fund’s investment strategies.” This explanation
was found for 17.11% of fund-year observations. The next two most common explanations
were that the fund was changing its style (4.90%) or the fund was adding an additional
benchmark (1.55%) (i.e., the fund is adding a new benchmark but does not intend to drop
the pre-existing benchmark(s)).
28
We note that explanations are mandated only in cases in which a fund completely “replaces” its bench-
mark(s) with a new one (new ones). Since most changes only involve the addition or the deletion of a
benchmark (i.e., without a full replacement happening within the same year), adding an explanation is not
required in most cases and, evidently, funds take advantage of this rule and do not provide any sort of
explanation in many of the cases in which it is not strictly mandated.
54
Table B.1What Do Funds Say When They Change Their Benchmarks?
This table tabulates the frequency of each reason cited for funds changing their benchmark indexes.
We manually gathered the text surrounding funds’ benchmark changes by reading their prospec-
tuses. We grouped the reasons into 9 categories and present the frequency at which each reason is
cited.
This appendix presents examples of how we identified the benchmark indexes and bench-
mark changes in our sample using funds’ prospectuses. Even though the SEC requires funds
to present their benchmark indexes in a tabular format, these tables are formatted differently
across funds. We present four cases in which funds made changes to the list of performance
benchmarks that appear in their Average Annual Total Returns table. The objective of this
Appendix is solely to familiarize the reader with the source of the data. Inclusion in this
appendix does not suggest or imply misconduct on the part of the mutual fund companies,
managers, or advisors mentioned.
55
Figure B.1: Example 1: S&P 500 to to Russell 1000 Value
In April 2018, the Advisors’ Inner Circle Cambiar Opportunity Fund disclosed that it added
the Russell 1000 Value Index as its new benchmark. In 2017, the Russell 1000 Value Index
underperformed the S&P Value Index by 8.17% (13.66 – 21.83%). The 10-year cumulative
return differential between the two indexes is -27.40%. The next year, the fund deleted any
mention of the S%P 500 Index.
2016: https://www.sec.gov/Archives/edgar/data/878719/000113542817000946/cambiarof-497k.txt
2017: https://www.sec.gov/Archives/edgar/data/878719/000139834418004250/fp0031913_497k.htm
2018: https://www.sec.gov/Archives/edgar/data/878719/000139834419004824/fp0039604_497k.htm
56
Figure B.2: Example 2: Fund uses Large-Blend and Small-Value Indexes.
In 2011, the Auer Growth Fund added the Russell 2000 Value Index; in 2016, the fund
dropped the Russell 2000 Value Index. The Russell 2000 Value Index return was 31.55%
in 2016, compared to a return of 11.96% for the retained index (S&P 500). An investor
presented with the 2011 prospectus and related marketing material would most likely not
have known that the fund had only begun reporting the Russell 2000 Value as a benchmark
index that year. Similarly, the post-2016 prospectus and summary prospectus do not indicate
that the fund once listed the Russell 2000 Value Index as a performance benchmark.
2010: https://www.sec.gov/Archives/edgar/data/1199046/000119312511081413/d485bpos.htm
2011: https://www.sec.gov/Archives/edgar/data/1199046/000119312512140109/d308767d485bpos.htm
2015: https://www.sec.gov/Archives/edgar/data/1199046/000119312516520569/d139037d485bpos.htm
2016: https://www.sec.gov/Archives/edgar/data/1199046/000119312517103411/d324224d485bpos.htm
57
Figure B.3: Example 3: Fund Reports Multiple Peer-based Benchmarks.
The Nuveen Symphony Small-Mid Cap Core Fund fund reported three different peer-based
indexes in three years. In 2009, their self-designated peer index was the Lipper Small Cap
Core Funds Index. In 2010, their self-designated peer index was the Lipper Mid Cap Growth
Funds Index. In 2011, their self-designated peer index was the Lipper Mid Cap Core Index.
None of these tables contains the previously-used peer-based benchmark indexes.
2009: https://www.sec.gov/Archives/edgar/data/1041673/000119312510044401/d497k.htm
2010: https://www.sec.gov/Archives/edgar/data/1041673/000119312511009118/d497k.htm
2011: https://www.sec.gov/Archives/edgar/data/1041673/000119312512031566/d274934d497k.htm
58
Figure B.4: Example 4: Fund Switching between Mid-cap and Small-cap Bench-
marks.
The Massachusetts Mutual Life (MML) Small Cap Equity Fund went back and forth between
the Russell 2000 and the Russell 2500 within four years.
2009: https://www.sec.gov/Archives/edgar/data/1317146/000119312510101907/d485bpos.htm
2010: https://www.sec.gov/Archives/edgar/data/1317146/000119312511118799/d485bpos.htm
Footnote: Going forward, the Fund’s performance will be compared to the Russell 2500 Index
rather than the Russell 2000 Index because the Russell 2500 Index more closely represents
the Fund’s investment strategy.
2011: https://www.sec.gov/Archives/edgar/data/1317146/000119312512199905/d278436d485bpos.htm
59
2012: https://www.sec.gov/Archives/edgar/data/1317146/000119312513185062/d465624d485bpos.htm
Footnote: Going forward, the Fund’s performance will be compared to the Russell 2000 Index
rather than the Russell 2500 Index because the Russell 2000 Index more closely represents
the Fund’s current investment strategy.
2013: https://www.sec.gov/Archives/edgar/data/1317146/000119312514176151/d712419d497k.htm
60
Appendix C Supplemental Results
This appendix reports results that are cited but not tabulated in the paper.
Table C.1 is a robustness test for the flow-performance regressions presented in Table 2.
In the regressions presented in the paper, the return-based measures (i.e., BAR, benchmark
return, etc.) are computed using three years of past returns; in Table C.1, we use five years
of past returns. The inference is unchanged. Statistically, the tracking error becomes more
negatively correlated with future monthly flows in this robustness table than in Table 2, and
other relationships remain the same. These results reinforce the insights drawn in the paper.
Table C.2 presents the results of tests similar to those of Table 3, but using benchmarks’
style returns as opposed to benchmark index returns. Columns 1 to 3 compare the style
return of added indexes to the average style return of non-added indexes. Columns 4 to 6
compare the style return of added indexes to the return of the best-matching style for each
fund. Columns 1 to 3 show that, on average, funds tend to add indexes in styles with low
past returns, while columns 4 to 6 show that funds tend to add indexes belonging to styles
that have a significantly lower return than the style best matching their investments.
Table C.3 presents robustness tests for the regressions presented in columns 3 and 7 of
Table 10. The baseline regression specifications presented in the main text of the paper used
5 years before and 5 years after a benchmark change as the flow measurement horizon. In
this robustness test, we vary the length of the pre-change period and post-change period.
61
Table C.1Flow-Performance Regressions (Robustness Test)
This table replicates the regression results presented in Table 2 using 5 years as the past performance
horizon instead of 3 years. All other econometric specifications are the same. *p < .10; **p < .05;
***p < .01.
Control Variables Yes Yes Yes Yes Yes Yes Yes Yes Yes
Time F.E. Yes No No No No No No No No
Fund F.E. No No No No No No No No Yes
Time × Morningstar Star F.E. No Yes Yes Yes Yes Yes Yes Yes Yes
Lagged Monthly Returns No No No No No No Yes No No
Return Decile F.E. No No No No No Yes No No No
Adj. R2 0.107 0.137 0.135 0.137 0.140 0.143 0.174 0.138 0.227
N obs 232,951 232,825 232,825 232,825 232,825 232,825 232,825 232,951 232,933
62
Table C.2Style Selection
This table contains the results comparing the style component of returns of the benchmark indexes
funds add to the style return of two control groups of indexes they did not choose to add. Columns 1
- 3 compare the average returns of the style of index funds add to the average of all non-added style
indexes. Columns 4 - 6 compare the average returns of the style of index funds add to the average of
the indexes in the style that best matches the fund’s investment style. To better represent random
counterfactuals, index returns in the control groups are frequency-weighted. We also present the
percentage of the differences that are negative. In parentheses, we present p-values calculated from
binomial tests where the null hypothesis is that positive and negative differences each occur 50%
of the time. *p < .10; **p < .05; ***p < .01.
63
Table C.3Fund Flows around Benchmark Changes (Robustness Test)
This table presents robustness tests for the baseline regressions presented in Table 10 (columns 3
and 7). The baseline regression specifications estimated in Table 10 used 5 years before and 5 years
after a benchmark change as the flow measurement horizon. Here, we report results using shorter
and longer horizons, as well as the original results (based on the ±5 horizon) as a point of reference.
64
C.2 Bootstrapped Statistics for Main Tests
In Section 5.2, we presented a series of tests examining how benchmark changes affect
the past returns of the indexes that fund report (see Table 3 and Table 4). Here, we provide
more details about the methodology used to compute bootstrapped p-values.
For each test, we have run five different bootstrap simulations which are summarized in
Table C.4. We calculate bootstrapped p-values that control for clustering (i.e., repetition)
of style choices within a year as well as index return dependence within and across time.
We use the “add-minus-existing” test from Panel A of Table 3 to illustrate the bootstrap
simulation procedure. For this test, we compute the difference between the returns of the
index(es) a fund adds and those of its pre-existing index(es). There are 776 fund-year
observations (with complete past return data) in this test. In the simulations, for each
instance in which a fund adds a new benchmark, we bootstrap 1-year, 5-year and 10-year
returns for both the index the fund adds as well as its existing index, and then take their
difference as we do in the actual data. We repeat this process to obtain 10,000 panels
containing 776 index return differences.
We construct five simulated samples using different methods to address each potential
type of correlation. We created these five samples sequentially to better understand the
magnitude of each source of return dependence. Panel A of Table C.4 summarizes the
characteristics of each simulation. In Panel B of the same table, we report an Herfind-
ahl–Hirschman index (HHI) measuring the frequency with which a given pair of added and
existing indexes appear in the data and in the simulations (the reported HHI figures are
averaged across the 10,000 simulations). We compute and report this measure both at the
index and at the style level. The purpose of the HHI index is to verify the extent to which
the simulated data reflects the same degree of “clustering” of returns as observed in the
actual data.
In the first version of the simulation (v1), the counterfactual returns for the added and
existing indexes are bootstrapped randomly from the entire sample of stock-based indexes
65
used by any of the funds in our sample and according to the frequency with which they
appear. The purpose of this simulation exercise is to serve as a baseline for the other
versions of the simulation. In the second version of the simulation (v2), we begin to account
for the return dependence observed in the data. Specifically, we account for within-year
return dependence. To do so, the returns of the added and existing benchmarks are drawn
randomly from the sample of index returns in the same year. In simulation v3, we account
for the frequency with which added and existing indexes appear in the data each year. The
purpose is to account for the fact that some of our fund-year observations are duplicates
(e.g., in a given year, index A may be added by more funds than what we would expect by
chance). To account for the clustering of indexes as observed in the data, in each simulation
run, we bootstrap by index within year (without replacement). To illustrate, suppose that
in a given year in the data, index A is the added benchmark for three funds and is also the
existing benchmark for three funds; index C is randomly drawn to be the counterfactual
index for index A, so the return of index C will replace the return of index A in all six
cases (i.e., the three “added” returns and the three “existing” returns). As verified in Panel
B of the table, this allows us to precisely match the add-existing pair HHI at the index
level observed in the data (354.2). In the fourth version of the simulation, we also account
for time dependence of index returns across years. To do so, we modify simulation v3 and
bootstrap by index across the entire panel (as opposed to within year). To illustrate, if
index C is drawn to be the counterfactual index for index A, the bootstrapped returns for
index A are those of index C across all years in that simulation. The last simulation (v5) is
designed to account for return dependence at the style level, in addition to within-year and
across-year return dependence. To do so, we bootstrap by style across the entire panel. As
in the rest of the paper, indexes are classified in one of nine styles along the 3×3 size-value
dimensions. Within each simulation, we first bootstrap at the style level, and then randomly
draw index returns within the bootstrapped styles. For instance, if in a given simulation run
the midcap-value style is drawn to be the counterfactual style for the large-blend style, the
66
bootstrapped returns for added and existing indexes in the large-blend style are drawn from
the returns of midcap-value indexes in the same year. As shown in Panel B of the table,
this allows us to precisely match the frequency of add-existing style pairs as observed in the
data (HHI of 687.9). The HHI at the add-existing index pair level in simulation v5 is slightly
greater than in the data (415.2 and 354.2, respectively).
Panel C of Table C.4 reproduces the standard t-test statistic for the add-minus-existing
test and also reports the bootstrapped p-values under the five simulations described above.
For brevity, in the paper, we report the most conservative bootstrapped p-values (i.e., those
from simulation v5) for each of the tests presented in Table 3 and Table 4.
67