0% found this document useful (0 votes)
57 views86 pages

SSRN-id2822105 Retail Investor Tracking

This document summarizes a research paper that analyzes retail investor activity in the US stock market between 2010-2015. The researchers developed a new method to identify retail investor trades using price improvement data. They found that stocks with net buying by retail investors outperformed stocks with net selling over the next week. Less than half of this predictive power was due to order flow persistence or liquidity provision, suggesting retail investors may have some firm-specific information not yet reflected in stock prices. The researchers conducted various robustness checks and found retail order flows could predict cross-sectional but not aggregate stock returns.

Uploaded by

M Zamurad Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views86 pages

SSRN-id2822105 Retail Investor Tracking

This document summarizes a research paper that analyzes retail investor activity in the US stock market between 2010-2015. The researchers developed a new method to identify retail investor trades using price improvement data. They found that stocks with net buying by retail investors outperformed stocks with net selling over the next week. Less than half of this predictive power was due to order flow persistence or liquidity provision, suggesting retail investors may have some firm-specific information not yet reflected in stock prices. The researchers conducted various robustness checks and found retail order flows could predict cross-sectional but not aggregate stock returns.

Uploaded by

M Zamurad Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Tracking Retail Investor Activity

Ekkehart Boehmer, Charles M. Jones, Xiaoyan Zhang and Xinran Zhang*

Journal of Finance, forthcoming

Abstract

We provide an easy method to identify purchases and sales initiated by retail investors
using recent, widely available U.S. equity transactions data. Individual stocks with net buying by
retail investors outperform stocks with negative imbalances by approximately 10 basis points over
the following week. Less than half of the predictive power of marketable retail order imbalances
is attributable to order flow persistence; contrarian trading (a proxy for liquidity provision) and
public news sentiment explain little of the remaining predictability. There is suggestive (but only
suggestive) evidence that retail marketable orders contain firm-level information that is not yet
incorporated into prices.

*Ekkehart Boehmer is with Singapore Management University, LKC School of Business. Charles
M. Jones is with Columbia Business School. Xiaoyan Zhang and Xinran Zhang are with Tsinghua
University, PBC School of Finance. We thank Kevin Crotty, Larry Glosten, Frank Hatheway, Eric
Kelley, Jamil Nazarali, Jeff Pontiff, Meijun Qian, Tarun Ramadorai, Paul Tetlock; seminar
participants at Brigham Young University, University of California at Riverside, the Frankfurt
School, Humboldt University Berlin, INSEAD, Purdue University, Rice University, Singapore
Management University, Southern Methodist University, Tsinghua University, University of Utah,
National Singapore University, Nanyang Technology University; and conference audiences at the
2017 SFS Cavalcade, 2017 German Finance Association, 2017 ABFER, the 2017 CICF and the
2018 AFA for their helpful comments. The contents of this publication are solely the responsibility
of the authors. The authors have no conflicts of interest to disclose.

Electronic copy available at: https://ssrn.com/abstract=2822105


Can retail equity investors predict future stock returns? Or do they make systematic, costly
mistakes in their trading decisions? The answers to these questions are important for other market
participants looking for useful signals about future price moves, for behavioral finance researchers,
and for policymakers who need to decide whether these investors should be protected from
themselves.

Many researchers have concluded that retail equity investors are generally uninformed and,
if anything, make systematic mistakes when selecting equity investments (see for example Barber
and Odean (2000,2008)). However, some more recent evidence, including Kaniel, Saar, and
Titman (2008), Barber, Odean and Zhu (2009), Kaniel, Liu, Saar, and Titman (2012), Kelley and
Tetlock (2013), Fong, Gallagher, and Lee (2014), and Barrot, Kaniel, and Sraer (2016), suggests
otherwise. These studies show that retail investors’ trading can predict future stock returns.
Unfortunately, most existing studies of retail order flow are based on proprietary datasets with
relatively small subsets of overall retail order flow. For example, Barber and Odean (2000)
analyze data from a single U.S. retail brokerage firm, while Barber and Odean (2008) examine
individual investor trading data from a total of three different retail or discount brokerage firms.
Kelley and Tetlock (2013) use data from a single U.S. wholesaler; Fong, Gallagher, and Lee (2014)
analyze data from the Australian Securities Exchange (ASX); and Barrot, Kaniel, and Sraer (2016)
use data from a single French brokerage firm. Kaniel, Saar, and Titman (2008), Kaniel et al. (2012),
and Boehmer, Jones, and Zhang (2008) use proprietary account-type data from the NYSE during
the early 2000s. During that sample period, only a small number of brokerages sent their retail
order flow to the NYSE. As a result, the NYSE’s market share of overall retail order activity was
(and has remained) quite small.

In existing work, many researchers use trade size as a proxy for retail order flow. Before
the spread of computer algorithms that “slice and dice” large institutional parent orders into a
sequence of small child orders, small trades were much more likely to come from retail customers,
while institutions were likely behind the larger reported trades. For example, Lee and
Radhakrishna (2000) use a $20,000 cutoff point to separate smaller individual trades from larger
institutional trades. More recently, Campbell, Ramadorai, and Schwartz (2009) effectively allow
these cutoff points to vary through a regression approach that is calibrated to observed quarterly
changes in institutional ownership, but they maintain the same basic assumption that small trades

Electronic copy available at: https://ssrn.com/abstract=2822105


are more likely to arise from individual trading. However, once algorithms become an important
feature of institutional order executions, in the early 2000s, this trade-size partition becomes far
less useful as a proxy for retail order flow. In fact, the tendency for algorithms to slice orders into
smaller and smaller pieces has progressed so far that we find that during our recent sample period,
the retail order flow that we identify actually has a slightly larger average trade size compared to
other flow.

Given the current automated and algorithm-driven market structure, researchers need an
easily implementable method to isolate retail order flow. We introduce such a measure in this
paper. As one of our main contributions, we show that our measure can identify a broad swath of
marketable retail order flow. Our measure builds on the fact that, due to regulatory restrictions in
the U.S. and the resulting institutional arrangements, retail order flow, but not institutional order
flow, can receive price improvement, measured in small fractions of a cent per share. We use this
fact to identify marketable retail price-improved orders from the TAQ data, a publicly available
data set that contains all transactions for stocks listed on a national exchange in the U.S. We do
this by identifying trades that execute at share prices with fractional pennies. Most such price-
improved transactions take place off-exchange and are reported to a Trade Reporting Facility
(TRF). Using this TRF data, we identify transactions as retail buys if the transaction price is
slightly below the round penny, and retail sells if the transaction price is slightly above the round
penny. This approach isolates retail investors’ marketable orders from institutional ones, because
institutional trades cannot receive this type of fractional penny price improvement.1 We discuss
our approach in greater detail in the data section. Notice that our retail order flow measure only
includes marketable orders, but not limit orders. Overall, we believe that our method of retail trade
identification is conservative, and we cross-validate the accuracy of our approach using a small
sample of NASDAQ TRF audit trail data.

We analyze retail marketable order flow from the U.S. equity market for six years between
January 2010 and December 2015. We find that retail investors are slightly contrarian at a weekly
horizon, and that the cross-section of weekly marketable retail order imbalances predicts the cross-

1
In contrast, institutional trades often occur at the midpoint of the prevailing bid and ask prices. If the bid-ask spread
is an odd number of cents, the resulting midpoint trade price ends in a half-penny. Many of these midpoint trades take
place on crossing networks and are reported to the TRF. Thus, trades at or near a half-penny are likely to be from
institutions and are not assigned to the retail category.

Electronic copy available at: https://ssrn.com/abstract=2822105


section of returns over the next several weeks, consistent with the findings of Kaniel, Saar, and
Titman (2008), Kaniel et al. (2012), Kelley and Tetlock (2013), Fong, Gallagher, and Lee (2014),
and Barrot, Kaniel, and Sraer (2016), but inconsistent with the findings of many others. The
predictability of marketable retail order flow for future returns is consistent with three hypotheses:
persistence in retail order flow, liquidity provision, and informed trading. We conduct a
decomposition exercise and separate the marketable retail order imbalance into proxies for these
components. The empirical findings show that the persistence in order flow, and order flow driven
by return reversals (our proxy for liquidity provision), accounts for about half of the predictive
power of the marketable retail order imbalance for future returns, and we attribute the other half
to potential informed trading. We go one step further and investigate the nature of the information
embedded in retail trading. Our results show that the marketable retail order imbalance is positively
correlated with some firm-level surprises in public news, and the marketable retail order flow has
predictive power beyond public news, which suggests (but only suggests) that retail investors
might possess firm-level information that is not yet incorporated into prices. Finally, we conduct
a battery of robustness checks and provide further discussion. Our results are robust, and we
provide additional evidence that, despite the predictive power of marketable retail order flows in
the cross section, aggregate marketable retail flows cannot predict future market returns.

Given the nature of our data, our work is also related to recent studies of off-exchange
trading in the U.S. For instance, Kwan, Masulis, and McInish (2015) study the competition
between traditional stock exchanges and new dark trading venues and find that the minimum
pricing increment regulation (typically one penny) drives orders to dark pools and limits the
competitiveness of the exchanges. Battalio, Corwin, and Jennings (2016) examine make-take fees
and how brokers route order flow, and suggest that current order routing practices may not
maximize the quality of limit order execution. Menkveld, Yueshen, and Zhu (2017) directly
investigate the pecking order of trading venues in dark pools and document that investors
strategically put low-cost-low-immediacy orders in front of high-cost-high-immediacy orders.

Compared to the earlier literature on retail orders and studies of off-exchange trades, we
make three main contributions. First and most importantly, we propose a novel methodology for
identifying and signing marketable retail trades using publicly available data with substantial
coverage. Second, our empirical results show that the marketable retail trades we identify can
predict the cross-section of future stock returns. Third, we analyze the nature of the predictive
3

Electronic copy available at: https://ssrn.com/abstract=2822105


power of marketable retail order flow and show that half of its predictability is likely driven by
order imbalance persistence and liquidity provision, while the other half is consistent with
informed trading. We also track potential informed trading to different types of news and provide
some suggestive evidence on the nature of the information possessed by these retail investors.

Two studies, Kaniel, Saar, and Titman (2008) and Kelley and Tetlock (2013), study similar
questions, and are closely related to our research, but with different data and different
interpretations. For instance, using proprietary data from the NYSE between January 2000 and
December 2003, Kaniel, Saar, and Titman (2008) document that retail order flows can predict
stock returns. In addition, Kaniel, Saar and Titman (2008) examine the contemporaneous relation
between their retail order flows and stock returns. They find that the contemporaneous return is
significantly positive for stocks that retail investors sell, and negative for stocks that they buy,
which is consistent with a liquidity provision interpretation and inconsistent with the information
story. We follow their approach using our new marketable retail order flow variables. We are able
to replicate the predictive relation between retail order flow and future stock returns. However, our
results for the contemporaneous relation are different: the contemporaneous return is significantly
negative for stocks that retail investors sell, and positive for stocks that they buy, when they use
marketable orders. Our findings are more in line with an information interpretation than a liquidity
provision interpretation.

Kelley and Tetlock (2013) obtain data from a major retail wholesaler between February
2003 and December 2007. Their data allow them to separate retail orders into market orders and
limit orders. They find that retail market orders and limit orders can both predict future stock
returns, but for different reasons. The aggressive market orders can correctly predict future news,
suggesting these trades are informed, while the passive limit orders are contrarian, consistent with
the liquidity provision hypothesis. Our marketable retail order flow measure only identifies market
orders, and for these marketable orders we follow their tests and replicate their results. In addition,
we decompose our marketable retail order imbalance into components related to order flow
persistence, contrarian trading, public news and a residual, which potentially contains non-public
information. The decomposition exercise shows that public news barely contributes to the
predictive power of the marketable retail trades, and the residual part is more important. With more
recent data and wider coverage, our study provides interesting new findings, which complement
the original studies by Kaniel, Saar, and Titman (2008) and Kelley and Tetlock (2013).
4

Electronic copy available at: https://ssrn.com/abstract=2822105


The remainder of this paper is organized as follows. We describe the data and our
identification method in Section I. Section II presents our main empirical results. We provide
further discussion of the results and perform robustness and plausibility checks in Section III.
Finally, Section IV concludes the paper.

I. Identifying Retail Order Flows


As we noted in the introduction, our most important contribution is to provide a simple
new method to identify a wide swath of marketable retail order flow using publicly available equity
transaction data. We introduce data sources in Section I.A, and the institutional background in
Section I.B. Summary statistics and cross validation are reported in Section I.C and I.D,
respectively.
A. Data Sources

From TAQ trade data, we keep only trades that occur off-exchange, with exchange code
“D.” We merge these TAQ data with stock returns and accounting data from CRSP and Compustat,
respectively. We include only the common stocks with share code 10 or 11 (which excludes mainly
ETFs, ADRs, and REITs) listed on the NYSE, NYSE MKT (formerly the Amex), and NASDAQ.
We remove low-priced stocks by requiring the minimum stock price to be $1 at the previous
month-end.

Our sample period covers January 3, 2010 to December 31, 2015. Data on subpenny price
improvement actually extend back to 2005. In Appendix Figure A1, we show the time series from
January 2005 (the start of Regulation NMS, which established the current regulatory framework
for subpenny price improvement in the U.S.) to December 2017. We choose to study the period
from 2010 to 2015 for two reasons. First, for the first few years of Reg NMS, there is a strong
upward trend in the number of subpenny trades, possibly because an increasing number of
brokerage firms were adopting the practice of providing fractional cents of price improvement to
retail investors via internalization or wholesalers. The upward trend disappears and stabilizes after
2009. Second, from 2016 to September 2018, the SEC adopted a tick size pilot program (TSPP)
that affects tick size and brokers’ ability to provide price improvement for many stocks, which
likely affects the prevalence of subpenny price improvements unevenly in the cross section.
Therefore, our main results are focused on the middle part of the data, 2010 to 2015. For each day,
we have an average of around 3,000 firms included in the sample.

Electronic copy available at: https://ssrn.com/abstract=2822105


B. Institutional Background and Methodology

In the U.S., most marketable equity orders initiated by retail investors do not take place on
one of the dozen or so registered exchanges. Instead, these retail orders are typically executed
either by wholesalers or via internalization, meaning that orders are filled from a broker’s own
inventory. Orders executed by wholesalers or through internalization must be publicly reported;
they are usually reported to a FINRA Trade Reporting Facility (TRF), which provides broker-
dealers with a mechanism through which to report transactions that take place off-exchange. These
TRF executions are then included in the TAQ “consolidated tape” of all reported transactions with
exchange code “D.” Many orders that are internalized or executed by wholesalers are given a small
price improvement relative to the National Best Bid or Offer (NBBO).2 For instance, wholesalers
are willing to provide a small price improvement to induce the retail trader’s broker to route the
order to the wholesaler. Internalizers, who are subject to Regulation 606T, need to show that they
execute their clients’ orders optimally, and thus also have incentives to provide price improvement
to their clients. This price improvement is typically only a small fraction of a cent. Common price
improvement amounts are 0.01, 0.1, and 0.2 cent.

Brokerage firms in the U.S. are required to provide regular summary statistics in SEC Rule
606 filings about their order routing practices for non-directed orders. A directed order instructs
the broker to execute an order on a given exchange or trading venue; a non-directed order gives
the broker discretion regarding the execution venue. The vast majority of retail orders are non-
directed. For example, Charles Schwab reports that 98.6% of their security orders during the
second quarter of 2016 were non-directed orders. The corresponding figure for TD Ameritrade is
99%. According to the Rule 606 filings by these two retail brokerage firms, more than 90% of
these orders receive price improvement.

Our communications with a major retail wholesaler and a major exchange suggest that
these types of price improvement are not a feature of institutional order executions, as institutional
orders are almost never internalized or sold to wholesalers. Instead, their orders are sent to
exchanges and dark pools, and Regulation NMS prohibits these orders from having subpenny limit

2
As a rough estimate of the frequency of subpenny price improvement, we find in a NASDAQ subsample used for
robustness tests (introduced in Section I.D) that 60% of trades on “retail” venues receive subpenny price
improvements, with 14% reported at the halfpenny and 46% taking place at a different subpenny. For subpenny trades
that do not execute at half pennies and constitute the focus of our study, more than 99% are reported to a TRF with
exchange code “D.”

Electronic copy available at: https://ssrn.com/abstract=2822105


prices. Thus, institutional transaction prices are usually in round pennies. The only exception
applies to midpoint trades. Reg NMS has been interpreted to allow executions at the midpoint
between the best bid and best offer. As a result, institutions are heavy users of crossing networks
and midpoint peg orders that generate transactions at this midpoint price. Since the quoted spread
is now typically one cent per share, this means that many institutional transactions are reported at
a half-penny price. In the early part of our sample, a small number of dark pools allowed some
subpenny orders and provided non-midpoint subpenny execution prices, but our results are robust
when we exclude this subperiod.3

Based on these institutional arrangements, identifying transactions initiated by retail


customers is fairly straightforward. Transactions with a retail seller tend to be reported on a TRF
at prices that are just above a round penny due to the small price improvement, while transactions
with a retail buyer tend to be reported on a TRF at prices just below a round penny. To be precise,
for all trades reported to a FINRA TRF (exchange code “D” in TAQ), let Pit be the transaction
price in stock i at time t, and let Zit ≡ 100 * mod(Pit, 0.01) be the fraction of a penny associated
with that transaction price. Zit can take any value in the unit interval [0,1). If Zit is in the interval
(0,0.4), we identify it as a retail sell transaction. If Zit is in the interval (0.6,1), then the transaction
is coded as a retail buy transaction. To be conservative, transactions at a round penny (Zit = 0) or
near the half-penny (0.4 ≤ Zit ≤ 0.6) are not assigned to the retail category.

As discussed above, Reg NMS requires that limit orders be priced at round pennies, so our
approach will by definition identify only marketable retail orders.4 The 606 filings by brokerage
firms are also partitioned into market and limit orders, which allows us to gauge the relative

3
According to SEC litigation releases (see, for example, https://www.sec.gov/litigation/admin/2015/33-9697.pdf and
https://www.sec.gov/litigation/admin/2016/33-10013.pdf), at least two dark pool operators (Credit Suisse and UBS)
were accused of violations of Regulation NMS in accepting, ranking, and executing orders based on subpenny prices.
These alleged violations occurred through mid-2011 and were eventually settled. A back-of-the-envelope calculation
suggests that these violations could have accounted for about 0.5% of total share volume during this part of our sample
period. Since these dark pools cater to institutions, including high-frequency traders, our identification of retail flows
using subpenny trades could be “contaminated” during this period, and we cannot identify using public TAQ data
which trades are from the affected dark pools. Given the potential contamination accounts for a small part of overall
subpenny trades for our sample period, our main analysis still focuses on the full sample period from 2010 to 2015.
We conduct a subsample analysis for 2012 to 2015 in a robustness check in Section III.B, and the results are similar
to those for the whole sample. We thank the associate editor for pointing this out.
4
Marketable orders by definition demand immediacy, and according to Kelley and Tetlock (2013), market orders are
more informed than are limit orders. Thus, any predictive power from retail market orders is likely to be stronger than
that of retail limit orders and overall retail orders.

Electronic copy available at: https://ssrn.com/abstract=2822105


prevalence of these two types of orders. For example, the Charles Schwab brokerage firm reports
that for the second quarter of 2016, market orders account for 50.0% of its customers’ non-directed
orders in NYSE-listed securities, while limit orders account for 45.1%, and other orders account
for the remainder. For securities listed on NASDAQ, limit orders are slightly more prevalent than
market orders at Schwab, with market orders accounting for 44.0% and limit orders, 50.7%. Note
also that non-marketable limit orders may be cancelled without being executed, so most overall
retail trading activity is likely to arise from marketable orders. Thus, our approach probably picks
up a majority of the overall retail trading activity.5

C. Summary Statistics

Table I presents summary statistics on the marketable retail orders identified by our
method. We pool observations across stocks and days, and compute the mean, standard deviation,
median, and 25th and 75th percentiles. Our sample comprises over 4.6 million stock-day
observations. For the number of shares traded per day (vol), the mean share volume is around 1.23
million, and the standard deviation is about 6.85 million shares. The average stock has 5,917 trades
each day (trd). These numbers suggest that the average trade size over this sample period is about
200 shares. Our identified marketable retail investor activity represents only a small part of the
overall trading volume. The identified average daily buy volume from marketable retail orders
(mrbvol) is 42,481 shares, and the average daily sell volume from marketable retail orders (mrsvol)
is 42,430 shares. Throughout the paper, we use “mr” to represent “marketable retail”. Thus, we
identify an average of 84,911 shares per stock-day traded by marketable retail orders, about 6.91%
of the average total shares traded each day. The average number of buy trades from marketable
retail orders (mrbtrd) each day is 110, and the average number of sell trades from marketable retail
orders (mrstrd) each day is 108. Thus, the total number of identified trades per stock-day from
marketable retail orders is 218, around 3.68% of the total number of trades. Interestingly, the buy
volumes closely match the sell volumes, and the number of buy trades match the number of sell
trades, both indicating that many marketable retail trades offset each other. In terms of average
share volumes and number of trades, there is slightly more buying than selling by marketable retail
trades over our sample period.

5
One might wonder whether the market or marketable orders can be offset, in aggregate, by limit or non-marketable
orders in the opposite direction. This is possible. Unfortunately, we do not have data to directly check this possibility.

Electronic copy available at: https://ssrn.com/abstract=2822105


Information on odd lot trades (trades of fewer than 100 shares) is reported on the TRF and
on the consolidated tape beginning in December 2013 (see O’Hara, Yao, and Ye, 2014). During
the sample period from December 2013 to December 2015, for which odd lot data are available,
the daily averages of marketable odd lot marketable retail buy and sell volumes (oddmrbvol and
oddmrsvol, respectively) are 506 and 443 shares respectively, totaling 949 shares traded by
marketable retail investors in odd lots per average stock-day. This is about one-third of the total
odd lot share volume, at 3,027 shares. The pattern for the number of trades is similar. Prior studies
of odd lots generally find that these marketable retail-dominated orders are virtually uninformed,
so later in the paper we study odd lots separately to determine whether the information content in
odd lots executed by marketable retail trades differs from that of marketable retail round lots.

Figure 1 provides further statistics on the overall properties of our identified marketable
retail trades. Panel A presents trade sizes in dollars. For each marketable retail trade, we compute
the trade size in dollars by multiplying the number of executed shares by the transaction price. For
each year of our sample, we compute the 25th percentile, the median, and the 75th percentile of
marketable retail trade size. The median marketable retail trade size is around $8,000, and the
interquartile range is mostly between $2,000 and $25,000. Panel B reports the distribution of
subpenny prices. We separate all trades into 12 groups or bins. We separate out trades that take
place at a round penny or half penny; the other bins are each 0.1 cent wide. We pool the sample
across days and stocks, and we report the number of shares reported in the different subpenny
buckets. Not surprisingly, most of the share volume occurs at round and half-pennies, with average
stock-day share volumes of around 27,000 and 7,000, respectively. The next most prevalent
occurrence, averaging around 3,000 shares per day per stock, is a subpenny price within 0.1 cent
of a round penny. Other subpenny bins are less prevalent, with most averaging around 1,000 shares
per stock per day.

We measure marketable retail investors’ directional trades by computing four order


imbalance measures for each stock i on each day t:

𝑚𝑟𝑏𝑣𝑜𝑙(𝑖, 𝑡) − 𝑚𝑟𝑠𝑣𝑜𝑙(𝑖, 𝑡)
𝑚𝑟𝑜𝑖𝑏𝑣𝑜𝑙(𝑖, 𝑡) = , (1)
𝑚𝑟𝑏𝑣𝑜𝑙(𝑖, 𝑡) + 𝑚𝑟𝑠𝑣𝑜𝑙(𝑖, 𝑡)

𝑚𝑟𝑏𝑡𝑟𝑑(𝑖, 𝑡) − 𝑚𝑟𝑠𝑡𝑟𝑑(𝑖, 𝑡)
𝑚𝑟𝑜𝑖𝑏𝑡𝑟𝑑(𝑖, 𝑡) = , (2)
𝑚𝑟𝑏𝑡𝑟𝑑(𝑖, 𝑡) + 𝑚𝑟𝑠𝑡𝑟𝑑(𝑖, 𝑡)

Electronic copy available at: https://ssrn.com/abstract=2822105


𝑜𝑑𝑑𝑚𝑟𝑏𝑣𝑜𝑙(𝑖, 𝑡) − 𝑜𝑑𝑑𝑚𝑟𝑠𝑣𝑜𝑙(𝑖, 𝑡)
𝑜𝑑𝑑𝑚𝑟𝑜𝑖𝑏𝑣𝑜𝑙(𝑖, 𝑡) = , (3)
𝑜𝑑𝑑𝑚𝑟𝑏𝑣𝑜𝑙(𝑖, 𝑡) + 𝑜𝑑𝑑𝑚𝑟𝑠𝑣𝑜𝑙(𝑖, 𝑡)

𝑜𝑑𝑑𝑚𝑟𝑏𝑡𝑟𝑑(𝑖, 𝑡) − 𝑜𝑑𝑑𝑚𝑟𝑠𝑡𝑟𝑑(𝑖, 𝑡)
𝑜𝑑𝑑𝑚𝑟𝑜𝑖𝑏𝑡𝑟𝑑(𝑖, 𝑡) = . (4)
𝑜𝑑𝑑𝑚𝑟𝑏𝑡𝑟𝑑(𝑖, 𝑡) + 𝑜𝑑𝑑𝑚𝑟𝑠𝑡𝑟𝑑(𝑖, 𝑡)

The first two measures are calculated using marketable retail round lot executions between January
2010 and December 2015 and by aggregating round lot and odd lot executions thereafter, while
the last two measures are calculated using only marketable retail odd lots, and thus these latter
measures begin in December 2013 instead of December 2010.

Summary statistics on the marketable retail order imbalance measures are reported at the
bottom of Table I. Across all stocks and all days, the mean order imbalance for share volume,
mroibvol, is -0.038, with a standard deviation of 0.464, and the mean order imbalance for trade,
mroibtrd, is -0.032, with a standard deviation of 0.437. The correlation between mroibtrd and
mroibvol is around 85%. Our later discussions mostly focus on mroibvol, but the results using
these two measures are quite similar given the high correlation between the two. Overall, the order
imbalance measured in shares is close to zero on average, but with sells slightly more prevalent
than buys, which is consistent with findings in Kaniel, Saar, and Titman (2008). More importantly,
the sizable standard deviation measures show that there is substantial cross-sectional variation in
the activity levels and trading direction of retail investors. The odd lot order imbalance measures
exhibit similar patterns.

In Figure 2, we plot the time-series of the cross-sectional mean, median, and 25th and 75th
percentiles of the marketable retail order imbalance measures over the six-year sample period.
Across all four order imbalance measures, the means and medians are all close to zero, while the
25th percentiles are mostly around -0.3, and the 75th percentiles are mostly around 0.2. There are
no obvious time trends or structural breaks in the time-series observations.

We extensively examine other properties of the marketable retail order imbalance


measures. To save space, we put them in the Appendix. The order imbalance measure’s daily
autocorrelations are reported in Appendix Figure A2 Panel A. The daily order imbalance measures
are mostly significantly positively correlated with their nearby lags, while the cross-firm median
correlation is 0.15. This positive autocorrelation is statistically significant over horizons up to a
few months. The persistence of marketable retail order flow is slightly higher for larger firms than

10

Electronic copy available at: https://ssrn.com/abstract=2822105


for smaller firms. Appendix Figure A2 Panel B presents the time-series correlation between the
marketable retail order imbalance measure and past returns. The results display a V-shape,
indicating that the correlation between the current marketable retail imbalance and the previous
one-day return is positive on average (which is consistent with momentum trading), then becomes
negative (which is consistent with contrarian trading) for the next 30 trading days. Finally,
Appendix Table AI reports the results for the measure’s seasonality and its relation with variables
reflecting firm fundamentals.

D. Cross Validation with NASDAQ TRF Data

Our main data source is TAQ, which does not provide direct information on the direction
of the trade or the identity of the traders. We validate our marketable retail order imbalance
algorithm through a small sample of proprietary NASDAQ data. 6 The same dataset is used in
Menkveld, Yueshen, and Zhu (2017), who provide more details about the data. The NASDAQ
sample covers all intraday transactions on its TRF for 117 stocks for the month of October 2010.
The 117 stocks are chosen from different size groups, but they are generally larger than a typical
firm in TAQ.7

For each trade, the NASDAQ TRF data provide a trade direction indicator: “buy,” “sell,”
or “cross.” Our algorithm identifies all subpenny trades with subpenny prices between 0.61 and
0.99 cents inclusive as “buy” trades. We separate all subpenny “true buy” trades (as indicated in
the NASDAQ TRF data) with price below $100 into two categories: “identified buy” and
“identified sell (false identification).” We falsely identify 1.37% of all subpenny “true buy” trades
as “sell.” Similarly, our algorithm identifies all subpenny trades with subpenny prices between
0.01 and 0.39 basis points as “sell” trades. In this case, we falsely identify 2.12% of all subpenny
“true sell” trades with price below $100 as “buy.” If we put identified marketable retail “buys” and
“sells” together, for stocks with a share price below $100, our subpenny approach matches the
NASDAQ TRF’s correct buy/sell sign 98.2% of the time, while the standard Lee and Ready (1991)
trade-signing algorithm gets the trade sign right 96.7% of the time. Overall, we find our algorithm
is quite accurate for trade direction identification.

6
We thank NASDAQ for generously providing the data.
7
The smallest market cap of the 117 NASDAQ firms is 257 million dollars, while our sample’s 40 percentile market
cap is merely 243 million dollars.

11

Electronic copy available at: https://ssrn.com/abstract=2822105


Menkveld et al. (2017) describe that when order flows come in, they will be routed to
different types of off-exchange venues, depending on the cost and immediacy of the trade
execution. The NASDAQ TRF data identify five types of off-exchange venues: “DarkNMid,”
“DarkMid,” “DarkOther,” “DarkPrintB,” and “DarkRetail.” Our communication with a major
marketable retail-wholesaler and the NASDAQ indicates that other than “DarkRetail”, the venue
types are a mix of all kinds of traders. Thus, the venue is not a precise indicator for a trader’s
identity, and even if one had access to the NASDAQ TRF sample for a larger cross-section over a
longer period of time, there would still be an important role for our algorithm in identifying
marketable retail buys and sells.

Our main measure in this article is order imbalance. The correlation between our order
imbalance measure and the one calculated using the “DarkRetail” order imbalance for the 117
stocks is 0.70. This correlation is less than one for two main reasons: First, our order imbalance
measure includes trades printed on the competing NYSE TRF, while the NASDAQ TRF dataset
does not; second, our order imbalance measure includes some subpenny trades from the
“DarkNMid” and “DarkMid” venues, in addition to those in “DarkRetail.” Finally, some
marketable retail market orders do not receive price improvement or receive a full half-cent of
price improvement. We do not sign these trades or include them in our marketable retail sample,
because we cannot be sure that we have the correct trade direction. Nevertheless, the high
correlation between our marketable retail order imbalance measure and the actual NASDAQ
“DarkRetail” venue data strongly suggests that our order imbalance measures closely reflect the
true marketable buy and sell activities of retail investors.8

II. Empirical Results

In the data section, we measure order imbalances at the daily level to minimize the amount
of aggregation. For our main empirical results, we focus on weekly horizons to reduce the impact

8
Kelley and Tetlock (2013) compute retail order imbalance measures using data from one large wholesaler. As part
of a conference discussion of our paper, Kelley computed the retail order imbalance measure for 2007 using our
algorithm and found that the correlations between our measure and their measure ranged between 0.345 and 0.507
when defining marketable retail flow using different subpenny ranges. For instance, 0.345 is the correlation of our
measure and their measure using the number of shares for subpenny prices in the (0, 0.4) and (0.6, 1) cent intervals,
while 0.507 is the correlation between our measure and their measure using the number of trades for subpenny prices
at 0.99 cents and 0.01 cents. These correlations should be less than one because their flow comes from only one
wholesaler, while our measure comes from TRF, which covers nearly all retail order executions. We are grateful to
Eric Kelley for computing and sharing these calculations with us.

12

Electronic copy available at: https://ssrn.com/abstract=2822105


of microstructure noise on our results. That is, our main variables of interest are firm-level average
marketable retail order imbalances over five-day horizons and five-day firm-level stock returns.
Blume and Stambaugh (1986) show that using the end-of-day closing price to compute daily
returns can generate an upward bias due to bid-ask bounce. Therefore, we compute two versions
of weekly returns, one by compounding CRSP daily returns, based on daily closing prices, and
one by compounding daily returns using the end-of-day bid-ask average price. We always report
the results for both types of returns but focus our attention on returns based on closing bid-ask
averages.

We start by investigating the properties of the order imbalance measures in Section II.A.
In Section II.B, we examine whether past marketable retail order imbalance measures can predict
future stock returns using Fama-MacBeth regressions and long-short portfolios. In Section II.C,
we compare alternative hypotheses for the predictive power of marketable retail order imbalances
for future stock returns. In Section II.D, we explore the nature of the information contained in
marketable retail flow by linking it to Thomson Reuters News Analytics data.

A. What Explains Marketable Retail Investor Order Imbalances?

We start our empirical investigation by examining what drives the trading of retail
investors. Specifically, we examine how retail investors’ marketable order flow is related to past
order flow and past returns. To allow maximal time-series flexibility and focus on cross-sectional
patterns, we adopt the Fama and MacBeth (1973) two-stage estimation. At the first stage, for each
day, we estimate the following predictive regression:

𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤) = 𝑏0(𝑤) + 𝑏1(𝑤)′ 𝑟𝑒𝑡(𝑖, 𝑤 − 1) + 𝑏2(𝑤)′ 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − 1)


(5)
+ 𝑢1(𝑖, 𝑤),

where we use various horizons of past weekly returns, 𝑟𝑒𝑡(𝑖, 𝑤 − 1) and various control variables
from the past to explain the order imbalance measure, 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤), for firm i during week w. The
first-stage estimation generates a daily overlapping time-series of weekly
coefficients, {𝑏0(𝑤), 𝑏1(𝑤)′ , 𝑏2(𝑤)′ }. At the second stage, we conduct statistical inference using
the time-series of the coefficients. Because we use overlapping daily frequency data for weekly

13

Electronic copy available at: https://ssrn.com/abstract=2822105


order imbalance and return measures, the standard errors are calculated using Newey-West (1987)
with six lags.9

To explain the order imbalance over week w, from day 1 to day 5, we first include its own
lag, the past week order imbalance measure from day -4 to day 0, or mroib(w-1). We also include
past returns over three different horizons: the previous week (ret(w-1)), the previous month (ret(m-
1)), and the previous six months (ret(m-7,m-2)). For control variables, we use log market cap, log
book-to-market ratio, turnover (share volume over shares outstanding), and daily return volatility,
all computed from the previous month’s data.

The results are presented in Table II, with regressions I and II explaining the order
imbalance measured in shares, and regressions III and IV explaining the order imbalance measured
using the number of trades. In the first regression, the order imbalance using share volume,
mroibvol, has a positive correlation with its own lag, with a highly significant coefficient of 0.22,
indicating that directional marketable retail trading activity is somewhat persistent over successive
weeks, as suggested in Chordia and Subrahmanyam (2004). The coefficients for the past week,
past one month, and past six-month returns are -0.9481, -0.2778, and -0.0586, respectively. All
three coefficients are negative and highly significant, which shows that marketable retail order
flows are contrarian for horizons ranging between one week and six months. The control variables
indicate that investors tend to buy more aggressively in larger firms, growth firms, and firms with
higher turnover and higher volatility. All coefficients are highly significant. The average adjusted
R2 from the first stage cross sectional estimation is about 6%.

We use different return and order imbalance measures for regressions II, III, and IV. At the
weekly horizon, the results are similar across methods of computing returns and order imbalances.
Henceforth, we focus our discussion on bid-ask midpoint returns, which do not have bid-ask
bounce and thus exhibit a smaller degree of time-series predictability than returns based on
transaction prices. We also include CRSP returns in the results for the sake of completeness and
robustness.

The negative coefficients on past returns match some of the findings in the literature. For
example, marketable retail order flows are found to be contrarian in Kaniel, Saar, and Titman

9
The optimal lag number is chosen using BIC.

14

Electronic copy available at: https://ssrn.com/abstract=2822105


(2008) over monthly horizons, and by Barrot, Kaniel, and Sraer (2016) over daily and weekly
horizons. In contrast, Kelley and Tetlock (2013) paint a more complex picture. They find that at
weekly horizons, marketable retail order imbalance measures are contrarian and have negative
coefficients on past returns. Over shorter (daily) horizons, however, they find that market order
imbalances actually have a positive coefficient on the lagged one-day return, which implies
momentum rather than contrarian behavior.

Appendix Figure A2 Panel B plots the correlation of daily order imbalance with past
returns for the previous one to 80 trading days. Similar to Kelley and Tetlock (2013), the
correlation between the current marketable retail order imbalance and the previous-day return is
positive, indicating a momentum trading pattern on average. However, at lags between two days
and 30 days, our average correlation is slightly negative. Our results are thus consistent with the
findings of Kelley and Tetlock (2013) at short horizons and with those of other researchers at
longer horizons. 10,11

Our results in Table II reveal two important drivers affecting weekly order imbalance. The
first is its own lag, which indicates that the marketable retail order imbalance measures are
persistent. The second are past returns, and we show a mixed result of both contrarian and
momentum patterns, with the contrarian pattern prevailing at weekly horizons.

B. Predicting Future Stock Returns with Marketable Retail Order Imbalance Measures
B.1. Methodology and Overall Predictive Power

Can marketable retail investors’ activity provide useful information for future stock
returns? In this section, we examine the predictive power of our order imbalance measures using
Fama-MacBeth regressions as follows:

10
Lee et al. (2004) also find a mixed pattern of contrarian and momentum, using the overall market order imbalance.
They find evidence that, after up-market moves, overall trades tend to follow a momentum pattern, while overall trades
tend to be contrarian after down-market moves. We provide similar results using daily retail order flows in Internet
Appendix Table AI Panel A. When we use weekly retail order flows, the patterns both become contrarian, as shown
in Internet Appendix Table AI Panel B.
11 In addition, we examine how firm-level order imbalance measures are related to firm fundamentals, as in Chordia,

Huh, and Subrahmanyam (2007). The results shown in Internet Appendix Table AI Panel E indicate that retail order
imbalances are positively related to firm size, number of analysts, analyst dispersion, and leverage and negatively
related to past return, firm age, and book-to-market ratio.

15

Electronic copy available at: https://ssrn.com/abstract=2822105


𝑟𝑒𝑡(𝑖, 𝑤) = 𝑐0(𝑤) + 𝑐1(𝑤)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) + 𝑐2(𝑤)′ 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − 1)
(6)
+ 𝑢2(𝑖, 𝑤),

where we use the marketable retail order imbalance measure from the previous week,
𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1), and various control variables to predict the next week’s stock return, 𝑟𝑒𝑡(𝑖, 𝑤),
for firm i during week w. As in the previous section, because we use overlapping daily frequency
data for weekly order imbalances and return measures, the standard errors of the time-series are
adjusted using Newey-West (1987) with five lags. If past marketable retail order imbalances
predict the cross-section of future returns in the same direction, we expect coefficient c1 to be
significantly positive. For example, if retail buys dominate retail sells for a particular stock during
a particular week, a positive c1 means that that stock’s future return tends to be above the cross-
sectional average. There are several possible explanations, none of which are mutually exclusive.
There could be persistence in marketable retail order imbalances, marketable retail orders could
be compensated for providing liquidity, or retail traders may have valuable information that is
incorporated into stock prices at some point after they trade. We examine these hypotheses in
Section C. If coefficient c1 turns out to be significantly negative, again there are multiple possible
explanations. These retail investors might be making systematic trading mistakes, or retail
investors might be mainly “liquidity” or “noise” traders who end up trading at temporarily
disadvantageous prices because rational but risk-averse market makers require compensation for
trading with them. Either way, a negative c1 would constitute a drag on the overall returns of these
retail investors. Finally, if c1 is insignificantly different from zero, we cannot reject the null that
our measure of marketable retail order flow is uninformative on average about the cross-section of
future stock returns.

We again include past returns as control variables, using three different horizons: the
previous week, the previous month, and the previous six months (from month m-7 to month m-2).
In addition, we include log market cap, log book-to-market ratio, turnover, and daily return
volatility, all from the previous month. We report the estimation results in Table III. In regression
I, we use the order imbalance based on share volume, mroibvol, to predict the next week’s return
based on bid-ask midpoints. The coefficient on mroibvol is 0.0009, with a t-statistic of 15.60. The
positive and significant coefficient shows that, if retail investors buy more than they sell in a given
week, the return on that stock in the next week is significantly higher. In terms of magnitude, we

16

Electronic copy available at: https://ssrn.com/abstract=2822105


report at the bottom of the table that the inter-quartile range for the mroibvol measure is 1.1888
per week. Multiplying the interquartile difference by the regression coefficient of 0.0009 generates
a weekly return difference of 10.89 basis points (or 5.66% per year) when moving from the 25th to
the 75th percentile of the mroibvol variable. The same pattern is present when we use different
order imbalance and return measures, and the weekly interquartile difference in the conditional
mean return ranges from 9.31 basis points to 11.44 basis points (4.84% to 5.94% per year).
Whether economic magnitudes are large or small depends on the beholder, but this strikes us as a
non-trivial amount of cross-sectional predictability that lasts for a relatively long time (weeks, not
days, as we will show later in the paper). Overall, past week marketable retail order imbalances
can significantly predict future returns in the correct direction.

For the control variables, we observe negative coefficients on the previous week’s return,
which indicates weekly return reversals, and positive coefficients on the other longer-horizon
returns, which indicates momentum. Size, book-to-market, turnover, and volatility all carry the
expected signs, and most are not statistically significant. This result also confirms that the
predictability we find is not simply a manifestation of some other size, book-to-market, turnover,
or volatility anomaly. The average adjusted R2’s from the first stage cross-sectional estimation are
mostly around 3.85%.

B.2. Subgroups in the Cross Section

Our sample includes on average more than 3,000 firms each day. Is the predictive power
of marketable retail order imbalances restricted to a particular type of firm? Do informed retail
investors have preferences for particular types of firms? We investigate these questions by
analyzing various firm subgroups in this section. We first sort all firms into three groups based on
a firm or stock characteristic observed at the end of the previous month. Then, we estimate equation
(6) within each characteristic group. That is, we allow all coefficients in equation (6) to be different
within each group, which allows substantial flexibility in the possible predictive relationship across
these different groups.

To save space, we include only the results on weekly returns that are computed using the
end-of-day bid-ask average price. We first sort all stocks into three different size groups based on
market capitalization: small, medium, and large. The results are reported in Panel A of Table IV.
In the left panel, we report coefficients on mroibvol, the order imbalance computed from share

17

Electronic copy available at: https://ssrn.com/abstract=2822105


volume. When we move from the smallest one-third of firms by market cap to the largest tercile,
the coefficient on mroibvol decreases from 0.0013 to 0.0003, and the t-statistic decreases from
13.90 to 3.68. Clearly, the predictive power of marketable retail order imbalances is much stronger
for smaller firms than for larger-cap firms, but the predictability remains reliably present in all
three groups. Economically, the interquartile difference in weekly returns is 21.9 basis points for
the smallest firms (11.39% per year), and 2.6 basis points for the largest firms (1.35% per year).
The results in the right panel using order imbalance based on the number of trades (mroibtrd) are
quite similar.

In Panel B of Table IV, we sort all firms into three groups based on the previous month-
end share price. In the left panel, moving from the lowest share-price firms to the highest, the
coefficient on mroibvol decreases from 0.0014 to 0.0002, and the t-statistics go from 13.34 to 3.23.
In terms of magnitude, the interquartile weekly return difference is 20.5 basis points (10.66% per
year) for the lowest-price firms and only 2.0 basis points for the firms with the highest share price
(1.04% per year). The results are similar for specifications using mroibtrd, reported in the right
panel, with slightly lower coefficients and t-statistics. The pattern is clear: the predictive power of
marketable retail order imbalances for future returns is stronger for low-price firms.

Next, we sort all firms based on previous-month turnover, which may be a proxy for
liquidity. In the left panel, moving from the tercile of low trading activity to the firms with more
turnover, the coefficient on mroibvol decreases from 0.0011 to 0.0007, and the t-statistic decreases
from 15.60 to 4.98. In terms of magnitude, the interquartile weekly return difference is 20.5 basis
points (10.66% per year) for the firms with the lowest turnover and 6.5 basis points for the firms
with the highest turnover (3.38% per year). For specifications based on mroibtrd in the right panel,
the results are similar, with slightly lower coefficients and t-statistics. Overall, marketable retail
order imbalances better predict returns for firms with lower trading activity.

In this section, we find that the predictive power of the marketable retail order imbalance
is significant and positive for all but one subgroup, which shows that the predictive power is not
driven by special subgroups. However, a clear cross-sectional pattern for the predictive power is
observed. The predictive power of the marketable retail order imbalance is much stronger for small
firms and firms with low share-price and low liquidity.

B.3. Longer Horizons

18

Electronic copy available at: https://ssrn.com/abstract=2822105


The results in the previous section show that marketable retail order imbalances can predict
next week’s returns positively and significantly. It is natural to now ask whether the predictive
power is transient or persistent. If the predictive power quickly reverses, the retail investors may
be capturing price reversals; if the predictive power continues over time and then vanishes beyond
some horizon, the retail investors may be informed about information relating to firm
fundamentals. To answer this question, we extend equation (6) to longer horizons as follows:

𝑟𝑒𝑡(𝑖, 𝑤 + 𝑘) = 𝑐0(𝑤) + 𝑐1(𝑤)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤) + 𝑐2(𝑤)′ 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤)


(7)
+ 𝑢3(𝑖, 𝑤 + 𝑘).

That is, we use one week of order imbalance measures to predict k-week ahead returns, ret(i,w+k),
with k=1 to 12. To observe the decay of the predictive power of marketable retail order imbalance,
the return to be predicted is a weekly return over a one-week period, rather than a cumulative
return over n weeks, which is an average over all weeks involved. If marketable retail order
imbalances have only short-lived predictive power for future returns, we might observe the
coefficient c1 decrease to zero within a couple of weeks. Alternatively, if the marketable retail
order imbalance has longer predictive power, the coefficient c1 should remain statistically
significant for a longer period. In our empirical estimation, we choose k ranging from two to 12
weeks.

We report the results in Table V, with results based on bid-ask average returns in Panel A,
and those based on closing transaction prices in Panel B. In Panel A, when we extend the window
from two to 12 weeks, the coefficient on mroibvol monotonically decreases from 0.00055 to
0.00007, and the coefficient on mroibtrd gradually decreases from 0.00048 to 0.00006. The
coefficients are statistically significant up to six or eight weeks ahead. The results in Panel B are
similar. There is no evidence of price reversals at any horizon. Thus, our marketable retail order
imbalances potentially capture either longer-lived information or slow information diffusion.

B.4. Long-Short Portfolios

One might wonder whether we can use marketable retail order imbalances as a signal to
form a profitable trading strategy. As discussed earlier, both mroibvol and mroibtrd are publicly
available information. In this section, we form quintile portfolios based on the previous week’s
average order imbalance and then hold the quintile portfolios for up to 12 weeks. If retail investors
on average can select the right stocks to buy and sell, then firms with higher or positive marketable

19

Electronic copy available at: https://ssrn.com/abstract=2822105


retail order imbalance would outperform firms with lower or negative order imbalance. Notice that
this exercise uses marketable retail order imbalance measures merely as a signal to predict future
stock returns, and it thus provides no information on whether retail investors with marketable
orders make profits from their own trades. We ignore trade frictions and transaction costs here,
and the results are therefore not definitive on whether outsiders can profit from these signals.

Table VI reports long-short portfolio returns, where we buy the stocks in the highest order
imbalance quintile and short the stocks in the lowest order imbalance quintile each day using the
previous 5-day marketable retail order flow measures, and hold them for the next few weeks.
Portfolio returns are value-weighted using the previous month-end market cap. Because the
holding period can be as long as 12 weeks, we report both the raw and risk-adjusted returns using
the Fama-French three-factor model. Given the usage of overlapping data, we adjust the standard
errors of the portfolio return time-series using Hansen and Hodrick (1980) standard errors with the
corresponding number of lags.12

In Panel A, the long-short strategy is based on the previous week’s mroibvol, and we report
bid-ask average returns. Over a one-week horizon, the long-short portfolio return is 0.092%, or
4.78% per year annualized. The t-statistic is 2.66. Risk adjustment using the Fama-French three-
factor model does not make much difference: the weekly Fama-French alpha for the long-short
portfolio is 0.084%, with a t-statistic of 2.43. When we increase the holding horizon to 12 weeks,
the mean return becomes 0.588%, with a t-statistic of 2.09. The general pattern is that holding-
period returns (and alphas) continue to grow at a decreasing rate over time. We observe no
evidence of a reversal in returns. In terms of statistical significance, the t-statistics are significant
or marginally significant up to the 12-week horizon. These results are slightly weaker than those
of the Fama-MacBeth regressions, mainly because, in this section, we value-weight the portfolio
returns across firms, while the Fama-MacBeth approach implicitly weights each stock equally.

When we restrict portfolio formation to one of the three market cap groups, the one-week
return is 0.403% (or 20.96% per year) with a t-statistic of 9.16 for the smallest firms, while the
one-week return is 0.067% (or 3.48% per year) with a t-statistic of 1.78 for the largest firms. When
the holding horizon becomes longer, the return on the long-short strategy is still significant and

12
For example, for a one-week holding period portfolio, we use Hansen and Hodrick (1980) with 5 lags. For a two-
week holding period, we use Hansen and Hodrick (1980) with 10 lags, and so on.

20

Electronic copy available at: https://ssrn.com/abstract=2822105


positive for up to 12 weeks for the smallest third of firms, but the results are statistically
insignificant for the largest tercile. The results in Panel B, obtained using mroibtrd, are
qualitatively similar but with smaller magnitude and lower statistical significance. This result is
expected since, as mentioned, the information provided by mroibvol is similar but finer than that
provided by mroibtrd.13

To make sure that the statistical significance in return differences is not driven by particular
sample periods, we provide a time-series plot of the return differences between quintiles 1 and 5
in Figure 3, where the portfolios are sorted on mroibvol and the holding period is one week. Over
our six-year sample period, we observe both time-variation in the return differences and positive
and negative spikes. However, most data points are positive, and the positive returns are not driven
by particular sample subperiods.

C. Alternative Hypotheses for Marketable Retail Order Imbalance Predictive Power for
Future Returns

The predictive power of marketable retail order imbalances for future stock returns is
consistent with three hypotheses. First, as in Chordia and Subrahmanyam (2004), order flows are
persistent, and, as the retail buying/selling pressure is persistent, this could lead directly to the
predictability of future returns. Second, as in Kaniel, Saar, and Titman (2008), these retail traders
are contrarian at weekly horizons, and since their contrarian trading provides liquidity to the
market, their trades might positively predict future returns. Third, as in Kelley and Tetlock (2013),
retail investors, especially the aggressive investors using market orders, may have valuable
information about the firm, and thus their trading could correctly predict the direction of future
returns. The above three hypotheses are not exclusive. In Section II.C.1, we conduct a simple
decomposition to separate alternative hypotheses. In Section II.C.2, we provide more evidence
regarding the liquidity provision hypothesis.

13
We also conduct a rough calculation that includes transaction costs. Frazzini, Israel and Moskowitz (2018) state
that a reasonable estimate of the one-way transaction cost on value-weighted US stocks is about 12 basis points for
the period January 2006 to June 2016. To be conservative, we assume that for each rebalance, we change 100% of the
positions. That is, each rebalance we incur a 2*12 bps = 24 bps rebalance cost. For instance, for a weekly rebalance
or 1 week holding period, each year’s transaction cost would be 52 rebalances * 2 * 12bps = 1248 bps. After this
drastic transaction cost adjustment, the mean returns and alphas remain positive and significant for the small firms
over all holding horizons. For the medium and big firms, the mean returns and alphas stay positive for longer holding
periods, but they are mostly insignificant.

21

Electronic copy available at: https://ssrn.com/abstract=2822105


C.1. Two-Stage Decomposition

To distinguish among these alternative hypotheses for the predictive relation between
previous period marketable retail order imbalance and next period stock return as in equation (6),
we adopt a two-step decomposition procedure. In the first step, we decompose the independent
variable in equation (6), the previous week’s marketable retail order imbalance mroib(w-1), into
three components, with the following cross-sectional regressions for each week w-1:

𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) = 𝑑0(𝑤 − 1) + 𝑑1(𝑤 − 1)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 2)


(8)
+𝑑2(𝑤 − 1)′ 𝑟𝑒𝑡(𝑖, 𝑤 − 2) + 𝑢4(𝑖, 𝑤 − 1).

̂ (𝑤 − 1), 𝑑1
For each week w-1, we obtain the time-series of coefficients, {𝑑0 ̂ (𝑤 − 1), 𝑑2
̂ (𝑤 −
1)′}. Next, we calculate the three components of the mroib(i,w-1) as follows:

̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 = 𝑑1
𝑚𝑟𝑜𝑖𝑏 ̂ (𝑤 − 1)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 2),
𝑖,𝑤−1

̂ 𝑖,𝑤−1
𝑚𝑟𝑜𝑖𝑏 𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛 ̂ (𝑤 − 1)′ 𝑟𝑒𝑡(𝑖, 𝑤 − 2),
= 𝑑2 (9)

̂ 𝑖,𝑤−1
𝑚𝑟𝑜𝑖𝑏 𝑜𝑡ℎ𝑒𝑟
= 𝑢4 ̂ (𝑤 − 1).
̂ (𝑖, 𝑤 − 1) + 𝑑0

From equation (8) and (9), we know that

̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 + 𝑚𝑟𝑜𝑖𝑏
𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) = 𝑚𝑟𝑜𝑖𝑏 ̂ 𝑖,𝑤−1
𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛 ̂ 𝑖,𝑤−1
+ 𝑚𝑟𝑜𝑖𝑏 𝑜𝑡ℎ𝑒𝑟
. (10)
𝑖,𝑤−1

That is, we denote the part of 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) related to the past order imbalance as the
“persistence,” which is related to the price pressure hypothesis. The part related to past returns
over different horizons is denoted as “contrarian,” which relates to the liquidity provision
hypothesis. After we take out predictability due to “persistence” and “contrarian” trading, we
denote the residual part as “other,” which potentially contains other relevant information about
future returns. Note that this empirical decomposition is an identity. When we add up these three
components, by definition we obtain the explanatory variable 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) in our basic
predictive regression in equation (6).

At the second stage, we replace the 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) in equation (6) by its three
components, and we estimate the following regression using the Fama-MaBeth methodology:

22

Electronic copy available at: https://ssrn.com/abstract=2822105


̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 + 𝑒2(𝑤)𝑚𝑟𝑜𝑖𝑏
𝑟𝑒𝑡(𝑖, 𝑤) = 𝑒0(𝑤) + 𝑒1(𝑤)𝑚𝑟𝑜𝑖𝑏 ̂ 𝑖,𝑤−1
𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛
𝑖,𝑤−1
(11)
̂ 𝑖,𝑤−1
+ 𝑒3(𝑤)𝑚𝑟𝑜𝑖𝑏 𝑜𝑡ℎ𝑒𝑟
+ 𝑒4(𝑤)′ 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − 1) + 𝑢5(𝑖, 𝑤).

Since we decompose the original order imbalance measure 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) into three parts,
related to order flow persistence, a contrarian trading pattern, and the residual, the coefficient
estimates in equation (11) reveal how each component of 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) helps to predict future
stock returns. The advantage of the two stage decomposition approach is that it includes
components of 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) from alternative hypotheses in a unified and internally consistent
empirical framework. The caveat of this approach is that without a structural model, interpreting
the results may be more difficult. In particular, we must make empirical assumptions on proxies
for the persistence and contrarian components. These empirical assumptions seem to us to be
reasonable, but we still need to be cautious that the interpretation of the results depends on the
validity of our empirical assumptions.

We report the decomposition results in Table VII. Panel A presents the first-stage
estimation as in equation (8), which is quite similar to those reported in Table II. Take the first
regression as an example. The order imbalance measure, mroibvol, has a highly significant and
positive coefficient on its own lag at 0.22, which indicates order persistence. In terms of past
returns, the coefficients for the past week, past month, and past six-month returns are -0.9286, -
0.2029 and -0.0267, respectively, all implying contrarian trading patterns.

After we decompose the previous week’s order imbalance into “persistence,” “contrarian,”
and “other,” we include them together to predict future stock returns, as in equation (11). In the
first regression, we use the past week’s mroibvol to predict future bid-ask return. The coefficient
estimate on mroib(persistence) is 0.0027, with a t-statistic of 8.75, which implies that price
pressure significantly and positively contributes to the predictive power of the marketable retail
flow. The coefficient estimate on mroib(contrarian) is -0.0044, and it is insignificantly different
from zero, implying that we cannot reject the null hypothesis that the contrarian component does
not contribute significantly to the predictive power of marketable retail order imbalances. Finally,

23

Electronic copy available at: https://ssrn.com/abstract=2822105


for the mroib(other) component, the coefficient is 0.0008, with a strongly significant t-statistic of
14.47.14

In terms of economic magnitude, we compute the interquartile range of all three


components of the order imbalance measure. For the mroib(persistence), if we move from the 25th
percentile firm to the 75th percentile firm, the difference in future one-week stock return is 0.0688%
(3.58% per year). For the mroib(other) variable, if we move from the 25th percentile firm to the
75th percentile firm, the difference in future one-week stock return is 0.0915% (4.76% per year).
For the mroib(contrarian) measure, the sign is the opposite and has no statistical significance. The
results in other specifications are quite similar.15

14
We also try to include the past order imbalance as a control variable for the second stage estimation. We cannot
directly include mroib(w-1) or mroib(w-2), because it will create collinearity issues. Therefore, here we include either
mroib(w-3) or mroib(m-1) to control for past mroib. These results are presented in Appendix Table AII. We also
consider using the contemporaneous return in equation (8) rather than lag returns, and these results are reported in
Appendix Table AIII. No matter which specification we use, the main results stay quite similar to those in Table VII.
15
All three components of the marketable order imbalance measure in equation (11) are measured at week w-1 so that
we are using this identity to decompose the predictive relation between mroib(w-1) and ret(w) in equation (6). If we
are willing to depart from this predictive decomposition framework, we can examine other relationships. For example,
we might want to examine the contemporaneous relation between mroib(w) and ret(w), and use the three fitted values
from week w rather than from week w – 1 . In this case, the first stage estimation for week w becomes,
𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤) = 𝑑0(𝑤) + 𝑑1(𝑤)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) + 𝑑2(𝑤)′ 𝑟𝑒𝑡(𝑖, 𝑤 − 1) + 𝑢4(𝑖, 𝑤). (8’)
These are period-by-period cross-sectional regressions, which means that when we estimate coefficients
̂ (𝑤), 𝑑1
{𝑑0 ̂ (𝑤), 𝑑2 ̂ (𝑤)′} in equation (8’), they are estimated using information from both week w and week w – 1.
From this first stage estimation, we can define the relevant persistence estimate as:
̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 = 𝑑1
𝑚𝑟𝑜𝑖𝑏 ̂ (𝑤)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1),
𝑖,𝑤

̂ 𝑖,𝑤
𝑚𝑟𝑜𝑖𝑏 𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛 ̂ (𝑤)′ 𝑟𝑒𝑡(𝑖, 𝑤 − 1),
= 𝑑2 (9’)
̂ 𝑖,𝑤
𝑚𝑟𝑜𝑖𝑏 𝑜𝑡ℎ𝑒𝑟
= 𝑢4 ̂ (𝑤).
̂ (𝑖, 𝑤) + 𝑑0

̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 + 𝑚𝑟𝑜𝑖𝑏
𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤) = 𝑚𝑟𝑜𝑖𝑏 ̂ 𝑖,𝑤𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛 ̂ 𝑖,𝑤
+ 𝑚𝑟𝑜𝑖𝑏 𝑜𝑡ℎ𝑒𝑟
. (10’)
𝑖,𝑤
̂ ̂ ̂
Notice that coefficients {𝑑0(𝑤), 𝑑1(𝑤), 𝑑2(𝑤)′} are estimated using information from week w, and thus
̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 , 𝑚𝑟𝑜𝑖𝑏
𝑚𝑟𝑜𝑖𝑏 ̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 , and 𝑚𝑟𝑜𝑖𝑏
̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 all use information from week w. Then the second stage
𝑖,𝑤 𝑖,𝑤 𝑖,𝑤
estimation for the contemporaneous relation between return and marketable retail order imbalance becomes
𝑟𝑒𝑡(𝑖, 𝑤) = 𝑒0(𝑤) + 𝑒1(𝑤)𝑚𝑟𝑜𝑖𝑏 ̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 + 𝑒2(𝑤)𝑚𝑟𝑜𝑖𝑏 ̂ 𝑖,𝑤𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛 ̂ 𝑖,𝑤
+ 𝑒3(𝑤)𝑚𝑟𝑜𝑖𝑏 𝑜𝑡ℎ𝑒𝑟
+
𝑖,𝑤
𝑒4(𝑤)′ 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − 1) + 𝑢5(𝑖, 𝑤). (11’)
In comparison with equation (11), equation (11’) gives us something closer to an estimate of the contemporaneous
relation between components of 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤) and 𝑟𝑒𝑡(𝑖, 𝑤), rather than a predictive relation between 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 −
1) and 𝑟𝑒𝑡(𝑖, 𝑤). We present the estimation results in Appendix Table AIV. Appendix Table AIV Panel A reports the
estimation results of equation (8’), and the results are quite similar to those in Table II and Panel A of Table VII.
Appendix Table AIV Panel B shows the estimation results for equation (11’). Appendix Table AIV Panel B shows
the estimation results for equation (11’). For instance, in regression I, the coefficient on mroib(w, persistence) is
0.0045 with a t-statistic of 14.26, while the coefficient on mroib(w-1, persistence) in Table VII Panel B is 0.0027 with
a t-statistic of 8.75. That is, the coefficient on past order persistence becomes larger and more significant in equation

24

Electronic copy available at: https://ssrn.com/abstract=2822105


Our decomposition exercise shows that close to half of the predictive power of the
marketable retail order imbalance comes from the persistence of the order imbalance measures,16
and that most of the rest comes from the residual component, after we take out order persistence
and the contrarian trading pattern. Since this residual component significantly predicts future stock
returns, it is consistent with the hypothesis that marketable retail investor trading contains valuable
information about future stock price movements.

C.2. A Closer Look at the Liquidity Provision Hypothesis

The liquidity provision hypothesis receives substantial attention in the existing literature,
so here we take a closer look at this hypothesis. Kaniel, Saar, and Titman (2008) argue that retail
investors’ contrarian trading provides liquidity to the market, and this leads to the positive
predictive power of past marketable retail order imbalance for future stock returns. Therefore, in
equation (8), we use the part of the marketable retail order imbalance related to past returns,
mroib(contrarian), as a proxy for the “liquidity provision” hypothesis. Now our results in Table
VII show that the contrarian component of marketable retail order flow cannot significantly predict
future stock returns. Does this finding completely rule out the “liquidity provision” hypothesis for
the predictive power of marketable retail order flow? We are afraid not. As mentioned in our earlier
discussion of our approach’s caveat: we can only rule out the liquidity provision hypothesis, under
the assumption that the contrarian trading pattern captured by mroib(contrarian) is a perfect proxy
for the liquidity provision hypothesis. This seems to us to be a reasonable assumption, but as far
as we can tell it cannot be directly confirmed by any data that we observe.17 In this subsection, we
provide more results regarding the liquidity provision hypothesis through different approaches
beyond the predictive regression.

(11’) than in equation (11), indicating that contemporaneous price pressure (proxied by lag order imbalance) is more
important than lagged price pressure in equation (11). The coefficient on mroib(w, contrarian) stays insignificant. The
coefficient on mroib(w, other) is 0.0006 with a t-statistic of 5.07, while the coefficient on mroib(w-1, other) is 0.0008
with a t-statistic of 14.47 in Table VII Panel B. The residual component remains significant but becomes slightly
smaller in this case.
16
To be more specific, the retail order imbalance has a low autocorrelation coefficient between 10-20%, but the
positive autocorrelation lasts for a long period. Here the persistence refers to the long horizon rather than the
magnitude.
17
For example, recent studies, such as Arif, Ben-Rephael and Lee (2016), and Chakrabarty, Moulton, and Trzcinka
(2017), show that directional trading by active funds is highly persistent and price destabilizing. If the retail trades
provide liquidity to these active funds, then liquidity provision can also go through the persistence channel, rather than
the contrarian channel.

25

Electronic copy available at: https://ssrn.com/abstract=2822105


An important piece of evidence in support of the liquidity provision hypothesis in Kaniel,
Saar, and Titman (2008) is the contemporaneous relation between marketable retail order
imbalance and stock returns.18 To be more specific, Kaniel, Saar, and Titman (2008) examine the
past, contemporaneous, and future returns of intense buy and sell portfolios of retail investors. In
their paper, the buy and sell order flows by retail investors are measured using the “net individual
trading” (NIT) measure. For each week, all firms are first sorted into decile groups using the
previous-week NIT, and then Kaniel, Saar, and Titman (2008) track the excess returns to these
different groups for the four weeks before and after the portfolio construction. The excess return
of each portfolio is computed by subtracting the return on a market proxy (the equal-weighted
portfolio of all stocks in the sample). Here we follow their approach, while using our marketable
retail order flow measures, mroibvol and mroibtrd. Results using mroibvol are reported in Table
VIII, and results using mroibtrd are reported in Appendix Table AV.

The main results of Kaniel, Saar and Titman (2008) are reported in their Table III, which
contains three main findings. First, the stocks the retail investors sell during the portfolio
construction week (week 0), the intense selling group, experience significantly positive excess
return, before week 0; while the stocks the retail investors buy during week 0, the intense buying
group, experience negative excess returns. This is a typical contrarian trading pattern of selling
winners and buying losers. In Panel A of Table VIII, the first row contains the firms intensely sold
by the retail investors using marketable orders, and the mean excess return in the 20 days prior to
the selling week is 0.67%. The bottom row contains the firms intensely bought by retail investors
using marketable orders, and the mean excess return on these stocks in the 20 days prior to the
selling week is -1.29%. Both numbers are highly significant, and confirm Kaniel, Saar and
Titman’s first finding.

The second finding of Kaniel, Saar and Titman (2008) is that after retail investors buy or
sell, the stocks the retail investors sell during week 0, the intense selling group, experience negative
excess returns, while the stocks the retail investors buy during week 0, the intense buying group,
experience positive excess returns. This shows that retail trading can predict returns in the correct
direction. In Panel A of Table VIII, we find that firms intensely sold by retail investors using
marketable orders (in the first row) experience a mean excess return in the 20 days after the selling

18
We thank an anonymous referee for this suggestion.

26

Electronic copy available at: https://ssrn.com/abstract=2822105


week of -0.30%, while the firms intensely bought by retail investors using marketable orders in
the bottom row experience a mean excess return of 0.57%. Again, both numbers are highly
significant and confirm Kaniel, Saar, and Titman’s second finding.

Finally, for the contemporaneous relation over week 0, Kaniel, Saar, and Titman (2008)
find that the contemporaneous excess return is significantly positive for stocks retail investors sell,
and negative for stocks they buy. Since the return signs are opposite of the retail trading direction,
they interpret this finding in favor of the liquidity provision hypothesis. From the column of k=0
in Table VIII Panel A, however, we find that for firms intensely sold by retail investors using
marketable orders, the contemporaneous return is significantly negative at -0.24% with a t-statistic
of -5.30. For the intensely bought firms in the bottom row, the contemporaneous return is
significantly positive at 0.11% with a t-statistic of 2.69. Our findings show consistent, rather than
opposite, signs between contemporaneous marketable retail trading and return direction, which
does not line up with the liquidity provision hypothesis proposed in Kaniel, Saar, and Titman
(2008).

What might cause the differences in our results? It might come from differences in the
retail order imbalance variable, the sample period or coverage. Our main variable comes from the
marketable retail order flows, while Kaniel, Saar and Titman (2008) use retail order imbalance
from both marketable and non-marketable order flows. Between the marketable and non-
marketable orders, it is likely that the marketable orders are more aggressive. In terms of the
sample period, the Kaniel, Saar and Titman (2008) sample is January 2000 through December
2003, and our sample is January 2010 through December 2015, which are about ten years apart.
Coverage wise, Kaniel, Saar and Titman (2008)’s sample is from NYSE’s Consolidated Equity
Audit Data (CAUD) which contains only retail trades that are executed on that exchange. During
the Kaniel, Saar and Titman (2008) sample period, only a small number of brokerages sent their
retail order flow to the NYSE. As a result, the NYSE’s market share of overall retail order activity
was (and has remained) quite small. In comparison, our sample is from TAQ which contains all
off-exchange and nearly all retail marketable orders. To summarize, the liquidity provision
hypothesis receives at most mixed support in our data sample.

D. Public News and Marketable Retail Order Imbalance

27

Electronic copy available at: https://ssrn.com/abstract=2822105


Our earlier results indicate that marketable retail order flows may contain valuable
information about future stock price movements, which might be surprising to many. As Kaniel,
Saar, and Titman (2008) note, “… it is unclear how individuals, who have far fewer resources than
institutions, could gain the upper hand in discovering private information and trading on it
profitably in such a widespread fashion.”
Therefore, to better understand whether the marketable retail investors can be informed
traders and the nature of information they might possess, we examine the relation between
marketable retail order flow and public news in this section. We introduce the public news data in
Section D.1, and investigate whether and how the information in marketable retail flow is related
to public news in Section D.2.

D.1 Marketable Retail Order Imbalance and Future Returns across News Topics

We obtain news data from Thomson Reuters News Analytics (TRNA), which contains
prominent public news articles for a broad set of firms starting from 2003. TRNA provides key
information about each news item, such as the ticker, the time stamp of the news story, the news
topics the story belongs to, and sentiment scores for each article. News topics are grouped into five
categories: cross market, general news, economy, equities, and money/debt. Each category
contains several news subtopics, and we collect 58 such subtopics in our sample. The sentiment
score measures the probabilities of the article being positive, negative, or neutral, computed using
Thomson Reuters’ proprietary algorithm. We compute a net sentiment score as the difference
between the positive and negative sentiment score for each stock each day. The news data are
available from January 2010 to December 2014, which covers most of our main sample. We use
tickers to match the news data with our marketable retail order imbalance data, generating a
merged sample of 3,854,813 stock-day observations.

We first provide some simple statistics for the relation among news, returns, and
marketable retail order flow. To examine whether our measure of public news can predict future
stock returns, we estimate the following Fama-MacBeth regression:

𝑟𝑒𝑡(𝑖, 𝑤) = 𝑓0(𝑤) + 𝑓1(𝑤) × 𝑠𝑒𝑛𝑡(𝑖, 𝑤 − 1) + 𝑓2(𝑤)′𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − 1) + 𝑢6(𝑖, 𝑤). (12)

Here, variable 𝑠𝑒𝑛𝑡(𝑖, 𝑤 − 1) is the average TRNA net sentiment score for firm i during week w-
1, calculated by averaging non-missing news sentiment for firm i within week w-1. The results are
reported in regression I and II of Table IX Panel A. In regression I, the coefficient of past week

28

Electronic copy available at: https://ssrn.com/abstract=2822105


public news sentiment is 0.0008 with a t-statistic of 3.31. The positive and significant coefficient
indicates that net sentiment in public news can predict the cross-section of next week’s stock
returns. When we include the past marketable retail order imbalance in regression III and IV, the
predictive power of the public news sentiment stays about the same. Interestingly, in the presence
of the contemporaneous public news sentiment in regression III and IV, the coefficients on past
marketable retail order imbalance are also positive and significant, with similar magnitudes to
those in Table III, indicating that public news sentiment does not take away the predictive power
of marketable retail order flow for future stock returns.

To better understand how marketable retail order imbalances are related to public news, we
next estimate the contemporaneous relation between the two using the following Fama-MacBeth
specification:

𝑠𝑒𝑛𝑡(𝑖, 𝑤) = 𝑔0(𝑤) + 𝑔1(𝑤) × 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤) + 𝑔2(𝑤)′𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − 1) + 𝑢7(𝑖, 𝑤). (13)

We find that the current week’s marketable retail order imbalance is significantly and positively
related to the same-week public news sentiment in 10 out of the 58 subtopics. To save space, we
present the coefficient estimates for the 10 cases in Panel B of Table IX. These 10 subtopics
represent about 38% of total news days, and they mostly contain firm-level news. For instance, for
the subtopic “RESF” (results forecast) in the news type “equities”, the coefficient g1 is 0.0054,
with a significant t-statistic of 3.90, indicating that the marketable retail order imbalance has a
positive and significant contemporaneous relation with news related to forecasts of company
results. Out of the ten subtopics, four are from the category of “money and debt”, and three are
from the category of “equity”, with the highest two significant t-statistics for the subtopics “results
forecast” and “debt rating news.” Interestingly, the marketable retail order imbalances are never
statistically significantly correlated with the “economy” type of news. This finding implies that
marketable retail investors may have valuable information at the firm level rather than at the
market level. This is confirmed in some of our later results (Section III.A), where we find that
retail investors cannot reliably predict future market-wide returns.

In Appendix Table AVI, we also examine whether retail order imbalance can directly
predict our measure of public news. We find the coefficient signs to be mixed. In fact, there are
several cases where marketable retail order flows significantly predict future public news
sentiment with negative coefficients. This does not contradict our earlier results of a positive and

29

Electronic copy available at: https://ssrn.com/abstract=2822105


significant contemporaneous relation between marketable retail order flow and our measure of
public news. This only shows that marketable retail flows cannot predict our measure of future
public news with the expected signs for these subtopics. 19 Using a joint test, we fail to reject the
null that marketable retail order flow cannot jointly predict future public news.

D.2 Public Information and Other Information

The above results show that marketable retail order imbalances are associated with some
types of contemporaneous public news, particularly firm-level news. In this subsection, we probe
deeper into the fraction of marketable retail order flows’ predictive power that is associated with
these public news releases, because it is also possible that marketable retail traders possess and
trade on non-public information that eventually makes its way into prices, but not via an
identifiable public news release.

We investigate this issue empirically using a two-step decomposition procedure similar to


that in Section II.C. In the first step, we estimate a Fama-MacBeth regression and decompose the
weekly order imbalance into four components for week w-1, as follows:

𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) = ℎ0(𝑤 − 1) + ℎ1(𝑤)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 2) + ℎ2(𝑤 − 1)′ 𝑟𝑒𝑡(𝑖, 𝑤 − 2) +


ℎ3(𝑤 − 1)𝑠𝑒𝑛𝑡(𝑖, 𝑤 − 1) + 𝑢8(𝑖, 𝑤 − 1). (14)

Here, variable 𝑠𝑒𝑛𝑡(𝑖, 𝑤 − 1) is the average TRNA net sentiment score for firm i during week w-
1, which we use to capture information in contemporaneous public news releases. After we obtain
̂ (𝑤 − 1), ℎ1
the time-series of coefficients, {ℎ0 ̂ (𝑤 − 1), ℎ2
̂ (𝑤 − 1)′, ℎ3
̂ (𝑤 − 1)}, we define the
following terms:

̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 = ℎ1
𝑚𝑟𝑜𝑖𝑏 ̂ (𝑤 − 1)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 2),
𝑖,𝑤−1

̂ 𝑖,𝑤−1
𝑚𝑟𝑜𝑖𝑏 𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛 ̂ (𝑤 − 1)′ 𝑟𝑒𝑡(𝑖, 𝑤 − 2),
= ℎ2
(15)
̂ 𝑝𝑢𝑏𝑙𝑖𝑐𝑛𝑒𝑤𝑠
𝑚𝑟𝑜𝑖𝑏 ̂ (𝑤 − 1)𝑠𝑒𝑛𝑡(𝑖, 𝑤 − 1),
= ℎ3
𝑖,𝑤−1

̂ 𝑖,𝑤−1
𝑚𝑟𝑜𝑖𝑏 𝑜𝑡ℎ𝑒𝑟
= 𝑢8 ̂ (𝑤 − 1).
̂ (𝑖, 𝑤 − 1) + ℎ0

19
To give some context for why retail order flow does not show significant predictive power for future public
sentiment in our sample, it could be that the public news we observe is noisy; it could be that the information the retail
investors have does not warrant a specific news story; it could be that the public news is published further into the
future than the horizons we examine here; or another possibility is that retail investors incorporate other useful public
news into their trading, such as the SeekingAlpha posts studied by Farrell et al. (2020).

30

Electronic copy available at: https://ssrn.com/abstract=2822105


The sum of the above four components is exactly 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1). As before, we denote the part
related to past order imbalance as the “persistence” component, which is related to the price
pressure hypothesis; the part related to past returns is denoted as the “contrarian” component,
which is connected to the liquidity provision hypothesis. We define the part related to
contemporaneous public news sentiment as the “public news” component. Finally, we denote the
residual part as the “other” component, which we attribute to marketable retail investors’ non-
public information that is not incorporated into prices via an identifiable news release.

At the second stage, we estimate the following regression using the Fama-Macbeth
methodology, which is parallel to equation (6):

̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 + 𝑗2(𝑤)𝑚𝑟𝑜𝑖𝑏
𝑟𝑒𝑡(𝑖, 𝑤) = 𝑗0(𝑤) + 𝑗1(𝑤)𝑚𝑟𝑜𝑖𝑏 ̂ 𝑖,𝑤−1
𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛
+
𝑖,𝑤−1

̂ 𝑝𝑢𝑏𝑙𝑖𝑐𝑛𝑒𝑤𝑠 + 𝑗4(𝑤)𝑚𝑟𝑜𝑖𝑏
𝑗3(𝑤)𝑚𝑟𝑜𝑖𝑏 ̂ 𝑖,𝑤−1
𝑜𝑡ℎ𝑒𝑟
+ 𝑗5(𝑤)′𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − 1) + 𝑢9(𝑖, 𝑤). (16)
𝑖,𝑤−1

Since we decompose the original order imbalance measure 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) into four parts,
related to persistence, contrarian trading pattern, public information, and the residual, the
coefficients in equation (16) reveal how each component helps to predict future stock returns.

Notice that for the first stage estimation, the public news component is derived from a
contemporaneous relation between the current news and current marketable retail order flow,
rather than past news. From the perspective of the empirical design, we can link the marketable
retail order imbalance with the past, contemporaneous, or future public news, but the
interpretations would be different. If we use future public news, the interpretation would be
whether and how marketable retail order flow “anticipates” future public news. If we use past
public news, the interpretation would be that previously “incorporated” public news can be a
component of the marketable retail order imbalance. When we use contemporaneous public news
sentiment, we interpret the related part of marketable retail order flow as contemporaneously
“processed” public news. Here we choose not to use future public news, because if we project
̂ 𝑝𝑢𝑏𝑙𝑖𝑐𝑛𝑒𝑤𝑠 would capture news from week w,
mroib(w-1) on sent(w) at the first stage, then 𝑚𝑟𝑜𝑖𝑏𝑖,𝑤

and would have a mechanical correlation with ret(w), which is the dependent variable in the second
stage estimation and the regression is no longer predictive. We also choose not to use past public
news, because we would like to maximize the explanatory power of public news for marketable

31

Electronic copy available at: https://ssrn.com/abstract=2822105


retail order flow, while contemporaneous public news sentiment very likely nests the information
in past public news.

Table X Panel A provides results for the first-stage decomposition. The patterns of how
past marketable retail order imbalance and past returns affect the current order imbalance are very
similar to those in Table II. The coefficient on the contemporaneous public news sentiment ranges
between 0.0249 and 0.0305, all with t-statistics higher than 10. This clearly indicates that more
positive news is associated with more contemporaneous purchases by marketable retail investors.
The average adjusted R2 for the first stage estimation are mostly between 5.49% and 8.59%.

Panel B of Table X reports the results on the second-stage decomposition. From the top
̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 are positive and highly significant, and the coefficients
half, the coefficients on 𝑚𝑟𝑜𝑖𝑏𝑖,𝑤−1

̂ 𝑖,𝑤−1
on 𝑚𝑟𝑜𝑖𝑏 𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛
are mostly insignificant, similar to the findings in Table VII. The coefficients
̂ 𝑝𝑢𝑏𝑙𝑖𝑐𝑛𝑒𝑤𝑠 , are also all insignificant,
on the public news components of order imbalance, 𝑚𝑟𝑜𝑖𝑏𝑖,𝑤−1

indicating that the contemporaneous public news component of marketable retail order imbalances
does not help to predict future returns significantly. In contrast, the “other” component of the
marketable retail order imbalance measure is always positive and significant in the regressions.
For instance, in the first regression, it has a coefficient of 0.0008 with a highly significant t-statistic
of 13.98. This result is consistent with the hypothesis that marketable retail investors trade on
information that is not incorporated into prices via the public news releases we measure. The
bottom half panel shows that when we move from the 25th percentile firm to the 75th firm, the
“other” component of the marketable retail order flow accounts for 0.07% to 0.10% of weekly
return differences, which is more than half of the return difference that the marketable retail order
imbalance can predict overall. These results suggest that either public news is noisy, the predictive
power of marketable retail investors’ order imbalance is not related to an identifiable public news
release, or the retail order imbalance is only related to public news releases in the more distant
future.

Returning to the interesting question raised at the beginning of this subsection, how can
retail investors using marketable orders, with far fewer resources than institutions, get the upper
hand in discovering non-public information? Here we offer two possible explanations.

32

Electronic copy available at: https://ssrn.com/abstract=2822105


First, as an investor group, retail investors can be heterogeneous. For instance, it is possible
that some individual investors might simply be endowed with non-public and valuable firm-
specific information. These individuals might work in the same industry or for a customer or
supplier and might naturally obtain value-relevant information in this way. They may also have
some resources for information acquisition. For example, it may be possible for individuals to
study the parking lots of retailers to assess demand growth. Farrell et al. (2020) provide evidence
that some retail investors make use of valuable firm-level analysis contained in SeekingAlpha
posts.

Second, our data only contains marketable retail market orders, but not retail limit orders.
Retail limit orders could have opposite information, which would at least partially offset our
findings for marketable retail market orders. In addition, it is not clear that the counterparties to
marketable retail orders are necessarily “better informed” institutional investors. For example,
Chakrabarty, Moulton, and Trzcinka (2017) document the presence of uninformed “short-term”
institutional investors as a non-trivial part of the market.

III. Further Discussion


Marketable retail order imbalances can predict future stock returns. This predictive ability
lasts up to eight weeks and is stronger for smaller and lower-priced firms. In this section, we
discuss several related issues to put the marketable retail order imbalance’s predictive power in
perspective. In Section III.A, we discuss whether marketable retail investors’ trading can predict
the market’s overall movement. We look into potential contamination of the retail subpenny trades
by dark pools, using subsample analysis in Section III.B. We examine whether the predictive
power is related to overall market conditions in Section III.C. We investigate the predictive power
of odd lot marketable retail orders in section III.D. Marketable retail trades occur with different
sizes, and we examine the predictive power of large vs. small trade sizes in Section III.E. It is
important to understand the role of wholesalers in this setup. Thus, Section III.F examines the
magnitude of price improvement and the profitability of interacting with marketable retail order
flow. We identify the nature of the information captured by marketable retail order flows by linking
marketable retail order imbalances to earnings news in Section III.G. We examine whether
marketable retail order imbalances can still predict future returns if we control for overall market

33

Electronic copy available at: https://ssrn.com/abstract=2822105


order imbalances in Section III.H. Finally, we examine the implicit assumption of price
improvements in Section III.I. To save space, all returns in this section are bid-ask returns.

A. Aggregate Marketable Retail Order Imbalance

If marketable retail order imbalances can predict future stock returns in the cross section,
retail investors using marketable orders may also be able to predict aggregate market moves. To
investigate this possibility, we aggregate marketable retail order imbalances across all firms to
predict aggregate stock market returns. We estimate the following equation:

𝑚𝑘𝑡(𝑤 + 1, 𝑤 + 𝑘) = 𝑚0 + 𝑚1 × 𝑎𝑔𝑔𝑚𝑟𝑜𝑖𝑏(𝑤) + 𝑢10(𝑤 + 1, 𝑤 + 𝑘), (17)

where mkt(w+1,w+k) is the future k-week cumulative market return from week w+1 to week w+k,
and aggmroib(w) is the current aggregated marketable retail order imbalance measure for week w.
We compute aggmroib using either value-weighted or equal-weighted mroibvol or mroibtrd
measures. The results are shown in Table XI Panel A. They are the same regardless of the
weighting scheme or order imbalance measure: there is no evidence that marketable retail order
flows can reliably predict future market returns.

Our approach can also be used to identify the marketable retail order flow in exchange-
traded funds (ETFs). In Table XI Panel B, we examine marketable retail order flow in a large cross
section of ETFs over the same time period. In cross-sectional predictive regressions of the form in
equation (6), the coefficient is mostly around or below one basis point, which is much smaller than
the comparable coefficients shown in Table III, and the t-statistics are mostly insignificant. This
result suggests that marketable retail order flows cannot predict sector returns or overall equity
market returns. To separate sector-oriented information from broader market-wide information,
we select the six largest ETFs that focus on the overall U.S. equity market by tracking
comprehensive U.S. equity indexes: SPY, IVV, VTI, VOO, IWM, and IWB. The results are
reported in the last row of Panel B. Consistent with the market timing results in Panel A, we find
little evidence that marketable retail order flow can predict future returns on broad equity market
ETFs.

B. Subsample Analysis

34

Electronic copy available at: https://ssrn.com/abstract=2822105


As mentioned in the data section, from 2008 and into 2011, a few dark pool operators were
accused of violating Regulation NMS in accepting, ranking, and executing subpenny trades and
were eventually fined by the SEC. These questionable dark pool trades account for about 0.5% of
total share volume during this period. Since these dark pools mainly cater to institutions, our
identification of retail flows using subpenny trades could be “contaminated”, and we cannot
identify which trades are from the affected dark pools. In this subsection, we conduct a robustness
check using subsamples, and examine whether our main results hold for both the “contaminated”
subsample and a later subsample that is not contaminated by these dark pool subpenny practices.

To be more specific, we re-estimate the key results in Table III, for subsamples 2010-2012
and 2013-2015. For comparison, in Table III, we find that the marketable retail order flow can
predict future stock return, with a regression coefficient of 0.0009, a t-statistic of 15.60, and an
interquartile weekly return difference of 10.89 bps. In Table XI Panel C, for the subsample of
2010-2012, the coefficient on marketable retail order is 0.0010, with a t-statistic of 11.52, and an
interquartile weekly return difference of 12.13bps. For the subsample of 2013-2015, the coefficient
on retail order becomes 0.0009, with a t-statistic of 10.57, and an interquartile weekly return
difference of 9.74 bps. We also test whether the coefficients from the two subperiods are
significantly different from each other. For the mroib(w-1) coefficients in regression I and II, the
difference is 0.0001, with a t-statistic of 1.85; for the coefficients in regression III and IV, the
difference is 0.0001, with a t-statistic of 0.67. Therefore, we cannot reject the hypothesis that the
coefficients from these two subperiods are the same.

We draw two observations from this exercise. First, the predictive power of retail order
flow for the “cleaner” second subperiod is significant, which shows that our results are robust.
Second, we cannot reject the hypothesis that the relevant coefficients from these two subperiods
are the same. In addition, even during the earlier subperiod, we believe that fewer than 10% of all
subpenny trades are misclassified as retail. For these reasons, we choose to use the whole 2010-
2015 sample period to conduct our main analysis.

C. Market Conditions

Barrot, Kaniel, and Srear (2016) find that marketable retail trades contain more information
when markets are volatile, specifically when the VIX option-implied volatility index is high. Their
sample spans from 2002 to 2010, during which the VIX experiences dramatic changes. In contrast,

35

Electronic copy available at: https://ssrn.com/abstract=2822105


our sample period is 2010 to 2015, where the VIX is far less volatile. Nevertheless, we divide our
sample in half: one portion when VIX is higher than the historical median of 18% and the other
when the VIX is below this historical median.

We re-estimate equation (6) for the high- and low-VIX subsamples. The results are
presented in Panel D of Table XI. Comparing the low- and high-VIX regimes, we find that the
coefficient on mroibvol is quite similar, yet the t-statistic is higher when VIX is low than when it
is high. This result might not be surprising, given that the volatility of all variables increases when
VIX is high. Overall, the predictive power in both high- and low-VIX regimes is positive and
significant.

D. Odd Lots

In this section, we investigate the behavior of odd lot marketable retail trades over the post-
December 2013 period when odd lot transactions are reported to the consolidated tape. Can odd
lot marketable retail order flow predict future firm-level returns? We estimate regression (6) using
odd lot marketable retail order imbalances and present the results in Table XI Panel E. Both
coefficients are positive but not statistically significant. In Appendix Table AVII, we find that
daily odd lot order imbalance measures can significantly predict returns for the next trading day
but not at longer horizons than that. We conclude that the odd lot marketable retail order imbalance
measure’s predictive power is much weaker than that of the overall marketable retail order
imbalance.

E. Order Sizes

As Figure 1 Panel A shows, a median market order submitted by a retail investor is around
$7,000. The median marketable retail trade is about 400 shares. The “stealth trading” literature
argues that medium-size orders are more likely to be informed and that large orders are usually
broken into smaller orders.

To determine if information content differs according to order size, we partition the orders
into large vs. small groups using 400 shares as the cutoff, and we estimate the predictive regression
for each group separately. The results are reported in Table XI Panel F. We find that both large
and small orders predict future stock returns, but the larger orders’ predictive power is stronger.
Our results suggest that more informed marketable retail investors may demand immediacy by

36

Electronic copy available at: https://ssrn.com/abstract=2822105


using larger market orders and that stealth trading does not seem to characterize the trading of
retail investors who use marketable orders.

F. Wholesaler/Internalizer’s Perspective: Profitability of Marketable Retail Order Flow

If marketable retail order flow is sufficiently informed, trading with these orders would be
unprofitable. This might raise the question of whether our results are consistent with the apparently
profitable business model of internalizers and wholesalers. Ultimately, as long as the information
content of marketable retail order flow is less than the bid-ask spread being charged, internalizers
and wholesalers on average can still earn positive revenues by trading with these orders. For
example, if a marketable retail buy and a marketable retail sell order arrive at the same time, they
offset each other, and a wholesaler earns the full bid-ask spread charged (the quoted spread less
the price improvement given). Ultimately, internalizers and wholesalers are only exposed to
adverse selection on marketable retail order imbalances. The summary statistics in Table I show
that there is a substantial amount of offsetting marketable retail order flow. The interquartile range
for the volume-based daily order imbalance measure is from -0.301 to 0.217, indicating that, even
at the ends of these ranges, more than two-thirds of the marketable retail order flow in such a stock
on a given day is offsetting buys and sells.

To get a better sense of the profitability of interacting with marketable retail order flow,
we compute standard microstructure information-content measures for the marketable retail trades
in our sample. Specifically, we calculate proportional effective spreads, one-minute price impacts,
and one-minute realized spreads for all marketable retail buys and sells during 2015. Realized
spreads are a standard proxy for trading revenue earned by a liquidity provider such as a
wholesaler. We apply standard data filters, eliminating all trades where effective spreads exceed
$1, and we calculate dollar-volume weighted averages across all stocks. We find that the mean
effective half-spread is 16 basis points. The one-minute price impact is four basis points, leaving
a realized half-spread of 12 basis points. In other words, interacting with our identified marketable
retail order flow appears to be profitable (at least before other costs) for the
wholesalers/internalizers because the bid-ask spreads are sufficiently large. The liquidity provider
(in this case, the wholesaler or internalizer) loses about four basis points (the price impact) of the
bid-ask spread to short-term price moves, but this leaves about 12 basis points (the realized spread)
of the bid-ask spread as average trading revenue to the liquidity provider. Note that the realized

37

Electronic copy available at: https://ssrn.com/abstract=2822105


spread is a very crude measure of trading revenue. Furthermore, we cannot measure payments
made by wholesalers to introducing brokers, nor can we measure the other costs associated with a
wholesaling or internalization operation. However, these realized spreads are considerably higher
than the realized spreads associated with on-exchange transactions, so we feel comfortable in
concluding that the price-improvement business model is quite profitable for wholesalers and
internalizers who can successfully segment order flow.

We can also examine some of the segmentation that is performed by these liquidity
providers. For instance, while there is substantial competition among wholesalers, the magnitude
of price improvement can vary substantially across orders. These liquidity providers are likely to
rationally incorporate the potential information embedded in marketable retail orders and offer
price improvement only up to the point at which they can still profit from the trade. That is, on the
one hand, if they infer there might be relevant information in the marketable retail order, they
might offer less price improvement, and on the other hand, if they conclude that the marketable
retail order is unlikely to contain relevant information, they might be willing to offer more price
improvement. If this is true, the predictive power of marketable retail order imbalances should be
greater for marketable retail trades with less price improvement.

In the earlier sections, we group all orders with subpenny prices between 0.6 and 1 as
marketable retail-initiated buy orders and group those between 0 and 0.4 as marketable retail-
initiated sell orders. In this section, we further divide orders into “less price improvement” and
“more price improvement” types. For transactions with less improvement, we define buyer-
initiated trades as transactions with prices between 0.8 and the round penny, and seller-initiated
trades as trades with transaction prices between the round penny and 0.2 cents. For the “more price
improvement” category, we define buyer-initiated trades as trades with transaction prices between
0.6 and 0.8, and seller-initiated trades as trades with transaction price between 0.2 and 0.4. We
compute marketable retail order imbalances following equation (1) and (2). We compare the
predictive power of marketable retail order imbalances for “more” vs. “less” price improvement
by estimating equation (6) on each order imbalance measure separately.

Recall that the distribution of subpenny price improvements is displayed in Figure 1 Panel
B. Most transaction occur at a round penny or half penny. Based on the other bins, each covering
0.1 cent, there is slightly more trading volume for the “less price improvement” category than for

38

Electronic copy available at: https://ssrn.com/abstract=2822105


the “more price improvement.” The regression results for the cross-section of future returns are
shown in Table XI Panel G. For the “less price improvement” type, the coefficients range from
0.0004 to 0.0007, both with t-statistics above 5. For the “more price improvement” type, the
coefficients range from 0.0001 to 0.0002, both with t-statistics below 4. Clearly, both sets of
marketable retail order imbalances have predictive power for future stock returns, but the
marketable retail trades with less price improvement have stronger predictive power, indicating
that internalizers/wholesalers successfully price-discriminate against marketable retail orders with
potentially more information content. Similar to the presence of large realized spreads, this
observation also supports the viability of the business model, particularly for internalizers and
wholesalers who can successfully distinguish between more- and less-informed order flows.

G. Earnings Announcements and Marketable Retail Order Flow

Kelley and Tetlock (2013) use the Dow Jones news archive to identify whether marketable
retail order flows are informed about cash flow news, and find that marketable retail market orders
can predict earnings surprises.

Here, we examine whether marketable retail order flow becomes more predictive around
earnings news. Specifically, we estimate a variant of equation (6) that allows the predictive
relationship to differ based on the variable eventday, an indicator that takes a value of 1 if day t is
an earnings announcement day and zero otherwise. The results are shown in Table XI Panel H.
They show that the predictive power of marketable retail order flow is greater on announcement
days, but the difference is not statistically significant. In Appendix Table AVIII, we directly
replicate the results in Kelley and Tetlock (2013), and find that our marketable retail order flows
can predict earnings news positively. The predictive power is statistically significant at the 1-day
horizon, but insignificant over longer horizons. That is, we are able to partially confirm KT’s
results. The difference may be attributable to the different samples periods and coverage we use.
While we cover all subpenny trades for most stocks over the 2010 to 2015 period, Kelley and
Tetlock cover about 1/3 of all marketable retail trades between 2003 and 2007.

H. Controlling for Overall Order Imbalances

Previous studies such as Chordia and Subrahmanyam (2004) find that overall order
imbalances (calculated using all reported transactions, including individual and institutional types)
can predict future stock returns. We use the Lee-Ready algorithm to compute the overall order

39

Electronic copy available at: https://ssrn.com/abstract=2822105


imbalance from TAQ data. In our dataset, overall order imbalances and marketable retail order
imbalances are significantly correlated at around 30%. An interesting question is whether overall
and marketable retail order imbalances are relatively orthogonal to each other: Specifically, if we
control for the overall order imbalance, can the marketable retail order imbalance still predict
future stock returns?

We proceed in two steps to address this question and report the results in Table XI Panel I.
In the first step, we re-estimate equation (6) using the overall order imbalance from the previous
week rather than marketable retail order imbalance as a key predicting variable. Consistent with
the literature, we find that overall order imbalances significantly predict future stock returns, with
a coefficient of 0.0004 and a significant t-statistic of 3.32.

In the second step, we estimate equation (6) using the marketable retail order imbalance
variables as key predicting variables, and include the overall order imbalance as a control. With
both marketable retail and market order imbalances in the model, marketable retail imbalances are
significantly positive, and they completely drive out the effect of overall order imbalances. Thus,
the predictive power of the marketable retail order imbalance seems to be stronger than that of the
overall order imbalance measure.

Here, we want to be cautious about the interpretation in the sense that this finding does not
necessarily indicate that the retail order flow that we identify is more informed than order flow
from institutional investors. First, due to different calculation methods for the two measures, the
difference between the overall oib and the marketable retail order imbalance is not the order
imbalance from institutional investors. Second, we only calculate the order flow from marketable
retail orders, which accounts for about half of the trades from retail investors, and the overall oib’s
weaker predictive power might be partially a result of uninformed trading by other participants in
the market.

I. When the Effective Spread is Less Than 1 Cent

Our identification for buy and sell orders relies on the implicit assumption that price
improvements are always a small fraction (less than half) of a cent. If price improvements are
larger, our method may not correctly sign trades. For example, if a stock has a bid price of $50.01
and an ask price of $50.04, and a marketable retail market buy order arrives and is improved by
0.75 cents, the reported transaction price would be $50.0325, and our trade-signing approach

40

Electronic copy available at: https://ssrn.com/abstract=2822105


would erroneously conclude that this is a sell order. We investigate whether our identification
method is reliable in three ways. First, recall that when we cross-validate using the 2010 NASDAQ
TRF sample, we find a trade sign error rate of only about 2%. Second, we examine intraday quote
data from TAQ. For all 2015 trades that we can sign using our approach, we compare our buy-sell
assignment to the trade sign from the Lee and Ready algorithm, and we find that the trade signs
match for 89.9% of the observations. Last, we impose a strict filter that requires the average
effective spread from the previous month to be at most one cent, and re-examine our results. For
stocks with a one-cent spread, our trade-sign approach for subpenny-priced trades should match
the Lee-Ready algorithm exactly and should be virtually error-free overall. This strict filter
excludes more than 80% of the data, and we retain only the most liquid stocks in the sample. The
results are shown in Table XI Panel J. We find the marketable retail order imbalance still
significantly predicts the next week’s stock returns, with a coefficient of 0.0008 and a significant
t-statistic of 4.48, consistent with the findings shown in Table III.

IV. Conclusions

In this paper, we exploit the fact that most marketable retail order flows in U.S. equity
markets are internalized or sold to wholesalers. As a part of this routing process, marketable retail
orders are typically given a small fraction of a penny per share of price improvement relative to
the national best bid or offer price, and this price improvement can be observed when the trade is
reported to the consolidated tape. Institutional orders almost never receive this kind of price
improvement, so it becomes possible to use subpenny trade prices to identify a broad swath of
marketable marketable retail order flow. It is also straightforward to identify whether the
marketable retail order is buying or selling stock: transactions at prices that are just above a round
penny are classified as marketable retail sales, while transactions that are just below a round penny
are marketable retail purchases.

We use this methodology to characterize the trading behavior and information content of
marketable retail orders. We find that marketable retail order flows are on average contrarian over
weekly horizons, buying stocks that have experienced recent price declines and selling stocks that
have risen in the past week. More significantly, we find that the marketable retail order flow can
predict the cross section of future stock returns. Over the next week, stocks with more positive
marketable retail order imbalances outperform stocks with relatively negative marketable retail

41

Electronic copy available at: https://ssrn.com/abstract=2822105


order imbalances by about 10 basis points, which is on the order of 5% annualized. This
predictability extends to about 12 weeks before dying off. Through an empirical decomposition
exercise, we attribute less than half of the predictive power of marketable retail order imbalances
to the order imbalance’s persistence and potential liquidity provision by marketable retail
investors’ contrarian trading. The remainder of the predictive power (over half of it) is consistent
with the hypothesis that the marketable retail order flow contains valuable information about future
returns. Concerning the information content of the marketable retail trades, we provide some
suggestive evidence that marketable retail order flows contain relevant information about short-
term future earnings news that is not yet incorporated into price, but our examination of a standard
news database does not find evidence that our retail investors can anticipate that particular set of
future public news.

An important advantage of our method is that it is based on widely available intraday


transaction data: Anyone with access to TAQ can easily identify marketable retail buys and sells
using our approach. Our approach has many possible research applications. For example, future
researchers can investigate certain behavioral biases to determine whether individual traders as a
group exhibit them. Another possibility is studying the seasonality and time-series variation of
marketable retail order flow, including tax-related and calendar-driven trading, as well as activity
around corporate events, such as dividends, stock splits, and equity issuance.20

20
Our measure is already used in a few studies. For instance, Farrell et al. (2020) find our retail order imbalances are
strongly correlated with the sentiment of “Seeking Alpha” articles and that the ability of retail order imbalances to
predict returns is roughly twice as high on days with SeekingAlpha posts. Israeli, Kasznik, and Sridharan (2019) use
our methodology to identify retail investor trading and then use abnormal retail trading volume and Bloomberg
searches as specific measures of retail and institutional investor attention.

42

Electronic copy available at: https://ssrn.com/abstract=2822105


REFERENCES

Arif, Salman and Ben-Rephael, Azi and Lee, Charles M.C., 2016, Short-Sellers and Mutual
Funds: Why Does Short-Sale Volume Predict Stock Returns? Working Paper.
Barber, Brad M., and Terrance Odean, 2000, Trading is hazardous to your wealth: The common
stock investment performance of individual investors, Journal of Finance 55, 773−806.
Barber, Brad M., and Terrance Odean, 2008, All that glitters: The effect of attention and news
on the buying behavior of individual and institutional investors, Review of Financial Studies 21, 785-
818.
Barber, Brad M., Terrance Odean, and Ning Zhu, 2009, Do retail trades move markets? Review
of Financial Studies 22, 151−186.
Barrot, Jean-Noel, Ron Kaniel, and David Alexandre Sraer, 2016, Are Retail Traders
Compensated for Providing Liquidity? Journal of Financial Economics 120, 146-168.
Battalio, Robert, Shane A. Corwin, and Robert Jennings, 2016, Can Brokers Have It All? On
the Relation between Make-Take Fees and Limit Order Execution Quality, Journal of Finance 71,
2193-2238.
Blume, Marshall E., and Robert F. Stambaugh, 1983, Biases in Computed Returns: An
Application to the Size Effect, Journal of Financial Economics 12, 387-404.
Boehmer, Ekkehart, Charles M. Jones, and Xiaoyan Zhang, 2008. Which Shorts Are Informed?
Journal of Finance 63, 491-527.
Campbell, John Y., Tarun Ramadorai, and Allie Schwartz, 2009, Caught on tape: Institutional
Trading, Stock Returns, and Earnings Announcements, Journal of Financial Economics, 92(1), 66-91.
Chakrabarty, Bidisha, Pamela C. Moulton, and Charles Trzcinka, 2017, The Performance of
Short-Term Institutional Trades. Journal of Financial and Quantitative Analysis 52.4 (2017), 1403-
1428.
Chordia, Tarun, and Avanidhar Subrahmanyam, 2004, Order Imbalance and Stock Returns:
Theory and Evidence, Journal of Financial Economics 72, 485−518.
Chordia, Tarun, Sahn-Wook Huh, and Avanidhar Subrahmanyam, 2007, The Cross-Section of
Expected Trading Activity, Review of Financial Studies 20, 709-740.
Fama, Eugene F., and James D. MacBeth, 1973, Risk, Return, and Equilibrium: Empirical
Tests, Journal of Political Economy 81, 607−636.

43

Electronic copy available at: https://ssrn.com/abstract=2822105


Farrell, Michael and Green, T. Clifton and Jame, Russell and Markov, Stanimir, 2020, The
Democratization of Investment Research and the Informativeness of Retail Investor Trading. Working
Paper.
Fong, Kingsley YL, David R. Gallagher, and Adrian D. Lee, 2014, Individual Investors and
Broker Types, Journal of Financial and Quantitative Analysis 49(2), 431-451.
Frazzini, Andrea and Israel, Ronen and Moskowitz, Tobias J., 2018, Trading Costs. Working
Paper.
Hansen, Lars Peter, and Robert J. Hodrick, 1980, Forward exchange rates as optimal predictors
of future spot rates: An econometric analysis, Journal of political economy, 88(5), 829-853.
Israeli, Doron and Kasznik, Ron and Sridharan, Suhas A., 2019, Unexpected Distractions and
Investor Attention to Corporate Announcements. Working Paper.
Kaniel, Ron, Saar Gideon, and Titman, Sheridan, 2008, Individual Investor Sentiment and
Stock Returns, Journal of Finance 63, 273−310.
Kaniel, Ron, Liu, Shuming, Saar, Gideon, and Titman, Sheridan, 2012, Individual Investor
Trading and Return Patterns Around Earnings Announcements, Journal of Finance 67, 639-680.
Kelley, Eric K. and Paul C. Tetlock, 2013, How Wise Are Crowds? Insights from Retail Orders
and Stock Returns, Journal of Finance 68, 1229-1265.
Kwan, Amy, Ronald Masulis, and Thomas H. McInish, 2015, Trading rules, Competition for
Order Flow and Market Fragmentation, Journal of Financial Economics, 115, 330-348.
Lee, Charles M. and Balkrishna Radhakrishna, 2000, Inferring Investor Behavior: Evidence
from TORQ Data, Journal of Financial Markets, 3(2), 83-111.
Lee, Charles M.C. and Mark J. Ready, 1991, Inferring Investor Behavior from Intraday Data,
Journal of Finance, 46(2), 733-746.
Menkveld, Albert J., Bart Z. Yueshen, and Haoxiang Zhu, 2017, Shades of Darkness: A
Pecking Order of Trading Venues, Journal of Financial Economics, 124, 503-534.
Newey, Whitney K., and Kenneth D. West, 1987, A Simple, Positive Semi-Definite,
Heteroskedasticity and Autocorrelation Consistent Covariance Matrix, Econometrica 55, 703−708.
O’Hara, Maureen, Chen Yao, and Mao Ye, 2014, What’s Not There: Odd Lots and Market
Data, Journal of Finance, 69(5), 2199-2236.

44

Electronic copy available at: https://ssrn.com/abstract=2822105


Table I. Summary Statistics
This table reports summary statistics of our measure of marketable retail investor trading activity.
Our sample period covers January 2010 to December 2015, and our sample firms are common
stocks listed on all U.S. stock exchanges with a share price of at least $1. Across all stocks and all
days, we report the pooled sample mean for the daily number of shares traded (vol), marketable
retail buy volume (mrbvol), marketable retail sell volume (mrsvol), number of trades (trd),
marketable retail buy trades (mrbtrd), marketable retail sell trades (mrstrd), as well as their odd
lot counterparts (prefix odd). Odd lot measures are available starting at the end of 2013. We include
odd lot-related data starting January 2014. We compute order imbalance measures (variables
containing mroib) as in equations (1) to (4).

N Mean Std Median Q1 Q3


Round lots and odd lots
Vol 4,628,957 1,229,004 6,849,849 221,234 51,768 819,615
Trd 4,628,957 5,917 13,909 1,505 312 5,502
Mrbvol 4,628,957 42,481 280,474 5,165 1,200 20,681
Mrsvol 4,628,957 42,430 264,704 5,635 1,369 21,828
Mrbtrd 4,628,957 110 410 22 5 79
Mrstrd 4,628,957 108 355 24 6 81
Mroibvol 4,628,957 -0.038 0.464 -0.027 -0.301 0.217
Mroibtrd 4,628,957 -0.032 0.437 -0.010 -0.276 0.205
Odd lots only
Oddvol 1,446,749 6,561 20,141 1,811 629 5,250
Oddtrd 1,446,749 222 669 64 21 186
Oddmrbvol 1,446,749 1,108 5,054 211 58 690
Oddmrsvol 1,446,749 968 3,488 210 62 663
Oddmrbtrd 1,446,749 37 171 7 2 23
Oddmrstrd 1,446,749 33 114 7 2 23
Oddmroibvol 1,446,749 -0.004 0.559 0.014 -0.338 0.331
Oddmroibtrd 1,446,749 -0.017 0.506 0.000 -0.290 0.250

45

Electronic copy available at: https://ssrn.com/abstract=2822105


Table II. Determinants of Marketable Retail Order Imbalances
This table reports determinants of retail investor trading activity. Our sample period covers January 2010 to December 2015, and our
sample firms are all common stocks listed on U.S. stock exchanges with a share price of at least $1. We estimate Fama-MacBeth
regressions as specified in equation (5). The dependent variables are two scaled marketable retail order imbalance measures: mroibvol
(based on the number of shares traded) and mroibtrd (based on the number of trades). As independent variables, we include the previous-
week return, ret(w-1), previous-month return, ret(m-1), and previous 6-month return, ret(m-7, m-2). We compute the weekly returns in
two ways: using the end-of-day bid-ask average price or the CRSP closing price. The control variables are monthly turnover (lmto),
monthly volatility of daily returns (lvol), log market cap (size), and log book-to-market ratio (lbm), all measured at the end of the previous
month. To account for serial correlation in the coefficients, the standard errors of the time-series are adjusted using Newey-West (1987)
with six lags.

Reg I II III IV
Dep.var Mroibvol Mroibvol Mroibtrd Mroibtrd
Return Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept -0.4013 -20.03 -0.4065 -20.19 -0.4326 -22.00 -0.4357 -22.01
Mroib(w-1) 0.2200 92.53 0.2201 92.57 0.2865 150.01 0.2866 150.06
Ret(w-1) -0.9481 -40.60 -0.9620 -41.43 -0.9003 -35.92 -0.9156 -36.74
Ret(m-1) -0.2778 -19.24 -0.2784 -19.30 -0.2258 -14.84 -0.2262 -14.87
Ret(m-7,m-2) -0.0586 -11.49 -0.0584 -11.46 -0.0380 -6.50 -0.0378 -6.48
Lmto 0.0003 5.31 0.0003 5.19 0.0002 3.93 0.0002 3.83
Lvol 0.8100 8.37 0.8478 8.79 0.4366 4.24 0.4633 4.51
Size 0.0154 12.06 0.0157 12.31 0.0209 16.37 0.0211 16.48
Lbm -0.0275 -17.66 -0.0274 -17.61 -0.0274 -18.09 -0.0273 -18.05
Adj.R2 6.00% 6.01% 9.49% 9.50%

46

Electronic copy available at: https://ssrn.com/abstract=2822105


Table III. Predicting Next-week Returns Using Marketable Retail Order Imbalances
This table reports estimation results on whether retail investors’ trading activity can predict the cross-section of one-week-ahead returns.
Our sample period covers January 2010 to December 2015, and our sample firms are all common stocks listed on U.S. stock exchanges
with a share price of at least $1. We estimate Fama-MacBeth regressions as specified in equation (6). The dependent variable is weekly
individual stock returns, computed in two ways: using end-of-day bid-ask average price or CRSP closing price. The independent
variables are two scaled marketable retail order imbalance measures: mroibvol (based on the number of shares traded) and mroibtrd
(based on the number of trades). As independent variables, we include the previous-week return, ret(w-1), previous-month return, ret(m-
1), and previous 6-month return, ret(m-7, m-2). The control variables are log book-to-market ratio (lbm), log market cap (size), monthly
turnover (lmto), and monthly volatility of daily returns (lvol), all measured at the end of the previous month. To account for serial
correlation in the coefficients, the standard errors of the estimated coefficients are adjusted using Newey-West (1987) with five lags.

Reg I II III IV
Order imbalance Mroibvol Mroibvol Mroibtrd Mroibtrd
Dep.var Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept 0.0050 2.58 0.0056 2.85 0.0050 2.58 0.0056 2.85
Mroib(w-1) 0.0009 15.60 0.0010 16.29 0.0008 12.30 0.0008 13.20
Ret (w-1) -0.0185 -5.83 -0.0220 -6.85 -0.0186 -5.88 -0.0222 -6.91
Ret (m-1) 0.0006 0.35 0.0006 0.34 0.0005 0.29 0.0005 0.29
Ret (m-7, m-2) 0.0008 1.16 0.0008 1.16 0.0008 1.12 0.0008 1.12
Lmto 0.0000 -3.37 0.0000 -3.76 0.0000 -3.36 0.0000 -3.75
Lvol -0.0223 -1.41 -0.0205 -1.31 -0.0217 -1.37 -0.0198 -1.27
Size -0.0001 -0.86 -0.0001 -0.92 -0.0001 -0.90 -0.0001 -0.96
Lbm -0.0001 -0.39 0.0000 -0.07 -0.0001 -0.42 0.0000 -0.10
Adj.R2 3.85% 3.85% 3.84% 3.84%
Interquartile 1.1888 1.1888 1.2292 1.2292
Interquartile weekly return diff 0.1089% 0.1144% 0.0931% 0.0997%

47

Electronic copy available at: https://ssrn.com/abstract=2822105


Table IV. Marketable Retail Return Predictability within Subgroups
This table reports whether marketable retail investor order imbalances can predict the cross section of returns for subsets of stocks. Our
sample period covers January 2010 to December 2015, and our sample firms are all common stocks listed on U.S. stock exchanges with
a share price of at least $1. We first sort all firms into 3 groups based on previous month-end characteristics. Then we estimate Fama-
MacBeth regressions, specified in equation (6), for each subgroup. The dependent variable is weekly returns on individual stocks,
computed using end-of-day bid-ask average price. The independent variables are two scaled marketable retail order imbalance measures:
mroibvol (based on number of shares traded) and mroibtrd (based on number of trades). To account for serial correlation in the
coefficients, the standard errors of the time-series are adjusted using Newey-West (1987) with five lags. For each regression, we also
provide the interquartile range for the relevant explanatory order imbalance along with the difference in predicted week-ahead returns
for observations at the two ends of the interquartile range. Control variables are the same as those in Table 3 and are not reported.

Panel A. Market cap groups


Mroib measure Mroibvol Mroibtrd
Mkt cap Coef. t-stat Interquartile Weekly return diff Coef. t-stat Interquartile Weekly return diff
Small 0.0013 13.90 1.662 0.219% 0.0012 11.58 1.736 0.207%
Medium 0.0007 9.18 1.323 0.087% 0.0004 5.63 1.346 0.059%
Big 0.0003 3.68 0.892 0.026% 0.0002 2.52 0.929 0.019%

Panel B. Share price groups


Mroib measure Mroibvol Mroibtrd
Price groups Coef. t-stat Interquartile Weekly return diff Coef. t-stat Interquartile Weekly return diff
Low 0.0014 13.34 1.432 0.205% 0.0012 10.34 1.586 0.185%
Medium 0.0007 10.00 1.289 0.089% 0.0005 7.56 1.309 0.070%
High 0.0002 3.23 0.961 0.020% 0.0002 2.19 0.961 0.015%

Panel C. Turnover groups


Mroib measure Mroibvol Mroibtrd
Turnover groups Coef. t-stat Interquartile Weekly return diff Coef. t-stat Interquartile Weekly return diff
Low 0.0011 15.60 1.837 0.205% 0.0011 14.71 1.777 0.195%
Medium 0.0008 10.21 1.219 0.094% 0.0006 7.05 1.228 0.071%
High 0.0007 4.98 0.910 0.065% 0.0004 2.55 1.005 0.037%

48

Electronic copy available at: https://ssrn.com/abstract=2822105


Table V. Predicting Returns k-weeks Ahead
This table reports estimation results on whether marketable retail investor trading activity can predict the cross-section of stock returns
at more distant horizons. Our sample period covers January 2010 to December 2015, and our sample firms are all common stocks listed
on U.S. stock exchanges with a share price of at least $1. We estimate Fama-MacBeth regressions, as specified in equation (7). The
dependent variable is the weekly individual stock return n-weeks ahead, computed in two ways: using the end-of-day bid-ask average
price (Panel A) or CRSP closing price (Panel B). The independent variables are two scaled marketable retail order imbalance measures,
mroibvol (based on the number of shares traded) or mroibtrd (based on the number of trades), respectively. To account for serial
correlation in the coefficients, the standard deviations of the time-series are adjusted using Newey-West (1987) with five lags. Control
variables are the same as those in Table 3; those coefficients are not reported.

Panel A. Predict bid-ask average return k weeks ahead


Mroibvol Mroibtrd
# of weeks ahead Coef. t-stat Coef. t-stat
1 week 0.00092 15.60 0.00076 12.30
2 weeks 0.00055 9.35 0.00048 7.89
4 weeks 0.00031 5.56 0.00026 4.66
6 weeks 0.00022 3.90 0.00015 2.60
8 weeks 0.00021 3.47 0.00011 1.75
10 weeks 0.00010 1.82 0.00002 0.35
12 weeks 0.00007 1.29 0.00009 1.52

Panel B. Predict CRSP return k weeks ahead


Mroibvol Mroibtrd
# of weeks ahead Coef. t-stat Coef. t-stat
1 week 0.00096 16.29 0.00081 13.20
2 weeks 0.00058 9.99 0.00052 8.57
4 weeks 0.00032 5.92 0.00028 5.05
6 weeks 0.00024 4.18 0.00017 2.93
8 weeks 0.00021 3.50 0.00011 1.80
10 weeks 0.00011 2.04 0.00005 0.81
12 weeks 0.00008 1.39 0.00010 1.76

49

Electronic copy available at: https://ssrn.com/abstract=2822105


Table VI. Long-short Strategy Returns Based on Marketable Retail Order Imbalances
This table reports portfolio returns using a long-short strategy wherein we buy the stocks in the highest quintile of scaled marketable
retail order imbalance, and we short the stocks in the lowest scaled marketable retail order imbalance quintile. The order imbalance is
computed during the previous week. Our sample period covers January 2010 to December 2015, and our sample firms are all common
stocks listed on U.S. stock exchanges with a share price of at least $1. Portfolio returns are value-weighted, and market cap terciles are
based on the previous month-end market cap. Because the holding period can be as long as 12 weeks, we report both the raw returns
and risk-adjusted returns using the Fama-French three-factor model. As our data are overlapping, we adjust the standard errors of the
portfolio return time-series using Hansen-Hodrick (1980) standard errors with the corresponding number of overlapping lags.

Panel A. Form portfolios on the previous week marketable retail order imbalance based on number of shares traded
Holding Whole sample Small Medium Big
Period Mean t-stat alpha t-stat alpha t-stat alpha t-stat alpha t-stat
1 week 0.092% 2.66 0.084% 2.43 0.403% 9.16 0.170% 6.24 0.067% 1.78
2 weeks 0.147% 2.45 0.135% 2.46 0.669% 9.01 0.292% 6.81 0.105% 1.70
4 weeks 0.223% 1.89 0.208% 2.00 1.124% 10.43 0.423% 6.36 0.143% 1.22
6 weeks 0.310% 1.72 0.277% 1.73 1.399% 13.02 0.558% 6.07 0.171% 1.05
8 weeks 0.448% 1.92 0.460% 2.26 1.709% 17.13 0.623% 4.18 0.342% 1.69
10 weeks 0.515% 1.99 0.484% 1.81 1.704% 11.17 0.578% 3.87 0.381% 1.53
12 weeks 0.588% 2.09 0.629% 1.89 1.857% 7.65 0.556% 3.20 0.477% 1.48

Panel B. Form portfolios on the previous week marketable retail order imbalance based on number of trades
Holding Whole sample Small Medium Big
Period Mean t-stat alpha t-stat alpha t-stat alpha t-stat alpha t-stat
1 week 0.056% 1.34 0.061% 1.44 0.343% 7.04 0.104% 3.52 0.055% 1.42
2 weeks 0.137% 1.72 0.143% 1.89 0.557% 6.72 0.194% 4.02 0.119% 1.61
4 weeks 0.238% 1.61 0.251% 1.88 0.880% 6.98 0.277% 3.75 0.214% 1.61
6 weeks 0.311% 1.50 0.350% 1.93 1.145% 6.25 0.313% 2.62 0.304% 1.84
8 weeks 0.427% 1.58 0.523% 2.26 1.468% 6.40 0.353% 1.91 0.449% 2.19
10 weeks 0.454% 1.41 0.539% 1.74 1.442% 5.37 0.292% 1.56 0.483% 1.64
12 weeks 0.529% 1.47 0.667% 1.70 1.672% 5.30 0.228% 1.05 0.567% 1.51

50

Electronic copy available at: https://ssrn.com/abstract=2822105


Table VII. Predictability Decomposition
This table reports estimation results on a decomposition of marketable retail order flow’s predictive power for the cross-section of future
stock returns. Our sample period covers January 2010 to December 2015, and our sample firms are all common stocks listed on U.S.
stock exchanges with a share price of at least $1. We estimate two-stage Fama-MacBeth regressions. Panel A reports the first-stage
estimation results, where the order imbalance measures are decomposed into three components, as specified in equation (8). Panel B
reports the second-stage decomposition of order imbalance’s predictive power, as specified in equation (9) - (11). The weekly returns
are computed in two ways: using end-of-day bid-ask average price or CRSP closing price. The scaled marketable retail order imbalance
measures are mroibvol (based on the number of shares traded) and mroibtrd (based on the number of trades). The variable mroib(w-1,
persistence) is estimated in the first stage using past order imbalance and reflects price pressure. The variable mroib(w-1, contrarian)
is estimated in the first stage using past returns over different horizons and is connected to the liquidity provision hypothesis. The
residual part of the previous-week order imbalance from the first-stage estimation is denoted as “other,” which can be attributed to
private information about future returns on the part of these marketable retail investors. As additional control variables, we include
previous-week return, ret(w-1), previous-month return, ret(m-1), and previous 6-month return, ret(m-7, m-2). The control variables are
log book-to-market ratio (lbm), log market cap (size), monthly turnover (lmto), and monthly volatility of daily returns (lvol), all measured
at the end of the previous month. To account for serial correlation in the coefficients, the standard errors of the time-series are adjusted
using Newey-West (1987) with five lags.

Panel A. First stage of projecting order imbalance on persistence and past return
Reg I II III IV
Dep.var Mroibvol(w-1) Mroibvol(w-1) Mroibtrd(w-1) Mroibtrd(w-1)
Return Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept -0.1413 -24.66 -0.1408 -24.61 -0.1054 -17.23 -0.1049 -17.19
Mroib(w-2) 0.2227 96.20 0.2228 96.20 0.2906 149.82 0.2907 149.85
Ret(w-2) -0.9286 -38.93 -0.9422 -39.80 -0.8926 -34.92 -0.9076 -35.81
Ret(m-1) -0.2029 -13.93 -0.2025 -13.90 -0.1591 -10.72 -0.1588 -10.70
Ret(m-7,m-2) -0.0267 -4.98 -0.0268 -4.99 -0.0054 -0.86 -0.0055 -0.88
Adj.R2 5.62% 5.63% 8.99% 9.00%

51

Electronic copy available at: https://ssrn.com/abstract=2822105


Panel B. Second-stage decomposition of order imbalance’s predictive power
Reg I II III IV
Order Imbalance Mroibvol(w) Mroibvol(w) Mroibtrd(w) Mroibtrd(w)
Dep.var Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept 0.0046 2.25 0.0052 2.54 0.0046 2.23 0.0052 2.52
Mroib(w-1,persistence) 0.0027 8.75 0.0029 9.41 0.0018 7.80 0.0019 8.56
Mroib(w-1,contrarian) -0.0044 -0.42 -0.1310 -1.46 -0.0073 -0.73 0.0328 1.62
Mroib(w-1,other) 0.0008 14.47 0.0009 15.48 0.0006 10.51 0.0007 11.64
Ret(w-1) -0.0176 -5.41 -0.0206 -6.27 -0.0177 -5.45 -0.0207 -6.30
Ret(m-1) -0.0060 -0.67 0.0002 0.03 0.0017 0.56 0.0093 1.13
Ret(m-7,m-2) -0.0009 -0.65 -0.0127 -1.12 0.0017 0.95 -0.0008 -0.34
Lmto 0.0000 -3.49 0.0000 -3.80 0.0000 -3.48 0.0000 -3.78
Lvol -0.0230 -1.48 -0.0231 -1.50 -0.0224 -1.44 -0.0225 -1.46
Size -0.0001 -0.61 -0.0001 -0.67 -0.0001 -0.65 -0.0001 -0.72
Lbm -0.0001 -0.46 0.0000 -0.14 -0.0001 -0.56 -0.0001 -0.23
Adj.R2 4.26% 4.27% 4.25% 4.26%
Int’quartile Int’quartile Int’quartile Int’quartile
return diff return diff return diff return diff
range range range range
Mroib(w-1,persistence) 0.2591 0.0688% 0.2593 0.0739% 0.3498 0.0620% 0.3500 0.0679%
Mroib(w-1,contrarian) 0.0627 -0.0277% 0.0631 -0.8265% 0.0614 -0.0445% 0.0619 0.2031%
Mroib(w-1,other) 1.1141 0.0915% 1.1141 0.0977% 1.1326 0.0718% 1.1327 0.0792%

52

Electronic copy available at: https://ssrn.com/abstract=2822105


Table VIII. Marketable Retail Order Imbalance and Contemporaneous Returns, Replicating KST Table III using mroibvol
This table presents analysis of market-adjusted returns around net buying and selling activity as given by our scaled marketable retail
order imbalance measure mroibvol (based on the number of shares traded). The sample extends from Jan 2010 to Dec 2015. For each
(non-overlapping) week in the sample period, we aggregate the daily order imbalance measures to weekly to form Mroib deciles. Each
stock is put into 1 of 10 deciles according to the Mroib value in the current week. Decile 1 contains the stocks with the most net selling
(negative Mroib) while decile 10 contains the stocks with the most net buying (positive Mroib). We present the results for four portfolios:
(i) decile 1, (ii) deciles 1 and 2, (iii) deciles 9 and 10, and (iv) decile 10. Let k be the number of days prior to or following the portfolio
formation each week. In Panel A, we calculate eight cumulative return numbers for each of the stocks in a portfolio: 𝐶𝑅(𝑡 − 𝑘, 𝑡 − 1),
where 𝑘 ∈ {20,15,10,5} days and t is the first day of the formation week, and 𝐶𝑅(𝑡 + 1, 𝑡 + 𝑘), where 𝑘 ∈ {5,10,15,20} days and t is
the last day of the formation week. The return on each portfolio is then adjusted by subtracting the return on a market proxy (the equal-
weighted portfolio of all stocks in the sample). We present the time-series mean and t-statistic for each market-adjusted cumulative
return measure and for the market-adjusted return during the intense trading week (k=0). In Panel B, we present the time-series mean
and t-statistic for weekly market-adjusted returns in the 4 weeks around the formation week (i.e., 𝐶𝑅(𝑡 − 𝑘, 𝑡 − 𝑘 + 4), where 𝑘 ∈
{20,15,10,5} days and t is the first day of the formation week, and 𝐶𝑅(𝑡 + 𝑘 − 4, 𝑡 + 𝑘), where 𝑘 ∈ {5,10,15,20} and t is the last day
of the formation week). ** indicates significance at 1% level and * indicates significance at 5% level (both against a two-sided
alternative). The t-statistic is computed using Newey-West standard errors.

Panel A. Cumulative market-adjusted return


Mroibvol Intense Selling Selling Buying Intense Buying
Bid-ask return (decile 1) (decile 1&2) (decile 9&10) (decile10)
Mean t-stat Mean t-stat Mean t-stat Mean t-stat
k=-20 0.0067** 5.48 0.0063** 8.10 -0.0111** -19.00 -0.0129** -12.49
k=-15 0.0056** 5.62 0.0055** 8.71 -0.0096** -20.87 -0.0109** -12.91
k=-10 0.0041** 5.40 0.0042** 9.04 -0.0074** -20.83 -0.0084** -12.93
k=-5 0.0027** 6.06 0.0028** 10.07 -0.0047** -24.09 -0.0053** -15.62
k=0 -0.0024** -5.30 -0.0019** -5.65 0.0011** 4.02 0.0011** 2.69
k=5 -0.0016** -3.89 -0.0012** -5.07 0.0018** 9.58 0.0024** 6.99
k=10 -0.0023** -3.16 -0.0018** -4.35 0.0028** 8.38 0.0036** 5.89
k=15 -0.0025* -2.45 -0.0022** -3.97 0.0036** 8.39 0.0046** 5.51
k=20 -0.0030* -2.36 -0.0025** -3.63 0.0043** 8.89 0.0057** 5.74

53

Electronic copy available at: https://ssrn.com/abstract=2822105


Panel B. Weekly market-adjusted return
Mroibvol Intense Selling Selling Buying Intense Buying
Bid-ask return (decile 1) (decile 1&2) (decile 9&10) (decile10)
Mean t-stat Mean t-stat Mean t-stat Mean t-stat
k=-20 0.0010** 2.75 0.0008** 3.36 -0.0016** -8.15 -0.0019** -5.90
k=-15 0.0016** 4.12 0.0014** 5.66 -0.0022** -12.70 -0.0026** -7.97
k=-10 0.0014** 3.59 0.0015** 5.88 -0.0028** -12.66 -0.0030** -7.62
k=-5 0.0027** 6.06 0.0028** 10.07 -0.0047** -24.09 -0.0053** -15.62
k=0 -0.0024** -5.30 -0.0019** -5.65 0.0011** 4.02 0.0011** 2.69
k=5 -0.0016** -3.89 -0.0012** -5.07 0.0018** 9.58 0.0024** 6.99
k=10 -0.0006 -1.51 -0.0006* -2.42 0.0010** 5.38 0.0013** 3.68
k=15 -0.0001 -0.21 -0.0004 -1.74 0.0009** 4.61 0.0010** 2.77
k=20 -0.0005 -1.34 -0.0003 -1.29 0.0006** 3.84 0.0010** 2.80

54

Electronic copy available at: https://ssrn.com/abstract=2822105


Table IX. Relation between Public News and Marketable Retail Order Flow
This table reports analysis of the relation between public news and our marketable retail investor order imbalance. Our sample period
covers January 2010 to December 2014, and our sample firms are all common stocks listed on U.S. stock exchanges with a share price
of at least $1. In Panel A, we examine whether public news can predict the cross-section of future stock returns. We estimate the Fama-
MacBeth regression specified in equation (12). The dependent variable is weekly returns, computed in two ways: using end-of-day bid-
ask average price or CRSP closing price. The independent variables are sent(w-1), which is the average TRNA net sentiment score for
firm i during week w by averaging non-missing news sentiment for firm i within week w-1. We also include past marketable retail order
imbalance mroibvol (based on the number of shares traded by marketable retail) in regressions III and IV. As control variables, we
include the previous-week return, ret(w-1), previous-month return, ret(m-1), and previous 6-month return, ret(m-7, m-2), log book-to-
market ratio (lbm), log market cap (size), monthly turnover (lmto), and monthly volatility of daily returns (lvol), measured at the end of
the previous month. In Panel B, we examine the relation between contemporaneous public news sentiment and marketable retail order
imbalance across subtopics. We estimate the Fama-MacBeth regression specified in equation (13). The dependent variable is weekly
net sentiment score, sent(i,w). The independent variable is the marketable retail order imbalance measure mroibvol . As control variables,
we include the previous-week return, ret(w-1), previous-month return, ret(m-1) and previous 6-month return, ret(m-7,m-2), log book-to-
market ratio (lbm), log market cap (size), monthly turnover (lmto), monthly volatility of daily returns (lvol). Coefficients on controls are
not reported for brevity. To account for serial correlation in the coefficients, the standard errors of the estimated coefficients are adjusted
using Newey-West (1987) with 5 lags.

55

Electronic copy available at: https://ssrn.com/abstract=2822105


Panel A. Predicting returns using public news and marketable retail order flow
Reg I II III IV
Dep.var Bid-ask return CRSP return Bid-ask return CRSP return
Order Imbalance Mroibvol Mroibvol
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept 0.0057 2.48 0.0066 2.83 0.0061 2.66 0.0070 3.02
Sent(w-1) 0.0008 3.31 0.0009 3.64 0.0008 3.33 0.0009 3.66
Mroib(w-1) 0.0009 10.61 0.0010 11.53
Ret(w-1) -0.0088 -2.65 -0.0105 -3.10 -0.0090 -2.70 -0.0107 -3.16
Ret(m-1) 0.0008 0.38 0.0009 0.44 0.0013 0.64 0.0015 0.72
Ret(m-7,m-2) 0.0001 0.15 0.0001 0.11 0.0002 0.21 0.0001 0.18
Lmto 0.0000 -1.03 0.0000 -1.29 0.0000 -1.14 0.0000 -1.41
Lvol -0.0435 -2.15 -0.0465 -2.34 -0.0444 -2.20 -0.0477 -2.40
Size -0.0001 -0.90 -0.0001 -1.04 -0.0001 -0.99 -0.0002 -1.14
Lbm 0.0001 0.29 0.0001 0.61 0.0001 0.44 0.0002 0.79
Adj.R2 5.01% 5.01% 5.06% 5.08%

Panel B. contemporaneous relation between sentiment and order imbalance


Topic Type Description N g1 t-stat
RESF Equities results forecast 102,515 0.0054 3.90
AAA money/debt debt rating news 23,405 0.0131 3.66
DIP general news diplomacy 6,057 0.0362 3.34
DIV Equities Dividend 24,282 0.0093 2.77
IGD money/debt investment grade debt 6,760 0.0261 2.76
DRV cross market derivatives 18,061 0.0238 2.58
DBT money/debt debt markets 73,600 0.0060 2.55
MTG money/debt mortgage-backed debt 7,764 0.0264 2.52
RES equities corporate results 176,699 0.0031 2.36
JUDIC general news Judicial 28,280 0.0096 2.31

56

Electronic copy available at: https://ssrn.com/abstract=2822105


Table X. Predictability Decomposition using Public News Releases

This table reports estimation results on a decomposition of our marketable retail order flow measure’s predictive power for future returns.
Our sample period covers January 2010 to December 2014, and our sample firms are all common stocks listed on U.S. stock exchanges
with a share price of at least $1. We estimate two-stage Fama-MacBeth regressions specified in equations (14) - (16). The dependent
variable is weekly returns, computed in two ways: using end-of-day bid-ask average price or CRSP closing price. The independent
variables are scaled marketable retail order imbalance measures: mroibvol (based on the number of shares traded) and mroibtrd (based
on the number of trades). In the first-stage estimation, the order imbalance measures are decomposed into four components. The variable
mroib(w-1, persistence) is estimated in the first stage using past order imbalance and reflects price pressure. The variable mroib(w-1,
contrarian) is estimated in the first stage using past returns over different horizons, which is connected to the liquidity provision
hypothesis. The variable mroib(w-1, public) is estimated in the first stage using week w-1 news sentiment, which proxies for marketable
retail order imbalances that predict returns associated with future news releases. The residual part of previous-week order imbalance
from first-stage estimation is denoted as “other,” which we attribute to retail investors’ valuable private information about future returns
that is incorporated into prices but is not associated with an identifiable public news release. As additional control variables, we include
previous-week return, ret(w-1), previous-month return, ret(m-1), and previous 6-month return, ret(m-7, m-2). Other control variables
are log book-to-market ratio (lbm), log market cap (size), monthly turnover (lmto), and monthly volatility of daily returns (lvol), all
measured at the end of the previous month. To account for serial correlation in the coefficients, the standard deviations of the time-series
are adjusted using Newey-West (1987) with five lags.

Panel A. First stage of projecting order imbalance on persistence, past return, and public news.
Reg I II III IV
Dep.var Mroibvol(w-1) Mroibvol(w-1) Mroibtrd(w-1) Mroibtrd(w-1)
Return Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept -0.1510 -24.08 -0.1505 -24.03 -0.1132 -16.89 -0.1127 -16.85
Mroib(w-2) 0.2208 112.64 0.2209 112.67 0.2827 125.44 0.2829 125.48
Ret(w-2) -0.8918 -36.22 -0.9051 -37.00 -0.8940 -34.04 -0.9098 -35.01
Ret(m-1) -0.2169 -13.68 -0.2156 -13.60 -0.1702 -10.56 -0.1687 -10.47
Ret(m-7,m-2) -0.0264 -4.54 -0.0264 -4.55 -0.0080 -1.17 -0.0081 -1.19
Sent(w-1) 0.0249 11.60 0.0249 11.60 0.0305 13.81 0.0305 13.81
Adj.R2 5.49% 5.50% 8.58% 8.59%

57

Electronic copy available at: https://ssrn.com/abstract=2822105


Panel B. Second-stage decomposition of order imbalance’s predictive power
Reg I II III IV
Order Imbalance Mroibvol(w) Mroibvol(w) Mroibtrd(w) Mroibtrd(w)
Dep.var Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept 0.0055 2.40 0.0060 2.63 0.0055 2.39 0.0060 2.62
Mroib(w-1,persistence) 0.0027 8.50 0.0029 9.16 0.0018 7.28 0.0020 8.02
Mroib(w-1,contrarian) 0.0088 0.55 0.6947 1.03 -0.0182 -1.00 0.0419 1.18
Mroib(w-1,public news) 0.1150 1.17 -0.0134 -0.35 -0.0386 -0.84 0.0020 0.04
Mroib(w-1,other) 0.0008 13.98 0.0009 15.16 0.0006 10.15 0.0007 11.35
Ret(w-1) -0.0217 -6.25 -0.0250 -7.12 -0.0218 -6.28 -0.0251 -7.16
Ret(m-1) 0.0059 0.54 0.6585 1.03 0.0106 1.05 0.0119 1.41
Ret(m-7,m-2) 0.0014 1.01 0.0511 1.02 0.0059 0.99 -0.0004 -0.22
Lmto 0.0000 -2.40 0.0000 -2.75 0.0000 -2.35 0.0000 -2.71
Lvol -0.0273 -1.63 -0.0252 -1.52 -0.0266 -1.59 -0.0244 -1.47
Size -0.0001 -0.68 -0.0001 -0.73 -0.0001 -0.73 -0.0001 -0.79
Lbm 0.0001 0.31 0.0001 0.66 0.0001 0.25 0.0001 0.59
Adj.R2 4.22% 4.23% 4.21% 4.22%
Interquartile return diff Interquartile Return diff Interquartile return diff Interquartile return diff
Mroib(w-1,persistence
0.2760 0.0707% 0.2763 0.0761% 0.3609 0.0614% 0.3611 0.0678%
+contrarian+public news)
Mroib(w-1,other) 1.1202 0.0932% 1.1203 0.1006% 1.1654 0.0745% 1.1654 0.0829%

58

Electronic copy available at: https://ssrn.com/abstract=2822105


Table XI. Additional Analysis
Our sample period covers January 2010 to December 2015, and our sample firms are all common stocks listed on U.S. stock exchanges
with a share price of at least $1. Standard errors are calculated using Newey-West (1987). In Panel A, we estimate equation (17). The
dependent variable is the n-week ahead weekly value-weighted market return. The independent variables are two scaled marketable
retail order imbalance measures, mroibvol (based on the number of marketable retail shares traded), and mroibtrd (based on the number
of marketable retail trades), respectively. For all other panels, the regression is specified in equation (6) and estimated using Fama-
MacBeth regressions. In Panel B, the dependent variable is weekly returns on approximately 1000 ETFs. In Panel C, we estimate the
coefficients for different subsample. In Panel D, we estimate the coefficients for different VIX regimes. In Panel E, the independent
variables are two scaled odd lot marketable retail order imbalance measures, oddmroibvol (based on the number of odd lot shares traded),
and oddmroibtrd (based on the number of odd lot trades), respectively. In Panel F, we estimate the coefficients for different trade size.
In Panel G, we estimate the coefficients for different amounts of price improvement. In Panel H, we estimate a variant of equation (6)
that allows the predictive relationship to differ based on the variable event day, an indicator that takes a value of 1 if day t is an
announcement day and zero otherwise. In Panel I, we estimate the coefficients controlling for overall order imbalance computing by
Lee-Ready algorithm. In Panel J, we estimate the coefficient when the effective spread is less than 1 cent. The dependent variable is
weekly returns, computed in two ways: using end-of-day bid-ask average price or CRSP closing price. The independent variable is one
of the two scaled marketable retail order imbalance measures mroibvol or mroibtrd. Control variables for the cross-sectional regressions
are the same as those shown in Table III, except that we do not include a book-to-market variable in the ETF regression; those coefficients
are not reported.

Panel A. Predicting future n-week market return


Mroibvol Mroibvol Mroibtrd Mroibtrd
Weights Value weight Equal weight Value weight Equal weight
Horizon Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
1 week 0.0037 0.50 -0.0053 -0.57 0.0054 0.92 -0.0038 -0.46
2 weeks 0.0101 0.79 -0.0030 -0.20 0.0120 1.21 0.0007 0.06
4 weeks 0.0044 0.20 -0.0236 -1.04 0.0073 0.43 -0.0136 -0.63
6 weeks -0.0061 -0.22 -0.0356 -1.25 0.0022 0.10 -0.0216 -0.80
8 weeks 0.0075 0.20 -0.0046 -0.10 0.0118 0.41 0.0044 0.11
10 weeks 0.0051 0.11 -0.0114 -0.23 0.0101 0.28 -0.0038 -0.08
12 weeks -0.0059 -0.10 -0.0315 -0.58 0.0000 0.00 -0.0227 -0.46

59

Electronic copy available at: https://ssrn.com/abstract=2822105


Panel B. Using marketable retail mroib to predict ETF returns
Order imbalance Mroibvol Mroibtrd
Dep.var Bid-ask return Bid-ask return
Coef. t-stat Coef. t-stat
All ETFs 0.0001 2.04 0.0001 1.68
Interquartile 1.4726 1.4737
Return diff 0.0153% 0.0118%
Broad market ETFs -0.0004 -0.81 0.0005 1.52

Panel C. Subsample analysis


Reg I II III IV
Period 2010-2012 2013-2015 2010-2012 2013-2015
Order imbalance Mroibvol Mroibvol Mroibtrd Mroibtrd
Dep.var Bid-ask Return Bid-ask Return Bid-ask Return Bid-ask Return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Mroib 0.0010 11.52 0.0009 10.57 0.0008 9.39 0.0007 8.10
Interquartile 1.2357 1.1424 1.3328 1.1266
Return diff 0.1213% 0.0974% 0.1041% 0.0826%

Panel D. Different market conditions


Vix <=18% Vix>18%
Dep.var Bid-ask return Bid-ask return
Indep.var Coef. t-stat Coef. t-stat
Mroibvol 0.0009 13.49 0.0010 9.36
Mroibtrd 0.0007 10.32 0.0008 7.60

Panel E. Predicting stock returns using odd-lot order imbalances


Order imbalance Mroibvol Mroibtrd
Dep.var Bid-ask return Bid-ask return
Coef. t-stat Coef. t-stat
Odd lot 0.0001 1.41 0.0001 0.77
Interquartile 1.2734 1.1314
Return diff 0.0154% 0.0086%

Panel F. Different marketable retail trade sizes


Order imbalance Mroibvol Mroibtrd
Dep.var Bid-ask return Bid-ask return
Coef. t-stat Coef. t-stat
Small trades (< 400 shares) 0.0004 5.77 0.0004 4.48
Large trades (≥ 400 shares) 0.0009 7.25 0.0008 5.85

Panel G. Different price improvement amounts


Order imbalance Mroibvol Mroibtrd

60

Electronic copy available at: https://ssrn.com/abstract=2822105


Dep.var Bid-ask return Bid-ask return
Coef. t-stat Coef. t-stat
Less price improvement 0.00071 9.30 0.00042 5.57
More price improvement 0.00021 3.04 0.00018 2.43

Panel H. Earnings surprises


Order imbalance Mroibvol Mroibtrd
Dep.var Bid-ask return Bid-ask return
Coef. t-stat Coef. t-stat
Mroib 0.0003 8.16 0.0004 11.98
Mroib* eventday 0.0003 1.47 0.0002 1.31

Panel I. Marketable retail vs. overall order imbalance


Order imbalance Overall Mroib Mroibvol Mroibtrd
Dep.var Bid-ask return Bid-ask return Bid-ask return
Coef. t-stat Coef. t-stat Coef. t-stat
Retail Mroib 0.0011 6.14 0.0006 3.33
Overall Mroib 0.0004 3.32 0.0000 0.10 0.0001 0.51

Panel J. When effective spread is less than one cent


Order imbalance Mroibvol Mroibtrd
Dep.var Bid-ask return Bid-ask return
Coef. t-stat Coef. t-stat
Mroib 0.0008 4.48 0.0004 2.45

61

Electronic copy available at: https://ssrn.com/abstract=2822105


Figure 1. Distribution of Trade Size and Subpenny Prices for Marketable Retail Orders
These figures report summary statistics for the marketable retail investor trading we identify. Our
sample period covers January 2010 to December 2015, and our sample firms are all common stocks
listed on U.S. stock exchanges with a share price of at least $1. In Panel A, we compute the trade
size in dollars as the number of shares multiplied by transaction price. For each year, we report the
cross-sectional median, q1 (25th percentile) and q3 (75th percentile). In Panel B, we separate trades
into 12 groups based on subpenny increments: trades at the whole penny, at the half penny, and in
buckets that are 0.1 cent wide. We report the cross-sectional median of the daily number of shares
traded in each group.

Panel A. Marketable retail order trade size in dollars


30,000

25,000

20,000

15,000

10,000

5,000

0
2010 2011 2012 2013 2014 2015

25% Median 75%

Panel B. Median share volumes for different subpenny groups


30,000

25,000

20,000

15,000

10,000

5,000

-
=0 0< 0.1< 0.2< 0.3< 0.4< =0.5 0.5< 0.6< 0.7< 0.8< 0.9<
<=0.1 <=0.2 <=0.3 <=0.4 <0.5 <=0.6 <=0.7 <=0.8 <=0.9 <1

62

Electronic copy available at: https://ssrn.com/abstract=2822105


Figure 2. Time Series of Marketable Retail Investor Order Imbalances
These figures report time series statistics of our identified marketable retail investor trading activity. Our sample period covers January
2010 to December 2015, and our sample firms are all common stocks listed on U.S. stock exchanges with a share price of at least $1.
We present cross-sectional means, medians, q1 (25th percentile), and q3 (75th percentile) for each day.

63

Electronic copy available at: https://ssrn.com/abstract=2822105


Figure 3. Portfolio Return Difference Using Previous Week Marketable Retail Order
Imbalance
These figures plot weekly value-weighted portfolio return differences between quintile 5 and
quintile 1, where stocks are sorted on the previous-week marketable retail order imbalance
calculated using the number of shares traded (mroibvol). Our sample period covers January 2010
to December 2015, and our sample firms are all common stocks listed on U.S. stock exchanges
with a share price of at least $1. The portfolio returns are computed using the end-of-day bid-ask
average price (bidaskret) in Panel A and the CRSP closing price (crspret) in Panel B.

Panel A. Weekly portfolio return difference using end-of-day bid-ask average prices
0.06

0.05

0.04

0.03

0.02

0.01

-0.01

-0.02

-0.03

-0.04
201001 201101 201201 201301 201401 201501

Panel B. Weekly portfolio return difference using CRSP closing prices


0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
-0.01
-0.02
-0.03
-0.04
201001 201101 201201 201301 201401 201501

64

Electronic copy available at: https://ssrn.com/abstract=2822105


Internet Appendix.
Appendix Figure A1. Marketable retail Order Flows between 2005 and 2017
These figures report the time series mean of marketable retail investor trading activity from
January 2005 (the start of Reg NMS, which allows subpenny price improvement in its current form)
to December 2017. Our sample firms are all common stocks listed on U.S. stock exchanges with
a share price of at least $1. In Panel A, we separately present the cross-sectional mean for each
stock-day for marketable retail buy trades and marketable retail sell trades. In Panel B, we present
the cross-sectional mean for each stock-day for marketable retail buy volume and marketable retail
sell volume.
Panel A. Number of Trades

250

200

150

100

50

retail buy trade mean retail sell trade mean

Panel B. Share Volume

120000

100000

80000

60000

40000

20000

retail buy volume mean retail sell volume mean

Electronic copy available at: https://ssrn.com/abstract=2822105


Appendix Figure A2. Marketable retail Order Imbalance Persistence and Cross-
correlations with Returns
These figures report the autocorrelation of marketable retail order imbalances and cross-
autocorrelations with past returns. Our sample period covers January 2010 to December 2015, and
our sample firms are common stocks listed on all U.S. stock exchanges with a share price of at
least $1. We also report subgroup correlations based on market cap (size) measured at the end of
the previous month. Panel A shows the auto-correlations of marketable retail order imbalances
using the number of shares for each firm at a lag of N days (N = 1, 2, 3, 4, 5, 10, 20, 30, 40, 50,
60, 70, and 80), reporting the cross-firm median of autocorrelations. Panel B shows the cross-
autocorrelations of marketable retail order imbalances using number of trades with past returns for
each firm, with the correlation calculated at a lag of N days (N = 1, 2, 3, 4, 5, 10, 20, 30, 40, 50,
60, 70, and 80) and reporting the cross-firm median of these cross-autocorrelations.
Panel A. Auto-correlations of Marketable Retail Order Imbalance Measure Using Number of
Shares

0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
1 2 3 4 5 10 20 30 40 50 60 70 80

All size group1 size group2 size group3

Panel B. Correlation of Marketable Retail Order Imbalance Measure Using Number of Trades with
Past Returns

0.030
0.025
0.020
0.015
0.010
0.005
0.000
-0.005 1 2 3 4 5 10 20 30 40 50 60 70 80
-0.010
-0.015
-0.020

All size group1 size group2 size group3

Electronic copy available at: https://ssrn.com/abstract=2822105


Appendix Table AI. What Affects Marketable Retail Order Imbalances?
This table reports covariates with our marketable retail order imbalance measure. Our sample
period covers January 2010 to December 2015, and our sample firms are common stocks listed on
all U.S. stock exchanges with a share price of at least $1. We estimate Fama–MacBeth regressions
in all panels. The dependent variables are two scaled marketable retail order imbalance measures:
mroibvol (based on the number of marketable retail shares traded) and mroibtrd (based on the
number of marketable retail trades). As independent variables, we include the previous-week
return ret(w-1), previous-month return, and previous 6-month return, ret(m-7,m-2). We compute
weekly returns in two ways: using end-of-day bid–ask average price or CRSP closing price. The
control variables are monthly turnover (lmto), monthly volatility of daily returns (lvol), log market
cap (size), and log book-to-market ratio (lbm), all measured at the end of the previous month. In
Panels A and B, variable Max(ret[t],0) is equal to ret[t] if the return is positive and 0 otherwise;
Min(ret[t],0) is equal to ret[t] if the return is negative and 0 otherwise. In Panel C, firms are sorted
into 3 groups based on previous month-end characteristics. The Rank2 interactive dummy is 1 if
the firm belongs to the medium characteristic group and 0 otherwise. The Rank3 interactive
dummy is 1 if the firm belongs to the large characteristic group and 0 otherwise. In Panel D, the
Monday dummy is equal to 1 if the trading day is Monday and 0 otherwise. The dummy variables
Friday, December, and January are defined similarly. In Panel E, the dependent variables are two
normalized order imbalance measures: mroibvol[m] (the number of monthly marketable retail buy
shares minus the number of monthly marketable retail sell shares divided by the sum of the number
of monthly marketable retail buy shares and the number of monthly marketable retail sell shares),
mroibtrd[m] (the number of monthly marketable retail buy trades minus the number of monthly
marketable retail sell trades divided by the sum of the number of monthly marketable retail buy
trades and the number of monthly marketable retail sell trades). As independent variables, we
include the monthly return of an individual stock if positive and 0 otherwise (Ret+); the monthly
return of an individual stock if negative and 0 otherwise (Ret-); the logarithm of month-end market
value (ASIZE=ln(MV)); the logarithm of one plus the number of months since its listing on an
exchange M (FAGE=log(1+M)); the logarithm of price (ALNP=ln(Prc)); the logarithm of one plus
the number of analysts who follow a company and report forecasts to the I/B/E/S database
(ALANA=log(1+ANA)); the standard deviation of the most recent eight quarterly earnings
(EVOLA); the absolute value of the most recent quarterly earnings minus the earnings four quarters
ago (ESURP); the analyst forecast dispersion (FDISP), which is defined as the standard deviation
of earnings per share forecasts from multiple (two or more) analysts; firm leverage (LEVRG),
which is the book debt divided by total assets, where book debt is current liabilities, long term
debt, and preferred stocks. Beta is the ex-ante rolling beta regressed on the market factor using
three years of daily returns.

Electronic copy available at: https://ssrn.com/abstract=2822105


Panel A. Momentum or contrarian over daily horizon: past returns
Reg I II III IV
Dep.var Mroibvol Mroibvol Mroibtrd Mroibtrd
Return Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept -0.1173 -26.85 -0.1195 -27.27 -0.1322 -31.78 -0.1335 -31.88
Mroib[-1] 0.0854 75.44 0.0854 75.40 0.1357 104.59 0.1357 104.70
Max(ret[-1],0) 0.1975 12.27 0.1919 11.77 0.2701 16.79 0.2563 15.82
Min(ret[-1],0) -0.5474 -21.69 -0.5510 -21.30 -0.4896 -20.27 -0.5061 -20.41
Max(ret[-6,-2],0) -0.0987 -12.81 -0.1015 -13.23 -0.0482 -6.56 -0.0514 -7.00
Min(ret[-6,-2],0) -0.3551 -36.09 -0.3604 -36.64 -0.3229 -32.99 -0.3267 -33.61
Max(ret[-27,-7],0) -0.0546 -10.32 -0.0548 -10.44 -0.0316 -6.07 -0.0307 -5.90
Min(ret[-27,-7],0) -0.1281 -21.84 -0.1309 -22.40 -0.1278 -21.25 -0.1301 -21.75
Ret(m-7,m-2) -0.0147 -13.03 -0.0146 -12.92 -0.0103 -8.27 -0.0102 -8.19
Lmto 0.0000 3.29 0.0000 3.09 0.0000 2.41 0.0000 2.37
Lvol -0.0254 -0.95 -0.0182 -0.69 -0.1266 -4.87 -0.1283 -4.92
Size 0.0048 18.02 0.0050 18.68 0.0065 24.66 0.0066 25.01
Lbm -0.0057 -15.90 -0.0056 -15.79 -0.0058 -17.29 -0.0057 -17.20

Panel B. Momentum or contrarian over weekly horizon: past returns


Reg I II III IV
Dep.var Mroibvol Mroibvol Mroibtrd Mroibtrd
Return Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept -0.4355 -21.86 -0.4416 -22.08 -0.4692 -24.00 -0.4731 -24.07
Mroib(w-1) 0.2191 94.13 0.2192 94.19 0.2855 151.68 0.2856 151.75
Max(ret(w-1),0) -0.5250 -17.36 -0.5262 -17.56 -0.4566 -15.05 -0.4582 -15.37
Min(ret(w-1),0) -1.4741 -35.98 -1.5144 -36.98 -1.4696 -34.09 -1.5140 -34.95
Ret(m-1) -0.2711 -19.11 -0.2712 -19.13 -0.2187 -14.63 -0.2184 -14.62
Ret(m-7,m-2) -0.0582 -11.56 -0.0580 -11.53 -0.0376 -6.50 -0.0374 -6.48
Lmto 0.0003 4.92 0.0002 4.81 0.0002 3.49 0.0002 3.40
Lvol 0.5660 5.90 0.5858 6.13 0.1688 1.66 0.1761 1.73
Size 0.0172 13.68 0.0176 14.00 0.0228 18.17 0.0231 18.35
Lbm -0.0267 -17.36 -0.0266 -17.30 -0.0266 -17.78 -0.0264 -17.73

Panel C. Momentum or contrarian for firms with different characteristics


Reg I II IIII IV
Dep.var Mroibvol Mroibvol Mroibvol Mroibvol
Return Bid-ask return Bid-ask return Bid-ask return Bid-ask return
Rank Size Price Mto Vol
Coef t-stat Coef t-stat Coef t-stat Coef t-stat
Intercept -0.4018 -21.36 -0.3984 -21.20 -0.3929 -20.76 -0.4002 -21.15
Mroib(w-1) 0.2198 99.54 0.2200 99.35 0.2201 99.42 0.2199 99.23
Ret(w-1) -0.9037 -30.01 -0.9104 -35.86 -1.2887 -30.04 -1.5742 -28.87
Ret(w-1)*Rank2 -0.1524 -3.85 -0.1846 -5.19 0.0676 1.46 0.3994 7.63

Electronic copy available at: https://ssrn.com/abstract=2822105


Ret(w-1)*Rank3 -0.0192 -0.47 -0.0583 -1.48 0.6252 13.25 0.7857 14.12
Ret(m-1) -0.2775 -20.39 -0.2777 -20.33 -0.2764 -20.35 -0.2754 -20.34
Ret(m-7,m-2) -0.0586 -12.07 -0.0589 -12.19 -0.0581 -12.00 -0.0587 -12.13
Lmto 0.0003 5.63 0.0003 5.64 0.0002 4.29 0.0003 5.68
Lvol 0.7981 8.58 0.7937 8.57 0.7847 8.52 0.6572 7.02
Size 0.0154 12.85 0.0152 12.67 0.0149 12.39 0.0157 13.13
Lbm -0.0275 -18.51 -0.0273 -18.45 -0.0273 -18.48 -0.0272 -18.45

Panel D. Seasonality
Reg I II III IV
Dep.var Mroibvol Mroibvol Mroibtrd Mroibtrd
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept -0.0383 -138.89 -0.0367 -155.82 -0.0323 -124.24 -0.0303 -136.36
Monday -0.0034 -6.04 -0.0049 -9.19
Friday 0.0027 4.94 0.0039 7.50
December -0.0343 -43.95 -0.0413 -56.10
January 0.0153 19.01 0.0168 22.08

Panel E. Firm fundamentals vs. monthly order imbalance


Reg I II
Dep.var Mroibvol Mroibtrd
Coef. t-stat Coef. t-stat
Intercept -0.0732 -6.60 -0.0875 -7.37
Ret(-) -0.1951 -16.39 -0.1980 -15.41
Ret(+) -0.0102 -1.16 0.0004 0.04
ALNP -0.0011 -0.59 0.0015 0.68
FAGE -0.0054 -5.03 -0.0052 -4.78
BTM -0.0034 -5.42 -0.0044 -7.82
SIZE 0.0022 3.10 0.0047 6.57
ALNAN 0.0082 7.36 0.0033 2.54
Beta 0.0010 0.48 -0.0043 -2.41
ESURP 0.0000 -0.06 0.0000 -0.45
EVOLA 0.0001 0.61 0.0001 0.64
FDISP 0.0009 2.70 0.0007 2.40
LEVRG 0.0357 10.79 0.0203 5.40
Avg adj R2 0.0243 0.0245

Electronic copy available at: https://ssrn.com/abstract=2822105


Appendix Table AII. Predictability Decomposition with Different Control Variables in the Second Stage
This table reports the second stage decomposition results of marketable retail order flow’s predictive power for the cross-section of
future stock returns using different control variables in second stage, as discussed in footnote 14. Our sample period covers January
2010 to December 2015, and our sample firms are all common stocks listed on U.S. stock exchanges with a share price of at least $1.
We estimate two-stage Fama-MacBeth regressions. In the first-stage estimation, the order imbalance measures are decomposed into
three components, as specified in equation (8). In the second-stage decomposition of order imbalance’s predictive power, as specified
in equation (9) - (11). Panel A reports the alternative second stage estimation deviated from equation (11), including marketable retail
order imbalance from week w-3 mroib(w-3) to control for past order imbalance as specified in equation (A1). Panel B reports the
alternative second stage estimation deviated from equation (11), including marketable retail order imbalance from previous month
mroib(m-1) to control for past order imbalance as reported in equation (A2).
̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 + 𝑒2(𝑤)𝑚𝑟𝑜𝑖𝑏
𝑟𝑒𝑡(𝑖, 𝑤) = 𝑒0(𝑤) + 𝑒1(𝑤)𝑚𝑟𝑜𝑖𝑏 ̂ 𝑖,𝑤−1
𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛 ̂ 𝑖,𝑤−1
+ 𝑒3(𝑤)𝑚𝑟𝑜𝑖𝑏 𝑜𝑡ℎ𝑒𝑟
+ 𝑒4(𝑤)𝑚𝑟𝑜𝑖𝑏(𝑤 − 3) (A1)
𝑖,𝑤−1

+ 𝑒5(𝑤) 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − 1) + 𝑢5(𝑖, 𝑤).
̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 + 𝑒2(𝑤)𝑚𝑟𝑜𝑖𝑏
𝑟𝑒𝑡(𝑖, 𝑤) = 𝑒0(𝑤) + 𝑒1(𝑤)𝑚𝑟𝑜𝑖𝑏 ̂ 𝑖,𝑤−1
𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛 ̂ 𝑖,𝑤−1
+ 𝑒3(𝑤)𝑚𝑟𝑜𝑖𝑏 𝑜𝑡ℎ𝑒𝑟
+ 𝑒4(𝑤)𝑚𝑟𝑜𝑖𝑏(𝑚 − 1) (A2)
𝑖,𝑤−1

+ 𝑒5(𝑤) 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − 1) + 𝑢5(𝑖, 𝑤).
The weekly returns are computed in two ways: using end-of-day bid-ask average price or CRSP closing price. The scaled order
imbalance measures are mroibvol (based on the number of marketable retail shares traded) and mroibtrd (based on the number of
marketable retail trades). The variable mroib(w-1, persistence) is estimated in the first stage using past order imbalance and reflects
price pressure. The variable mroib(w-1, contrarian) is estimated in the first stage using past returns over different horizons and is
connected to the liquidity provision hypothesis. The residual part of the previous-week order imbalance from the first-stage estimation
is denoted as “other,” which can be attributed to private information about future returns on the part of these marketable retail investors.
As additional control variables, we include previous-week return, ret(w-1), previous-month return, ret(m-1), and previous 6-month
return, ret(m-7, m-2). The control variables are log book-to-market ratio (lbm), log market cap (size), monthly turnover (lmto), and
monthly volatility of daily returns (lvol), all measured at the end of the previous month. To account for serial correlation in the
coefficients, the standard errors of the time-series are adjusted using Newey-West (1987) with five lags.

Electronic copy available at: https://ssrn.com/abstract=2822105


Panel A. Include mroib(w-3) as a control for past order imbalance in the second stage
Reg I II III IV
Order Imbalance Mroibvol(w) Mroibvol(w) Mroibtrd(w) Mroibtrd(w)
Dep.var Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept 0.0044 2.08 0.0050 2.35 0.0044 2.05 0.0049 2.32
Mroib(w-1,persistence) 0.0024 7.89 0.0026 8.44 0.0016 6.73 0.0017 7.34
Mroib(w-1, contrarian) -0.0066 -0.60 -0.1559 -1.50 -0.0067 -0.69 0.0356 1.60
Mroib(w-1,other) 0.0008 13.70 0.0009 14.65 0.0006 9.45 0.0007 10.49
Mroib(w-3) 0.0002 4.24 0.0003 4.65 0.0002 2.87 0.0002 3.27
Ret(w-1) -0.0173 -5.31 -0.0201 -6.10 -0.0174 -5.34 -0.0202 -6.13
Ret(m-1) -0.0062 -0.68 -0.0030 -0.33 0.0020 0.65 0.0117 1.17
Ret(m-7,m-2) -0.0010 -0.69 -0.0152 -1.13 0.0019 1.03 -0.0013 -0.52
Lmto 0.0000 -3.49 0.0000 -3.78 0.0000 -3.47 0.0000 -3.75
Lvol -0.0218 -1.39 -0.0224 -1.44 -0.0210 -1.34 -0.0217 -1.39
Size -0.0001 -0.49 -0.0001 -0.53 -0.0001 -0.52 -0.0001 -0.58
Lbm -0.0001 -0.47 0.0000 -0.14 -0.0001 -0.59 -0.0001 -0.25
Adj.R2 4.34% 4.36% 4.33% 4.35%
Interquartile return diff Interquartile return diff Interquartile return diff Interquartile return diff
Mroib(w-1,persistence) 0.2591 0.0628% 0.2593 0.0672% 0.3498 0.0546% 0.3500 0.0597%
Mroib(w-1,contrarian) 0.0627 -0.0415% 0.0631 -0.9834% 0.0614 -0.0409% 0.0619 0.2205%
Mroib(w-1,other) 1.1141 0.0888% 1.1141 0.0950% 1.1326 0.0676% 1.1327 0.0748%

Electronic copy available at: https://ssrn.com/abstract=2822105


Panel B. Include mroib(m-1) as a control for past order imbalance in the second stage
Reg I II III IV
Order Imbalance Mroibvol(w) Mroibvol(w) Mroibtrd(w) Mroibtrd(w)
Dep.var Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept 0.0046 2.21 0.0052 2.50 0.0045 2.19 0.0052 2.48
Mroib(w-1,persistence) 0.0026 8.60 0.0028 9.23 0.0017 7.62 0.0019 8.36
Mroib(w-1, contrarian) -0.0058 -0.52 -0.1499 -1.42 -0.0081 -0.82 0.0318 1.58
Mroib(w-1,other) 0.0008 14.08 0.0009 15.16 0.0006 10.03 0.0007 11.22
Mroib(m-1) 0.0000 1.92 0.0000 2.20 0.0000 1.50 0.0000 1.71
Ret(w-1) -0.0173 -5.32 -0.0203 -6.17 -0.0174 -5.36 -0.0204 -6.22
Ret(m-1) -0.0062 -0.68 0.0005 0.07 0.0015 0.52 0.0092 1.12
Ret(m-7,m-2) -0.0010 -0.69 -0.0152 -1.11 0.0017 0.98 -0.0008 -0.31
Lmto 0.0000 -3.47 0.0000 -3.75 0.0000 -3.46 0.0000 -3.74
Lvol -0.0232 -1.48 -0.0239 -1.54 -0.0227 -1.44 -0.0233 -1.50
Size -0.0001 -0.55 -0.0001 -0.61 -0.0001 -0.59 -0.0001 -0.66
Lbm -0.0001 -0.50 0.0000 -0.17 -0.0001 -0.60 -0.0001 -0.27
Adj.R2 4.28% 4.30% 4.28% 4.29%
Interquartile return diff Interquartile return diff Interquartile return diff Interquartile return diff
Mroib(w-1,persistence) 0.2591 0.0683% 0.2593 0.0731% 0.3498 0.0607% 0.3500 0.0665%
Mroib(w-1,contrarian) 0.0627 -0.0361% 0.0631 -0.9454% 0.0614 -0.0498% 0.0619 0.1965%
Mroib(w-1,other) 1.1141 0.0899% 1.1141 0.0964% 1.1326 0.0697% 1.1327 0.0774%

Electronic copy available at: https://ssrn.com/abstract=2822105


Appendix Table AIII. Predictability Decomposition with Contemporaneous Return in the First Stage
This table reports estimation results on a decomposition of marketable retail order flow’s predictive power for the cross-section of future
stock returns, using contemporaneous return in first stage as discussed in footnote 14. Our sample period covers January 2010 to
December 2015, and our sample firms are all common stocks listed on U.S. stock exchanges with a share price of at least $1. We estimate
two-stage Fama-MacBeth regressions. In the first-stage estimation, the order imbalance measures are decomposed into three components,
as specified in equation (8). In the second-stage decomposition of order imbalance’s predictive power, as specified in equation (9) - (11).
Panel A reports the alternative first-stage estimation results deviated from equation (8), where the order imbalance measures mroib(w-
1) are decomposed into three components with contemporaneous return, ret(w-1), as specified in equation (A3). Panel B reports the
second-stage decomposition of order imbalance’s predictive power after first-stage estimation in Panel A, as specified in equation (A4)
and (A5).
mroib(i, w − 1) (A3)
= d0(𝑤 − 1) + 𝑑1(𝑤 − 1)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 2) + 𝑑2(𝑤 − 1)𝑟𝑒𝑡(𝑖, 𝑤 − 1) + 𝑑3(𝑤 − 1)𝑟𝑒𝑡(𝑖, 𝑚 − 1)
+ 𝑑4(𝑤 − 1)𝑟𝑒𝑡(𝑖, 𝑚 − 2, 𝑚 − 7) + 𝑢4(𝑖, 𝑤 − 1).
̂
mroib 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 ̂ (𝑤 − 1)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 2),
= 𝑑1 (A4)
𝑖,𝑤−1
̂ 𝑖,𝑤−1
mroib 𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛 ̂ (𝑤 − 1)𝑟𝑒𝑡(𝑖, 𝑤 − 1) + 𝑑3
= 𝑑2 ̂ (𝑤 − 1)𝑟𝑒𝑡(𝑖, 𝑚 − 1) + 𝑑4 ̂ (𝑤 − 1)𝑟𝑒𝑡(𝑖, 𝑚 − 2, 𝑚 − 7),
̂ 𝑖,𝑤−1
mroib 𝑜𝑡ℎ𝑒𝑟 ̂ (𝑖, 𝑤 − 1) + 𝑑0
= 𝑢4 ̂ (𝑤 − 1).
ret(i, w) = e0(w) + e1(w)mroib ̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 + 𝑒2(𝑤)mroib̂ 𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛
𝑖,𝑤−1
̂ 𝑜𝑡ℎ𝑒𝑟
+ 𝑒3(𝑤)mroib 𝑖,𝑤−1 + 𝑒4(𝑤)′𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − 1)
(A5)
𝑖,𝑤−1
+ 𝑢5(𝑖, 𝑤).
Panel C reports the alternative first-stage estimation results deviated from equation (8), where the order imbalance measures mroib(w-
1) are decomposed into four components with contemporaneous positive and negative return, posret(i,w-1) and negret(i,w-1), as
specified in equation (A6), where posret(i,w-1) equals to ret(i,w-1) if ret(i,w-1) is positive and zero if ret(i,w-1) is negative and
negret(i,w-1) equals to ret(i,w-1) if ret(i,w-1) is negative and zero if ret(i,w-1) is positive. Panel D reports the second-stage decomposition
of order imbalance’s predictive power, after first-stage estimation in Panel C, as specified in equation (A7) and (A8).
mroib(i, w − 1) (A6)
= d0(𝑤 − 1) + 𝑑1(𝑤 − 1)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 2) + 𝑑2(𝑤 − 1)𝑝𝑜𝑠𝑟𝑒𝑡(𝑖, 𝑤 − 1) + 𝑑3(𝑤 − 1)𝑛𝑒𝑔𝑟𝑒𝑡(𝑖, 𝑤 − 1)
+ 𝑑4(𝑤 − 1)𝑟𝑒𝑡(𝑖, 𝑚 − 1) + 𝑑5(𝑤 − 1)𝑟𝑒𝑡(𝑖, 𝑚 − 2, 𝑚 − 7) + 𝑢4(𝑖, 𝑤 − 1).
̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒
mroib𝑖,𝑤−1 ̂ (𝑤 − 1)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 2),
= 𝑑1 (A7)
̂ 𝑝𝑜𝑠𝑟𝑒𝑡 = 𝑑2
mroib ̂ (𝑤 − 1)𝑝𝑜𝑠𝑟𝑒𝑡(𝑖, 𝑤 − 1) + 0.5 × 𝑑4 ̂ (𝑤 − 1)𝑟𝑒𝑡(𝑖, 𝑚 − 1) + 0.5 × 𝑑5 ̂ (𝑤 − 1)𝑟𝑒𝑡(𝑖, 𝑚 − 2, 𝑚 − 7)
𝑖,𝑤−1
̂ 𝑛𝑒𝑔𝑟𝑒𝑡 ̂ ̂ ̂
mroib 𝑖,𝑤−1 = 𝑑3(𝑤 − 1)𝑛𝑒𝑔𝑟𝑒𝑡(𝑖, 𝑤 − 1) + 0.5 × 𝑑4(𝑤 − 1)𝑟𝑒𝑡(𝑖, 𝑚 − 1) + 0.5 × 𝑑5(𝑤 − 1)𝑟𝑒𝑡(𝑖, 𝑚 − 2, 𝑚 − 7)
̂ 𝑜𝑡ℎ𝑒𝑟
mroib ̂ ̂
𝑖,𝑤−1 = 𝑢4(𝑖, 𝑤 − 1) + 𝑑0(𝑤 − 1).

Electronic copy available at: https://ssrn.com/abstract=2822105


ret(i, w) = e0(w) + e1(w)mroib ̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 + 𝑒2(𝑤)mroib ̂ 𝑝𝑜𝑠𝑟𝑒𝑡 + 𝑒3(𝑤)mroib ̂ 𝑛𝑒𝑔𝑟𝑒𝑡 + 𝑒4(𝑤)mroib
̂ 𝑜𝑡ℎ𝑒𝑟
𝑖,𝑤−1
(A8)
𝑖,𝑤−1 𝑖,𝑤−1 𝑖,𝑤−1
+ 𝑒5(𝑤)′𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − 1) + 𝑢5(𝑖, 𝑤).
The weekly returns are computed in two ways: using end-of-day bid-ask average price or CRSP closing price. The scaled marketable
retail order imbalance measures are mroibvol (based on the number of marketable retail shares traded) and mroibtrd (based on the
number of marketable retail trades). The variable mroib(w-1, persistence) is estimated in the first stage using past order imbalance and
reflects price pressure. The variable mroib(w-1, contrarian) is estimated in the first stage using contemporaneous return and past returns
over different horizons and is connected to the liquidity provision hypothesis. The residual part of the previous-week order imbalance
from the first-stage estimation is denoted as “other,” which can be attributed to private information about future returns on the part of
these marketable retail investors. As additional control variables, we include previous-week return, ret(w-1), previous-month return,
ret(m-1), and previous 6-month return, ret(m-7, m-2). The control variables are log book-to-market ratio (lbm), log market cap (size),
monthly turnover (lmto), and monthly volatility of daily returns (lvol), all measured at the end of the previous month. To account for
serial correlation in the coefficients, the standard errors of the time-series are adjusted using Newey-West (1987) with five lags.

10

Electronic copy available at: https://ssrn.com/abstract=2822105


Panel A. First stage of projecting order imbalance on persistence, contemporaneous return and past return
Reg I II III IV
Dep.var Mroibvol(w-1) Mroibvol(w-1) Mroibtrd(w-1) Mroibtrd(w-1)
Return Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept -0.1468 -26.10 -0.1469 -26.12 -0.1122 -18.52 -0.1124 -18.55
Mroib(w-2) 0.2216 96.52 0.2216 96.53 0.2885 151.74 0.2885 151.77
Ret(w-1) 0.2267 6.51 0.2538 7.34 0.5391 13.90 0.5677 14.71
Ret(m-1) -0.2170 -14.83 -0.2170 -14.84 -0.1722 -11.67 -0.1722 -11.67
Ret(m-7,m-2) -0.0281 -5.38 -0.0281 -5.39 -0.0072 -1.17 -0.0072 -1.16
Adj.R2 5.40% 5.41% 8.88% 8.89%

Panel B. Second stage decomposition of order imbalance’s predictive power, after first stage in Panel A
Reg I II III IV
Order Imbalance Mroibvol(w) Mroibvol(w) Mroibtrd(w) Mroibtrd(w)
Dep.var Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept 0.0045 2.19 0.0051 2.47 0.0045 2.15 0.0051 2.44
Mroib(w-1,persistence) 0.0027 8.72 0.0029 9.36 0.0018 7.73 0.0019 8.49
Mroib(w-1,contrarian) 0.0420 0.71 0.0545 0.74 0.2910 1.73 0.1832 0.94
Mroib(w-1,other) 0.0008 13.99 0.0009 14.94 0.0006 10.17 0.0007 11.25
Ret(w-1) -0.0609 -2.36 -0.0883 -2.23 -0.3131 -1.80 -0.2255 -1.27
Ret(m-1) 0.0220 0.72 0.0080 0.35 0.0455 1.12 0.0190 0.82
Ret(m-7,m-2) 0.0006 2.04 0.0006 2.06 0.0006 1.88 0.0006 1.91
Lmto 0.0000 -3.36 0.0000 -3.67 0.0000 -3.36 0.0000 -3.68
Lvol -0.0230 -1.46 -0.0232 -1.49 -0.0225 -1.43 -0.0227 -1.45
Size -0.0001 -0.56 -0.0001 -0.62 -0.0001 -0.59 -0.0001 -0.66
Lbm -0.0001 -0.50 0.0000 -0.18 -0.0001 -0.59 -0.0001 -0.27
Adj.R2 4.04% 4.05% 4.03% 4.04%
Interquartile return diff Interquartile return diff Interquartile return diff Interquartile return diff
Mroib(w-1,persistence) 0.2579 0.0689% 0.2579 0.0739% 0.3473 0.0617% 0.3472 0.0675%
Mroib(w-1,contrarian) 0.0497 0.2087% 0.0499 0.2721% 0.0546 1.5878% 0.0551 1.0096%
Mroib(w-1,other) 1.1152 0.0911% 1.1152 0.0974% 1.1332 0.0714% 1.1332 0.0789%

11

Electronic copy available at: https://ssrn.com/abstract=2822105


Panel C. First stage of projecting order imbalance on persistence, contemporaneous positive and negative return and past return
Reg I II III IV
Dep.var Mroibvol(w-1) Mroibvol(w-1) Mroibtrd(w-1) Mroibtrd(w-1)
Return Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept -0.1742 -29.49 -0.1737 -29.44 -0.1364 -21.58 -0.1358 -21.51
Mroib(w-2) 0.2203 96.94 0.2203 96.93 0.2875 151.11 0.2876 151.23
Posret(w-1) 0.9453 25.97 0.9506 26.20 1.1696 29.19 1.1721 29.35
Negret(w-1) -0.8228 -15.18 -0.7778 -14.35 -0.3954 -6.78 -0.3375 -5.75
Ret(m-1) -0.2159 -15.08 -0.2159 -15.09 -0.1705 -11.76 -0.1706 -11.76
Ret(m-7,m-2) -0.0268 -5.09 -0.0268 -5.10 -0.0059 -0.93 -0.0058 -0.93
Adj.R2 5.55% 5.55% 9.01% 9.01%

Panel D. Second-stage decomposition of order imbalance’s predictive power, after first stage in Panel C
Reg I II III IV
Order Imbalance Mroibvol(w) Mroibvol(w) Mroibtrd(w) Mroibtrd(w)
Dep.var Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept 0.0049 2.43 0.0054 2.69 0.0048 2.39 0.0053 2.66
Mroib(w-1,persistence) 0.0027 8.64 0.0029 9.20 0.0018 7.80 0.0020 8.51
Mroib(w-1,positive) 0.0196 0.08 0.4550 1.12 -0.0102 -0.03 -0.6046 -0.84
Mroib(w-1,negative) 0.0952 0.38 -0.6257 -1.19 -0.3945 -0.88 0.5624 0.77
Mroib(w-1,other) 0.0008 14.56 0.0009 15.53 0.0006 10.82 0.0007 11.93
Ret(w-1) 0.0213 0.07 -0.2911 -0.89 -0.1151 -0.26 0.1337 0.45
Ret(m-1) 0.0061 0.24 -0.0287 -0.88 0.0087 0.42 0.0017 0.15
Ret(m-7,m-2) 0.0005 1.65 0.0006 1.92 0.0005 1.88 0.0005 1.81
Lmto 0.0000 -3.44 0.0000 -3.76 0.0000 -3.44 0.0000 -3.76
Lvol -0.0202 -1.32 -0.0207 -1.37 -0.0199 -1.30 -0.0203 -1.34
Size -0.0001 -0.71 -0.0001 -0.76 -0.0001 -0.75 -0.0001 -0.80
Lbm -0.0001 -0.57 -0.0001 -0.25 -0.0001 -0.67 -0.0001 -0.34
Adj.R2 4.30% 4.32% 4.29% 4.31%
Interquartile return diff Interquartile return diff Interquartile return diff Interquartile return diff
Mroib(w-1,persistence) 0.2563 0.0690% 0.2564 0.0737% 0.3461 0.0622% 0.3461 0.0677%
Mroib(w-1,positive) 0.0357 0.0699% 0.0360 1.6397% 0.0405 -0.0412% 0.0407 -2.4626%

12

Electronic copy available at: https://ssrn.com/abstract=2822105


Mroib(w-1,negative) 0.0321 0.3052% 0.0315 -1.9690% 0.0292 -1.1499% 0.0289 1.6224%
Mroib(w-1,other) 1.1136 0.0923% 1.1137 0.0986% 1.1323 0.0730% 1.1322 0.0806%

13

Electronic copy available at: https://ssrn.com/abstract=2822105


Appendix Table AIV. Predictability Decomposition with Contemporaneous Components in the Second Stage
This table reports the second stage decomposition results of marketable retail order flow’s predictive power for the cross-section of
future stock returns using contemporaneous components in second stage as discussed in footnote 15. Our sample period covers January
2010 to December 2015, and our sample firms are all common stocks listed on U.S. stock exchanges with a share price of at least $1.
We estimate two-stage Fama-MacBeth regressions. Panel A reports the first-stage estimation, where the order imbalance measures are
decomposed into three components, as specified in equation (A9), also the equation (8’) in footnote 15. Panel B reports the second-stage
decomposition of order imbalance’s predictive power, as specified in equation (A10) and (A12), also the equation (9’) - (11’) in footnote
15.

𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤) = 𝑑0(𝑤) + 𝑑1(𝑤)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) + 𝑑2(𝑤)′ 𝑟𝑒𝑡(𝑖, 𝑤 − 1) + 𝑢4(𝑖, 𝑤). (A9)


̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 = 𝑑1
𝑚𝑟𝑜𝑖𝑏 ̂ (𝑤)𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1), (A10)
𝑖,𝑤
̂ 𝑖,𝑤
𝑚𝑟𝑜𝑖𝑏 𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛 ̂ (𝑤)′ 𝑟𝑒𝑡(𝑖, 𝑤 − 1),
= 𝑑2
̂ 𝑖,𝑤
𝑚𝑟𝑜𝑖𝑏 𝑜𝑡ℎ𝑒𝑟
= 𝑢4 ̂ (𝑤).
̂ (𝑖, 𝑤) + 𝑑0
̂
𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤) = 𝑚𝑟𝑜𝑖𝑏 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 ̂ 𝑖,𝑤
+ 𝑚𝑟𝑜𝑖𝑏 𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛 ̂ 𝑖,𝑤
+ 𝑚𝑟𝑜𝑖𝑏 𝑜𝑡ℎ𝑒𝑟
. (A11)
𝑖,𝑤
𝑟𝑒𝑡(𝑖, 𝑤) = 𝑒0(𝑤) + 𝑒1(𝑤)𝑚𝑟𝑜𝑖𝑏 ̂ 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 ̂ 𝑖,𝑤
+ 𝑒2(𝑤)𝑚𝑟𝑜𝑖𝑏 𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑖𝑎𝑛 ̂ 𝑖,𝑤
+ 𝑒3(𝑤)𝑚𝑟𝑜𝑖𝑏 𝑜𝑡ℎ𝑒𝑟
+ 𝑒4(𝑤)′ 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − (A12)
𝑖,𝑤
1) + 𝑢5(𝑖, 𝑤).
The weekly returns are computed in two ways: using end-of-day bid-ask average price or CRSP closing price. The scaled order
imbalance measures are mroibvol (based on the number of marketable retail shares traded) and mroibtrd (based on the number of
marketable retail trades). The variable mroib(w, persistence) is estimated in the first stage using past order imbalance and reflects price
pressure. The variable mroib(w, contrarian) is estimated in the first stage using past returns over different horizons and is connected to
the liquidity provision hypothesis. The residual part of the previous-week order imbalance from the first-stage estimation is denoted as
“other,” which can be attributed to private information about future returns on the part of these marketable retail investors. As additional
control variables, we include previous-week return, ret(w-1), previous-month return, ret(m-1), and previous 6-month return, ret(m-7, m-
2). The control variables are log book-to-market ratio (lbm), log market cap (size), monthly turnover (lmto), and monthly volatility of
daily returns (lvol), all measured at the end of the previous month. To account for serial correlation in the coefficients, the standard
errors of the time-series are adjusted using Newey-West (1987) with five lags.

14

Electronic copy available at: https://ssrn.com/abstract=2822105


Panel A. First stage of projecting order imbalance on persistence and past return, on week w
Reg I II III IV
Dep.var Mroibvol(w) Mroibvol(w) Mroibtrd(w) Mroibtrd(w)
Return Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept -0.1413 -24.66 -0.1408 -24.61 -0.1054 -17.23 -0.1049 -17.19
Mroib(w-1) 0.2227 96.20 0.2228 96.20 0.2906 149.82 0.2907 149.85
Ret(w-1) -0.9286 -38.93 -0.9422 -39.80 -0.8926 -34.92 -0.9076 -35.81
Ret(m-1) -0.2029 -13.93 -0.2025 -13.90 -0.1591 -10.72 -0.1588 -10.70
Ret(m-7,m-2) -0.0267 -4.98 -0.0268 -4.99 -0.0054 -0.86 -0.0055 -0.88
Adj.R2 5.62% 5.63% 8.99% 9.00%

Panel B. Second stage decomposition of order imbalance’s predictive power, on week w


Reg I II III IV
Order Imbalance Mroibvol(w) Mroibvol(w) Mroibtrd(w) Mroibtrd(w)
Dep.var Bid-ask return CRSP return Bid-ask return CRSP return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept 0.0063 3.00 0.0069 3.26 0.0069 3.29 0.0075 3.55
Mroib(w, persistence) 0.0045 14.26 0.0047 15.20 0.0027 11.61 0.0030 12.73
Mroib(w, contrarian) -0.0812 -0.90 0.2662 0.65 0.3711 0.88 0.0662 1.10
Mroib(w, other) 0.0006 5.07 0.0006 5.80 0.0017 13.69 0.0018 14.47
Ret(w-1) -0.0512 -0.66 0.1940 0.60 0.1537 0.43 0.0332 0.58
Ret(m-1) -0.0214 -0.58 0.2237 0.92 -0.0403 -0.60 0.0257 1.41
Ret(m-7,m-2) 0.0000 -0.53 0.0000 1.00 0.0000 0.90 0.0000 1.04
Lmto 0.0000 -3.69 0.0000 -4.07 0.0000 -3.72 0.0000 -4.10
Lvol -0.0226 -1.41 -0.0214 -1.34 -0.0221 -1.37 -0.0208 -1.31
Size -0.0002 -1.42 -0.0002 -1.46 -0.0002 -1.72 -0.0002 -1.78
Lbm 0.0000 -0.16 0.0000 0.18 0.0000 -0.04 0.0001 0.29
Adj.R2 4.01% 4.03% 4.11% 4.14%
Interquartile return diff Interquartile return diff Interquartile return diff Interquartile return diff
Mroib(w,persistence) 0.2591 0.1161% 0.2593 0.1229% 0.3498 0.0955% 0.3500 0.1041%
Mroib(w,contrarian) 0.0627 -0.5090% 0.0631 1.6793% 0.0614 2.2796% 0.0619 0.4092%
Mroib(w,other) 1.1141 0.0623% 1.1141 0.0711% 1.1326 0.1919% 1.1327 0.2015%

15

Electronic copy available at: https://ssrn.com/abstract=2822105


Appendix Table AV. Marketable Retail Order Imbalance and Contemporaneous Returns, Replicating KST Table III using
mroibtrd
This table presents analysis of market-adjusted returns around net buying and selling activity as given by the scaled marketable retail
order imbalance measure mroibtrd (based on the number of trades). For each non-overlapping week in the sample period (which extends
from Jan 2010 to Dec 2015), we aggregate the daily order imbalance measures to form weekly Mroib deciles. Each stock is put into 1
of 10 deciles according to the Mroib value in the current week. Decile 1 contains the stocks with the most net selling (negative Mroib)
while decile 10 contains the stocks with the most net buying (positive Mroib). We present the results for four portfolios: (i) decile 1, (ii)
deciles 1 and 2, (iii) deciles 9 and 10, and (iv) decile 10. Let k be the number of days prior to or following the portfolio formation each
week. In Panel A, we calculate eight cumulative return numbers for each of the stocks in a portfolio: 𝐶𝑅(𝑡 − 𝑘, 𝑡 − 1), where 𝑘 ∈
{20,15,10,5} days and t is the first day of the formation week, and 𝐶𝑅(𝑡 + 1, 𝑡 + 𝑘), where 𝑘 ∈ {5,10,15,20} days and t is the last day
of the formation week. The return on each portfolio is then adjusted by subtracting the return on a market proxy (the equal-weighted
portfolio of all stocks in the sample). We present the time-series mean and t-statistic for each market-adjusted cumulative return measure
and for the market-adjusted return during the intense trading week (k=0). In Panel B, we present the time-series mean and t-statistic for
weekly market-adjusted returns in the 4 weeks around the formation week (i.e., CR(t − k, t − k + 4), where 𝑘 ∈ {20,15,10,5} days and
t is the first day of the formation week, and CR(t + k − 4, t + k), where 𝑘 ∈ {5,10,15,20} and t is the last day of the formation week).
** indicates significance at 1% level and * indicates significance at 5% level (both against a two-sided alternative). The t-statistic is
computed using Newey-West standard errors.

Panel A. Cumulative market adjusted return


Mroibtrd Intense Selling Selling Buying Intense Buying
Bid-ask return (decile 1) (decile 1&2) (decile 9&10) (decile10)
Mean t-stat Mean t-stat Mean t-stat Mean t-stat
k=-20 0.0051** 4.29 0.0047** 6.58 -0.0101** -18.94 -0.0131** -14.20
k=-15 0.0047** 4.76 0.0044** 7.54 -0.0088** -19.68 -0.0111** -14.65
k=-10 0.0035** 4.72 0.0033** 7.55 -0.0067** -18.91 -0.0086** -14.30
k=-5 0.0022** 5.17 0.0022** 8.77 -0.0041** -19.93 -0.0051** -15.34
k=0 -0.0035** -8.02 -0.0030** -9.66 0.0025** 8.19 0.0024** 5.69
k=5 -0.0016** -3.92 -0.0010** -4.84 0.0015** 8.65 0.0019** 5.95
k=10 -0.0025** -3.46 -0.0016** -4.16 0.0023** 7.96 0.0029** 5.31
k=15 -0.0026** -2.68 -0.0018** -3.48 0.0029** 7.64 0.0038** 4.86
k=20 -0.0031** -2.67 -0.0020** -3.16 0.0033** 7.37 0.0046** 4.92

16

Electronic copy available at: https://ssrn.com/abstract=2822105


Panel B. Weekly market adjusted return
Mroibtrd Intense Selling Selling Buying Intense Buying
Bid-ask return (decile 1) (decile 1&2) (decile 9&10) (decile10)
Mean t-stat Mean t-stat Mean t-stat Mean t-stat
k=-20 0.0005 1.36 0.0004 1.81 -0.0014** -8.03 -0.0020** -6.52
k=-15 0.0013** 3.39 0.0012** 5.17 -0.0022** -12.83 -0.0026** -8.88
k=-10 0.0013** 3.36 0.0011** 4.66 -0.0027** -12.34 -0.0035** -9.72
k=-5 0.0022** 5.17 0.0022** 8.77 -0.0041** -19.93 -0.0051** -15.34
k=0 -0.0035** -8.02 -0.0030** -9.66 0.0025** 8.19 0.0024** 5.69
k=5 -0.0016** -3.92 -0.0010** -4.84 0.0015** 8.65 0.0019** 5.95
k=10 -0.0008* -2.08 -0.0005* -2.32 0.0007** 4.02 0.0010** 3.17
k=15 -0.0002 -0.48 -0.0004 -1.77 0.0006** 3.22 0.0009* 2.39
k=20 -0.0005 -1.37 -0.0002 -0.95 0.0005** 2.93 0.0007* 2.17

17

Electronic copy available at: https://ssrn.com/abstract=2822105


Appendix Table AVI. Marketable retail Order Imbalance and Future News Sentiment
This table reports estimation results on whether our marketable retail investor order imbalances
can predict one-week-ahead sentiment for each news release category. Our sample period covers
January 2010 to December 2014, and our sample firms are all common stocks listed on U.S. stock
exchanges with a share price of at least $1. We estimate Fama-MacBeth regressions for each month
as in the following equation using only firm-day observations that have public news for a given
subtopic:
𝑠𝑒𝑛𝑡(𝑖, 𝑤) = 𝑎0 + 𝑎1 × 𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑤 − 1) + 𝑎2′𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑤 − 1) + 𝑢1(𝑖, 𝑤). (A13)
The dependent variable is the weekly net sentiment score for a given subtopic, sent(i,w). The
independent variables are the scaled marketable retail order imbalance measure mroibvol (based
on the number of shares traded) and mroib(w-1,other) from the first stage decomposition in Table
VII. As control variables, we include the previous-week return, ret(w-1), previous-month return,
ret(m-1) and previous 6-month return, ret(m-7,m-2), log book-to-market ratio (lbm), log market
cap (size), monthly turnover (lmto), and monthly volatility of daily returns (lvol). Coefficients on
controls are not reported for brevity. To account for serial correlation in the coefficients, the
standard errors of the time-series are adjusted using Newey–West (1987) with five lags. We report
the coefficient on the marketable retail order imbalance and t-statistics below and the p-value of
the Wald Test for all coefficients jointly equal to zero and the t-test for average of all coefficients
equal to zero at the bottom of the table.

mroibvol mroibother
Topic Type Description N a1 t-stat a1 t-stat
FUND cross market investment funds 93,142 -0.0039 -1.19 -0.0003 -0.42
REGS cross market regulatory issues 61,513 -0.0044 -1.63 -0.0008 -1.20
MNGISS cross market management issues 54,170 -0.0022 -0.96 -0.0002 -0.31
NEWS cross market top stories 36,265 -0.0019 -0.30 -0.0016 -1.08
EXCA cross market exchange activities 20,533 -0.0114 -1.39 -0.0037 -1.56
DRV cross market derivatives 18,061 0.0032 0.45 0.0002 0.12
ISU cross market new issues 17,061 0.0073 0.96 0.0011 0.60
INV cross market Investing 13,916 0.0090 1.20 0.0015 0.82
PRESS cross market press digests 9,783 0.0029 0.22 0.0005 0.19
TRD cross market international trade 4,782 0.0192 1.65 0.0055 1.66
BKRT cross market bankruptcy 4,700 -0.0090 -0.75 -0.0002 -0.08
RTM cross market Retirement 3,075 -0.0187 -1.43 -0.0016 -0.46
HEA general news health/medicines 52,110 -0.0099 -3.13 -0.0020 -2.51
POL general news domestic politics 49,699 -0.0129 -4.45 -0.0029 -4.26
JUDIC general news judicial 28,280 0.0056 1.46 0.0018 1.62
LAW general news legislation 17,135 -0.0016 -0.28 0.0003 0.22
ENV general news environment/nature 16,189 -0.0178 -3.87 -0.0044 -3.64
SCI general news science/technology 11,035 -0.0175 -1.87 -0.0051 -2.32
CRIM general news crime 9,866 0.0230 2.27 0.0036 1.76
SECUR general news security 7,528 0.0121 1.44 0.0021 0.89
DIP general news diplomacy 6,057 0.0167 1.81 0.0024 0.91

18

Electronic copy available at: https://ssrn.com/abstract=2822105


mroibvol mroibother
Topic Type Description N a1 t-stat a1 t-stat
WEA general news weather 3,860 0.0195 0.88 0.0042 0.58
DIS general news disasters/accidents 3,836 -0.0123 -1.31 -0.0055 -1.85
VOTE general news elections 1,899 -0.0304 -1.16 -0.0081 -1.05
VIO general news civil unrest 1,339 0.0187 0.59 0.0070 0.91
WAR general news war/insurgencies 1,053 0.0134 0.32 0.0088 0.58
EMRG economy emerging markets 88,990 -0.0069 -3.51 -0.0014 -2.61
WASH economy US government news 32,835 -0.0027 -0.48 -0.0006 -0.44
JOB economy labor/employment 28,670 -0.0041 -1.24 -0.0005 -0.47
MCE economy macroeconomics 18,148 0.0042 0.62 0.0006 0.34
ECI economy economic indicators 11,982 0.0077 0.77 0.0017 0.88
CEN economy central banks 9,462 -0.0093 -0.80 -0.0043 -1.23
TAX economy tax 4,871 -0.0405 -3.63 -0.0096 -3.77
FED economy Federal Reserve Board 3,632 -0.0378 -2.85 -0.0131 -4.29
PLCY economy policymakers speak 3,263 -0.0187 -1.04 -0.0042 -0.89
INT economy interest rates 2,360 0.0004 0.02 0.0038 0.59
RES equities corporate results 176,699 -0.0017 -1.83 0.0000 0.14
RCH equities broker research 141,765 -0.0137 -6.35 -0.0025 -4.92
RESF equities results forecast 102,515 0.0013 1.19 0.0001 0.48
STX equities stock markets 101,829 -0.0071 -2.31 -0.0019 -2.91
MRG equities ownership changes 69,775 -0.0073 -3.53 -0.0016 -3.28
HOT equities hot stocks 66,185 -0.0073 -2.17 -0.0017 -2.14
PVE equities private equity 25,740 -0.0167 -3.29 -0.0033 -2.64
DIV equities dividend 24,282 0.0021 0.76 0.0002 0.20
IPO equities initial public offer 13,913 0.0113 1.94 0.0027 1.90
DBT money/debt debt markets 73,600 0.0012 0.45 0.0000 0.04
USC money/debt US corporate bonds 26,796 -0.0068 -2.19 -0.0020 -2.50
AAA money/debt debt rating news 23,405 0.0072 1.45 0.0015 1.25
LOA money/debt loans 17,718 -0.0053 -1.40 -0.0022 -1.86
HYD money/debt high-yield debt/junk 10,222 -0.0021 -0.38 -0.0011 -0.77
GVD money/debt government debt 8,759 0.0182 1.44 0.0027 0.86
MUNI money/debt muni news 7,933 0.0074 0.83 0.0025 0.88
MTG money/debt mortgage-backed debt 7,764 0.0063 0.67 0.0004 0.13
FRX money/debt forex 7,006 0.0211 1.86 0.0037 1.21
IGD money/debt investment grade debt 6,760 0.0110 0.95 0.0023 0.74
ABS money/debt asset-backed debt 2,982 0.0120 0.53 0.0028 0.54
TNC money/debt bond terms & conditions 1,826 0.0619 1.37 0.0117 1.04
MMT money/debt money markets 1,574 -0.0390 -1.25 -0.0066 -0.81
P-value of Wald Test for all coefficients jointly equal to zero 0.10 1.00
P-value of t-test for average of all coefficients equal to zero 0.71 0.64

19

Electronic copy available at: https://ssrn.com/abstract=2822105


Appendix Table AVII. Predicting Returns k-days Ahead Using Odd Lot Marketable Retail Order Imbalances
This table reports estimation results on whether retail investors’ odd lot trading activity can predict the cross-section of k-days ahead
returns. Our sample period covers January 2014 to December 2015, and our sample firms are all common stocks listed on U.S. stock
exchanges with a share price of at least $1. We estimate Fama-MacBeth regressions as specified in equation (A14)
𝑟𝑒𝑡(𝑖, 𝑑 + 𝑘) = 𝑏0(𝑑) + 𝑏1(𝑑)𝑂𝑑𝑑𝑚𝑟𝑜𝑖𝑏(𝑖, 𝑑) + 𝑏2(𝑑)′𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠(𝑖, 𝑑) + 𝑢2(𝑖, 𝑑 + 𝑘) (A14)
The dependent variable is daily individual stock return k-days ahead, computed in two ways: using the end-of-day bid-ask average price
or CRSP closing price. The independent variables are two scaled daily odd lot marketable retail order imbalance measures, oddmroibvol
(based on the number of shares traded) or oddmroibtrd (based on the number of trades), respectively. The control variables one-day
return, 𝑟𝑒𝑡(𝑖, 𝑑), previous week return, 𝑟𝑒𝑡(𝑖, 𝑑 − 1, 𝑑 − 5), previous month return, 𝑟𝑒𝑡(𝑖, 𝑑 − 6, 𝑑 − 26), log book-to-market ratio
(lbm), log market cap (size), monthly turnover (lmto), and monthly volatility of daily returns (lvol) measured at the end of the previous
month. To account for serial correlation in the coefficients, the standard deviations of the time-series are adjusted using Newey-West
(1987) with five lags.

Order imbalance Oddmroibvol Oddmroibvol Oddmroibtrd Oddmroibtrd


Dep.var Bid-ask return CRSP return Bid-ask return CRSP return
# of days ahead Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
1 day 0.0003 6.13 0.0003 5.69 0.0003 5.28 0.0003 4.89
2 day 0.0000 0.49 0.0000 0.74 0.0000 -0.16 0.0000 0.09
3 day 0.0001 1.40 0.0001 1.38 0.0000 0.39 0.0000 0.35
4 day 0.0000 0.35 0.0000 0.54 0.0000 0.12 0.0000 0.20
5 day 0.0001 2.11 0.0001 2.25 0.0001 1.93 0.0001 2.15
6 day 0.0000 -0.90 0.0000 -0.66 0.0000 0.08 0.0000 0.43
7 day 0.0001 1.00 0.0001 1.35 0.0001 0.99 0.0001 1.19
8 day 0.0000 -0.41 0.0000 -0.56 0.0000 -0.30 0.0000 -0.21
9 day 0.0001 1.20 0.0001 1.10 0.0000 0.49 0.0000 0.35
10 day 0.0001 1.10 0.0001 1.32 0.0000 0.72 0.0001 0.89

20

Electronic copy available at: https://ssrn.com/abstract=2822105


Appendix Table AVIII. Replicate Kelley and Tetlock (2013) Table IV
This table presents results from daily logistic regressions of earnings forecast errors on scaled marketable retail imbalances (Mroibvol[0])
and control variables as specified in equation (A14)
PosFE[x, y] = c0 + 𝑐1 𝑀𝑟𝑜𝑖𝑏[0] + 𝑐2 𝐿𝑎𝑔𝑁𝑒𝑔′ + 𝑐4 𝐿𝑎𝑔𝑅𝑒𝑡 ′ + 𝑐5 𝐹𝑖𝑟𝑚𝐶ℎ𝑎𝑟𝑠 ′ + 𝑒1. (A15)
The dependent variable PosFE[x,y] is one if the analyst forecast error for quarterly earnings announcements occurring from day t+x
through day t+y is positive and zero if the forecast error is negative. The forecast error is the difference between actual earnings-per-
share and the median analyst forecast from I/B/E/S. At least 50 earnings announcements with corresponding forecast data during the
window of the dependent variable are required for each daily logistic regression. Average coefficients and Newey and West (1987) t-
statistics are reported with lags equal to twice the horizons of the dependent variable. We use the negative probability in TRNA as Neg,
while Kelley and Tetlock (2013) use negative news from Dow Jones archives.

Kelly and Tetlock (2013) Table IV Our paper


Reg I II I II III IV
Dep.var PosFE[1,5] PosFE[6,20] PosFE[1] PosFE[1,3] PosFE[1,5] PosFE[6,20]
Order Imbalance Imb mkt Imb mkt Mroibvol Mroibvol Mroibvol Mroibvol
Ret CRSP Return CRSP Return CRSP Return CRSP Return CRSP Return CRSP Return
Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat Coef. t-stat
Intercept -1.614 -7.58 -1.664 -7.43 -1.334 -5.34 -3.197 -2.92 -2.783 -9.37 -2.758 -9.78
Mroib[0] 0.126 5.73 0.047 2.61 0.121 2.81 0.043 1.39 0.035 1.30 0.007 0.38
Neg[0] 0.050 0.68 -0.015 -0.27 3.357 1.60 1.611 1.03 2.812 2.35 0.288 1.62
Neg[-5,-1] -0.014 -1.00 -0.029 -2.23 0.249 0.49 0.054 0.18 0.043 0.12 -0.300 -1.85
Neg[-26,-6] -0.093 -6.20 -0.110 -5.50 -0.599 -0.97 -1.465 -2.46 -0.651 -0.94 -1.398 -2.94
Ret[0] 0.033 6.60 0.028 9.33 5.608 3.92 5.785 6.20 3.017 5.19 1.951 4.88
Ret[-5,-1] 0.031 7.75 0.025 8.33 3.330 5.34 2.827 2.24 3.243 7.18 1.661 5.33
Ret[-26,-6] 0.024 8.00 0.014 7.00 2.166 6.14 2.945 2.64 1.799 7.51 1.227 5.82
Size 0.193 12.87 0.191 13.64 0.153 8.62 0.275 3.92 0.254 11.16 0.249 11.61
Lbm -0.384 -5.05 -0.367 -4.59 -0.189 -3.68 0.082 0.33 -0.129 -2.74 -0.090 -2.36
Days 673 1193 289 566 745 1210
Average R2 9.99% 8.29% 13.16% 11.06% 11.52% 8.98%
Average N 265 482 125 229 302 585

21

Electronic copy available at: https://ssrn.com/abstract=2822105

You might also like