A Simple Estimation of Bid-Ask Spreads from Daily Close, High, and Low Prices
A Simple Estimation of Bid-Ask Spreads from Daily Close, High, and Low Prices
Angelo Ranaldo
University of St. Gallen
We propose a new method to estimate the bid-ask spread when quote data are not available.
Received July 17, 2016; editorial decision May 23, 2017 by Editor Andrew Karolyi.
This paper provides a new method to accurately estimate the bid-ask spread
based on readily available daily close, high, and low prices. Akin to the seminal
model proposed by Roll (1984), the rationale of our estimator is the departure of
the security price from its efficient value because of transaction costs. However,
our estimator improves the Roll measure in two important respects: First,
our method exploits a wider information set, namely, close, high, and low
prices, which are readily available, rather than only close prices like in the Roll
measure. Second, our estimator is completely independent of trade direction
dynamics, unlike in the Roll measure, which relies on the occurrence of bid-ask
bounces, and, consequently, relies on the assumption of serially independent
trade directions that are equally likely.
We thank the editor Andrew Karolyi, an anonymous referee, Yakov Amihud, Allaudeen Hameed, Joel Hasbrouck,
Robert Korajczyk, Asani Sarkar, Avanidhar Subrahmanyam, Paul Söderlind, and Jan Wrampelmeyer, as well
as the participants of the 2017 AFA meetings in Chicago, 2013 CFE conference in London, and 2016 SFI
research days in Gerzensee for comments and suggestions. All remaining errors are our own. We acknowledge
financial support from the Swiss National Science Foundation (SNSF; grants 159418 and 154445). Parts of this
paper were written while Abdi visited the Stern School of Business, New York University, whose hospitality is
gratefully acknowledged. Supplementary data can be found on The Review of Financial Studies web site. Send
correspondence to Angelo Ranaldo, Swiss Institute of Banking and Finance, University of St. Gallen, Unterer
Graben 21, 9000 St. Gallen, Switzerland; telephone: +41712247010. E-mail: [email protected].
© The Author 2017. Published by Oxford University Press on behalf of The Society for Financial Studies.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial
License(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution,
and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please
[email protected].
doi:10.1093/rfs/hhx084 Advance Access publication August 26, 2017
1 For example, end-of-day bid and ask quotes are missing in the CRSP data set from 1942 to 1992.
4438
2 The key advantages of using daily data, including large computational time savings, are comprehensively
discussed by Holden, Jacobsen, and Subrahmanyam (2014).
3 The use of end-of-period quotes, at frequencies lower than daily, goes back to Stoll and Whaley (1983).
4 Rather than approximating and estimating transaction costs, an alternative approach to measuring illiquidity is
to use proxies for the price impact, in particular the Amihud (2002) illiquidity measure.
4439
and close prices allows our model to benefit from the richest readily available
information set of price data.5 Second, unlike Roll (1984), our measure does
not rely on bid-ask bounces and, therefore, is independent of trade direction
time-series dynamics of close prices. Third, unlike Corwin and Schultz’s (2012)
HL estimator, our model neither needs to violate Jensen’s inequality in order
to construct the closed-form estimator nor does it need ad hoc adjustments
for nontrading periods, such as weekends, holidays, and overnight closings.
Finally, our estimates using the mid-range and close price are only marginally
sensitive to the number of trades per day, whereas the high-low estimator
proposed by Corwin and Schultz (2012) further underestimates effective costs
when the daily number of trades are lower, that is, when stocks (and markets)
are less liquid.
5 Unlike the availability of close, high, and low prices, the availability of open prices is subject to additional
limitations. For example, open prices are missing in the CRSP data between July 1962 and June 1992.
6 NYSE (NASDAQ) decimalization started for few of the listed stocks in August 2000 (March 2001), followed
by wider implementation in the next months and completion in January 2001 (April 2001).
4440
to the TAQ effective spreads, whereas our estimates show highest average
cross-sectional correlations and lowest estimation errors.
Second, our estimator provides the most accurate estimates in the absence
of quote data, making it the best choice for applications that rely on longer
time horizons, going back beyond 1993. Compared with other bid-ask spread
estimators that do not rely on quote data (the HL, Roll, Gibbs, EffTick, and FHT
measures), it provides the highest cross-sectional correlation with the intraday
effective spread. On a monthly basis, the average cross-sectional correlation of
our estimates with the Daily TAQ effective spreads is 0.74, whereas the other
estimators range from 0.37 to 0.65. The analysis of Monthly TAQ data from
1993 to 2003 delivers consistent results, that is, our estimates have the highest
average cross-sectional correlation of 0.86, whereas those of other estimators
4441
1. The Estimator
We first explain our model in theory, and then, provide details for its best use
in practice.
1.1 Model
Our model relies on assumptions similar to those made in the Roll (1984) model.
We assume that the efficient price follows a geometric Brownian motion (GBM)
and the observed price at each time point can be either buyer initiated or seller
initiated. To keep the notation concise, we directly implement the model on log-
price, and the superscript e refers to efficient prices. Equation (1) shows how
the observed market price and efficient price at the closing time are related. The
random variable ct represents the observable close log-price, and the random
variable cte represents the efficient log-price at the closing time. The random
variable qt is the trade direction indicator, and s is the relative spread, which
we aim to estimate. In line with Roll (1984), we assume that trade directions
4442
Definition 1. We define the mid-range as the average of daily high and low
log-prices:
(lt +ht )
ηt ≡ (4)
2
One can replace the efficient high and low log-prices with the observed values
since the spreads cancel out.
7 Using Daily TAQ data between October 2003 and December 2015 and an algorithm similar to Lee and Ready
(1991), we observe that around 90% (91%) of stocks-days include high (low) prices that are above (below) the
quote midpoints. The Internet Appendix provides more details.
4443
Proposition 2. The squared distance between close log-price of day t and the
proposed mid-point proxy includes two components: bid-ask spread component
and efficient price variance component Equation (7) shows this relation:
(ηt +ηt+1 ) 2
E ct − = s 2 /4+(1/2−k1 /8)σe2 , k1 ≡ 4 ln(2). (7)
2
Garman and Klass (1980), Parkinson (1980), and Beckers (1983) use the
value of k1 for the purpose of estimating volatility using the daily price range.
Here, rather than using the range, we take the average of high and low prices and
use it as an efficient price proxy. Proofs for Propositions 2 and 3 are available
in Appendix A. The effective half-spread, by definition, is the distance between
the price and the contemporaneous midquote. We interpret Equation (7) to be
a characterization of the standard definition of the effective half-spread, that
is, when the unobservable midpoint is proxied by the average mid-ranges. We
argue that the average of the consecutive mid-ranges of days t and t +1 is a
natural proxy for the midquote or the efficient price at the closing time of day
t since the mid-range of day t occurs before the closing time and the mid-
range of the next day occurs after it. As expressed in Equation (7), the squared
distance between the close price and the proxy for the midquote contains
two components: the squared effective half-spread and the transitory variance.
The squared effective spread term represents the squared distance between
the observed close price and the midquote at the time of market close. The
transitory variance term represents the squared distance between the midquote
at the close time and its approximation, that is, the average of two consecutive
mid-ranges. Figure 1 provides a graphical illustration of the two components
of the dispersion measure introduced in Equation (7) in the framework of the
4444
Roll (1984) model. The figure illustrates that the distance between the close
price and the average of the two consecutive mid-ranges reflects two quantities,
namely, the effective spread and the intraday efficient-price variation (σe2 ).
As the next step, we propose a way to compute a measure of intraday volatility,
which we will remove from the dispersion between the close price and the
midquote proxy.
4445
considerably biased if the trades are observed less frequently. Figure 2 illustrates
the explained simulation results. By its accurate estimation of efficient price
variance, Proposition 3 provides us with a way to remove the efficient price
variance part introduced in Proposition 2.
4446
Unlike Roll (1984), the derivation of Equation (9) does not need to rely on
additional restrictive assumptions on the serial independence of trades and equal
likelihood of buyer-initiated and seller-initiated close price, which do not find
empirical support.8 Compared to the HL estimator (Corwin and Schultz 2012),
our model should perform better for at least three reasons: First, it benefits
from the richer readily available information set of price data, i.e. the daily
high, low, and close prices. Second, unlike Corwin and Schultz’s (2012) HL
estimator, our model is robust to the price movements in nontrading periods,
such as weekends, holidays, and overnight price changes. Therefore, it does not
rely on ad-hoc overnight price adjustments.9 Finally, by relying on the average
of high and low prices instead of the price range, our model is less sensitive
to the number of observed trades per day. This is a key advantage that we will
1
N
ŝtwo-day corrected = ŝt , ŝt = max{4(ct −ηt )(ct −ηt+1 ),0}. (11)
N t=1
where N shows the number of days in the month and ŝt refers to the two-
day estimates. As shown in Equation (11), to calculate the two-day corrected
8 Hasbrouck and Ho (1987) and Choi, Salandro, and Shastri (1988), among others, find a serial dependence in the
trade directions, and Harris (1989) and McInish and Wood (1990) show that close prices are more likely to be
buyer initiated than seller initiated.
9 Appendix B provides the proof.
4447
10 Joel Hasbrouck has kindly provided the SAS codes for the Gibbs sampler estimator on his personal Web page.
We modify the codes by altering the estimation windows from stock-years into stock-months. We only consider
4448
Table 1
Other bid-ask estimation methods using daily data
Label Inputs Description
Roll Close price Roll = 2 max{−cov(ct+1 ,ct ),0}, where c is close log-price
Gibbs Close price Gibbs sampler Bayesian estimation of spreads by setting a nonnegative
prior density for the spreads
J
j =1 γ̂j Sj
EffTick Close price EffTick = ,
P̄
Sj = $1/8, $1/4, $1/2, $1,
Min Max Uj ,0 ,1 , j =1
γ̂j = j −1 ,
Min Max Uj ,0 ,1− k=1 γ̂k , j = 2,...,J
⎧
⎪2F ,
⎨ j
j =1
Uj = 2Fj −Fj −1 , j = 2,...,J −1,
⎪
⎩Fj −Fj −1 , j = J
Fong, Holden, and Trzcinka (2017) develop an estimator, named FHT, which
relies on the assumption that price movements that are smaller than the bid-ask
spread will be unobservable and are reflected in the days with zero returns. They
argue that the measure simplifies the LOT measure developed by Lesmond,
Ogden, and Trzcinka (1999) and it performs very well in estimating liquidity
stock-months in which there are at least 12 days with trades. As he already noted on his Web page, the monthly
estimator is less accurate than is the annual version because of the weight of the prior density in the outputs.
4449
of the global equity market to the extent that it becomes one of the most accurate
measures.
Holden (2009), jointly with Goyenko, Holden, and Trzcinka (2009), develops
a proxy for the effective spread based on observable price clustering. Larger
spreads are associated with larger effective tick sizes. The steps to calculate
their EffTick measure are shown in Table 1.
More recently, Corwin and Schultz (2012) develop an estimator based on
daily high and low prices. They argue that high (low) prices are almost always
buyer (seller) initiated. Therefore, the daily price range reflects both the efficient
price volatility and its bid-ask spread. They build their model on the comparison
of one- and two-day price ranges. The latter should twice reflect the variance of
the former, but they should have the same bid-ask spread. This reasoning gives
2. Numerical Simulations
In this section, we perform several numerical simulations under different
settings. For ease of comparison, we define the setting of simulations similar to
that in Corwin and Schultz (2012). We compare two versions of our measure,
labeled CHL, with the HL and Roll estimates, that is, the monthly corrected
and the two-day corrected versions.11
Panel A of Table 2 shows the results for the near-ideal settings. For each
relative spread under analysis, we perform 10,000 time simulations for 21-day
months of the price process. Each day consists of 390 minutes
√
in which trades
are observable. We simply draw from Mt = Mt−1 ezσ/ 390 , Pt = Mt eqt s/2 , z ∼
N (0,1), where Mt and Pt represent the efficient price and observed transaction
price at time t, respectively. We set the daily standard deviation of efficient-
price return, σ to be 3%. qt can be equally likely −1 or +1 for every individual
observed trade, relaxing the assumption of buyer- (seller-) initiated high (low)
prices. We report both the bias and the estimation errors, in terms of RMSEs,
in the table. The results showed in panel A are twofold: First, both CHL and
HL show considerably lower estimation errors compared to the Roll. Second,
although the CHL monthly corrected estimates tend to be less-biased than the
two-day corrected version, they do not show very different estimation errors.
11 Shane Corwin has kindly provided the SAS codes for the HL estimator on his personal Web site. The code
produces several versions of spread estimates. We consider two of them in our simulations. The first version,
named MSPREAD_0, is calculated by setting two-day negative estimates to zero and then taking the monthly
average. The second version, named XSPREAD_0, is calculated by directly setting the negative monthly averaged
estimates to zero. Although the second version produces less-biased results in some simulation cases, Corwin
and Schultz (2012) advocate the former method, which is better associated with the TAQ benchmark.
4450
Table 2
Estimated bid-ask spreads from simulations
Bias RMSEs
CHL HL Roll CHL HL Roll
2-day Month 2-day Month 2-day Month 2-day Month
A. Near-ideal conditions
0.5% spread 0.7% 0.2% 0.9% 0.1% 0.7% 0.8% 0.8% 1.0% 0.5% 1.5%
1.0% spread 0.3% 0.0% 0.8% 0.0% 0.3% 0.5% 0.8% 0.8% 0.6% 1.5%
3.0% spread −0.6% −0.1% 0.2% −0.1% −0.4% 0.8% 0.7% 0.5% 0.6% 1.9%
5.0% spread −0.7% 0.0% 0.0% −0.1% −0.4% 0.9% 0.6% 0.6% 0.6% 2.2%
8.0% spread −0.4% 0.0% −0.2% −0.2% −0.5% 0.7% 0.5% 0.6% 0.6% 2.7%
B. Each trade is visible with a chance of 10% (average of 39 trades Per day)
0.5% spread 0.7% 0.2% 0.6% −0.3% 0.7% 0.8% 0.8% 0.7% 0.4% 1.5%
1.0% spread 0.3% −0.1% 0.3% −0.5% 0.3% 0.5% 0.8% 0.5% 0.7% 1.5%
3.0% spread −0.6% −0.1% −0.3% −0.8% −0.4%
4451
aim to assess how the environment of infrequent trades affects bid-ask spread
estimates. As the downward bias is larger for the cases with less-frequently
observed trades, we design two separate settings. In panel B of Table 2, each
per-minute trade has a 10% chance of being observed, allowing an average of
39 trades per day. In panel C, each trade only has a chance of 2/390≈0.5%
of being observed, allowing an average of two trades per day. This implies
that sometimes there are no transactions or only one trade per day meaning
identical high and low prices, and zero range. To avoid these cases, we discard
any two-day period that includes a nontrading day or a day with zero price
range, and calculate the spreads for the rest of the two-day periods in the
sample.
Three clear results emerge from this analysis. First, under the most
4452
stock-days in our sample include less than 100 trades, but also theses stock-
days belong to 77% of stocks in the sample. These numbers suggest that the
HL estimates’ sensitivity to the daily number of trades can be a broader issue
that goes way beyond a limited number of illiquid stocks.
4453
12 We perform additional numerical simulations reported in the Internet Appendix. These include overnight price
movements, and the relaxation of the assumption of equal likelihood of buyer-initiated and seller-initiated trades.
The trade direction imbalance highly affects the Roll estimates but the effect on CHL and HL estimates is marginal.
4454
roots and average over the month; and (3) we discard estimates for months in
which there are less than 12 applicable days.13 , 14
To calculate the HL estimates, we exactly follow Corwin and Schultz (2012).
More specifically, (1) we keep the previous daily high and low prices on those
days when a stock does not trade, or has a zero price range, and, for the days
with zero range, we adjust the high and low prices of previous day in the ad-hoc
way explained in their paper. (2) we perform the ad-hoc overnight adjustment
as explained in their paper; (3) we use the two-day corrected version; that is,
we set negative two-day estimates to zero; and (4) we discard stock-months
with less than 12 two-day estimates. We then calculate the other measures and
merge all the estimations. We finally discard stock-months in which (1) any of
the estimates produce a missing value, (2) a stock split or enormous distribution
13 An applicable day is defined as one with a closing price, high price, low price, price range, and volume above
zero. Inclusion or exclusion of the volume criterion does not visually change any outcomes. It is also possible
and accurate to replace missing ηt+1 values, for the two-day estimates in which no trade occurs on day t +1, with
readily available mid-quotes. However, to have a fair comparison with other estimates, we refrain from using
midquotes in our estimates. In favor of the Corwin and Schultz (2012) estimates, we keep using the midquotes
for their nontrading days and overnight price adjustments.
14 As we merge the estimates in the next step, this filter will be applied to other estimates as well. Therefore, all the
estimates will have similar quality in terms of the selected months-stocks.
15 We discard stock-months in which the cumulative price adjustment factor (cfacpr) changes more than 20%.
16 For example, the Gibbs estimator’s code returns errors for the few stock-months in which the price remains
constant for most of the days in the month, because the initial trade directions used in the simulations are
calculated as sign of daily returns.
17 To calculate the effective spreads using Daily TAQ data, we use the same SAS codes kindly provided by Craig
Holden on his Web site. We add additional criteria to keep the trades/quotes records with no symbol suffixes.
4455
Table 3
Summary statistics for different estimators
N Mean Median SD ρ(.,ES i,t ) %0
Effective spread 579,872 0.82% 0.27% 1.41% 1.000 0.00
CHL - Two-day 579,872 1.39% 1.02% 1.30% 0.745 0.00
CHL - Monthly 579,872 1.24% 0.74% 1.85% 0.680 33.96
HL - Two-day 579,872 1.21% 0.93% 1.03% 0.660 0.00
HL - Monthly 579,872 0.58% 0.31% 0.87% 0.625 24.25
Roll 579,872 1.50% 0.72% 2.56% 0.454 42.81
Gibbs 579,872 2.13% 1.47% 2.96% 0.397 0.00
EffTick 579,872 2.03% 0.64% 4.81% 0.419 27.44
EffTick – Alt. incr. 579,872 0.25% 0.07% 0.72% 0.514 0.00
FHT 579,872 0.26% 0.00% 0.69% 0.436 61.61
CRSP_S 579,872 0.82% 0.21% 1.61% 0.957 0.30
This table provides the main summary statistics for the pooled sample of the main estimators considered in
this paper. The column labeled N refers to the number of stock-months of estimates in the sample. The column
labeled ρ(.,ES i,t ) refers to the correlation of different estimates with the TAQ effective spread benchmark.
consolidated quotes (CQ) file if the spread is more than five dollars. Second,
we merge the CQ and NBBO (cleaned) data to construct a complete official
NBBO data set. Third, we match trades with constructed official NBBO quotes
one millisecond before them.18 In addition to the above-mentioned filters, we
discard all trades outside the market opening hours and with proportional
effective spreads above 40%. We compute the dollar-weighted average for
intraday proportional effective spreads to obtain the average daily spreads.
Then we take the average of daily spreads to construct the monthly benchmark.
The final step in the data preparation is to link the CRSP and Daily TAQ
using CUSIPs in the TAQ master files.19 This matching strategy allows us to
cover 98% of stock-months estimates from the CRSP. We provide the summary
statistics for the estimates in Table 3. As we compare the pooled data in Table 3,
18 Starting from July 27 2015, Daily TAQ timestamps are provided in microseconds, and, we match trades with the
official NBBO quotes one microsecond before them.
19 We use the monthly master files, which cover a longer portion of our sample. For 2015, however, we rely on
daily master files because monthly master files are not available after 2014.
4456
both the mean and standard deviation convey valuable information about the
explanatory power of the estimators. The mean provides a simple measure for
the level or size of the estimated transaction costs, and the standard deviation
gives information about the time-series and cross-sectional dispersion of spread
estimates around the mean. We also include overall correlations of estimates
with the TAQ effective spreads benchmark, confirming the better association
of two-day corrected estimates over monthly corrected estimates, with the
benchmark. Running a pooled regression of TAQ effective spreads on the CHL
two-day corrected estimates, ES i,t = a+b CHL i,t +εi,t we obtain the values of
−0.29%, 0.8053, and 56% respectively for a, b and R 2 , whereas the same
regression on the CHL monthly corrected estimates delivers the values of
0.18%, 0.5169, and 46% respectively for a, b and R 2 . Although the sample
4457
second subsampling attempt, we group stocks into five quintiles sorting them
by their average number of daily trades during the sample period. Table 4 shows
the correlation coefficients between different estimates and the TAQ effective
spread benchmark for each quintile and for the entire sample. As expected, in
the absence of quote data the CHL estimates have the highest correlation with
the TAQ benchmark for the entire sample, as well as for the first three quintiles
representing stocks less frequently traded.20
As a decomposition of the standard deviations reported in Table 3, we also
compute the cross-sectional standard deviation of the estimates on a monthly
basis to assess how well the estimators’ dispersion follows that of the TAQ
benchmark across time. Figure 5 shows the results for some estimators. It is
clearly evident that the cross-sectional dispersions from our estimator most
closely track that of the benchmark.
We now turn to identifying which criteria should be used to assess the
measurement performance of the effective spread estimators. As stressed by
20 We also observe that CHL has the highest correlation with the Amihud price impact measure, which reflects
another dimension of market liquidity. This holds for the entire sample, and for each of the five quintiles, sorted
by average number of daily trades. Moreover, the correlations are higher for less-frequently traded quintiles. The
Internet Appendix provides the results.
4458
Table 4
Correlations for quintiles based on the average number of trades
N CHL HL Roll Gibbs EffTick FHT CRSP_S
Full sample 579,872 0.745 0.660 0.454 0.397 0.419 0.436 0.957
ANTD quintile 1 77,978 0.820 0.762 0.572 0.685 0.371 0.367 0.936
ANTD quintile 2 103,920 0.785 0.721 0.459 0.459 0.406 0.450 0.945
ANTD quintile 3 110,083 0.701 0.677 0.357 0.334 0.444 0.444 0.943
ANTD quintile 4 130,725 0.616 0.627 0.282 0.251 0.485 0.430 0.931
ANTD quintile 5 157,166 0.529 0.557 0.246 0.240 0.519 0.463 0.912
The table shows the correlation coefficients between different monthly estimates and the TAQ effective spread
benchmark. We group the stocks into five quintiles sorting them by their average number of trades per day during
the sample period. The daily number of trades is counted using TAQ consolidated trades data for trades that
occur between 9:30 and 16:00 and have a positive price and volume. The first four quintiles are constructed of
1,392 stocks, and the fifth is constructed of 1,393 stocks. The labels in the first row refer to our estimator (CHL)
and the estimators proposed by Corwin and Schultz (HL; 2012), Roll (Roll; 1984), Hasbrouck (Gibbs; 2009),
Holden (EffTick; 2009), Fong, Holden, and Trzcinka (FHT; 2017), and Chung and Zhang (CRSP_S; 2014). N
refers to the number of stock-months of estimates for the entire sample, as well as for each quintile. To compare
Figure 5
Cross-sectional dispersion of monthly spread estimates
This figure shows the standard deviations of spread estimates across stocks for each month from October 2003 to
December 2015. In addition to the effective spread based on the Daily TAQ data, the labels refer to our estimator
(CHL) and the estimators proposed by Corwin and Schultz (HL; 2012) and Roll (Roll; 1984).
Goyenko, Holden, and Trzcinka (2009), the choice of the best estimator,
depending on the specific application, should be based on different criteria.
For the sake of completeness, our analysis encompasses the three main criteria
used in the literature: cross-sectional correlation, time-series correlation, and
4459
4460
Table 5
Average cross-sectional correlations with the TAQ benchmark
N CHL HL Roll Gibbs EffTick FHT CRSP_S
A. Average cross-sectional correlations with effective spreads for monthly estimates
Full period 3,944.7 0.738 0.642 0.424 0.369 0.409 0.426 0.959
2003–2007 4,380.5 0.762 0.664 0.435 0.378 0.458 0.522 0.963
2008–2011 3,870.8 0.736 0.635 0.428 0.442 0.391 0.401 0.959
2012–2015 3,555.5 0.715 0.625 0.409 0.288 0.374 0.349 0.956
B. Average cross-sectional correlations with changes in effective spreads for monthly estimates
Full period 3,895.7 0.298 0.284 0.114 0.093 0.026 0.037 0.666
2003–2007 4,266.2 0.328 0.306 0.128 0.093 0.029 0.066 0.659
2008–2011 3,765.9 0.304 0.292 0.121 0.122 0.028 0.036 0.643
2012–2015 3,461.3 0.245 0.239* 0.086 0.057 0.018 0.002 0.684
C. Analysis across different markets
NYSE 1,337.1 0.495* 0.481* 0.213 0.239 0.500 0.405 0.919
4461
4462
Table 6
Average time-series correlations for spread estimates of individual stocks compared to the TAQ
benchmark
N CHL HL Roll Gibbs EffTick FHT CRSP_S
A. Average time-series correlations with effective spreads: Monthly estimates
Full period 7,210 0.518 0.510 0.242 0.330 0.310 0.181 0.739
2003–2007 5,652 0.393 0.377 0.140 0.247 0.252 0.124 0.614
2008–2011 4,783 0.611 0.604 0.317 0.436 0.267 0.150 0.757
2012–2015 4,406 0.314 0.325 0.106 0.175 0.148 0.072 0.614
B. Average time-series correlations with changes in effective spreads: Monthly estimates
Full period 7,124 0.287* 0.290 0.115 0.166 0.050 0.024 0.452
2003–2007 5,574 0.256* 0.258 0.096 0.167 0.040 0.012 0.386
2008–2011 4,727 0.340 0.351 0.146 0.211 0.064 0.034 0.470
2012–2015 4,331 0.190 0.196 0.066 0.102 0.016 0.003 0.388
C. Analysis across different markets
4463
Table 7
Prediction errors
N CHL HL Roll Gibbs EffTick FHT CRSP_S
A. RMSEs, breakdown for different periods, and across different markets
Full period 3,944.7 0.0104 0.0107 0.0221 0.0289 0.0441 0.0130 0.0043
2003–2007 4,380.5 0.0084 0.0086 0.0182 0.0250 0.0368 0.0101 0.0030
2008–2011 3,870.8 0.0141 0.0141* 0.0291 0.0317 0.0551 0.0175 0.0062
2012–2015 3,555.5 0.0089 0.0094 0.0192 0.0302 0.0408 0.0117 0.0037
NYSE 1,337.1 0.0089 0.0077 0.0162 0.0231 0.0170 0.0030 0.0012
AMEX 297.1 0.0115 0.0124 0.0286 0.0253 0.0994 0.0190 0.0062
NASDAQ 2,310.5 0.0111 0.0118 0.0238 0.0316 0.0436 0.0154 0.0050
B. RMSEs, excluding stock-months with zero estimates
Full period 648.4 0.0115 0.0127 0.0261 0.0201 0.0771 0.0168 0.0059
2003–2007 819.2 0.0089 0.0096 0.0214 0.0173 0.0623 0.0124 0.0039
2008–2011 617.8 0.0156 0.0173 0.0347 0.0245 0.0965 0.0227 0.0089
2012–2015 497.5 0.0101 0.0114 0.0226 0.0186 0.0733 0.0155 0.0050
when stocks are traded with large (small) effective spreads.21 The time-series
correlation analysis confirms the previous findings that our estimator generally
provides the most accurate estimates of effective costs, especially for less liquid
stocks.
21 As an additional test, which we report in the Internet Appendix, we construct equally weighted portfolios of stocks
and then compare the correlation of the estimated portfolios’ spread to that of the high-frequency benchmark.
The estimated spreads of market-wide portfolio show a time-series correlation of 0.965 with the ones of the TAQ
benchmark.
22 We repeat this analysis using mean-absolute errors (MAEs) and, confirming the results of this section, find out that
for the entire sample CHL estimates have the lowest MAEs compared with other estimates. The Internet Appendix
provides the results.
4464
In panel A, we include the entire sample, including the zero estimates for
all measures to compare the overall accuracy of estimates. In panel B, we
exclude the stock-months in which Roll, EffTick, or FHT estimates are zero to
compare the accuracy nonzero estimates. In both settings, end-of-day quoted
spreads show lowest RMSEs. However, in absence of end-of-day quotes, our
estimator (CHL) provides the lowest RMSEs compared with other estimators
across the entire sample, as well as AMEX and NASDAQ listed stocks. The
difference between average RMSEs of our estimates and the other estimates
is also significant, using Newey-West (1987) standard errors with four lags to
test whether the time-series of pairwise difference of RMSEs is statistically
different from zero.
23 We also consider comparing the correlation of CHL estimates and the effective spread benchmark, with the ones
from combination of other estimates. To do so, we combine other estimates both by taking their simple average
and using their first principal component. As reported in the Internet Appendix, our estimates show the highest
time-series and cross-sectional correlation with the effective spread benchmark.
4465
Table 8
Partial correlations
CHL | HL,
CHL | HL, Roll,
CHL | HL, Roll, Gibbs,
CHL | HL, Roll, Gibbs, EffTick,
N CHL | HL Roll Gibbs EffTick FHT
A. Average partial cross-sectional correlations with the TAQ benchmark
All stocks, levels 3,944.7 0.478 0.455 0.450 0.439 0.430
All stocks, changes 3,895.7 0.159 0.155 0.151 0.150 0.149
NYSE 1,337.1 0.188 0.202 0.190 0.166 0.159
AMEX 297.1 0.473 0.436 0.412 0.408 0.405
NASDAQ 2,310.5 0.496 0.469 0.465 0.456 0.450
ES quintile 1 1,035.0 0.042 0.067 0.056 0.048 0.045
ES quintile 2 841.4 0.078 0.095 0.089 0.074 0.070
ES quintile 3 753.2 0.159 0.175 0.176 0.164 0.157
ES quintile 4 715.9 0.338 0.325 0.330 0.326 0.320
Newey-West (1987) standard errors with four lags in the time-series of monthly-
estimated cross-sectional correlations. All average cross-sectional correlations
are significantly different from zero and positive, indicating that CHL has
some additional explanatory power, not already included in any overidentified
models, in predicting the effective spread. For instance, the average partial
cross-sectional correlation of CHL and TAQ effective spreads after controlling
for HL, Roll, Gibbs, EffTick, and FHT is 0.430 for the entire sample and
0.159, 0.405, and 0.450 for NYSE, AMEX, and NASDAQ stocks, respectively.
Another interesting result is that the additional explanatory ability of CHL is
larger for less liquid stocks as indicated by the increasing partial correlations
from quintiles 1 to 5 in rows 8 to 12. All these findings remain consistent when
average partial time-series correlations are considered (panel B of Table 8).
4466
4. Other Applications
Well-performing estimators of transaction costs can be applied in a variety
of research areas. To illustrate their potential uses, we propose two simple
applications. The first example is a description of the historical spread estimates
for stocks listed on NYSE (AMEX) from 1926 (1962) to 2015. In the
second example, the spread estimates are applied to measure systematic risks
originating from liquidity issues.
24 See the Internet Appendix for more details on the construction of Monthly TAQ benchmark and additional
analysis.
25 Intuitively, when tick sizes are larger, measuring end-of-day spreads produces larger estimation variance, and,
consequently, larger estimation errors. For example, when the tick size is large enough that the spread size is
only two (one) ticks wide, observing either the end of day bids or asks one tick further than the intraday value
causes a 50% (100%) measurement error.
4467
Figure 7
Average partial correlations after controlling for HL and Roll
We split the stocks sample into three illiquidity terciles by sorting them with their average effective spread
during the sample period. Then we break down each illiquidity tercile into three volatility terciles using the
daily volatility of the stocks during the sample period. The partial correlations are the correlations between the
residuals of regressing TAQ effective spreads and our estimates (CHL) on Corwin and Schultz’s (HL; 2012) and
Roll’s (Roll; 1984) estimates.
4468
Table 9
Comparison with the monthly TAQ benchmark, January 1993–September 2003
N CHL HL Roll Gibbs EffTick FHT CRSP_S
A. Average cross-sectional correlations with the TAQ benchmark
All stocks, levels 5,009.2 0.861 0.833 0.605 0.713 0.637 0.644 0.846
All stocks, changes 4,925.6 0.471 0.460 0.206 0.266 0.194 0.153 0.578
1993–1995 3,922.4 0.812 0.808* 0.609 0.747 0.562 0.607 0.787
1996–2000 5,830.4 0.890 0.869 0.620 0.737 0.728 0.684 0.836
2001–2003 4,701.5 0.860 0.795 0.572 0.635 0.552 0.614 0.927
NYSE 1,578.4 0.810* 0.808* 0.453 0.629 0.812 0.755 0.856
AMEX 353.1 0.929 0.918 0.651 0.846 0.788 0.743 0.850
NASDAQ 4,925.6 0.471 0.460 0.206 0.266 0.194 0.153 0.578
B. Average time-series correlations for spread estimates of individual stocks
All stocks, levels 10,783 0.586 0.580 0.280 0.445 0.464 0.402 0.778
All stocks, changes 10,676 0.401 0.400* 0.155 0.285 0.221 0.116 0.586
1993–1995 6,137 0.496 0.504 0.229 0.379 0.527 0.243 0.634
Figure 8 shows the time development of the estimated spreads computed for
three equally weighted portfolios: the smallest and largest market capitalization
deciles, as well as the entire stocks sample. The spreads originated from our
model display relatively stable variation over time. Reassuringly, this also
applies to the smallest market capitalization decile. In contrast, Corwin and
Schultz (2012) document that the spread estimates generated by their model
display considerable variation over time, and these are extraordinarily high
during the Great Depression, in which the market-wide average estimates of
4469
Figure 8
Time-series evolution of estimated spread, calculated as equally weighted portfolios of stocks
This figure shows the monthly historical developments of spread estimates from our model. Small cap and large
cap portfolios are represented by the first and last decile of stocks sorted by market capitalization at the end of
each month. Panel A (B) shows the estimates for stocks listed on the NYSE (AMEX) between 1926 (1962) and
2015.
4470
the effective spreads are as high as 20% for NYSE stocks and 50% for small cap
stocks. Instead, panel A (B) of Figure 8 shows that our estimates for the NYSE
(AMEX) stocks evolve pretty steadily across every decade, remaining within
an economically reasonable range; that is, the market-wide estimated effective
spread does not exceed 4% (6%) for NYSE (AMEX) stocks. Moreover, the
average estimated effective spread for the small cap stocks listed on the NYSE
(AMEX) does not exceed 12% (19%) during the entire sample.
The results in this subsection suggest that our estimator can be used in various
research areas across many types of markets and assets, including less actively-
traded ones. This is especially true for researchers interested in the ability of
an estimator to capture the temporal evolution of spreads over long time spans
that predate quote data or international markets without quote data.
i M
i M
cov rt+1 ,st+1 cov st+1 ,rt+1
β3 = , β4 = (14)
var rt+1 M
−st+1
M M
var rt+1 −st+1
M
4471
4472
Table 10
Cross-sectional correlations of estimated systematic liquidity risks with the ones of TAQ benchmark
ρ(β2ES ,β2Estimates ) ρ(β3ES ,β3Estimates ) ρ(β4ES ,β4Estimates )
N CHL HL Roll CHL HL Roll CHL HL Roll
A. Cross-section of estimated systematic risks: All stocks
Full period 5,547 0.830 0.744 0.652 0.935 0.919 0.838 0.755 0.673 0.459
2003–2007 4,119 0.501 0.457 0.170 0.670 0.658* 0.407 0.532 0.439 0.214
2008–2011 3,574 0.736 0.604 0.354 0.971 0.970* 0.896 0.557 0.428 0.296
2012–2015 3,268 0.325 −0.032 0.078 0.545 0.327 0.298 0.571 0.510 0.243
B. Cross-section of estimated systematic risks considering liquidity shocks of AR(2) model
Full period 5,433 0.524 0.446 0.158 0.856 0.879 0.563 0.530 0.266 0.234
2003–2007 4,010 0.285 0.152 −0.078 0.808* 0.812 0.595 0.364 0.147 0.064
2008–2011 3,654 0.398 0.265 −0.031 0.893 0.913 0.627 0.520 0.107 0.142
2012–2015 3,231 0.137 0.126* 0.080 0.724 0.752 0.398 0.295 0.221 0.148
26 We reiterate the analysis using 25 portfolios sorted by illiquidity level like in Acharya and Pedersen (2005). As
reported in the Internet Appendix, the results are fully consistent.
4473
The results in panel B of Table 10 confirm the high accuracy of CHL estimates
to gauge systematic liquidity risks using spread innovations. The correlation
coefficients between the estimates of β2 , β3 , and β4 from our model and the
TAQ spreads are 0.524, 0.856, and 0.530, whereas the same correlations for HL
estimates are 0.446, 0.879, and 0.266, and those for the Roll estimates are 0.158,
0.563, and 0.234, respectively. The subsampling analysis across shorter periods
delivers consistent results, confirming that CHL estimates provide systematic
risk estimates that follow the ones of the TAQ benchmark more closely, no
matter if transaction costs are in levels or innovations.
Like in Section 3, we reiterate the subsampling analysis across primary
exchange, market capitalization, and effective spread size (panels C, D,
and E).27 Overall, our estimator outperforms the other measures when stocks
5. Conclusion
Building on the seminal model proposed by Roll (1984), we have derived
a new way to estimate bid-ask spreads using price data. Compared with the
Roll measure, our model has two important benefits: First, it takes advantage
of a richer information set of daily close, high, and low prices, whereas the
Roll measure solely relies on the close prices. Thereby, our model improves
estimation accuracy. From the high and low prices, we can compute the mid-
range, that is, the mean of the daily high and low log-prices, that proxies
the efficient price. Second, our estimator is fully independent of order-flow
dynamics, and therefore it does not rely on bid-ask bounces, as the original Roll
measure does. Our method of estimating effective spreads is straightforward,
is easy to compute, and has an intuitive closed-form solution that resembles
the Roll measure. Whereas the Roll measure relies on the covariance of
consecutive close-to-close price returns, our estimator relies on the covariance
of close-to-mid-range returns around the same close price.
We tested our method numerically and empirically by using Trade and Quotes
(TAQ) data. The simulation analysis shows that considering all imperfections
together (i.e., infrequent trading, inconstant spreads, and nontrading periods),
our model provides more accurate estimates than those from the high-low
estimator proposed by Corwin and Schultz (2012) and the Roll model for
less liquid securities, for which transaction costs and liquidity issues are of
27 To facilitate comparisons, we use the same quintile groups like in Section 3. However, here we remove a few
more stocks that have fewer than 30 months of data.
4474
much more concern. In the empirical analysis, the effective spread computed
with TAQ data serves as the benchmark for our comparative considerations.
When end-of-day quote data are available, that is, from 1993 onwards, the
closing percentage quoted spread generally represents the most accurate low-
frequency spread proxy. This is especially true across the post-decimalization
era in the U.S. stock market from 2001 onwards, whereas before it, the closing
percentage quoted spread (our estimator) outperforms the other estimators in
terms of average time-series correlations (average cross-sectional correlations
and lowest estimation errors).
On the other hand, when quote data are unavailable, our estimator is the
most accurate one. Assessed against other estimates, it generally provides
the highest cross-sectional and average time-series correlation with the TAQ
4475
2 2
E het −cte = E lte −cte = σe2 . (A3)
Plugging (A2) and (A3) into (A1) leads to the proof
1 2 2 2
Proof of Proposition 2
Now we use the two propositions for the proof of Proposition 2 of the paper. The stepwise proof
is as follows:
2
E (ct −(ηt +ηt+1 )/2)2 = E cte +qt s/2−ηt /2−ηt+1 /2 (A7)
2 2
= E qt2 s 2 /4+1/4 cte −ηt +1/4 cte −ηt+1 +1/2 cte −ηt+1 cte −ηt
+qs/4 cte −ηt+1 +qs/4 cte −ηt , (A8)
2
= s 2 /4+1/2E cte −het /2−lte /2 , (A9)
2 2
= s 2 /4+(1/2)E (1/4) cte −het +(1/4) cte −lte +(1/2) cte −lte cte −het ,
(A10)
4476
Proof of Proposition 3
The proof for Proposition 3 of the paper is similar to that of Proposition 2:
2
E (ηt+1 −ηt )2 = E ηt+1 −cte +cte −ηt , (A12)
2 2
= 2E (cet −ηt ] = 2E cte −het /2−lte /2 , (A13)
2 2
= 2E (1/4) cte −het +(1/4) cte −lte +(1/2) cte −lte cte −het , (A14)
= (2−2log(2))σe2 . (A15)
Equation (A13) is the result of Proposition A2, and, finally, we derive Equation (A15) using
Proposition A1.
Definition B1. The nontrading period (e.g., overnight) efficient-price variance is defined as
follows:
2
σN2 ontrading = E ot+1
e
−cte . (B1)
Proposition B1. If we consider a price movement during nontrading periods with the variance
of σN2 ontrading , Equation (B2) holds:
E (ct −(ηt +ηt+1 )/2)2 = s 2 /4+(1/2−log(2)/2)σe2 +1/4σN2 ontrading . (B2)
Proof of Proposition B1: The proof is similar to the proof of Proposition 2, which is explained in
Appendix A. The only difference arises because the distance between efficient close price of day
t and the efficient high (low) price of day t +1 is higher than the distance between efficient close
price of day t and the efficient high (low) price at the same day. Therefore, Equation (A5) no longer
holds, and, instead, Equation (B3) shows the link between the two quantities. Using Equation (B3)
and following the steps of the proof in in Appendix A leads to the proof of Proposition B1.
2 2 2
E (cet −het+1 ] = E cte −ot+1 e e
+ot+1 −het = σN ontrading
2 2
e
+E ot+1 −het = σN2 ontrading +E cte −het . (B3)
Proposition B2. If we consider a price movement during nontrading periods (e.g., overnight)
with the variance of σN2 ontrading , Equation (B4) holds:
E (ηt+1 −ηt )2 = (2−2log(2))σe2 +σN2 ontrading . (B4)
Proof of Proposition B2: The proof is very similar to the proof of Proposition B1.
4477
Proposition C1. Theorem 1 still holds if the assumptions of buyer-(seller-) initiated high (low)
prices are replaced with the following assumptions:
(1) The trade directions of high and low prices are independent of the ones of previous day.
(2) The trade directions of high and low prices are independent of the ones for close prices.
(3) The chance of high price being buyer-initiated is equivalent to the chance of low price being
seller-initiated.28 This symmetry between the two trade directions is specified more formally
in Equation (C3).
E qth = −E qtl (C3)
Proof of Proposition C1: Starting from the right-hand side of Equation (9) and replacing the
observed close, high and low prices with the right-hand sides of Equations (1), (C1), and (C2).
Using the assumptions that the efficient price path and trade directions are independent of each
other, and the expected symmetry in efficient log-price movements, one can derive Equation
(C4):
1 h 1
4E [(ct −ηt )(ct −ηt+1 )] = s2 E qt2 + qth +qtl qt+1 l
+qt+1 − qt qth +qtl +qt+1
h l
+qt+1 .
4 2
(C4)
Then, using the assumptions in Proposition C1, the expectation term in the right-hand side of
Equation (C4) reduces to E qt2 , which is equal to Equation (9) of the paper. It is important
to note that qth and qtl refer to the trade direction of observed rather than efficient high (low)
prices. Hence, Equation (C3) does not necessarily impose dependence between trade directions
and efficient price values. More specifically, while trade directions can be independent of the
efficient price path, the high (low) observed trades might more often reflect buyer- (seller-)
initiated trades because these trades are more likely to be selected as high (low) observed
prices.
28 As shown in the Internet Appendix, the analysis of Daily TAQ data provides empirical support to this assumption.
4478
References
Acharya, V. V., and L. H. Pedersen. 2005. Asset pricing with liquidity risk. Journal of Financial Economics
77:375–410.
Amihud, Y. 2002. Illiquidity and stock returns: Cross section and time-series effects. Journal of Financial Markets
5:31–56.
Amihud, Y., and H. Mendelson. 1986. Asset pricing and the bid-ask spread. Journal of Financial Economics
17:223–49.
Asparouhova, E. N., H. Bessembinder, and I. Kalcheva. 2010. Liquidity biases in asset pricing tests. Journal of
Financial Economics 96:215–37.
———. 2013. Noisy prices and inference regarding returns. Journal of Finance 68:665–714.
Beckers, S. 1983. Variance of security price returns based on high, low, and closing prices. Journal of Business
56:97–112.
Chordia, T., R. Roll, and A. Subrahmanyam. 2000. Commonality in liquidity. Journal of Financial Economics
56:3–28.
Chung, K. H., and H. Zhang. 2014. A simple approximation of intraday spreads using daily data. Journal of
Financial Markets 17:94–120.
Corwin, S. A., and P. Schultz. 2012. A simple way to estimate bid-ask spreads from daily high and low prices.
Journal of Finance 67:719–59.
Fong, K., C. W. Holden, and C. A. Trzcinka. 2017. What are the best liquidity proxies for global research? Review
of Finance Forthcoming.
Garman, M. B., and M. J. Klass. 1980. On the estimation of security price volatilities from historical data. Journal
of Business 53:67–78.
Goyenko, R. Y., C. W. Holden, and C. A. Trzcinka. 2009. Do liquidity measures measure liquidity? Journal of
Financial Economics 92:153–81.
Hameed, A., W. Kang, and S. Viswanathan. 2010. Stock market declines and liquidity. Journal of Finance
65:257–93.
Harris, L. E. 1989. A day-end transaction price anomaly. Journal of Financial and Quantitative Analysis 24:
29–45.
———. 1990. Statistical properties of the Roll serial covariance bid/ask spread estimator. Journal of Finance
45:579–590.
Hasbrouck, J. 2004. Liquidity in the futures pits: Inferring market dynamics from incomplete data. Journal of
Financial and Quantitative Analysis 39:305–26.
———. 2009. Trading costs and returns for US equities: the evidence from daily data. Journal of Finance
64:1445–77.
Hasbrouck, J. and Thomas S. Y. Ho. 1987. Order arrival, quote behavior, and the return-generating process.
Journal of Finance 42:1035–48.
Hasbrouck, J., and D. J. Seppi. 2001. Common factors in prices, order flows, and liquidity. Journal of Financial
Economics 59:383–411.
Holden, C. W. 2009. New low-frequency liquidity measures. Journal of Financial Markets 12:778–813.
Holden, C. W., and S. Jacobsen. 2014. Liquidity measurement problems in fast, competitive markets: expensive
and cheap solutions. Journal of Finance 69:1747–85.
4479
Holden, C. W., S. Jacobsen, and A. Subrahmanyam. 2014. The empirical analysis of liquidity, Foundations and
Trends in Finance 8:263–365.
Huberman, G., and D. Halka. 2001. Systematic liquidity. Journal of Financial Research 24:161–78.
Kamara, A., X. Lou, and R. Sadka. 2008. The divergence of liquidity commonality in the cross-section of stocks.
Journal of Financial Economics 89:444–66.
Karolyi, G. A., K.-H. Lee, and M. A. van Dijk. 2012. Understanding commonality in liquidity around the world.
Journal of Financial Economics 105:82–112.
Lee, C. M. C., and M. J. Ready. 1991. Inferring trade direction from intraday data. Journal of Finance 46:733–46.
Lesmond, D. A., J. P. Ogden, and C. A. Trzcinka. 1999. A new estimate of transaction costs. Review of Financial
Studies 12:1113–41.
McInish, T. H., and R. A. Wood. 1990. An analysis of transactions data for the Toronto Stock Exchange: Return
patterns and the end-of-day effect. Journal of Banking and Finance 14:441–58.
Parkinson, M. 1980. The extreme value method for estimating the variance of the rate of return. Journal of
Business 53:61–65.
Pastor, L., and R. F. Stambaugh. 2003. Liquidity risk and expected stock returns. Journal of Political Economy
111:642–85.
Roll, R. 1984. A simple implicit measure of the effective bid-ask spread in an efficient market, Journal of Finance
39:1127–39.
Stoll, H., and R. E. Whaley. 1983. Transaction costs and the small firm effects. Journal of Financial Economics
12:57–79.
Watanabe, A., and M. Watanabe. 2008. Time-varying liquidity risk and the cross section of stock returns. Review
of Financial Studies 21:2449–86.
4480