VolatilityMatrix JASA
VolatilityMatrix JASA
t,
w, (2)
where w
T
1 = 1 and
t,
=
_
t +
t
E
t
S
u
du, (3)
with E
t
denoting the conditional expectation given the history up
to time t. Let w
+
be the proportion of long positions and w
be
the proportion of the short positions. Then, w
1
= w
+
+w
is
the gross exposure of the portfolio. To simplify the problem,
following Jagannathan and Ma (2003) and Fan et al. (2011),
we consider only the risk optimization problem. In practice, the
expected return constraint can be replaced by the constraints
of sectors or industries, to avoid unreliable estimates of the
expected return vector. For a short-time horizon, the expected
return is usually negligible. Following Fan et al. (2011), we
consider the following risk optimization under gross-exposure
constraints:
min w
T
t,
w, s.t.w
1
c and w
T
1 = 1, (4)
where c is the total exposure allowed. Note that using w
+
(c 1)/2. As noted by
Jagannathan and Ma (2003), the constrained optimization prob-
lem (4) is equivalent to unconstrained risk optimization with
a regularized covariance matrix. Other methods of regulariza-
tion are also possible to handle the noise-accumulation problem
(e.g., the shrinkage method of Ledoit and Wolf (2004)).
The problem (4) involves the conditional expected volatility
matrix (3) in the future. Unless we know exactly the dynamic
of the volatility process, which is usually unknown, even if we
observed the entire continuous paths up to the current time t.
As a result, we rely on the approximation even with ideal data
that we were able to observe the processes continuously without
error. The typical approximation is
t,
h
1
_
t
t h
S
u
du (5)
D
o
w
n
l
o
a
d
e
d
b
y
[
P
r
i
n
c
e
t
o
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
5
:
4
1
1
9
S
e
p
t
e
m
b
e
r
2
0
1
2
414 Journal of the American Statistical Association, March 2012
for an appropriate window width h and we estimate
_
t
t h
S
u
du based on the historical data at the time interval
[t h, t ].
Approximation (5) holds reasonably well when and h are
both small. This relies on the continuity assumptions: local time-
varying volatility matrices are continuous in time. The approx-
imation is also reasonable when both and h are large. This
relies on the ergodicity assumption so that both quantities will
be approximately ES
u
, when the stochastic volatility matrix
S
u
is ergodic. The approximation is not good when is small,
whereas h is large as long as S
u
is time varying, whether the
stochastic volatility S
u
is stationary or not. In other words, when
the holding time horizon is short, as long as S
u
is time varying,
we can only use a short-time window [t h, t ] to estimate
t,
.
The recent arrivals of high-frequency data make this problem
feasible.
The approximation error in (5) cannot usually be evaluated
unless we have a specic parametric model on the stochastic
volatility matrix S
u
. However, this is at the risk of model mis-
specications and nonparametric approach is usually preferred
for high-frequency data. With p
2
elements approximated, which
can be in the order of hundreds of thousands or millions, a natural
question to ask is whether these errors accumulate and whether
the result (risk) is stable. The gross-exposure constraint gives a
stable solution to the problem as shown by Fan et al. (2011).
We would like to close this section by noting that formulation
(4) is a one-period, not a multi-period, portfolio optimization
problem.
2.2 Risk Approximations With Gross-Exposure
Constraints
The utility of gross-exposure constraint can easily be seen
through the following inequality. Let
t,
be an estimated co-
variance matrix and
R
t,
(w) = w
T
t,
w (6)
be estimated risk of the portfolio. Then, for any portfolio with
gross exposure w
1
c, we have
|
R
t,
(w) R
t,
(w)|
p
i=1
p
j=1
|
i,j
i,j
||w
i
||w
j
|
|
t,
t,
|
w
2
1
|
t,
t,
|
c
2
, (7)
where
i,j
and
i,j
are respectively the (i, j) elements of
t,
and
t,
, and
|
t,
t,
|
= max
i,j
|
i,j
i,j
|
is the maximum elementwise estimation error. Risk approxima-
tion (7) reveals that there is no large error accumulation effect
when gross exposure c is moderate.
From now on, we drop the dependence of t and whenever
there is no confusion. This facilitates the notation.
Fan et al. (2011) showed further that the risks of optimal
portfolios are indeed close. Let
w
opt
= argmin
w
T
1=1, ||w||
1
c
R(w),
w
opt
= argmin
w
T
1=1, ||w||
1
c
R(w) (8)
be, respectively, the theoretical (oracle) optimal allocation vec-
tor we want and the estimated optimal allocation vector we get.
Then, R(w
opt
) is the theoretical minimum risk and R( w
opt
) is
the actual risk of our selected portfolio, whereas
R( w
opt
) is our
perceived risk, which is the quantity known to nancial econo-
metricians. They showed that
|R( w
opt
) R(w
opt
)| 2a
p
c
2
, (9)
|R( w
opt
)
R( w
opt
)| a
p
c
2
, (10)
|R(w
opt
)
R( w
opt
)| a
p
c
2
, (11)
with a
p
= |
< X
(i)
+X
(j)
> and
< X
(i)
X
(j)
>, the integrated covaria-
tion can be estimated as
i,j
=
_
X
(i)
, X
(j)
_
=
_
_
X
(i)
+X
(j)
_
_
X
(i)
X
(j)
__
/4. (12)
In particular, the diagonal elements are estimated by the method
itself. When the two-scale realized volatility (TSRV) (Zhang et
al. 2005) is used, this results in the two-scale realized covariance
(TSCV) estimate (Zhang 2011).
3.2 Pairwise-Refresh Method and TSCV
We nowfocus on the pairwise estimation method. To facilitate
the notation, we reintroduce it.
We consider two log-price processes X and Y that satisfy
dX
t
=
(X)
t
dt +
(X)
t
dB
(X)
t
and dY
t
=
(Y)
t
dt +
(Y)
t
dB
(Y)
t
,
(13)
where cor(B
(X)
t
, B
(Y)
t
) =
(X,Y)
t
. For the two processes X and Y,
consider the problemof estimating X, Y
T
with T = 1. Denote
by T
n
the observation times of X and S
m
the observation times of
Y. Denote the elements in T
n
and S
m
by {
n,i
}
n
i=0
and {
m,i
}
m
i=0
,
respectively, in an ascending order (
n,0
and
m,0
are set to be
0). The actual log prices are not directly observable, but are
observed with microstructure noises:
X
o
n,i
= X
n,i
+
X
i
, and Y
o
m,i
= Y
m,i
+
Y
i
, (14)
where X
o
and Y
o
are the observed transaction prices in the
logarithmic scale, and X and Y are the latent log prices govern by
the stochastic dynamics (13). We assume that the microstructure
noise
X
i
and
Y
i
processes are independent of the X and Y
processes and that
X
i
i.i.d.
N
_
0,
2
X
_
and
Y
i
i.i.d.
N
_
0,
2
Y
_
. (15)
Note that this assumption is mainly for the simplicity of pre-
sentation; as we can see from the proof, one can, for example,
easily replace the identical Gaussian assumption with the not
necessarily identically distributed (but are of the same variance)
sub-Gaussian assumption without affecting our results.
The pairwise-refresh time V = {v
0
, v
1
, . . . , v
n
} can be ob-
tained by setting v
0
= 0, and
v
i
= max
_
min{ T
n
: > v
i1
}, min{ S
m
: > v
i1
}
_
,
where n is the total number of refresh times in the interval (0, 1].
The actual sample times for the two individual processes X and
Y that correspond to the refresh times are
t
i
= max{ T
n
: v
i
} and s
i
= max{ S
m
: v
i
},
which are indeed the previous-tick measurement.
We study the property of the TSCVbased on the asynchronous
data:
X, Y
1
= [X
o
, Y
o
]
(K)
1
n
K
n
J
_
X
o
, Y
o
_
(J)
1
, (16)
where
[X
o
, Y
o
]
(K)
1
=
1
K
n
i=K
_
X
o
t
i
X
o
t
iK
__
Y
o
s
i
Y
o
s
iK
_
and n
K
= ( n K +1)/K. As discussed by Zhang (2011), the
optimal choice of K has order K = O( n
2/3
), and J can be taken
to be a constant such as 1.
When either the microstructure error or the asynchronicity
exists, the realized covariance is seriously biased. An asymp-
totic normality result in Zhang (2011) reveals that TSCV can
simultaneously remove the bias due to the microstructure error
and the bias due to the asynchronicity. However, this result is
not adequate for our application to the vast volatility matrix esti-
mation. To understand its impact on a
p
, we need to establish the
concentration inequality. In particular, for a sufciently large
|x| = O((log p)
), if
max
i,j
P{
n|
ij
ij
| > x} < C
1
exp
_
C
2
x
1/
_
, (17)
for some positive constants C
1
, C
2
, and , then
a
p
= |
|
= O
P
_
(log p)
n
_
. (18)
We will show in the next section that the result indeed holds for
some which depends on the tail of the volatility process and
n replaced by the minimum of the subsample size ( n
K
( n)
1
3
).
Hence the impact of the number of assets is limited, only of the
logarithmic order.
3.3 Concentration Inequalities
Inequality (17) requires the conditions on both diagonal ele-
ments and off-diagonal elements. Technically, they are treated
differently. For the diagonal cases, the problem corresponds to
the estimation of integrated volatility and there is no issue of
asynchronicity. TSCV(16) reduces to TSRV(Zhang et al. 2005),
which is explicitly given by
X, X
1
=
_
X
o
, X
o
_
(K)
1
n
K
n
J
_
X
o
, X
o
_
(J)
1
, (19)
where [X
o
, X
o
]
(K)
1
=
1
K
n
i=K
(X
o
t
i
X
o
t
iK
)
2
and n
K
= (n
K +1)/K. As shown by Zhang et al. (2005), the optimal choice
of K has order K = O(n
2/3
) and J can be taken to be a constant
such as 1.
To facilitate the reading, we relegate the technical conditions
and proofs to the Appendix. The following two results establish
the concentration inequalities for the integrated volatility and
integrated covariation.
Theorem 1. Let X process be as in (13), and n be the total
number of observations for the X process during the time interval
(0,1]. Under Conditions 14 in the Appendix
(A) if
(X)
t
C
X, X
1
_
1
0
(X)
t
2
dt | > x
_
4 exp{Cx
2
}
D
o
w
n
l
o
a
d
e
d
b
y
[
P
r
i
n
c
e
t
o
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
5
:
4
1
1
9
S
e
p
t
e
m
b
e
r
2
0
1
2
416 Journal of the American Statistical Association, March 2012
for positive constants c and C. A set of candidate values
for c and C are given in (A.25).
(B) if the tail behavior of
(X)
t
can be described as
P
_
sup
0t 1
(X)
t
C
_
k
exp
_
aC
b
_
, for any C
> 0
with positive constants k
X, X
1
_
1
0
(X)
t
2
dt | > x
_
(4 +k
) exp{C x
2b
4+b
}.
Aset of candidate values for c and C are given in (A.26).
Theorem 2. Let X and Y be as in (13), and n be the total
number of refresh times for the processes X and Y during the
time interval (0,1]. Under Conditions 15 in the Appendix,
(A) if
(i)
t
C
X, Y
1
_
1
0
(X)
t
(Y)
t
(X,Y)
t
dt | > x
_
8 exp{Cx
2
}
for positive constants c and C. A set of candidate values
for c and C are given in (A.29).
(B) if the tail behavior of
(i)
t
for i = X or Y satisfy
P
_
sup
0t 1
(i)
t
C
_
k
exp
_
aC
b
_
, for any C
> 0
with positive constants k
X, Y
1
_
1
0
(X)
t
(Y)
t
(X,Y)
t
dt | > x
_
(8 +2k
) exp
_
Cx
2b
4+b
_
.
Aset of candidate values for c and C are given in (A.30).
3.4 Error Rates on Risk Approximations
Having had the above concentration inequalities, we can now
readily give an upper bound of the risk approximations. Con-
sider the p log-price processes as in Section 2.1. Suppose the
processes are observed with the market microstructure noises.
Let n
(i,j)
be the observation frequency obtained by the pairwise-
refresh method for two processes X
(i)
and X
(j)
and n
be the ob-
servation frequency obtained by the all-refresh method. Clearly,
n
(i,j)
is typically much larger than n
.
Using (18), an application to Theorems 1 and 2 to each ele-
ment in the estimated integrated covariance matrix yields
a
pairwiserefresh
p
= |
pairwise
|
= O
P
_
(log p)
n
1/6
min
_
, (20)
where is
1
2
when the volatility processes are bounded and
is a constant depending on the tail behavior of the volatility
processes when they are unbounded, and n
min
= min
i,j
n
(i,j)
is
the minimum number of observations of the pairwise-refresh
time.
Note that based on our proofs which do not rely on any
particular properties of pairwise-refresh times, our results of
Theorems 1 and 2 are applicable to all-refresh method as well,
with the observation frequency of the pairwise-refresh times
replaced by that of the all-refresh times. Hence, using the all-
refresh time scheme, we have
a
allrefresh
p
= |
allrefresh
|
= O
P
_
(log p)
n
1/6
_
, (21)
with the same as above. Clearly, n
min
is larger than n
. Hence,
the pairwise-refresh method gives a somewhat more accurate
estimate in terms of the maximumelementwise estimation error.
3.5 Projections of Estimated Volatility Matrices
The risk approximations (9)(11) hold for any solutions to (8)
whether the matrix
is positive semidenite or not. However,
convex optimization algorithms typically require the positive
semideniteness of the matrix
. Yet, the estimates based on
the elementwise estimation sometimes cannot all satisfy this
and even the ones from all-refresh method can have the same
problemwhen TSCVis applied. This leads to the issue of howto
project a symmetric matrix onto the space of positive semide-
nite matrices.
There are two intuitive methods for projecting a p p sym-
metric matrix A onto the space of positive semidenite
matrices. Consider the singular value decomposition: A =
T
diag(
1
, . . . ,
p
), where is an orthogonal matrix, con-
sisting of p eigenvectors. The two intuitive appealing projection
methods are
A
+
1
=
T
diag(
+
1
, . . . ,
+
n
), (22)
where
+
j
is the positive part of
j
, and
A
+
2
= (A +
min
I
p
)/(1 +
min
), (23)
where
min
is the negative part of the minimum eigenvalue. For
both projection methods, the eigenvectors remain the same as
those of A. When A is positive semidenite matrix, we have
obviously that A
1
= A
2
= A.
In applications, we apply the above transformations to the es-
timated correlation matrix A rather than directly to the volatility
matrix estimate
. The correlation matrix A has diagonal ele-
ments of 1. The resulting matrix under projection method (23)
apparently still satises this property, whereas the one under
projection method (22) does not. As a result, projection method
(23) keeps the integrated volatility of each asset intact.
In our initial simulation and empirical studies, we ap-
plied both projections. It turns out that there is no signicant
difference between the two projection methods in terms of
results. We decided to apply only projection (23) in all nu-
merical studies.
3.6 Comparisons Between Pairwise- and All-Refresh
Methods
The pairwise-refresh method keeps far richer information in
the high-frequency data than the all-refresh method. Thus, it is
D
o
w
n
l
o
a
d
e
d
b
y
[
P
r
i
n
c
e
t
o
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
5
:
4
1
1
9
S
e
p
t
e
m
b
e
r
2
0
1
2
Fan, Li, and Yu: Vast Volatility Matrix Estimation Using High-Frequency Data 417
expected to estimate each element more precisely. Yet, the esti-
mated correlation matrix is typically not positive semidenite.
As a result, projection (23) can distort the accuracy of element-
wise estimation. On the other hand, the all-refresh method more
often gives positive semidenite estimates. Therefore, projec-
tion (23) has less impact on the all-refresh method than on the
pairwise-refresh method.
Risk approximations (9)(11) are only the upper bounds. The
upper bounds are controlled by a
p
, which has rates of con-
vergence govern by (20) and (21). While the average number
of observations of pairwise-refresh time is far larger than the
number of observations n
(i)
t
dB
(i)
t
+
_
1 (
(i)
)
2
(i)
t
dW
t
+
(i)
dZ
t
,
i = 1, . . . , p, (24)
where the elements of B, W, and Z are independent standard
Brownian motions. The spot volatility obeys the independent
OrnsteinUhlenbeck processes:
d
(i)
t
=
(i)
_
(i)
0
(i)
t
_
dt +
(i)
1
dU
(i)
t
, (25)
where
(i)
t
= log
(i)
t
and U
(i)
t
is an independent Brownian
motion.
The number of assets p is taken to be 50. Slightly
modied from Barndorff-Nielsen et al. (2008), the
parameters are set to be (
(i)
,
(i)
0
,
(i)
1
,
(i)
,
(i)
) =
(0.03x
(i)
1
, x
(i)
2
, 0.75x
(i)
3
, 1/40x
(i)
4
, 0.7) and
(i)
=
exp(
(i)
0
), where x
(i)
j
is an independent realization from the
uniform distribution on [0.7, 1.3]. The parameters are kept
xed in the simulations.
Model (24) is used to generate the latent log-price values with
initial values X
(i)
0
= 1 (log price) and
(i)
0
from its stationary
distribution. The Euler scheme is used to generate latent price
at the frequency of once per second. To account for the market
microstructure noise, the Gaussian noises
(i)
t
i.i.d.
N(0,
2
) with
= 0.0005 are added. Therefore, like (14), the observed log
prices are X
o(i)
t
= X
(i)
t
+
(i)
t
.
To model the nonsynchronicity, p independent Poisson pro-
cesses with intensity parameters
1
,
2
, . . . ,
p
are used to sim-
ulate the trading times of the assets. Motivated by the U.S. equity
trading dataset (the total number of seconds in a common trad-
ing day of the U.S. equity is 23, 400), we set the trading inten-
sity parameters
i
s to be 0.02i 23400 for i = 1, 2, . . . , 50,
meaning that the average numbers of trading times for each asset
are spread out in the arithmetic sequence of the interval [468,
23,400].
4.2 An Oracle Investment Strategy and Risk
Assessment
An oracle investment strategy is usually a decent benchmark
for other portfolio strategies to compare with. There are several
oracle strategies. The one we choose is to make portfolio al-
location based on the covariance matrix estimated using latent
prices at the nest grid (one per second). Latent prices are the
noise-free prices of each asset at every time points (one per
second), which are unobservable in practice and are available
to us only in the simulation. Therefore, for each asset, there are
23, 400 latent prices in a normal trading day. We will refer to
the investment strategy based on the latent prices as the oracle
or latent strategy. This strategy is not available for the empirical
studies.
The assessment of risk is based on the high-frequency data.
For a given portfolio strategy, its risk is computed based on
the latent prices at every 15 minutes for the simulation studies,
whereas for the empirical studies, the observed prices at every 15
minutes are used to assess its risk. This mitigates the inuence
of the microstructure noises. For the empirical study, we do not
hold positions overnight therefore are immune to the overnight
price jumps (we will discuss the details in Section 5).
4.3 Out-of-Sample Optimal Allocation
One of the main purposes of this article is to investigate
the comparative advantage of the high-frequency-based meth-
ods against the low-frequency-based methods, especially in the
context of portfolio investment. Hence, it is essential for us to
run the following out-of-sample investment strategy test which
includes both the high-frequency- and low-frequency-based ap-
proaches. Moreover, since in the empirical studies, we do not
know the latent asset prices, the out-of-sample test should be
designed so that it can also be conducted in the empirical studies.
We simulate the prices of 50 traded assets as described in
Section 4.1 for the duration of 200 trading days (numbered as
day 1, day 2, . . . , day 200) and record all the tick-by-tick trading
times and trading prices of the assets.
We start investing 1 unit of capital into the pool of assets with
low-frequency- and high-frequency-based strategies from day
101 (the portfolios are bought at the opening of day 101). For the
low-frequency strategy, we use the previous 100 trading days
daily closing prices to compute the sample covariance matrix
and to make the portfolio allocation accordingly with the gross-
exposure constraints. For the high-frequency strategies, we use
the previous h = 10 trading days tick-by-tick trading data. For
D
o
w
n
l
o
a
d
e
d
b
y
[
P
r
i
n
c
e
t
o
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
5
:
4
1
1
9
S
e
p
t
e
m
b
e
r
2
0
1
2
418 Journal of the American Statistical Association, March 2012
the all-refresh strategy, we use all-refresh time to synchronize
the trades of the assets before applying TSCV to estimate the
integrated volatility matrix and make the portfolio allocation,
while for the pairwise-refresh high-frequency strategy, we use
pairwise-refresh times to synchronize each pair of assets and
apply TSCV to estimate the integrated covariance for the cor-
responding pair. With projection technique (23), the resulting
TSCV integrated volatility matrix can always be transformed to
a positive semidenite matrix which facilitates the optimization.
We run two investment plans. In the rst plan, the portfolio
is held for = 1 trading day before we reestimate the covaria-
tion structure and adjust the portfolio weights accordingly. The
second plan is the same as the rst one except for the fact that
the portfolio is held for = 5 trading days before rebalancing.
In the investment horizon (which is fromday 101 to day 200 in
this case), we record the 15-minute portfolio returns based on the
latent prices of the assets, the variation of the portfolio weights
across 50 assets, and other relevant characteristics. While it
appears that 100 trading days is short, calculating 15-minute
returns increases the size of the relevant data for computing the
risk by a factor of 26.
We study those portfolio features for a whole range of gross-
exposure constraint c fromc = 1, which stands for the no-short-
sale portfolio strategy, to c = 3. This is usually the relevant range
of gross exposure for investment purpose.
The standard deviations and other characteristics of the strat-
egy for = 1 are presented in Table 1 (the case = 5 gives
similar comparisons and hence is omitted). The standard de-
viations, which are calculated based on the 15-minute returns
as mentioned before, represent the actual risks of the strategy.
As we only optimize the risk prole, we should not look sig-
nicantly on the returns of the optimal portfolios. They cannot
even be estimated with good accuracy over such a short invest-
ment horizon. Figures 1 and 2 provide graphical details to these
characteristics for both = 1 and = 5.
Table 1. The out-of-sample performance of daily-rebalanced optimal
portfolios with gross-exposure constraint
Std Dev Max Min No. of No. of
Methods % Weight Weight Long Short
Low-frequency sample covariance matrix estimator
c = 1 (No short) 16.69 0.19 0.00 13 0
c = 2 16.44 0.14 0.05 28.5 20
c = 3 16.45 0.14 0.05 28.5 20
High-frequency all-refresh TSCV covariance matrix estimator
c = 1 (No short) 16.08 0.20 0.00 15 0
c = 2 14.44 0.14 0.05 30 19
c = 3 14.44 0.14 0.05 30 19
High-frequency pairwise-refresh TSCV covariance matrix estimator
c = 1 (No short) 15.34 0.18 0.00 15 0
c = 2 12.72 0.13 0.03 31 18
c = 3 12.72 0.13 0.03 31 18
NOTE: We simulate one trial of intra-day trading data for 50 assets, make portfolio
allocations for 100 trading days, and rebalance daily. The standard deviations and other
characteristics of these portfolios are recorded. All the characteristics are annualized (Max
Weight, median of maximum weights; Min Weight, median of minimum weights; No. of
Long, median of numbers of long positions whose weights exceed 0.001; No. of Short,
median of numbers of short positions whose absolute weights exceed 0.001).
From Table 1 and Figures 1 and 2, we see that for both
holding lengths = 1 and = 5, the all-refresh TSCV and
pairwise-refresh TSCVapproaches outperformsignicantly the
low-frequency one in terms of risk prole for the whole range
of the gross-exposure constraints. This supports our theoreti-
cal results and intuitions. First, the shorter estimation window
allows these two high-frequency approaches to deliver consis-
tently better results than the low-frequency one. Second, the
pairwise-refresh method outperforms the all-refresh method, as
expected. Finally, both the low-frequency strategy and the high-
frequency strategies outperform signicantly the equal-weight
portfolio (see Figures 1 and 2).
All the risk curves attain their minimum around c = 1.2 (see
Figures 1 and 2), which meets our expectation again, since
that must be the point where the marginal increase in estimation
error outpaces the marginal decrease in specication error. This,
coupled with the result we get in Empirical Studies section, will
give us some guidelines about what gross-exposure constraint
to use in investment practice.
In terms of portfolio weights, neither the low-frequency nor
the high-frequency optimal no-short-sale portfolios are well di-
versied with all approaches assigning a concentrated weight of
around 20% to one individual asset. Their portfolio risks can be
improved by relaxing the gross-exposure constraint (see Figures
1 and 2).
5. EMPIRICAL STUDIES
Risk minimization problem (6) has important applications
in asset allocation. We demonstrate its application in the stock
portfolio investment in the 30 DowJones Industrial Average (30
DJIA) constituent stocks.
To make asset allocation, we use the high-frequency data
of the 30 DJIA stocks from January 1, 2008, to September 30,
2008. These stocks are highly liquid. The period covers the birth
of nancial crisis in 2008.
At the end of each holding period of = 1 or = 5 trading
days in the investment period (fromMay 27, 2008, to September
30, 2008), the covariance of the 30 stocks is estimated according
to the different estimators. They are the sample covariance of the
previous 100 trading days daily return data (low-frequency), the
all-refresh TSCV estimator of the previous 10 trading days, and
the pairwise-refresh TSCV estimator of the previous 10 trading
days. These estimated covariance matrices are used to construct
optimal portfolios with a range of exposure constraints. For
= 5, we do not count the overnight risks of the portifolio. The
reason is that the overnight price jumps are often due to the
arrival of news and are irrelevant of the topics of our study. The
standard deviations and other characteristics of these portfolio
returns for = 1 are presented in Table 2 together with the stan-
dard deviation of an equally weighted portfolio of the 30 DJIA
stocks rebalanced daily. The standard deviations are for the 15
minutes returns, which represent the actual risks. Figures 3 and
4 provide the graphical details to these characteristics for both
= 1 and = 5.
Table 2 and Figures 3 and 4 reveal that in terms of the
portfolios actual risk, the all-refresh TSCV and pairwise-
refresh TSCV strategies perform at least as well as the
low-frequency-based strategy when the gross exposure is small
and outperform the latter signicantly when the gross exposure
D
o
w
n
l
o
a
d
e
d
b
y
[
P
r
i
n
c
e
t
o
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
5
:
4
1
1
9
S
e
p
t
e
m
b
e
r
2
0
1
2
Fan, Li, and Yu: Vast Volatility Matrix Estimation Using High-Frequency Data 419
1 1.5 2 2.5 3
10
15
20
25
30
35
40
Exposure Constraint
A
n
n
u
a
l
i
z
e
d
R
i
s
k
(
%
)
(a) Risk
1 1.5 2 2.5 3
0.1
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
0.2
Exposure Constraint
M
a
x
i
m
u
m
W
e
i
g
h
t
(b) Maximum Weight
Low frequency
Allrefresh
Pairwiserefresh
Oracle (Latent price)
Equalweight
Low frequency
Allrefresh
Pairwiserefresh
Oracle (Latent price)
Figure 1. Out-of-sample performance of daily-rebalanced optimal portfolios based on high-frequency and low-frequency estimation of the
integrated covariance matrix. (a) Annualized risk of portfolios. (b) Maximum weight of allocations. (The online version of this gure is in color.)
is large. Both facts support our theoretical results and intuitions.
Given 10 times the length of covariance estimation window, the
low-frequency approach still cannot perform better than the
high-frequency TSCV approaches, which afrms our belief that
the high-frequency TSCV approaches can signicantly shorten
the necessary covariance estimation window and capture better
the short-term time-varying covariation structure (or the local
covariance). These results, together with the ones presented in
the Simulation Studies section, lend strong support to the above
statement.
1 1.5 2 2.5 3
10
15
20
25
30
35
40
Exposure Constraint
A
n
n
u
a
l
i
z
e
d
R
i
s
k
(
%
)
(a) Risk
1 1.5 2 2.5 3
0.1
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
0.2
Exposure Constraint
M
a
x
i
m
u
m
W
e
i
g
h
t
(b) Maximum Weight
Low frequency
Allrefresh
Pairwiserefresh
Oracle (Latent price)
Equalweight
Low frequency
Allrefresh
Pairwiserefresh
Oracle (Latent price)
Figure 2. Out-of-sample performance of optimal portfolios based on high-frequency and low-frequency estimation of the integrated covariance
matrix with holding period = 5. (a) Annualized risk of portfolios, (b) maximum weight of allocations. (The online version of this gure is in
color.)
D
o
w
n
l
o
a
d
e
d
b
y
[
P
r
i
n
c
e
t
o
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
5
:
4
1
1
9
S
e
p
t
e
m
b
e
r
2
0
1
2
420 Journal of the American Statistical Association, March 2012
Table 2. The out-of-sample performance of daily-rebalanced optimal
portfolios of the 30 DJIA stocks
Std Dev Max Min No. of No. of
Methods % Weight Weight Long Short
Low-frequency sample covariance matrix estimator
c = 1 (No short) 12.73 0.50 0.00 8 0
c = 2 14.27 0.44 0.12 16 10
c = 3 15.12 0.45 0.18 18 12
High-frequency all-refresh TSCV covariance matrix estimator
c = 1 (No short) 12.55 0.40 0.00 8 0
c = 2 12.36 0.36 0.10 17 12
c = 3 12.50 0.36 0.10 17 12
High-frequency pairwise-refresh TSCV covariance matrix estimator
c = 1 (No short) 12.54 0.39 0.00 9 0
c = 2 12.23 0.35 0.08 17 12
c = 3 12.34 0.35 0.08 17 12
Unmanaged index
30 DJIA equally weighted 22.12
As the gross-exposure constraint increases, the portfolio risk
of the low-frequency approach increases drastically relative to
the ones of the high-frequency TSCV approaches. The reason
could be a combination of the fact that the low-frequency ap-
proach does not produce a well-conditioned estimated covari-
ance due to the lack of data and the fact that the low-frequency
approach can only attain the long-run covariation but cannot
capture well the local covariance dynamics. The portfolio risk
of the high-frequency TSCV approaches increased only moder-
ately as the gross-exposure constraint increases. From nancial
practitioners standpoint, that is also one of the comparative
advantages of high-frequency TSCV approaches, which means
that investors do not need to be much concerned about the choice
of the gross-exposure constraint while using the high-frequency
TSCV approaches.
It can be seen that both the low-frequency and high-frequency
optimal no-short-sale portfolios are not diversied enough.
Their risk proles can be improved by relaxing the gross-
exposure constraint to around c = 1.2, that is, 10% short po-
sitions and 110% long positions are allowed. The no-short-sale
portfolios under all approaches have the maximum portfolio
weight of 22%50%. As the gross-exposure constraint relaxes,
the pairwise-refresh TSCV approach has its maximum weight
reaching the smallest value around 30%34%, while the low-
frequency approach goes down to only around 40%. This is
another comparative advantage of the high-frequency approach
in practice, as a portfolio with less weight concentration is typ-
ically considered more preferable.
Another interesting fact is that the equally weighted daily-
rebalanced portfolio of the 30 DJIAstocks carries an annualized
return of only 10% while DJIA went down 13.5% during
the same period (May 27, 2008, to Sep 30, 2008), giving an
annualized return of 38.3%. The cause of the difference is that
we intentionally avoided holding portfolios overnight, hence
the portfolios are not affected by the overnight price jumps. In
the turbulent nancial market of May to September 2008, our
portfolio strategies are not affected by the numerous sizeable
downward jumps. Those jumps are mainly caused by the news of
distressed economy and corporations. The moves could deviate
far from what the previously held covariation structure dictates.
1 1.5 2 2.5 3
12
14
16
18
20
22
24
Exposure Constraint
A
n
n
u
a
l
i
z
e
d
R
i
s
k
(
%
)
(a) Risk
1 1.5 2 2.5 3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
0.48
0.5
Exposure Constraint
M
a
x
i
m
u
m
W
e
i
g
h
t
(b) Maximum Weight
Low frequency
Allrefresh
Pairwiserefresh
Equalweight
Low frequency
Allrefresh
Pairwiserefresh
Figure 3. Out-of-sample performance of daily-rebalanced optimal portfolios for 30 DJIA constituent stocks with investment period from May
27, 2008, to September 30, 2008 (89 trading days). (a) Annualized risk of portfolios. (b) Maximum weight of allocations. (The online version of
this gure is in color.)
D
o
w
n
l
o
a
d
e
d
b
y
[
P
r
i
n
c
e
t
o
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
5
:
4
1
1
9
S
e
p
t
e
m
b
e
r
2
0
1
2
Fan, Li, and Yu: Vast Volatility Matrix Estimation Using High-Frequency Data 421
1 1.5 2 2.5 3
12
14
16
18
20
22
24
Exposure Constraint
A
n
n
u
a
l
i
z
e
d
R
i
s
k
(
%
)
(a) Risk
1 1.5 2 2.5 3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
0.48
0.5
Exposure Constraint
M
a
x
i
m
u
m
W
e
i
g
h
t
(b) Maximum Weight
Low frequency
Allrefresh
Pairwiserefresh
Equalweight
Low frequency
Allrefresh
Pairwiserefresh
Figure 4. Out-of-sample performance of 5-day-rebalanced optimal portfolios for 30 DJIA constituent stocks with investment period from
May 27, 2008, to September 30, 2008 (89 trading days). (a) Annualized risk of portfolios. (b) Maximum weight of allocations. (The online
version of this gure is in color.)
6. CONCLUSION
We advocate the portfolio selection with gross-exposure con-
straint (Fan et al. 2011). It is less sensitive to the error of co-
variance estimation and mitigates the noise accumulation. The
out-of-sample portfolio performance depends on the expected
volatility in the holding period. It is at best approximated and the
gross-exposure constraints help reducing the error accumulation
in the approximations.
Two approaches are proposed for the use of high-frequency
data to estimate the integrated covariance: all-refresh and
pairwise-refresh methods. The latter retains far more data
on average and hence estimates more precisely element-by-
element. Yet, the pairwise-refresh estimates are often not posi-
tive semidenite and projections are needed for the convex opti-
mization algorithms. The projection distorts somewhat the per-
formance of the pairwise-refresh strategies. New optimization
algorithms need to be developed in order to take full advantage
of pairwise-refresh. Further investigations on the relative mer-
its of pairwise-refresh, blocking approach, and all-refresh
are needed.
The use of high-frequency nancial data increases signi-
cantly the available sample size for volatility estimation, and
hence shortens the time window for estimation, adapts better to
local covariations. Our theoretical observations are supported by
the empirical studies and simulations, in which we demonstrate
convincingly that the high-frequency-based strategies outper-
form the low-frequency-based one in general.
With the gross-exposure constraint, the impact of the size of
the candidate pool for portfolio allocation is limited. We derive
the concentration inequalities to demonstrate this theoretically.
Simulation and empirical studies also lend further support to it.
APPENDIX: CONDITIONS AND PROOFS
Conditions
We derive our theoretical results under the following conditions. For
simplicity, we state the conditions for integrated covariation (Theorem
2). The conditions for integrated volatility (Theorem 1) are simply the
ones with Y = X.
Condition 1. The drift processes are such that
(X)
t
=
(Y)
t
= 0 for
all t [0, 1].
Condition 2.
(i)
t
, i = X, Y are continuous stochastic processes
which are either bounded by 0 < C
(i)
t
C
_
k
exp
_
aC
b
_
, for any C
> 0,
with positive constants k
, a, and b.
Condition 3. The observation times are independent with the X
and Y processes. The synchronized observation times for the X
and Y processes satisfy sup
1j n
n (v
j
v
j1
) C
, where
C
< (A.1)
and
1
2
n
1/3
n
K
2 n
1/3
.
Remark 2. Note that indeed, in the derivations of the Theorems 1
and 2,
(X)
t
and
(Y)
t
are only required to be c` adl` ag. In other words, the
continuity assumption is not needed for the concentration inequalities
to hold. The continuity of the volatility processes is only needed in
approximation (5) in our study, it can be removed or relaxed for other
applications of the concentration inequalities.
Lemmas
We need the following three lemmas for the proofs of Theorems
1 and 2. In particular, Lemma 2 is exponential type of inequality for
any dependent random variables that have a nite moment generating
function. It is useful for many statistical learning problems. Lemma 3 is
a concentration inequality for the realized volatility based on discretely
observed latent process.
Lemma 1. When Z N(0, 1), for any ||
1
4
, E exp{(Z
2
1)} exp(2
2
).
Proof. Using the moment generating function of Z
2
2
1
, we have
E exp{(Z
2
1)} = exp
_
1
2
log(1 2)
_
.
Let g(x) = log(1 x) +x +x
2
with |x| 1/2. Then, g
(x) =
x(1 2x)/(1 x) is nonnegative when x [0, 1/2] and negative when
x [1/2, 0). In other words, g(x) has a minimum at point 0, namely
g(x) 0 for |x| 1/2. Consequently, for || 1/4, log(1 2)
2 (2)
2
. Hence, E exp{(Z
2
1)} exp(2
2
).
Lemma 2. For a set of random variables X
i
, i = 1, . . . , K, and an
event A, if there exists two positive constants C
1
and C
2
such that for
all || C
1
,
E (exp(X
i
)I
A
) exp(C
2
2
), (A.2)
then for w
i
s being weights satisfying
K
i=1
|w
i
| w [1, ), we
have
P
i=1
w
i
X
i
> x
2 exp
_
x
2
4C
2
w
2
_
, when 0 x 2C
1
C
2
.
Proof. By the Markov inequality, for 0 C
1
/w and w
K
i=1
|w
i
|, we have
P
i=1
w
i
X
i
> x
exp(x)E
exp
i=1
w
i
X
i
I
A
exp(x)w
1
K
i=1
|w
i
|E (exp(w|X
i
|)I
A
)
2 exp
_
C
2
2
w
2
x
_
. (A.3)
Taking = x/(2C
2
w
2
), we have
P
i=1
w
i
X
i
> x
2 exp
_
x
2
4C
2
w
2
_
,
when 0 x 2C
1
C
2
. (A.4)
[X, X]
1
=
n
i=1
(X
v
i
X
v
i1
)
2
be the realized volatility based on
the discretely observed X process frommodel (1) of the univariate case:
dX
t
=
t
dt +
t
dW
t
. Under Conditions 13,
(A) if
(X)
t
C
< for all t [0, 1], then for all large n, for
x [0, c
n],
P
n
1/2
[X, X]
1
_
1
0
2
t
dt
> x
2 exp{Cx
2
},
where the constants c and C can be taken as in (A.7).
(B) If the tail behavior of
(X)
t
satises,
P
_
sup
0t 1
(X)
t
C
_
k
exp
_
aC
b
_
, for any C
> 0
with positive constants k
n
1/2
[X, X]
1
_
1
0
2
t
dt
> x
(2 +k
) exp
_
Cx
2b
4+b
_
.
A set of candidate values for c and C are given in (A.8).
Proof. For any constant C
} 1. Let
s
=
_
s
, when s
C
C
, when s >
C
and
X
t
=
_
t
0
s
dW
s
. By time change for martingales, [see, for example,
Theorem 4.6 of chapter 3 of Karatzas and Shreve (2000)], if
t
=
inf{s : [
X]
s
t } where [
X]
s
is the quadratic variation process, then
B
t
:=
X
t
is a Brownian-motion with regard to {F
t
}
0t
. We then
have that
E exp
X
2
t
_
t
0
2
s
ds
= E exp
_
_
B
2
[
X]t
[
X]
t
__
.
Note further that for any t, [
X]
t
is a stopping time with regard to
{F
s
}
0s
, and the process exp((B
2
s
s)) is a submartingale for any
. By the optional sampling theorem, using [
X]
u
C
2
u (bounded
stopping time), we have
E exp
_
_
B
2
[
X]u
[
X]
u
__
E exp
_
_
B
2
C
2
u
C
2
u
__
.
Therefore, note that X
i
=
X
i
on the set of {
C
= 1}, we have that,
under Condition 3,
E
exp
(X
i
)
2
_
v
i
v
i1
2
t
dt
I
{
C
=1}
|F
v
i1
E exp
_
n
_
B
2
C
2
C
2
n
__
= E exp
_
C
2
n
(Z
2
1)
_
, (A.5)
where Z N(0, 1) and X
i
= X
v
i
X
v
i1
.
D
o
w
n
l
o
a
d
e
d
b
y
[
P
r
i
n
c
e
t
o
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
5
:
4
1
1
9
S
e
p
t
e
m
b
e
r
2
0
1
2
Fan, Li, and Yu: Vast Volatility Matrix Estimation Using High-Frequency Data 423
It follows from the law of iterated expectations and (A.5) that
E
exp
[X, X]
1
_
1
0
2
t
dt
I
{
C
=1}
= E
exp
n1
i=1
(X
i
)
2
_
v
n1
0
2
t
dt
I
{
C
=1}
exp
X
2
n
_
vn
v
n1
2
t
dt
I
{
C
=1}
|F
v
n1
exp
n1
i=1
(X
i
)
2
_
v
n1
0
2
t
dt
I
{
C
=1}
E exp
_
C
2
n
(Z
2
1)
_
,
where Z N(0, 1). Repeating the process above, we obtain
E
exp
[X, X]
1
_
1
0
2
t
dt
I
{
C
=1}
_
E exp
_
C
2
n
(Z
2
1)
__
n
.
By Lemma 1, we have for ||
n
4C
2
,
E
exp
[X, X]
1
_
1
0
2
t
dt
I
{
C
=1}
exp
_
2
2
C
4
C
2
_
.
(A.6)
By Lemma 2, we have,
P
n
1/2
[X, X]
1
_
1
0
2
t
dt
> x
{
C
= 1}
2 exp
_
x
2
8C
4
C
2
_
, (A.7)
when 0 x C
2
n
1/2
[X, X]
1
_
1
0
2
t
dt
> x
n
1/2
[X, X]
1
_
1
0
2
t
dt
> x,
C
= 1
+P{
C
< 1}
exp{x}E
exp
[X, X]
1
_
1
0
2
t
dt
I
{
C
=1}
+P{
C
< 1} for nonnegative
2 exp
_
x
2
8C
4
C
2
_
+k
exp
_
aC
b
_
,
when 0 x C
2
n.
Now, let C
= (
x
2
8aC
2
)
1
4+b
, we have that when 0 x (8a)
2
b
C
n
4+b
2b
,
P
n
1/2
[X, X]
1
_
1
0
2
t
dt
> x
(2 +k
) exp
_
8
b
4+b
a
4
4+b
C
2b
4+b
x
2b
4+b
_
. (A.8)
Proof of Theorem 1
We rst prove the results conditioning on the set of observation
times V. Recall notation introduced in Sections 3.2 and 3.3. Let
n be the observation frequency. For simplicity of notation, with-
out ambiguity, we will write
n,i
as
i
and
(X)
t
as
t
. Again let
C
:= inf{t : sup
0st
s
> C
X, X
(K)
1
=
[X, X]
(K)
1
n
K
n
J
[X, X]
1
(J)
, (A.9)
where
[X, X]
(K)
1
= K
1
n
i=K
(X
i
X
iK
)
2
. Then, from the deni-
tion, we have,
X, X
1
=
[X, X]
(K)
1
+
[
X
,
X
]
(K)
1
+2
[X,
X
]
(K)
1
n
K
n
J
_
[X, X]
(J)
1
+
[
X
,
X
]
(J)
1
+2
[X,
X
]
(J)
1
_
=
1
K
K1
l=0
V
(l)
K
n
K
n
J
[X, X]
(J)
1
+R
1
+R
2
, (A.10)
where R
1
=
[
X
,
X
]
(K)
1
n
K
n
J
[
X
,
X
]
(1)
1
, R
2
= 2
[X,
X
]
(K)
1
2
n
K
n
J
[X,
X
]
(1)
1
, and
V
(l)
K
=
n
K
j=1
(X
jK+l
X
(j1)K+l
)
2
, for l = 0, 1, . . . , K 1.
Note that we have assumed that n
K
=
nK+1
K
is an integer above, to
simplify the presentation.
Recall that we consider the case when J = 1, or n
J
= n. Let
I
1
=
1
K
K1
l=0
n
K
V
(l)
K
_
1
0
t
2
dt
_
n
K
n
_3
2
[X, X]
(1)
1
_
1
0
t
2
dt
n
K
R
1
+
n
K
R
2
, (A.11)
and I
2
=
n
3/2
K
n
_
1
0
t
2
dt . We are interested in
n
K
X, X
1
_
1
0
t
2
dt
= I
1
+I
2
.
The key idea is to compute the moment generating functions for each
term in I
1
and then to use Lemma 2 to conclude.
For the rst term in I
1
, since V
(l)
k
is a realized volatility based on
discretely observed X process, with observation frequency satisfying
sup
1i n
K
n
K
(
iK+l
(i1)K+l
) C
n
K
4C
2
,
E
exp
n
K
(V
(l)
K
_
1
0
2
t
dt
I
{
C
=1}
exp
_
2
2
C
4
C
2
_
.
(A.12)
For the second term in I
1
, we have obtained in (A6) that
E
exp
[X, X]
(1)
1
_
1
0
t
2
dt
I
{
C
=1}
exp
_
2
2
C
4
C
2
_
, when ||
n
4C
2
. (A.13)
D
o
w
n
l
o
a
d
e
d
b
y
[
P
r
i
n
c
e
t
o
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
5
:
4
1
1
9
S
e
p
t
e
m
b
e
r
2
0
1
2
424 Journal of the American Statistical Association, March 2012
We introduce an auxiliary sequence a
n
that grows with n at a mod-
erate rate to facilitate our presentation in the following. In particular,
we can set a
n
= n
1/12
. Let us now deal with R
1
, the third term in I
1
.
Note that from the denition
n
K
R
1
=
n
K
K
i=K
(
i
iK
)
2
n K +1
n
n
i=1
(
i
i1
)
2
n
K
(n K +1)
K
n
n
i=1
i1
n
K
n K +1
K
n K +1
n
i=K
iK
n
K
K 1a
n
K
1
a
n
K 1
K1
i=1
_
2
i
2
X
_
n
K
K 1a
n
K
1
a
n
K 1
n1
nK+1
_
2
i
2
X
_
+
n
K
(K 1)a
n
K
1
a
n
n
n
i=1
_
2
i
2
X
_
+
n
K
(K 1)a
n
K
1
a
n
n
n1
i=0
_
2
i
2
X
_
.
(A.14)
The rst two terms in (A.14) are not sums of independent variables.
But they can be decomposed into sums of independent randomvariables
and the moment generating functions can be computed. To simplify the
argument without losing the essential ingredient, let us focus on the
rst term of (A.14). We have the following decomposition
n
i=1
i1
=
oddi
i1
+
eveni
i1
,
and the summands in each terms of the right-hand side are now inde-
pendent. Therefore, we need only to calculate the moment generating
function of
i
i1
.
For two independent normally distributed random variables X
N(0,
2
X
) and Y N(0,
2
Y
), it can easily be computed that
E(exp{n
1/2
XY}) =
_
1
1
2
X
2
Y
2
/n
_
1/2
exp
_
2
X
2
Y
2
/n
_
when ||
2
X
Y
,
where we have used the fact that log(1 x) 2x when 0 x
1
2
.
Hence, by independence, it follows that (we assume n is even to
simplify the presentation)
E exp
2n
1/2
oddi
i1
=
_
1
1 4
4
X
2
/n
_
n/4
exp
_
2
4
X
2
_
, when ||
n
2
2
2
X
. (A.15)
The second term in R
1
works similarly and has the same bound. For
example, when n
K
is even, one can have the following decomposition
n
i=K
iK
=
n
K
/2
j=1
2jK1
i=2jKK
iK
+
n
K
/2
j=1
2jK+K1
i=2jK
iK
.
The last four terms are sums of independent
2
-distributed random
variables and their moment generating functions can easily be bounded
by using Lemma 1. Taking the term
1
an
K1
K1
i=1
(
2
i
2
X
) for exam-
ple, we have
E
exp
a
n
K 1
K1
i=1
_
2
i
2
X
_
exp
_
2
4
X
2
/a
2
n
_
when ||
a
n
K 1
4
2
X
.
For the term R
2
, we have,
n
K
R
2
=
2a
n
n
K
n
1
a
n
i=1
X
i
i1
i=1
X
i
+
2
a
n
a
n
n
K
K
i=K
(K)
X
i
i
n
i=K
(K)
X
i
iK
, (A.16)
where X
i
= X
i
X
i1
, and
(K)
X
i
= X
i
X
iK
. The rst term
above satises
E
exp
a
n
n
i=1
X
i
I
{
C
=1}
= E
exp
i=1
_
a
n
X
i
_
2
2
X
/2
I
{
C
=1}
_
E
_
exp
_
2
X
C
2
Z
2
/2na
2
n
___
n
=
_
1
1
2
X
C
2
2
/na
2
n
_
n/2
exp
_
2
X
C
2
2
/a
2
n
_
, when ||
na
n
2C
X
, (A.17)
where in the second line we have again used the optional sampling
theoremand lawof iterated expectations as in the derivations of Lemma
3; Z denotes a standard normal random variable. The second term in
R
2
works similarly and has the same bound. For the third term, by
conditioning on the X-process rst, we have
E
_
exp
_
a
n
n
K
K
n
i=K
(K)
X
i
i
_
I
{
C
=1}
_
= E
exp
a
2
n
2
n
K
2K
2
K1
l=0
n
K
j=1
_
(K)
X
jK+l
_
2
2
X
I
{
C
=1}
K1
l=0
exp
a
2
n
2
n
K
2
X
2K
n
K
j=1
_
(K)
X
jK+l
_
2
I
{
C
=1}
1
K
K1
l=0
1
a
2
n
2
X
K
C
2
n
K
/2
1
K
exp
_
a
2
n
2
n
K
2
X
K
C
2
_
when ||
2C
a
n
C
,
(A.18)
where we have used the H olders inequality above. The fourth term
works similarly and has the same bound.
Combining the results for all the terms (A.12)(A.18) together, ap-
plying Lemma 2 to I
1
, we conclude that the conditions for Lemma 2
are satised with A = {
C
= 1}, C
1
= C
1,x
n
K
,
C
1,x
= min
1
4C
2
n/ n
K
2
2
2
X
,
a
n
(K 1)/ n
K
4
2
X
,
D
o
w
n
l
o
a
d
e
d
b
y
[
P
r
i
n
c
e
t
o
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
5
:
4
1
1
9
S
e
p
t
e
m
b
e
r
2
0
1
2
Fan, Li, and Yu: Vast Volatility Matrix Estimation Using High-Frequency Data 425
a
n
n/ n
K
2C
X
C
K/ n
K
2C
a
n
C
(A.19)
=
1
4C
2
C
2
, 2
4
X
, 2
4
X
/a
2
n
,
2
X
C
2
/a
2
n
,
a
2
n
n
K
2
X
K
C
2
_
(A.20)
= max
_
2C
4
C
2
, 2
4
X
_
for big enough n
= 2C
4
C
2
1 and C
X
.
Let w = 8, which is larger, when n is sufciently large, than
1
K
K1
l=0
1 +
_
n
K
n
_
3/2
. ,, .
coef. in the rst two terms of I
1
+
2
n
K
n
K
+
2
n
K
a
n
K
+
2
n
K
a
n
n
. ,, .
controls coefcients in (A.14)
+
4a
n
n
K
n
+
4
a
n
. ,, .
coefcients in (A.16)
.
Set C
I
1
= (4C
2
w
2
)
1
. By Lemma 2, when 0 x 2C
1,x
C
2
n
K
,
P
_
{|I
1
| > x}
_
C
= 1
__
2 exp(C
I
1
x
2
).
Hence when 0 x 4C
1,x
C
2
n
K
,
P
__
|I
1
| >
x
2
_
C
= 1
_
_
2 exp
_
C
I
1
4
x
2
_
. (A.21)
For the term I
2
, let C
I
2
= 2
3/2
C
2
, we have, by Condition 4,
I
2
=
n
3/2
K
n
_
1
0
t
2
dt C
I
2
/
n, on the set {
C
= 1}. (A.22)
Hence
P
__
|I
2
| >
x
2
_
C
= 1
_
_
_
0, if x > 2C
I
2
/
n
1, if x 2C
I
2
/
n.
Since for all large n, 1 2 exp(
C
I
1
4C
2
I
2
4n
), which is smaller than
2 exp(
C
I
1
x
2
4
) when x
2C
I
2
n
, we have that for all large n,
P
__
|I
2
| >
x
2
_
C
= 1
_
_
2 exp
_
C
I
1
4
x
2
_
. (A.23)
Combining (A.21) and (A.23), we have, when 0 x
4C
1,x
C
2
n
K
,
P
__
n
K
X, X
1
_
1
0
t
2
dt
> x
_
C
= 1
_
_
= P
_
{|I
1
+I
2
| > x}
_
C
= 1
__
P
_
{|I
1
| > x/2}
_
C
= 1
__
+P
_
{|I
2
| > x/2}
_
C
= 1
__
4 exp
_
C
I
1
4
x
2
_
.
By Condition 4 again, we have, when 0 x cn
1/6
,
P
__
n
1/6
X, X
1
_
1
0
t
2
dt
> x
_
C
= 1
_
_
P
__
n
K
X, X
1
_
1
0
t
2
dt
> x/
2
_
C
= 1
_
_
4 exp(Cx
2
), (A.24)
where c = 4
2C
1,x
C
2
n
K
, and C =
C
I
1
8
= (32C
2
w
2
)
1
. For big
enough n and the typical case when C
1 and C
X
, we have
c = 2
2C
2
and C =
1
64w
2
C
4
C
2
. (A.25)
Notice that this conditional result depends only on the observation
frequency n and not on the locations of the observation times as long
as Condition 3 is satised, (A.24) holds unconditionally on the set of
the observation times. This proves the rst half of the Theorem 1 when
C
1.
For the second half of the theorem, we have
P
_
n
1/6
X, X
1
_
1
0
t
2
dt
> x
_
= P
__
n
1/6
X, X
1
_
1
0
t
2
dt
> x
_
C
= 1
_
_
+P
__
C
< 1
__
< 4 exp
_
x
2
64w
2
C
4
C
2
_
+k
exp
_
aC
b
_
,
when 0 x 2
2C
2
n
1/6
.
Letting C
= (
x
2
64w
2
aC
2
)
1
4+b
, we have, when 0 x
2
3b12
2b
C
(w
2
a)
2
b
n
4+b
6b
,
P
_
n
1/6
[X, X]
1
_
1
0
2
t
dt
> x
_
(4 +k
) exp
_
(64w
2
)
b
4+b
a
4
4+b
C
2b
4+b
x
2b
4+b
_
. (A.26)
Remark 3. In the above proof, we have demonstrated by using a
sequence a
n
that goes to at a moderate rate that one can eliminate
the impact of the small order terms on the choices of the constants,
as long as the terms have their moment generating functions satisfy
inequalities of form (A.2). We will use this technique again in the next
subsection.
Proof of Theorem 2
We again conduct all the analysis assuming that the observation
times are given. Our nal result holds because the conditional result
does not depend on the locations of the observation times as long as
Condition 3 is satised.
Recall notation for the observation times as introduced in Section
3.2. Dene
Z
+
= X +Y and Z
= X Y.
Z
+
and Z
(X)
t
_
2
+
_
(Y)
t
_
2
+2
t
(X)
t
(Y)
t
D
o
w
n
l
o
a
d
e
d
b
y
[
P
r
i
n
c
e
t
o
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
5
:
4
1
1
9
S
e
p
t
e
m
b
e
r
2
0
1
2
426 Journal of the American Statistical Association, March 2012
and
dW
t
=
(X)
t
dB
(X)
t
(Y)
t
dB
(Y)
t
_
_
(X)
t
_
2
+
_
(Y)
t
_
2
2
t
(X)
t
(Y)
t
.
W
+
and W
Z
+
t
=
_
_
(X)
t
_
2
+
_
(Y)
t
_
2
+2
t
(X)
t
(Y)
t
and
t
=
_
_
(X)
t
_
2
+
_
(Y)
t
_
2
2
t
(X)
t
(Y)
t
,
we have
dZ
+
=
Z
+
t
dW
+
t
and dZ
=
Z
t
dW
t
with
0
Z
+
t
,
Z
t
2C
when
(X)
t
,
(Y)
t
are bounded by C
, or
P
_
sup
0t 1
Z
+
t
2C
_
P
_
sup
0t 1
X
t
C
_
+P
_
sup
0t 1
Y
t
C
_
2k
exp
_
aC
b
_
,
when
X
t
,
Y
t
are such that their tail probabilities are bounded as in
Condition 2.
In fact, let
C
:= inf{t : sup
0st
(X)
s
> C
or sup
0st
(Y)
s
>
C
} 1. We have P{
C
< 1} P{sup
0t 1
X
t
C
} +
P{sup
0t 1
Y
t
C
} 2k
exp{aC
b
}.
For the observed Z
+
and Z
processes, we have
Z
+,o
v
i
= X
o
t
i
+Y
o
s
i
= Z
+
v
i
+
i,+
and
Z
,o
v
i
= X
o
t
i
Y
o
s
i
= Z
v
i
+
i,
,
where t
i
and s
i
are the last ticks at or before v
i
and
i,+
= X
t
i
X
v
i
+Y
s
i
Y
v
i
+
X
i
+
Y
i
,
i,
= X
t
i
X
v
i
Y
s
i
+Y
v
i
+
X
i
Y
i
.
Note that
X, Y
1
=
1
4
(
Z
+
, Z
+
, Z
1
). We can rst prove
analogous results as in Theorem 1 for
Z
+
, Z
+
1
and
Z
, Z
1
, then
use the results to obtain the nal conclusion for TSCV.
For
Z
+
, Z
+
1
, the derivation is different from that of Theorem 1
only for the terms that involve the noise, namely
n
K
R
1
and
n
K
R
2
.
Write
X
i
= X
t
i
X
v
i
and
Y
i
= Y
s
i
Y
v
i
. Then, the rst term in
n
K
R
1
becomes
n
K
n
K
n
n
i=1
i,+
i1,+
=
n
K
n
K
i=1
_
X
i
X
i1
+
X
i
Y
i1
+
X
i
_
X
i1
+
Y
i1
_
+
Y
i
X
i1
+
Y
i
Y
i1
+
Y
i
_
X
i1
+
Y
i1
_
+
_
X
i
+
Y
i
_
X
i1
+
_
X
i
+
Y
i
_
Y
i1
+
_
X
i
+
Y
i
__
X
i1
+
Y
i1
_
_
.
The only O
P
(1) term is the last term, which involves
only independent normals, and can be dealt with the same
way as before (again assume n is even for simplicity of
presentation):
E exp
_
2 n
1/2
odd i
_
X
i
+
Y
i
__
X
i1
+
Y
i1
_
_
= E exp
_
2 n
1/2
even i
_
X
i
+
Y
i
__
X
i1
+
Y
i1
_
_
=
1
1 4
_
2
X
+
2
Y
_
2
2
/ n
n/4
exp
_
2
_
2
X
+
2
Y
_
2
2
_
, when ||
n
2
2
_
2
X
+
2
Y
_.
The other terms are of a smaller order of magnitude. By applying an
a
n
sequence which grows moderately with n as in Proof of Theorem
1 (we can set a
n
= n
1/12
), we can see easily that as long as we can
show that the moment generating functions of these terms can indeed
be suitably bounded as in (A.2), their exact bounds do not have effect
on our choice of C
1
, C
2
, or . To show the bounds for the moment
generating functions, rst note that, for any positive number a and real
valued b, by the optional sampling theorem (applied to submartingales
exp(aB
2
s
) and exp(b
yB
s
) with stopping time [X]
u
C
2
u for real
number
y), we have
E
_
exp{a(
X
i
)
2
}I
{
C
=1}
|F
i1
_
_
E
_
exp
_
aC
2
Z
2
/ n
__
_
for Z N(0, 1)
=
_
1
1 2aC
2
/ n
_
1/2
, (A.27)
where F
i
is the information collected up to time v
i
. Inequality (A.27)
holds when
X
i
is replaced by
Y
i
. Similarly,
E
_
exp{b
X
i
Y
i1
}I
{
C
=1}
|F
i2
_
E
_
E(exp{b
X
i
Y
i1
}I
{
C
=1}
|F
i1
)|F
i2
_
E
_
exp{b
2
C
C
2
(
Y
i1
)
2
/2 n}I
{
C
=1}
|F
i2
_
_
1
1 b
2
C
4
C
2
/ n
2
_
1/2
. (A.28)
Inequalities (A.27) and (A.28) can be used to obtain the bounds we
need. For example, by (A.28) and the law of iterated expectations,
E
_
exp
_
oddi
X
i
Y
i1
_
I
{
C
=1}
_
_
1
1
2
C
4
C
2
/ n
2
_
n/4
exp
_
2
C
4
C
2
_
2 n
_
when ||
n
2C
2
,
and by independence, normality of the noise, the law of iterated expec-
tations, and (A.27), we have
E
_
exp
_
a
n
n
i=1
X
i
_
X
i1
+
Y
i1
_
_
I
{
C
=1}
_
= E
_
exp
_
n
i=1
_
a
n
X
i
_
2
_
2
X
+
2
Y
__
2
_
I
{
C
=1}
_
1
1
_
2
X
+
2
Y
_
2
C
2
/ na
2
n
n/2
exp
__
2
X
+
2
Y
_
C
2
2
/a
2
n
_
,
D
o
w
n
l
o
a
d
e
d
b
y
[
P
r
i
n
c
e
t
o
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
5
:
4
1
1
9
S
e
p
t
e
m
b
e
r
2
0
1
2
Fan, Li, and Yu: Vast Volatility Matrix Estimation Using High-Frequency Data 427
when ||
na
n
C
_
2C
(
2
X
+
2
Y
)
.
Similar results can be found for the other terms above, with the same
techniques.
The second termin
n
K
R
1
works similarly and has the same bound.
The other terms in
n
K
R
1
(with
2
X
replaced by
2
X
+
2
Y
) and the whole
term of
n
K
R
2
are of order o
p
(1) and have good tail behaviors. Again,
by using a sequence a
n
, we can conclude that their exact bounds will
not matter in our choice of the constants and we only need to show
that their moment generating functions are appropriately bounded as
in (A.2). The arguments needed to prove the inequalities of form (A.2)
for each elements in these terms are similar to those presented in the
above proofs, and are omitted here.
Hence, by still letting w = 8 and redening
C
1,x
=
1
4(2C
)
2
C
and
C
2
= max
_
2(2C
)
4
C
2
, 2
_
2
X
+
2
Y
_
2
_
= 32C
4
C
2
X
,
Y
,
we have, when 0 x c
n
1/6
,
P
__
n
1/6
Z
+
, Z
+
_
1
0
Z
+
t
2
dt
> x
_
_
{
C
= 1}
_
4 exp(C
x
2
),
and
P
__
n
1/6
, Z
_
1
0
t
2
dt
> x
_
_
{
C
= 1}
_
4 exp(C
x
2
),
where c
= 4
2C
1,x
C
2
and C
= (32C
2
w
2
)
1
.
Finally, for the TSCV, when 0 x c n
1/6
,
P
__
n
1/6
X, Y
1
_
1
0
(X)
t
(Y)
t
(X,Y)
t
dt
> x
_
_
{
C
= 1}
_
P
__
n
1/6
Z
+
, Z
+
_
1
0
Z
+
t
2
dt
> 2x
_
_
{
C
= 1}
_
+P
__
n
1/6
, Z
_
1
0
t
2
dt
> 2x
_
_
{
C
= 1}
_
8 exp(Cx
2
),
where c = c
/2 = 2
2C
1,x
C
2
and C = 4C
= (8C
2
w
2
)
1
. For big
enough n and the typical case when C
1 and C
X
, we have
c = 4
2C
2
and C =
_
256w
2
C
4
C
2
_
1
. (A.29)
This completes the proof of the rst half of the statement of Theorem
2, when
C
1.
For the second half of the theorem, we have
P
_
n
1/6
X, Y
1
_
1
0
(X)
t
(Y)
t
(X,Y)
t
dt
> x
_
= P
__
n
1/6
X, Y
1
_
1
0
(X)
t
(Y)
t
(X,Y)
t
dt
> x
_
{
C
< 1}
_
+P({
C
< 1})
< 8 exp
_
_
256w
2
C
4
C
2
_
1
x
2
_
+2k
exp
_
aC
b
_
,
when 0 x 4
2C
2
n
1/6
.
Let C
= (
x
2
256w
2
aC
2
)
1
4+b
, the above inequality becomes
P
_
n
1/6
X, Y
1
_
1
0
(X)
t
(Y)
t
(X,Y)
t
dt
> x
_
(8 +2k
) exp
_
(256w
2
)
b
4+b
a
4
4+b
C
2b
4+b
x
2b
4+b
_
, (A.30)
when 0 x 2
5b12
2b
C
(w
2
a)
2
b
n
4+b
6b
.
Remark 4. Note that the argument is not restricted to TSCV based
on the pairwise-refresh timesit works the same (only with n re-
placed by n