Modelling Volatility and Correlation: Introductory Econometrics For Finance' © Chris Brooks 2008 1
Modelling Volatility and Correlation: Introductory Econometrics For Finance' © Chris Brooks 2008 1
11/01/93
Date
9/01/97
Models with nonlinear g() are non-linear in mean, while those with
nonlinear 2() are non-linear in variance.
Heteroscedasticity Revisited
An example of a structural model is
yt = 1 + 2x2t + 3x3t + 4x4t + u t
with ut N(0, u2).
The assumption that the variance of the errors is constant is known as
homoscedasticity, i.e. Var (ut) = . u2
What if the variance of the errors is not constant?
- heteroscedasticity
- would imply that standard error estimates could be wrong.
Is the variance of the errors likely to be constant over time? Not for financial
data.
t
Instead of calling the variance
, in the literature it is usually called
ht, so the model is
yt = 21 + 2x2t2 + ... + kxkt2 + ut , ut N(0,ht)
ut 2
ut q
ut 1
where ht = 0 + 1 +2
+...+q
t 0 1ut21
vt N(0,1)
The two are different ways of expressing exactly the same model. The
first form is easier to understand while the second form is required for
simulating from an ARCH model, for example.
Note that the ARCH test is also sometimes applied directly to returns
instead of the residuals from Stage 1 above.
How do we decide on q?
The required value of q might be very large
Non-negativity constraints might be violated.
When we estimate an ARCH model, we require i >0 i=1,2,...,q
(since variance cannot be negative)
t2
= 0 i u
i 1
2
t i
j t j
j 1
0
1 (1 )
when 1 < 1
1 = 1
1.
Specify the appropriate equations for the mean and the variance - e.g. an
AR(1)- GARCH(1,1) model:
yt = + yt-1 + ut , ut N(0,t2)
t2 = 0 + 1 ut21 +t-12
2.
3. The computer will maximise the function and give parameter values and
their standard errors
Introductory Econometrics for Finance Chris Brooks 2008
(2)
T
f ( yt 1 2 X t , 2 )
t 1
2
T
2
( 2 )
t 1
(
y
x
)
1
t
1
2
t
(4)LF ( , , )
exp
1
2
2
T
T
( 2 )
2 t 1
Then, using the various laws for transforming functions containing logarithms, we
obtain the log-likelihood function, LLF:
T
1 T ( y t 1 2 xt ) 2
LLF T log log(2 )
2
2
2
1
which is equivalent to
(5)
2
T
(
y
x
)
T
T
1
1
2 t
LLF log 2 log(2 ) t
2
2
2 t 1
2
LLF
1 ( y 1 2 xt ).2. 1
t
1
2
2
(7)
LLF
T 1 1 ( y t 1 2 xt ) 2
(8)
22 2
2
4
Setting (6)-(8) to zero to minimise the functions, and putting hats above the
parameters to denote the maximum likelihood estimators,
From (6),
( y x ) 0
y x 0
y T x 0
t
(9)
1
T
1
y
t 1 2T
1 y 2 x
( y x ) x 0
y x x x 0
y x x x 0
x y x ( y x ) x
x y x Tx y Tx
2 ( xt2 Tx 2 ) y t xt Tx y
t
1 t
2
t
2
t
(10)
2
From (8),
2
t
2
t
y x Tx y
( x Tx )
T
1
2 4
2
t
(y
1 2 xt ) 2
T
1
2 ut2
T
Rearranging, 2
(11)
1
ut2
Tk
t2
t2 = 0 + 1 ut21 +t-12
T
1 T
1 T
2
2
L log(2 ) log( t ) ( y t y t 1 ) 2 / t
2
2 t 1
2 t 1
Unfortunately, the LLF for a model with time-varying variances cannot be maximised
analytically, except in the simplest of cases. So a numerical procedure is used to
maximise the log-likelihood function. A potential problem: local optima or
multimodalities in the likelihood surface.
0 1ut21
vt
2 t21
ut
t
vt
ut
t
Are the
normal? Typically
are still leptokurtic, although less so than the . Is
vt
vt use the ML with a robust variance/covariance
this a problem?
Not really, as we can
estimator.
ML with robust standard errors is called Quasi- Maximum Likelihood or
ut
QML.
log( t ) log( t 1 )
2
u
t 1
u t 1
t 1
t 1
t2 = 0 + 1 ut21 +t-12+ut-12It-1
where It-1 = 1 if ut-1 < 0
= 0 otherwise
y t 0.172
(3.198)
0.12
0.1
0.08
0.06
0.04
0.02
0
-1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
GARCH-in Mean
yt = + t-1+ ut , ut N(0,t2)
t2 = 0 + 1 ut21 +t-12
can be interpreted as a sort of risk premium.
GARCH can model the volatility clustering effect since the conditional
variance is autoregressive. Such models can be used to forecast volatility.
Forecasting Variances
using GARCH Models (Contd)
Let 1f,T be the one step ahead forecast for 2 made at time T. This is
easy to calculate since, at time T, the values of all the terms on the
RHS are known.
1f,T 2 would be obtained by taking the conditional expectation of the
first equation at the bottom of slide 36:
2
1f,T = 0 + 1 uT2 +T2
2
Given, 1f,T 2 how is 2f,T , the 2-step ahead forecast for 2 made at time T,
calculated? Taking the conditional expectation of the second equation
at the bottom of slide 36:
2
2
f 2
2f,T = 0 + 1E( uT 1 T) + 1,T
2
2
where E( uT 1 T) is the expectation, made at time T, of uT 1, which is
the squared disturbance term.
2
Forecasting Variances
using GARCH Models (Contd)
We can write
E(uT+12 t) = T+12
But T+12 is not known at time T, so it is replaced with the forecast for it,
2
, so1f,Tthat
the 2-step ahead forecast is given by
2
2
f 2
=20f,T+ 1
+ 1f,T
1,T
2
f
2
f
1,T
=2,0T + (1+)
By similar arguments, the 3-step ahead forecast will be given by
2
3f,T = ET(0 + 1 + T+22)
2
= 0 + ( 1 + )
2f,T
2
= 0 + (1+)[ 0 + (1+)
] 1f,T
f 2
= 0 + 0( 1+ ) + ( 1+ ) 2
1,T
Any s-step ahead forecast (s 2) would be produced by
f
s ,T
s 1
0 ( 1 ) i 1 ( 1 ) s 1 h1f,T
i 1
i ,t
im,t
m2 ,t
h p
s
F
F ,t
Usual t- and F-tests are still valid in non-linear models, but they are
not flexible enough.
t2
)
ut21
t21
We estimate the model imposing the restriction and observe the maximised
LLF falls to 64.54. Can we accept the restriction?
LR = -2(64.54-66.85) = 4.62.
The test follows a 2(1) = 3.84 at 5%, so reject the null.
Denoting the maximised value of the LLF by unconstrained ML as L( )
and the constrained optimum as L(~ ) . Then we can illustrate the 3 testing
procedures in the following diagram:
L
A
~
L
B
~
We know at the unrestricted MLE, L(), the slope of the curve is zero.
~
L(
)?
But is it significantly steep at
The Models
The Base Models
For the conditional mean
(1)
And for the variance
(2)
RMt R Ft 0 1 ht u t
ht 0 1u t21 1 ht 1
or
1/ 2
(3)
u
u t 1
2
t 1
ln(ht ) 0 1 ln(ht 1 ) 1 (
)
ht 1
h
t 1
where
RMt denotes the return on the market portfolio
RFt denotes the risk-free rate
ht denotes the conditional variance from the GARCH-type models while t2 denotes
the implied variance from option prices.
ht 0 1u t21 1 ht 1 t21
u t 1
u
t 1
ht 1
We are interested in testing H0 : = 0 in (4) or (5).
Also, we want to test H0 : 1 = 0 and 1 = 0 in (4),
and H0 : 1 = 0 and 1 = 0 and = 0 and = 0 in (5).
ht 1
1/ 2
) ln( t21 )
If this second set of restrictions holds, then (4) & (5) collapse to
ht2 0 t21
(4)
(5)
(8.78)
ht 0 1u t21 1 ht 1
(8.79)
ht 0 1u t21 1 ht 1 t21
(8.81)
ht2 0 t21
Equation for
0
Variance
specification
(8.79)
(8.81)
(8.81)
0.0072
(0.005)
0.0015
(0.028)
0.0056
(0.001)
(8.81)
Log-L
2
010-4
0.071
(0.01)
0.043
(0.02)
-0.184
(-0.001)
5.428
(1.65)
2.065
(2.98)
0.993
(1.50)
0.093
(0.84)
0.266
(1.17)
-
0.854
(8.17)
-0.068
(-0.59)
-
767.321
17.77
0.318
(3.00)
0.581
(2.94)
776.204
764.394
23.62
Notes: t-ratios in parentheses, Log-L denotes the maximised value of the log-likelihood function in
each case. 2 denotes the value of the test statistic, which follows a 2(1) in the case of (8.81) restricted
to (8.79), and a 2 (2) in the case of (8.81) restricted to (8.81). Source: Day and Lewis (1992).
Reprinted with the permission of Elsevier Science.
Variance
specification
(c)
(e)
(e)
-0.0026
(-0.03)
0.0035
(0.56)
0.0047
(0.71)
0.094
(0.25)
-0.076
(-0.24)
-0.139
(-0.43)
-3.62
(-2.90)
-2.28
(-1.82)
-2.76
(-2.30)
(8.78)
u t 1
ht 1
u t 1
ht 1
u
t 1
ht 1
u
t 1
ht 1
1/ 2
1/ 2
(8.80)
(8.82)
Log-L
0.529
(3.26)
0.373
(1.48)
-
-0.273
(-4.13)
-0.282
(-4.34)
-
0.357
(3.17)
0.210
(1.89)
-
776.436
8.09
0.351
(1.82)
0.667
(4.01)
780.480
765.034
30.89
Notes: t-ratios in parentheses, Log-L denotes the maximised value of the log-likelihood function in
each case. 2 denotes the value of the test statistic, which follows a 2(1) in the case of (8.82) restricted
to (8.80), and a 2 (2) in the case of (8.82) restricted to (8.82). Source: Day and Lewis (1992).
Reprinted with the permission of Elsevier Science.
But the models do not represent a true test of the predictive ability of IV.
There are 729 data points. They use the first 410 to estimate the models,
and then make a 1-step ahead forecast of the following weeks volatility.
Proxy for ex
post volatility
SR
Historic
WV
GARCH
SR
GARCH
WV
EGARCH
SR
EGARCH
WV
Implied Volatility
SR
Implied Volatility
WV
Forecasting Model
(8.83)
b0
b1
R2
0.0004
(5.60)
0.0005
(2.90)
0.0002
(1.02)
0.0002
(1.07)
0.0000
(0.05)
-0.0001
(-0.48)
0.0022
(2.22)
0.0005
(0.389)
0.129
(21.18)
0.154
(7.58)
0.671
(2.10)
1.074
(3.34)
1.075
(2.06)
1.529
(2.58)
0.357
(1.82)
0.718
(1.95)
0.094
0.024
0.039
0.018
0.022
0.008
0.037
0.026
Notes: Historic refers to the use of a simple historical average of the squared returns to forecast
volatility; t-ratios in parentheses; SR and WV refer to the square of the weekly return on the S&P 100,
and the variance of the weeks daily returns multiplied by the number of trading days in that week,
respectively. Source: Day and Lewis (1992). Reprinted with the permission of Elsevier Science.
(8.86)
b1
0.601
(1.03)
b2
0.298
(0.42)
b3
-
b4
-
R2
0.027
b0
-0.00010
(-0.09)
0.00018
(1.15)
0.632
(1.02)
-0.243
(-0.28)
0.123
(7.01)
0.038
-0.00001
(-0.07)
0.695
(1.62)
0.176
(0.27)
0.026
0.00026
(1.37)
0.590
(1.45)
-0.374
(-0.57)
0.118
(7.74)
0.038
0.00005
(0.37)
1.070
(2.78)
-0.001
(-0.00)
0.018
Forecast comparison
Notes: t-ratios in parentheses; the ex post measure used in this table is the variance of the weeks daily
returns multiplied by the number of trading days in that week. Source: Day and Lewis (1992).
Reprinted with the permission of Elsevier Science.
Conclusions of Paper
h11t
Ht
h21t
h12 t
h22 t
h11t
VECH ( H t ) h22t
h12t
t t 1 ~ N 0, H t
h22 t c21 a 21u12t a 22 u 22t a 23u1t u 2 t b21h11t 1 b22 h22 t 1 b23 h12 t 1
h12 t c31 a 31u12t a 32 u 22t a 33u1t u 2 t b31h11t 1 b32 h22 t 1 b33 h12 t 1
u
2 h22form
t
0
1 2t a
1Quadratic
t 1 for the parameter matrices to ensure a positive definite variance /
covariance matrix Ht.
h12t 0 1u1t 1u 2t 1 2 h12t 1
Neither the VECH nor the diagonal VECH ensure a positive definite variancecovariance matrix.
An alternative approach is the BEKK model (Engle & Kroner, 1995).
In matrix form, the BEKK model is
H t W W AH t 1 A B t 1t 1B
Data comprises 3580 daily observations on the FTSE 100 stock index and
stock index futures contract spanning the period 1 January 1985 - 9 April
1999.
Several competing models for determining the optimal hedge ratio are
constructed. Define the hedge ratio as .
No hedge (=0)
Nave hedge (=1)
Multivariate GARCH hedges:
Symmetric BEKK
Asymmetric BEKK
In both cases, estimating the OHR involves forming a 1-step ahead
h
forecast and computing OHRt 1 CF ,t 1 t
hF ,t 1
OHR Results
In Sample
Unhedged
=0
Nave Hedge
=1
Symmetric Time
Varying
Hedge
t
Return
Variance
0.0389
{2.3713}
0.8286
-0.0003
{-0.0351}
0.1718
hFC ,t
hF ,t
Asymmetric
Time Varying
Hedge
hFC ,t
h F ,t
0.0061
{0.9562}
0.1240
0.0060
{0.9580}
0.1211
Symmetric Time
Varying
Hedge
Asymmetric
Time Varying
Hedge
Out of Sample
Unhedged
=0
Nave Hedge
=1
t
Return
Variance
0.0819
{1.4958}
1.4972
-0.0004
{0.0216}
0.1696
hFC ,t
hF ,t
0.0120
{0.7761}
0.1186
hFC ,t
h F ,t
0.0140
{0.9083}
0.1188
1.00
0.95
0.90
0.85
Conclusions
- OHR is time-varying and less
- M-GARCH OHR provides a
better hedge, both in-sample
- No role in calculating OHR for
than 1
and out-of-sample.
asymmetries
0.80
0.75
0.70
0.65
500
1000
1500
2000
2500
3000
Symmetric BEKK
Asymmetric BEKK