Lecture_4_notes_final20180219203938
Lecture_4_notes_final20180219203938
Models
Massimo Guidolin
February 2018
• E[yt ] ≡ µ < ∞ ∀t
where
µ = [µ1 , µ2 , ..., µN ];
Γ0 = N × N covariance matrix where the ith diagonal element is the variance of yi,t and the
(i, j)th element is the covariance between yi,t and yj,t ;
Γh =cross-covariance matrix at lag h;
and the expectations are taken element-by-element over the joint distribution of yt .
ρ0 = D−1 Γ0 D−1
1
where D= N × N diagonal matrix collecting on its main diagonal the standard deviations
of yi,t for i = 1, ..., N and
Cov[yi,t , yj,t ]
ρi,j (0) =
σi,t σj,t
• ρ0 is a symmetric matrix with unit diagonal elements because ρi,j (0) = ρj,i (0), −1 ≤
ρi,j ≤ 1 and ρi,i = 1 for 1 ≤ i and j ≤ N .
where µ=mean vector of yt and the (i,j)th element of Γh =covariance between yi,t and yj,t−h .
ρh = D−1 Γh D−1
where D is the diagonal matrix of standard deviations of the individual series yi,t
and
Cov[yi,t , yj,t−h ]
ρi,j (h) =
σi,t σj,t
• When h > 0, ρi,j (h) measures the linear dependence of yi,t on yj,t−h , while ρj,i (h)
measures the linear dependence of yj,t on yi,t−h .
• ρj,i (h) 6= ρi,j (h) for any i 6= j, therefore Γh and ρh do not need to be symmetric.
2
1.3 Sample Cross-Covariance and Cross-Correlation Matrices
where
0
ȳ = [y¯1 , y¯P
2 , ..., y¯
N]
T
ȳi = T −1 t=1 yi,t with i = 1, ..., N
where D̂ is the N × N diagonal matrix of the sample standard deviations of each of the
component series.
3
m
X 1
Q(m) = T 2 tr(Γ̂h Γ̂−1 −1
0 Γ̂h Γ̂0 )
h=1
T −h
where
T = sample size
N = dimension of yt
m = maximum lag length we want to test
tr(A) = trace of some matrix A, defined as the sum of the diagonal elements of A.
a
• Under H0 , Q(m) ∼ χ2 (N 2 m).
• When T is small, the χ2 approximation to the distribution of the test statistic may be
misleading.
• When T is small, the nominal size of the portmanteau test tends to be lower than
the significance level chosen and the test has low power against many alternatives.
Adjusted versions of the Q statistic can be used:
– Hosking’s statistic:
m
X 1
Q∗ (m) = T (T + 2) tr(Γ̂h Γ̂−1 −1
0 Γ̂h Γ̂0 )
h=1
T −h
Multivariate White Noise: Let zi,t = [z1,t , z2,t , ..., zN,t ] be a N × 1 vector of random
variables. This multivariate time series is said to be a multivariate white noise if it is a
stationary vector with zero mean, and if the values of zt at different times are uncorrelated,
i.e., Γh is an N × N of zeros at all h 6= 0.
• Assuming that the values of zt are uncorrelated does not necessarily imply that they
are independent. Independence can be inferred by the lack of correlations at all leads
and lags among the random variables that enter zt , when the random vector follows a
multivariate normal distribution.
4
2 Introduction to VAR Analysis
where
yt = N × 1 vector containing N endogenous variables
a0 = N × 1 vector of constants
A1, A2, ..., Ap = p N × N matrices of autoregressive coefficients ut
= N × 1 vector of serially uncorrelated, white noise disturbances.
In matrix notation
1 b1,2 y1,t b1,0 ϕ1,1 ϕ1,2 y1,t−1 1,t
= + +
b2,1 1 y2,t b2,0 ϕ2,1 ϕ2,2 y2,t−1 2,t
or in compact form
Byt = Q0 + Q1 yt−1 + t
• y1,t depends on its own lag and on both one lag and current value of y2,t ; y2,t depends
on its own lag and on both one lag and current value of y1,t .
5
2. −b2,1 measures the contemporaneous effect of a unit change of y1,t on y2,t .
• Each contemporaneous variable is correlated with its own error term, therefore the
regressors are not uncorrelated with the error terms as required by OLS estimation
techniques.
• When −b1,2 6= 0, y2,t depends on y1,t and on 1,t and will be correlated with it. When
−b2,1 6= 0, y1,t depends on y2,t and on 2,t .
yt = a0 + A1 yt−1 + ut
ut = B−1 t then
1,t − b1,2 2,t 2,t − b2,1 1,t
u1,t = and u2,t =
1 − b1,2 b2,1 1 − b1,2 b2,1
.
1.
1,t − b1,2 2,t
E[u1,t ] = E[ ]=0
1 − b1,2 b2,1
2,t − b2,1 1,t
E[u2,t ] = E[ ]=0
1 − b1,2 b2,1
2.
V ar[1,t − b1,2 2,t ]
V ar[u1,t ] = =
(1 − b1,2 b2,1 )2
6
V ar[1,t ] + b21,2 V ar[2,t ] − 2b1,2 Cov[1,t , 2,t ] 2
σ,1 2
+ b21,2 σ,2
= =
(1 − b1,2 b2,1 )2 (1 − b1,2 b2,1 )2
2
σ,2 2
+ b22,1 σ,1
V ar[u2,t ] =
(1 − b1,2 b2,1 )2
constant over time.
3.
2 2
E[(1,t − b1,2 2,t )(2,t − b2,1 1,t )] −(b2,1 σ,1 + b1,2 σ,2 )
Cov[u1,t , u2,t ] = =
(1 − b1,2 b2,1 )2 (1 − b1,2 b2,1 )2
• u1,t and u2,t are serially uncorrelated, but are cross-correlated unless b1,2 = b2,1 =
0.
4.
Var[u1,t ] Cov[u1,t , u2,t ] σ12 σ1,2
Σu = =
Cov[u1,t , u2,t ] Var[u2,t ] σ1,2 σ22
• In general, it is not possible to identify the structural parameters and errors from
the OLS estimates of the parameters and the residuals of the standard form VAR,
unless some restrictions are imposed on the primitive system.
7
b1,0 1 0 ϕ1,1 ϕ1,2 y1,t−1
= + +
b2,0 − b1,0 b2,1 -b2,1 1 ϕ2,1 − b2,1 ϕ1,1 ϕ2,2 − b2,1 ϕ1,2 y2,t−1
1,t
+
2,t − b2,1 1,t
so that
a1,0 = b1,0 , a2,0 = b2,0 − b1,0 b2,1 , a1,1 = ϕ1,1, , a1,2 = ϕ1,2 ,
a2,1 = ϕ2,1 − b2,1 ϕ1,1 , a2,2 = ϕ2,2 − b2,1 ϕ1,2 , u1,t = 1,t , u2,t = 2,t − b2,1 1,t
It follows that
σ12 ≡ V ar[u1,t ] = σ,1
2
2
σ22 ≡ V ar[u2,t ] = σ,2 2
− b22,1 σ,1
2
Cov[u1,t , u2,t ] = −b2,1 σ,1
• The restriction implies that the observed values of u1,t are completely attributed
to pure (structural) shocks to y1,t .
We can go back from the estimated Σu to the original (and unobserved) diagonal
matrix Σ, and this is equivalent, after a little bit of algebra to
8
• In a N-variate VAR, we need to impose (N 2 − N )/2 in order to retrieve the N
structural shocks from the residual of the OLS estimate.
so that
1 0 0 1,t
−1
ut = B t = -b2,1
1 0 2,t =
-b3,1 -b3,2 1 3,t
1,t
= 2,t − b2,1 1,t
3,t − b3,1 1,t − b3,2 2,t
• There are as many Choleski decompositions as all the possible orderings of the
variables. Therefore, when we apply a Choleski triangular identification scheme
to a VAR model we are introducing a number of (potentially arbitrary) assump-
tions on the contemporaneous relationships among the variables.
(a)
E[yt ] = a0 + A1 E[yt−1 ]
time invariant and thus
(b)
µt|t−1 ≡ E[yt |=t−1 ] = E[yt |yt−1 ] = a0 + A1 yt−1
9
(c) Given that a0 = (IN − A1 )µ, the VAR(1) model can be rewritten as
yt − µ = A1 (yt−1 − µ) + ut
ỹt = A1 yt−1 + ut
and substituting yt−1 = A1 yt−2 + ut−1 , then yt−2 = A1 yt−3 + ut−2 and keeping
iterating, we obtain
and ∞
X
yt = µ + Ai1 ut−i + ut
i=1
Properties:
(a) ut is serially uncorrelated and it is also uncorrelated with the past values of yt
Cov[ut , yt−1 ] = 0
(b)
Cov[yt , ut ] = Σu
derived by post-multiplying yt = µ + ∞ i 0
P
i=1 A1 ut−i + ut by ut , taking the expec-
tation, and exploiting the fact that ut is serially uncorrelated.
(c) yt depends on the past innovations ut−j with a coefficient matrix Aj1 .
10
• yt is stable if det(IN − A1 z) 6= 0 for |z| ≤ 1.
(d)
∞
X
Cov[yt ] ≡ Γ0 = Σu + A1 Σu A01 + A21 Σu (A21 )0 + ... = Ai1 Σu (Ai1 )0
i=0
0
Given that Cov[ut , yt−j ] = E[ut yt−j ] = 0 for j > 0 if we post-multiply the
expression of yt by yt−h we obtain
E(yt yt+1−h ) = A1 E(yt yt−h )0 for h > 0
Therefore
Γh = A1 Γh−1 = Ah1 Γ0 for h > 0
−1/2
Finally, if we post-multiply Γh by D we obtain
ρh = D−1/2 A1 Γh−1 D−1/2 = D−1/2 A1 D1/2 D−1/2 Γh−1 D1/2 =
Ψρh−1 = Ψh ρ0 for h > 0
−1/2 −1/2
where Ψ = D A1 D
11
(e)
Cov[yt |=t−1 ] = Cov[yt |yt−1 ] = A1 Cov[yt−1 |yt−1 ]A01 + Σu = Σu
because Cov[yt−1 |yt−1 ] = 0.
In compact form
A(L)yt = a0 + ut
where A(L) = IN − A1 L − ... − Ap Lp .
Properties:
(a)
µ = E[yt ] = (IN − A1 − ... − Ap )−1 a0
provided that the inverse of the matrix (IN − A1 − ... − Ap ) exists, and
p
X
µt|t−1 ≡ E[yt |yt−1 ] = a0 + Aj yt−j
j=1
(b)
Cov[yt , ut ] = Σu
(c)
Cov[yt−h , ut ] = 0 for any h > 0
(d)
Γh = A1 Γh−1 + ... + Ap Γh−p for h > 0
12
(e)
ρh = Ψ1 ρh−1 + ... + Ψp ρh−p for h > 0
where Ψi = D−1/2 Ai D1/2 .
In VMA representation
∞
X ∞
X
ξt = Ut + F1 Ut−1 + F21 Ut−2 + ... = Fi1 Ut−i + Ut = Πi Ut−i + Ut
i=1 i=1
• A VAR(p) model is stable (and thus stationary) as long as the eigenvalues of the
companion matrix F1 are all less than one in modulus, which implies det(IN −
A1 z − ... − Ap zp ) 6= 0 for |z| ≤ 1.
Multivariate LS estimator:
Starting from
Y = BZ + U
where Y ≡ [y1 , y2 , ..., yt ], B ≡ [a0 , A1 , A2 , ..., Ap ], U ≡ [u1 , u2 , ..., uT ], Z ≡ [Z0 , Z1 , ZT −1 ]
with Zt ≡ [10 , yt−1
0 0
, yt−2 0
, ..., yt−p+1 ]0 . Given that y ≡ vec(Y), β ≡ vec(B) and
u ≡ vec(U) the multivariate LS estimator is
β̂ = ((ZZ0 )−1 ⊗ Σu )(Z ⊗ Σ−1 0 −1
u )y = ((ZZ ) Z ⊗ IN )y
13
that minimizes
S(β) = u0 (IN Σu )−1 u
(b)
T T
1 X 1X
Σ̂u = ût û0t or Σ̃u = ût û0t
T − N p t=1 T t=1
where ût = yt − BZt−1 .
Multivariate ML estimator:
(a) Sample of T observations on Y and a pre-sample of p initial conditions y−p+1 , y−p+2 , ..., y0 .
(c) Gaussian multivariate white noise (then innovations at different times are inde-
pendent).
(d) Noise error terms are independent with Σu , then the covariance matrix of u is
ΣU = IT ⊗ Σu and its normal density is
NT 1
fu (u) = (2π)− 2 |IT ⊗ Σu |− 2 exp(− 12 u0 (IT ⊗ Σ−1
u )u).
NT 1
(e) fy (y) = (2π)− 2 |IT ⊗ Σu |− 2 exp(− 12 (Y − BZ)0 (IT ⊗ Σ−1
u )(Y − BZ)).
14
NT T 1
=− ln(2π) − ln|Σu | − tr(U0 Σ−1
u U)
2 2 2
• For an unconstrained VAR, the ML and OLS estimators are the same under the
assumption of Gaussian innovations.
Average cross- vector product of the OLS residuals: the ML estimator of the
matrix Σu is
T
1X
Σ̃u = ût û0t
T t=1
ating between the two objects until convergence is achieved, will return identical
results.
• If the order of the VAR model increases increasing, the (absolute) size of the
residuals decreases and both the fit of the model and its forecasting power in-
crease.
• If the number of parameters increases, its in-sample accuracy increases, while its
out-of-sample predictive power decreases.
Restricted, standard VAR: models in which the structure and number of lags in-
cluded in each equation may vary across different equations.
15
Methods to select p:
i. (M)AIC: ln|Σ̃u | + 2 K
T
K
ii. (M)SBC: ln|Σ̃u | + T
ln(T )
Alternative statistics:
a
LRT 0 (p0 , p1 ) = (T − N p − 1)(ln|Σ̃pu0 | − |Σ̃pu0 |) ∼ χ2 (N (p1 − p0 ))
• If the assumption that errors from each equation are normally distributed
is not respected, the test is not valid.
• When the sample size is small, the test may be subject to substantial size
distortions.
16
Assumption: ut is an independent multivariate white noise, such that ut and us
are independent for t 6= s ⇒ Et [ut+h |=t ] = 0 for h > 0.
Properties:
17
or, for example in the case of a VAR(1):
X∞
y1,t µ1 θ1,1(i) θ1,2(i) u1,t−i
= +
y2,t µ2 θ2,1(i) θ2,2(i) u2,t−i
i=0
where
u1,t 1 1 -b1,2 1,t
=
u2,t 1 − b1,2 b2,1 -b2,1 1 2,t
Then
Ai1
1 -b1,2 Θi 1 -b1,2
Φi = =
1 − b1,2 b2,1 -b2,1 1 1 − b1,2 b2,1 -b2,1 1
and ∞
X
yt = µ + Φi t−i
i=0
For example, φ1,2(0) is the instantaneous impact on y1,t of a one-unit change in 2,t .
For example, H
P
i=0 φ1,2,(i) is the cumulative effects of a one-unit shock (or im-
pulse) to 2,t on the variable y1,t after H periods.
• The set of elements φj,k(i) , with i = 1, ..., H is the impulse response function
of the jth variable of the system, up to the period H.
• IRFs are constructed using estimated coefficient, thus will contain sampling
error. Therefore, it is advisable to construct confidence intervals around
them to account for the uncertainty that derives from parameter estimation.
18
• Bootstrapping techniques are usually used to compute confidence intervals,
as these are more reliable and avoid the complex computation of exact ex-
pressions for the asymptotic variance of the IRF coefficients.
Bootstrapping methods:
iii. Discard the coefficients used to generate {ytb } and estimate new coefficients
from {ytb }. The impulse response functions are computed from the newly
estimated coefficients and saved, also indexed by the bootstrap iteration b.
then
uy1 (h) = y1,t+h −E[y1,t+h|t ] = φ1,1 (0)1,t+h +φ1,1 (1)1,t+h−1 +...+φ1,1 (h−1)1,t+1 +
• The variance of the forecast error increases as the forecast horizon h in-
creases because all the coefficients φ2j,k are non-negative as they are squared.
19
Therefore, the h-step-ahead forecast error variance can be decomposed in
ii. proportion of forecast error variance due to the shocks in the sequence {2,t }
2
σy2 [φ21,2 (0) + φ21,2 (1) + ... + φ21,2 (h − 1)]
2
σy1 (h)
Granger causality: Let =t be the information set containing all the relevant
information available up to and including time t. In addition, let yt (h|=t ) be
the optimal (minimum MSFE) h-step-ahead prediction of the process {yt } at
the forecast origin t, based on =t . The vector time series process {xt } is said to
(Granger-) cause {yt } in a Granger sense if and only if
Feedback system: represented by the joint process {x0t , yt0 }0 when {xt } causes
{yt } and {yt } causes {xt }.
• We only consider the information in the past and present values of the
process under examination, rather than the entire =t because =t of all the
20
existent relevant information is rarely available.
M SF E(E[yt |xt−1 , xt−2 , ..., yt−1 , yt−2 , ...]) ≤ M SF E(E[yt |yt−1 , yt−2 , ...])
• The lack of Granger causality can be verified using a standard F-test of the
restriction a1,2(1) = a1,2(2) = ... = a1,2(p) = 0, where, for instance in the case
of N = 2
y1,t a1,0 a1,1(1) a1,2(1) y1,t−1
= + + ...+
y2,t a2,0 a2,1(1) a2,2(1) y2,t−1
a1,1(p) a1,2(p) y1,t−p u1,t
+ +
a2,1(p) a2,2(p) y2,t−p u2,t
Block-causality tests: verify whether one variable, yn,t , Granger causes any
other variables in the system, that is, whether taking into account the lagged
value of yn,t helps forecasting any of the other variables in the VAR.They consist
of likelihood ratio tests
(T − m)(ln|Σ̃R U
u | − |Σ̃u |)
where Σ̃Ru is the covariance matrix of the residuals from a model that has been
restricted to have all the coefficients of the lags of the variable yn,t = 0 and Σ̃Uu
is the residual covariance matrix of the unrestricted model.
21