0% found this document useful (0 votes)
3 views

Lecture_4_notes_final20180219203938

The document discusses Vector Autoregressive Moving Average (VARMA) models, focusing on the foundations of multivariate time series analysis, including weak stationarity, cross-covariance and cross-correlation matrices, and sample statistics. It explains the transition from structural to reduced-form VARs, detailing the relationships between endogenous variables and their lags, as well as the implications of contemporaneous feedback effects. Additionally, it addresses the properties of multivariate white noise processes and the challenges in identifying structural parameters from OLS estimates.

Uploaded by

yanabid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture_4_notes_final20180219203938

The document discusses Vector Autoregressive Moving Average (VARMA) models, focusing on the foundations of multivariate time series analysis, including weak stationarity, cross-covariance and cross-correlation matrices, and sample statistics. It explains the transition from structural to reduced-form VARs, detailing the relationships between endogenous variables and their lags, as well as the implications of contemporaneous feedback effects. Additionally, it addresses the properties of multivariate white noise processes and the challenges in identifying structural parameters from OLS estimates.

Uploaded by

yanabid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Vector Autoregressive Moving Average (VARMA)

Models

Massimo Guidolin

February 2018

1 Foundations of Multivariate Time Series Analysis

1.1 Weak Stationarity of Multivariate Time Series

Weak Stationarity (multivariate case): Consider a N-dimensional time series yt =


[y1,t , y2,t , ..., yN,t ]0 . Formally, this is said to be weakly stationary if its first two unconditional
moments are finite and constant through time, i.e.,

• E[yt ] ≡ µ < ∞ ∀t

• E[(yt − µ)(yt − µ)0 ] ≡ Γ0 < ∞ ∀t

• E[(yt − µ)(yt−h − µ)0 ] ≡ Γh ∀t, ∀h

where
µ = [µ1 , µ2 , ..., µN ];
Γ0 = N × N covariance matrix where the ith diagonal element is the variance of yi,t and the
(i, j)th element is the covariance between yi,t and yj,t ;
Γh =cross-covariance matrix at lag h;
and the expectations are taken element-by-element over the joint distribution of yt .

1.2 Cross-Covariance and Cross-Correlation Matrices

Lag-0 correlation matrix of yt :

ρ0 = D−1 Γ0 D−1

1
where D= N × N diagonal matrix collecting on its main diagonal the standard deviations
of yi,t for i = 1, ..., N and
Cov[yi,t , yj,t ]
ρi,j (0) =
σi,t σj,t

• ρ0 is a symmetric matrix with unit diagonal elements because ρi,j (0) = ρj,i (0), −1 ≤
ρi,j ≤ 1 and ρi,i = 1 for 1 ≤ i and j ≤ N .

Lag-h cross-covariance matrix of yt :

Γh = E[(yt − µ)(yt−h − µ)0 ]

where µ=mean vector of yt and the (i,j)th element of Γh =covariance between yi,t and yj,t−h .

• It is time-invariant if the time-series is weakly stationary.

Lag-h cross-correlation matrix:

ρh = D−1 Γh D−1

where D is the diagonal matrix of standard deviations of the individual series yi,t
and
Cov[yi,t , yj,t−h ]
ρi,j (h) =
σi,t σj,t

• When h > 0, ρi,j (h) measures the linear dependence of yi,t on yj,t−h , while ρj,i (h)
measures the linear dependence of yj,t on yi,t−h .

• ρi,i (h) is the lag-h autocorrelation coefficient of yi,t .

• ρj,i (h) 6= ρi,j (h) for any i 6= j, therefore Γh and ρh do not need to be symmetric.

• Information summarized by the cross-correlation matrices:

2
1.3 Sample Cross-Covariance and Cross-Correlation Matrices

Sample cross-covariances matrix:


T
1 X
Γ̂h = (yt − ȳ)(yt−h − ȳ) with h ≥ 0
T t=h+1

where
0
ȳ = [y¯1 , y¯P
2 , ..., y¯
N]
T
ȳi = T −1 t=1 yi,t with i = 1, ..., N

Sample cross-correlation matrix:

ρ̂h = D̂−1 Γ̂h D̂−1 with h ≥ 0

where D̂ is the N × N diagonal matrix of the sample standard deviations of each of the
component series.

1.4 Multivariate Portmanteau Tests

Multivariate version of the Ljung-Box statistic:

H0 : ρ1 = ... = ρm = 0 vs H1 : ρi 6= 0 for some i ∈ {1, ..., m}

3
m
X 1
Q(m) = T 2 tr(Γ̂h Γ̂−1 −1
0 Γ̂h Γ̂0 )
h=1
T −h
where
T = sample size
N = dimension of yt
m = maximum lag length we want to test
tr(A) = trace of some matrix A, defined as the sum of the diagonal elements of A.

a
• Under H0 , Q(m) ∼ χ2 (N 2 m).

• When T is small, the χ2 approximation to the distribution of the test statistic may be
misleading.

• When T is small, the nominal size of the portmanteau test tends to be lower than
the significance level chosen and the test has low power against many alternatives.
Adjusted versions of the Q statistic can be used:

– Hosking’s statistic:
m
X 1
Q∗ (m) = T (T + 2) tr(Γ̂h Γ̂−1 −1
0 Γ̂h Γ̂0 )
h=1
T −h

– Li and McLeod’s statistic:


m
∗∗
X 1 −1 −1 N 2 m(m + 1)
Q (m) = T tr(Γ̂h Γ̂0 Γ̂h Γ̂0 ) +
h=1
T −h 2T

1.5 Multivariate White Noise Process

Multivariate White Noise: Let zi,t = [z1,t , z2,t , ..., zN,t ] be a N × 1 vector of random
variables. This multivariate time series is said to be a multivariate white noise if it is a
stationary vector with zero mean, and if the values of zt at different times are uncorrelated,
i.e., Γh is an N × N of zeros at all h 6= 0.

• Each component of zt simply behaves like a univariate white noise.

• The individual white noises are uncoupled in a linear sense.

• Assuming that the values of zt are uncorrelated does not necessarily imply that they
are independent. Independence can be inferred by the lack of correlations at all leads
and lags among the random variables that enter zt , when the random vector follows a
multivariate normal distribution.

4
2 Introduction to VAR Analysis

2.1 From Structural to Reduced-Form VARs

Vector Autoregressive Model Var(p): A Vector Autoregressive model of order p is a


process that can be represented as
p
X
yt = a0 + A1 yt−1 + A2 yt−2 + ... + Ap yt−p + ut = a0 + Aj yt−j + ut
j=1

where
yt = N × 1 vector containing N endogenous variables
a0 = N × 1 vector of constants
A1, A2, ..., Ap = p N × N matrices of autoregressive coefficients ut
= N × 1 vector of serially uncorrelated, white noise disturbances.

Structural VAR or VAR in primitive form:

y1,t = b1,0 − b1,2 y2,t + ϕ1,1 y1,t−1 + ϕ1,2 y2,t−1 + 1,t

y2,t = b2,0 − b2,1 y1,t + ϕ2,1 y1,t−1 + ϕ2,2 y2,t−1 + 2,t


where
y1,t and y2,t are assumed to be stationary
1,t and 2,t are uncorrelated white-noise disturbances with standard deviation σ1 and σ2 ,
respectively.

In matrix notation
         
1 b1,2 y1,t b1,0 ϕ1,1 ϕ1,2 y1,t−1 1,t
= + +
b2,1 1 y2,t b2,0 ϕ2,1 ϕ2,2 y2,t−1 2,t

or in compact form
Byt = Q0 + Q1 yt−1 + t

• y1,t depends on its own lag and on both one lag and current value of y2,t ; y2,t depends
on its own lag and on both one lag and current value of y1,t .

• It captures contemporaneous feedback effects:

1. −b1,2 measures the contemporaneous effect of a unit change of y2,t on y1,t ;

5
2. −b2,1 measures the contemporaneous effect of a unit change of y1,t on y2,t .

• Each contemporaneous variable is correlated with its own error term, therefore the
regressors are not uncorrelated with the error terms as required by OLS estimation
techniques.

• When −b1,2 6= 0, y2,t depends on y1,t and on 1,t and will be correlated with it. When
−b2,1 6= 0, y1,t depends on y2,t and on 2,t .

• Contemporaneous terms cannot be used in forecasting.

Reduced-form VAR or VAR in standard form:

y1,t = a1,0 + a1,1 y1,t−1 + a1,2 y2,t−1 + u1,t

y2,t = a2,0 + a2,1 y1,t−1 + a2,2 y2,t−1 + u2,t


that is obtained from by pre-multiplying both sides of Byt = Q0 + Q1 yt−1 + t by B−1

yt = a0 + A1 yt−1 + ut

where a0 = B−1 Q0 , A1 = B−1 Q1 , ut = B−1


t t .

• It does not contain contemporaneous feedback terms.

• It can be estimated equation by equation using OLS.

• u1,t and u2,t are composites of 1,t and 2,t : in fact

ut = B−1 t then
1,t − b1,2 2,t 2,t − b2,1 1,t
u1,t = and u2,t =
1 − b1,2 b2,1 1 − b1,2 b2,1
.

Properties (derived by the white noise processes 1,t , 2,t ):

1.
1,t − b1,2 2,t
E[u1,t ] = E[ ]=0
1 − b1,2 b2,1
2,t − b2,1 1,t
E[u2,t ] = E[ ]=0
1 − b1,2 b2,1

2.
V ar[1,t − b1,2 2,t ]
V ar[u1,t ] = =
(1 − b1,2 b2,1 )2

6
V ar[1,t ] + b21,2 V ar[2,t ] − 2b1,2 Cov[1,t , 2,t ] 2
σ,1 2
+ b21,2 σ,2
= =
(1 − b1,2 b2,1 )2 (1 − b1,2 b2,1 )2
2
σ,2 2
+ b22,1 σ,1
V ar[u2,t ] =
(1 − b1,2 b2,1 )2
constant over time.

3.
2 2
E[(1,t − b1,2 2,t )(2,t − b2,1 1,t )] −(b2,1 σ,1 + b1,2 σ,2 )
Cov[u1,t , u2,t ] = =
(1 − b1,2 b2,1 )2 (1 − b1,2 b2,1 )2

• u1,t and u2,t are serially uncorrelated, but are cross-correlated unless b1,2 = b2,1 =
0.

4.    
Var[u1,t ] Cov[u1,t , u2,t ] σ12 σ1,2
Σu = =
Cov[u1,t , u2,t ] Var[u2,t ] σ1,2 σ22

• In general, it is not possible to identify the structural parameters and errors from
the OLS estimates of the parameters and the residuals of the standard form VAR,
unless some restrictions are imposed on the primitive system.

Recursive Choleski triangularization: Impose a Choleski decomposition on the


covariance matrix of the residuals of the VAR in its standard form, that is the restriction
b1,2 = 0, so that
y1,t = b1,0 + ϕ1,1 y1,t−1 + ϕ1,2 y2,t−1 + 1,t
y2,t = b2,0 − b2,1 y1,t + ϕ2,1 y1,t−1 + ϕ2,2 y2,t−1 + 2,t
then
u1,t = 1,t and u2,t = 2,t − b2,1 1,t
 
−1 1 0
In matrix form, the restriction b1,2 = 0 means that B = , so that
-b2,1 1
        
y1,t 1 0 b1,0 1 0 ϕ1,1 ϕ1,2 y1,t−1
= + +
y2,t -b2,1 1 b2,0 -b2,1 1 ϕ2,1 ϕ2,2 y2,t−1
  
1 0 1,t
+
-b2,1 1 2,t

7
     
b1,0 1 0 ϕ1,1 ϕ1,2 y1,t−1
= + +
b2,0 − b1,0 b2,1 -b2,1 1 ϕ2,1 − b2,1 ϕ1,1 ϕ2,2 − b2,1 ϕ1,2 y2,t−1
 
1,t
+
2,t − b2,1 1,t

so that
a1,0 = b1,0 , a2,0 = b2,0 − b1,0 b2,1 , a1,1 = ϕ1,1, , a1,2 = ϕ1,2 ,
a2,1 = ϕ2,1 − b2,1 ϕ1,1 , a2,2 = ϕ2,2 − b2,1 ϕ1,2 , u1,t = 1,t , u2,t = 2,t − b2,1 1,t
It follows that
σ12 ≡ V ar[u1,t ] = σ,1
2

2
σ22 ≡ V ar[u2,t ] = σ,2 2
− b22,1 σ,1
2
Cov[u1,t , u2,t ] = −b2,1 σ,1

• The restriction implies that the observed values of u1,t are completely attributed
to pure (structural) shocks to y1,t .

Choleski decomposition of the symmetric matrix: the covariance matrix of the


residuals is forced to be equal to

Σu = WΣW0 = Σ1/2 (Σ1/2 )0

where W = B−1 , Σ is the diagonal covariance matrix of the structural innovations


and Σ1/2 is the triangular “square root” of the covariance matrix Σu
  2  0
1 0 σ,1 0 1 0
Σu = 2 =
-b2,1 1 0 σ,2 -b2,1 1
2
   
1 0 σ,1 0 1 -b2,1
= 2 =
-b2,1 1 0 σ,2 0 1
2 2
 
σ,1 -b2,1 σ,2
2 2
-b2,1 σ,1 σ,2 − b22,1 σ,1
2

We can go back from the estimated Σu to the original (and unobserved) diagonal
matrix Σ, and this is equivalent, after a little bit of algebra to

Σ = W−1 Σu (W0 )−1

8
• In a N-variate VAR, we need to impose (N 2 − N )/2 in order to retrieve the N
structural shocks from the residual of the OLS estimate.

Example for a VAR(1) with three endogenous variables:


We need to impose (32 −3)/2 = 3 restrictions that is equivalent to pre-multiplying
the structural VAR by the lower triangular matrix
 
1 0 0
B−1 =  -b1,2 1 0 
-b1,3 -b2,3 1

so that   
1 0 0 1,t
−1
ut = B t = -b2,1
 1 0   2,t  =
-b3,1 -b3,2 1 3,t
 
1,t
= 2,t − b2,1 1,t 
3,t − b3,1 1,t − b3,2 2,t

• There are as many Choleski decompositions as all the possible orderings of the
variables. Therefore, when we apply a Choleski triangular identification scheme
to a VAR model we are introducing a number of (potentially arbitrary) assump-
tions on the contemporaneous relationships among the variables.

2.2 Stationarity Conditions and the Population Moments of


a VAR(1) Process

Properties of a reduced-form, standard VAR(1) model:

(a)
E[yt ] = a0 + A1 E[yt−1 ]
time invariant and thus

µ ≡ E[yt ] = (IN − A1 )−1 a0

where (IN − A1 ) is a non-singular matrix and IN is the N × N identity matrix

(b)
µt|t−1 ≡ E[yt |=t−1 ] = E[yt |yt−1 ] = a0 + A1 yt−1

9
(c) Given that a0 = (IN − A1 )µ, the VAR(1) model can be rewritten as

yt − µ = A1 (yt−1 − µ) + ut

so that if the mean-corrected time-series ỹt ≡ yt − µ then

ỹt = A1 yt−1 + ut

and substituting yt−1 = A1 yt−2 + ut−1 , then yt−2 = A1 yt−3 + ut−2 and keeping
iterating, we obtain

ỹt = A1 (A1 yt−2 + ut−1 ) + ut = A21 yt−2 + A1 ut−1 + ut =



X
= ut + A1 ut−1 + A21 ut−2 + A31 ut−3 + ... = Ai1 ut−i + ut
i=1

and ∞
X
yt = µ + Ai1 ut−i + ut
i=1

Vector moving average (VMA) infinite representation of the VAR(1) model:



X
yt = µ + Θi ut−i + ut
i=1
P∞
that is derived by yt = µ + i=1 Ai1 ut−i + ut , where we define Θi ≡ Ai1 .

• It represents the multivariate extension of the Wold’s representation theorem.

Properties:

(a) ut is serially uncorrelated and it is also uncorrelated with the past values of yt

Cov[ut , yt−1 ] = 0

and ut is called vector of innovations of the series at time t.

(b)
Cov[yt , ut ] = Σu
derived by post-multiplying yt = µ + ∞ i 0
P
i=1 A1 ut−i + ut by ut , taking the expec-
tation, and exploiting the fact that ut is serially uncorrelated.

(c) yt depends on the past innovations ut−j with a coefficient matrix Aj1 .

10
• yt is stable if det(IN − A1 z) 6= 0 for |z| ≤ 1.

(d)

X
Cov[yt ] ≡ Γ0 = Σu + A1 Σu A01 + A21 Σu (A21 )0 + ... = Ai1 Σu (Ai1 )0
i=0

where A01 is a N × N IN , or alternatively



X
Γ0 = Θi Σu Θ0i
i=0

where Θi = the coefficients of the moving average representations of the VAR,


that can be derived from
yt = µ + A(L)yt + ut
that can be rewritten as
A(L)yt = µ + ut
where A(L) ≡ IN − A(L). Let

X
Θ(L) ≡ Θi Li
i=0

be an operator such that Θ(L)A(L) = IN , then post-multiply A(L)yt = µ + ut


by Θ(L) and obtain
yt = Θ(L)µ + Θ(L)ut
that is ∞ ∞
X X
yt = Θi µ + Θi ut−i
i=0 i=0
Θ(L) is the inverse of A(L), then
i
X
Θi = Θi−j A1 with Θ0 = IN
j=1

0
Given that Cov[ut , yt−j ] = E[ut yt−j ] = 0 for j > 0 if we post-multiply the
expression of yt by yt−h we obtain
E(yt yt+1−h ) = A1 E(yt yt−h )0 for h > 0
Therefore
Γh = A1 Γh−1 = Ah1 Γ0 for h > 0
−1/2
Finally, if we post-multiply Γh by D we obtain
ρh = D−1/2 A1 Γh−1 D−1/2 = D−1/2 A1 D1/2 D−1/2 Γh−1 D1/2 =
Ψρh−1 = Ψh ρ0 for h > 0
−1/2 −1/2
where Ψ = D A1 D

11
(e)
Cov[yt |=t−1 ] = Cov[yt |yt−1 ] = A1 Cov[yt−1 |yt−1 ]A01 + Σu = Σu
because Cov[yt−1 |yt−1 ] = 0.

• When the residuals are simultaneously uncorrelated (i.e., Σu is diagonal),


then also Cov[yt |yt−1 ] will be diagonal.

2.3 Generalization to a VAR(p) Model

Starting from the VAR(p) model equation

yt = a0 + A1 yt−1 + A2 yt−2 + ... + Ap yt−p + ut

Rewrite using the lag operator

(IN − A1 L − ... − Ap p)yt = a0 + ut

In compact form
A(L)yt = a0 + ut
where A(L) = IN − A1 L − ... − Ap Lp .

Properties:

(a)
µ = E[yt ] = (IN − A1 − ... − Ap )−1 a0
provided that the inverse of the matrix (IN − A1 − ... − Ap ) exists, and
p
X
µt|t−1 ≡ E[yt |yt−1 ] = a0 + Aj yt−j
j=1

(b)
Cov[yt , ut ] = Σu

(c)
Cov[yt−h , ut ] = 0 for any h > 0

(d)
Γh = A1 Γh−1 + ... + Ap Γh−p for h > 0

12
(e)
ρh = Ψ1 ρh−1 + ... + Ψp ρh−p for h > 0
where Ψi = D−1/2 Ai D1/2 .

Representation of VAR(p) as a Np-dimensional VAR(1)


 
 0
 A1 A2 Ap
yt  IN
 y0t−1  0 0 
0 ... 0]0
 
ξt ≡ 
 , F1 ≡  0 IN
 (N p×N p)  0  and
 Ut ≡ [ut
(N p)×1 (N p)×1
y0t−p+1
 
0 0 0
Then
ξt = F1 ξt−1 + Ut
where
 
Σu 0 ... 0
0 0 ... 0 
E[Ut U0t ] =  and E[Ut U0t−h ] = 0 for h > 0


 ... 
0 0 ... 0

In VMA representation

X ∞
X
ξt = Ut + F1 Ut−1 + F21 Ut−2 + ... = Fi1 Ut−i + Ut = Πi Ut−i + Ut
i=1 i=1

where Πi ≡ JFi1 J0 and J ≡ [IN , 0, ..., 0]0 .

• A VAR(p) model is stable (and thus stationary) as long as the eigenvalues of the
companion matrix F1 are all less than one in modulus, which implies det(IN −
A1 z − ... − Ap zp ) 6= 0 for |z| ≤ 1.

2.4 Estimation of a VAR(p) Model

Multivariate LS estimator:

Starting from
Y = BZ + U
where Y ≡ [y1 , y2 , ..., yt ], B ≡ [a0 , A1 , A2 , ..., Ap ], U ≡ [u1 , u2 , ..., uT ], Z ≡ [Z0 , Z1 , ZT −1 ]
with Zt ≡ [10 , yt−1
0 0
, yt−2 0
, ..., yt−p+1 ]0 . Given that y ≡ vec(Y), β ≡ vec(B) and
u ≡ vec(U) the multivariate LS estimator is
β̂ = ((ZZ0 )−1 ⊗ Σu )(Z ⊗ Σ−1 0 −1
u )y = ((ZZ ) Z ⊗ IN )y

13
that minimizes
S(β) = u0 (IN Σu )−1 u

• When a reduced-form VAR is unconstrained, the GLS estimator is the same as


the OLS estimator, B̂, and therefore an unconstrained VAR can be estimated
equation by equation by OLS.

Asymptotic properties of the OLS estimator B̂ (under standard assumptions):

(a) Consistent and asymptotically normally distributed


√ D a
→ N (0, ΣB̂ ) or vec(B̂) ∼ N (vec(B), ΣB̂ /T )
T vec(B̂ − B) −

where ΣB̂ = plim(ZZ0 /T )−1 ⊗ Σu .

(b)
T T
1 X 1X
Σ̂u = ût û0t or Σ̃u = ût û0t
T − N p t=1 T t=1
where ût = yt − BZt−1 .

Multivariate ML estimator:

Under the assumptions:

(a) Sample of T observations on Y and a pre-sample of p initial conditions y−p+1 , y−p+2 , ..., y0 .

(b) Stationary process and Gaussian multivariate white noise innovations.


⇒ Y = [y1 , y2 , ..., yT ]0 is jointly normally distributed.

(c) Gaussian multivariate white noise (then innovations at different times are inde-
pendent).

(d) Noise error terms are independent with Σu , then the covariance matrix of u is
ΣU = IT ⊗ Σu and its normal density is
NT 1
fu (u) = (2π)− 2 |IT ⊗ Σu |− 2 exp(− 12 u0 (IT ⊗ Σ−1
u )u).

NT 1
(e) fy (y) = (2π)− 2 |IT ⊗ Σu |− 2 exp(− 12 (Y − BZ)0 (IT ⊗ Σ−1
u )(Y − BZ)).

the ML estimator maximizes


NT T 1
`(B, Σu ; Y, Z) = lnfy (Y) = − ln(2π)− ln|Σu |− (Y−BZ)0 (IT ⊗Σ−1
u )(Y−BZ) =
2 2 2

14
NT T 1
=− ln(2π) − ln|Σu | − tr(U0 Σ−1
u U)
2 2 2

• For an unconstrained VAR, the ML and OLS estimators are the same under the
assumption of Gaussian innovations.

Average cross- vector product of the OLS residuals: the ML estimator of the
matrix Σu is
T
1X
Σ̃u = ût û0t
T t=1

Concentrated log-likelihood of the VAR(p) model: Substituting the expression


for the matrix Σu that maximizes the likelihood, in the class of all symmetric positive
definite matrices, we obtain
NT T 1
`(B, Σu ; Y, Z) = − ln(2π) − ln|Σ̃u | − N T
2 2 2

• Optimizing `(B, Σu ; Y, Z) = − N2T ln(2π) − T2 ln|Σu | − 21 tr(U0 Σ−1


u U) in one pass
NT
or maximizing over Σ̃u and `(B, Σu ; Y, Z) = − 2 ln(2π) − 2 ln|Σ̃u | − 12 N T iter-
T

ating between the two objects until convergence is achieved, will return identical
results.

2.5 Specification of a VAR Model and Hypothesis Testing

• If the order of the VAR model increases increasing, the (absolute) size of the
residuals decreases and both the fit of the model and its forecasting power in-
crease.

• If the number of parameters increases, its in-sample accuracy increases, while its
out-of-sample predictive power decreases.

Restricted, standard VAR: models in which the structure and number of lags in-
cluded in each equation may vary across different equations.

15
Methods to select p:

(a) Multivariate information criteria:

i. (M)AIC: ln|Σ̃u | + 2 K
T

K
ii. (M)SBC: ln|Σ̃u | + T
ln(T )

iii. (M)HQIC: ln|Σ̃u | + 2 K


T
ln(ln(T ))

(b) Final predictor error (FPE):


T + Np + 1 N
F P E(p) = [ ] |Σ̃u |
T − Np + 1
where |Σ̃u | =determinant of the estimated covariance matrix of the residuals
from a given VAR(p) model.

(c) General-to-simple approach: use sequential Likelihood Ratio (LR) test:


H0 : p0 lags are sufficient
a
LRT (p0 , p1 ) = T (ln|Σ̃pu0 | − |Σ̃pu1 |) ∼ χ2 (N (p1 − p0 ))
where Σ̃pu0 =determinant of the covariance matrix estimated under H0 :VAR in-
cludes p0 lags and Σ̃pu1 =determinant of the covariance matrix estimated under
H1 :VAR includes p1 lags.

Alternative statistics:
a
LRT 0 (p0 , p1 ) = (T − N p − 1)(ln|Σ̃pu0 | − |Σ̃pu0 |) ∼ χ2 (N (p1 − p0 ))

• LR tests can only be used to perform a pairwise comparison of two VAR


systems, one that is a restricted (nested inside) version of the bigger VAR.
Then a simple-to-general approach is not possible.

• If the assumption that errors from each equation are normally distributed
is not respected, the test is not valid.

• When the sample size is small, the test may be subject to substantial size
distortions.

2.6 Forecasting with a VAR model

Forecasting method: Minimization of the mean squared forecast error (MSFE)


using the loss function.

16
Assumption: ut is an independent multivariate white noise, such that ut and us
are independent for t 6= s ⇒ Et [ut+h |=t ] = 0 for h > 0.

Minimized time t MSFE prediction at h:


Et [yt+h |=t ] = Et [yt+h |{ys |s ≤ t}] =
= a0 + A1 Et [yt+h−1 |=t ] + ... + Ap Et [yt+h−p |=t ]

Best linear predictor in terms of MSFE minimization:


Et [yt+h |=t ] = a0 + A1 Et [yt+h−1 |=t ] + ... + Ap Et [yt+h−p |=t ]

Properties:

i. Unbiased predictor, i.e. E[yt+h − E[yt+h |=t ]] = 0.

ii. If ut is an independent white noise vector, then


M SF E[Et [yt+h ]] = M SF E[Et [yt+h |yt , yt−1 , ...]].

3 Structural Analysis with VAR Models

3.1 Impulse Response Functions

• VAR models can be used to understand the dynamic relationships between


the variables of interest.

Impulse Response Function: In the context of a VAR model, an impulse


response function traces out the time path of the effects of an exogenous shock
to one (or more) of the endogenous variables on some or all of the other variables
in a VAR system.

Impact multipliers: coefficients of the matrix Φi .

Starting from the moving average representation of a VAR(1)



X ∞
X
yt = a0 + A1 yt−1 + ut = µ + Ai1 ut−i = µ + Θi ut−i
i=0 i=0

17
or, for example in the case of a VAR(1):
    X∞   
y1,t µ1 θ1,1(i) θ1,2(i) u1,t−i
= +
y2,t µ2 θ2,1(i) θ2,2(i) u2,t−i
i=0

where     
u1,t 1 1 -b1,2 1,t
=
u2,t 1 − b1,2 b2,1 -b2,1 1 2,t
Then
Ai1
   
1 -b1,2 Θi 1 -b1,2
Φi = =
1 − b1,2 b2,1 -b2,1 1 1 − b1,2 b2,1 -b2,1 1

and ∞
X
yt = µ + Φi t−i
i=0

For example, φ1,2(0) is the instantaneous impact on y1,t of a one-unit change in 2,t .

Cumulative response of the variable j to a shock to the variable k:


H
X
φj,k,(i)
i=0

For example, H
P
i=0 φ1,2,(i) is the cumulative effects of a one-unit shock (or im-
pulse) to 2,t on the variable y1,t after H periods.

Long-run impact multipliers: impact multipliers when H → ∞.

• The set of elements φj,k(i) , with i = 1, ..., H is the impulse response function
of the jth variable of the system, up to the period H.

• VAR in its reduced form is under-identified by construction and therefore


φj,k(i) cannot be computed from the OLS estimates of the VAR in its stan-
dard form without imposing adequate restrictions.

• Choleski decompositions provide a minimal set of restrictions concerning


the simultaneous relationships among variables that can be used to identify
the structural model, but this method forces potentially important identi-
fication asymmetry on the system.

• IRFs are constructed using estimated coefficient, thus will contain sampling
error. Therefore, it is advisable to construct confidence intervals around
them to account for the uncertainty that derives from parameter estimation.

18
• Bootstrapping techniques are usually used to compute confidence intervals,
as these are more reliable and avoid the complex computation of exact ex-
pressions for the asymptotic variance of the IRF coefficients.

Bootstrapping methods:

i. Estimate each equation using OLS/MLE and construct {ubt } by randomly


sampling with replacement from the estimated residuals.

ii. Use {ubt } and the estimated coefficients to construct a pseudo-vector of


endogenous variable series {ytb }.

iii. Discard the coefficients used to generate {ytb } and estimate new coefficients
from {ytb }. The impulse response functions are computed from the newly
estimated coefficients and saved, also indexed by the bootstrap iteration b.

• An impulse response function is considered to be statistically significant if


zero is not included in the bootstrapped confidence interval.

3.2 Variance Decompositions

Forecast error variance decomposition:

Starting from the VMA representation of the model,


h−1
X
ut (h) = yt+h − Et [yt+h ] = Φi t+h−i
i=0

then

uy1 (h) = y1,t+h −E[y1,t+h|t ] = φ1,1 (0)1,t+h +φ1,1 (1)1,t+h−1 +...+φ1,1 (h−1)1,t+1 +

+φ1,2 (0)2,t+h + φ1,2 (1)2,t+h−1 + ... + φ1,2 (h − 1)2,t+1


and
2 2
σy1 (h) = σy1 [φ21,1 (0)+φ21,1 (1)+...+φ21,1 (h−1)]+σy2
2
[φ21,2 (0)+φ21,2 (1)+...+φ21,2 (h−1)]

• The variance of the forecast error increases as the forecast horizon h in-
creases because all the coefficients φ2j,k are non-negative as they are squared.

19
Therefore, the h-step-ahead forecast error variance can be decomposed in

i. proportion due to the shocks in {1,t }


2
σy1 [φ21,1 (0) + φ21,1 (1) + ... + φ21,1 (h − 1)]
2
σy1 (h)

ii. proportion of forecast error variance due to the shocks in the sequence {2,t }
2
σy2 [φ21,2 (0) + φ21,2 (1) + ... + φ21,2 (h − 1)]
2
σy1 (h)

• Variance decompositions determine how much of the h-step-ahead forecast


error variance of a given variable is explained by innovations to each ex-
planatory variable for h = 1, 2, ... .

• They require identification, therefore Choleski decompositions (or other re-


striction schemes) are typically imposed.

Innovation accounting: approach where the forecast error variance decompo-


sition and the impulse response function are combined to uncover the dynamic
interrelationships among the endogenous variables.

3.3 Granger Causality

Granger causality: Let =t be the information set containing all the relevant
information available up to and including time t. In addition, let yt (h|=t ) be
the optimal (minimum MSFE) h-step-ahead prediction of the process {yt } at
the forecast origin t, based on =t . The vector time series process {xt } is said to
(Granger-) cause {yt } in a Granger sense if and only if

M SF Eyt (h|=t ) < M SF Eyt (h|=t {xs |s ≤ t})

Feedback system: represented by the joint process {x0t , yt0 }0 when {xt } causes
{yt } and {yt } causes {xt }.

• We only consider the information in the past and present values of the
process under examination, rather than the entire =t because =t of all the

20
existent relevant information is rarely available.

Granger Causality - Restricted: Let yt (h|{xs , ys |s ≤ t}) be the optimal


linear (minimum MSFE) h-step-ahead prediction function of the process {yt } at
the forecast origin t, based on the information {xs , ys |s ≤ t}. The process {xt }
is said to Granger cause {yt } if

M SF E(E[yt |xt−1 , xt−2 , ..., yt−1 , yt−2 , ...]) ≤ M SF E(E[yt |yt−1 , yt−2 , ...])

• Difference between Granger causality and exogeneity: for yt to be exoge-


nous it is required that it is not affected by the contemporaneous value of
xt , while Granger causality refers to the effects of the past values of {xt }
on the current value of yt .

• The lack of causality can be assessed by looking at the representation of


the VAR in its standard form.

• The lack of Granger causality can be verified using a standard F-test of the
restriction a1,2(1) = a1,2(2) = ... = a1,2(p) = 0, where, for instance in the case
of N = 2
      
y1,t a1,0 a1,1(1) a1,2(1) y1,t−1
= + + ...+
y2,t a2,0 a2,1(1) a2,2(1) y2,t−1
    
a1,1(p) a1,2(p) y1,t−p u1,t
+ +
a2,1(p) a2,2(p) y2,t−p u2,t

Block-causality tests: verify whether one variable, yn,t , Granger causes any
other variables in the system, that is, whether taking into account the lagged
value of yn,t helps forecasting any of the other variables in the VAR.They consist
of likelihood ratio tests
(T − m)(ln|Σ̃R U
u | − |Σ̃u |)

where Σ̃Ru is the covariance matrix of the residuals from a model that has been
restricted to have all the coefficients of the lags of the variable yn,t = 0 and Σ̃Uu
is the residual covariance matrix of the unrestricted model.

21

You might also like