Cap0_Slides
Cap0_Slides
You need to review the following assumptions (See the slides of Econometria I)
Finite Sample
Assumption (FS1 - Linearity). yi = 1 + 2xi2 + ::: + K xiK + "i. The model speci…es
a linear relationship between y and x1; :::; xK (The model is linear in the parameters).
Assumption (FS2 - Full column rank). There is no exact linear relationship among any of
the independent variables in the model, rank (X) = K
Assumption (FS3 - Exogeneity of the independent variables - Strict Exogeneity). E "ijxj1; :::; xj
0; 8i; j , E ( "ij X) = 0; 8i , E ( "j X) = 0:
Assumption (FS4 - Homoscedasticity and Nonautocorrelation). Var ( "ij X) = E "2i X =
2 > 0; 8i; Cov "i; "j X = 0; 8i; j ; i 6= j
Large Sample (n ! 1)
Assumption (LS1 - Linearity, S&WD). The model is linear yi = x0i + "i and f(yi; xi)g
is jointly S&WD.
X0X P p
Assumption (LS2 - Rank Condition). n = n1 n
i=1 xi x0
i ! E xix0i = Q and Q is
nonsingular.
Assumption (LS3 - Predetermined Regressors). All the regressors are predetermined in the
sense that they are orthogonal to the contemporaneous error term: E (xik "i) = 0; 8i; k.
Assumption (LS4 - fxi"ig is a Martingale Di¤erence with Finite Second Moments). fwig,
where wi := xi"i; is a martingale di¤erence sequence (so a fortiori E (xi"i) = 0). The
K K matrix of cross moments, E "2i xix0i , is nonsingular.
Assumption (LS5 - Conditional Homocedasticity). Var ( "ij xi) = 2; 8i.
3
0.1 Introduction
y = X +"
E ( "j X) = 0
0 2 =
E "" X = where 6= I
The hypothesis E ( "j X) = 0 is too strong and it does not hold in general for time-series
analysis. We may consider the weaker hypotheses E ( "tj xt) = 0.
4
Spherical Disturbances:
2 3 2 3
E "21 X E ( "1"2j X) E ( "1"nj X) 2 0 0
6 7 6 7
6 7
0 6 E ( "2"1j X) E "22 X E ( "2"nj X) 7 6 0 2 0 7
E "" X =6 ... ... ... ... 7=6
6 ... ... ... ... 7
7
6 7 4 5
4 5
0 0 2
E ( "n"1j X) E ( "n"2j X) E "2n X
Assumptions FS1-FS3 may hold under Autocorrelation and/or Heterocedasticity, so the OLS
estimator may be unbiased:
E (b) = E ( bj X) = :
However, it can be proved [board] that
1 2 1
Var ( bj X) = X0X X0 X X0X :
Additionally, if Assumptoin FS5 holds (The disturbances are normally distributed), then
1 2 1
bj X N ; X0X X0 X X0X
Assumptions LS1-LS3 may hold under Autocorrelation and/or Heterocedasticity, so the OLS
estimator may be consistent:
p
b ! :
However, usual inference is not valid. To see why consider the following.
Hence,
p d d
n (b ) ! Q 1Z where Z ! N (0; S) or
p d
n (b ) ! N 0; Q 1SQ 1 :
Under the Assumptions LS1-LS4 the expression Q 1SQ 1 reduces to 2Q 1. Now this
simpli…cation cannot be done because S does not coincide with 2Q; due to the presence
of Autocorrelation and/or Heterocedasticity.
Remarks:
The Gauss-Markov Theorem (based on Assumptions FS1-FS4) no longer holds for the
OLS estimator, because FS4 does not hold. The BLUE is some other estimator.
However, the OLS estimator b is unbiased and can still be used even if FS4 does not
hold.
9
Because the variance of the least squares estimator is not 2 X0X 1 statistical infer-
ence based on 2 X0X 1 is incorrect. The usual t-ratio is not distributed as the t
distribution. The same comment applies to the F-test.
If is known we may develop the theory under the Assumption FS1-FS3 and FS5. Oth-
erwise we need the Assumptions LS1-LS4 to estimate through a consistent estimator.
10
0.3 Heteroskedasticity
ECONOMETRIA I
11
0.4 Autocorrelation
Because the issue of serial correlation arises almost always in time-series models, we use the
subscript "t" instead of "i" in this section.
yt = 1 + 2xt2 + "t;
"t = "t 1 + ut; j j<1
where futg is a sequence of i.i.d. r.v. with E (ut) = 0, Var (ut) = 2u and E "t k ut = 0
8k 2 N: We say that f"tg follows an AR(1) [autoregressive process of order 1]. We have:
2 2
u u j;
Var ( "tj X) = ::: = 2
; E "t"t j X = ::: = 2
j 0
1 1
2 3
1 n 1
2 6 n 2 7
0 0 u 6 1 7
6
E "" X = E "" = 26 ... ... ... ... 7
7:
1 4 5
n 1 n 2 1
13
Prolonged in‡uence of shocks. In time series data, random shocks (disturbances) have
e¤ects that often persist over more than one time period. An earthquake, ‡ood, strike,
pandemic, or war, for example, will probably a¤ect the economy’s operation in periods
following the period in which it occurs.
Inertia. Owing to inertia or psychological conditioning, past actions often have a strong
e¤ect on current actions, so that a positive disturbance in one period is likely to in‡uence
activity in succeeding periods.
E ( "tj X) = 0; t = 1; 2; :::; n
it can be shown that the hypothesis H0 : 1 = 2 = ::: = p = 0 can be tested through
the following auxiliary regression:
regression et on et 1; :::; et p:
Under the null
d
LM = nR2 ! 2(p)
If the regressors are not strictly exogenous, for example, if there is a lagged endogenous
(yt 1; or yt 2; etc.) as explanatory variable the test presented in the previous slide is not
valid: The reason is somewhat technical and is explained in Hayashi’s book, pp. 144-146.
et on xt,et 1; :::; et p
and then calculating the LM statistic for the hypothesis that the p coe¢ cients of et 1; :::; et p
are all zero. This regression is still valid when the regressors are strictly exogenous (so you
may always use that regression).
18
Given
et = 1 + 2xt2 + ::: + K xtK + 1et 1 + ::: + pet p + errort
the null hypothesis can be formulated as
H0 : 1 = 2 = ::: = p = 0
Unde the null
d
LM = nR2 ! 2(p)
Select p
– For quarterly data, use e.g. p = 4; For monthly data, use e.g. p = 12
– Test that all coe¢ cients are zero using the 2(p) test
– Test that all coe¢ cients associated with et j are zero using 2(p) test
20
Example 0.4.3. AP IConsider, chnimp: the volume of imports of barium chloride from
China, chempi: index of chemical production in USA (to control for overall demand for
barium chloride), gas: the volume of gasoline production (another demand variable), rtwex:
an exchange rate index (measures the strength of the dollar against several other currencies).
File: 00_auto_resid_graph.py
21
If you conclude that the errors are serial correlated you have a few options:
(a) you don’t know the form of autocorrelation so you rely on the OLS, but you use the
[ (b) = 1 Q
(HAC) covariance matrix estimator AVar ^ 1S^ HAC Q^ 1 [sections 0.4.2 -
n
0.4.4]
(b) You know (at least approximately) the form of autocorrelation and so you use a feasible
GLS estimator [section 0.4.6]
(c) You are concerned only with the dynamic speci…cation of the model and with forecast.
You may try to convert your model into a dynamically complete model [section 0.4.7]
(d) Your model may be misspeci…ed: you respecify the model and the autocorrelation dis-
appears [section 0.4.8]
25
Previous results in section 0.2 apply. However, by assuming just the autocorrelation we
can be more precise in terms of the conditions that lead to consistency and asymptotic
distribution.
When the regressors include a constant (true in virtually all known applications), Assumption
LS4 implies that the error term is a scalar martingale di¤erence sequence, so if the error
is found to be serially correlated (or autocorrelated), that is an indication of a failure of
Assumption LS4.
Assumpt. LS1-LS3 may hold under serial correlation, so the OLS estimator may be consistent
even if the error is autocorrelated. However, usual inference is not valid. To see why, consider
! 1
p X0X 1
n (b ) = p X0"
n n
0 1 10 1
n
X n
@
1 0 A @
1 X
= xtxt p xt"tA
n t=1 n t=1
We have
p
AVar n (b ) = Q 1SQ 1
where
Q := E xix0i
! 0 1
n
X
1 1
S := AVar p X0" = n!1
lim Var @ xi"iA
n n i=1
(see section 0.2).
27
S = Var (xi"i) = 2 0
E xixi :
n
2 X
s
^ =
S xtx0t
n t=1
Given
1 nX1 X n
S=E "2t xtx0t + lim 0 0
E "t"t j xtxt j + E "t j "txt j xt ;
n j=1 t=j+1
a possible estimator of S based on the analogy principle would be
n 0 1
nX n
1 X 1 X
2 0
et xtxt + etet j xtx0t j + et j etxt j x0t ; n0 < n:
n t=1 n j=1 t=j+1
A major problem with this estimator is that it is not positive semi-de…nite and hence cannot
be a well-de…ned variance-covariance matrix.
30
Newey and West show that with a suitable weighting function ! (j ), the estimator below is
consistent and positive semi-de…nite:
Xn XL Xn
1 1
^ HAC =
S e2t xtx0t + ! (j ) etet j xtx0t j + et j etxt j x0t
n t=1 n j=1 t=j+1
where the weighting function ! (j ) is
j
! (j ) = 1 :
L+1
The maximum lag L must be determined in advance. Autocorrelations at lags longer than
L are ignored. For a moving-average process, this value is in general a small number.
This estimator is known as (HAC) covariance matrix estimator and is valid when both
conditional heteroscedasticity and serial correlations are present but of an unknown form.
p ^ 1SHAC Q ^ 1.
[
The term HAC estimador also applies to AVar ( n (b )) = Q
31
1
! (1) = 1 = 0:75
4
2
! (2) = 1 = 0:50
4
3
! (3) = 1 = 0:25
4
32
10
L
9
0
0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000
n
33
p
Consistent estimator for AVar ( n (b )) = Q 1SQ 1:
p
[
AVar n (b ^ 1S
) =Q ^ 1:
^ HAC Q
[ (pn (b
Suppose LS1-LS3 hold and AVar
p p
)) ! AVar ( n (b )). We have
a [ (b) = N [ (b) :
b N ; AVar ; AVar
Under H0 : k = 0k we have
bk 0 h i
d
t0k = k ! N (0; 1) ; where ^ 2bk [ (bk ) = AVar
= AVar [ (b) :
^ bk kk
There are many forms of autocorrelation and each one leads to a di¤erent structure for the
error covariance matrix . The most popular form is known as the …rst-order autoregressive
process. In this case the error term in
yt = x0t + "t
is assumed to follow the AR(1) model
We assume in this section that is known. For this reason we may develop the relevant
theory under the assumptions FS1-FS3.
37
Derivation of GLS
As in the heteroscedastic case, to obtain the GLS estimator we need to …nd a full rank n n
matrix P such that
Py = PX + P"
y = X +"
and
0 0 0 0 0 2 0 2
E " " X = E P"" P X = P E "" X P = P P = I:
Thus, P is such that
P P0 = I , 1= P0P
We use this matrix P to obtain y = Py and X = PX:
38
The GLS estimator is the OLS estimator applied to the transformed model y = X +" ;
i.e.
1
^ GLS = X0 X X0 y :
It follows that
1
Var ^ GLS X = 2 X0 X
In the case where the error term "t follows an AR(1) the matrix is given in example 0.4.1.
It can be proved (this is not straightforward) that
2 3
1 0 0 0
6 7
6 1+ 2 0 0 7
6 7
1=6
6 0 1+ 2 0 0 7
7:
6 ... ... ... ... ... 7
6 7
6 7
4 0 0 0 1+ 2 5
0 0 0 1+ 2
40
~0t + ut
y~t = x
where
( q ( q
1 2y t=1 ; 1 2 x0 t= 1 ;
y~t = 1 ~0t =
x 1
yt yt 1 t > 1 (xt xt 1)0 t > 1
Without the …rst observation, the transformed model is
0
yt yt 1 = (xt xt 1) + ut ; t > 1:
42
Here is another way to obtain the transformed model assuming that "t follows an AR(1),
"t = "t 1 + ut
yt = x0t + "t
= x0t + "t 1 + ut
= x0t + yt 1 x0t 1 + ut
yt y = x0t x0t + ut
| {z t 1} | {z
1
} |{z}
y~t white noise process.
~0t
x
The GLS estimator is the OLS estimator applied to the transformed model. So the GLS can
also be expressed as
0 1 1
n
X Xn
^ GLS = @ x
~t x 0
~t A x
~ty~t
t=1 t=1
1
which is the same as ^ GLS = X0 1X X0 1y
43
yt = 1 + 2xt2 + "t
"t = "t 1 + ut
We have
yt = 1 + 2xt2 + "t 1 + ut
= 1 + 2 xt2 + (yt 1 1 2 xt2 ) + ut
y y = 1 (1 )+ 2 xt2 xt 1;2 + ut
|t {z t 1} | {z } | {z }
y~t ~t1
x ~t2
x
that is, the transformed model is
" #
h i
yt yt 1 = 1 xt2 xt 1;2 1 +u
t
| {z } 2
0 | {z }
~0t=(xt
x xt 1)
or
y~t = 1x
~t1 + 2x
~t2 + ut:
44
Example 0.4.8 (continuation of the previous example). AP IEstimation using the Maxi-
mum Likelihood method FGLS (see 00_auto_ar1.py). We assume that "t = "t 1 + ut
(although other models may be more suitable in view of the example 0.4.4).
46
=================================================================================
coef std err z P>|z| [0.025 0.975]
---------------------------------------------------------------------------------
const -38.9788 23.380 -1.667 0.095 -84.802 6.845
lchempi 2.8735 0.635 4.529 0.000 1.630 4.117
lgas 1.2002 1.004 1.195 0.232 -0.768 3.168
lrtwex 0.8469 0.453 1.871 0.061 -0.040 1.734
ar.L1.lchnimp 0.3066 0.086 3.555 0.000 0.138 0.476
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 3.2619 +0.0000j 3.2619 0.0000
-----------------------------------------------------------------------------
Note: ^ = 0:3066
47
If we are concerned only with the dynamic speci…cation of the model and with forecast we
may try to convert a model with autocorrelation into a Dynamically Complete Model.
Consider
~0t + ut
yt = x
such that E ( utj x
~t) = 0: This condition although guarantees consistency of b (if other
conditions are also met), does not preclude autocorrelation. You may try to increase the
number of regressors to xt and get a new regression model
If a model is DC then once xt has been controlled for, no lags of either y or x help to explain
current yt. This is a strong requirement and is implausible when the lagged dependent
variable has predictive power, which is often the case.
Theorem 0.4.1. If a model is DC then the errors are not correlated. Moreover fxt"tg is a
MDS.
E ( yt j x
~t) = E ( ytj xt2) = 1 + 2xt2:
ut = yt ( 1 + 2xt2) )
ut 1 = yt 1 ( 1 + 2xt 1;2)
we have
yt = 1 + 2xt2 + ut
= 1 + 2 xt2 + ut 1 + "t
= 1 + 2 xt2 + yt 1 1 + 2 xt 1;2 + "t:
50
Autocorrelation test - Breusch-Godfrey test (Question: Explain how the test was imple-
mented).
p-value : 0.6389
52
0.4.8 Misspeci…cation
In many cases the …nding of autocorrelation is an indication that the model is misspeci…ed.
If this is the case, the most natural route is not to change your estimator (from OLS to GLS)
but to change your model. Types of misspeci…cation may lead to a …nding of autocorrelation
in your OLS residuals:
dynamic misspeci…cation;
yt = 1 + 2 log t + "t:
In the following …gure we estimate a misspeci…ed functional form: yt = 1 + 2t + "t : The
residuals are clearly autocorrelated