Ec2 1
Ec2 1
CLM - Assumptions
• Typical Assumptions
(A1) DGP: y = Xβ + ε is correctly specified.
(A2) E[εε|X] = 0
Lecture 1 (A3) Var[εε|X] = σ2 IT
Review I (A4) X has full column rank – rank(X)=k-, where T ≥ k.
β-q=0
The General Linear Hypothesis: H0: Rβ β-q=0
Example: Testing H0: Rβ
(2) We know that imposing the restrictions leads to a loss of fit. R2 • In the linear model
must go down. Does it go down a lot? -i.e., significantly? y = X β + ε = β1 + X2 β2 + X3 β3 + X4 β4 + ε
Recall (i) e* = y – Xb* = e – X(b*– b) • We want to test if the slopes X3, X4 are equal to zero. That is,
(ii) b*= b – (X′′X)-1R′′[R(X′′X)-1R′′]-1(Rb – q)
H0 : β3 = β4 = 0
=> e*′′e* - e′′e = (Rb – q)′′[R(X′′X)-1R′′]-1(Rb – q) H1 : β 3 ≠ 0 or β4 ≠ 0 or both β3 and β4 ≠ 0
Note: Large σ2, small n, and large deviations from the means, decrease
• How do we estimate this? Two cases: the precision forecasting error.
(1) If x0 is a vector of constants => Form C.I. as usual.
(2) If x0 has to be estimated => Complicated (what is the • Interpretation: Forecast variance is smallest in the middle of our
variance of the product?). Use bootstrapping. “experience” and increases as we move outside it.
∑ ∑
1 1
Mean Absolute Error (MAE) = | yˆi − yi |= | ei |
m i =T +1 m i=T +1
• Evaluation of a model’s predictive accuracy for a group of (in-
sample and out-of-sample) observations T +m T +m
∑ ∑
1 1
Mean Squared Error (MSE) = ( yˆi − yi )2 = ei 2
m i =T +1 m i =T +1
∑ ∑
1 1 2
Root Mean Square Error (RMSE)= ( yˆi − yi )2 =
m i=T +1
ei
m i=T +1
T +m
∑
1
ei 2
m i =T +1
U=
Theil’s U-stat = T
∑y
1 2
i
T i=1
RS – Lecture 1
Q: g(xn)=δ/xn a
→ ? (δ is a constant)
Then, as n grows, g(xn) ≈ g(θ) + g′(θ) (xn - θ)
=> n½([g(xn) - g(θ)]) ≈ g′(θ) [n½(xn - θ)]. First, calculate the first two moments of g(xn):
=> n½([g(xn) - g(θ)]/σ) ≈ g′(θ) [n½(xn - θ)/σ]. g(xn) = δ/xn => plim g(xn)=(δ/θ)
g’(xn) = -(δ/xn2) => plim g’(xn)=-(δ/θ 2)
The asymptotic distribution of (g(xn) - g(θ)) is given by that of [n½(xn -
θ)/σ], which is a standard normal. Then, Recall delta method formula: g(xn) → N(g(θ), [g′(θ)]2 σ2/n).
a
• Now, we challenge the assumption that {xi,εi} is a sequence of • Now, we assume plim(X’ε/T) ≠ 0.
independent observations. plim (X’X/T) = Q
IV Estimators IV Estimators
• Properties of bIV • Properties of σˆ 2, under IV estimation:
(1) Consistent - We define σˆ 2 :
T T
∑e ∑ (y
1 1
bIV = (Z’X)-1Z’y = (Z’X)-1Z’(Xβ+ε) σˆ 2 = 2
IV = i − x ' b IV ) 2
T i =1
T i =1
= (Z’X/T)-1 (Z’X/T) β + (Z’X/T)-1Z’ε/T
= β + (Z’X/T)-1 Z’ε/T → β p
(under assumptions) where eIV = y - X bIV = y - X(Z’X)-1Z’y = [I - X(Z’X)-1Z’]y = Mzx y
- Then,
σˆ 2 = eIV'eIV /T = ε'Mzx'Mzxε/T
(2) Asymptotic normality
√T (bIV - β) = √T (Z’X)-1Z’ε = ε'ε/T – 2 ε'X (Z’X)-1Z’ε/T + ε'Z (Z'X)-1X’X(Z’X)-1Z’ε/T
= (Z’X/T)-1 √T (Z’ε/T)
=> plim σˆ 2 = plim(ε'ε/T) - 2 plim[(ε'X/T) (Z’X/T)-1 (Z'ε/T)] +
Using the Lindberg-Feller CLT √T (Z’ε/T) → N(0, σ2Qzz)
d
+ plim(ε'Z (Z’X)-1X’X(Z’X)-1Z’ε/T) = σ2
Then, √T (bIV - β) d
→ N(0, σ2Qzx-1QzzQxz-1)
Est Asy. Var[bIV] = E[(Z'X)-1 Z’εε'Z (Z’X)-1]= σ̂2(Z’X)-1 Z'Z(Z’X)-1
RS – Lecture 1
IV Estimators: 2SLS (2-Stage Least Squares) IV Estimators: 2SLS (2-Stage Least Squares)
• Case 2: l > k -i.e., number of instruments > number of regressors. • We can easily derive properties for bIV:
- This is the usual case. We can throw l-k instruments, but throwing b IV = ( X ' PZ X ) −1 X ' PZ y = ( X ' PZ PZ X ) −1 X ' PZ PZ y
away information is never optimal. = ( Xˆ ' Xˆ ) −1 Xˆ ' y = ( Xˆ ' Xˆ ) −1 Xˆ ' yˆ
- The IV normal equations are an l x k system of equations:
Z’y = Z’Xβ+ Z’ε (1) bIV is consistent
Note: We cannot approximate all the Z’ε by 0 simultenously. There (2) bIV is asymptotically normal.
will be at least l-k non-zero residuals. (Similar setup to a regression!) - This is estimator is also called GIVE (Generalized IV estimator)
- From the IV normal equations => W'Z’X bIV = W'Z’y • Interpretations of bIV
- We define a different IV estimator
- Let ZW = Z(Z’Z)-1Z’X = PZX = Xˆ b IV = b 2 SLS = ( Xˆ ' Xˆ ) − 1 Xˆ ' y This is the 2SLS interpretation
- Then, X'PZX bIV = X'PZy b = ( Xˆ ' X ) −1 Xˆ ' y This is the usual IV Z = Xˆ
IV
bIV = ( X ' PZ X ) −1 X ' PZ y = ( X ' PZ PZ X ) −1 X ' PZ PZ y = ( Xˆ ' Xˆ ) −1 Xˆ ' yˆ
• What are the finite sample properties of bIV? We do not have the
condition E[ε|X] = 0, we cannot conclude that bIV is unbiased, or
that it has a Var[b2SLS] equal to its asymptotic covariance matrix.
=> In fact, b2SLS can have very bad small-sample properties.
Then, we do a first-stage regression to obtain fitted values of X: • In the linear model in Yogo (2004):
x = ZП + Uδ + V -V ~N(0, σV2I) X (endogenous variable): consumption growth
Then, using the fitted values we estimate and do tests on β. Z (the IVs): twice lagged nominal interest rates, inflation,
consumption growth, and log dividend-price ratio.
• Finding a Z that meets both requirements is not easy.
- The valid condition is not that complicated to meet. • But, log consumption is close to a random walk, consumption
- The relevant condition is more complicated: Finding a Z correlated growth is difficult to predict. This leads to the IVs being weak.
with X. But, the explanatory power of Z may not be enough to allow => Yogo (2004) finds F-statistics for H0: П = 0 in the 1st
inference on β. In this case, we say Z is a weak instrument. stage regression that lie between 0.17 and 3.53 for different countries.
RS – Lecture 1
M-Estimation M-Estimation
• The objective function is a sample average or a sum. For example,
• If s(z,b) = ∂q(z,b)/∂b′ exists (almost everywhere), we solve
we want to minimize a population (first) moment:
∑i s(zi,bM)/T =0 (*)
minb E[q(z,β)]
– To check the s.o.c., we define the (pd) Hessian: • Otherwise, the M-estimator is of ρ-type. (ρ= q(z,β)).
H = ∑i ∂2q(zi,b)/∂b∂b′
RS – Lecture 1
– q(z;b) = S(b) = e′e = ∑i=1,…,T (yi - f(xi;b))2 - Var[bM] =(1/T) H0-1V0 H0-1
– bNLLS = argmin S(b) - If the model is correctly specified: -H = V.
Then, Var[b] = V0
• Maximum Likelihood
– Let f (xi,β) be the pdf of the data.
– H and V are evaluated at b0:
– L(x,β) = Πi=1,…,T f (xi;β)
– log L(x,β) = ∑i=1,…,T ln f (xi;β) - H = ∑i [∂2q (zi,b)/∂b∂b′]
– Now, we move from population to sample moments - V = ∑i [∂q(zi,b)/∂b][∂q(zi,b)/∂b′]
– q(z,b) = -log L(x,b)
– bMLE = argmin – log L(x;b)