Lecture3
Lecture3
Zhian Hu
Fall 2024
1 / 25
This Lecture
2 / 25
Sample Mean in an Intercept-Only Model
OLS \frac{\sum Y_i}{n}
▶ The intercept-only model: Y = µ + e, assuming that E[e] = 0 and E[e 2 ] = σ 2 .
▶ In this model µ = E[Y ] is the expectation of Y . Given a random sample, the
least squares estimator µ
b = Ȳ equals the sample mean.
▶ We now calculate the expectation and variance of the estimator Ȳ .
" n # n Y
1X 1X
E[Ȳ ] = E Yi = E [Yi ] = µ
n n
i=1 i=1
▶ An estimator with the property that its expectation equals the parameter it is
estimating is called unbiased.
▶ An estimator θb for θ is unbiased if E[θ]b =θ
4 / 25
Linear Regression Model
▶ We now consider the linear regression model.
▶ The variables (Y , X ) satisfy the linear regression equation
Y = X ′β + e
E[e | X ] = 0
E∥X ∥2 < ∞
of X .
5 / 25
Expectation of Least Squares Estimator
▶ The OLS estimator is unbiased in the linear regression model.
▶ In summation notation:
!−1
n n
!
h i X X
E βb | X1 , . . . , Xn = E Xi Xi′ Xi Yi | X1 , . . . , Xn
i=1 i=1
n
!−1 " n
! #
X X
X = Xi Xi′ E Xi Yi | X1 , . . . , Xn
i=1 i=1
n
!−1 n
X X
= Xi Xi′ E [Xi Yi | X1 , . . . , Xn ]
i=1 i=1
n
!−1 n
X X
= Xi Xi′ Xi E [Yi | Xi ] X_i X_j
i=1 i=1
n
!−1 n
E[Y_i|X_i]=X
X X _i\beta
= Xi Xi′ Xi Xi′ β = β.
i=1 i=1 6 / 25
Expectation of Least Squares Estimator
▶ In matrix notation: h i
−1 ′
E[βb | X ] = EX ′X X Y |X
−1 ′
= X ′X X E[Y | X ]
−1
= X ′X X ′X β
= β.
▶ In the linear regression model with i.i.d. sampling
E[βb | X ] = β
▶ Using the law of iterative expectation, we can further prove that E[β]
b = β.
7 / 25
Variance of Least Squares Estimator
(A^{-1})^T=(A^T)^{-1}
var[Z]ij=cov(Z_iZ_j)
8 / 25
Variance of Least Squares Estimator
9 / 25
Variance of Least Squares Estimator
10 / 25
Variance of Least Squares Estimator
var A′ Y | X = var A′ e | X = A′ DA
11 / 25
Gauss-Markov Theorem
▶ In this model we know that the least squares estimator is unbiased for β and has
covariance matrix σ 2 (X ′ X )−1 .
▶ Is there an alternative unbiased estimator βe which has a smaller covariance
matrix?
Linear:\Theta=A'Y,\theta=\sum a_i·y_i,
\beta=(X'X)^{-1}X'Y,A'=(X'X)^{-1}X'
Unbiased: if linear, it is unbiased
12 / 25
Gauss-Markov Theorem
\beta=A'·Y
:E(β|X)=β
13 / 25
Gauss-Markov Theorem
▶ For βe = A′ Y we have
E[βe | X ] = A′ E[Y | X ] = A′ X β
▶ Then βe is unbiased for all β if (and only if) A′ X = Ik . Furthermore
E[(β-E(β|X))(β-E
var[βe | X ] = var A′ Y | X = A′ DA = A′ Aσ 2
(β|
X))'|X]=E[(\hat{β}-
▶ the last equality using the homoskedasticity assumption. To establish
β)(\the
Theorem we need to show that for any such matrix A hat{β}-β)'|X]=E[A'
−1 ee'A|
A′ A ≥ X ′ X X]=
E[e | X ] = 0
var[e | X ] = Σσ 2
for some n × n matrix Σ > 0 (possibly a function of X ) and some scalar σ 2 . This
includes the independent sampling framework where Σ is diagonal but allows for
non-diagonal covariance matrices as well.
▶ Under these assumptions, we can calculate the expectation and variance of the
OLS estimator:
E[βb | X ] = β
−1 −1
var[βb | X ] = σ 2 X ′ X X ′ ΣX X ′ X
15 / 25
Generalized Least Squares
▶ In this case, the OLS estimator is not efficient. Instead, we develop the
Generalized Least Squares (GLS) estimator of β.
▶ When Σ is known, take the linear model and pre-multiply by Σ−1/2 . This
produces the equation Ye = X e β + ee where Ye = Σ−1/2 Y , Xe = Σ−1/2 X , and
ee = Σ −1/2 e.
▶ Consider OLS estimation of β in this equation.
−1
βegls = Xe ′X
e e ′ Ye
X
′ −1 ′
−1/2 −1/2
= Σ X Σ X Σ−1/2 X Σ−1/2 Y
−1 E[\tidle{e}\tidle{e}'|X]
= X ′ Σ−1 X X ′ Σ−1 Y =E[\sigma^{-frac{1}{2}ee'\sigma^{-frac
▶ You can calculate that {1}{2}}]
h i =\sigma^{-frac{1}{2}}E[ee'|X]\sigma^
E β̃gls | X = β {-frac{1}{2}
=I_n\sigma^2
h i −1
var βegls | X = σ 2 X ′ Σ−1 X
16 / 25
Residuals
▶ What are some properties of the residuals ebi = Yi − Xi′ βb in the context of the
linear regression model?
▶ Recall that eb = Me
e | X ] = E[Me | X ] = ME[e | X ] = 0
E[b
e | X ] = var[Me | X ] = M var[e | X ]M = MDM
var[b
▶ Under the assumption of conditional homoskedasticity
e | X ] = Mσ 2
var[b
▶ In particular, for a single observation i we can find the variance of ebi by taking the
i th diagonal element. Since the i th diagonal element of M is 1 − hii we obtain
ei | X ] = E êi2 | X = (1 − hii ) σ 2
var [b
▶ Can you show the conditional expectation and variance of the prediction errors
eei = Yi − Xi′ βb(−i) ?
i i h_ii
17 / 25
Estimation of Error Variance
▶ The error variance σ 2 = E e 2 can be a parameter of interest.
▶ One estimator is the sample average of the squared residuals:
n
2 1X 2
σ
b = ebi
n
i=1
▶ We can calculate the expectation of b2 :
σ
1 1 1
b2 = e ′ Me = tr e ′ Me = tr Mee ′
σ
n n n
▶ Then MM=M
1
b | X = tr E Mee ′ | X
2
E σ
n
1
= tr ME ee ′ | X
n
1
= tr(MD)
n
n
1X
= (1 − hii ) σi2
n 18 / 25
Estimation of Error Variance
▶ Adding the assumption of conditional homoskedasticity
2 n−k
2 1 2
E σb | X = tr Mσ = σ
n n M n-k
19 / 25
Covariance Matrix Estimation Under Homoskedasticity
▶ For inference we need an estimator of the covariance matrix Vβb of the least
squares estimator.
▶ Under homoskedasticity the covariance matrix takes the simple form
−1
Vβb0 = X ′ X σ2
20 / 25
Covariance Matrix Estimation Under Homoskedasticity
▶ This was the dominant covariance matrix estimator in applied econometrics for
many years and is still the default method in most regression packages.
▶ Stata uses the covariance matrix estimator by default in linear regression unless an
alternative is specified.
▶ However, the above covariance matrix estimator can be highly biased if
homoskedasticity fails.
21 / 25
Covariance Matrix Estimation Under Heteroskedasticity
i=1
h i
▶ You can verify that E Vbbideal | X = Vβb. However, the errors ei2 are unobserved.
β
22 / 25
Covariance Matrix Estimation Under Heteroskedasticity
▶ We can replace ei2 with the squared residuals ebi2 .
n
!
′
−1 X −1
VbβbHC0 = XX Xi Xi′ ebi2 X ′X
i=1
23 / 25
Standard Errors
q rh i
s βj = Vβbj =
b b Vbβb
jj
24 / 25
Measures of Fit
▶ As we described in the previous chapter a commonly reported measure of
regression fit is the regression R 2 defined as
Pn
2 eb2 b2
σ
R = 1 − Pn i=1 i 2 = 1 − 2
σ
i=1 Yi − Ȳ
bY
2
where σ̂Y2 = n−1 ni=1 Yi − Ȳ .R 2 is an estimator of the population parameter
P
,n-1
var [X ′ β] σ2
ρ2 = =1− 2
var[Y ] σY
▶ However, σb2 and σbY2 are biased. Theil (1961) proposed replacing these by the
2
eY2 = (n − 1)−1 ni=1 Yi − Ȳ yielding what is known
unbiased versions s 2 and σ
P
as R-bar-squared or adjusted R-squared:
Pn 1
2 (n − 1) e 2
s
R̄ 2 = 1 − 2 = 1 − i=1 i
b
Pn 2
σ
eY (n − k) i=1 Yi − Ȳ
25 / 25