0% found this document useful (0 votes)
3 views

Lecture3

This lecture focuses on the finite-sample properties of the Ordinary Least Squares (OLS) estimator in linear regression models, including its expectation and variance. It discusses the unbiased nature of the OLS estimator and introduces the Gauss-Markov theorem, which establishes OLS as the Best Linear Unbiased Estimator (BLUE). Additionally, it covers Generalized Least Squares (GLS) for cases with correlated or heteroskedastic errors.

Uploaded by

sundance114514
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture3

This lecture focuses on the finite-sample properties of the Ordinary Least Squares (OLS) estimator in linear regression models, including its expectation and variance. It discusses the unbiased nature of the OLS estimator and introduces the Gauss-Markov theorem, which establishes OLS as the Best Linear Unbiased Estimator (BLUE). Additionally, it covers Generalized Least Squares (GLS) for cases with correlated or heteroskedastic errors.

Uploaded by

sundance114514
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Econometric Analysis of Cross Section and Panel Data

Lecture 3: Finite-Sample Properties of the OLS Estimator

Zhian Hu

Central University of Finance and Economics

Fall 2024

1 / 25
This Lecture

▶ Hansen (2022): Chapter 4


▶ We investigate some finite-sample properties of the least squares estimator in the
linear regression model.
▶ In particular we calculate its finite-sample expectation and covariance matrix and
propose standard errors for the coefficient estimators.

2 / 25
Sample Mean in an Intercept-Only Model
OLS \frac{\sum Y_i}{n}
▶ The intercept-only model: Y = µ + e, assuming that E[e] = 0 and E[e 2 ] = σ 2 .
▶ In this model µ = E[Y ] is the expectation of Y . Given a random sample, the
least squares estimator µ
b = Ȳ equals the sample mean.
▶ We now calculate the expectation and variance of the estimator Ȳ .
" n # n Y
1X 1X
E[Ȳ ] = E Yi = E [Yi ] = µ
n n
i=1 i=1

▶ An estimator with the property that its expectation equals the parameter it is
estimating is called unbiased.
▶ An estimator θb for θ is unbiased if E[θ]b =θ

OLS E[], Y \beta


n^{-1}\sum Y_i
3 / 25
Sample Mean in an Intercept-Only Model
▶ We next calculate the variance of the estimator.
1 Pn
▶ Making the substitution Yi = µ + ei we find Ȳ − µ = n i=1 ei .
 ! 
n n
1X 1X
var[Ȳ ] = E (Ȳ − µ)2 = E 
 
ei  ej 
n n
i=1 j=1
n X
n n
1 X 1 X
= E [ei ej ] = σ2 i,i
n2 n2
i=1 j=1 i=1
1
= σ2.
n E[e_i]=0,E[e_j]=0
▶ The second-to-last inequality is because E [ei ej ] = σ2 for i = j but E [ei ej ] = 0 for
i ̸= j due to independence. E[e_ie_i]=0

4 / 25
Linear Regression Model
▶ We now consider the linear regression model.
▶ The variables (Y , X ) satisfy the linear regression equation

Y = X ′β + e
E[e | X ] = 0

The variables have finite second moments


E Y2 < ∞
 

E∥X ∥2 < ∞

and an invertible design matrix QXX = E [XX ′ ] > 0


▶ Homoskedastic Linear Regression Model: E e 2 | X = σ 2 (X ) = σ 2 is independent
 

of X .

5 / 25
Expectation of Least Squares Estimator
▶ The OLS estimator is unbiased in the linear regression model.
▶ In summation notation:
 !−1 
n n
!
h i X X
E βb | X1 , . . . , Xn = E  Xi Xi′ Xi Yi | X1 , . . . , Xn 
i=1 i=1
n
!−1 " n
! #
X X
X = Xi Xi′ E Xi Yi | X1 , . . . , Xn
i=1 i=1
n
!−1 n
X X
= Xi Xi′ E [Xi Yi | X1 , . . . , Xn ]
i=1 i=1
n
!−1 n
X X
= Xi Xi′ Xi E [Yi | Xi ] X_i X_j
i=1 i=1
n
!−1 n
E[Y_i|X_i]=X
X X _i\beta
= Xi Xi′ Xi Xi′ β = β.
i=1 i=1 6 / 25
Expectation of Least Squares Estimator

▶ In matrix notation: h i
−1 ′
E[βb | X ] = EX ′X X Y |X
−1 ′
= X ′X X E[Y | X ]
−1
= X ′X X ′X β


= β.
▶ In the linear regression model with i.i.d. sampling

E[βb | X ] = β

▶ Using the law of iterative expectation, we can further prove that E[β]
b = β.

7 / 25
Variance of Least Squares Estimator
(A^{-1})^T=(A^T)^{-1}
var[Z]ij=cov(Z_iZ_j)

▶ For any r × 1 random vector Z , define the r × r covariance matrix

var[Z ] = E (Z − E[Z ])(Z − E[Z ])′ = E ZZ ′ − (E[Z ])(E[Z ])′


   

▶ For any pair (Z , X ), define the conditional covariance matrix

var[Z | X ] = E (Z − E[Z | X ])(Z − E[Z | X ])′ | X


 

▶ We define Vβb def


= var[βb | X ] as the conditional covariance matrix of the regression
coefficient estimators.
▶ We now derive its form.

8 / 25
Variance of Least Squares Estimator

▶ The conditional covariance matrix of the n × 1 regression error e is the n × n


matrix.  def
var[e | X ] = E ee ′ | X = D

var[e|x]_{ij}=E[e_ie_j|x]
▶ The i th diagonal element of D is
E[e_i^2|X]=E[(e_i-E[e_i|x])^2|x]
 2   2  2
E ei | X = E ei | Xi = σi

▶ The ij th off-diagonal element of D is

E [ei ej | X ] = E (ei | Xi ) E [ej | Xj ] = 0

9 / 25
Variance of Least Squares Estimator

▶ Thus D is a diagonal matrix with i th diagonal element σi2 :


 2 
σ1 0 · · · 0
2
  0 σ2 · · · 0 

D = diag σ12 , . . . , σn2 =  . . .
 .. .. . . . .. 
0 0 · · · σn 2
+LRM

▶ In the special case of the linear homoskedastic regression model, then


E ei2 | Xi = σi2 = σ 2 and we have the simplification D = In σ 2 . In general,
 

however, D need not necessarily take this simplified form.


+LRM

10 / 25
Variance of Least Squares Estimator

▶ For any n × r matrix A = A(X ),

var A′ Y | X = var A′ e | X = A′ DA
   

▶ In particular, we can write βb = A′ Y where A = X (X ′ X )−1 and thus


−1 −1
Vβb = var[βb | X ] = A′ DA = X ′ X X ′ DX X ′ X

▶ It is useful to note that X ′ DX = ni=1 Xi Xi′ σi2 , a weighted version of X ′ X .


P

▶ In the special case of the linear homoskedastic regression model, D = In σ 2 , so


X ′ DX = X ′ X σ 2 , and the covariance matrix simplifies to Vβb = (X ′ X )−1 σ 2
" "
V_{\beta}=(X'X)^{-1}X'DX(X'X)^{-1}

11 / 25
Gauss-Markov Theorem

▶ Write the homoskedastic linear regression model in vector format as


BLUE=Best+Linear+Unbiased+Estimator
Y = Xβ + e
E[e | X ] = 0
var[e | X ] = In σ 2

▶ In this model we know that the least squares estimator is unbiased for β and has
covariance matrix σ 2 (X ′ X )−1 .
▶ Is there an alternative unbiased estimator βe which has a smaller covariance
matrix?
Linear:\Theta=A'Y,\theta=\sum a_i·y_i,
\beta=(X'X)^{-1}X'Y,A'=(X'X)^{-1}X'
Unbiased: if linear, it is unbiased
12 / 25
Gauss-Markov Theorem

▶ Take the homoskedastic linear regression model. If βe is an unbiased


estimator of β then var[βe | X ] ≥ σ 2 (X ′ X )−1
▶ Since the variance of the OLS estimator is exactly equal to this bound this means
that no unbiased estimator has a lower variance than OLS. Consequently we
describe OLS as efficient in the class of unbiased estimators.
▶ Let’s restrict attention to linear estimator of β, which are estimators that can be
written as βe = A′ Y , where A = A(X ) is an m × n function of the regressors X .
▶ This restriction give rise to the description of OLS as the best linear unbiased
estimator (BLUE).
OLS

\beta=A'·Y
:E(β|X)=β
13 / 25
Gauss-Markov Theorem
▶ For βe = A′ Y we have
E[βe | X ] = A′ E[Y | X ] = A′ X β
▶ Then βe is unbiased for all β if (and only if) A′ X = Ik . Furthermore
E[(β-E(β|X))(β-E
var[βe | X ] = var A′ Y | X = A′ DA = A′ Aσ 2
 
(β|
X))'|X]=E[(\hat{β}-
▶ the last equality using the homoskedasticity assumption. To establish
β)(\the
Theorem we need to show that for any such matrix A hat{β}-β)'|X]=E[A'
−1 ee'A|
A′ A ≥ X ′ X X]=

▶ Set C = A − X (X ′ X )−1 . Note that X ′ C = 0. We calculate that


−1 ′
   
−1 −1 −1
A′ A − (X ′ X ) = C + X (X ′ X ) C + X (X ′ X ) − (X ′ X )
−1 −1
= C ′ C + C ′ X (X ′ X ) + (X ′ X ) X ′C
−1 −1 −1
+ (X ′ X ) X ′ X (X ′ X ) − (X ′ X )
= C ′ C ≥ 0.
14 / 25
Generalized Least Squares OLS

▶ Take the linear regression model in matrix format Y = X β + e


▶ Consider a generalized situation where the observation errors are possibly
correlated and/or heteroskedastic.

E[e | X ] = 0
var[e | X ] = Σσ 2

for some n × n matrix Σ > 0 (possibly a function of X ) and some scalar σ 2 . This
includes the independent sampling framework where Σ is diagonal but allows for
non-diagonal covariance matrices as well.
▶ Under these assumptions, we can calculate the expectation and variance of the
OLS estimator:
E[βb | X ] = β
−1 −1
var[βb | X ] = σ 2 X ′ X X ′ ΣX X ′ X


15 / 25
Generalized Least Squares
▶ In this case, the OLS estimator is not efficient. Instead, we develop the
Generalized Least Squares (GLS) estimator of β.
▶ When Σ is known, take the linear model and pre-multiply by Σ−1/2 . This
produces the equation Ye = X e β + ee where Ye = Σ−1/2 Y , Xe = Σ−1/2 X , and
ee = Σ −1/2 e.
▶ Consider OLS estimation of β in this equation.
 −1
βegls = Xe ′X
e e ′ Ye
X
 ′  −1  ′  
−1/2 −1/2
= Σ X Σ X Σ−1/2 X Σ−1/2 Y
−1 E[\tidle{e}\tidle{e}'|X]
= X ′ Σ−1 X X ′ Σ−1 Y =E[\sigma^{-frac{1}{2}ee'\sigma^{-frac
▶ You can calculate that {1}{2}}]
h i =\sigma^{-frac{1}{2}}E[ee'|X]\sigma^
E β̃gls | X = β {-frac{1}{2}
=I_n\sigma^2
h i −1
var βegls | X = σ 2 X ′ Σ−1 X
16 / 25
Residuals
▶ What are some properties of the residuals ebi = Yi − Xi′ βb in the context of the
linear regression model?
▶ Recall that eb = Me

e | X ] = E[Me | X ] = ME[e | X ] = 0
E[b
e | X ] = var[Me | X ] = M var[e | X ]M = MDM
var[b
▶ Under the assumption of conditional homoskedasticity

e | X ] = Mσ 2
var[b
▶ In particular, for a single observation i we can find the variance of ebi by taking the
i th diagonal element. Since the i th diagonal element of M is 1 − hii we obtain
ei | X ] = E êi2 | X = (1 − hii ) σ 2
 
var [b
▶ Can you show the conditional expectation and variance of the prediction errors
eei = Yi − Xi′ βb(−i) ?
i i h_ii
17 / 25
Estimation of Error Variance
▶ The error variance σ 2 = E e 2 can be a parameter of interest.

▶ One estimator is the sample average of the squared residuals:
n
2 1X 2
σ
b = ebi
n
i=1
▶ We can calculate the expectation of b2 :
σ
1 1  1
b2 = e ′ Me = tr e ′ Me = tr Mee ′

σ
n n n
▶ Then MM=M
 1
b | X = tr E Mee ′ | X
 2  
E σ
n
1
= tr ME ee ′ | X
 
n
1
= tr(MD)
n
n
1X
= (1 − hii ) σi2
n 18 / 25
Estimation of Error Variance
▶ Adding the assumption of conditional homoskedasticity
 
2 n−k
 2  1 2

E σb | X = tr Mσ = σ
n n M n-k

b2 is biased towards zero.


which means that σ
▶ So we can define an unbiased estimator by rescaling
n
1 X 2
s2 = ebi
n−k
i=1

▶ By the above calculation E s 2 | X = σ 2 and E s 2 = σ 2 . Hence the estimator


   

s 2 is unbiased for σ 2 . Consequently, s 2 is known as the bias-corrected estimator


for σ 2 and in empirical practice s 2 is the most widely used estimator for σ 2 .

19 / 25
Covariance Matrix Estimation Under Homoskedasticity
▶ For inference we need an estimator of the covariance matrix Vβb of the least
squares estimator.
▶ Under homoskedasticity the covariance matrix takes the simple form
−1
Vβb0 = X ′ X σ2

▶ Replacing σ 2 with its estimator s 2 , we have


−1
Vbβb0 = X ′ X s2

▶ Since s 2 is conditionally unbiased for σ 2 it is simple to calculate that Vbb0 is


β
conditionally unbiased for Vβb under the assumption of homoskedasticity:
h i −1  2 −1 2
E Vbβb0 | X = X ′ X E s | X = X ′X

σ = Vβb

20 / 25
Covariance Matrix Estimation Under Homoskedasticity

▶ This was the dominant covariance matrix estimator in applied econometrics for
many years and is still the default method in most regression packages.
▶ Stata uses the covariance matrix estimator by default in linear regression unless an
alternative is specified.
▶ However, the above covariance matrix estimator can be highly biased if
homoskedasticity fails.

21 / 25
Covariance Matrix Estimation Under Heteroskedasticity

▶ Recall that the general form for the covariance matrix is


−1 −1
Vβb = X ′ X X ′ DX X ′ X


▶ This depends on the unknown matrix D: D = diag σ12 , . . . , σn2 = E [ee ′ | X ]




▶ An ideal but infeasible estimator is


−1  ′  −1
Vbβbideal = X ′ X X DX
e X ′X
n
!
−1 −1
X
= X ′X Xi Xi′ ei2 X ′ X


i=1
h i
▶ You can verify that E Vbbideal | X = Vβb. However, the errors ei2 are unobserved.
β

22 / 25
Covariance Matrix Estimation Under Heteroskedasticity
▶ We can replace ei2 with the squared residuals ebi2 .

n
!

−1 X −1
VbβbHC0 = XX Xi Xi′ ebi2 X ′X
i=1

▶ The label ”HC” refers to ”heteroskedasticity-consistent”. The label ”HC0” refers


to this being the baseline heteroskedasticity-consistent covariance matrix
estimator.
▶ We know, however, that ebi2 is biased towards zero. Recall that to estimate the
variance σ 2 the unbiased estimator s 2 scales the moment estimator σ̂ 2 by
n/(n − k). We make the same adjustment here:
n
  !
n −1
X −1
VbβbHC1 = X ′X Xi Xi′ ebi2 X ′ X

n−k
i=1

23 / 25
Standard Errors

  q rh i
s βj = Vβbj =
b b Vbβb
jj

24 / 25
Measures of Fit
▶ As we described in the previous chapter a commonly reported measure of
regression fit is the regression R 2 defined as
Pn
2 eb2 b2
σ
R = 1 − Pn i=1 i 2 = 1 − 2
σ
i=1 Yi − Ȳ
bY
2
where σ̂Y2 = n−1 ni=1 Yi − Ȳ .R 2 is an estimator of the population parameter
P

,n-1
var [X ′ β] σ2
ρ2 = =1− 2
var[Y ] σY
▶ However, σb2 and σbY2 are biased. Theil (1961) proposed replacing these by the
2
eY2 = (n − 1)−1 ni=1 Yi − Ȳ yielding what is known
unbiased versions s 2 and σ
P
as R-bar-squared or adjusted R-squared:
Pn 1
2 (n − 1) e 2
s
R̄ 2 = 1 − 2 = 1 − i=1 i
b
Pn 2
σ
eY (n − k) i=1 Yi − Ȳ
25 / 25

You might also like