0% found this document useful (0 votes)
5 views

Lecture5-Estimating the Linear Conditional Mean Model II - Annotated

Lecture 5 of Econ 704 focuses on estimating the conditional mean model using ordinary least squares (OLS) estimators, discussing their unbiasedness, variance, and the implications of homoskedasticity. The Gauss-Markov theorem is introduced, establishing OLS as the best linear unbiased estimator under certain conditions, while also addressing the challenges posed by heteroskedasticity and the use of generalized least squares (GLS) estimators. The lecture concludes with discussions on the consistency and asymptotic normality of the OLS estimator.

Uploaded by

Hot Buzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture5-Estimating the Linear Conditional Mean Model II - Annotated

Lecture 5 of Econ 704 focuses on estimating the conditional mean model using ordinary least squares (OLS) estimators, discussing their unbiasedness, variance, and the implications of homoskedasticity. The Gauss-Markov theorem is introduced, establishing OLS as the best linear unbiased estimator under certain conditions, while also addressing the challenges posed by heteroskedasticity and the use of generalized least squares (GLS) estimators. The lecture concludes with discussions on the consistency and asymptotic normality of the OLS estimator.

Uploaded by

Hot Buzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Econ 704

Lecture 5 - Estimating the Conditional Mean Model II

Xiaoxia Shi

University of Wisconsin - Madison

11/12/2018

Lecture (5) Econ 704 11/12/2018 1 / 26


We have considered the conditional mean model

E [Y |X ] = X 0 β. (1)
Define U = Y − E [Y |X ]. The conditional mean model above can be
equivalently written as

Y = X 0 β + U, E [U |X ] = 0. (2)

The different notation does not make the model a different model.
The model describes the conditional mean function E [Y |X = x ].

When E [XX 0 ] is a full-rank matrix, the conditional mean function


uniquely defines a true parameter value β. (Full-rank condition)

Lecture (5) Econ 704 11/12/2018 2 / 26


Suppose that there is an i.i.d. sample {(Yi , Xi0 )0 }ni=1 of (Y , X 0 )0 , we
can use the sample to estimate this true value β.

We introduced a sample analogue estimator


! −1
n n
β̂ = ∑ Xi Xi0 ∑ Xi Yi ≡ (Xn0 Xn )−1 (Xn0 Yn ).
i =1 i =1

This turns out to be the ordinary least squares (OLS) estimator.

The full-rank condition also guarantees that this estimator is


well-defined (at least with high probability when n is large enough).

Is this sample analogue estimator a good estimator?

Lecture (5) Econ 704 11/12/2018 3 / 26


Outline

Mean and Variance of OLS Estimator

Consistency

Asymptotic Normality

Lecture (5) Econ 704 11/12/2018 4 / 26


Outline

Mean and Variance of OLS Estimator

Consistency

Asymptotic Normality

Lecture (5) Econ 704 11/12/2018 5 / 26


Mean of OLS Estimator: Unbiasedness

The OLS estimator is an unbiased estimator for the β in the


conditional mean model E [Y |X ] = X 0 β.

E [ β̂|Xn ] = E [(Xn0 Xn )−1 (Xn0 Yn )|Xn ]

= (Xn0 Xn )−1 (Xn E [Yn |Xn ])

= (Xn0 Xn )−1 (Xn0 Xn β)

= (Xn0 Xn )−1 (Xn0 Xn ) β = β

Thus, E [ β̂] = E (E [ β̂|Xn ]) = E ( β) = β.


Assumptions used: E [Y |X ] = X 0 β, full-rank, i.i.d.

Lecture (5) Econ 704 11/12/2018 6 / 26


Variance of the OLS Estimator

Under the same assumptions: E [Y |X ] = X 0 β (Linear specification of


the conditional mean model), full-rank, i.i.d., we can also derive the
variance of the OLS estimator for β.

Let’s just derive the conditional variance of β̂ given Xn as this is


sufficient for our subsequent discussion.

Var ( β̂|Xn ) = E [(Xn0 Xn )−1 (Xn0 Yn )|Xn ]

= (Xn0 Xn )−1 Xn0 (Var (Yn |Xn ))Xn (Xn0 Xn )−1 .

Lecture (5) Econ 704 11/12/2018 7 / 26


Let’s work a bit harder on the matrix Var (Yn |Xn ). Written out, it is
Var (Y1 |Xn ) Cov (Y1 , Y2 |Xn ) . . . Cov (Y1 , Yn |Xn )
 
Cov (Y1 , Y2 |Xn ) Var (Y2 |Xn ) . . . Cov (Y2 , Yn |Xn )
 
 .. .. .. .. 
 . . . . 
Cov (Y1 , Yn |Xn ) Cov (Y2 , Yn |Xn ) . . . Var (Yn |Xn )

 
Var (Y1 |X1 ) 0 ... 0
 0 Var ( Y |
2 2X ) . .. 0 
=
 
.. .. . .. .. 
 . . . 
0 0 ... Var (Yn |Xn )

Lecture (5) Econ 704 11/12/2018 8 / 26


Homoskedasticity

Suppose that we add the additional assumption:

Var (Y |X ) = σ2 .

That is, the conditional variance of Y given X = x does not vary


with x.

This assumption is called the homoskedasticity assumption.

Then Var (Yn |Xn ) simplifies to


 2 
σ 0 ... 0
0 σ2 ... 0
 . . . .  ≡ σ 2 In .
.. .. . . ..
0 0 ... σ2

Lecture (5) Econ 704 11/12/2018 9 / 26


Gauss-Markov Theorem

When Var (Yn |Xn ) simplies, Var ( β̂|Xn ) also simlifies:

Var ( β̂|Xn ) = σ2 × (Xn0 Xn )−1

Moreover, the OLS estimator now is the best linear unbiased


estimator (BLUE), as the Gauss-Markov Theorem shows:
Gauss-Markov Theorem. Under the conditional mean model
E [Y |X ] = X 0 β, suppose that the i.i.d. sampling, the full-rank, and
the homoskedasticity assumptions are satisfied. Then

Var ( β̂|Xn ) ≤ Var ( β̃|Xn )

for any β̃ = D (Xn )Yn for some function k × n matrix -valued


function D such that E [ β̃|Xn ] = β.

Lecture (5) Econ 704 11/12/2018 10 / 26


Discussion of Homoskedasticity

Being a BLUE is certainly good since small variance is good.

However, the homoskedasticity assumption is very strong for


economic applications. If E [Y |X = x ] is not constant across x, what
are the reasons to beliee that Var (Y |X = x ) is?

Thus, the Gauss Markov Theorem is really more fun than meaningful
for us.

When homoskedasticity is not satisfied, we have heteroskedasticity.

It is custom to say we have heteroskedastic error because

Var (Y |X ) = Var (U |X ).

Lecture (5) Econ 704 11/12/2018 11 / 26


Generalized Least Squares Estimator

When there is heteroskedasticity, other estimators may have smaller


variance than the OLS estimator. For example, the generalized least
squares (GLS) estimator.

Suppose that the heteroskedastic variance function


σ2 (x ) = Var (U |X = x ) is known. We can rewrite the conditional
mean model Y = X 0 β + U as
 0
Y X U
= β+
σ (X ) σ (X ) σ (X )

And then use the OLS estimator based on this transformed model.
The resulting estimator is the GLS estimator, which is a weighted
least squares estimator (WLS).

Lecture (5) Econ 704 11/12/2018 12 / 26


Generalized Least Squares Estimator

The GLS estimator is the sample analogue estimator based on the


following expression of β:

β = (E σ(X )−2 XX 0 )−1 E [σ(X )−2 XY ].


 

 −1
Thus β̃WLS = ∑ni=1 σ(Xi )−2 Xi Xi0 ∑ni=1 σ (Xi )−2 Xi Yi .


The GLS estimator is BLUE.

Lecture (5) Econ 704 11/12/2018 13 / 26


Generalized Least Squares Estimator

The GLS estimator is the sample analogue estimator based on the


following expression of β:

β = (E σ(X )−2 XX 0 )−1 E [σ(X )−2 XY ].


 

 −1
Thus β̃WLS = ∑ni=1 σ(Xi )−2 Xi Xi0 ∑ni=1 σ (Xi )−2 Xi Yi .


The GLS estimator is BLUE.

However, σ(x ) is almost never known. A feasible generalized least


squares (FGLS) estimator may be defined using an estimated σ(x ).

But the FGLS estimator is no long BLUE. It is no longer unbiased,


and does not necessarily have smaller variance than β̂.

Lecture (5) Econ 704 11/12/2018 13 / 26


The unbiasedness and BLUEness are the small sample properties of
the OLS estimator.

These properties are exact descriptions of (some aspects) of the


distribution of the OLS estimator. They apply regardless of sample
size as long as their required assumptions are satisfied.

The OLS estimator also possesses some nice large sample properties.

These are approximate descriptions of (some aspects) of the


distribution of the OLS estimator. They are meaningful only when the
sample size is large enough.

Lecture (5) Econ 704 11/12/2018 14 / 26


Outline

Mean and Variance of OLS Estimator

Consistency

Asymptotic Normality

Lecture (5) Econ 704 11/12/2018 15 / 26


Consistency

βb is consistent for β. That is, βb →p β.

To see why, observe that βb is a continuous function of two sample


average forms: n−1 ∑ni=1 (Xi Xi0 ) and n−1 ∑ni=1 (Xi Yi ).
! −1 !
n n
β̂ = n −1
∑ (Xi Xi0 ) n −1
∑ (Xi Yi )
i =1 i =1

That means the law of large numbers (LLN) can be applied to each
element of the n−1 ∑ni=1 (Xi Xi0 ) and the vector n−1 ∑ni=1 (Xi Yi ).

Lecture (5) Econ 704 11/12/2018 16 / 26


Consistency

The LLN applies as long as


each element of E (|XX 0 |) and that of E (|XY |) are finite. Assume
that they are, then the LLN tells us
n n
n −1 ∑ (Xi Xi0 ) →p E (XX 0 ), and n−1 ∑ (Xi Yi ) →p E (XY ). (3)
i =1 i =1

Therefore, by the Slutsky Theorem, we have

βb →p [E (XX 0 )]−1 E (XY ) = β. (4)

Lecture (5) Econ 704 11/12/2018 17 / 26


Outline

Mean and Variance of OLS Estimator

Consistency

Asymptotic Normality

Lecture (5) Econ 704 11/12/2018 18 / 26


Asymptotic Normality

βb is asymptotically normally distributed.


To see why, first observe that

βb = (Xn0 Xn )−1 Xn0 (Xn β + U)


= β + (Xn0 Xn )−1 Xn0 Un . (5)

We want to use the central limit theorem on this. The first step to
apply CLT is always subtracting the true value:
! −1 !
n n
βb − β = (Xn0 Xn )−1 Xn0 Un = n −1
∑ Xi Xi0 n −1
∑ Xi Ui
i =1 i =1

Lecture (5) Econ 704 11/12/2018 19 / 26


Asymptotic Normality: Derivation

! −1 !
n n
βb − β = n −1
∑ Xi Xi0 n −1
∑ Xi Ui
i =1 i =1

We already know that


n
n −1 ∑ Xi Xi0 →p E (XX 0 ).
i =1

We can apply the vector central limit theorem on the term


n−1 ∑ni=1 Xi Ui .

Lecture (5) Econ 704 11/12/2018 20 / 26


Vector Central Limit Theorem

Vector CLT. Consider an i.i.d. sample {Wi }ni=1 of a k × 1 random vector


W . Consider the thought experiment of increasing sample size by adding
more i.i.d. observations of W . Let W̄n = n−1 ∑ni=1 Wi . If
each element of Var (W ) is finite, then

n (W̄n − E (W )) →d N (0k , Var (W )), (6)

where 0k is a k × 1 vector of zeroes.


Here
√ the convergence is for the muti-variate distribution of the vector
n (W̄n − E (W )). It means that the joint CDF of this random vector
converges to that of a joint normal random vector.
This is more than element wise convergence in distribution.

Lecture (5) Econ 704 11/12/2018 21 / 26


Back to the OLS Estimator
! −1 !
n n
βb − β = n −1 ∑ Xi Xi0 n −1 ∑ Xi Ui
i =1 i =1

Now come back to the term n−1 ∑ni=1 Xi Ui .




Note that XU is a k × 1 random vector, and {Xi Ui }ni=1 is an i.i.d.


sample of XU.
Thus, by the vector CLT, as long as Var (XU ) < ∞, we have
n
n−1/2 ∑ (Xi Ui − E [XU ]) →d N (0, Var (XU )).
i =1

But E [XU ] = E [E [XU |X ]] = E [XE [U |X ]] = 0. Thus


n
n−1/2 ∑ (Xi Ui ) →d N (0, Var (XU )).
i =1

Lecture (5) Econ 704 11/12/2018 22 / 26


Asymptotic Distribution

We have shown
n
n −1 ∑ Xi Xi0 →p E (XX 0 )
i =1
n
n−1/2 ∑ (Xi Ui ) →d N (0, Var (XU )).
i =1

Therefore,
! −1 !
√ n n
n ( β̂ − β) = n −1 ∑ Xi Xi0 n−1/2 ∑ (Xi Ui )
i =1 i =1

→d N 0, E (XX 0 )−1 Var (XU )E (XX 0 )−1 .




Lecture (5) Econ 704 11/12/2018 23 / 26


The Asymptotic Variance


n ( β̂ − β) →d N 0, E (XX 0 )−1 Var (XU )E (XX 0 )−1 .


The asymptotic variance can be simplified a little bit:

Var (XU ) = E [XU (XU )0 ]

= E [U 2 XX 0 ]

= E [E (U 2 |X )XX 0 ] = E [σ2 (X )XX 0 ]

Note that σ2 (X ) = Var (U |X ) = Var (Y |X ) is the heteroskedasticity.

Lecture (5) Econ 704 11/12/2018 24 / 26


The Asymptotic Variance


n ( β̂ − β) →d N 0, E (XX 0 )−1 E [σ2 (X )XX 0 ]E (XX 0 )−1 .


If the model√is homoskedastic, that is σ2 (X ) = σ2 , the asymptotic


variance of n ( β̂ − β) further simplifies:

E (XX 0 )−1 E [σ2 (X )XX 0 ]E (XX 0 )−1 = σ2 E (XX 0 )−1 .

But as we discussed, in econometrics, we should not make this


simplifying assumption. Thus, we should stick with the more
complicated form of the asymptotic variance.

Lecture (5) Econ 704 11/12/2018 25 / 26


Quiz 5

Is the following statement true or false? If false, give two reasons why.

Suppoes that E [Y |X ] = X 0 β, E [XX 0 ] has full-rank, and we are estimating


β based on an i.i.d. sample of (Y , X 0 )0 . Then the OLS estimator based on
this sample is the best estimator.

Lecture (5) Econ 704 11/12/2018 26 / 26

You might also like