0% found this document useful (0 votes)
2 views

Lecture4

This lecture introduces the normal regression model, a specific case of linear regression, emphasizing the importance of normality for precise distributional characterizations and inferences. It discusses the properties of normal and multivariate normal distributions, the implications for linear regression, and the derivation of maximum likelihood estimators. Additionally, it covers the distribution of OLS coefficient vectors, residuals, variance estimators, and the t-statistic in the context of the normal regression model.

Uploaded by

sundance114514
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture4

This lecture introduces the normal regression model, a specific case of linear regression, emphasizing the importance of normality for precise distributional characterizations and inferences. It discusses the properties of normal and multivariate normal distributions, the implications for linear regression, and the derivation of maximum likelihood estimators. Additionally, it covers the distribution of OLS coefficient vectors, residuals, variance estimators, and the t-statistic in the context of the normal regression model.

Uploaded by

sundance114514
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Econometric Analysis of Cross Section and Panel Data

Lecture 4: Normal Regression

Zhian Hu

Central University of Finance and Economics

Fall 2024

1 / 29
This Lecture

▶ Hansen (2022): Chapter 5


▶ This chapter introduces the normal regression model, which is a special case of
the linear regression model.
▶ It is important as normality allows precise distributional characterizations and
sharp inferences.
▶ Therefore in this chapter we introduce likelihood methods.

2 / 29
The Normal Distribution

▶ We say that a random variable Z has the standard normal distribution, or


Gaussian, written Z ∼ N(0, 1), if it has the density
 2
1 x
ϕ(x) = √ exp − , −∞ < x < ∞
2π 2
▶ Properties:
1. All integer moments of Z are finite.
2. All odd moments of Z equal 0 .
3. For any positive integer m

E Z 2m = (2m − 1)!! = (2m − 1) × (2m − 3) × · · · × 1


 

3 / 29
The Nornal Distribution

▶ If Z ∼ N(0, 1) and X = µ + σZ for µ ∈ R and σ ≥ 0 then X has the univariate


normal distribution, written X ∼ N µ, σ 2 . By change-of-variables X has the


density
(x − µ)2
 
1
f (x) = √ exp − , −∞ < x < ∞
2πσ 2 2σ 2
▶ The expectation and variance of X are µ and σ 2 , respectively.

4 / 29
Multivariate Normal Distribution

▶ We say that the k-vector Z has a multivariate standard normal distribution,


written Z ∼ N (0, Ik ), if it has the joint density
 ′ 
1 xx
f (x) = k/2
exp − , x ∈ Rk
(2π) 2
▶ The mean and covariance matrix of Z are 0 and Ik , respectively.

5 / 29
Multivariate Normal Distribution

▶ If Z ∼ N (0, Ik ) and X = µ + BZ then the k-vector X has a multivariate normal


distribution, written X ∼ N(µ, Σ) where Σ = BB ′ ≥ 0. If Σ > 0 then by
change-of-variables X has the joint density function

(x − µ)′ Σ−1 (x − µ)
 
1
f (x) = exp − , x ∈ Rk
(2π)k/2 det(Σ)1/2 2
▶ The expectation and covariance matrix of X are µ and Σ, respectively.
▶ If X ∼ N(µ, Σ) and Y = a + BX , then Y ∼ N (a + Bµ, BΣB ′ ).

6 / 29
Properties of Multivariate Normal Distribution

▶ If (X , Y ) are multivariate normal, X and Y are uncorrelated if and only if they are
independent.
▶ If X ∼ N (0, Ik ) then X ′ X ∼ χ2k , chi-square with k degrees of freedom.
▶ If X ∼ N(0, Σ) with Σ > 0 then X ′ Σ−1 X ∼ χ2k where k = dim(X ).
p
▶ If Z ∼ N(0, 1) and Q ∼ χ2k are independent then Z / Q/k ∼ tk , student t with
k degrees of freedom.

7 / 29
Joint Normality and Linear Regression
▶ Suppose the variables (Y , X ) are jointly normally distributed. Consider the best
linear predictor of Y given X

Y = X ′β + α + e

▶ So E[Xe] = 0 and E[e] = 0, so X and e are uncorrelated, and hence independent


(Why?). because X and e are multivariate normal
▶ Independence implies that

& E e 2 | X = E e 2 = σ2
   
E[e | X ] = E[e] = 0

which are properties of a homoskedastic linear CEF.


▶ We have shown that when (Y , X ) are jointly normally distributed, they satisfy a
normal linear CEF
Y = X ′ β + α + e, e ∼ N 0, σ 2


e is independent of X .
8 / 29
Normal Regression Model

▶ The normal regression model is the linear regression model with an independent
normal error
Y = X ′β + e
e ∼ N 0, σ 2


▶ The normal regression model holds when (Y , X ) are jointly normally distributed.
▶ For notational convenience, X contains the intercept.

9 / 29
Normal Regression Model
▶ The normal regression model implies that
 the conditional density of Y given X
takes the form f (y | x) = 1
2 1/2
exp − 2σ1 2 (y − x ′ β)2
(2πσ )
▶ Under the assumption that the observations are mutually independent this implies
that the conditional density of (Y1 , . . . , Yn ) given (X1 , . . . , Xn ) is
n
Y
f (y1 , . . . , yn | x1 , . . . , xn ) = f (yi | xi )
i=1
n  
Y 1 1 ′
2
= exp − 2 yi − xi β
2 1/2 2σ
i=1 (2πσ )
n
!
1 1 X ′
2
= exp − 2 yi − xi β
(2πσ 2 )n/2 2σ
i=1
def
= Ln β, σ 2


▶ This is called the likelihood function when evaluated at the sample data.
10 / 29
Normal Regression Model

▶ For convenience it is typical to work with the natural logarithm


n
n 1 X 2 def
log Ln β, σ 2 = − log 2πσ 2 − 2 Yi − Xi′ β = ℓn β, σ 2
  
2 2σ
i=1

which is called the log-likelihood function.


▶ The maximum likelihood estimator (MLE) (βbmle , σ 2 ) is the value which
bmle
maximizes the log-likelihood.
▶ We can write the maximization problem as
 
2
= argmax ℓn β, σ 2

βbmle , σ
bmle
β∈Rk ,σ 2 >0

11 / 29
Normal Regression Model
 
▶ The maximizers βbmle , σ 2
bmle jointly solve the first-order conditions (FOC)
n
∂ 1 X  
ℓn β, σ 2 Xi Yi − Xi′ βbmle

0= = 2
∂β bmle ,σ 2 =b
β=β 2
σmle σ
bmle
i=1
n 2
∂ n 1 X
ℓn β, σ 2 Yi − Xi′ βbmle

0= 2
=− 2 + 4
∂σ bmle ,σ 2 =b
β=β 2
σmle 2b
σmle 2b
σmle
i=1
▶ The first FOC is proportional to the first-order conditions for the least squares
minimization problem. It follows that the MLE satisfies
n
!−1 n
!
X X

βmle =
b Xi X Xi Yi = βbols
i
i=1 i=1

▶ Solving the second FOC for 2


σ
bmle we find
n  2 n 2 n
1 X 1 X 1X 2
σ 2
bmle = Yi − Xi′ βbmle = Yi − Xi′ βbols = êi = σ 2
bols
n n n
i=1 i=1 i=1

12 / 29
Distribution of OLS Coefficient Vector

▶ In the normal linear regression model we can derive exact sampling distributions
for the OLS/MLE estimator, residuals, and variance estimator.
▶ The normality assumption e | X ∼ N 0, σ 2 combined with independence of the


observations has the multivariate implication

e | X ∼ N 0, In σ 2


▶ That is, the error vector e is independent of X and is normally distributed.


▶ Recall that the OLS estimator satisfies
−1
βb − β = X ′ X X ′e
β^=(X'X)^{-1}X'(Xβ+e)
which is a linear function of e.

13 / 29
Distribution of OLS Coefficient Vector
▶ Since linear functions of normals are also normal this implies that conditional on X
−1 ′
βb − β | X ∼ X ′ X X N 0, In σ 2

 −1 ′ −1 
∼ N 0, σ 2 X ′ X X X X ′X
 −1 
= N 0, σ 2 X ′ X

▶ In the normal regression model,


 −1 
βb | X ∼ N β, σ 2 X ′ X

▶ Letting βj and βbj denote the j th elements of β and β,


b we have
 h 
2 ′
−1 i
βj | X ∼ N βj , σ
b XX
jj

14 / 29
Distribution of OLS Residual Vector
▶ Recall that eb = Me where M = In − X (X ′ X )−1 X ′ . So conditional on X

eb = Me | X ∼ N 0, σ 2 MM = N 0, σ 2 M
 

▶ Furthermore, it is useful to find the joint distribution of β and eb.

(X ′ X )−1 X ′ e (X ′ X )−1 X ′
     
βb − β
= = e
eb Me M

▶ The vector has a joint normal distribution with covariance matrix


BB'\sigma^2  2 ′ −1 
σ (X X ) 0
0 σ2M
XM=0,
0
▶ Since the off-diagonal block is zero it follows that βb and eb are statistically
independent.
15 / 29
Distribution of Variance Estimator

▶ Next, consider the variance estimator s 2 .


▶ It satisfies ( n − k) s 2 = eb′ eb = e ′ Me. The spectral decomposition of M is
M = HΛH ′ where H ′ H = In and Λ is diagonal with the eigenvalues of M on the
diagonal.
▶ Since M is idempotent with rank n − k, it has n − k eigenvalues equalling 1 and k
eigenvalues equalling 0 , so
 
In−k 0
Λ=
0 0k

16 / 29
Distribution of Variance Estimator

▶ Let u = H ′ e ∼ N 0, In σ 2 and partition u = (u1′ , u2′ )′ where u1 ∼ N 0, In−k σ 2 .


 

Then
(n − k)s 2 = e ′ Me
 
′ In−k 0
=eH H ′e
0 0
 
′ In−k 0
=u u
0 0
= u1′ u1
∼ σ 2 χ2n−k
▶ We see that in the normal regression model the exact distribution of s 2 is a scaled
chi-square. Since eb is independent of βb it follows that s 2 is independent of βb as
well.

17 / 29
t-statistic
 h i 
2 ′ −1
▶ We already know that βj | X ∼ N βj , σ (X X )
b . So
jj

βbj − βj
r h i ∼ N(0, 1)
2 ′ −1
σ (X X )
jj

▶ Now take the standardized statistic and replace the unknown variance σ 2 with its
estimator s 2 . We call this a t-ratio or t-statistic

βbj − βj βbj − βj
T =r h i =  
s 2 (X ′ X )−1 s βbj
jj
 
where s βbj is the classical (homoskedastic) standard error for βbj .

18 / 29
t-statistic
▶ With algebraic re-scaling we can write the t-statistic as the ratio of the
standardized statistic and the square root of the scaled variance estimator.
r
βbj − βj (n − k)s 2
T =r h / /(n − k)
2 ′ −1
i σ2
σ (X X )
jj
N(0, 1)
∼q
χ2n−k /(n − k)
∼ tn−k

a student t distribution with n − k degrees of freedom.


▶ This derivation shows that the t-ratio has a sampling distribution which depends
only on the quantity n − k.

19 / 29
t-statistic

▶ An important caveat about the above theorem is that it only applies to the
t-statistic constructed with the homoskedastic (old-fashioned) standard error.
▶ It does not apply to a t-statistic constructed with any of the
heteroskedasticity-robust standard errors.
▶ In fact, the robust t-statistics can have finite sample distributions which deviate
considerably from tn−k even when the regression errors are independent N 0, σ 2 .


▶ Thus the distributional result in the above theorem and the use of the t
distribution in finite samples is only exact when applied to classical t-statistics
under the normality assumption.

20 / 29
Confidence Intervals for Regression Coefficients

▶ The OLS estimator βb is a point estimator for a coefficient β.


▶ A broader concept is a set or interval estimator which takes the form Cb = [L,
b U].
b
▶ The goal of an interval estimator Cb is to contain the true value, e.g. β ∈ Cb, with
high probability.
▶ The interval estimator Cb is a function of the data and hence is random.

21 / 29
Confidence Intervals for Regression Coefficients

▶ An interval estimator Cb is called a 1 − α confidence interval when


P[β ∈ Cb] = 1 − α for a selected value of α.
▶ The value 1 − α is called the coverage probability. Typical choices for the
coverage probability 1 − α are 0.95 or 0.90.
▶ The probability calculation P[β ∈ Cb] is easily mis-interpreted as treating β as
random and Cb as fixed. (The probability that β is in Cb.)
▶ This is not the appropriate interpretation. Instead, the correct interpretation is
that the probability P[β ∈ Cb] treats the point β as fixed and the set Cb as random.
It is the probability that the random set Cb covers (or contains) the fixed true
coefficient β.

22 / 29
Confidence Intervals for Regression Coefficients

▶ A good choice for a confidence interval for the regression coefficient β is obtained
by adding and subtracting from the estimator βb a fixed multiple of its standard
error:
Cb = [βb − c × s(β),
b βb + c × s(β)]
b

where c > 0 is a pre-specified constant which determines the coverage probability.


▶ This confidence interval is symmetric about the point estimator βb and its length is
proportional to the standard error s(β).
b

23 / 29
Confidence Intervals for Regression Coefficients
▶ Equivalently, Cb is the set of parameter values for β such that the t-statistic T (β)
is smaller (in absolute value) than c, that is
( )
βb− β
Cb = {β : |T (β)| ≤ c} = β : −c ≤ ≤c
s(β)
b
▶ The coverage probability of this confidence interval is
P[β ∈ Cb] = P[|T (β)| ≤ c]
= P[−c ≤ T (β) ≤ c]
▶ Since the t-statistic T (β) has the tn−k distribution, it equals F (c) − F (−c),
where F (u) is the student t distribution function with n − k degrees of freedom.
▶ Since F (−c) = 1 − F (c), we can write it as
P[β ∈ Cb] = 2F (c) − 1
This is the coverage probability of the interval Cb, and only depends on the
constant c.
24 / 29
Confidence Intervals for Regression Coefficients

▶ When the degree of freedom is large the distinction between the student t and the
normal distribution is negligible.
▶ In particular, for n − k ≥ 61 we have c ≈ 2.00 for a 95% interval.
▶ Using this value we obtain the most commonly used confidence interval in applied
econometric practice:
Cb = [βb − 2s(β),
b βb + 2s(β)]
b
▶ This is a useful rule-of-thumb. This 95% confidence interval Cb is simple to
compute and can be easily calculated from coefficient estimates and standard
errors.

25 / 29
t Test
▶ A typical goal in an econometric exercise is to assess whether or not a coefficient
β equals a specific value β0 .
▶ Often the specific value to be tested is β0 = 0 but this is not essential. This is
called hypothesis testing.
▶ For simplicity write the coefficient to be tested as β. The null hypothesis is

H0 : β = β0

▶ This states that the hypothesis is that the true value of β equals the hypothesized
value β0 .
▶ The alternative hypothesis is the complement of H0 , and is written as

H1 : β ̸= β0

26 / 29
t Test
▶ We are interested in testing H0 against H1 .
▶ The method is to design a statistic which is informative about H1 and to
characterize its sampling distribution.
▶ The standard statistic is the absolute value of the t-statistic

βb − β0
|T | =
s(β)
b

▶ If H0 is true then we expect |T | to be small, but if H1 is true then we would


expect |T | to be large.
▶ Hence the standard rule is to reject H0 in favor of H1 for large values of the
t-statistic |T | and otherwise fail to reject H0 . Thus the hypothesis test takes the
form: Reject H0 if |T | > c.

27 / 29
t Test
▶ The constant c which appears in the statement of the test is called the critical
value.
P [ Reject H0 | H0 ] = P [|T | > c | H0 ]
= P [T > c | H0 ] + P [T < −c | H0 ]
= 1 − F (c) + F (−c)
= 2(1 − F (c))
▶ We select the value c so that this probability equals a pre-selected value called the
significance level which is typically written as α.
▶ It is conventional to set α = 0.05, though this is not a hard rule. We then select c
so that F (c) = 1 − α/2, which means that c is the 1 − α/2 quantile (inverse
CDF) of the tn−k distribution.
▶ With this choice the decision rule “Reject H0 if |T | > c ” has a significance level
(false rejection probability) of α.

28 / 29
t Test

▶ A simplification of the above test is to report what is known as the p-value of the
test.
▶ In general, when a test takes the form ”Reject H0 if S > c” and S has null
distribution G (u) then the p-value of the test is p = 1 − G (S).
▶ A test with significance level α can be restated as ”Reject H0 if p < α ”.
▶ It is sufficient to report the p-value p and we can interpret the value of p as
indexing the test’s strength of rejection of the null hypothesis.
▶ Thus a p-value of 0.07 might be interpreted as ”nearly significant”, 0.05 as
”borderline significant”, and 0.001 as ”highly significant”.
▶ In the context of the normal regression model the p-value of a t-statistic |T | is
p = 2 (1 − Fn−k (|T |) ) where Fn−k is the tn−k CDF.

29 / 29

You might also like