Lecture4
Lecture4
Zhian Hu
Fall 2024
1 / 29
This Lecture
2 / 29
The Normal Distribution
3 / 29
The Nornal Distribution
density
(x − µ)2
1
f (x) = √ exp − , −∞ < x < ∞
2πσ 2 2σ 2
▶ The expectation and variance of X are µ and σ 2 , respectively.
4 / 29
Multivariate Normal Distribution
5 / 29
Multivariate Normal Distribution
(x − µ)′ Σ−1 (x − µ)
1
f (x) = exp − , x ∈ Rk
(2π)k/2 det(Σ)1/2 2
▶ The expectation and covariance matrix of X are µ and Σ, respectively.
▶ If X ∼ N(µ, Σ) and Y = a + BX , then Y ∼ N (a + Bµ, BΣB ′ ).
6 / 29
Properties of Multivariate Normal Distribution
▶ If (X , Y ) are multivariate normal, X and Y are uncorrelated if and only if they are
independent.
▶ If X ∼ N (0, Ik ) then X ′ X ∼ χ2k , chi-square with k degrees of freedom.
▶ If X ∼ N(0, Σ) with Σ > 0 then X ′ Σ−1 X ∼ χ2k where k = dim(X ).
p
▶ If Z ∼ N(0, 1) and Q ∼ χ2k are independent then Z / Q/k ∼ tk , student t with
k degrees of freedom.
7 / 29
Joint Normality and Linear Regression
▶ Suppose the variables (Y , X ) are jointly normally distributed. Consider the best
linear predictor of Y given X
Y = X ′β + α + e
& E e 2 | X = E e 2 = σ2
E[e | X ] = E[e] = 0
e is independent of X .
8 / 29
Normal Regression Model
▶ The normal regression model is the linear regression model with an independent
normal error
Y = X ′β + e
e ∼ N 0, σ 2
▶ The normal regression model holds when (Y , X ) are jointly normally distributed.
▶ For notational convenience, X contains the intercept.
9 / 29
Normal Regression Model
▶ The normal regression model implies that
the conditional density of Y given X
takes the form f (y | x) = 1
2 1/2
exp − 2σ1 2 (y − x ′ β)2
(2πσ )
▶ Under the assumption that the observations are mutually independent this implies
that the conditional density of (Y1 , . . . , Yn ) given (X1 , . . . , Xn ) is
n
Y
f (y1 , . . . , yn | x1 , . . . , xn ) = f (yi | xi )
i=1
n
Y 1 1 ′
2
= exp − 2 yi − xi β
2 1/2 2σ
i=1 (2πσ )
n
!
1 1 X ′
2
= exp − 2 yi − xi β
(2πσ 2 )n/2 2σ
i=1
def
= Ln β, σ 2
▶ This is called the likelihood function when evaluated at the sample data.
10 / 29
Normal Regression Model
11 / 29
Normal Regression Model
▶ The maximizers βbmle , σ 2
bmle jointly solve the first-order conditions (FOC)
n
∂ 1 X
ℓn β, σ 2 Xi Yi − Xi′ βbmle
0= = 2
∂β bmle ,σ 2 =b
β=β 2
σmle σ
bmle
i=1
n 2
∂ n 1 X
ℓn β, σ 2 Yi − Xi′ βbmle
0= 2
=− 2 + 4
∂σ bmle ,σ 2 =b
β=β 2
σmle 2b
σmle 2b
σmle
i=1
▶ The first FOC is proportional to the first-order conditions for the least squares
minimization problem. It follows that the MLE satisfies
n
!−1 n
!
X X
′
βmle =
b Xi X Xi Yi = βbols
i
i=1 i=1
12 / 29
Distribution of OLS Coefficient Vector
▶ In the normal linear regression model we can derive exact sampling distributions
for the OLS/MLE estimator, residuals, and variance estimator.
▶ The normality assumption e | X ∼ N 0, σ 2 combined with independence of the
e | X ∼ N 0, In σ 2
13 / 29
Distribution of OLS Coefficient Vector
▶ Since linear functions of normals are also normal this implies that conditional on X
−1 ′
βb − β | X ∼ X ′ X X N 0, In σ 2
−1 ′ −1
∼ N 0, σ 2 X ′ X X X X ′X
−1
= N 0, σ 2 X ′ X
14 / 29
Distribution of OLS Residual Vector
▶ Recall that eb = Me where M = In − X (X ′ X )−1 X ′ . So conditional on X
eb = Me | X ∼ N 0, σ 2 MM = N 0, σ 2 M
(X ′ X )−1 X ′ e (X ′ X )−1 X ′
βb − β
= = e
eb Me M
16 / 29
Distribution of Variance Estimator
Then
(n − k)s 2 = e ′ Me
′ In−k 0
=eH H ′e
0 0
′ In−k 0
=u u
0 0
= u1′ u1
∼ σ 2 χ2n−k
▶ We see that in the normal regression model the exact distribution of s 2 is a scaled
chi-square. Since eb is independent of βb it follows that s 2 is independent of βb as
well.
17 / 29
t-statistic
h i
2 ′ −1
▶ We already know that βj | X ∼ N βj , σ (X X )
b . So
jj
βbj − βj
r h i ∼ N(0, 1)
2 ′ −1
σ (X X )
jj
▶ Now take the standardized statistic and replace the unknown variance σ 2 with its
estimator s 2 . We call this a t-ratio or t-statistic
βbj − βj βbj − βj
T =r h i =
s 2 (X ′ X )−1 s βbj
jj
where s βbj is the classical (homoskedastic) standard error for βbj .
18 / 29
t-statistic
▶ With algebraic re-scaling we can write the t-statistic as the ratio of the
standardized statistic and the square root of the scaled variance estimator.
r
βbj − βj (n − k)s 2
T =r h / /(n − k)
2 ′ −1
i σ2
σ (X X )
jj
N(0, 1)
∼q
χ2n−k /(n − k)
∼ tn−k
19 / 29
t-statistic
▶ An important caveat about the above theorem is that it only applies to the
t-statistic constructed with the homoskedastic (old-fashioned) standard error.
▶ It does not apply to a t-statistic constructed with any of the
heteroskedasticity-robust standard errors.
▶ In fact, the robust t-statistics can have finite sample distributions which deviate
considerably from tn−k even when the regression errors are independent N 0, σ 2 .
▶ Thus the distributional result in the above theorem and the use of the t
distribution in finite samples is only exact when applied to classical t-statistics
under the normality assumption.
20 / 29
Confidence Intervals for Regression Coefficients
21 / 29
Confidence Intervals for Regression Coefficients
22 / 29
Confidence Intervals for Regression Coefficients
▶ A good choice for a confidence interval for the regression coefficient β is obtained
by adding and subtracting from the estimator βb a fixed multiple of its standard
error:
Cb = [βb − c × s(β),
b βb + c × s(β)]
b
23 / 29
Confidence Intervals for Regression Coefficients
▶ Equivalently, Cb is the set of parameter values for β such that the t-statistic T (β)
is smaller (in absolute value) than c, that is
( )
βb− β
Cb = {β : |T (β)| ≤ c} = β : −c ≤ ≤c
s(β)
b
▶ The coverage probability of this confidence interval is
P[β ∈ Cb] = P[|T (β)| ≤ c]
= P[−c ≤ T (β) ≤ c]
▶ Since the t-statistic T (β) has the tn−k distribution, it equals F (c) − F (−c),
where F (u) is the student t distribution function with n − k degrees of freedom.
▶ Since F (−c) = 1 − F (c), we can write it as
P[β ∈ Cb] = 2F (c) − 1
This is the coverage probability of the interval Cb, and only depends on the
constant c.
24 / 29
Confidence Intervals for Regression Coefficients
▶ When the degree of freedom is large the distinction between the student t and the
normal distribution is negligible.
▶ In particular, for n − k ≥ 61 we have c ≈ 2.00 for a 95% interval.
▶ Using this value we obtain the most commonly used confidence interval in applied
econometric practice:
Cb = [βb − 2s(β),
b βb + 2s(β)]
b
▶ This is a useful rule-of-thumb. This 95% confidence interval Cb is simple to
compute and can be easily calculated from coefficient estimates and standard
errors.
25 / 29
t Test
▶ A typical goal in an econometric exercise is to assess whether or not a coefficient
β equals a specific value β0 .
▶ Often the specific value to be tested is β0 = 0 but this is not essential. This is
called hypothesis testing.
▶ For simplicity write the coefficient to be tested as β. The null hypothesis is
H0 : β = β0
▶ This states that the hypothesis is that the true value of β equals the hypothesized
value β0 .
▶ The alternative hypothesis is the complement of H0 , and is written as
H1 : β ̸= β0
26 / 29
t Test
▶ We are interested in testing H0 against H1 .
▶ The method is to design a statistic which is informative about H1 and to
characterize its sampling distribution.
▶ The standard statistic is the absolute value of the t-statistic
βb − β0
|T | =
s(β)
b
27 / 29
t Test
▶ The constant c which appears in the statement of the test is called the critical
value.
P [ Reject H0 | H0 ] = P [|T | > c | H0 ]
= P [T > c | H0 ] + P [T < −c | H0 ]
= 1 − F (c) + F (−c)
= 2(1 − F (c))
▶ We select the value c so that this probability equals a pre-selected value called the
significance level which is typically written as α.
▶ It is conventional to set α = 0.05, though this is not a hard rule. We then select c
so that F (c) = 1 − α/2, which means that c is the 1 − α/2 quantile (inverse
CDF) of the tn−k distribution.
▶ With this choice the decision rule “Reject H0 if |T | > c ” has a significance level
(false rejection probability) of α.
28 / 29
t Test
▶ A simplification of the above test is to report what is known as the p-value of the
test.
▶ In general, when a test takes the form ”Reject H0 if S > c” and S has null
distribution G (u) then the p-value of the test is p = 1 − G (S).
▶ A test with significance level α can be restated as ”Reject H0 if p < α ”.
▶ It is sufficient to report the p-value p and we can interpret the value of p as
indexing the test’s strength of rejection of the null hypothesis.
▶ Thus a p-value of 0.07 might be interpreted as ”nearly significant”, 0.05 as
”borderline significant”, and 0.001 as ”highly significant”.
▶ In the context of the normal regression model the p-value of a t-statistic |T | is
p = 2 (1 − Fn−k (|T |) ) where Fn−k is the tn−k CDF.
29 / 29