0% found this document useful (0 votes)
42 views

Web Page

The document outlines lecture 3 on the sampling distribution of OLS estimators. It will: 1) Recap the sampling distribution of the independent variable X. 2) Discuss the probability framework of linear regression. 3) Cover the sampling distribution of the slope coefficient β1. It will then introduce hypothesis testing in the next part of the lecture. An example using Stata will also revise the interpretation of regression coefficients.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Web Page

The document outlines lecture 3 on the sampling distribution of OLS estimators. It will: 1) Recap the sampling distribution of the independent variable X. 2) Discuss the probability framework of linear regression. 3) Cover the sampling distribution of the slope coefficient β1. It will then introduce hypothesis testing in the next part of the lecture. An example using Stata will also revise the interpretation of regression coefficients.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

outline lecture 3

• the OLS assumptions (continued)


Lecture 3 • sampling distribution of the OLS estimators
Introduction: Inference in Simple Linear Regression • tests of significance: is the slope di↵erent from zero?
• heteroskedasticity and homoskedasticity
• the Gauss-Markov theorem
Dragos Radu
[email protected] Recommended readings:
Stock and Watson, sub-chapters: 4.5 and 5.1-4.

Examples (separate videos):


5SSMN932: Introduction to Econometrics • interpretation of OLS coefficients
• hypothesis tests
(Stata output and graphical intuition)

the big picture: a five-step waltz object of interest b 1


overview: where are we going from here? steps 1 and 2

• we want to learn about the slope of the population regression line.


• we have data from a sample, so there is sampling uncertainty. • Yi = b 0 + b 1 · Xi + ui , i = 1, ..., n
There are five steps towards this goal: • b 1 = slope of the population regression line
1 State the population object of interest. Estimator: the OLS estimator of b̂ 1
2 Provide an estimator of this population object. Sampling distribution of b̂ 1 : Since the population regression line is
3 Derive the sampling distribution of the estimator (this requires E (Y |X ) = b 0 + b 1 and E (ui |Xi ) = 0, we need to also assume that:
certain assumptions about the normality of the distribution). • (Xi , Yi ), i = 1, ..., n are independent identically distributed draws.
4 The square root of the estimated variance of the sampling distribution • large outliers are unlikely.
is the standard error (SE) of the estimator.
5 Use the SE to construct t-statistics (for hypothesis tests) and
confidence intervals.

object of interest b 1 OLS assumptions


steps 1 and 2

• Yi = b 0 + b 1 · Xi + ui , i = 1, ..., n
• b 1 = slope of the population regression line
1 The error term has conditional mean zero given Xi : E (ui |Xi ) = 0.
Estimator: the OLS estimator of b̂ 1 2 (Xi , Yi ), i = 1, ..., n are independent identically distributed draws.
Sampling distribution of b̂ 1 : Since the population regression line is 3 Large outliers are unlikely.
E (Y |X ) = b 0 + b 1 and E (ui |Xi ) = 0, we need to also assume that:
• (Xi , Yi ), i = 1, ..., n are independent identically distributed draws.
• large outliers are unlikely.
These are our second and third OLS assumptions.
sampling distribution of b̂ 1 sampling distribution of b̂ 1
the estimates of b follow a probability distribution

• an estimator is a formula (e.g. OLS) for how to compute b̂ 1 • we need to discuss the properties of this sampling distribution of b̂
• we use a sample of data to obtain an estimate: a value of b̂ 1 based on the single sample that we have.
computed by the formula for a given sample • but remember that the sampling distribution refers to di↵erent
• each di↵erent sample from the same population will produce a values of b̂ across di↵erent samples not just within one.
di↵erent estimate of b • these b̂s are usually assumed to have a normal distribution because
• the probability distribution of these b̂ values across di↵erent samples the error term is normally distributed.
is called the sampling distribution of b̂

normal distribution mean and variance

bias and efficiency your turn: OLS assumptions


activity: compare four regressions

Open the file anscombe.dta which is on KEATS. Follow these steps:


1 Run the regressions:
. reg y1 x1
. reg y2 x2
. reg y3 x3
. reg y4 x4
2 Compare the regression output (coefficients, R 2 , SER).
How di↵erent are the four models?
3 Construct now scatter plots for each of these models:
. graph twoway (scatter y1 x1)(lfit y1 x1)
. ...
. graph twoway (scatter y4 x4)(lfit y4 x4)

What do you think now about the di↵erences between these models?
An estimator b̂ is unbiased if its sampling distribution has as its expected value Hint: Do our OLS assumptions hold in all these models?
the true value of b: E ( b̂) = b
outline lecture 3 part 1

Lecture 3
• recap: sampling distribution of X
Part I: The Sampling Distribution of OLS Estimators • the probability framework of linear regression
• sampling distribution of b̂ 1

Dragos Radu What comes next:


[email protected] In the next part (two) of this lecture we’ll introduce hypothesis testing.

Practical example (separate video):


Revise the interpretation of regression coefficients
5SSMN932: Introduction to Econometrics (Stata and graphical intuition)

THE NORMAL DISTRIBUTION THE DOUBLE STRUCTURE OF A SAMPLED RANDOM VARIABLE

1 X -m
2
Random variable X with unknown population mean mX
1 -
s
f(X) f X e 2
s 2 probability density
function of X

X ~ N m ,s 2
mX X

Sample of n observations X1, X2, ..., Xn: potential distributions

0
0 m-4s m-3s m-2s m-s m m+s m+2s m+3s m+4s X mX X1 mX X2 mX Xn

THE DOUBLE STRUCTURE OF A SAMPLED RANDOM VARIABLE THE DOUBLE STRUCTURE OF A SAMPLED RANDOM VARIABLE

Random variable X with unknown population mean mX Random variable X with unknown population mean mX

probability density probability density


function of X function of X

mX X mX X

Actual sample of n observations x1, x2, ..., xn: realization Sample of n observations X1, X2, ..., Xn: potential distributions
1
Estimator: X X 1 ... X n
n

mX x1 X1 x2 mX X2 mX xn Xn mX X1 mX X2 mX Xn
THE DOUBLE STRUCTURE OF A SAMPLED RANDOM VARIABLE Sampling of distribution of X

Random variable X with unknown population mean mX


15
probability density
function of X

10

mX X

Actual sample of n observations x1, x2, ..., xn: realization 5

1
Estimate: x x1 ... xn n=1
n 10 million samples

0
mX x1 X1 x2 mX X2 mX xn Xn 0 0.5 1

The actual number that we obtain, given the realization {x1, …, xn}, is known as our estimate.

Sampling of distribution of X THE DOUBLE STRUCTURE OF A SAMPLED RANDOM VARIABLE

15
probability density probability density
function of X function of X
n = 100

10

n = 25
5
n = 10
n=1
10 million samples

0
mX X mX X
0 0.5 1

THE DOUBLE STRUCTURE OF A SAMPLED RANDOM VARIABLE


the sampling distribution of the OLS estimator
1 1
s 2
X var X 1 ... X n var X 1 ... X n
n n2
1
var X 1 ... var X n The OLS estimator is computed from a sample of data. A di↵erent sample
n2
yields a di↵erent value of b̂ 1 . This is the source of the sampling
1 2 1 s 2
uncertainty of b̂ 1 . We want to:
sX ... s X2 ns X2 X
n2 n2 n • quantify the sampling uncertainty associated with b̂ 1
Sample of n observations X1, X2, ..., Xn: potential distributions • use b̂ 1 to test hypotheses such as b 1 = 0
• construct a confidence interval for b 1
variance s X2 variance s X2 variance s X2 Two steps to get there:
1 probability framework for linear regression
mX X1 mX X2 mX Xn
2 distribution of the OLS estimator

Thus we have demonstrated that the variance of the sample mean is equal to the variance of
X divided by n, a result with which you will be familiar from your statistics course.
probability framework for OLS the sampling distribution of b̂ 1

The probability framework for linear regression is summarized by the three


least squares assumptions.
• Population of interest like Y , b̂ 1 has a sampling distribution
ex: all possible school districts in di↵erent years • what is E ( b̂ 1 )?
• Random variables: Y, X if E ( b̂ 1 ) = b 1 then OLS is unbiased: a good thing!
Ex: Test Score, STR • what is var ( b̂ 1 ) (measure of sampling uncertainty)
• Joint distribution of (Y, X). We assume: we need to derive a formula for the standard error of b 1
The population regression function is linear • what is the distribution of b̂ 1 in large samples?
E (ui |Xi ) = 0 (1st Least Squares Assumption) in large samples, b̂ 1 is normally distributed.
outliers are unlikely (3rd L.S.A.)
• Data Collection by simple random sampling implies:
(Xi , Yi ), i = 1, ..., n are i.i.d. (2rd L.S.A.)

sampling distribution of b̂ 1 sampling distribution of b̂ 1

Yi = b 0 + b 1 · Xi + ui n n
 Xi X ( ui u)  Xi X ui
Y = b0 + b1 · X + u b̂ 1 b1 = i =1
= i =1
n 2 n 2
Therefore: Â Xi X Â Xi X
i =1 i =1
Yi Y = b 1 Xi X + ( ui u)

su2i
sb̂2 =
n n
 Xi X Yi Y  Xi X [ b 1 Xi X + ( ui u )]
b̂ 1 = i =1
n = i =1
n
2 n · sX2i
2 2
 Xi X  Xi X
i =1 i =1

sampling distribution of b̂ 1 variance of b̂ 1 and variance of X


based on textbook SW section 4.5

Conclusion:
The sampling variability of the estimated regression coefficients will be:
• the higher the larger the variability of the unobserved factors, and
• the lower, the higher the variation in the explanatory variable (X)
variance of b̂ 1 and variance of X what comes next?

• part 2 of lecture 3: hypothesis testing for regression coefficients


• before that: revise the interpretation of OLS coefficients
(practical example video)

If there is more variation in X, then there is more information in the data


that we can use to fit the regression line.

outline lecture 3 part 2

Lecture 3 • recap stats: testing a hypothesis related to the population means


Part II: Hypothesis Tests • testing a hypothesis related to a regression coefficient
• one-sided t-tests of hypotheses related to regression coefficients

Dragos Radu
What comes next:
[email protected]
In the next part (three) we discuss properties of the variance of the
regression coefficient.

Practical example (separate video):


Hypothesis tests for regression coefficients using the wage regression from
5SSMN932: Introduction to Econometrics
the previous example.

TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN

Assumption: X ~ N(m, s2) probability density


function of X
Null hypothesis: H0 : m m0 Distribution of X under H0: m =10
(standard deviation = 1 taken as given)
Alternative hypothesis: H1 : m m0

Example

Null hypothesis: H0 : m 10
Alternative hypothesis: H1 : m 10

6 7 8 9 10 11 12 13 14 X

Suppose that we have a sample of data for the example model and the sample mean X is 9.
Would this be evidence against the null hypothesis m = 10?
TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN

probability density probability density


function of X function of X
Distribution of X under H0: m =10 Distribution of X under H0: m =m0
(standard deviation = 1 taken as given) (standard deviation taken as given)

2.5% 2.5%

6 7 8 9 10 11 12 13 14 X m0–4sd m0–3sd m0–2sd m0–sd m0 m0+sd m0+2sd m0+3sd m0+4sd X


The usual procedure for making decisions is to reject the null hypothesis if it implies that the probability of getting
such an extreme sample mean is less than some (small) probability p.
Now suppose that in the example model the sample mean is equal to 14. This clearly For example, we might choose to reject the null hypothesis if it implies that the probability of getting such an
conflicts with the null hypothesis. extreme sample mean is less than 0.05 (5%).
14 is four standard deviations above the hypothetical mean and the chance of getting such According to this decision rule, we would reject the null hypothesis if the sample mean fell in the upper or lower
an extreme estimate is only 0.006%. We would reject the null hypothesis. 2.5% tails.

TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN

probability density probability density


function of X function of X
Distribution of X under H0: m =10 Distribution of X under H0: m =10
(standard deviation = 1 taken as given) (standard deviation = 1 taken as given)

2.5% 2.5% 2.5% 2.5%

6 7 8 9 10 11 12 13 14 X 6 7 8 9 10 11 12 13 14 X

23

TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN

5% and 1% acceptance regions compared


probability density probability density
5%: m0 – 1.96 s.d. ≤ X ≤ m0 + 1.96 s.d. –1.96 ≤ z ≤ 1.96
function of X function of X
Distribution of X under H0: m =10 1%: m0 – 2.58 s.d. ≤ X ≤ m0 + 2.58 s.d. –2.58 ≤ z ≤ 2.58
(standard deviation = 1 taken as given)
1% level
X - m0
z 5% level
s.d.

2.5% 2.5% 0.5% 0.5%

6 7 8 9 10 11 12 13 14 X m0–4sd m0–3sd m0–2sd m0–sd m0 m0+sd m0+2sd m0+3sd m0+4sd X


TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT

Statistics Regression model Example: p = b0 + b1w + u


Model X unknown m, s2 𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿 + 𝒖 Null hypothesis: H0: b1 = 1.0

Estimator X ෡𝟏 =
𝜷
σ 𝑿𝒊 − 𝑿ሜ 𝒀𝒊 − 𝒀ሜ Alternative hypothesis: H1: b1 ≠ 1.0
σ 𝑿𝒊 − 𝑿ሜ 𝟐

pˆ = 1.21 + 0.82w
Null hypothesis H 0 : m = m0 𝑯𝟎 : 𝜷𝟏 = 𝜷𝟎𝟏 (0.05) (0.10)
Alternative hypothesis H 1: m m 0 𝑯𝟏 : 𝜷𝟏 ≠ 𝜷𝟎𝟏

X − m0 ෡ 𝟏 − 𝜷𝟎𝟏
𝜷 ෡ 𝟏 − 𝜷𝟎𝟏 𝟎. 𝟖𝟐 − 𝟏. 𝟎𝟎
𝜷
Test statistic t= 𝒕=
s.e. ( X ) ෡𝟏
s.e. 𝜷 𝒕=
෡𝟏
=
𝟎. 𝟏𝟎
= −𝟏. 𝟖𝟎.
s.e. 𝜷
Reject H0 if t tcrit t tcrit
n = 20 degrees of freedom = 18 t crit, 5% = 2.101
Degrees of freedom n–1 n–k=n–2

There is one important difference. When locating the critical value of t, one must take
The critical value of t with 18 degrees of freedom is 2.101 at the 5% level. The absolute
account of the number of degrees of freedom. For random variable X, this is n – 1, where n
value of the t statistic is less than this, so we do not reject the null hypothesis.
is the number of observations in the sample. For regression it is n – k with k the number of
coefficients we estimate in the regression, in this case two: intercept and slope.

ONE-SIDED t TESTS OF HYPOTHESES RELATING TO REGRESSION COEFFICIENTS ONE-SIDED t TESTS OF HYPOTHESES RELATING TO REGRESSION COEFFICIENTS

Null hypothesis: H0: b 1 = 0 Null hypothesis: H 0: b 1 = 0


probability density Alternative hypothesis: H1: b 1 ≠ 0 probability density Alternative hypothesis: H 1: b 1 > 0
function of b̂ 2 function of b̂ 2

reject H0 do not reject H0 reject H0 do not reject H0 reject H0

5%
2.5% 2.5%

–1.96 sd 0 1.96 sd
෡𝟏
𝜷 0 1.65 sd
෡𝟏
𝜷

If you use a two-sided 5% significance test, your estimate must be 1.96 standard deviations However, if you can justify the use of a one-sided test, for example with H0: b1 > 0, your
above or below 0 if you are to reject H0. estimate has to be only 1.65 standard deviations above 0.

ONE-SIDED t TESTS OF HYPOTHESES RELATING TO REGRESSION COEFFICIENTS ONE-SIDED t TESTS OF HYPOTHESES RELATING TO REGRESSION COEFFICIENTS

Null hypothesis: H0: b 1 = 0 Null hypothesis: H 0: b 1 = 0


probability density Alternative hypothesis: H1: b 1 > 0 probability density Alternative hypothesis: H 1: b 1 > 0
function of b̂ 2 function of b̂ 2

do not reject H0 reject H0 do not reject H0 reject H0

5% 5%

0 1.65 sd b1 ෡𝟏
𝜷 0 1.65 sd b1 ෡𝟏
𝜷

Suppose that Y is genuinely determined by X and that the true (unknown) coefficient is b1, Suppose that we have a sample of observations and calculate the estimated slope
as shown. ෡ 𝟏. If it is as shown in the diagram, what do we conclude when we test H0?
coefficient,𝜷
ONE-SIDED t TESTS OF HYPOTHESES RELATING TO REGRESSION COEFFICIENTS ONE-SIDED t TESTS OF HYPOTHESES RELATING TO REGRESSION COEFFICIENTS

Null hypothesis: H0: b 1 = 0 Null hypothesis: H 0: b 1 = 0


probability density Alternative hypothesis: H1: b 1 > 0 probability density Alternative hypothesis: H 1: b 1 > 0
function of b̂ 2 function of b̂ 2

do not reject H0 reject H0 do not reject H0 reject H0

5% 5%

0 1.65 sd b1 ෡𝟏
𝜷 0 1.65 sd b1 ෡𝟏
𝜷

෡ 𝟏 lies in the rejection region. It makes no difference whether we


The answer is that 𝜷 What do we conclude if 𝜷෡ 𝟏is as shown? We fail to reject H0, irrespective of whether we
perform a two-sided test or a one-sided test. We come to the correct conclusion. perform a two-sided test or a two-sided test. We would make a Type II error in either case.

ONE-SIDED t TESTS OF HYPOTHESES RELATING TO REGRESSION COEFFICIENTS ONE-SIDED t TESTS OF HYPOTHESES RELATING TO REGRESSION COEFFICIENTS

Null hypothesis: H0: b 1 = 0 Null hypothesis: H 0: b 1 = 0


probability density Alternative hypothesis: H1: b 1 > 0 probability density Alternative hypothesis: H 1: b 1 > 0
function of b̂ 2 function of b̂ 2

do not reject H0 reject H0 do not reject H0 reject H0

5% 5%

0 1.65 sd b2 ෡𝟏
𝜷 0 1.65 sd b1 ෡𝟏
𝜷

What do we conclude if 𝜷 ෡ 𝟏 is not in the


෡ 𝟏is as shown here? In the case of a two-sided test, 𝜷 ෡ 𝟏 does lie in the rejection
However, if we are in a position to perform a one-sided test, 𝜷
rejection region. We are unable to reject H0. region and so we have demonstrated that X has a significant effect on Y (at the 5%
significance level, of course).

what comes next?

Lecture 3
Part III: Heteroskedasticity
• part 3 of lecture 3: properties of var (u |X )
• before that: revise hypothesis tests for regresion coefficients
Dragos Radu
(intuition and Stata output)
[email protected]

5SSMN932: Introduction to Econometrics


outline lecture 3 part 1 Anscombe quartet

• Gauss-Markov conditions
• heteroskedasticity
• Gauss-Markov theorem

Gauss-Markov conditions bias and efficiency


simple linear regression (SLR)

SLR.1: y = b0 + b1 x + u
SLR.2: random sampling from the population
SLR.3: some sample variation in the xi
SLR.4: E (u |x ) = 0

An estimator b̂ is unbiased if its sampling distribution has as its expected value


the true value of b: E ( b̂) = b

Gauss-Markov conditions bias and efficiency


Theorem: unbiasedness of OLS

SLR.1 SLR.4 =) E ( b̂ 0 ) = b 0 , E ( b̂ 1 ) = b 1
under conditions SLR1-4, the OLS coefficients are unbiased.
• the estimated coefficients may be smaller or larger, depending on the
sample that is the result of a random draw
• however, on average, they will be equal to the values that characterise
the true relationship between y and x in the population
• on average means if sampling was repeated, i.e. if drawing the
random sample and doing the estimation was repeated many times
• in a given sample, estimates may di↵er considerably from true values

An estimator b̂ is unbiased if its sampling distribution has as its expected value


the true value of b: E ( b̂) = b
heteroskedasticity and homoskedasticity graphical illustration of homoskedasticity
var (u ) does not depend on X

If var (u |X = x ) is constant – that is, if the variance of the conditional


distribution of u given X does not depend on X – then u is said to be
homoskedastic. Otherwise, u is heteroskedastic.

graphical illustration of heteroskedasticity graphical illustration of homoskedasticity


var (u ) does depend on X var (u ) does not depend on X

graphical illustration of homoskedasticity graphical illustration of heteroskedasticity


var (u ) does not depend on X var (u ) depends on X
graphical illustration of heteroskedasticity graphical illustration of heteroskedasticity
var (u ) depends on X var (u ) does depend on X

heteroskedasticity - residuals plot in Stata heteroskedasticity - residuals plot in Stata


does var (u ) depend on X? var (u ) does depend on X

homoskedasticity - residuals plot in Stata Gauss-Markov conditions


var (u ) does not depend on X simple linear regression (SLR)

SLR.1: y = b0 + b1 x + u
SLR.2: random sampling from the population
SLR.3: some sample variation in the xi
SLR.4: E (u |x ) = 0
SLR.5: Var (u |x ) = Var (u ) = s2
• under these assumptions OLS estimator has the smallest variance
among all linear estimators and is therefore BLUE (Best Linear
Unbiased Estimator). This is the Gauss-Markov theorem.
precision of the OLS coefficient
OLS estimator is BLUE

You might also like