0% found this document useful (0 votes)

18 views

Theory 3. Linear Regression With One Regressor (Textbook Chapter 4)

This document discusses linear regression with one regressor. It covers estimating population slope coefficients using ordinary least squares, how to interpret estimated slopes and intercepts, measures of fit including R-squared and standard error of regression, and calculating predicted values and residuals.

Uploaded by

costea2112

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Theory 3. Linear Regression With One Regressor (Textbook Chapter 4)

Uploaded by

costea2112

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Linear Regression with One Regressor

(SW Chapter 4)

Linear regression allows us to estimate, and make

inferences about, population slope coefficients. Ultimately
our aim is to estimate the causal effect on Y of a unit change
in X – but for now, just think of the problem of fitting a
straight line to data on two variables, Y and X.

4-1
The problems of statistical inference for linear regression
are, at a general level, the same as for estimation of the mean
or of the differences between two means. Statistical, or
econometric, inference about the slope entails:

• Estimation:
o How should we draw a line through the data to estimate
the (population) slope (answer: ordinary least squares).
o What are advantages and disadvantages of OLS?
• Hypothesis testing:
o How to test if the slope is zero?
• Confidence intervals:
o How to construct a confidence interval for the slope?

4-2
Linear Regression: Some Notation and Terminology
(SW Section 4.1)

The population regression line:

Test Score = β0 + β1STR

β1 = slope of population regression line

ΔTest score
=
ΔSTR
= change in test score for a unit change in STR
• Why are β0 and β1 “population” parameters?
• We would like to know the population value of β1.
• We don’t know β1, so must estimate it using data.
4-3
The Population Linear Regression Model – general
notation

Yi = β0 + β1Xi + ui, i = 1,…, n

• X is the independent variable or regressor

• Y is the dependent variable
• β0 = intercept
• β1 = slope
• ui = the regression error
• The regression error consists of omitted factors, or
possibly measurement error in the measurement of Y. In
general, these omitted factors are other factors that
influence Y, other than the variable X
4-4
This terminology in a picture: Observations on Y and X; the
population regression line; and the regression error (the “error
term”):

4-5
The Ordinary Least Squares Estimator
(SW Section 4.2)

How can we estimate β0 and β1 from data?

Recall that Y was the least squares estimator of μY: Y solves,
n
min m ∑ (Yi − m) 2
i =1

By analogy, we will focus on the least squares (“ordinary

least squares” or “OLS”) estimator of the unknown
parameters β0 and β1, which solves,
n
min b0 ,b1 ∑ [Yi − (b0 + b1 X i )]2
i =1

4-6
Mechanics of OLS
The population regression line: Test Score = β0 + β1STR

ΔTest score
β1 = = ??
ΔSTR

4-7
n
The OLS estimator solves: min b0 ,b1 ∑ [Yi − (b0 + b1 X i )]2
i =1

• The OLS estimator minimizes the average squared

difference between the actual values of Yi and the prediction
(“predicted value”) based on the estimated line.
• This minimization problem can be solved using calculus
(App. 4.2).
• The result is the OLS estimators of β0 and β1.

4-8
4-9
Application to the California Test Score – Class Size data

Estimated slope = βˆ1 = – 2.28

Estimated intercept = βˆ = 698.9
0

n = 698.9 – 2.28×STR
Estimated regression line: TestScore
4-10
Interpretation of the estimated slope and intercept
n = 698.9 – 2.28×STR
TestScore
• Districts with one more student per teacher on average
have test scores that are 2.28 points lower.
ΔTest score
• That is, = –2.28
ΔSTR
• The intercept (taken literally) means that, according to this
estimated line, districts with zero students per teacher
would have a (predicted) test score of 698.9.
• This interpretation of the intercept makes no sense – it
extrapolates the line outside the range of the data – here,
the intercept is not economically meaningful.

4-11
Predicted values & residuals:

One of the districts in the data set is Antelope, CA, for which
STR = 19.33 and Test Score = 657.8
predicted value: Yˆ
Antelope= 698.9 – 2.28×19.33 = 654.8
residual: uˆ Antelope = 657.8 – 654.8 = 3.0
4-12
OLS regression: STATA output

regress testscr str, robust

Regression with robust standard errors Number of obs = 420

F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581

-------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+----------------------------------------------------------------
str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
-------------------------------------------------------------------------

n = 698.9 – 2.28×STR
TestScore

(we’ll discuss the rest of this output later)

4-13
Measures of Fit
(Section 4.3)

A natural question is how well the regression line “fits” or

explains the data. There are two regression statistics that
provide complementary measures of the quality of fit:

• The regression R2 measures the fraction of the variance of

Y that is explained by X; it is unitless and ranges between
zero (no fit) and one (perfect fit)

• The standard error of the regression (SER) measures the

magnitude of a typical regression residual in the units of
Y.
4-14
The regression R2 is the fraction of the sample variance of Yi
“explained” by the regression.
Yi = Yˆ + uˆ = OLS prediction + OLS residual
i i

⇒ sample var (Y) = sample var(Yˆi ) + sample var( uˆi ) (why?)

⇒ total sum of squares = “explained” SS + “residual” SS
n

2 2 ESS
∑ i
(Yˆ
i =1
− Yˆ ) 2

Definition of R : R = = n

∑ i
TSS
(Y − Y ) 2

i =1

• R2 = 0 means ESS = 0
• R2 = 1 means ESS = TSS
• 0 ≤ R2 ≤ 1
• For regression with a single X, R2 = the square of the
correlation coefficient between X and Y
4-15
The Standard Error of the Regression (SER)

The SER measures the spread of the distribution of u. The

SER is (almost) the sample standard deviation of the OLS
residuals:
1 n
SER = ∑ i
n − 2 i =1
( ˆ
u − ˆ
u ) 2

1 n 2
= ∑
n − 2 i =1
uˆi

1 n
(the second equality holds because û = ∑ uˆi = 0).
n i =1

4-16
1 n 2
SER = ∑
n − 2 i =1
uˆi

The SER:
• has the units of u, which are the units of Y
• measures the average “size” of the OLS residual (the
average “mistake” made by the OLS regression line)
• The root mean squared error (RMSE) is closely related to
the SER:
1 n 2
RMSE = ∑
n i =1
uˆi

This measures the same thing as the SER – the minor

difference is division by 1/n instead of 1/(n–2).

4-17
Technical note: why divide by n–2 instead of n–1?
1 n 2
SER = ∑
n − 2 i =1
uˆi

• Division by n–2 is a “degrees of freedom” correction – just

like division by n–1 in sY2 , except that for the SER, two
parameters have been estimated (β0 and β1, by βˆ and βˆ ),
0 1

whereas in sY2 only one has been estimated (μY, by Y ).

• When n is large, it makes negligible difference whether n,
n–1, or n–2 are used – although the conventional formula
uses n–2 when there is a single regressor.
• For details, see Section 17.4

4-18
Example of the R2 and the SER

n = 698.9 – 2.28×STR, R2 = .05, SER = 18.6

TestScore

STR explains only a small fraction of the variation in test

scores. Does this make sense? Does this mean the STR is
unimportant in a policy sense?
4-19
The Least Squares Assumptions
(SW Section 4.4)

What, in a precise sense, are the properties of the OLS

estimator? We would like it to be unbiased, and to have a
small variance. Does it? Under what conditions is it an
unbiased estimator of the true population parameters?

To answer these questions, we need to make some

assumptions about how Y and X are related to each other, and
about how they are collected (the sampling scheme)

These assumptions – there are three – are known as the

Least Squares Assumptions.
4-20
The Least Squares Assumptions
Yi = β0 + β1Xi + ui, i = 1,…, n

1. The conditional distribution of u given X has mean zero,

that is, E(u|X = x) = 0.
This implies that βˆ1 is unbiased
2. (Xi,Yi), i =1,…,n, are i.i.d.
• This is true if X, Y are collected by simple random
sampling
• This delivers the sampling distribution of βˆ0 and βˆ1
3. Large outliers in X and/or Y are rare.
• Technically, X and Y have finite fourth moments
• Outliers can result in meaningless values of βˆ1

4-21
Least squares assumption #1: E(u|X = x) = 0.
For any given value of X, the mean of u is zero:

Example: Test Scorei = β0 + β1STRi + ui, ui = other factors

• What are some of these “other factors”?
• Is E(u|X=x) = 0 plausible for these other factors?
4-22
Least squares assumption #1, ctd.
A benchmark for thinking about this assumption is to
consider an ideal randomized controlled experiment:
• X is randomly assigned to people (students randomly
assigned to different size classes; patients randomly
assigned to medical treatments). Randomization is done
by computer – using no information about the individual.
• Because X is assigned randomly, all other individual
characteristics – the things that make up u – are
independently distributed of X
• Thus, in an ideal randomized controlled experiment,
E(u|X = x) = 0 (that is, LSA #1 holds)
• In actual experiments, or with observational data, we will
need to think hard about whether E(u|X = x) = 0 holds.
4-23
Least squares assumption #2: (Xi,Yi), i = 1,…,n are i.i.d.

This arises automatically if the entity (individual, district)

is sampled by simple random sampling: the entity is selected
then, for that entity, X and Y are observed (recorded).

The main place we will encounter non-i.i.d. sampling is

when data are recorded over time (“time series data”) – this
will introduce some extra complications.

4-24
Least squares assumption #3: Large outliers are rare
Technical statement: E(X4) < ∞ and E(Y4) < ∞

• A large outlier is an extreme value of X or Y

• On a technical level, if X and Y are bounded, then they
have finite fourth moments. (Standardized test scores
automatically satisfy this; STR, family income, etc. satisfy
this too).
• However, the substance of this assumption is that a large
outlier can strongly influence the results

4-25
OLS can be sensitive to an outlier:

• Is the lone point an outlier in X or Y?

• In practice, outliers often are data glitches
(coding/recording problems) – so check your data for
outliers! The easiest way is to produce a scatterplot.
4-26
The Sampling Distribution of the OLS Estimator
(SW Section 4.5)

The OLS estimator is computed from a sample of data; a

different sample gives a different value of βˆ . This is the
1

source of the “sampling uncertainty” of βˆ1 . We want to:

• quantify the sampling uncertainty associated with βˆ 1

• use βˆ1 to test hypotheses such as β1 = 0

• construct a confidence interval for β1
• All these require figuring out the sampling distribution of
the OLS estimator. Two steps to get there…
o Probability framework for linear regression
o Distribution of the OLS estimator
4-27
Probability Framework for Linear Regression

The probability framework for linear regression is

summarized by the three least squares assumptions.
Population
The group of interest (ex: all possible school districts)
Random variables: Y, X
Ex: (Test Score, STR)
Joint distribution of (Y, X)
The population regression function is linear
E(u|X) = 0 (1st Least Squares Assumption)
X, Y have finite fourth moments (3rd L.S.A.)
Data Collection by simple random sampling:
nd
,
{(Xi iY )}, i = 1,…, n, are i.i.d. (2 L.S.A.)
4-28
The Sampling Distribution of βˆ1

Like Y , βˆ1 has a sampling distribution.

• What is E( βˆ )? (where is it centered?)
1

o If E( βˆ1 ) = β1, then OLS is unbiased – a good thing!

• What is var( βˆ )? (measure of sampling uncertainty)
1

• What is the distribution of βˆ1 in small samples?

o It can be very complicated in general
• What is the distribution of βˆ in large samples?
1

o It turns out to be relatively simple – in large samples,

βˆ is normally distributed.
1

4-29
The mean and variance of the sampling distribution of βˆ1
Some preliminary algebra:
Yi = β0 + β1Xi + ui
Y = β0 + β1 X + u
so Yi – Y = β1(Xi – X ) + (ui – u )
Thus,
n

∑( X i − X )(Yi − Y )
βˆ1 = i =1
n

∑ i
( X
i =1
− X ) 2

∑( X
i =1
i − X )[ β1 ( X i − X ) + (ui − u )]
= n

∑ i
( X
i =1
− X ) 2

4-30
n n

∑( X i − X )( X i − X ) ∑( X i − X )(ui − u )
βˆ1 = β1 i =1
n
+ i =1
n

∑( Xi − X )
i =1
2
∑ i
( X
i =1
− X ) 2

∑( X i − X )(ui − u )
so βˆ1 – β1 = i =1
n
.
∑ i
( X
i =1
− X ) 2

n
⎡ n n
⎤
Now ∑ ( X i − X )(u i − u ) = ∑ ( X i − X )u i – ⎢ ∑ ( X i − X ) ⎥ u
i =1 i =1 ⎣ i =1 ⎦
n
⎡⎛ n ⎞ ⎤
= ∑ ( X i − X )u i – ⎢ ⎜ ∑ X i ⎟ − nX ⎥ u
i =1 ⎣ ⎝ i =1 ⎠ ⎦
n
= ∑( Xi =1
i − X )u i

4-31
n n
Substitute ∑( X
i =1
i − X )(u i − u ) = ∑( X
i =1
i − X )u i into the

expression for βˆ1 – β1:

∑( X i − X )(ui − u )
βˆ1 – β1 = i =1
n

∑ i
( X
i =1
− X ) 2

so
n

∑( X i − X )u i
βˆ1 – β1 = i =1
n

∑ i
( X
i =1
− X ) 2

4-32
Now we can calculate E( βˆ1 ) and var( βˆ1 ):
⎡ n ⎤
⎢ ∑ ( X i − X )u i ⎥
E( βˆ1 ) – β1 = E ⎢ i =n1 ⎥
⎢ ( X − X )2 ⎥
⎢⎣ ∑i =1
i
⎥⎦
⎧ ⎡ n ⎤ ⎫
⎪ ⎢ ∑ ( X i − X )u i ⎥ ⎪
⎪ ⎪
= ⎨ ⎢ n
E E i =1
⎥ 1X ,..., X n⎬
⎪ ⎢ ∑ ( X i − X )2 ⎥ ⎪
⎩⎪ ⎣⎢ i =1 ⎦⎥ ⎭⎪
= 0 because E(ui|Xi=x) = 0 by LSA #1
• Thus LSA #1 implies that E( βˆ ) = β1
1

• That is, βˆ1 is an unbiased estimator of β1.

• For details see App. 4.3
4-33
Next calculate var( βˆ1 ):
write
n
1 n
∑ ( X i − X )u i ∑
n i =1
vi
βˆ1 – β1 = i =n1 =
⎛ n −1⎞ 2
∑ i =1
(Xi − X ) 2
⎜
⎝ n ⎠
⎟ sX

n −1
where vi = (Xi – X )ui. If n is large, s ≈ σ and
2
X
2
X≈ 1, so
n
1 n
∑
n i =1
vi
ˆ
β 1 – β1 ≈ ,
2
σX

where vi = (Xi – X )ui (see App. 4.3). Thus,

4-34
1 n
∑
n i =1
vi
βˆ1 – β1 ≈
σ X2
so var( βˆ1 – β1) = var( βˆ1 )
var( v ) / n
=
(σ X2 ) 2
so
ˆ 1 var[( X i − μ x )ui ]
var( β1 – β1) = × .
n σX 4

Summary so far
• βˆ is unbiased: E( βˆ ) = β1 – just like Y !
1 1

• var( βˆ1 ) is inversely proportional to n – just like Y !

4-35
What is the sampling distribution of βˆ1 ?
The exact sampling distribution is complicated – it
depends on the population distribution of (Y, X) – but when n
is large we get some simple (and good) approximations:
p
(1) Because var( βˆ1 ) ∝ 1/n and E( βˆ1 ) = β1, βˆ1 → β1

(2) When n is large, the sampling distribution of βˆ1 is

well approximated by a normal distribution (CLT)

Recall the CLT: suppose {vi}, i = 1,…, n is i.i.d. with E(v) =

1 n
0 and var(v) = σ2. Then, when n is large, ∑ vi is
n i =1
approximately distributed N(0,σ v2 / n ).

4-36
Large-n approximation to the distribution of βˆ1 :
1 n 1 n
∑
n i =1
vi ∑
n i =1
vi
ˆ
β 1 – β1 = ≈ , where vi = (Xi – X )ui
⎛ n −1⎞ 2 σX 2

⎜ ⎟ X
s
⎝ n ⎠
• When n is large, vi = (Xi – X )ui ≈ (Xi – μX)ui, which is
i.i.d. (why?) and var(vi) < ∞ (why?). So, by the CLT,
1 n
∑
n i =1
vi is approximately distributed N(0, σ 2
v / n ).

• Thus, for n large, βˆ is approximately distributed

⎛ σ 2
⎞
βˆ1 ~ N ⎜ β1 , v4 ⎟ , where vi = (Xi – μX)ui
⎝ nσ X ⎠

4-37
The larger the variance of X, the smaller the variance of βˆ1

The math
ˆ 1 var[( X i − μ x )ui ]
var( β1 – β1) = ×
n σ X4
where σ X2 = var(Xi). The variance of X appears in its square
in the denominator – so increasing the spread of X decreases
the variance of β1.

The intuition
If there is more variation in X, then there is more
information in the data that you can use to fit the regression
line. This is most easily seen in a figure…

4-38
The larger the variance of X, the smaller the variance of βˆ1

There are the same number of black and blue dots – using
which would you get a more accurate regression line?

4-39
Summary of the sampling distribution of βˆ1 :
If the three Least Squares Assumptions hold, then
• The exact (finite sample) sampling distribution of βˆ1 has:
o E( βˆ ) = β1 (that is, βˆ is unbiased)
1 1

ˆ 1 var[( X i − μ x )ui ] 1
o var( β1 ) = × ∝ .
n σX 4
n
• Other than its mean and variance, the exact distribution of
βˆ is complicated and depends on the distribution of (X,u)
1
p
• βˆ1 → β1 (that is, βˆ1 is consistent)
βˆ1 − E ( βˆ1 )
• When n is large, ~ N(0,1) (CLT)
var( βˆ1 )
• This parallels the sampling distribution of Y .
4-40
We are now ready to turn to hypothesis tests & confidence
intervals…
4-41

Introduction To Econometrics - Stock & Watson - CH 4 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 4 Slides
84 pages
Temas 4 Al 7
No ratings yet
Temas 4 Al 7
191 pages
Stock and Watson - Slides For Chapter 4
No ratings yet
Stock and Watson - Slides For Chapter 4
43 pages
Lecture set 2
No ratings yet
Lecture set 2
47 pages
Lecture 2 SLR - 1
No ratings yet
Lecture 2 SLR - 1
28 pages
Linear Regression
No ratings yet
Linear Regression
73 pages
Introduction To Multiple Regression
No ratings yet
Introduction To Multiple Regression
36 pages
ECON6001: Applied Econometrics S&W: Chapter 4: Linear Regression With One Regressor, An Introduction Dr. Gedeon Lim
No ratings yet
ECON6001: Applied Econometrics S&W: Chapter 4: Linear Regression With One Regressor, An Introduction Dr. Gedeon Lim
59 pages
IAES Tajikistan Day4
No ratings yet
IAES Tajikistan Day4
46 pages
StockWatson Econ CH04
No ratings yet
StockWatson Econ CH04
27 pages
Financial Data Analysis Unit e Lectures
No ratings yet
Financial Data Analysis Unit e Lectures
36 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Regression With A Single Regressor: Hypothesis Tests and Confidence Intervals
No ratings yet
Regression With A Single Regressor: Hypothesis Tests and Confidence Intervals
46 pages
StockWatson Econ CH04
No ratings yet
StockWatson Econ CH04
27 pages
Week 3-4
No ratings yet
Week 3-4
75 pages
Econometrics Chapter 4
No ratings yet
Econometrics Chapter 4
5 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
3-TheSimpleLinearRegressionModelPart2
No ratings yet
3-TheSimpleLinearRegressionModelPart2
38 pages
Applications Chapter 4
No ratings yet
Applications Chapter 4
38 pages
Simple Regression
No ratings yet
Simple Regression
27 pages
StockWatson Econ CH 3
No ratings yet
StockWatson Econ CH 3
20 pages
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
No ratings yet
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
15 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
24 pages
M05 StockWatson123635 03 Econ Ch05
No ratings yet
M05 StockWatson123635 03 Econ Ch05
42 pages
EMF CheatSheet V4
100% (1)
EMF CheatSheet V4
2 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
Introduction To Econometrics - Stock & Watson - CH 5 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 5 Slides
71 pages
Introduction To Econometrics
No ratings yet
Introduction To Econometrics
37 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Ec410 Lecture 4 - Simple Regression II
No ratings yet
Ec410 Lecture 4 - Simple Regression II
8 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
C1 English
No ratings yet
C1 English
26 pages
Bivariate Regression Analysis: The Beginning of Many Types of Regression
No ratings yet
Bivariate Regression Analysis: The Beginning of Many Types of Regression
40 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Any Questions?: Eco 205: Econometrics
No ratings yet
Any Questions?: Eco 205: Econometrics
25 pages
Lecture 4 MLR - 1
No ratings yet
Lecture 4 MLR - 1
30 pages
Lecture 4 MLR - 1
No ratings yet
Lecture 4 MLR - 1
30 pages
Lecture 6
No ratings yet
Lecture 6
45 pages
Linear Regression With One Regressor
No ratings yet
Linear Regression With One Regressor
41 pages
Simple Linear Regression - Lecture Notes
No ratings yet
Simple Linear Regression - Lecture Notes
19 pages
1- The Simple Regression Model
No ratings yet
1- The Simple Regression Model
41 pages
Gujarati D, Porter D, 2008: Basic Econometrics 5Th Edition Summary of Chapter 3-5
No ratings yet
Gujarati D, Porter D, 2008: Basic Econometrics 5Th Edition Summary of Chapter 3-5
64 pages
CH 5 Slidesdd (1ddd)
No ratings yet
CH 5 Slidesdd (1ddd)
71 pages
No Linealidades Stock Watson
No ratings yet
No Linealidades Stock Watson
59 pages
Econ 399 Chapter2a
No ratings yet
Econ 399 Chapter2a
40 pages
Econometrics Chap 3
No ratings yet
Econometrics Chap 3
19 pages
Nonlinear Regression Part 1
No ratings yet
Nonlinear Regression Part 1
47 pages
Lecture set 5
No ratings yet
Lecture set 5
54 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
Introduction To Econometrics - Stock & Watson - CH 6 Slides
No ratings yet
Introduction To Econometrics - Stock & Watson - CH 6 Slides
59 pages
Lecture 2
No ratings yet
Lecture 2
39 pages
BRM - L4,5 - Linear Regression
No ratings yet
BRM - L4,5 - Linear Regression
113 pages
Lecture 2-3_Properties of the OLS Estimates
No ratings yet
Lecture 2-3_Properties of the OLS Estimates
20 pages
Chapter 3
No ratings yet
Chapter 3
40 pages
YD Slides5 NonLin
No ratings yet
YD Slides5 NonLin
54 pages
October 25, 2011
No ratings yet
October 25, 2011
27 pages

Theory 3. Linear Regression With One Regressor (Textbook Chapter 4)

Uploaded by

Theory 3. Linear Regression With One Regressor (Textbook Chapter 4)

Uploaded by

Linear Regression with One Regressor

Linear regression allows us to estimate, and make

The population regression line:

Test Score = β0 + β1STR

β1 = slope of population regression line

Yi = β0 + β1Xi + ui, i = 1,…, n

• X is the independent variable or regressor

How can we estimate β0 and β1 from data?

By analogy, we will focus on the least squares (“ordinary

• The OLS estimator minimizes the average squared

Estimated slope = βˆ1 = – 2.28

regress testscr str, robust

Regression with robust standard errors Number of obs = 420

(we’ll discuss the rest of this output later)

A natural question is how well the regression line “fits” or

• The regression R2 measures the fraction of the variance of

• The standard error of the regression (SER) measures the

⇒ sample var (Y) = sample var(Yˆi ) + sample var( uˆi ) (why?)

The SER measures the spread of the distribution of u. The

This measures the same thing as the SER – the minor

• Division by n–2 is a “degrees of freedom” correction – just

whereas in sY2 only one has been estimated (μY, by Y ).

n = 698.9 – 2.28×STR, R2 = .05, SER = 18.6

STR explains only a small fraction of the variation in test

What, in a precise sense, are the properties of the OLS

To answer these questions, we need to make some

These assumptions – there are three – are known as the

1. The conditional distribution of u given X has mean zero,

Example: Test Scorei = β0 + β1STRi + ui, ui = other factors

This arises automatically if the entity (individual, district)

The main place we will encounter non-i.i.d. sampling is

• A large outlier is an extreme value of X or Y

• Is the lone point an outlier in X or Y?

The OLS estimator is computed from a sample of data; a

source of the “sampling uncertainty” of βˆ1 . We want to:

• use βˆ1 to test hypotheses such as β1 = 0

The probability framework for linear regression is

Like Y , βˆ1 has a sampling distribution.

o If E( βˆ1 ) = β1, then OLS is unbiased – a good thing!

• What is the distribution of βˆ1 in small samples?

o It turns out to be relatively simple – in large samples,

expression for βˆ1 – β1:

• That is, βˆ1 is an unbiased estimator of β1.

where vi = (Xi – X )ui (see App. 4.3). Thus,

• var( βˆ1 ) is inversely proportional to n – just like Y !

(2) When n is large, the sampling distribution of βˆ1 is

Recall the CLT: suppose {vi}, i = 1,…, n is i.i.d. with E(v) =

• Thus, for n large, βˆ is approximately distributed

You might also like