ch02-1
ch02-1
y = 0 + 1x + u
1
Some Terminology
In the simple linear regression model,
where y = 0 + 1x + u, we typically refer
to y as the
Dependent Variable, or
Left-Hand Side Variable, or
Explained Variable, or
Regressand
E(u) = 0
. E(y|x) = + x
.
0 1
x1 x2
Economics 20 - Prof. Anderson 6
Ordinary Least Squares
Basic idea of regression is to estimate the
population parameters from a sample
Let {(xi,yi): i=1, …,n} denote a random
sample of size n from the population
For each observation in this sample, it will
be the case that
yi = 0 + 1xi + ui
y3 .} u3
y2 u2 {.
y1 .} u1
x1 x2 x3 x4 x
Economics 20 - Prof. Anderson 8
Deriving OLS Estimates
To derive the OLS estimates we need to
realize that our main assumption of E(u|x) =
E(u) = 0 also implies that
Cov(x,u) = E(xu) = 0
E(y – 0 – 1x) = 0
E[x(y – 0 – 1x)] = 0
y
n
n 1
i ˆ 0 ˆ1 xi 0
i 1
n
ˆ ˆ x 0
n 1
i i 0 1 i
x
i 1
y
Economics 20 - Prof. Anderson 12
More Derivation of OLS
Given the definition of a sample mean, and
properties of summation, we can rewrite the first
condition as follows
y ˆ0 ˆ1 x ,
or
ˆ0 y ˆ1 x
Economics 20 - Prof. Anderson 13
More Derivation of OLS
n
ˆ x ˆ x 0
i i
x
i 1
y y 1 1 i
n n
x
i i
y y ˆ
1 xi xi x
i 1 i 1
n n
i 1 i 1
x x y i i y
ˆ
1 i 1
n
x x
2
i
i 1
n
provided that xi x 0
2
i 1
} û1
y1 .
x1 x2 x3 x4 x
Economics 20 - Prof. Anderson 18
Alternate approach to derivation
Given the intuitive idea of fitting a line, we can
set up a formal minimization problem
That is, we want to choose our parameters such
that we minimize the following:
n n 2
ui yi 0 1 xi
ˆ ˆ
2
ˆ
i 1 i 1
n
ˆ ˆ x 0
y
i 1
i 0 1 i
n
ˆ ˆ x 0
i i 0 1i
x y
i 1
Economics 20 - Prof. Anderson 20
Algebraic Properties of OLS
The sum of the OLS residuals is zero
Thus, the sample average of the OLS
residuals is zero as well
The sample covariance between the
regressors and the OLS residuals is zero
The OLS regression line always goes
through the mean of the sample
n uˆ i
x uˆ
i 1
i i 0
y ˆ0 ˆ1 x
Economics 20 - Prof. Anderson 22
More terminology
We can think of each observation as being made
up of an explained part, and an unexplained part,
yi yˆ i uˆi We then define the following :
y y is the total sum of squares (SST)
2
i
y y y yˆ yˆ y
2 2
i i i i
uˆ yˆ y
2
i i
uˆ 2 uˆ yˆ y yˆ y
2 2
i i i i
R2 = SSE/SST = 1 – SSR/SST
x x y
ˆ
1 i
2
i
, where
s x
s xi x
2 2
x
x x y x x x
i i i 0 1 i ui
x x x x x
i 0 i 1 i
x x u
i i
x x x x x
0 i 1 i i
x x u
i i
x x x x x
2
i i i
x x u
ˆ1 1 i
2
i
s x
Economics 20 - Prof. Anderson 30
Unbiasedness of OLS (cont)
ˆ 1
E 1 1 2 d i E ui 1
sx
. E(y|x) = + x
.
0 1
x1 x2
Economics 20 - Prof. Anderson 35
Heteroskedastic Case
f(y|x)
y
.
. E(y|x) = 0 + 1x
.
x1 x2 x3 x
Economics 20 - Prof. Anderson 36
Variance of OLS (cont)
ˆ 1
Var 1 Var 1
2 d i ui
sx
2 2
1 1
2 Var d i ui i Var ui
2
2 d
sx sx
2 2
1 1
2
sx
d sx2
i
2 2 2
i
d 2
2
1 2 2
2
2 sx
ˆ
2 Var 1
sx sx
Economics 20 - Prof. Anderson 37
Variance of OLS Summary
The larger the error variance, 2, the larger
the variance of the slope estimate
The larger the variability in the xi, the
smaller the variance of the slope estimate
As a result, a larger sample size should
decrease the variance of the slope estimate
Problem that the error variance is unknown
n 2
Economics 20 - Prof. Anderson 40
Error Variance Estimate (cont)
ˆ ˆ 2 Standard error of the regression
recall that sd ˆ
sx
if we substitute ˆ for then we have
the standard error of ˆ1 ,
se ˆ1 ˆ / xi x
2
1
2