Lecture 2 Simple Regression Model
Lecture 2 Simple Regression Model
3
Y = -250.568 + 3671.397 X,
(18.11663) (87.17442)
The Simple Regression Model
Definition of the simple linear regression model
y 0 1 x u
Dependent variable, Error term,
explained variable, Independent variable, disturbance,
regressand, explanatory variable, unobservables,…
left hand side variable right hand side variable 4
response variable,… control variable,
regressor, covariate,…
QUESTION
1. A dependent variable is also known as a(n) _____.
a. explanatory variable
b. control variable
c. predictor variable
d. response variable
y u
1 as long as 0
x x
By how much does the dependent Interpretation only correct if all other
variable change if the independent things remain equal when the indepen-
variable is increased by one unit? dent variable is increased by one unit
(Infeasible!)
8
The conditional mean independence assumption is unlikely to hold because
individuals with more education will also be more intelligent on average.
Population Regression Function
Population regression function (PRF)
The conditional mean independence assumption implies that
E ( y | x) E ( 0 1 x u | x)
0 1 x E (u | x)
0 1 x
This means that the average value of the dependent variable
can be expressed as a linear function of the explanatory variable
Formally, (compared to the ideal case)
E ( y | x)
1 and 0 E ( y | x 0) 9
x
Population regression function
For individuals
with x=x2 , the
average value of
y is β0+β1x2.
10
A random sample
In order to estimate the regression model one needs data
A random sample of n observations
Second observation
Third observation
11
Fit as good as possible a regression line through the data points:
12
13
Example 1
CEO Salary and return on equity
Fitted regression
Causal interpretation? 14
Fitted regression line
(depends on sample)
Unknown population
regression line
15
Example 2
Wage and education
Fitted regression
Causal interpretation?
16
Example 3
Voting outcomes and campaign expenditures (two
parties)
uˆi yi yˆi
Fitted or predicted values Residuals (Deviations from regression line)
uˆ
i 1
i 0 uˆ x
i 1
i i 0
Goodness-of-Fit
How well does the explanatory variable explain the dependent variable?
Measures of Variation
16
20
Decomposition of total variation
For example:
Growth rate of wage is 8.3%
per year of education
wage 0.83$
wage
10$ 0.083 8.3%
educ 1 year 26
Log-logarithmic form
CEO salary and firm sales
log(salary) 0 1 log(sales) u
28
Expected values and variances
of the OLS estimators
The estimated regression coefficients are random variables
because they are calculated from a random sample
Data is random and depends on particular sample that has been drawn
29
Standard Assumptions
x , y : i 1,..., n
i i
The data is a random sample
drawn from the population
Throw back worker into population and repeat random draw n times
The wages and years of education of the sampled workers are used
to estimate the linear relationship between wages and education
31
The values drawn
for the i-th worker
Interpretation of unbiasedness
The estimated coefficients may be smaller or larger, depending on the sample
that is the result of a random draw
However, on average, they will be equal to the values that characterize the
true relationship between y and x in the population
“On average” means if sampling was repeated, i.e. if drawing the random
sample and doing the estimation was repeated many times
In a given sample, estimates may differ considerably from true values
34
Variances of the OLS estimators
Var (ui | xi ) E (u | xi )
2
i
2
36
An example for heteroskedasticity: Wage and education
37
CAR CONSUMPTION VS INCOME OR WEALTH
Mark Zuckerberg
38
BILL GATES
39
TRUMP
40
QUESTION
7. Which of the following is a nonlinear regression model?
a. y = β0 + β1x1/2 + u
b. log y = β0 + β1log x +u
c. y = 1 / (β0 + β1x) + u
d. y = β0 + β1x + u
Conclusion:
The sampling variability of the estimated regression coefficients will
be the higher the larger the variability of the unobserved factors,
and the lower, the higher the variation in the explanatory variable
42
Estimating the error variance
43
Unbiasedness of the error variance
Theorem 2.3 (Unbiasedness of the error variance)
Plug in for
the unknown
ˆ
x x y
i i
, where s xi x
2 2
1 2 x
s x
The numerator
n n
x x y x x
i 1
i i
i 1
i 0 1 xi ui
n n n
0 xi x 1 xi x xi xi x xi ui 45
i 1 i 1 i 1 45
UNBIASEDNESS OF OLS (CONT)
x x 0, x x x x x
2
i i i i
so, ˆ
x x
iu
and i
1 1 2
s
xi x E ui | xi
x
E ˆ1 1
s 2
x
1
E ˆ ?
0 46