Econometrics___Lecture_2___Simple_Regression
Econometrics___Lecture_2___Simple_Regression
▶ The simple regression model can be used to study the relationship between two variables.
1 / 32
T HE SIMPLE REGRESSION MODEL
I NTERPRETATION OF THE SIMPLE LINEAR REGRESSION MODEL
▶ The simple linear regression model is rarely applicable in practice but its discussion is
useful for pedagogical reasons.
2 / 32
T HE SIMPLE REGRESSION MODEL
E XAMPLES
Linearity: one-unit change in x has the same effect on y, regardless of the initial value of x. This
is often unrealistic. For example, in the wage-education example, we might want to allow for
increasing returns: the next year of education has a larger effect on wages than did the previous
year.
3 / 32
T HE SIMPLE REGRESSION MODEL
C AUSAL INTERPRETATION
4 / 32
T HE SIMPLE REGRESSION MODEL
P OPULATION REGRESSION FUNCTION (PFR)
▶ This means that the average value of the dependent variable can be expressed as a linear
function of the explanatory variable.
▶ It is important to understand that the equation tells us how the average value of y changes
with x; it does not say that y equals β0 + β1 x for all units in the population.
5 / 32
T HE SIMPLE REGRESSION MODEL
G RAPHICAL FORM
This graph represents the linear relationship between the dependent variable y and the
independent variable x for the entire population. It is important to note that we do not observe
the entire population. Instead, we typically work with a sample. 6 / 32
T HE SIMPLE REGRESSION MODEL
D ERIVING THE ORDINARY LEAST SQUARES (OLS) ESTIMATES
The residual for observation i is the difference between the actual yi and its fitted value:
Finding β̂0 and β̂1 to make the sum of squared residuals as small as possible. Gives the same
result as the method of moments.
▶ OLS estimators
8 / 32
T HE SIMPLE REGRESSION MODEL
D ERIVING THE ORDINARY LEAST SQUARES (OLS) ESTIMATES
Interpreting β1
1. The more x and y move together (high covariance), the higher the value of β1 : This is
because covariance measures the strength of the linear relationship between x and y . If
they are highly positively correlated, β1 will be larger, reflecting that changes in x strongly
influence changes in y .
2. If the variance of x is large, the effect of β1 will be smaller: This is because, with a large
spread in x, each individual unit change in x represents a smaller portion of the total
variation in x. So, the estimated effect of x on y (captured by β1 ) becomes smaller.
R stats: Let’s see how to calculate these coefficients in R. Open on Moodle R scripts / Lecture 2
- Simple OLS
9 / 32
T HE SIMPLE REGRESSION MODEL
D ERIVING THE ORDINARY LEAST SQUARES (OLS) ESTIMATES
▶ OLS fits as good as possible a regression line through the data points
10 / 32
T HE SIMPLE REGRESSION MODEL
E XAMPLE OF A SIMPLE REGRESSION
Fitted regression
▶ Causal interpretation?
11 / 32
T HE SIMPLE REGRESSION MODEL
P ROPERTIES OF OLS ON ANY SAMPLE OF DATA
12 / 32
T HE SIMPLE REGRESSION MODEL
P ROPERTIES OF OLS ON ANY SAMPLE OF DATA
R stats: Let’s see if we can confirm the OLS properties in R. Open on Moodle R scripts /
Lecture 2 - Simple OLS
13 / 32
T HE SIMPLE REGRESSION MODEL
G OODNESS OF FIT
Measures of variation:
14 / 32
T HE SIMPLE REGRESSION MODEL
G OODNESS OF FIT
15 / 32
T HE SIMPLE REGRESSION MODEL
G OODNESS OF FIT
So, it means that only one independent variable (x) does not explain much of the variation in
salaries
Caution: A high R-squared does not necessarily mean that the regression has a causal
interpretation!
16 / 32
T HE SIMPLE REGRESSION MODEL
I NCORPORATING NONLINEARITIES : S EMI - LOGARITHMIC FORM
We might want to allow for increasing returns: the next year of education has a larger effect on
wages than did the previous year.
17 / 32
T HE SIMPLE REGRESSION MODEL
I NCORPORATING NONLINEARITIES : S EMI - LOGARITHMIC FORM
Fitted regression
18 / 32
T HE SIMPLE REGRESSION MODEL
I NCORPORATING NONLINEARITIES : S EMI - LOGARITHMIC FORM
19 / 32
T HE SIMPLE REGRESSION MODEL
I NCORPORATING NONLINEARITIES : S EMI - LOGARITHMIC FORM
▶ The log-log form postulates a constant elasticity model, whereas the semi-log form
assumes a semi-elasticity
▶ Linear regression must be linear in the parameters (coefficients), but the variables can
undergo nonlinear transformations such as x 2 , log (x ), or ex .
20 / 32
T HE SIMPLE REGRESSION MODEL
E XPECTED VALUES AND VARIANCES OF THE OLS ESTIMATORS
▶ The estimated regression coefficients are random variables because they are calculated
from a random sample
▶ The question is what the estimators will estimate on average and how large will their
variability be in repeated samples
21 / 32
T HE SIMPLE REGRESSION MODEL
S TANDARD ASSUMPTIONS FOR THE LINEAR REGRESSION MODEL
22 / 32
T HE SIMPLE REGRESSION MODEL
A SSUMPTIONS FOR THE LINEAR REGRESSION MODEL ( CONT.)
23 / 32
T HE SIMPLE REGRESSION MODEL
A SSUMPTIONS FOR THE LINEAR REGRESSION MODEL ( CONT.)
Interpretation of unbiasedness:
▶ The estimated coefficients may be smaller or larger, depending on the sample resulting
from a random draw. However, on average, they will be equal to the values that
characterize the true relationship between y and x in the population.
▶ “On average” means if sampling was repeated, i.e. if drawing the random sample and doing
the estimation was repeated many times.
▶ In a given sample, estimates may differ considerably from true values. But if we repeat the
estimation for many different samplings, we will get, on average, the true value.
24 / 32
T HE SIMPLE REGRESSION MODEL
A SSUMPTIONS FOR THE LINEAR REGRESSION MODEL ( CONT.)
Assumption SLR.5: the homoskedasticity assumption plays no role in showing that β0 and β1
are unbiased. We add Assumption SLR.5 because it simplifies the variance calculations and
because it implies that OLS has certain efficiency properties, which we will see next class.
25 / 32
T HE SIMPLE REGRESSION MODEL
A SSUMPTIONS FOR THE LINEAR REGRESSION MODEL ( CONT.)
26 / 32
T HE SIMPLE REGRESSION MODEL
A SSUMPTIONS FOR THE LINEAR REGRESSION MODEL ( CONT.)
Homoskedasticity does not hold: more educated people likely have a wider variety of job
opportunities, which could lead to more wage variability at higher levels of education. People
with low levels of education have fewer opportunities and often must work at the minimum wage;
this reduces wage variability at low education levels.
27 / 32
T HE SIMPLE REGRESSION MODEL
A SSUMPTIONS FOR THE LINEAR REGRESSION MODEL ( CONT.)
Conclusion:
▶ The sampling variability of the estimated regression coefficients will be the higher, the
larger the variability of the unobserved factors, and the lower, the higher the variation in the
explanatory variable.
28 / 32
T HE SIMPLE REGRESSION MODEL
E STIMATING THE ERROR VARIANCE
The formulas of the previous slides allow us to isolate the factors that contribute to Var (β̂1 ) and
Var (β̂0 ). But these formulas are unknown, because σ 2 is unknown (it’s a population parameter).
Nevertheless, we can use the data to estimate σ 2 , which then allows us to estimate Var (β̂1 ) and
Var (β̂0 ).
29 / 32
T HE SIMPLE REGRESSION MODEL
▶ The estimated standard deviations of the regression coefficients are called “standard
errors.” They measure how precisely the regression coefficients are estimated.
30 / 32
T HE SIMPLE REGRESSION MODEL
R EGRESSION ON A BINARY EXPLANATORY VARIABLE
▶ This regression allows the mean value of y to differ depending on the state of x
▶ Note that the statistical properties of OLS are no different when x is binary
31 / 32
R EFERENCES
Heiss F. (2020). Using R for Introductory Econometrics, 2nd edition. Available here: link.
Chapter 2: The Simple Regression Model.
32 / 32