Week 2
Week 2
Applied Econometrics
ECO440/ECO640
Niagara University
Lecture Outline
• Goal of OLS Regression: Estimating Empirical Regression
Equation
Yˆ i βˆ 0 βˆ 1Xi (2.2)
• However, this equation should best “fit” the underlying relationship in the data
– Based on our data, what should be the value of ?
– How does an additional year of schooling translate into additional wages?
Possible Empirical Regression Candidates
• Do any of these possible empirical regression equations seem to fit the
data well?
• Consider the errors in prediction (residuals) between the data and the
regression line ()
• = level of wages that is predicted, given a specific level of education
• Consider the slope and what that implies for the relationship between
education and schooling
Preferred Empirical Regression Equation
• Best fitted regression line is seen below, chosen using a
method called OLS regression
• Why is this equation preferable to the others considered on
the previous slide?
Ordinary Least Squares Regression
(i = 1,2,…,N)
OLS Estimate Solutions for Single
Variable Regression
• The OLS solution estimates for the parameters in our
theoretical regression equation (2.1) are seen below:
Yi β0 β1Xi ε i (2.1)
(X X)(Y Y)
i i
β̂1 i1
N (2.4)
i
(X
i1
X) 2
βˆ 0 Y βˆ 1X (2.5)
Intuition of OLS Slope Coefficient Estimate
• Let’s consider the OLS estimate for to add some intuition behind the
mathematical form of our estimates
• Numerator represents the covariance between X and Y
– Covariance of X and Y: Statistically tells us how related is the
movement of our independent and dependent variables
– Important as is capturing how changing the X variable by 1 unit
impacts the variable of interest Y
– Can the numerator be positive or negative?
– As the relationship between X and Y becomes stronger, what
should that mean for magnitude of ?
N
(X X)(Y Y)
i i
β̂1 i1
N
i
(X
i1
X) 2
Intuition of OLS Slope Coefficient Estimate
• Denominator represents the variance of X
– Variance of X: Statistically tells us the dispersion of values of X
(i.e., does X take on a wide or narrow range of values?)
Think about X as gender, years of education, age of worker:
which X will likely have the greatest(smallest) variance?
– Can the denominator be positive or negative?
– Would your choice of units likely matter?
Let’s say X is one’s salary and its measured in dollars or
thousands of dollars
Will that choice impact variance of X?
N
(X X)(Y Y)
i i
β̂1 i1
N
i
(X
i1
X) 2
Intuition of OLS Slope Coefficient Estimate
• represents the ratio of the covariance of X,Y to the variance of X
– Interpretation: We are weighting the relationship between X and Y
by the dispersion in the X variable
Strong relation between X and Y lead to larger values of (in
absolute value terms)
However, as is an estimate of the marginal effect of increasing
X by 1 unit on Y, this relationship between X and Y should be
scaled to account for the range of values of X
ˆβ 590.20 6.38
1
92.50
βˆ 169.4 (6.38 * 10.35)
0
103.4
ˆ 103.4 6.38X
Yi i
Lecture Outline
• Goal of OLS Regression: Estimating Empirical Regression
Equation
^
FINAI D i=8927 −357 PAREN T i +87.4 HSRAN K i (2.11)
Multivariate Regression
Interpretation Example 2 : Financial
Aid
Figure 2.1 Financial Aid as a Function of Parents’ Ability to Pay
Multivariate Regression
Interpretation Example 2 : Financial
Aid
Figure 2.2 Financial Aid as a Function of High School Rank
Lecture Outline
• Goal of OLS Regression: Estimating Empirical Regression
Equation
(Yi Yi ) =
i1
2
i i i
ˆ
(Y
i1
Y 2
) (e
i1
)2
(2.13)
R
2 ESS
1
RSS
1
ei2
(2.14)
TSS TSS (Yi Y )2
• Adding a variable will not change TSS. Why?
– Adding a variable will, in most cases, decrease RSS
and increase R2
– Even if the added variable is nonsensical, R2 will
increase, unless the OLS coefficient for that added
variable is exactly zero
Example of the Interpretative
Limitations of
Example: Chapter 1 Weight guessing regression
R 2 0.74
R 2 0.75
Additional Variables, Degrees of
Freedom, and
• Including post office box demonstrates the problems of r-
squared and the need for a related alternative
• Furthermore, the inclusion of the post office box variable
requires the estimation of a coefficient.
– This lessons the degrees of freedom, or the excess of
the number of observations (N) over the coefficients
(including the intercept) estimated (K+1).
– The lower the degrees of freedom, the less reliable the
estimates are likely to be.
• Thus, the increase in the quality of fit needs to be
compared to the decrease in the degrees of freedom.
– R 2 was developed for this purpose.
Using To Describe the Overall Fit of
the Estimated Model
• R 2 measures the percentage variation of Y around its mean that
is explained by the regression equation, adjusted for degrees
of freedom.
R 2 1
i /(N K 1)
e 2
(2.15)
(Y Y)
i
2
/(N 1)
(2.16)
MOZZARELLA t = 0.85 + 0.378 INCOME t
N 10 R 2 0.88
where:
MOZZARELLAt = U.S. per capita consumption of mozzarella
cheese (in pounds) in year t
INCOMEt = U.S. real disposable per capital income (in thousands
of dollars) in year t
• On a hunch, add in new variable:
N 10 R 2 0.97