Ch3 Multiple Regression
Ch3 Multiple Regression
Multiple Regression
Outline
1. Multiple Regression Equation
2. The Three-Variable Model: Notation and
Assumptions
3. OLS Estimation for the three-variable
model
4. Properties of OLS estimators
5. Goodness of fit –R2 and adjusted R2
6. More on Functional Form
7. Hypothesis Testing in Multiple Regression
1. Multiple regression equation
Yi 1 2 X 2i .... k X ki ui
• Y = One dependent variable (criterion)
• X = Two or more independent variables (predictor
variables).
• ui the stochastic disturbance term
• 1is the intercept
• k measures the change in Y with respect to Xk,
holding other factors fixed.
Motivation for multiple regression
– Incorporate more explanatory factors into the model
– Explicitly hold fixed other factors that otherwise
would be in u
– Allow for more flexible functional forms
• Example: Wage equation
Other factors
Other factors
Log of CEO salary Log sales Quadratic function of CEO tenure with firm
ˆ
u Y2
i
i 1 2 2 i 3 3i
ˆ ˆ X ˆ X
2
min
Example- Stata output
• Model: wage = f(educ,exper )
. reg wage educ exper
Yˆ uˆ
i 1
i i 0
4. Properties of OLS estimators
Gauss-Markov Theorem ˆ1 , ˆ 2 ,...., ˆ k are the best
linear unbiased estimators (BLUEs) of 1 , 2, ......, k
~
• An estimator j is an unbiased estimator of j if
~
E ( j ) j
• An estimator
~
j of j is linear if and only if it can
be expressed as a linear function of the data on the
n
~
dependent variable:
j wij y i
i 1
• An unbiased estimator of 2
:
2
E (u )
2
i /n
u 2
i 1
nk nk
RSS / follows
2 2
distribution with df = number of
observations – number of estimated parameters = n-k
Positive ˆ is called the standard error of the regression
(SER) (or Root MSE). SER is an estimator of the standard
deviation of the error term.
4. Properties of OLS estimators
2
Var ( ˆ j )
TSS j (1 R )2
j
n
• Where TSS j ( xij x j ) 2 is total sample
i 1
variation in xj and R 2
j is the R-squared from
regressing xj on all other independent
variables (and including an intercept).
• Since is unknown, we replace it with its
estimator ̂ . Standard error:
se( ˆ j ) ˆ /[TSS j (1 R 2j )]1/ 2
5. A measure of “Goodness of fit”
• Decomposition of total variation
TSS /(n 1)
n 1
R 1 (1 R )
2 2
nk
where k = the number of parameters in the model including the
intercept term.
R2 and the adjusted R2
• It is good practice to use adjusted R2 than R2
because R2 tends to give an overly optimistic
picture of the fit of the regression, particularly
when the number of explanatory variables is
not very small compared with the number of
observations.
The game of maximizing adjusted R2
• Researchers play the game of maximizing adjusted R2, that
is, choosing the model that gives the highest adjusted R2.
This may be dangerous.
• Our objective is not to obtain a high adjusted R2 per se but
rather to obtain dependable estimates if the true
population regression coefficients and draw statistical
inferences about them.
• Researchers should be more concerned about the logical or
theoretical relevance of the explanatory variables to the
dependent variable and their statistical significance.
• Even if R-squared is small (as in the given example),
regression may still provide good estimates of ceteris
paribus effects
Comparing Coefficients of Determination R2
j 0
Two tail |t0|>t(n-k),α/2
j 0
Right tail t0 > t(n-k),α
j 0 j 0
• -
. use "D:\Bai giang\Kinh te luong\datasets\GPA1.DTA", clear
• If the t statistic exceeds the critical value, then you can reject the
null hypothesis; otherwise, you do not reject it
Example- Stata output
• Model: wage = f(educ,exper, tenure )
. reg wage educ exper tenure
educ .00263
exper .00019406 .00014537
tenure -.0001254 -.00013218 .00046849
_cons -.03570219 -.0042369 .00143314 .53138894
Example- Stata output
• We have se( ˆ3 ˆ 4 ) 0.029635
t = -4.958 t 0.025,522 2.
Reject H0
7.3. Testing the Equality of Two Regression Coefficients
Method 2: F-test
• If the F statistics exceeds the critical value then you can reject
the null hypothesis; otherwise, you do not reject it.
2
( ˆ3 ˆ 4 ) ( 3 4 )
F1,n k
ˆ ˆ )
se ( 3 4
7.3. Testing the Equality of Two Regression Coefficients
Method 3
• Example: Return to education at 2 year vs. at 4 year colleges
Test against .
. test exper=tenure
( 1) exper - tenure = 0
F( 1, 522) = 24.58
Prob > F = 0.0000
We reject the hypothesis that the two effects are
equal.
7.4. Restricted Least Squares: Testing Linear Equality
Restrictions
Batting average Home runs per year Runs batted in per year
against
Degrees of freedom in
the unrestricted model
• Discussion
– The three variables are „jointly significant“
– They were not significant when tested individually
– The likely reason is multicollinearity between them
• Test hypothesis that, after controlling for cigs, parity, and faminc, parents’
education has no effect on birth weight
7.5. Testing for Structural or Parameter Stability of
Regression Models: The Chow Test
• Now we have three possible regressions:
• Time period 1970–1981: Yt = λ1 + λ2Xt + u1t (1)
Time period 1982–1995: Yt = γ1 + γ2Xt + u2t (2)
Time period 1970–1995: Yt = α1 + α2Xt + ut (3)
• there is no difference between the two time periods. The mechanics
of the Chow test are as follows:
1. Estimate regression (3), obtain RSS3 with df = (n1 + n2 − k)
We call RSS3 the restricted residual sum of squares (RSSR) because it is obtained by
imposing the restrictions that λ1 = γ1 and λ2 = γ2, that is, the subperiod regressions are
not different.
2. Estimate Eq. (1) and obtain its residual sum of squares, RSS1, with df
= (n1 − k).
3. Estimate Eq. (2) and obtain its residual sum of squares, RSS2, with df
= (n2 − k).
7.5. Testing for Structural or Parameter Stability of
Regression Models: The Chow Test
4. The unrestricted residual sum of squares (RSSUR), that is,
RSSUR = RSS1 + RSS2 with df = (n1 + n2 − 2k)
5. F ratio: