0% found this document useful (0 votes)
8 views

Chapter 6

The document discusses issues related to multiple regression analysis including goodness of fit, selection of regressors, adjusted R-squared, choosing between non-nested models, and controlling for too many factors in regression analysis. It provides details on various metrics used to evaluate regression models and choose appropriate explanatory variables.

Uploaded by

Gunjan Choudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Chapter 6

The document discusses issues related to multiple regression analysis including goodness of fit, selection of regressors, adjusted R-squared, choosing between non-nested models, and controlling for too many factors in regression analysis. It provides details on various metrics used to evaluate regression models and choose appropriate explanatory variables.

Uploaded by

Gunjan Choudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Econometrics – II

BA (H) Econ Core– Spring, 2024 (February to May)

Chapter 6:Multiple Regression Analysis: Further Issues


Wooldridge, J.M. (2015) Introductory Econometrics, 6e

Tirtha Chatterjee
Section 6.3: More on Goodness of Fit and selection of regressors

Tirtha Chatterjee
R-square- goodness of fit

• 𝑅! is an estimate of how much variation in y is explained by 𝑥", 𝑥!, … 𝑥# in the


population.
• It is the ratio of the explained variation(SSE) compared to the total variation (SST)
• Fraction of the sample variation in y that is explained by x
$$%
R2 = $$&
• Relative change in 𝑅! when variables are added to an equation is very useful
• The F statistic for testing the joint significance crucially depends on the difference in 𝑅! s between
the unrestricted and restricted models.
• Small 𝑅! does not mean biased estimators
• Unbiased estimators of the ceteris paribus effects of the independent variables are determined by
the zero conditional mean assumption MLR.4

Tirtha Chatterjee
Consequences and implication of small 𝑅 "

• Makes prediction difficult- Because most of the variation in y is explained by


unobserved factors.
• Implies that we have not accounted for several factors that affect y
• Implies that the error variance is large relative to the variance of y
• Makes it difficult to get precise estimators of 𝛽'
• Large data set can be helpful here- we may be able to precisely estimate the partial effects even
though we have not controlled for many unobserved factors.

Tirtha Chatterjee
Adjusted R-square
• Population 𝑅! is defined as 𝜌! = 1 − σ!(/ σ!)
• Measures proportion of the variation in y in population explained by the independent variables.
• Numerator is population variance of error term, u & denominator is population variance of y
• 𝑅! is supposed to be estimating population 𝑅! or 𝜌!
%%&⁄'
• estimates σ!" by 𝑆𝑆𝑅⁄𝑛 and σ!# by 𝑆𝑆𝑇/𝑛 . Hence, 𝑅! = 1 −
%%(⁄'

• But both numerator and denominator of 𝑅! are biased estimators of σ!( and σ!) .
• Replacing them with unbiased estimators we get adjusted R- square
[$$,⁄ -.#." ] 3!
4
𝑅, ! = 1 − [$$&/(-.")] = 1-[[$$&/(-.")]]
• But the ratio of two unbiased estimators is not an unbiased estimator
• Therefore, it is incorrect to assume that adjusted R-square is a corrected/better
estimator of population R-square.

Tirtha Chatterjee
Adjusted R-square

• 𝑅, ! imposes a penalty for adding additional independent variables to a model.


• 𝑅! can never fall if we add a new independent variable as SSR falls but 𝑅, ! depends
on k or the number of independent variables
• As we increase k, SSR falls but degrees of freedom also falls
• SSR/(n-k-1) can go up or fall when new independent variable is added to a regression
• 𝑅, ! will increase as k increases only if
• t statistic on the new variable > 1 in absolute value or
• for a group of newly added independent variable, the F statistic for joint significance of the new
variables is greater than unity

Tirtha Chatterjee
Adjusted R-square

(".,! )(-.")
• Adjusted R-square can be written in terms of 𝑅! as: 𝑅, ! =1−
(-.#.")
• For small n and large k, 𝑅, ! can be substantially below 𝑅!
• For example- if 𝑅!=0.30, n=51 and k=10, then
(".6.86)(9".") 6.:6 96 89
, !
𝑅 =1- (9"."6.") = 1 − = 1 − ;6 = 0.125
;6
• 𝑅, ! can also be negative for a small usual R-squared and small n − k − 1
• Indicating a very poor fit
• Often both 𝑅, ! and 𝑅! are reported in regression results
• Note that F statistics is computed using 𝑅! and not 𝑅, !

Tirtha Chatterjee
How to Choose between Non-nested Models

• Non-nested models are those where none of the models are restricted versions of the
other.
• F-statistics are used to compare restricted model & unrestricted models-nested models
• Consider two models-
log 𝑠𝑎𝑙𝑎𝑟𝑦 = 𝛽) + 𝛽* 𝑦𝑒𝑎𝑟𝑠 + 𝛽! 𝑔𝑎𝑚𝑒𝑠𝑦𝑟 + 𝛽+ 𝑏𝑎𝑣𝑔 + 𝛽, ℎ𝑟𝑢𝑛𝑠𝑦𝑟 + 𝑢
log 𝑠𝑎𝑙𝑎𝑟𝑦 = 𝛽) + 𝛽* 𝑦𝑒𝑎𝑟𝑠 + 𝛽! 𝑔𝑎𝑚𝑒𝑠𝑦𝑟 + 𝛽+ 𝑏𝑎𝑣𝑔 + 𝛽, 𝑟𝑏𝑖𝑠𝑦𝑟 + 𝑢
• One possibility is to create a composite model that contains all explanatory variables
from the original models and then to test each model against the general model using
the F test.
• What do we do if both models get rejected or neither get rejected
• We might not be able to compare the models in that case

Tirtha Chatterjee
Using 𝑅" " to Choose between Non-nested Models

log 𝑠𝑎𝑙𝑎𝑟𝑦 = 𝛽6 + 𝛽"𝑦𝑒𝑎𝑟𝑠 + 𝛽!𝑔𝑎𝑚𝑒𝑠𝑦𝑟 + 𝛽8𝑏𝑎𝑣𝑔 + 𝛽;ℎ𝑟𝑢𝑛𝑠𝑦𝑟 + 𝑢

log 𝑠𝑎𝑙𝑎𝑟𝑦 = 𝛽6 + 𝛽"𝑦𝑒𝑎𝑟𝑠 + 𝛽!𝑔𝑎𝑚𝑒𝑠𝑦𝑟 + 𝛽8𝑏𝑎𝑣𝑔 + 𝛽;𝑟𝑏𝑖𝑠𝑦𝑟 + 𝑢

• We know that F-stat cannot be used to compare these two models


• R-square and adjusted R-square can be used here for non-nested models
• 𝑅> ! for the regression containing hrunsyr is .6211 & 𝑅> ! for the regression containing rbisyr is
.6226.
• Thus, based on the 𝑅> ! , there is a very slight preference for the model with rbisyr.
• But the difference is very small
• Yet we can still prefer one model over the other

Tirtha Chatterjee
Using 𝑅! ! to Choose between Non-nested Models- different functional form

• Consider two models with different functional forms


𝑟𝑑𝑖𝑛𝑡𝑒𝑛𝑠 = 𝛽6 + 𝛽" log 𝑠𝑎𝑙𝑒𝑠 + 𝑢

𝑟𝑑𝑖𝑛𝑡𝑒𝑛𝑠 = 𝛽6 + 𝛽"𝑠𝑎𝑙𝑒𝑠 + 𝛽!𝑠𝑎𝑙𝑒𝑠 ! + 𝑢


• 𝑅! is 0.061 and 0.148 respectively for the first and second model.
• It appears that quadratic model is better.
• But we know that 𝑅! does not penalize more complicated models and number of
parameters are different in the two models
• So we need 𝑅, !: 𝑅, ! is 0.030 and 0.090 respectively for the two models.
• Even this comparison gives the same results

Tirtha Chatterjee
Using 𝑅! ! -Non-nested Models- different functional form for dependent variable

• 𝑅, ! cannot be used to compare non-nested models with different functional forms for
the dependent variable
• For example- y and log(y)
• 𝑅!s measure the explained proportion of the total variation in whatever dependent
variable we are using in the regression
• And different nonlinear functions of the dependent variable will have different
amounts of variation to explain.
• For example, the total variations in y and log(y) are not the same and are often very different.
• Comparing 𝑅, ! from regressions with these different forms of the dependent variables
does not tell us anything about which model fits better
• they are fitting two separate dependent variables.

Tirtha Chatterjee
Deciding which explanatory variables to use

• This is critical for estimating the model


• We have to control for all relevant variables
• But we should not over control – i.e. add too many variables
• This makes the regression meaningless
• Also, we have to be careful that the variables are not correlated with each other
• Further while looking at policy impact, we have to carefully select control variables
• Unfortunately, the issue of whether or not to control for certain factors is not always
clear-cut.

Tirtha Chatterjee
Controlling for too many factors in regression analysis

• We often over control for factors in multiple regression.


• This is because we are concerned about Omitted Variable Bias
• But it is important to remember the ceteris paribus nature of multiple regression.
• In some cases, it makes no sense to hold some factors fixed precisely because they
should be allowed to change when a policy variable changes
• Example- suppose we are doing a study to assess the impact of state beer taxes on
traffic fatalities.
• The idea is that a higher tax on beer will reduce alcohol consumption, and likewise drunk driving,
resulting in fewer traffic fatalities

Tirtha Chatterjee
Controlling for too many factors in regression analysis

• Notice we don’t control for a variable measuring per capita beer consumption.
• If we control for beer consumption in this equation
• 𝛽" measures the difference in fatalities due to a one percentage point increase in tax,
holding beercons fixed.
• This has no meaning
• We should not be controlling for differences in beercons across states, unless we want
to test for some sort of indirect effect of beer taxes.
• Other factors, such as gender and age distribution, should be controlled for.

Tirtha Chatterjee
Controlling for too many factors in regression analysis- 2nd example

• Effects of pesticide usage among farmers on family health expenditures.


• In addition to pesticide usage amounts, should we include the number of doctor visits
as an explanatory variable?
• The answer is No.
• Health expenditures include doctor visits, and we would like to pick up all effects of pesticide use
on health expenditures.
• If we include the number of doctor visits as an explanatory variable, then we are only measuring
the effects of pesticide use on health expenditures other than doctor visits.
• It makes more sense to use number of doctor visits as a dependent variable in a
separate regression on pesticide amounts.

Tirtha Chatterjee
Adding regressors to reduce error variance

• What happens when we add an explanatory variable


• Reduces error variance
• But can also increase multicollinearity problem
• We include independent variables that affect y & are uncorrelated with other
independent variables of interest
• The issue is not unbiasedness here: Our aim is getting an estimator with a smaller
sampling variance.

Tirtha Chatterjee
Adding regressors to reduce error variance

• Example- consider estimating the individual demand for beer as a function of the
average county beer price.
• It is important to control for individual characteristics like age, education
• affect demand
• They will be uncorrelated with county-level prices &
• Will give more precise estimates of price elasticity of beer demand.
• Might significantly reduce the error variance and standard error of the price
coefficient will be smaller

Tirtha Chatterjee
Adding regressors when we are studying policy impact
• Another issue is endogeneity of our explanatory variables
• We should not worry that some of our explanatory variables are endogenous –
provided they are not themselves affected by the policy
• For example, in studying the effect of hours in a job training program on labor
earnings
• we can include the amount of education reported prior to the job training program.
• We need not worry that schooling might be correlated with omitted factors, such as “ability,”
• Because we are not trying to estimate the return to schooling.
• We are trying to estimate the effect of the job training program,
• And we can include any controls that are not themselves affected by job training without biasing
the job training effect.
• We must avoid including a variable such as the amount of education after the job
training program,
• Some people may decide to get more education because of how many hours they were assigned to
the job training program.

Tirtha Chatterjee

You might also like