Chapter 6
Chapter 6
Tirtha Chatterjee
Section 6.3: More on Goodness of Fit and selection of regressors
Tirtha Chatterjee
R-square- goodness of fit
Tirtha Chatterjee
Consequences and implication of small 𝑅 "
Tirtha Chatterjee
Adjusted R-square
• Population 𝑅! is defined as 𝜌! = 1 − σ!(/ σ!)
• Measures proportion of the variation in y in population explained by the independent variables.
• Numerator is population variance of error term, u & denominator is population variance of y
• 𝑅! is supposed to be estimating population 𝑅! or 𝜌!
%%&⁄'
• estimates σ!" by 𝑆𝑆𝑅⁄𝑛 and σ!# by 𝑆𝑆𝑇/𝑛 . Hence, 𝑅! = 1 −
%%(⁄'
• But both numerator and denominator of 𝑅! are biased estimators of σ!( and σ!) .
• Replacing them with unbiased estimators we get adjusted R- square
[$$,⁄ -.#." ] 3!
4
𝑅, ! = 1 − [$$&/(-.")] = 1-[[$$&/(-.")]]
• But the ratio of two unbiased estimators is not an unbiased estimator
• Therefore, it is incorrect to assume that adjusted R-square is a corrected/better
estimator of population R-square.
Tirtha Chatterjee
Adjusted R-square
Tirtha Chatterjee
Adjusted R-square
(".,! )(-.")
• Adjusted R-square can be written in terms of 𝑅! as: 𝑅, ! =1−
(-.#.")
• For small n and large k, 𝑅, ! can be substantially below 𝑅!
• For example- if 𝑅!=0.30, n=51 and k=10, then
(".6.86)(9".") 6.:6 96 89
, !
𝑅 =1- (9"."6.") = 1 − = 1 − ;6 = 0.125
;6
• 𝑅, ! can also be negative for a small usual R-squared and small n − k − 1
• Indicating a very poor fit
• Often both 𝑅, ! and 𝑅! are reported in regression results
• Note that F statistics is computed using 𝑅! and not 𝑅, !
Tirtha Chatterjee
How to Choose between Non-nested Models
• Non-nested models are those where none of the models are restricted versions of the
other.
• F-statistics are used to compare restricted model & unrestricted models-nested models
• Consider two models-
log 𝑠𝑎𝑙𝑎𝑟𝑦 = 𝛽) + 𝛽* 𝑦𝑒𝑎𝑟𝑠 + 𝛽! 𝑔𝑎𝑚𝑒𝑠𝑦𝑟 + 𝛽+ 𝑏𝑎𝑣𝑔 + 𝛽, ℎ𝑟𝑢𝑛𝑠𝑦𝑟 + 𝑢
log 𝑠𝑎𝑙𝑎𝑟𝑦 = 𝛽) + 𝛽* 𝑦𝑒𝑎𝑟𝑠 + 𝛽! 𝑔𝑎𝑚𝑒𝑠𝑦𝑟 + 𝛽+ 𝑏𝑎𝑣𝑔 + 𝛽, 𝑟𝑏𝑖𝑠𝑦𝑟 + 𝑢
• One possibility is to create a composite model that contains all explanatory variables
from the original models and then to test each model against the general model using
the F test.
• What do we do if both models get rejected or neither get rejected
• We might not be able to compare the models in that case
Tirtha Chatterjee
Using 𝑅" " to Choose between Non-nested Models
Tirtha Chatterjee
Using 𝑅! ! to Choose between Non-nested Models- different functional form
Tirtha Chatterjee
Using 𝑅! ! -Non-nested Models- different functional form for dependent variable
• 𝑅, ! cannot be used to compare non-nested models with different functional forms for
the dependent variable
• For example- y and log(y)
• 𝑅!s measure the explained proportion of the total variation in whatever dependent
variable we are using in the regression
• And different nonlinear functions of the dependent variable will have different
amounts of variation to explain.
• For example, the total variations in y and log(y) are not the same and are often very different.
• Comparing 𝑅, ! from regressions with these different forms of the dependent variables
does not tell us anything about which model fits better
• they are fitting two separate dependent variables.
Tirtha Chatterjee
Deciding which explanatory variables to use
Tirtha Chatterjee
Controlling for too many factors in regression analysis
Tirtha Chatterjee
Controlling for too many factors in regression analysis
• Notice we don’t control for a variable measuring per capita beer consumption.
• If we control for beer consumption in this equation
• 𝛽" measures the difference in fatalities due to a one percentage point increase in tax,
holding beercons fixed.
• This has no meaning
• We should not be controlling for differences in beercons across states, unless we want
to test for some sort of indirect effect of beer taxes.
• Other factors, such as gender and age distribution, should be controlled for.
Tirtha Chatterjee
Controlling for too many factors in regression analysis- 2nd example
Tirtha Chatterjee
Adding regressors to reduce error variance
Tirtha Chatterjee
Adding regressors to reduce error variance
• Example- consider estimating the individual demand for beer as a function of the
average county beer price.
• It is important to control for individual characteristics like age, education
• affect demand
• They will be uncorrelated with county-level prices &
• Will give more precise estimates of price elasticity of beer demand.
• Might significantly reduce the error variance and standard error of the price
coefficient will be smaller
Tirtha Chatterjee
Adding regressors when we are studying policy impact
• Another issue is endogeneity of our explanatory variables
• We should not worry that some of our explanatory variables are endogenous –
provided they are not themselves affected by the policy
• For example, in studying the effect of hours in a job training program on labor
earnings
• we can include the amount of education reported prior to the job training program.
• We need not worry that schooling might be correlated with omitted factors, such as “ability,”
• Because we are not trying to estimate the return to schooling.
• We are trying to estimate the effect of the job training program,
• And we can include any controls that are not themselves affected by job training without biasing
the job training effect.
• We must avoid including a variable such as the amount of education after the job
training program,
• Some people may decide to get more education because of how many hours they were assigned to
the job training program.
Tirtha Chatterjee