Lecture 6 Multicollinearity
Lecture 6 Multicollinearity
MULTICOLLINEARITY
[email protected]
OUTLINE
▪ Definition
▪ Sources of multicollinearity
▪ Detection
▪ Remedy
2
MULTICOLLINEARITY
» One of the assumptions of the classical linear regression model is that there is no
perfect linear relationship among the regressors.
» If there are one or more such relationships among the regressors, we call it
multicollinearity, or collinearity for short.
• Perfect collinearity: A perfect linear relationship between two variables.
• Imperfect collinearity: The regressors are highly (but not perfectly) collinear.
𝑌 AND 𝑋𝑠: NO RELATIONSHIP AMONG X𝑠
𝑿𝟑
𝑿𝟐
𝑌 AND 𝑋𝑠: RELATIONSHIP AMONG 𝑋𝑠
𝑿𝟐
𝑿𝟑
SOURCES OF MULTICOLLINEARITY
» Constraints on the population being sampled
• Example: 2 regressors of income and occupation
• If we can’t survey physicians who are rich, then income is correlated with the dummy
variable “teacher”.
where 𝜎 2 is the variance of the error term 𝑢𝑖 , and 𝑟23 is the coefficient of correlation
between 𝑋2 and 𝑋3 .
VARIANCE INFLATION FACTOR
1
𝑉𝐼𝐹 = 2
1 − 𝑟23
• 𝑽𝑰𝑭 is a measure of the degree to which the variance of the OLS
estimator is inflated because of multicollinearity.
DETECTION OF MULTICOLLINEARITY
» High 𝑅2 but few significant 𝑡 ratios
» High pair-wise correlations among or regressors
» Significant 𝐹 test for auxiliary regressions (regressions of each regressor on the
remaining regressors) or 𝑅2 of auxiliary regression is higher than the regression
between 𝑌 and 𝑋𝑠
» Wrong expected sign but high 𝑅2
» High variance inflation factor (𝑉𝐼𝐹) > 10 (or 5)
» Sensitive change when one more independent variable is added
EXAMPLE: HOUSEHOLD EXPENDITURE