0% found this document useful (0 votes)
36 views

Lecture 6 Multicollinearity

This document discusses multicollinearity, which occurs when there are perfect or near-perfect linear relationships among regressors in a regression model. It can inflate the variance of coefficients and make them sensitive to changes in the data. Sources include constraints on the population, inherent collinearity between variables, and model specification. Detection methods include high R-squared but insignificant t-ratios, high correlations between variables, and variance inflation factors above 10. Solutions involve restructuring models, transforming variables, and dropping highly correlated regressors.

Uploaded by

Mah Do
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Lecture 6 Multicollinearity

This document discusses multicollinearity, which occurs when there are perfect or near-perfect linear relationships among regressors in a regression model. It can inflate the variance of coefficients and make them sensitive to changes in the data. Sources include constraints on the population, inherent collinearity between variables, and model specification. Detection methods include high R-squared but insignificant t-ratios, high correlations between variables, and variance inflation factors above 10. Solutions involve restructuring models, transforming variables, and dropping highly correlated regressors.

Uploaded by

Mah Do
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Topic 6

MULTICOLLINEARITY

[email protected]
OUTLINE
▪ Definition
▪ Sources of multicollinearity
▪ Detection
▪ Remedy

2
MULTICOLLINEARITY
» One of the assumptions of the classical linear regression model is that there is no
perfect linear relationship among the regressors.
» If there are one or more such relationships among the regressors, we call it
multicollinearity, or collinearity for short.
• Perfect collinearity: A perfect linear relationship between two variables.
• Imperfect collinearity: The regressors are highly (but not perfectly) collinear.
𝑌 AND 𝑋𝑠: NO RELATIONSHIP AMONG X𝑠

𝑿𝟑

𝑿𝟐
𝑌 AND 𝑋𝑠: RELATIONSHIP AMONG 𝑋𝑠

𝑿𝟐
𝑿𝟑
SOURCES OF MULTICOLLINEARITY
» Constraints on the population being sampled
• Example: 2 regressors of income and occupation
• If we can’t survey physicians who are rich, then income is correlated with the dummy
variable “teacher”.

» Regressors are just collinear


• Example: people with higher education tend to have higher income
SOURCES OF MULTICOLLINEARITY
» Model specification
• Example: adding polynomial terms to a model, especially if the range of the 𝑋 variable
is small.
» Economic Function
• Example: 𝑤𝑎𝑔𝑒 = 𝑓 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛, 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒, 𝑎𝑔𝑒
• Experience may be correlated with age.
CONSEQUENCES
• The OLS estimators are still BLUE, but one or more regression coefficients have large
standard errors relative to the values of the coefficients, thereby making the 𝑡 ratios
small.
• Even though some regression coefficients are statistically insignificant, the 𝑅2 value
may be very high.
• Therefore, one may conclude (misleadingly) that the true values of these coefficients
are not different from zero.
• Also, the regression coefficients may be very sensitive to small changes in the data,
especially if the sample is relatively small.
• In some cases, wrong expected signs of estimated coefficients.
VARIANCE INFLATION FACTOR
» For the following regression model:
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖
» It can be shown that:
𝜎2 𝜎2
𝑣𝑎𝑟 𝛽መ2 = 2 2 = 2 𝑉𝐼𝐹
σ 𝑥2𝑖 1 − 𝑟23 σ 𝑥2𝑖
and
𝜎2 𝜎2
𝑣𝑎𝑟 𝛽መ3 = 2 2 = 2 𝑉𝐼𝐹
σ 𝑥3𝑖 1 − 𝑟23 σ 𝑥3𝑖

where 𝜎 2 is the variance of the error term 𝑢𝑖 , and 𝑟23 is the coefficient of correlation
between 𝑋2 and 𝑋3 .
VARIANCE INFLATION FACTOR

1
𝑉𝐼𝐹 = 2
1 − 𝑟23
• 𝑽𝑰𝑭 is a measure of the degree to which the variance of the OLS
estimator is inflated because of multicollinearity.
DETECTION OF MULTICOLLINEARITY
» High 𝑅2 but few significant 𝑡 ratios
» High pair-wise correlations among or regressors
» Significant 𝐹 test for auxiliary regressions (regressions of each regressor on the
remaining regressors) or 𝑅2 of auxiliary regression is higher than the regression
between 𝑌 and 𝑋𝑠
» Wrong expected sign but high 𝑅2
» High variance inflation factor (𝑉𝐼𝐹) > 10 (or 5)
» Sensitive change when one more independent variable is added
EXAMPLE: HOUSEHOLD EXPENDITURE

» SURVEY DATA OF MARRIED COUPLES IN HCM 2020.


• [DEP VAR] expense: household expenditure (mil. VND/month)
• income: household monthly income (mil. VND/month)
• age_wife: age of the wife (or female partner)
• age_husband: age of the husband (or male partner)
• hhsize: Household size (members)
• children: % children in the household
SUMMARY STATISTICS
HOUSEHOLD EXPENDITURE: OLS REGRESSION
DETECTING
MULTICOLLINEARITY
Correlation matrix
Auxiliary regression
Variance deflating factors
CORRELATION MATRIX

» High correlation coefficients (usually believed to be ±0.8) suggest high


multicollinearity.
» Low correlation coefficients do not imply the absence of multicollinearity…
» … as multicollinearity may involve more than two variables
AUXILIARY
REGRESSION
VARIANCE INFLATING FACTOR
SOLUTIONS FOR
MULTICOLLINEARITY
SOLUTIONS

» General Rules of Thumb: DO NOT WORRY IF


• coefficients are statistically significant
• correct expected signs for coefficients
RESTRUCTURING THE MODEL
» There may be alternative specifications or alternative functional forms
• Example: production function
𝑦 = 𝐹(𝑙𝑎𝑏𝑜𝑟, 𝑙𝑎𝑛𝑑, 𝑐𝑎𝑝𝑖𝑡𝑎𝑙)
• Solution:
𝑦 𝑙𝑎𝑏𝑜𝑟 𝑐𝑎𝑝𝑖𝑡𝑎𝑙
= 𝐹( , 𝑙𝑎𝑛𝑑, )
𝑙𝑎𝑛𝑑 𝑙𝑎𝑛𝑑 𝑙𝑎𝑛𝑑
TRANSFORMING REGRESSOR
DROPPING
CORRELATED REGRESSORS
DROPPING
CORRELATED REGRESSORS
Thank You
[email protected]

You might also like