SlideShare a Scribd company logo
Chapter 8
Multicollinearity
Perfect Multicollinearity
• Perfect multicollinearity is a violation of Classical Assumption
VI.
• It is the case where the variation in one explanatory variable
can be completely explained by movements in another
explanatory variable.
• Such a case between two independent variables would be:
• Where the X’s are independent variables in:
Perfect Multicollinearity (continued)
• Other examples of perfect linear relationships:
• Real world examples?
-Distance between two cities.
-Percent of voters voting in favor and against a
proposition.
Perfect Multicollinearity (continued)
Perfect Multicollinearity (continued)
• OLS is incapable of generating estimates of regression
coefficients where perfect multicollinearity is present.
• You cannot “hold all the other independent variables in the
equation constant.”
• A special case related to perfect multicollinearity is a
dominant variable.
• A dominant variable is so highly correlated with the
dependent variable that it masks the effects of other
independent variables.
• Don’t confuse dominant variables with highly significant
variables.
Imperfect Multicollinearity
• Imperfect multicollinearity: linear functional relationship
between two or more independent variables so strong that it
can significantly affect the estimations of coefficients.
• It occurs when two (or more) independent variables are
imperfectly linearly related, as in:
• Note ui, a stochastic error term in Equation (8.7)
Imperfect Multicollinearity (continued)
The Consequences of Multicollinearity
• The major consequences of multicollinearity are:
1. Estimates will remain unbiased.
2. The variances and standard errors of the estimates
will increase.
3. The computed t-scores will fall.
4. Estimates will become sensitive to changes in
specification.
5. The overall fit of the equation and estimation of the
coefficients of non multicollinear variables will be largely
unaffected.
The Consequences of
Multicollinearity (continued)
1. Estimates will remain unbiased.
• Even if an equation has significant multicollinearity, the
estimates of β will be unbiased if first six Classical
Assumptions hold.
2. The variances and standard errors of the estimates will
increase.
• With multicollinearity, it becomes difficult to precisely identify
the separate effects of multicollinear variables.
• OLS is still BLUE with multicollinearity.
• But the “minimum variances” can be fairly large.
The Consequences of
Multicollinearity (continued)
The Consequences of
Multicollinearity (continued)
3. The computed t-scores will fall.
• Multicollinearity tends to decrease t-scores mainly because of
the formula for the t-statistic.
• If standard error increases, t-score must fall.
• Confidence intervals also increase because standard errors
increase.
The Consequences of
Multicollinearity (continued)
4. Estimates will become sensitive to changes in specification.
• Adding/dropping variables and/or observations will often
cause major changes in β estimates when significant
multicollinearity exists.
• This occurs because with severe multicollinearity OLS is
forced to emphasize small differences between variables in
order to distinguish the effect of one multicollinear variable.
The Consequences of
Multicollinearity (continued)
5. The overall fit of the equation and estimation of the
coefficients of nonmulticollinear variables will be largely
unaffected.
• will not fall much, if at all, with significant multicollinearity.
• Combination of high and no statistically significant variables
is an indication of multicollinearity.
• It is possible for an F-test of overall significance to reject the
null even though none of the individual t-tests do.
Two Examples of the Consequences
of Multicollinearity
Example: Student consumption function
where:
COi = annual consumption expenditures of the ith
student on items other than tuition and room and
board.
Ydi = annual disposable income (including gifts) of that
student
LAi = liquid assets (savings, etc.) of the ith student
εi = stochastic error term
Two Examples of the Consequences
of Multicollinearity (continued)
• Estimate Equation 8.9 with OLS:
• Including only disposable income:
Two Examples of the Consequences
of Multicollinearity (continued)
Example: Demand for gasoline by state
where:
PCONi = petroleum consumption in the ith state
(trillions of BTUs)
UHMi = urban highway miles within the ith state
TAXi = gasoline tax in the ith state (cents per
gallon)
REGi= motor vehicle registrations in the ith state
(thousands)
Two Examples of the Consequences
of Multicollinearity (continued)
• Estimate Equation 8.12 with OLS:
• If you drop UHM:
The Detection of Multicollinearity
• Multicollinearity exists in every equation.
• Important question is how much exists.
• The severity can change from sample to sample.
• There are no generally accepted, true statistical tests for
multicollinearity.
• Researchers develop a general feeling for the severity of
multicollinearity by examining a number of characteristics.
Two common ones are:
1. Simple correlation coefficient
2. Variance inflation factors
High Simple Correlation Coefficients
• The simple correlation coefficient, r, is a measure of the
strength and direction of the linear relationship of two
variables.
• Range of r is +1 to -1.
• Sign of r indicates the direction of the correlation.
• If r, in absolute value, is high, then the two variables are quite
correlated and multicollinearity is a potential problem.
High Simple Correlation Coefficients
(continued)
• How high is high?
• Some researchers select arbitrary number, such as 0.80.
• Better answer might be r is high if it causes unacceptable
large variances.
• The use of r to detect multicollinearity has a major limitation:
groups of variables acting together can cause multicollinearity
without any single simple correlation coefficient being high.
High Variance Inflation Factors (VIFs)
• Variance inflation factor (VIF) is a method of detecting the
severity of multicollinearity by looking at the extent to which
a given explanatory variable can be explained by all other
explanatory variables in an equation.
• Suppose the following model with K independent variables:
• Need to calculate a VIF for each of the K independent
variables.
High Variance Inflation
Factors (VIFs) (continued)
• To calculate VIFs:
1. Run an OLS regression that has Xi as a function of
all the other explanatory variables in the equation.
2. Calculate the variance inflation factor for
High Variance Inflation
Factors (VIFs) (continued)
• The higher the VIF, the more severe the effects of
multicollinearity.
• But, there are no formal critical VIF values.
• A common rule of thumb: if VIF > 5, multicollinearity is
severe.
• It’s possible to have large multicollinearity effects without
having a large VIF.
Remedies for Multicollinearity
Remedy 1: Do nothing
• Existence of multicollinearity might not mean anything (i.e.
coefficients still significant and meet expectations).
• If you delete a multicollinear variable that belongs in model,
you cause specification bias.
• Every time a regression is rerun, we risk encountering a
specification that accidently works on the specific sample.
Remedies for Multicollinearity
(continued)
Remedy 2: Drop a redundant variable
• Two or more variables in an equation measuring essentially
the same thing might be called redundant.
• Dropping redundant variable is nothing more than making up
for a specification error.
• In case of severe multicollinearity, it makes no statistical
difference which variable is dropped.
• The theoretical underpinnings of model should be the basis
for dropping a redundant variable.
Remedies for Multicollinearity
(continued)
Example: Student consumption function:
Remedies for Multicollinearity
(continued)
Remedy 3: Increase the size of the sample
• Normally, a larger sample will reduce the variance of the
estimated coefficients diminishing impact of multicollinearity.
• Unfortunately, while a useful alternative to be considered, it
may be impossible.
An Example of Why Multicollinearity
Often is Best Left Unadjusted
Example: Impact of marketing on soft drink sales
where:
St = sales of the soft drink in year t
Pt = average relative price of the drink in year t
At = advertising expenditures for the company in year t
Bt = advertising expenditures for the company’s main
competitor in year t.
An Example of Why Multicollinearity
Often is Best Left Unadjusted (continued)
• If variable B is dropped:
• Note expected bias in estimated coefficient of At:
CHAPTER 8: the end

More Related Content

Similar to Chapter8_Final.pptxkhnhkjlllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll (20)

PPTX
Group 20_Logistic Regression devara.pptx
sriaditya070304
 
PPT
Multiple Regression.ppt
EkoGaniarto
 
PDF
Ch_17_Wooldridge_6e_PPT_Updated.pdf.pdf
ROBERTOENRIQUEGARCAA1
 
PDF
Bmgt 311 chapter_15
Chris Lovett
 
PPTX
2.2 Logit and Probit.pptx
Rahul Borate
 
PDF
Log reg pdf.pdf
DevarapalliVamsi1
 
PPTX
Logistic regression
DrZahid Khan
 
PDF
Regression analysis made easy
Weam Banjar
 
PPTX
Applications of regression analysis - Measurement of validity of relationship
Rithish Kumar
 
PDF
Introduction to Limited Dependent variable
Ashok Dsouza
 
PPT
Multivariate Linear Regression.ppt
TanyaWadhwani4
 
PPTX
Tempest in teapot
Gaetan Lion
 
PPT
FE3.ppt
asde13
 
PPT
Intro to ecm models and cointegration.ppt
OliverMcNamara
 
PPTX
Presentation related to time series and the problem of autocorrelation
MishaalFaisal2
 
PPT
A presentation for Multiple linear regression.ppt
vigia41
 
DOCX
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
madlynplamondon
 
PDF
Assumptions of Linear Regression - Machine Learning
Kush Kulshrestha
 
PPTX
Lecture 3 Supervised Learning(Machine Learning).pptx
MdMujahidHasan1
 
PPSX
Viva extented final
Silia Vitoratou
 
Group 20_Logistic Regression devara.pptx
sriaditya070304
 
Multiple Regression.ppt
EkoGaniarto
 
Ch_17_Wooldridge_6e_PPT_Updated.pdf.pdf
ROBERTOENRIQUEGARCAA1
 
Bmgt 311 chapter_15
Chris Lovett
 
2.2 Logit and Probit.pptx
Rahul Borate
 
Log reg pdf.pdf
DevarapalliVamsi1
 
Logistic regression
DrZahid Khan
 
Regression analysis made easy
Weam Banjar
 
Applications of regression analysis - Measurement of validity of relationship
Rithish Kumar
 
Introduction to Limited Dependent variable
Ashok Dsouza
 
Multivariate Linear Regression.ppt
TanyaWadhwani4
 
Tempest in teapot
Gaetan Lion
 
FE3.ppt
asde13
 
Intro to ecm models and cointegration.ppt
OliverMcNamara
 
Presentation related to time series and the problem of autocorrelation
MishaalFaisal2
 
A presentation for Multiple linear regression.ppt
vigia41
 
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
madlynplamondon
 
Assumptions of Linear Regression - Machine Learning
Kush Kulshrestha
 
Lecture 3 Supervised Learning(Machine Learning).pptx
MdMujahidHasan1
 
Viva extented final
Silia Vitoratou
 

Recently uploaded (20)

PDF
Haiti Educational System Le Floridien.pdf
LE FLORIDIEN
 
PPTX
Real Options Analysis in an Era of Market Volatility and Technological Disrup...
abakahmbeahvincent
 
PDF
Natesan Thanthoni: The Agile Visionary Transforming Virbac IMEA (India, Middl...
red402426
 
PDF
CFG application - 2025 - Curtis Funding Group, LLC
Curt MacRae
 
PPTX
Black life TeleHealth 3 (1).pptx Business Plan
mdthelackyboy
 
PDF
How do we fix the Messed Up Corporation’s System diagram?
YukoSoma
 
PDF
Summary of Comments on Writing the House, Parts I & II.pdf
Brij Consulting, LLC
 
PDF
Deception Technology: The Cybersecurity Paradigm We Didn’t Know We Needed
GauriKale30
 
PPTX
Melbourne’s Trusted Accountants for Business Tax - Clear Tax
Clear Tax
 
PDF
Your Best Year Yet Create a Sharp, Focused AOP for FY2026
ChristopherVicGamuya
 
PDF
The Canvas of Creative Mastery Newsletter_June 2025
AmirYakdi
 
PDF
Albaik Franchise All Information Update.pdf
AL-Baik Franchise
 
PDF
How is IMSLP Wagner Connected with Pachelbel & Shostakovich.pdf
SheetMusic International
 
PDF
Adnan Imam - A Dynamic Freelance Writer
Adnan Imam
 
PPT
Impact of Hand Block Printing Manufacturers in the Bedsheet Retail Market.ppt
Top Supplier of Bedsheet, Razai, Comforters in India - Jaipur Wholesaler
 
PPTX
Appreciations - June 25.pptxggggggghhhhhh
anushavnayak
 
PDF
SAG Infotech Issues Press Release for Media and Publications
SAG Infotech
 
PDF
A Brief Introduction About Dorian Fenwick
Dorian Fenwick
 
DOCX
Top Digital Marketing Services Company | Fusion Digitech
ketulraval6
 
PDF
Vedanta Group Sets High Standards in Tax Contribution.
Vedanta Cases
 
Haiti Educational System Le Floridien.pdf
LE FLORIDIEN
 
Real Options Analysis in an Era of Market Volatility and Technological Disrup...
abakahmbeahvincent
 
Natesan Thanthoni: The Agile Visionary Transforming Virbac IMEA (India, Middl...
red402426
 
CFG application - 2025 - Curtis Funding Group, LLC
Curt MacRae
 
Black life TeleHealth 3 (1).pptx Business Plan
mdthelackyboy
 
How do we fix the Messed Up Corporation’s System diagram?
YukoSoma
 
Summary of Comments on Writing the House, Parts I & II.pdf
Brij Consulting, LLC
 
Deception Technology: The Cybersecurity Paradigm We Didn’t Know We Needed
GauriKale30
 
Melbourne’s Trusted Accountants for Business Tax - Clear Tax
Clear Tax
 
Your Best Year Yet Create a Sharp, Focused AOP for FY2026
ChristopherVicGamuya
 
The Canvas of Creative Mastery Newsletter_June 2025
AmirYakdi
 
Albaik Franchise All Information Update.pdf
AL-Baik Franchise
 
How is IMSLP Wagner Connected with Pachelbel & Shostakovich.pdf
SheetMusic International
 
Adnan Imam - A Dynamic Freelance Writer
Adnan Imam
 
Impact of Hand Block Printing Manufacturers in the Bedsheet Retail Market.ppt
Top Supplier of Bedsheet, Razai, Comforters in India - Jaipur Wholesaler
 
Appreciations - June 25.pptxggggggghhhhhh
anushavnayak
 
SAG Infotech Issues Press Release for Media and Publications
SAG Infotech
 
A Brief Introduction About Dorian Fenwick
Dorian Fenwick
 
Top Digital Marketing Services Company | Fusion Digitech
ketulraval6
 
Vedanta Group Sets High Standards in Tax Contribution.
Vedanta Cases
 
Ad

Chapter8_Final.pptxkhnhkjlllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

  • 2. Perfect Multicollinearity • Perfect multicollinearity is a violation of Classical Assumption VI. • It is the case where the variation in one explanatory variable can be completely explained by movements in another explanatory variable. • Such a case between two independent variables would be: • Where the X’s are independent variables in:
  • 3. Perfect Multicollinearity (continued) • Other examples of perfect linear relationships: • Real world examples? -Distance between two cities. -Percent of voters voting in favor and against a proposition.
  • 5. Perfect Multicollinearity (continued) • OLS is incapable of generating estimates of regression coefficients where perfect multicollinearity is present. • You cannot “hold all the other independent variables in the equation constant.” • A special case related to perfect multicollinearity is a dominant variable. • A dominant variable is so highly correlated with the dependent variable that it masks the effects of other independent variables. • Don’t confuse dominant variables with highly significant variables.
  • 6. Imperfect Multicollinearity • Imperfect multicollinearity: linear functional relationship between two or more independent variables so strong that it can significantly affect the estimations of coefficients. • It occurs when two (or more) independent variables are imperfectly linearly related, as in: • Note ui, a stochastic error term in Equation (8.7)
  • 8. The Consequences of Multicollinearity • The major consequences of multicollinearity are: 1. Estimates will remain unbiased. 2. The variances and standard errors of the estimates will increase. 3. The computed t-scores will fall. 4. Estimates will become sensitive to changes in specification. 5. The overall fit of the equation and estimation of the coefficients of non multicollinear variables will be largely unaffected.
  • 9. The Consequences of Multicollinearity (continued) 1. Estimates will remain unbiased. • Even if an equation has significant multicollinearity, the estimates of β will be unbiased if first six Classical Assumptions hold. 2. The variances and standard errors of the estimates will increase. • With multicollinearity, it becomes difficult to precisely identify the separate effects of multicollinear variables. • OLS is still BLUE with multicollinearity. • But the “minimum variances” can be fairly large.
  • 11. The Consequences of Multicollinearity (continued) 3. The computed t-scores will fall. • Multicollinearity tends to decrease t-scores mainly because of the formula for the t-statistic. • If standard error increases, t-score must fall. • Confidence intervals also increase because standard errors increase.
  • 12. The Consequences of Multicollinearity (continued) 4. Estimates will become sensitive to changes in specification. • Adding/dropping variables and/or observations will often cause major changes in β estimates when significant multicollinearity exists. • This occurs because with severe multicollinearity OLS is forced to emphasize small differences between variables in order to distinguish the effect of one multicollinear variable.
  • 13. The Consequences of Multicollinearity (continued) 5. The overall fit of the equation and estimation of the coefficients of nonmulticollinear variables will be largely unaffected. • will not fall much, if at all, with significant multicollinearity. • Combination of high and no statistically significant variables is an indication of multicollinearity. • It is possible for an F-test of overall significance to reject the null even though none of the individual t-tests do.
  • 14. Two Examples of the Consequences of Multicollinearity Example: Student consumption function where: COi = annual consumption expenditures of the ith student on items other than tuition and room and board. Ydi = annual disposable income (including gifts) of that student LAi = liquid assets (savings, etc.) of the ith student εi = stochastic error term
  • 15. Two Examples of the Consequences of Multicollinearity (continued) • Estimate Equation 8.9 with OLS: • Including only disposable income:
  • 16. Two Examples of the Consequences of Multicollinearity (continued) Example: Demand for gasoline by state where: PCONi = petroleum consumption in the ith state (trillions of BTUs) UHMi = urban highway miles within the ith state TAXi = gasoline tax in the ith state (cents per gallon) REGi= motor vehicle registrations in the ith state (thousands)
  • 17. Two Examples of the Consequences of Multicollinearity (continued) • Estimate Equation 8.12 with OLS: • If you drop UHM:
  • 18. The Detection of Multicollinearity • Multicollinearity exists in every equation. • Important question is how much exists. • The severity can change from sample to sample. • There are no generally accepted, true statistical tests for multicollinearity. • Researchers develop a general feeling for the severity of multicollinearity by examining a number of characteristics. Two common ones are: 1. Simple correlation coefficient 2. Variance inflation factors
  • 19. High Simple Correlation Coefficients • The simple correlation coefficient, r, is a measure of the strength and direction of the linear relationship of two variables. • Range of r is +1 to -1. • Sign of r indicates the direction of the correlation. • If r, in absolute value, is high, then the two variables are quite correlated and multicollinearity is a potential problem.
  • 20. High Simple Correlation Coefficients (continued) • How high is high? • Some researchers select arbitrary number, such as 0.80. • Better answer might be r is high if it causes unacceptable large variances. • The use of r to detect multicollinearity has a major limitation: groups of variables acting together can cause multicollinearity without any single simple correlation coefficient being high.
  • 21. High Variance Inflation Factors (VIFs) • Variance inflation factor (VIF) is a method of detecting the severity of multicollinearity by looking at the extent to which a given explanatory variable can be explained by all other explanatory variables in an equation. • Suppose the following model with K independent variables: • Need to calculate a VIF for each of the K independent variables.
  • 22. High Variance Inflation Factors (VIFs) (continued) • To calculate VIFs: 1. Run an OLS regression that has Xi as a function of all the other explanatory variables in the equation. 2. Calculate the variance inflation factor for
  • 23. High Variance Inflation Factors (VIFs) (continued) • The higher the VIF, the more severe the effects of multicollinearity. • But, there are no formal critical VIF values. • A common rule of thumb: if VIF > 5, multicollinearity is severe. • It’s possible to have large multicollinearity effects without having a large VIF.
  • 24. Remedies for Multicollinearity Remedy 1: Do nothing • Existence of multicollinearity might not mean anything (i.e. coefficients still significant and meet expectations). • If you delete a multicollinear variable that belongs in model, you cause specification bias. • Every time a regression is rerun, we risk encountering a specification that accidently works on the specific sample.
  • 25. Remedies for Multicollinearity (continued) Remedy 2: Drop a redundant variable • Two or more variables in an equation measuring essentially the same thing might be called redundant. • Dropping redundant variable is nothing more than making up for a specification error. • In case of severe multicollinearity, it makes no statistical difference which variable is dropped. • The theoretical underpinnings of model should be the basis for dropping a redundant variable.
  • 27. Remedies for Multicollinearity (continued) Remedy 3: Increase the size of the sample • Normally, a larger sample will reduce the variance of the estimated coefficients diminishing impact of multicollinearity. • Unfortunately, while a useful alternative to be considered, it may be impossible.
  • 28. An Example of Why Multicollinearity Often is Best Left Unadjusted Example: Impact of marketing on soft drink sales where: St = sales of the soft drink in year t Pt = average relative price of the drink in year t At = advertising expenditures for the company in year t Bt = advertising expenditures for the company’s main competitor in year t.
  • 29. An Example of Why Multicollinearity Often is Best Left Unadjusted (continued) • If variable B is dropped: • Note expected bias in estimated coefficient of At: