0% found this document useful (0 votes)
55 views6 pages

Collinarity

The document defines and discusses collinearity, which occurs when independent variables in a multiple regression model are correlated with each other. This violates an assumption of independence and can cause problems, including inflated standard errors and unreliable estimates of regression coefficients. The document recommends testing for collinearity using variance inflation factors or regressing variables on each other to calculate correlation coefficients, and addresses various solutions such as dropping variables or using alternative estimation methods if collinearity is present.

Uploaded by

Neway Alem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views6 pages

Collinarity

The document defines and discusses collinearity, which occurs when independent variables in a multiple regression model are correlated with each other. This violates an assumption of independence and can cause problems, including inflated standard errors and unreliable estimates of regression coefficients. The document recommends testing for collinearity using variance inflation factors or regressing variables on each other to calculate correlation coefficients, and addresses various solutions such as dropping variables or using alternative estimation methods if collinearity is present.

Uploaded by

Neway Alem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 6

What is collinearity?

 Two independent variables, X1 and X2 are collinear when they are correlated
with each other
 In multiple regression study, we assume that the X variables are independent of
each other.
 We also assume that each X-variable contains a unique piece of information
about Y.
Y= β0 + β1X1 + β2X2 + ε
 We only need Y to depend on X`s
 In a multiple linear regression model
Y= β0 + β1X1 + β2X2 + ε
 We believe that
 β1 = the change in Y for a 1 unit change in X1, while X2 is healed constant
 Also
 β2 = the change in Y for a 1 unit change in X2, while X1 is healed constant
 ….. In that X1 and X2 are correlated,
 Β1 ≠ the change in Y for a 1 unit change in X1, with X2 is healed constant
also
 β2 ≠ the change in Y for a 1 unit change in X2, with X1 is healed constant
Effects of multicollinearity
1. Variance (and the standard errors) of regression coefficient estimators (i.e
the bi) are inflated. This means the var(bi) is too large
2. The magnitude the bi may be different from what we expect
3. The signs of the bi may be the opposite of what we expect
4. Adding or removing any of the x-variables produced large change in the
value of remaining bi or their signs.
5. Sometimes removing data point causes large change in the value of the bi
or the signs.
6. In some cases, F is significant, but the t-values (bi) may not be significant
Test for multicollinearity
1. Calculate the correlation coefficient (r) for each pair of the X-variables. If any of
the r-values is significantly different from zero, then the independent variables
involved may be collinear
 t=
 Caveat: Although the r for any two X variables may be too small, three
independent variables X1,X2, and X3. may be highly correlated as a
group.
2. Check the variance inflation factor (VIF) is too high
 Rule of thumb; collinearity exists if VIF>5. A VIF of 10, for example, means var
(bi) is 10 times what it should be if no collinearity existed (If no collinearity VIF
should be 1)
 VIF is a more rigorous check for collinearity than correlation coefficient.
 In the regression model Y= β0 + β1X1 + β2X2 + β3X3 + ε
R21 = obtained from regressing X1 on X2 and X3 as follows
X1 = β0 + β2X2 + β3X3 + ε
Similarly,
 X2 = β0 + β1X1 + β3X3 + ε => to obtain R22
 X3 = β0 + β1X1 + β2X2 + ε => to obtain R23
Solution for multi collinearity
1. Drop the variables causing the problem
 If using a larger number of X-variables, a stepwise regression could be
used to determine which of the variables to drop
 Removing collinear X-variables is the simplest method of solving the
multicollinearity problem
2. If all the X-variables are retained, then avoid making inferences on the
individual β parameters. Also, restrict inferences about the mean value of Y to
values of X that lie in the experimental region.
3. Re-code form of the independent variables.
 Example: if X1 and X2 are collinear, you might try using X1 and the ratio of
X2/X1 instead
4. Ridge regression: this is an alternative estimation procedure to ordinary least
squares
Example
 It is believed that the increase in the tar and nicotine contents of cigarette are
accompanied by an increase in the carbon monoxide emitted from the cigarette
smock. We wish to model carbon monoxide content, Y. as a function of tar(X1),
nicotine(X2), and weight(X3). The following model is specified
 Y= β0 + β1X1 + β2X2 + β3X3 + ε
 Y = carbon monoxide
 X1 = tar (milligram)
 X2 = nicotine (milligram)
 X3 = weight (gram)
 A cross-sectional dataset of 25 observations representing 25 brands of
cigarettes is presented

You might also like