330 Lecture8 2014
330 Lecture8 2014
Collinearity
6.08.2014
R-hint(s) of the day
Random numbers and variables
> rnorm(5, mean = 2,sd = 1)
[1] 1.9199447 3.8475595 2.8962234 2.6015305 0.8656212
> runif(5, min = 2, max = 5)
[1] 4.409922 4.232709 3.444322 3.192482 4.263457
> sample(2:5, size = 5, replace = T)
[1] 2 2 4 2 5
> rpois(5, lambda = 3)
[1] 2 2 3 5 4
Writing functions
> mymean <- function(x){sum(x)/length(x)}
> test <- rnorm(20)
> mymean(test)-mean(test)
[1] 0
R-hint(s) of the day
Manipulating functions (e.g., pairs20x)
> pairs20x
function (x, ...)
{
panel.hist <- function(x, ...) {
usr <- par("usr")
on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5))
h <- hist(x, plot = FALSE)
breaks <- h$breaks
nB <- length(breaks)
y <- h$counts
y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
}
...
pairs(x, upper.panel = panel.smooth,
lower.panel = panel.cor, diag.panel = panel.hist, ...)
}
Aims of todays lecture
Y = 1 + 2x1 x2 +
4
2
2
x2
x2
0
0
2
2
4
4
4 2 0 2 4 4 2 0 2 4
x1 x1
Fitted planes for data sets A and B
x2 x1 x2 x1
Conclusion
The factor
1
1 Rj2
represents the increase in variance caused by correlation between
the explanatory variables and is called the variance inflation factor
(VIF)
Calculating the VIF: Theory
20 30 40 50
90
t.temp
70
0.81
50
30
p.temp
80
0.88
0.81
60
40
t.vp
7
p.vp
7
6
0.91 0.93 0.83 0.98
5
4
3
20 30 40 50 40 60 80 3 4 5 6 7
Collinearity
I Explanatory variables:
> round(cor(cement),2)
x1 x2 x3 x4 y
x1 1.00 0.23 -0.82 -0.25 0.73
x2 0.23 1.00 -0.14 -0.97 0.82
x3 -0.82 -0.14 1.00 0.03 -0.53
x4 -0.25 -0.97 0.03 1.00 -0.82
y 0.73 0.82 -0.53 -0.82 1.00
Cement data: VIF
> diag(solve(cor(cement[,-5])))
x1 x2 x3 x4
38.49621 254.42317 46.86839 282.51286
> apply(cement[,-5],1,sum)
1 2 3 4 5 6 7 8 9 10 11 12 13
99 97 95 97 98 97 97 98 96 98 98 98 98
Cement data: Drop x4
> diag(solve(cor(cement[,-c(4,5)])))
x1 x2 x3
3.251068 1.063575 3.142125
...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 48.19363 3.91330 12.315 6.17e-07 ***
x1 1.69589 0.20458 8.290 1.66e-05 ***
x2 0.65691 0.04423 14.851 1.23e-07 ***
x3 0.25002 0.18471 1.354 0.209
...
Residual standard error: 2.312 on 9 degrees of freedom
Multiple R-squared: 0.9823, Adjusted R-squared: 0.9764
F-statistic: 166.3 on 3 and 9 DF, p-value: 3.367e-08
Collinearity
Step 1: Calculate the residuals from regressing the response on all the
explanatory variables except x;
Step 3: Plot the first set of residuals versus the second set.
Hydrocarbon emission
4
2
Residuals
Residuals
0
0
6 4 2
10 5 0 5 10 20 10 0 10
t.temp p.temp
5
5
Residuals
Residuals
0
0
0.4 0.2 0.0 0.2 0.4 0.4 0.2 0.0 0.2 0.4 0.6
t.vp p.vp
Some curious facts about AVPs