Ch-LinearRegression
Ch-LinearRegression
with
Chapter 3
Linear Regression
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 0 / 97
Linear Regression
Overview
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 1 / 97
Linear regression model
yi = xi> β + εi , i = 1, . . . , n.
In matrix form:
y = X β + ε.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 2 / 97
Assumptions
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 3 / 97
Notation
OLS estimator of β :
Fitted values:
ŷ = X β̂.
Residuals:
ε̂ = y − ŷ .
Residual sum of squares (RSS):
n
X
ε̂2i = ε̂> ε̂.
i =1
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 4 / 97
R tools
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 5 / 97
Linear Regression
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 6 / 97
Demand for economics journals
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 7 / 97
Demand for economics journals
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 8 / 97
Demand for economics journals
Goal: Estimate the effect of the price per citation on the number of
library subscriptions.
Regression equation:
log(subs)i = β1 + β2 log(citeprice)i + εi .
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 9 / 97
Demand for economics journals
●
7 ●
●
●
●●
● ● ●
●● ● ●●●●●●● ● ● ●
●
● ●
●
6
●●● ● ●
● ● ●● ● ●
● ●● ●● ●● ● ● ● ●
●● ● ●● ●●● ●
● ● ●●●●● ● ●
● ● ● ●
●● ● ●● ● ● ●●
●●
5
●● ● ● ●
● ●● ●● ● ●
●● ●
● ● ●● ● ● ●
● ●● ●
log(subs)
● ● ●● ● ●● ●
● ● ●● ●● ●● ● ●
● ●● ●● ● ● ●● ●
4
● ●● ● ● ● ● ●●
● ● ●●
● ● ● ●●
● ● ●
● ● ● ●●●●
● ●
● ●
3
● ●● ● ●
● ●● ●
●
2
1
−4 −2 0 2
log(citeprice)
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 10 / 97
Fitted-model objects
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 11 / 97
Generic functions
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 12 / 97
Summary of fitted-model objects
R> summary(jour_lm)
Call:
lm(formula = log(subs) ~ log(citeprice), data = journals)
Residuals:
Min 1Q Median 3Q Max
-2.7248 -0.5361 0.0372 0.4662 1.8481
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.7662 0.0559 85.2 <2e-16
log(citeprice) -0.5331 0.0356 -15.0 <2e-16
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 13 / 97
Summary of fitted-model objects
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 14 / 97
Analysis of variance
R> anova(jour_lm)
Analysis of Variance Table
Response: log(subs)
Df Sum Sq Mean Sq F value Pr(>F)
log(citeprice) 1 126 125.9 224 <2e-16
Residuals 178 100 0.6
ANOVA breaks the sum of squares about the mean of log(subs) into
two parts:
part accounted for by linear function of log(citeprice),
part attributed to residual variation.
anova() produces
ANOVA table for a single “lm” object, and also
comparisons of several nested “lm” models using F tests.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 15 / 97
Point and interval estimates
Confidence intervals:
R> confint(jour_lm, level = 0.95)
2.5 % 97.5 %
(Intercept) 4.6559 4.8765
log(citeprice) -0.6033 -0.4628
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 16 / 97
Prediction
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 17 / 97
Prediction
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 18 / 97
Prediction
●
7 ●
●
●
●●
● ● ●
●● ● ●●●●●●● ● ● ●
●
● ●
●
6
●●● ● ●
● ● ●● ● ●
● ●● ●● ●● ● ● ● ●
●● ● ●● ●●● ●
● ● ●●●●● ● ●
● ● ● ●
●● ● ●● ● ● ●●
●●
5
●● ● ● ●
● ●● ●● ● ●
●● ●
● ● ●● ● ● ●
● ●● ●
log(subs)
● ● ●● ● ●● ●
● ● ●● ●● ●● ● ●
● ●● ●● ● ● ●● ●
4
● ●● ● ● ● ● ●●
● ● ●●
● ● ● ●●
● ● ●
● ● ● ●●●●
● ●
● ●
3
● ●● ● ●
● ●● ●
●
2
1
−4 −2 0 2
log(citeprice)
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 19 / 97
Diagnostic plots
The plot() method for class lm() provides six types of diagnostic
plots, four of which are shown by default.
R> plot(jour_lm)
produces
residuals versus fitted values,
QQ plot for normality,
scale-location plot,
standardized residuals versus leverages.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 20 / 97
Diagnostic plots
Residuals vs Fitted Normal Q−Q
3
2
●● ● IO ● ● ●● ● ●
IO ●
Standardized residuals
●● ●
2
● ●●●●
● ● ● ●●
●● ● ●
●
1
● ●●●● ●
●●
●
● ● ● ●●●● ●
●●
●
●●
●
●●●●
●
●●●●●●
●
1
● ● ● ●●
●
●●
●
●
● ● ● ● ●●●
● ● ● ●●
●
● ●● ●● ●● ●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
Residuals
●●
● ● ●●●● ● ● ●●
●●
●
● ● ● ●●● ●● ● ●
● ●● ● ●
●●
●
● ●●● ●●● ●●●
●
●●
●
●●
●●
●
●
0
●● ●●●● ● ● ● ●●
●●
●
●●●●● ●
0
● ● ● ● ● ●
●
●●
●●
● ●● ● ● ● ●● ●●
●●
●
● ●●● ●●●
● ●● ● ●●
●
●●
● ● ●
● ● ● ● ● ● ●
●
●
●●
● ●● ● ● ●●●● ● ●● ● ● ●
●
●●
●
●
●●
●
●
●
−3 −2 −1
● ●●
●
●
●
−1
● ● ●
●
●●
●●
●
●●
●
●●
●
● ●● ● ● ●●
●●
●
●●●
● ● ● ● ● ●●●
● ●●●●●
−2
BoIES ● ●
● BoIES
● MEPiTE
−3
● MEPiTE
3 4 5 6 7 −2 −1 0 1 2
3
● MEPiTE
●
● ●● ● RoRPE
Standardized residuals
Standardized residuals
2
BoIES ● ●
●●
IO ●
1.5
● ● ●
● ● ●● ● ●
●
●●●●●●● ●
● ● ● ●● ● ●
● ● ●● ●●●●●
1
●
● ●
●
●
●● ●
● ● ● ●●
● ●● ●●●
● ● ● ● ● ●●● ●● ●
●
●
●
●
●●
●●●● ● ● ●●●● ●
●
● ●
● ● ●●
●●●
● ●●● ● ●
● ●● ● ●
● ●●● ●●
0
●
● ●● ●● ● ●
●● ●
● ●
● ● ● ● ●
1.0
● ● ●● ●●● ● ●
●●●
●● ●● ●● ●
● ● ●● ●●● ● ● ● ●
● ● ● ●●
●
● ●● ● ●● ● ● ●●● ● ●●
● ● ● ● ● ● ●
●●●●●● ● ● ●●● ●
●● ●● ●● ● ● ●● ●
●● ● ●
●
●●●●● ● ● ● ●● ●
●
● ● ●● ●●
● ● ●● ● ● ● ●●● ●● ● Ecnmt ●
● ● ● ● ●
● ● ●● ● ● ● ● ●
−2
● ● ● ●●● ●●
●● ● ●● ●
●
0.5
●● ● ●
● ● ● ●●●● ● ●
● ●● ●
●● ●● ● ● ●
● ● ● ● ●
● ● ●● ● ●●
● ●●
● ●● ● ● ●
● 0.5
Cook's ●distance
MEPiTE
−4
●
0.0
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 21 / 97
Diagnostic plots
All these journals are not overly expensive: either heavily cited
(Econometrica), resulting in a low price per citation, or with few
citations, resulting in a rather high price per citation.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 22 / 97
Testing a linear hypothesis
Rβ = r ,
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 23 / 97
Testing a linear hypothesis
Hypothesis:
log(citeprice) = - 0.5
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 24 / 97
Linear Regression
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 25 / 97
Wage equation
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 26 / 97
Wage equation
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 27 / 97
Wage equation
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 28 / 97
Wage equation
Model:
In R:
R> cps_lm <- lm(log(wage) ~ experience + I(experience^2) +
+ education + ethnicity, data = CPS1988)
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 29 / 97
Wage equation
R> summary(cps_lm)
Call:
lm(formula = log(wage) ~ experience + I(experience^2) +
education + ethnicity, data = CPS1988)
Residuals:
Min 1Q Median 3Q Max
-2.943 -0.316 0.058 0.376 4.383
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.321395 0.019174 225.4 <2e-16
experience 0.077473 0.000880 88.0 <2e-16
I(experience^2) -0.001316 0.000019 -69.3 <2e-16
education 0.085673 0.001272 67.3 <2e-16
ethnicityafam -0.243364 0.012918 -18.8 <2e-16
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 30 / 97
Dummy variables and contrast codings
Factor ethnicity:
Only a single coefficient for level "afam".
No coefficient for level "cauc" which is the “reference category”.
"afam" coefficient codes the difference in intercepts between the
"afam" and the "cauc" groups.
In statistical terminology: “treatment contrast”.
In econometric jargon: “dummy variable”.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 31 / 97
Dummy variables and contrast codings
Internally:
R produces a dummy variable for each level.
Resulting overspecifications are resolved by applying “contrasts”,
i.e., a constraint on the underlying parameter vector.
Contrasts can be attributed to factors (or queried and changed) by
contrasts().
Default for unordered factors: use all dummy variables except for
reference category.
This is typically what is required for fitting econometric regression
models.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 32 / 97
The function I()
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 33 / 97
Comparison of models
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 34 / 97
Comparison of models
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 35 / 97
Comparison of models
Response: log(wage)
Df Sum Sq Mean Sq F value Pr(>F)
experience 1 840 840 2462 <2e-16
I(experience^2) 1 2249 2249 6597 <2e-16
education 1 1620 1620 4750 <2e-16
ethnicity 1 121 121 355 <2e-16
Residuals 28150 9599 0
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 36 / 97
Comparison of models
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 37 / 97
Comparison of models
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 38 / 97
Linear Regression
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 39 / 97
Partially linear models
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 40 / 97
Partially linear models
Call:
lm(formula = log(wage) ~ bs(experience, df = 5) +
education + ethnicity, data = CPS1988)
Residuals:
Min 1Q Median 3Q Max
-2.931 -0.308 0.057 0.367 3.994
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.77558 0.05608 49.5 <2e-16
bs(experience, df = 5)1 1.89167 0.07581 24.9 <2e-16
bs(experience, df = 5)2 2.25947 0.04647 48.6 <2e-16
bs(experience, df = 5)3 2.82458 0.07077 39.9 <2e-16
bs(experience, df = 5)4 2.37308 0.06520 36.4 <2e-16
bs(experience, df = 5)5 1.73934 0.11969 14.5 <2e-16
education 0.08818 0.00126 70.1 <2e-16
ethnicityafam -0.24820 0.01273 -19.5 <2e-16
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 41 / 97
Partially linear models
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 42 / 97
Partially linear models
Details:
Construct a list cps_bs of fitted linear models via lapply().
Apply extractor functions, e.g., sapply(cps_bs, AIC).
The call above additionally sets the penalty term to log(n) (yielding
BIC instead of the default AIC), and assigns names via
structure().
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 43 / 97
Partially linear models
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 44 / 97
Partially linear models
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 45 / 97
Partially linear models
In R:
Set alpha for color (0: fully transparent, 1: opaque).
Argument alpha available in various color functions, e.g., rgb().
rgb() implements RGB (red, green, blue) color model.
Selecting equal RGB intensities yields a shade of gray.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 46 / 97
Partially linear models
Alternatives:
Visualization: Employ tiny plotting character such as pch = ".".
Model specification: Use penalized splines with package mgcv –
or kernels instead of splines with package np.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 47 / 97
Linear Regression
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 48 / 97
Factors and Interactions
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 49 / 97
Interactions
Formula Description
y ~ a + x Model without interaction: identical slopes
with respect to x but different intercepts with
respect to a.
y ~ a * x Model with interaction: the term a:x gives
y ~ a + x + a:x the difference in slopes compared with the
reference category.
y ~ a / x Model with interaction: produces the same
y ~ a + x %in% a fitted values as the model above but using a
nested coefficient coding. An explicit slope
estimate is computed for each category in a.
y ~ (a + b + c)^2 Model with all two-way interactions
y ~ a*b*c - a:b:c (excluding the three-way interaction).
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 50 / 97
Interactions
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 51 / 97
Interactions
Equivalently:
R> cps_int <- lm(log(wage) ~ experience + I(experience^2) +
+ education + ethnicity + education:ethnicity,
+ data = CPS1988)
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 52 / 97
Separate regressions for each level
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 53 / 97
Separate regressions for each level
Comparison:
R> cps_sep_cf <- matrix(coef(cps_sep), nrow = 2)
R> rownames(cps_sep_cf) <- levels(CPS1988$ethnicity)
R> colnames(cps_sep_cf) <- names(coef(cps_lm))[1:4]
R> cps_sep_cf
(Intercept) experience I(experience^2) education
cauc 4.310 0.07923 -0.0013597 0.08575
afam 4.159 0.06190 -0.0009415 0.08654
R> anova(cps_sep, cps_lm)
Analysis of Variance Table
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 54 / 97
Change of the reference category
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 55 / 97
Weighted least squares
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 56 / 97
Weighted least squares
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 57 / 97
Weighted least squares
●
7 ●
●
●
●●
● ● ●
●● ● ●●●●●●● ● ● ●
●
● ●
●
6
●●● ● ●
● ● ●● ● ●
● ●● ●● ●● ● ● ● ●
●● ● ●● ●●● ●
● ● ●●●●● ● ●
● ● ● ●
●● ● ●● ● ● ●●
●●
5
●● ● ● ●
● ●● ●● ● ●
●● ●
● ● ●● ● ● ●
● ●● ●
log(subs)
● ● ●● ● ●● ●
● ● ●● ●● ●● ● ●
● ●● ●● ● ● ●● ●
4
● ●● ● ● ● ● ●●
● ● ●●
● ● ● ●●
● ● ●
● ● ● ●●●●
● ●
● ●
3
● ●● ● ●
● ●● ●
●
2
OLS
WLS1
1
WLS2 ●
−4 −2 0 2
log(citeprice)
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 58 / 97
Feasible generalized least squares
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 59 / 97
Feasible generalized least squares
Iterate further:
R> gamma2i <- coef(auxreg)[2]
R> gamma2 <- 0
R> while(abs((gamma2i - gamma2)/gamma2) > 1e-7) {
+ gamma2 <- gamma2i
+ fglsi <- lm(log(subs) ~ log(citeprice), data = journals,
+ weights = 1/citeprice^gamma2)
+ gamma2i <- coef(lm(log(residuals(fglsi)^2) ~
+ log(citeprice), data = journals))[2]
+ }
R> gamma2
log(citeprice)
0.2538
R> jour_fgls2 <- lm(log(subs) ~ log(citeprice), data = journals,
+ weights = 1/citeprice^gamma2)
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 60 / 97
Feasible generalized least squares
●
7 ●
●
●
●●
● ● ●
●● ● ●●●●●●● ● ● ●
●
● ●
●
6
●●● ● ●
● ● ●● ● ●
● ●● ●● ●● ● ● ● ●
●● ● ●● ●●● ●
● ● ●●●●● ● ●
● ● ● ●
●● ● ●● ● ● ●●
●●
5
●● ● ● ●
● ●● ●● ● ●
●● ●
● ● ●● ● ● ●
● ●● ●
log(subs)
● ● ●● ● ●● ●
● ● ●● ●● ●● ● ●
● ●● ●● ● ● ●● ●
4
● ●● ● ● ● ● ●●
● ● ●●
● ● ● ●●
● ● ●
● ● ● ●●●●
● ●
● ●
3
● ●● ● ●
● ●● ●
●
2
OLS
1
FGLS ●
−4 −2 0 2
log(citeprice)
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 61 / 97
Linear Regression
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 62 / 97
Linear regression with time series data
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 63 / 97
Linear regression with time series data
Two solutions:
Data preprocessing (e.g., lags and differences) “by hand” before
calling lm(). (See also Chapter 6.)
Use dynlm() from package dynlm.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 64 / 97
Linear regression with time series data
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 65 / 97
Linear regression with time series data
Time
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 66 / 97
Linear regression with time series data
Interpretation:
Distributed lag model: consumption responds to changes in
income only over two periods.
Autoregressive distributed lag: effects of income changes persist.
In R:
R> library("dynlm")
R> cons_lm1 <- dynlm(consumption ~ dpi + L(dpi), data = USMacroG)
R> cons_lm2 <- dynlm(consumption ~ dpi + L(consumption),
+ data = USMacroG)
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 67 / 97
Linear regression with time series data
R> summary(cons_lm1)
Time series regression with "ts" data:
Start = 1950(2), End = 2000(4)
Call:
dynlm(formula = consumption ~ dpi + L(dpi),
data = USMacroG)
Residuals:
Min 1Q Median 3Q Max
-190.0 -56.7 1.6 49.9 323.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -81.0796 14.5081 -5.59 7.4e-08
dpi 0.8912 0.2063 4.32 2.4e-05
L(dpi) 0.0309 0.2075 0.15 0.88
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 68 / 97
Linear regression with time series data
R> summary(cons_lm2)
Time series regression with "ts" data:
Start = 1950(2), End = 2000(4)
Call:
dynlm(formula = consumption ~ dpi + L(consumption),
data = USMacroG)
Residuals:
Min 1Q Median 3Q Max
-101.30 -9.67 1.14 12.69 45.32
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.53522 3.84517 0.14 0.89
dpi -0.00406 0.01663 -0.24 0.81
L(consumption) 1.01311 0.01816 55.79 <2e-16
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 69 / 97
Linear regression with time series data
Graphically:
R> plot(merge(as.zoo(USMacroG[,"consumption"]), fitted(cons_lm1),
+ fitted(cons_lm2), 0, residuals(cons_lm1),
+ residuals(cons_lm2)), screens = rep(1:2, c(3, 3)),
+ col = rep(c(1, 2, 4), 2), xlab = "Time",
+ ylab = c("Fitted values", "Residuals"), main = "")
R> legend(0.05, 0.95, c("observed", "cons_lm1", "cons_lm2"),
+ col = c(1, 2, 4), lty = 1, bty = "n")
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 70 / 97
Linear regression with time series data
observed
5000
Fitted values cons_lm1
cons_lm2
3000
1000
300
Residuals
0 100
−200
Time
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 71 / 97
Linear regression with time series data
Details:
merge() original series with fitted values from both models, a zero
line and residuals of both models.
merged series is plotted on two screens with different colors and
some more annotation.
Before merging, original “ts” series is coerced to class “zoo” (from
package zoo) via as.zoo().
“zoo” generalizes “ts” with slightly more flexible plot() method.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 72 / 97
Encompassing test
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 73 / 97
Encompassing test
Idea:
Transform nonnested model comparison into nested model
comparison.
Fit the encompassing model comprising all regressors from both
competing models.
Compare each of the two nonnested models with the
encompassing model.
If one model is not significantly worse than the encompassing
model while the other is, this test would favor the former model
over the latter.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 74 / 97
Encompassing test
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 75 / 97
Encompassing test
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 76 / 97
Linear Regression
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 77 / 97
Static linear models
Data structure:
Two-dimensional index.
Cross-sectional objects are called “individuals”.
Time identifier is called “time”.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 78 / 97
Static linear models
Data handling: Select subset of three firms for illustration and declare
individuals ("firm") and time identifier ("year").
R> data("Grunfeld", package = "AER")
R> library("plm")
R> gr <- subset(Grunfeld, firm %in% c("General Electric",
+ "General Motors", "IBM"))
R> pgr <- plm.data(gr, index = c("firm", "year"))
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 79 / 97
Static linear models
Basic model:
Remarks:
two-way model upon setting effect = "twoways",
fixed effects via fixef() method and associated summary()
method.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 80 / 97
Static linear models
R> summary(gr_fe)
Oneway (individual) effect Within Model
Call:
plm(formula = invest ~ value + capital, data = pgr,
model = "within")
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-167.33 -26.14 2.09 26.84 201.68
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
value 0.1049 0.0163 6.42 3.3e-08
capital 0.3453 0.0244 14.16 < 2e-16
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 81 / 97
Static linear models
R-Squared: 0.871
Adj. R-Squared: 0.861
F-statistic: 185.407 on 2 and 55 DF, p-value: <2e-16
Answer: Compare fixed effects and pooled OLS fits via pFtest().
R> pFtest(gr_fe, gr_pool)
F test for individual effects
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 82 / 97
Static linear models
Random effects:
Specify model = "random" in plm() call.
Select method for estimating the variance components.
Recall: Random-effects estimator is essentially FGLS estimator,
utilizing OLS after “quasi-demeaning” all variables.
Precise form of quasi-demeaning depends on random.method
selected.
Four methods available: Swamy-Arora (default), Amemiya,
Wallace-Hussain, and Nerlove.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 83 / 97
Static linear models
R> summary(gr_re)
Oneway (individual) effect Random Effect Model
(Wallace-Hussain's transformation)
Call:
plm(formula = invest ~ value + capital, data = pgr,
model = "random", random.method = "walhus")
Effects:
var std.dev share
idiosyncratic 4389.3 66.3 0.35
individual 8079.7 89.9 0.65
theta: 0.837
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-187.40 -32.92 6.96 31.43 210.20
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 84 / 97
Static linear models
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
(Intercept) -109.9766 61.7014 -1.78 0.08
value 0.1043 0.0150 6.95 3.8e-09
capital 0.3448 0.0245 14.06 < 2e-16
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 85 / 97
Static linear models
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 86 / 97
Static linear models
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 87 / 97
Dynamic linear models
R> summary(empl_ab)
Twoways effects Two steps model
Call:
pgmm(formula = dynformula(form, list(2, 1, 0, 1)),
data = EmplUK, effect = "twoways", model = "twosteps",
index = c("firm", "year"), gmm.inst = ~log(emp),
lag.gmm = list(c(2, 99)))
Residuals
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.6191 -0.0256 0.0000 -0.0001 0.0332 0.6410
Coefficients
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 90 / 97
Dynamic linear models
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 91 / 97
Dynamic linear models
Note: Due to constructing lags and taking first differences, three cross
sections are lost. Hence, estimation period is 1979–1984 and only 611
observations effectively available for estimation.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 92 / 97
Linear Regression
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 93 / 97
Systems of linear equations
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 94 / 97
Systems of linear equations
Fit model:
R> library("systemfit")
R> gr_sur <- systemfit(invest ~ value + capital,
+ method = "SUR", data = pgr2)
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 95 / 97
Systems of linear equations
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Chrysler_(Intercept) -5.7031 13.2774 -0.43 0.67293
Chrysler_value 0.0780 0.0196 3.98 0.00096
Chrysler_capital 0.3115 0.0287 10.85 4.6e-09
IBM_(Intercept) -8.0908 4.5216 -1.79 0.09139
IBM_value 0.1272 0.0306 4.16 0.00066
IBM_capital 0.0966 0.0983 0.98 0.33951
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 96 / 97
Systems of linear equations
Details:
summary() provides standard regression results for each
equation in compact layout plus some measures of overall fit.
More detailed output (between-equation correlations, etc.)
available, but was suppressed here.
Output indicates again that there is substantial variation among
firms.
Christian Kleiber, Achim Zeileis © 2008–2017 Applied Econometrics with R – 3 – Linear Regression – 97 / 97