Linear Mixed Effects Modeling Using R
Linear Mixed Effects Modeling Using R
Linear mixed effects models simply model the fixed and random effects as having a linear form. Similar to the General Linear Model, an outcome variable is contributed to by additive fixed and random effects (as well as an error term). Using the familiar notation, the linear mixed effect model takes the form:
3. Example Data
The example used for this article is fictional data where the interval scaled outcome variable Extroversion (extro) is predicted by fixed effects for the interval scaled predictor Openness to new experiences (open), the interval scaled predictor Agreeableness (agree), the interval scaled predictor Social engagement (social), and the nominal scaled predictor Class (class); as well as the random (nested) effect of Class within School (school). The data contains 1200 cases evenly distributed among 24 nested groups (4 classes within 6 schools). The data set is available here.
Loading required package: lattice Attaching package: 'Matrix' The following object(s) are masked from 'package:base': det
Attaching package: 'lme4' The following object(s) are masked from 'package:stats': AIC
4.2. Running the Analysis. Now we are prepared and can proceed to fit the model, which is named lmm.2, using the lmer function. Some of the optional arguments are shown here, each with the default value specified. For example, the family = gaussian argument can be used to specify other distributions (e.g. binomial, poisson, etc.). The REML = TRUE argument is used to specify that the REstricted Maximum Likelihood criterion be used rather than the loglikelihood criterion for optimization of parameter estimates. The verbose = FALSE argument suppresses the iteration history which if TRUE would display the iteration number, the value of the deviance (negative twice the log-likelihood) and the value of the parameter s which is the standard deviation of the random effects relative to the standard deviation of the residuals (Bates, 2010, p. 4). Also note the form of the formula for specifying the model. The formula (from left to right) begins with the outcome variable then the tilde, followed by all the predictors. The first five predictors represent fixed effects and then, in parentheses each random effect is listed. The random effect specifies the nested effect of class within (or under) school; as class would be considered the level one variable and school the level two variable -- which is why the forward slash is used. By default, the lmer function will also model the random effect for the highest level variable (school) of the nesting. A standard interaction term can be specified using the colon, for example (1|school:class) would specify a random effect (the parentheses) for the interaction of school and class (the colon). Likewise, a fixed effect interaction could be specified with the colon separating the two variables; for example + open:agree + open:agree:social + which would specify the interaction of open and agree, then the interaction of open, agree, and social; no parentheses would identify these interactions as fixed effects.
> lmm.2 <- lmer(formula = extro ~ open + agree + social + class + (1|school/class), data = lmm.data, family = gaussian, REML = TRUE, verbose = FALSE) > summary(lmm.2) Linear mixed model fit by REML Formula: extro ~ open + agree + social + class + (1 | school/class) Data: lmm.data AIC BIC logLik deviance REMLdev 3548 3599 -1764 3509 3528 Random effects: Groups Name Variance Std.Dev. class:school (Intercept) 2.88365 1.69813 school (Intercept) 95.17339 9.75569 Residual 0.96837 0.98406 Number of obs: 1200, groups: class:school, 24; school, 6 Fixed effects: Estimate Std. Error t value (Intercept) 57.3838787 4.0559632 14.148 open 0.0061302 0.0049634 1.235 agree -0.0077361 0.0056985 -1.358 social 0.0005313 0.0018523 0.287 classb 2.0547978 0.9837345 2.089
classc classd
3.7049300 5.6657332
0.9837165 0.9837285
3.766 5.759
Correlation of Fixed (Intr) open open -0.048 agree -0.047 -0.012 social -0.045 -0.006 classb -0.121 -0.002 classc -0.121 -0.001 classd -0.121 0.000
0.500 0.500
0.500
4.3. Interpreting the Default Summary Output. The output (above) begins by showing what was done; a linear mixed model was fit using REML criterion and the model (formula) and data are listed. Next, two rows of fit statistics are shown; beginning with the Akaike Information Criterion (AIC; Akaike, 1974) followed by the Bayesian Information Criterion (BIC; Schwarz, 1978), the log-likelihood, the deviance for the maximum likelihood criterion (smaller deviance indicates better fit), and the deviance for the REML criterion. Generally I tend to use and recommend the BIC for comparing models and assessing fit; the lower the BIC the better the model fits the data (e.g., a BIC of -55.22 indicates a better fitting model than one with a BIC of +23.56). One common way to test the models fit is to rerun the analysis but include only the intercept terms which is often called the null model and compare the BIC of that model to the hypothesized (full) model BIC. The next section of the output provides estimates for the random effects in the form of variances and standard deviations. Notice that there are three values shown; the nested effect of class within school, the random effect of the higher level variable, school and the residual term which represents error. The variance estimates are of interest here because we can add them together to find the total variance (of the random effects) and then divide that total by each random effect to see what proportion of the random effect variance is attributable to each random effect (similar to R in traditional regression). So, if we add the variance components:
> 2.88365 + 95.17339 + 0.96837 [1] 99.02541
Then we can divide this total variance by our nested effect variance to give us the proportion of variance accounted for, which indicates whether or not this effect is meaningful.
> 2.88365/99.02541 [1] 0.02912030
So, we can see that only 2.9% of the total variance of the random effects is attributed to the nested effect. If all the percentages for each random effect are very small, then the random effects are not present and linear mixed modeling is not appropriate (i.e. remove the random effects from the model and use general linear or generalized linear modeling instead). We can see that the effect of school alone is quite substantial (96%):
> 95.17339/99.02541 [1] 0.9611007
Another way to think about these variance components is in terms used with standard Analysis of Variance (ANOVA). The residual variance estimate can be thought of as the within groups variance and each random effect variance estimate can be thought of as a between groups estimate (recall the ubiquitous ANOVA summary table). The next section of the output details the estimates of the fixed effects. These estimates are interpreted the same way as one would interpret estimates from a traditional ordinary least squares linear regression. They are
interpreted as the constant (intercept) and slopes of each fixed effect predictor. The intercept is interpreted as the mean of the outcome (extro) when all the predictors have a value of zero. The predictor estimates (coefficients or slopes) are interpreted the same way as the coefficients from a traditional regression. For instance, a one unit increase in the predictor Openness to new experiences (open) corresponds to a 0.0061302 increase in the outcome Extroversion (extro). Likewise, a one unit increase in the predictor Agreeableness (agree) corresponds to a 0.0077361 decrease in the outcome Extroversion (extro). Furthermore, the categorical predictor classb has a coefficient of 2.0547978; which means, the mean Extroversion score of the second group of class (b) is 2.0547978 higher than the mean Extroversion score of the first group of class (a). Class (a) was automatically coded as the reference category by the lmer function because, like most R functions the category with the lower numeric value (or alphabetically first letter) is coded as the reference category. This is very important to note because, both SPSS and SAS use the opposite strategy; they code categorical variables so that the reference category is the category with the highest numerical value (or alphabetically last letter). This difference in strategies means that output from SPSS and SAS will agree but be very different from output produced using the lmer function in R. The key differences will be with the intercept term (which will be substantially different) and the categorical fixed effects coefficients (which will be similar, but not the same). Of course, the really important thing to note is that those differences then produce very different predicted values. If interested in getting the three programs to match, simply reverse code the categorical variable values in SPSS and SAS versions of the data. The last section of output simply provides the correlations among the fixed effects variables. This can be used to assess multicollinearity. As we can see in our output (above), the predictors are not related; with the obvious and expected exception of the categories of class. Therefore, multicollinearlity is not a concern. 4.4. Extracting Elements of the Output. The default output shown by the summary function (above) has elements which can be extracted and either viewed or assigned to an object. There are also several other elements of the lmer object which can be extracted and may be useful or meaningful. To extract the estimates of the fixed effects:
> fixef(lmm.2) (Intercept) open agree 57.3838786610 0.0061301543 -0.0077360956 classd 5.6657331872 social 0.0005312872 classb 2.0547977919 classc 3.7049300287
c:III c:IV c:V c:VI d:I d:II d:III d:IV d:V d:VI
-0.3458363 -0.2497661 -0.3678312 -0.1753169 1.2898957 -1.1384331 -1.3554610 -1.2252249 -0.9876851 3.4169085
To extract the coefficients for the random effects intercept (2 groups of school) and each group of the random effect factor, which here is a nested set of groups (4 groups of class within 6 groups of school):
> coef(lmm.2) $`class:school` (Intercept) a:I 53.97657 a:II 58.31526 a:III 58.73534 a:IV 58.65125 a:V 58.58580 a:VI 56.03906 b:I 57.68797 b:II 57.65618 b:III 57.67410 b:IV 57.65030 b:V 57.72731 b:VI 55.90742 c:I 58.77320 c:II 57.13330 c:III 57.03804 c:IV 57.13411 c:V 57.01605 c:VI 57.20856 d:I 58.67377 d:II 56.24545 d:III 56.02842 d:IV 56.15865 d:V 56.39619 d:VI 60.80079 $school (Intercept) I 43.39230 II 51.26820 III 55.41672 IV 59.32421 V 63.64807 VI 71.25377
open 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154 0.006130154
agree -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096 -0.007736096
social 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872 0.0005312872
classb 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798 2.054798
classc 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493 3.70493
classd 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733 5.665733
As you can see above, we can further specify using the $ to extract just the coefficients for the random effect of school (or just the coefficients for the nested effect $class:school):
> coef(lmm.2)$'school' (Intercept) open I 43.39230 0.006130154 II 51.26820 0.006130154 III 55.41672 0.006130154 IV 59.32421 0.006130154 V 63.64807 0.006130154 VI 71.25377 0.006130154
To extract the fitted or predicted values based on the model parameters and data, here the predicted values are assigned the name yhat:
> yhat <- fitted(lmm.2) > summary(yhat) Min. 1st Qu. Median 39.91 54.43 60.16
Max. 80.49
To extract the residuals (errors) and summarize them, as well as plot them (they should be approximately normally distributed around a mean of zero):
> residuals <- resid(lmm.2) > summary(residuals) Min. 1st Qu. Median Mean -9.84100 -0.32980 0.00553 0.00000 > hist(residuals)
Data: lmm.data AIC BIC logLik deviance REMLdev 5812 5827 -2903 5811 5806 Random effects: Groups Name Variance Std.Dev. school (Intercept) 95.8720 9.7914 Residual 7.1399 2.6721 Number of obs: 1200, groups: school, 6 Fixed effects: Estimate Std. Error t value (Intercept) 60.267 3.997 15.08
Next, add the random effect variance estimates and then divide the random effect of schools variance estimate by the total variance estimate.
> 95.8720 + 7.1399 [1] 103.0119 > 95.8720 / 103.0119 [1] 0.9306886
So, we see that the ICC is .9306886 (verified below). Another way to get the ICC is with the multilevel package (Bliese, 2009). First, conduct a standard one way ANOVA using the base aov function.
> aov.1 <- aov(extro ~ school, lmm.data) > summary(aov.1) Df Sum Sq Mean Sq F value Pr(>F) school 5 95908 19181.5 2686.5 < 2.2e-16 *** Residuals 1194 8525 7.1 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Then load the multilevel library so that we can use the ICC1 and ICC2 functions.
> library(multilevel) Loading required package: nlme Attaching package: 'nlme' The following object(s) are masked from 'package:lme4': BIC, fixef, lmList, ranef, VarCorr Loading required package: MASS
Next, we can run the ICC1 function to obtain the Intra Class Correlation (which matches the value from above).
> ICC1(aov.1) [1] 0.930689
ICC1 indicates that 93.07% of the variance in 'extro' can be "explained" by school group membership. We can also get the ICC2, which is a measure of reliability.
> ICC2(aov.1) [1] 0.9996278
The ICC2 value of .9996 indicates that school groups can be very reliably differentiated in terms of extro scores. Remember to detach the multilevel package before continuing with the next section.
> detach("package:multilevel")
To then show the structure of elements of the MCMC object; this simply shows how you can then extract elements of the MCMC object using the MCMC object name and @ (examples are further below):
> str(mcmc.5000) Formal class 'merMCMC' [package "lme4"] with 9 slots ..@ Gp : int [1:3] 0 24 30 ..@ ST : num [1:2, 1:5000] 1.73 9.91 1.68 9.96 1.67 ... ..@ call : language lmer(formula = extro ~ open + agree + social + class + (1 | school/class), data = lmm.data) ..@ deviance: num [1:5000] 3509 3509 3509 3509 3509 ... ..@ dims : Named int [1:18] 2 1200 7 30 1 2 0 1 2 5 ... .. ..- attr(*, "names")= chr [1:18] "nt" "n" "p" "q" ... ..@ fixef : num [1:7, 1:5000] 57.383879 0.00613 -0.007736 0.000531 2.054798 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:7] "(Intercept)" "open" "agree" "social" ... .. .. ..$ : NULL ..@ nc : int [1:2] 1 1 ..@ ranef : num [1:30, 1:5000] -3.407 0.931 1.351 1.267 1.202 ... ..@ sigma : num [1, 1:5000] 0.984 0.968 1.003 0.996 0.98 ...
To extract the fixed effect parameter estimates from the MCMC object (output of the matrix of 5000 by 7 parameter estimates not shown):
> mcmc.5000@fixef
To extract the random effect parameter estimates from the MCMC object (output of the matrix of parameter estimates not shown):
> mcmc.5000@ranef
Deviance is a measure of fit; the smaller the deviance statistic, the better the model fits the data. To extract and summarize the Maximum Likelihood Deviance:
> dev <- as.vector(mcmc.5000@deviance) > summary(dev) Min. 1st Qu. Median Mean 3rd Qu. 3509 3808 3828 3807 3846
Max. 3918
To show the Highest Posterior Density (HPD) intervals for the parameters of an MCMC distribution (which essentially provides confidence intervals for the posterior parameters):
> HPDinterval(mcmc.5000, prob = 0.95) $fixef lower upper (Intercept) 55.648253310 58.975371325 open -0.005528170 0.016827333 agree -0.020263349 0.005456398 social -0.003820699 0.004483850 classb 0.843105909 3.165402291 classc 2.509578421 4.816869867 classd 4.503296720 6.840049292 attr(,"Probability") [1] 0.95 $ST lower upper [1,] 0.7267248 1.018011 [2,] 0.8629657 3.426454 attr(,"Probability") [1] 0.95 $sigma lower upper [1,] 1.014880 1.21526 attr(,"Probability") [1] 0.95 $ranef [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] [21,] [22,] [23,] lower -6.8912029 -1.2099349 -0.1084854 0.2550491 0.7257380 -1.4673513 -3.2369864 -1.8458023 -1.1854551 -0.6519829 -0.1414238 -1.4623917 -2.2266973 -2.3333123 -1.7605533 -1.1287503 -0.8618493 -0.2751656 -2.3236627 -3.3204482 -2.7984293 -2.0954523 -1.4421025 upper -3.1898835 1.4023157 2.2773701 2.6539355 3.3318154 2.0908638 0.3176006 0.7151407 1.1665241 1.6672193 2.5049994 2.0688023 1.3912863 0.2616849 0.6277444 1.2296900 1.7422243 3.3046945 1.2985081 -0.7280814 -0.3895353 0.2725904 1.1768267
[24,] 3.2136308 6.7736435 [25,] -13.9539192 -10.1167238 [26,] -6.6835666 -3.5847653 [27,] -3.2718712 -0.2672930 [28,] 0.1375870 3.1094643 [29,] 3.8363158 6.9669211 [30,] 9.9230090 13.8334339 attr(,"Probability") [1] 0.95
As with some of the objects above, we can use the $ to extract elements of the HPD interval output; here extracting just the intervals for the fixed effects ($fixef):
> HPDinterval(mcmc.5000, prob = 0.95)$fixef lower upper (Intercept) 55.648253310 58.975371325 open -0.005528170 0.016827333 agree -0.020263349 0.005456398 social -0.003820699 0.004483850 classb 0.843105909 3.165402291 classc 2.509578421 4.816869867 classd 4.503296720 6.840049292 attr(,"Probability") [1] 0.95
6. Alternatives
As mentioned at the beginning of this article, there are other R packages available and in development for doing mixed effect modeling (linear and otherwise) or multilevel modeling; some of these alternatives are also capable of doing MCMC methods on the fitted model. A few of those alternatives are HGLMMM, MCMCglmm, and multilevel. However, the lme4 package represents the long term (and continued) development of the nlme package for mixed effects modeling which has been developed and used for more than 10 years. References Akaike, H. (1974). A new look at the statistical model identification. I.E.E.E. Transactions on Automatic Control, AC 19, 716 723. Available at: http://www.unt.edu/rss/class/Jon/MiscDocs/Akaike_1974.pdf Bates, D., & Maechler, M. (2010). Package lme4. Reference manual for the package, available at: http://cran.r-project.org/web/packages/lme4/lme4.pdf Bates, D. (2010). Linear mixed model implementation in lme4. Package lme4 vignette, available at: http://cran.r-project.org/web/packages/lme4/vignettes/Implementation.pdf Bliese, P. (2009). Package multilevel. Reference manual for the package, available at: http://cran.r-project.org/web/packages/multilevel/index.html Gelman, A. (2005). Analysis of variance -- why it is more important than ever. The Annals of Statistics, 33(1), 1 -- 53. Available at: http://www.unt.edu/rss/class/Jon/MiscDocs/Gelman_2005.pdf Kreft, I., & DeLeeuw, J. (1998). Introducing multilevel modeling. London: Sage Publications Ltd.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461 464. Available at: http://www.unt.edu/rss/class/Jon/MiscDocs/Schwarz_1978.pdf Additional Resources Bates, D. (2010). Computational methods for mixed models. Package lme4 vignette, available at: http://cran.r-project.org/web/packages/lme4/vignettes/Theory.pdf Bates, D. (2010). Penalized least squares versus generalized least squares representations of linear mixed models. Package lme4 vignette, available at: http://cran.r-project.org/web/packages/lme4/vignettes/PLSvGLS.pdf Bliese, P. (2009). Multilevel modeling in R: A brief introduction to R, the multilevel package and the nlme package. Available at: http://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf Draper, D. (1995). Inference and hierarchical modeling in the social sciences. Journal of Educational and Behavioral Statistics, 20(2), 115 - 147. Available at: http://www.unt.edu/rss/class/Jon/MiscDocs/Draper_1995.pdf Fox, J. (2002). Linear mixed models: An appendix to An R and S-PLUS companion to applied regression. Available at: http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-mixed-models.pdf Hofmann, D. A., Griffin, M. A., & Gavin, M. B. (2000). The application of hierarchical linear modeling to organizational research. In K. J. Klein (Ed.), Multilevel theory, research, and methods in organizations: Foundations, extensions, and new directions (p. 467 - 511). San Francisco, CA: Jossey-Bass. Available at: http://www.unt.edu/rss/class/Jon/MiscDocs/Hofmann_2000.pdf Raudenbush, S. W. (1995). Reexamining, reaffirming, and improving application of hierarchical models. Journal of Educational and Behavioral Statistics, 20(2), 210 - 220. Available at: http://www.unt.edu/rss/class/Jon/MiscDocs/Raudenbush_1995.pdf Raudenbush, S. W. (1993). Hierarchical linear models and experimental design. In L. Edwards (Ed.), Applied analysis of variance in behavioral science (p. 459 - 496). New York: Marcel Dekker. Available at: http://www.unt.edu/rss/class/Jon/MiscDocs/Raudenbush_1993.pdf Rogosa, D., & Saner, H. (1995). Longitudinal data analysis examples with random coefficient models. Journal of Educational and Behavioral Statistics, 20(2), 149 - 170. Available at: http://www.unt.edu/rss/class/Jon/MiscDocs/Rogosa_1995.pdf
Until next time; Freedom is just another word for nothing left to lose