0% found this document useful (0 votes)
40 views

MCMC Methods For Multi-Response Generalized Linear

This document summarizes an R package called MCMCglmm that implements Markov chain Monte Carlo methods for fitting multi-response generalized linear mixed models. The package allows for a range of response variable distributions including Gaussian, Poisson, binomial, and others. It permits various random effects variance structures including interactions with other variables and complex structures from pedigrees or phylogenies. The package aims to provide these modeling capabilities with little expertise required from the user while reducing computational time compared to other software.

Uploaded by

kyotopinheiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

MCMC Methods For Multi-Response Generalized Linear

This document summarizes an R package called MCMCglmm that implements Markov chain Monte Carlo methods for fitting multi-response generalized linear mixed models. The package allows for a range of response variable distributions including Gaussian, Poisson, binomial, and others. It permits various random effects variance structures including interactions with other variables and complex structures from pedigrees or phylogenies. The package aims to provide these modeling capabilities with little expertise required from the user while reducing computational time compared to other software.

Uploaded by

kyotopinheiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

JSS Journal of Statistical Software

January 2010, Volume 33, Issue 2. http://www.jstatsoft.org/

MCMC Methods for Multi-Response Generalized


Linear Mixed Models: The MCMCglmm R Package

Jarrod D. Hadfield
University of Edinburgh

Abstract
Generalized linear mixed models provide a flexible framework for modeling a range of
data, although with non-Gaussian response variables the likelihood cannot be obtained
in closed form. Markov chain Monte Carlo methods solve this problem by sampling
from a series of simpler conditional distributions that can be evaluated. The R package
MCMCglmm implements such an algorithm for a range of model fitting problems. More
than one response variable can be analyzed simultaneously, and these variables are allowed
to follow Gaussian, Poisson, multi(bi)nominal, exponential, zero-inflated and censored dis-
tributions. A range of variance structures are permitted for the random effects, including
interactions with categorical or continuous variables (i.e., random regression), and more
complicated variance structures that arise through shared ancestry, either through a pedi-
gree or through a phylogeny. Missing values are permitted in the response variable(s) and
data can be known up to some level of measurement error as in meta-analysis. All simu-
lation is done in C/ C++ using the CSparse library for sparse linear systems.

Keywords: MCMC, linear mixed model, pedigree, phylogeny, animal model, multivariate,
sparse, R.

Due to their flexibility, linear mixed models are now widely used across the sciences (Brown
and Prescott 1999; Pinheiro and Bates 2000; Demidenko 2004). However, generalizing these
models to non-Gaussian data has proved difficult because integrating over the random effects
is intractable (McCulloch and Searle 2001). Although techniques that approximate these
integrals (Breslow and Clayton 1993) are now popular, Markov chain Monte Carlo (MCMC)
methods provide an alternative strategy for marginalizing the random effects that may be more
robust (Zhao, Staudenmayer, Coull, and Wand 2006; Browne and Draper 2006). Developing
MCMC methods for generalized linear mixed models (GLMM) is an active area of research
(e.g., Zeger and Karim 1991; Damien, Wakefield, and Walker 1999; Sorensen and Gianola
2002; Zhao et al. 2006), and several software packages are now available that implement
these techniques, e.g., WinBUGS (Spiegelhalter, Thomas, Best, and Lunn 2003), MLwiN
2 MCMCglmm: MCMC Methods for Multi-Response GLMMs in R

(Rasbash, Steele, Browne, and Prosser 2005), glmmBUGS (Brown 2009), JAGS (Plummer
2003). However, these methods often require a certain level of expertise on behalf of the
user and may take a great deal of computing time. The MCMCglmm package for R (R
Development Core Team 2009) implements Markov chain Monte Carlo routines for fitting
multi-response generalized linear mixed models. A range of distributions are supported and
several types of variance structure for the random effects and the residuals can be fitted. The
aim is to provide routines that require little expertise on behalf of the user while reducing
the amount of computing time required to adequately sample the posterior distribution. The
package is available from the Comprehensive R Archive Network at http://CRAN.R-project.
org/package=MCMCglmm.
In this paper we explain the underlying structure of GLMM’s and then briefly describe a
general strategy for estimating the parameters. Few new results are presented, and we would
like to acknowledge that many of the statistical results can be found in Sorensen and Gianola
(2002) and many of the algorithm details that allow the models to be fitted efficiently can be
found in Davis (2006). The main body of the paper introduces the software, using a worked
example taken from a quantitative genetic experiment. We end by comparing the routines
with WinBUGS (Spiegelhalter et al. 2003), and find MCMCglmm to be nearly 40 times faster
per iteration, and to have an effective sample size per iteration more than 3 times greater.

1. Model form
The model has three components: a) probability density functions that relate the data y to
latent variables l, on the link scale b) a standard linear mixed model with fixed and random
predictors applied to l and c) variance structures that describe the expected (co)variances
between the location effects (fixed and random effects). Although we develop these models in
a Bayesian context where the distinction between fixed and random effects does not technically
exist, we make the distinction throughout the manuscript as the terminology is well entrenched
and understood.

1.1. Probability of the data y given the latent variable l


The probability of the i-th data point is represented by:

fi (yi |li ) (1)

where fi is the probability density function associated with yi . For example, if yi was assumed
to be Poisson distributed and we used the canonical log link function, then Equation 1 would
have the form:

fP (yi |λ = exp(li )) (2)

where λ is the canonical parameter of the Poisson density function fP .

1.2. Linear model for the latent variables l


The vector of latent variables are predicted by the linear model
Journal of Statistical Software 3

l = Xβ + Zu + e (3)

where X is a design matrix relating fixed predictors to the data, and Z is a design matrix
relating random predictors to the data. These predictors have associated parameter vectors
β and u, and e is a vector of residuals. In the Poisson case these residuals deal with any
over-dispersion in the data after accounting for fixed and random sources of variation.

1.3. Variance structures for the model parameters


The location effects (β and u), and the residuals (e) are assumed to come from a multivariate
normal distribution:
     
β β0 B 0 0
 u  ∼ N  0  ,  0 G 0  (4)
e 0 0 0 R

where β0 are the prior means for the fixed effects with prior covariance matrix B, and G and
R are the expected (co)variances of the random effects and residuals respectively. The zero
off-diagonal matrices imply a priori independence between fixed effects, random effects, and
residuals. Generally, G and R are large square matrices with dimensions equal to the number
of random effects and residuals. Typically they are unknown, and must be estimated from
the data, usually by assuming they are structured in a way that they can be parametrized by
few parameters. Below we will focus on the structure of G, but the same logic can be applied
to R.
At its most general, MCMCglmm allows variance structures of the form:

G = (V1 ⊗ A1 ) ⊕ (V2 ⊗ A2 ) ⊕ . . . (5)

where the parameter (co)variance matrices (V) are usually low-dimensional and are to be
estimated, and the structured matrices (A) are usually high dimensional and treated as
known. We will refer to terms separated by a direct sum (⊕) as component terms, and the use
of a direct sum explicitly assumes random effects associated with different component terms
are independent. Each component term, however, is formed through the Kronecker product
(⊗) which allows for possible dependence between random effects within a component term.
Equation 24 can be expanded to give:
 
V1 ⊗ A1 0
G= (6)
0 V 2 ⊗ A2

where the zero off-diagonals represent the independence between component terms.
In the simplest models the structured matrices of each component term are often assumed to
be identity matrices and the parameter (co)variance matrices scalar variances:

V1 ⊗ A1 = σ12 I (7)

which assumes that random effects within a component term are independent but have a
common variance. However, independence between different levels is often too strong an
4 MCMCglmm: MCMC Methods for Multi-Response GLMMs in R

assumption. For example, if we had made two visits to a sample of schools and recorded test
scores for the children, we may expect dependence between measurements made in the same
school although they were sampled at different times. If the random effects are ordered schools
within ages (u> = [u1 u2 ]) where u1 are the random effects for the schools at time period
one, and u2 for the same set of schools at time period 2, then an appropriate G component
may have the form:

σu21
 
σu1 ,u2
V1 ⊗ A1 = ⊗I (8)
σu2 ,u1 σu22

Here the diagonal elements model different variances for the two sampling periods, and the
covariance captures any persistent differences between schools. The identity matrix in the
Kronecker product implies the schools are independent. Although the assumption of inde-
pendence may be adequate in many applications, there are situations where it is not tenable.
For example, when data have been collected on related individuals, or related species, then
complicated patterns of dependence can arise if the characteristics are heritable. In these
cases A is not an identity matrix but a matrix whose elements are equal to the proportion of
genes the two individuals have in common.

2. Parameter estimation and DIC


For most types of model (non-Gaussian data) the distribution of l is not in a recognizable
form and is updated using either Metropolis-Hastings updates or the slice sampling method
of (Damien et al. 1999). Latent variables whose residuals are non-independent are sampled in
blocks using Metropolis-Hastings updates and an efficient proposal distribution is determined
during the burn-in phase using adaptive methods (Haario, Saksman, and Tamminen 2001;
Ovaskainen, Rekola, Meyke, and Arjas 2008). The parameters of the mixed model (β and u)
follow a multivariate normal distribution and can be Gibbs sampled in a single block using
the method of Garcia-Cortes and Sorensen (2001). This method requires solving a large, but
often sparse set of linear equations which can be done efficiently using methods provided in
the CSparse library (Davis 2006). With conjugate priors the variance structures (R and G)
follow an inverse-Wishart distribution which can also be Gibbs sampled in a single block in
many instances. By fitting non-identified multiplicative working parameters for the random
effects non-central F -distributed priors for the variance components can be fitted (Gelman
2006). This involves updating the working parameters each iteration which again can be
achieved using the method of Garcia-Cortes and Sorensen (2001).
The deviance and hence the deviance information criterion (DIC) can be calculated in dif-
ferent ways depending on what is in ‘focus’ (Spiegelhalter, Best, Carlin, and van der Linde
2002). For non-Gaussian response variables (including censored Gaussian) MCMCglmm cal-
culates the deviance using the probability of the data given the latent variables. For Gaussian
data, however, the deviance is calculated using the probability of the data given the location
parameters θ > = [β u].
In the appendix the conditional distributions, and computational strategies for sampling from
them, are described in more detail, together with a more in depth explanation on the com-
putation of deviance and DIC.
Journal of Statistical Software 5

3. Software
To illustrate the software we reanalyze experimental data collected on the Eurasian passerine
bird, the Blue tit (Cyanistes caeruleus), see Hadfield, Nutall, Osorio, and Owens (2007). The
data consist of measurements taken on 828 chicks distributed across 106 broods:

R> library("MCMCglmm")
R> data("BTdata")
R> BTdata[1,]

tarsus back animal dam fosternest hatchdate sex


1 -1.892297 1.146421 R187142 R187557 F2102 -0.6874021 Fem

The day after the chicks hatch, approximately half of the brood are reciprocally swapped with
chicks from another nest. This results in an unbalanced cross-classified data structure where
chicks share a fosternest with both relatives and non-relatives. Using molecular methods
(Griffiths, Double, Orr, and Dawson 1998) the sex of the chicks were determined in 94% of
cases, and the response variables, tarsus length and back color, were measured in all birds.
The response variables are approximately normal and were mean centered and scaled to unit
variance. The date on which the chicks hatched was recorded for all nests. The parental
generation is assumed to consist of unrelated individuals and all chicks from the same family
are assumed to share the same mother and father. Although in this example, family structure
can be modeled more efficiently by fitting genetic mother (dam) as a random effect, we will
use the more general animal model Henderson (1976) which is parametrized in terms of the
relationship matrix, A. The relationship matrix is defined by the pedigree;

R> data("BTped")
R> BTped[1,]

animal dam sire


1 R187557 <NAR> <NAR>

a 3 column data frame with an individual’s identifier (animal) in the first column and its
parental identifiers in the second and third columns. The pedigree often contains more in-
dividuals than are present in the data frame (in this example the pedigree also includes the
parental generation) but all animal’s in the data frame must have a row in the pedigree.

3.1. MCMCglmm arguments


The function MCMCglmm within the R package of the same name is used for model fitting.
Hadfield et al. (2007) were interested in estimating the covariance between tarsus and back
for different sources of variation and to achieve this we fitted the model:

R> m1 <- MCMCglmm(


+ fixed = cbind(tarsus, back) ~ trait:sex + trait:hatchdate - 1,
+ random = ~ us(trait):animal + us(trait):fosternest,
+ rcov = ~ us(trait):units, prior = prior,
+ family = c("gaussian", "gaussian"), nitt = 60000, burnin = 10000,
+ thin = 25, data = BTdata, pedigree = BTped)
6 MCMCglmm: MCMC Methods for Multi-Response GLMMs in R

In the following sections we work through the four main arguments taken by MCMCglmm: those
that specify the response variables and fixed effects (fixed), the distribution of the response
variables (family), the random effects and associated G-structure (random), and the R-
structure (rcov). The syntax used to specify the model closely follows that used by asreml
(Butler, Cullis, Gilmour, and Gogel 2007), an R interface to ASReml (Gilmour, Gogel, Cullis,
Welham, and Thompson 2002) – a program for fitting GLMM using restricted maximum
likelihood (REML).

3.2. fixed: Response variables and fixed effects


The fixed argument follows the standard R formula language, and although multiple re-
sponses can be passed as a single vector, it is perhaps easier in many cases to pass them as a
matrix using cbind. For example,

fixed = cbind(tarsus, back) ~ trait:sex + trait:hatchdate - 1

defines a bivariate model with the responses tarsus and back. For multi-response models
it is usual to make use of the reserved variables trait and units which index columns and
rows of the response matrix, respectively. To understand the use of these variables it can be
easier to think of the response as stacked column-wise:

y trait units
-1.8922972 tarsus 1
tarsus back 1.1361098 tarsus 2
1 -1.8922972 1.1464212 .. .. ..
. . .
2 1.1361098 -0.7596521 =⇒ 0.8332690 tarsus 828
.. .. .. 1.1464212 back 1
. . .
828 0.8332690 -1.4387430 -0.7596521 back 2
.. .. ..
. . .
-1.4387430 back 828

By fitting trait as a fixed effect we allow the two responses to have different means, and
by fitting interactions such as trait:hatchdate we allow different regression slopes of the
traits on hatchdate. Multi-response models models are generally easier to interpret when an
overall intercept is suppressed (-1) otherwise the parameter estimates associated with back
are interpreted as contrasts with tarsus.

3.3. family: Response variable distributions


For the above model, two distributions must be specified in the family argument, and we
assume Gaussian distributions with identity link functions for both:

family = c("gaussian", "gaussian")

Other distributions and link functions can be specified (see Table 1). Some distributions
require more data columns than linear predictors. For example, censored data are passed as
Distribution #Data #Liability Density function

"gaussian" 1 1 P(y) = fN (wθ, σe2 )

"poisson" 1 1 P(y) = fP (exp(l))


P
"categorical" 1 J −1 P(y = k|k 6= 1) = exp(lk )/1 + J−1
j=1 exp(lj )
PJ−1
P(y = 1) = exp(1)/1 + j=1 exp(lj )
 nk
exp (lk )
"multinomial" J J −1 P(yk = nk |k 6= J) = P
1+ J−1
 nk
j=1 exp(lj ) 

exp(1)
P(yk = nk |k = J) = PJ−1
1+ j=1 exp(lj )

"ordinal" 1 1 P(y = k) = FN (γk |l, 1) − FN (γk−1 |l, 1)

"exponential" 1 1 P(y) = fE (exp(−l))

"cengaussian" 2 1 P(y1 R > yR > y2 ) = FN (y2 |wθ, σe2 ) − FN (y1 |wθ, σe2 )

"cenpoisson" 2 1 P(y1 R > yR > y2 ) = FP (y2 |l) − FP (y1 |l)

"cenexponential" 2 1 P(y1 R > yR > y2 ) = FE (y2 |l) − FE (y1 |l)


Journal of Statistical Software

 
exp(l2 ) exp(l2 )
"zipoisson" 1 2 P(y = 0) = + 1 − fP (y|exp(l1 ))
1+exp(l2 )  1+exp(l2 )
exp(l2 )
P(y|yR > 0) = 1 − 1+exp(l2 ) fP (y|exp(l1 ))

Table 1: Distribution types that can fitted using MCMCglmm. The prefix "zi" stands for zero-inflated, and the prefix "cen" standards
for censored where y1 and y2 are the upper and lower bounds for the unobserved datum y. J stands for the number of categories
in the multinomial/categorical distributions and this must be specified in the family argument for the multinomial distribution. The
density function is for a single datum in a univariate model with w being a row vector of W. f and F are the density and distribution
functions for the subscripted distribution (N =Normal, P =Poisson, E=Exponential). The J − 1 γ’s in the ordinal models are the
7

cutpoints, with γ1 set to zero.


8 MCMCglmm: MCMC Methods for Multi-Response GLMMs in R

two columns, the first specifying the lowest value the data could take, and the second column
specifying the highest value the data could take. However, only a single linear predictor
(associated with the uncensored but unobserved data) is fitted for that distribution and it
should be remembered that in this case trait is really indexing linear predictors, not data.
Another example of this is the binomial distribution (specified as "multinomial2" in the
family argument) which is generally specified as a two column response of successes and
failures, but is parametrized by a single linear predictor of the log odds ratio. In addition,
some distributions actually have more linear predictors than data columns. For example, the
zero-inflated Poisson has two linear predictors; one for predicting zero-inflation and one for
predicting the Poisson counts. Similarly, categorical data although passed as a single response
are treated as a multinomial response with J − 1 linear predictors (where J are the number
of categories). Again, it should be remembered that in this case several levels of trait may
be associated with different aspects of the same data column.

3.4. random: Random effects and G


Simple variance structures, as represented in Equation 7, can also be specified as a standard
R formula:

random = ~ fosternest + ...

although this is often inappropriate, especially for multi-response models where the implicit
assumption has been made that fosternest effects are identical for both traits. Table 2
summarizes covariance matrix specifications for the general 3 case, but to illustrate, we will
focus on a 2 × 2 (co)variance matrix (Vf ) associated with fosternest effects:
The diagonal elements are the fosternest variance components for tarsus length and back
color, and the off-diagonal elements are the covariance between fosternest effects on the
two traits. The specification above, without an interaction, forces the structure:

σf2 σf2
 
Vf = (9)
σf2 σf2
where all components are forced to be the same. It is natural to form interactions with trait
as we did with the fixed effects, although there are three possible ways this could be done.
The straight forward interaction trait:fosternest although still fitting a single variance
component across both traits, assumes that individual effects are independent between traits:

σf2 0
 
Vf = (10)
0 σf2
More useful interactions can be formed using the idh() and us() functions. For example,
idh(trait):fosternest fits heterogeneous variances across traits:

σf2 :tarsus
 
0
Vf = (11)
0 σf2 :back
although still assumes that the two traits are independent at the fosternest level. The
specification us(trait):fosternest fits the completely parametrized matrix that allows for
covariance across traits:
Syntax n Covariance Correlation
   
V V V 1 1 1
rfactor 1  V V V   1 1 1 
V V V 1 1 1
   
V1,1 V1,2 V1,3 1 r1,2 r1,3
us(ffactor):rfactor 6  V1,2 V2,2 V2,3   r1,2 1 r2,3 
V1,3 C2,3 V3,3 r1,3 r2,3 1
   
V 0 0 1 0 0
ffactor:rfactor 1  0 V 0   0 1 0 
0 0 V 0 0 1
   †
V1 + V2 V1 V1 1 r r
rfactor+ffactor:rfactor 2  V1 V1 + V2 V1   r 1 r 
V1 V1 V1 + V2 r r 1
   
Journal of Statistical Software

V1,1 0 0 1 0 0
idh(ffactor):rfactor 3  0 V2,2 0   0 1 0 
0 0 V3,3 0 0 1

Table 2: Different random effect specifications where ffactor is a factor with three levels and rfactor is a factor with (usually) many
levels. The resulting 3 × 3 covariance and correlation matrices of rfactor effects within and across ffactor factor levels are given,
together with the number of parameters to be estimated (n). Continuous variables can also go within the variance structure functions
(e.g., us,idh). In this case the associated parameters are regression coefficients for which (co)variances are estimated. † : rR > 0
9
10 MCMCglmm: MCMC Methods for Multi-Response GLMMs in R

σf2 :tarsus
 
σf:tarsus,back
Vf = (12)
σf:back,tarsus σf2 :back

Since the experiment was designed to measure the covariances between the two response
variables, completely parametrized (co)variance matrices are specified:

random = ~ us(trait):fosternest + us(trait):animal

For models that have pedigree or phylogenetic effects the vector of random effects needs to
be associated with the inverse relationship matrix A−1 . This matrix is formed by passing a
pedigree or phylogeny to the pedigree argument of MCMCglmm. The individuals (or taxa) need
to be associated with a column in the data frame, and this column must be called animal.
It is also possible to fit random interactions between categorical and continuous variables as
in random regression models. For example, a random intercept-slope model with a covariance
term fitted could be specified:

random = ~ us(1+age):individual

or for higher order polynomials the poly function could be used:

random = ~ us(1 + poly(age, 2)):individual

Another form of random effect structure that does not arise in the worked example is that
arising in meta analysis. In meta-analysis each data point is measured with some error.
If the sampling error around the true value is approximately normal, and the variance of
the sampling errors known, then random effect meta-analyses can be fitted by passing the
sampling variances to the mev argument of MCMCglmm. In the simplest case, without additional
random effects and i.i.d R-structure, the latent variables are assumed to have the multivariate
normal distribution:

l ∼ N Xβ, D + σe2 I

(13)

where D is a diagonal matrix with mev along the diagonal.

3.5. rcov: Residual variance structure R


The R-structure can be parametrized in the same way as the G-structure although currently
direct sums are not possible. However, unlike the G-structure it is important that the residual
model is specified in away that allows each linear predictor to have a unique residual. For
multi-response models forming an interaction between trait and units satisfies this condition
and as with the G-structure various types of interaction could be considered. Again, we will
use a fully parametrized covariance matrix:

rcov = ~ us(trait):units
Journal of Statistical Software 11

3.6. prior: Response variables and fixed effects


If not defined, default priors are used which are not proper and this can lead to both inferential
and numerical problems. The prior specification is passed to MCMCglmm via the argument
prior which takes a list of three elements specifying the priors for the fixed effects (B), the
G-structure (G) and the R-structure (R).
For the fixed effects, a multivariate normal prior distribution can be specified through the
mean vector mu (β0 ) and a (co)variance matrix V (B) passed as list elements of B. The default
has a zero mean vector and a diagonal variance matrix with large variances (1e+10).
For non-parameter expanded models, the parameter (co)variance matrices are assumed to
have (conditional) inverse-Wishart prior distributions and individual elements for each com-
ponent of the variance structure take the arguments V, n and fix which specify the expected
(co)variance matrix at the limit, the degree of freedom parameter, and the partition to con-
dition on. The variance structure prior specification for the above models was

R> prior <- list(R = list(V = diag(2)/3, n = 2), G = list(


+ G1 = list(V = diag(2)/3, n = 2), G2 = list(V = diag(2)/3, n = 2)))

where the expected covariance matrices for all three components of the variance structure
are diagonal matrices implying a priori independence between tarsus and back. The traits
were scaled to have unit variance prior to analysis and so the specification implies the prior
belief that the total variance is evenly split across all three terms. The term fix has been left
unspecified and so all variance parameters are estimated. However, for certain types of model
it is advantageous to be able to fix sub-matrices at certain values and not estimate them. The
fix argument partitions V into (potentially) 4 sub-matrices where the partition occurs on the
fix-th diagonal element. For example, if V is an n × n matrix then V is partitioned:
" #
V1:(fix-1),1:(fix-1) V1:(fix-1),fix:n
V= (14)
Vfix:n,1:(fix-1) Vfix:n,fix:n

and the lower right sub-matrix (Vfix:n,fix:n ) is fixed and not estimated. When fix = 1 the
whole matrix is fixed.
Two further arguments that can passed are alpha.mu and alpha.V which specify the prior
distribution for the non-identified working parameters. When the matrix alpha.V is non-null
parameter expanded models are fitted. When the variance-structure defines a single variance,
the prior distribution is a scaled non-central F -distribution (Gelman 2006). Without loss of
generality we can have V = 1 in the prior to give:

P(σ 2 ) = fF (σ 2 /alpha.V|1, nu, (alpha.mu2 )/alpha.V)

where fF is the density function of the F -distribution defined by three parameters: the nu-
merator and denominator degrees of freedom and the non-centrality parameter, respectively.

3.7. MCMC output


The model was ran for 60,000 iterations with a burn-in phase of 10,000 and a thinning interval
of 25. MCMCglmm returns a list with elements:
12 MCMCglmm: MCMC Methods for Multi-Response GLMMs in R

Trace of traittarsus:sexFem Density of traittarsus:sexFem


−0.2

4
2
−0.6

0
10000 20000 30000 40000 50000 60000 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1

Iterations N = 2000 Bandwidth = 0.01623

Trace of traitback:sexFem Density of traitback:sexFem

0 2 4 6
0.1
−0.2

10000 20000 30000 40000 50000 60000 −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3

Iterations N = 2000 Bandwidth = 0.01581

Trace of traittarsus:sexMale Density of traittarsus:sexMale


4
0.4

2
0.1

10000 20000 30000 40000 50000 60000 0.1 0.2 0.3 0.4 0.5 0.6

Iterations N = 2000 Bandwidth = 0.01595

Trace of traitback:sexMale Density of traitback:sexMale


0 2 4 6
0.1
−0.2

10000 20000 30000 40000 50000 60000 −0.2 −0.1 0.0 0.1 0.2

Iterations N = 2000 Bandwidth = 0.0147

Figure 1: Trace of the sampled output and density estimates for male and female tarsus
length and back color.

ˆ Sol: Posterior distribution of location effects (and cutpoints for ordinal models).

ˆ VCV: Posterior distribution of (co)variance matrices.

ˆ Liab: Posterior distribution of latent variables.

ˆ Deviance: Deviance.

ˆ DIC: Deviance information criterion.


Journal of Statistical Software 13

Trace of animal.trait.tarsus.tarsus Density of animal.trait.tarsus.tarsus

4
0.6

2
0.2

0
10000 20000 30000 40000 50000 60000 0.2 0.4 0.6 0.8 1.0

Iterations N = 2000 Bandwidth = 0.02201

Trace of animal.trait.back.tarsus Density of animal.trait.back.tarsus


0.0

4
−0.3

0
10000 20000 30000 40000 50000 60000 −0.4 −0.3 −0.2 −0.1 0.0 0.1

Iterations N = 2000 Bandwidth = 0.01311

Trace of animal.trait.tarsus.back Density of animal.trait.tarsus.back


0.0

4
−0.3

10000 20000 30000 40000 50000 60000 −0.4 −0.3 −0.2 −0.1 0.0 0.1

Iterations N = 2000 Bandwidth = 0.01311

Trace of animal.trait.back.back Density of animal.trait.back.back


0.4

4
0.1

10000 20000 30000 40000 50000 60000 0.0 0.1 0.2 0.3 0.4 0.5

Iterations N = 2000 Bandwidth = 0.01277

Figure 2: Trace of the sampled output and density estimates for the genetic covariance matrix
of tarsus length and back color.

The samples from the posterior distribution are stored as mcmc objects, which can be summa-
rized and visualized using the coda package (Plummer, Best, Cowles, and Vines 2008). The
element Sol contains the fixed effects (β), and, if pr = TRUE, also the random effects (u). The
element VCV contains the parameter (co)variance matrices stacked column-wise, and, if pl =
TRUE, Liab contains the posterior distribution of latent variables l. The element Deviance
contains the deviance at each stored iteration and DIC contains the deviance information
criterion (Spiegelhalter et al. 2002) calculated over all iterations after burn-in. Traces of the
sampled output and density estimates are shown for the effects of gender on trait expression
(Figure 1) and the genetic covariance matrix associated with animal (Figure 2).
14 MCMCglmm: MCMC Methods for Multi-Response GLMMs in R

animal fosternest units DIC


variance function variance function variance function
us us us 4043.8/4041.9
idh us us 4050.5/4050.7
idh idh us 4063.0/4062.8
idh idh idh 4077.9/4076.7
us idh us 4056.2/4059.2
us idh idh 4091.1/4089.5
idh us idh 4069.8/4069.9
us us idh 4081.8/4082.4

Table 3: Deviance information criteria for several models where the covariance between the
response variable for a designated source of variation was either estimated (us) or set to zero
(idh). Each model was ran twice in order to asses the level of Monte Carlo error in calculating
DIC.

We also fitted alternative variance structures where some or all covariances were set to zero,
and Table 3 shows the DIC for each model. The priors on the reduced models were set up so
that the marginal prior for the variances was the same as that in the full model. The sampling
error of DIC can be large and so we ran all models for an additional 500,000 iterations.

3.8. Comparison with WinBUGS


We also fitted an identical model in WinBUGS (code available from the author) using a
multivariate extension to the method proposed by Waldmann (2009). On a 2.5Ghz dual
core MacBook Pro with 2GB RAM, MCMCglmm took 7.6 minutes and WinBUGS took
4.8 hours to fit the model. Moreover, the number of effective samples was 3.2 times higher
in MCMCglmm (averaged over all parameters) indicating that the chain has better mixing
properties. Because MCMCglmm samples all location parameters in a single block the gains
in efficiency are expected to be even higher when the parameters show stronger posterior
correlation.

4. Concluding remarks
This paper introduces an R package for fitting multi-response generalized linear mixed models
using Markov chain Monte Carlo techniques developed in quantitative genetics (Sorensen and
Gianola 2002). A key aspect of these techniques is that they update all location effects (fixed
and random) as a single block which results in better mixing properties and shorter chain
lengths than alternative strategies. This can involve repeatedly solving a very large but sparse
set of mixed model equations, and the computational cost of doing this is minimized by using
the CSparse C libraries for solving sparse linear systems (Davis 2006). For the example data
set analyzed, MCMCglmm collected 120 times more effective samples per unit time than
the same model fitted in WinBUGS. A range of distributions for the response variables are
permitted, and flexible variance structures for the random effects and residuals included. It
is hoped that this package makes the flexibility and simplicity of generalized linear mixed
Journal of Statistical Software 15

modeling available to a wider range of researchers.

Acknowledgments
This work would not have been possible without the CSparse written by Tim Davis and the
comprehensive book on MCMC and mixed models by Sorensen and Gianola (2002). This work
was funded by NERC and a Leverhulme trust award to Loeske Kruuk, who together with
Shinichi Nakagawa and two anonymous reviewers made helpful comments on this manuscript.
I also thank Dylan Childs for help with fitting the example in WinBUGS.

References

Breslow NE, Clayton DG (1993). “Approximate Inference in Generalized Linear Mixed Mod-
els.” Journal of the American Statistical Association, 88(421), 9–25.

Brown H, Prescott R (1999). Applied Mixed Models in Medicine. John Wiley & Sons, New
York.

Brown P (2009). glmmBUGS: Generalized Linear Mixed Models and Spatial Models with
BUGS. R package version 1.6.4, URL http://CRAN.R-project.org/package=glmmBUGS.

Browne WJ, Draper D (2006). “A Comparison of Bayesian and Likelihood-Based Methods


for Fitting Multilevel Models.” Bayesian Analysis, 1(3), 473–514.

Browne WJ, Steele F, Golalizadeh M, Green MJ (2009). “The Use of Simple Reparameteri-
zations to Improve the Efficiency of Markov Chain Monte Carlo Estimation for Multilevel
Models with Applications to Discrete Time Survival Models.” Journal of the Royal Statis-
tical Society A, 172, 579–598.

Butler D, Cullis BR, Gilmour AR, Gogel BJ (2007). Analysis of Mixed Models for S-Language
Environments: ASReml-R Reference Manual. Queensland DPI, Brisbane, Australia. URL
http://www.vsni.co.uk/resources/doc/asreml-R.pdf.

Cowles MK (1996). “Accelerating Monte Carlo Markov Chain Convergence for Cumulative-
Link Generalized Linear Models.” Statistics and Computing, 6(2), 101–111.

Damien P, Wakefield J, Walker S (1999). “Gibbs Sampling for Bayesian Non-Conjugate and
Hierarchical Models by Using Auxiliary Variables.” Journal of the Royal Statistical Society
B, 61, 331–344.

Davis TA (2006). Direct Methods for Sparse Linear Systems. SIAM, Philadelphia.

Demidenko E (2004). Mixed Models: Theory and Application. John Wiley & Sons, New
Jersey.

Garcia-Cortes LA, Sorensen D (2001). “Alternative Implementations of Monte Carlo EM


Algorithms for Likelihood Inferences.” Genetics Selection Evolution, 33(4), 443–452.
16 MCMCglmm: MCMC Methods for Multi-Response GLMMs in R

Gelman A (2006). “Prior Distributions for Variance Parameters in Hierarchical Models.”


Bayesian Analysis, 1(3), 515–533.

Gelman A, Carlin JB, Stern HS, Rubin DB (2004). Bayesian Data Analysis. 2nd edition.
Chapman & Hall.

Gelman A, van Dyk DA, Huang ZY, Boscardin WJ (2008). “Using Redundant Parameter-
izations to Fit Hierarchical Models.” Journal of Computational and Graphical Statistics,
17(1), 95–122.

Gilmour AR, Gogel BJ, Cullis BR, Welham SJ, Thompson R (2002). ASReml User Guide
Release 1.0. VSN International Ltd, Hemel Hempstead, UK. URL http://www.VSN-Intl.
com/.

Griffiths R, Double MC, Orr K, Dawson RJG (1998). “A DNA Test to Sex Most Birds.”
Molecular Ecology, 7(8), 1071–1075.

Haario H, Saksman E, Tamminen J (2001). “An Adaptive Metropolis Algorithm.” Bernoulli,


7(2), 223–242.

Hadfield JD, Nakagawa S (2010b). “General Quantitative Genetic Methods for Comparative
Biology: Phylogenies, Taxonomies, Meta-Analysis and Multi-Trait Models for Continuous
and Categorical Characters.” Journal of Evolutionary Biology.

Hadfield JD, Nutall A, Osorio D, Owens IPF (2007). “Testing the Phenotypic Gambit: Pheno-
typic, Genetic and Environmental Correlations of Colour.” Journal of Evolutionary Biology,
20(2), 549–557.

Henderson CR (1976). “Simple Method for Computing Inverse of a Numerator Relationship


Matrix Used in Prediction of Breeding Values.” Biometrics, 32(1), 69–83.

Korsgaard IR, Andersen AH, Sorensen D (1999). “A Useful Reparameterisation to Obtain


Samples from Conditional Inverse Wishart Distributions.” Genetics Selection Evolution,
31(2), 177–181.

Liu CH, Rubin DB, Wu YN (1998). “Parameter Expansion to Accelerate EM: The PX-EM
Algorithm.” Biometrika, 85(4), 755–770.

Liu JS, Wu YN (1999). “Parameter Expansion for Data Augmentation.” Journal of the
American Statistical Association, 94(448), 1264–1274.

McCulloch CE, Searle SR (2001). Generalized, Linear and Mixed Models. John Wiley & Sons,
New York.

Meuwissen THE, Luo Z (1992). “Computing Inbreeding Coefficients in Large Populations.”


Genetics Selection Evolution, 24(4), 305–313.

Ovaskainen O, Rekola H, Meyke E, Arjas E (2008). “Bayesian Methods for Analyzing Move-
ments in Heterogeneous Landscapes from Mark-Recapture Data.” Ecology, 89(2), 542–554.

Pinheiro JC, Bates DM (2000). Mixed-Effects Models in S and S-PLUS. Springer-Verlag, New
York.
Journal of Statistical Software 17

Plummer M (2003). “JAGS: A Program for Analysis of Bayesian Graphical Models Using
Gibbs Sampling.” In K Hornik, F Leisch, A Zeileis (eds.), Proceedings of the 3rd Interna-
tional Workshop on Distributed Statistical Computing, Vienna, Austria. ISSN 1609-395X,
URL http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Proceedings/.

Plummer M, Best N, Cowles K, Vines K (2008). coda: Output Analysis and Diagnostics for
MCMC. R package version 0.13-3, URL http://CRAN.R-project.org/package=coda.

Rasbash J, Steele F, Browne W, Prosser B (2005). A User’s Guide to MLwiN Ver-


sion 2.0. University of Bristol, Bristol. URL http://www.cmm.bris.ac.uk/MLwiN/
download/manuals.shtml.

R Development Core Team (2009). R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http:
//www.R-project.org/.

Sorensen D, Gianola D (2002). Likelihood, Bayesian and MCMC Methods in Quantitative


Genetics. Springer-Verlag, New York.

Spiegelhalter DJ, Best NG, Carlin BR, van der Linde A (2002). “Bayesian Measures of Model
Complexity and Fit.” Proceedings of the Royal Society of London B, 64(4), 583–639.

Spiegelhalter DJ, Thomas A, Best NG, Lunn D (2003). WinBUGS Version 1.4 User Manual.
MRC Biostatistics Unit, Cambridge. URL http://www.mrc-bsu.cam.ac.uk/bugs/.

van Dyk DA, Meng XL (2001). “The Art of Data Augmentation.” Journal of Computational
and Graphical Statistics, 10(1), 1–50.

Waldmann P (2009). “Easy and Flexible Bayesian Inference of Quantitative Genetic Param-
eters.” Evolution, 63(6), 1640–1643.

Zeger SL, Karim MR (1991). “Generalized Linear Models with Random Effects – A Gibbs
Sampling Approach.” Journal of the American Statistical Association, 86(413), 79–86.

Zhao Y, Staudenmayer J, Coull BA, Wand MP (2006). “General Design Bayesian Generalized
Linear Mixed Models.” Statistical Science, 21(1), 35–51.
18 MCMCglmm: MCMC Methods for Multi-Response GLMMs in R

A. Updating the latent variables l


The conditional density of l is given by:

P(li |y, θ, R, G) ∝ fi (yi |li )fN (ei |ri R−1 −1 >


/i e/i , ri − ri R/i ri ) (15)

where fN indicates a Multivariate normal density with specified mean vector and covariance
matrix. Equation 15 is the probability of the data point yi with linear predictor li on the
link scale for distribution fi , multiplied by the probability of the linear predictor residual.
The linear predictor residual follows a conditional normal distribution where the conditioning
is on the residuals associated with data points other than i. Vectors and matrices with the
row and/or column associated with i removed are denoted /i. In practice, this conditional
distribution only involves other residuals which are expected to show some form of residual
covariation, as defined by the R structure. Because of this we actually update latent variables
in blocks, where the block is defined as groups of residuals which are expected to be correlated:

Y
P(lj |y, θ, R, G) ∝ pi (yi |li )fN (ej |0, Rj ) (16)
i∈j

where j indexes blocks of latent variables that have non-zero residual covariances. A special
case arises for multi-parameter distributions in which each parameter is associated with a
linear predictor. For example, in the zero-inflated Poisson two linear predictors are used
to model the same data point, one to predict zero-inflation, and one to predict the Poisson
variable. In this case the two linear predictors are updated in a single block even when the
residual covariance between them is set to zero, because the first probability in Equation 16
cannot be factored:

P(lj |y, θ, R, G) ∝ pi (yi |lj )fN (ej |0, Rj ) (17)

We use adaptive methods during the burn-in phase to determine an efficient multivariate
normal proposal distribution entered at the previous value of lj with covariance matrix mM.
For computational efficiency we use the same M for each block j, where M is the average
posterior (co)variance of lj within blocks and is updated each iteration of the burn-in period
Haario et al. (2001). The scalar m is chosen using the method of Ovaskainen et al. (2008)
so that the proportion of successful jumps is optimal, with a rate of 0.44 when lj is a scalar
declining to 0.23 when lj is high dimensional (Gelman, Carlin, Stern, and Rubin 2004).
For the standard linear mixed model with a Gaussian response and identity link, P(li =
yi |y, θ, R, G) is always unity and so the Metropolis-Hastings steps are always omitted. When
the latent variables within a block j are associated with missing data then their conditional
distribution is multivariate normal and can be Gibbs sampled directly:

P(lj |y, θ, R, G) ∼ N (Xj β + Zj u, Rj ) (18)

where design matrices subscripted by j are the rows of the original design matrices associated
with the latent variables in block j.
Journal of Statistical Software 19

>
B. Updating the location vector θ = β > u>


Garcia-Cortes and Sorensen (2001) provide a method for sampling θ as a complete block that
involves solving the sparse linear system:

θ̃ = C−1 W> R−1 (l − Wθ? − e? ) (19)

where C is the mixed model coefficient matrix:

B−1
 
> −1 0
C=W R W+ (20)
0 G−1

and W = [X Z], and B is the prior (co)variance matrix for the fixed effects.
θ? and e? are random draws from the multivariate normal distributions:
   
β0 B 0
θ? ∼ N , (21)
0 0 G

and

e? ∼ N (Wθ? , R) (22)

θ̃ + θ? gives a realization from the required probability distribution:

P(θ|l, W, R, G) (23)

Equation 19 is solved using Cholesky factorization. Because C is sparse and the pattern of
non-zero elements fixed, an initial symbolic Cholesky factorization of PCP> is preformed
where P is a fill-reducing permutation matrix (Davis 2006). Numerical factorization must
be performed each iteration but the fill-reducing permutation (found via a minimum degree
ordering of C + C> ) reduces the computational burden dramatically compared to a direct
factorization of C (Davis 2006).
Forming the inverse of the variance structures is usually simpler because they can be expressed
as a series of direct sums and Kronecker products:

G = (V1 ⊗ A1 ) ⊕ (V2 ⊗ A2 ) ⊕ . . . (24)

and the inverse of such a structure has the form

G−1 = V1−1 ⊗ A−1 ⊕ V2−1 ⊗ A−1


 
1 2 ⊕ ... (25)

which involves inverting the parameter (co)variance matrices (V), which are usually of low
dimension, and inverting A. For many problems A is actually an identity matrix and so
inversion is not required. When A is a relationship matrix associated with a pedigree, Hen-
derson (1976); Meuwissen and Luo (1992) give efficient recursive algorithms for obtaining the
inverse, and Hadfield and Nakagawa (2010b) derive a similar procedure for phylogenies.
20 MCMCglmm: MCMC Methods for Multi-Response GLMMs in R

C. Updating the variance structures G and R


Components of the direct sum used to construct the desired variance structures are condi-
tionally independent. The sum of squares matrix associated with each component term has
the form:

S = U> A−1 U (26)

where U is a matrix of random effects where each column is associated with the relevant
row/column of V and each row associated with the relevant row/column of A. The parameter
(co)variance matrix can then be sampled from the inverse Wishart distribution:

V ∼ IW ((Sp + S)−1 , np + nu ) (27)

where nu is the number of rows in U, and Sp and np are the prior sum of squares and prior
degree’s of freedom, respectively.
In some models, some elements of a parameter (co)variance matrix cannot be estimated from
the data and all the information comes from the prior. In these cases it can be advantageous
to fix these elements at some value and Korsgaard, Andersen, and Sorensen (1999) provide
a strategy for sampling from a conditional inverse-Wishart distribution which is appropriate
when the rows/columns of the parameter matrix can be permuted so that the conditioning
occurs on some diagonal sub-matrix. When this is not possible Metropolis-Hastings updates
can be made.

D. Ordinal models
For ordinal models it is necessary to update the cutpoints which define the bin boundaries
for latent variables associated with each category of the outcome. To achieve good mixing we
used the method developed by (Cowles 1996) that allows the latent variables and cutpoints
to be updated simultaneously using a Hastings-with-Gibbs update.

E. Parameter expansion
As the covariance matrix approaches a singularity the mixing of the chain becomes notoriously
slow. This problem is often encountered in single-response models when a variance component
is small and the chain becomes stuck at values close to zero. Similar problems occur for the
EM algorithm and (Liu, Rubin, and Wu 1998) introduced parameter expansion to speed up
the rate of convergence. The idea was quickly applied to Gibbs sampling problems Liu and
Wu (1999) and has now been extensively used to develop more efficient mixed-model samplers
(e.g., van Dyk and Meng 2001; Gelman, van Dyk, Huang, and Boscardin 2008; Browne, Steele,
Golalizadeh, and Green 2009).
The columns of the design matrix (W) can be multiplied by the non-identified working pa-
rameters α = [1, α1 , α2 , . . . αk ]> :

Wα = [X Z1 α1 Z2 α2 . . . Zk αk ] (28)
Journal of Statistical Software 21

where the indices denote sub-matrices of Z which pertain to effects associated with the same
variance component. Replacing W with Wα we can sample the new location effects θα as
described above, and rescale them to obtain θ:

θ = (Inβ ⊕ki=1 Inui αi )θα (29)

where the identity matrices are equal in dimension to nx the number of elements in the
subscripted parameter vector x.
Likewise, the (co)variance matrices can be rescaled by the set of α’s associated with the
variances of a particular variance structure component (αV ):

V = Diag(αV )Vα Diag(αV ) (30)

The working parameters are not identifiable in the likelihood, but do have a proper conditional
distribution. Defining Xα as an n × (k + 1) design matrix, with each column equal to the
sub-matrices in Equation 28 post-multiplied by the relevant sub-vectors of θα , we can see that
α is a vector of regression coefficients:

l = Xα α + e (31)

and so the methods described above can be used to update them.

F. Deviance and DIC


The deviance D is defined as:

D = −2log(P rob(y|Ω)) (32)

where Ω is some parameter set of the model. The deviance can be calculated in different ways
depending on what is in ‘focus’, and MCMCglmm calculates this probability for the lowest level
of the hierarchy (Spiegelhalter et al. 2002). For Gaussian response variables the likelihood is
the density:

fN (y|Xβ + Zu, R) (33)

where Ω = {θ, R} but for other response variables variables it is the product:
Y
fi (yi |li ) (34)
i

with Ω = l.
For multivariate models with mixtures of Gaussian and non-Gaussian data (including missing
values) the likelihood of the Gaussian data is the density of yg in the conditional density:
 
fN yg |Xg β + Zg u + Rg,l R−1
l,l (l − X l β − Zl u), Rg,g − Rg,l R−1
l,l Rl,g (35)
22 MCMCglmm: MCMC Methods for Multi-Response GLMMs in R

where the subscripts g and l denote rows of the data vector/design matrices that pertain
to Gaussian data, and non-Gaussian data respectively. Subscripts on the R-structure index
both rows and columns. The likelihood of the non-Gaussian data are identical to Equation 34
giving the complete parameter set Ω = {θg , R, l}.
The deviance is calculated at each iteration if DIC = TRUE and stored each thin-th iteration
after burn-in. The mean deviance (D̄) is calculated over all iterations, as is the mean of the
latent variables (l) the R-structure and the vector of predictors (Xβ + Zu). The deviance
is calculated at the mean estimate of the parameters (D(Ω̄)) and the deviance information
criterion calculated as:

DIC = 2D̄ − D(Ω̄) (36)

Affiliation:
Jarrod D. Hadfield
Institute of Evolutionary Biology
University of Edinburgh
Edinburgh, EH9 3JT, United Kingdom
E-mail: [email protected]
URL: http://wildevolution.biology.ed.ac.uk/jhadfield/

Journal of Statistical Software http://www.jstatsoft.org/


published by the American Statistical Association http://www.amstat.org/
Volume 33, Issue 2 Submitted: 2009-02-18
January 2010 Accepted: 2009-12-21

You might also like