Bayesian Inference of Genetic Parameters Based On
Bayesian Inference of Genetic Parameters Based On
114249
∗
Department of Forest Genetics and Plant Physiology,
SE-90183 Swedish University of Agricultural Sciences, Umeå, Sweden
†
Department of Mathematics and Statistics,
Rolf Nevanlinna Institute,
FIN-00014 University of Helsinki, Helsinki, Finland
‡
Department of Agricultural Sciences,
FIN-00014 University of Helsinki, Helsinki, Finland
§
Thetastats,
SE-22471 Uardavägen 91, Lund, Sweden
∗∗
Department of Animal Science,
Iowa State University, Ames, Iowa, 50011-3150
1
Running head: Conditional decompositions
Keywords: WinBUGS, Mixed linear model, Genetic evaluation,
Complex pedigree, Markov chain Monte Carlo
2
ABSTRACT
It is widely recognized that the mixed linear model is an important tool for parameter es-
timation in the analysis of complex pedigrees, which include both pedigree and genomic
information, and where mutually dependent genetic factors are often assumed to follow
tistical method based on the decomposition of the multivariate normal prior distribution
ally demanding genetic evaluations of complex pedigrees, within the user-friendly computer
package WinBUGS. In order to demonstrate and evaluate the exibility of the method, we
analyzed two example pedigrees: a large non-inbred pedigree of Scots pine (Pinus sylvestris
L.) that includes additive and dominance polygenic relationships; and a simulated pedigree
where genomic relationships have been calculated based on a dense marker map. The anal-
ysis showed that our method was fast and provided accurate estimates, and that it should
therefore be a helpful tool for estimating genetic parameters of complex pedigrees quickly
and reliably.
3
Much eort in genetics has been devoted to revealing the underlying genetic architecture of
quantitative or complex traits. Traditionally, the polygenic model has been used extensively
to estimate genetic variances and breeding values of natural and breeding populations, where
an innite number of genes is assumed to code for the trait of interest (Bulmer 1971; Falconer
and Mackay 1996). The genetic variance of a quantitative trait can be decomposed into
an additive part that corresponds to the eects of individual alleles, and a part that is
non-additive because of interactions between alleles. Attention has generally been focused
on the estimation of additive genetic variance (and heritability), since additive variation
is directly proportional to the response of selection via the breeder's equation (Falconer
and Mackay 1996, chap 11). However, in order to estimate additive genetic variation and
genetic evaluations (Misztal 1997; Ovaskainen et al. 2008; Waldmann et al. 2008), especially
if the pedigree being analyzed contains a large proportion of full-sibs and clones, as these
in particular give rise to non-additive genetic relationships (Lynch and Walsh 1998, pp
145). The polygenic model using pedigree and phenotypic information i.e. the animal model
(Henderson 1984) has been the model of choice for estimating genetic parameters in breeding
and natural populations (Abney et al. 2000; Sorensen and Gianola 2002; O0 Hara et al. 2008).
wide, single nucleotide polymorphism (SNP) maps. These maps have helped to uncover a
vast amount of new loci responsible for trait expression and have provided general insights
into the genetic architecture of quantitative traits (e.g. Valdar et al. 2006; Visscher 2008;
Flint and Mackay 2009). These insights can help when calculating disease risks in humans,
4
when attempting to increase the yield from breeding programs, and when estimating re-
science and agriculture can now be scored quickly and relatively cheaply, for example in mice
(Valdar et al. 2006), chickens (Muir et al. 2008), and dairy cattle (VanRaden et al. 2009).
In the analysis of populations of breeding stock, the inclusion of dense marker data has
improved the predictive ability (i.e. reliability) of genetic evaluations compared to the tra-
ditional phenotype model, both in simulations (Meuwissen et al. 2001; Calus et al. 2008;
Hayes et al. 2009) and when using real data (Legarra et al. 2008; VanRaden et al. 2009;
González-Recio et al. 2009). Meuwissen et al. (2001) suggested that the eect of all markers
should rst be estimated, and then summed, in order to obtain genomic estimated breeding
values (GEBVs). An alternative procedure, where all markers are used to compute the ge-
nomic relationship matrix (in place of the additive polygenic relationship matrix) has also
been suggested (e.g. Villanueva et al. 2005; VanRaden 2008; Hayes et al. 2009); this matrix
is then incorporated into the statistical analysis to estimate GEBVs. A comparison of both
procedures (VanRaden 2008) yielded similar estimates of GEBVs in cases where the eect
of an individual allele was small. In addition, if not all pedigree members have marker in-
formation, a combined relationship matrix derived from both genotyped and ungenotyped
individuals could be computed; this has been shown to increase the accuracy of GEBVs
(Legarra et al. 2009; Misztal et al. 2009). Another plausible option to incorporate marker
information is to use low-density SNP panels within families, and to trace the eect of SNPs
from high-density genotyped ancestors, as suggested by Habier et al. (2009) and Weigel
et al. (2009). However, fast and powerful computer algorithms, which can use the marker
5
information as eciently as possible in the analysis of quantitative traits, are needed in order
The present study describes the development of an ecient Bayesian method for incorpo-
rating general relationships into the genetic evaluation procedure. The method is based on
distributions, each conditioned on the descending variables. When evaluating the genetic pa-
rameters of natural and breeding populations, high dimensional distributions are often used
as prior distributions of various genetic eects, such as the additive polygenic eect (Wang
et al. 1993), multivariate additive polygenic eects (Van Tassell and Van Vleck 1996), and
quantitative trait loci (QTL) eects via the identical-by-decent matrix (Yi and Xu 2000).
parameters, estimated by using Markov chain Monte Carlo (MCMC) sampling algorithms
in the software package WinBUGS (Lunn et al. 2000 ; 2009 ). By performing prior calcu-
lations in the form of the factorized product of simple univariate conditional distributions,
the computational time of the MCMC estimation procedure is reduced considerably. This
feature permits rapid inference for both the polygenic model and the genomic relationship
model. Moreover, the decomposition allows for inbreeding of varying degree, since the cor-
rect genetic covariance structure can be inferred into the analysis. In the present paper,
we test the method on two previously published pedigree datasets: phenotype data from
a large pedigree of Scots pine, incorporating information on both additive and dominance
genetic relationships (Waldmann et al. 2008); and genomic information obtained from a
6
METHODS
Statistical model: Following Henderson (1984), we made use of the following linear
mixed eect model under Gaussian assumptions:
y = Xb + Zu + e, (1)
for all members in the population; b is a vector of size p × 1 containing systematic en-
multivariate normal distribution with zero mean vector, and covariance structure Gσu2 ;
X and Z are known incidence matrices relating phenotypic records to respective loca-
tion parameters included in (1); and e is a vector containing independent residual er-
rors that follow a multivariate normal distribution with zero mean vector, and covari-
ance structure Iσe2 , where I is the identity matrix of order n. Note that records can be
missing for some pedigree members (here, yi =0 NA0 if individual i has a missing record) .
Typically, u contains the additive polygenic eect, although non-additive genetic eects
such as dominance, or QTL eects estimated from marker data, could be included in
the model. To make inferences in model (1), the mixed model equations (MMEs) can
for the estimation of genetic parameters in the linear mixed eect model is Markov chain
7
Monte Carlo (MCMC) methods (e.g. Wang et al. 1993; Sorensen and Gianola 2002; Bauer
et al. 2009).
The most common distribution used for u in model (1) is the multivariate normal distri-
bution (Lynch and Walsh 1998, pp 194), since u contains n variables (u1 , u2 , . . . , un ), that
on their own are assumed to be normally distributed. The multivariate normal is, therefore,
a natural choice of distribution for u. For example, the traditional polygenic model relies
on normal distribution assumptions for the Mendelian inheritance of genes from parents to
ospring (Bulmer 1971). In addition, when using Bayesian inference to estimate parameters
in model (1), the property of the multivariate normal distribution helps to form conditional
distributions, which are of key importance in MCMC sampling (Sorensen and Gianola 2002;
Rue and Held 2005). The hierarchical structure of model (1) can be usefully interpreted
as a graphical model, which facilitates computations because this representation allows the
joint distribution of genetic eects and other parameters to be broken down into products
of local components (Lauritzen et al. 1990). In the present paper, genetic eects (u) are
either the pedigree, estimated relatedness from markers; or both. The factorization of the
dependency structure in the graph gives (1) a Markov property (Lauritzen et al. 1990),
which can be successfully utilized in Bayesian MCMC methods. See Rue and Held (2005)
for a comprehensive survey of this topic; and see Steinsland and Jensen (2010) for how to
use the Markov property for making inference in the classical animal model. In addition, the
standard decomposition procedure of the additive polygenic eect (Thomas 1992; Lin 1999;
Waldmann 2009), utilizes the Markov property of the animal model, where an ospring is
8
conditioned on its parents in the analyzed pedigree.
number of parameters becomes large, e.g. when performing genetic evaluations of complex
distribution into conditionally dependent parts. If we replace the multivariate normal prior,
with the product of these (lower dimensional) distributions, both the mean and the variance
are shifted, for each conditional distribution. Let us assume that we wish to decompose u
into two subsets of column vectors, uT = [uT1 uT2 ], where u1 and u2 are of length l and
n − l, respectively. The mean and variance can be expressed as:
· ¸ · ¸
E (u1 ) V ar (u1 ) Cov (u1 , u2 )
E (u) = and V ar (u) = , (2)
E (u2 ) Cov (u1 , u2 ) T
V ar (u2 )
E (u2 |u1 ) = E (u2 ) + (u1 − E (u1 ))T V ar (u1 )−1 Cov (u2 , u1 ) , (3)
V ar (u2 |u1 ) = V ar (u2 ) − Cov (u2 , u1 )T V ar (u1 )−1 Cov (u2 , u1 ) . (4)
9
(typically the number of members in the pedigree, i.e. N = n ), we have
N
Y
p(u1 , u2 , u3 , . . . , uN | σu2 ) = p(ui |ui−1 , . . . , u1 , σu2 ) = p(u1 | σu2 )p(u2 | u1 , σu2 )p(u3 | u2 , u1 , σu2 ) . . .
i=1
(6)
where σu2 is the variance component of u. Our target is to generate u, which is a realization
from a multivariate normal distribution with given mean (vector of zeros) and covariance
that have been drawn so far, i.e., from p(ui | u1 , u2 , . . . , ui−1 , σu2 ). It should be emphasized
that this sequential strategy is exact and will lead to the correct vector u, drawn from the
full multivariate normal distribution M V N (0, Gσu2 ). First, let us assume individual-wise
partitions for u. Conditional expectation and conditional variance of p(ui |u1 , . . . , ui−1 , σu2 )
are thought of here as the weighted mean and the weighted variance for pedigree member i.
To compute weights for the mean (for individual i), the following general expression can be
used:
i−1
X
W (i) = w(i, j)uj . (7)
j=1
The precalculated weights are then read into WinBUGS together with the data. The code
for the weights in the model only includes one indexed univariate normal distribution with
conditional mean (eqn. 7) and variance (eqn. 4) as a prior for ui . Hence, for every pedi-
gree member i, we have one vector w(i, j) : j = 1, 2, . . . , (i − 1) calculated for the mean
where most of the terms are zero; this feature yields a sparse format which is suitable for
storing the weights. The weights for the mean (w(i, j)) and variance, which specify the con-
10
ditional prior distribution of each individual, need to be calculated only once and are thus
computed outside the MCMC estimation (i.e. before compilation of the code). The order of
the weights is important since the drawing of samples needs to follow the same unique order
throughout the simulation process. Furthermore, it is also possible to use the same principle
to update the parameters in blocks, sampling from multiple multivariate normal distribu-
tions, each of small dimension. When estimating the additive polygenic eect, the approach
proposed here gives identical results to the standard decomposition suggested earlier (e.g.
Thomas 1992; Lin 1999; Waldmann 2009) for non-inbred pedigrees (non-related parents).
Our proposed method could, however, incorporate non-zero covariances between parents,
and inbreeding coecients greater than zero, if complex relationships between relatives aris-
ing from dominance are neglected (e.g. Abney et al. 2000). Two small numerical examples
for computing a realization of additive and dominance polygenic eects, and illustrating the
eect of inbreeding on the additive polygenic covariance structure are given in the Appendix.
was shown to be a good and robust non-informative choice of prior for variance compo-
nents (Gelman 2006). This prior corresponds to an inverse-χ2 distribution with −1 degrees
of freedom, that is p(σi2 ) ∝ σi−1 , i = u, e. To obtain an upper boundary for the uniform
distributions were assigned as priors for the variance components, thereby, obtaining esti-
mates of standard deviations; these were then multiplied by 5 to obtain the upper bounds.
The chosen upper bounds were considerably larger than the upper bounds of the 95% highest
11
probability density (HPD; Box and Tiao 1973) regions obtained in the preliminary analysis.
Note that this procedure is a pragmatic solution and should not be viewed as a strictly
Bayesian solution. A at, noninformative prior was assigned to the systematic environmen-
tal xed eect in both examples, as bj ∼ N (0, 106 ) for systematic eect level j . For u, we used
the common multivariate normal distribution as prior: u ∼ M V N (0, Gσu2 ) (Sorensen and
Gianola 2002), and then used the decomposition of the multivariate normal distribution into
Q
univariate normal distributions, proposed here as p(u|σu2 ) = qi=1 p(ui |u1 , u2 , . . . , ui−1 , σu2 ).
The vector y is assumed to follow a Gaussian distribution; thus the likelihood function is
given as:
n
Y Y ½ ¾
1 (yi − bj − ui )2
p(y|b, u , σe2 ) = p(yi |bj , ui , σe2 ) = √ exp − , (8)
i=1 i∈O
2πσe 2σe2
where bj is the corresponding systematic eect level (covariate) of pedigree member i, con-
nected through X ; and O is the set of members in the pedigree for which phenotypic
records are available. If individual i has a missing record, note that p(yi |bj , ui , σe2 ) = 1. If
be independent a priori, the joint distribution of all parameters conditional on the data is
proportional to:
p(b, u, σu2 , σe2 |y) ∝ p(b)p(u|σu2 )p(σu2 )p(σe2 )p(y|b, u, σe2 ). (9)
The phenotype model (1) was run in the Bayesian software package WinBUGS (Lunn
WinBUGS exploits a graphical modelling technique to translate the supplied prior distribu-
tions of the parameters into corresponding full conditional distributions. The computation
12
of the weights (7) using (3) and (4) was executed in ANSI C. The WinBUGS code is available
in electronic supplement S1, while the computer code used for calculating weights is available
and compare the mixing to alternative implementations, we calculated the eective sample
size (ESS) of the obtained MCMC chains (Kass et al. 1998; Waagepetersen et al. 2008).
ESS can be seen as the number of independent samples from the estimated posterior which
contain equivalent amount of information (i.e. exceed same estimation accuracy) than our
dependent MCMC samples. Low ESS values indicate poor mixing (i.e. high auto-correlation
Polygenic example: To verify our proposed decomposition method, data acquired from
a 26-year-old eld trial of Scots pine (Pinus sylvestris L.), previously published by Wald-
mann et al. (2008), Finley et al. (2009), Hallander and Waldmann (2009), and Waldmann
(2009), were analyzed to obtain posterior distributions of additive and dominance polygenic
eects for all trees in the pedigree. The pedigree consists of 52 parents crossed according
to a partial diallel design resulting in mixed half-sib and full-sib families totaling 4970 sur-
viving ospring. The parents were assumed to be non-related and non-inbred. In total, 202
families were distributed over approximately 4 ha of forest. The eld trial was subdivided
into 70 square (or nearly square) blocks, which were used in the subsequent evaluations as a
systematic environmental eect. Several traits of interest for breeding purposes were mea-
sured in 1997, although for the present study we chose to analyze only trunk diameter at
breast height (DBH). The mean value of DBH was 114 mm. We made use of the following
covariance structure in the mixed linear model: G1 σu21 = Aσa2 and G2 σu22 = Dσd2 , where A
13
is the additive relationship matrix; σa2 is the variance component of additive genetic eects;
D is the dominance relationship matrix; and σd2 is the variance component of dominance
genetic eects. A and D were calculated using standard equations (Lynch and Walsh 1998,
pp 763 and 768). Uniform densities were assigned as prior distributions for σa and σd as
described in the previous section. See Waldmann et al. (2008), Finley et al. (2009) and
Hallander and Waldmann (2009), for a more detailed analysis of the Scots pine pedigree.
Genomic example: This was a simulated dataset, typical of the data acquired from an
animal breeding protocol, consisting of 5865 pedigree members from seven generations. The
There are 6000 loci evenly distributed over six chromosomes (1000 markers per chromosome),
with 0.1 cM between markers. Forty-eight QTLs, each of small eect, were assumed to code
for the trait of interest. Pedigree and phenotype information were available from the rst four
generations of animals. The animals from generations ve to seven had no given phenotype,
but did have complete marker information. From each generation, 15 males and 150 females
were randomly selected and mated according to a hierarchical mating design, resulting in a
total of 1500 animals being born per generation. Interested readers are invited to consult
Lund et al. (2009) for further details of the dataset. The genomic relationship matrix (or
realized relationship matrix) was computed using the second method proposed by VanRaden
of size n × m containing genotypes at all m loci for all n members of the pedigree; P is
a matrix of the same size containing allele frequencies that dier from 0.5 at all loci; and
14
nally, D is a diagonal matrix of size m × m with diagonal elements 1
m[2pi (1−pi )]
where pi is
the allele frequency at locus i. We analyzed a subset of the complete pedigree consisting of
the rst four generations (1014 animals in total) in order to reduce computational time. See
electronic Table S1 for original identication numbers of pedigree members in the analyzed
sub-population.
RESULTS
Polygenic example: To facilitate a comparison between the results obtained from our
proposed method and the results reported by Waldmann et al. (2008), we performed the same
procedure as Waldmann et al. (2008) when analyzing the Scots pine pedigree. One MCMC
was run for a total of 225,000 iterations, from which the rst 25,000 iterations were omitted
(burn-in) from the analysis, and every 10th iteration was saved (thinned), resulting in an ef-
fective sample of 20,000 iterations. Table 1 shows the results of the analysis using our method,
together with the results from Waldmann et al. (2008) for the trait DBH. Our posterior
point-estimates and their 95 % HPD regions closely agree with those obtained by Waldmann
et al. (2008), although slightly dierent degrees of freedom for the inverse-χ2 distributions
used as prior to the variance components (−1 in our method while Waldmann et al. (2008)
used −2 in theirs) could cause some dierences in the respective posterior distributions. We
believe, however, that the priors have little inuence on the parameter estimates obtained
from the analyzed data, mainly due to both the large size of the pedigree and the similar
15
parameter estimates obtained by Waldmann et al. (2008) and Finley et al. (2009) using
dierent priors.
The additive and dominance covariance structures resulted in 9,940 and 61,247 non-
zero weights, respectively. For variance components and heritability, 95% HPD intervals
were estimated using the R library, 'boa' (Smith 2007). In WinBUGS, each scan of the
MCMC took 0.4241 seconds on an AMD Opteron(tm) Dual Core Processor (2.39 GHz) with
1 GB of RAM. The corresponding average time, for each MCMC scan, using the method in
Waldmann et al. (2008) was 1.840 seconds. In each iteration in the MCMC procedure, all
non-zero weights for means need to be multiplied by the corresponding genetic parameter
of the preceding pedigree members (a matrix-vector multiplication: see equation 7). The
actual number of non-zero weights depends on the covariance structure, as more non-zero
relationships will result in a higher number of non-zero weights. For the Scots pine data,
little computational time was needed to obtain reliable posterior estimates due to the leve
the total computational time is greatly reduced in the current example because of the few
non-zero weights. Hence, our proposed method seems to be very benecial for analyzing
polygenic data.
The slightly lower eective sample size (ESS; sample size adjusted for auto-correlation)
obtained by our method reects the fact that single-site Gibbs sampling is performed in
WinBUGS, while the hybrid Gibbs sampler, proposed by Waldmann et al. (2008) also
applies block updating of parameters, which is known to improve the mixing of the MCMC
16
et al. (2008) improves the ESS further. However, in this case, the marginally lower ESS of
the current method is well compensated for by the improved speed. Furthermore, we tested
two additional updating options (block hybrid and conjugated multivariate) in OpenBUGS
version 3.0.3 (Thomas et al. 2006), to see whether the ESS was improved. Only the ESS of
dominance genetic eects was improved while the ESS of the other parameters was unaected
or even decreased (results not shown). On the other hand, the computing time increased
markedly: we therefore believe that the standard updating (multivariate forward) option in
whether changing the sequential order of conditioning would generate a dierent number of
weights and, thereby, a dierence in computational time. However, the number of weights
remained the same, regardless of the sequence, and we therefore can conclude that the order
Genomic example: First, a purely additive polygenic model was used to obtain initial
values of the variance components for all 225,000 iterations. A preliminary analysis was then
conducted in order to estimate the standard deviations of the variance components, which
were used to set the upper limit of the uniform prior distributions. The MCMC procedure was
run for 23,000 scans in total, from which the rst 3,000 scans were discarded to give 20,000
saved iterations. The heritability estimates obtained are shown in Table 2. The posterior
point-estimates from our analyzed subset agree closely with the true corresponding parameter
values given in Lund et al. (2009), both for posterior mean and mode. However, the additive
polygenic model resulted in heritability point-estimate being too large although with both
models, true value is included within the estimated 95% HPD regions. In addition, in Table 2,
17
it is important to note that 95% HPD region obtained by a genomic model is clearly more
narrow than the one given by a polygenic model, suggesting that the inclusion of the genomic
relationship matrix improves the estimation accuracy of heritability. Similar conclusions have
been drawn by, for example, Meuwissen et al. 2001, Villanueva et al. (2005) and Hayes
The computational eort in the genomic example was unfortunately massive, due to the
large number of non-zero weights (in total: 513,591). On the same computer as that used for
the polygenic example, each scan took 39.40 seconds. Initially, we truncated small elements in
the genomic relationship matrix in order to reduce the number of non-zero elements, but this
modication eectively prevented convergence of the MCMC chain. Truncating small values
in the resulting weight matrix, instead of the genomic matrix itself, might be a more suitable
procedure because the individual weights for computing the variance of each genetic parameter
(i.e. each normal distribution) are correctly calculated given the genomic covariance matrix,
and only weights used for computing the mean of each normal distribution are aected.
Conversely, if elements in the genomic covariance structure are truncated before the
decomposition procedure, then weights for computing the means and the variances of the
normal distributions of all parameters are aected. By truncating small elements in the weight
matrix, the computational time could potentially be greatly reduced, although the accuracy
of estimated posterior and 95% HPD regions would be negatively aected. However, some
preliminary experiments on truncating weights have given promising results (i.e. the obtained
posterior estimates were only slightly aected), although we chose to include the full weight
matrix when analyzing the simulated data (other results not shown).
18
DISCUSSION
Bayesian method that allows us to make inferences in linear mixed eect models with a large
number of genetic parameters. For example, the proposed method can be used for the following:
variance component based linkage and association mapping methods for the estimation of QTL
eects; estimating non-additive genetic eects, such as dominance and epistasis; estimation
which are important in breeding evaluations. The approach was implemented in the user
friendly computer software WinBUGS, and was shown to be fast and accurate on both real
and simulated data. By using this approach, researchers will be able to perform advanced
genetic evaluations of complex traits and pedigrees without possessing advanced knowledge
Recently, several studies have shown that the accuracy of genetic evaluations can be in-
creased by incorporating the genomic relationship matrix (Villanueva et al. 2005; Misztal et
al. 2009; Hayes et al. 2009). For large, complex pedigrees with a high number of polymor-
phic markers, the resulting genomic relationship matrix will probably be dense, i.e. most
pedigree members will have non-zero, pair-wise estimated relatedness. A general problem
with this approach is that when making inferences in animal models, most currently avail-
able methods rely on either sparse solvers (e.g. Schaeer and Kennedy 1986; Johnson and
19
Thompson 1995; Waldmann et al. 2008) or on ecient graph model techniques (Wilkinson
and Yeung 2004; Rue and Held 2005; Steinsland and Jensen 2010). Compared to standard,
non-sparse methods, these methods will not result in the same reduction in computational
time when incorporating sparse covariance structures (i.e. when using pedigree information
only). Unfortunately, even though our proposed method can handle dense genetic covari-
ance structures, as demonstrated in the genomic example, the computational time required
is massive. One way to overcome this hurdle would be to truncate small elements in the
weight matrix obtained by our approach, thereby obtaining a good approximation of the
the covariance matrix to facilitate the inference of parameter estimation as, for example,
suggested by Mrode and Thompson 1989, Wilkinson and Yeung (2004) and Waldmann et al.
(2008). As a result, estimating parameters using the linear mixed model does not depend
A good mixing property of the MCMC method is very important in Bayesian analysis in
order to obtain reliable posterior estimates, especially if parameters are highly correlated in
the model. In Gibbs sampling, the updater samples from the fully conditional posterior dis-
tribution, which is proportional to the likelihood function and the prior distribution through
Bayes theorem. Our proposed method samples from the multivariate normal distribution for
the prior, but samples from the likelihood function are taken for one parameter at a time,
which introduces dependencies to the posterior (i.e. introduces higher correlation between
parameters in the posterior as these are drawn element-wise). Thus, the mixing property
of our algorithm does not match the mixing properties achieved with block updating of pa-
20
rameters, where sampling from both prior distribution and the likelihood are performed in
a block (García-Cortés and Sorensen 1996; Roberts and Sahu 1997). On the other hand, for
and prior are made for each parameter, which introduces heavy dependencies to the posterior
distribution, resulting in poor mixing properties of the MCMC chain (Sorensen and Gianola
2002). Hence, our method should result in better mixing than single site updating but result
in less eective mixing than block updating of parameters. This insight is conrmed, to some
extent, empirically when the mixing property in our approach was compared to the mixing
property of the standard single-site sampler (Sorensen and Gianola 2002) implemented in C
(results not shown). However, this comparison should be interpreted bearing in mind that
WinBUGS uses an expert system that attempts to utilize the most appropriate sampling
bine our suggested decomposition approach with block updating of parameters into a hy-
brid sampler. A similar approach (i.e. combining single-site and block sampling) was im-
plemented by Waldmann et al. (2008), which resulted in better mixing of the MCMC
chain than was obtained in pure single-site updating. However, it would not be possible
to implement the combined sampling approach in WinBUGS, as the block updating re-
quires a large equation system to be solved during each iteration in the MCMC procedure.
can be randomly updated in each MCMC step and not in the same sequential order, as
21
to apply transformation of the location parameters in the model (Vine et al. 1996;
The lack of freely available computer packages designed for the genetic evaluation of com-
plex pedigrees using a Bayesian framework has, unfortunately, prevented more regular use
of these models. The graphical model representation within the Bayesian software package
WinBUGS is very well suited for decomposition of joint distributions into products of local
components, i.e. parent and ospring nodes in a directed acyclical graph (DAG) (Lunn et al.
2000 ; 2009 ). WinBUGS also applies an intelligent, automatic approach to the choice of up-
daters needed in the implementation of the MCMC procedure. Both Damgaard (2007) and
Waldmann (2009) successfully executed the animal model in WinBUGS and produced re-
sults that show how evaluating the genetics of complex pedigrees can be performed smoothly
without the need of expensive hardware and software. As an extension to these studies, we
have shown how general relationship structures can be decomposed and hence be eciently
implemented in WinBUGS.
One important improvement oered by our proposed method, compared to the standard
factorization method of additive polygenic eects (e.g. Thomas 1992; Lin 1999; Waldmann
2009), is the ability to obtain realizations from the correct additive covariance structure of
inbred populations. If such populations are analyzed with the standard factorization model,
level of inbreeding and the size of the pedigree. Problems with handling inbred pedigrees
arise with the standard model because the covariances between parents are assumed to be
zero. Since it is not uncommon to have some degree of inbreeding in both breeding and
22
natural populations of animals and plants, the standard factorization of additive polygenic
relationships can be erroneous. Dierences in the estimated posterior of the polygenic vari-
ance components, obtained by the standard factorization model and our approach, need
further verication in extensive computer simulations. It should be noted that the non-
We have, in the present study, demonstrated the benet of decomposing the multivariate
normal distribution, often used as prior for genetic eects in the standard, linear mixed eect
model. To our knowledge, this procedure of decomposing the prior distribution has not been
utilized before, in the context of parameter estimation in genetics. The decomposion approach
was put forward and excecuted successfully in WinBUGS by Vines et al. (1996); they utilized
a random eect model to analyze clinical data in the context of epidemiology. However,
they did not include covariance between random eects, which makes the decomposition
procedure more complex. In general, there also exist alternative procedures for ecient
implementing the prior distribution, which deserves more attention. One such alternative,
which is likely to be equally ecient than the approach presented here, is obtained by
root) of the original covariance matrix (Golub and Van Loan 1996). As in our approach,
(Cholesky) weights can be calculated once, prior to the WinBUGS analysis. Both procedures
involves one matrix-vector multiplication each iteration in the MCMC process and are,
therefore, likely to be computational equally time consuming for analysis of large pedigrees.
23
ACKNOWLEDGMENTS
This work was supported by funding from: the Research School of Tree Breeding and Forest
Genetics for JH and PW; Föreningen svensk skogsträdsförädling for CW and PW; and re-
search grants from the Academy of Finland and the University of Helsinki's Research Funds
for MJS. The Scots pine data used in the polygenic example were provided by the Swedish
Forest Research Institute, Skogforsk. In addition, we wish to thank the associate editor and
LITERATURE CITED
Abney, M., M. S. McPeek and C. Ober, 2000 Estimation of variance components of quanti-
Bauer, A. M., T. C. Reetz, F. Hoti, W. -D. Schuh, J. Léon et al., 2009 Bayesian prediction
Box, G. E. P., and G. C. Tiao, 1973 Bayesian Inference in Statistical Analysis. Wiley, New
York.
Bulmer, M. G., 1971 Eect of selection on genetic variability. Am. Nat. 105: 201-210.
24
of genomic selection using dierent methods to dene haplotypes. Genetics 178: 553-561.
Damgaard, L. H., 2007 How to use Winbugs to draw inferences in animal models. J. Anim.
New York.
Finley, A. O., S. Banerjee, P. Waldmann and T. Ericsson, 2009 Hierarchical spatial modeling
of additive and dominance genetic variance for large spatial trial datasets. Biometrics 65:
441-451.
Flint, J., and T. F. C. Mackay, 2009 Genetic architectures of quantitative traits in ies, mice
Gelman, A., 2006 Prior distributions for variance parameters in hierarchical models. Bayesian
Anal. 1: 515-534.
Golub, G. H., and C. F. Van Loan, 1996 Matrix Computations, 3rd Edition. Johns Hopkins
Habier, D., R. L. Fernando, and J. C. M. Dekkers, 2009 Genomic selection using low-density
Hallander, J., and P. Waldmann, 2009 Optimum contribution selection in large general tree
25
breeding populations with an application to Scots pine. Theor. Appl. Genet. 118: 1133-
1142.
Harris, D. L., 1964 Genotypic covariances between inbred relatives. Genetics 50: 1319-1348.
Hayes, B. J., P. M. Vissher and M. E. Goddard, 2009 Increased accuracy of articial selection
Jensen, D. R., 1998 Multivariate normal distribution. In: Armitage P, Colton T, editors.
Johnson, D. L., and R. Thompson, 1995 Restricted maximum likelihood estimation of vari-
ance components for univariate animal models using sparse matrix techniques and average
Kass, R. E., B. P. Carlin, A. Gelman, and R. Neal, 1998. Markov Chain Monte Carlo in
Lauritzen, S. L., A. P. Dawid, B. N. Larsen and H.-G. Leimer, 1990 Independence properties
Legarra, A., I. Aguilar and I. Misztal, 2009 A relationship matrix including full pedigree and
Levine, R. A., and G. Casella, 2006 Optimizing random scan Gibbs samplers. J. Mult. Anal.
97: 2071-2100.
26
Lin, S., 1999 Monte Carlo Bayesian methods for quantitative traits. Comp. Stat. Data
analyses of the QTLMAS XII common dataset. I: Genomic selection. BMC Proc. 3: S1.
Lunn, D. J., A. Thomas, N. Best and D. Spiegelhalter, 2000 WinBUGS - a Bayesian mod-
elling framework: concepts, structure, and extensibility. Stat. Comp. 10: 325-337.
Lunn, D. J., D. Spiegelhalter, A. Thomas and N. Best, 2009 The BUGS project: evolution,
Lynch, M., and B. Walsh, 1998 Genetics and Analysis of Quantitative Traits. Sinauer Asso-
Meuwissen, T. H. E., B. J. Hayes and M. E. Goddard, 2001 Prediction of total genetic value
Misztal, I., 1997 Estimation of variance components with large-scale dominance models. J.
Misztal, I., A. Legarra and I. Aguilar, 2009 Computing procedures for genetic evaluation
including phenotypic, full pedigree, and genomic information. J. Dairy Sci. 92: 4648-4655.
Mrode, R., and R. Thompson, 1989 An alternative algorithm for incorporating the relation-
ships between animals in estimating variance components. J. Anim. Breed. Genet. 106:
89-95.
wide assessment of worldwide chicken SNP genetic diversity indicates signicant absence of
rare alleles in commercial breeds. Proc. Natl. Acad. Sci. USA 105: 17312-17317.
27
O0 Hara R. B., J. M. Cano, O. Ovaskainen, C. Teplitsky, and J. S. Alho, 2008 Bayesian ap-
Ovaskainen, O., J. M. Cano and J. Merilä, 2008 A Bayesian framework for comparative
Roberts, G. O., and S. K. Sahu, 1997 Updating schemes, correlation structure, blocking and
parameterization for the Gibbs sampler. J. R. Stat. Soc. Ser. B 170: 419-431.
Rue, H., and L. Held, 2005 Gaussian Markov Random Fields: Theory and Applications.
Schaeer, L. R., and B. W. Kennedy, 1986 Computing solutions to mixed model equations.
Smith, B. J., 2007 boa: an R package for MCMC output convergence assessment and poste-
Sorensen, D., and D. Gianola, 2002 Likelihood, Bayesian and MCMC Methods in Quantita-
Steinsland, I., and H. Jensen, 2010 Utilising Gaussian Markov Random Field properties of
Thomas, A., R. B. O0 Hara, U. Ligges and S. Sturtz, 2006 Making BUGS Open. R News 6:
12-17.
Thomas, D. C. 1992 Fitting genetic data using Gibbs sampling - an application to nevus
Markov chain Monte Carlo computation in quantitative genetics. Genet. Sel. Evol. 40:
28
161-176.
genetic association of complex traits in heterogeneous stock mice. Nat. Genet. 38: 879-887.
Waldmann, P. 2009 Easy and exible Bayesian inference of quantitative genetic parameters.
Waldmann, P., J. Hallander, F. Hoti and M. J. Sillanpää, 2008 Ecient Markov Chain Monte
Wang, C. S., J. J. Rutledge and D. Gianola, 1993 Marginal inference about variance com-
ponents in a mixed linear model using Gibbs sampling. Genet. Sel. Evol. 21: 41-62.
VanRaden, P. M., 2008 Ecient methods to compute genomic predictions. J. Dairy Sci. 91:
4414-4423.
al., 2009 Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci.
92: 16-24.
Van Tassell, C. P., and L. D. Van Vleck, 1996 Multiple-trait Gibbs sampler for animal mod-
els: exible programs for Bayesian and likelihood-based (covariance) component inference.
dictive ability of direct genomic values for lifetime net merit of Holstein sires using selected
Villanueva, B., R. Pong-Wong, J. Fernández and M. A. Toro, 2005 Benets from marker-
29
assisted selection under an additive polygenic genetic model. J. Anim. Sci. 83: 1747-1752.
Wilkinson, D. J., and S. K. H. Yeung, 2004 A sparse matrix approach to Bayesian compu-
tation in large linear models. Comp. Stat. Data Anal. 44: 493-516.
Vines, S. K., W. R. Gilks and P. Wild, 1996 Fitting Bayesian multiple random eects models.
Visscher, P. M., 2008 Sizing up human height variation. Nat. Genet. 40: 489-490.
Xu, S., 2006 Separating nurture from nature in estimating heritability. Heredity 97: 256-
257.
Yi, N., and S. Xu, 2000 Bayesian mapping of quantitative trait loci under the identity-by-
30
APPENDIX
For individual 1: E(d1 | σd2 ) = 0, V ar(d1 ) = σd2 and d1 | σd2 ∼ N (0, σd2 ).
For individual 2: E(d2 | d1 , σd2 ) = E(d2 ) + [d1 ][ σ12 ][0] = 0, V ar(d2 | d1 , σd2 ) = V ar(d2 ) −
d
" 1
#· ¸
σd2
0 0
For individual 3: E(d3 | d1 , d2 , σd2 ) = E(d3 ) + [d1 d2 ] = 0, V ar(d3 |
0 σ12 0
" 1
#· ¸ d
σd2
0 0
d1 , d2 , σd2 ) = V ar(d3 ) − [0 0] 1 = σd2 and d3 | d1 , d2 , σd2 ∼ N (0, σd2 ).
0 σ 2 0
d
31
For individual 4, both mean and variance are shifted (reduced) because individuals 3 and
4 are full sibs. The following weights for reduction in mean are, consequently, obtained
for individual 4: w(4, 1) = 0, w(4, 2) = 0 and w(4, 3) = 0.25. Using equation (7), we
obtain W (4) = 0.25d3 . Hence, instead of drawing d from M V N (0, Dσd2 ), we make use of
the following univariate normal distributions: d1 | σd2 ∼ N (0, σd2 ), d2 | d1 , σd2 ∼ N (0, σd2 ),
When obtaining realization from additive polygenic eects of pedigree members 1 to 5, our
proposed method and the standard factorization method (e.g., Lin 1999), give exactly the
same mean and variance used in the normal univariate distributions as a1 , a2 , a3 ∼ N (0, σa2 ),
2 2
2 σa 2 σa
a4 ∼ N ( a1 +a
2
, 2 ) and a5 ∼ N ( a3 +a
2
, 2 ). However, for pedigree member 6, the standard
2
5 σa
factorization method yields a6 ∼ N ( a4 +a
2
, 2 ). Our proposed method, on the other hand,
gives
32
−1
1 0 0 0.5 0 0.25
0 1 0 0.5 0.5 0.5
2 2
0 0 1 0 0.5 σa 0.25 σa = a4
+ a5
,
2 2
0.5 0.5 0 1 0.25 0.625
0 0.5 0.5 0.25 1 0.625
Using our proposed method, for individual 6, we make use of the following normal uni-
2
5 3σa
variate distribution: a6 | a1 , a2 , a3 , a4 , a5 , σa2 ∼ N ( a4 +a
2
, 8 ). Consequently, the weight for
the variance component diers between our method (3/8) and the standard method (1/2);
this will cause a6 to be sampled from an incorrect distribution if the standard method is
33
TABLE 1
Summary statistics including posterior estimates (mode, mean and median) and effective
effective sample size (ESS) obtained from the WinBUGS analysis in the polygenic model
example for additive genetic variance ( ), dominance genetic variance ( ), residual
variance ( ), heritability ( ) and dominance proportion ( ). The results of the
decomposition approach are denoted DEC while the results of Waldmann et al. (2008) are
denoted HYB including both MCMC estimates and restricted maximum likelihood (REML)
estimates
Mode Mean Median 95 % HPD region ESS REML
Paramet DEC HYB DEC HYB DEC HYB DEC HYB DEC HYB HYB
er
53.16 54.70 63.29 62.52 60.49 59.62 [29.47, [27.67, 376.4 417.2 55.95
103.1] 103.7]
77.56 82.88 84.69 88.41 82.5 85.71 [39.86, [39.70, 399.2 456.8 83.06
136.9] 142.2]
726.4 722.2 724.3 721.7 725.0 722.6 [670.7, [665.3, 733.2 756.1 728.5
778.3] 776.8]
0.0617 0.0630 0.0724 0.0714 0.0694 0.0685 [0.0340, [0.0327, 372.2 420.9 0.0645
0.1158] 0.1170
0.0894 0.0939 0.0970 0.1014 0.0946 0.1 [0.0463, [0.0500, 394.5 447.5 0.0957
0.1567] 0.1616]
TABLE 2
Posterior estimates obtained from the WinBUGS analysis in the genomic model example for
heritability ( ) using a model including genomic relationship matrix ( ) and a model
including additive polygenic relationship matrix ( ). The true value of for the entire
pedigree should be 0.3 (Lund et al. 2009)
Model Mode Mean Median 95 % HPD region
0.3040 0.2980 0.2997 [0.2553, 0.3384]
0.3376 0.3418 0.3397 [0.2191, 0.4691]
TABLE 3
Example pedigree for sampling of dominance polygenic effects where a 0 indicate that an
individual has an unknown father or mother
Individual Father Mother
1 0 0
2 0 0
3 1 2
4 1 2
TABLE 4
Example pedigree for sampling of additive polygenic effects where a 0 indicate that an
individual has an unknown father or mother