Estimation of Heritability From Limited Family Data Using Genome-Wide Identity-By-Descent Sharing
Estimation of Heritability From Limited Family Data Using Genome-Wide Identity-By-Descent Sharing
http://www.gsejournal.org/content/44/1/16
Ge n e t i c s
Se l e c t i o n
Ev o l u t i o n
Abstract
Background: In classical pedigree-based analysis, additive genetic variance is estimated from between-family
variation, which requires the existence of larger phenotyped and pedigreed populations involving numerous
families (parents). However, estimation is often complicated by confounding of genetic and environmental family
effects, with the latter typically occurring among full-sibs. For this reason, genetic variance is often inferred based
on covariance among more distant relatives, which reduces the power of the analysis. This simulation study shows
that genome-wide identity-by-descent sharing among close relatives can be used to quantify additive genetic
variance solely from within-family variation using data on extremely small family samples.
Methods: Identity-by-descent relationships among full-sibs were simulated assuming a genome size similar to that
of humans (effective number of loci ~80). Genetic variance was estimated from phenotypic data assuming that
genomic identity-by-descent relationships could be accurately re-created using information from genome-wide
markers. The results were compared with standard pedigree-based genetic analysis.
Results: For a polygenic trait and a given number of phenotypes, the most accurate estimates of genetic variance
were based on data from a single large full-sib family only. Compared with classical pedigree-based analysis, the
proposed method is more robust to selection among parents and for confounding of environmental and genetic
effects. Furthermore, in some cases, satisfactory results can be achieved even with less ideal data structures, i.e., for
selectively genotyped data and for traits for which the genetic variance is largely under the control of a few major
genes.
Conclusions: Estimation of genetic variance using genomic identity-by-descent relationships is especially useful for
studies aiming at estimating additive genetic variance of highly fecund species, using data from small populations
with limited pedigree information and/or few available parents, i.e., parents originating from non-pedigreed or
even wild populations.
© 2012 Ødegård and Meuwissen; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the
Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Ødegård and Meuwissen Genetics Selection Evolution 2012, 44:16 Page 2 of 10
http://www.gsejournal.org/content/44/1/16
on between-family variation only, since the Mendelian for humans, and thus they were typical of species with
sampling deviations of non-parents cannot be separated relatively large genomes. Variation in IBD sharing was
from the residual (or permanent environmental) effects simulated using a model with 80 “effective loci” (n e )
on the same animals [3]. Estimation of genetic variance within a family (equivalent to human genome size).
based on pedigree relationships is further complicated by Effective loci are defined as the number of indepen-
the fact that common environmental effects may be dently segregating “loci” that would yield the same stan-
important for some relatives, especially full-sibs (e.g., dard deviation of the proportion of genome shared
maternal environment, rearing environment, litter effects, among full-sibs as observed in real genomic data from
etc.), which means that the genetic variance must be esti- human sib pairs [4]. Hence, an “effective locus allele” is
mated from covariances among phenotypes of more dis- not a specific mutation, but is equivalent to a long hap-
tant relatives (e.g., half sibs, cousins, etc.). lotype block passed on from parent to offspring. For
Due to linkage between loci within the same chromo- simplicity, it was assumed that different families were
some, parents tend to pass on long segments of DNA to unrelated and that inbreeding was zero. For an “effective
their offspring. Hence, the “effective” number of segregat- locus” i, the IBD relationship of two full-sibs was there-
ing loci within a full-sib family will be much lower than fore defined as 0 if none of the paternal and maternal
the corresponding number for the whole population, “alleles” (haplotype blocks) were IBD, 0.5 if either their
even for species with a larger genome. For example, paternal or maternal “alleles” were IBD and 1 if both
recent reports have indicated that the effective number of their paternal and maternal “alleles” were IBD. The
segregating loci among full-sib pairs in humans is only actual relationship between two full-sibs was then
about 80 [4,5]. When the effective number of segregating defined as the average relationship across all “effective
loci is low, the actual relationships among full-sibs vary loci” (i.e., representing the whole genome). An example
substantially among sib-pairs. Visscher et al. [5] have esti- of the distribution of actual relationships in a large
mated that actual relationships among human full-sibs simulated full-sib family is shown in Figure 1. Since all
vary from 0.37 to 0.62, and used these relationships to relationships among full-sibs are based on the inheri-
quantify the additive genetic variation of human height tance of a limited number of “effective loci” (ne = 80),
based on within-family segregation only, i.e., free from the actual relationship matrix cannot be of full rank for
non-genetic factors. In this study, the heritability values large size families, which introduces numerical problems
were based on more than 3000 sib pairs. With such a in data simulation and analysis. Therefore, the relation-
large dataset, including numerous families, the main ship matrix was forced to be positive definite by adding
challenge is not to estimate between-family variation, but a small positive value (10-3) to each diagonal element
rather to separate genetic effects from other effects that (sufficiently small to have a neglible effect on the genetic
act on a family level. Visscher et al. [5] pointed out that (co)variance structure).
one limitation of their method was that it required large Data sets and data structures
datasets with densely genotyped individuals. Indeed, for a Nine data structures were generated, using various num-
sib-pair design (twin study), a large number of full-sib bers of full-sib families (1-10) and individuals (200-1000)
pairs would be needed. However, for livestock, aquacul- with data (Table 1). Furthermore, three scenarios were
ture species and laboratory animals, population struc- defined (Table 2), all assuming moderate heritability,
tures are usually very different from those in humans, but differing with respect to the distribution of genetic
with much larger progeny groups of either full- or half
sibs (or both). Therefore, the aim of the current study
was to test whether genetic variance could be accurately
estimated with relatively small datasets and a limited
number of families, using a population structure typical
of a high fecundity species (e.g., insects, crustaceans, fish
or poultry), and whether the results could also be gener-
alized to species in which only one of the sexes (usually
males) has a large reproductive potential (e.g., mamma-
lian livestock).
Methods
Simulation study
Genomic identity-by-descent relationships
The IBD relationships were simulated so that they clo- Figure 1 Example of actual relationships among simulated full-
sibs in a single family (N = 1000).
sely resembled the relationships estimated with real data
Ødegård and Meuwissen Genetics Selection Evolution 2012, 44:16 Page 3 of 10
http://www.gsejournal.org/content/44/1/16
For both models, variance components were estimated heritabilities were estimated with moderate to high preci-
with restricted maximum likelihood methodology using sion even with the smallest datasets (200 animals).
the ASREML software package [7]. When assuming no common environmental variance
Selective genotyping The genomic IBD model assumes (Scenario 2), the pedigree-based analyses were also
that all animals are genotyped with a sufficiently dense unbiased but they were less precise than the genomic IBD
marker map covering the entire genome. However, in analyses (Figure 3). As expected, if common environmen-
some studies, selective genotyping of phenotypically tal effects were not included in the data, the precision of
extreme (high/low) animals within each family may be the estimated heritability was improved, in particular for
used to save costs. This may be a useful approach for QTL the smallest datasets using the classical model, while the
(Quantitative Trait Loci) detection, but our aim was to precision of the IBD model was unaffected for the largest
evaluate whether such data could be used to estimate datasets (1000 individuals). The differences between the
quantitative genetic variation as well. In these analyses, we two models were most pronounced with larger datasets
assumed a single family with 200, 500 or 1000 full-sibs, with a few families. For the IBD genomic model, within-
and for which only individuals with phenotypes deviating family variation dominated estimation of genetic variance,
more than one residual standard deviation from the mean and thus reducing family sizes to give room for more
were genotyped. However, because including only the gen- families led to more imprecise estimates of genetic
otyped (phenotypically extreme) animals in the analysis variance.
would probably yield overestimated variance components, For selectively genotyped data, the genomic IBD model
the non-genotyped animals were also included in the ana- was also able to estimate the genetic variance based on a
lysis. For the analyses, genomic IBD relationships among single large (N = 1000) family. However, single-family esti-
genotyped individuals were combined with pedigree rela- mates based on smaller samples (200 or 500) tended to be
tionships of non-genotyped individuals in a common rela- overestimated, and the precision of the estimates were
tionship matrix [8,9]. reduced compared to that with full genotyping (Figure 4).
If all the genetic variance was located in only one “effec-
Results tive locus”, and no common environmental variance
The estimated heritabilities (across-replicate means and existed (Scenario 3), estimation of heritability was still
standard deviations) for the different structures under unbiased for both the genomic IBD and the pedigree-
Scenarios 1 and 2 are presented in Figures 2 and 3, based (more than one family) methods (Figure 5). With
respectively. For the classical pedigree-based analyses, the larger datasets (500-1000 individuals), the genomic IBD
data structure did not make it possible to separate per- method was more precise, but the two methods were
manent environmental effects common to full-sibs from equally imprecise for the smallest datasets, and, in contrast
genetic effects, since both factors are estimated from with the earlier results, the single-family design yielded
between-family variation only (and no other relatives highly imprecise results with the genomic IBD model.
than full-sibs were present). Hence, the estimated herit-
ability in the classical model was biased by the common Discussion
environmental component, resulting generally in over- This study shows that tracing genomic IBD relationships
estimated genetic variance. Furthermore, when the num- using genomic information has clear advantages, not only
ber of families included in the dataset was low, the for prediction of individual breeding values [10] but also
estimates also varied substantially from replicate to repli- for estimation of genetic (co)variance components. Both
cate. For the one-family designs, no between-family the current and earlier studies have shown that genetic
variation existed, and therefore, by definition, genetic variance can be estimated based on within-family varia-
variance could not be estimated with a classical pedigree- tion. In contrast, estimation of genetic variance in a classi-
based model. However, for all the designs, the genomic cal genetic analysis is based only on between-family
IBD model was able to estimate genetic variance, due to variation. Hence, for the latter, it is imperative that genetic
the fact that the model inferred genetic variance from and non-genetic family effects are properly separated by
within-family variation, and multiple families were there- the model, which puts major limitations on the usefulness
fore not needed. Moreover, even with multiple families, of family data, e.g., resemblance among full-sibs may also
the heritability estimates were unbiased and much more be due to similarities in the environment. Furthermore, for
accurate than with the classical model. Furthermore, an accurate estimation of genetic variance in a classical
precision of the heritability estimate in the IBD model model, many families must be included in the study and
increased with increasing family sizes and were most pre- selection of data should be avoided. However, by using
cise for single-family designs (i.e., largest family size for a actual IBD sharing among sibs instead of expected rela-
given number of observations). For the latter design, tionships, genetic variation can be quantified solely from
Ødegård and Meuwissen Genetics Selection Evolution 2012, 44:16 Page 5 of 10
http://www.gsejournal.org/content/44/1/16
Figure 2 Across-replicate averages of estimated heritability (with between-replicate standard deviations of the estimates) by total
number of observations using a classical animal model (a) and a genomic IBD animal model (b) with 1, 5 or 10 families for Scenario
1. The dotted line represents the true input heritability.
within-family variation [5], which also facilitates proper study shows that with the genomic IBD approach, genetic
separation of genetic and non-genetic family effects (as the variance can be accurately inferred from a single family,
latter do not affect within-family variation). The current and for a given number of observations, including more
Ødegård and Meuwissen Genetics Selection Evolution 2012, 44:16 Page 6 of 10
http://www.gsejournal.org/content/44/1/16
Figure 3 Across-replicate averages of estimated heritability (with between-replicate standard deviations of the estimates) by total
number of observations using a classical animal model (a) and a genomic IBD animal model (b) with 1, 5 or 10 families for Scenario 2.
The dotted line represents the true input heritability.
families gives less accurate results (even in the absence of Falconer and Mackay [11] showed that the optimal family
common environmental effects). For a classical analysis size for a specific number of observations under a full-sib
(and in absence of common environmental effects), design was n = 2/h 2. However, using the genomic IBD
Ødegård and Meuwissen Genetics Selection Evolution 2012, 44:16 Page 7 of 10
http://www.gsejournal.org/content/44/1/16
Figure 5 Across-replicate averages of estimated heritability (with between-replicate standard deviations of the estimates) by total
number of observations using a classical animal model (a) and a genomic IBD animal model (b) with 1, 5 or 10 families for Scenario 3.
The dotted line represents the true input heritability.
Scenarios 1 and 2 in the current study assumed that model [10]. The main difference between the GBLUP
genetic variance is evenly distributed over genomic model and the genomic IBD model is that the first
regions, as assumed in the genomic BLUP (GBLUP) model uses identity-by-state (IBS) relationships, while
Ødegård and Meuwissen Genetics Selection Evolution 2012, 44:16 Page 9 of 10
http://www.gsejournal.org/content/44/1/16
the latter uses IBD relationships (based on marker genomic IBD model (or equivalent gametic model)
alleles traced back to a common ancestor). The assump- requires only a single large family for proper and accu-
tion that genetic variance is distributed evenly across rate estimation of heritability for quantitative traits. In
genomic regions has been shown to be an appropriate contrast, classical pedigree-based estimation requires the
approximation for a number of traits [15,16]. However, establishment of a sizeable pedigreed population consist-
there are also examples of the opposite assumption, e.g., ing of numerous full- and (preferably) half-sib families
genetic variation in resistance against infectious pan- to produce estimates with acceptable accuracy. Further-
creas necrosis in Atlantic salmon seems largely con- more, the proposed genomic IBD model is expected to
trolled by a single major QTL [17,18]. For the latter be less affected by selection among parents and will
type of traits, some of the underlying assumptions of facilitate the separation of genetic and non-genetic
both the pedigree-based and genomic IBD models are family effects (e.g., effects of common rearing).
violated. First, within-family genetic variance will vary
greatly among families, depending on the actual parental
Acknowledgements
genotypes ("effective alleles”) for the genomic region This work was supported by the grant 203699 (New statistical tools for
that primarily affects the trait (although it will still on integrating and exploiting complex genomic and phenotypic data sets),
financed by the Research Council of Norway. The helpful comments of two
average be 12 σ 2a ); in the example on Atlantic salmon, the anonymous reviewers are gratefully acknowledged.
within-family genetic variance will depend on whether
Author details
or not the parents segregate for the major QTL. Second, 1
Nofima, P.O. Box 210, NO-1431 Ås, Norway. 2Department of Animal and
IBD relationships in the most important linkage group Aquacultural Sciences, Norwegian University of Life Sciences, P.O. Box 5003,
(s) will dominate genetic covariance between relatives, NO-1432 Ås, Norway.
not the overall genomic or expected (pedigree-based) Authors’ contributions
IBD relationships. Still, even for such data, the genomic JØ was mainly responsible for conception and design of the study, data
IBD model could estimate genetic variance more accu- analysis and writing of the manuscript. THEM contributed to the design of
the study and to writing and revision of the manuscript. Both authors read
rately than the classical pedigree-based analysis. Hence, and approved the final manuscript.
although the genomic IBD relationships are not necessa-
rily representative of the genetic covariance structure Competing interests
The authors declare that they have no competing interests.
among sibs in this situation, they are still more informa-
tive than the pedigree-based relationships. In this set- Received: 1 February 2012 Accepted: 8 May 2012 Published: 8 May 2012
ting, the differences between the classic pedigree and
genomic IBD models increased with the size of the data- References
1. Fisher RA: The correlation between relatives on the supposition of
set (no practical difference with 200 individuals but a Mendelian inheritance. Trans Royal Soc Edinburgh 1918, 52:399-433.
substantial difference with 1000 individuals). However, 2. Hill WG: Variation in genetic composition in backcrossing programs. J
estimation of genetic variance within a single family Heredity 1993, 84:212-213.
3. Ødegård J, Meuwissen THE, Heringstad B, Madsen P: A simple algorithm to
was, as expected, highly prone to sampling effects. In estimate genetic variance in an animal threshold model using Bayesian
Scenario 3, the real number of different breeding values inference. Genet Sel Evol 2010, 42:29.
represented within a single full-sib family is actually lim- 4. Gagnon A, Beise J, Vaupel JW: Genome-wide identity-by-descent sharing
among CEPH siblings. Genet Epidem 2005, 29:215-224.
ited to four (two “effective alleles” per parent), which 5. Visscher PM, Medland SE, Ferreira MAR, Morley KI, Zhu G, Cornes BK,
explains the large between-replicate deviations in the Montgomery GW, Martin NG: Assumption-free estimation of heritability
estimated heritability. Thus, in real data, for which the from genome-wide identity-by-descent sharing between full siblings.
PLoS Genet 2006, 2:e41.
underlying genetics of the trait is generally unknown, it 6. Goddard M: Genomic selection: prediction of accuracy and maximisation
is recommended to use more than one family for quan- of long term response. Genetica 2009, 136:245-257.
titative genetic analysis, even when applying the geno- 7. Gilmour AR, Gogel BJ, Cullis BR, Thompson R: ASReml user guide release 3.0.
30 edition. Hemel Hempstead: VSN International Ltd; 2009.
mic IBD approach. 8. Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ: Hot topic: A
unified approach to utilize phenotypic, full pedigree, and genomic
Conclusions information for genetic evaluation of Holstein final score. J Dairy Sci
2010, 93:743-752.
The proposed genomic IBD method is particularly rele- 9. Christensen O, Lund M: Genomic prediction when some animals are not
vant for quantitative genetic studies aiming at estimating genotyped. Genet Sel Evol 2010, 42:2.
additive genetic variance of highly fecund species, using 10. Meuwissen THE, Hayes BJ, Goddard ME: Prediction of total genetic value
using genome-wide dense marker maps. Genetics 2001, 157:1819-1829.
data on populations with limited pedigree information 11. Falconer DS, Mackay TFC: Introduction to quantitative genetics Essex:
and/or few available parents. For example, genetic var- Longman Group Ltd; 1996.
iance may be estimated based on a few full-sib-families 12. Habier D, Fernando RL, Dekkers JCM: Genomic selection using low-density
marker panels. Genetics 2009, 182:343-353.
with parents sampled from the wild or from non-pedi- 13. Fernando R, Grossman M: Marker assisted selection using best linear
greed domesticated populations. In principle, the unbiased prediction. Genet Sel Evol 1989, 21:467-477.
Ødegård and Meuwissen Genetics Selection Evolution 2012, 44:16 Page 10 of 10
http://www.gsejournal.org/content/44/1/16
doi:10.1186/1297-9686-44-16
Cite this article as: Ødegård and Meuwissen: Estimation of heritability
from limited family data using genome-wide identity-by-descent
sharing. Genetics Selection Evolution 2012 44:16.