0% found this document useful (0 votes)
17 views14 pages

Skaa 101

This article provides a review of the current status of genomic evaluation in animal breeding. It discusses the early applications of genomic selection which relied on estimating SNP effects from genotypes and phenotypes. It describes how genomic selection was simplified through the development of single-step genomic BLUP which combines pedigree and genomic relationships. The article also discusses ongoing issues with genomic selection including new validation methods and addressing reductions in genetic variance.

Uploaded by

Ana Sandoval
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

Skaa 101

This article provides a review of the current status of genomic evaluation in animal breeding. It discusses the early applications of genomic selection which relied on estimating SNP effects from genotypes and phenotypes. It describes how genomic selection was simplified through the development of single-step genomic BLUP which combines pedigree and genomic relationships. The article also discusses ongoing issues with genomic selection including new validation methods and addressing reductions in genetic variance.

Uploaded by

Ana Sandoval
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Journal of Animal Science, 2020, Vol. 98, No.

4, 1–14

doi:10.1093/jas/skaa101
Advance Access publication April 8, 2020
Received: 29 January 2020 and Accepted: 7 April 2020
Board Invited Review

Board Invited Review


Current status of genomic evaluation
Ignacy Misztal,†,1 Daniela Lourenco,† and Andres Legarra‡

Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, ‡Department of Animal Genetics,
Institut National de la Recherche Agronomique, Castanet-Tolosan, France

1
Corresponding author: [email protected]

ORCiD number: 0000-0002-0382-1897 (I. Misztal).

Abstract
Early application of genomic selection relied on SNP estimation with phenotypes or de-regressed proofs (DRP). Chips of 50k
SNP seemed sufficient for an accurate estimation of SNP effects. Genomic estimated breeding values (GEBV) were composed
of an index with parent average, direct genomic value, and deduction of a parental index to eliminate double counting. Use
of SNP selection or weighting increased accuracy with small data sets but had minimal to no impact with large data sets.
Efforts to include potentially causative SNP derived from sequence data or high-density chips showed limited or no gain in
accuracy. After the implementation of genomic selection, EBV by BLUP became biased because of genomic preselection and
DRP computed based on EBV required adjustments, and the creation of DRP for females is hard and subject to double counting.
Genomic selection was greatly simplified by single-step genomic BLUP (ssGBLUP). This method based on combining genomic
and pedigree relationships automatically creates an index with all sources of information, can use any combination of male
and female genotypes, and accounts for preselection. To avoid biases, especially under strong selection, ssGBLUP requires that
pedigree and genomic relationships are compatible. Because the inversion of the genomic relationship matrix (G) becomes
costly with more than 100k genotyped animals, large data computations in ssGBLUP were solved by exploiting limited
dimensionality of genomic data due to limited effective population size. With such dimensionality ranging from 4k in chickens
to about 15k in cattle, the inverse of G can be created directly (e.g., by the algorithm for proven and young) at a linear cost. Due
to its simplicity and accuracy, ssGBLUP is routinely used for genomic selection by the major chicken, pig, and beef industries.
Single step can be used to derive SNP effects for indirect prediction and for genome-wide association studies, including
computations of the P-values. Alternative single-step formulations exist that use SNP effects for genotyped or for all animals.
Although genomics is the new standard in breeding and genetics, there are still some problems that need to be solved. This
involves new validation procedures that are unaffected by selection, parameter estimation that accounts for all the genomic
data used in selection, and strategies to address reduction in genetic variances after genomic selection was implemented.

Key words:  genomic evaluation, genomic selection, large data, single-step GBLUP

  

Introduction Angus (S. P.  Miller, American Angus Association, Saint Joseph,
MO, personal communication), and over 100,000 animals per
Genomic selection is now widely practiced across the breeding line for some pig and broiler breeding companies.
and genetics industry. This is evident by large-scale genotyping Generally, the beginning of genomic selection is attributed
using inexpensive SNP chips. As of November of 2019, genotypes to a study by Meuwissen et  al. (2001). They used simulated
were available for over 3 million U.S. Holsteins (https://queries. data to conduct analyses with a large number of equally
uscdcb.com/Genotype/cur_freq.html), over 700,000 for American spaced markers; no attempt was made to identify QTLs but

© The Author(s) 2020. Published by Oxford University Press on behalf of the American Society of Animal Science.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/
licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
For commercial re-use, please contact [email protected]
1
2 | Journal of Animal Science, 2020, Vol. 98, No. 4

Abbreviations genomic models just by replacing the relationship matrix.


AF allele frequencies A combined matrix was first shown by Legarra et al. (2009) and
APY algorithm for proven and young complete analysis using the so-called single-step genomic BLUP
BOO breed of origin (ssGBLUP) was presented by Aguilar et al. (2010) and Christensen
DGV direct genomic value and Lund (2010). In the following studies, ssGBLUP was shown to
DRP deregressed proofs be as accurate, or more, than multistep analyses.
DYD daughter yield deviation Initially, the main focus of the single-step research was
G genomic relationship matrix ensuring compatibility of genomic and pedigree information
GBLUP genomic BLUP (Vitezica et  al., 2011) because incompatibility creates biases,
GEBV genomic estimated breeding values especially under strong selection. A  later focus was extending
GPU graphical processing units ssGBLUP to larger numbers of genotyped animals (Legarra and
GWAS genome-wide association studies Ducrocq, 2012; Fernando et al., 2014, Liu et al., 2014; Misztal et al.,
ICS independent chromosome segments 2014a). Currently, ssGBLUP is the main tool for genomic evaluation
LR linear regression in species other than dairy. If a population includes non-genotyped
PCG preconditioned conjugate gradient animals with phenotypes, the transition to some form of single
PEV prediction error variance step is unavoidable because BLUP, which is used to created pseudo-
ssBR single-step Bayesian regression observations adopted in multistep, becomes biased by genomic
ssGBLUP single-step genomic BLUP preselection (Patry and Ducrocq, 2011b). The only alternative to
SVD singular value decomposition ssGBLUP that has been explored is the use of segregation analysis
UPG unknown parent groups to partially “infer” genotypes of the ancestors of genotyped
animals, to later introduce this information in a refined ssGBLUP
some markers were by chance to be closely linked to QTLs. (Meuwissen et al., 2015). This strategy gave promising results but it
Computations included haplotypes, and analyses were done is computationally complex and has not been pursued.
by methods called BayesA and BayesB that assumed different Advances in genotyping techniques are allowing sequence
distribution of haplotype effects. With 2,200 genotyped animals, data to be generated at a lower cost; therefore, there is an
they obtained prediction accuracies of 0.85. The accuracies were interest to exploit these data (Georges et  al., 2019). Sequence
>0.7 after five generations without phenotyping or with only data can be used to identify recessive genes, targets for gene
500 genotyped animals. Such high accuracies with small data editing, and also potential causative SNP that can aid a genetic
created high hopes in the animal breeding community. prediction across breeds or lines (Hayes and Daetwyler, 2019).
The first large-scale genotyping was possible after the However, gains with using the potential causative variants for
introduction of the SNP 50k bovine chip (Matukumalli et  al., genetic prediction appear to be limited (VanRaden et  al., 2017;
2009), which provided an affordable and accurate technology Fragomeni et  al., 2019), with perhaps the exception of across-
for genotyping. Much subsequent work on the methodology of breed prediction (Moghaddar et al., 2019).
genomic selection focused on SNP effects and on the creation of Although the rate of increase in genetic gain delivered by
a genomic relationship matrix (G) (VanRaden, 2008), a concept genomic selection can be over 100 % in some cases (García-
that allowed conceptual comparisons between pedigree-based Ruiz et al., 2016), issues have recently emerged. One important
and genome-based predictions. Methods using either SNP effects issue is the fast reduction in additive genetic variance and
or genomic relationships led initially to field data analyses using more undesirable genetic correlations between important
a multistep methodology (VanRaden, 2008; VanRaden et al., 2009), traits (Hidalgo et  al., 2020). This reduction is even more
where a regular genetic evaluation by pedigree BLUP (meant noticeable when genomic information is not used for variance
as non-genomic method throughout the paper) is followed by components estimation. The same phenomenon may be
the extraction of pseudo-phenotypes of genotyped animals, a responsible for the reported reduction of 33% in heritability
genomic analysis for genotyped animals, and the creation of an in computations of genomic predictions for production yield
index combining results from BLUP and the genomic analysis traits to avoid bias in genomic estimated breeding values
(VanRaden et al., 2009). The multistep methodology is the natural (GEBV; VanRaden et al., 2014).
choice when the genomic and pedigree/phenotypic data are From the onset of genomic selection, many ideas were
owned by separate organizations. proposed and usually tested by simulation, and many of these
When the genomic selection was introduced, the main ideas were later applied to real data sets, first small then large.
focus was on testing models to increase accuracy, in particular Many of these studies led to questions about various aspects of
increasing the accuracy of prediction by SNP selection (or genomic selection, for example: Is it better to use haplotypes
differential weighting), assuming that it was possible to instead of SNP? What is the optimal number of SNP? Why there
identify most pairs QTL-closest SNP from data. However, as the are discrepancies between simulation and field studies? Why
data grows bigger, gains with SNP selection become smaller is SNP selection less useful with large data? How can we use
or nonexistent (Karaman et  al., 2016). Subsequently, most unknown parent groups (UPG) in genomic models? Is there
commercial evaluations do not use SNP selection. any limit to genomic selection?. The purpose of this paper is
The multistep method is relatively complicated and in its to present and evaluate proposed ideas on genomic selection
initial form relies on the existence of animals (bulls) with high considering most up-to-date experiences with field data.
accurate EBVs from pedigree information. It is also subject to
double counting of the genomic information when both parents
and progenies are genotyped. Because the genomic information Exploring Genomic Selection Developments
can be expressed as genomic relationships (VanRaden, 2008),
Misztal et  al. (2009) proposed a single-step evaluation that Initial developments
enhanced the BLUP machinery with a relationship matrix that Genomic selection is generally attributed to a study by
combines pedigree and genomic relationships. Subsequently, a Meuwissen et  al. (2001) where simulated data for up to 2,200
pedigree-based BLUP with any model of analysis could support phenotyped animals with genomic information expressed
Misztal et al.  |  3

as 50k haplotypes. For prediction, haplotype effects were 4 NeL, where L is genome length. In a simulated population,
estimated by several methods including treating them as fixed Pocrnic et al. (2016a) found that the accuracy of prediction using
effects, by haplotype-BLUP assuming haplotypes as normal a recursion was maximized assuming 4 NeL segments. They also
random effects, by BayesA assuming a t-distribution of effects showed that for a large population the number of segments can
(allowing for large effects), and by BayesB assuming a mixture be estimated as the number of the largest eigenvalues explaining
distribution where most haplotypes had null effects. The 98% of the variation in G, with the remaining 2% interpreted as
accuracy of predicting breeding values for the next generation noise. Extension of their studies to farm animals (Pocrnic et al.,
was the highest using BayesB and reached 0.85. The persistence 2016b) allowed to determine the number of segments, and
of accuracies over generations (without additional phenotypes) indirectly Ne, and the optimal size of the SNP chip for several
was excellent, decaying only to 0.72 after an additional five species (see Table 1).
generations of data. Reducing the number of animals with The genomic prediction does not act on individual
phenotypes to 1,000 only slightly reduced the accuracy to 0.79. segments but on their clusters, where the four largest clusters
The study by Meuwissen et al. (2001) generated great excitement could account for 10% of the genetic variation (Pocrnic et al.,
in the animal breeding community, showing the possibility of 2019a). Small data only allow to estimate only the largest
very high accuracy with small data. However, this turned out eigenvalues (or clusters), but they explain a large portion of
to be true because the simulation was unrealistic with a small the genomic variation in G. Subsequently, moderate accuracy
genome and QTLs of large effect, and no selection. Muir (2007) of genomic selection can be achieved with small data sets, and
showed low persistence of genomic predictions under selection large data sets are needed for additional improvements. The
and dependence of accuracy on population parameters. same study explains why SNP selection improves accuracy in
Much of the work involving the methodology of genomic small but not in large data sets. Genomic selection works by
selection on a practical side was accomplished by VanRaden estimating the effects of chromosome segments, and once
(2008) using SNP markers instead of haplotypes. Also, he nearly all are well estimated, the accuracy is high without
showed the equivalence of BLUP with SNP effects to genomic SNP selection (Karaman et  al., 2016) or weighting (Lourenco
BLUP (GBLUP) using G; where G = ZZ/k, with Z being the matrix et al., 2017).
n
SNP
of gene content, k = 2 pi (1 − pi ), and pi the frequency of the
i=1
Estimation of haplotypes or SNP effects
ith SNP. While genomic and pedigree inbreeding were highly If the DNA information is inherited as chromosome segments,
correlated (r = 0.68) using base allele frequencies (AF) but lowly it would be natural to base the estimation on haplotypes rather
using current AF (r = 0.12), any AF resulted in similar prediction than on SNP effects. Using haplotypes would potentially account
accuracy. The SE for elements of G was inversely proportional for epistasis within each block, as for instance, a segment of 5
to the square root of the number of markers. He stated that SNP can be estimated as having 25 different SNP combinations,
genomic relationships are due to shared alleles, and he related as opposed to only 5 SNP effects. In practice, the difference
the distribution of such alleles to a study by Stam (1980). in accuracy in models with haplotypes and SNP effects is
VanRaden (2008) findings on gene frequencies were validated by negligible (Cuyabano et  al., 2015; Jónás et  al., 2016). Problems
Strandén and Christensen (2011) who showed that in SNP, BLUP using haplotypes are the need for a complex data preparation
and GBLUP AF only affect the mean of predictions. The number and arbitrary choices in their definition, poor estimates for rare
of shared alleles, also known as independent chromosome haplotypes and the existence of spurious haplotypes due to
segments (ICS), was used for approximating the accuracy of genotyping errors.
genomic selection based on the number of genotyped animals
and heritability (Goddard, 2009); lower Ne means fewer segments
Multistep genomic evaluation
to estimate and higher accuracy of genomic selection for the
same population size. A study by VanRaden et  al. (2009) using field data sets
established a mature multistep methodology for genomic
Limited dimensionality of genomic information selection in dairy cattle. The steps included running pedigree-
Genomic prediction in farm animals is possible because of the based BLUP with the national database, creating pseudo-
small effective population size (Ne). Stretches of DNA from observations for genotyped animals (bulls) such as daughter
overrepresented ancestors (i.e., popular bulls) form relatively yield deviation (DYD). These pseudo-observations are fit into
few segments called LD blocks (Muir, 2007), shared segments a model estimating on SNP effects assuming normal (linear)
(VanRaden, 2008), or ICS (Goddard, 2009). While the segments or non-normal (nonlinear) distributions. Finally, genomic
are not easily identified and have fuzzy limits (i.e., they are predictions for a genotyped animal or candidate to selection are
broken at slightly different places across two sibs), they appear obtained combining pedigree and genomic-based predictions
indirectly, for example, as singular G that needs to be blended into an index:
to become full rank. The number of chromosome segments is
usually quantified by the formula presented by Stam (1980) as GEBV = w1 PA + w2 DGV − w3 PI,

Table 1.  Estimated number of chromosome segments, effective population size, and the optimal size of SNP chip following Pocrnic et al. (2016)

Species Estimated number of segments Estimated optimal size of SNP chip Estimated effective population size

Broiler chicken 4.2k 50k 44


Pig 4.1k 49k 48
Angus cattle 10.6k 127k 113
Jersey cattle 11.5k 138k 101
Holstein cattle 14.0k 168k 149
4 | Journal of Animal Science, 2020, Vol. 98, No. 4

where GEBV is the genomic estimated breeding value, PA is In the analyses done by Aguilar et  al. (2010), the reliability of
parent average, DGV is direct genomic value, and PI is parental the new method called ssGBLUP was as high or higher than in
index created based on pedigree relationships for genotyped multistep.
animals. Essentially, PI removes double counting of relationship Further research involving ssGBLUP was split into several
information because part of PA is included in DGV, and weights w1 directions. The first was compatibility between pedigree and
to w3 could be approximated from reliabilities of each component. genomic relationships, as incompatibility can generate biases
Genomic prediction in the aforementioned paper was validated or losses of accuracy under selection (Vitezica et al., 2011). The
by forward prediction, where prediction for young bulls using second was an extension to a very large number of genotyped
truncated data were compared with pseudo-phenotypes of those animals as initial implementation was based on dense matrices
bulls obtained by BLUP with complete data. That study also showed (Aguilar et al., 2011), which restricted the number of genotyped
that increasing the reference size of the genotyped population animals to about 100k animals. Finally, there was an interest
has a higher impact on prediction accuracy than the number of in accommodating SNP weighting via a weighted G, especially
SNP markers and that assuming non-normal SNP distribution with potential causative SNP obtained from sequence data.
has a positive effect only on traits with large effect QTL. Instead The interest in single-step methods increased as the genomic
of creating DYD, which is time-consuming, pseudo-observation selection was underway, because pedigree BLUP and, therefore,
could be calculated as deregressed proofs (DRP) (VanRaden and multistep methods were becoming biased due to genomic
Wiggans, 1991; Garrick et al., 2009; Wiggans et al., 2011). preselection (Patry and Ducrocq, 2011b), whereas ssGBLUP
Another version of a multistep method depended on using accounts for preselection.
genomic predictions called molecular breeding value as a
correlated trait (Kachman, 2008; MacNeil et  al., 2010). In this Different single-step formulations
version, GEBVs for genotyped animals based on previously Several alternative single-step formulas were proposed. These
estimated SNP effects were added as an extra trait with genetic included equations where G is not inverted (Legarra and Ducrocq,
correlation computed separately. Because genomic predictions 2012), where SNP effects are estimated for the genotyped animals
indirectly include parent average, it was hard to account for and a polygenic effect is fit for non-genotyped animals (Legarra
double counting, and genetic trend was abnormally high even and Ducrocq, 2012; Liu et al., 2014), and where SNP effects were
for early time periods when the genomic selection was not fit for all animals using imputed genotypes (Fernando et  al.,
practiced (Lourenco et al., 2018). 2014; Taskinen et al., 2017). The purpose of these formulas was to
reduce computations with many genotyped animals. As opposed
Single-step genomic evaluations to a regular ssGBLUP, which can be applied to an existing BLUP
The multistep methodology was well suited for scenarios where software just by replacing the relationship matrix, SNP-based
the phenotype and genomic data belong to separate organizations, models require new programming. Meuwissen et  al. (2014)
and especially when most information can be condensed in proposed an alternative single-step approach by combining
a small number of animals to genotype. This includes dairy identical by descent and identical by state approaches.
populations with a large number of average information bulls
with many daughters. When a population includes both males Compatibility between genomic and pedigree
and females, creating DRP free of double counting is hard, relationships
especially when genotyping includes parents and their progeny An important issue in single-step methodology is the
(Legarra et al., 2014). As the genomic information can be used to compatibility of genomic and pedigree relationships. While the
capture relationships, Misztal et  al. (2009) proposed combining genomic relationships indirectly account for all the ancestors
pedigree and genomic relationships into a combined relationship but have an arbitrary scale depending on gene frequencies,
matrix. Subsequently, pedigree-only analyses could be converted the pedigree relationships have a well-defined scale but are
to genomic analyses only by replacing the pedigree relationship limited by the depth and completeness of the pedigree. When
matrix by the combined matrix, and the steps of construction pedigrees were complete up to a base population, scaling G for
pseudo-phenotypes and the index would no longer be needed. compatibility (same means for diagonals and off-diagonals)
This combined matrix (H) was first presented by Legarra et  al. with the pedigree relationship matrix for genotyped animals
(2009) who proposed to extend the genomic information to non- (A22) improved accuracy and eliminated bias for a population
genotyped animals based on the joint distribution of breeding under strong selection (Chen et al., 2011a; Vitezica et al., 2011).
values of non-genotyped (u1) and genotyped (u2) animals: With no selection, the impact of scaling was minimal. Similar
scaling could be accomplished automatically by using base
ñ ô ñ ô
var (u1 ) cov (u1 , u2 ) A11 + A12 A−1 −1 −1
22 (G − A22 ) A22 A21 A12 A22 G
population gene frequencies (Strandén and Christensen, 2011;
H= =
cov (u2 , u1 )var (u2 ) −1
GA22 A21 G Christensen, 2012), although finding those frequencies when
the base population is not genotyped, for example, using the
where subscripts 1 and 2 refer to non-genotyped and genotyped method of Gengler et al. (2008), can be time-consuming, and it
animals, respectively; A is the pedigree relationship matrix and suffers sometimes from a clear definition of base population as
G is the genomic relationship matrix or G. Christensen and Lund described below.
(2010) arrived to the same results, using the notion of predicting When the base populations are heterogeneous with missing
the genotype at non-genotyped individuals using pedigree pedigrees across generations, as is typical in ruminants, ssGBLUP
information. may diverge or become biased, and the standard way to ensure
The inverse of H was presented by Aguilar et al. (2010) and convergence was by including a parameter ω as in Tsuruta et al.
Christensen and Lund (2010) as: (2011):

ñ ô ï ò
0 0 0 0
H−1 = A−1 + H−1 = A−1 +
0 G −1 − −1
A22 0 G−1 − ωA−1
22
Misztal et al.  |  5

The parameter ω compensates for incomplete pedigree and It is not clear whether the equations for UPG in ssGBLUP
incomplete accounting of inbreeding (Misztal et  al., 2017). should be considered for G for a single breed as genomic
Incompleteness of pedigree in A22 can be minimized by relationships are not affected by missing pedigree, and, therefore,
truncating old data and pedigree (Misztal et  al., 2013b) or by UPG are automatically accounted for. In other words, if all
assigning nonzero inbreeding to unknown parents (VanRaden, animals were genotyped, terms involving UPG should disappear
1992); old missing pedigree becomes irrelevant with truncation from H* above. Tsuruta et  al. (2019a) found that removing G
of data. Truncation to two generations of phenotypes and three from the equation above improved accuracy and reduced bias.
generations of pedigree reduced bias without lowering accuracy In GBLUP, using UPG for G did not increase accuracy for multi-
(Lourenco et al., 2014; Howard et al., 2018). breed populations (Plieschke et al., 2015).
In a single-step SNP-based model known as single-step Missing relationships also cause underestimation of
Bayesian Regression (ssBR) developed by Fernando et al. (2014), inbreeding as animals with missing parents are automatically
the compatibility between genomic and pedigree information treated as not inbred. One solution is assigning nonzero
is provided partially by the use of fixed effects in a model for inbreeding to missing parents (VanRaden, 1992; Lutaaya et  al.,
genotyped animals (Hsu et  al., 2017), similar to Vitezica et  al. 1999; Aguilar and Misztal, 2008). Such assignment improved
(2011) where this effect is implicitly fit as random. This arises convergence rate and bias in ssGBLUP in Holsteins (Misztal et al.,
from the findings of Strandén and Christensen (2011) that 2017; Tsuruta et al., 2019a) although it only slightly affected the
solutions from SNP BLUP and GBLUP are independent of gene accuracy.
frequencies if the model includes a mean. However, the missing
pedigree problem is present in all single-step formulations and The concept of metafounders
it becomes more complex with several base populations as Legarra et al. (2015) proposed to account for UPG while providing
described below. proper scaling by generalizing UPG to metafounders. In their
approach, G would be derived using 0.5 AF as an “absolute
Missing pedigree and UPG
reference” (Christensen, 2012), and A would be scaled for
In several species, there is a need to define several populations. compatibility with G using relationships among and within
This is the case in ruminants with missing parents (whereas metafounders, which are seen as pseudo-individuals. These
unknown parents of animals born in 2000 are better than relationships represent sizes and overlaps of the different base
unknown parents of animals born in 2016), and the case in pigs populations (Legarra et al., 2014). They can be estimated in such
and birds (with several lines collapsing into one, and with 2-, a way so that they account for scaling, unaccounted inbreeding,
3-, and 4-way crosses). These base populations have different different genetic level (e.g., when using multi-breed animals
means due to selection and not considering them leads to or selected populations), and multiple breeds and crosses.
strong biases. Several methods were proposed to estimate the relationships,
In BLUP, the genetic merit of these different base populations and, in practice, they imply estimating gene frequencies in
is often modeled by genetic or UPG (Quaas and Pollak, 1981; the different base populations (Garcia-Baccino et  al., 2017).
Quaas, 1988; Westell et  al., 1988). In ssGBLUP, when UPG are In simulations and real data, the concept of metafounders
applied only to pedigree relationships (A) as follows: delivered the least biased predictions (Garcia-Baccino et  al.,
2017; Meyer et  al., 2018; Bradford et  al., 2019b). When applied
 
00 0 to dairy cattle, the relationships across metafounders could be
 −1 −1 
H = A + 0G − A22 0
∗ ∗
well estimated only for metafounders associated with sufficient
00 0 number of genotypes (S. Tsuruta, University of Georgia, Athens
GA, personal communication). In dairy sheep, the use of
The convergence rate can be slow or no convergence may be metafounders reduces biases in predictions and instability of
reached (Tsuruta et  al., 2014; Matilainen et  al., 2016), partly UPG estimates for small data sizes (F Macedo, INRAE, Toulouse,
because UPG were ignored in pedigree relationships for France, personal communication).
genotyped animals (A22). Indeed, construction of A22 implicitly
assumes complete pedigrees. Misztal et al. (2013b) revised UPG
Evaluations of crossbred populations
equations to include groups also in the genomic portion of H
based on Quaas–Pollak (QP) transformation: Genomic evaluation of crossbred populations may be separated
into two types, for specific crosses or for complex crosses. In
  pigs and chicken, there is an interest in using F1 and possibly
00 0
 −1 Ä ä  three- to four-way crosses for the evaluation of purebreds on
H∗ = A∗ +  0G − A−1 − G−1 − A−1
22 Q2 
 Ä 22 ä Ä ä  the commercial scale. In beef and dairy, the interest is to have a
 −1 −1  −1 −1
0−Q2 G − A22 Q2 G − A22 Q2 joint analysis of many breeds with complex crosses (e.g., “Kiwi”
Jersey–Holstein crosses in New Zealand, 10+ breed crosses by
When UPGs were applied to all components of H as above, International Genetic Solutions, and 50+ beef crosses by The
convergence dramatically improved for a multitrait model Irish Cattle Breeding Federation). More recently, across-breed
in the Nordic dairy cattle population (Matilainen et  al., 2016). prediction with genomic data is not successful (Erbe et al., 2012;
Revised UPGs also worked well for the U.S. Holstein data up to Kachman et al., 2013) because the breeds do not share the same
2014 (Misztal et al., 2017). However, using data updated to 2015, chromosome segments. Also, the crossbreeds generate limited
Masuda et al. (2018a), based on cross-validation, reported lower information if the amount of crossbred data is small and if
reliabilities using revised UPG than not using UPG at all. While they are progeny of very few parents (Pocrnic et  al., 2019b).
most animals genotyped earlier were potentially elite, with Genetic by environment interaction and purebred-crossbred
complete pedigree, most genotyped animals after 2014 were correlations can be considered using multiple-trait models
commercial cows, often with incomplete pedigree and high (Xiang et  al., 2016; Vandenplas et  al., 2017). With purebreds and
pedigree error rate (Bradford et al., 2019a). defined crosses (F1), the genomic relationships can be adjusted
6 | Journal of Animal Science, 2020, Vol. 98, No. 4

separately for each breed combination using gene frequencies is computed as a solution to:
or other methods (Makgahlela et al., 2014; Lourenco et al., 2016)
although the impact of such adjustment is small if the selection A11 s = A12 q
pressure is low. With many crosses, a simple approach is to
ignore gene frequencies and have one set of SNP effects (Golden using sparse matrix techniques, in particular, because A11 is
et al., 2018) or one G (Mäntysaari et al., 2017) for all breeds and sparse and small. Masuda et  al. (2017) found that, for a U.S.
breed combinations. Steyn et  al. (2019) simulated five breeds Holstein population, this algorithm required 2 min to set up and
using either shared or separate relationships. In the second less than 1 s per round of multiplication.
case, the accuracy was compromised if the number of SNPs was
reduced from 45k to 9k, and despite all breeds having identical Algorithm for proven and young
QTLs, interbreed predictions had low accuracy. In U.S. dairy, SNP Because of small effective population size in farm animals, G
effects are estimated separately for each breed as otherwise the has a rank of about 5k for pigs and chicken to about 15k for
predictions would be based on the dominating breed—Holsteins beef and dairy (Pocrnic et al., 2016b), indicating the existence of
(VanRaden et  al., 2020); phenotypes of crossbreds are not used that many LD blocks or chromosome segments. Subsequently,
in the regular genetic evaluation of purebreds because of the inverse of G can be obtained by recursion on a number of
concerns of compromising the evaluation of purebreds. The most “core” animals equal to the rank of G, indirectly assuming that
refined method for the F1 crossbreds is by phasing haplotypes breeding values of N animals contain the same information
in crossbreds originating from two parental lines and building a as the effects of N chromosome segments. When animals are
model with two H matrices, one per breed (sometimes called the designated as core (c) or noncore (n), the inverse of G can be
breed of origin [BOO] model) (Christensen et al., 2014). Xiang et al. directly obtained as (Misztal, 2016):
(2016) observed an increase in accuracy compared with fitting a
single H matrix in an analysis of Landrace, Yorkshire, and crosses. ñ −1 ô ñ −1 ô
Gcc 0 −Gcc Gcn î ó
The method becomes complex for more complex crosses as the G−1
APY = + M−1 −Gnc G−1
cc I
0 0 I
origin of alleles in each crossbred is more difficult to establish.
The concept of metafounders provides a convenient solution where M is a diagonal matrix with elements:
to ssGBLUP applied to purebreds and crossbreds (Christensen
et  al., 2015; Xiang et  al., 2017). In such a case, the relationship
mi = gii− gic G−1
cc gci
across breeds represent a distance from a common genetic
origin (usually a small relationship, but potentially different where i refers to the ith genotyped, noncore animal. This method
across pairs of breeds), and the variances within breed reflect has almost a linear cost (computations and memory) with
correct scaling separately for each breed and for all breeds the number of animals (Fragomeni et  al., 2015) and has been
simultaneously (Legarra et  al., 2015). Xiang et  al. (2017) fit this successfully applied to 2.3 M genotyped animals (Tsuruta et al.,
model treating each breed combination as a different trait to 2019b). The choice of core animals for recursion is not critical
account for G × E and observed the same accuracy as in the BOO for accuracy when the number of core animals is sufficient but
model of Xiang et al. (2016). influences the convergence rate; the random choice is preferable
(Bradford et  al., 2017). Lately, Pocrnic et  al. (2019a) found that
Modifying single step for large data sets
accuracies obtained with N core animals are like those obtained
Single-step GBLUP requires explicit or implicit computations with G ignoring all but the largest N eigenvalues. This explains
of G-1 and A−122 . When created using dense matrix techniques why the accuracy with the algorithm for proven and young
(Aguilar et  al., 2011), the practical limit is about 100k animals. (APY) using 25% of the optimal number of core animals is
This is because computations increase cubically and storage almost the same, as 25% of important eigenvalues explain 90%
quadratically with the number of genotyped animals. Several of the genetic variation in G. In fact, the recursion acts not on
strategies were proposed to overcome size limitations. individual chromosome segments but on their clusters.

Indirect computations of A22−1 Inverse by singular value decomposition


Matrix A−122 is dense and, therefore, cannot be created efficiently The inverse of G can be derived from the eigenvalue
for a large number of genotyped animals. Henderson (1976) decomposition:
showed that the inverse of a submatrix of A could be obtained
based on the rules for inversion of a partitioned matrix:
G = UDU

Ä ä−1 where U is a matrix of eigenvectors and D is a matrix of


22
A−1
22 = A − (A12 ) A11 A12
eigenvalues. If all eigenvalues are positive, the inverse of G is:

When only a product of A−1 22 and a vector is required in the


iteration process as in the preconditioned conjugate gradient G−1 = U  D−1 U
(PCG) algorithm, that product can be calculated sequentially If G has small rank, only a small fraction of eigenvalues will be
every round as follows (Masuda et al., 2017; Strandén et al., 2017): meaningful. Let Dt indicate a fraction of D with non-negligible
eigenvalues, and let Ut be the corresponding eigenvectors. Then:
Ä ä−1
22
A−1
22 q = [A − (A12 ) A11 A12 ]q

G−1
t = Ut D−1
t Ut
where the product:
While eigenvalue decomposition of G requires creating G
Ä ä−1 explicitly and can be very expensive, a less expensive alternative,
s = A11 A12 q
when there are more genotyped animals than SNP, is the
Misztal et al.  |  7

singular value decomposition (SVD) of the matrix of SNP content it “hybrid model,” although the same model had already been
(Z), where Z = UD0.5 V . The SVD for a matrix of 720k animals by proposed by Legarra and Ducrocq (2012). As the imputation was
60k SNP takes less than a day (Y. Masuda, University of Georgia, expensive, and the model is conceived to use Gibbs sampling
Athens, GA, personal communication). The SVD concept can be methods, the implementation of ssBR in the BOLT software used
applied separately for each chromosome (Ødegård et al., 2018) graphical processing units, that is, GPU (Garrick et  al., 2018).
Compared with ssGBLUP, ssBR allows the user to estimate SNP
Inverse by the Woodbury formula effects directly but the implementation of complex models (e.g.,
Mäntysaari et al. (2017) proposed an inverse of G = ZZ + Iε based correlated maternal effects with multiple traits) is quite complex.
on the Woodbury formula to overcome computing challenges The method of ssBR was used for a multi-breed evaluation done
when the number of genotyped animals is greater than the by the Simmental association for more than 10 breeds (Golden
number of SNP: et  al., 2018) but they decreased the number of SNPs from 50k
to about 2.5k preselected SNP, contrary to all other species who
1 1 1 −1
1 abandoned the idea of preselecting markers because an optimal
G−1 = I − Z( Z  Z + I) Z 
subset of markers may not be optimal a few generations later.
ε ε ε ε
A more general (and simple) formulation of the ssBR model was
where Z Z is the design matrix of SNP BLUP and I is an identity given by Taskinen et al. (2017).
matrix with the same dimension. The formula is an exact
inversion but is based on an arbitrary value of ε (i.e., 0.05I, Other approaches
0.05A22), without which G could not be full rank. The “Woodbury” Legarra and Ducrocq (2012) developed an assymetric method
G-1 is dense and is not used explicitly. Its use is only for PCG where G was not inverted, but the method did not scale up
systems in which only a product of this matrix by a vector is well. Also, both Legarra and Ducrocq (2012) and Liu et  al.
desired, being reformulated as: (2014) proposed methods that used SNP effects estimated for
genotyped animals. Vandenplas et al (2019) showed that such
1¶ −1
© models when solved by the PCG algorithm require a special
G−1 q = I − Z(Z  Z + Iε) Z 
ε preconditioner for convergence.
1¶ −1
©
q= I − Z(UDU  ) Z 
ε Preselection bias
1
xq = {I − SS  } q, Under genomic selection, BLUP becomes biased (Patry and
ε
Ducrocq, 2011a, 2011b) due to preselection on Mendelian
with sampling; for instance, only offspring that has received the
“good” alleles from a sire gets to be recorded. This has an impact
on multistep methods which use BLUP as a first step, because
S = ZU  D−1/2
they will tend to penalize genomically selected animals and,
Matrix S has dimensions equal to the number of animals by the therefore, to underestimate the genetic trend. The bias can be
number of SNP. In practice, the SNP BLUP design-matrix Z Z is corrected for (Wiggans et al., 2011, 2012), but the corrections need
not full rank, and one dimension can be reduced to the actual to be reevaluated as genotyping increases. Single-step GBLUP is
rank (5k to 15k for one breed) by truncating U and D to eliminate expected to be resistant to selection bias (VanRaden et al., 2012;
small eigenvalues. Legarra et  al., 2014) as it considers all available information
jointly. Masuda et  al. (2018b) ran evaluations with BLUP and
Single-step Bayesian Regression ssGBLUP for production traits in U.S. Holsteins. They found
If 50k SNP are enough for predictions, an alternative idea was that the trends for BLUP level off, when they should actually
to impute genotypes of non-genotyped animals, resulting increase, whereas trends for ssGBLUP were consistent. Based
in the same 50k SNP effects to estimate regardless of the on the work at UGA in dairy and in pigs (unpublished), typical
number of genotyped animals. Let u2, the vector of breeding trends for genotyped animals by BLUP and ssGBLUP indicating
values for genotyped animals be equal to Za, where a is a preselection in BLUP are shown in Figure 1. The preselection bias
vector of SNP effects. Legarra et  al. (2009) showed that the can intensify when more animals are genotyped.
conditional distribution of breeding values for non-genotyped Differences between trends from BLUP and ssGBLUP can
and genotyped animals has an expectation equal to A12 A−1 22 u2 . be used indirectly as a measure of the effectiveness of the
Replacing u2 by Za: genomic section. If the trend by ssGBLUP is increasing and
the trend by BLUP is lower, genomic selection is successful. If
u1 = E (u1 | u2 ) + ε = A12 (A22 )
−1
Za + ε = Ta + ε the trends by both methods are identical, genomic selection
does not have an impact over the regular selection. If in an
where T can be called an imputation matrix for non-genotyped extreme case, the trends by ssGBLUP decrease, it means either
animals and ε can be called an imputation error. Then, the poor implementation of genomic selection or a change in the
breeding values in an animal model can be replaced by: selection objectives.

ñ ô ñ ô ñ ô Validation of genomic predictions


u1 T ε
u= = a+ Genomic evaluations are validated by realized accuracies or
u2 Z 0
reliabilities computed from predictions based on incomplete
Regardless of the number of animals, the number of “genomic” data to predictions/phenotypes based on complete data —see
unknowns is equal to the number of SNP, although there is review by Daetwyler et al. (2013) and Legarra and Reverter (2018).
an additional uncorrelated effect ε with a simple relationship Several types of validations are currently applied, and each
structure (Fernando et  al., 2014). The model was reformulated one is suitable for a different data structure. The k-fold cross-
for economy of memory by Fernando et  al. (2016) who called validation depends on splitting the population into n samples
8 | Journal of Animal Science, 2020, Vol. 98, No. 4

Individual theoretical accuracies


Individual accuracies are published with (G)EBV as a measure
of precision and they are based on true or approximated PEV
derived from mixed model equations (Henderson, 1984). The
PEV can be obtained either via efficient matrix inversion, for
example, by REML with sparse matrix package YAMS (Masuda
et al., 2015) or via Gibbs sampling (Tsuruta et al., 2017; Garrick
et  al., 2018). This is affordable for up to ~100K individuals
genotyped. The last option can support larger data sets if the
computation is by GPU (Garrick et al., 2018). For complex models
and large populations, the computation of PEV is usually too
expensive and approximations are used instead. With genomic
information, the PEV for the ith animal can be approximated as
(Misztal et al., 2013a):
Figure 1. Trend of (G)EBV with ssGBLUP (solid) and BLUP (dashed) indicating
preselection bias in BLUP.
1
PEVi σe2 ≈ σe2 p g
+ dri + di + di
and predicting phenotypes of one sample from the remaining σa2

samples (Saatchi et al., 2011). It is primarily used for small data


where d are contributions (in terms of effective daughters or
sets when only one generation is genotyped or when other
observations) due to pedigrees (r), phenotypes (p), and genomic
methods cannot be applied. As it follows from the decomposition
information (g), and σa2 and σe2 are additive and residual variances,
of GEBV (VanRaden and Wright, 2013; Lourenco et  al., 2015a),
respectively. Approximate contributions due to pedigree and
accuracies by clustering methods, such as the k-fold, depend on
phenotypic information were determined by earlier studies
the algorithm for creating the clusters. In particular, BLUP may
(Misztal and Wiggans, 1988; Meyer, 1989; VanRaden and Wiggans,
emerge as the best method if most animals in each cluster are
1991).
progeny of the same ancestors.
With the multistep SNP model, the contribution due to
The validation needs to consider that the breeder wants
genomic information could be calculated by inversion for any
to predict the next generation from former ones, in other
number of genotyped animals (VanRaden et al., 2011; Liu et al.,
words, forward and not backward or “sideways.” Thus, another
2017b). To avoid double counting, the calculations exclude the
validation method is based on a comparison of pseudo-
genomic information that is already included in the pedigree
observations of sires (DYD or DRP) with their (G)EBV obtained
information. In ssGBLUP, the genomic contribution can be
without their daughter’s information (VanRaden et  al., 2009).
calculated by combined differences between genomic and
This validation type is only realistic for populations with sires
pedigree relationships (Misztal et  al., 2013a). Edel et  al. (2019)
that have large progeny sizes and phenotype recording is mainly
provided formulas for avoiding double counting in ssBR. Efficient
on progeny, such as in dairy cattle. If pseudo-observations are
computation of genomic accuracies for any model and data set
computed by BLUP under genomic selection, this validation
is still a research topic but not a hot one because when the
may be biased by preselection (Masuda et al., 2018b). If pseudo-
models are too large for direct inversion, genomic predictions
observations are computed by ssGBLUP, the bias can be avoided
are accurate enough for selection.
but there is a danger of double counting of the genomic
information, especially if progeny sizes are small. Yet another
validation method that is called predictive ability or predictivity Genetic parameters under genomic selection
can be used when the validation animals have only their own Plant breeders estimate variance components at each genetic
records but not progeny (Legarra et  al., 2008). It is based on evaluation, partly because they have several random effects (e.g.,
correlations between GEBV obtained without a phenotype and blocks) and partly because their data sets are small. In contrast,
the phenotype adjusted for fixed effects. However, it can only animal breeders tend to use either once-in-a-while estimates or
be computed for simple models and depends on the quality of to use pedigree-based estimates for genomic evaluation purposes,
adjustments (Legarra and Reverter, 2018). Accuracies based on for example, as in VanRaden (2008). Genetic parameters can be
validation are depressed by selection and, therefore, are lower estimated with genomic information using ssGBLUP and normal
than individual theoretical accuracies based on prediction error tools such as REML or Monte Carlo Markov Chain via Gibbs
variance (PEV; Bijma, 2012; Lourenco et al., 2015a). sampling. The use of the genomic information increases the costs
A completely different approach to validation was taken by of computations because the inverse of G is usually dense, whereas
Legarra and Reverter (2018) in a method called LR, which stands non-genomic mixed model equations are sparse. Masuda et  al.
for linear regression. The method LR examines regressions and (2014) developed a sparse matrix package that recognizes and
correlations of (G)EBV using complete and partial data sets while processes dense blocks rapidly. A  four-trait single-step AIREML
accounting for the relatedness of animals in the validation and model with 15k genotyped animals took less than 1  h with the
additive variances under selection. The advantage of method new package (Masuda et  al., 2015); the computations increase
LR is the ability to support any model and any data structure. cubically with the number of genotyped animals and of traits.
For example, Bermann et  al. (2020) were able to calculate the Comparing genomic and pedigree-based estimates of
accuracy of evaluations for a threshold model. However, the variance components relies on the compatibility of genomic
method requires the additive variance for the validation and pedigree information (Legarra, 2016). Without selection
population, which may be hard to estimate as typically these and with a complete pedigree, the estimates of variance
are a subset of selected animals. Without such a variance, only components ignoring or using the genomic information are
relative comparisons among methods are possible, although usually similar, although with the genomic information they
they are useful to rank methods. have lower standard errors (Forni et  al., 2011). Under strong
Misztal et al.  |  9

selection, the estimates ignoring the genomic information are Using sequence data for genomic predictions
biased (Gao et al., 2019; Hidalgo et al., 2020). The computed bias
As sequencing is becoming less expensive, there is an interest
due to preselection depends on the accuracy of modeling and
in exploiting sequence information in animal genetics. If all
intensity of selection. For example, the popular QMSim program
causative variants and their substitution effects could be
for simulation of genomic data (Sargolzaei and Schenkel,
identified, genomic prediction would be perfect (i.e., selection
2009) only performs BLUP selection. If various types of single-
accuracy  =  1.0). If those effects were conserved across breeds,
step methods show different results despite being equivalent
accurate multi-breed evaluation would be possible (Goddard,
models, the actual variances are affected by small details in the
2017). But substitution effects may vary from breed to breed,
models (e.g., Gao et al., 2019)
even at the QTL level, due to gene–environment interaction
Before genomic selection, the genetic parameters were
and to nonadditive gene action (Duenk et  al., 2020). Sequence
thought to be generally stable, but this was not studied in
data are available through selective sequencing of key
depth. Under genomic selection, there are indications of rapidly
animals across species (e.g., 1000 Bull Genomes Project; http://
changing parameters, perhaps due to the Bulmer effect (Van
www.1000bullgenomes.com/; Hayes and Daetwyler, 2019) and
Grevenhof et al., 2012; Hidalgo et al., 2020). For instance, bias for
imputation for the remaining animals (Ros-Freixedes et  al.,
U.S. dairy genomic evaluations decreased when heritability was
2020). For a successful incorporation of potential causative SNP,
reduced to about 70%—50% of the original value (Wiggans et al.,
they need to be very close to the actual causative SNP, and their
2012; Misztal et al., 2017), which is an indicator of overestimated
a priori variance in a model need to be large as otherwise their
heritability. Hidalgo et al. (2020) used a Gibbs sampling approach
value is strongly regressed toward 0 (Brøndum et al., 2015).
to analyze the changes in genetic parameters for growth and
Practical results using sequence data from large data sets
fitness traits in pigs. To make the computations possible,
yielded mixed results. Some studies have found no improvement
analyses were done in time slices of 3 yr, and genotypes were
(Erbe et  al., 2012) and some showed a small improvement, in
restricted to parents and animals with records. Over time,
particular Moghaddar et al. (2019) who found an increase in the
heritabilities for growth were reduced by one half and the
accuracy of “distant” animals of ~0.10 using selected sequence
antagonistic genetic correlations between growth and fitness
variants. In a study that yielded up to 5% improvement in
traits became almost twice as strong. Estimates without
reliability across traits, VanRaden et al. (2017) partly used a bin
genomic information were quite different. The aforementioned
concept, where they eliminated most of the SNP close to SNP
study illustrated the tradeoffs in parameter estimation under
with the largest effects. The bin concept, popular in plants,
genomic selection. Without genomic information, the estimates
recognizes that QTLs are nested in chromosome segments and
may be biased, and with all the genomic information available
attempts to locate at most a few SNP per segment (Xu, 2013);
the computation are expensive. Cesarani et  al. (2019) reported
fewer SNP reduce the impact of priors and reduce shrinkage
biased variance components estimates under genomic selection
of causative SNP. Fragomeni et al. (2017) showed that ssGBLUP
when the genomic information was truncated or too few
can account for causative SNP if they have a large weight in
generations were used. A  modest compromise is to restrict
a weighted G. In a study on stature in U.S. Holstein using the
genotypes only to those animals on which selection was more
potential causative SNP identified by VanRaden et  al. (2017),
intense and to remove genotypes of all young animals and
Fragomeni et  al. (2019) found that the addition of potential
possibly of nonparents. Genetic parameter estimation with a
causative SNP to the current SNP panel increased reliabilities in
large number of genotypes can be possible in GBLUP when the
GBLUP but not in ssGBLUP, and reliabilities from ssGBLUP were
APY algorithm is applied. However, in ssGBLUP, A−1 22 is relatively the highest. Similar results were found in Belgian Blue cattle
dense and using it in computations eliminates most of the gains
(J.L. Gualdrón-Duarte, University of Liege, Belgium, personal
due to using a sparse G−1 APY . communication) and for health traits in dairy cattle (S. DeNise,
Zoetis, Kalamazoo MO, personal communication). A  few real,
Stability of GEBV validated major genes (as identified by molecular genetics)
Under BLUP, the evaluation of an animal depends nearly only explaining up to 10% of the genetic variance have been found
on its phenotype, parents, and progeny. Therefore, EBV for and included in ssGBLUP evaluations, either as correlated
animals with no new information are stable even if the accuracy traits (Legarra and Vitezica, 2015) or as weighting the G matrix
is low (and PEV high). In genomic evaluations, all genotyped appropriately. In general, any of these two strategies work
animals are connected through G. It means that information and result in small, but less than expected, improvements on
on new genotyped animals affects all the other genotyped accuracy (Carillier-Jacquin et al., 2016; Teissier et al., 2018; Oget
animals, causing fluctuations. Changing core animals in the et  al., 2019). Possibly, the causative SNP are already accounted
APY algorithm also causes fluctuations in GEBV even though the for by the values of chromosome segments with large data.
accuracy is not affected (Misztal et al., 2019). When short-term Some improvement with the causative SNP could be due to
fluctuations are undesirable, for example, for merchandising, imperfect modeling by GBLUP with pseudo-data such as DRP or
one solution is to use full model genomic prediction (by SSGBLUP DYD instead of records.
or multistep methods) periodically (say once a month), compute There is a dilemma whether causative SNP with large effect,
SNP effects, and run interim (e.g., weekly) indirect predictions if found, should be used in selection programs for strongly
based on backsolved SNP effects. While with small data the selected traits. With long-term selection, most likely genes
indirect predictions can be inaccurate due to ignoring parent with positive effect for most traits are fixed or close to fixation,
average, in large populations, the fraction of parent average in and genes that still have a large effect but are not fixed are
GEBV is small and indirect predictions have similar accuracy to likely to show undesirable pleiotropy. A chromosomal deletion
complete predictions (Lourenco et al., 2015b; Garcia et al, 2020). in pigs increased growth but decreased fertility (Derks et  al.,
To mitigate risk associated with potential rank changes of young 2018). Manhattan plots for mortality and milk yield, using a
bulls, semen from a team of bulls may be marketed instead of two-trait analysis, in U.S. Holstein showed the same peaks on
semen from individual bulls (e.g., https://www.dairynz.co.nz/ chromosome 14 (Tsuruta et al., 2017). Georges et al. (2019) cite
animal/animal-evaluation/bull-team/). many studies indicating pleiotropy as a result of balancing
10 | Journal of Animal Science, 2020, Vol. 98, No. 4

selection, for example, where disruptive variants in genes Conclusions


increase muscularity but affect the viability of fitness. Negative
Genomic selection methodology has been widely embraced
effects of pleiotropy on low heritability traits may be hard to
by the animal breeding industry as evidenced by the scale
identify but can be important in the long run.
of genotyping. The evaluation in most species except dairy
Balancing selection resulting in intermediate gene
cattle is by single-step methods, which consider all sources of
frequencies may be unlikely in cases where selection indices are
information jointly, with methodology refined sufficiently to
utilized even with pleiotropy. While causative SNPs with large
provide relatively unbiased evaluation for any data size, and
effects are not likely to be fixed after years of pedigree BLUP, the
easily accommodating causal genes. The dairy industry plans
trend toward fixation will be faster with genomic selection. In
to move to single step are hampered by distributed ownership
the extreme, the fixation will negatively affect low heritability or
of phenotypic and genomic data. Most evaluations use <100K
sparsely recorded traits.
SNP chips without SNP selection or weighting, indirectly
Genome-wide association studies acknowledging that the prediction acts mostly on chromosome
segments and less on markers of QTLs. Whether accurate
A standard tool for traditional genome-wide association studies
determination of causative SNPs will lead to substantially
(GWAS) is a model where one marker is analyzed at a time
increased accuracy of selection also across breeds is unclear.
as fixed effect (Kennedy et  al., 1992), for example, an efficient
While the validations methods are less than perfect, they
mixed-model association expedited—EMMAX (Kang et  al.,
illustrate higher accuracy of evaluation with the genomic
2010). To reduce spurious signals due to a population structure,
information. An important concern in long-term genomic
an animal effect using a pedigree or G is added to the model
selection may be a serious reduction of the additive variance
(Kennedy et al., 1992). Alternatively, many studies use Bayesian
that may limit future gains, especially given that the parameter
methods such as BayesB or BayesR with all SNP considered
estimation with the genomic information is difficult.
jointly, interpreting large signals as markers to nearby QTLs.
While the former studies determine SNP significance using
P-values, the latter usually estimate fractions of explained
Acknowledgments
variance per segment of the genome, for example, 1 Mb.
Many studies, especially using small data, detect a large We gratefully acknowledge the very helpful comments by the two
number of “large” markers, interpreting those as close to a anonymous reviewers. This research was primarily supported by
QTL; however, the overlap of those markers across multiple grants from American Angus Association, Cobb-Vantress, Genus
populations or generations in a population under selection is PIC, Holstein Association USA, Smithfield Premium Genetics,
minimal. This suggests that many detected associations are Zoetis, and U.S. Department of Agriculture’s National Institute of
spurious (Fragomeni et al., 2014; Liu et al., 2017a). Studies using Food and Agriculture (Agriculture and Food Research Initiative
BayesB often show very high peaks, sometimes explaining competitive grant number 2015-67015-22936). This paper is
>10% of the additive variance, especially with small data sets. presented at the ASAS 2019 Symposium.
As genomic selection with small data works on large clusters of
chromosome segments (Pocrnic et al., 2019a), it is possible that
some peaks may be tags to those clusters. Conflict of interest statement
Many of these signals in GWAS are, therefore, probably false
The authors declare no real or perceived conflicts of interest.
positives and can probably be explained by small data sets.
If pedigree relationships are incomplete (e.g., ancestors not
included), they would not account for population structure. In Literature Cited
addition, P-values or False Discovery Rate are rarely reported. Aguilar,  I., A.  Legarra, F.  Cardoso, Y.  Masuda, D.  Lourenco, and
Classical GWAS in EMMAX is conceived for a set of individuals I.  Misztal. 2019. Frequentist p-values for large-scale-single
that are genotyped and phenotyped. When genotyped animals step genome-wide association, with an application to birth
have only records from progeny or other relatives, this method weight in American Angus cattle. Genet. Sel. Evol. 51:28.
is only applicable in a multiple-step manner, that is, creating doi:10.1186/s12711-019-0469-3
pseudo-phenotypes such as DRP or DYD as it was typically the Aguilar,  I., and I.  Misztal. 2008. Technical note: recursive
case in dairy cattle (Boichard et al., 2003), but this is difficult to algorithm for inbreeding coefficients assuming nonzero
generalize to other species, where progeny sizes are smaller inbreeding of unknown parents. J. Dairy Sci. 91:1669–1672.
doi:10.3168/jds.2007-0575
and many genotyped have phenotypes (e.g., weights) but not
Aguilar,  I., I.  Misztal, D.  L.  Johnson, A.  Legarra, S.  Tsuruta, and
genotypes. However, Gualdrón-Duarte et  al. (2014) and Bernal
T.  J.  Lawlor. 2010. Hot topic: a unified approach to utilize
Rubio et al. (2016) showed the equivalence of P-values in GBLUP- phenotypic, full pedigree, and genomic information for
based models with P-values in single-marker fixed regressions genetic evaluation of Holstein final score. J. Dairy Sci. 93:743–
with a polygenic effect. Lu et  al. (2018) extended the theory 752. doi:10.3168/jds.2009–2730
to ssGBLUP, and Aguilar et  al. (2019) added this concept to Aguilar,  I., I.  Misztal, A.  Legarra, and S.  Tsuruta. 2011. Efficient
the BLUPF90 package (Misztal et  al., 2014b), with a successful computation of the genomic relationship matrix and other
implementation using 1 million birth weight phenotypes for matrices used in single-step evaluation. J. Anim. Breed. Genet.
American Angus, almost 2 million animals in the pedigrees, and 128:422–428. doi:10.1111/j.1439-0388.2010.00912.x
Bermann, M., A. Legarra, M. K. Hollifield, Y. Masuda, D. Lourenco,
1,424 genotyped sires. The GWAS with P-values from ssGBLUP
and I.  Misztal. 2020. Validation of genomic and pedigree
accounts for population structure, considers phenotypes from
predictions from threshold models using the linear regression
both genotyped and non-genotyped animals without additional (LR) method: an application in chicken mortality. Genet. Sel.
steps, and allows for arbitrarily complex models. At this time, Evol. (under review)
the method is limited to models where the left-hand side of the Bernal Rubio, Y. L., J. L. Gualdrón Duarte, R. O. Bates, C. W. Ernst,
mixed model equations can be inverted, which sets a soft limit D.  Nonneman, G.  A.  Rohrer, A.  King, S.  D.  Shackelford,
of perhaps ~100K genotyped animals. T.  L.  Wheeler, R.  J.  Cantet, et  al. 2016. Meta-analysis of
Misztal et al.  |  11

genome-wide association from genomic prediction models. Duenk,  P.  P. Bijma,  M.  P.  L. Calus,  Y.  C.  J. Wientjes, and
Anim. Genet. 47:36–48. doi:10.1111/age.12378 J. H. J. van der Werf. 2020. The impact of non-additive effects
Bijma,  P. 2012. Accuracies of estimated breeding values on the genetic correlation between populations. G3 (Bethesda)
from ordinary genetic evaluations do not reflect the 10:783–795. doi:10.1534/g3.119.400663
correlation between true and estimated breeding values Edel, C., E. C. G. Pimentel, M. Erbe, R. Emmerling, and K. U. Götz.
in selected populations. J. Anim. Breed. Genet. 129:345–358. 2019. Short communication: calculating analytical
doi:10.1111/j.1439-0388.2012.00991.x reliabilities for single-step predictions. J. Dairy Sci. 102:3259–
Boichard,  D., C.  Grohs, F.  Bourgeois, F.  Cerqueira, R.  Faugeras, 3265. doi:10.3168/jds.2018-15707
A. Neau, R. Rupp, Y. Amigues, M. Y. Boscher, and H. Levéziel. Erbe, M., B. J. Hayes, L. K. Matukumalli, S. Goswami, P. J. Bowman,
2003. Detection of genes influencing economic traits in C. M. Reich, B. A. Mason, and M. E. Goddard. 2012. Improving
three French dairy cattle breeds. Genet. Sel. Evol. 35:77–101. accuracy of genomic predictions within and between dairy
doi:10.1186/1297-9686-35-1-77 cattle breeds with imputed high-density single nucleotide
Bradford, H. L., Y. Masuda, J. B. Cole, I. Misztal, and P. M. VanRaden. polymorphism panels. J. Dairy Sci. 95:4114–4129. doi:10.3168/
2019a. Modeling pedigree accuracy and uncertain parentage jds.2011-5019
in single-step genomic evaluations of simulated and US Fernando,  R.  L., H.  Cheng, and D.  J.  Garrick. 2016. An efficient
Holstein datasets. J. Dairy Sci. 102:2308–2318. doi:10.3168/ exact method to obtain GBLUP and single-step GBLUP when
jds.2018-15419 the genomic relationship matrix is singular. Genet. Sel. Evol.
Bradford, H. L., Y. Masuda, J. B. Cole, I. Misztal, and P. M. VanRaden. 48:80. doi:10.1186/s12711-016-0260-7
2019b. Modeling pedigree accuracy and uncertain parentage Fernando, R. L., J. C. Dekkers, and D. J. Garrick. 2014. A class of
in single-step genomic evaluations of simulated and US Bayesian methods to combine large numbers of genotyped
Holstein datasets. J. Dairy Sci. 102:2308–2318. doi:10.3168/ and non-genotyped animals for whole-genome analyses.
jds.2018-15419 Genet. Sel. Evol. 46:50. doi:10.1186/1297-9686-46-50
Bradford, H. L., I. Pocrnić, B. O. Fragomeni, D. A. L. Lourenco, and Forni,  S., I.  Aguilar, and I.  Misztal. 2011. Different genomic
I. Misztal. 2017. Selection of core animals in the algorithm for relationship matrices for single-step analysis using
proven and young using a simulation model. J. Anim. Breed. phenotypic, pedigree and genomic information. Genet. Sel.
Genet. 134:545–552. doi:10.1111/jbg.12276 Evol. 43:1. doi:10.1186/1297-9686-43-1
Brøndum,  R.  F., G.  Su, L.  Janss, G.  Sahana, B.  Guldbrandtsen, Fragomeni, B. O., D. A. L. Lourenco, A. Legarra, P. M. VanRaden,
D.  Boichard, and M.  S.  Lund. 2015. Quantitative trait loci and I. Misztal. 2019. Alternative SNP weighting for single-step
markers derived from whole genome sequence data increases genomic best linear unbiased predictor evaluation of stature
the reliability of genomic prediction. J. Dairy Sci. 98:4107–4116. in US Holsteins in the presence of selected sequence variants.
doi:10.3168/jds.2014-9005 J. Dairy Sci. 102:10012–10019. doi:10.3168/jds.2019-16262
Carillier-Jacquin,  C., H.  Larroque, and C.  Robert-Granié. Fragomeni,  B.  O., D.  A.  L.  Lourenco, Y.  Masuda, A.  Legarra, and
2016. Including α s1 casein gene information in genomic I.  Misztal. 2017. Incorporation of causative quantitative
evaluations of French dairy goats. Genet. Sel. Evol. 48:54. trait nucleotides in single-step GBLUP. Genet. Sel. Evol. 49:59.
doi:10.1186/s12711-016-0233-x doi:10.1186/s12711-017-0335-0
Cesarani,  A., I.  Pocrnic, N.  P.  P.  Macciotta, B.  O.  Fragomeni, Fragomeni, B. O., D. A. Lourenco, S. Tsuruta, Y. Masuda, I. Aguilar,
I.  Misztal, and D.  A.  L.  Lourenco. 2019. Bias in heritability A.  Legarra, T.  J.  Lawlor, and I.  Misztal. 2015. Hot topic: use
estimates from genomic restricted maximum likelihood of genomic recursions in single-step genomic best linear
methods under different genotyping strategies. J. Anim. Breed. unbiased predictor (BLUP) with a large number of genotypes.
Genet. 136:40–50. doi:10.1111/jbg.12367 J. Dairy Sci. 98:4090–4094. doi:10.3168/jds.2014-9125
Chen, C. Y., I. Misztal, I. Aguilar, A. Legarra, and W. M. Muir. 2011. Fragomeni,  Bde.  O., I.  Misztal, D.  L.  Lourenco, I.  Aguilar,
Effect of different genomic relationship matrices on accuracy R.  Okimoto, and W.  M.  Muir. 2014. Changes in variance
and scale. J. Anim. Sci. 89:2673–2679. doi:10.2527/jas.2010-3555 explained by top SNP windows over generations for three
Christensen,  O.  F. 2012. Compatibility of pedigree-based and traits in broiler chicken. Front. Genet. 5:332. doi:10.3389/
marker-based relationship matrices for single-step genetic fgene.2014.00332
evaluation. Genet. Sel. Evol. 44:37. doi:10.1186/1297-9686-44-37 Gao, H., P. Madsen, G. P. Aamand, J. R. Thomasen, A. C. Sørensen,
Christensen, O. F., A. Legarra, M. S. Lund, and G. Su. 2015. Genetic and J. Jensen. 2019. Bias in estimates of variance components
evaluation for three-way crossbreeding. Genet. Sel. Evol. 47:98. in populations undergoing genomic selection: a simulation
doi:10.1186/s12711-015-0177-6 study. BMC Genomics 20:956. doi:10.1186/s12864-019-6323-8
Christensen,  O.  F., and M.  S.  Lund. 2010. Genomic prediction Garcia,  A.  L.  S., Y.  Masuda, S.  Tsuruta, S.  Miller, I.  Misztal, and
when some animals are not genotyped. Genet. Sel. Evol. 42:2. D.  Lourenco. 2020. Indirect predictions with a large number
doi:10.1186/1297-9686-42-2 of genotyped animals using the algorithm for proven and
Christensen, O. F., P. Madsen, B. Nielsen, and G. Su. 2014. Genomic young. J. Anim. Sci. (in press)
evaluation of both purebred and crossbred performances. Garcia-Baccino,  C.  A., A.  Legarra, O.  F.  Christensen, I.  Misztal,
Genet. Sel. Evol. 46:23. doi:10.1186/1297-9686-46-23 I. Pocrnic, Z. G. Vitezica, and R. J. Cantet. 2017. Metafounders
Cuyabano,  B.  C., G.  Su, and M.  S.  Lund. 2015. Selection of are related to F st fixation indices and reduce bias in single-
haplotype variables from a high-density marker map for step genomic evaluations. Genet. Sel. Evol. 49:34. doi:10.1186/
genomic prediction. Genet. Sel. Evol. 47:61. doi:10.1186/ s12711-017-0309-2
s12711-015-0143-3 García-Ruiz,  A., J.  B.  Cole, P.  M.  VanRaden, G.  R.  Wiggans,
Daetwyler, H. D., M. P. Calus, R. Pong-Wong, G. de Los Campos, and F.  J.  Ruiz-López, and C.  P.  Van  Tassell. 2016. Changes in
J. M. Hickey. 2013. Genomic prediction in animals and plants: genetic selection differentials and generation intervals in
simulation of data, validation, reporting, and benchmarking. US Holstein dairy cattle as a result of genomic selection.
Genetics 193:347–365. doi:10.1534/genetics.112.147983 Proc. Natl. Acad. Sci. U.  S. A. 113:E3995–E4004. doi:10.1073/
Derks,  M.  F.  L., M.  S.  Lopes, M.  Bosse, O.  Madsen, B.  Dibbits, pnas.1519061113
B.  Harlizius, M.  A.  M.  Groenen, and H.  J.  Megens. 2018. Garrick, D. J., D. P. Garrick, and B. L. Golden. 2018. An introduction
Balancing selection on a recessive lethal deletion with to BOLT software for genetic and genomic evaluations. In:
pleiotropic effects on two neighboring genes in the porcine Proceedings of the 11th World Congress on Genetics Applied
genome. PLoS Genet. 14:e1007661. doi:10.1371/journal. to Livestock Production; February 11 to 16, 2018; Auckland
pgen.1007661 (New Zealand); p 973.
12 | Journal of Animal Science, 2020, Vol. 98, No. 4

Garrick, D. J., J. F. Taylor, and R. L. Fernando. 2009. Deregressing Karaman, E., H. Cheng, M. Z. Firat, D. J. Garrick, and R. L. Fernando.
estimated breeding values and weighting information 2016. An upper bound for accuracy of prediction using GBLUP.
for genomic regression analyses. Genet. Sel. Evol. 41:55. PLoS One. 11:e0161054. doi:10.1371/journal.pone.0161054
doi:10.1186/1297-9686-41-55 Kennedy,  B.  W., M.  Quinton, and J.  A.  van  Arendonk. 1992.
Gengler, N., S. Abras, C. Verkenne, S. Vanderick, M. Szydlowski, Estimation of effects of single genes on quantitative traits. J.
and R. Renaville. 2008. Accuracy of prediction of gene content Anim. Sci. 70:2000–2012. doi:10.2527/1992.7072000x
in large animal populations and its use for candidate gene Legarra, A. 2016. Comparing estimates of genetic variance across
detection and genetic evaluation. J. Dairy Sci. 91:1652–1659. different relationship models. Theor. Popul. Biol. 107:26–30.
doi:10.3168/jds.2007-0231 doi:10.1016/j.tpb.2015.08.005
Georges, M., C. Charlier, and B. Hayes. 2019. Harnessing genomic Legarra, A., I. Aguilar, and I. Misztal. 2009. A relationship matrix
information for livestock improvement. Nat. Rev. Genet. including full pedigree and genomic information. J. Dairy Sci.
20:135–156. doi:10.1038/s41576-018-0082-2 92:4656–4663. doi:10.3168/jds.2009-2061
Goddard, M. E. 2009. Genomic selection: prediction of accuracy Legarra, A., O. F. Chistensen, I. Aguilar, and I. Misztal. 2014. Single
and maximisation of long term response. Genetica 136:245– step, a general approach for genomic selection. Livest. Prod.
257. doi:10.1007/s10709-008-9308-0 Sci. 166:54–65. doi:10.1534/genetics.115.177014
Goddard,  M.  E. 2017. Can we make genomic selection 100% Legarra, A., O. F. Christensen, Z. G. Vitezica, I. Aguilar, and I. Misztal.
accurate? J. Anim. Breed. Genet. 134:287–288. doi:10.1111/ 2015. Ancestral relationships using metafounders: finite
jbg.12281 ancestral populations and across population relationships.
Golden,  B.  L., M.  L.  Spangler, W.  M.  Snelling, and D.  J.  Garrick. Genetics 200:455–468. doi:10.1016/j.livsci.2014.04.029
2018. Current single-step national beef cattle evaluation models Legarra,  A., and V.  Ducrocq. 2012. Computational strategies for
used by the American Hereford Association and International national integration of phenotypic, genomic, and pedigree
Genetic Solutions, computational aspects, and implications of data in a single-step best linear unbiased prediction. J. Dairy
marker selection. Proceedings of the Beef Improvement Sci. 95:4629–4645. doi:10.3168/jds.2011-4982
Federation 11th Genetic Prediction Workshop Refining Legarra,  A., and A.  Reverter. 2018. Semi-parametric estimates
Genomic Evaluation and Selection Indices; December 12 to of population accuracy and bias of predictions of breeding
13, 2018; Kansas City (MO); p. 14–22. values and future phenotypes using the LR method. Genet. Sel.
Gualdrón  Duarte,  J.  L., R.  J.  Cantet, R.  O.  Bates, C.  W.  Ernst, Evol. 50:53. doi:10.1186/s12711-018-0426-6
N.  E.  Raney, and J.  P.  Steibel. 2014. Rapid screening for Legarra, A., C. Robert-Granié, E. Manfredi, and J. M. Elsen. 2008.
phenotype-genotype associations by linear transformations Performance of genomic selection in mice. Genetics 180:611–
of genomic evaluations. BMC Bioinformatics 15:246. 618. doi:10.1534/genetics.108.088575
doi:10.1186/1471-2105-15-246 Legarra,  A., and Z.  G.  Vitezica. 2015. Genetic evaluation with
Hayes,  B.  J., and H.  D.  Daetwyler. 2019. 1000 Bull genomes major genes and polygenic inheritance when some animals
project to map simple and complex genetic traits in cattle: are not genotyped using gene content multiple-trait BLUP.
applications and outcomes. Annu. Rev. Anim. Biosci. 7:89–102. Genet. Sel. Evol. 47:89. doi:10.1186/s12711-015-0165-x
doi:10.1146/annurev-animal-020518-115024 Liu, Z., M. E. Goddard, F. Reinhardt, and R. Reents. 2014. A single-
Henderson,  C.  R. 1976. A simple method for computing the step genomic model with direct estimation of marker effects.
inverse of a relationship matrix used in prediction of breeding J. Dairy Sci. 97:5833–5850. doi:10.3168/jds.2014-7924
values. Biometrics 32:69. Liu, Z., P. M. VanRaden, M. H. Lidauer, M. P. Calus, H. Benhajali,
Henderson,  C.  R. 1984. Applications of linear models in animal H.  Jorjani, and V.  Ducrocq. 2017b. Approximating genomic
breeding. Guelph (ON), Canada: University of Guelph. reliabilities for national genomic evaluation. Interbull Bull.
Hidalgo,  J., S.  Tsuruta, D.  Lourenco, Y.  Masuda, Y.  Huang, 51:75–85.
K. A. Gray, and I. Misztal. 2020. Changes in genetic parameters Liu,  A., Y.  Wang, G.  Sahana, Q.  Zhang, L.  Liu, M.  S.  Lund, and
for fitness and growth traits in pigs under genomic selection. G.  Su. 2017a. Genome-wide association studies for female
J. Anim. Sci. (under review) fertility traits in Chinese and Nordic Holsteins. Sci. Rep.
Howard,  J.  T., T.  A.  Rathje, C.  E.  Bruns, D.  F.  Wilson‐Wells, 7:8487. doi:10.1038/s41598-017-09170-9
S.  D.  Kachman, and M.  L.  Spangler. 2018. The impact of Lourenco, D. A. L., B. O. Fragomeni, H. L. Bradford, I. R. Menezes,
truncating data on the predictive ability for single‐step J.  B.  S.  Ferraz, I.  Aguilar, S.  Tsuruta, and I.  Misztal. 2017.
genomic best linear unbiased prediction. J. Anim. Breed. Genet. Implications of SNP weighting on single-step genomic
135:251–262. doi:10.1111/jbg.12334 predictions for different reference population sizes. J. Anim.
Hsu, W. L., D. J. Garrick, and R. L. Fernando. 2017. The accuracy Breed. Genet. 134:463–471. doi:10.1111/jbg.12288
and bias of single-step genomic prediction for populations Lourenco, D. A., B. O. Fragomeni, S. Tsuruta, I. Aguilar, B. Zumbach,
under selection. G3 (Bethesda). 7:2685–2694. doi:10.1534/ R.  J.  Hawken, A.  Legarra, and I.  Misztal. 2015a. Accuracy of
g3.117.043596 estimated breeding values with genomic information on
Jónás,  D., V.  Ducrocq, M.  N.  Fouilloux, and P.  Croiseau. males, females, or both: an example on broiler chicken. Genet.
2016. Alternative haplotype construction methods for Sel. Evol. 47:56. doi:10.1186/s12711-015-0137-1
genomic evaluation. J. Dairy Sci. 99:4537–4546. doi:10.3168/ Lourenco,  D.  A., I.  Misztal, S.  Tsuruta, I.  Aguilar, T.  J.  Lawlor,
jds.2015–10433 S.  Forni, and J.  I.  Weller. 2014. Are evaluations on young
Kachman,  S. 2008. Incorporation of marker scores into national genotyped animals benefiting from the past generations? J.
cattle evaluations. Proceedings of the 9th Genetic Prediction Dairy Sci. 97:3930–3942. doi:10.3168/jds.2013-7769
Workshop; December 10 to 11, 2008; Kansas City (MO): Beef Lourenco,  D.  A., S.  Tsuruta, B.  O.  Fragomeni, C.  Y.  Chen,
Improvement Federation; p. 92–98. W.  O.  Herring, and I.  Misztal. 2016. Crossbreed evaluations
Kachman,  S.  D., M.  L.  Spangler, G.  L.  Bennett, K.  J.  Hanford, in single-step genomic best linear unbiased predictor using
L.  A.  Kuehn, W.  M.  Snelling, R.  M.  Thallman, M.  Saatchi, adjusted realized relationship matrices. J. Anim. Sci. 94:909–
D. J. Garrick, R. D. Schnabel, et al. 2013. Comparison of molecular 919. doi:10.2527/jas.2015-9748
breeding values based on within- and across-breed training in Lourenco, D. A., S. Tsuruta, B. O. Fragomeni, Y. Masuda, I. Aguilar,
beef cattle. Genet. Sel. Evol. 45:30. doi:10.1186/1297-9686-45-30 A.  Legarra, J.  K.  Bertrand, T.  S.  Amen, L.  Wang, D.  W.  Moser,
Kang,  H.  M., J.  H.  Sul, S.  K.  Service, N.  A.  Zaitlen, S.  Y.  Kong, et  al. 2015b. Genetic evaluation using single-step genomic
N.  B.  Freimer, C.  Sabatti, and E.  Eskin. 2010. Variance best linear unbiased predictor in American Angus. J. Anim. Sci.
component model to account for sample structure in 93:2653–2662. doi:10.2527/jas.2014-8836
genome-wide association studies. Nat. Genet. 42:348–354. Lourenco,  D.  A.  L., S.  Tsuruta, B.  O.  Fragomeni, Y.  Masuda,
doi:10.1038/ng.548 I. Aguilar, A. Legarra, S. Miller, D. Moser, and I. Misztal. 2018.
Misztal et al.  |  13

Single-step genomic BLUP for national beef cattle evaluation Meyer,  K. 1989. Approximate accuracy of genetic evaluation
in US: from initial developments to final implementation. under an animal model. Livest. Prod. Sci. 21:87–100.
Proc. World. Cong. Appl. Livest. Prod. 11:495. doi:10.1016/0301-6226(89)90041–9
Lu,  Y., M.  J.  Vandehaar, D.  M.  Spurlock, K.  A.  Weigel, Meyer, K., B. Tier, and A. Swan. 2018. Estimates of genetic trend
L.  E.  Armentano, E.  E.  Connor, M.  Coffey, R.  F.  Veerkamp, for single-step genomic evaluations. Genet. Sel. Evol. 50:39.
Y. de Haas, C. R. Staples, et al. 2018. Genome-wide association doi:10.1186/s12711-018-0410-1
analyses based on a multiple-trait approach for modeling Misztal,  I. 2016. Inexpensive computation of the inverse of
feed efficiency. J. Dairy Sci. 101:3140–3154. doi:10.3168/ the genomic relationship matrix in populations with small
jds.2017-13364 effective population size. Genetics 202:401–409. doi:10.1534/
Lutaaya,  E., I.  Misztal, J.  K.  Bertrand, and J.  W.  Mabry. 1999. genetics.115.182089
Inbreeding in populations with incomplete pedigrees. J. Anim. Misztal, I., H. L. Bradford, D. A. L. Lourenco, S. Tsuruta, Y. Masuda,
Breed. Genet. 116:475–480. doi:10.1046/j.1439-0388.1999.00210.x A. Legarra, and T. J. Lawlor. 2017. Studies on inflation of GEBV
MacNeil,  M.  D., J.  D.  Nkrumah, B.  W.  Woodward, and in single-step GBLUP for type. Interbull Bull. 51:38–42.
S.  L.  Northcutt. 2010. Genetic evaluation of Angus cattle for Misztal, I., A. Legarra, and I. Aguilar. 2009. Computing procedures
carcass marbling using ultrasound and genomic indicators. J. for genetic evaluation including phenotypic, full pedigree, and
Anim. Sci. 88:517–522. doi:10.2527/jas.2009-2022 genomic information. J. Dairy Sci. 92:4648–4655. doi:10.3168/
Makgahlela,  M.  L., I.  Strandén, U.  S.  Nielsen, M.  J.  Sillanpää, jds.2009-2064
and E.  A.  Mäntysaari. 2014. Using the unified relationship Misztal,  I., A.  Legarra, and I.  Aguilar. 2014a. Using recursion to
matrix adjusted by breed-wise allele frequencies in genomic compute the inverse of the genomic relationship matrix. J.
evaluation of a multibreed population. J. Dairy Sci. 97:1117– Dairy Sci. 97:3943–3952. doi:10.3168/jds.2013-7752
1127. doi:10.3168/jds.2013-7167 Misztal,  I., S.  Tsuruta, I.  Aguilar, A.  Legarra, P.  M.  VanRaden,
Mäntysaari,  E.  A., R.  D.  Evans, and I.  Strandén. 2017. Efficient and T.  J.  Lawlor. 2013a. Methods to approximate reliabilities
single-step genomic evaluation for a multibreed beef cattle in single-step genomic evaluation. J. Dairy Sci. 96:647–654.
population having many genotyped animals. J. Anim. Sci. doi:10.3168/jds.2012-5656
95:4728–4737. doi:10.2527/jas2017.1912 Misztal,  I., S.  Tsuruta, D.  A.  L.  Lourenco, Y.  Masuda, I.  Aguilar,
Masuda, Y., I. Aguilar, S. Tsuruta, and I. Misztal. 2015. Technical A. Legarra, and Z. Vitezica. 2014b. Manual for BLUPF90 family
note: acceleration of sparse operations for average- of programs. Available from http://nce.ads.uga.edu/wiki/lib/
information REML analyses with supernodal methods exe/fetch.php?media=blupf90_all7.pdf
and sparse-storage refinements. J. Anim. Sci. 93:4670–4674. Misztal, I., S. Tsuruta, I. Pocrnic, and D. Lourenco. 2019. Changes in
doi:10.2527/jas.2015-9395 predictions when using different core animals in the APY algorithm.
Masuda, Y.,T. Baba, and M. Suzuki. 2014. Application of supernodal Proceedings of the 70th Annual Meeting EAAP; August 26 to
sparse factorization and inversion to the estimation of (co) 30, 2019; Ghent, Belgium; p. 593.
variance components by residual maximum likelihood. J. Misztal, I., Z. G. Vitezica, A. Legarra, I. Aguilar, and A. A. Swan.
Anim. Breed. Genet. 131:227–236. doi:10.1111/jbg.12058 2013b. Unknown-parent groups in single-step genomic
Masuda,  Y., I.  Misztal, A.  Legarra, S.  Tsuruta, D.  A.  Lourenco, evaluation. J. Anim. Breed. Genet. 130:252–258. doi:10.1111/
B. O. Fragomeni, and I. Aguilar. 2017. Technical note: avoiding jbg.12025
the direct inversion of the numerator relationship matrix Misztal, I., and G. R. Wiggans. 1988. Approximation of prediction
for genotyped animals in single-step genomic best linear error variance in large-scale animal models. J. Dairy Sci. 71:27–
unbiased prediction solved with the preconditioned conjugate 32. doi:10.1016/S0022-0302(88)79976-2
gradient. J. Anim. Sci. 95:49–52. doi:10.2527/jas.2016.0699 Moghaddar, N., M. Khansefid, J. H. J. van der Werf, S. Bolormaa,
Masuda,  Y., I.  Misztal, P.  VanRaden, and T.  Lawlor. 2018a. Pre- N.  Duijvesteijn, S.  A.  Clark, A.  A.  Swan, H.  D.  Daetwyler,
selection bias and validation method in single-step GBLUP for and I.  M.  MacLeod. 2019. Genomic prediction based on
production traits in US Holstein. In: Proceedings of the 11th selected variants from imputed whole-genome sequence
World Congress on Genetics Applied to Livestock Production; data in Australian sheep populations. Genet. Sel. Evol. 51:72.
February 11 to 16, 2018; Auckland (New Zealand); p. 540. doi:10.1186/s12711-019-0514-2
Masuda,  Y., I.  Misztal, P.  M.  VanRaden, and T.  J.  Lawlor. 2018b. Muir, W. M. 2007. Comparison of genomic and traditional BLUP-
Differing genetic trend estimates from traditional and genomic estimated breeding value accuracy and selection response
evaluations of genotyped animals as evidence of preselection bias under alternative trait and genomic parameters. J. Anim. Breed.
in US Holsteins. In: ADSA Annual Meeting, Knoxville (TN); Genet. 124:342–355. doi:10.1111/j.1439-0388.2007.00700.x
p. 5194–5206. Ødegård,  J., U.  Indahl, I.  Strandén, and T.  H.  E.  Meuwissen.
Matilainen,  K., M.  Koivula, I.  Stranden, G.  P.  Aamand, and 2018. Large-scale genomic prediction using singular value
E.  A.  Mantysaari. 2016. Managing genetic groups in single- decomposition of the genotype matrix. Genet. Sel. Evol. 50:6.
step genomic evaluations applied on female fertility traits in doi:10.1186/s12711-018-0373-2
Nordic Red Dairy Cattle. Interbull Bull. 50:71–75. Oget, C., M. Teissier, J. M. Astruc, G. Tosser-Klopp, and R. Rupp.
Matukumalli,  L.  K., C.  T.  Lawley, R.  D.  Schnabel, J.  F.  Taylor, 2019. Alternative methods improve the accuracy of genomic
M. F. Allan, M. P. Heaton, J. O’Connell, S. S. Moore, T. P. Smith, prediction using information from a causal point mutation
T. S. Sonstegard, et al. 2009. Development and characterization in a dairy sheep model. BMC Genomics 20:719. doi:10.1186/
of a high density SNP genotyping assay for cattle. PLoS One. s12864-019-6068-4
4:e5350. doi:10.1371/journal.pone.0005350 Patry,  C., and V.  Ducrocq. 2011a. Accounting for genomic pre-
Meuwissen, T. H., B. J. Hayes, and M. E. Goddard. 2001. Prediction selection in national BLUP evaluations in dairy cattle. Genet.
of total genetic value using genome-wide dense marker Sel. Evol. 43:30. doi:10.1186/1297-9686-43-30
maps. Genetics 157:1819–1829. Patry,  C., and V.  Ducrocq. 2011b. Evidence of biases in genetic
Meuwissen,  T.  H., J.  Odegard, I.  Andersen-Ranberg, and evaluations due to genomic preselection in dairy cattle. J.
E.  Grindflek. 2014. On the distance of genetic relationships Dairy Sci. 94:1011–1020. doi:10.3168/jds.2010-3804
and the accuracy of genomic prediction in pig breeding. Plieschke, L., C. Edel, E. C. Pimentel, R. Emmerling, J. Bennewitz,
Genet. Sel. Evol. 46:49. doi:10.1186/1297-9686-46-49 and K.  U.  Götz. 2015. A simple method to separate base
Meuwissen,  T.  H., M.  Svendsen, T.  Solberg, and J.  Ødegård. population and segregation effects in genomic relationship
2015. Genomic predictions based on animal models using matrices. Genet. Sel. Evol. 47:53. doi:10.1186/s12711-015-0130-8
genotype imputation on a national scale in Norwegian Red Pocrnic, I., D. A. L. Lourenco, C. Y. Chen, W. O. Herring, and I. Misztal.
cattle. Genet. Sel. Evol. 47:79. doi:10.1186/s12711-015-0159-8 2019b. Crossbred evaluations using single-step genomic BLUP
14 | Journal of Animal Science, 2020, Vol. 98, No. 4

and algorithm for proven and young with different sources of evaluations of final score in US Holsteins. J. Dairy Sci. 97:5814–
data. J. Anim. Sci. 97:1513–1522. doi:10.1093/jas/skz042 5821. doi:10.3168/jds.2013-7821
Pocrnic, I., D. A. Lourenco, Y. Masuda, A. Legarra, and I. Misztal. Vandenplas, J., M. P. L. Calus, H. Eding, and C. Vuik. 2019. A second-
2016a. The dimensionality of genomic information and level diagonal preconditioner for single-step SNPBLUP. Genet.
its effect on genomic prediction. Genetics 203:573–581. Sel. Evol. 51:30. doi:10.1186/s12711-019-0472-8
doi:10.1534/genetics.116.187013 Vandenplas, J., J. J. Windig, and M. P. L. Calus. 2017. Prediction of the
Pocrnic,  I., D.  A.  Lourenco, Y.  Masuda, and I.  Misztal. 2016b. reliability of genomic breeding values for crossbred performance.
Dimensionality of genomic information and performance of Genet. Sel. Evol. 49:43. doi:10.1186/s12711-017-0318-1
the Algorithm for Proven and Young for different livestock Van  Grevenhof,  E.  M., J.  A.  Van  Arendonk, and P.  Bijma. 2012.
species. Genet. Sel. Evol. 48:82. doi:10.1186/s12711-016-0261-6 Response to genomic selection: the Bulmer effect and
Pocrnic, I., D. A. L. Lourenco, K. Y. Masuda, and I. Misztal. 2019a. the potential of genomic selection when the number
Accuracy of genomic BLUP when considering a genomic of phenotypic records is limiting. Genet. Sel. Evol. 44:26.
relationship matrix based on the number of the largest doi:10.1186/1297-9686-44-26
eigenvalues: a simulation study. Genet. Sel. Evol. 51:75. VanRaden,  P.  M. 1992. Accounting for inbreeding and
Quaas,  R.  L. 1988. Additive genetic model with groups and crossbreeding in genetic evaluation of large populations. J.
relationships. J. Dairy Sci. 71:1338–1345. Dairy Sci. 75:3136–3144.
Quaas,  R.  L., and E.  J.  Pollak. 1981. Modified equations for sire VanRaden,  P.  M. 2008. Efficient methods to compute genomic
models with groups. J. Dairy Sci. 64:1868–1872. predictions. J. Dairy Sci. 91:4414–4423. doi:10.3168/jds.2007-0980
Ros-Freixedes,  R., A.  Whalen, C.  Y.  Chen, G.  Gorjanc, VanRaden, P. M., J. R. O’Connell, G. R. Wiggans, and K. A. Weigel.
W. O. Herring, A. J. Mileham, and J. M. Hickey. 2020. Accuracy 2011. Genomic evaluations with many more genotypes. Genet.
of whole-genome sequence imputation using hybrid peeling Sel. Evol. 43:10. doi:10.1186/1297-9686-43-10
in large pedigreed livestock populations. Genet. Sel. Evol. 52:17. VanRaden,  P.  M., M.  E.  Tooker, T.  C.  S.  Chud, H.  D.  Norman,
doi:10.1186/s12711-020-00536-8 J.  H.  Megonigal Jr, I.  W.  Haagen, and G.  R.  Wiggans. 2020.
Saatchi, M., M. C. McClure, S. D. McKay, M. M. Rolf, J. Kim, J. E. Decker, Genomic predictions for crossbred dairy cattle. J. Dairy Sci.
T. M. Taxis, R. H. Chapple, H. R. Ramey, S. L. Northcutt, et al. 2011. 103:1620–1631. doi:10.3168/jds.2019-16634
Accuracies of genomic breeding values in American Angus VanRaden,  P.  M., M.  E.  Tooker, J.  R.  O’Connell, J.  B.  Cole, and
beef cattle using K-means clustering for cross-validation. D. M. Bickhart. 2017. Selecting sequence variants to improve
Genet. Sel. Evol. 43:40. doi:10.1186/1297-9686-43-40 genomic predictions for dairy cattle. Genet. Sel. Evol. 49:32.
Sargolzaei,  M., and F.  S.  Schenkel. 2009. QMSim: a large-scale doi:10.1186/s12711-017-0307-4
genome simulator for livestock. Bioinformatics 25:680–681. VanRaden,  P.  M., M.  E.  Tooker, J.  R.  Wright, C.  Sun, and
doi:10.1093/bioinformatics/btp045 J. L. Hutchison. 2014. Comparison of single-trait to multi-trait
Stam,  P. 1980. The distribution of the fraction of the genome national evaluations for yield, health, and fertility. J. Dairy Sci.
identical by descent in finite random mating populations. 97:7952–7962. doi:10.3168/jds.2014-8489
Genet. Res. 35:131–155. VanRaden, P. M., C. P. Van Tassell, G. R. Wiggans, T. S. Sonstegard,
Steyn, Y., D. A. L. Lourenco, and I. Misztal. 2019. Genomic predictions R.  D.  Schnabel, J.  F.  Taylor, and F.  S.  Schenkel. 2009. Invited
in purebreds with a multibreed genomic relationship matrix1. review: reliability of genomic predictions for North American
J. Anim. Sci. 97:4418–4427. doi:10.1093/jas/skz296 Holstein bulls. J. Dairy Sci. 92:16–24. doi:10.3168/jds.2008-1514
Strandén, I., and O. F. Christensen. 2011. Allele coding in genomic VanRaden, P. M., and G. R. Wiggans. 1991. Derivation, calculation,
evaluation. Genet. Sel. Evol. 43:25. doi:10.1186/1297-9686-43-25 and use of national animal model information. J. Dairy Sci.
Strandén, I., K. Matilainen, G. P. Aamand, and E. A. Mäntysaari. 74:2737–2746. doi:10.3168/jds.S0022-0302(91)78453-1
2017. Solving efficiently large single-step genomic best linear VanRaden, P. M., and J. R. Wright. 2013. Measuring genomic pre-
unbiased prediction models. J. Anim. Breed. Genet. 134:264–274. selection in theory and in practice. Interbull Bull. 47:147–150.
doi:10.1111/jbg.12257 VanRaden, P. M., J. R. Wright, and T. A. Cooper. 2012. Adjustment
Taskinen,  M., E.  A.  Mäntysaari, and I.  Strandén. 2017. Single- of selection index coefficients and polygenic variance to
step SNP-BLUP with on-the-fly imputed genotypes and improve regressions and reliability of genomic evaluations. J.
residual polygenic effects. Genet. Sel. Evol. 49:36. doi:10.1186/ Dairy Sci. 95(Suppl. 2):446–447
s12711-017-0310-9 Vitezica, Z. G., I. Aguilar, I. Misztal, and A. Legarra. 2011. Bias in
Teissier,  M., H.  Larroque, and C.  Robert-Granié. 2018. Weighted genomic predictions for populations under selection. Genet.
single-step genomic BLUP improves accuracy of genomic Res. (Camb). 93:357–366. doi:10.1017/S001667231100022X
breeding values for protein content in French dairy goats: a Westell, R. A., R. L. Quaas, and L. D. V. Vleck. 1988. Genetic groups
quantitative trait influenced by a major gene. Genet. Sel. Evol. in an animal model. J. Dairy Sci. 71:1310–1318.
50:31. doi:10.1186/s12711-018-0400-3 Wiggans, G. R., T. A. Cooper, P. M. Vanraden, and J. B. Cole. 2011.
Tsuruta,  S., D.  A.  L.  Lourenco, Y.  Masuda, I.  Misztal, and Technical note: adjustment of traditional cow evaluations
T.  J.  Lawlor. 2019a. Controlling bias in genomic breeding to improve accuracy of genomic predictions. J. Dairy Sci.
values for young genotyped bulls. J. Dairy Sci. 102:9956–9970. 94:6188–6193. doi:10.3168/jds.2011-4481
doi:10.3168/jds.2019-16789 Wiggans, G. R., P. M. Vanraden, and T. A. Cooper. 2012. Technical
Tsuruta, S., D. A. L. Lourenco, Y. Masuda, I. Misztal, and T. J. Lawlor. note: adjustment of all cow evaluations for yield traits to be
2019b. Validation of genomic predictions for linear type traits comparable with bull evaluations. J. Dairy Sci. 95:3444–3447.
in US Holsteins using over 2 million genotyped animals. J. doi:10.3168/jds.2011-5000
Dairy Sci. 102 (Suppl. 1):397. Xiang, T., O. F. Christensen, and A. Legarra. 2017. Technical note:
Tsuruta,  S., D.  A.  L.  Lourenco, I.  Misztal, and T.  J.  Lawlor. 2017. genomic evaluation for crossbred performance in a single-
Genomic analysis of cow mortality and milk production step approach with metafounders. J. Anim. Sci. 95:1472–1480.
using a threshold-linear model. J. Dairy Sci. 100:7295–7305. doi:10.2527/jas.2016.1155
doi:10.3168/jds.2017-12665 Xiang, T., B. Nielsen, G. Su, A. Legarra, and O. F. Christensen. 2016.
Tsuruta, S., I. Misztal, I. Aguilar, and T. J. Lawlor. 2011. Multiple- Application of single-step genomic evaluation for crossbred
trait genomic evaluation of linear type traits using genomic performance in pig. J. Anim. Sci. 94:936–948. doi:10.2527/
and phenotypic data in US Holsteins. J. Dairy Sci. 94:4198– jas.2015-9930
4204. doi:10.3168/jds.2011-4256 Xu,  S. 2013. Genetic mapping and genomic selection using
Tsuruta,  S., I.  Misztal, D.  A.  Lourenco, and T.  J.  Lawlor. 2014. recombination breakpoint data. Genetics 195:1103–1115.
Assigning unknown parent groups to reduce bias in genomic doi:10.1534/genetics.113.155309

You might also like