0% found this document useful (0 votes)
2 views

Average semivariance directly yields accurate estimates of the genomic variance in complex trait analyses

This document discusses the challenges of estimating genomic variance in complex trait analyses and proposes a new method using average semivariance (ASV) to improve accuracy. The authors highlight that traditional methods often lead to overestimation or underestimation of genomic variance and heritability due to varying approaches in calculating genomic relationship matrices. The proposed KASV matrix derived from ASV provides accurate estimates of genomic variance and heritability across different populations and traits.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Average semivariance directly yields accurate estimates of the genomic variance in complex trait analyses

This document discusses the challenges of estimating genomic variance in complex trait analyses and proposes a new method using average semivariance (ASV) to improve accuracy. The authors highlight that traditional methods often lead to overestimation or underestimation of genomic variance and heritability due to varying approaches in calculating genomic relationship matrices. The proposed KASV matrix derived from ASV provides accurate estimates of genomic variance and heritability across different populations and traits.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

G3, 2022, 12(6), jkac080

https://doi.org/10.1093/g3journal/jkac080
Advance Access Publication Date: 20 April 2022
Investigation

Average semivariance directly yields accurate estimates of


the genomic variance in complex trait analyses
Mitchell J. Feldmann ,1,* Hans-Peter Piepho,2 Steven J. Knapp 1

1
Department of Plant Sciences, University of California, Davis, CA 95616, USA,

Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025


2
Biostatistics Unit, Institute of Crop Science, University of Hohenheim, 70593 Stuttgart, Germany

*Corresponding author: Department of Plant Sciences, University of California, Davis, CA 95616, USA. Email: [email protected]

Abstract
Many important traits in plants, animals, and microbes are polygenic and challenging to improve through traditional marker-assisted selec-
tion. Genomic prediction addresses this by incorporating all genetic data in a mixed model framework. The primary method for predicting
breeding values is genomic best linear unbiased prediction, which uses the realized genomic relationship or kinship matrix (K) to connect
genotype to phenotype. Genomic relationship matrices share information among entries to estimate the observed entries’ genetic values
and predict unobserved entries’ genetic values. One of the main parameters of such models is genomic variance (r2g ), or the variance of a
trait associated with a genome-wide sample of DNA polymorphisms, and genomic heritability (h2g ); however, the seminal papers introduc-
ing different forms of K often do not discuss their effects on the model estimated variance components despite their importance in genetic
research and breeding. Here, we discuss the effect of several standard methods for calculating the genomic relationship matrix on esti-
mates of r2g and h2g . With current approaches, we found that the genomic variance tends to be either overestimated or underestimated
depending on the scaling and centering applied to the marker matrix (Z), the value of the average diagonal element of K, and the assort-
ment of alleles and heterozygosity (H) in the observed population. Using the average semivariance, we propose a new matrix, KASV , that di-
rectly yields accurate estimates of r2g and h2g in the observed population and produces best linear unbiased predictors equivalent to routine
methods in plants and animals.

Keywords: average semivariance; genomic heritability; genomic variance; genomic relatedness; linear mixed model; genomic best linear
unbiased predictor

Introduction widely adopted in plant breeding, human genetics, and biology


(Habier et al. 2007; Goddard and Hayes 2007; Heffner et al. 2009;
Linear mixed model (LMM) analyses are routine in the prediction
Bloom et al. 2013).
of breeding values in plants and animals (Henderson 1977;
Genomic variance (r2g )—the variance explained by genome-
VanRaden 2008; Hayes et al. 2009; Albrecht et al. 2011; Endelman
wide associations between the underlying quantitative trait locus
2011; Crossa et al. 2014; Meuwissen et al. 2016; Pincot et al. 2020;
and DNA markers genotyped in the training population—is often
Petrasch et al. 2021) and polygenic risk scores in humans (de los
estimated in genetic experiments (Visscher et al. 2007; Gao et al.
Campos et al. 2010; Makowsky et al. 2011; Lee et al. 2012;
2012; Lee et al. 2012, 2013; Lipka et al. 2014; Rutkoski et al. 2014;
Dudbridge 2013; Maier et al. 2018; Wray et al. 2019; Truong et al.
Kumar et al. 2015; Piaskowski et al. 2018; Rice and Lipka 2019;
2020), partitioning of sources of variance (Searle et al. 1992; Lynch Krause et al. 2019; Pincot et al. 2020; Petrasch et al. 2021; Yadav
and Walsh 1998; Visscher et al. 2008; Kang et al. 2010; Piepho 2019; et al. 2021) using genomic relationship matrices (GRMs, K), which
Schmidt et al. 2019a, 2019b; Feldmann et al. 2021), and controlling measure the relatedness among entries (Yang et al. 2010; Habier
for confounding effects in genome-wide association studies et al. 2013). The selection of K is used directly in solutions to the
(GWAS) (Yu et al. 2006; Visscher et al. 2012; Korte and Farlow 2013; mixed model equations and is central to estimating the correct
Visscher et al. 2017). Genomic prediction approaches are widely variance components in LMM analyses (Henderson 1953; Searle
applied in the study of complex traits in natural and experimen- et al. 1992; Lynch and Walsh 1998; Mrode 2014). The phenotypic
tal populations and facilitate the estimation of genomic variance variance–covariance (V) is V ¼ G þ R, where R ¼ Ir2e is the resid-
(r2g ), genomic heritability (h2g ), and other quantitative, population, ual variance–covariance, and G ¼ Kr2g is the genomic variance–
and evolutionary genetic parameters (Bulmer et al. 1980; Falconer covariance (Henderson 1953; Searle et al. 1992; Lynch and Walsh
and Mackay 1996; Lynch and Walsh 1998; Meuwissen et al. 2001; 1998; Piepho 2019). The genomic variance r2g is a scalar and, thus,
Bernardo 2002; Hill et al. 2008; Van Heerwaarden et al. 2008; any change in K will impact r2g estimates. Genomic variance is
Crossa et al. 2010; de los Campos et al. 2015; Huang and Mackay found in many ratios throughout modern quantitative genetic re-
2016; Lehermeier et al. 2017; Noble et al. 2019), and has been search, including genomic heritability, prediction accuracy,

Received: December 27, 2021. Accepted: March 17, 2022


C The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America.
V
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
2 | G3, 2022, Vol. 12, No. 6

selection reliability, prediction error variance, and response to ge- differences among genotypic values (g), i.e. 21 varðgi  gj Þ
nomic selection (Goddard 2009; Hickey et al. 2009; Gorjanc et al. (Webster and Oliver 2007; Piepho 2019). Piepho (2019) derived
2015). Of these ratios, genomic heritability has been the most fre- ASV from a study’s observations, worked out the semivariance
quently reported in public research (Speed et al. 2012, 2017; Speed and took the average across all pairs of observations. In our con-
and Balding 2015; de los Campos et al. 2015; Legarra 2016; text, there is an equivalent alternative derivation based on the
Lehermeier et al. 2017; Yang et al. 2017). sample variance of the genotypic values Estaghvirou et al. (2013).
Genomic heritability is The sample variance among genotypic values is
P
ðn  1Þ1 ni¼1 ðgi  gÞ2 . That is to say that the expected values of
r2g the sample variance of genotypic values are the ASV, i.e.
h2g ¼ ; (1)
r2g þ r2e Eðs2g Þ ¼ hASV
g . ASV can be used to estimate and partition the total
variance in LMM analyses into parts; such as the total variance,
where r2g is the genomic variance and r2e is the residual variance as in Piepho (2019), the variance explained by large effect
markers and marker–marker interactions, as in Feldmann et al.

Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025


on an entry-mean basis. Genomic heritability is often estimated
by substituting restricted maximum likelihood (REML) variance (2021), and genomic variance, as shown below.
component estimates into (1). We studied how different forms of
ASV definitions of genomic variance and
K affect variance component estimates. We found that, even for
heritability
large data sets, there are systematic differences in the genomic
variance component estimates arising from different forms of K In complex traits analyses, there is a crucial difference in the
(VanRaden 2008; Astle et al. 2009; Yang et al. 2010; Forni et al. treatment of genotypes and effects in statistical models used for
2011; Endelman and Jannink 2012) and that the resulting vari- data analysis vs the quantitative genetics theory (Yang et al. 2010;
ance component estimates may not always be correct when di- Speed et al. 2012, 2017; de los Campos et al. 2015; Speed and
rectly substituted into (1), as is routine practice. Despite this, Balding 2015; Legarra 2016). In quantitative genetics theory, be-
researchers often simultaneously use the same approaches for tween entry differences in genetic values and genomic variance
genomic prediction and variance component estimation and may are attributed to the a random sampling of marker genotypes
consequently report incorrect genomic heritability estimates. (Bulmer et al. 1980; Lande and Thompson 1990; Falconer and
Here, using the average semivariance (ASV), introduced by Mackay 1996; Lynch and Walsh 1998) and, in an LMM framework,
Piepho (2019) and expanded by Feldmann et al. (2021), we derive a variation stems from a random sampling of the marker effects.
new form for K, referred to as KASV , which is the product K ¼ Despite differences in derivation and assumptions regarding the
T source of randomness, the resulting variance–covariance struc-
Z Z from the mean-centered marker matrix Z ¼ PZ, where P ¼
In  n1 1n 1Tn is the idempotent, mean-centering n  n-matrix. The ture between the two coincides under specific experimental, pop-
ASV relationship matrix is ulation, and marker sampling conditions (de los Campos et al.
2015; Legarra 2016). With this in mind, we derived an approach
K using the ASV that relies on the assumptions of LMM analyses,
KASV ¼ : (2) e.g. random marker effects, but yields correct estimates of geno-
ðn  1Þ1 trðKÞ
mic variance.
This matrix is scaled to the residual variance–covariance ma- The analyses shown throughout this paper assume the depen-
trix and directly yields accurate estimates of r2g and h2g regardless dent variables are least squared means (LSMs) or other adjusted
of population constitution, population size, or true heritability. It means for entries (y). R ¼ In r2e gives the residual variance of the
is possible to scale other forms of the via a division by LSMs. The ASV can efficiently deal with more general forms of
ðn  1Þ1 trðKÞ or to scale estimates of genomic variance from any variance–covariance matrices in generalized LMMs (Piepho 2019).
form of K by multiplying ðn  1Þ1 trðKÞ by r ^ 2g to obtain ASV esti- The LMM for this analysis is
mates of variance component. We explore the practical implica-
tions of KASV for estimating r2g and h2g in a wild population of y ¼ 1n l þ In g þ e (3)
Arabidopsis thaliana (Atwell et al. 2010), a wheat (Triticum aestivium)
breeding population (Crossa et al. 2010), a laboratory mouse (Mus where y is the vector of phenotypic LSMs of for n entries, n is the
musculus) population (Valdar et al. 2006), an apple (Malus  domes- number of entries, 1n is an n-element vector of ones, l is the pop-
tica) breeding population (Kumar et al. 2015), and a pig (Sus scrofa) ulation mean, In is the identity matrix of size n, g is an n-element
breeding population (Cleveland et al. 2012). The ASV approach vector of random effect values for entries with g  Nð0; Kr2g Þ, and
that we propose can be used to estimate variance components in e is the residual for each entry with e  Nð0; In r2e Þ.
genetic evaluation studies in plants, animals, microbes, and The ASV definition of variance from LMM (3) is
humans.
hASV
y ¼ ðn  1Þ1 trðVPÞ ¼ hASV
g þ hASV
e ; (4)
The Average Semivariance
The ASV estimator of the total variance (Piepho 2019) is half the where hASV
y is the phenotypic variance, V ¼ Kr2g þ In r2e is the vari-
average total pairwise variance of a difference between entries ance–covariance among observations, hASV g is the genomic ASV,
and can be decomposed into independent sources of variance, and hASV
e is the ASV of the residuals. If we assume G ¼ Kr2g , where
e.g. genomic and residual. There are two alternative ASV deriva- G is the variance–covariance of the best linear unbiased predic-
tions, both leading to the same definitions of the estimators. The tors (BLUPs) of the genotypic values g, it can be inferred that the
first derivation originated in geostatistics and estimated the magnitude of r2g in directly inverse to trðKÞ because
semivariance as half of the variance among all pairwise V ¼ Kr2g þ In r2e .
M. J. Feldmann et al. | 3

The ASV definition of the genomic variance is AGHmatrix::Gmatrix() to calculate the Yang et al. (2010) (KY ) and
VanRaden (2008) relationship (KVR ) matrices (Rampazo Amadeu
T et al. 2016), rrBLUP::A.mat() to calculate the Endelman and Jannink
hASV
g ¼ ðn  1Þ1 trðZZT PÞr2g ¼ ðn  1Þ1 trðZ Z Þr2g ¼ ðn  1Þ1 trðKÞr2g
(2012) (KEJ ) relationship matrix, and statgenGWAS::kinship() to esti-
(5)
mate the Astle et al. (2009) (KAB ) and IBS relationship (KIBS ) matri-
T ces (van Rossum and Kruijer 2020).
where Z ¼ PZ is the mean-centered marker matrix, and K ¼ Z Z
The form proposed by VanRaden (2008) is
is the realized genomic relationship or kinship matrix described
P
by VanRaden (2008), omitting the scaling constant 2 j pj ð1  pj Þ, T
where pj is the allele frequency of the jth SNP, which requires ZZ
KVR ¼ Pm ; (8)
Hardy–Weinberg equilibrium (HWE) to hold (de los Campos et al. 2 j¼1 pj ð1  pj Þ
T T
2015), and trðZZT PÞ ¼ trðZ Z Þ. The trace of Z Z is a function of
heterozygosity in the observed population (Vitezica et al. 2013, where Z is the marker matrix centered on column means (2pj ),

Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025


2017; Legarra et al. 2018). When the observed population is in and pj is the minor allele frequency (MAF) for the jth SNP. This
HWE, n1 trðKÞ ¼ 1, and when the population is not in HWE due to form assumes HWE and obtains pj from a historical reference
inbreeding, the n1 trðKÞ ¼ 1 þ f , where f is the in coefficient of in- population, not the observed population. When pj originates from
breeding (Endelman  and Jannink 2012; Legarra et al. 2018). In the the observed population, the centering by 2pj is equivalent to col-
general case, hASV ¼ ðn  1Þ1 trðKPÞ r2g , where K is any form of umn centering and KVR only differs from KASV by a scaling factor.
g
the GRM calculated from Z, without centering, or Z, with center- The normalized relationship matrix, KGN , was explicitly intro-
ing, because trðKÞ ¼ trðKPÞ. duced as the normalized relationship matrix by Forni et al. (2011) as
The ASV definition of the residual variance is
K
KGN ¼ : (9)
hASV ¼ ðn  1Þ 1
r2e trðIn ITn PÞ ¼ r2e : (6) n1 trðKÞ
e

Notably, the genomic variance hASV is on the same scale as the This form is the most numerically similar to KASV and only dif-
g
residual variance hASV , and both are defined such that (4) is accu- fers by a single denominator degree of freedom.
e
rate. REML estimates of the residual variance are equivalent to The form of the relationship matrix proposed by Endelman
ASV estimates when best linear unbiased estimators or LSMs are and Jannink (2012) is
the response variable y.
T
dSii I þ ð1  dÞS þ hZ •j ihZ •j i
Two equivalent methods yield accurate h2g KEJ ¼
2hpj ð1  pj Þi
; (10)
estimates
There are two equivalent ways to obtain accurate estimates of where d  ðn=mÞCV2 is a shrinkage factor, CV2 is the coefficient
T T
genomic variance and subsequently genomic heritability. The of variation of the eigenvalues of S, S ¼ m1 Z Z  hZ •k ihZ •k i; hSii i
first method, our recommended approach, utilizes KASV (2) in the is the mean of diagonal elements of S. Notably, at high marker
LMM analysis and directly yields accurate estimates of the geno- densities, when d ¼ 0, Endelman and Jannink (2012) is equivalent
mic variance components from the model by rescaling the GRM. to VanRaden (2008).
The first method works because V ¼ Kr2g þ In r2e is a true state- The method proposed by Yang et al. (2010) also centers the col-
ment regardless of K, but different choices of K change the scal- umns of Z by subtracting 2pj
ing and interpretation of r2g . Thus, variance components
8
estimated by ASV can then be substituted directly into (1) with- >
> Xm
ðzji  2pj Þðzjk  2pj Þ
>
> m1 ; i 6¼ k
out any adjustment. >
< 2pj ð1  pj Þ
j¼1
The second method is to adjust the genomic variance compo- KYik ¼ m z2  ð1 þ 2p Þz þ 2p2 ; (11)
>
> X ji j ji j
nent estimates from any form of K by multiplying them by a scal- >
> 1 þ m1 ; i¼k
>
: 2pj ð1  pj Þ
ing factor (ðn  1Þ1 trðKÞ) defined by the population size (n) and the j¼1

diagonals of the chosen GRM (trðKÞ). Through substitution of (5)


and (6) into (1), the ASV estimator of genomic heritability hASV is where zij is the jth SNP in the ith individuals, zjk is the jth SNP in
g
the kth individual when j 6¼ k, and m is the number of markers.
^h ASV ^h ASV The diagonals are treated differently than the off-diagonals in
ðn  1Þ1 trðKÞ^
r 2g
^ ASV ¼
h
g
¼
g
¼ : (7) this form.
g
^h ASV ^h ASV þ ^h ASV ðn  1Þ1 trðKÞ^
r 2g þ r
^ 2e
y g e The method proposed by Astle et al. (2009) is

This formulation can be used directly with any form of K or K Xm ðzj  2pj 1Þðzj  2pj 1ÞT
by substituting REML variance component estimates. Note that KAB ¼ ð2mÞ1   ; (12)
j¼1 2pj 1  pj
ðn  1Þ1 trðKÞ is the same as the scaling coefficient used in (2).
The second strategy is analogous to the post hoc adjustment ap-
where zj is the i-element vector of the jth SNP.
proach Feldmann et al. (2021) proposed. The classical identity-by-state definition is (Astle et al. 2009):

Xm 1
Materials and methods KIBS ¼ ð2mÞ1 ðzj  1Þðzj  1ÞT þ : (13)
j¼1 2
Genomic relationship matrices
We calculated and applied seven relationship matrices for each Note that this is the only calculation that is not scaled or cen-
population, simulated or case example, including KASV . We used tered by any function of pj.
4 | G3, 2022, Vol. 12, No. 6

For each model and each simulation, we estimated two vari- made available by PIC (a Genus company) with n ¼ 3,534 entries
ance components (r2g and r2e ) using sommer::mmer() and took the genotyped at m ¼ 52,843 SNPs (H ¼ 0.311; d ¼ 0:0) that were phe-
ratio of variance components in R v4.1.0 (R Core Team 2020). We notyped for five traits: T1, T2, T3, T4, and T5 (Cleveland et al.
estimated genomic heritability using the standard form by 2012). For each population, we calculate the seven relationship
substituting REML estimates from (3) into (1). matrices (8–9) and apply them in (3) for each trait to estimate h^2
g
with (1).
LMM analysis in R We performed cross-validation to determine predictive ability
In the sommer R package (Covarrubias-Pazaran 2016), LMM (3) is ^ ; yÞ, or the correlation between BLUPs and LSM, which is a
rðg
expressed as measure of success commonly reported in genome prediction
studies that indicates how informative the phenotype is as a
mmerðfixed ¼ Y  1; measure of the genomic value. We also estimated the prediction
rffiffiffiffiffiffi
random ¼ vsðEntry; Gu ¼ KÞ;
accuracy rðg ^ 2 , which is a measure of success that scales
^ ; yÞ= h
rcov ¼ units; g rffiffiffiffiffiffi

Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025


data ¼ dataÞ ^ 2 ) (Crossa et al. 2014).
the predictive ability to the upper limit ( h g

where data is an n  2 matrix with Y as a column of LSMs, Entry is An ideal situation for genomic prediction is a low value of predic-
a column of factor-coded entries, and K is one of the seven GRMs tive ability and a high value of prediction accuracy. When the
in this study given. A large number of statistical computing solu- predictive ability is high, genomic selection is unlikely to outper-
tions can fit this model, including regress (Clifford and McCullagh form phenotypic selection. When the prediction accuracy is low,
2006), ASREML (Butler 2021), rrBLUP (Endelman 2011), GEMMA the model is bad at capturing the variation in genomic values.
(Zhou and Stephens 2012), emmREML (Akdemir and Okeke 2015), We first split each population into 80% train and 20% test and es-
brms (Bürkner 2017), and lme4GS (Caamal-Pat et al. 2021). timated genomic BLUPs and then calculated the accuracy as the
correlation between the estimated LSM y and the BLUP g ^ for all
Simulated data entries in the test set. We performed this cross-validation
We generated 36 experiment designs with different heterozygos- scheme 100 times for each population and each trait.
ity H ¼ 0.0, 0.25, 0.5, and 0.75 and different trait heritability h2g ¼
0.2, 0.5, and 0.8 and for population sizes of n ¼ 250, 500, and
1,000. In all examples, 1,000 populations genotyped at m ¼ 5,000 Results
causal loci were used to generate the genetic traits. We simulated Analysis of simulated data confirms that ASV
all m ¼ 5,000 marker effects following a normal distribution l ¼ 0 yields accurate estimates of genomic variance
and r ¼ 1. When multiplied by the marker genotypes and The ASV relationship matrix yielded suitable estimates of geno-
summed, the score is an individual’s true genetic value, g. mic variance and genomic heritability in the observed popula-
Residuals were simulated with l ¼ 0 and r2e ¼ ð1  h2 Þ=ðh2 s2g Þ to tions, while the other methods varied with the level of
obtain a trait with the desired genomic heritability (Endelman heterozygosity. When heterozygosity H < 0.5, the genomic vari-
P
2011) and s2g ¼ ðn  1Þ1 ni¼1 ðgi  gÞ2 is the sample variance ance tends to be underestimated, and when H > 0.5, the genomic
among genotypic values (Estaghvirou et al. 2013). In this study, variance tends to be overestimated (Fig. 1) by methods excluding
the true value of h2g ¼ 0:2, 0.5, or 0.8. All plots were made with the (2) and (9). This pattern was realized regardless of the population
ggplot2 package (Wickham 2016) in R 4.1.0 (R Core Team 2020). size, e.g. n ¼ 250, 500, and 1,000. All methods tend to produce ac-
curate estimates when H ¼ 0.5, in which case the inbreeding coef-
Empirical data ficient f ¼ 0 and HWE is not violated.
We analyzed four publicly available data sets using seven meth- The precision (variance) improved by increasing the popula-
ods for calculating the realized relationship matrix and estimated tion size (n), but the accuracy (bias) did not improve. It has been
h2g . First, we analyzed six traits from Kumar et al. (2015), which demonstrated ad nauseam that increasing n increases precision
evaluated a breeding population of n ¼ 247 apple (Malus  domes- or lowers the sampling variance of the estimates but does not
tica) hybrids genotyped at m ¼ 2,829 SNPs with H ¼ 0.348 (Kumar eliminate bias (Laird and Ware 1982; Searle et al. 1992; Lynch and
et al. 2015). The reported traits were fruit weight (WT), fruit firm- Walsh 1998; Legarra 2016). Notably, the entire parameter space
ness (FF), greasiness (GRE), crispiness (CRI), juiciness (JUI), and of h2g was observed when the population size is small (Fig. 1). Only
flavor intensity (FIN). The shrinkage factor d from Endelman and KASV and KGN yielded stable precision as H increased (Fig. 2).
Jannink (2012) was equal to 0.02. Second, we analyzed the wheat Other methods that we examined have variable precision and
data set from Crossa et al. (2010), who evaluated n ¼ 599 wheat variable accuracy depending on the sample size, heterozygosity,
(Triticum aestivum) fully inbred lines (H ¼ 0.0; d ¼ 0:03) for grain and the true value of h2g (Figs. 1 and 2). Interestingly, we observed
yield (GY) in four environments genotyped for m ¼ 1,278 SNPs. an interaction between h2g and H that impacted the precision of
We evaluated each environment (i.e. GY-E1, GY-E2, GY-E3, and genomic heritability estimation did not affect KGN or KASV .
GY-E4) with an independent model. Third, we analyzed data Precision improved as H increased for high heritability traits and
from Valdar et al. (2006) which evaluated a laboratory population precision worsened as H increased for low heritability traits. For
of n ¼ 1,814 stock mice (M. musculus) for body mass index (BMI), traits where h2h ¼ 0:5, precision was constant.
body length, and weight and genotyped for m ¼ 10,346 SNPs
(H ¼ 0.363; d ¼ 0:01). Fourth, we analyzed a population of Analysis of simulated and empirical data
n ¼ 1,057 naturally occurring Arabidopsis (A. thaliana) ecotypes confirms that ASV does not impact BLUPs or
phenotyped for the mean (l) and SD of flowering time under 10 C prediction accuracy
(FT10) and 16 C (FT16) and genotyped at m ¼ 193,697 SNPs ^ ; yÞ) nor the BLUPs from genomic
Neither the predictive ability (rðg
(H ¼ 0.0; d ¼ 0:0) from Atwell et al. (2010) and Alonso-Blanco et al. best linear unbiased predictor are affected by ASV. In our simu-
(2016). Fifth, we analyzed a commercial pig (S. scrofa) population lated populations, the predictive ability was equal across all
M. J. Feldmann et al. | 5

Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025


Fig. 1. Effect of heritability (h2g ), population size (n), and heterozygosity (H) on the accuracy of genomic heritability estimates. Phenotypic observations
were simulated for 1,000 samples with n ¼ 250, 500, and 1,000 (left to right) genotyped for m ¼ 5,000 SNPs and the average heterozygosity H ¼ 0%, 25%,
50%, and 75%. The accuracy of genomic heritability estimates (h ^ 2 ) from LMMs fit using the seven relationship matrices is shown for true genomic
g
heritability (h2g ) ¼ 0:2 (upper panel), 0.5 (middle panel), and 0.8 (lower panel). The upper and lower halves of each box correspond to the first and third
quartiles (the 25th and 75th percentiles). The notch corresponds to the median (the 50th percentile). The upper whisker extends from the box to the
highest value that is within 1.5 IQR of the third quartile, where IQR is the interquartile range, or distance between the first and third quartiles. The
lower whisker extends from the first quartile to the lowest value within 1.5 IQR of the quartile. The dashed line in each plot is the true value from
simulations.

seven GRMs that we tested (Fig. 3), but the prediction accuracy with different concepts in mind, they are numerically similar,
^ 1 rðg
(h ^ ; yÞ) varies with the choice of GRM and therefore the het- apart from a single degree of freedom difference in the divisor of
g
erozygosity in the sampled populations. In 22 empirical trait  the GRM: Forni et al. (2011) used the number of entries (n),
population examples we evaluated, the differences in the predic- whereas we used dfg ¼ n  1 for calculating the sample variance
tion accuracy, when present, appeared to be negligible and do not (Bulmer 1979). KGN , instead of being biased by a factor of 1=ð1 þ
lend themselves clearly to “better” or “worse” categories (Figs. 4 f Þ; KGN is biased by a factor of ðn  1Þ=n. Our simulations confirm
and 5). While the choice of K does not impact BLUP, it does im- this deviation and the median genomic variance estimates using
pact estimates of genomic variance r ^ 2,
^ 2g , genomic heritability h KGN were slightly larger than KASV , which was equal to the true
g
1
prediction accuracy h g rðg ^ ^ ; yÞ (Fig. 5), average prediction error value in the simulations (Fig. 1). This work, Forni et al. (2011), and
variance PEV, and selection reliability 1  r2 g PEV, which all rely Legarra (2016) all arrive at numerically similar solutions through
on r ^ 2g . Differences in Fig. 5 are more pronounced for the fully in- conceptually different derivations, which we feel is indicative of
bred populations, e.g. Arabidopsis and wheat, than the partially the value of these approaches for the plant, animal, and human
inbred populations, e.g. pig, mouse, and apple. ASV allows users genetic studies that rely on genomic relatedness, e.g. GWAS, ge-
to understand how well GS is performing relative to phenotypic nomic prediction, or inferring population structure and ancestry.
selection and to predict how reliable genomic selection can be for
certain traits in specific populations more accurately than other KASV yields genomic variance estimates that
methods since it directly yields accurate estimates of r2g and h2g naturally account for inbreeding
(Figs. 3–5). Inbreeding changes the patterns of among and within entry geno-
mic variance and drives deviations from HWE (Bernardo 2002;
The relationship between KASV and KGN Wricke and Weber 2010; Legarra 2016; Isik et al. 2017). A challenge
We found that the normalized K, i.e. KGN (9), proposed by Forni of partial inbreeding is that researchers may not know or infer
et al. (2011) and further described by Legarra (2016), yields esti- the reference population, making unadjusted genomic variance
mates of K that only deviated from KASV by a single degree of free- estimates hard to interpret (Legarra 2016). In genomic evalua-
dom in the denominator of the matrix scaling factor. Although tions in plants and animals, the current population is often inter-
these estimators were derived through different approaches and preted as the reference population, but this is an inaccurate
6 | G3, 2022, Vol. 12, No. 6

Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025


Fig. 2. Precision of genomic heritability estimates from simulations. The SDs from the simulation experiments are plotted against heterozygosity (H),
population size (n), and true genomic heritability (h2g ) for each of the seven GRMs evaluated in this study. Points and lines are jittered around each value
of H to improve clarity as many of the lines are parallel and overlap one another.

Fig. 3. Effect of heritability (h2g ), population size (n), and heterozygosity (H) on the predictive ability rðg
^ ; yÞ. Phenotypic observations were simulated for
1,000 samples with n ¼ 250, 500, and 1000 (left to right) genotyped for m ¼ 5,000 SNPs and the average heterozygosity H ¼ 0%, 25%, 50%, and 75%. rðg ^ ; gÞ
estimates from LMMs fit using the seven relationship matrices is shown for true genomic heritability h2g ¼ 0:2 (upper panel), 0.5 (middle panel), and 0.8
(lower panel). Each box’s upper and lower halves correspond to the first and third quartiles (the 25th and 75th percentiles). The notch corresponds to
the median (the 50th percentile). The upper whisker extends from the box to the highest value within 1.5 IQR of the third quartile, where IQR is the
interquartile range or distance between the first and third quartiles. The lower whisker extends from the first quartile to the lowest value within 1.5 IQR
of the quartile.
M. J. Feldmann et al. | 7

Fig. 4. Cross-validated predictive ability from five case studies and including 22 phenotypic traits using seven GRMs. Cross-validated predictive ability Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025
^ ; yÞ) results are presented from 100 realizations of 80 : 20 cross-validation using the seven relationship matrices for six traits in an apple population
(rðg
with n ¼ 247 entries genotyped at m ¼ 2,829 SNPs (Kumar et al. 2015) (first row), four traits in an Arabidopsis population with n ¼ 1,057 entries genotyped
at m ¼ 193,697 SNPs (Atwell et al. 2010) (second row), three traits in an mouse population with n ¼ 1,814 entries genotyped at m ¼ 10,346 SNPs (Valdar
et al. 2006) (third row), and five traits in a pig population with n ¼ 3,534 entries genotyped at 52,843 SNPs (Cleveland et al. 2012) (fourth row), four traits in
an wheat population with n ¼ 599 entries genotyped at m ¼ 1,278 SNPs (Crossa et al. 2010) (fifth row). For the Arabidopsis data set (second row), KY
systematically produced singular systems in sommer::mmer() and prediction accuracy was not estimated for either FT10l or FT16l . Each box’s upper and
lower halves correspond to the first and third quartiles (the 25th and 75th percentiles). The notch corresponds to the median (the 50th percentile). The
upper whisker extends from the box to the highest value within 1.5 IQR of the third quartile, where IQR is the interquartile range or distance between
the first and third quartiles. The lower whisker extends from the first quartile to the lowest value within 1.5 IQR of the quartile.

interpretation unless the population is at HWE and H ¼ 0.5 by de- When the study populations are entirely, or partially, inbred as
sign or happenstance. It may be that the only reference popula- in wheat, Arabidopsis, or inbred per se evaluations in hybrid crops,
tion that is concretely defined is the sample population. In such as maize, tomato, rice, the covariance among marker effects
connection to Legarra (2016), our work will allow researchers to increases. Lehermeier et al. (2017) proposed a novel method (termed
directly obtain accurate estimates of the genomic variance in the method M2) to account for the covariance of marker effects, which
sample population regardless of whether the assumptions of increases the genomic variance estimates in recombinant inbred
HWE are met. line populations. Our analyses of the same flowering time data with
8 | G3, 2022, Vol. 12, No. 6

Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025


Fig. 5. Cross-validated prediction accuracy from five case studies and including 22 phenotypic traits using seven GRMs. Cross-validated prediction
qffiffiffiffiffiffi
accuracy (rðg ^ 2 ) results are presented from 100 realizations of 80 : 20 cross-validation using the seven relationship matrices for six traits in an
^ ; yÞ= h
apple population with n ¼ 247 entries genotyped at m ¼ 2,829 SNPs (Kumar et al. 2015) (first row), four traits in an Arabidopsis population with n ¼ 1,057
entries genotyped at m ¼ 193,697 SNPs (Atwell et al. 2010) (second row), three traits in an mouse population with n ¼ 1,814 entries genotyped at
m ¼ 10,346 SNPs (Valdar et al. 2006) (third row), and five traits in a pig population with n ¼ 3,534 entries genotyped at 52,843 SNPs (Cleveland et al. 2012)
(fourth row), four traits in an wheat population with n ¼ 599 entries genotyped at m ¼ 1,278 SNPs (Crossa et al. 2010) (fifth row). For the Arabidopsis data
set (second row), KY systematically produced singular systems in sommer::mmer() and prediction accuracy was not estimated for either FT10l or FT16l .
Each box’s upper and lower halves correspond to the first and third quartiles (the 25th and 75th percentiles). The notch corresponds to the median (the
50th percentile). The upper whisker extends from the box to the highest value within 1.5 IQR of the third quartile, where IQR is the interquartile range
or distance between the first and third quartiles. The lower whisker extends from the first quartile to the lowest value within 1.5 IQR of the quartile.

ASV yielded equivalent results and patterns to Lehermeier et al.


Discussion
(2017), suggesting that KASV may be providing an estimate of geno- GRMs are routine in human, plant, animal, and microbial genet-
mic variance that naturally accounts for linkage disequilibrium (LD) ics in agriculture, medicine, and biology for both prediction of ge-
and the covariance of marker effects (Table 1). We believe that the netic values, e.g. breeding values and polygenic scores (Hayes
similarity in results is because LD is associated with the off- et al. 2010; Jensen et al. 2012; Bloom et al. 2013; Gowda et al. 2014;
diagonal elements of K, which is taken into account using ASV. Lipka et al. 2014, 2015; Goddard et al. 2016; Jivanji et al. 2019;
M. J. Feldmann et al. | 9

Table 1. Genomic heritability (h ^ 2 ) estimates for the 22 traits from five case studies, including six traits in an apple population with
g
n ¼ 247 entries genotyped at m ¼ 2,829 SNPs (Kumar et al. 2015), four traits in an wheat population with n ¼ 599 entries genotyped at
m ¼ 1,278 SNPs (Crossa et al. 2010), four traits in an Arabidopsis population with n ¼ 1,057 entries genotyped at m ¼ 193,697 SNPs (Atwell
et al. 2010), and three traits in an mouse population with n ¼ 1,814 entries genotyped at m ¼ 10,346 SNPs (Valdar et al. 2006), and five
traits in a pig population with n ¼ 3,534 entries genotyped at 52,843 SNPs (Cleveland et al. 2012) using the seven GRMs compared in this
article.

Case study Trait ASV Forni et al. VanRaden Astle and Yang et al. Endelman IBS
(2011) (2008) Balding (2010) and
(2009) Jannink (2012)

Apple WT 0.48 0.48 0.49 0.51 0.44 0.50 0.59


GRE 0.51 0.51 0.52 0.52 0.53 0.53 0.62
FF 0.77 0.77 0.78 0.75 0.70 0.79 0.84
CRI 0.54 0.54 0.55 0.50 0.54 0.56 0.64

Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025


JUI 0.47 0.47 0.47 0.44 0.41 0.48 0.57
FIN 0.19 0.19 0.19 0.20 0.21 0.20 0.26
Arabidopsis FT10l 0.92 0.92 0.76 0.77 – 0.76 0.83
FT10sd 0.27 0.27 0.09 0.09 0.09 0.09 0.14
FT16l 0.92 0.92 0.75 0.76 – 0.76 0.83
FT16sd 0.55 0.55 0.26 0.26 0.29 0.26 0.36
Mouse BMI 0.21 0.21 0.21 0.23 0.23 0.21 0.26
Length 0.35 0.35 0.34 0.34 0.34 0.35 0.41
Weight 0.60 0.60 0.59 0.59 0.60 0.60 0.66
Pig T1 0.03 0.03 0.03 0.04 0.04 0.03 0.05
T2 0.27 0.27 0.26 0.27 0.31 0.26 0.36
T3 0.23 0.23 0.22 0.27 0.23 0.22 0.31
T4 0.35 0.35 0.34 0.35 0.41 0.34 0.45
T5 0.39 0.39 0.38 0.38 0.46 0.38 0.49
Wheat GY-E1 0.53 0.53 0.35 0.33 0.39 0.36 0.46
GY-E2 0.49 0.49 0.32 0.29 0.41 0.34 0.42
GY-E3 0.40 0.40 0.24 0.27 0.24 0.25 0.33
GY-E4 0.45 0.45 0.29 0.27 0.29 0.30 0.38

Pincot et al. 2020; Petrasch et al. 2021; Fan et al. 2021), and for ac- studies, the population sizes are n  500, which may pose a gen-
counting for population structure and relatedness in marker-trait eral problem for variance component and ratio estimation as
association analyses (Kang et al. 2010; Yang et al. 2010, 2011; Tian those variance components can have high sampling variability be-
et al. 2011; Peiffer et al. 2014; Spindel et al. 2016; Alqudah et al. tween replicated experiments (Fig. 1). For large populations, com-
2016; Pincot et al. 2018; Ferguson et al. 2021; Freebern et al. 2020). mon in human and domesticated animal studies, it is possible to
As advocated by Speed and Balding (2015) and Legarra (2016), the precise (low variance) but inaccurate (high bias) estimates of r2g
ragged diagonal elements of KASV equal 1, on average, and the and h2g resulting from different relationship matrices, unless the
off-diagonal elements equal 0, on average. ASV directly yields ac- assumptions of HWE happen to be perfectly met in the study pop-
curate estimates of genomic heritability in the observed popula- ulation.
tion and can be used to adjust deviations that arise from other We did not explore differences that arise from population
commonly used methods for calculating genomic relationships structure or rare alleles, which is a limitation to our simulation
regardless of the population constitution, such as inbred lines approach (Astle et al. 2009; Lee et al. 2012, 2013; Speed et al. 2012).
and F1 hybrids, unstructured GWAS populations, and animal We believe, but have not demonstrated, that our ASV approach
herds or flocks (Fig. 1). could be applied to many of the existing methods that have been
The interpretation of genomic variance and heritability esti- proposed to handle these real-world situations. For example, Lee
mates was systematically affected by the available methods used et al. (2012) propose that K be calculated among different sets of
to estimate K. The bias that we show in this paper is independent SNPs with similar MAFs and then the genomic variance for each
of sampling error (large data sets mitigate sampling error) and MAF bin are jointly estimated and summed to account for unique
exists even for enormous data sets. We derived a new relationship variation attributable to common vs rare alleles. Speed et al.
matrix, KASV , using the ASV that yielded consistent variance com- (2012) proposed a scaling factor for each SNP based on its own
ponent estimates. We also derived a correction factor sample variance (varðxl Þs ), where s ranges from 2 to 2 and xl is a
ðn  1Þ1 trðKÞ that allowed accurate estimates of genomic herita- vector of marker genotypes at the lth locus (Speed et al. 2012; Lee
bility in the observed population from LMM analyses using various et al. 2013). This means that SNPs are either being centered and
software packages (Clifford and McCullagh 2006; Endelman 2011; scaled (s ¼ –1), which is equal to KGN , or that SNPs are being cen-
Zhou and Stephens 2012; Pe rez and de Los Campos 2014; Akdemir tered but not scaled (s ¼ 0). While Speed et al. (2012) indicate that
and Okeke 2015; Covarrubias-Pazaran 2016; Bürkner 2017; Runcie s ¼ –1 yields more stable estimates of h2g , it is not entirely clear
and Crawford 2019; Butler 2021; Caamal-Pat et al. 2021). how to optimally select a value of s for each locus.
Adopting experiment designs that enable screening of a greater Our simulations exposed systematic differences between (2)
number of entries n yield more precise estimates of key variance and other forms of K. Our simulation and empirical experiments
components in research programs (Smith et al. 2006; Moehring also suggested limited, if any, differences between genomic vari-
et al. 2014; Borges et al. 2019; Mackay et al. 2019; Hoefler et al. 2020) ance estimates from five other commonly cited GRMs (Fig. 1;
and ASV can ensure that those estimates are accurate and compa- Table 1). The lack of significant differences is perturbing. In every
rable across populations. In many plant quantitative genetic case, there are multiple reasons given for using one relationship
10 | G3, 2022, Vol. 12, No. 6

matrix over any other that do not seem to play any role in either Astle W, , Balding DJ. Population structure and cryptic relatedness in
bias (accuracy) or variance (precision) of the genomic variance genetic association studies. Stat Sci. 2009;24:451–471.
component estimates. Both (2) and (9) have the necessary nu- Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y,
meric properties advocated by Speed and Balding (2015) that en- Meng D, Platt A, Tarone AM, Hu TT, et al. Genome-wide associa-
able the variance components from LMM (3) to be interpreted tion study of 107 phenotypes in Arabidopsis thaliana inbred lines.
directly as the genomic variance in the sampled population. We Nature. 2010;465(7298):627–631.
recommend that the ASV approach be considered for adoption by Bernardo R. Breeding for Quantitative Traits in Plants, Vol. 1.
genetic researchers working in humans, microbes, or (un)domes- Woodbury (MN): Stemma Press; 2002.
ticated plants and animals. Bloom JS, Ehrenreich IM, Loo WT, Lite TLV, Kruglyak L. Finding the
sources of missing heritability in a yeast cross. Nature. 2013;
494(7436):234–237.
Data availability Borges A, González-Reymundez A, Ernst O, Cadenazzi M, Terra J,
The input and output data from simulations and analyses have Gutie rrez L. Can spatial modeling substitute for experimental de-

Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025


been deposited, along with the code for the simulations, in a pub- sign in agricultural experiments? Crop Sci. 2019;59(1):44–53.
lic Zenodo repository (https://doi.org/10.5281/zenodo.6211739). Bulmer MG. Principles of Statistics. North Chelmsford, Chelmsford,
MA: Courier Corporation; 1979.
Bulmer MG. The Mathematical Theory of Quantitative Genetics.
Acknowledgments Oxford: Clarendon Press; 1980.
The authors thank Andre s Legarra for suggestions that signifi- Bürkner PC. brms: an R package for Bayesian multilevel models us-
cantly improved the manuscript. ing Stan. J Stat Soft. 2017;80(1):1–28.
MJF, HPP, and SJK: conceptualization, investigation, project ad- Butler D. 2021. asreml: fits the Linear Mixed Model. R package version
ministration, resources, supervision, and writing—review and 4.1.0.160.
Caamal-Pat D, Pe rez-Rodrıguez P, Crossa J, Velasco-Cruz C, Pe rez-
editing; MJF: data curation, formal analysis, methodology,
Elizalde S, Vázquez-Pen ~ a M. lme4gs: an r-package for genomic se-
Software, validation, visualization, and writing—original draft
lection. Front Genet. 2021;12(982):1–12.
preparation; HPP and SJK: funding acquisition.
Cleveland MA, Hickey JM, Forni S. A common dataset for genomic
analysis of livestock populations. G3 (Bethesda). 2012;2(4):
Funding 429–435.
Clifford D, McCullagh P. The regress function. Newsl R Project. 2006;
This research was supported by grants to SJK from the United
6(2):6:6.
States Department of Agriculture (http://dx.doi.org/10.13039/
Covarrubias-Pazaran G. Genome-assisted prediction of quantitative
100000199), National Institute of Food and Agriculture (NIFA)
traits using the R package sommer. PLoS One. 2016;11(6):
Specialty Crops Research Initiative (#2017-51181-26833), and
e0156744.
California Strawberry Commission (http://dx.doi.org/10.13039/
Crossa J, de los Campos G, Pe rez P, Gianola D, Burguen
~ o J, Araus JL,
100006760), in addition to funding from the University of
Makumbi D, Singh RP, Dreisigacker S, Yan J, et al. Prediction of ge-
California, Davis (http://dx.doi.org/10.13039/100007707). HPP was
netic values of quantitative traits in plant breeding using pedi-
supported by the German Research Foundation (DFG) grant PI
gree and molecular markers. Genetics. 2010;186(2):713–724.
377/18-1. The funders had no role in study design, data collection
Crossa J, Perez P, Hickey J, Burgueno J, Ornella L, Cerón-Rojas J,
and analysis, decision to publish, or preparation of the manu-
Zhang X, Dreisigacker S, Babu R, Li Y, et al. Genomic prediction in
script.
CIMMYT maize and wheat breeding programs. Heredity (Edinb).
2014;112(1):48–60.
Conflicts of interest de los Campos G, Gianola D, Allison DB. Predicting genetic predispo-
sition in humans: the promise of whole-genome markers. Nat
None declared.
Rev Genet. 2010;11(12):880–886.
de los Campos G, Sorensen D, Gianola D. Genomic heritability: what
Literature cited is it? PLoS Genet. 2015;11(5):e1005048.
Dudbridge F. Power and predictive accuracy of polygenic risk scores.
Akdemir D, Okeke U. Emmreml: Fitting Mixed Models with Known PLoS Genet. 2013;9(3):e1003348.
Covariance Structures. CRAN. R Package Version 3.1; 2015. Endelman JB. Ridge regression and other kernels for genomic selec-
Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M, tion with R package rrblup. Plant Genome. 2011;4(3):250–255.
Simianer H, Schön CC. Genome-based prediction of testcross val- Endelman JB, Jannink JL. Shrinkage estimation of the realized rela-
ues in maize. Theor Appl Genet. 2011;123(2):339–350. tionship matrix. G3 (Bethesda). 2012;2:1405–1413.
Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Estaghvirou SBO, Ogutu JO, Schulz-Streeck T, Knaak C, Ouzunova M,
Borgwardt KM, Cao J, Chae E, Dezwaan TM, Ding W, et al. 1,135 Gordillo A, Piepho HP. Evaluation of approaches for estimating
genomes reveal the global pattern of polymorphism in the accuracy of genomic prediction in plant breeding. BMC
Arabidopsis thaliana. Cell. 2016;166(2):481–491. Genomics. 2013;14:860.
Alqudah AM, Koppolu R, Wolde GM, Graner A, Schnurbusch T. The Falconer D, Mackay T. Introduction to Quantitative Genetics.
genetic architecture of barley plant stature. Front Genet. 2016;7: Harlow, Essex (UK): longmans Green; 1996.
117. Fan M, Hall ML, Roast M, Peters A, Delhey K. Variability, heritability
Amadeu RR, Cellon C, Olmstead JW, Garcia AAF, Resende MFR, and condition-dependence of the multidimensional male colour
Mun~ oz PR. Aghmatrix: R package to construct relationship matri- phenotype in a passerine bird. Heredity. 2021;127:300–311.
ces for autotetraploid and diploid species: a blueberry example. Feldmann MJ, Piepho HP, Bridges WC, Knapp SJ. Average semivar-
Plant Genome. 2016;9(3):1–10. iance yields accurate estimates of the fraction of marker-
M. J. Feldmann et al. | 11

associated genetic variance and heritability in complex trait spatial designs outperform classic experimental designs? JABES.
analyses. PLoS Genet. 2021;17(8):e1009762. 2020;25(4):523–552.
Ferguson JN, Fernandes SB, Monier B, Miller ND, Allen D, Dmitrieva Huang W, Mackay TFC. The genetic architecture of quantitative
A, Schmuker P, Lozano R, Valluru R, Buckler ES, et al. Machine traits cannot be inferred from variance component analysis.
learning-enabled phenotyping for GWAS and TWAS of WUE PLoS Genet. 2016;12(11):e1006421.
traits in 869 field-grown sorghum accessions. Plant Physiol. 2021; Isik F, Holland J, Maltecca C. Genetic Data Analysis for Plant and
187(3):1481–1500. Animal Breeding. Berlin (Germany): Springer; 2017.
Forni S, Aguilar I, Misztal I. Different genomic relationship matrices Jensen J, Su G, Madsen P. Partitioning additive genetic variance into
for single-step analysis using phenotypic, pedigree and genomic genomic and remaining polygenic components for complex traits
information. Genet Sel Evol. 2011;43:1–7. in dairy cattle. BMC Genet. 2012;13:44.
Freebern E, Santos DJ, Fang L, Jiang J, Gaddis KLP, Liu GE, VanRaden Jivanji S, Worth G, Lopdell TJ, Yeates A, Couldrey C, Reynolds E,
PM, Maltecca C, Cole JB, Ma L. Gwas and fine-mapping of livability Tiplady K, McNaughton L, Johnson TJ, Davis SR, et al. Genome-
wide association analysis reveals qtl and candidate mutations in-

Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025


and six disease traits in holstein cattle. BMC Genomics. 2020;
21(1):1–11. volved in white spotting in cattle. Genet Sel Evol. 2019;51(1):62.
Gao H, Christensen OF, Madsen P, Nielsen US, Zhang Y, Lund MS, Su Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S, Freimer NB, Sabatti
G. Comparison on genomic predictions using three GBLUP meth- C, Eskin E. Variance component model to account for sample
ods and two single-step blending methods in the nordic holstein structure in genome-wide association studies. Nat Genet. 2010;
population. Gen Sel Evol. 2012;44:1–8. 42(4):348–354.
Goddard M. Genomic selection: prediction of accuracy and maximi- Korte A, Farlow A. The advantages and limitations of trait analysis
sation of long term response. Genetica. 2009;136(2):245–257. with GWAS: a review. Plant Methods. 2013;9:29.
Krause MR, González-Pe rez L, Crossa J, Pe rez-Rodrıguez P,
Goddard M, Hayes B. Genomic selection. J Anim Breed Genet. 2007;
Montesinos-López O, Singh RP, Dreisigacker S, Poland J, Rutkoski
124(6):323–330.
J, Sorrells M, et al. Hyperspectral reflectance-derived relationship
Goddard M, Kemper K, MacLeod I, Chamberlain A, Hayes B. Genetics
matrices for genomic prediction of grain yield in wheat. G3
of complex traits: prediction of phenotype, identification of
(Bethesda). 2019;9(4):1231–1247.
causal polymorphisms and genetic architecture. Proc Roy Soc B:
Kumar S, Molloy C, Mun ~ oz P, Daetwyler H, Chagne  D, Volz R.
Biol Sci. 2016;283:20160569.
Genome-enabled estimates of additive and nonadditive genetic
Gorjanc G, Bijma P, Hickey JM. Reliability of pedigree-based and ge-
variances and prediction of apple phenotypes across environ-
nomic evaluations in selected populations. Genet Sel Evol. 2015;
ments. G3 (Bethesda). 2015;5:2711–2718.
47:1–15.
Laird NM, Ware JH. Random-effects models for longitudinal data.
Gowda M, Zhao Y, Würschum T, Longin CFH, Miedaner T, Ebmeyer
Biometrics. 1982;38(4):963–974.
E, Schachschneider R, Kazman E, Schacht J, Martinant J-P, et al.
Lande R, Thompson R. Efficiency of marker-assisted selection in the
Relatedness severely impacts accuracy of marker-assisted selec-
improvement of quantitative traits. Genetics. 1990;124(3):
tion for disease resistance in hybrid wheat. Heredity (Edinb).
743–756.
2014;112(5):552–561.
Lee SH, DeCandia TR, Ripke S, Yang J, Sullivan PF, Goddard ME,
Habier D, Fernando RL, Dekkers JC. The impact of genetic relation-
Keller MC, Visscher PM, Wray NR; Molecular Genetics of
ship information on genome-assisted breeding values. Genetics.
Schizophrenia Collaboration (MGS). Estimating the proportion of
2007;177(4):2389–2397.
variation in susceptibility to schizophrenia captured by common
Habier D, Fernando RL, Garrick DJ. Genomic BLUP decoded: a look
snps. Nat Genet. 2012;44(3):247–250.
into the black box of genomic prediction. Genetics. 2013;194(3):
Lee SH, Yang J, Chen GB, Ripke S, Stahl EA, Hultman CM, Sklar P,
597–607. Visscher PM, Sullivan PF, Goddard ME, et al. Estimation of SNP
Hayes BJ, Pryce J, Chamberlain AJ, Bowman PJ, Goddard ME. Genetic heritability from dense genotype data. Am J Hum Genet. 2013;
architecture of complex traits and accuracy of genomic predic- 93(6):1151–1155.
tion: coat colour, milk-fat percentage, and type in holstein cattle Legarra A. Comparing estimates of genetic variance across different
as contrasting model traits. PLoS Genet. 2010;6(9):e1001139. relationship models. Theor Popul Biol. 2016;107:26–30.
Hayes BJ, Visscher PM, Goddard ME. Increased accuracy of artificial Legarra A, Lourenco DA, Vitezica ZG. Bases for Genomic Prediction;
selection by using the realized relationship matrix. Genet Res 2018. [accessed 2021 May 24] http://genoweb.toulouse.inra.fr/
(Camb)). 2009;91(1):47–60. ~alegarra/GSIP.pdf.
Heffner EL, Sorrells ME, Jannink JL. Genomic selection for crop im- Lehermeier C, De los Campos G, Wimmer V, Schön CC. Genomic var-
provement. Crop Sci. 2009;49(1):1–12. iance estimates: with or without disequilibrium covariances? J
Henderson C. Best linear unbiased prediction of breeding values not Anim Breed Genet. 2017;134(3):232–241.
in the model for records. J. Dairy Sci. 1977;60(5):783–787. Lipka AE, Kandianis CB, Hudson ME, Yu J, Drnevich J, Bradbury PJ,
Henderson CR. Estimation of variance and covariance components. Gore MA. From association to prediction: statistical methods for
Biometrics. 1953;9(2):226–252. the dissection and selection of complex traits in plants. Curr
Hickey JM, Veerkamp RF, Calus MP, Mulder HA, Thompson R. Opin Plant Biol. 2015;24:110–118.
Estimation of prediction error variances via Monte Carlo sam- Lipka AE, Lu F, Cherney JH, Buckler ES, Casler MD, Costich DE.
pling methods using different formulations of the prediction er- Accelerating the switchgrass (Panicum virgatum L.) breeding cycle
ror variance. Genet Sel Evol. 2009;41:1–9. using genomic selection approaches. PLoS One. 2014;9(11):
Hill WG, Goddard ME, Visscher PM. Data and theory point to mainly e112227.
additive genetic variance for complex traits. PLoS Genet. 2008; Lynch M, Walsh B. Genetics and Analysis of Quantitative Traits, Vol.
4(2):e1000008. 1. Sunderland (MA): Sinauer; 1998.
Hoefler R, González-Barrios P, Bhatta M, Nunes JA, Berro I, Nalin RS, Mackay I, Piepho HP, Garcia AAF. Statistical methods for plant
Borges A, Covarrubias E, Diaz-Garcia L, Quincke M, et al. Do breeding. In: Balding D, Moltke I, Marioni J, editors. Handbook of
12 | G3, 2022, Vol. 12, No. 6

Statistical Genomics: Two Volume Set; 2019. p. 501–520. New Schmidt P, Hartung J, Rath J, Piepho HP. Estimating broad-sense heri-
York, NY. tability with unbalanced data from agricultural cultivar trials.
Maier RM, Zhu Z, Lee SH, Trzaskowski M, Ruderfer DM, Stahl EA, Crop Sci. 2019b;59(2):525–536.
Ripke S, Wray NR, Yang J, Visscher PM, et al. Improving genetic Searle SR, Casella G, McCulloch CE. Variance Components. New
prediction by leveraging genetic correlations among human dis- York: John Wiley & Sons; 1992.
eases and traits. Nat Comm. 2018;9:1–17. Smith A, Lim P, Cullis BR. The design and analysis of multi-phase
Makowsky R, Pajewski NM, Klimentidis YC, Vazquez AI, Duarte CW, plant breeding experiments. J Agric Sci. 2006;144(5):393–409.
Allison DB, de Los Campos G. Beyond missing heritability: predic- Speed D, Balding DJ. Relatedness in the post-genomic era: is it still
tion of complex traits. PLoS Genet. 2011;7(4):e1002051. useful? Nat Rev Genet. 2015;16(1):33–44.
Meuwissen T, Hayes B, Goddard M. Prediction of total genetic value Speed D, Cai N, Johnson MR, Nejentsev S, Balding DJ, Consortium U;
using genome-wide dense marker maps. Genetics. 2001;157(4): UCLEB Consortium. Reevaluation of snp heritability in complex
1819–1829. human traits. Nat Genet. 2017;49(7):986–992.
Meuwissen T, Hayes B, Goddard M. Genomic selection: a paradigm Speed D, Hemani G, Johnson MR, Balding DJ. Improved heritability

Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025


shift in animal breeding. Anim Front. 2016;6(1):6–14. estimation from genome-wide snps. Am J Hum Genet. 2012;91(6):
Moehring J, Williams ER, Piepho HP. Efficiency of augmented p-rep 1011–1021.
designs in multi-environmental trials. Theor Appl Genet. 2014; Spindel J, Begum H, Akdemir D, Collard B, Redon ~ a E, Jannink J,
127(5):1049–1060. McCouch S. Genome-wide prediction models that incorporate de
Mrode RA. Linear Models for the Prediction of Animal Breeding novo gwas are a powerful new tool for tropical rice improvement.
Values. Boston: CABI; 2014. Heredity (Edinb). 2016;116(4):395–408.
Noble DW, Radersma R, Uller T. Plastic responses to novel environ- Tian F, Bradbury PJ, Brown PJ, Hung H, Sun Q, Flint-Garcia S,
ments are biased towards phenotype dimensions with high addi- Rocheford TR, McMullen MD, Holland JB, Buckler ES. Genome-
tive genetic variation. Proc Natl Acad Sci U S A. 2019;116(27): wide association study of leaf architecture in the maize nested
13452–13461. association mapping population. Nat Genet. 2011;43(2):159–162.
Peiffer JA, Romay MC, Gore MA, Flint-Garcia SA, Zhang Z, Millard MJ, Truong B, Zhou X, Shin J, Li J, van der Werf JH, Le TD, Lee SH.
Gardner CA, McMullen MD, Holland JB, Bradbury PJ, et al. The ge- Efficient polygenic risk scores for biobank scale data by exploiting
netic architecture of maize height. Genetics. 2014;196(4):1337–1356. phenotypes from inferred relatives. Nat Comm. 2020;11:1–11.
rez P, de Los Campos G. Genome-wide regression and prediction
Pe Valdar W, Solberg LC, Gauguier D, Burnett S, Klenerman P, Cookson
with the bglr statistical package. Genetics. 2014;198(2):483–495. WO, Taylor MS, Rawlins JNP, Mott R, Flint J. Genome-wide genetic
Petrasch S, Mesquida-Pesci SD, Pincot DDA, Feldmann MJ, López CM, association of complex traits in heterogeneous stock mice. Nat
Famula R, Hardigan MA, Cole GS, Knapp SJ, Blanco-Ulate B. Genet. 2006;38(8):879–887.
Genomic prediction of strawberry resistance to postharvest fruit Van Heerwaarden B, Willi Y, Kristensen TN, Hoffmann AA.
decay caused by the fungal pathogen Botrytis cinerea. G3 Population bottlenecks increase additive genetic variance but do
(Bethesda). 2021;12(1);jkab378. not break a selection limit in rain forest Drosophila. Genetics.
Piaskowski J, Hardner C, Cai L, Zhao Y, Iezzoni A, Peace C. Genomic 2008;179(4):2135–2146.
heritability estimates in sweet cherry reveal non-additive genetic van Rossum BJ, Kruijer W. statgenGWAS: Genome Wide Association
variance is relevant for industry-prioritized traits. BMC Genet. Studies. CRAN. R Package Version 1.0.5; 2020.
2018;19(1):23. VanRaden PM. Efficient methods to compute genomic predictions. J
Piepho HP. A coefficient of determination (R2) for generalized linear Dairy Sci. 2008;91(11):4414–4423.
mixed models. Biom J. 2019;61(4):860–872. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS
Pincot DD, Hardigan MA, Cole GS, Famula RA, Henry PM, Gordon TR, discovery. Am J Hum Genet. 2012;90(1):7–24.
Knapp SJ. Accuracy of genomic selection and long-term genetic Visscher PM, Hill WG, Wray NR. Heritability in the genomics era–
gain for resistance to verticillium wilt in strawberry. Plant concepts and misconceptions. Nat Rev Genet. 2008;9(4):255–266.
Genome. 2020;13(3):e20054. Visscher PM, Macgregor S, Benyamin B, Zhu G, Gordon S, Medland S,
Pincot DD, Poorten TJ, Hardigan MA, Harshman JM, Acharya CB, Cole Hill WG, Hottenga J-J, Willemsen G, Boomsma DI, et al. Genome
GS, Gordon TR, Stueven M, Edger PP, Knapp SJ. Genome-wide as- partitioning of genetic variation for height from 11,214 sibling
sociation mapping uncovers fw1, a dominant gene conferring re- pairs. Am J Hum Genet. 2007;81(5):1104–1110.
sistance to fusarium wilt in strawberry. G3 (Bethesda). 2018;8(5): Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA,
1817–1828. Yang J. 10 years of GWAS discovery: biology, function, and trans-
R Core Team. R: A Language and Environment for Statistical lation. Am J Hum Genet. 2017;101(1):5–22.
Computing. Vienna (Austria): R Foundation for Statistical Vitezica ZG, Legarra A, Toro MA, Varona L. Orthogonal estimates of
Computing; 2020. variances for additive, dominance, and epistatic effects in popu-
Rice B, Lipka AE. Evaluation of rr-blup genomic selection models lations. Genetics. 2017;206(3):1297–1307.
that incorporate peak genome-wide association study signals in Vitezica ZG, Varona L, Legarra A. On the additive and dominant vari-
maize and sorghum. Plant Genome. 2019;12(1):180052. ance and covariance of individuals within the genomic selection
Runcie DE, Crawford L. Fast and flexible linear mixed models for scope. Genetics. 2013;195(4):1223–1230.
genome-wide genetics. PLoS Genet. 2019;15(2):e1007978. Webster R, Oliver MA. Geostatistics for Environmental Scientists.
Rutkoski JE, Poland JA, Singh RP, Huerta-Espino J, Bhavani S, Barbier New York (NY): John Wiley & Sons; 2007.
H, Rouse MN, Jannink JL, Sorrells ME. Genomic selection for Wickham H. ggplot2: Elegant Graphics for Data Analysis. New York
quantitative adult plant stem rust resistance in wheat. Plant (NY): Springer-Verlag; 2016.
Genome. 2014;7(3);1–10. Wray NR, Kemper KE, Hayes BJ, Goddard ME, Visscher PM. Complex
Schmidt P, Hartung J, Bennewitz J, Piepho HP. Heritability in plant trait prediction from genome data: contrasting EBV in livestock
breeding on a genotype-difference basis. Genetics. 2019a;212(4): to PRS in humans: genomic prediction. Genetics. 2019;211(4):
991–1008. 1131–1141.
M. J. Feldmann et al. | 13

Wricke G, Weber E. Quantitative Genetics and Selection in Plant MG, et al. Genome partitioning of genetic variation for complex
Breeding. New York (NY): Walter de Gruyter; 2010. traits using common snps. Nat Genet. 2011;43(6):519–525.
Yadav S, Wei X, Joyce P, Atkin F, Deomano E, Sun Y, Nguyen LT, Ross Yang J, Zeng J, Goddard ME, Wray NR, Visscher PM. Concepts, esti-
EM, Cavallaro T, Ks A, et al. Improved genomic prediction of mation and interpretation of snp-based heritability. Nat Genet.
clonal performance in sugarcane by exploiting non-additive ge- 2017;49(9):1304–1310.
netic effects. Theor Appl Genet. 2021;134:1–18. Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, McMullen
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, MD, Gaut BS, Nielsen DM, Holland JB, et al. A unified mixed-
Madden PA, Heath AC, Martin NG, Montgomery GW, et al. model method for association mapping that accounts for multi-
Common SNPs explain a large proportion of the heritability for ple levels of relatedness. Nat Genet. 2006;38(2):203–208.
human height. Nat Genet. 2010;42(7):565–569. Zhou X, Stephens M. Genome-wide efficient mixed-model analysis
Yang J, Manolio TA, Pasquale LR, Boerwinkle E, Caporaso N, for association studies. Nat Genet. 2012;44(7):821–824.
Cunningham JM, de Andrade M, Feenstra B, Feingold E, Hayes
Communicating editor: A. Lipka

Downloaded from https://academic.oup.com/g3journal/article/12/6/jkac080/6571389 by guest on 11 February 2025

You might also like