Genetic Analysis of Potato Tuber Metabolite Composition-Genome-Wide Association Studies Applied To A Nontargeted Metabolome
Genetic Analysis of Potato Tuber Metabolite Composition-Genome-Wide Association Studies Applied To A Nontargeted Metabolome
DOI: 10.1002/csc2.20398
Crop Science
ORIGINAL RESEARCH ARTICLE
Crop Breeding & Genetics
1 INTRODUCTION
© 2020 The Authors. Crop Science © 2020 Crop Science Society of America
T A B L E 1 Characteristics of the 185 potato clones used in this tion (49.95% methanol/0.05% formic acid diluted in water)
study, organized by market class, skin color, and flesh color was added to each tube, resulting in a 5:1 ratio of solvent to tis-
No. of sue. Due to the volume of the microtiter plate wells compared
Category clones with the volume of extraction solution, the extraction solution
Market class was added in two steps: 750 μl followed by 250 μl after the
tissue had time to absorb the first volume of liquid. Each plate
Chipping 71
was sonicated for 15 min at 40 kHz while immersed in a water
French fries 36 bath at 20 ˚C. Subsequently, each plate was centrifuged for 10
min at 1,479g. The supernatant was filtered through a 0.2-μm
Fresh market 78
polytetrafluoroethylene (PTFE) filter in preparation for liquid
Skin color chromatography–mass spectrometry (LC–MS) analysis.
White 154
2.4 Raw data preprocessing matrix as designed in GWASpoly. The genotype scores for
GWASpoly were those SNP scores previously reported from
Initially, over 3,000 small molecule fragments were detected the Infinium array and as amended for dosage or quality con-
by tandem mass spectrometry. Some of these were likely trol (Endelman, 2017; Supplemental Table S3). A correction
to be isotopomers (identical compounds containing naturally for statistical significance was used to address the problem
occurring isotopic variants), whereas others could have been of testing multiple hypotheses and allowed us to focus on
fragments of the same molecule, such as multiple peptides features with stronger correlations, calculated as significance
derived by hydrolysis from storage proteins. In order to reduce threshold (.05) divided by the number of markers tested.
these redundancies, the R clustering package ramclustR was The Proteomics and Metabolomics Facility (PMF) at Col-
used to group related fragments together (Broeckling, Afsar, orado State University sought to identity any features that
Neumann, & Prenni, 2014). The ramclustR package simulta- were significantly associated with one or more SNP mark-
neously examines the retention time and correlational simi- ers by comparing retention times and mass over charge (m/z)
larity between all pairs of features in the dataset and performs ratios with databases of known compounds. Due to the rela-
clustering and dendrogram cutting to obtain groups of fea- tive dearth of information in plants, the PMF staff were only
tures (Broeckling et al., 2014). These groups of features are able to uncover structural information on 67 SNP-associated
used to collapse the raw feature dataset to a spectral inten- features.
sity dataset using a weighted mean function. The groups, in
conjunction with the signal intensities from the MS/MS anal-
ysis, which can be searched against MS/MS databases using 3 RESULTS
the National Institute of Standards and Technology (NIST)
database MSSearch, to compare our novel spectra to anno- 3.1 Raw data collection
tated and identified spectra (Broeckling et al., 2014). Since
we detected a mixture of primary metabolites, specialized Metabolites were extracted from cooked potato tubers. After
metabolites and fragments of abundant proteins, the latter of preprocessing raw data from UPLC–MS/MS, 981 nonredun-
which are not true metabolites, we will refer to “features” to dant metabolic features were available for further analysis.
describe the collection of molecules we measured by UPLC– Before using the data in downstream analyses, the data were
MS. After ramclustR was applied, the 3,000+ species were first log transformed, and then BLUPs were calculated from
reduced to a set of 981 features and used for further analysis. relative abundance data to account for variation arising from
The 981 features were log transformed (data in Supple- biological and technical replicates (Bates et al., 2015; Piepho,
mental Table S1) to obtain a normal distribution prior to cal- Möhring, Melchinger, & Büchse, 2008; Supplemental Table
culating best linear unbiased prediction (BLUP) values. The S2). Among the reduced dataset of features detected, 99 could
BLUPs were calculated using the lmer package in R (Bates, be partially or fully identified (complete list in Supplemen-
Mächler, Bolker, & Walker, 2015). Biological replicates and tal Table S4). Examples of features detected included amino
technical replicates (injections) were coded as fixed effects, acids (e.g., methionine), peptides, fatty acids, and glycoalka-
whereas individual clones were coded as random effects. This loids (e.g., α-chaconine and β-chaconine).
allowed us to account for error due to biological and techni-
cal replicates. The BLUPs (provided in Supplemental Table
S2), in turn, were used as input data for subsequent genetic 3.2 Genetic analysis of tuber metabolite
analysis. composition
TA B L E 2 Summary of single nucleotide polymorphism (SNP) marker allele effect sizes by autotetraploid genetic model
Genetic modelb
Effect sizea Additive 1-dom-alt 1-dom-ref 2-dom-alt 2-dom-ref
Min. .04 .04 .04 .05 .04
1st quartile .14 .1 .12 .13 .15
Median .21 .18 .20 .22 .24
3rd quartile .31 .28 .34 .31 .38
Max. .77 .78 .69 .57 .69
SD .14 .15 .15 .12 .15
SNP count 1,009 385 457 230 360
a
General linear models were used to estimate marker effect sizes for all loci detected by GWASpoly. Minimum, first quartile, median, third quartile and maximum adjusted
r2 (i.e., effect sizes) of population-structure-corrected metabolite feature best linear unbiased predictions (BLUPs) are reported for the 2,441 loci detected by the five
genetic models.
b
The additive model assesses whether dosage of a SNP allele is associated with a trait. The 1-dom-reference and 1-dom-alternate models assess whether a single copy of
the reference or alternate SNP allele is sufficient to confer the trait (i.e., a dominant allele). The 2-dom-ref and 2-dom-alt models assess whether two copies of either allele
are sufficient to confer a trait.
before genetic dominance is achieved, requiring a higher copy with 230 and 360 SNPs, respectively (Supplemental Table
number to satisfy a threshold of functional difference between S5). Taken as groups, the additive, 1-dom ref, and 2-dom-
haplotypes. Lastly, the general model evaluates whether any alt models had similar distributions of effect sizes, with a
of the single haplotype classes are correlated with a signif- median effect size of ∼20% for each identified genetic marker
icant difference in phenotype relative to the other four. To on the BLUP estimate for each biochemical trait measured
help describe the contrasts made between haplotypes, the sim- (Table 2).
plex dominant will be referred to as 1-dom ref or 1-dom alt
depending on which arbitrarily selected allele is dominant,
whereas duplex dominant will be referred to as 2-dom ref or 2- 3.3 Marker distribution among specific
dom alt. The additive model of inheritance yielded the most features
trait–marker associations, with 307 features associated with
1,009 SNPs (Supplemental Table S5). The simplex dominant The SNPs associated with specific features of potato tuber
models identified 241 (1-dom ref) and 205 (1-dom alt) fea- metabolite composition were not evenly distributed through-
tures associated with 457 and 385 SNPs, respectively (Sup- out the genome. Instead, we observed that several regions
plemental Table S5). The duplex dominant models identi- were significant for dozens of features, regardless of the
fied 160 (2-dom ref) and 203 (2-dom alt) features associated model of inheritance tested. Figure 1 illustrates this with
596 Crop Science LEVINA ET AL.
the additive model: distinct clusters of SNPs associated with 3.6 Genomic regions associated with
many features were observed on chromosomes 3, 7, and 8. glycoalkaloids
In every genetic model, the hot spot on chromosome 3 had
the largest number of marker–trait associations. This hot spot Four different glycoalkaloids were identified among the fea-
is most exaggerated in the additive genetic model, with 530 tures detected by metabolomic profiling. Genome-wide asso-
marker–trait associations observed between the markers at ciation study identified 21 SNP–glycoalkaloid associations,
47.1 and 50.8 Mbp. For both the additive and duplex domi- tagging nine unique positions in the genome on 8 of the
nant models, SNP c2_20259 (chromosome 3, 49,317,882 bp 12 potato chromosomes (Table 4, Supplemental Table S5).
on the pseudomolecule) was associated with the most fea- Alpha-chaconine was associated with two SNPs, c1_12771
tures; with the additive model, it was associated with 109 fea- and c2_43960, located on chromosomes 2 and 7, respectively
tures, whereas with the duplex dominant model, it was corre- (Figures 2a and 2b, Table 4). The chromosome 7 locus has
lated with 121. For the simplex dominant model (1-dom), the an additive mode of inheritance, whereas the chromosome 2
nearby marker SNP c2_1579 (chromosome 3, 50,455,611 bp) locus operates in a simplex dominant fashion, as illustrated
was associated with most features (a total of 86). A cluster in the box and whiskers plots of Figures 2a and 2b, respec-
of marker–trait associations was also seen on chromosome 9 tively. For α-solamarine, two linked SNPs on chromosome
around 7.0 Mbp, but only for the simplex dominant models, 7 (c1_8186 and c2_55833) were significant. SNP c2_55833
with 52 associations for the 1-dom alt and 30 associations for shows an additive mode of inheritance, which is illustrated
the 1-dom ref (Supplemental Table S5). in Figure 2c. Beta-chaconine was associated with four SNPs:
c2_32677, c2_27485, and c2_32710, all on chromosome 8,
and c2_1579 on chromosome 3 (Table 4, Figure 2d). The addi-
3.4 Features with SNP association and tive mode of inheritance for c2_32677 is illustrated in Fig-
identity information ure 2d, whereas c2_1579 shows a simplex dominant mode
of inheritance (Supplemental Table S5). For solasodine, 10
Out of 981 features detected by analytical chemistry and pre- SNPs across several chromosomes (1, 5, 8, 10, and 11) were
processed by our bioinformatics pipeline, 668 had a broad- significant, and most of the SNPs were significant only for the
sense heritability > .5, and 472 were significantly associated general model (Table 4, Supplemental Table S5).
with at least one SNP according to GWASpoly (p < .05). Out
of 472, we were able to partially or fully determine the iden-
tity of 67. Of these, 28 represented peptides and 39 repre- 4 DISCUSSION
sented other small molecules, including four glycoalkaloids
or their derivatives. Table 3 lists all identified features and The overall goal of this project was to link specific metabolic
their broad-sense heritability and provides the genomic loca- features of potato tubers with genetic markers, anticipating
tion of the SNPs most strongly associated with each. By way that this will eventually make it possible to more effectively
of illustration, we draw readers’ attention to mapping results breed for desired potato composition. We chose to profile
for two chemical families. cooked samples, rather than raw, as potatoes are normally
cooked before consumption. Although we were restricted to
examining only one chemical extract and applying only one
3.5 Amino acids separation chemistry with UPLC–MS/MS, we demonstrate
that a large number of metabolite features in the potato tuber
Unlike many other staple crops, potato tubers can have high are genetically determined and dramatically increase the num-
levels of unincorporated amino acids in addition to those ber of traits mapped using GWAS.
found in storage proteins (Synge, 1977). It is therefore not In this study, we detected 981 features, 472 of which were
surprising to detect free amino acids in our samples; 5 of the correlated with at least one SNP; a total of 423 unique SNPs
67 mapped and identified compounds were amino acids (argi- were associated with at least one feature. It is likely that
nine, tryptophan, tyrosine, valine, and methionine; Table 3). extensive linkage disequilibrium made it possible to map so
Tryptophan, valine, and methionine are among the essen- many features, as extensive LD ensures that even a few thou-
tial amino acids for human and animal health. Methion- sand markers are enough to provide a high level of genome
ine is of additional significance in potato as a breakdown coverage during GWAS. Linkage disequilibrium in potato is
product of methionine (i.e., methional) is a key component known to be extensive, presumably due to clonal propagation
of flavor (Di et al., 2003). Two SNPs on chromosome 7, with relatively few meioses occurring since potato cultivar
c1_8186 and c2_55833, were correlated with methionine improvement began (D’hoop, Paulo, Visser, van Eck, &
levels. van Eeuwijk, 2011). As statistical methods have improved,
LEVINA ET AL. Crop Science 597
T A B L E 3 Identified metabolic features, their broad-sense heritabilities (H2 ), and the name and chromosomal location of single nucleotide
polymorphism (SNP) markers most closely associated with each feature
Most significant
Metabolite identity H2 SNP(s)a Chromosomeb −log10 (p)c
α-chaconine .92 c1_12771, c2_43960 2, 7 5.3, 5.0
α-solamarine .79 c1_8186 7 6.7
β-chaconine .84 c2_1579, c2_32677 3, 8 6.8, 13.9
solasodine .84 c2_49910, c2_49128, 1, 5, 8, 10, 6.4, 6.5, 9.6,
c1_5566, c2_44249, 11 5.3, 8.1
c2_13593
tetrahydropentoxyline-like .87 c2_32677 8 22.5
cyclopamine-like .83 c2_7747 10 4.7
imperialine-like .81 c2_29945 9 14.4
arginine .72 c2_12125, c2_1579 1, 3 5.2, 10.1
Gly Ser Val Arg .88 c2_19749 7 9.7
guanine .75 c2_51113, c2_12368 2, 11 7.8, 4.8
leucyl-leucyl-leucine .76 c2_32677 8 15.9
methionine .87 c2_46709 7 7.1
peptide .90 c2_20259 3 16.6
peptide .85 c2_20259 3 5.4
peptide .87 c2_56971 6 4.7
peptide .86 c2_44120 7 15.3
peptide .87 c2_55833 7 5.6
peptide .84 c2_19749 7 9.6
peptide .78 c2_55833 7 6
peptide .90 c2_46213 12 8.2
peptide .54 c2_23251 12 7.7
peptide .88 c2_20259, c2_46858 3, 11 14.2, 5.0
peptide .48 c2_44120, c1_11363 7, 4 15.5, 4.8
peptide .49 c2_55833, c2_44513 7, 8 6.3, 4.7
peptide, charge state = 2 .81 c2_1579 3 18.7
peptide, charge state = 2 .73 c1_2689 12 5.7
peptide, charge state = 3 .85 c2_32677 8 7.8
peptide, charge state = 3 .83 c2_32677 8 5.8
peptide, charge state = 3 .76 c2_32677 8 6.6
peptide, charge state = 4 .68 c2_20259 3 15
peptide, charge state = 4 .89 c2_47510 3 6
peptide, charge state = 4 .90 c2_1579 3 5.8
peptide, charge state = 5 .74 c1_13911 2 12.7
peptide, charge state = 5 .55 c2_1579 3 13.4
peptide, lysine/arginine moiety .92 c2_46709 7 8.7
peptide, phenylalanine and .93 c2_46709 7 8.2
arginine moiety
peptide, charge state = 4 .93 c2_15533 10 8.2
peptide .93 c2_20274 3 41.2
Phe Trp Gly-like .66 c2_15533 10 14.4
peptide .20 c2_11005 9 4.7
(Continues)
598 Crop Science LEVINA ET AL.
TA B L E 3 (Continued)
Most significant
Metabolite identity H2 SNP(s)a Chromosomeb −log10 (p)c
peptide (as if from .71 c2_20259 3 24.2
chemotrypsin series)
tryptophan .59 c1_2117 6 5.5
tyrosine .78 c1_10001 7 21.6
Val Arg Ile Tyr .68 c1_13483 7 7.2
valine .88 c2_40155 2 11.5
lysophospholipid .54 c2_14704 1 4.7
PE(P-38:2) .85 c2_4709, c1_7574, 1, 4, 7 5.6, 5.5, 6.05
c2_43588
PG(44:2) .76 c2_1579 3 18.4
9-hydroxy-7-megastigmen-3- .89 c2_32677, c1_5936 8, 11 6.9, 5.3
one
glucoside-like
phaseic acid-like .70 c1_14293 2 8.2
C24 H38 O4 –cholanoic acid like .46 c2_56971 6 4.9
C24 H38 O4 –cholic acid-like .82 c1_10001 7 17.4
trihydroxy-octadecadienoic .93 c2_20274 3 47.1
acid
trihydroxy-octadecenoic acid .73 c1_13911 2 9.1
linoleamide .92 c2_58296 3 7.6
DAT(40:2), di-acyl conjugate .90 c2_32677 8 7.2
of disaccharide
hydroxyoreadone-like .55 c2_32677 8 7
thiamine .78 c2_58296 3 10.3
phosphate .54 c2_58296 3 9.3
choline .85 c2_1579, c2_32677 3, 8 5.3, 6.8
aminobenzoic acid .89 c2_46994, c1_9056, 6, 10, 11 13.1, 10.4,
c2_53676 4.9
caffeoylputrescine .65 c2_17218 3 4.9
dehydroquinic acid .63 c2_1579 3 7.7
eriodictyol-O-glucoside .89 c2_20259 3 24.9
n-feruloyltyramine .36 c2_20178 3 5.6
hydroxyspheroidenone .55 c2_41044 8 8.7
thonningianin A-like .86 c1_6997, c2_32677 6,8 5.4, 18.1
a The SNP markers most strongly correlated with each feature by GWASpoly.
b The chromosome each SNP is located on, respectively.
c
The strength of each SNP–feature association [−log10 (p)], respectively, as ascertained by GWASpoly.
estimates for the extent of LD have been refined, revealing 7, and 8, were each associated with a dozen or more features
that more recent cultivar releases (since 2005) have less exten- (Figure 1). This parallels a recent description of the potato
sive LD than older cultivars (prior to 1974; r2 ≤ .1 at 2 Mb tuber proteome, where hundreds of proteins were measured
rather than 10 Mb, respectively; Vos et al., 2017). However, quantitatively in a diploid mapping population and >150
LD may be more extensive within recent large introgressions were successfully mapped using QTL analysis (Acharjee
from wild relatives or other exotic germplasm that have et al., 2018). Similar to our report, the distribution of factors
brought in elite alleles at important loci (Vos et al., 2017). influencing tuber metabolite composition were not evenly
Unexpectedly, SNPs associated with tuber metabolite distributed but rather concentrated in several regions of the
composition features were not randomly distributed across genome, with small intervals of chromosomes 3, 8, and 9
the potato genome. Several small regions, on chromosomes 3, each correlated with 10–25 proteins across both seasons,
LEVINA ET AL. Crop Science 599
SNP
Glycoalkaloid markera Chromosomeb Positionc Modeld −log10 (p)e
bp
β-chaconine c2_32677 8 1,237,709 additive 12.6
c2_1579 3 50,455,611 1-dom-ref 6.8
α-solamarine c1_8186 7 3,246,279 additive 6.7
α-chaconine c1_12771 2 44,810,486 1-dom-ref 5.3
c2_43960 7 2,300,074 additive 5.0
solasodine c2_49910 1 86,095,990 general 6.4
c2_49128 5 47,454,994 general 6.5
c2_54725 5 14,541,535 general 5.1
c1_5566 8 55,137,894 general 9.6
c2_29025 8 399,666 additive 5.1
c2_44249 10 3,908,588 general 5.3
c2_13593 11 35,913,857 general 8.1
a
Markers listed were significant at marker-number corrected p value and had the highest −log10 (p) for any given genetic model:chromosome combination.
b
The chromosome each single nucleotide polymorphism (SNP) marker is located on.
c SNP chromosome position in version 4.03 of the potato genome (Sharma et al., 2013; The Potato Genome Consortium, 2011).
d
Autotetrapoloid genetic model, as implemented in GWASpoly, that detected each SNP–glycoalkaloid association.
e
The strength of each SNP–glycoalkaloid association [−log10 (p)], as ascertained by GWASpoly.
and chromosome 5 having a hot spot in only one season respectively (Altman, Travers, Kothari, Caspi, & Karp, 2013).
(Acharjee et al., 2018). Unfortunately, most of the genetic By way of comparison, the human metabolome database has
markers in that study have little information available for entries for 114,100 metabolites, including both water and lipid
them. Acharjee et al. (2018) reported that their hot spot on soluble metabolites (Wishart et al., 2007).
chromosome 3, which spanned about 10 cM, contained the Potato is autotetraploid and strikingly heterozygous, aver-
beta-carotene hydroxylase (bch) gene. As bch is located aging one SNP every 15 bp in introns and every 24 bp in exons
about 6 Mb (on the pseudomolecule) from our chromosome (Uitdewilligen et al., 2013). GWASpoly is well suited for con-
3 hot spot, it is possible that the clusters of tuber metabolite ducting GWAS in polyploids because it takes marker dosage
composition QTL we found on chromosome 3 describe the into account as it examines four different inheritance mod-
same region. Additional annotation of Acharjee et al. (2018) els. The additive model assesses whether dosage of an allele
genetic markers, especially their location, would help answer is correlated with a trait, whereas the simplex dominant and
this question more conclusively. About half of the features duplex dominant models of inheritance test whether one or
explained by chromosome 3 in our study were peptide frag- two copies of an allele are needed for a phenotype. The gen-
ments, which likely correspond to highly abundant proteins or eral model asks whether the phenotype of any genotype is dif-
peptides within the tuber. Near SNP c2_20259 are a number ferent from any other genotype and is thus more useful as an
of Kunitz-type invertase inhibitors and protease inhibitors indicator of a potential story than as an explanation of genetic
(Hirsch et al., 2014), some of which are expressed at very action. Additive genetic models are the most common expla-
high levels in tubers. It has been shown in maize, using a sim- nations in this dataset, with nearly 60% of the features (307
ilar metabolic profiling and GWAS approach, that mapping of 472) associated with SNPs appearing to act in an additive
variation in levels of a C-terminal hydrolysis fragment of an fashion. A visually compelling example of a locus that acts
abundant α-zein seed storage protein reveals the location of additively in autotetraploid potato to influence β-chaconine
the corresponding seed storage gene (Shen et al., 2013). levels is provided in Figure 2d. This is to say that each addi-
Even though we were able to identify many features, includ- tional copy of the elite allele increases the output of the trait
ing amino acids, fatty acids, and glycolakaloids, this study in a stair step fashion.
was hampered by lack of a comprehensive plant metabolite Although potato flavor has not been intensively studied
database. Plant metabolome databases are still being popu- from an analytical perspective, it is an important trait for con-
lated and expanded, especially for LC–MS data. The two main sumer acceptance, especially for fresh market varieties. One
metabolite databases that have plant metabolites are MetaCyc of the compounds we detected is methionine, which breaks
and KEGG, which contain 11,991 and 15,161 compounds, down during cooking (through the Strecker degradation
600 Crop Science LEVINA ET AL.
F I G U R E 2 Identification of single nucleotide polymorphism (SNP) markers associated with select glycoalkaloids. The R package GWASpoly
was used to identify markers associated with glycoalkaloid best linear unbiased prediction (BLUP) values. For each of the four panels, a Manhattan
plot on the left presents the genome-wide association study (GWAS) results and contains a quantile–quantile (QQ) plot as an inset. On the right of each
panel is a box-and-whisker plot showing distribution of BLUP scores by tetraploid genotype. Within the Manhattan plots, the dashed line indicates
the significance threshold of .05 (corrected for the number of marker tests performed), whereas the arrow shows the SNP highlighted in the box-and-
whisker plot. The QQ plots report observed vs, expected results for each SNP (as −log10 p values) and illustrate goodness of fit for the tested model.
Within the box-and-whisker plots, population quartiles are shown with red boxes, and population means by horizontal green lines. Above each box-
and-whisker plot, genotypes not sharing a letter are significantly different from each other (p < .05) according to ANOVA, whereas the numbers show
the number of individuals observed for each tetraploid genotype. Evaluation of SNPs for association with (a) α-chaconine using an additive model of
inheritance, (b) α-chaconine using a simplex dominant model, (c) α-solamarine using an additive model, and (d) β-chaconine using an additive model
reaction) to methional, which is responsible for the aroma high levels of glycoalkaloids early, before many resources
of baked potato (Di et al., 2003). Two SNPs (c1_8186 and have been invested in their evaluation. In this study, we found
c2_55833) correlated with methionine levels might be useful 21 SNPs associated with various glycoalkaloids. It is impor-
for improving potato flavor, although it should be noted tant to note that our statistical associations were made to the
that these two SNPs, located in the region of chromosome relative abundance of each feature, and not to absolute levels.
7 associated with many features, were also associated (in The GWAS detected glycoalkaloid-associated SNPs on chro-
coupling) with an undesirable compound, α-solamarine. mosomes 1, 2, 3, 5, 7, 8, 10, and 11. Notably, none of the 21
Understanding where linkage between metabolic features SNPs associated with glycoalkaloids mapped to regions that
works in the breeder’s favor, and where recombination will harbor known glycoalkaloid synthetic and regulatory genes
be needed to break up undesirable linkages, are important in (i.e., genes reported by Cárdenas et al., 2016; Itkin et al., 2013;
an applied breeding program. Mariot et al., 2016). Alleles of any gene that substantially
Glycoalkaloids help potatoes defend against insects and increase tuber glycoalkaloid content are likely to have been
other herbivores yet, at high levels, are poisonous to humans, strongly selected against during domestication. In future stud-
and thus breeders have to ensure that new varieties have ies, we can test whether the loci reported here have predictive
acceptably low levels in tubers. High levels of tuber glycoalk- value for glycoalkaloid content so as to save time and effort
laoids are common in the offspring of crosses made with in the breeding pipeline.
wild species (Kozukue et al., 2008). It would help breeders The large numbers of metabolic features that we were
if DNA markers could be developed to identify clones with able to associate with SNP markers increases the number
LEVINA ET AL. Crop Science 601
of traits that have been mapped using GWAS in potato to CONFLICT OF INTEREST
date by at least one order of magnitude. Nevertheless, even The authors declare no conflict of interest.
though fully half of the features we measured could be
mapped, in most cases, the associated SNPs only explained ORCID
a small fraction of the phenotypic variation. Our efforts have Anna V. Levina https://orcid.org/0000-0001-6967-9363
added additional value to the SolCAP potato diversity panel, Owen Hoekenga https://orcid.org/0000-0003-4427-2000
increasing the breadth and depth of knowledge associated
with this important community resource. However, it is clear REFERENCES
that a larger diversity panel should increase the resolution of Acharjee, A., Chibon, P. Y., Kloosterman, B., America, T., Renaut, J.,
GWAS efforts in the future and allow the detection of more Maliepaard, C., & Visser, R. G. F. (2018). Genetical genomics of
genes of smaller genetic effect than our current threshold quality related traits in potato tubers using proteomics. BMC Plant
of 4%. Although this is a critique commonly applied to the Biology, 18. https://doi.org/10.1186/s12870-018-1229-1
Altman, T., Travers, M., Kothari, A., Caspi, R., & Karp, P. D. (2013).
first wave of diversity panels and GWAS mapping studies
A systematic comparison of the MetaCyc and KEGG pathway
performed in various plant systems, the extent of LD within
databases. BMC Bioinformatics, 14. https://doi.org/10.1186/1471-
the potato genome allowed us to conduct genome-wide 2105-14-112
mapping with a fraction of the marker density required in Balcke, G. U., Handrick, V., Bergau, N., Fichtner, M., Henning, A., Stell-
most other plants; even so, mapping power could be improved mach, H., . . . Frolov, A. (2012). An UPLC-MS/MS method for highly
further by genotyping with haplotype-specific markers. Thus, sensitive high-throughput analysis of phytohormones in plant tissues.
for a relatively small investment, we expect that expansion of Plant Methods, 8. https://doi.org/10.1186/1746-4811-8-47
the SolCAP panel into a second iteration would substantially Baldwin, S. J., Dodds, K. G., Auvray, B., Genet, R. A., Macknight,
R. C., & Jacobs, J. M. E. (2011). Association mapping of cold-
benefit the U.S. and international potato breeding commu-
induced sweetening in potato using historical phenotypic data. Annals
nities. Likewise, as plant metabolite databases develop, and of Applied Biology, 158, 248–256. https://doi.org/10.1111/j.1744-
it becomes possible to identify more LC–MS features, it 7348.2011.00459.x
should eventually be possible to conduct considerably more Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting lin-
targeted breeding of potato tuber metabolite composition ear mixed-effects models using lme4. Journal of Statistical Software,
than is possible at present. 67(1). https://doiorg/10.18637/jss.v067.i01
Broeckling, C. D., Afsar, F. A., Neumann, S., & Prenni, J. E. (2014).
RAMClust: A novel feature clustering method enables spectral-
AC K N OW L E D G M E N T S
matching-based annotation for metabolomics data. Analytical Chem-
We would like to thank Shuping Cheng, Matt Falise,
istry, 86, 6812–6817. https://doi.org/10.1021/ac501530d
James Keach, Teddy Yesudasan, Maria Carrizales, Nick Broeckling, C. D., Heuberger, A. L., Prince, J. A., Ingelsson, E., &
Kaczmar, and Jaebum Park for their invaluable help in Prenni, J. E. (2013). Assigning precursor-product ion relationships
sample processing and field management. This project was in indiscriminant MS/MS data from non-targeted metabolite profil-
partially supported by USDA National Institute of Food ing studies. Metabolomics, 9, 33–43. https://doi.org/10.1007/s11306-
and Agriculture Plant Breeding and Education Grant no. 012-0426-4
2010-85117-20551 and USDA National Research Initiative Camire, M. E., Kubow, S., & Donnelly, D. J. (2009). Potatoes and human
health. Critical Reviews in Food Science and Nutrition, 49, 823–840.
Grant no. 2008-55300-04757 (the SolCAP project).
https://doi.org/10.1080/10408390903041996
Cárdenas, P. D., Sonawane, P. D., Pollier, J., Vanden Bossche, R., Dewan-
gan, V., Weithorn, E., . . . Aharoni, A. (2016). GAME9 regulates the
AUTHOR CONTRIBUTIONS biosynthesis of steroidal alkaloids and upstream isoprenoids in the
plant mevalonate pathway. Nature Communications, 7. https://doi.
Anna Lavina: Conceptualization; Data curation; Formal anal- org/10.1038/ncomms10654
ysis; Investigation; Methodology; Software; Validation; Visu- De Vos, R. C. H., Moco, S., Lommen, A., Keurentjes, J. J. B., Bino, R.
alization; Writing-original draft; Writing-review & editing. J., & Hall, R. D. (2007). Untargeted large-scale plant metabolomics
using liquid chromatography coupled to mass spectrometry. Nature
Owen Hoekenga: Conceptualization; Data curation; For-
Protocols, 2, 778–791. https://doi.org/10.1038/nprot.2007.95
mal analysis; Methodology; Supervision; Validation; Visu- D’hoop, B. B., Keizer, P. L. C., Paulo, M. J., Visser, R. G. F., van
alization; Writing-original draft; Writing-review & edit- Eeuwijk, F. A., & van Eck, H. J. (2014). Identification of agronomi-
ing. Mikhail Gordin: Data curation; Software; Validation; cally important QTL in tetraploid potato cultivars using a marker-trait
Visualization; Writing-review & editing. Corey Broeck- association analysis. Theoretical and Applied Genetics, 127, 731–748.
ling: Data curation; Formal analysis; Methodology. Walter https://doi.org/10.1007/s00122-013-2254-y
DeJong: Conceptualization; Funding acquisition, Investiga- D’hoop, B. B., Paulo, M. J., Mank, R. A., van Eck, H. J., & van
Eeuwijk, F. A. (2008). Association mapping of quality traits in potato
tion; Methodology; Project administration; Resources; Super-
(Solanum tuberosum L.). Euphytica, 161, 47–60. https://doi.org/10.
vision; Visualization; Writing-original draft; Writing-review
1007/s10681-007-9565-5
& editing.
602 Crop Science LEVINA ET AL.
D’hoop, B. B., Paulo, M. J., Visser, R. G. F., van Eck, H. J., & van Lindqvist-Kreuze, H., Gastelo, M., Perez, W., Forbes, G. A., de Koeyer,
Eeuwijk, F. A. (2011). Phenotypic analyses of multi-environment D., & Bonierbale, M. (2014). Phenotypic stability and genome-wide
data for two diverse tetraploid potato collections: Comparing an aca- association study of late blight resistance in potato genotypes adapted
demic panel with an industrial panel. Potato Research, 54, 157–181. to the tropical highlands. Phytopathology, 104, 624–633. https://doi.
https://doi.org/10.1007/s11540-011-9186-1 org/10.1094/PHYTO-10-13-0270-R
Di, R., Kim, J., Martin, M. N., Leustek, T., Jhoo, J., Ho, C. T., & Malosetti, M., Van Der Linden, C. G., Vosman, B., & Van Eeuwijk, F. A.
Tumer, N. E. (2003). Enhancement of the primary flavor compound (2007). A mixed-model approach to association mapping using pedi-
methional in potato by increasing the level of soluble methionine. gree information with an illustration of resistance to Phytophthora
Journal of Agricultural and Food Chemistry, 51, 5695–5702. https: infestans in potato. Genetics, 175, 879–889. https://doi.org/10.1534/
//doi.org/10.1021/jf030148c genetics.105.054932
Douches, D. (2008). SolCAP germplasm. Solanaceae Coordinated Mariot, R. F., de Oliveira, L. A., Voorhuijzen, M. M., Staats, M., Hutten,
Agricultural Project. Retrieved from http://solcap.msu.edu/potato_ R. C. B., van Dijk, J. P., . . . Frazzon, J. (2016). Characterization and
germplasm_data.shtml transcriptional profile of genes involved in glycoalkaloid biosynthe-
Endelman, J. B. (2017). Software. Endelman Group. Retrieved from http: sis in new varieties of Solanum tuberosum L. Journal of Agricultural
//potatobreeding.cals.wisc.edu/software/ and Food Chemistry, 64, 988–996. https://doi.org/10.1021/acs.jafc.
FAOSTAT. (2013). FAOSTAT. Rome: FAO. Retrieved from http://www. 5b05519
fao.org/faostat/en/#data/FBS Navarre, D. A., Goyer, A., & Shakya, R. (2009). Nutritional value of
Friedman, M. (2006). Potato glycoalkaloids and metabolites: Roles in potatoes: Vitamin, phytonutrient, and mineral content. In J. Singh &
the plant and in the diet. Journal of agricultural and food chemistry, L. Kaur (Eds.), Advances in potato chemistry and technology (1st ed.,
54, 8655–8681. https://doi.org/10.1021/jf061471t pp. 395–424). New York: Academic Press.
Friedman, M., & Levin, C. E. (2016). Glycoalkaloids and calystegine Piepho, H. P., Möhring, J., Melchinger, A. E., & Büchse, A. (2008).
alkaloids in potatoes. Page Advances in potato chemistry and tech- BLUP for phenotypic selection in plant breeding and variety test-
nology (2nd ed., pp. 167–194). Amsterdam: Elsevier. https://doi.org/ ing. Euphytica, 161, 209–228. https://doi.org/10.1007/s10681-007-
10.1016/B978-0-12-800002-1.00007-8 9449-8
Gebhardt, C., Ballvora, A., Walkemeier, B., Oberhagemann, P., & Roddick, J. G. (1996). Steroidal glycoalkaloids: Nature and conse-
Schüler, K. (2004). Assessing genetic potential in germplasm collec- quences of bioactivity. Advances in Experimental Medicine and
tions of crop plants by marker-trait association: A case study for pota- Biology, 404, 277–295. https://doi.org/10.1007/978-1-4899-1367-
toes with quantitative variation of resistance to late blight and matu- 8_25
rity type. Molecular Breeding, 13, 93–102. https://doi.org/10.1023/B: Rosyara, U. R., De Jong, W. S., Douches, D. S., & Endelman, J. B.
MOLB.0000012878.89855.df (2016). Software for genome-wide association studies in autopoly-
Heuberger, A., Broeckling, C. D., Kirkpatrick, K., & Prenni, J. E. (2013). ploids and its application to potato. The Plant Genome, 9(2). https:
Application of nontargeted metabolite profiling to discover novel //doi.org/10.3835/plantgenome2015.08.0073
markers of quality traits in an advanced population of malting barley. Sharma, S. K., Bolser, D., de Boer, J., Sønderkær, M., Amoros,
Plant Biotechnology Journal, 12, 147–160. https://doi.org/10.1111/ W., Carboni, M. F., . . . Bryan, G. J. (2013). Construction of ref-
pbi.12122 erence chromosome-scale pseudomolecules for potato: Integrating
Hirsch, C. D., Hamilton, J. P., Childs, K. L., Cepela, J., Crisovan, E., Vail- the potato genome with genetic and physical maps. G3: Genes,
lancourt, B., . . . Buell, C. R. (2014). Spud DB: A resource for mining Genomes, Genetics, 3, 2031–2047. https://doi.org/10.1534/g3.113.
sequences, genotypes, and phenotypes to accelerate potato breeding. 007153
The Plant Genome, 7. https://doi.org/10.3835/plantgenome2013.12. Shen, M., Broeckling, C. D., Chu, E. Y., Ziegler, G., Baxter, I. R., Prenni,
0042 J. E., & Hoekenga, O. A. (2013). Leveraging non-targeted metabolite
Hirsch, C. N., Hirsch, C. D., Felcher, K., Coombs, J., Zarka, D., Van profiling via statistical genomics. PLOS ONE, 8(2). https://doi.org/
Deynze, A., . . . Buell, C. R. (2013). Retrospective view of North 10.1371/journal.pone.0057667
American potato (Solanum tuberosum L.) breeding in the 20th and Synge, R. L. M. (1977). Free amino acids of potato tubers: A survey of
21st centuries. G3: Genes, Genomes, Genetics, 3, 1003–1013. https: published results set out according to potato variety. Potato Research,
//doi.org/10.1534/g3.113.005595 20, 1–7. https://doi.org/10.1007/BF02362296
Itkin, M., Heinig, U., Tzfadia, O., Bhide, A. J., Shinde, B., Cardenas, P. The Potato Genome Consortium (2011). Genome sequence and analy-
D., . . . Aharoni, A. (2013). Biosynthesis of antinutritional alkaloids sis of the tuber crop potato. Nature, 475, 189–195. https://doi.org/10.
in Solanaceous crops is mediated by clustered genes. Science, 341, 1038/nature10158
175–179. https://doi.org/10.1126/science.1240230 Tian, J., Chen, J., Ye, X., & Chen, S. (2016). Health benefits of the potato
Kozukue, N., Yoon, K.-S., Byun, G.-I., Misoo, S., Levin, C. E., & Fried- affected by domestic cooking: A review. Food Chemistry, 202, 165–
man, M. (2008). Distribution of glycoalkaloids in potato tubers of 59 175. https://doi.org/10.1016/j.foodchem.2016.01.120
accessions of two wild and five cultivated Solanum species. Jour- Tingey, W. M. (1984). Glycoalkaloids as pest resistance factors.
nal of Agricultural and Food Chemistry, 56, 11920–11928. https: American Potato Journal, 61, 157–167. https://doi.org/10.1007/
//doi.org/10.1021/jf802631t BF02854036
Li, L., Paulo, M. J., van Eeuwijk, F., & Gebhardt, C. (2010). Statistical Uitdewilligen, J. G. a M. L., a Wolters, A.-M., D’hoop, B. B., a Borm,
epistasis between candidate gene alleles for complex tuber traits in an T. J., Visser, R. G. F., & van Eck, H. J. (2013). A next-generation
association mapping population of tetraploid potato. Theoretical and sequencing method for genotyping-by-sequencing of highly heterozy-
Applied Genetics, 121, 1303–1310. https://doi.org/10.1007/s00122- gous autotetraploid potato. PLOS ONE, 10(10). https://doi.org/10.
010-1389-3 1371/journal.pone.0062355
LEVINA ET AL. Crop Science 603
Urbany, C., Stich, B., Schmidt, L., Simon, L., Berding, H., Junghans, H.,
. . . Gebhardt, C. (2011). Association genetics in Solanum tuberosum S U P P O RT I N G I N F O R M AT I O N
provides new insights into potato tuber bruising and enzymatic tis- Additional supporting information may be found online in the
sue discoloration. BMC Genomics, 12. https://doi.org/10.1186/1471-
Supporting Information section at the end of the article.
2164-12-7
Vos, P. G., Paulo, M. J., Voorrips, R. E., Visser, R. G. F., van Eck, H.
J., & van Eeuwijk, F. A. (2017). Evaluation of LD decay and various
LD-decay estimators in simulated and SNP-array data of tetraploid
How to cite this article: Levina AV, Hoekenga O,
potato. Theoretical and Applied Genetics, 130, 123–135. https://doi.
org/10.1007/s00122-016-2798-8
Gordin M, Broeckling C, De Jong WS. Genetic
Wishart, D. S., Tzur, D., Knox, C., Eisner, Roman, Guo, An Chi, analysis of potato tuber metabolite composition:
& Young, Nelson (2007). HMDB: Human metabolome database. Genome-wide association studies applied to a
Nucleic Acids Research, 35, D521–D526. https://doi.org/10.1093/nar/ nontargeted metabolome. Crop Science.
gkl923 2021;61:1−13. https://doi.org/10.1002/csc2.20398