Biplot Analysis of MET Data IITA
Biplot Analysis of MET Data IITA
Weikai Yan
May 2006
Contact: [email protected]
Multi-Environment Trials (MET)
• MET are essential
• MET are expensive
• MET data are valuable
• MET data are not fully used
Weikai Yan2006
Why biplot analysis?
• Biplot analysis can help understand MET
data
– Graphically,
– Effectively,
– Conveniently
Weikai Yan2006
Outline
• Multi-environment trial (MET) data
• Basics of biplot analysis
• Biplot analysis of G-by-E data
• Biplot analysis of G-by-T data
• Better understanding of MET data
• Conclusions
Weikai Yan2006
Multi-environment
trial data
Contact: [email protected]
MET data is
a genotype-environment-trait
(G-E-T) 3-way table
• Multiple Genotypes
• Multiple Environments
• Multiple Traits
Weikai Yan2006
A G-E-T 3-way table contains
many 2-way tables
• G by E: for each trait
• G by T (trait): in each environment;
across environments
• E by T: for each genotype; across
genotypes
Weikai Yan2006
A G-E-T 3-way table is
an extended 2-way table
• G by V:
– each E-T combination as a variable (V)
• P by T:
– each G-E combination as a phenotype
(P)
Weikai Yan2006
A G-E-T 3-way table implies
informative 2-way tables
• Association by environment 2-way
tables
– Associations:
• among traits
• between traits and genetic markers
Weikai Yan2006
Goals of MET data analysis
• Short-term goals:
– Variety evaluation
• Response to the environment (G x E)
• Trait profiles (G x T)
• Long-term goals:
– To understand
• the target environment (G x E)
• the test environments (G x E)
• the crop (G x T)
• the genotype x environment interaction (A x T)
Weikai Yan2006
Basics of biplot
analysis
Most two-way tables can be
visually studied using biplots
Contact: [email protected]
Origin of biplot
Gabriel (1971)
One of the most
important advances in
data analysis in recent
decades
Currently…
> 50,000 web pages
Numerous academic
publications
Included in most
statistical analysis
packages
Still a very new
technique to most
scientists
Prof. Ruben Gabriel, “The founder of biplot”
Courtesy of Prof. Purificación Galindo
University of Salamanca, Spain
Weikai Yan2006
What is a biplot?
• “Biplot” = “bi” + “plot”
– “plot”
• scatter plot of two rows OR of two columns, or
• scatter plot summarizing the rows OR the columns
– “bi”
• BOTH rows AND columns
• 1 biplot >> 2 plots
Weikai Yan2006
Mathematical definition of a Biplot
Graphical display of matrix multiplication
Matrix multiplication 5
a 2
3 3 x 2 3 3 a2 6 12 15
4.472 cos =
0.8944
2
Y
0
O A4
-1
-2 B3
– Pij =OAi*OBj*cosij -4
-4 -3 -2 -1 0 1 2 3 4 5
X
– Implies the product matrix
Weikai Yan2006
Practical definition of a biplot
“Any two-way table can be analyzed using a 2D-biplot as soon as it can be
sufficiently approximated by a rank-2 matrix.” (Gabriel, 1971)
(Now 3D-biplots are also possible…)
Matrix decomposition 5
4 E1
P(4, 3) G(3, 2) E(2, 3)
e1 e2 e3 x y 3 G2 G1
g1 20 9 6 g1 4 3
e1 e2 e3 2
g2 6 g 2 3 3 x 2 3 3
12 15 1 E2
Y
g3 10 6 9 g 3 1 3 y 4 1 2
0
O G4
g 4 8 12 12 g 4 4 0 -1
-2
E3
G-by-E table -3
G3
-4
-4 -3 -2 -1 0 1 2 3 4 5
X
Weikai Yan2006
Singular Value Decomposition (SVD) &
Singular Value Partitioning (SVP)
The ‘rank’ of Y, i.e.,
the minimum number Matrix Matrix
characterising “Singular values” characterising
of PC required to
fully represent Y the rows the columns
r
SVD: aik k bkj
Yij SVD
k 1
r
Inner-product property
Interpretations based on biplots with f = 1
approximates YYT, the distance matrix
Similarity/dissimilarity among row (genotype) factors
Interpretations based on biplots with f = 0
approximates YTY, the variance matrix
Similarity/dissimilarity among column (environment)
factors
Combined use of f = 0 and f = 1
(Gabriel, 2002 Biometrika; Yan, 2002, Agron J; Built in the GGEbiplot software)
Weikai Yan2006
Biplot analysis is…
to use biplots to display
– a two-way data per se (Y),
– its distance matrix (YYT), and
– its variance matrix (YTY)
so that
– relationships among rows,
– relationships among columns, and
– interactions between rows and columns
can be graphically visualized.
Weikai Yan2006
Data centering prior to biplot analysis
• The general linear model for a G-by-E
data set (P)
– P = M + G + E + GE
• Possible two-way “tables” (Y):
• Y = P = M + G + E + GE —original data: QQE biplot
• Y = P – M = G + E + GE —global-centered (PCA)
• Y = P – M – E = G + GE —column-centered: GGE biplot
• Y = P – M – G = E + GE —row-centered
• Y = P – M – G – E = GE —double-centered: GE biplot
All models are useful, depending on the research objectives (built in GGEbiplot)
Weikai Yan2006
Data scaling prior to biplot analysis
• Different GGE biplots
• Yij = (i + ij)/sj
• Sj = 1 no scaling
(built in GGEbiplot)
Weikai Yan2006
Four questions must be asked
before trying to interpret a biplot
TEST
GENOTYPE ENVIRONMENT
EVALUATION EVALUATION
MEGA-
ENVIRONMENT
ANALYSIS
Contact: [email protected]
Sample G-by-E data
(Yield data of 18 genotypes in 9 environments, 1993, Ontario, Canada)
Weikai Yan2006
Before trying to interpret a biplot…
1. Model selection?
Centering = 2 (“G+GE”)
Scaling =0
2. Goodness of fit?
78%.
3. Singular value
partitioning?
SVP = 2 (environment-
metric)
4. Draw to scale?
Yes.
Weikai Yan2006
G By E data analysis
TEST
GENOTYPE ENVIRONMENT
EVALUATION EVALUATION
MEGA-
ENVIRONMENT
ANALYSIS
• Mega-environment is a group of geographical locations that share the same (set of)
best genotypes consistently across years.
Weikai Yan2006
Relationships among environments
The “Environment-vector” view
• Angle vs.
correlation
• The angles
among test
environments
• Environment
grouping
Weikai Yan2006
“Which-won-where”
G7
G18
G12
G13 G8
ME: mega-environment
Weikai Yan2006
G By E data analysis
TEST
GENOTYPE ENVIRONMENT
EVALUATION EVALUATION
MEGA-
ENVIRONMENT
ANALYSIS
Weikai Yan2006
Discriminating ability and representativeness
Vector length: discriminating ability
Angle to the AE: representativeness
Average-environment axis
Average environment
Weikai Yan2006
Ideal test environments:
discriminating and representative
Ideal test
environment
Weikai Yan2006
Classify each test environment into
one of three categories
Discriminative Not
discriminative
Weikai Yan2006
Vector length = discrimination
= GE = GE1 + GE2
Contribution to
Proportionate
GE
Contribution to
Non-
proportionate
GE
Weikai Yan2006
G By E data analysis
TEST
GENOTYPE ENVIRONMENT
EVALUATION EVALUATION
MEGA-
ENVIRONMENT
ANALYSIS
Weikai Yan2006
Vector length = GGE = G + GE
Contribution To GE
(instability)
Contribution To G
(mean performance)
Weikai Yan2006
Mean vs. Stability
Weikai Yan2006
Genotype ranking on both MEAN and STABILITY
“The ideal
genotype”
Weikai Yan2006
Genotype classification
Mean High mean Low mean
Stability performance performance
Important comments:
– (2) and (3) are meaningful only for a single mega-environment
– Any stability analysis is meaningful only for a single mega-
environment
– Any stability index can be used only as a modifier to the ranking
based on mean performance
Weikai Yan2006
Other ways to view
a GGE biplot
Contact: [email protected]
Inner-product property
Weikai Yan2006
Ranking on a single environment
Weikai Yan2006
Ranking on two environments
Weikai Yan2006
Relative adaptation of a genotype
Weikai Yan2006
Compare any two genotypes
Weikai Yan2006
Biplot analysis of
Genotype by trait data
Contact: [email protected]
Objectives of G By T data analysis
Weikai Yan2006
Data of 4 traits for 19 covered oat
varieties (Ontario 2004)
(Background info: High yield, high groat, high protein, and low oil are desirable for milling oats)
Weikai Yan2006
Relationships among traits
Weikai Yan2006
Trait profile of each genotype
Weikai Yan2006
Trait profile of a genotype
Weikai Yan2006
Trait profile comparison between two
genotypes
Weikai Yan2006
Genotype ranking based on a trait
Weikai Yan2006
Parent selection based on trait profiles
Weikai Yan2006
Independent culling
Weikai Yan2006
Fuller understanding
of MET data
MET data are more informative
than you thought
Contact: [email protected]
A G-E-T 3-way dataset contains
various 2-way tables
• G by E data
• G by T data
• E by T data:
– for each genotype; all genotypes
• G by V data:
– each E-T as a variable (V)
• P by T data:
– each G-E as a phenotype (P)
• Genetic association by environment data
• Trait association by environment data
Weikai Yan2006
Genetic-covariate by environment biplot
(QTL by environment biplot)
Barley
Genomics
Data
Weikai Yan2006
Trait-association by environment biplot
Oat
MET
Data
Weikai Yan2006
Four-way data analysis
• Year…
Weikai Yan2006
Conclusions
Contact: [email protected]
Conclusion (1)
• “GGE biplot analysis” is an effective tool
for G by E data analysis to achieve
understandings about….
1. the target environment,
2. the test environments, and
3. the genotypes
4. stability analysis is useful only to a single
mega-environment
Weikai Yan2006
Conclusion (2)
• “GGE biplot analysis” is an effective tool
for G by T data analysis to achieve
understandings about….
1. the interconnected plant system,
2. positively correlated traits
3. negatively correlated traits
4. the strength and weakness of the
genotypes
Weikai Yan2006
Conclusion (3)
• “Biplot analysis” is an effective tool for
other two-way table analysis
–Marker by environment
–QTL by environment
–Gene by treatment
–Diallel cross
–…
Weikai Yan2006
Conclusion (4)
• Biplot analysis can be VERY EASY…
– From reading data to displaying the biplot: 2 seconds
– Displaying any of the perspectives of a biplot and
changing from one to another: 1 second
– Displaying the biplot for any subset: 1 second
– Learning how to use the software and interpret
biplots: 30 minutes
– Everything can be just one mouse-click away
Weikai Yan2006
Thank you
Contact: Weikai Yan: [email protected]
web: www.ggebiplot.com
Contact: [email protected]