Biplot Analysis of Data
Biplot Analysis of Data
SUMMARY
Diallel crosses have been used in genetic research to determine the inheritance of a trait among a set of genotypes and to identify superior parents for hybrid or cultivar development. Conventional analysis of diallel data is limited to partitioning the total variation attributable to crosses into general combining ability (GCA) of each parent and specic combining ability (SCA) of each cross. The SCA effects are just residuals not explained by the GCA effects; they are cross specic and do not provide much information on parents. The biplot approach of diallel data analysis introduced in this chapter allows a much better understanding of parents. For a given set of data, the following information can be easily visualized: 1) the GCA effect of each parent; 2) the SCA effect of each parent (not cross); 3) the best crosses; 4) the best testers; 5) the heterotic groups; and 6) genetic constitutions of parents with regard to the trait under investigation. Diallel crosses represent matings made in all possible combinations among a set of genotypes. They have been widely used in genetic research for investigating inheritance of quantitative traits among a set of genotypes. There are four types of diallel mating schemes (Grifng, 1956): 1) method 1 diallel cross with parents and reciprocals; 2) method 2 diallel cross with parents but without reciprocals; 3) method 3 diallel cross with reciprocals but without parents; and 4) method 4 diallel cross without parents and reciprocals. Reciprocals are made for the purpose of detecting any maternal effect. We conne our discussion to method 2 diallel cross, although other types of diallel crosses can be easily accommodated. Conventionally, analysis of diallel cross data is conducted to partition total genetic variation into GCA of the parents and SCA of the crosses. In this chapter, we will use two sets of diallel data (Tables 9.1 and 9.2) to demonstrate the biplot approach to diallel analysis. The rst dataset is to demonstrate the general steps and utilities of biplot analysis for diallel data. The second dataset is used to exemplify analysis of a large dataset for which a biplot of the rst principal component (PC1) vs. the second principal component (PC2) may not be adequate. A discussion of both datasets is necessary because they contain contrasting entry tester interaction or gene action patterns.
TABLE 9.1 Resistance to FHB of Seven Winter Wheat Genotypes and their F1 Hybrids, as Measured by Percentage of Uninfected Kernels Upon Inoculation
Testers Entries A B C D E F G Mean A 27.5 35.7 46.4 53.7 33.3 64.9 43.3 43.5 B 35.7 37.5 46.2 40.8 51.9 45.6 57.5 45.0 C 46.4 46.2 38.7 49.1 50.4 55.6 69.4 50.8 D 53.7 40.8 49.1 51.2 49.4 48.1 57.5 50.0 E 33.3 51.9 50.4 49.4 42.5 63.1 68.9 51.4 F 64.9 45.6 55.6 48.1 63.1 60.0 63.1 57.2 G 43.3 57.5 69.4 57.5 68.9 63.1 43.7 57.6 MEAN 43.5 45.0 50.8 50.0 51.4 57.2 57.6 50.8 GCA 7.3 5.8 0.0 0.8 0.6 6.4 6.8 0.0
Note: The genotype codes are A = Alidos, B = 81-F379, C = Arina, D = SVP7201717510, E = SVP-C87155, F = UNG136.1, and G = Ung226.1. Source: Data based on Buerstmayr, H. et al., Euphytica, 110:199206, 1999.
In a diallel cross data, a parent is both an entry and a tester. The model used for biplot analysis of diallel data is the tester-centered principal component analysis. It was labeled as Equation 4.5 and is presented again below: Yij j = gi1e1 j + gi 2 e2 j + ij where Yij is the expected value of the cross between entry i and tester j; is the grand mean; j is the main effect of tester j; gi1 and e1j are called the primary effects for entry i and tester j, respectively; gi2 and e2j the secondary effects for entry i and tester j, respectively; and ij is the residue not explained by the primary and secondary effects. A biplot is constructed by plotting gi1 against gi2 and e1j against e2j in a single scatter plot. Using GGEbiplot software, a biplot will be generated in about one second after the data are read (Figure 9.1). The parents are presented in lowercase italics when they are viewed as entries; they are in regular uppercase when viewed as testers. The biplot explained 77% (46 and 31% by PC1 and PC2, respectively) of the total variation, which, in conventional analyses, would be partitioned into GCA effects of the parents and SCA effects of the crosses. The guidelines cross at the biplot origin (0,0). What we can learn from this biplot follows.
4.4
G
3.3
2.2
c d f
1.1
P C 2
b e B D F a C g E
-1.1
-2.2
-3.3
-4.4 -4.4 -3.3 -2.2 -1.1 0 1.1 2.2 3.3 4.4 5.5 6.6
P C1 Symmetrical Scaling
FIGURE 9.1 Biplot based on the wheat FHB research data. The seven parents are in lowercase when viewed as entries and in uppercase when viewed as testers.
GCA effects, whereas entry a (Alidos, the most susceptible parent) had the lowest GCA effect. The GCA effects of the entries are in the order of: g f > d > c > b e > a. Note that this order is highly consistent with that of the last column in Table 9.1 except that E was misplaced. We see later that E is something special; it is the best tester among the tested parents (see Section 9.5). The correlation between the GCA effects and the projections onto the ATC axis is 0.926.
9.5 THE BEST TESTERS FOR ASSESSING GENERAL COMBINING ABILITY OF PARENTS
An ideal tester for revealing the GCA effects of entries should fulll two criteria: it should be representative of all testers and, at the same time, be most discriminating of the entries. Based on this denition, an ideal tester must be located on the ATC axis to be representative of all testers; its vector should be the longest of all testers to be most discriminating. Such a tester is indicated by the center of the concentric circles in Figure 9.4. Although such an ideal tester may not exist in reality, it can be used as a reference to compare the real testers. The concentric circles are drawn for this purpose with the hypothesized ideal tester at the center (Figure 9.4). The closer a tester is to this ideal center, the more desirable it is. Clearly, tester E was the best tester in this dataset, as it was very close to the ideal tester. On the contrary, G was the poorest tester as it is the least representative of all testers. Figure 9.4 can be brought about by a single click on the GGEbiplot function Compare Withthe Ideal Tester (Chapter 6).
4.4
G
3.3
2.2
c d f
1.1
P C 2
b e B D F a C g E
-1.1
-2.2
-3.3
-4.4 -4.4 -3.3 -2.2 -1.1 0 1.1 2.2 3.3 4.4 5.5 6.6
P C1 Symmetrical Scaling
FIGURE 9.2 Average tester coordination (ATC) view of the biplot based on the wheat FHB research data. The small circle represents the average tester. Entries are in lowercase italics and testers are in uppercase.
28
21
G c d b e B D F E f A
14
P C 2
-7
-14
C g
-21
-21
-14
-7
14
21
28
35
42
P C1 Tester-focused Scaling
FIGURE 9.3 The tester vector view of the biplot based on the wheat FHB research data. The seven parents are in lowercase when viewed as entries and in uppercase when viewed as testers.
28
21
G c c c c B D F E c A
14
P C 2
-7
-14
C c
-21
-21
-14
-7
14
21
28
35
42
P C1 Tester-focused Scaling
FIGURE 9.4 Visual evaluation of the parents as testers. The concentric center represents the ideal tester, which is the most discriminating of entries and has no preference for mating partners, i.e., with zero specic combining ability. The seven parents are in lowercase italics when viewed as entries and in uppercase when viewed as testers.
R = 0.69
C D
B A
That tester E is the best tester implies that the GCA effect of an entry can be reasonably assessed by the value of its hybrid with tester E. To verify, F1 hybrid between each entry and tester E is plotted against the GCA effect of the entry (Figure 9.5). The two are highly correlated except entries b and e are off the regression line. The performance of BE was above expectation based on the overall pattern of the data.
28
21
G c d b e B D F a C g
-21 -14 -7 0 7 14 21 28 35 42
14
A f
P C 2
-7
-14
-21
P C1 Tester-focused Scaling
FIGURE 9.6 Polygon view of the biplot, showing the best crosses among all possible combinations. The best crosses are g by C, B, D, E, and F in the f sector and f by A, and E in the f sector. The parents are in lowercase italics when viewed as entries and in uppercase when viewed as testers.
All testers had positive PC1 scores (Figure 9.7), implying that the interaction between the entries and the testers displayed by PC1 is of non-crossover type or proportional interaction (Crossa and Cornelius, 1997; Yan et al., 2000). Thus, the entry PC1 scores should approximate the GCA effects, as discussed in Section 9.2. Figure 9.7 indicates three groups relative to GCA. Group 1 contains entry a only, with the smallest GCA; Group 2 includes entries b, c, d, and e, with intermediate GCA effects and minor differences among them; and Group 3 includes entries f and g, with the largest GCA. To explain the differences in GCA, we hypothesize that Group 2 had an additive gene (A1) relative to Group 1 i.e., entry a, and Group 3 had an additional gene (A2) relative to Group 2. PC2 displays the nonproportional interactions between entries and testers, as the testers assumed different signs (Crossa and Cornelius, 1997; Yan et al., 2000). Specically, PC2 displays positive interactions between two heterotic groups: A and G as one group and B, C, D, and F as the other. If we assume that heterosis arises from the accumulation of different dominant genes, then the two groups must have different dominant FHB resistance genes that are designated as D1 and D2 (Figure 9.8). Entry e is located right on the PC2 guideline, implying that there was no nonproportionate interaction between e and either of the two heterotic groups. This can be explained by one of the three hypotheses: 1) entry e carries neither D1 nor D2; 2) entry e carries both D1 and D2; and 3) entry e carries a resistance gene that is different from both D1 and D2. The rst hypothesis is the least tenable because it cannot explain the heterosis observed between entry e and other parents. The second and the third hypotheses are equally satisfactory in explaining the heterosis between e and other parents, but the third hypothesis must invoke an additional gene. Figure 9.9 integrates Figures 9.6 to 9.8 to formulate the genotypes of the seven parents and to explain their performance as purelines and the performances of their hybrids. The hybrids between G and B, C, D, and F were among the best hybrids relative to FHB resistance because they each integrated the four resistance genes (A1, A2, D1, and D2). The same can be said for the cross AF. The formulations in Figure 9.9 allow some general discussion on the nature of a superior parent and a superior tester. Parents F and G are regarded as superior, because they had both high GCA and SCA. They had high GCA by having more resistance genes, and they had high SCA by having resistance genes different from those in the other heterotic groups. Parent E was regarded as a superior tester for identifying parents with high GCA, because it has a gene that is different from all the existing resistance genes in the other parents. Superior hybrids combine all or most of the resistance genes through one of the two pathways: 1) both parents exhibit high GCA but belong to different heterotic groups, and 2) one high-GCA parent and one superior tester. Caution must be used when reading this section, however. The hypotheses on the genetic constitutions of the parents with regard to resistance to FHB are highly speculative. These hypotheses have yet to be subjected to critical testing. Nevertheless, we are excited about the capability of the GGEbiplot to allow such hypotheses to be formulated, given the importance of hypotheses in scientic discovery.
4.4
G
3.3
2.2
c d f
1.1
P C 2
b
A1
e B D F a C g
A1A2
-1.1
-2.2
-3.3
-4.4 -4.4 -3.3 -2.2 -1.1 0 1.1 2.2 3.3 4.4 5.5 6.6
P C1 Symmetrical Scaling
FIGURE 9.7 The proposed genotypes of the parents based on PC1. Parents when viewed as entries are in lowercase italics and in uppercase when viewed as testers.
4.4
G
3.3
2.2
c d
D1
A f
1.1
P C 2
b e
D1D2?
E B D F
-1.1
-2.2
a
-3.3
C
D2
g
-4.4 -4.4 -3.3 -2.2 -1.1 0 1.1 2.2 3.3 4.4 5.5 6.6
P C1 Symmetrical Scaling
FIGURE 9.8 The proposed genotypes of the parents based on PC2. Parents when viewed as entries are in lowercase italics and in uppercase when viewed as testers.
28
21
G
A1D1
14
c d b e
A1D1D2?
A f
A1A2D1
P C 2
E B D F
-7
-14
a
D2
C g A1A2D2
-21
-21
-14
-7
14
21
28
35
42
P C1 Tester-focused Scaling
FIGURE 9.9 Possible genotypes of the parents based on both PC1 and PC2.
TABLE 9.2 Tolerance of Ten Corn Inbreds (Diagonal) and their F1 Hybrids to Pink Stem Borer, as Measured by the Percentage of Yield Retained after Infestation
A A B C D E F G H I J 85.8 89.1 86.3 82.0 86.6 92.4 82.9 88.1 84.2 86.0 B 89.1 88.3 79.7 72.4 89.6 78.5 97.6 84.8 83.0 89.8 C 86.3 79.7 89.9 88.3 97.2 86.0 72.2 92.5 74.9 82.6 D 82.0 72.4 88.3 -a 91.1 91.4 76.5 90.4 76.8 81.5 E 86.6 89.6 97.2 91.1 81.7 83.3 86.3 94.9 83.9 86.5 F 92.4 78.5 86.0 91.4 83.3 88.4 88.9 87.0 76.8 83.8 G 82.9 97.6 72.2 76.5 86.3 88.9 70.7 97.7 75.8 83.9 H 88.1 84.8 92.5 90.4 94.9 87.0 97.7 87.6 99.9 83.9 I 84.2 83.0 74.9 76.8 83.9 76.8 75.8 99.9 92.9 82.1 J 86.0 89.8 82.6 81.5 86.5 83.8 83.9 83.9 82.1 78.0 Mean 86.3 85.3 85.0 83.4 88.1 85.6 83.2 90.7 83.0 83.8
Note: The codes of the inbreds are: A = A509; B = A637; C = A661; D = CM105; E = EP28; F = EP31; G = EP42; H = F7; I = PB60; and J = Z77016.
a
Missing cell replaced by its column average for completing the calculation.
9.8.1 SHRINKING
THE
DATASET
BY
The polygon view of the biplot based on the 10 10 diallel data (Table 9.2) explained 63%, 37% by PC1 and 26% by PC2, of the total variation (Figure 9.10). This is considerably smaller than that explained by the biplot for the FHB dataset (Figure 9.1). Less variation explained by the biplot implies that some predictions based on the biplot will be less accurate. Therefore, it would be a good strategy to try to reduce the data size by removing redundant parents. Figure 9.10 suggests that A and J, C and D, and E and F are pairs of parents that are similar both as entries and testers. Therefore, parents J, D, and F (alternatively, A, C, and E) can be removed from the data without losing critical information.
9.8.3 GCA
AND
SCA
The ATC view brought up by the GGEbiplot function Average Tester Coordination helps in visualizing the GCA and SCA effects of the parents (Figure 9.12). The ATC axis happens to coincide with the PC1 axis. Thus, the PC1 scores of the entries approximate their GCA effects, with r = 0.953. The highest PC1 entry is h, followed by b, e, a, i, c, and g. Entry c had the highest SCA; it interacted positively with itself and E but negatively with others.
OF
PARENTS
WITH
REGARD
TO
PSB RESISTANCE
From the perspective of the entries, PC1 is the contrast of c, g, and i vs. b, e, and h, although there are differences within each group. For simplicity, we assign dominant genes R1 and R2, respectively (Figure 9.15). This assignment implies that there could be heterosis in all crosses between the two groups.
3.3
b
2.2
B i I G
1.1
g j J a E A e f F h
P C 2
-1.1
-2.2
c
-3.3 -2.2 -1.1
d
0 1.1
D
2.2
C
3.3 4.4 5.5
P C1 Symmetrical Scaling
FIGURE 9.10 Polygon view of the biplot of the 10 10 maize diallel crosses for resistance to corn pink stem borer. The ten inbreds are in lowercase when viewed as entries and in uppercase when viewed as testers.
4.4
3.3
2.2
P C 2
1.1
Ee
a H
A h
-1.1
g i
-3.3 -2.2 -1.1
b I
0 1.1 2.2 3.3
-2.2
4.4
5.5
P C1 Symmetrical Scaling
FIGURE 9.11 Polygon view of the biplot based on a subset of the maize diallel cross data; three parents (D, F, and J) were deleted from the data. Parents when viewed as entries are in lowercase italics and in uppercase when viewed as testers.
18
c
12
P C 2
E e a A H h B i I
-6 0 6 12 18 24 30
-6
P C1 Entry-focused Scaling
FIGURE 9.12 The ATC view of the biplot based on the maize diallel cross subset, showing the GCA of the entries.
20
Model 1 PC 1 = 53% PC 2 = 20% Sum = 73%
15
10
P C 2
cE c H A c B c
-15 -10 -5 0 5 10
-5
c I
15 20 25
-10
30
35
P C1 Tester-focused Scaling
FIGURE 9.13 The ATC view of the biplot based on the maize diallel cross subset, comparing testers with the ideal tester, which is presented by the concentric center. The ideal tester is the most discriminating of entries and has no preference for mating partners.
PC2 is the contrast between c and e vs. g, i, b, and h (Figure 9.16). Interestingly, these two groups interacted negatively between groups and positively within groups, suggesting that recessive resistance genes are involved. Therefore, r3 and r4 are assigned to the two groups, respectively. Parent A is near the biplot origin, indicating lack of interaction with any of the two groups. It is, therefore, assigned the double recessive genotype r3r4. Integrating Figures 9.11, 9,15, and 9.16 gives a complete picture of the PSB resistance phenotypes of the parents and their hybrids, and possible interpretations from the perspective of genetic constitutions (Figure 9.17). Again, it should be pointed out that these interpretations are only hypotheses, which have to be critically tested.
20
15
10
P C 2
eE a H A h B i
-15 -10 -5 0 5 10
-5
b I
15 20 25
-10
30
35
P C1 Tester-focused Scaling
FIGURE 9.14 The tester vector view of the biplot based on the maize diallel cross subset, showing groups of testers.
4.4
Model 1 PC 1 = 53% PC 2 = 20% Sum = 73%
3.3
2.2
P C 2
1.1
Ee
R1 R2
a H
h B i b I
-1.1 0 1.1 2.2 3.3 4.4 5.5
-1.1
P C1 Symmetrical Scaling
FIGURE 9.15 The proposed genotypes of the entries based on PC1.
4.4
3.3
2.2
r3
P C 2
1.1
Ee
r3r4
a H
A h
-1.1
g i
-3.3 -2.2 -1.1
r4
b I
-2.2
1.1
2.2
3.3
4.4
5.5
P C1 Symmetrical Scaling
FIGURE 9.16 The proposed genotypes of the entries based on PC2.
4.4
Model 1 PC 1 = 53% PC 2 = 20% Sum = 73%
R1r3
3.3
2.2
P C 2
1.1
Ee
R2r3
a H
r3r4
A h R2r4 b
R2r4
-1.1
g
R1r4 R1r4
B i
G I
-1.1
1.1
2.2
3.3
4.4
5.5
P C1 Symmetrical Scaling
FIGURE 9.17 Proposed genotypes of the entries based on both PC1 and PC2. Minor genes may exist to differentiate between entries g and i and between entries b and h.
visualized with reference to the biplot size. If two parents look different, they are probably different. If they look similar, they are probably not very much different. Unfortunately, decision-making in scientic research is heavily dependent on statistical tests. If two things look similar, what is the point in trying to prove that they are different? Yes, statistical tests make us more condent about our conclusions. But the bottom line is that all decisions are subjective: it is subjective to choose among many testing methods, and it is subjective to choose a signicance level. Someone has said, the relationship between statistics and agriculture is like that between a lamp post and a drunk it is for support, not illumination. It is our belief that researchers heavily rely on statistical tests partly because they have no means of getting a complete picture of their problem, research, or dataset. In the light of a biplot, statistical tests may become less crucial for making decisions. Seasoned breeders do not make decisions based on statistical tests, not because tests are not available, but because they are not needed. In most cases, the need for a statistical test for a particular hypothesis is an indication of lack of condence by the researcher in the hypothesis; but when this is the case, statistical tests will not dramatically increase condence. Nonetheless, the GGEbiplot software is now equipped to do conventional statistics. This will allow researchers to examine their data both via the biplot and conventional way.