Introduction To Biometrical Genetics-By Kenneth Mather
Introduction To Biometrical Genetics-By Kenneth Mather
Biometrical Genetics
Introduction to
Biometrical
Genetics
KENNETH MATHER
C.B.E., D.Sc., F.R.S.
JOHN L. JINKS
D.Sc., F.lnst. Bioi., F.R.S.
Professor and Head of Department of Genetics
in the University of Birmingham
LONDON
CHAPMAN AND HALL
First published 1977
by Ozapman and Hall Ltd
11 New Fetter Lane, London EC4P 4EE
© 1977 K. Mather and J. L. Jinks
Set by Hope Services, Wantage
and printed in Great Britain
at the University Printing House, Cambridge
1. Continuous variation
Mendel laid the foundation of genetics by the study of differences which
divided his peas into sharply distinct categories. Thus there was never
doubt as to whether one of his plants was tall or short, or its flowers red
or white and so on: the categories did not overlap. He was able to show
that each phenotypic class corresponded to one, or at any rate only a few,
genotypes and that where there was more than one genotype in the pheno-
typic class they could be separated by further appropriate breeding tests,
that is by the clearly distinguishable classes of plant to which they gave
rise among their descendants following appropriate test matings. He was
thus able to infer the genes, or factors as he called them, upon whose
behaviour hereditary transmission depends, and it has been by the further
study of such gene differences in many species of plants and animals that
our knowledge of the genetic materials has largely been built up. We
should note, however, that plants or animals may differ in this sharply
distinct way for reasons other than the genes they carry; in fact because
of the environments in which they have lived their lives. Thus the water
crowfoot, Ranunculus aquatilis, has quite different leaves when growing
in running water than when growing on land. In such a case, of course,
observation of the environments suggests at once that the difference is
not genetic, or at least not wholly genetic, in its causation; but in general
an appropriate breeding test is necessary to establish this point.
Now, differences by which individuals are divided into sharply distinct
categories are not the only variation to be seen in either natural popu-
lations or experimental families. Mendel's peas themselves showed further
variation, for his ta11s ranged from 6 to 7 ft or even more in height and
his shorts from 9 to 18 inches (see Bateson, 1909). The important thing
for his experiments and their interpretation was that despite the variation
within the classes, the taIls and shorts did not overlap in height: each indi-
vidual could be classified unambiguously as tall or short. There was in
fact a discontinuity in the distribution of heights between tall and short,
2 The genetical foundation
all plants below the discontinuity being short and all above it tall; and as
Mendel showed, they differed correspondingly and consistently in their
genotypes.
The same complexity of variation can be seen in other species. For
example, in man we can recognize dwarf individuals which owe their
character to a single gene difference from normals, from whom they are
generally clearly distinguishable in respect of stature. Yet people who
are not distinguishable in this way - those of normal stature - are not all
alike. Indeed they range widely in stature; but the variation they show is
of a different kind, with every stature represented between wide limits.
The middle statures of the range are the most common and if we exam-
ine a large number of individuals we find that the gradations from one
stature to the next are so fine as to be almost imperceptible. There are
in fact no discontinuities in the distribution of normal stature: the vari-
ation is continuous.
Such continuous variation is ubiquitous in living things and, apart
perhaps from a few special cases like antigenic specificity, it is displayed
by all characters. Thus in general there is no distinction between con-
tinuous and discontinuous variation in the characters by which they are
displayed and indeed, as we have already seen, we quite commonly ob-
serve the two kinds of variation side by side in the same family or popu-
lation. So, whatever the reasons for the differences between the two
kinds of variation, they are not mutually exclusive.
Some examples of continuous variation are shown in Fig. 1. In princi-
ple the number of classes into which individuals can be divided accord-
ing to the manifestation of the character is limited only by the accuracy
of the measurements we can make. We find it convenient, however, to
group the individuals whose measurements fall between certain limits,
which we choose for our own convenience, and represent the variation
by recording the numbers falling into the various classes defined in this
way. We then obtain histograms as illustrated in Figs. 1 (a) and (c) from
which the general shape of the distribution resulting from the variation
can be seen. It should nevertheless be remembered that the grouping we
are using is purely arbitary: it does not spring from discontinuities in the
variation itself and so provides no basis for an analysis of the causes of
variation in the way that Mendel showed to be possible with discontinu-
ous variation.
One class of character, however, requires a special word. Sometimes
the very nature of the character itself imposes certain discontinuities on
the variation it shows. Thus the number of vertebrae in a vertebrate
Continuous variation 3
20 (a) Man
15
<f!.
>-
u
c
Q) 10
&
~
5
0 60 65 70 75
Inches
40 40
o 0
14 16 18 20 22 24
Chaetae
Analysis of variance
df MS
Between flies 220 2.196
Within flies
(= between sides) 221 1.996
1.996/2.196 = 91 % of the variation between flies is a reflection of the
developmental variation within flies.
Difference
recovered
12
10
Original
-10 7-6- 8
-4 • 0
-2
2 4 6 8 10 12 14 difference
-4
•
III!. -6
Fig. 2. Mather and Harrison's (1949) data relating the genetical component
of variation for the number of abdominal chaetae in Drosophila melanogaster
to the chromosomes. The slope of the regression shows that 81 % of the vari-
ation in chaeta number is unambiguously ascribable to genes borne by the
three major chromosomes, which on allowing for genes which the experiment
could not be expected to pick up accords with all the heritable variation
being mediated by nuclear genes.
Overall m = 18.983
Overall dx = 0.2521
d2 = -0.2229 ± 0.0618
d3 = 0.6583
With X chromosome
from Sam Well
d2S = -0.5396 d 2W = 0.0938
0.0874
d3S = 1.0729 =
d3W 0.2438
±
!
~ P",
GAH}_p Ga H}O
9ah 3 gAh
GAh
gaH
}p'" Gah}
gAH ~
(b)
,- P",
~a
G H
I-- ? P3 ~
~
~
AGH}
agh I-p", A9H}O
aGh
AGh} A9h}
agH P3 aGH p,
Frequency
Marker class Mean Observed
A a Joint
Frequency
Marker class Mean
A a Joint
GH !(I-P2) !Pj d (I -pj-P2)/(I -P3)
Gh !P3 0 d
gH o 1P3 -d
gh !Pj !(I-P2) -d (I -Pj-P2)/(I -P3)
!CGh-gH) = d
!(GH-gh) = d(I-pj- P2)/(I-P3)
Pj+P3 = P2
we have already estimates of d and P3' Now when A-a is to the left of
G-g, P2 = Pj +P3 giving P3 = P2-Pj' Then Pj and P2 can be estimated as
!(Pj+P2-P3) and !(Pj+P2+ P3) respectively.
We can illustrate this method of locating a gene contributing to con-
tinuous variation by reference to data from Wolstenholme and Thoday
(1963). These authors report a number of such experiments in Drosophila
meianogaster, and the results of one of these experiments are set out in
the right-hand column of Table 3. The continuously varying character is
the number of sternopleural chaetae while the marker genes are clipped
wing (cp) and Stubble bristles (Sb), which are located respectively at
45.3 and 58.2 on the standard map of chromosome III. The average
number of chaetae for the four marker classes are shown in Table 3, but
the authors do not report the frequencies of these classes. A direct esti-
mate of P3 is thus not available from this experiment, but the marker
genes are.12.9 units apart on the standard map, and P3 may therefore be
taken as 0.129.
The first thing is to note that the GH class has the greatest mean num-
ber of chaeta and gh the lowest. The gene affecting chaeta number (A-a)
must thus lie between the two markers: had it been outside, the G-h and
gH classes would have shown the extreme mean chaeta numbers (see
Table 4). We then proceed, using the formulae of Table 3 to find
d = !(GH-gh) = !(21.l6-l7.86) = 1.650 and
d(P2-pj)/P3 = !(Gh-gH) = !(19.59 -18.86) = 0.365
0.365 X 0.129
giving = 0.0285.
1.650
Locating the genes 19
With Pj +P2 = P3 we then find Pj = !(0.129 -0.0285) = 0.050 and
P2 = t(0.129 + 0.0285) = 0.079.
The experiment thus places the locus of A-a at 0.05 X 100 = 5.0 units
to the right of cp and 7.9 units to the left of Sb, that is at locus 50.3 on
the standard map of chromosome III.
It has been assumed for the purpose of illustration that the effect on
sternopleural chaeta number was acribable to a single gene. In fact
Wolstenholme and Thoday obtained evidence that two genes were most
probably involved. They used in their analysis a technique, introduced
by Thoday (1961), of using progeny tests to ascertain the number of
classes genetically different in respect of chaeta number included in each
of the marker classes. This method of Thoday's has been used by Davies
(1971) to show that genes at a minimum of fifteen loci, scattered over
the lengths of all three major chromosomes, are involved in the heri-
table variation of sternopleura1 chaeta number in Drosophila melano-
gaster, and that similarly at least fourteen or fifteen loci, not the same
as those for sternopleural chaetae, are involved in the variation of ab-
dominal chaeta number in this fly. Further evidence from other experi-
ments of various kinds also indicates that the minimum number of gene
loci in the variation each of these two chaeta characters is likely to be
nearer 20 than 10.
Summarizing, these experiments with Drosophila melanogaster show
us that the heritable component of the continuous (or to be more pre-
cise, quasi-continuous) variation in both abdominal and sternopleural
chaeta number depends on genes which are carried on the chromosomes
and which will therefore segregate and recombine in just the same way
as the familiar genes of classical genetics. Furthermore, within the tech-
nicallimitations of the experiments, the whole of this heritable compo-
nent is accountable in terms of such chromosome-borne genes. Differ-
ences in chaeta number may reflect the simultaneous action of genes
carried on all of three of the major chromosomes and finer analysis
reveals that at least some fourteen or fifteen loci must be involved.
The effects of the different genes supplement one another, their effects
sometimes combining in a simple additive fashion, but sometimes inter-
acting in such a way that the combined effect is not simply the sum of
the individual actions. At the same time, overlaying the variation due to
these genes is variation traceable to environmental agencies or to the
vagaries of development, variation which is distinguishable from that
due to the genes only by a breeding test. Finally the effects traceable to
individual genes, or even to whole chromosomes, may be no greater in
20 The genetical foundation
magnitude, and indeed may often be smaller than the effects of the non-
heritable agencies. In other words, as revealed in these experiments the
heritable portion of continuous variation depends on genes transmitted
in the Mendelian fashion, but acting in polygenic systems, the member
genes of a system having effects similar to one another (and to those of
non-heritable agencies), capable of supplementing one another (whether
in simply additive fashion or not) and small in relation to the non-
heritable variation, or at least in relation to the variation in the system
as a whole.
The biometrical approach
-
x=z
I
v=li £=1 V=I~
No dominance
Uneq.uol freq.uencies
°
thus changed to that shown in Fig. 4 (centre left). The mean has been
raised from to!, the variance has increased to L\, and the distribution
is now asymmetrical with the long tail at the lower end. Making both A
and B dominant over their respective alleles changes the distribution
even more. The mean has risen further to I and the variance to I! while
the asymmetry is now so great that the extreme large phenotype is the
most common and certain of the phenotypes have vanished altogether.
Let us now revert to the assumption of no dominance, but alter the
gene frequencies so that A and a and Band b are no longer equally
common in the population. Let A occur with three times the frequency
of a and B with three times that of b, or to put it another way, let the
gene frequencies be A i; a 1 and B t b 1. The genotypes will give the
same phenotypes as in the original model at the top of Fig. 4, but they
will occur with different frequencies. Thus the proportion of AABB
24 The biometricai approach
individuals will be i- X i- X i- X i- = -,fh, that of AaBB and AABb will each
be 2 X i- X ! X i- X i- = ~, and so on. The resulting frequency distribution
of phenotypes is shown at the bottom of Fig. 4. In some respects the
change in the distribution resembles that brought about by dominance:
the mean is again raised to 1 and the distribution is asymmetrical with
the long tail towards the lower end. This new distribution differs how-
ever from that produced by dominance in that the variance has not been
raised but in fact reduced from 1 to 3/4. Thus both the assumptions of
dominance and unequal gene frequencies result in change of the bio-
metrical properties of the distribution of phenotypes, and each produces
its own characteristic syndrome of changes.
Although broadly resembling the distribution of a continuously
varying character, the distributions in Fig. 4 differ from it in one import-
ant respect: they are not strictly continuous since the phenotypes fall
into a small number of discrete classes. This difference stems from three
simplifying assumptions that we have made in the models on which the
frequency distributions of Fig. 4 are derived. In the first place we have
assumed that the effects of A-a and B-b are alike: had we not made this
assumption a larger number of phenotypes would have been possible.
Secondly, we have assumed the absence of non-heritable variation: its
presence would have blurred the boundaries of the phenotypic classes
given by the various genotypes and caused them to overlap, so produc-
ing continuous variation. Thirdly, we have been considering a very
simple polygenic system comprising only two gene pairs, which when
the action of the two gene pairs are alike produces only five phenotypic
classes, non-heritable effects apart. The consequences of raising the
number of gene pairs in the system are illustrated in Fig. 5. With four
loci involved, there are nine phenotypic classes and with eight loci there
are seventeen. Thus, given the same overall difference between the ex-
treme phenotypes the step produced by each gene substitution is smaller,
and a given change requires more gene substitutions to produce it, the
more genes there are in. the system. The result is a closer approximation
to continuous variation, and although in principle there are small dis-
continuities still present in the distribution of phenotypes, decreasing
amounts of non-heritable variation would serve to blur them and give
full continuity.
One further point should be observed about the distribution shown
in Fig. 5. All of them are based on the assumptions of no dominance and
equal frequencies of the two alleles at each locus. In consequence all the
distributions are symmetrical and have means of O. But the variances of
Genetic analysis and somatic analysis 25
-2 -I 0 1 2
X=O V=I
4 genes
-2 -I 0 1 2 -2 -I 012
X=O V-l
- 2 X=O V-l
-4
Fig. s. The effect of change in the number of genes in the polygenic sys-
tem. The three histograms show the distributions where the systems com-
prise two, four and eight genes respectively. In all cases the gene frequencies
are equal, and the genes in the system have equal and additive effects, with-
out dominance. The range between the highest and lowest expressions of
the character is the same in all three cases, the genes in the four gene and
eight gene cases thus having individual effects respectively one-half and
one-quarter of those of the genes in the two gene case. The number of
genotypic, and hence phenotypic, classes rises with the number of genes
and the approximation to fully continuous variation becomes closer. The
mean of the distribution is unchanged, but the variance falls inversely pro-
portionally as the number of genes rises.
7. Biometrical genetics
If we accept that commonly we cannot distinguish any oJ the individual
genes whose segregation contributes to continuous variation (and that
even with the special stocks available in Drosophila we cannot distinguish
all of them) we must be content to deal with the relevant polygenic sys-
tem as a whole. And since we cannot distinguish the segregant classes
one from another, we cannot use a form of analysis based on class fre-
quencies as in the classic Mendelian method. We can, however, recognize
the biometrical properties of the frequency distributions of the pheno-
types which are our raw material, and we can estimate the biometrical
quantities, the means, variances, and so on, which characterize these dis-
tributions. As we have seen, these parameters can reflect, and reflect in
characteristic ways, the properties of the polygenic system from which
the heritable component of the variation stems. We can thus seek to
gain information about the properties of the genes underlying continu-
ous variation by analysis of the biometrical quantities which character-
ize the frequency distributions of the phenotypes in related families and
populations. We must expect that the information so obtained will not
be just like that yielded by classical genetical analysis. In particular,
since we shall not be following individual genes we cannot learn about
their individual properties: rather, since we are considering the system
as a whole, we shall obtain information about the overall joint or aver-
age properties of its member genes. At the same time because we are
considering all the variation that the character shows we shall be bring-
ing the effects of all the relevant genes into the reckoning, and this we
can never achieve by the Mendelian technique of identifying and follow-
ing individual genes, since there must inevitably be some genes of rela-
tively small effect which escape identification.
The phenotypes of the individuals in any family or other appropriate
group yield two biometrical quantities which are of use to us, the mean
of the distribution (a first degree statistic since it is linear in x, the metric
measuring the expression of the character) and the variance (a second
degree statistic depending on x 2 ). In addition, any pair of related families
or groups may yield a covariance, which is also a second degree statistic.
Higher order statistics may also be obtained, notably that of the third-
30 The biometrical approach
order which measures skewness (depending on x 3 ) and the fourth order
which measures kurtosis (depending on x 4 ). These have, however, seldom
been put to use in genetical analysis and we shall consider them no further.
We shall thus be concentrating on the genetical information that can be
derived from comparisons among the means, variances and covariances of
related families or groups of individuals. These we shall seek to interpret
in terms of appropriate parameters representing the consequences of the
various genetical phenomena in which we may be interested. Having de-
fined these parameters, expectations are formulated in terms of them for
the means, variances and covariances of the families or groups that our
experiments yield. The means, etc. observed are then related to these
expectations in such a way as to yield estimates of the parameters and
tests of their significance.
In any experiment we may run into a complexity of genetical phenom-
ena, especially as we must expect to be dealing with a number of genes
whose relations one with another may not be the same for all of them:
indeed we have already seen this to be the case with the system mediating
variation of the number of sternopleural chaetae in Drosophila, where
the genes of the X chromosome interacted with those of chromosomes
II and III, although these latter show no evidence of anything but an
additive relation to one another. Such a complexity of phenomena leads
to a corresponding multiplicity of parameters which it would be necessary
to take into account in formulating expectations for the statistics ob-
served, with the consequence that except in large and complex experi-
ments there could be more parameters than there were statistics from
which to estimate them. Some simplification must therefore be made in
the approach: only those parameters which are regarded as of chief im-
portance, and with which the data can cope, should be introduced into
the analysis initially, and others added only as necessity requires.
The simplest genetical formulation to be used in the initial analysis is
generally taken as that which includes parameters representing the addi-
tive effects of the genes (that is the differences between corresponding
homozygotes, AA and aa, BB and bb, etc.) and their dominance proper-
ties. Given that the experimental material is sufficient, the experiment
adequately designed and the statistical analysis suitably carried out, we
can then estimate these parameters and also test the goodness of fit of
this initial simple formulation to the observations. If the fit proves to be
adequate, we have no grounds for postulating a more complex genetical
situation. But if, on the other hand, the fit proves to be inadequate, con-
sideration can be given to a more complex formulation incorporating
Biometrical genetics 31
further parameters, representing interaction between non~allelic genes,
or linkage or whatever else seems appropriate. If this in turn proves to
be inadequate to fit the observations, and the data are themselves suf-
ficiently extensive, a still more complex set of parameters representing
a still more complex genetical situation can be tried.
This approach will be developed and illustrated in the following chap-
ters. We shall start by considering data from controlled breeding experi-
ments based on crosses among true-breeding lines and later turn to the
more difficult analysis of data from randomly breeding populations,
just as classical genetics began with experimental crosses and later pro-
ceeded to the genetical analysis of populations.
Additive and
dominance effects
8. Components of means
With disomic inheritance, two alleles A-a can give rise to three genotypes
AA, Aa and aa. Two parameters are required to describe the differences
in phenotypic expression of these three genotypes in respect of any
character which they affect. As the origin, we take the mid-point be-
tween the two homozygotes since this does not depend on the differ-
ences between the three genotypes, but on the rest of the genotype and
the effects of the environment, and thus reflects the general circum-
stances of the observations. The two parameters measuring the differ-
ences between the genotypes may then be defined as d, measuring the
departure of each homozygote from the mid-point, and h, measuring the
departure of the heterozygote from it. Taking A as the allele which in-
creases the expression of the character, AA will exceed the mid-point
(m) by d, and so will have an expression m + d, while aa will equally fall
short of the mid-point having an expression m-d, and Aa will deviate
from m by h so having an expression m + h (Fig. 6). If h is 0 the hetero-
Ao h m
00 1 'I AA
• I I 1 I ..
I~----~----~----d'---~I
I
-d
Fig. 6. The d and h increments of the gene difference A-a. Deviations are
measured from the mid-parent, m, midway between the two homozygotes
AA and aa. Aa may lie on either side of m and the sign of h will vary accord-
ingly.
zygote's expression of the character will be midway between the ex-
pression of the two homozygotes and dominance is absent. If h is posi-
tive, the heterozygote will be nearer to AA than to aa in its expression
and A will be partially, or if h = d completely, dominant. Similarly if
h is negative, a will be the dominant allele. If h > d Aa will fall outside
Components of means 33
the range delimited by AA and aa, and the gene may then be said to dis-
play over-dominance. It should be noted that here the capital letter A
does not imply dominance of the allele so designated: A is the allele
which increases the expression of the character whether it be dominant
or not.
This characterization of the differences among the genotypes can be
applied to any genes, whether their effects be large or small, leading to
continuous variation or not, provided the expressions of the character in
question can be expressed in quantitative terms. Thus the sex-linked
mutant Bar-eye (B) reduces the number of facets in the eyes of Drosophila
melanogaster, wild-type females (+/-t) having an average number of
779.4 facets, heterozygotes (B/+) having an average of 358.4 facets and
the homozygous mutant (B/B) having an average of 68.1 at 25° C
(Sturtevant, 1925, quoted by Goldschmidt, 1938). Then m is !(779.4
+ 68.1) = 423.75, d = 779.4 - 423.75 = H779.4 - 68.1) = 355.65
and h = 358.4 - 423.75 = -65.35. Since h is negative the B mutant is
partially dominant to wild-type and we may if we wish measure its de-
gree of dominance by hid = -65.35/355.65 = -0.184. We should note
that the effect of the Bar-eye mutant is large, and leads to discontinuous
variation, the phenotypes of B/B, B/+ and +/+ showing no overlap. No
one would go to the trouble of counting the facets in classifying the
three genotypes when Bar-eye is being used, and because its effect is
sufficiently large for it to be recognized and followed individually in
breeding experiments there would be no difficulty in disentangling it
from other gene differences whose effects were sufficiently small to
contribute only to the continuous variation in facet number that we
can observe within the phenotypes associated with each of these geno-
types for Bar-eye.
Confining ourselves now to continuous variation, we cannot dis-
tinguish individually the genes contributing to it. If we consider two
homozygous lines the departure of each of them from their mid-point
(or mid-parent as it is often called) will reflect the simultaneous action
of all the genes affecting the character by which the lines differ. As-
suming that the effects of these genes are simply additive, the depar-
ture from the mid-point will in fact be the sum of the d's, one from each
of the genes, taking sign into account. Where, for example, the lines dif-
fer at two loci, A-a and B-b, if one of them is AABB and the other aabb,
the first will depart by da + db and the second by -(da + db). But if the
lines are AAbb and aaBB, their departures will be da - db and -da + db
respectively. Generalizing, where the homozygous lines differ at k
34 Additive and dominance effects
loci, we may define [d] as the departure from the mid-parent of the line
with the greater expression of the character, where [d] = S(d+) - S(d_),
S(d+) standing for the sum of the d's of all the genes in this line tending
to increase the phenotype, S(d_) for the sum of the d's of those tending
to decrease it and S(d+) > S(d_) since [d] must be positive. In the same
way, when we cross the two homozygous lines, the phenotype of the
heterozygote will depart from the mid-parent by [h] = S(h). Since by
definition any h may be positive or negative, [h] itself may be positive
or negative, and of course where some of the genes at some of the loci
have positive h's and others negative h's they will tend to balance out
each other's effects. [h] may thus be small or even 0, even where each
of the genes individually shows pronounced dominance, simply because
being dominant in opposite directions they are cancelling out each other's
effects.
We can now see at once that although hid provides a measure of domi-
nance for a single gene difference, [h]/[d] does not provide a correspond-
ing measure of dominance when we are considering more than one gene.
[h]f[d] may be very small simply because some of the h's are positive
and others negative, so leading to a small value for [h] even although
none of the individual h's is small; and equally [h] / [d] may be large just
because the genes are so distributed between the parent lines that they
are tending to balance out one another's effects and [d] = S(d+)-
S(d_) is small even although every d is itself not small. Thus [h] /[d] ,
°
although depending on dominance in that it cannot depart from unless
one or more of the genes show dominance, is not itself a direct measure
of that dominance. For this reason it is often referred to as the potence
ratio. It is particularly worth emphasizing that where the FI between
two lines differing at more than one locus gives a phenotype falling out-
side the range delimited by the parents and so displays heterosis, i.e.
[h ]>[d] ; there is no reason to postulate over-dominance of any of the
genes involved since the excess of [h] over [d] can come about merely by
the d's of the various genes balancing one another to a greater extent
than do their h's. Thus to take a simple example, when ha = da and hb =
db' the F I between AAb band aaBB will have a phenotype of ha + h b, the
parents having phenotypes of da - db and -da + db. Then [h] / [d] =
(ha + hb)/(da - db) = (da + db)/(da - db) and heterosis is displayed even
although neither gene shows over-dominance.
Where an F2 is raised from the FI> it will include i AA, tAa and iaa in
respect of the gene A-a. This gene will therefore contribute ida + tha -
ida = tha to the departure of the average expression of the character in
Testing the model 35
F2 from the mid-parent. Assuming the effects to be additive of the k
genes by which the parent lines differed, the departure of the F2 mean
thus becomes! [h], and it may be observed that this is equally the case
even where two or more of the genes are linked. The mean phenotype of
the F 2 will then be F2 = m + ! [h] . In the same way, where B} is the back-
cross to the larger parent PI' it will include !AA and !Aa and A-a con-
tributes !da + !ha to the departure of the mean of B} from the mid-parent.
Then taking all k genes into account ii1 = m + ! [d] + ! [h]. Similarly the
back-cross to P2, the smaller parent gives if2 = m - ! [d] + ! [h].
Continuing from the F2, where a true F3 generation is raised by selfing
the F 2 individuals, in respect of A-a it will comprise ~ AA, ! Aa and
~ aa when taken as a whole. This gene will then contribute i da + ! ha -
i d a = ! ha to the departure of the F 3 mean from the mid-parent, and
taking all k genes into account the mean phenotype will be ~ = m + ! [h] .
If however the third generation is raised by mating together pairs of
individuals taken at random from the F2 (a procedure which is some-
times incorrectly described, especially by animal geneticists, as giving an
F3 generation) the distribution of A-a over this generation taken as a
whole will be ! AA, ! Aa, ! aa as in the F 2 , and the mean phenotype will
be 53 = m + ! [h] where S3 indicates the third generation raised by
sibmating among the F 2. This formulation of mean phenotypes in
terms of m, [d] and [h] can be extended to the F 4 , where F4 = m +! [h],
and indeed to any of the types of family raised by the almost endless
combinations of mating systems possible among the descendants of the
initial cross. A number of these results are collected together in Table 5.
Mean Phenotype
Generation
m [d] [h]
PI 1 1 0
P2 1 -1 0
FI 1 0
1
F2 1 0 2
BI 1 1
~ t
~ 1 -t t1
F3 1 0 4
1 0 1
F4 8
~ 0 t3
S4 1 0 -g
F2 X PI 1 1
~ t1
F2 X P2 1 -t 2
1
F2 X FI 1 0 2
1 1
BI selfed 1 ~ 4
~ selfed 1 -t 1
4
Xh] = 3.411
variance within the family (~) by the number of individuals in that fam-
ily (Table 6). Reference to this table shows that the greater family size
of the segregating generations has more than compensated for their greater
expected variability in that the variances of their family means are smaller
than those of their non-segregating families.
Six equations are available for estimating m, [d] and [h] and these are
obtained by equating the observed family means to their expectations,
in teFms of these three parameters, which are taken from Table 5. The
coefficients of m, [d] and [h] in the six equations are listed in the central
columns of Table 6. There are three more equations than unknowns and
the estimation of the three unknowns (m, [d] and [h]) must therefore be
by a least squares technique. The six generation means to which we are
fitting the m, [d] and [h] model are not known with equal precision; for
examp~, the variance of the mean (Vp2) of ~ is almost three times that
of the OJ. The best estimates will be obtained, therefore, if the generation
means and their expectations are weighted, the appropriate weights being
the reciprocals of the variances of the means. For the first entry in the
table, PI' the weight is given by 1/1.0334 = 0.9677 and so on for the
other families (Table 6).
The six equations and their weights may be combined to give three
equations whose solution will lead to weighted least squares estimates
of m, [d] and [h], as follows. In order to obtain the first of these three
equations each of the six equations is multiplied through by the coef-
ficient of m which it contains, and by its weight, and the six are then
summed. We thus have
Testing the model 39
m [d] [h]
0.9676800 + 0.9676800 112.541 1840
0.6688468 0.6688468 = 65.8479674
l.031 0340 + 1.0310340 = 12l.3 26925 9
2.0341740 + 1.0170870 = 227.376 1048
2.0458265 + 1.0229133 + l.0229132 = 237.3 1 5 874 0
l.629 991 8 0.8149959 + 0.8149959 = 177.9315349
8.377553 1 + 0.5067506 + 3.8860301 = 942.339591 0
The second and third equations are found in the same way using the
coefficient of [d] for the second and of [h] for the third along with the
weights as multipliers. We then have three simultaneous equations, known
as normal equations, that may be solved in a variety of ways to yield esti-
mates of m, [d] and [h].
A general approach to the solution is by way of matrix inversion. The
three equations are rewritten in the form
8.377 553 1
[ 0.5067506
0.5067506
2.5554814
3.886030
0.1039587
U[r!:J d
942.339591 OJ
= [ 76.385386 1
3.8860301 0.1039587 2.458532 1 h 442.6386827
J M S
where J is the information matrix, M is the estimate of the parameters
and S is the matrix of the scores.
The solution then takes the general form M = r 1 S where r 1 is the
inverse of the information matrix and is itself a variance-covariance
matrix.
The inversion may be achieved by anyone of a number of standard
procedures (Fisher, 1946; Searle, 1966). For our example, inversion leads
to the following solution.
on the basis of this model and for the estimates obtained it has as the
expected value
107.3220 -!(8.1997) + t(10.0597) = 108.2515.
This expectation along with those for the other five families is listed
in Table 6. The agreement with the observed values appears to be very
close and in no case is the deviation more than 0.83% of the observed
value. The goodness of fit of this model can be tested statistically by
squaring the deviation of the observed from the expected value for each
type of family and multiplying by the corresponding weight. The sum of
the products over all six types of families is a X2 . Since the data comprise
six observed means, and three parameters have been estimated, this X2
has 6 - 3 = 3 degrees of freedom.
The contribution made to the X2 by~, for example, is (116.3000-
115.5217)2 X 0.96768 = 0.5862. Summing the six such contributions,
one from each of the six types of family, gives Xf31 = 3.4110 which has
a probability of between 0.40 and 0.30. The model must therefore be
regarded as adequate: there is no evidence of anything beyond additive
and dominance effects.
The individual scaling tests, A, Band C, referred to on page 37 can,
of course, also be used to test the model. Thus with the present data
A = 2liJ - ~ - FJ = (2 X 116.000)-116.300-117.6750 = -1.975
and ~ = 4Vin + VpJ + VFJ = (4 X 0.4888) + 1.0334 + 0.9699
= 3.959
leading to SA = v'~ = 1.990.
Testing the model 41
Thus A = -1.98 ± 1.99 which, when entered in a table of normal deviates
does not differ significantly from the value 0 expected. These three tests,
as applied to the present data, are summarized in Table 7. Not surpris-
ingly they agree with the joint scaling test in showing the model to be
adequate.
TABLE 7.
Individual scaling tests on the data from a cross in Nicotiana
used in Table 6
Test
A = 2B] -PI-PI -1.98 ± 1.99
B = 2B2 - 1'2 - 1'1 = 2.20 ± 2.21
C = 4F2 - 2P] - p] -P2 = -2.99 ± 3.77
The joint scaling test, however, does more than test the adequacy of
the additive-dominance model: it provides the best possible estimates of
all the parameters required to account for differences among family
means when the model is adequate and, as we shall see in Chapter 5, it
can be readily extended to more complex situations. In the present case,
these best estimates show that the additive and dominance components
are of the same order of magnitude and since [h 1 is significantly positive,
alleles which increase final height must be dominant more often than
alleles which decrease it.
In this example the simple model is adequate but this is frequently
not the case, the inadequacy being revealed both by the joint scaling test
leading to a significant X2 and by one or more of the individual scaling
tests showing a significant departure from O. Two examples of this ana-
lysed in the way just described are summarized in Table 8.
The first is the weight per loculus of fruit in a cross between the two
tomato varieties, Danmark and Red Currant grown in 1938 (Powers,
1951). The second example, again provided by Dr D. S. Virk, is plant
height at the sixth week after planting in the experimental field in a
cross between varieties 72 and 22 of Nicotiana rustica. Variety 22 was
a parent of the cross we have just analysed in detail and 72 has the same
origin as variety 73 of the earlier cross. Both crosses were grown simul-
taneously, using the same experimental design and family sizes, in 1975.
For the tomato cross all three individual scaling tests are significant as
is also the joint scaling test. For the N. rustica cross the C scaling test
42 Additive and dominance effects
TABLE 8.
Examples of crosses where the additive-dominance model is inadequate.
1. Tomato: Danmark X Red Currant, for weight per loculus of fruit, in
1938 (Powers, 1951)
2. Nicotiana rustica: varieties 72 X 22, for plant height at sixth week
in field, in 1975.
and the joint scaling test are significant. In both cases, therefore, there is
clear evidence of the inadequacy of the simple additive-dominance model.
10. Scales
A failure of the additive-dominance model to fit the data, such as we
found with the last two examples considered in the previous Section,
must imply that one (or more) of the assumptions on which the model
is based is in fact invalid. Thus, for example, in constructing the model
we have assumed that the genes show simple autosomal inheritance. If
then some of them were sex-linked or if there were a maternal element
in the determination of the character, or indeed if the pattern of inherit-
ance departed from the simple autosomal in any other way, the model
would not be appropriate and would be found to fail in its fit with an
adequate body of observational results. This does not of course mean
that biometrical analysis is impossible: it means only that a more appro-
Scales 43
priate model must be found and fitted to the data. The failure of the
additive-dominance model in the examples of the last Section is, how-
ever, most unlikely to be due to invalidity of the assumption of simple
autosomal inheritance. Nicotiana rustica and the tomato are both her-
maphroditic plants and sex-linkage cannot therefore be involved. The
reciprocal Fl'S were alike in their expression of the character and this
rules out a maternal element in its determination. There is no reason to
postulate inviability of any of the genotypes included in the families
raised, and the experiment was conducted in such a way as to minimize,
if not entirely eliminate, the chance of selection disturbing the segre-
gation of the genes.
These considerations point to the assumption of simple additivity of
the d's and h's stemming from the various genes as the invalid part of
the model. Again, as we shall see in Chapter 5, the model can be elabor-
ated to accommodate non-independence of the effects of the different
genes, although only at the expense of introducing further parameters.
There is, however, one particular cause of non-independence whose
effects can be resolved in a different way, so allowing the simple additive-
dominance model to be retained and the complexity of introducing
special parameters for the accommodation of the interactions among
the genes to be avoided.
The additive-dominance model assumes that the genes involved are
independent of each other in producing their effects; or in other words
that the total effect of all the genes affecting the character (or at least
the total effect of all such genes which affect the observations we are
making) is the simple sum of their individual effects. Clearly this need
not be so. Genes might, for example, act in a multiplicative fashion, that
is their joint effect is the product, not the sum, of their individual actions,
and such multipIicativity has in fact often been postulated. In such a case
the simple model we have been using must fail when applied to an ad-
equate body of data. But if two genes are acting in this way, their joint
effect being xa x b ' where xa and Xb are their individual effects, and we
replace the measurement of the phenotype by its logarithm we have
log (xaxb ) = log xa + log x b . The multiplicative action has been removed
and they now make their own independent contributions to the.pheno-
type. So when in such a case we carry out the analysis in terms of the
logarithms of our initial measurements, the assumption of independence
is justified and the simple model will fit. Many other relations between
genes and phenotype are obviously possible and each would suggest a
suitable transformation of the scale on which the measurements of the
44 Additive and dominance effects
phenotype are expressed to restore independence. To take but one more
example, if the genes are additive in their effects on the linear dimensions
of an organ while the character we are following is effectively an area it
will reflect not the sum of the gene effects (as a linear character would)
but the square of the sum. In respect of the area character, then, the
model which assumes additivity will fail; but if we replace the direct
observations by their square roots, so restoring to it a linear basis, the
assumption of additive action of the genes would be valid and the model
would fit these rescaled results. In other words where the assumption of
independent action of the genes fails for this kind of reason, it is possible
in principle to transform the data to a more appropriate scale, as by
taking logs or square roots, or whatever else it may be, and to carry out
the analysis successfully using the simple additive-dominance model on
these transformed data.
The difficulty is, of course, that we cannot in general know how the
genes affecting a character combine in producing their effects, or even
whether in fact they all combine in the same way. So given that the
model fails when applied to a set of data, we can only cast around for a
transformation which removes, or at any rate substantially reduces, the
non-independence. Sometimes the nature of the character may suggest a
suitable transformation. Thus if a character effectively depends on the
area of an organ, the square root transformation is an obvious one to
try; but we must not be surprised if it fails, as we obviously cannot know
that the genes combine additively in their effects on linear dimensions.
In the same way the total weight of fruit yielded by say a tomato plant
can be regarded as the product of number of fruits and their average
weight. This is a multiplicative relation and suggests a log transformation;
but again it does not follow that because these components of yield are
related multiplicatively the genes affecting anyone of the components
combine in a similar way or that some genes do not affect both compo-
nents simultaneously and so introduce a disturbance into the multipli-
cative relation.
Thus ultimately the only justification for any transformation that
may be used is that it works; that whereas on the original data the model
failed because of non-independence, once the data have been transformed
the non-additivity vanishes, the simple model is adequate and there is no
need to complicate the analysis or the interpretation of its results by
introducing parameters to accommodate the non-additivity. Further-
more, because our test of the satisfactoriness of a transformation is em-
pirical, by showing that it is successful in allowing analysis in terms of
Scales 45
the simple model, we must be careful not to use its success as a justifi-
cation for drawing theoretical conclusions concerning the physiology of
gene action. At the same time, it is of course legitimate to test the agree-
ment of any empirical scale with one expected theoretically from other
considerations. This caution is reinforced when we consider that even
where the genes are not all combining in the same way to produce their
effects it may still be possible to find a scale on which their effects are
independent on average, at least as far as the data under analysis go. In
such a case it can give us little if any good information about the nature
of gene action and interaction, and indeed this same transformation may
fail when applied to a different cross involving different genes, as has in
fact been observed to happen on many occasions in practice. Even, how-
ever, where this occurs, empirically the transformation has been justified
since it has simplified the analysis of the body of data to which it was
applicable and lent more precision and confidence to the predictive use
of the results of that analysis.
We can see the value of a suitable transformation if we return to the
example already considered on page 41, where the additive-dominanace
model failed to fit the data on the weight per loculus of fruit in the cross
between two tomato varieties (Table 8). Powers (1951) has published
these data on both the original scale and on a logarithmic scale. We can,
therefore, carry out the same tests on the log transformed data. These
tests summarized in Table 9 provide clear evidence of the adequacy of
TABLE 9.
Analysis of weight per loculus of fruit in the tomato cross Danmark X Red Currant
using the log transformed data (Powers, 1951). Compare with cross 1 in Table 8
As we have already seen, when the direct counts are used, h is negative
and the Bar allele appears partially dominant to its wild-type alternative.
If, however, we apply the log transformation, h becomes positive and hid
is larger than with the direct measure of facet number, so suggesting not
only that wild-type is partially dominant to Bar but that the degree of
dominance is larger as well as being in the opposite direction. But if we
take the square root of facet number (which might be regarded as reason-
able since the number of facets is essentially a measure of area), hid is
near to 0, so suggesting that dominance is in truth negligible.
Components of variation: F2 and back-crosses 47
Which of these scales we choose to use, and hence what direction and
degree of dominance we choose to accept, is in this case a matter of
taste, for with a gene difference of such large and unique effect by com-
parison with the residual variation in facet number, we have no test of
whether any of the scales is preferable to the others in respect of reduc-
ing or removing interactions with other genes. If our aim is to simplify
the representation of the effect of Bar, as far as possible, the square root
transformation has the advantage of eliminating h and leaving us only
with the need to use d in describing the relation between the three geno-
types. At the same time, no matter which scale we use we can easily pre-
dict the mean facet number of an F2 , back-cross or any other type of
family we care to consider, since in the absence of other segregating
genes of comparable effect hand d give us a complete description of the
genetic determination of the action of Bar. Furthermore, we should note
that no matter which scale is used, we must conclude that dominance, if
present, is small. Neither the log nor the square root transformation (nor
for that matter, any other reasonable transformation) would show domi-
nance as other than complete, i.e. h = d, if in fact B/+ had had the same
number of facets as one or other of homozygotes, and neither transform-
ation would have failed to reveal over-dominance, i.e. h > d, if the facet
number of B/+ had fallen outside the range determined by B/B and +/+.
As has been emphasized, the justification for using a transformed scale
is not theoretical but empirical, in that it removes or so reduces non-
independence of the gene effects as to permit the use of the additive-
dominance model with the simpler analysis and more confident predic-
tion to which it leads. Furthermore the estimates of the genetical par-
ameters d and h, obtained when the additive-dominance model can be
employed, are unconditional in that they are not subject to adjustment
by the interaction parameters which non-additivity introduces and are
constant over the range of variation under consideration. For these
reasons, while we must recognize that it is not always possible to find a
transformation which in effect removes non-additivity when this is
present in the direct measurements, the search for such a transformation
is always well worth-while.
Now F is a linear function of the h's and so, like h, can take sign: it is in
fact a weighted sum of the h's, the weights being the corresponding d's.
Where F is positive the genes from the larger parent, PI' show a prepon-
derance of dominance over their alleles from P2 , and where F is negative
the genes from the smaller parent P2 , show the preponderance of domi-
nance. It will be observed too that because of F the back-cross to the
parent with the preponderance of dominance gives the smaller variance.
If we assume that all k gene pairs by which PI and P2 differ have equal
d's and equal h's, D = S(d 2 ) = kd 2, H = S(h2) = kh 2 and F = S(dh) =
kdh. Then y'(DH) = ..j(kd2. kh 2) = kdh = F, provided the h's are all of
the same sign. But if the h's vary in their sign, some being + and others-,
F < ..j(DH). Exactly the same conclusions are arrived at even when we
do not have equal d's and h's providing that the dominance ratio hid, is
the same for all k loci. We have, therefore, in principle a test of consist-
ency in the sign of the h's.
When analysing the components of variation the simple additive-
dominance model assumes that the various gene pairs contribute inde-
pendently to the variances and covariances just as we saw that it did
when analysing the components of means. In addition, however we now
have the further assumption that the contribution to the variation made
by non-heritable agencies is independent of that made by the genes, or
to put it in other words that there is no interaction of genotype and
environment. This is by no means always a valid assumption, for we not
Generations derived from F2 51
uncommonly find different genotypes to be subject to different types of
non-heritable variation. Sometimes the differences can be removed, or at
least greatly reduced by a transformation of the scale.
Commonly, however, we find that an Fl between two inbred lines of
a naturally outbreeding species, while showing an intermediate mean ex-
pression of a character shows a variance lower than those of both parents.
No reasonable transformation of the scale will remove such differences.
Two courses are then open. A simple, if somewhat crude, allowance for
the differences can be made by taking the average of the parental and Fl
variances as the direct estimate of E; and this can be refined by an ap-
propriate weighting of the contributions the parents and Fl make to the
a a
average, for example, by taking Vpl + Vp2 + ! VF1 (where Vp1 is the
variance of parent 1 etc.) as a direct estimate of the E component in
~F2' and in the summed variances of the back-crosses, ~1 + VB2 . Diffi-
culties arise when we move on to later generations, since the correspond-
ing weighting should change, as for example in F3 where E in the overall
a
variance should be found as i Vp1 + i Vp2 + Jj;.1 since only of the indi- a
viduals in F3 are heterozygous at any locus by comparison with! in F2 .
Probably when making this simple correction for differences in the non-
heritable variation among parents and Fl' putting E = Vp1 + Vp2 + ! VF1 a a
is as useful a weighting as any, and well within the limits of error of such
a crude, empirical correction.
The second course open to us is to expand the model and introduce
into it appropriate parameters to represent the genotype X environment
interaction in the way we shall see in Chapter 6. Such an expanded model,
however, necessarily requires more data to permit the estimation of the
greater number of parameters it entails and the testing of its goodness of
fit. The use of a suitable transformation or a simple, if necessarily approxi-
mate, correction is always worth considering if the simple additive-
dominance model can thereby be made to fit satisfactorily.
* Since this reference will be in frequent use, it will hereafter be abbreviated to M and J.
52 Additive and dominance effects
be iAA; aAa; iaa giving a mean of aha. The contribution of A-a to the
variance VF3 will thus be id/ + ah/ + i(-da)2 - (aha)2 = id/ + -hh/.
This overall variance can, however, be broken down into two parts: the
variance of the means of the F3 families, VzF3' round the overall mean
of the F3 generation, and the mean variance of the F3 families, ~F3'
each calculated round its own mean but averaged over all families. The
variance of the F3 means is like the variance of F2 in that its heritabie
portion reflects the genetical differences produced by segregation at
gametogenesis of the Fl. These are therefore described as first rank vari-
ances, denoted by the subscript 1. The variances within the F3 families
themselves, however, reflect the segregation at gametogenesis of the F2
individuals and the mean variance of the F3 families is thus of the second
rank, denoted by the subscript 2. As we shall see later, rank is of special
significance in relation to the effects of linkage on the components of
variation.
In respect of A-a, the F3 families will be of three kinds derived respect-
ively by selfing AA, Aa and aa individuals of the F2 . The families from
homozygous F2 individuals will be like Pl and P2 in the contribution A-a
makes to their means and variances and the families from Aa individuals
of F2 will be like the F2 itself in the contribution to mean and variance,
thus
F2 individuals AA Aa aa
Frequency in F2
mean da ! ha -da
F family (
3 variance 0 !d/ + !h/ 0
The contribution to the variance of F3 means, VzF3' will thus be
!d/ + H!ha)2 + !(-da)2 - (!ha)2 the last term being the correction for
the overall mean of aha. This reduces to !d/ + ir;h/, which summing
over all the genes by which Pl and P2 differ gives !D + rr,H as the heri-
table portion of VIF3 . The contribution of A-a to the mean variance,
~F3' will be !(O) + Htd/ + !h/) + !(O) = !d/ + kh/ which on sum-
ming over all gene differences gives !D + kH as the heritable portion of
the mean variance.
Both these variances will of course also contain a non-heritable com-
ponent, E, but these E components will not in general be equal. In the
first place the effect of those non-heritable agencies that cause differ-
ences among the members of a family will be less on the mean of the
family than on its individual members. Indeed in respect of this part of
Generations derived from F2 53
the non-heritable variation E2 = ~ E 1 , where E2 is the variation of the
means of families comprising n individuals each and E1 is the variation
within the families. But where each family is raised in its own plot in
the case of plants, or in its own cage or culture container in the case of
animals, we must expect greater non-heritable differences between indi-
viduals from different families, i.e. coming from different plots or con-
tainers, than between individuals from the same family, i.e. from the
same plot or container. Thus, unless special experimental designs are
used to avoid this situation, we must expect E2 > ~ E1 and in extreme
cases E2 may even be greater than E1 itself. If we write Ew for the non-
heritable variation within families and Eb for the additional non-heritable
variation between families, we can put E2 = Eb + ~ E w ' and, of course,
E1 =Ew ·
There is another point to be noted about the variance of family means.
Each mean will be subject to sampling variation arising from the variation
within the family, and this will be additional to the innate variation be-
tween the family means themselves, arising from genetical or indeed any
other differences between the means as such. The component of sampling
variation in ~F3 will be ~ V2F3 where each family includes n individuals,
or, if the numbers vary from one family to another, where n is the har-
monic mean of these numbers. ~ V2F3 will of course include the item ~Ew'
which is the contribution of sampling variation in respect of non-heritable
variation within families to non-heritable variation between their means.
We can thus write
~F3 tD + -kH + Eb + ~ GF3
V2F3 = iD + !H + Ew·
In addition to these two variances we can also find the covariance,
W1F23 , between the phenotype of the F2 parent and the mean of the F3
family to which it gives rise. This covariance will of course be of the first
rank. In respect of A-a, an AA F2 individual will have a phenotype of da
and will give rise to a progeny of mean da• Similarly an aa F2 individual
will have a phenotype -da and the mean of its progeny will be -da ; but
an Aa individual in F2 will have a phenotype ha itself while the mean of
its progeny will only be tha. The contribution of A-a to the covariance
will thus be !(da)2 + Hha.tha) + !(-da)2 -tha.!ha, the correction term
being the product of the F2 and overall F3 means. This reduces to tda2 +
!h/ and, summing over all the relevant genes, gives W1F23 = tD + !H.
There will be no E component in the covariance provided that the
non-heritable agencies affecting the progeny are uncorrelated with those
54 Additive and dominance effects
affecting the parents. This lack of correlation can be achieved, and an E
component avoided, by independent randomization of parents and off-
spring in the experiment, so that they do not share a common family
environment. Such independent randomization is a standard practice in
experimental plant breeding; but it is difficult to achieve with higher
animals because of the essential period of maternal care for the young
offspring, with the consequence that the covariance must be expected
to contain an E component in such cases.
We can extend the calculations to the F4 generation, where there are
three variances and two covariances. The first variance, ~F4' is that be-
tween the means of the groups of F4 families, where the members of
each group trace back through a single F3 family to a single F2 individual,
and it is therefore of rank 1. There will be a corresponding covariance,
W1F34 , between the means of the F3 families and the means of the F4
groups. The second variance, V2F3' is the variance of F4 family means
within the groups taken round the group means but averaged over
groups. It will be of rank 2, and will have a corresponding covariance,
W2F34 , between F3 individuals and the mean of the F4 families to which
they give rise, calculated within groups but averaged over groups. Finally
there will be the mean variance of families averaged over all the F4 famil-
ies, which will be of rank 3 since it reflects differences springing from
gametogenesis in the F3 individuals. Provided that Eb is no greater be-
tween families from different groups than between those of the same
group, and making allowance for the appropriate sampling variation of
family and group means, with n individual in each family and n' families
in each group, it can be shown that
TABLE 11.
Biparental progenies from random matings among the individuals of an F2
Progeny
Mating Frequency
Mean Variance
AAXAA -h1 d 0
AAX Aa 'f !Cd+h) !(d-h)2
1
AAX aa 8 h 0
1
AaX Aa 'f !h !d2+~h2
AaX aa !Ch-d) i{d+h)2
aa X aa *
-h -d 0
Overall mean !h
from this table that the contribution of A-a to the variance of family
means (~S3) will be
-hd/ + i [t(da + ha)]2 ... -hC-da)2 - Ctha)2 = id/ + -hh/
where the term -Ctha )2 is the correction for the deviation of the overall
mean of the generation from the mid-parent m. Similarly the contribution
of A-a to the mean variance of the families (V2S3) will be -h (0) + !.! (da -
ad;
ha )2 ... + -h (0) = + l~h~. Then summing over all the relevant
genes, adding the non-heritable component of variation and also the item
for sampling variation in ~S3' we find
TABLE 12.
Components of variation in F2 and its derivatives
Sampling
Statistic D H Ew Eb variation
VIF2 ! ! 1 0 0
1
VIF3 t 1i 0 1 "iiViF3
V2F3 ! 1 0 0
WzF23 t 1 0 0 0
1
VIF4 t i4 0 0 "ii' V2F4
V2F4 ! :b 0 1 n1 ~F4
V3F4 1 1i 0 0
WzF34 t :b 0 0 0
W2F34 ! -h 0 0 0
1
VIS3 ! 1i 0 1 "ii Vis3
V2S3 ! fi, 1 0 0
WzS23 ! 0 0 0 0
1
VIS4 ! lis 0 0 "ii' Vis4
1
V2S4 1 -& 0 "ii ljS4
V3S4 ! M 0 0
WzS34 ! :b 0 0 0
W2S34 1 :b 0 0 0
The balance sheet of genetic variability 57
13. The balance sheet of genetic variability
Like energy, genetic variability is conserved inside a closed system. Cross-
ing, segregation and recombination, may redistribute it among the vari-
ous states in which it can exist, but in the absence of mutation, random
change and selection its total quantity remains unchanged (see Mather,
1973 for a fuller discussion of the theory of variability). One aspect of
this conservation of variability is revealed by the heritable variances we
have been discussing.
The heritable portion of the phenotypic differences between homo-
zygotes is D- type variation. Heterozygotes contribute to the phenotypic
differences in two ways. They may contribute directly to the pheno-
typic differences among the individuals of a family or generation; but
their contribution may also appear in part as the departure of the gen-
eration mean from the mid-parent, which as we have seen depends on
[h]. Now D and H are both quadratic quantities, in terms of d and h,
but [h] on the other hand is linear. The coefficient of [h] in the depar-
ture of the mean from the mid-parent must thus be squared if it is to be
comparable to the coefficients of D and H. The heritable variation ex-
pressed by the phenotypes of a generation may thus be expressed as
xD + y H + z [h] 2 and in the absence of complicating circumstances, x,
y and z must sum to unity.
In the Fl' x = Y = 0 and z = 1 since the mean is [h]; but in the F2 to
which it gives rise x = 1, y = ! and with the mean at ! [h], z = ! 2 = ! so
once again giving x + y + z = ! + ! + ! = 1. The F3 has an overall mean
of ! [h] so giving z = P = -h. There are two variances whose heritable
components are to be taken into account in the F3 • These are VzF3 =
tD + -hH and V2F3 = !D + !H, sampling variation being left out of
account as any differences it produces are random changes. Thus taken
together these two variances contribute iD + nH and x = i, y = T6
while as we have seen z = -h so completing the tally and giving x + y + z
= 1. The same applies to F4 (see Table 13) and indeed to Fs or any later
generation. In the biparental progenies of the third generation the heri-
table components of the two variances are VzS3 = !D + -hH and V2S3 =
!D + nH n
while the mean is Hh]. So x =! +! = t, y = -h + = 1, z =
(t)2 = ! giving once again x + y + z = 1, and the same can be shown to
apply to S4 the fourth generation, and indeed to Ss etc. raised by con-
tinued sib-mating (see M and J).
It will be observed that the coefficient of D in the successive F gener-
ations follows, as indeed it must, the series 1 -tn-I which gives the pro-
portion of individuals homozygous in the nth generation for the alleles
58 Additive and dominance effects
TABLE 13.
The balance sheet of variability
Coefficient of
Generation
D H [d]2 [h]2
Parents 0 0 12=1 0
Fl 0 0 0 12=1
F2 ! ! 0 (!)2=!
F3 ~F3 ! -h
l'2F3 ! i
Total i 1\ 0 (!)2=-h
F4 ~F4 ! if
l'2F4 ! ~
~F4 i -h
Total ~ i4 0 (i)2=i.J
~S3 ! -h
l'2S3 ! -&
Total ! ! 0 (!i=!
S4 ~S4 ! -&
l'2S4 i1 -&
~S4 OJ 11
Total i M 0 (i)2=l4
Back-crosses ~ ! ! 0
VB 0 0 !
Total ! ! ! (!i=!
at a locus at which the parents differed. Similarly the sum of the coef-
ficient of H and the squared coefficient of [h]2 follows the series tn-I,
since the proportion of heterozygotes at such a locus is halved in each
generation under selfing. In the same way the coefficients of D, Hand
[h) 2 in S3' S4 etc. are related to the Fibonacci series which gives the fall
in the proportion of heterozygotes under continued sib-mating.
The same principle of conservation of variability applies to the joint
Partitioning the variation 59
back-crosses although with the introduction of a fourth compone~t. The
heritable portion of the mean variance of the two back-crosses is VB =
Hf~n + ~2) = iD + iH. The means of the back-crosses are ii1 = !([d]
+ [h]) and ~ = !([h] - [d]) the overall mean of the two taken together
being! [h]. The heritable variance of the back-cross means is thus
The departure of the overall mean from the mid-parent accounts for
(! [h])2 = ![h]2 of the variability, and the coefficients of D, H, [h]2 and
the new component [d]2 thus sum to unity (Table 13). Once this fourth
component of variability is recognized we can complete the picture by
noting that in the parental generation, ~ = [d] and ~ = -[d], giving a
total of [d] 2 for the variability represented by the difference between
the means of these two true-breeding lines from whose cross all the later
generations are descended.
In conclusion we should note that D, H, [d]2 and [h]2 are different
components of variability with different properties. Their coefficients
sum to unity because all the variability must be acounted for, but each
of them has its own special relation to the expression of variability
among the phenotypes. Thus Hand [h] 2 depend on dominance while D
and [d]2 do not. The dominance properties of the genes express them-
selves in different ways in [h]2 than in H: dominance in opposing direc-
tions tends to balance out in [h] 2 but not in H. Furthermore [h] 2,* H
apart from the trivial case where only one gene difference is involved,
for even where all the gene pairs show dominance in the same direction
[h]2 will exceed H by a factor which depends on how many g~ne pairs
are involved and by how much the individual h's vary from one to
another. In the same way [d]2 will reflect the distribution of the genes
between the parents whereas D will not: thus D will be the same in the
cross AABB X aabb as in AAbb X aaBB, whereas [d]2 will not. And
where all the increasing alleles are associated in one parent, AA BB CC
..... , and all the decreasing alleles in the other, aa bb cc ... , [d] 2
will exceed D by a factor depending on the number of gene pairs in-
volved and on the extent to which the individual d's vary from one to
another. We shall have occasion again to touch on these relationships in
a later section.
F
0.1643
V(DH)
Vp2 and VF1 ) are all estimates of Ew- Two of these, from the two parental
families, do not differ from one another, but they do differ from the FI
estimate which is significantly larger. We must therefore combine them
in the way described on p. 51, to give
Ew = !(Vpl + Vp2 + 2 VF1 ) = 41.1426.
Partitioning the variation 61
The combined estimate of Ew together with the remaining three variances
leave us with four equations for estimating the four components D, H, F,
and Ew' So only a perfect fit solution is possible, the equations being
D 4 JjP2 - 2(VBl + VB2 ) = 59.2062
H 4(VBl + VB2 - JjF2 - Ew) = 27.6304
F = VB2 - VB] = 6.6459.
These estimates are tabulated in Table 14. Finally we can estimate the
dominance ratio as y HID) = 0.6831 which agrees with the relatively high
level of dominance suggested by the analysis of the means. The relatively
low value for Fly(D'H) provides little evidence that the dominance devi-
ations at different loci are particularly consistent in sign or magnitude.
Having only four equations for the estimation of four parameters we
must obtain a perfect fit solution to them, and we can neither calculate
the standard deviation of the estimates of D, H, E and F, nor indeed can
we test the goodness of fit of the additive-dominance model as a whole.
To do so requires a more comprehensive experiment such as that described
and analysed by Hayman (1960), which is also discussed by M and J.
Hayman's experiment was again initiated by a cross between two true-
breeding lines of Nicotiana rustica, although it was not the same cross as
the one we have just been considering. The two parents were crossed
reciprocally to give the two reciprocal FI'S from each of which an F2, F3
and F4 were raised. The F3 consisted of 10 families from each reciprocal,
i.e. 20 F3'S in all, and the F4 of 50 families from each reciprocal, the 100
families thus involved being obtained by selfing 5 plants from each of
20 F3 families. Back-crosses were not included in the experiment. The
character we shall be considering is plant height measured in inches. The
plants were grown in two blocks, the plots within the blocks each com-
prising five plants. Each of the F3 and F4 families occupied one plot in
each block, but each parent, FI and F2 was present as five plots in each
of the two blocks. There is internal evidence from Hayman's account of
the experiment that some F4 plants, and it would appear seven F4 famil-
ies, failed in the experiment or were excluded for other reasons. JjF2'
J.jFj and ~F4 were obtained from the variances within plots, round the
plot means, and so include Ew as their non-heritable component. JjP3'
JjF4 and J.jF4 were found as variances between the relevant plot means,
taken round the block means, and so include Eb as well as the sampling
variation stemming from V2F3 , J.jF4 and ~F4 respectively. Since each
plot included five plants, n = 5 and in F4 each group included five
62 Additive and dominance effects
families so giving n' = 5 also. Thus allowing for sampling variation (see
pp. 53-4)
~F3 = tD + fgH + Eb + t V:zF3
= tD + fgH + Eb + HaD + !H + Ew)
and similarly
VzF4 = aD + ii H + Eb + t ~F4
= aD + iiH + Eb + HiD + fgH + Ew)·
Since n' =5
VzF4 = tD + i4H + ~, V2F4
tD + i4H + HaD + iiH + Eb ) + g(!D + fgH + Ew).
The coefficients of D, H, Ew and Eb so obtained are set out in columns
5-8 of the upper part of Table 15. PI' P2 and the reciprocal FI's were
each raised as five plots in each block. Thus not only could an estimate
of E1 = Ew be obtained from the pooled variances of parents and FI's
within plots; but an estimate of E2 , the non-heritable variance between
plots, can also be found from the pooled variances between plot means,
taken round the block means. In addition to Eb this will include an item
of t Ew because of sampling variation resulting from the variances within
plots.
The direct estimates of E1 and E2 , together with the variance of F2 ,
the two variances from F3 and the three from F3 are shown in Table 15,
which also gives the number of degrees of freedom (df) on which each
variance is based. (The details of the derivation of their number of de-
grees of freedom are given by M and J.) There are thus eight observed
statistics from which we must estimate four parameters, D, H, Ew and
E b • This will leave four degrees of freedom for testing the goodness of
fit of the model.
The procedure is essentially the same method of weighted least
squares already described for the analysis of means (page 38). One
difference must, however, be noted. The variances of means, whose re-
ciprocals are used as weights in the analysis, are commonly observed
empirically in the experiments. Replication is, however, seldom suf-
ficient to permit the use of the same procedure where variances them-
selves are to be analysed, and in consequence the theoretical variance
of the variance must be. used to supply the reciprocals for use as weights.
The variance of a variance V is 2 V2/N, where N is the number of degrees
Partitioning the variation 63
TABLE 15.
Analysis of Hayman's (1960) experiment on plant height in Nicotiana rustica
Coefficients of
Observed df First weight
D H
Female parent
Male Mean
AA aa
parent
d -d 0
AA AA aA
d d h !(d+h)
aa Aa aa
-d h -d t(h-d)
Mean !(d+h) !(h-d) th
Vr led-hi !Cd+h/ !(d2 +h2 )
Wr td(d-h) td(d+h) td2
and Vy + Vr = !d/ + !(d/ + h/) = td/ + !h/ which equals the contri-
bution of such a gene difference to V1F2 (Table 12), as indeed it obviously
should since an F2 includes AA, Aa and aa individuals in the same pro-
portions as the families of the corresponding genotypes in Table 16.
We can take the analysis further by considering the relation between
w,. and v,.. Now the difference between the variances of the two arrays is
A v,. = i[(da + ha)2 - (da - ha?J = daha and that between the covariance
t
is A w,. = da [ (da + ha) - (da - ha) J = da ha· Thus if we plot w,. against v,.
as in Fig. 7, the line joining the two points must have a slope of daha/daha
Wr
Arra v ca /
~d(d+h)
---~/1
/.~ i
-~Arrcy AA I
I I
I I
r I
I I
o k(dth)
2 Vr
Fig. 7. The W/v,. graph, neglecting non-heritable variation, from a dial1el
set of matings involving one gene difference, A-a, where h = ld. The line
passing through two points, from arrays AA and aa respectively, also passes
through the point Wr , v;. and has a slope of 1. It cuts the ordinate at Wr =
hd2 -h 2 ).
70 Diallels
= 1 and it will pass through the point Wr , ~, which as we have seen will
be the point !d~, Hd~ + h~). So, if we project the line passing through
the two points of the figure backwards it will cut the ordinate, where
Vr = 0, at the value of Wr given by
tvr - V.r = 2~ da2 - ~ (d 2
<J a
+ ha2) = ~ (d 2 -
<J a
ha'
2)
The relative position of the two array points on the line will reflect
the direction of dominance. If the A allele is dominant, that is ha is posi-
tive, the point for array 1 (common parent AA) will occupy the lower
position on the line. If, however, the a allele is dominant and ha negative
the point for array 2 (common parent aa) will occupy the lower position
on the line. This graph therefore tells us a great deal about the genetical
situation. In the absence of dominance, v,. is the same for both arrays
and so is w,.. The two points on the graph will thus coincide exc"ept for
random sampling variation in the estimates of v,. and w,.. If they do not
coincide, the intercept on the ordinate of the line which joins them will
provide a measure of dominance, and in particular where ha < da it will
cut the ordinate above the origin, where ha = d a it will pass through the
origin and where ha > da it will pass below the origin. It should be noted,
of course, that so far we have neglected non-heritable variation, which
will contribute to the different variances (although in a suitably designed
experiment not to the covariances) and for which due allowance must be
made in any analysis of this kind. We will return to the nature of the
necessary allowances at a later stage.
If the two true-breeding lines which are used as the parents of the
families differ at more than one locus the effects of all the genes by
which they differ will be reflected simultaneously in the phenotypes of
the four families derived by mating them in all four possible combi-
nations. In other words da and ha must be replaced by [d] and [h]. The
information to be gained will thus be of the same kind as that obtain-
able from an analysis of means (Section 8) and being restricted to par-
ental and Fl families it will not even yield enough statistics to test the
adequacy of the model. In the previous chapter we examined the limi-
tations of [d] and [h] in respect of the information they provide about
the dominance properties of the genes they depend on. We saw too how
these limitations can be overcome by proceeding to F2 and other segre-
gating generations, which in addition to providing the additional means
needed to test the adequacy of the model also yield second degree stat-
istics enabling us to estimate and bring into the interpretation the quad-
ratic quantities D = S(d 2 ) and H = S (h 2 ). We will now examine an alterna-
tive approach.
The principles of diaZZel analysis 71
Table 16 is the simplest example of a diallel set of mating in which a
number, n, of true-breeding lines are mated together in all possible com-
binations to give n 2 families. Since it involved only two lines (n = 2) it
could clearly give us information about only one genetical difference, or,
if more than one such difference was involved, only about the differences
as a unitary aggregate. If more lines are used, clearly a correspondingly
greater number of differences, or aggregate differences, can be investi-
gated. As the next simplest case let us consider a diallel among four lines
representing all the possible combinations of two gene differences, A-a
and B-b. The genotypes of the 16 families so obtained are shown in
Table 17 as are the phenotypes expected on the assumption that A-a
TABLE 17.
Diallel set of matings involving four true-breeding lines, being all the combinations
of two genes, A-a and B-b
Female parent
Male Mean
parent AABB AAbb aaBB aabb
da+db da-db -da+db -da-db 0
AABB AABB AABb AaBB AaBb
da+db da+db da+hb ha+db ha+hb HCda+ha) + Cdb+h b )]
AAbb AABb AAbb AaBb Aabb
da-db da+hb da-db ha+hb ha-db HCda+ha) + Chb-db )]
aaBB AaBB AaBb aaBB aaBb
-da+db ha+db ha+hb -da+db -da+hb H Cha-da) + Cdb+hbl]
aabb AaBb Aabb aaBb aabb
-da-db ha+hb ha-db -da+hb -da-db HCha-da) + Chb-db )]
Mean HCda+ha) HCda+ha> HCha-da) HCha-da) tCha+hb)
+ Cdb+hbl] + Chb-db)] + Cdb+h b)] + Chb-db )]
and B-b contribute independently. At the foot of the table are the four
v,.'s one for each array, and similarly the four w,.'s. It will be observed
that, as in the earlier example, ~ w,. = ~ v,. when we move from one
array to another. Thus moving from array AAbb to AABB gives ~ w,. =
~ v,. = dbhb' and from aabb to AABB gives ~ W, = ~ v,. = da ha + dbhb'
So, if we plot w,. against v,. the four points, one from each array, will lie
72 Diallels
on a straight line of slope I. Furthermore it must pass through the point
w,., J:; which is t{d/ + d b2), Hd/ + h/ + d/ + hb2) and may be rewritten
as !D, l(D + H). The line will thus cut the ordinate at tv,. - v,. = !D-
1(D + H) = ! (D - H). So we can learn something of the average domi-
nance relations of the two genes and indeed, bearing in mind that the
variance among the four parent means is Vp = ! [(da + db )2 + (da - db )2
+ (-da + db)2 + (-da - db)2] = d/ + db2 = D, we can obtain an estimate
of the average dominance as V[(Vp- 41)/Vp] = V(R/D), where I is the
intercept of the regression line with the ordinate.
We should note, too, that now two genes, A-a and B-b, are involved
the relation of w,. to ~ provides a test of the additive-dominance model
of gene action. The phenotypes set out in Table 17 are those expected
when the two gene pairs make independent contributions to the ex-
pression of the character. If their contributions are not independent,
that is if the genes interact in producing their effects, we cannot expect
the relation of Wr and ~, to hold good as we have derived them, and in
particular we can no longer expect the regression of w,. on ~ to be rec-
tilinear with a slope of I.
the sets of matings reared on the two occasions, and 15 for the interac-
tion of matings X occasions, i.e. for the differences between the dupli-
cate observations after allowance has been made for the overall differ-
ence between occasions. The 15 df for differences between rna tings may
be partitioned into 3 items, namely 3 df for differences among the 4
genotypes of female parents, 3 for differences among the 4 genotypes
of male parents, and 9 for the interaction of female and male parental
genotypes. The main items for differences among female and male
parents both reflect differences among the same set of four genotypes
and so, in the absence of complications such as maternal effects, should
yield estimates of the same component of variation, which will of course
be the additive variation (D). The item for interaction of female and
male parents will test for departures from simple additivity of the gene
effects, including dominance as well as non-additivity of non-allelic
genes in producing their effects. The analysis of variance is set out in
Table 19. The matings X occasions item provides an estimate of the error
variation. The mean square for occasions is significant, so confirming
that, as might be expected, the experimental conditions were not pre-
74 Diallels
TABLE 19.
Analysis of variance of the diallel data in Table 18
Item df MS VR P
Female parents 3 1.41146 15.02 <0.001
Male parents 3 1.05563 11.23 <0.001
Interaction 9 0.28306 3.01 0.05-0.01
cisely the same at the times when the progenies were raised from the
duplicate sets of matings. The mean squares for the differences among
the four genotypes are significant for both female and male parents,
showing that there is additive genetic variation among these genotypes.
The item for interaction of the differences among the female and male
parents, although not so large, is also significant, so showing that the
differences among the sixteen progenies are not wholly accountable in
terms of additive variation: there must also be present non-additive vari-
ation to which both dominance and interaction of non-allelic genes
could contribute.
The mean squares for female parents and male parents do not differ
significantly from one another, as would be expected if the two sexes
are contributing equally to the genotypes of the progeny. There is thus
no indication of any maternal effect, or of indeed any other departure
from simple autosomal inheritance, and the close comparison of means
for the corresponding arrays from female parents and male parents
shown in the margins of Table 18 confirms this. A further and more
stringent test is, however, possible. The four matings along the leading
diagonal of the diallel table (Table 18) are repeats of the homozygous
parental lines, the female and male parents being of the same genotype
in each case. The other twelve matings are between parents of different
genotypes and fall into six pairs of reciprocal crosses. Provided the
parents contribute equally to the progeny these reciprocals should be
alike within the limits of sampling variation. The mean square for differ-
ences between reciprocals can thus be compared with error variation to
provide a test of equilinearity in the genetical determination of the
An example of a simple diallel 75
character. The mean square is readily found. Thus the duplicate progenies
from WS X WW gives values of 17.25 and 18.35 while those from WW X
WS give 18.05 and 18.55. The difference between the reciprocals is there-
fore 17.25 + 18.35 - 18.05 - 18.55 = -1.0 and the contribution of this
comparison to the sum of squares (SS) is (-1.0)2/4, the divisor 4 reflec-
ting the use of 4 observations in deriving the difference. There are 6 such
differences, obtained from the 6 pairs of reciprocal crosses, as set out in
Table 20, and summing their contributions yields a SS of 0.771 875. This
TABLE 20.
Differences between the offspring of reciprocal crosses in the
data of Table 18
ws SW SS
WW -1.00 0.85 0.75
WS -0.85 0.20
SW -0.20
Total -0.25
WW WS SW SS Mean Wr v,.
WW 17.5500 18.0500 18.1125 17.7375 17.8625 0.1427 0.0703
I
WS 18.8000-18.4875-18.9500--18.5719 0.2748 0.1582
SW 19.2250 18.7500 18.6438 0.3232 0.2186
SS 19.3500 18.6969 0.5271 0.4713
Mean 0.3169 0.2296
the progenies of mating within the four parental genotypes and thus are
repeats of these four parental lines. Each is the mean of two duplicate
progenies. Thus for WW X WW we have! (17.45 + 17.65) = 17.55. The
off-diagonal entries on the other hand are the means of four progenies,
namely the pair of reciprocals each of which is represented by duplicate
progenies. Thus the entry for WW X WS is 1(17.25 + 18.35 + 18.05 +
18.55) = 18.05. In proceding to find w,. and v,. we note that, after
pooling our reciprocals, it does not matter whether we work on female
or male arrays: they will give identical results. The WS array for example,
consists ofWW X WS(18.0500), WS X WS(18.8000), WS X SW(18.4875)
and WS X SS (18.9500) as shown by the linking lines in Table 21. Its v,.
is thus 1[(18.0500 2 + 18.8000 2 + 18.4875 2 + 18.9500 2 ) - 1(18.0500 +
18.8000 + 18.4875 + l8.9500P] = 0.1582 the final divisor being 3
because there are 3 df among the 4 progenies. These values of v,. are
entered in the right-hand column of the table.
The calculation of w,. requires a further word of explanation. We
could have used values for the four parental lines obtained from progenies
of these lines obtained independently of the diallel itself. This is, however,
unnecessary as the four parental lines appear along the leading diagonal
of the diallel table and we can in fact utilize these four entries in the
table to provide values of the mean chaeta numbers of the four parental
genotypes. (This introduces a complication in assessing the values of the
components of variation, as we shall see later (p. 80), but one which
An example of a simple diallel 77
does not affect our immediate analysis and so may be ignored for the
moment.) So again taking the WS array as an example, we find its w,. as
1[(18.0500 X 17.5500) + (18.8000 X 18.8000)+(18.4875 X 19.2250)
+ (18.9500 X 19.3500)] -1(18.0500 + 18.8000 + 18.4875 + 18.9500)
(17.5500 + 18.8000 + 19.2250 + 19.3500)] = 0.2748.
The values of w,. for the four arrays are given next to those for the cor-
responding v,. in Table 21.
w,.
0·1
Fig. 8. The Wr/v,. graph for sternopleural chaeta number in the defined
diallel among the four lines WW, WS, SW and SS in Drosophila melano-
gaster. The slope of the regression line is b = 0.9172, which does not differ
significantly from 1. The position of the points along this line shows that
the genes from Ware preponderantly dominant and those from S prepon-
derantly recessive.
TABLE 22.
Values of Wr and v,. from the two occasions
Occasion 1 Occasion 2
Array
Wr v,. Wr+v,. Wr-v,. Wr v,. Wr+ v,. Wr-v,.
WW 0.1242 0.0275 0.1517 0:0967 0.1283 0.2004 0.3288 -0.0721
WS 0.3750 0.3317 0.7067 0.0433 0.1772 0.0518 0.2290 0.1254
SW 0.3467 0.2781 0.6427 0.0686 0.2914 0.1677 0.4590 0.1237
SS 0.4063 0.3268 0.7331 0.0794 0.6696 0.6538 1.3233 0.0158
An example of a simple diallel 79
them. There are thus eight values for each of w,. + v,. and w,. - v,., one
from each of the four arrays in each of the two halves of the experiment.
We can now carry out an analysis of variance on w,. + v,. and another
similarly on w,. - v,.. In each case there will be 7 df among the 8 ob-
served values, of which 3 can be ascribed to differences between the
arrays and the remaining 4 to the differences between the duplicate
values obtained for each of the 4 arrays. These 4 df could be further
partitioned into 1 df for the overall difference between occasions and
3 df for variation of the 4 array differences round this overall value; but
this is unnecessary in the present case since the overall difference be-
tween occasions is not significant when compared with the residual vari-
ation for the 3 df. We thus have a simple analysis into two parts, one of
which for 4 df is a measure of the variation within arrays between oc-
casions and provides the estimate of error against which the mean square
between arrays can be tested for significance.
The two analyses of variance, for w,. + v,. and w,. - v,. respectively, so
obtained are set out in Table 23. The MS between arrays for Wr - v,. is
TABLE 23.
Analyses of variance of Wr + v,. and Wr - v,.
Item df MS
Between arrays 3 0.2200 VR= 2.77
Wr+ v,.
Within arrays 4 0.0794 P = 0.20 -0.05
not significant when tested against that within arrays and indeed is
smaller than it. There is thus no evidence of any non-allelic interaction;
no evidence, that is, of any inadequacy of the additive-dominance model.
Turning to the analysis of variance w,. + v,., it will be seen that the MS
between arrays is greater than that within them, but not significantly so.
On this evidence alone, therefore, we could not be confident that even
dominance was present. We should recall, however, the evidence from
the initial analysis of variance (Table 19) of non-additive effects, which
must be accounted for in some way. Since there is no evidence of inter-
action between non-allelic genes, we must conclude that although not
formally significant by itself the higher value for the MS between arrays
for w,. + v,., does in fact reflect dominance, and that while the assump-
80 Diallels
tion of additive genetic variation alone is not adequate, the additive-
dominance model does provide an adequate basis for interpreting the
results.
Returning to the overall estimates of Wr and v,. obtained when the
data from the two halves of the experiment are pooled (Table 21), their
mean values are Wr = 0.3169 and v,. = 0.2296. To these two statistics
we may add the variance of the parent lines (Vp) found from the leading
diagonal of the diallel table which, as has already been noted, comprises
the four parental genotypes. We thus find from Table 21 Vp =! [17.550 2
+ 18.8002 + 19.225 2 + 19.350 2 - 1(17.550 + 18.800 + 19.225 +
19.350)2] = 0.675 573. However, before we can use these estimates for
deriving the values of the genetical components of variation D and H,
they must be corrected for the non-heritable items that they contain.
The original analysis of variance of the experiment (Table 19) yielded a
value of 0.09398 for the error variance based on the differences between
the duplicate observations made on each of the sixteen matings in the
table. This error variation reflects, of course, the non-heritable differ-
ences to which the observations are subject and hence provides the basis
for finding the non-heritable components of the three statistics in which
we are now interested. We note that each value along the leading diagonal
of Table 21 is the mean of a pair of duplicate observations. These will
thus be subject to half the error variation of the single observations and
we can estimate the non-heritable component of Vp, which is the variance
of the values in this leading diagonal, as! X 0.09398 = 0.04699. Thus
the heritable component of Vp = D = 0.675 57 - 0.046 99 = 0.628 58.
The off-diagonal entries in Table 21 are, however, the means of four
observations each, and so will be subject to only 1 the error variation of
single observations. v,. for each array is based on three such off diagonal
entries together with one diagonal entry. In other words i of the obser-
vations on which v,. is based are each subject to 1 of the error variance,
and 1 of the observations are subject to ! the error variance. Thus the
non-heritable component of each v,., can be estimated as (i' ! + !. !)
0.09398 = 0.02937 and the heritable component of v,. = !CD + H) =
0.22959 - 0.029 37 = 0.20022.
Turning to ~, we note that it would contain no non-heritable item if
it had been calculated using values of the parental lines from observations
made independently of the diallel matings. In fact, however, we are taking
the parental values from the leading diagonal of the diallel table itself.
So, every w,. will include, as one of the four cross-products from which
it is derived, the square of the appropriate parental value. Thus, for
An example of a simple diallel 81
example, as we have already seen, w,. for the WS array is based on
(18.0500 X 17.5500) + (18.8000)2 + (18.4875 X 19.2250)
+ (18.9500 + 19.3500).
This squared value will bring in an item for non-heritable variation. It is
a value from the leading diagonal of Table 21 and so is the mean of two
observations and it provides one of the four cross-products that contri-
bute to each w,.. Hence the non-heritable component_ of Wr will be (i· !)
0.09398 = 0.011 75 and the genetic component of w,. = !D thus be-
becomes 0.31693 - 0.011 75 = 0.305 18. Before proceeding we might
observe that while the regression of Wr on Jt;. used in analysing their re-
lationship should strictly be the regression of the genetic portion of w,.
on the genetic portion of ~, the regression of the Wr on ~ uncorrected
for their non-heritable components (as used in Fig. 8) will give exactly
the same value for b since we subtract a common non-heritable item
from all four w,. and also a common one from all four Jt;.. The slope of
the regression line is thus not affected, even although its position as
defined by the point ~, v,. , through which it must pass, and hence its
intercept with the ordinate, is valid only after the non-heritable compo-
nents have been deducted.
Returning to our main theme, there is another statistic which we have
not used so far but which can be calculated from the diallel table, namely
the variance of array means, Vr whose heritable component is iD. These
means are shown in Table 21 from which we find Vr = 0.15278. This
variance, too, will contain a non-heritable component. Each array mean
is derived from an array as shown in Table 21, and thus corresponds to
the joint mean of the corresponding female and male arrays of Table 18:
in fact the mean of the WW array is the mean of all the observations in
the first column and first row of Table 18, the observations in the top
left corner each having been used twice. The array mean is thus the
mean of twelve observations used once each and two used twice, thus
being the equivalent of (12 Xl) + (2 X 2) = 16 observations. But when
an observation is multiplied by two, the amount it contributes to a vari-
ance is multiplied by four. So the non-heritable component of the vari-
ance of array sums will be (12 Xl) + (2 X 4) = 20 times the error
variance and the non-heritable variance of array means Vr, will be
correspondingly
20
16 2 (0.09398) = 0.00734.
82 Diallels
So after deducting the non-heritable components we have
= 0.62858, v,. = !CD+ H) = 0.20022
= 0.305 18, Vi=" = iD = 0.14544
We can thus find estimates of D and H as
D = ~(Vp + tv,. + Vr) = ~ (1.07920) = 0.61669
H = 4v,.-D = 0.80088-0.61669 = 0.18419.
Then as an estimate of the average level of dominance we can take
y'(H/D) = J O.18419
0.61669 = ±0.54651.
TABLE 24.
Components of variation in the diallel of Table 18
J~ = 0.5465
We might note that this procedure for estimating D and H is not fully
efficient as we have given Vp, tv,. and Vr equal weight in finding D. A
more complex procedure can be used to provide least squares estimates
which at D = 0.62288, H = 0.17792 and y'(H/D) = 0.53445 are vir-
tually identical with those yielded by the simpler procedure.
While we now have an estimate of the dominance ratio we do not as
yet have any indication as to its direction. But as we have earlier noted,
the order of the points on the w,., v,. graph itself gives an indication of
the relative number of dominant to recessive genes present in the com-
mon parent of each array: the common parent with the most dominant
genes has the smallest values of Wr and v,. and that with the most recess-
An example of a simple diallel 83
ive genes the largest values of ~ and v,.. Now it can be seen from Table
21 that the order of the arrays from the smallest to the largest values of
~ and v,. is WW, WS, SW and SS. Since WW has a smaller value than SW
and WS than SS, the W chromosome II must show dominance over its S
homologue. Similarly WW gives smaller values for ~ and Vr than does WS,
and SW than SS. Thus the W homologue of chromosome III also shows
dominance over the S. Since, therefore, the W homologues are also as-
sociated with a lower score (Table 21) and the S homologues with a
higher score, the direction of dominance is clearly prepondantly for
lower score, the dominance deviations being negative.
This diallel is defined in the sense that the genotypes of the parents,
and hence of the progenies, are known for every mating. It is therefore
possible to approach its analysis in a different way. The progenies of the
sixteen matings fall into the nine genotypes expected for all the possible
combinates of two 'genes' each with two 'alleles'. The different geno-
types are not expected to be produced by the same number of matings;
the homozygotes WWWW, WWSS, SSWW, and SSSS are each represented
by single matings, although of course duplicate progenies are available
for each of them; the four single heterozygotes WWWS, WSWW, SSWS,
and WSSS each came from two matings (reciprocals) and so are rep-
resented by four progenies; and the double heterozygote is produced by
four matings (those along the off-diagonal in Table 18) and so is rep-
resented by eight progenies. The mean chaeta number of the nine geno-
types, obtained by averaging over the appropriate observations in Table
18 are set out in Table 25, together with (in brackets) the number of
observations from which each is derived. The means in the margins of
the table are the means of all flies of the particular genotype in question.
Thus, for example, the mean of all flies homozygous for the W chromo-
some II is given at the bottom of the first column having been found as
i[(17.55 X 2) + (18.05 X 4) + (18.80 X 2)] = l8.11Q5. The expected
departures of these marginal means from the mid-parent value of the
whole experiment are also shown in terms of d2 , d3 , h2 and h3 , where the
subscripts 2 and 3 refer to chromosomes II and III respectively. As will
be readily seen, we can estimate these four parameters from the mar-
ginal means, the chromosome II parameters from the lower margin of
the table and the chromosome III from the right-hand margin. Consider-
ing the chromosome II parameters
d2 H(d2 + !h3 ) - (-d2 + !h3)] = t(19.0188-18.1125] = 0.4531
h2 = H2(h2+th3)-(d2+th3)-(-d2+!h3)] = H2 X 18.3219-
19.0188 - 18.1125] = -0.2438.
84 Diallels
TABLE 25.
Direct estimation of genetic parameters
Chromosome Chromosome II
Mean Expectation
III WjW WjS SjS
WjW 17.5500 18.1125 19.2250 18.2500 m+!h2-d3
(2) (4) (2)
WjS 18,0500 18.1125 18.7500 18.2563 m+!h2+h3
(4) (8) (4)
SjS 18.8000 18.9500 19.3500 19.0125 m+!h2+d3
(2) (4) (2)
18.1125 18.3219 19.0188 18.4438
m-d2+!h3 m+h2+!h3 m+d2 +!h3 m+!h2~h3
m = 18.7532
Chromosome Diallel
Mean
II III analysis
h
-0.5381 -0.9835 -0.7410 - 0.5466
d
These values and those yielded similarly for d3 and h3 by the right-hand
marginal means are shown in the lower part of Table 25. It will be ob-
served that for both chromosomes the S homologue mediated a higher
chaeta number than W, and also that h is negative in both cases, indica-
tive that the W homologue is showing dominance over S for both
chromosomes II and III. Now D = d/ + d 23 = 0.4531 2 + 0.3813 2 =
0.3507 which compares with the estimate D = 0.6167 obtained from
the diallel analysis, and similarly H = h/ + hi = 0.2001 as compared
with the estimate H = 0.1842 from the diallel analysis. The agreement
between the two estimates of H is close, but that between the two esti-
mates of D less so. We should remember, however, that D and Hare
quadratic quantities and hence will tend to magnify apparent discrep-
ancies. In order to make a comparison in linear quantities, let us note
that the direct estimates of d2 and d3 do not differ significantly and
Undefined diallels 85
hence assume that they are equal. Similarly h2 and h3 do not differ
significantly and we assume_that they also are equal. We t~en replace d 2
and d 3 each by their mean d, and 112 and h3 similarly by h. The values
for d and h obtained from the direct analysis are shown in the column
headed Mean in the lower part of Table 25. Turning to the estimates from
the diallel analysis,D = 2d2 andH= 2Ji2. Thend =Y(-!·0.6167) = 0.5553
andh =Y1·0.1842) = -0.3035, these findings being entered in the
column of the table headed Diallel Analysis. That h from the diallel
analysis must in fact have a negative sign is shown, as already noted, by
the order of the points on the w,., v,. graph.
The agreement in respect of h is now strikingly good and that in re-
spect of d reasonably close. In fact, although it is not easy to test the
significance of the difference between the two estimates of d, it is un-
likely to be significant. If we now estimate the average level of domi-
nance by taking hid we obtain -0.7416 from the entries in the mean
column and -0.5466 from the diallel column. The two analyses agree
in showing dominance to be incomplete, lying somewhere between half
and three-quarters, and in the direction of low chaeta number. Evidently
the diallel analysis has produced estimates which are compatible with
those of the direct analysis, over and above it showing that while domi-
nance is present there is no evidence for interaction of non-allelic genes.
1 7. Undefined diallels
Just as a 4 X 4 diallel can be used to investigate two genetic differences,
in the way we have seen, an 8 X 8 could be designed using as parents all
the possible combinations of three genetic differences and used to exam-
ine the behaviour of these genetic differences and to test whether they
showed non-allelic interaction. We could go on to a 16 X 16 to look at
four genetic differences in the same way, and so on. Where, however,
the genotypes of the parents, and hence of the progenies are defined
and known, as in the case we have described, the approach through
direct analysis is always open and will in general lead to more informa-
tive results since the d's and h's are then estimated individually and not
pooled in D and H. The value of applying the diallel analysis to the ex-
periment discussed in the last section, was in fact that it allowed us to
compare its results with those of direct analysis and see that it did effec-
tively extract the same information.
With the vast majority of diallels, direct analysis is not possible because
it is rare for the parental genotypes to be defined as they were in the
Drosophila experiment. Where the differences among the parental geno-
86 Diallels
types are undefined, diallel analysis must be used and two further com-
plications must immediately be taken into account. In the first place we
cannot know that the two alleles (assuming that there are only two) of
any gene are equally common among the parents, other than in excep-
tional cases like the diallel referred to by Jinks et al. (1969) in which the
20 parental lines were descended by selfing from 20 individuals in an F2
of Nicotiana rustica and hence might be expected to have equal fre-
quencies for the alleles at any locus, within the limits of sampling vari-
ation.
Secondly, we cannot be sure either that the pairs of alleles at differ-
ent loci are distributed at random with respect to each other in the way
that can be ensured in a defined diallel. Clearly we must take the possi-
bility of such association of the genes into account in carrying out the
analysis and interpreting its results.
Let us look into the consequences of these complications, starting
with that of unequal gene frequencies. Consider the case where a pro-
portion U a of the parent lines are true-breeding for allele A and pro-
portion va(= I - u a ) are true-breeding for allele a. The mating AA and
AA will then occur in u/ of cases and of aa with aa in v/ of cases, the
remaining 2uava of matings being AA X aa. The frequencies of the types
of matings, together with the genotypes and phenotypes in respect of
this gene difference are shown in Table 26. The array means, variances
and covariances are also shown in the table. Just one point needs noting
about their derivation. The mating AA X AA, for example, constitutes
ua X ua = Ua2 of all matings in the table, but it constitutes ua of the
matings in the arrays stemming from AA parents. Thus the mean of the
AA array is uada + vaha' not u 2 da + v 2 ha. Bearing the same point in mind,
the variance of that same array is found as
v,. = uad/ + vah/ - (uada + vaha)2 = Ua va (da - hal
and the covariance is
w,. = uaoda·da-vada·ha-(ua-va)da(uada+vaha) = 2uavada(da-ha)·
The mean, v,. and w,. of the aa array are found similarly.
We can then see from the table that the changes in v,. and Wy between
the arrays are respectively
~v,. = 4ua vadah a and ~w,. = 4ua vadaha·
Female parent
Genotype AA aa Mean
Frequency u v
Expression d -d (u-v)d
AA AA Aa
... u u2 UV
...=
CI)
d d h ud+vh
'"
p.
aa Aa aa
CI)
-;J
::E v UV v2
-d h -d uh-vd
Mean ud+vh uh-vd (u-v)d+2uvh
~ uv(d-hi uv(d+h)2 uV[d+(v-u)h]Z+4u 2V2h 2 = t(DR+HR )
w, 2uvd(d-h) 2uvd(d+h) 2uvd[d+(v-u)h] = !Dw
Vp = 4uvd2 = Dp
A w,. = 4uvdh A v,. = 4uvdh
ference to two important properties of -w,. and v,:. First the arrays will
have the same v,: and -w,. in the absence of dominance, i.e. when ha = 0
'*
the arrays will all give the same point, within the limits of sampling vari-
ation, on the -w,., v,: graph. Secondly, where ha 0 the slope of the line
joining the points from the two arrays on the -w,./v,: graph will have a
slope of 1. It will be observed that if ua = va' i.e. if the frequencies of A
and a are equal, all these expressions reduce to those found for the
simple case discussed at the beginning of the Chapter, as indeed they
clearly should.
In extending our consideration to two genetic differences, we note
that where the frequencies of A and a are u a and va respectively among
the true-breeding parents, and the frequencies of Band b are similarly
u b and Vb' the alleles at the two loci will be distributed independently of
each other if the frequency of AB, Ab, aB and ab parents are uaub ' Ua Vb'
vaub and Va Vb respectively. Given that this is the case, and assuming that
the effects of non-allelic genes are simply additive, that is that there is
no non-allelic interaction, it is not difficult to derive the expression for
the array means, variances and covariances shown in Table 27. These
88 Diallels
expressions reduce of course to those in Table 17 when ua = va = ub =
Vb = 1. It will be seen that for any pair of arrays ~ v,. = ~w,.. Thus for
example the differences between arrays aabb and AABB are ~ v,. = ~w,.
= 4 (uavadaha + UbVb db h b) while those between arrays aaBB and AAbb
are ~v,. = ~w,. = 4 (uavadaha -Ubvbdbh b)· Thus in plotting their w,.
against v,. the four arrays will again give four points lying in a straight
line of slope I, and also again the array with the two dominant alleles
will have the lowest values of w,. and v,., and so will give the lowest point
on the graph while the array with the two recessive alleles will give the
highest point with the other two arrays giving intermediate points. Thus
the test of adequacy of the additive-dominance model developed for the
defined diallel in the previous section will apply to undefined diallels.
We should note, however, that an undefined diallel will reveal failure of
the model not only when the genes show non-allelic interaction, i.e. are
not independent in their action but also when the genes show non-
random association among the parents, i.e. are non-independent in their
distribution. Finally, it is not difficult to see that these relations between
w,. and v,., and with them the test of goodness of fit of the additive-
dominance model, still hold good for three, four or indeed any number
of gene differences. They are in fact general properties of diallel sets of
matings.
So far nothing has been said about the genetical components of vari-
ation D and H, and indeed when we turn to these we find complexities
which were not present in the case of the defined dialle!. Turning back
to the case of the single gene difference in an undefined diallel (Table 26)
we find that the contribution this pair of alleles makes to Vp the vari-
ance of the parents is no longer d/, but takes the more general form
4uavad/, which of course becomes d/ when the alleles are equally fre-
quent among the parents of the diallel, i.e. ua = va = 1. With two genes
independent in their actions and their distribution Vp = 4 Uava d/ +
4U bVbdb2 and with any number of genes Vp = S(4uvd 2 ). We may thus
write Vp = Dp where Dp = S(4uvd 2 ).
When we turn to array variances, however, while the contributions of
A-a to v,. may still be written as the sum of two quadratic quantities,
one of which depends solely on h 2 , the other no longer depends solely
on d 2 • The contribution tov,. is in fact uava [da + (Va - ua)haF +
4u/ V/ h/ and generalizing to any number of genes independent in their
actions and their distribution the genetical componen.!. of v,. = S{uv[d +
(v - u)hF + 4u 2 v 2 h 2 }. This can be cast in the form v,. = !CDR + HR)
TABLE 27.
Array frequencies, means, variances and covariances for two gene differences,
A-a with frequencies ua and va' and B-b with frequencies ub and vb
Array
Overall
AABB AAbb aaBB aabb
Frequency Uaub Uavb vaub va Vb
they resembled the plants from the reciprocal crosses started off in a
different seed pan. This would produce the result observed and later
experiments in fact pointed to it as the most likely cause. Whatever the
explanation, however, it is clear that the duplicate error variance is not
a reliable yardstick to use in assessing the significance of the items in the
analysis of variance. When tested against the reciprocal mean square,
8.058, the probabilities of the variances for female and male parents are
still very small and even that for interaction still has a probability as low
as 0.001. Thus even when tested against this new and higher estimate of
error, all the items are still significant.
It should be noted that this test of significance is not strictly valid,
since the 36 df for reciprocal differences are not orthogonal to the 3
items, female parents, male parents and interaction, contained in the
80 df that these 3 items jointly comprise. Since, however, the recipro-
cals mean square is lower than any of the other 3, deduction of the
36 df from the 80 could only serve to raise the mean square attaching
to the residual df's and so raise the VR and hence the significance.
Although our test is not strictly valid, it is a conservative test and we
can therefore accept the significance that it reveals for all 3 items in
the main analysis of variance. The Hayman analysis of variance of these
data described by M and J overcomes this difficulty and confirms the
conclusions from this simple analysis.
The significant interaction item in the analysis of variance shows us
that there is non-additive heritable variation, and we must now con-
tinue the analysis to discover whether this non-additive element can be
accounted for by dominance or whether non-independence of the effects
94 Diallels
of non-allelic genes is also involved. Proceeding just as we did in the
earlier example, the values of w,. + v,. and w,. - v,., taken from M and J,
are listed for each of the nine arrays from each of the two blocks in the
upper part of Table 30, with their analyses of variance in the lower part
TABLE 30.
Wr + v,: and Wr - v,: from the two blocks, 1 and 2
Array
Wr + v,: Wr-v,:
2 2
1 81.939 61.763 13.728 3.419
2 25.105 14.731 11.650 6.141
3 159.529 112.761 9.055 2.106
4 80.289 41.610 7.453 9.125
5 29.814 19.128 14.277 4.919
6 51.130 36.211 21.844 15.149
7 91.263 72.303 20.668 16.212
8 55.178 55.213 19.008 16.410
9 29.152 15.093 15.535 4.691
Analyses of variance Wr + v,. Wr-v,
df MS VR P MS VR P
Between arrays 8 2736.0 9.67 <0.001 50.34 1.96 0.20-0.10
Within arrays 9 282.9 25.95
of the table. As in the Drosophila example, we have not taken out the
single degree of freedom for the block difference because our analysis
of the original data again shows no evidence of such a difference. It is
clear that w,. + v,. varies significantly from array to array whereas w,. -
v,. does not. There is therefore clear evidence of dominance but no evi-
dence of non-independence in effect of non-allelic genes. This means
that not only is there no evidence of interaction between non-allelic
genes in producing their effects, but also that there is no evidence of
the genes being associated in a non-random way in their distributions
between the parents.
We can move on to the regression of w,. on v,.. The arrays pooled over
blocks and reciprocals are set out in Table 31. The values of Wr and v,:
are also shown for each array. The SS for w,. is 2680.75 and for v,. is
2892.76, while the SCP for Wr and V, is 2690.30. The linear regression of W,
on V, is thus b = 2690.30/2892.76 = 0.9300 which does not differ signifi-
An example of an undefined diallel 95
TABLE 31.
Half-diallel table pooled over the two blocks, with array means, v,. and Wr
Arrays
Mean Vr Wr
2 3 4 5 6 7 8 9
1 38.90 25.30 37.10 35.45 25.80 29.10 36.30 30.30 25.30 31.506 30.1665 38.9886
2 27.05 25.80 23.30 22.35 25.15 24.05 19.95 21.75 23.856 5.0059 13.4255
3 48.80 30.38 25.50 30.90 38.95 26.10 25.45 32.108 64.8375 70.3335
4 34.10 24.45 30.63 30.55 22.20 24.50 28.394 23.8661 33.6539
5 26.60 25.83 28.65 19.70 23.35 24.692 6.8419 16.0434
6 27.00 28.75 20.60 23.05 26.778 12.0930 30.3851
7 37.00 23.55 28.25 30.672 31.0601 49.6899
8 15.30 20.40 22.011 18.3805 35.8031
9 25.40 24.161 5.3974 15.8965
o 10 20 30 40 50 60
Fig. 9. The Wr/v,. graph for flowering time in the undefined diallel among
nine lines of Nicotiana rustica. The parental line giving rise to the array
represented by each point is indicated by the number against it.
120
100
80
~
+
~'-
60
-8
40
20
0 10 30 40 50
P
Fig. 10. Wr + v,. from each array of the Nicotiana rnstica diallel plotted
against P the mean flowering time (expressed in days after 1st July) of the
parental line giving rise to that array. Note that all the points lie as a straight
regression line except that from parental line 8. With the exception of line
8, the earlier the flowering of the parent, the smaller the corresponding
Wr + v,., showing that in general the alleles for earlier flowering aredomi-
nant to those for later flowering. The position of point 8, however, indi-
cates that this dominance relation no longer holds when the parent's
flowering time is earlier than mid-July.
TABLE32.
Components of variation for flowering-time in Nicotiana rustica
TABLE 33.
Phenotypes from the nine genotypes comprising all combinations
of A-a and B-b
AA Aa aa
BB da +db ha+db -da+db
+iab +jba -iab
Bb da + hb ha + hb -da+hb
+jab +lab -jab
da = db = Ita = hb = i = ia = ib = I.
Now in our usage, the designation of the commoner phenotypes as
being produced by AABB, AaBB, etc. implies that this commoner pheno-
type is the one with the greater expression of the character. Clearly there
could then be a counterpart situation where the phenotype with the
lesser expression would constitute n, of the F2 , and that with the greater
expression only k This would arise where the phenotype of aabb, Aabb,
aaBb and AaBb were alike on the one hand and those of aaBB, AaBB, AAbb,
AABb and AABB were alike on the other. The equations then became
-da - db +i = ha - db - ib = -da + hb - ia = ha + hb + I
and
-da + db - i =ha + db + ib = da - db - i = da + hb + ia = d a + db + i.
These equations are satisfied when da = db = -ha = -hb = -i = ia = ib
= -I. Thus the general conditions for classical complementary action
are that all eight parameters are equal in size, with two j's positive like
the d's and i and I having the same sign as the two h's, which themselves
are of the same sign.
The second classical interaction we will consider is that of so-called
duplicate genes, which give a 15: 1 ratio in F 2, aabb being the only geno-
type to give a unique phenotype where the commoner phenotype has the
greater expression of the character and AABB being the genotype with
the unique phenotype where the commoner class has the lesser expression
of the character. In the former case AABB, AABb, AaBB, AaBb, aaBB,
aaBb, AAbb and Aabb must have the same phenotype from which it
follows that
da + db +i = da + hb + ia + db + ib = ha + hb + I
= ha
= -da+db-i =-da+hb-ia = da-db-i = ha-db-ib ·
These equations are satisfied if da = db = ha = hb = -i = -ia = -ib = -I.
The counterpart situation where AABB is unique and aabb, aaAb, Aabb,
AaBb, AAbb, AABb, aaBB and AaBB are alike arises where
104 Genic interaction and linkage
da = db = -ha = -h b = i = -ia = -ib = I.
So we see that duplicate interaction arises when all the parameters have
the same magnitude, and the two j's are negative while i and I have the
opposite sign to the h's. To abbreviate, the condition for complementary
action is that
da = db = ± ha = ± hb = ±i = ia = ib = ±I
while the condition for duplicate interaction is similarly
da = db = ± ha = ± hb = +i = -ia = -ib = + I.
The value of this approach is that we can now generalize the notion
of complimentary and duplicate action. For example if we write
e(da=db=±ha=±hb) = ±i = ia = ib = ±I
we have no interaction when i = i = I = 0 i.e. e = 0, full complemenary
interaction when e = I, partial complementary interaction when the
interaction parameters are all equal but less than the d's and h's i.e.
0< e < I and over or super-complementary interaction when e> 1.
Furthermore when e = -1, we have full duplicate interaction, when
o > e> -I partial duplicate interaction, and when -I > e over or super-
duplicate interaction. We shall see later how this generalization can be
put to use. Other more complicated generalizations about interaction are,
of course, also possible although none have yet been developed for use in
practice.
One last point remains to be made about the classical interactions. An
F2 giving a 9:3:3: 1 ratio was regarded classically as showing no interac-
tion. In point of fact a 9: 3: 3: 1 or one of its simple derivatives is obtained
whenever da = ± ha' db = ± hb and ± i = ia = ib = ± I. Thus the ratio does
not necessarily indicate an absence of interaction in our sense, but again
implies its own limitations in the relations among the interaction par-
ameters.
tion, being positive when, for example, the two h's and I yielded by two
genes are in the same direction, and negative when I is in the opposite
direction to the h's. With i and j interaction, however, not only does the
direction of the interaction itself enter in, but also whether the two
genes in question are associated or dispersed in the parents, as indeed we
can see from Table 34. The i yielded by two genes will be in one direc-
tion when the genes are associated but in the other when they are dis-
persed, whereas if they are intrinsically in the same direction the two j's
will reinforce one another when the genes are associated but will tend to
cancel one another out when the genes are dispersed. The algebraic re-
lations of i and j to the proportions of the k genes which are associated
and dispersed is somewhat complex (see M and J) and need not be de-
tailed here. It is sufficient for us to note that neither [i] nor [j] need be
o in a given cross even where [d] = 0 as a result of partial dispersion of
the genes. As with [d] and [h], however, [i] = 0 does not necessarily
imply that all the individual i's are 0, although [i] =F 0 must imply that
at least some of the i's are not O. The same is of course true of [j] and [l].
We can see from Table 34, but using the generalized forms for [d], [h]
and their interactions, which take into account the effects of association
and dispersion as well as the direction of the interaction
~ = m+ [d] + [i]
Interaction as displayed by means 107
P2 m - [d] + [i]
F1 = m + [h] + [I]
~ m + ![h] + HI]
~ = m + ![d] + ![hJ + Hi] + !U] + HI]
B; = m -![d] + ![h] + ![i] -!U] + ![I].
Six parameters are involved in these expressions and six means are
available for their estimation. We can therefore arrive at perfect fit esti-
mates of the six parameters, thus
m = !ft; +!i5; + 4~- 2~ - 2B;
[d] = !ft; -!i5;
[h] 6~ + 6B;-8~-~-lt~- Hi>;
[i] = 2~ + ]B2 -=-4F~
U] 2B 1 - P1 - 2 B2 + P2
[I] = ~ + i>; + 2 ~ + 4 ~ - 4 Ii1 - 4 B;.
The standard errors of these estimates can be found in the usual way.
Thus, for example,
t = [d]/S[d).
Finding [i]' [j] or [I] significant in such tests is obviously equivalent to
finding significant deviations from zero in the scaling tests; but it has the
additional advantage of yielding estimates of the parameters and therefore
of identifying the type or types of interactions responsible for the depar-
ture from the simple additive-dominance situation. We should note that
the 3 degrees of freedom, from which is derived the Xf3) testing the good-
ness of fit of the model in the joint scaling test described on pages 37-40,
are now being used for estimating the three interaction parameters. No
test of goodness of fit is therefore possible of the new model incorpor-
ating the three types of digenic interaction: indeed as we have seen it is
a perfect fit estimation. More generations such as F3 or second back-
crosses must be included if sufficient equations are to be available to
provide a test of goodness of fit. If in such a case the model involving
digenic interactions proves to be inadequate to account for the results,
108 Genic interaction and linkage
we should have to consider the possibility of trigenic interaction or
some other further complicating factor but this is beyond the scope
of our present treatment.
We may illustrate the procedure of estimation in a simple case by ref-
erence once more to data from the cross between varieties 72 and 22 of
Nicotiana rustica for plant height six weeks after planting in the field
which was analysed in Chapter 3. The C scaling test and the joint scaling
test when applied to these data were highly significant (Table 8). The
simple additive-dominance model is clearly inadequate. Furthermore,
attempts to find an alternative scale on which this model would be ad-
equate failed. If we wish to analyse these data further we must, there-
fore, allow for the presence of non-allelic interaction (or epistasis as it is
sometimes called) in any model we attempt to fit.
Using the perfect fit formulae we can estimate the three interaction
components, [i], [j] and [I] in addition to m, [d] and [h]. As we have
already seen
[d] = t~ - tF;.
On substituting the appropriate family means from Table 8, this gives
[d] = tc80.40 - 65.4 7)
= 7.46.
Similarly,
SId) = v'J[d) = v'[!(l.936)2+!(l.726)2]
= v'1.680 = ± 1.296.
[j], does not differ significantly from zero it would appear that a model
in which it was omitted would be adequate for these data.
Fitting a five parameter model by omitting [j] would allow us to test
the goodness of fit of the model by means ofax 2 with one df, and at
the same time improve the precision with which the remaining par-
ameters were estimated. Estimating the five components of this model
proceeds exactly as for the simple additive-dominance model in the joint
scaling test (Chapter 3, Section 9). It leads to the estimates on the right-
hand side of Table 35. As expected the five parameter model is adequate,
the X2 [1) testing its goodness of fit being non-significant. There is also a
marginal improvement in the precision with which we have estimated the
five components, as shown by their lower standard errors.
Since the model is adequate we can conclude that trigenic interactions
and similar complex factors are not making a significant contribution to
the differences among the generation means. We can interpret the data,
therefore, in terms of the additive, dominance and digenic non-allelic
interaction components of the gene action. The h increments of the
majority of individual loci must be negative while the I increments of
the majority of pairs of loci must be positive. The non-allelic interaction
is, therefore, mainly of the duplicate kind.
Before leaving the effects of non-allelic interaction on means we must
note the contribution it can make to heterosis. Heterosis will be observed
when ~ > liz, liz
where is taken as the parent with the greater expression
of the character. As we have seen earlier, in the absence of interaction
liz
~ > requires that [h] > [d], and this in turn requires that one or both
110 Genic interaction and linkage
of two conditions be satisfied, namely
(i) h > d for some or all of the genes; that is there must be over-
dominance at some or all loci.
(ii) [d] < Sd; that is there must be dispersion of the genes between
the parents, the value of [d] being thus reduced by the balancing effects
of the genes of opposite effect in each parent, whereby [h) may exceed
[d) although each h is no larger and may even be smaller than its corre-
sponding d.
These two conditions cannot be distinguished from means alone,
although second degree statistics allow the distinction to be made. At
the same time it is a distinction of great practical importance, since
wherever heterosis depends on overdominance the maximum expression
of the character, for example yield in a crop plant, can be achieved only
by a hybrid breeding programme producing FI's for commercial use.
Where, however, heterosis is due to dispersion of the genes, it is in prin-
ciple always possible to produce a true breeding line expressing the
character to at least as high a degree as the Fp although of course this
may involve the breakage of linkages between the dispersed genes.
Now where digenic interaction is displayed the requirement for
P1 > P1 becomes [h] + [I] > [d] + [i]. This relation clearly offers a num-
ber of possibilities for the production of heterosis. Two effects, re-
inforcing the relations by which heterosis may arise in the absence of
interaction, are however of special importance, namely
(i) That the h's and their associated l's are entirely or at any rate
preponderantly of the same sign, which of course is a feature of comp-
lementary gene action.
(ii) Dispersion of the interacting genes between the parents, so that
although, as is required by complementary interaction, the sign of the
individual i's is the same as that of the h's, [i] will take a negative sign in
the parents.
The first relation will raise the value of [h] + [I], the expression of
the character in Fl. The second will limit the increase in value of [d] +
(i], and may even diminish it relative to [d).
Thus complementary interaction can increase the expression of het-
erosis whether it be due to over-dominance or gene dispersion. It is thus
not surprising that wherever the data permit the analysis to be made,
non-allelic interaction, presumably of the complementary type, has
been found to be a common accompaniment of heterosis. These effects
of digenic interaction on heterosis are illustrated in Fig. 11.
Variances and covariances 111
(-if
Duplicate -6 Complementary
Fig. 11. Heterosis, measured by the excess of the Fl mean over that of the
better parent (~- ~), in relation to non-allelic interaction, measured by 8.
Solid lines show the relationship where 2,4, and 8 gene pairs are respectively
involved with maximum dispersal, i.e. 1 increasing allele in each parent for
2 gene pairs (1/1), 2 in each parent for 4 gene pairs (2/2) and 4 in each
parent for 8 gene pairs (4/4). The broken line shows the relationship for 8
genes where 6 increasing alleles are in one parent and 2 in the other (6/2).
Note that in all cases, except that of 2 gene pairs, the sign of the heterosis is
reversed where duplicate type interaction of sufficient strength is operating.
The diagram assumes that all d's are equal to one another and to all h's, with
all i's = alll's = 8d.
!h b and the deviation from it of AA, aa and Aa are respectively (da + !ja)'
- (da + h a) and (ha + !Z). In the case of gene B-b, the corresponding
deviations are (db + !jb)' - (db + !jb) and (hb + !/). These deviations re-
place da , -da, ha' db' -db' and hb which obtain in the absence of inter-
action.
If we pass on to F 3, taking the generation as a whole, the four complete
homozygotes (AABB, etc.) each comprise l4, the four single heterozygotes
(AABb, etc.) each comprise 0\, and the doubly heterozygous genotype
(AaBb) comprises -A of the individuals. The means of all AA, Aa and aa
individuals are thus da + !ja + !h b , ha + !Z + !h b and -(da + !ja) + !h b
giving deviations of (da + !ja), (ha + !/) and - (da + !ja)' It is not surpris-
ing therefore to find that the total variance of the F3 generation is
Variances and covariances 113
VF3 ~F3 + ViF3 = Hda + V a)2
+ ~(db + !jb)2 + n(ha + !/)2
+ n(h b + !Z)2 + fti 2 + &j/ + &jb2 + ~/2
the coefficients of the terms in i 2 , P and 12 being once again the products
of the coefficients of the relevant main effects. If we proceed further to
find ~F3 and ViF3 we again find terms in (d + *j)2 and (h + !l)2, thus
~F3 + !ja)2 + t{db + Vb)2 + rdha + !Z)2 + fo(h b + !/)2
t{da
+ !i 2 + -hj/ + -hjb 2 + -rh/2
ViFJ = !(da + !ja)2 + !(db + !jb)2 + Hha + *1)2 + Hhb + 11)2
+ foi 2 + ~j/ + ~j/ + -hz2.
The coefficients of the interaction terms are again the products of the
coefficients of the relevant main effects in ~F3' but not in ViF3. Indeed
since ~F3 + ViFJ = VFJ the product rule cannot apply to ViF3 if it
applies to ~F3 and VF3 ·
The covariance of F2 parents and F3 means is
-~I------2~1----~O------2~1----~,e
Duplicate Complemetary
Fig. 12. The effect on VzF2 of complementary and duplicate type interac-
tion, measured bye, in the cases of 2 and 5 segregating gene pairs. In each
of the two cases all d = all h, and all i = alIj = alII = ed. In both the 2 and
5 gene cases the values of VzF2 are scaled to be 1 when there is no interac-
tion (e = 0).
AA Aa aa
q2 2pq p2
BB da+db ha+db -da +db
da+db !ha + db -da +db
2pq C2q2 2pq
Bb da + hb ha+hb 2p2 -da + hb
da + !hb !ha + !h b ha + hb -da + !hb
!ha + !h b R
p2 2pq q2
bb da-db ha-db -da -db
da-db !ha -db -da-db
The two hitherto unfamiliar terms in this expression involve the re-
combination value, combined in one case with da db and in the other with
ha h b · With free recombination p = !, I - 2p = 0 and the new terms van-
ish to leave the expressions obtained in Section II. With complete link-
age p = 0, I - 2p = I and, aside from non-heritable variation, T-jF2 =
t{da + d b)2 + !(ha + hb)2. The two genes are then acting as one. Even
where recombination occurs, however, the recombinant genotypes will
be rare if p is small, and the genes will effectively act as one except in
so far as selection may isolate one of the rare recombinants.
Where the genes are in repulsion the heritable variance of F2 becomes
T-jF2 = Hd/+db2-20-2p)dadb]+![h/+hb2+20·-2p)2hahb]·
The sign of the term in da db is changed but, as would be expected, that
in ha hb remains the same. It should be noted, however, that ha hb will be
Correlated gene distributions: linkage 119
positive only if ha and hb are reinforcing one another by aCting in the
same direction. If they are opposing one another in action this term will
take a negative sign. Thus reinforcement versus opposition of the h's
resembles coupling versus repUlsion of the genes in its effects on the
signs of the term in p. It should be remembered nevertheless that re-
inforcement versus opposition is a physiological distinction while coup-
ling versus repulsion is a mechanical one.
If we now write
D = d/ + db 2 ± 2(1 - 2p)da db
~F3 = t D + -kH + E
WIF23 = 1D+1H
2" g
D = d/+d/±2(1-2p)2da d b
and H = h/ + hb2 + 2(1 - 2p)2 (1 - 2p + 2p2) ha h b •
The same definition will apply to lSF4 and W2F34 ' the rank 2 statistics
of F4 , just as the rank 1 definition will apply to ~F4 and W1F34 . The
mean variance of F4 families will, however, by extension of the argu-
ment reflect three rounds of recombination, at gametogenesis in F l , F2
and F 3 , and hence will be of rank 3 as is denoted by it being written as
f3F4. The rank 3 components of variation which appear in f3F4 are
TABLE 38.
Ear conformation in barley (Mather, 1949). D I and H1 denote the rank 1 com-
ponents, and D2 and H2 the rank 2 components. £1 and £2 are the non-heritable
variances of individuals and family means respectively
Heritable variation
Statistic Observed Expectation
Observed Expected Deviation
which reduces to
4p(l - 2p)
4(1 - p) when da = db·
23. Diallels
The means of the families which constitute a set of diallel crosses will
reflect any interaction shown by the genes in which the parental lines
differ. On the other hand, since only these means are used in diallel
analysis, and indeed the families themselves are non-segregating in the
diallels we have been observing, linkage as such can be having no effect
on the variation that we observe and measure. At the same time the
Diallels 125
genes in which the parental lines differ may be correlated in their dis-
tributions among the parents and in such a case their contributions to
the variation among the families of the diallel will not be independent.
The general expression for the effects of digenic interaction on the
means, variances and covariances of a diallel are very complex (M and J,
Table 96). We can, however, learn something of the ways in which both
interaction and correlated gene distributions express themselves in diallel
analysis if we consider the special and relatively simple case of four
parental lines representing all the combinations of two genes pairs with
ua = va = ub = Vb = t (i.e. all gene frequencies equal) but having corre-
lated distributions among the four parents, and where da = db = ha = hb
and i = ia = ib = 1 = ()d (i.e. with digenic interaction of the complementary-
duplicate type). The correlation of the gene distributions is measured by
the parameter c the frequencies of the AABB and aabb parents each
being!(1 + c) and those of the AAbb and aabb parents each being!(1-c).
When c = 0, all the parents occur with the frequency!. When c = 1 associ-
ation is complete, the AAbb and aaBB parents being absent, with A and B
on the one hand and a and b on the other always occurring together as a
single compound gene pair. Equally when c = -1 dispersion is complete,
A always occurring with b and a with B the AABB and aabb parents
being absent. Values of c between 1 and -1 represent various strengths
of association and dispersion.
Similarly the interaction is measured by (). So with da = db and i = ()d
the phenotype of for example AABB, which in general terms is da + db
+ i, can be written as d(2 + (), and that of AAbb as d(-(). Similarly
with h = d and 1 = i = ()d the phenotype of AaBb, which in general
terms is ha + hb + I, becomes d(2 + () and so on. The phenotypes of the
sixteen families in the diallel are set out in these terms in the body of
Table 39, where the frequencies of the four parental lines are also shown
in terms of c.
The diallel table is sufficiently simple for us to undertake a full analy-
sis. The first point to note is that since da = db = ha = hb and ia = i b, the
central two arrays will be alike in the values they yield for v,. and w,. and
so will provide only a single joint point in the w,./v,: graph, which thus
will have only three points instead of the more general four. The mean
of the parents will be !d[(1 + c)(2 + ()) - 2(1 - c)() + (1 + c)(-2 + ())]
= d ()c and the mean of array ab will obviously be the same. Since the
phenotype is d (2 + () for all four classes in the AB array, its mean will
obviously be d(2 + () while the means of the Ab and aB arrays will be
!d[(1 + c)(2 + () - (1 - c)() + (1 - c)(2 + () - (1 + c)()J = d. v,. for
126 Genic interaction and linkage
TABLE 39.
the AB array will clearly be 0 since the phenotypes of all its classes will
be alike, and so of course will its w,. also. For the ab array
v,. = !d2 [(l+c)(2+0)2+2(l-c)02+(l+C)(-2+0)2]-d 2 0 2 C2
= d(2 + 0 2 + 2c)
the term d 2 0 2 c 2 being the correction for the mean. The variance of the
parents will obviously be the same as v,. for the ab array, since the pheno-
types of the four classes in the array are the same as those of the corres-
ponding parents. For these reasons also Wr will equal v,. for this array.
Turning to the central arrays we find
Duplicate
w,. 0-5
Complement.ary
Association
w,. 05
-1-
Fig. 14. The effect of gene association and dispersion, measured by c, of two
gene pairs on the Wrlv,. graph from a diallel set of matings, with da = db = ha =
hb and e= O. The effect of association is similar to that of duplicate interaction
and dispersion to that of complimentary interaction, illustrated on Fig. 13. The
path of the middle point with change in c is not however curved, as with interac-
tion, but follows a line parallel to the abscissa as shown by the heavy line. The
numbers indicate the values of c to which the points correspond.
200
2-0
<fl
-+-'
~ 100
o
LL
1-5
o~------------~
15 25
Fig. 15. Krafka's data (from Hogben, 1933) on the mean numbers of facets
in the eyes of two lines of Bar-eyed Drosophila at two temperatures. When
the direct count of eye facets is used (on the left) the difference between
the lines at 15° C (dl5 ) is larger than the difference at 25° C (d25 ), so indi-
cating genotype X environment interaction. When, however, the logs of
the mean numbers of eye facets are used (on the right) d15 and d 25 are
nearly equal. The scalar transformation has removed the interaction.
at 25° C the difference is only 49. At the higher temperature the dif-
ference is only 1/3 of that at the lower. The lines are not reacting equally
to the change in temperature: the effects of genotype and environment
are not additive, or in other words, there must be an interaction of geno-
type and environment. When, however, we change the scale by taking
logarithms of the number of facets, we obtain the picture shown on the
right of Fig. 15. In log measure the difference between Land U is 0.58
at 15° C and 0.4 7 at 25° C. The higher temperature still gives a smaller
difference than the lower, but the reduction is proportionately very
much less than when the untransformed facet number was used. The
132 Interaction oj genotype and environment
log transformation has very much reduced the genotype X environment
interaction, if not entirely eliminated it.
The size of the reduction emerges even more dramatically if we sub-
ject the data to an analysis of variance. The 3 df among the four obser-
vations may be assigned 1 each to the overall effect of the genetic dif-
ference, the overall effect of the environment, and the genotype X en-
vironment interaction. The percentages of the total variation taken out
by each of these three items using direct measure and log measure are:
Item Direct-Measure-Log
Genetic 54.1 66.1
Environmental 32.5 33.2
Interaction 13.4 0.7
Looked at in this way the interaction has been rendered negligible by
the log transformation.
One further point is worth noting before we leave this example. When
considering another Bar-eye gene, in Section 8, we saw that a square root
transformation eliminated that interaction between alleles which we
term dominance, whereas a log transformation did not, and we saw too
that a theoretical interpretation of this finding could be advanced. In
the present example, while a square root transformation reduces the
genotype X environment interaction it is much less effective than the
log transformation. This contrast emphasizes the essentially empirical
nature of choice of a transformation, and the unwisdom of seeking to
draw theoretical conclusions from a successful case of a particular
change of scale.
Not all genotype X environment interactions can, however, be as-
cribed to the use of an inappropriate scale for the representation of the
character. Table 40 sets out the mean numbers of sternopleural chaetae
borne by the two inbred lines, Samarkand (S) and Wellington (W), of
Drosophila when raised in six different environments, which comprised
all the possible combinations of two temperatures 18 and 25° C, and
three types of culture vessel, ! pint milk bottles with yeasted food (B),
1 X 3 inch vials with yeasted food (Y), and similar vials with unyeasted
food (U). Five cultures were reared of each line in each environment,
the figures in the table being the means of all the five replicate cultures
in each case. Comparisons among the five replicates give us an estimate
of error variation which will be based on 4 df within each combination
of genotype and environment. Since there are 2 X 6 = 12 such combi-
Genotype X environment interaction 133
TABLE 40.
Mean numbers of sternopleural chaetae in the Sand W inbred lines of
Drosophila melanogaster, their Fl and F2 raised in six environments
TABLE 4l.
Mean chaeta numbers of the Sand W inbred lines at 18 and 25°C
TABLE 42.
Alternative models for the phenotypes given by two genotypes, Sand W,
raised in two environments, 18 and 25°C
18°C 25°C
m + [d] + e + g m + [dj-e-g
S
m + [d] + es m + [d] - es
Item df MS VR P
[d] 3.1506 456.1 v.s.
e 1 0.2862 41.4 v.s.
g 1 0.5852 84.7 v.s.
Error 48 0.00691
v.s. = very small
Two genotypes and two environments 137
must be compared to test their significance. Again all the three items are
highly significant. Since each MS in the analysis stems from 1 df, the VR
obtained when it is divided by the error variance is a t 2 • Thus in the case
of the g item, the VR is 84.7, giving t = y'(VR) = 9.2 as in the earlier
test. The two ways of testing the significance of g are thus no more than
two forms of the same test.
The significance of g shows that genotype X environment interaction
is present, or in other words that the two genotypes Sand W do not
react equally to the change in temperature. This suggests an alternative
formulation for the phenotypes of the two lines at the two temperatures,
in which e and g are replaced by two different parameters es and ew
measuring respectively the differences produced in Sand W by the alter-
ation of temperature. Thus S at 18° C has the phenotype m + [d] + es
and at 25° C is m + [d] - es ' while for W at the two temperatures are
m - [d] + ew and m - [d] - ew as set out in the lower expressions of
Table 42. This formulation has the advantage that es and ew are proper-
ties of the individual lines, unlike e and g, which are compounds of the
properties of the two lines. As such es and ew are biologically more di-
rectly meaningful than e and g, and indeed are direct measures of the
sensitivity of the two lines to change in an aspect of the environment.
They thus measure a character which is prospectively important and
whose genetic basis can be investigated in a direct way.
Now [d], ew and es permit a complete specification of the phenotype
as do [d], e and g. Clearly therefore, since [d] is common to both for-
mulations, es and ew must relate to e and g. In fact, es = e + g, and ew =
e - g, or put the other way round e = Hes + ew ) while g = Hes - ew )'
and the SS jointly accounted for byes and ew equals that jointly ac-
counted for by e and g, each SS corresponding of course to 2 df. Thus
given the values of one pair of parameters the values of the other two
can be found: they are no more than alternative ways of representing
the same thing and are readily converted into each other. The formu-
lation to be used may be chosen by its convenience for the investigation
or analysis in hand. In general, while es and ew are the more biologically
meaningful pair, e and g are commonly the more analytically useful,
although this is not always the case.
In the present example es = t(20.45 - 20.68) = e + g = 0.2675 -
0.3825 = -0.115 while e w = H19.44 -18.14) = e - g = 0.2675-
(-0.3825) = 0.650. We note that es and ew are each found as half the
difference between two of the observed values in Table 41 each of which
has an error variance of 0.006 907. Hence ~s = ~w = !C2 X 0.006907)
138 Interaction of genotype and environment
= 0.003454 and ses = sew = y'0.003 454 = 0.05877. The difference
between es and ew is significant (which is, of course, an alternative way
of demonstrating genotype X environment interaction and leads in fact
to exactly the same test of significance that we have already used), and
ew is significantly greater than 0, but es is not significantly negative on
these results. Thus while we can say that the two genotypes respond
differently to the change in temperature, we cannot say from these data
that they respond in different directions.
Item df MS VR P
lines (L) 2 4.8163 232.4 v.s.
Environments (E) 5 0.6026 29.1 v.s.
Interaction (I) 10 0.2743 13.2 v.s.
Error 48 0.02072
-----------------------
L1 (S-W) 9.4519 456.2 V.s.
L2 (S + W -2Fl ) 0.1806 8.7 v.s.
11 5 0.4433 21.4 v.s.
12 5 0.1052 5.1 0.001
Over Over
all environments temperatures Remainder
df= 5 1 4
S 0.0508 0.0771 0.0443
W 0.6280 2.5220 0.1544
Fl 0.4723 2.0651 0.0742
F2 0.3777 1.4211 0.1168
All entries are mean squares
1 2 3 4 5 6
Environment Mean
(l8°e B) (l8°e Y) (l8°e U) (25°e B) (25°e Y) (25°e U)
Analysis of variance of g
Item df MS VR p
0·6 •
6
OA
0·2
9 o~----~--~-+--~--~----~----~
-0'2
.2
-OA
-0·6
-~0~6-----0~·4-----0~·2----~0----0~·2~--~~L-~
e
Fig. 16. The regression of g on e for sternopleural chaeta number in two
lines of Drosophila melanogaster (S and W) raised in six environments (1 to
6). The regression line of g on e has a slope of -1.2256 which by being out-
side the range 1 to -1, shows that the two lines of flies respond in opposite
directions to the relevant change in the environment (see also Fig. 17).
These six g's are plotted against their corresponding e's in Fig. 16 from
which it is clear that there is a negative relation between g and e, g falling
as e rises. We can test whether this relation is rectilinear by finding the
regression of g on e. The calculation is shown at the foot of Table 46.
The SS for e is found simply as e/ + el ... e6 2, since the sum of
the six e's must be O. (It is nevertheless easier to find this SS as (m + e])2
The relation of g to e 147
+ (m + e2? ... + (m + e6)2 - H(m + el) + (m + e2) ... +
(m + e6)]2 as every m + e is known exactly whereas all the e's involve
recurring decimals.) Similarly SS(g) = g/ + g/ ... + g/ while the
S.C.P. of g and e is e1gl + e2 g2 ... e6g6. Then the linear regression co-
efficient of g on e is S.C.P./SS(e) = -0.7214/0.5886 = -1.2256. The
analysis of variance of g is carried out in the customary way, the SS for
regression being (-0.7214)2/0.5886 = 0.8842 which on subtracting
from SS(g) leaves 1.1084 - 0.8842 = 0.2242 as the SS remaining. Since
there are six environments each yielding an observation, there will be
5 df of which 1 is taken up by the regression itself leaving 4 df for vari-
ation of the points round the regression line, so giving as the remainder
MS 0.2242/4 = 0.0560.
Each g value is derived from that of the difference between an S ob-
servation and a W observation. Each observation is subject to an error
variance of 0.020 72 and the difference between two of them will have
an error variance twice this value. Half the difference will have an error
variance of one-quarter that of the difference itself. Thus g will be sub-
ject to a variance of! X 2 X 0.020 72 = 0.010 36. When tested against
this estimate of error the remainder MS gives a VR of 5.41 for 4 and
48 df and this has a P between 0.01 and 0.001. The departures from
the linear regression are thus significant. At the same time the regression
MS tested against the remainder MS yields a VR of 15.78 which for 1
and 4 df has P = 0.02 - 0.01. Thus, despite the variation round the line,
there can be no doubt of the linear component in the regression of g on e.
This linear component must reflect the relation between g and e which
we have already seen to be produced by the temperature changes. This
relation plays a dominant part in producing the regression line because
the effect of temperature in changing e is greater than the effects of the
changes in culture container as a glance at Fig. 16 will confirm. The sig-
nificant variation about the regression line reflects the consequences of
the changes in container, which must thus produce interactions, g, not
related in the same way to the overall effects, e, as those brought about
by the alteration of temperature. Thus the relative responses of the two
genotypes to change in culture container cannot be following the same
pattern as their relative responses to change in temperature. It is there-
fore necessary to specify the type of environmental change before we
can discuss the relative sensitivities of the two genotypes to it.
The plot of g against e in Fig. 16 brings out in a clear and simple way
the relation between these two quantities. It shows us, however, nothing
of the sensitivities to environmental change of the individual genotypes
148 Interaction of genotype and environment
Sand W. A more informative, albeit somewhat more complex, picture
can be obtained in a slightly different way. If we deduct m from the
values given by S in the six environments we are left with [d] + e] + g],
[d] + e2 + g2' etc., which may of course be written in the alternative
formulation as [d] + es ] [d] + es2 ' etc. Similarly deducting m from the
values given by W leaves -[d] + e] - g], -[d] + e2 - g2' etc. which may
also be rewritten as -[d] + e w ], -[d) + e w 2, etc. The values so obtained
for the two genotypes are set out in Table 47 and are plotted against e
in Fig. 17. The table also gives the linear regression coefficients, b, of
S - m and W - m on e, and the regression lines themselves are shown
on the figure.
1·5
0·5
o 6 54
-1'0
-1,5
-2,0
-0'2 o 0·2 0-4 0·6
e
Fig. 17. The sensitivity diagram for sternopleural chaeta number in S
and W. The deviations of Sand W from the mid-parent, m, for each en-
vironment are plotted (ordinate) against e (abscissa). The six environments
are denoted by the numbers 1 to 6. The outer broken lines are the best
fitting regression lines of S - m and W - m on e. The mean of S - m and
W - m is e for each environment, and the central broken line derived from
these means is thus the regression of e on e and must have a slope of 1. The
diagram makes clear that W is more sensitive than S to change in the en-
vironment and that the two change in opposite directions.
TABLE 47.
Sensitivity to environmental change in S, W, their F1 and F2
Environment
Mean b
1 2 3 4 5 6
S- m (= [d] + es) 0.9042 0.8342 0.5842 0.7642 1.2542 0.9842 0.8875 -0.226
iii - m (= [-d] + e w ) -0.0458 -0.3358 -0.3358 -1.0058 -1.5358 -2.0658 -0.8875 2.226
e (= He s + e w ]) 0.4292 0.2492 0.1242 -0.1208 -0.1408 -0.5408 0 1.000
~-m(= [h] +eh) 0.3042 0.3342 0.4842 -0.4558 -0.7458 -1.1958 -0.2125 1.836
F2 - m (= Hh] + !eh) 0.5142 0:1842 0.0742 -0.2258 -0.9958 -0.9258 -0.2292 1.603
150 Interaction of genotype and environment
The first point to note is that the means of S - m and W- mare [d]
and -[d] respectively. The regression line for S - m must thus cut the
ordinate of the graph at [d], and the regression line for W- m cuts it at
-[d]. These two points must be equally spaced above and below the
origin, as will be seen from the figure. Next, the slope of the S - m re-
gression line measures the rate of change of es = e + g on e: in other
words it measures the sensitivity of S to change in the environment.
Equally the slope of the W - m line measures the sensitivity of W to
change in the environment, and clearly this is much greater than the
sensitivity of S, which in so far as it changes at all does so in the op-
posite direction. Now the slope of the S line depends on the change in
es = e + g on e, while that of the W line depends on e w = e - g. The
slope observed for the S line is -0.226 while that of the W line is 2.226.
The regression of e on e, which is also shown in the figure, will obviously
have a slope of 1. Thus the slope of the S line departs from that of e by
-0.226 - 1.000 = -1.226, while that of the W line departs by 2.226 -
1.000 = 1.226. So the interaction of genotype with environment is
responsible for a slope of -1.226 in Sand 1.226 in W - values which are
equal in magnitude but opposite in sign as indeed they must be since e
is found from the mean of Sand W in each environment, with the S line
reflecting the change of g and the W line that of -g. We may also note
that -1.226 measuring the contribution of g to the slope of the S line
equals the slope we have already found in a different way for the re-
gression of g on e (Table 46 and Fig. 16) as indeed it must. Table 47 and
Fig. 17 thus give us all the information that we were able to obtain from
Table 46 and Fig. 16 and more besides.
This analysis of the genotype X environment interaction is made poss-
ible only by using the chaeta numbers displayed by Sand W in the differ-
ent environments to provide their own biological measurement of the
environments and so to quantify the overall effects of various changes
of environments. The biological measure, e, has allowed us to quantify
the consequences of the changes in culture condition as well as those of
change in temperature and show both on the same scale. In doing so it
has enabled us to compare the patterns of response to temperature and
culture condition and show that they are not the same. A further advan-
tage, although not one that is brought out by our present data, is that g
may display a rectilinear relation to environmental change measured by
e, even where it fails to do so when the environment is measured in other
and perhaps more obvious ways. As an example of this, two strains of
the fungus Schizophyllum commune (Jinks and Connolly, 1973) when
Crosses between inbred lines 151
grown in a series of nine environments differing by temperature, display
interactions which when quantified by g are related in a curvilinear man-
ner to temperature itself. But when the temperature is replaced by the
biological measure e the relation of g to the environmental change be-
comes rectilinear, as is shown in Fig. 18 .
.c 80
-+-'
3
e
..:: 60
0
2
0
0:: 40
2
20
15
100
.c 80
-+-'
3
e
Ol
'0 60
Q)
0
0::
40
2 3 49 8 5 6 7
20
o 10 20
e
Fig. 18. The effect of temperature on growth rate (in mm per nine days)
of a slow (L) and a fast (H) growing strain of Schizophyllum commune. The
upper graph shows growth rates plotted against temperature, and the lower
graph shows it plotted against e, the biological measure of the nine environ-
ments. The nine temperatures are denoted by the numbers 1 to 9, which
thus relate corresponding points on the two graphs.
F2
18 29.463 0 1
2 0 !1 19.933 19.994
25 29.463 0 ! -1 0 -2 18.960 19.134
2
X[2J = 1.046 P = 0.7 - 0.5
Crosses between inbred lines 153
with their structures in terms of the six parameters, m, [d], [h], e, gd' gh
and also the weights attached to each observed chaeta number in the
analysis. The weights come of course from the variances given in Table
40. Since each temperature mean is found by averaging three obser-
vations, B, Y and U, at that temperature, its variance in for example the
parent lines will be 0.02072 -;- 3 = 0.006907 and the weight is 1/
0.006907 = 144.78. The six weighted least squares equations of esti-
mation for the parameters are then obtained in a manner exactly anal-
ogous to that used in Section 9 and turns out to be, in matrix form,
m
A
J M S.
23843.546
513.979
11875.702
524.284
-220.553
355.028
s
from which we find
m = 19.6799 ± 0.0411
A
~d -0.3808 ± 0.0416
gh = 0.3192 ± 0.0587.
154 Interaction of genotype and environment
the standard errors being obtained as the square roots of the entries
along the leading diagonal of the variance-covariance matrix rl. All the
estimates are significant and no parameter is redundant therefore.
The estimates allow us to calculate expectations for the mean chaeta
number of the S, W, FI and F2 at each temperature as shown in the last
column of Table 48. Then squaring the differences between observed
and expected means, multiplying each squared difference by the corres-
ponding weight and summing over all eight observations gives Xf2J =
1.046, there being 2 df because six parameters have been estimated from
the eight observations. This Xf2J has a probability lying between 0.7 and
0.5, indicating that so far as these data go the model is fully adequate to
account for the observations: there are no grounds for suspecting com-
plications such as non-allelic interaction.
We should note, however, that a more sensitive test would have been
possible if more generations, notably the two back-crosses, had been
included in the experiment and if more replicates had been raised of the
F2 to reduce the variance of its mean.
Comparing the estimates of the parameters with their contributions
to the eight observations shows that:
(i) [d) is positive because S has a larger mean number of chaetae
than W.
(ii) [Il] is negative because the FI and F2 are nearer to W, the -[d)
parent, than to S which has [d).
e
(iii) is positive because the average chaeta number is higher at 18
than at 25° C.
(iv) gd is negative because the difference between Sand W decreases
as the overall chaeta number rises from 25 to 18° C.
(v) gh has the opposite sign to [h) because dominance decreases as the
chaeta number rises from 25 to 18° C.
These points become clearer, if, -having satisfied ourselves that on the
one hand the model is adequate while on the other it contains no redun-
dant parameters, we set out the analysis and its results in a different
way. If we concentrate on the data from a single environment we have
no information about the effects of environmental change. The four
observations from one environment can therefore be accounted for by
estimating only, m, [d) and [h], the estimates so obtained being of course
applicable only to that environment. Proceeding in this way, one environ-
ment at a time, we obtain two estimates each of m, [d) and [h) thus:
Crosses between inbred lines 155
25°C 18°C s.d.
m 19.3995 19.9403 ± 0.0581
[d] 1.2683 0.5067 ± 0.0588
[h] -0.5316 0.1067 ± 0.0831
2
X(1) 0.934 0.112
There are two X2 's one from each environment and each having 4 - 3 =
1 df. Neither is significant and the model is thus adequate at both en-
vironments.
Now min the combined analysis is a combination of the two m's from
e
the separate environments while is a measure of the difference between
the two separate m's. Similarly the combined [3] is a compound of the
two separate [d],s, while gd depends on their difference; and the com-
bined [ii] is a compound of the two separate [h),s while gh depends on
their difference. In the present case, where the variances of correspond-
ing observations are equal in the two environments, m is in fact the sim-
e
ple average of m18 and m 2S ' while is half their difference, i.e. is !(m I8 -
m2S). Similarly [£I] = !([d] 18 + [d] 2S) and gd = !([d] 18 - [d12s) while
[/1] = ! ([h]18 + [h bs) with gh = ! ([h ]18 - [h bs). The interpretation and
implication of the estimates of the six parameters from the combined
analysis of the results from the two environments are now clear. [d] falls
as m rises. Hence [d] and e are moving in opposite directions, and gd is
thus negative. Similarly, while [h] is preponderantly negative, it is rising
as e rises and gh is thus positive. A further point is brought out well by
the present estimates. At 25° C [h] is significantly negative, giving a ratio
[h]/[d] = -0.43. At 18°C [h] is positive but it does not differ signifi-
cantly from 0, although it obviously does differ significantly from [h ]2S.
The ratio [h]/[d] = 0.19. Thus the dominance, or to be more precise the
potence of the W genotype over the S changes markedly with the en-
vironment: the value of [h] as indeed that of [d] also, is not unconditional.
This is of course another way of saying that the interaction between geno-
types and environments affects dominance as well as additive variation.
One last point remains to be made about these results. The rate of
change of gd on e is -0.3808/0.2704 = -1.4085 which agrees with our
estimate of -1.4299 obtained in the previous section from consideration
of Sand W alone. Since S departs from m by [d] its interaction with the
temperature change will thus be -1.4085e but with W the deviation
from m is -[d] and the interaction is thus -gd = 1.4085e. The rate of
e
change ofgh on is 0.3192/0.2704 = 1.1804. Thus the reaction to tem-
perature of the heterozygote is not only much nearer to that of W than
156 Interaction of genotype and environment
it is to that of S - it is in fact approaching quite closely in value to that
of W. Clearly W is dominant to S in its genotype X environment inter-
action: indeed its dominance in respect of the interaction is even
greater than in respect of overall chaeta number.
1·5
1·0
S
0·5
-0·5
-1·0
-1·5
-2·0
0·6
e
Fig. 19. The sensitivity diagram for sternopleural chaetas in the Sand W
lines of Drosophila, together with their FI and F2 • FI and F2 follow the
response pattern of W more than that of S, thus indicating the dominance
of the relevant genes in W.
29. Variance of F2
So far we have been considering the situation where the environments
are defined and hence distinguishable from one another. The expression
of the different genotypes can then be observed in each environment
and the changes of expression related directly to change from one en-
vironment to another. The analysis is thus essentially one of components
of means. Frequently, however, the environments are not so definable
and unambiguously distinguishable. Thus, for example, the results from
plants grown on distinct blocks in an experimental field can be handled
by the methods we have been discussing because although we cannot
specify the chemical or physical differences between the environments
associated with the different blocks we can at least distinguish unam-
158 Interaction of genotype and environment
Environment Overall
2 Mean Variance
Parent 1 AA d+e+gd d-e-gd d (e + gd)2
Fl Aa h+e+gh h-e-gh h (e + gh)2
Parent 2 aa -d+e-gd -d-e +.gd -d (e - gd)2
Environment Overall
1· ........... t Mean Variance
Parent 1 AA d + el + gdl· ... d + et + gdt d see + g~2
Aa h + el + ghl .... h + et + ght h see + g~2
Parent 2 aa -d+el-gdl·· -d+et-gdt -d see _g~2
Then taken over all environments, the means of the parents are d and -d
respectively, that of Fl is h and that of F2 is tho The variance of the AA
paren t will be S (e + gd? which will also be V (e + gd) since with each
environment carrying l/t of the individuals the SS will also be the MS.
The variances of the other parent, Fl and F2 are similarly shown on the
right-hand column of the table. Now when the parental and Fl variances
are combined in the F2 proportions they giv~ !S(gd)2 + 1S(gh)2 + See +
tgh )2 and subtracting this from ~F2 gives the heritable component due
to the gene A-a as
H~F2 = !d 2 + !S(gd)2 + 1h2 + 1S(gh)2 + See + !gh)2
-!S(gd)2 - 1S(gh)2 - See + !gh)2
just as we found earlier in the simpler case of two environments.
The extension to more than one gene difference however, brings in a
new problem. This is simply illustrated by the case of two gene differ-
ences, A-a and B-b, in two environments. It is easy to show that the
Variance ofF2 161
variances of the four possible homozygotes will be
Thus if we use AABB and aabb as the parents from whose cross the F2 is
raised, the average of their variances will clearly be (gda + gdb)2 + e 2
while with the alternative pair of parents, AAbb and aaBB, it will be
(gda - gdb)2 + e 2. The variance of Fl will be (e + gha + ghb)2 in both cases,
so combining parents and Fl variances in the F2 proportions will give
the term in gdagdb being negative where the cross was AABB X aabb and
positive where it was AAbb X aaBB. The estimate of the basic genetical
component of the variation is thus not free from the effects of the en-
vironmental interaction where two or more genes are involved. These
residual effects depend on cross-product terms of the kinds gda gdb and
ghaghb and as the number of genes rises the number of such terms rises
relative to the number of squared terms of the kinds gda2, gdb 2, gh/' ghb 2
which are eliminated. The residual effects are therefore prospectively
the more troublesome as the number of genes in the system increases.
In the case of the gd terms the residual effects could be eliminated if
all the homozygotes (four with two gene pairs) were available for their
variances to be compounded in finding the correction to be deducted
162 Interaction of genotype and environment
from ~F2' but this will seldom be possible. The signs of the terms in
gd. gd. will however, depend not only on the intrinsic signs of th e indi-
vidual gd'S but also on whether the relevant genes are associated or dispersed
in the parental homozygotes. If the genes are suitably dispersed between
the parents the net result could be that on summing over all pairs of
gene differences the aggregate S (gda gdb) was negligible. The estimate of
D = S (d 2 ) would then not be greatly affected by the covariance of the
interactions. The sign of the terms in gh.gh. on the other hand, depends
only on the intrinsic signs of the individual gh'S. Unless therefore there
is an approach to equality in the number of positive and negative g's,
the aggregate S(ghaghb) cannot be expected to become negligible.
Similar terms in S (gd. gd) and S (gh. gh) will be associated with the con-
tributions made by D = S(d 2) and H = S(h2) respectively in the variance
derived from later generations such as F3 . The relative size of the contri-
butions made by these terms will depend not only on the variance in
question, whether for example it is ~F3 or V 2F3 , but also on the detailed
design of the experiment from which the variances are estimated. The
presence of genotype X environment interaction is, however, always
liable to introduce bias into the estimates of D and H, the amount of
bias depending on the extent to which the different gd gd and gh gh items
balance out in S(gd.gd) and S(gh.gh) respectively. Thus, wherever differ-
ences in the variances of the two parental lines and the Fl suggest size-
able interaction components of variation, we must treat the estimates of
D and H with corresponding caution.
Randomly breeding
populations
30. The components of variation
So far we have been concerned with the analysis of data obtained from
true-breeding lines and the descendants of crosses made between them.
Following such a cross, a multiplicity of generations and types of fam-
ily can be raised experimentally - a multiplicity limited only by the bio-
logical properties of the material (whether, for example, it can be selfed
as well as crossed, whether individuals can be kept alive for crossing to
their own offspring and so on) and by the time and facilities available
for the experimental programme. Each generation and type of family
will have its own mean and variance, and its own covariances with other
related families. Thus a large number of statistics can be obtained from
which we can estimate the genetical and environmental components of
both means and variances. The specification of these components of
variation is simpler because by starting with true-breeding lines we can,
in the absence of selective elimination; specify the relative frequencies
of the types of zygotes and gametes that we expect in and from any
given type of family.
When however we turn from the descendants of crosses among true-
breeding lines to consider genetically heterogeneous populations of un-
specified constitution, not only is the situation more complex, but the
range of statistics available from the populations is commonly more
limited. We can of course ascertain the mean and variance of the popu-
lation itself; but given that it is in equilibrium and that non-heritable
effects are not changing, these will be the same within sampling vari-
ation from one generation to the next. We can also compare the vari-
ation within families raised from pairs of parents with the variation be-
tween them, and we can look at the covariation between individuals of
different genetical relationships, such as parents and offspring, full-sibs,
164 Randomly breeding populations
half-sibs, first cousins and so on, provided we can recognize individuals
with these relationships. Our analysis will thus depend on differences in
second degree statistics rather than means and we shall not in general
have the direct estimates of non-heritable variation that are provided by
the variation of homozygous lines and their F 1 S in the experiments we
have discussed in earlier chapters.
Let us consider the gene pair A-a in a population in which mating is at
random, the frequency of allele A being ua and that of allele a being va =
I - ua • The incidence of the three genotypes in respect of this gene pair
will then be AA u/; Aa 2ua va; aa v/. AA and aa deviate by da and -da
respectively from the mid-parent and Aa by ha • Then in respect of this
gene pair, the population mean will be u/da + 2uavaha - v/da = (ua -
va) da + 2ua Va ha· The contribution of A-a to the variance of the popu-
lation will thus be
u/d/ + 2uavah} +v/d/- [(ua -va)da + 2uavaha]2
which reduces to 2uava [d/ + 2 (va - ua)da ha + (1 - 2ua va)h/]
which in its turn can be rewritten as
2ua va [d/ + 2 (va
- ua)da ha + (va - ua)h/ + 2ua Va h/]
= 2uaa
V [da +(va -U)h]2+4u 2 2 2
a a a Va ha .
Where the genes are independent in their action and uncorrelated in
their distribution within the population, the total heritable variation will
be the sum of a series of such terms, one from each gene pair, namely
VR = S2uv[d+ (v-u)h]2+ S4U 2 V2 h 2.
1-5
1-0
c
_Q
-'
g
o
0-5
o 0-5 1-0
Gene frequency (u)
Fig. 20. Change in the contribution made by a gene pair to DR and HR ac-
cording to u, the frequency of the dominant allele, in a randomly breeding
population, where d = h = 1.
The value of DR thus depends not only on the effects of the various genes
of the system when in the homozygous state, which we denote by d, but
also on h, their effects when heterozygous, and on the allele frequencies
t
u and v. Only when either h = 0 or u = v = (or of course when both
conditions are satisfied) does DR = D = S (d 2 ). Thus DR is not in general
the additive variation as we have defined and used this term in the earlier
chapters.
It is nevertheless frequently referred to as such. As so used it is the
166 Randomly breeding populations
TABLE 51.
The pair matings in a randomly breeding population in respect of
a single gene difference
Female parents
AA Aa aa
Frequency u2 2uv v2
AA u2 u4 2u 3 v U 2V 2 Frequency
-'"~
....
<I)
<U
P.. Aa 2uv
d
0
2 u3 v
!(d + h)
!Cd -h)2
4U 2 V 2
h
0
2uv 3
mean
variance
<I)
0;3 !(d+h) !h !ch -d)
:E k(d-h? !d 2 + kh 2 l(d + h)2
aa v2 U 2 V2 2 uv 3 v4
h !(h -d) -d
0 k(d + h)2 0
Overall mean (u - v)d + 2uvh
the last term being the correction for the overall mean. This reduces to
Ua va [da + (va - ua)haF + ua2 v/ h/.
Summing over all relevant genes we obtain !DR + fgHR .
These two variances sum to give tDR + !HR , the· total heritable vari-
ance of the population, as obviously they must. In the special case of
U = v = t, where DR becomes D and HR becomes H, the two variances
become respectively!D + fbH and !D + fgH which we have already
found for 1-2s3 and ~S3' Thus such families within a population may be
regarded as the general case of biparental families obtained from an F2 ,
just as the population itself corresponds to the general case of the F2 •
The members of a single family are distinguishable in the population
as full-sibs. The covariance of such full-sibs may be obtained directly,
but it is simpler to note that where a population is divided into groups
of like status, such as our families of full-sibs, the mean covariance of
two members of the same group can be shown to equal the variance of
the group means. We can therefore immediately write down the covari-
ance of full-sibs taken over the population as a whole as !DR + b,HR .
Where the mating system of a population is such that any parent may
leave a number of offspring, the second parent of which is however
prospectively different for each of them, this second parent being drawn
at random from the population, full-sibs will be rare but groups with one
common parent, and composed therefore of what are termed half-sibs,
may be recognized. In such a case the second parents may be regarded as
providing a set of gametes having the population frequencies of A and a,
namely ua and va' The properties of these families will thus be as shown
in Table 52. The contributions of A-a to the mean variance of the single
parent families and to the variance of their means are given at the foot of
168 Randomly breeding populations
TABLE 52.
Families of individuals having one parent in common, and thus composed
of half-sibs (HS), in a randomly breeding population in
respect of a single gene difference
[Note: since the second parents of the progeny of any common parent
are drawn at random from the population they may be
regarded as providing an array of uA + va gametes for
fusion with those of the common parent]
Common parent
AA Aa aa
Progeny u1 2uv v1 Frequency in population
d h -d Phenotype
AA u tU 0 Frequency in family
d d Phenotype
Aa v t(u + v) u
h h h
aa 0 tv v
-d -d
the table, and it will be seen that on summing over the relevant gene dif-
ferences, the heritable portion of mean variance becomes iDR + !HR and
that of the variance of the family means becomes !DR" These two vari-
ances sum to !DR + !HR the heritable variance of the population, as in-
deed they clearly should. The covariance of the half-sibs, of which these
families are composed, will of course be the same as the variance of the
family means, namely !DR •
One further statistic may be found from Table 52. The covariance of
a single parent and its offspring is found as the covariance of the com-
mon parent and the mean of its offspring as set out in the table. This is
clearly
The components of variation 169
u/ da [ua da + va ha1+ Ua va ha [(Ua - va)da + ha1+ V/ da [Va da
- ua ha1- [(Ua - va)da + 2uavahaF = UaVa [da + (Va - ua)haF
the correction term being the square of the population mean, since this
is the mean of all the parents as well as the mean of all their progeny.
Summing over all the relevant gene differences then shows the covari-
ance of parent and offspring to be tDR .
All that remains to complete these formulations of different variances
and covariances derivable from the population is to add in the appropri-
ate items for non-heritable and sampling variation. Here as in our earlier
consideration eSection 12) we must distinguish between the non-
heritable variation among members of the same family and that be-
tween families. If we denote by Ew' the non-heritable variance within
families, the mean variances of full-sib and half sib-families become
respectively tDR + nHR + Ew and iDR + nHR + Ew- Where Eb is the
non-heritable variance between families the variances of family means
must obviously include E b . They will, however, also include an item for
sampling variation which will of course be 1; V, where V denotes the
relevant mean variance and n is the number of individuals in the family
or the harmonic mean of these numbers if they vary from one family to
another. Thus if ~SR and ViSR stand for the variance of the mean and
the mean variance of full-sibeS) families as observed in a randomly breed-
ing population
~SR = tDR + -hHR + Eb + keVisR)
and ViSR tDR + nHR + Ew'
Similarly for half-sib families, denoted by the inclusion in the suffix of
HS in place of S standing for full-sibs,
~HSR
and
Now if the individuals from a family are distributed independently of
one another across the range of the environments throughout their lives
there will be no cause of non-heritable variation between families ad-
ditional to those within, and Eb = O. But if families remain together,
perhaps also enjoying parental attention as in many animal species, or
being endowed by the mother with nutritional resources on which to
draw during early life as happens in both plants and animals, there will
be non-heritable differences between families which go beyond those
170 Randomly breeding populations
within: Eb is then> 0 and will be reflected by a corresponding increase
in the variance of family means. Furthermore, since members of the
same family will share the same environment in respect of such family
effects while members of different families will not, their covariance
will reflect Eb also, whether they are sibs or half-sibs. An Eb component
must therefore also be included in these covariances which thus become
and
And if offspring in some measure share the environment of their parents
the same will be true to a corresponding extent of the parent/offspring
covariance" which must thus be written as
WpOR = !DR + E~
the prime indicating that the non-heritable effects common to parent
and offspring may not be just the same as that shown by members of
the same progeny.
These various results are collected together in Table 53. Two points
remain to be made about them. First, where nutritional resources for
early life are provided by the mother, or where parental attention is
provided and it is not the same from mother and father, the Eb compo-
nent in the covariance of half-sibs will be different according to whether
the common parent is mother or father. Secondly the non-heritable vari-
ance of the population as a whole will be Ew + Eb since each individual
will reflect both effects. Where Eb = 0 this non-heritable component of
TABLE53.
Composition of variances and covariances in a randomly breeding population
Relationship Statistic
Full-sib families VJSR == !DR + -/r,HR + Eb + 11 V2SR
(both parents common) V2SR = !DR + ff,HR + Ew
WSR = !DR + -k,HR + Eb
Half-sib families VJHSR = lDR + Eb + t V2HSR
(one parent common) V2HSR = iDR+!HR+Ew
WHSR iDR+Eb
= 0.4180
172 Randomly breeding populations
!DR + -hHR + Eb
and rss = 0.4619.
!DR+!HR+Ew+Eb
The denominator used in finding rpo is, of course, the geometric mean
of the variances of parents and offspring, but when single parents and
single offspring are used in finding WpOR and these are a fair sample from
the population, the variances of both parents and offspring, and hence
their geometric mean, will all be VR as shown. The same argument applies
to the denominator used in finding rss.
If we could further assume that the non-heritable variation between
individuals from different families was no greater than that between in-
dividuals from the same family, i.e. Eb = E~ = 0 these equations would
reduce to
lD +lH +E = 0.4180
2 R 'f R w
!DR + -hHR
and lD + lH + E = 0.4619.
2 R 'f R w
Although we would still have three parameters with only two equations
and so be unable to estimate the numerical values of the parameters, we
could find their values relative to one another or more usefully find the
relative contributions that DR' HR and Ew made to VR the total variance
of the population. Thus
MZA
Vp 14.5608 14.7307 tDR + !HR + fEw + tEb
V'p 9.6635 5.0000 Ew + Eb·
The estimates of the total variance of the MZT and the two MZA
samples ·are not significantly different. This is expected on the model
since they should all be estimates of tDR + !HR + Ew + E b. Equally the
mean scores in the three samples do not differ significantly, being 9.72,
11.86 and 10.71 respectively. This too is expected on our model which
assumes that all three samples are drawn from the same population and,
therefore, have the same genetical and environmental sources of vari-
ation. This does not of course mean that the specification of the genetical
componeJit as tDR + !HR and the environmental component as Ew + Eb
is necessarily adequate but that the genetical and environmental compo-
nents are the same for all three samples whatever their compositions. We
can, therefore, regard the males and females as replicate estimates of the
statistics for the purposes of analysis. For twins raised apart therefore
MZT
Vp 11.0819 13.0290 !DR + iHR + !Ew + Eb
Vp 8.1207 7.7199 Ew
MZA
Vp 14.6458 13.0290 !DR + iHR + !Ew + !Eb
P'p 7.3317 7.7199 Ew+Eb
DZT
Vp 11.7828 10.7368 iDR + J.rHR + !Ew + Eb
J:j.. 13.8552 12.3045 iDR + ftHR + Ew·
180 Randomly breeding populations
Fitting the full model by least squares procedures confirms our earlier
conclusion that Eb is not significantly different from zero and also reveals
that HR is not significant. A DR' Ew model may, therefore, be fitted which
with six observed statistics leaves 4 df for testing the adequacy of the
model against the replicate error. The least squares estimates are
DR = 18.6248 and Ew = 8.1605.
We can of course go further and obtain improved estimates of DR and
Ew by weighting the observed statistics by their amounts of information
(Section 9). This method gives
DR = 18.3380 ± 4.9884 c = 3.68 P < 0.001
Ew = 7.7199±1.2755 c = 6.05 P<O.OOI
which agrees with the estimates from the simpler calculations just given
and with the estimates based on monozygotic twins alone. The test of
the fit of the model based on the comparison of the observed and ex-
pected statistics from the weighted estimation leads to an approximate
Xf41 = 1.3717 (P = 0.80) which confirms once more the adequacy of
the simple model.
These four degrees of freedom for testing the adequacy of the model
are made up of two parts. Two degrees of freedom are testing the effect
of omitting HR and Eb from the full model and two are testing the
equality of the total variance components of the three types of twins
which are expected to be equal on the model. Since the DR' Ew model
is adequate this confirms that HR and EI) are not significantly different
from zero and that the total variance components do not differ signifi-
cantly. This in turn confirms the earlier test of the homogeneity of the
three total variances. We can conclude, therefore, that dominance and
the family environment have no detectable effects on the Neuroticism
score and that all three types of twins, that is MZT, MZA and DZT,
are subject to the same heritable and environmental sources of variation.
Hence, the results provide no evidence for the often assumed greater en-
vironmental heterogeneity experienced by dizygotic relative to mono-
zygotic twins.
We have now considered three sets of data each of which allow us to
separate heritable from non-heritable sources of variation. In each set,
however, it is the presence of monozygotic twins raised apart that has
permitted this partitioning. Indeed, as we have seen, we can make this
partitioning solely on the basis of MZA scores and at the same time have
The use of twins 181
available the best test for genotype X environment interactions. What we
cannot do however, without involving other types of twin data or other
kinds of family relationships is to test any other assumptions we may
care to make about the sources of variation, mating system, etc.
Providing that we retain MZT and DZT scores we can substitute di-
zygotic twins reared apart (or full-sibs reared apart) DZA for MZA to
obtain an almost equally effective test of the assumptions and estimates
of the parameters of the additive-dominance model of gene action if
adequate. The expectations of the two variances for DZA on this model
for a randomly mating population are
Vp = iDR + f.IHR + fEw + fEb
VF = tDR + -nHR + Ew + Eb·
As we have seen twins, or alternatively full-sibs, raised apart are in-
valuable for unambiguously separating heritable and non-heritable sources
of variation. The extent to which they allow us to achieve this objective,
however, rests on the validity of the assumption that the two individuals
of each twin pair are distributed at random among the family environ-
ments present in the population. We can test whether 'foster' homes are
a random sample of family environments by comparing their mean and
variance for any particular measure with those of a random sample of
'own' homes. There are a variety of measures we can use for this purpose.
We could, for example, measure the physical environment directly using
an index such as socio-economic class that has been developed by social
scientists for comparing family environments. Equally, of course, we
could measure the environment biologically as we did in Chapter 6 to
analyse genotype X environment interactions. One measure might then
be the phenotypes of the parents, either biological or foster, who pro-
vide the home environment in respect of the character in question.
This can tell us whether foster homes are a random sample. It does
not, however, tell us whether the separated twins were allocated to this
sample of foster homes at random. That is, whether there is a 'place-
ment' effect because successful attempts have been made to match the
fostered individuals with the foster home. In such a case the separated
twins would have been raised independently but in similar family en-
vironments. In order to test for such effects we would have to look for
a correlation between the family environments of separated twins. Our
measure of the family environments would again be based on an en-
vironment index or the phenotypes of the foster parents. Only if the
182 Randomly breeding populations
correlations were non-significant could we conclude that the separated
twins provided a valid estimate of the total environmental effects.
Much of the available twin data consists of MZT and DZT and as we
have already noted an unambiguous analysis of such data is not gener-
ally possible because the simplest additive-dominance, random mating
model has four parameters and we can fit only three as a maximum. If
one of the two parameter models fits, for example Ew and Eb or Ew and
DR' and the others fail we can be confident of the results. If, however,
all the two parameter models fit equally well or fail equally badly no
unambiguous conclusion is possible. We have no basis for choosing be-
tween the alternative two parameter models and all three parameter
models are equally satisfactory since all would lead to perfect fit sol-
utions. What can and cannot be achieved in these circumstances is well
illustrated by the work of N. G. Martin (1975).
Even more typical of the kind of twin data found in the literaturv are
the observations of Holt (1952) on the number of palm print ridges in
man which are presented in the form of correlations for MZT, DZT and
full-sib families. Although correlations provide a useful summary of the
data and are widely used in human genetics, they are not a good starting
point for an analysis. In particular we cannot carry out any of the tests
of assumptions that depend on a comparison of total variances. As
correlations the data have been standardized to the same unit total vari-
ance for all kinds of families and at the same time we lose one statistic
from each kind of family.
For this character mating is known to be at random. The correlation
for monozygotic twins on the additive-dominance model is therefore
tDR + !HR + Eb
r = 0.96 =
tDR +!HR +Ew+Eb
which can be rewritten
Item df MS
Blocks 11 0.0153
Sub-blocks 12 0.0063
Male groups 36 0.0167 *
Families within groups 144 0.0069 *
Plots within families 178 0.0031 *
Sampling variance of 250 0.0017
plot means
The analysis is in terms of plot means.
* Significant when tested against the appropriate error vari-
ance, which in all these cases is the MS immediately below.
186 Randomly breeding populations
the difference between sub-blocks, 15 for differences among the 16 fam-
ilies in the block and 15 for sub-block X family interaction. The first item
is of little interest to us, but the second provides information about the
effects of the genetical differences among the 16 families and the third
item is a direct measure of the variance of the non-heritable component
of variation in the family means. The 15 df for family differences are sub-
divisible into 3 for differences among the progenies of the 4 males and
3 X 4 = 12 for the differences among the progenies of the females mated
to the same male, averaged over the 4 males of the block. This last item
is clearly a measure of the variance of means of full-sib families, while
the former measures the variance among the means of half-sib groups of
families, since the 4 families tracing back to a single male each has a dif-
ferent mother and are therefore in the half-sib relationship to one
another.
Since the 12 blocks are derived from 12 different sets each of 4 males
and 16 females, we can pool corresponding items from all the blocks and
find I X 12 = 12 df for sub-block differences, 3 X 12 = 36 for differ-
ences among the progenies of different males, 12 X 12 = 144 for differ-
ences among the females mated to the same male, and 15 X 12 = 180 df
for the non-heritable component of variation of family means. Since,
however, two plots failed in the experiment, their means were estimated
by the standard missing plot technique and 2 df were lost from this total
of 180 leaving 178 in the analysis. There are of course 11 df for differ-
ences among the 12 block totals, but, like the 12 df for sub-block differ-
ences, these are of little interest to us. Each plot contained 10 plants
except in a few cases. The results were recorded as the mean yield per
plant for each plot and an analysis of variance was carried out on a
single plot basis. A further observation was, however, made. The mean
variance of plants within plots was found from a sample of the plots
used in this and another related experiment and used to derive an esti-
mate of the sampling variance of the plot means, which is recorded as
0.0017 by Robinson et al. Where Vw is the mean variance within plots
the sampling variance of the mean of plots of 10 plants would be 10 ~,
but there were missing plants in a few plots and the divisor 10 was there-
fore replaced by 9.4 which is the harmonic mean of the actual numbers
of plants in the plots.
The results of the analysis of variance require little comment. The non-
heritable variation of plot means, estimated from the family X sub-block
interaction, is clearly greater than the sampling variance of plot means
arising from the variance of plants within plots. The MS for family X
Experimental analysis 187
sub-block interaction must therefore be used for testing the MS between
females within males which, if significant, must itself be used for testing
the MS between males. Although the VR's are not large, with the high
number of df available these two items are both significant when so
tested and thus combine to provide evidence for genetical variation
among the families. The differences between sub-blocks are not signifi-
cant, while those between blocks are, but as already noted these items
are of little interest for our present analysis and will be used no further.
The further analysis of the variation into the various heritable and
non-heritable components can be carried out directly from the MS's in
the analysis of variance set out in Table 54. This is in fact the approach
used by Robinson et al. (and see M and J, pp. 226 et seq.). It is, how-
ever, somewhat easier to follow if we first find the variance of plot means,
that of family means within male groups (i.e. within groups having a com-
mon male parent) and that between male group means, all of which are
easily derivable from the MS 's ·of Table 54. Since the analysis of variance
was based on single plot observations, the variance of plot means within
families is given directly by the MS for family X sub-block interaction.
Each family included two plots, one in each sub-block, and the variance
of family means within male groups is thus! the MS between families
within groups. Finally each male group includes four families each raised
in two plots, and the variance of male group means thus becomes
1/(4 X 2) = 1 of the MS between males. The variances so calculated are
listed in Table 55, which also includes the mean variance within plots.
TABLE 55.
Components of variation of yield in the maize experiment
which is, of course, the expectation we have already found for the over-
all variance of means of biparental families. The Ne1 mating system has
thus enabled us to break the overall variance of family means into two
recognizable parts having different expectations in terms of our par-
ameters and so add a further equation for the estimation of the par-
ameters.
Returning to our analysis, we note that the means of families within
Experimental analysis 189
male groups are each based on two plots. Their variance will thus have an
expectation of !DR + T6HR + tEb + 2~ V;SR allowing us to estimate !DR
+ T6HR as 0.003 45 - 1(0.00313) = 0.00188. Since the male groups
each include four families their means will be subject to a sampling vari-
ance of one-quarter the variance of individual family means. Their ex-
pectation for the variance of male group means is thus !DR + !C!DR +
rr,HR + tEb + 2~ V;SR) and we can estimate !DR by deducting one-quarter
the variance of family means within male groups from the variance of
male group means, giving !DR = 0.002 09 - !C0.003 45) = 0.001 23.
We now have the estimates !DR + T6HR = 0.001 88 and !DR = 0.00123
giving DR = 8 X 0.00123 = 0.009 84 and HR = 16 X (0.001 88-
0.00123) = 0.0104.
Finally we note that the variance of individuals within families is
!DR + fr,HR + Ew = 0.015 98 and now having estimates of DR and HR we
can complete the analysis by finding
Ew = 0.01598-!(0.00984)-fr,(0.01040) = 0.00116.
The estimates of the four parameters DR' HR , Ew and Eb are assembled
at the foot of Table 55. Since there were only four equations (provided
by the variances of male group means, of family means within male
groups, of plot means within families and of individuals within plots
respectively) the solutions give perfect fit estimates of the parameters
and we therefore have no test of adequacy of the model: at least one
more equation, whose provision would require the experiment to be
further elaborated in an appropriate way, would be needed for such a
test of adequacy.
Various more elaborate experimental designs have been proposed
from time to time, and have indeed been used in practice in a limited
number of cases. There is, for example, the design often referred to as
North Carolina 2, in which a number of male and female parents are
used, but with every male mated to every female. This yields a quasi-
diallel set of crosses, resembling the diallel in that every male genotype
is mated to every female, and of course vice versa; but differing from it
in that (a) the male parents and female parents are separate samples
from the population of genotypes, there being no necessary correspon-
dence between them in either genotype or number, and (b) being sam-
ples from an open bred population, the parents are not fully homo-
zygous as are the parents of the diallels we discussed in Chapter 4. The
data from an NC2 experiment can nevertheless be analysed like a diallel,
although for reason (b) above, they will not yield the same estimates of
190 Randomly breeding populations
the genetical parameters as a true diallel. Thus, the variances of the
means of both the male and the female arrays yield estimates of kDR'
and not of !DR as with a true diallel, and similarly the term for inter-
action of male and female parents in the simple analysis of variance of
the quasi-diallel table depends on roHR not !HR as in the true dialle!.
Finally, the mean variance within families has a genetical component,
!DR + fr,HR in a quasi-diallel whereas in a true diallel this variance
within families is wholly non-heritable. Since this design yeilds two
estimates of DR' from the means of male and female arrays respectively,
it affords in principle a test of adequacy of the model, but it will clearly
be more a test of the assumption that male and female parents contrib-
ute equally to the phenotype of the progeny, i.e. that there are, for
example, no maternal effects, than of anything else.
Where a number of inbred, homozygous lines are available from the
population, or are otherwise readily made from it, a true diallel exper-
iment may be carried out and analysed in the normal way. Appropriate
sets of homozygous lines will however seldom be available, although such
a set has been used in at least one case. Where analysis can be carried out
by such a true diallel experiment, it will afford a better test of adequacy
of the model and will yield more informative estimates of the parameters
in the sense that their standard errors will be lower from an experiment
involving a given number of individuals, than will any of the other designs,
just as NC2 is more informative than NC I (M and J, pp. 241-3). A true
diallel, however, demands a suitable sample of homozygous lines, and
even an NC2 requires the capacity for producing a series of different
progenies from a single female by controlled matings with successive
males. Such a controlled multiplicity of matings is more likely to be
possible with plants than with animals, where indeed the possibilities
must commonly be restricted to the NCI design. In general the choice
of design will be governed more by the biological possibilities of the
species than anything else. Also because the analysis of NCI exper-
iments depends on the partitioning of variances, and variances whose
genetical components involve DR and HR with such low coefficients as
1/8 and 1/16, such experiments must be large, involving large numbers
of individuals and hence demanding of resources to carry out, if they
are to yield informative estimates of the genetical components.
Complicating factors 191
34. Complicating factors
The assumptions on which is based the model we have used in the gen-
etical analysis of populations are (a) that the genes, both allelic and non-
allelic, are distributed independently of one another in the population
under analysis and (within the limits imposed by the mating system used)
in the progenies on which are based the observations used in the analysis,
and (b) that the genes display neither non-allelic interaction nor geno-
type X environment interaction in expressing their effects. The assump-
tion of independence of gene distribution is primarily the assumption of
random mating: linkage will have little effect in a randomly mating popu-
lation unless the forces of selection impinging on the population are such
as to produce a marked linkage disequilibrium. The assumption of ran-
dom mating does not always hold good. We have already seen that there
is assortative mating (that is a phenotypic correlation between mates) in
man and it is known that mating can depart from randomness in popu-
lations of other animal species also. Indeed anything that affects the
time of sexual maturity or mating behaviour and choice can prospec-
tively lead to non-random mating. In plants a variety of mechanisms are
known to affect mating, some leading to an excess of self-mating and
others virtually to exclusive cross-mating. The latter may be regarded as
a means of ensuring effectively random mating in respect of all the genes
except those governing the mechanism itself (see Mather, 1973). The
former by encouraging self-mating must generally lead to marked depar-
tures from randomness in the direction of inbreeding and hence to pro-
portions of homozygotes in excess of those expected from the Hardy-
Weinberg equilibrium in respect of any genes that vary in the population.
Assortative mating is the preferential coming together of individuals in
mating pairs on the basis of similarity (or, in negatively assortative mating,
of dissimilarity) of their phenotypes. Inbreeding is the preferential coming
together of individuals in mating pairs on the basis of closer than average
family, and hence genetic, relationship. Inbreeding may be held to imply
a form of assortative mating; but the distinction between them is never-
theless an important one, as their consequences are not the same. They
differ in several ways. Inbreeding will tend to raise the proportion of
homozygotes in the population and if sufficiently close will lead to com-
plete homozygosis apart from the effect of recurrent mutation.
Furthermore it will do so for all the genes in the nucleus, with the
result that, as in Johannsen's beans, the population will consist of a mix-
ture of true-breeding lines. Assortative mating on the other hand, depen-
192 Randomly breeding populations
ding as it does only on phenotypic similarity, will be affected by non-
heritable agencies as well as by heritable: it will affect the distribution
of the genes mediating the character in question, but it need not lead to
any marked increase in homozygosis, even where the contribution of
non-heritable agencies is small. Indeed it will not result in any signifi-
cant rise in the proportion of homozygotes where the variation in the
expression of the character in question is mediated by a reasonably
large number of gene-differences whose effects are not grossly dissimi-
lar in magnitude. Thus the consequences of assortative mating and in-
breeding will appear in different ways in respect of continuous variation.
Because of the association of non-allelic genes of similar effect to which
it leads, assortative mating raises the contribution of DR to the variation
of the character in the population, while in so far as it does not lower
the proportion of heterozygotes, it leaves the contribution of HR un-
changed. Because it raises the proportion of homozygotes, inbreeding
also raises the contribution of DR to the variation, but because of the
concommitant reduction in the proportion of heterozygotes, the con-
tribution of HR is correspondingly lowered. With complete inbreeding
HR vanishes entirely from the composition of the variation.
Where assortative mating is operative, it can be accommodated by the
approach due to Fisher (1918) to which we have already made a brief
reference, and which has been illustrated further in its analytical situ-
ation by links and Fulker (1970). Where inbreeding is complete it is
easily accommodated in the analysis. The population then consists of
nothing but homozygotes in the proportions u AA:v aa,and its variance
will be Dp + Ew + E b , where Dp = S [4ua va d/l as shown when we were
considering the variance of the homozygous parents of a diallel in Sec-
tion 18. Where inbreeding is only partial the situation is more complex
involving DR' HR and /, the inbreeding coefficient, as well as Dp. The
analysis then becomes correspondingly complicated.
Turning to interactions, the presence of genotype X environment in-
teraction is easy to detect by a comparison of the variance of the popu-
lation over two or more environments. If the simple model assuming no
such interaction is adequate, the variances of the population will be
homogeneous: any significant heterogeneity of their variances will show
that genotype X environment interaction must be taken into account.
Kearsey (1965) has reported an analysis of the variation in flowering time
of a randomly bred population of the poppy, Papaver dubium, which he
carried out using a number of experimental designs, two of which were
NCI and NC2. He sowed samples of each of the experimental progenies
Complicating factors 193
that he used in the analysis of the population, at two different times, so
making it possible to compare the variances they yield when grown in
the two different environments experienced by plants raised at two dif-
ferent periods of the year. The mean variances of the families following
the two sowings are shown for both his NC 1 and NC2 experiments in
Table 56. Each of these four MS are based on over 320 df, and it is clear
TABLE 56.
Variation in flowering time of a population of poppies (Kearsey, 1965)
Experiment
Sowing
NCI NC2 Mean Ratio 1/2
1 36 49 42.5
VF 2.02
2 19 23 21.0
1 45 30 37.5
DR 3.13
2 10 14 12.0
1 76 159 117.5
HR 2.67
2 46 42 44.0
11 10 10.5
Ew 2 8 12 10.0
1.05
35. Heritability
The proportion that the heritable variation constitutes of the total
phenotypic variation of a character in a population is commonly re-
ferred to as the heritability of that character. The heritability is gener-
ally denoted by h 2 , but to avoid confusion with hand h 2 as we have
been using them, we will here denote it by T. A distinction is further
drawn between what are termed the 'narrow' heritability and the 'broad'
heritability. The former is the proportion that the additive genetic vari-
ation constitutes of the total variation, and the latter is the proportion
that all the heritable or genotypic variation constitutes of the total.
Thus where both additive and dominance variation are present (but
leaving aside non-allelic interaction) the narrow heritability in a popu-
lation is ~ = ~DR/(~DR + !HR + Ew + Eb ) while the broad heritability
is Tb = (tDR + !HR)/(!DR + !HR + Ew + E b)· Where dominance vari-
ation is absent ~ = Tb = tDR/(tDR + Ew + E b). It should be noted that
non-allelic interaction like dominance can change Tb without altering Tn to
a corresponding extent.
The heritability, and particularly the narrow heritability, ~, provides
a convenient summary of the situation with regard to the distribution of
variation between the genetic and the non-genetic within the population.
It is easily measured as the ratio that twice the parent/offspring covari-
ance (WpOR = !DR ) bears to the variance of the individuals in the popu-
lation, provided thatE; can be shown to be negligible or can be made
negligible or can be measured and deducted from WpOR to leave a direct
estimate of !DR . Furthermore, once we know the value of ~ it can be
196 Randomly breeding populations
used to predict the response of the population to certain types of selec-
tion. Thus if we select that group of individuals which has a greater ex-
pression of the character than the remaining group of unselected indi-
viduals and then breed them together, the mean expression of the off-
spring so obtained will exceed that of the population by R = T.t S where
R is referred to as the response to selection and S, the intensity of selec-
tion, is the amount by which the mean of the selected parents exceeds
that of the population (see Falconer, 1960). As Falconer points out,
this prediction of selective response will hold good in detail only where
a number of other conditions apply, for example, that there is no non-
allelic interaction and the scale of measurement is adequate. In any case
the predictions can be expected to be valid only in the short-term, since
response to selection must itself imply changes of gene frequency, includ-
ing some gene fixation. Nevertheless predictions of this kind have proved
to hold good, at least to a first approximation, in a high proportion of
cases.
The uses to which the concept of heritability can be put should not,
however, blind us to its limitation. These stem ultimately from two of its
features. In the first place it is a ratio, in the case of T.t the ratio of the
additive genetical variation to the total phenotypic variation of the popu-
lation. It depends therefore not just on the amount of heritable variation
in the population, but also on the amount of non-heritable. The herita-
bility can be raised not only by injecting more genic variation into the
population but also by making more stable the environment in which
the individuals are raised and develop. Equally it can be lowered by
raising the non-heritable variation as well as by reducing the heritable.
Thus, while the heritability may be a convenient summary of the situ-
ation for some comparisons or uses, it can never give as clear and in-
formative a picture as the estimates of the components of variation, DR'
HR and E. Given such estimates we can easily construct ~ or Tb which-
ever we need, should we need it, and at the same time we have compre-
hensive information which provides an understanding beyond anything
to be obtained from heritabilities and their comparison.
The second limitation of the concept of heritability stems from the
properties of the genetical components of variation, especially DR' of
which it is compounded. As we have already noted, since DR = S 4uv[d
+ (v - u)hj2 it cannot give us information about the genetical poten-
tialities of the population in the way that D = S (d 2 ) can do for the de-
scendants of a cross between two inbred lines. The value of DR not only
varies with the gene frequencies as a result of the general factor uv that
Heritability 197
it contains for each gene difference, but it also depends on the term (v -
u)h which is included with d. Now if the more common of two alleles is
dominant, v < u when h is positive and v > u when h is negative. In either
case (v - u)h will be negative and d + (v - u)h will be less than d. In the
same way when the less common allele is dominant, (v - u)h will be posi-
tive and d + (v - u)h will be greater than d. We can illustrate the effect
of this relationship by reference to the data of Robinson et al. (1949) on
yield in maize, which we analysed in Section 33. Although we used the
data there to illustrate the analysis of a population by means of the Nel
experimental design, the results were in fact derived from an F2 where of
course all u = v =!, giving DR = D = S(d 2) and HR = H = S(h2). We
found DR = D = 0.009, HR = H = 0.010 and Ew + Eb = 0.013. Approxi-
mating these findings by setting D = H = Ew + Eb = 0.01 for ease of pre-
sentation, we note that if all the genes in the system are alike in their
effects hid = y'(HID) = 1 and d = h. Then assuming that u and v are
the same for all genes we can calculate ~ and ~ for any gene frequency
that we choose. The relations of ~ and Tb to u, so obtained, are shown
in Fig. 21, from which we see that Tn becomes increasingly small as u
50r-------~--~----.---------------_,
40
>- 30
~
:.c
.9
.~
:r: 20
10
o
iJ
Fig. 21. Effect of gene frequency, u, on the narrow (Tn) and broad (Tb )
heritabilities, in %, in a randomly breeding population, where Sd 2 = Sh 2 =
E = 1. d, hand u are assumed to be the same for all gene pairs.
198 Randomly breeding popUlations
increases above 0.5, and in particular becomes relatively very small as u
rises to 0.8 or more. When, however, U < 0.5, Tn can rise to 3/2 the value
it has at U = v = 1, before falling away towards 0 as u approaches O. Thus
when u > 0.5, Tn will always underestimate the fixable genetic variation
and will grossly underestimate it as u approaches 1. When u < 0.5, Tn
can materially overestimate the fixable genetic variation until u gets
fairly close to O. If we had taken h = -d, which is also consonant with
the data, the same pair of curves would have been obtained but with v =
1 - u replacing u along the abscissa.
In both cases the abscissa is the frequency of the dominant-allele and
Tn always gives an underestimate of the fixable genetic variation when
the dominant gene is the more common; although it generally overesti-
mates it when this allele is the less common. Such evidence as we have
suggests that the dominant allele tends to be the more common in popu-
lations. We must expect therefore that although Tn may tell us how the
population will respond to simple mass selection, it will underestimate
the changes that can be obtained if we set about our breeding programme
in a different way. If, for example, instead of applying mass selection to
the population, we first of all raise from it a number of at least partially
inbred lines, choose the best of these, cross them together in pairs and
select further from their F2 's, progress can be made going well beyond
anything that our estimate of Tn would suggest. Experience in breeding
maize, for example, accords with this expectation.
One last point remains to be made. If we have estimates of both Tn
and Tb, we can find Tb - Tn = !HR/(tDR + !HR + Ew + Eb) and this can
be compared with tTn = !DR/(tDR + !HR + Ew + Eb) to give us an esti-
mate of HR/DR . In our example, HR/DR is always greater than 1 when the
frequency of the dominant allele is greater than 0.5. If we failed to re-
member the composite nature of DR' we would be in danger of taking
this as evidence of preponderant over-dominance of the genes in the popu-
lation, when no such over-dominance was, in fact, present.
Genes and
effective factors
and we have an estimate of k, the number of genes in which the two lines
differ.
In arriving at this estimate of k we have made four assumptions, that:
(a) there is no non-allelic interaction,
(b) the gene differences are of equal effect,
(c) there is complete association of like alleles in the parents,
(d) there is no linkage of the genes.
What are the consequences on the estimate of k if these assumptions fail?
Taking non-allelic interaction first, it will be recalled from Section 20
that when allowance is made for such interaction the means of the two
200 Genes and effective factors
parental lines become m + [d] + [i] and m - [d] + [i]. So, half the par-
ental difference is still [d], and no complication is introduced into the
numerator of the fraction which yields our estimate of k. Turning to
the denominator, however, we note that D = S(da + tSia)2 in F2 and S3
and it will exceed S(d/) or fall short of it according to the preponderant
sign of the j's, and by an amount which will depend also on the extent
and magnitude of this interaction (see Section 21). The estimate of k
can thus be biased upwards or downwards by i interaction. If we have
the data for estimating D in more than one generation we may be able
to correct it for the effect of the interaction, since in F 3 , D = S (da +
! Sia)2 and in F4 it changes further to S (da + i Sia)2 so allowing us to ex-
trapolate to S(da2 ). An extensive set of observations would be necessary
for such a procedure and no attempt has yet been made to find k in the
known presence of non-allelic interaction.
Turning next to the assumption of equality of gene effects, we note
that if these effects are not in fact equal we can define d as their average
- -
and then write da = d (1 + O'a)' db = d (1 + O'b) and so on. It can then be
shown that our estimate of k becomes k = k/( 1 + ~) where Va is the
variance of 0' or equally the variance of d/{j (see M and J, p. 309). Thus
inequality of the gene effects must always lead to an underestimate of k.
To take an example, where there are three gene differences of equal
effect, d being 2 for each of them (da = db = de = 2) with the + alleles
all in one parent and the - alleles in the other (shown as ~2 , ~2 , ~2 in
Table 58). [d] = 2 + 2 + 2 = 6 and D = S(d 2 ) = 12, giving k = 6 2/12 = 3,
which of course equals the true k. If however, we have three genes of
unequal effects, with da = 3, db = 2, de = 1, again with complete associ-
ation of like alleles in the parents (shown as ~3 , ~2 , ~l in Table 58),
[d] = 3 + 2 + 1 = 6 as before, but D = 3 2 + 22 + 12 = 14 giving k =
6 /14 = 2.57, so underestimating k. In this case d = 1(3 + 2 + 1) = 2
2
and da = 2(1 + t), db = 2-{1 + 0) and de = 2(1 - t) giving O'a = t, O'b = 0,
O'e = - t and ~ = HP + 0 2 + (_t)2] =!. Then k = k/(1 + ~) = 3/(1 +
!) = 2.57 as already found.
Incomplete association of like alleles also leads to an underestimate,
and generally a much greater underestimate, of k, since [d) is necessarily
less than S (d). If we write S (d+) for the summed effects of the genes,
whose + alleles are present in the larger parent and S (d_) for the summed
effects of those whose - alleles are also present in that parent, [d] =
S (d+) - S (d_) = S (d) - 2 S (d_) and we can obtain a measure of the
Estimating the number of segregating genes 201
TABLE58.
The consequences of inequality of gene effects and incomplete association of
like alleles for the estimate of the number of gene differences.
[Note: The effects of the three gene differences and the distribution of alleles
between the parents are shown in the left-hand column. Thus, for
example, _~ _~ _~ indicates that all gene differences are of equal
effect (all d = 2) with the + alleles concentrated in one parent and
the - alleles in the other; while _~ -; _~ indicates gene differences
of unequal effect (d = 3, 2 and 1 for them respectively) wi th the -
allele of the second gene associated with the + alleles of the other two.]
Assumptions
[d] r D Va
Equal effects Complete association
2 2 2
v v 6 12 0 3.00
-2 -2-2
3 2 1 1
f v 6 14 7) 2.51
-3 -2-1
2 2-2
v f 2 ~ 12 0 0.33
-2-2 2
3 2 -1 2 1
f f 4 :J 14 7) 1.14
-3-2 1
3-2
-3 2 -1
f f 2 ! 14 1
7) 0.29
3-2 -1
-3 2 1
f f 0 0 14 i 0.00
20,--------------,
1·5
,
k 10
0·5
Fig. 22. Effect of linkage on the estimate, k, of the number of units of in-
heritance where two genes, with da = 3 and db = 1 show the recombination
frequency p. C indicates the coupling (that is, like alleles associated) and R
the repulsion (that is, like alleles dispersed) arrangements of the genes.
204 Genes and effective factors
against p. This is of course to be expected but we should note that with
tight linkage k lies close to 1, and even with p as high as 0.1, k is still
close to 1, especially with coupling where it is 1.08, although even with
repulsion it has fallen only to 0.77. Thus even where recombination
occurs, the two genes still appear more like a single unit of inheritance
than like two, unless the linkage is loose and recombination fairly fre-
quent. Where linkage is reasonably tight therefore we are estimating not
the number of genes but the number of effective units of inheritance or
effective factors as they are termed. We should note further that with
reasonably tight linkage k is much the same whether measured from the
coupling or the repulsion cross. Thus with p = 0.05, ke = 1.04 and
k:R = 0.87. The difference between the two cases lies not so much in
the number of effective factors as in the average effect of that factor:
with coupling [d]e = 4 and de = [d1e/ke = 4/1.04 = 3.85 while with re-
pulsion [d]R = 2 and dR = [d]R/kR = 2/0.87 = 2.30. This is of course a
very simple example that we have taken for illustrative purposes. Clearly,
however, the same principle will hold where a greater number of genes
are linked and so aggregated into a single effective factor. At the same
time the number of possible arrangements of the genes in relation to
one another is much greater and the change in the effect of the factor
from the most dispersed to the most associated will be correspondingly
greater. The same principle will hold also where more than one group
of linked genes is segregating. Thus, for example, with four genes falling
into two groups, each comprising two genes with da = 3 and db = 1, and
p = 0.05 in both cases, the two groups being unlinked with each other,
we should find k = 2 X 0.87 = 1.74 and d = 2.30 when both groups
were in the dispersed arrangement, and k = 2 X 1.04 = 2.08 and d =
3.85 when both were in the associated arrangement.
So, if we cross two parental lines differing at a number of loci which
fall into linked groups, and with the alleles at the loci within the groups
preponderantly in the dispersion arrangement, and select for high and
low expressions of the character in the descendants of the cross, we
expect to pick up and fix recombinants within the groups and so to
have replaced the preponderantly dispersed arrangements of the parental
groups by preponderantly associated arrangements in the selected lines.
Then on estimating k from the cross between the selected lines we would
expect to find k much the same as that found from the cross of the
parent lines themselves, but with d increased to an extent corresponding
to the effectiveness of the selection in raising and lowering the ex-
pression of the character in the high and low selective lines respectively.
Consequences of linkage: effective factors 205
This is well illustrated by an experiment described by Mather (1941) in
which two lines of Drosophila melanogaster were crossed. Beginning with
the F2 , selection was practised over thirteen generations for an increased
number and over twelve generations for a decreased number of abdomi-
nal chaetae. The selected lines were then crossed with each other and an
F2 raised.
The results of this experiment are summarized in Table 59, where the
means and variances shown are the averages of males and females. The
- -
mean numbers of abdominal chaeta (PJ and P2 ) are shown for the two
lines that were crossed together for both the original cross, with which
TABLE 59.
k and d in the original lines and the selection lines derived
from their cross, in a selection experiment for abdominal chaetae
in Drosophila melanogaster (Mather, 1941)
the experiment was started, and the cross between the two selected lines,
high and low, derived from that original cross. In each case the non-
heritable component of the ~F2' the variance of the F2 , was estimated
by combining the variance of PI' P2 and Fl in the F2 proportions, thus
VE = ! VpJ + ! VP2 + 1- VFJ· ~F2 - VE is taken as an estimate of !D. This
assumes that H is 0 and so almost certainly overestimates D, but the
overestimation is unlikely to be serious since there was little evidence
of dominance in these crosses and in any case H makes only half the
contribution of D to ~F2. Nevertheless to the extent that H exceeded
0, k will be an underestimate, although the bias will be equal for the
two crosses unless the dominance ratio HID differs between them.
It should be noted too, that the estimate of D and hence that of k will
be less precise in the case of the original cross since the small difference
between ~F2 and VE , from which D is found, will render it subject to
sampling variation proportionately much greater than in the cross be-
tween the selected lines where the difference between V~F2 and VE is
much larger.
Despite these necessary provisos, however, the results are clear and
206 Genes and effective factors
striking. In both crosses there are some two, or if we allow for the low-
ering of the estimate arising from inequality of their effects, perhaps
three effective factors, but in the cross of the selected lines the average
effect of the factors is about3t times as great as in the original cross.
The effect of selection has been to build up greatly the effects of the
units of inheritance that we can detect and whose number we can esti-
mate by biometrical methods.
These findings have a simple interpretation in terms of linked groups
of genes, and indeed as we have seen are to be expected on that basis.
They afford us the prime clue to our understanding of how selection
acts by rearranging linked combinations of the genes - polygenic com-
binations as they are called. They also emphasize to us the distinction
between the effective factors that we can detect and the genes that we
postulate and of which the factors are made up. Effective factors are
not genes which can change only by the process (or combination of
processes) that we term mutation. Their physical basis lies in the pieces
of chromosomes marked and delimited by the genes - all members of
the same polygenic system - through whose effects they are recognized.
And being pieces of chromosome, they can change their genic content
and hence their effects by recombination. They thus have a quality of
lability and hence of transcience much greater than that of their con-
stituent genes, which can change only by mutation. True they will be
changed by the mutations of their constituent genes, but this is a rarer
event than is the recombination whicb- takes place within them as many
experiments have shown. Recombination within effective factors rather
than mutation of their constituent genes is the basis for understanding
the reassortment of polygenic variability and hence of response to selec-
tion. It is a basis, too, which allows us to understand the way in which
selection appears to create the polygenic variability upon which response
to its impact depends (Mather, 1973) and this is reflected in the combi-
nations of constancy, or near constancy, of k with change in d.
Furthermore, since the basis of the effective factor is a piece of
chromosome, we must expect it to include not only a number of linked
genes which are members of the same polygenic system and hence affect-
ing the expression of the character through which the factor is recog-
nized, but also other genes, members of other polygenic systems affect-
ing other characters. The properties in action of an effective factor can
thus transcend the properties of the individual genes of which it is com-
posed, in at least two ways. First a factor comprising two or more genes
in a preponderantly dispersed arrangement, each of which is dominant
Other sources of estimates 207
in the same direction, can show overdominance as a factor even though
none of the individual genes shows overdominance. This is indeed one of
the classical explanations of the occurrence of heterosis in the FI of two
inbred lines and of course by the same token also of inbreeding de-
pression. Secondly, taking into account the admixture of different poly-
genic combinations in the same piece of chromosome, the effective factor
can show pleiotropy in its action even though none of its constituent
genes shows pleiotropic action as an individual. Such a 'pleiotropy'
provides a basis for understanding the correlated responses to selection
that are so commonly and so extensively observed. But being a pleiotropy
that depends on linkage, it can be resolved by recombination, thus those
correlated expressions of two or more characters which we recognize as
correlated responses to selection can be, and indeed in experiment regu-
larly have been, resolved by giving time and opportunity for recombi-
nation to reassort the genic content of the effective factor (Mather, 1973).
k = H V2~3
H VVF3
208 Genes and effective factors
where V;F3 is of course the mean variance of these F 3's. VVF3 is the vari-
ance of the variances and the subscript H denotes that it is the heritable
portion of the variances about which we are talking (M and J, p. 311).
This estimate is the K2 of Mather and Jinks. Similar estimates can be
derived from the variances of groups of S3 and also second back-cross
families. This type of estimate has one great advantage over the esti-
mates we have been using: in the absence of linkage it is unaffected by
the association or dispersion of alleles in the parental lines, just as in
the absence of linkage D is unaffected by association or dispersion
although [d) is. It has, however, two disadvantages over and above its
requirement for an F3 or similar generation to be raised. The first is
that it is more affected by inequality of the effects of the genes than is
the k we have been using. This is, however, probably not actually so
serious a matter as the fact that to obtain it we have to estimate not
just V;F3 and V VF3 ' but the heritable components of these variances
H V;F3 and H VVF3' To do so involves the use of a number of corrections
based on the estimates of non-heritable variation obtained from parents
and Fp and these corrections may not be small by comparison with the
F3 variances that they are used to correct. The estimate of k that is ulti-
mately obtained is thus likely to be subject to a proportionately greater
standard error and the confidence with which it can be used is corre-
spondingly reduced.
Useful estimates of this kind can nevertheless be obtained where the
necessary data are available (M and J, pp. 319-24), and if obtainable
they can be put to very good use because as we have already noted they
are not affected by the dispersion of like alleles between the parents.
Now going back to the estimate of k that we have chiefly been dis-
cussing in this chapter, we found k = [dF/D, which can be rewritten as
[dF = kD. Given, therefore that we have an independent estimate of k
and knowing D, we can find [dF and hence [d). And given further that
the estimate of k we are using is independent of the association or dis-
persion of the genes the [d) that we do find will in fact be an estimate
of S(d). So if we cross two parent lines and, by raising from them F2 ,
back-crosses and F 3's or any other combination of families that will
give us the value of D together with a k of the second kind (K2 ) we can
calculate S(d). This will tell us whether we can expect to produce lines
that will transcend the parent lines in their expression of the character
we are considering, and indeed how far they will so transcend them.
The value of such information to a breeder concerned to enhance or
diminish the expression of the character needs no emphasis.
Other sources of estimates 209
Still a third basic method of estimating the number of effective fac-
tors has recently been developed by links and Towey (1976). It depends
on ascertaining the proportion of individuals in a generation, say the F2 ,
which are heterozygous for at least one gene - or rather one effective
factor. This proportion is found by raising a progeny, an F3 family for
example, from each of a number of individuals in the F2 . Two individuals
are selfed from each F3 family, and if the two F4 's so produced differ in
either mean or variance (or of course both) in respect of the character
under observation, the two F3 individuals must have had different geno-
types and the F2 individual which gave rise to the F3 from which they
were taken must have been heterozygous for at least one effective fac-
tor. Thus the proportion of F2 heterozygous for at least one unit is ascer-
tained and assuming no linkage of the effective factors their number can
be estimated. Once again, the estimate must be minimal since there could
have been gene differences too small to detect by families of the size used;
but equally the estimate will be unaffected by dispersion of the genic
differences in the parents. It can then be used in the same way as the K2
estimates derived from the variances of F3 or similar families and there
are fewer corrections to be made in the process of estimation, although
of course it requires continuing the experiment for an extra generation
to F4 .
Conclusion
A~a (B~b, etc.) A pair of alleles, a gene pair, a single gene difference. A is
the allele which increases and a that which decreases the
expression of the character.
A (E, C etc.) Individual scaling tests.
b Regression coefficient.
c A measure of gene association in the parental lines of a
diallel.
d The departure of one of a pair of corresponding homo~
zygotes from their mid~point or mid~parent (m). It is
positive for the homozygote carrying the increasing
allele and negative for that carrying the decreasing allele.
The relevant gene pair may be denoted by a subscript:
thus AA departs from m by da and aa departs from m by
-da ·