0% found this document useful (0 votes)
65 views

Introduction To Biometrical Genetics-By Kenneth Mather

Uploaded by

prince dhiman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Introduction To Biometrical Genetics-By Kenneth Mather

Uploaded by

prince dhiman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 238

Introduction to

Biometrical Genetics
Introduction to
Biometrical
Genetics
KENNETH MATHER
C.B.E., D.Sc., F.R.S.

Professor of Genetics in the University of Birmingham


(Formerly Vice-Chancellor and Professor of Genetic.
in the University of Southampton)

JOHN L. JINKS
D.Sc., F.lnst. Bioi., F.R.S.
Professor and Head of Department of Genetics
in the University of Birmingham

LONDON
CHAPMAN AND HALL
First published 1977
by Ozapman and Hall Ltd
11 New Fetter Lane, London EC4P 4EE
© 1977 K. Mather and J. L. Jinks
Set by Hope Services, Wantage
and printed in Great Britain
at the University Printing House, Cambridge

ISBN-13: 978-0-412-15320-4 e-ISBN-13: 978-1-4613-3387-6


DOl: 10.1007/978-1-4613-3387-6

This title is available


in both hardbound and paperback
editions. The paperback edition is
sold subject to the condition that it
shall not, by way of trade or otherwise, be
lent,re-sold, hired out, or otherwise circulated
without the publisher's prior consent in any form of
binding or cover other than that in which it is
published and without a similar condition
including this condition being imposed
on the subsequent purchaser
All rights reserved. No part of
this book may be reprinted, or reproduced
or utilized in any form or by any electronic,
mechanical or other means, now known or hereafter
invented, including photocopying and recording,
or in any information storage or retrieval
system, without permission in writing
from the publisher
Contents

Preface page vii


1. The genetical foundation
I. Continuous variation 1
2. The genic basis 4
3. Assaying the chromosomes 10
4. Locating the genes 14
2. The biometrica1 approach
5. The manifestation of polygenic systems 21
6. Genetic analysis and somatic analysis 25
7. Biometrical genetics 29
3. Additive and dominance effects
8. Components of means 32
9. Testing the model 35
10. Scales 42
II. Components of variation: F 2 and back-crosses 47
12. Generations derived from F 2 51
13. The balance sheet of genetic variability 57
14. Partitioning the variation 59
4. Diallels
15. The principles of diallel analysis 68
16. An example of a simple diallel 72
17. Undefined diallels 85
18. An example of an undefined diallel 90
5. Genic interaction and linkage
19. Non-allelic interaction 99
20. Interaction as displayed by means 104
2I. Variances and covariances 111
22. Correlated gene distributions: linkage 116
23. Diallels 124
vi Contents
6. Interaction of genotype and environment
24. Genotype x environment interaction 130
25. Two genotypes and two environments 134
26. A more complex case 138
27. The relation of g to e 144
28. Crosses between inbred lines 151
29. Variance of F 2 157
7. Randomly breeding populations
30. The components of variation 163
31. Human populations 171
32. The use of twins 174
33. Experimental analysis 183
34. Complicating factors 191
35. Heritability 195
8. Genes and effective factors
36. Estimating the number of segregating genes 199
37. Consequences of linkage: effective factors 202
38. Other sources of estimates 207
9. Conclusion
39. Designing the experiments 210
40. Concepts and uses 215
Glossary of symbols and abbreviations 219
References 224
Index 227
Preface

In the second edition of Biometricai Genetics, which appeared in 1971,


we set out to give a general account of the subject as it had developed
up to that time. Such an account necessarily had to be comprehensive
and reasonably detailed. Although it could be, and indeed has been, used
by those who were making an acquaintance with this branch of genetics
for the first time, it went beyond their needs. We have been encouraged
therefore to write an introduction to the genetical analysis of continuous
variation aimed primarily at senior undergraduate and postgraduate
students, and concentrating on basic considerations, basic principles and
basic techniques. This has meant, of course, omitting all reference to
some phenomena of more restricted interest, notably sex-linkage, ma-
ternal effects, haploidy and polyploidy. It has meant, too, that even with
some phenomena which have been included, like interactions, linkage
and effective factors, the discussions cannot go into full detail. Anyone
who is interested, however, can find further information in Biometricai
Genetics, to which detailed references have been given where it ap-
peared that these would be helpful.
The order of presentation has been changed with the aim of making it
easier for beginners. It is now presented basically in terms of phenomena,
additive-dominance variation being taken first, followed by genic interac-
tion, correlated gene distributions and genotype X environment interac-
tion, rather than in terms of the type of data to be analysed, with means
first followed by second degree statistics. We believe that this will be
found to be more acceptable to the student and will enable him to mas-
ter the basic phenomena in all their manifestations before proceeding to
those which add complexities to the fundamental models and analyses.
We have, however, continued to defer consideration of populations until
after that of crosses between true-breeding lines, since, although histori-
cally populations were dealt with by Fisher before simple crosses, the
restrictions on the information to be gained from populations and the
possibilities as well as the limitations of its interpretation cannot be
viii Preface
appreciated until the analysis and interpretation of data from simple
crosses are understood. In this, of course, biometrical genetics follows
the pattern already set by classical genetics.
We have taken some of the examples we use from the earlier book,
but we have sought wherever possible to use new illustrative material.
And although our aim has been to simplify the presentation, we have
taken the opportunity in a number of places to bring in relevant ad-
vances made since Biometrical Genetics was written some six years ago.
We have assumed that the reader is familiar with basic genetics and
basic statistics.
Biometrical genetics is still too widely regarded as an esoteric form of
genetical endeavour, tortuous, over-difficult and of little but theoretical
interest. Basic misapprehensions still appear to be abroad, such as that
it requires the assumptions of normal frequency distributions and simple
additivity in action of the genes and the environment if its analyses are
to be meaningful. We hope that this book will help to dispel such notions.
We hope too that it will assist the student to a balanced appreciation of
biometrical genetics, its theoretical structure and its analytical method-
ology, its aims and its approach, its capabilities and its limitations, and
above all its unique value in practical situations that many geneticists,
especially applied geneticists, inevitably encounter.
We are indebted to Dr P. D. S. Caligari for his help in the preparation
of the script, and to the Leverhulme Trust Fund for financial assistance
during the writing of this book.

November 1976 K.M.


1.L.l.
The genetical foundation

1. Continuous variation
Mendel laid the foundation of genetics by the study of differences which
divided his peas into sharply distinct categories. Thus there was never
doubt as to whether one of his plants was tall or short, or its flowers red
or white and so on: the categories did not overlap. He was able to show
that each phenotypic class corresponded to one, or at any rate only a few,
genotypes and that where there was more than one genotype in the pheno-
typic class they could be separated by further appropriate breeding tests,
that is by the clearly distinguishable classes of plant to which they gave
rise among their descendants following appropriate test matings. He was
thus able to infer the genes, or factors as he called them, upon whose
behaviour hereditary transmission depends, and it has been by the further
study of such gene differences in many species of plants and animals that
our knowledge of the genetic materials has largely been built up. We
should note, however, that plants or animals may differ in this sharply
distinct way for reasons other than the genes they carry; in fact because
of the environments in which they have lived their lives. Thus the water
crowfoot, Ranunculus aquatilis, has quite different leaves when growing
in running water than when growing on land. In such a case, of course,
observation of the environments suggests at once that the difference is
not genetic, or at least not wholly genetic, in its causation; but in general
an appropriate breeding test is necessary to establish this point.
Now, differences by which individuals are divided into sharply distinct
categories are not the only variation to be seen in either natural popu-
lations or experimental families. Mendel's peas themselves showed further
variation, for his ta11s ranged from 6 to 7 ft or even more in height and
his shorts from 9 to 18 inches (see Bateson, 1909). The important thing
for his experiments and their interpretation was that despite the variation
within the classes, the taIls and shorts did not overlap in height: each indi-
vidual could be classified unambiguously as tall or short. There was in
fact a discontinuity in the distribution of heights between tall and short,
2 The genetical foundation
all plants below the discontinuity being short and all above it tall; and as
Mendel showed, they differed correspondingly and consistently in their
genotypes.
The same complexity of variation can be seen in other species. For
example, in man we can recognize dwarf individuals which owe their
character to a single gene difference from normals, from whom they are
generally clearly distinguishable in respect of stature. Yet people who
are not distinguishable in this way - those of normal stature - are not all
alike. Indeed they range widely in stature; but the variation they show is
of a different kind, with every stature represented between wide limits.
The middle statures of the range are the most common and if we exam-
ine a large number of individuals we find that the gradations from one
stature to the next are so fine as to be almost imperceptible. There are
in fact no discontinuities in the distribution of normal stature: the vari-
ation is continuous.
Such continuous variation is ubiquitous in living things and, apart
perhaps from a few special cases like antigenic specificity, it is displayed
by all characters. Thus in general there is no distinction between con-
tinuous and discontinuous variation in the characters by which they are
displayed and indeed, as we have already seen, we quite commonly ob-
serve the two kinds of variation side by side in the same family or popu-
lation. So, whatever the reasons for the differences between the two
kinds of variation, they are not mutually exclusive.
Some examples of continuous variation are shown in Fig. 1. In princi-
ple the number of classes into which individuals can be divided accord-
ing to the manifestation of the character is limited only by the accuracy
of the measurements we can make. We find it convenient, however, to
group the individuals whose measurements fall between certain limits,
which we choose for our own convenience, and represent the variation
by recording the numbers falling into the various classes defined in this
way. We then obtain histograms as illustrated in Figs. 1 (a) and (c) from
which the general shape of the distribution resulting from the variation
can be seen. It should nevertheless be remembered that the grouping we
are using is purely arbitary: it does not spring from discontinuities in the
variation itself and so provides no basis for an analysis of the causes of
variation in the way that Mendel showed to be possible with discontinu-
ous variation.
One class of character, however, requires a special word. Sometimes
the very nature of the character itself imposes certain discontinuities on
the variation it shows. Thus the number of vertebrae in a vertebrate
Continuous variation 3

20 (a) Man

15
<f!.
>-
u
c
Q) 10
&
~
5

0 60 65 70 75
Inches

40 40

(b) Drosophila (c) Nicot/ana


30 30
0~
>-
u
c
Q)
20 20
&
Q)
It
10 10

o 0
14 16 18 20 22 24
Chaetae

Fig. 1. Frequency distributions illustrating three examples of continuous


variation. (a) Stature, in inches, of 8585 men; (b) number of sternopleural
chaetae in 200 individuals of Drosophila melanogaster; (c) time of first
flowering, in days after sowing, in 200 individuals of Nicotiana rustica. In
all cases the frequencies are expressed as percentages of the total number
of individuals observed. In (a) and (c) the discontinuities of the histograms
are imposed on the distribution by artificial grouping of the observations,
for purposes of representation: the characters are truly continuous in their
variation. In (b) however, the discontinuities of the histogram arise from
the nature of the character, since we cannot recognize fractional chaetae:
the variation is quasi-continuous. The frequencies in (a) and (b) accord
with the normal distribution, but (c) departs from the normal in that it
shows positive skewness.

animal, or bristles on an insect, can display only a specific set of values,


for the number must be an integer since fractional vertebrae or bristles
are ruled out. Such a character is said to be meristic and its variation
4 The genetical foundation
quantal, the expression changing by quanta and not smoothly as in truly
continuous, or as it is often (although somewhat loosely) called, quanti-
tative variation. Such quantal variation is illustrated in Fig. 1 (b) which
shows the frequency distribution of the number of sternopleural chaetae
(bristles situated on the surface of the thorax between the front and mid-
legs) in a line of Drosophila melanogaster. The distribution is very like
the histograms of Figs. 1 (a) and (c) although now the class limits are not
set arbitrarily by the observer but by the quantal nature of the character
itself. Such variation has been described as 'quasi-continuous' (Griineberg,
1952) because it suggests a truly continuous variation of an underlying
potential for the manifestation of chaetae, an interpretation which, as
we shall see, accords well with the extensive experimental information
that we have about this character in Drosophila. We may note too that
this very character can also show truly discontinuous variation, for the
gene Sp (Sternopleural) has been recognized because it increases the
number of sterno pleural chaetae to an extent which, at any rate in flies
raised at higher temperatures, results in a sharp discontinuity of chaeta
number between wild-type and Sp individuals.

2. The genic basis


Continuous variation is ubiquitous and Darwin himself emphasized the
significance for continuing adaptative and evolutionary change of the
small cumulative steps which it makes possible. It is important too for
plant and animal breeders since it is as characteristic a feature of the
commercially important characters - yield, fertility, quality, confor-
mation and so on - of domesticated species as it is of the biologically
significant characters on which depends the success of a species in the
wild. Means of analysing such variation and especially of uncovering the
way in which the genetic materials play their part in its determination is
thus of prime importance to both our understanding of organisms in the
wild and our manipulation of them for practical purposes under domes-
tication. At the same time, the Mendelian approach is denied to us by
the absence of those clearly distinct classes from whose contrasts genes
can be inferred and from whose frequencies the properties of these genes
can be investigated. How then are we to proceed?
Clearly our approach must start with the frequency distribution, to
which a continuously varying character gives rise when it is observed in
a number of individuals, as illustrated in Fig. 1. Such a distribution is
characterized by certain statistics of which its mean and variance are the
The genic basis 5
most important for our purpose, and to which we can add the relevant
covariances or correlations where the simultaneous distributions of re-
lated individuals are available. If we can find a way of interpreting and
understanding these means, variances and covariances in genetical terms
they provide the analytical tool that we need.
This approach was pioneered by Galton (1889) in the attempt that he
made to elucidate the principles of heredity in the days before genetics
as we know it had come to life with the so-called 'rediscovery' of
Mendelism in 1900. Galton's investigations were continued and extended
by Pearson, and their application of statistical mathematics to biological
problems marked a significant step in the growth of that aspect of quan-
titative biology which we now call biometry or biometrics. They showed
us the quantities in terms of which continuous variation can be analysed
and Galton was indeed able to demonstrate through the calculation of
correlations between relatives (a concept which he introduced) that
there must be an hereditary component in continuous variation. He got
no further, however, and little progress was in fact made in understand-
ing the genetical implications of these statistical quantities until, in a
classical treatise published in 1918, R. A. Fisher showed how the bio-
metrical findings not only could be interpreted, but indeed in some
respects virtually demanded interpretation, in terms of Mendel's factors,
by then termed genes and known to be carried on the chromosomes. In
so bringing together the Galtonian approach and the Mendelian basis,
Fisher laid the foundation of what we know as biometrical genetics.
The first great principle of genetics is that the phenotype is the result-
ant of the individua1's genotype and the environment in which that indi-
vidual develops and lives its life. The phenotype can thus be altered by
both change in the genotype and change in the environment. We would
thus expect there to be an element in continuous variation that sprang
from variation of the environment as well as an element depending on
differences among the genotypes. That this was indeed the case was first
demonstrated by Johannsen (1909) from his observations on the dwarf
bean (Phaseolus vulgaris), which shares with many other species of plant,
including Mendel's peas, the property of regular self-pollination. Given
Mendelian heredity therefore, we would expect individuals generally to
be homozygous for their genes. All their progeny would thus be geneti-
cally alike, and would constitute what Johannsen called a pure line;
although of course different pure lines might be expected to be geneti-
cally different in being homozygous for different genes. Johannsen
isolated 19 such pure lines and he was able to show that when compari-
6 The genetical foundation
sons were made between the lines, the average weights of daughter beans
were related to those of their parents, but that when comparisons were
made within lines there was no such relation (see Darlington and Mather,
1949). Thus bean weight, a continuously variable character, showed only
non-heritable variation within lines, but there was a genetical component
in the differences between lines.
Thus the heritable and non-heritable differences were jointly respon-
sible for the variation in seed weight of the beans; they were of the same
order of magnitude in their effects; and they could be distinguished only
by a breeding test. All the many analyses of continuous variation under-
taken over the years on many characters in many species, both plant and
animal, have revealed this combination of heritable and non-heritable
agencies in the determination of continuous variation. We return to it
later, but one further point remains to be noted now. The distribution
of sternopleural chaetae shown in Fig. 1b is from a pure line of
Drosophila melanogaster, which was produced by inbreeding over many
generations. The variation is therefore all non-heritable. Now a fruit fly
has sterno pleural chaetae on both sides of its thorax and the numbers of
chaetae on the two sides~ when averaged over many individuals, are alike.
Yet in a single individual they are not always exactly the same, differing
frequently by one or two chaetae and at times by even more (Table 1).
It is difficult to attribute these differences to differences in the external
environmental agencies impinging on the two sides of the fly, or rather
of the larva from which it developed. The differences are much more
plausibly attributable to the vagaries of development, in cell division and
so on, affecting the two sides differently. The bilateral difference is thus
generally taken as a measure of the stability, or instability if one looks at
it the other way, of the developmental processes. They are non-heritable
differences but are not due to environmental differences in the strict
sense. Furthermore, an analysis of variance of the chaeta numbers of
flies from an inbred line shows that the variation between flies, though
higher than that between the sides of the same fly (thus revealing the
action of environmentally determined differences between individuals),
are not markedly higher (Table 1). Thus the non-heritable differences
that we can observe between individuals in this or any other species, are
not always and not wholly to be attributed to differences in the environ-
ment: they may in part, even in large part, be reflecting an instability of
development.
Turning now to the heritable component of the variation, it was ob-
served by Galton that not only was there a correlation between parent
The genic basis 7
TABLE 1.
Non-heritable variation for sternopleural chaeta number between and within females
of the Samarkand inbred line of Drosophila melanogaster.

Chaeta no. (sum of sides) 17 18 19 20 21 22 23 24 Total


Number of flies 11 31 55 55 36 25 7 221
Mean Chaeta no. 20.787

Difference between sides 0 2 3 4 Total


Number of flies 61 96 49 13 2 221
Mean difference 1.090

Analysis of variance
df MS
Between flies 220 2.196
Within flies
(= between sides) 221 1.996
1.996/2.196 = 91 % of the variation between flies is a reflection of the
developmental variation within flies.

and offspring in their manifestations of the character he was observing


(usually some morphological character, like stature in man); but that the
correlation was the same between male parents and their offspring as it
was between female parents and their offspring. This strongly suggests
that both parents contribute equally to the heredity of their offspring
as reflected in the variation under observation, in other words that the
hereditary element is transmitted equilinearly from the two parents in
continuous variation just as it is with Mendelian genes. This equilinearity
of transmission has been confirmed time and time again in experiments
where reciprocal crosses made between two parents have produced fam-
ilies which, apart from differences attributable to sampling variation,
were alike in their mean expressions of the character and in their vari-
ances also. Reciprocal differences are seen no more commonly in the
study of continuous variation than in any other kind of genetical inves-
tigation, and when they do appear it is chiefly where the study of
Mendelian genes warns us to expect them to appear, for example where
the unequal transmission of sex chromosomes might be expected to be
involved.
8 The genetical foundation
The equilinearity of relationship between parent and offspring gives a
strong presumption that the heritable element of continuous variation
reflects the effects of genes transmitted in the same way as Mendelian
genes, that is by the chromosomes, but acting in some way to produce
this quantitative type of variation. That this is indeed the case has been
amply demonstrated by experiments, particularly in Drosophila melano-
gaster where the experimental analysis can be taken further than in other
species. In this fly, inversions are available in each of the three major
chromosomes (X, II and III) which largely, although in most cases not
entirely, suppress recombination of genes between the inverted chromo-
some and its normal wild-type counterpart in heterozygous females.
These inversion chromosomes can also be marked by dominant mutants.
The marker genes make it possible to follow the marked chromosomes
from one generation to another, and the inversions ensure that the
marked chromosomes are transmitted as units largely free from genic
erosion by recombination when they are kept heterozygous with their
normal counterparts. In consequence these chromosomes are of great
use in a variety of ways for analysing genetical differences.
Mather and Harrison (1949) had twelve lines, all wild-type, but
ranging from 36.00 to 70.25 in their average numbers of the abdominal
chaetae, or sternites, borne on the ventral surfaces of the 4th and 5th
abdominal segments - a character which shows quasi-continuous vari-
ation like that of the sternopleural chaetae. They crossed each of their
lines to a tester stock which carried inversions in all three major chromo-
somes, each of which was marked by a dominant gene, the X by Bar eye-
shape (B), II by Plum eye-colour (Pm) and III by Stubble bristles (Sb).
The abdominal chaetae numbers were determined for the Fl female flies
that were heterozygous for the B, Pm and Sb chromosomes. These Fl
chaetae numbers differed, of course, from those of the parent wild-type
lines, because unlike the parents they were not homozygous for the wild-
type chromosomes but heterozygous for the wild-type and the marked
chromosomes. The B, Pm, Sb Fl females were then back-crossed each to
its wild-type parent line. The resulting families contained eight classes of
daughters, distinguishable by the segregation of the X chromosome
marked by B, II marked by Pm and III marked by Sb. We need, however,
note only that, apart from the effects of any recombination the inver-
sions had failed to suppress and from the effects of the small chromo-
some IV which was not followed in the experiment, the wild-type daugh-
ters would be genetically like the original parent line, since they were
carrying none of the marked chromosomes, while the B, Pm, Sb progeny
The genic basis 9
would be like the FI , heterozygous for the marked and wild-type hom-
ologues of X, II and III. The chaetae numbers of these classes were also
determined.
The differences in chaetae number (y-ordinate) between the two
classes in the back-cross progenies from the twelve lines are plotted in
Fig. 2 against the differences in chaeta number (x-abscissa) between the

Difference
recovered
12

10

Original
-10 7-6- 8
-4 • 0
-2
2 4 6 8 10 12 14 difference

-4

III!. -6

Fig. 2. Mather and Harrison's (1949) data relating the genetical component
of variation for the number of abdominal chaetae in Drosophila melanogaster
to the chromosomes. The slope of the regression shows that 81 % of the vari-
ation in chaeta number is unambiguously ascribable to genes borne by the
three major chromosomes, which on allowing for genes which the experiment
could not be expected to pick up accords with all the heritable variation
being mediated by nuclear genes.

parent lines and their respective F} s. A negative value of x indicates that


the parent line had fewer chaetae than its Fl' and a positive value that it
had more. Negative and positive values appeared of course when wild-
type lines with low and high chaetae numbers respectively were com-
pared with their Fls, and the size of the difference reflects the heritable
contributions of the wild-type chromosomes since the marked chromo-
somes were the same in all the crosses. A negative value of y indicates a
similar shortage of chaetae on the Wild-type progeny in the back-cross
by comparison with their B, Pm, Sb sisters, and a positive value indicates
a corresponding excess. There is a direct relation between y and x, the
regression of y on x being 0.8073. This means that for every difference
of one chaeta between parent and Fl' a difference of 0.8 of a chaeta was
recovered in the back-cross.
10 The genetical foundation
The implications of these results are clear. First, hereditary elements
mediating the continuous variation of abdominal chaeta number must
segregate just like Mendelian genes since the differences between the
parents and F} reappear within the back-cross families. Secondly, since
these differences reappear between just those classes whose chromo-
some constitutions are like those of parents and Fl' the hereditary units
in question must be borne in the chromosomes. Cytoplasmic units
cannot in any case be involved in the recovered differences as the two
types of back-cross fly were from the same mother in each case. Thirdly
and finally, since 81 % of the paren t-F} difference was recovered on aver-
age, genes carried by these three major chromosomes must be respon-
sible for a minimum of 81 % of the heritable differences in abdominal
chaetae number among the parent lines. We should however recall that
inversions do not always fully suppress recombination, and in these
experiments the inversions used in chromosome III would probably
suppress recombination in only one of the two arms of the chromosome,
while that in the X would allow some recombination in the centre of the
chromosome. At the same time we should bear in mind that the small
chromosome IV was not controlled, and any difference due to its genes
would not be recovered consistently between the wild-type and B, Pm,
Sb progeny in the back-cross. The recovery of 81 % of the differences
actually achieved, therefore, makes it very likely that all the hereditary
determinants of the variations in abdominal chaeta number are carried
by the chromosomes. In other words the hereditary element in continu-
ous variation springs from genes borne on the chromosomes in just the
same way as the genes familiar from Mendelian analysis.

3. Assaying the chromosomes


Marked chromosomes can also be used to build up homozygous lines
which carry the three major chromosomes from any two wild-type
stocks in all the eight possible combinations. The wild-type stocks are
crossed with that carrying the marked chromosomes, and the wild-type
chromosomes are carried heterozygous against their marked homologues
until they have been brought together in each of the eight combinations.
Similar heterozygotes are then mated together and their wild-type
progeny, which will be true breeding for the relevant combination of
wild-type chromosomes, are used as the foundation of the desired line,
the marked chromosomes being thus eliminated at the last stage in the
construction of each line. Caligari and Mather (1975) have used this
Assaying the chromosomes 11
approach in the analysis of the differences in sternopleural chaeta num-
ber between two inbred lines, Samarkand (Sam) and Wellington (Well).
Denoting the Well chromosomes X, II and III by WWW and those of
Sam correspondingly by SSS, the eight homozygous lines WWW, WWS,
WSW, WSS, SWW, SWS, SSW and SSS were built up using appropriate
marked chromosomes, WWW and SSS being of course reconstructions
of the Well and Sam lines from which the chromosomes were originally
taken. The extent to which the WWW and SSS lines differ from Well and
Sam is a measure of the recombination that went on between the wild-
type and the marked chromosomes during tlie construction of the eight
lines and also, of course, of any effect of the small chromosome IV
which was again not controlled in the experiment.
As part of a larger experiment Caligari and Mather raised these eight
lines, together with Well and Sam, in three types of culture container at
a temperature of 2l.S°C. All cultures were replicated so as to yield an
estimate of error variation. The three types of culture container differed
a little in the mean numbers of chaetae borne by the flies they yielded,
but there was no evidence that the eight lines reacted differentially to
these effects of the containers and the results have therefore been aver-
aged over containers as well as over replicate observations. The means of
the eight lines are shown in Table 2. The first point to note is that SSS
exceeded WWW by an average of 19.717 - 18.350 = l.367 chaetae,
whereas Sam exceeded Well by 1.908 chaetae. There has thus been a
72% recovery of the parental difference in the reconstituted SSS and
WWW lines. Now during the construction of the eight lines every wild-
type chromosome was kept heterozygous with its marked homologue
for at least four generations and so had at least four opportunities of
recombining with it, by contrast with the single opportunity for recom-
bination in the experiment of Mather and Harrison discussed above. So
despite the use of marked chromosomes more effective in their sup-
pression of recombination than those of Mather and Harrison, the
greater number of opportunities for recombination has resulted in
some loss of the parental difference; but again it was a sufficiently
small reduction to be consonant with the hereditary element in the
variation of sterno pleural chaeta number arising from genes borne on
the chromosomes.
We can, however, take the analysis further. Since all combinations of
the X, II and III chromosomes from Well and Sam are present equally in
the eight lines, we can obtain estimates of the effects on chaeta number
of the gene differences in each of the three chromosomes, by the use of
12 The genetical foundation
TABLE 2.
Sternopleural chaeta numbers in the eight substitution lines from the inbred stocks
Samarkand and Wellington of Drosophila melanogaster raised at 21·5°C
Difference in chaeta number Sam -Well = 20.650 - 18.742 = 1.908
SSS -WWW = 19.717 - 18.350 = 1.367

Substitution Mean chaeta number


line
Observed Expected 1 O-E1 Expected 2 O-E2

WWW 18.350 18.296 0.054 18.394 -0.044


(m-dx -d2 -d3 ) (m-dx-d2W-d3W)

WWS 18.925 19.613 -0.688 18.881 0.044


(m-dx-d2 +d3 ) (m-dx -d2W+ d3W)

WSW 18.625 17.850 0.775 18.581 0.044


(m-dx +d2 -d3 ) (m-dx +d2 W--d3W)

WSS 19.025 19.167 -0.142 19.069 ...0.044


(m-dx +d2 +d3 ) (m-dx + d2W+ d3W)

SWW 18.650 18.800 -0.150 18.702 -0.052


(m+dx -d2 -d3 ) (m+dx -d2S -d3S)

SWS 20.900 20.117 0.783 20.848 0.052


(m+dx -d2 +d3 ) (m+ dx -d2S+ d3S)

SSW 17.675 18.354 -0.679 17.623 0.052


(m+dx +d2 -d3 ) (m+dx+d~d3S)

SSS 19.717 19.671 0.046 19.769 -0.052


(m+dx +d2 +d3 ) (m+dx +d2S+ d3S)

Overall m = 18.983
Overall dx = 0.2521
d2 = -0.2229 ± 0.0618
d3 = 0.6583

With X chromosome
from Sam Well
d2S = -0.5396 d 2W = 0.0938
0.0874
d3S = 1.0729 =
d3W 0.2438
±

orthogonal functions such as are employed in the analysis of variance


(see Mather, 1967). The effect of the X chromosome, for example can
be found as i (SSS + SSW + SWS + SWW - WSS - WSW - WWS -
WWW). Substituting the observed line means from Table 2 we then find
Assaying the chromosomes 13
dx = ! (19.717 + 17.675 + 20.900 + 18.650 -19.025 -18.625 -18.925
- 18.350) = 0.2521 which means that any line carrying the Sam X
chromosome will exceed the overall mean of the experiment (m =
18.9833) by 0.2521 chaeta because of this chromosome, while any line
carrying the Well X will similarly fall short of the overall mean by
0.2521 chaeta. The effects, d2 and d3 , of chromosomes II and III can be
found similarly, using the appropriate functions, and are shown just
below the main body of Table 2, together with the relevant standard
error based on the estimate of error variance obtained from the repli-
cation of the observations referred to above. This standard error applies
to the estimates of effect of all three chromosomes, and all three d's are
significant. Thus all three chromosomes must be carrying genes influ-
encing the average number of sternopleural chaeta. It will be observed
too that d2 has a negative sign, whereas both dx and d3 are positive. Now
in finding dx we gave lines carrying the Sam X a positive sign and those
carrying the Well X a negative one. Thus a positive value for d x means
that the Sam X mediated a higher chaeta number than the Well X.
Similarly the positive value for d3 means that the Sam III chromosome
gives a higher chaeta number than Well III. The negative value for d2
means, however, that the Sam II chromosomes give a lower chaeta num-
ber than Well II. So the X and III chromosomes in the parental lines are
reinforcing each other in their effects on chaeta number, but the II
chromosome is acting in the opposing direction.
We can construct expected values for the average chaeta number of
each line from the overall mean of the experiment, m, and dx , d2 and d3 .
Thus the expected value for SSS = m + d x + d2 + d3 = 18.9833 + 0.2521
+ (-0.2229) + 0.6583 = 19.671. The expectations are shown in column
three of Table 2 and the differences between them and the observed
means in column 4. There is broad overall agreement with expectation
and in some lines, notably SSS and WWW, the observed and expected
values of the means agree well, but in other cases, notably WWS, WSW,
SWS and SSW, the numerical agreement is not nearly so good. The
reason for this is to be seen if we look at the way the chromosomes
combine together to produce their effects. Chromosomes II and III show
no influence on one another's effects on chaeta number: the difference
in effect between the Sam II chromosome and Well II is the same no
matter which chromosome III they are with, and vice versa. In other
words the effects of these chromosomes simply add on to one another,
and we can arrive at the joint effect of either II with either III by sum-
ming their individual effects, taking the sign into account, of course.
14 The genetical foundation
Thus the deviation from the mean resulting from the combination Well II
and Well III is -d2 - d3 , that from WS is -d2 + d3 , that from SW is d 2 -d3
and that from SS is d2 + d3 . Effects summing in this way are said to be
additive, or in the statistician's terminology they show no interaction.
The situation is different, however, when we look at the effects of
these two chromosomes in relation to that of the X, for neither II nor
III produce as big a difference when present with the Well X as they do
when present with the Sam X. The effects of chromosomes II and III in
the presence of each of the two X's are shown at the bottom of Table 2,
d2S and d3S being the effects of II and III respectively with Sam X, and
d2W and d3W their effects with Well X. The value of d2S is, of course,
found as 1/4(SSS+SSW-SWS-SWW) = 1/4 (19.717 + 17.675 - 20.900
- 18.650) = - 0.5396 and so on. The effect of chromosome II does not
differ significantly from zero when it is with Well X but is quite large
with Sam X. This relation is like that termed epistatic in classical gen-
etics (see Fig. 38, Darlington and Mather, 1949). Chromosome III pro-
duces an effect even in the presence of Well X but it is only a quarter of
that produced in the presence of Sam X. It should be observed, however,
that this influence of the X on the effects of II and III does not alter the
additive relations holding between II and III themselves: d2S and d3S are
additive just as are d2W and d3W . The mean numbers of chaetae expected
for the eight lines, allowing for the interactions of X with II and III by
the use of d2S etc., are shown in the fifth column of Table 2, and the
differences between these and the means observed are given in the sixth
column. These differences are now Quite small in every case: there is
good agreement between observation and expectation. When allowance
is made for the ways in which the genes in the different chromosomes
interact with one another in producing their effects (interactions which,
it should be observed, have their counterparts in classical genetics) the
variation in sterno pleural chaeta number is accountable in terms of these
genes.

4. Locating the genes


The foregoing experiment has led us to recognize that each of the three
chromosomes whose effects have been assayed carried one or more genes
affecting the number of sternopleural chaetae. Since, however, inversions
have been used in the marked chromosomes to suppress recombination
as far as possible, each chromosome has behaved as a unit in hereditary
transmission and we cannot tell whether its effect was due to only a
Locating the genes 15
single gene difference or to more than one, and if to more than one, how
these were distributed along the chromosome. In order to take the analy-
sis further we must turn to a different procedure, which is in essence anal-
ogous to the three-point experiment by which in classical genetics the
locus of a third gene is ascertained in relation to two genes of known loci.
Two marker genes, capable of being recognized and followed in trans-
mission by the familiar methods of classical genetics and of known pos-
itions in the linkage map, are used to provide the base line for mapping
the third gene which is of unknown location and, because it contributes
to continuous variation, is not capable - or at least not readily capable -
of being followed by a classical methodology.
Consider the situation where a gene A-a, which contributes to continu-
ous variation, is segregating at the same time as two marker genes, G-g
and H-h. It should be noted that this time the marker genes are not associ-
ated with any inversion, since unhampered recombination is essential in a
three-point experiment, being in fact the means by which the location of
the new gene is ascertained on the genetic map. There are two situations
possible, in the first of which A-a lies between G-g and H-h as shown in
the upper part of Fig. 3, and the second in which A-a lies outside the
length of chromosome delimited by G-g and H-h as shown in the lower
part of Fig. 3. Let the frequency of recombination between A-a and G-g
be PI' that between A-a and H-h be P2' and that between G-g and H-h be
P3. We will assume that the map distances between the genes are suf-
ficiently small for interference to be complete, i.e. for there to be no
double crossing-over within the length of chromosome we are discussing.
Each of the recombination values is thus also the frequency of crossing-
over.
Let us deal first with the situation where A-a is between G-g and H-h
[Fig. 3 (upper)]. Since interference is complete P3 = PI + P2. The triple
heterozygote GAH/gah will produce six types of gamete with the fre-
quencies shown in the figure, gametes of types GaH and gAh not being
produced because of the absence of double crossing-over. If the triple
heterozygote is back-crossed to gah/gah, zygotes of the six correspond-
ing types (GAH/gah, gah/gah, etc.) will be produced with corresponding
frequencies. These six genotypes fall into four classes distinguishable by
the segregation of the marker genes, namely GH/gh, Gh/gh and gH/gh
and gh/gh, but we cannot distinguish between A/a and a/a in the same
way since this gene contributes to continuous variation and its segre-
gation is obscured by non-heritable variation and by the effects of any
other genes which contribute to the variation of the character and which
16 The genetical foundation
(a) P3
I' '1
G
9
~ ~
a h
I' I II

!
~ P",

GAH}_p Ga H}O
9ah 3 gAh

GAh
gaH
}p'" Gah}
gAH ~

(b)
,- P",

~a
G H
I-- ? P3 ~
~
~
AGH}
agh I-p", A9H}O
aGh

AGh} A9h}
agH P3 aGH p,

Fig. 3. Locating a gene difference (A-a) affecting continuous variation by


reference to two marker genes (G-g and H-h). The gametic output of the
triple heterozygote is shown, above, where A-a lies between G-g and H-h,
and below, where A-a lies outside the segment delimited by G-g and H-h.
P1 is the frequency of recombination between A-a and G-g; P2 that between
A-a and H-h; and P3 that between G-g and H-h. Interference is assumed to
be complete.

may also be segregating. We can, however, record the average expression


of the character in each of the classes GH/gh, Gh/gh, gH/gh and gh/gh.
Let Aa add an increment d and aa an increment of -d to the mean ex-
pression.
The four classes distinguished by the marker genes are shown in
Table 3, together with the frequencies in which they occur associated
with A and a respectively, and also their overall frequencies in the
progeny, which must of course depend on P3' (Note that only the genes
which these genotypes received from the triple heterozygotes, and by
which they are distinguished, are shown in the Table: all individuals
received gah from the other parent.) Now all individuals in the marker
class GH carry A, and hence will show a mean expression of m+d in
respect of the continuously varying character where m is the mean ex-
pression of the whole experiment. Similarly the gh class always carries
Locating the genes 17
TABLE 3.
Locating a gene between the markers
(Observed results from Wolstenholme and Thoday, 1963)

Frequency
Marker class Mean Observed
A a Joint

GH !(1-P3) 0 !(1-P3) d 21.16


1 1 1
Gh 'lP2 'lP] 'lP3 d (P2-P])/P3 19.59
1 1 1
gH 'lP] 'lP2 'lP3 -d (P2-P])/P3 18.86
gh 0 !(1-P3) !(1-P3) -d 17.86
!(GH - gh) = d = 1.650 d = 1.650
!(Gh - gh) = d (P2- P])/P3 = 0.365 p] = 0.050
p] + P2 = P3 = 0.129 P2 = 0.079

a and so has a mean of m-d. The Gh class comprises two genotypes:


GAh with a frequency of !P2 and an expression m+d, and Gah with a
frequency of !p] and an expression m-d. The mean of the Gh individ-
uals will thus be
(!P2 d -!p]d)/(!P2+!P]) = d(P2-P])/(P]+P2) = d(P2-P])/P3·
The mean of the gH marker class is similarly -d(P2-P])/P3. Now writing
GH for the mean expression of marker class GH, we can see that
!(GH-gh) = ! [d-(-d)] = d,

and !(Gh-gH) = !d[P2p3 P]- (_P2p3 Pl)] = d P2p3 P].

In addition, P3 can be founa from the frequencies of the four marker


classes, and we can thus obtain estimates of d, PI' P2 and P3.
Turning now to the second situation where A-a lies outside the piece
of chromosome determined by G-g and H-h, the types of gamete pro-
duced by the triple heterozygotes are shown, together with their fre-
quencies in Fig. 3 (lower). Table 4 is obtained from Fig. 3 (lower) in the
same way as Table 3 was from Fig. 3 (upper). Again, of course, P3 can be
found from the frequencies of occurrence of the four marker classes, but
we see that the estimate of d is yielded not by !(GH-gh) but from the
recombinant marker classes as !(Gh-gH). The difference between the
parental marker classes GH and gh, provides an estimate of P]+P2' since
HGH-gh) = d(l-P]-P2)/(l-P3) from whichp] + P2 can be found, as
18 The genetical foundation
TABLE 4.
Locating a gene outside the markers

Frequency
Marker class Mean
A a Joint
GH !(I-P2) !Pj d (I -pj-P2)/(I -P3)
Gh !P3 0 d
gH o 1P3 -d
gh !Pj !(I-P2) -d (I -Pj-P2)/(I -P3)
!CGh-gH) = d
!(GH-gh) = d(I-pj- P2)/(I-P3)
Pj+P3 = P2

we have already estimates of d and P3' Now when A-a is to the left of
G-g, P2 = Pj +P3 giving P3 = P2-Pj' Then Pj and P2 can be estimated as
!(Pj+P2-P3) and !(Pj+P2+ P3) respectively.
We can illustrate this method of locating a gene contributing to con-
tinuous variation by reference to data from Wolstenholme and Thoday
(1963). These authors report a number of such experiments in Drosophila
meianogaster, and the results of one of these experiments are set out in
the right-hand column of Table 3. The continuously varying character is
the number of sternopleural chaetae while the marker genes are clipped
wing (cp) and Stubble bristles (Sb), which are located respectively at
45.3 and 58.2 on the standard map of chromosome III. The average
number of chaetae for the four marker classes are shown in Table 3, but
the authors do not report the frequencies of these classes. A direct esti-
mate of P3 is thus not available from this experiment, but the marker
genes are.12.9 units apart on the standard map, and P3 may therefore be
taken as 0.129.
The first thing is to note that the GH class has the greatest mean num-
ber of chaeta and gh the lowest. The gene affecting chaeta number (A-a)
must thus lie between the two markers: had it been outside, the G-h and
gH classes would have shown the extreme mean chaeta numbers (see
Table 4). We then proceed, using the formulae of Table 3 to find
d = !(GH-gh) = !(21.l6-l7.86) = 1.650 and
d(P2-pj)/P3 = !(Gh-gH) = !(19.59 -18.86) = 0.365
0.365 X 0.129
giving = 0.0285.
1.650
Locating the genes 19
With Pj +P2 = P3 we then find Pj = !(0.129 -0.0285) = 0.050 and
P2 = t(0.129 + 0.0285) = 0.079.
The experiment thus places the locus of A-a at 0.05 X 100 = 5.0 units
to the right of cp and 7.9 units to the left of Sb, that is at locus 50.3 on
the standard map of chromosome III.
It has been assumed for the purpose of illustration that the effect on
sternopleural chaeta number was acribable to a single gene. In fact
Wolstenholme and Thoday obtained evidence that two genes were most
probably involved. They used in their analysis a technique, introduced
by Thoday (1961), of using progeny tests to ascertain the number of
classes genetically different in respect of chaeta number included in each
of the marker classes. This method of Thoday's has been used by Davies
(1971) to show that genes at a minimum of fifteen loci, scattered over
the lengths of all three major chromosomes, are involved in the heri-
table variation of sternopleura1 chaeta number in Drosophila melano-
gaster, and that similarly at least fourteen or fifteen loci, not the same
as those for sternopleural chaetae, are involved in the variation of ab-
dominal chaeta number in this fly. Further evidence from other experi-
ments of various kinds also indicates that the minimum number of gene
loci in the variation each of these two chaeta characters is likely to be
nearer 20 than 10.
Summarizing, these experiments with Drosophila melanogaster show
us that the heritable component of the continuous (or to be more pre-
cise, quasi-continuous) variation in both abdominal and sternopleural
chaeta number depends on genes which are carried on the chromosomes
and which will therefore segregate and recombine in just the same way
as the familiar genes of classical genetics. Furthermore, within the tech-
nicallimitations of the experiments, the whole of this heritable compo-
nent is accountable in terms of such chromosome-borne genes. Differ-
ences in chaeta number may reflect the simultaneous action of genes
carried on all of three of the major chromosomes and finer analysis
reveals that at least some fourteen or fifteen loci must be involved.
The effects of the different genes supplement one another, their effects
sometimes combining in a simple additive fashion, but sometimes inter-
acting in such a way that the combined effect is not simply the sum of
the individual actions. At the same time, overlaying the variation due to
these genes is variation traceable to environmental agencies or to the
vagaries of development, variation which is distinguishable from that
due to the genes only by a breeding test. Finally the effects traceable to
individual genes, or even to whole chromosomes, may be no greater in
20 The genetical foundation
magnitude, and indeed may often be smaller than the effects of the non-
heritable agencies. In other words, as revealed in these experiments the
heritable portion of continuous variation depends on genes transmitted
in the Mendelian fashion, but acting in polygenic systems, the member
genes of a system having effects similar to one another (and to those of
non-heritable agencies), capable of supplementing one another (whether
in simply additive fashion or not) and small in relation to the non-
heritable variation, or at least in relation to the variation in the system
as a whole.
The biometrical approach

5. The manifestation of polygenic systems


The evidence that we have examined in the previous chapter showed that
continuous variation is partly heritable and partly non-heritable, the two
components being separable only by appropriate breeding tests. The non-
heritable component springs partly from the impact of differences in
external environmental agencies, but it may also reflect vagaries in the
internal development of the individuals. The heritable component of the
variation, as exemplified in the Drosophila experiments, depends on
genes at many loci scattered over all the chromosomes, but working
together in a polygenic system. Because of their small, similar and sup-
plementary effects on the phenotype of its constituent genes, such a
system characteristically gives rise to continuous variation, in which the
effects of the individual genes cannot be traced except by using special
techniques such as are available in well studied species like Drosophila.
Polygenic systems have properties which are basic to our understand-
ing of the genetical structure of populations, their variation and their
responses to selection (Mather, 1973). These, however, are not our
present concern, which is the genetical analysis of the continuous vari-
ation that these systems characteristically produce.
A very simple example of a polygenic system and the variation it pro-
duces is illustrated at the top of Fig. 4. Two gene pairs are involved, A-a
and B-b, the alleles denoted by capital letters each adding a unit to the
expression of the character, and those denoted by small letters each sub-
stracting a unit from it. It is assumed that these genes show no domi-
nance, i.e. the expression of a heterozygote, Aa or Bb, is mid-way be-
tween those of the corresponding homozygotes, AA and aa or BB and
bb. The effects of the genes at the two loci supplement one another in a
simply additive fashion and the alleles A and a are equally common as
are Band b. The genes at the two loci are assumed to be uncorrelated in
their distribution, so that the frequencies shown for the various geno-
types are those which would be obtained in an F2 where the genes are
No dominance
Eq.ual freq.uencies

-
x=z
I
v=li £=1 V=I~

No dominance
Uneq.uol freq.uencies

Fig. 4. The polygenic interpretation of continuous variation. The upper-


most histogram shows the distribution of phenotypes with two genes of
equal and additive effect, and without dominance, neglecting non-heritable
variation. The frequencies of alleles A and a, and also of Band b are equal.
Each capital letter adds t and each small letter -t to the phenotypic ex-
pression. The two histograms in the centre show the effect of dominance
for, on the left, one gene and, on the right, both genes. Dominance is as-
sumed to show itself by the gene denoted by the small letter having no
effect when heterozygous with its allele denoted by the capital letter . The
histogram at the bottom shows the effect of unequal gene frequencies: the
frequencies of A and B are assumed to be i and those of a and b to be !.
The mean (i) and variance (V) are shown below each histogram. In these
examples both dominance and unequal gene frequencies produce skewness
in the distribution, besides altering the mean and variance.
The manifestation of polygenic systems 23
unlinked. The genic composition of the family is shown at the top of the
figure and the distribution of the phenotypes, in the absence of non-
heritable differences, is shown immediately below it. Because of the ab-
sence of dominance and the simple additivity in their effects of the non-
allelic genes, the phenotypic expression of any genotype is proportional
to the difference between the numbers of capital letters (denoting alleles
enhancing the character) and small letters (denoting alleles diminishing
the character). As a consequence certain genotypes give the same pheno-
type as one another, the most striking example being provided by AaBb,
AAbb and aaBB, which all contribute to the central and most common
phenotypic class. This similarity of the phenotypes associated with sev-
eral genotypes combines with the greater frequencies of certain geno-
types in the family to produce a frequency distribution in which the
central expression is the most common and the extreme expressions
most rare, as is characteristic of continuous variation. Since each gene
which enhances the character is matched by an equally common allele
which diminishes it, the distribution has a mean (x) of 0, a variance (V)
of I and is symmetrical.
We can vary the-assumptions on which the model is based. Suppose,
for example, that we introduce dominance at one of the two loci, say
A-a, such that Aa no longer falls on the mid-point between AA and aa
but has a phenotype like that of AA. The genotypes occur with the same
frequencies as before, but AaBB has a phenotype of 2 like AABB, AaBb
joins AABb in having a phenotype of I, and Aabb joins AAbb and aaBB
in having a phenotype of 0, leaving aaBb and aabb with phenotypes of
-1 and -2 respectively. The frequency distribution of phenotypes is

°
thus changed to that shown in Fig. 4 (centre left). The mean has been
raised from to!, the variance has increased to L\, and the distribution
is now asymmetrical with the long tail at the lower end. Making both A
and B dominant over their respective alleles changes the distribution
even more. The mean has risen further to I and the variance to I! while
the asymmetry is now so great that the extreme large phenotype is the
most common and certain of the phenotypes have vanished altogether.
Let us now revert to the assumption of no dominance, but alter the
gene frequencies so that A and a and Band b are no longer equally
common in the population. Let A occur with three times the frequency
of a and B with three times that of b, or to put it another way, let the
gene frequencies be A i; a 1 and B t b 1. The genotypes will give the
same phenotypes as in the original model at the top of Fig. 4, but they
will occur with different frequencies. Thus the proportion of AABB
24 The biometricai approach
individuals will be i- X i- X i- X i- = -,fh, that of AaBB and AABb will each
be 2 X i- X ! X i- X i- = ~, and so on. The resulting frequency distribution
of phenotypes is shown at the bottom of Fig. 4. In some respects the
change in the distribution resembles that brought about by dominance:
the mean is again raised to 1 and the distribution is asymmetrical with
the long tail towards the lower end. This new distribution differs how-
ever from that produced by dominance in that the variance has not been
raised but in fact reduced from 1 to 3/4. Thus both the assumptions of
dominance and unequal gene frequencies result in change of the bio-
metrical properties of the distribution of phenotypes, and each produces
its own characteristic syndrome of changes.
Although broadly resembling the distribution of a continuously
varying character, the distributions in Fig. 4 differ from it in one import-
ant respect: they are not strictly continuous since the phenotypes fall
into a small number of discrete classes. This difference stems from three
simplifying assumptions that we have made in the models on which the
frequency distributions of Fig. 4 are derived. In the first place we have
assumed that the effects of A-a and B-b are alike: had we not made this
assumption a larger number of phenotypes would have been possible.
Secondly, we have assumed the absence of non-heritable variation: its
presence would have blurred the boundaries of the phenotypic classes
given by the various genotypes and caused them to overlap, so produc-
ing continuous variation. Thirdly, we have been considering a very
simple polygenic system comprising only two gene pairs, which when
the action of the two gene pairs are alike produces only five phenotypic
classes, non-heritable effects apart. The consequences of raising the
number of gene pairs in the system are illustrated in Fig. 5. With four
loci involved, there are nine phenotypic classes and with eight loci there
are seventeen. Thus, given the same overall difference between the ex-
treme phenotypes the step produced by each gene substitution is smaller,
and a given change requires more gene substitutions to produce it, the
more genes there are in. the system. The result is a closer approximation
to continuous variation, and although in principle there are small dis-
continuities still present in the distribution of phenotypes, decreasing
amounts of non-heritable variation would serve to blur them and give
full continuity.
One further point should be observed about the distribution shown
in Fig. 5. All of them are based on the assumptions of no dominance and
equal frequencies of the two alleles at each locus. In consequence all the
distributions are symmetrical and have means of O. But the variances of
Genetic analysis and somatic analysis 25

-2 -I 0 1 2
X=O V=I

4 genes

-2 -I 0 1 2 -2 -I 012
X=O V-l
- 2 X=O V-l
-4

Fig. s. The effect of change in the number of genes in the polygenic sys-
tem. The three histograms show the distributions where the systems com-
prise two, four and eight genes respectively. In all cases the gene frequencies
are equal, and the genes in the system have equal and additive effects, with-
out dominance. The range between the highest and lowest expressions of
the character is the same in all three cases, the genes in the four gene and
eight gene cases thus having individual effects respectively one-half and
one-quarter of those of the genes in the two gene case. The number of
genotypic, and hence phenotypic, classes rises with the number of genes
and the approximation to fully continuous variation becomes closer. The
mean of the distribution is unchanged, but the variance falls inversely pro-
portionally as the number of genes rises.

the distributions decrease as the number of gene-pairs increases, that with


four gene pairs having half the variance of that with two, and that with
eight gene pairs having a quarter of its variance. Again we can see how
the genetic properties of the polygenic system are reflected character-
istically in the biometrical properties of the frequency distribution of
the phenotypes.

6. Genetic analysis and somatic analysis


The use of the special stocks and special breeding methods available in
26 The biometrical approach
Drosophila have enabled us not only to recognize that the heritable part
of continuous variation is to be attributed to polygenic systems (and
indeed it was recognized that such systems provide a basis for under-
standing continuous variations, long before such experiments were
undertaken with Drosophila), but also to locate within the chromosomes,
and hence to count, at least some of the genes in the system and to inves-
tigate up to a point their action and interaction in producing their effects
on the chaeta characters under study. A somewhat similar although less
detailed analysis has been possible with several characters in wheat, again
using special stocks built up to carry known combinations of chromo-
somes derived from the two varieties under investigation (Law, 1967).
Such special stocks are however available in only a limited number of
species. How then in their absence are we to proceed to learn something
of the properties of a polygenic system, its properties of dominance and
the interaction of its genes with one another and with non-heritable
agencies, as well as their linkage relations?
The difficulty stems of course from the relatively small effects of indi-
vidual genes on the character, from the similarity of these effects and
from the obscuring effect of the non-heritable portion of the variation.
The classical technique of genetics would be to isolate as many of the
genes as possible and to study their properties individually, and this is
what has been done at any rate up to a point with Drosophila using their
specific locations on the chromosomes as the basis for recognizing them
as separate genes. In principle, while not being able to assign the individ-
ual genes to specific locations on specific chromosomes we could pro-
ceed some distance in this way with any example of continuous variation.
We could seek to control the environment in which the organism is raised
so as to reduce the non-heritable variation and its blurring effect on genic
segregation, although in so far as the non-heritable differences arose from
chance effects of development rather than from the impact of outside
agencies, this variation could not be wholly eliminated. We could seek to
produce inbred lines from the population or the descendants of the cross
under investigation so as at least partially to break down the polygenic
system into smaller elements, depending on fewer gene differences, and
to provide ourselves with the means of making repeatable observations
and progeny tests to whatever extent was necessary to establish a genetic
difference however small it might be. Many such inbred lines would be
needed, and in the absence of special stocks many generations of inbreed-
ing to give us the material we needed. And in the end when we came to
put the parts of the polygenic system together again in order to see how
Genetic analysis and somatic analysis 27
they interacted we should be faced once more with much of the genetic
complexity that we had been seeking to circumvent. Such an approach
is clearly not generally a practical proposition.
A different approach to fractionating the polygenic system has been
advocated from time to time, that of analysing the character under study
into component sub-characters in the expectation that these sub-
characters would prove to be under simpler, and hence more readily ana-
lysable, genetic control than the full character itself. Thus the yield of
grain of a wheat plant can be regarded as the product of the average
weight per grain, the average number of grains per ear, and the number
of ears borne by the plant. If different genes mediated these separate
sub-characters we should then have at least made a start on simplifying
the problem of genetically analysing the continuous variation in yield,
especially if we could at the same time reduce the non-heritable com-
ponent of the variation.
On the face of it, there are some grounds for believing that this ap-
proach through what has been called somatic analysis, might have value
as an aid to the genetic analysis. It has been reported by Spickett (1963)
that he was able to identify genes in Drosophila by their location in the
chromosomes, all of which affected the number of sternopleural chaetae
and did so in different ways, one by a local effect in a particular section
of the clump of chaetae while another had a more generally distributed
effect. At a somewhat coarser level, genes are known which affect the
number of sternopleurals while not affecting the number of abdominal
chaetae and vice versa. But other genes are also known which affect both
sets of chaetae simultaneously. These genes can be recognized by effects
other than on the sternopleurals and abdominals and they are genes pro-
ducing discontinuity in the distribution of phenotypes and so capable of
being followed by the Mendelian technique. But if their effects were con-
fined to the chaetae under consideration and were sufficiently small not
to produce individually detectable characteristics, and if they were seg-
regating simultaneously in a family or population we should find that
seeking to analyse the genetic control of the one group of chaetae separ-
ately from that of the other did not in fact simplify the problem; for
while this somatic analysis would serve to separate some of the genes it
would not separate others which affected both sub-characters and which
therefore appeared in both analyses. Variation in the two sub-characters
would be correlated because some genes affected both, but only partially
so because other genes affected only one.
This situation is a commonplace in Mendelian genetics. Taking but one
28 The biometrical approach
example, that of flower colour in plants, genes are known which simul-
taneously affect both the anthocyanin and anthoxanthin pigments,
others which affect only the one class of pigment and still further genes
which affect only the other. Even these latter genes can result in corre-
lated effects in the two classes of pigment, for the two types of pigment
can share a common precursor which if in limited supply will be available
in greater quantity for the production of one type if, because of gene
action, the other type is being produced in lesser quantity and so is mak-
ing smaller demands on the pool of precursor. Thus a negative correlation
can arise between the amounts of the two kinds of pigment.
The evidence from attempts at the somatic analysis of continuously
varying characters agrees with this expectation. If we subdivide yield of
grain in a cereal into average weight of grain, average number of grains
per ear, and number of ears in the plant, we find that there are corre-
lations, most commonly negative ones, between the sub-characters. Simi-
larly the yield of sugar by sugar beet is the product of the sugar percent-
age in the root and the weight of root; but the two are negatively corre-
lated and while it is relatively easy to raise the yield of root by selection,
the sugar percentage will then tend to fall and vice versa. In seeking to
breed for yield of sugar little advantage is gained by treating the two sub-
characters separately, for the value of this somatic analysis is largely
vitiated by the negative correlation between them.
At the fundamental level of the gene and its immediate biochemical
product, there can be a simple one-to-one correspondence between
change in the gene and change in the product, as indeed we see in the
variation of such proteins as haemoglobins and enzymes. But when we
pass to characters of the kind we have been discussing, biochemically
and developmentally remote from the primary action of the genes, the
complexity of development ensures that just as the character will be
affected by many genes, one gene may - and indeed commonly will - be
found to affect a number of characters, if we search out all its conse-
quences for the overall phenotype of the organism. So, save at a very
basic level, somatic analysis and genetic analysis will not march together
in a simple fashion, and the only way to relate changes of phenotype to
changes of genotype is to isolate the genes and ascertain their effects.
Somatic analysis is of use only after it has been validated by prior gen-
etical analysis: it is not a generally reliable precursor to genetic analysis
itself. It is of use for genetical purposes only where experiment and ob-
servation have shown its application to be justifiable and helpful: where
we are dealing with continuous variation, due to genes which in general
Biometricai genetics 29
we cannot expect to be readily recognizable in segregation, we cannot
expect to overcome the intrinsic difficulty of the situation by attempting
a prior somatic analysis.

7. Biometrical genetics
If we accept that commonly we cannot distinguish any oJ the individual
genes whose segregation contributes to continuous variation (and that
even with the special stocks available in Drosophila we cannot distinguish
all of them) we must be content to deal with the relevant polygenic sys-
tem as a whole. And since we cannot distinguish the segregant classes
one from another, we cannot use a form of analysis based on class fre-
quencies as in the classic Mendelian method. We can, however, recognize
the biometrical properties of the frequency distributions of the pheno-
types which are our raw material, and we can estimate the biometrical
quantities, the means, variances, and so on, which characterize these dis-
tributions. As we have seen, these parameters can reflect, and reflect in
characteristic ways, the properties of the polygenic system from which
the heritable component of the variation stems. We can thus seek to
gain information about the properties of the genes underlying continu-
ous variation by analysis of the biometrical quantities which character-
ize the frequency distributions of the phenotypes in related families and
populations. We must expect that the information so obtained will not
be just like that yielded by classical genetical analysis. In particular,
since we shall not be following individual genes we cannot learn about
their individual properties: rather, since we are considering the system
as a whole, we shall obtain information about the overall joint or aver-
age properties of its member genes. At the same time because we are
considering all the variation that the character shows we shall be bring-
ing the effects of all the relevant genes into the reckoning, and this we
can never achieve by the Mendelian technique of identifying and follow-
ing individual genes, since there must inevitably be some genes of rela-
tively small effect which escape identification.
The phenotypes of the individuals in any family or other appropriate
group yield two biometrical quantities which are of use to us, the mean
of the distribution (a first degree statistic since it is linear in x, the metric
measuring the expression of the character) and the variance (a second
degree statistic depending on x 2 ). In addition, any pair of related families
or groups may yield a covariance, which is also a second degree statistic.
Higher order statistics may also be obtained, notably that of the third-
30 The biometrical approach
order which measures skewness (depending on x 3 ) and the fourth order
which measures kurtosis (depending on x 4 ). These have, however, seldom
been put to use in genetical analysis and we shall consider them no further.
We shall thus be concentrating on the genetical information that can be
derived from comparisons among the means, variances and covariances of
related families or groups of individuals. These we shall seek to interpret
in terms of appropriate parameters representing the consequences of the
various genetical phenomena in which we may be interested. Having de-
fined these parameters, expectations are formulated in terms of them for
the means, variances and covariances of the families or groups that our
experiments yield. The means, etc. observed are then related to these
expectations in such a way as to yield estimates of the parameters and
tests of their significance.
In any experiment we may run into a complexity of genetical phenom-
ena, especially as we must expect to be dealing with a number of genes
whose relations one with another may not be the same for all of them:
indeed we have already seen this to be the case with the system mediating
variation of the number of sternopleural chaetae in Drosophila, where
the genes of the X chromosome interacted with those of chromosomes
II and III, although these latter show no evidence of anything but an
additive relation to one another. Such a complexity of phenomena leads
to a corresponding multiplicity of parameters which it would be necessary
to take into account in formulating expectations for the statistics ob-
served, with the consequence that except in large and complex experi-
ments there could be more parameters than there were statistics from
which to estimate them. Some simplification must therefore be made in
the approach: only those parameters which are regarded as of chief im-
portance, and with which the data can cope, should be introduced into
the analysis initially, and others added only as necessity requires.
The simplest genetical formulation to be used in the initial analysis is
generally taken as that which includes parameters representing the addi-
tive effects of the genes (that is the differences between corresponding
homozygotes, AA and aa, BB and bb, etc.) and their dominance proper-
ties. Given that the experimental material is sufficient, the experiment
adequately designed and the statistical analysis suitably carried out, we
can then estimate these parameters and also test the goodness of fit of
this initial simple formulation to the observations. If the fit proves to be
adequate, we have no grounds for postulating a more complex genetical
situation. But if, on the other hand, the fit proves to be inadequate, con-
sideration can be given to a more complex formulation incorporating
Biometrical genetics 31
further parameters, representing interaction between non~allelic genes,
or linkage or whatever else seems appropriate. If this in turn proves to
be inadequate to fit the observations, and the data are themselves suf-
ficiently extensive, a still more complex set of parameters representing
a still more complex genetical situation can be tried.
This approach will be developed and illustrated in the following chap-
ters. We shall start by considering data from controlled breeding experi-
ments based on crosses among true-breeding lines and later turn to the
more difficult analysis of data from randomly breeding populations,
just as classical genetics began with experimental crosses and later pro-
ceeded to the genetical analysis of populations.
Additive and
dominance effects
8. Components of means
With disomic inheritance, two alleles A-a can give rise to three genotypes
AA, Aa and aa. Two parameters are required to describe the differences
in phenotypic expression of these three genotypes in respect of any
character which they affect. As the origin, we take the mid-point be-
tween the two homozygotes since this does not depend on the differ-
ences between the three genotypes, but on the rest of the genotype and
the effects of the environment, and thus reflects the general circum-
stances of the observations. The two parameters measuring the differ-
ences between the genotypes may then be defined as d, measuring the
departure of each homozygote from the mid-point, and h, measuring the
departure of the heterozygote from it. Taking A as the allele which in-
creases the expression of the character, AA will exceed the mid-point
(m) by d, and so will have an expression m + d, while aa will equally fall
short of the mid-point having an expression m-d, and Aa will deviate
from m by h so having an expression m + h (Fig. 6). If h is 0 the hetero-
Ao h m
00 1 'I AA
• I I 1 I ..
I~----~----~----d'---~I
I
-d
Fig. 6. The d and h increments of the gene difference A-a. Deviations are
measured from the mid-parent, m, midway between the two homozygotes
AA and aa. Aa may lie on either side of m and the sign of h will vary accord-
ingly.
zygote's expression of the character will be midway between the ex-
pression of the two homozygotes and dominance is absent. If h is posi-
tive, the heterozygote will be nearer to AA than to aa in its expression
and A will be partially, or if h = d completely, dominant. Similarly if
h is negative, a will be the dominant allele. If h > d Aa will fall outside
Components of means 33
the range delimited by AA and aa, and the gene may then be said to dis-
play over-dominance. It should be noted that here the capital letter A
does not imply dominance of the allele so designated: A is the allele
which increases the expression of the character whether it be dominant
or not.
This characterization of the differences among the genotypes can be
applied to any genes, whether their effects be large or small, leading to
continuous variation or not, provided the expressions of the character in
question can be expressed in quantitative terms. Thus the sex-linked
mutant Bar-eye (B) reduces the number of facets in the eyes of Drosophila
melanogaster, wild-type females (+/-t) having an average number of
779.4 facets, heterozygotes (B/+) having an average of 358.4 facets and
the homozygous mutant (B/B) having an average of 68.1 at 25° C
(Sturtevant, 1925, quoted by Goldschmidt, 1938). Then m is !(779.4
+ 68.1) = 423.75, d = 779.4 - 423.75 = H779.4 - 68.1) = 355.65
and h = 358.4 - 423.75 = -65.35. Since h is negative the B mutant is
partially dominant to wild-type and we may if we wish measure its de-
gree of dominance by hid = -65.35/355.65 = -0.184. We should note
that the effect of the Bar-eye mutant is large, and leads to discontinuous
variation, the phenotypes of B/B, B/+ and +/+ showing no overlap. No
one would go to the trouble of counting the facets in classifying the
three genotypes when Bar-eye is being used, and because its effect is
sufficiently large for it to be recognized and followed individually in
breeding experiments there would be no difficulty in disentangling it
from other gene differences whose effects were sufficiently small to
contribute only to the continuous variation in facet number that we
can observe within the phenotypes associated with each of these geno-
types for Bar-eye.
Confining ourselves now to continuous variation, we cannot dis-
tinguish individually the genes contributing to it. If we consider two
homozygous lines the departure of each of them from their mid-point
(or mid-parent as it is often called) will reflect the simultaneous action
of all the genes affecting the character by which the lines differ. As-
suming that the effects of these genes are simply additive, the depar-
ture from the mid-point will in fact be the sum of the d's, one from each
of the genes, taking sign into account. Where, for example, the lines dif-
fer at two loci, A-a and B-b, if one of them is AABB and the other aabb,
the first will depart by da + db and the second by -(da + db). But if the
lines are AAbb and aaBB, their departures will be da - db and -da + db
respectively. Generalizing, where the homozygous lines differ at k
34 Additive and dominance effects

loci, we may define [d] as the departure from the mid-parent of the line
with the greater expression of the character, where [d] = S(d+) - S(d_),
S(d+) standing for the sum of the d's of all the genes in this line tending
to increase the phenotype, S(d_) for the sum of the d's of those tending
to decrease it and S(d+) > S(d_) since [d] must be positive. In the same
way, when we cross the two homozygous lines, the phenotype of the
heterozygote will depart from the mid-parent by [h] = S(h). Since by
definition any h may be positive or negative, [h] itself may be positive
or negative, and of course where some of the genes at some of the loci
have positive h's and others negative h's they will tend to balance out
each other's effects. [h] may thus be small or even 0, even where each
of the genes individually shows pronounced dominance, simply because
being dominant in opposite directions they are cancelling out each other's
effects.
We can now see at once that although hid provides a measure of domi-
nance for a single gene difference, [h]/[d] does not provide a correspond-
ing measure of dominance when we are considering more than one gene.
[h]f[d] may be very small simply because some of the h's are positive
and others negative, so leading to a small value for [h] even although
none of the individual h's is small; and equally [h] / [d] may be large just
because the genes are so distributed between the parent lines that they
are tending to balance out one another's effects and [d] = S(d+)-
S(d_) is small even although every d is itself not small. Thus [h] /[d] ,
°
although depending on dominance in that it cannot depart from unless
one or more of the genes show dominance, is not itself a direct measure
of that dominance. For this reason it is often referred to as the potence
ratio. It is particularly worth emphasizing that where the FI between
two lines differing at more than one locus gives a phenotype falling out-
side the range delimited by the parents and so displays heterosis, i.e.
[h ]>[d] ; there is no reason to postulate over-dominance of any of the
genes involved since the excess of [h] over [d] can come about merely by
the d's of the various genes balancing one another to a greater extent
than do their h's. Thus to take a simple example, when ha = da and hb =
db' the F I between AAb band aaBB will have a phenotype of ha + h b, the
parents having phenotypes of da - db and -da + db. Then [h] / [d] =
(ha + hb)/(da - db) = (da + db)/(da - db) and heterosis is displayed even
although neither gene shows over-dominance.
Where an F2 is raised from the FI> it will include i AA, tAa and iaa in
respect of the gene A-a. This gene will therefore contribute ida + tha -
ida = tha to the departure of the average expression of the character in
Testing the model 35
F2 from the mid-parent. Assuming the effects to be additive of the k
genes by which the parent lines differed, the departure of the F2 mean
thus becomes! [h], and it may be observed that this is equally the case
even where two or more of the genes are linked. The mean phenotype of
the F 2 will then be F2 = m + ! [h] . In the same way, where B} is the back-
cross to the larger parent PI' it will include !AA and !Aa and A-a con-
tributes !da + !ha to the departure of the mean of B} from the mid-parent.
Then taking all k genes into account ii1 = m + ! [d] + ! [h]. Similarly the
back-cross to P2, the smaller parent gives if2 = m - ! [d] + ! [h].
Continuing from the F2, where a true F3 generation is raised by selfing
the F 2 individuals, in respect of A-a it will comprise ~ AA, ! Aa and
~ aa when taken as a whole. This gene will then contribute i da + ! ha -
i d a = ! ha to the departure of the F 3 mean from the mid-parent, and
taking all k genes into account the mean phenotype will be ~ = m + ! [h] .
If however the third generation is raised by mating together pairs of
individuals taken at random from the F2 (a procedure which is some-
times incorrectly described, especially by animal geneticists, as giving an
F3 generation) the distribution of A-a over this generation taken as a
whole will be ! AA, ! Aa, ! aa as in the F 2 , and the mean phenotype will
be 53 = m + ! [h] where S3 indicates the third generation raised by
sibmating among the F 2. This formulation of mean phenotypes in
terms of m, [d] and [h] can be extended to the F 4 , where F4 = m +! [h],
and indeed to any of the types of family raised by the almost endless
combinations of mating systems possible among the descendants of the
initial cross. A number of these results are collected together in Table 5.

9. Testing the model


We can thus arrive at a formulation of the mean phenotypes in terms of
the mid-parent, m, which depends on the general conditions of the ob-
servations, the additive component [d] and the dominance component
[h J. If this formulation is adequate, Table 5 shows that a number of
relations must hold good. Thus confining ourselves to the parents, p}
and P2, the FI' the F2 and the two back-crosses, B} and B2, we can see
that
B1 H~+~)
B2 = H~ +15;)
and F2 = 1(2 ~ + ~ + 15;).
36 Additive and dominance effects
TABLE 5.
Components of means

Mean Phenotype
Generation
m [d] [h]
PI 1 1 0
P2 1 -1 0
FI 1 0
1
F2 1 0 2
BI 1 1
~ t
~ 1 -t t1
F3 1 0 4
1 0 1
F4 8

~ 0 t3
S4 1 0 -g
F2 X PI 1 1
~ t1
F2 X P2 1 -t 2
1
F2 X FI 1 0 2
1 1
BI selfed 1 ~ 4
~ selfed 1 -t 1
4

These expected relationships can be used to test the adequacy of the


model. The families must have been raised in comparable environments,
so that differences between their means which spring from differences
of the environments in which they have been raised do not introduce
distorting biases into the estimates of the mean phenotypes,~, P2' ~,
~,"liz and Ji2 . Also these means will be subject to sampling variation
which can be estimated by normal statistical procedures from the vari-
ances among the individuals within the families themselves. Thus if lj,]
is the variance of the individuals within the PI family, and Vp] is the vari-
ance of~, the mean of PI' Vp] = lj,)n where n is the number ofindivid-
uals observed in PI and used in calculating~.
Now we can rewrite the first of the relations as A = 2Ji] - ~ - ~ = 0
w hereu pon we can find ~ = 4 VB] + Vp] + Vi<'] and the stand ard error of
A can be obtained asV~. The expected value of A is 0 and we can thus
test whether this relation holds good by finding A/V ~ and looking up
its probability in a table of normal deviates in the customary way. It
should be noted that if the numbers of individuals observed within each
of the three families, PI' FI and BI , are small (say less than 10) A/V~
must be treated as t and its probability found from the table of t using
Testing the model 37
as the number of degrees of freedom the sum of the numbers of df from
the three families. The other two relations can similarly be tested by
settingB = ijj2-~-~ with correspondingly ~ = 4l'B2 + Vp2 + VpJ
and C = 4~ - 2~ - ~ - ~ with Vc = 16Vp2 + 4Vp1 + Vpl + Vp2' These
tests of the expected relationship have been termed 'scaling tests' by
Mather (1949) and further scaling tests can be devised where observations
on additional types of family are available. Thus, for example, where ob-
servations have also been made on the F3 generation we can test the agree-
ment of the relation D = 8F3 - 3Pl - 3P2 - 2Fl with its expected value
o (see Table 5), using VD = 64 VP3 + 9 J},l + 9Vp2 + 4 Vp1 .
Sets of such scaling tests can be devised to cover any combination of
types of family that may be available. Instead, however, of testing the
various expected relationships one at a time, a procedure proposed by
Cavalli (1952) and known as the joint scaling test may be used. This
effectively combines the whole set of scaling tests into one and thus
offers a more general, more convenient, more adaptable and more in-
formative approach. It consists of estimating the model's parameters, m,
[d) and [h) from the means of the all types of families available, fol-
lowed by a comparison of these means as observed with their expected
values derived from the estimates of the three parameters. This makes it
clear at once that at least three types of family are necessary if the par-
ameters of the model are to be estimated, but with only three types of
family available no test can be made of the goodness of fit of the model
since in such a case a perfect fit must be obtained between the observed
means and their expectatioI1S from the estimates of the three parameters.
So to provide such a test at least four types of family must be raised.
The procedure of the joint scaling test may be illustrated by reference
to data supplied by Dr D. S. Virk of a cross between two pure-breeding
varieties, 22 and 73, of the Birmingham collection of Nicotiana rustica
varieties. In Table 6 are presented the means and variances of the means
for plant height of the parental, F 1! F2 and first back-cross families (B 1 and
B2) derived from this cross, when grown in the summer of 1975. Family
size was deliberately varied with the kind of family. It was set at as low
as 20 for the genetically uniform parents and in excess of 100 for the
F2 and back-crosses, to compensate for the greater variation expected in
these segregating families. All plants were individually randomized at the
time of sowing so that the variation within families reflects all the non-
heritable sources of variation to which the experiment is exposed. With
this design the estimate of variance of a family mean (Vx) valid for use
in the joint scaling test is obtained in the usual way by dividing the
38 Additive and dominance effects
TABLE 6.
Joint scaling test on a cross between true-breeding varieties 22 and 73 of Nicotiana
rustica for the character final height of the plant in cm

Weight Model Mean Difference


No. of
Generation 11:-
plants x (=I/Vx ) m [d) [h) Observed Expected
O-E

PI (var 22) 20 1.0034 0.967680 1 1 0 = 116.3000 115.5217 0.7783


P2 (var 73) 20 1.4525 0.668847 1 -1 0 = 98.4500 99.1223 -0.6723
Fl 60 0.9699 1.031034 1 0 1 =117.6750 117.3807 0.2943
F2 160 0.4916 2.034174 0 ,-I =111·7781 112.3514 -0.5733
Bl 120 0.4888 2.045827 ,-I ,-I = 116.0000 116.4512 -0.4512
~ 120 0.6135 1.629992 -,-I ,-I = 109.1610 108.2515 0.9095

Xh] = 3.411

variance within the family (~) by the number of individuals in that fam-
ily (Table 6). Reference to this table shows that the greater family size
of the segregating generations has more than compensated for their greater
expected variability in that the variances of their family means are smaller
than those of their non-segregating families.
Six equations are available for estimating m, [d] and [h] and these are
obtained by equating the observed family means to their expectations,
in teFms of these three parameters, which are taken from Table 5. The
coefficients of m, [d] and [h] in the six equations are listed in the central
columns of Table 6. There are three more equations than unknowns and
the estimation of the three unknowns (m, [d] and [h]) must therefore be
by a least squares technique. The six generation means to which we are
fitting the m, [d] and [h] model are not known with equal precision; for
examp~, the variance of the mean (Vp2) of ~ is almost three times that
of the OJ. The best estimates will be obtained, therefore, if the generation
means and their expectations are weighted, the appropriate weights being
the reciprocals of the variances of the means. For the first entry in the
table, PI' the weight is given by 1/1.0334 = 0.9677 and so on for the
other families (Table 6).
The six equations and their weights may be combined to give three
equations whose solution will lead to weighted least squares estimates
of m, [d] and [h], as follows. In order to obtain the first of these three
equations each of the six equations is multiplied through by the coef-
ficient of m which it contains, and by its weight, and the six are then
summed. We thus have
Testing the model 39
m [d] [h]
0.9676800 + 0.9676800 112.541 1840
0.6688468 0.6688468 = 65.8479674
l.031 0340 + 1.0310340 = 12l.3 26925 9
2.0341740 + 1.0170870 = 227.376 1048
2.0458265 + 1.0229133 + l.0229132 = 237.3 1 5 874 0
l.629 991 8 0.8149959 + 0.8149959 = 177.9315349
8.377553 1 + 0.5067506 + 3.8860301 = 942.339591 0
The second and third equations are found in the same way using the
coefficient of [d] for the second and of [h] for the third along with the
weights as multipliers. We then have three simultaneous equations, known
as normal equations, that may be solved in a variety of ways to yield esti-
mates of m, [d] and [h].
A general approach to the solution is by way of matrix inversion. The
three equations are rewritten in the form

8.377 553 1
[ 0.5067506
0.5067506
2.5554814
3.886030
0.1039587
U[r!:J d
942.339591 OJ
= [ 76.385386 1
3.8860301 0.1039587 2.458532 1 h 442.6386827
J M S
where J is the information matrix, M is the estimate of the parameters
and S is the matrix of the scores.
The solution then takes the general form M = r 1 S where r 1 is the
inverse of the information matrix and is itself a variance-covariance
matrix.
The inversion may be achieved by anyone of a number of standard
procedures (Fisher, 1946; Searle, 1966). For our example, inversion leads
to the following solution.

r!zJ [0.4567853 -0.0613140 -0.719 416 O~ 942.339 591 O~


[d
h
= -0.061 3140
-0.7194160
0.4002201
0.0799914
0.0799914
1.540495 1
L 76.3853861
442.638682 7
M r 1 S
The estimate of m is then
m = (0.4567853 X 942.3395910) - (0.0613140 X 76.3853861)-
(0.7194160 X 442.6386827)
= 107.3220362
40 Additive and dominance effects
which equals 107.3220 to the accuracy required, and the S.E. of is m
v'0.456 785 3 = ± 0.675859 = ± 0.6759 to the accuracy required. In a
similar way
[£I] = 8.1997 ± 0.6326
and [11] 10.0587 ± 1.2412.
All are highly significantly different from zero when looked up in a
table of normal deviates.
The adequacy of the additive-dominance model may now be tested
by predicting the six family means from the estimates of m, [d] and [h].
For example,

on the basis of this model and for the estimates obtained it has as the
expected value
107.3220 -!(8.1997) + t(10.0597) = 108.2515.
This expectation along with those for the other five families is listed
in Table 6. The agreement with the observed values appears to be very
close and in no case is the deviation more than 0.83% of the observed
value. The goodness of fit of this model can be tested statistically by
squaring the deviation of the observed from the expected value for each
type of family and multiplying by the corresponding weight. The sum of
the products over all six types of families is a X2 . Since the data comprise
six observed means, and three parameters have been estimated, this X2
has 6 - 3 = 3 degrees of freedom.
The contribution made to the X2 by~, for example, is (116.3000-
115.5217)2 X 0.96768 = 0.5862. Summing the six such contributions,
one from each of the six types of family, gives Xf31 = 3.4110 which has
a probability of between 0.40 and 0.30. The model must therefore be
regarded as adequate: there is no evidence of anything beyond additive
and dominance effects.
The individual scaling tests, A, Band C, referred to on page 37 can,
of course, also be used to test the model. Thus with the present data
A = 2liJ - ~ - FJ = (2 X 116.000)-116.300-117.6750 = -1.975
and ~ = 4Vin + VpJ + VFJ = (4 X 0.4888) + 1.0334 + 0.9699
= 3.959
leading to SA = v'~ = 1.990.
Testing the model 41
Thus A = -1.98 ± 1.99 which, when entered in a table of normal deviates
does not differ significantly from the value 0 expected. These three tests,
as applied to the present data, are summarized in Table 7. Not surpris-
ingly they agree with the joint scaling test in showing the model to be
adequate.
TABLE 7.
Individual scaling tests on the data from a cross in Nicotiana
used in Table 6

Test
A = 2B] -PI-PI -1.98 ± 1.99
B = 2B2 - 1'2 - 1'1 = 2.20 ± 2.21
C = 4F2 - 2P] - p] -P2 = -2.99 ± 3.77

The joint scaling test, however, does more than test the adequacy of
the additive-dominance model: it provides the best possible estimates of
all the parameters required to account for differences among family
means when the model is adequate and, as we shall see in Chapter 5, it
can be readily extended to more complex situations. In the present case,
these best estimates show that the additive and dominance components
are of the same order of magnitude and since [h 1 is significantly positive,
alleles which increase final height must be dominant more often than
alleles which decrease it.
In this example the simple model is adequate but this is frequently
not the case, the inadequacy being revealed both by the joint scaling test
leading to a significant X2 and by one or more of the individual scaling
tests showing a significant departure from O. Two examples of this ana-
lysed in the way just described are summarized in Table 8.
The first is the weight per loculus of fruit in a cross between the two
tomato varieties, Danmark and Red Currant grown in 1938 (Powers,
1951). The second example, again provided by Dr D. S. Virk, is plant
height at the sixth week after planting in the experimental field in a
cross between varieties 72 and 22 of Nicotiana rustica. Variety 22 was
a parent of the cross we have just analysed in detail and 72 has the same
origin as variety 73 of the earlier cross. Both crosses were grown simul-
taneously, using the same experimental design and family sizes, in 1975.
For the tomato cross all three individual scaling tests are significant as
is also the joint scaling test. For the N. rustica cross the C scaling test
42 Additive and dominance effects
TABLE 8.
Examples of crosses where the additive-dominance model is inadequate.
1. Tomato: Danmark X Red Currant, for weight per loculus of fruit, in
1938 (Powers, 1951)
2. Nicotiana rustica: varieties 72 X 22, for plant height at sixth week
in field, in 1975.

Mean and its S.E.


Generation Cross 1 Cross 2
PI 10.36 ± 0.581 80.40 ± 1.936
P2 0.45 ± 0.017 65.47 ± 1.726
FI 2.33 ± 0.130 85.99 ± 1.231
F2 2.12 ± 0.105 84.03 ± 0.856
BI 4.82 ± 0.253 84.18 ± 1.160
B2 0.97 ± 0.045 73.88 ± 1.015
Scaling tests
A -3.05 ±0.791 1.97 ± 3.263
B -0.85 ±0.159 -3.70 ± 2.936
C -6.99 ± 0.763 18.27 ± 4.950
Joint Xhl= 96.59 X[31 = 24.18

and the joint scaling test are significant. In both cases, therefore, there is
clear evidence of the inadequacy of the simple additive-dominance model.

10. Scales
A failure of the additive-dominance model to fit the data, such as we
found with the last two examples considered in the previous Section,
must imply that one (or more) of the assumptions on which the model
is based is in fact invalid. Thus, for example, in constructing the model
we have assumed that the genes show simple autosomal inheritance. If
then some of them were sex-linked or if there were a maternal element
in the determination of the character, or indeed if the pattern of inherit-
ance departed from the simple autosomal in any other way, the model
would not be appropriate and would be found to fail in its fit with an
adequate body of observational results. This does not of course mean
that biometrical analysis is impossible: it means only that a more appro-
Scales 43
priate model must be found and fitted to the data. The failure of the
additive-dominance model in the examples of the last Section is, how-
ever, most unlikely to be due to invalidity of the assumption of simple
autosomal inheritance. Nicotiana rustica and the tomato are both her-
maphroditic plants and sex-linkage cannot therefore be involved. The
reciprocal Fl'S were alike in their expression of the character and this
rules out a maternal element in its determination. There is no reason to
postulate inviability of any of the genotypes included in the families
raised, and the experiment was conducted in such a way as to minimize,
if not entirely eliminate, the chance of selection disturbing the segre-
gation of the genes.
These considerations point to the assumption of simple additivity of
the d's and h's stemming from the various genes as the invalid part of
the model. Again, as we shall see in Chapter 5, the model can be elabor-
ated to accommodate non-independence of the effects of the different
genes, although only at the expense of introducing further parameters.
There is, however, one particular cause of non-independence whose
effects can be resolved in a different way, so allowing the simple additive-
dominance model to be retained and the complexity of introducing
special parameters for the accommodation of the interactions among
the genes to be avoided.
The additive-dominance model assumes that the genes involved are
independent of each other in producing their effects; or in other words
that the total effect of all the genes affecting the character (or at least
the total effect of all such genes which affect the observations we are
making) is the simple sum of their individual effects. Clearly this need
not be so. Genes might, for example, act in a multiplicative fashion, that
is their joint effect is the product, not the sum, of their individual actions,
and such multipIicativity has in fact often been postulated. In such a case
the simple model we have been using must fail when applied to an ad-
equate body of data. But if two genes are acting in this way, their joint
effect being xa x b ' where xa and Xb are their individual effects, and we
replace the measurement of the phenotype by its logarithm we have
log (xaxb ) = log xa + log x b . The multiplicative action has been removed
and they now make their own independent contributions to the.pheno-
type. So when in such a case we carry out the analysis in terms of the
logarithms of our initial measurements, the assumption of independence
is justified and the simple model will fit. Many other relations between
genes and phenotype are obviously possible and each would suggest a
suitable transformation of the scale on which the measurements of the
44 Additive and dominance effects
phenotype are expressed to restore independence. To take but one more
example, if the genes are additive in their effects on the linear dimensions
of an organ while the character we are following is effectively an area it
will reflect not the sum of the gene effects (as a linear character would)
but the square of the sum. In respect of the area character, then, the
model which assumes additivity will fail; but if we replace the direct
observations by their square roots, so restoring to it a linear basis, the
assumption of additive action of the genes would be valid and the model
would fit these rescaled results. In other words where the assumption of
independent action of the genes fails for this kind of reason, it is possible
in principle to transform the data to a more appropriate scale, as by
taking logs or square roots, or whatever else it may be, and to carry out
the analysis successfully using the simple additive-dominance model on
these transformed data.
The difficulty is, of course, that we cannot in general know how the
genes affecting a character combine in producing their effects, or even
whether in fact they all combine in the same way. So given that the
model fails when applied to a set of data, we can only cast around for a
transformation which removes, or at any rate substantially reduces, the
non-independence. Sometimes the nature of the character may suggest a
suitable transformation. Thus if a character effectively depends on the
area of an organ, the square root transformation is an obvious one to
try; but we must not be surprised if it fails, as we obviously cannot know
that the genes combine additively in their effects on linear dimensions.
In the same way the total weight of fruit yielded by say a tomato plant
can be regarded as the product of number of fruits and their average
weight. This is a multiplicative relation and suggests a log transformation;
but again it does not follow that because these components of yield are
related multiplicatively the genes affecting anyone of the components
combine in a similar way or that some genes do not affect both compo-
nents simultaneously and so introduce a disturbance into the multipli-
cative relation.
Thus ultimately the only justification for any transformation that
may be used is that it works; that whereas on the original data the model
failed because of non-independence, once the data have been transformed
the non-additivity vanishes, the simple model is adequate and there is no
need to complicate the analysis or the interpretation of its results by
introducing parameters to accommodate the non-additivity. Further-
more, because our test of the satisfactoriness of a transformation is em-
pirical, by showing that it is successful in allowing analysis in terms of
Scales 45
the simple model, we must be careful not to use its success as a justifi-
cation for drawing theoretical conclusions concerning the physiology of
gene action. At the same time, it is of course legitimate to test the agree-
ment of any empirical scale with one expected theoretically from other
considerations. This caution is reinforced when we consider that even
where the genes are not all combining in the same way to produce their
effects it may still be possible to find a scale on which their effects are
independent on average, at least as far as the data under analysis go. In
such a case it can give us little if any good information about the nature
of gene action and interaction, and indeed this same transformation may
fail when applied to a different cross involving different genes, as has in
fact been observed to happen on many occasions in practice. Even, how-
ever, where this occurs, empirically the transformation has been justified
since it has simplified the analysis of the body of data to which it was
applicable and lent more precision and confidence to the predictive use
of the results of that analysis.
We can see the value of a suitable transformation if we return to the
example already considered on page 41, where the additive-dominanace
model failed to fit the data on the weight per loculus of fruit in the cross
between two tomato varieties (Table 8). Powers (1951) has published
these data on both the original scale and on a logarithmic scale. We can,
therefore, carry out the same tests on the log transformed data. These
tests summarized in Table 9 provide clear evidence of the adequacy of
TABLE 9.
Analysis of weight per loculus of fruit in the tomato cross Danmark X Red Currant
using the log transformed data (Powers, 1951). Compare with cross 1 in Table 8

Mean and its S.E.


Generation
on logarithmic scale
PI 0.9769 ± 0.026 61
P2 -0.3643 ± 0.D18 36
FI 0.3346 ± 0.026 73
F2 0.2726 ± 0.01465
BI 0.6357 ± 0.01706
~ -0.0512 ±0.01467
Scaling tests
A -0.0401 ± 0.050 85
B -0.0727 ± 0.043 73
C -0.1914 ± 0.085 65
Joint X[3] = 5.66
46 Additive and dominance effects
the additive-dominance model on the new scale. In contrast the data on
plant height in the cross between two Nicotiana rustica which were con-
sidered along with the tomato data (Table 8) could not be successfully
transformed to a scale on which the simple model was adequate by
taking logs, antilogs, squares or square roots of the original data. The
further analysis of these data is taken up in Chapter 5.
One last point remains to be made about scales of measurement. If
we employ a transformation to remove interactions between non-allelic
genes, as in the example we have just considered, we may, and indeed
commonly will, change the apparent degree of dominance that the indi-
vidual genes show, in other words change the value of the ratio hid. This
is well illustrated by the data in facet number in Bar-eyed female
Drosophila quoted in Section 8. The comparisons among the facet num-
bers of BIB, B/+ and +1+ flies are shown in Table 10 using the direct
counts of the facets, the logs of these counts and the square roots of them.
TABLE 10.
Effect of scalar transformation on the analysis of facet number
in Bar-eyed Drosophila

Mean facet number


Genotype
Direct count Log. transformation Square-root
+/+ (wild type) 779.4 2.892 27.92
B/+ 358.4 2.554 18.93
B/B 68.1 1.833 8.25
Components
m 423.75 2.3625 18.085
d 355.65 0.5295 9.835
h -65.35 0.1915 0.845
hid -0.184 0.362 0.086

As we have already seen, when the direct counts are used, h is negative
and the Bar allele appears partially dominant to its wild-type alternative.
If, however, we apply the log transformation, h becomes positive and hid
is larger than with the direct measure of facet number, so suggesting not
only that wild-type is partially dominant to Bar but that the degree of
dominance is larger as well as being in the opposite direction. But if we
take the square root of facet number (which might be regarded as reason-
able since the number of facets is essentially a measure of area), hid is
near to 0, so suggesting that dominance is in truth negligible.
Components of variation: F2 and back-crosses 47
Which of these scales we choose to use, and hence what direction and
degree of dominance we choose to accept, is in this case a matter of
taste, for with a gene difference of such large and unique effect by com-
parison with the residual variation in facet number, we have no test of
whether any of the scales is preferable to the others in respect of reduc-
ing or removing interactions with other genes. If our aim is to simplify
the representation of the effect of Bar, as far as possible, the square root
transformation has the advantage of eliminating h and leaving us only
with the need to use d in describing the relation between the three geno-
types. At the same time, no matter which scale we use we can easily pre-
dict the mean facet number of an F2 , back-cross or any other type of
family we care to consider, since in the absence of other segregating
genes of comparable effect hand d give us a complete description of the
genetic determination of the action of Bar. Furthermore, we should note
that no matter which scale is used, we must conclude that dominance, if
present, is small. Neither the log nor the square root transformation (nor
for that matter, any other reasonable transformation) would show domi-
nance as other than complete, i.e. h = d, if in fact B/+ had had the same
number of facets as one or other of homozygotes, and neither transform-
ation would have failed to reveal over-dominance, i.e. h > d, if the facet
number of B/+ had fallen outside the range determined by B/B and +/+.
As has been emphasized, the justification for using a transformed scale
is not theoretical but empirical, in that it removes or so reduces non-
independence of the gene effects as to permit the use of the additive-
dominance model with the simpler analysis and more confident predic-
tion to which it leads. Furthermore the estimates of the genetical par-
ameters d and h, obtained when the additive-dominance model can be
employed, are unconditional in that they are not subject to adjustment
by the interaction parameters which non-additivity introduces and are
constant over the range of variation under consideration. For these
reasons, while we must recognize that it is not always possible to find a
transformation which in effect removes non-additivity when this is
present in the direct measurements, the search for such a transformation
is always well worth-while.

11. Components of variation: F2 and back-crosses


So far we have been considering the constitution of family means in
terms of the additive-dominance model and the way in which obser-
vational data can be analysed so as to yield not only estimates of the
48 Additive and dominance effects
genetical parameters [d] and [h], in terms of which the values of the
means can be interpreted, but also a test of whether the model fits the
data in the sense of providing an adequate framework for the under-
standing of the observations. We must now leave these first degree stat-
istics, the means, and turn to consider the second degree statistics, the
variances and covariances that can be calculated from the families raised
in genetical experiments, the genetical parameters in terms of which
these statistics can be analysed and the test of whether the simple model
provides an adequate basis for understanding them.
Now the variation in each of the true breeding parent lines, PI and P2,
must be exclusively non-heritable, for all the individuals within one line
will be of the same genotype, apart from the effects of mutation which,
although detectable in suitable experiments, are in general so small as to
be safely neglected. Similarly all the individuals in the FI between two
such parent lines will have the same genotype although they will be het-
erozygous and not homozygous like their parents. Again all the variation
will be non-heritable within the FI family as it was in the parents. The
variances of the measurements of the character in both parents and FI
will thus provide estimates of the non-heritable variation and of its con-
tribution to the variances of later generations in which, because of seg-
regation of the genic differences between PI and P2, heritable variation
will also be present.
Considering first the F2, in the absence of disturbing elements such as
differential fertilization or viability, its constitution in respect of any
gene pair A-a by which PI and P2 differed, will be !AA, tAa and !aa.
This gene pair will add increments of da, ha and -da to the expression of
the character in individuals of the three genotypes and, as we have already
seen (Table 5) the contribution of A-a to the deviation of the F2 mean
from m, the mid-parent, will be !ha. The contribution of A-a to the sum
of squares of deviation from the mid-parent will be
!d/ + !h/ + i(-da)2 = !d/ + !h/
and its contribution to the sum of squares from the F2 mean then becomes
!d/ + th/ - (tha)2 = td/ + !h/
the term correcting for the departure of the mean from the mid-parent
being the square of the mean itself since we are using the proportionate
frequencies of the three genotypes and these sum to unity. For the same
reason the contribution of A-a to the mean square measuring the heri-
table variation, is the same as its contribution to the sum of squares,
namely !da2 + !ha2 •
Components of variation: F2 and back-crosses 49
Assuming that non-allelic genes make independent contributions to it,
the heritable variance produced by all the genes segregating in the F2 will
be the sum of their individual contributions. It thus becomes !S(d 2) +
!S(h 2) =!D +!H where we define D = S(d 2) and H = S(h2). Thus the
heritable variance comprises two parts, the D component, depending on
the d's which measure the departure of homozygotes from the mid-
parent and the H component which depends on the h's measuring the
departures of heterozygotes from the mid-parent. The D variation can
in principle be fixed by the selection of homozygous lines and so may
be referred to as fixable variation. The H variation depends on the
properties of heterozygotes and is therefore unfixable. H may also be
described as the dominance component of variation since when domi-
nance is absent at all loci, all h's = 0 and H = O. Similarly if dominance
is complete at all loci, all h = ±d and H = D, while with overdominance
at all loci all h > ±d and H> D. Now since D = S(d 2) and H = S(h2)
both are quadratic quantities. By contrast therefore with [d) and [h],
the values of D and H will be uninfluenced by the distribution between
the parent lines of the alleles at the various loci and by the direction of
dominance as reflected in the sign of h. Thus if we care to assume that
hand d are constant in magnitude (although in the case of h not necess-
arily in sign) for all the genes segregating in the cross,.y'(H/D) = h/d
provides a direct estimate of the degree of dominance free of the dis-
turbances which we had occasion to note when we were discussing the
ratio [h) I[d). If hand d are not constant in magnitude.y'(HID) pro-
vides an estimate of the average dominance of the genes.
Before leaving the variance of F2 we should note that it must of course
also include a non-heritable component which, provided the heritable
and non-heritable components are independent of one another (i.e.
provided that the phenotypes given by all the genotypes are subject to
the same variation from non-heritable causes), can be denoted by a
separate term E. Thus the variance of F2 may be expressed as
~P2 = !D+!H+E.
The reason for using ~P2 rather than the simple Vp2 to denote this vari-
ance will appear later (page 52).
Proceeding from F2 to the back-crosses we note that in respect of A-a
the back-cross to the larger parent, PI' will comprise! AA and! Aa indi-
viduals and that to the smaller parent, P2 , ! Aa and! aa individuals. Then,
as we have already seen, liz = !da + !ha and li2 = !ha - !da. The contri-
butions of A-a to the variances of the two back-crosses will thus be,
50 Additive and dominance effects
!da2 + !h/ - [!(da + ha)]2 = !(da - ha)2 to VB1
and similarly !(da + ha )2 to VB2 . Then assuming independence of the
contributions of the different genes, the heritable portions of the back-
cross variances become! S (d - h)2 and! S (d + h P respectively. Clearly
d and h do not make independent contributions and we must introduce
a further component of variation, F = S (dh), to give the expressions
VB] = !D-!F+!H+E and V B2 = !D+!F+!H+E,
E representing the non-heritable variation as before. We may note, how-
ever, that if we add the two variances

and again we have an expression to which d and h make independent


contributions. Similarly, if we take the difference of the two variances
V B1 - V B2 = F = S(dh).

Now F is a linear function of the h's and so, like h, can take sign: it is in
fact a weighted sum of the h's, the weights being the corresponding d's.
Where F is positive the genes from the larger parent, PI' show a prepon-
derance of dominance over their alleles from P2 , and where F is negative
the genes from the smaller parent P2 , show the preponderance of domi-
nance. It will be observed too that because of F the back-cross to the
parent with the preponderance of dominance gives the smaller variance.
If we assume that all k gene pairs by which PI and P2 differ have equal
d's and equal h's, D = S(d 2 ) = kd 2, H = S(h2) = kh 2 and F = S(dh) =
kdh. Then y'(DH) = ..j(kd2. kh 2) = kdh = F, provided the h's are all of
the same sign. But if the h's vary in their sign, some being + and others-,
F < ..j(DH). Exactly the same conclusions are arrived at even when we
do not have equal d's and h's providing that the dominance ratio hid, is
the same for all k loci. We have, therefore, in principle a test of consist-
ency in the sign of the h's.
When analysing the components of variation the simple additive-
dominance model assumes that the various gene pairs contribute inde-
pendently to the variances and covariances just as we saw that it did
when analysing the components of means. In addition, however we now
have the further assumption that the contribution to the variation made
by non-heritable agencies is independent of that made by the genes, or
to put it in other words that there is no interaction of genotype and
environment. This is by no means always a valid assumption, for we not
Generations derived from F2 51
uncommonly find different genotypes to be subject to different types of
non-heritable variation. Sometimes the differences can be removed, or at
least greatly reduced by a transformation of the scale.
Commonly, however, we find that an Fl between two inbred lines of
a naturally outbreeding species, while showing an intermediate mean ex-
pression of a character shows a variance lower than those of both parents.
No reasonable transformation of the scale will remove such differences.
Two courses are then open. A simple, if somewhat crude, allowance for
the differences can be made by taking the average of the parental and Fl
variances as the direct estimate of E; and this can be refined by an ap-
propriate weighting of the contributions the parents and Fl make to the
a a
average, for example, by taking Vpl + Vp2 + ! VF1 (where Vp1 is the
variance of parent 1 etc.) as a direct estimate of the E component in
~F2' and in the summed variances of the back-crosses, ~1 + VB2 . Diffi-
culties arise when we move on to later generations, since the correspond-
ing weighting should change, as for example in F3 where E in the overall
a
variance should be found as i Vp1 + i Vp2 + Jj;.1 since only of the indi- a
viduals in F3 are heterozygous at any locus by comparison with! in F2 .
Probably when making this simple correction for differences in the non-
heritable variation among parents and Fl' putting E = Vp1 + Vp2 + ! VF1 a a
is as useful a weighting as any, and well within the limits of error of such
a crude, empirical correction.
The second course open to us is to expand the model and introduce
into it appropriate parameters to represent the genotype X environment
interaction in the way we shall see in Chapter 6. Such an expanded model,
however, necessarily requires more data to permit the estimation of the
greater number of parameters it entails and the testing of its goodness of
fit. The use of a suitable transformation or a simple, if necessarily approxi-
mate, correction is always worth considering if the simple additive-
dominance model can thereby be made to fit satisfactorily.

12. Generations derived from F2


Further generations can be derived from the F2 and the back-crosses, and
the structures of their variances expressed in terms of D, Hand F. Those
from the back-crosses will not be considered here: they are dealt with by
Mather and Jinks (1971). * In respect of the gene for A-a the overall com-
position of an F3 generation, derived by selfing the individuals of F2 will

* Since this reference will be in frequent use, it will hereafter be abbreviated to M and J.
52 Additive and dominance effects
be iAA; aAa; iaa giving a mean of aha. The contribution of A-a to the
variance VF3 will thus be id/ + ah/ + i(-da)2 - (aha)2 = id/ + -hh/.
This overall variance can, however, be broken down into two parts: the
variance of the means of the F3 families, VzF3' round the overall mean
of the F3 generation, and the mean variance of the F3 families, ~F3'
each calculated round its own mean but averaged over all families. The
variance of the F3 means is like the variance of F2 in that its heritabie
portion reflects the genetical differences produced by segregation at
gametogenesis of the Fl. These are therefore described as first rank vari-
ances, denoted by the subscript 1. The variances within the F3 families
themselves, however, reflect the segregation at gametogenesis of the F2
individuals and the mean variance of the F3 families is thus of the second
rank, denoted by the subscript 2. As we shall see later, rank is of special
significance in relation to the effects of linkage on the components of
variation.
In respect of A-a, the F3 families will be of three kinds derived respect-
ively by selfing AA, Aa and aa individuals of the F2 . The families from
homozygous F2 individuals will be like Pl and P2 in the contribution A-a
makes to their means and variances and the families from Aa individuals
of F2 will be like the F2 itself in the contribution to mean and variance,
thus
F2 individuals AA Aa aa
Frequency in F2
mean da ! ha -da
F family (
3 variance 0 !d/ + !h/ 0
The contribution to the variance of F3 means, VzF3' will thus be
!d/ + H!ha)2 + !(-da)2 - (!ha)2 the last term being the correction for
the overall mean of aha. This reduces to !d/ + ir;h/, which summing
over all the genes by which Pl and P2 differ gives !D + rr,H as the heri-
table portion of VIF3 . The contribution of A-a to the mean variance,
~F3' will be !(O) + Htd/ + !h/) + !(O) = !d/ + kh/ which on sum-
ming over all gene differences gives !D + kH as the heritable portion of
the mean variance.
Both these variances will of course also contain a non-heritable com-
ponent, E, but these E components will not in general be equal. In the
first place the effect of those non-heritable agencies that cause differ-
ences among the members of a family will be less on the mean of the
family than on its individual members. Indeed in respect of this part of
Generations derived from F2 53
the non-heritable variation E2 = ~ E 1 , where E2 is the variation of the
means of families comprising n individuals each and E1 is the variation
within the families. But where each family is raised in its own plot in
the case of plants, or in its own cage or culture container in the case of
animals, we must expect greater non-heritable differences between indi-
viduals from different families, i.e. coming from different plots or con-
tainers, than between individuals from the same family, i.e. from the
same plot or container. Thus, unless special experimental designs are
used to avoid this situation, we must expect E2 > ~ E1 and in extreme
cases E2 may even be greater than E1 itself. If we write Ew for the non-
heritable variation within families and Eb for the additional non-heritable
variation between families, we can put E2 = Eb + ~ E w ' and, of course,
E1 =Ew ·
There is another point to be noted about the variance of family means.
Each mean will be subject to sampling variation arising from the variation
within the family, and this will be additional to the innate variation be-
tween the family means themselves, arising from genetical or indeed any
other differences between the means as such. The component of sampling
variation in ~F3 will be ~ V2F3 where each family includes n individuals,
or, if the numbers vary from one family to another, where n is the har-
monic mean of these numbers. ~ V2F3 will of course include the item ~Ew'
which is the contribution of sampling variation in respect of non-heritable
variation within families to non-heritable variation between their means.
We can thus write
~F3 tD + -kH + Eb + ~ GF3
V2F3 = iD + !H + Ew·
In addition to these two variances we can also find the covariance,
W1F23 , between the phenotype of the F2 parent and the mean of the F3
family to which it gives rise. This covariance will of course be of the first
rank. In respect of A-a, an AA F2 individual will have a phenotype of da
and will give rise to a progeny of mean da• Similarly an aa F2 individual
will have a phenotype -da and the mean of its progeny will be -da ; but
an Aa individual in F2 will have a phenotype ha itself while the mean of
its progeny will only be tha. The contribution of A-a to the covariance
will thus be !(da)2 + Hha.tha) + !(-da)2 -tha.!ha, the correction term
being the product of the F2 and overall F3 means. This reduces to tda2 +
!h/ and, summing over all the relevant genes, gives W1F23 = tD + !H.
There will be no E component in the covariance provided that the
non-heritable agencies affecting the progeny are uncorrelated with those
54 Additive and dominance effects
affecting the parents. This lack of correlation can be achieved, and an E
component avoided, by independent randomization of parents and off-
spring in the experiment, so that they do not share a common family
environment. Such independent randomization is a standard practice in
experimental plant breeding; but it is difficult to achieve with higher
animals because of the essential period of maternal care for the young
offspring, with the consequence that the covariance must be expected
to contain an E component in such cases.
We can extend the calculations to the F4 generation, where there are
three variances and two covariances. The first variance, ~F4' is that be-
tween the means of the groups of F4 families, where the members of
each group trace back through a single F3 family to a single F2 individual,
and it is therefore of rank 1. There will be a corresponding covariance,
W1F34 , between the means of the F3 families and the means of the F4
groups. The second variance, V2F3' is the variance of F4 family means
within the groups taken round the group means but averaged over
groups. It will be of rank 2, and will have a corresponding covariance,
W2F34 , between F3 individuals and the mean of the F4 families to which
they give rise, calculated within groups but averaged over groups. Finally
there will be the mean variance of families averaged over all the F4 famil-
ies, which will be of rank 3 since it reflects differences springing from
gametogenesis in the F3 individuals. Provided that Eb is no greater be-
tween families from different groups than between those of the same
group, and making allowance for the appropriate sampling variation of
family and group means, with n individual in each family and n' families
in each group, it can be shown that

~F4 = tD + ifH + ~, V2F4


V2F4 !D + -/-rH + Eb + ~ VJF4
VJF4 = !D + rr,H + Ew
W1F34 tD +-/-rH
W2F34 !D + rr,H.
We can proceed in the same way to F s ' where there will be four variances
and three covariances, and indeed to any later F generation that we wish.
In addition to F3'S other types of family can be raised from F2 parents.
The F2 individuals may for example be mated together in random pairs
to give families of the type that Mather (1949) has called BIPS (for bi-
parental progenies of the third generation). Such random mating of the
Generations derived from F2 55
F2 individuals will obviously give a third generation which (linkage apart)
has overall the same constitution as the F2 itself, and which will thus
have an overall mean of Hh] and an overall variance of tD + !H + E.
As with the F 3 , however, we 'can divide this overall variance into two
parts, the variance of the family means (~S3) and the mean variance of
the families (f'2S3)' the subscript S indicating sib-mating and so allowing
extension of the nomenclature to fourth and later generations raised by
random sib-mating within families. In respect of any gene pair, A-a, there
are six types of mating among the F2 individuals. These, together with
their frequencies where mating is at random, their means and their vari-
ances, in respect of A-a, are shown in Table 11. It is not difficult to see

TABLE 11.
Biparental progenies from random matings among the individuals of an F2

Progeny
Mating Frequency
Mean Variance
AAXAA -h1 d 0
AAX Aa 'f !Cd+h) !(d-h)2
1
AAX aa 8 h 0
1
AaX Aa 'f !h !d2+~h2
AaX aa !Ch-d) i{d+h)2
aa X aa *
-h -d 0
Overall mean !h

from this table that the contribution of A-a to the variance of family
means (~S3) will be
-hd/ + i [t(da + ha)]2 ... -hC-da)2 - Ctha)2 = id/ + -hh/
where the term -Ctha )2 is the correction for the deviation of the overall
mean of the generation from the mid-parent m. Similarly the contribution
of A-a to the mean variance of the families (V2S3) will be -h (0) + !.! (da -
ad;
ha )2 ... + -h (0) = + l~h~. Then summing over all the relevant
genes, adding the non-heritable component of variation and also the item
for sampling variation in ~S3' we find

~S3 iD + -hH + Eb + ~ V2S3


f'2S3 = iD + fr;H + Ew
56 Additive and dominance effects
to which may be added
WIS23 = iD
for the covariance of the family means with the phenotypes of their F2
parents. We can proceed in the same way to S4' the fourth generation
raised by random sib-mating inside the F3 families where just as with F 4,
there will be three variances and two covariances, and indeed to later
generations (see M and J).
These results are collected together in Table 12. A fuller compilation
is given by M and J (Table 44) which includes also the constitution of
variances and covariances from later generations derived from the back-
crosses.

TABLE 12.
Components of variation in F2 and its derivatives

Sampling
Statistic D H Ew Eb variation
VIF2 ! ! 1 0 0
1
VIF3 t 1i 0 1 "iiViF3
V2F3 ! 1 0 0
WzF23 t 1 0 0 0
1
VIF4 t i4 0 0 "ii' V2F4
V2F4 ! :b 0 1 n1 ~F4
V3F4 1 1i 0 0
WzF34 t :b 0 0 0
W2F34 ! -h 0 0 0
1
VIS3 ! 1i 0 1 "ii Vis3
V2S3 ! fi, 1 0 0
WzS23 ! 0 0 0 0
1
VIS4 ! lis 0 0 "ii' Vis4
1
V2S4 1 -& 0 "ii ljS4
V3S4 ! M 0 0
WzS34 ! :b 0 0 0
W2S34 1 :b 0 0 0
The balance sheet of genetic variability 57
13. The balance sheet of genetic variability
Like energy, genetic variability is conserved inside a closed system. Cross-
ing, segregation and recombination, may redistribute it among the vari-
ous states in which it can exist, but in the absence of mutation, random
change and selection its total quantity remains unchanged (see Mather,
1973 for a fuller discussion of the theory of variability). One aspect of
this conservation of variability is revealed by the heritable variances we
have been discussing.
The heritable portion of the phenotypic differences between homo-
zygotes is D- type variation. Heterozygotes contribute to the phenotypic
differences in two ways. They may contribute directly to the pheno-
typic differences among the individuals of a family or generation; but
their contribution may also appear in part as the departure of the gen-
eration mean from the mid-parent, which as we have seen depends on
[h]. Now D and H are both quadratic quantities, in terms of d and h,
but [h] on the other hand is linear. The coefficient of [h] in the depar-
ture of the mean from the mid-parent must thus be squared if it is to be
comparable to the coefficients of D and H. The heritable variation ex-
pressed by the phenotypes of a generation may thus be expressed as
xD + y H + z [h] 2 and in the absence of complicating circumstances, x,
y and z must sum to unity.
In the Fl' x = Y = 0 and z = 1 since the mean is [h]; but in the F2 to
which it gives rise x = 1, y = ! and with the mean at ! [h], z = ! 2 = ! so
once again giving x + y + z = ! + ! + ! = 1. The F3 has an overall mean
of ! [h] so giving z = P = -h. There are two variances whose heritable
components are to be taken into account in the F3 • These are VzF3 =
tD + -hH and V2F3 = !D + !H, sampling variation being left out of
account as any differences it produces are random changes. Thus taken
together these two variances contribute iD + nH and x = i, y = T6
while as we have seen z = -h so completing the tally and giving x + y + z
= 1. The same applies to F4 (see Table 13) and indeed to Fs or any later
generation. In the biparental progenies of the third generation the heri-
table components of the two variances are VzS3 = !D + -hH and V2S3 =
!D + nH n
while the mean is Hh]. So x =! +! = t, y = -h + = 1, z =
(t)2 = ! giving once again x + y + z = 1, and the same can be shown to
apply to S4 the fourth generation, and indeed to Ss etc. raised by con-
tinued sib-mating (see M and J).
It will be observed that the coefficient of D in the successive F gener-
ations follows, as indeed it must, the series 1 -tn-I which gives the pro-
portion of individuals homozygous in the nth generation for the alleles
58 Additive and dominance effects
TABLE 13.
The balance sheet of variability

Coefficient of
Generation
D H [d]2 [h]2
Parents 0 0 12=1 0
Fl 0 0 0 12=1
F2 ! ! 0 (!)2=!

F3 ~F3 ! -h
l'2F3 ! i
Total i 1\ 0 (!)2=-h

F4 ~F4 ! if
l'2F4 ! ~
~F4 i -h
Total ~ i4 0 (i)2=i.J

~S3 ! -h
l'2S3 ! -&
Total ! ! 0 (!i=!

S4 ~S4 ! -&
l'2S4 i1 -&
~S4 OJ 11
Total i M 0 (i)2=l4
Back-crosses ~ ! ! 0
VB 0 0 !
Total ! ! ! (!i=!

at a locus at which the parents differed. Similarly the sum of the coef-
ficient of H and the squared coefficient of [h]2 follows the series tn-I,
since the proportion of heterozygotes at such a locus is halved in each
generation under selfing. In the same way the coefficients of D, Hand
[h) 2 in S3' S4 etc. are related to the Fibonacci series which gives the fall
in the proportion of heterozygotes under continued sib-mating.
The same principle of conservation of variability applies to the joint
Partitioning the variation 59
back-crosses although with the introduction of a fourth compone~t. The
heritable portion of the mean variance of the two back-crosses is VB =
Hf~n + ~2) = iD + iH. The means of the back-crosses are ii1 = !([d]
+ [h]) and ~ = !([h] - [d]) the overall mean of the two taken together
being! [h]. The heritable variance of the back-cross means is thus

The departure of the overall mean from the mid-parent accounts for
(! [h])2 = ![h]2 of the variability, and the coefficients of D, H, [h]2 and
the new component [d]2 thus sum to unity (Table 13). Once this fourth
component of variability is recognized we can complete the picture by
noting that in the parental generation, ~ = [d] and ~ = -[d], giving a
total of [d] 2 for the variability represented by the difference between
the means of these two true-breeding lines from whose cross all the later
generations are descended.
In conclusion we should note that D, H, [d]2 and [h]2 are different
components of variability with different properties. Their coefficients
sum to unity because all the variability must be acounted for, but each
of them has its own special relation to the expression of variability
among the phenotypes. Thus Hand [h] 2 depend on dominance while D
and [d]2 do not. The dominance properties of the genes express them-
selves in different ways in [h]2 than in H: dominance in opposing direc-
tions tends to balance out in [h] 2 but not in H. Furthermore [h] 2,* H
apart from the trivial case where only one gene difference is involved,
for even where all the gene pairs show dominance in the same direction
[h]2 will exceed H by a factor which depends on how many g~ne pairs
are involved and by how much the individual h's vary from one to
another. In the same way [d]2 will reflect the distribution of the genes
between the parents whereas D will not: thus D will be the same in the
cross AABB X aabb as in AAbb X aaBB, whereas [d]2 will not. And
where all the increasing alleles are associated in one parent, AA BB CC
..... , and all the decreasing alleles in the other, aa bb cc ... , [d] 2
will exceed D by a factor depending on the number of gene pairs in-
volved and on the extent to which the individual d's vary from one to
another. We shall have occasion again to touch on these relationships in
a later section.

14. Partitioning the variation


The D, Hand E components of variation differ in the relative contri-
60 Additive and dominance effects
butions they make to the variances and covariances in the various gener-
ations and types of family we can raise from a cross between two true-
breeding parental lines. We can therefore obtain estimates of these com-
ponents by suitable comparisons among the various second degree stat-
istics. This is seen at its simplest if we turn again to the example described
on page 37 of plant height in the PI' P2, F I , F2, BI and B2 families raised
from the 22 X 73 cross of Nicotiana rustica. The earlier analysis showed
that a simple additive-dominance model satisfactorily accounted for the
means of these generations. Now we shall consider the variances of these
same families and obtain estimates of D, H, F and E. These variances are
set out in Table 14. Although we have six variances three of them (Vpl'
TABLE 14.
Variances within families for plant height in the cross between varieties 22 and 73 of
Nicotiana rustica (corresponding with the means in Table 6)

Family Variance Expectation


PI 20.6684 Ew
P2 29.0500 Ew
FI 57.4260 Ew
F2 77.6533 !D + !H+Ew
BI 59.5288 iD + !H-!F+Ew
B2 66.1747 iD + iH+ !F+Ew
Components
D 59.2062
H 27.6304
F 6.6459
Ew 41.1426 (found as i Vpj + i VP2 +! VF1 )

J~ 0.6831 (Dominance ratio)

F
0.1643
V(DH)

Vp2 and VF1 ) are all estimates of Ew- Two of these, from the two parental
families, do not differ from one another, but they do differ from the FI
estimate which is significantly larger. We must therefore combine them
in the way described on p. 51, to give
Ew = !(Vpl + Vp2 + 2 VF1 ) = 41.1426.
Partitioning the variation 61
The combined estimate of Ew together with the remaining three variances
leave us with four equations for estimating the four components D, H, F,
and Ew' So only a perfect fit solution is possible, the equations being
D 4 JjP2 - 2(VBl + VB2 ) = 59.2062
H 4(VBl + VB2 - JjF2 - Ew) = 27.6304
F = VB2 - VB] = 6.6459.
These estimates are tabulated in Table 14. Finally we can estimate the
dominance ratio as y HID) = 0.6831 which agrees with the relatively high
level of dominance suggested by the analysis of the means. The relatively
low value for Fly(D'H) provides little evidence that the dominance devi-
ations at different loci are particularly consistent in sign or magnitude.
Having only four equations for the estimation of four parameters we
must obtain a perfect fit solution to them, and we can neither calculate
the standard deviation of the estimates of D, H, E and F, nor indeed can
we test the goodness of fit of the additive-dominance model as a whole.
To do so requires a more comprehensive experiment such as that described
and analysed by Hayman (1960), which is also discussed by M and J.
Hayman's experiment was again initiated by a cross between two true-
breeding lines of Nicotiana rustica, although it was not the same cross as
the one we have just been considering. The two parents were crossed
reciprocally to give the two reciprocal FI'S from each of which an F2, F3
and F4 were raised. The F3 consisted of 10 families from each reciprocal,
i.e. 20 F3'S in all, and the F4 of 50 families from each reciprocal, the 100
families thus involved being obtained by selfing 5 plants from each of
20 F3 families. Back-crosses were not included in the experiment. The
character we shall be considering is plant height measured in inches. The
plants were grown in two blocks, the plots within the blocks each com-
prising five plants. Each of the F3 and F4 families occupied one plot in
each block, but each parent, FI and F2 was present as five plots in each
of the two blocks. There is internal evidence from Hayman's account of
the experiment that some F4 plants, and it would appear seven F4 famil-
ies, failed in the experiment or were excluded for other reasons. JjF2'
J.jFj and ~F4 were obtained from the variances within plots, round the
plot means, and so include Ew as their non-heritable component. JjP3'
JjF4 and J.jF4 were found as variances between the relevant plot means,
taken round the block means, and so include Eb as well as the sampling
variation stemming from V2F3 , J.jF4 and ~F4 respectively. Since each
plot included five plants, n = 5 and in F4 each group included five
62 Additive and dominance effects
families so giving n' = 5 also. Thus allowing for sampling variation (see
pp. 53-4)
~F3 = tD + fgH + Eb + t V:zF3
= tD + fgH + Eb + HaD + !H + Ew)
and similarly
VzF4 = aD + ii H + Eb + t ~F4
= aD + iiH + Eb + HiD + fgH + Ew)·
Since n' =5
VzF4 = tD + i4H + ~, V2F4
tD + i4H + HaD + iiH + Eb ) + g(!D + fgH + Ew).
The coefficients of D, H, Ew and Eb so obtained are set out in columns
5-8 of the upper part of Table 15. PI' P2 and the reciprocal FI's were
each raised as five plots in each block. Thus not only could an estimate
of E1 = Ew be obtained from the pooled variances of parents and FI's
within plots; but an estimate of E2 , the non-heritable variance between
plots, can also be found from the pooled variances between plot means,
taken round the block means. In addition to Eb this will include an item
of t Ew because of sampling variation resulting from the variances within
plots.
The direct estimates of E1 and E2 , together with the variance of F2 ,
the two variances from F3 and the three from F3 are shown in Table 15,
which also gives the number of degrees of freedom (df) on which each
variance is based. (The details of the derivation of their number of de-
grees of freedom are given by M and J.) There are thus eight observed
statistics from which we must estimate four parameters, D, H, Ew and
E b • This will leave four degrees of freedom for testing the goodness of
fit of the model.
The procedure is essentially the same method of weighted least
squares already described for the analysis of means (page 38). One
difference must, however, be noted. The variances of means, whose re-
ciprocals are used as weights in the analysis, are commonly observed
empirically in the experiments. Replication is, however, seldom suf-
ficient to permit the use of the same procedure where variances them-
selves are to be analysed, and in consequence the theoretical variance
of the variance must be. used to supply the reciprocals for use as weights.
The variance of a variance V is 2 V2/N, where N is the number of degrees
Partitioning the variation 63
TABLE 15.
Analysis of Hayman's (1960) experiment on plant height in Nicotiana rustica

Coefficients of
Observed df First weight
D H

ViF2 69.29 80 0.008331 0.500 0.. 250000 1.00 0

ViF3 43.12 36 0.009681 0.550 0.087500 0.20 1.0

ViF3 36.66 160 0.059526 0.250 0.125000 1.00 0

ViF4 67.84 36 0.003911 0.555 0.024375 0.04 0.2

ViF4 41.29 153 0.044872 0.275 0.043 750 0.20 1.0

V3F4 26.47 770 0.549481 0.125 0.062500 1.00 0

From (E1 12.95 160 0.477035 o o 1.00 0


Vp1, Vp2
and VPl E2 14.06 32 0.080937 o o 0.20 1.0

Expectation after iteration Estimate after iteration


1 2 5 2 5

ViF2 64.49 64.87 65.07 79.98 99.01 97.51


A
ViPJ 61.80 68.37 68.16 H 44.93 8.67 12.63

ViF3 38.88 39.04 39.12 13.27 13.20 13.16

ViF4 48.26 57.79 57,11 11.23 10.52 10.79

V2F4 37.84 40.76 40.79 20.04 17.40 19.31

V3F4 26.07 26.12 26.14 50.64 45.08 48.37


E1 13.27 13.20 13.16 1.43 1.35 1.35
E2 13.88 13.16 13.42 3.65 3.07 3.06
2
X[4J 5.87 3.67 3.68
yI(H/D) 0.75 0.30 0.36

of freedom from which V is estimated. These variances of variances


should, however, be found using not the values observed for VzF2' etc.,
but the values expected for them based on the estimates of D, H, Ew and
Eb obtained by the weighted analysis. In other words finding the best
64 Additive and dominance effects
estimates of the components of variation depends on using weights
which themselves depend on the estimates of the components obtained
using correct weights. We therefore proceed by the process of iteration,
calculating the weights, first from the observed values of ~F2' etc.
These weights are used to obtain estimates of D, etc. which are in turn
used to find expected values for ~F2 etc. New weights are computed
from the expected values of the statistics and the process repeated until
further repetition fails to improve the estimates and the test of goodness
of fit. In the case of Hayman's experiment, two rounds of iteration are
sufficient to achieve this result.
The values observed for the statistics are set out in the second column
of Table 15, from which the first weights used in the first round of cal-
culations, can be found as shown in column 3. Thus for ~F2' its variance
is
= 2 X ~~.292 = 120.028

and the first weight is = 0.008331.


120.028
We then proceed to find the J and S matrices using these weights and
the coefficients, of D, H, E w, and Eb exactly as in the earlier example
except that since there are now four parameters there will be four
equations of estimation (not three as in the earlier example) with the
consequence that J will be a 4 X 4 matrix and S a 4 X 1 matrix. Solution
of the four equations of estimation, by finding r l S, gives the estimates
of D, H, Ew and Eb shown in the second column of the lower right-hand
portion of the table, their standard errors being obtained by taking the
square roots of the four values in the leading diagonal of rl. The values
expected for ~F2 etc. are computed using these estimates of D, etc. and
Xl41 testing goodness of fit with the model is found in exactly the same
way as in the earlier example. This X2 has four degrees of freedom since
four parameters have been estimated from the eight observed statistics.
The Xf41 is not significant even in the first test and there is thus no indi-
cation that the model is inadequate.
Weights for the second iteration are found from the values of ~F2
etc., expected after the first iteration. In the case of ~F2' its expected
value is 64.49, giving as its variance (2 X 64.49 2 )/80 and for the second
weight 80/(2 X 64.49 2) = 0.009 618. Only the weights for V1F3 and V1F4
Partitioning the variation 65
change substantially, in the case of ~F3 from 0.009 681 (first weight) to
0.004713 (second weight) and for ~F4 from 0.003 911 to 0.007 729.
Nevertheless, when a new round of estimation is carried out exactly like
the first calculation except that the new weights are used instead of the
earlier ones, the estimates of D, and especially H are substantially
changed, although Ew and Eb are not materially affected. New expec-
tations can then be found for ~F2 etc. as shown in the lower left portion
of the table and Xf4J calculated to test the goodness of fit. This now turns
out to be Xf4J = 3.67 with a probability of 0.30. Again there is clearly no
indication of inadequacy of the model: indeed the fit is now better than
after the first iteration. The new expectation for ~F2 can be used to find
a third set of weights leading to a third round of calculations, and the
process continued as long as one wishes. Hayman actually carried out
five iterations, and the results of the fifth are shown in the table. It is
clear that nothing was gained by continuing beyond the second round
of calculations.
The standard errors of D, H, Ew and Eb are shown in the lower right
portion of the table. SH is large, so large indeed that there is no good
evidence that H departs from 0, i.e. no good evidence of dominance.
Nor should we be surprised at this when we see how low the coefficients
of H are in the composition of the various statistics found from the ex-
perimental data: dominance clearly contributes relatively little to vari-
ation in the types of family raised in this experiment and we should
therefore expect the estimate of H to be imprecise. It is for this reason
too that the estimate of H changes so much more than those of D, Ew
and Eb as we proceed from the first to the second iteration, and it will
indeed be observed that despite the apparently large size of the change
in the estimate of H it is not in fact large when compared with SH' If a
prime aim of the experiment had been to investigate the dominance
properties of the genes, it would clearly have been desirable to include
in it some types of families to whose variation dominance made greater
contributions: indeed the inclusion of back-crosses would of itself have
materially improved the estimate of dominance effects since H contrib-
utes as much as D to variation in such families.
A type of experiment especially well suited to the detection and
measurement of dominance by the partitioning of variation is the so-
called North Carolina Design III (M and J). It has the further advantage
of leading to a simple analysis of variance, as does Kearsey and Jinks'
(1968) triple test cross, which is an extension of N.C.D.III capable also
of testing for interaction between non-allelic genes. Valuable as these
66 Additive and dominance effects
are in particular respects, N.C.D.III and similar types of experiment are
however of restricted use, as they suffer from two major limitations. In
the first place only certain types of family can be utilized in them, and
the number of variances obtainable from them is so restricted that little
can be done towards testing the validity of the assumption that the genes
contribute independently to the variation under investigation. Secondly,
the analysis of variance, to which such designs lead, offers no means of
combining several different generations into a single analysis, and so of
multiplying the number of statistics available for use in estimating D,
Hand E in the way necessary not only for testing their adequacy as a
represention of the variation but also for estimating the further compo-
nents of variation that, as we shall see in later chapters, may be necessi-
tated when the variation has a more complex structure than is provided
by the simple additive-dominance model. The great merits of analysis by
weighted least squares, illustrated by Hayman's experiment, are that it
leads directly to a test of the adequacy of the model, that it is com-
pletely flexible in regard to the generations and types of family whose
statistics can be brought into the analysis and that it is completely gen-
eral in that it can be extended to cover structures and models of vari-
ation of any degree of complexity.
One final point remains to be noted about Hayman's experiment. He
made no use of the covariances W1F23 ' W1F34 and W2F34 that the F 2 , F3
and F4 can yield in addition to their variances. Furthermore, his F3 famil-
ies were obtained by selfing F2 plants other than those which he measured
for the purpose of finding ~F2 and his F4's were obtained by selfing F3
plants other than those from whose measurements the F3 variances were
obtained. In this way he could ensure that, being based on unrelated
plants, the variances from different generations were uncorrelated.
Suppose, however, the same F2 plants had been used for taking the
measurements from which VzF2 was found and for raising the F 3S. A
sampling correlation between VzF2 and VzF3 would have resulted. Also
if W1F23 had been calculated from the' same F2 measurements and F3
means, it too would have shown a sampling correlation with both the
variances. In such a case the weights used in calculating the estimates of
the components of variation can no longer be the simple reciprocals of
the variances of VzF2 etc., but must take into account the sampling co-
variances of the statistics. A procedure is available for dealing with these
more complicated applications of the method of weighted least squares
(see M and J). No new basic principles are involved since the simpler
analysis we have described is just a special case of the more general
Partitioning the variation 67
approach, but the necessary calculations become much heavier. Whereas
the analysis of an experiment, like Hayman's, designed to avoid the com-
plication of sampling correlation between the statistics, can be carried
out without any great trouble on an electronic desk-calculator, the analy-
sis of results where the statistics are subject to sampling correlations is
virtually impracticable without access to an electronic computer.
Diallels
15. The principles of diallel analysis
Consider two true-breeding lines which differ in the alleles they bear at
a locus, A-a, one thus being AA and the other aa. If they are mated in all
possible combinations the four progenies so produced will of course con-
sist of two which are like the two parents respectively and two which are
the reciprocal Fi s. These four families can be arranged according to their
parentage as in Table 16, which also shows the respective phenotypes
TABLE 16.
The four families obtained by mating two true-breeding lines
differing in one gene, A-a

Female parent
Male Mean
AA aa
parent
d -d 0

AA AA aA
d d h !(d+h)

aa Aa aa
-d h -d t(h-d)
Mean !(d+h) !(h-d) th
Vr led-hi !Cd+h/ !(d2 +h2 )
Wr td(d-h) td(d+h) td2

expressed as deviation from the mid-parent value, m. The table is sym-


metrical round its leading diagonal, each male array (row) having a com-
mon male parent, being like the female array (column) which has the
same genotype as its common female parent. The table also gives the
mean and variance (v,.) in respect of this gene for each array. It will be
The principles of diallel analysis 69
seen that the array variances, like the variances of back-crosses, will differ
only if dominance is present. A further statistic can be calculated for each
array. This is w,., the covariance of the family means within the array with
the phenotypes of their respective non-recurrent parents. Thus for the
array whose common parent is AA, w,. = tda · da + !(-da) ha = tda (da- ha)·
Again Wr is the same for both arrays in the absence of dominance. The
mean variance of the arrays is Vr = t [!{da - ha)2 + !(da + ha)2] = !(d/ +
h/) and the mean covariance is similarly w,. = td/. The variance of the
array means can also be found as
Vy = H!(da + ha)] 2 + Hf(ha - da)]2 - (tha)2 = !d/

and Vy + Vr = !d/ + !(d/ + h/) = td/ + !h/ which equals the contri-
bution of such a gene difference to V1F2 (Table 12), as indeed it obviously
should since an F2 includes AA, Aa and aa individuals in the same pro-
portions as the families of the corresponding genotypes in Table 16.
We can take the analysis further by considering the relation between
w,. and v,.. Now the difference between the variances of the two arrays is
A v,. = i[(da + ha)2 - (da - ha?J = daha and that between the covariance
t
is A w,. = da [ (da + ha) - (da - ha) J = da ha· Thus if we plot w,. against v,.
as in Fig. 7, the line joining the two points must have a slope of daha/daha
Wr
Arra v ca /
~d(d+h)

---~/1
/.~ i
-~Arrcy AA I
I I
I I
r I
I I
o k(dth)
2 Vr
Fig. 7. The W/v,. graph, neglecting non-heritable variation, from a dial1el
set of matings involving one gene difference, A-a, where h = ld. The line
passing through two points, from arrays AA and aa respectively, also passes
through the point Wr , v;. and has a slope of 1. It cuts the ordinate at Wr =
hd2 -h 2 ).
70 Diallels
= 1 and it will pass through the point Wr , ~, which as we have seen will
be the point !d~, Hd~ + h~). So, if we project the line passing through
the two points of the figure backwards it will cut the ordinate, where
Vr = 0, at the value of Wr given by
tvr - V.r = 2~ da2 - ~ (d 2
<J a
+ ha2) = ~ (d 2 -
<J a
ha'
2)

The relative position of the two array points on the line will reflect
the direction of dominance. If the A allele is dominant, that is ha is posi-
tive, the point for array 1 (common parent AA) will occupy the lower
position on the line. If, however, the a allele is dominant and ha negative
the point for array 2 (common parent aa) will occupy the lower position
on the line. This graph therefore tells us a great deal about the genetical
situation. In the absence of dominance, v,. is the same for both arrays
and so is w,.. The two points on the graph will thus coincide exc"ept for
random sampling variation in the estimates of v,. and w,.. If they do not
coincide, the intercept on the ordinate of the line which joins them will
provide a measure of dominance, and in particular where ha < da it will
cut the ordinate above the origin, where ha = d a it will pass through the
origin and where ha > da it will pass below the origin. It should be noted,
of course, that so far we have neglected non-heritable variation, which
will contribute to the different variances (although in a suitably designed
experiment not to the covariances) and for which due allowance must be
made in any analysis of this kind. We will return to the nature of the
necessary allowances at a later stage.
If the two true-breeding lines which are used as the parents of the
families differ at more than one locus the effects of all the genes by
which they differ will be reflected simultaneously in the phenotypes of
the four families derived by mating them in all four possible combi-
nations. In other words da and ha must be replaced by [d] and [h]. The
information to be gained will thus be of the same kind as that obtain-
able from an analysis of means (Section 8) and being restricted to par-
ental and Fl families it will not even yield enough statistics to test the
adequacy of the model. In the previous chapter we examined the limi-
tations of [d] and [h] in respect of the information they provide about
the dominance properties of the genes they depend on. We saw too how
these limitations can be overcome by proceeding to F2 and other segre-
gating generations, which in addition to providing the additional means
needed to test the adequacy of the model also yield second degree stat-
istics enabling us to estimate and bring into the interpretation the quad-
ratic quantities D = S(d 2 ) and H = S (h 2 ). We will now examine an alterna-
tive approach.
The principles of diaZZel analysis 71
Table 16 is the simplest example of a diallel set of mating in which a
number, n, of true-breeding lines are mated together in all possible com-
binations to give n 2 families. Since it involved only two lines (n = 2) it
could clearly give us information about only one genetical difference, or,
if more than one such difference was involved, only about the differences
as a unitary aggregate. If more lines are used, clearly a correspondingly
greater number of differences, or aggregate differences, can be investi-
gated. As the next simplest case let us consider a diallel among four lines
representing all the possible combinations of two gene differences, A-a
and B-b. The genotypes of the 16 families so obtained are shown in
Table 17 as are the phenotypes expected on the assumption that A-a
TABLE 17.
Diallel set of matings involving four true-breeding lines, being all the combinations
of two genes, A-a and B-b

Female parent
Male Mean
parent AABB AAbb aaBB aabb
da+db da-db -da+db -da-db 0
AABB AABB AABb AaBB AaBb
da+db da+db da+hb ha+db ha+hb HCda+ha) + Cdb+h b )]
AAbb AABb AAbb AaBb Aabb
da-db da+hb da-db ha+hb ha-db HCda+ha) + Chb-db )]
aaBB AaBB AaBb aaBB aaBb
-da+db ha+db ha+hb -da+db -da+hb H Cha-da) + Cdb+hbl]
aabb AaBb Aabb aaBb aabb
-da-db ha+hb ha-db -da+hb -da-db HCha-da) + Chb-db )]
Mean HCda+ha) HCda+ha> HCha-da) HCha-da) tCha+hb)
+ Cdb+hbl] + Chb-db)] + Cdb+h b)] + Chb-db )]

v,. HCda-ha)' HCda-ha)' HCda+ha)' HCda+ha>' t(da' +ha'+ db' + hb')


+ Cdb-hbl'] + Cdb +hbl 2 ] + Cdb-h b)'] + Cdb+hbl2] = HD+H)

w,. HdaCda-ha> HdaCda-ha) HdaCda+ha) HdaCda+ha) Hda2+db2 )


+ db Cdb-hb>] + dbCdb+ h b)] + db Cdb-hb>] + db Cdb+hb)] =tD

and B-b contribute independently. At the foot of the table are the four
v,.'s one for each array, and similarly the four w,.'s. It will be observed
that, as in the earlier example, ~ w,. = ~ v,. when we move from one
array to another. Thus moving from array AAbb to AABB gives ~ w,. =
~ v,. = dbhb' and from aabb to AABB gives ~ W, = ~ v,. = da ha + dbhb'
So, if we plot w,. against v,. the four points, one from each array, will lie
72 Diallels
on a straight line of slope I. Furthermore it must pass through the point
w,., J:; which is t{d/ + d b2), Hd/ + h/ + d/ + hb2) and may be rewritten
as !D, l(D + H). The line will thus cut the ordinate at tv,. - v,. = !D-
1(D + H) = ! (D - H). So we can learn something of the average domi-
nance relations of the two genes and indeed, bearing in mind that the
variance among the four parent means is Vp = ! [(da + db )2 + (da - db )2
+ (-da + db)2 + (-da - db)2] = d/ + db2 = D, we can obtain an estimate
of the average dominance as V[(Vp- 41)/Vp] = V(R/D), where I is the
intercept of the regression line with the ordinate.
We should note, too, that now two genes, A-a and B-b, are involved
the relation of w,. to ~ provides a test of the additive-dominance model
of gene action. The phenotypes set out in Table 17 are those expected
when the two gene pairs make independent contributions to the ex-
pression of the character. If their contributions are not independent,
that is if the genes interact in producing their effects, we cannot expect
the relation of Wr and ~, to hold good as we have derived them, and in
particular we can no longer expect the regression of w,. on ~ to be rec-
tilinear with a slope of I.

16. An example of a simple diallel


An example will illustrate how the diallel analysis and the test of the
additive-dominance model can be carried out in practice. The data are
taken from a larger experiment carried out using the eight substitution
lines between the Wellington and Samarkand inbred lines of Drosophila
melanogaster to which we referred in Chapter I. The character followed
was again sterno pleural chaeta number. The results of mating four of the
substitution lines WWW, WWS, WSW and WSS in all combinations are
shown in Table 18. Since the X chromosome was the same in all four
parent lines it can be ignored and the lines will thus be designated by
their contributions in respect of chromosomes II and III. It will be re-
called from Chapter I that all the substitution lines were homozygous
for their respective chromosomes. The set of sixteen matings was dupli-
cated, a complete set being raised on each of two occasions, and the
duplicates are recorded separately in the table, each entry of which is
the mean number of chaeta from five female and five male progeny.
The observations of Table 18 may be subjected to an analysis of vari-
ance. The 16 X 2 = 32 observations had 31 df of which 15 will be for
differences among the 16 matings, 1 for the overall difference between
An example of a simple diallel 73
TABLE 18.
Sternopleural chaeta number in a diallel set of matings among four true-breeding lines,
being all the combinations of the Wellington (W) and Samarkand (S) chromosomes
II and III in Drosophila melanogaster, made on two occasions. The two entries in
each cell of the table are one from each of the two occasions

Male Female parent


Mean
parent SW
WW WS SS
WW 17.45 17.25 18.20 17.65 17.9000
17.65 18.35 18.45 18.20
WS 18.05 18.80 18.10 18.85 18.5940
18.55 18.80 18.45 19.15
SW 17.40 18.40 19.05 18.50 18.6313
18.40 19.00 19.40 18.90
SS 17.95 18.95 18.65 19.10 18.6500
17.15 18.85 18.95 19.60
Mean 17.8250 18.5500 18.6563 18.7438 18.4438

the sets of matings reared on the two occasions, and 15 for the interac-
tion of matings X occasions, i.e. for the differences between the dupli-
cate observations after allowance has been made for the overall differ-
ence between occasions. The 15 df for differences between rna tings may
be partitioned into 3 items, namely 3 df for differences among the 4
genotypes of female parents, 3 for differences among the 4 genotypes
of male parents, and 9 for the interaction of female and male parental
genotypes. The main items for differences among female and male
parents both reflect differences among the same set of four genotypes
and so, in the absence of complications such as maternal effects, should
yield estimates of the same component of variation, which will of course
be the additive variation (D). The item for interaction of female and
male parents will test for departures from simple additivity of the gene
effects, including dominance as well as non-additivity of non-allelic
genes in producing their effects. The analysis of variance is set out in
Table 19. The matings X occasions item provides an estimate of the error
variation. The mean square for occasions is significant, so confirming
that, as might be expected, the experimental conditions were not pre-
74 Diallels
TABLE 19.
Analysis of variance of the diallel data in Table 18

Item df MS VR P
Female parents 3 1.41146 15.02 <0.001
Male parents 3 1.05563 11.23 <0.001
Interaction 9 0.28306 3.01 0.05-0.01

Occasions 0.94531 10.06 0.01-0.001

Matings X Occasions 15 0.09398


(Error)
-----------------------
Reciprocals 6 0.12865 1.37 >0.20

cisely the same at the times when the progenies were raised from the
duplicate sets of matings. The mean squares for the differences among
the four genotypes are significant for both female and male parents,
showing that there is additive genetic variation among these genotypes.
The item for interaction of the differences among the female and male
parents, although not so large, is also significant, so showing that the
differences among the sixteen progenies are not wholly accountable in
terms of additive variation: there must also be present non-additive vari-
ation to which both dominance and interaction of non-allelic genes
could contribute.
The mean squares for female parents and male parents do not differ
significantly from one another, as would be expected if the two sexes
are contributing equally to the genotypes of the progeny. There is thus
no indication of any maternal effect, or of indeed any other departure
from simple autosomal inheritance, and the close comparison of means
for the corresponding arrays from female parents and male parents
shown in the margins of Table 18 confirms this. A further and more
stringent test is, however, possible. The four matings along the leading
diagonal of the diallel table (Table 18) are repeats of the homozygous
parental lines, the female and male parents being of the same genotype
in each case. The other twelve matings are between parents of different
genotypes and fall into six pairs of reciprocal crosses. Provided the
parents contribute equally to the progeny these reciprocals should be
alike within the limits of sampling variation. The mean square for differ-
ences between reciprocals can thus be compared with error variation to
provide a test of equilinearity in the genetical determination of the
An example of a simple diallel 75
character. The mean square is readily found. Thus the duplicate progenies
from WS X WW gives values of 17.25 and 18.35 while those from WW X
WS give 18.05 and 18.55. The difference between the reciprocals is there-
fore 17.25 + 18.35 - 18.05 - 18.55 = -1.0 and the contribution of this
comparison to the sum of squares (SS) is (-1.0)2/4, the divisor 4 reflec-
ting the use of 4 observations in deriving the difference. There are 6 such
differences, obtained from the 6 pairs of reciprocal crosses, as set out in
Table 20, and summing their contributions yields a SS of 0.771 875. This

TABLE 20.
Differences between the offspring of reciprocal crosses in the
data of Table 18

ws SW SS
WW -1.00 0.85 0.75
WS -0.85 0.20
SW -0.20
Total -0.25

SS stems from 6 comparisons and so takes 6 df, thus yielding a MS of


i(0.77 1 875) = 0.12865 as shown below the main analysis in Table 19.
This MS does not depart significantly from the estimate of error vari-
ation and there is hence no evidence of any departure from simple auto-
somal inheritance. The 6 df included in this test are part of the 15 dffor
differences among matings and represent a partition of these 15 different
from the partition used in the main analysis, and testing a dif(erent fea-
ture of the genetical situation. More comprehensive analyses of variance
of the diallel tables are available, notably one by Hayman. These test a
wider range of features, but are more complex to carry out. They will
therefore not be described here, but a full account of Hayman's analysis
of variance is given by M and J.
We have now established that there is not only additive variation, but
non-additive also, between the four genotypes, and that there is no evi-
dence of reciprocal differences. We can proceed to analyse the non-
additive variation further, and in particular to test whether dominance
is adequate to account for it or whether interaction of non-allelic genes
must also be invoked, by examining the relations between w,. and v,..
Since there is no evidence of differences between the progenies of re-
ciprocal crosses, we can combine these to give single values for each
76 Diallels
cross between different lines, and we can of course also pool the values
from duplicate progenies. This gives us the reduced or half-diallel table
shown in Table 21. The entries along the diagonal of this table are for
TABLE2l.
Half-diallel table from the data of Table 18

WW WS SW SS Mean Wr v,.
WW 17.5500 18.0500 18.1125 17.7375 17.8625 0.1427 0.0703
I
WS 18.8000-18.4875-18.9500--18.5719 0.2748 0.1582
SW 19.2250 18.7500 18.6438 0.3232 0.2186
SS 19.3500 18.6969 0.5271 0.4713
Mean 0.3169 0.2296

the progenies of mating within the four parental genotypes and thus are
repeats of these four parental lines. Each is the mean of two duplicate
progenies. Thus for WW X WW we have! (17.45 + 17.65) = 17.55. The
off-diagonal entries on the other hand are the means of four progenies,
namely the pair of reciprocals each of which is represented by duplicate
progenies. Thus the entry for WW X WS is 1(17.25 + 18.35 + 18.05 +
18.55) = 18.05. In proceding to find w,. and v,. we note that, after
pooling our reciprocals, it does not matter whether we work on female
or male arrays: they will give identical results. The WS array for example,
consists ofWW X WS(18.0500), WS X WS(18.8000), WS X SW(18.4875)
and WS X SS (18.9500) as shown by the linking lines in Table 21. Its v,.
is thus 1[(18.0500 2 + 18.8000 2 + 18.4875 2 + 18.9500 2 ) - 1(18.0500 +
18.8000 + 18.4875 + l8.9500P] = 0.1582 the final divisor being 3
because there are 3 df among the 4 progenies. These values of v,. are
entered in the right-hand column of the table.
The calculation of w,. requires a further word of explanation. We
could have used values for the four parental lines obtained from progenies
of these lines obtained independently of the diallel itself. This is, however,
unnecessary as the four parental lines appear along the leading diagonal
of the diallel table and we can in fact utilize these four entries in the
table to provide values of the mean chaeta numbers of the four parental
genotypes. (This introduces a complication in assessing the values of the
components of variation, as we shall see later (p. 80), but one which
An example of a simple diallel 77
does not affect our immediate analysis and so may be ignored for the
moment.) So again taking the WS array as an example, we find its w,. as
1[(18.0500 X 17.5500) + (18.8000 X 18.8000)+(18.4875 X 19.2250)
+ (18.9500 X 19.3500)] -1(18.0500 + 18.8000 + 18.4875 + 18.9500)
(17.5500 + 18.8000 + 19.2250 + 19.3500)] = 0.2748.
The values of w,. for the four arrays are given next to those for the cor-
responding v,. in Table 21.

w,.

0·1

o 0·1 0·2 0·3 0-4 0·5


II,-

Fig. 8. The Wr/v,. graph for sternopleural chaeta number in the defined
diallel among the four lines WW, WS, SW and SS in Drosophila melano-
gaster. The slope of the regression line is b = 0.9172, which does not differ
significantly from 1. The position of the points along this line shows that
the genes from Ware preponderantly dominant and those from S prepon-
derantly recessive.

If we now plot Wr against Vr (Fig. 8) we expect to find a straight line


of slope 1 if the non-additive variation is wholly ascribable to dominance.
The regression of Wr on Vr can be calculated in the customary way.
There are 3 df among the four points, one from each array, and we find
SS(w,.) = 0.076297, SCP(w,., v,.) = 0.081464 and SSe v,:)= 0.089010.
0.081 464 . 0.081 464 2
Then b = 0.089010 = 0.9152 and It accounts for 0.089010 = 0.074 558
of the SS (w,.) leaving 0.001 739 as the remainder SS for deviations from
the regression line. Since the assignment of 1 df to the regression line
78 Diallels
leaves 2 df for the remainder SS, the error variation against which the
regression SS must be tested is HO.OOI 739) = 0.000 870, and t[2] testing
the significance of the slope of the regression line isy (0.074 56/0.000870)
= 9.26 which even with no more than 2 df for the estimate of error, has a
probability of only 0.01. Clearly there is a significant regression of w,. on v,..
Furthermore the standard error of b will be found as y(error variance/
SS( Vr)) = y(O.OOO 870/0.089 010) = 0.0988 and it is clear that the value
of does not depart significantly from 1. Thus, so far as this analysis
goes there is good evidence of dominance, but no evidence that domin-
ance is not wholly able to account for the relation observed between W,
and Vr • In other words dominance is present but there is no indication
of non-allelic interaction: the additive-dominance model is sufficient to
account for the data.
A further and somewhat different analysis of w,. and v,. is possible.
Instead of concentrating on w,. and v,., we can look at Wr + v,. and w,. -
v,., which between them contain all the information that Wr and v,. carry.
Now if dominance (or for that matter certain types of non-allelic inter-
action) are present w,. + v,. must change from array to array. At the same
time, if there is non-allelic interaction w,. - v,. will vary between arrays,
although if only dominance is present, w,. - v,. will not vary more than
expected from error variation. Now, we can calculate Wr and v,. for each
array not only from the data pooled over duplicates as we did above, but
also separately from each of the duplicate occasions. The calculation is,
of course, exactly as with the pooled data but using the separate data
from each individual occasion. The values of w,. and v,. so obtained are
shown in Table 22, together with the w,. + v,. and w,. - v,. derived from

TABLE 22.
Values of Wr and v,. from the two occasions
Occasion 1 Occasion 2
Array
Wr v,. Wr+v,. Wr-v,. Wr v,. Wr+ v,. Wr-v,.
WW 0.1242 0.0275 0.1517 0:0967 0.1283 0.2004 0.3288 -0.0721
WS 0.3750 0.3317 0.7067 0.0433 0.1772 0.0518 0.2290 0.1254
SW 0.3467 0.2781 0.6427 0.0686 0.2914 0.1677 0.4590 0.1237
SS 0.4063 0.3268 0.7331 0.0794 0.6696 0.6538 1.3233 0.0158
An example of a simple diallel 79
them. There are thus eight values for each of w,. + v,. and w,. - v,., one
from each of the four arrays in each of the two halves of the experiment.
We can now carry out an analysis of variance on w,. + v,. and another
similarly on w,. - v,.. In each case there will be 7 df among the 8 ob-
served values, of which 3 can be ascribed to differences between the
arrays and the remaining 4 to the differences between the duplicate
values obtained for each of the 4 arrays. These 4 df could be further
partitioned into 1 df for the overall difference between occasions and
3 df for variation of the 4 array differences round this overall value; but
this is unnecessary in the present case since the overall difference be-
tween occasions is not significant when compared with the residual vari-
ation for the 3 df. We thus have a simple analysis into two parts, one of
which for 4 df is a measure of the variation within arrays between oc-
casions and provides the estimate of error against which the mean square
between arrays can be tested for significance.
The two analyses of variance, for w,. + v,. and w,. - v,. respectively, so
obtained are set out in Table 23. The MS between arrays for Wr - v,. is
TABLE 23.
Analyses of variance of Wr + v,. and Wr - v,.
Item df MS
Between arrays 3 0.2200 VR= 2.77
Wr+ v,.
Within arrays 4 0.0794 P = 0.20 -0.05

Between arrays 3 0.0029 Not


Wr- v,. Within arrays 4 0.0053 significant

not significant when tested against that within arrays and indeed is
smaller than it. There is thus no evidence of any non-allelic interaction;
no evidence, that is, of any inadequacy of the additive-dominance model.
Turning to the analysis of variance w,. + v,., it will be seen that the MS
between arrays is greater than that within them, but not significantly so.
On this evidence alone, therefore, we could not be confident that even
dominance was present. We should recall, however, the evidence from
the initial analysis of variance (Table 19) of non-additive effects, which
must be accounted for in some way. Since there is no evidence of inter-
action between non-allelic genes, we must conclude that although not
formally significant by itself the higher value for the MS between arrays
for w,. + v,., does in fact reflect dominance, and that while the assump-
80 Diallels
tion of additive genetic variation alone is not adequate, the additive-
dominance model does provide an adequate basis for interpreting the
results.
Returning to the overall estimates of Wr and v,. obtained when the
data from the two halves of the experiment are pooled (Table 21), their
mean values are Wr = 0.3169 and v,. = 0.2296. To these two statistics
we may add the variance of the parent lines (Vp) found from the leading
diagonal of the diallel table which, as has already been noted, comprises
the four parental genotypes. We thus find from Table 21 Vp =! [17.550 2
+ 18.8002 + 19.225 2 + 19.350 2 - 1(17.550 + 18.800 + 19.225 +
19.350)2] = 0.675 573. However, before we can use these estimates for
deriving the values of the genetical components of variation D and H,
they must be corrected for the non-heritable items that they contain.
The original analysis of variance of the experiment (Table 19) yielded a
value of 0.09398 for the error variance based on the differences between
the duplicate observations made on each of the sixteen matings in the
table. This error variation reflects, of course, the non-heritable differ-
ences to which the observations are subject and hence provides the basis
for finding the non-heritable components of the three statistics in which
we are now interested. We note that each value along the leading diagonal
of Table 21 is the mean of a pair of duplicate observations. These will
thus be subject to half the error variation of the single observations and
we can estimate the non-heritable component of Vp, which is the variance
of the values in this leading diagonal, as! X 0.09398 = 0.04699. Thus
the heritable component of Vp = D = 0.675 57 - 0.046 99 = 0.628 58.
The off-diagonal entries in Table 21 are, however, the means of four
observations each, and so will be subject to only 1 the error variation of
single observations. v,. for each array is based on three such off diagonal
entries together with one diagonal entry. In other words i of the obser-
vations on which v,. is based are each subject to 1 of the error variance,
and 1 of the observations are subject to ! the error variance. Thus the
non-heritable component of each v,., can be estimated as (i' ! + !. !)
0.09398 = 0.02937 and the heritable component of v,. = !CD + H) =
0.22959 - 0.029 37 = 0.20022.
Turning to ~, we note that it would contain no non-heritable item if
it had been calculated using values of the parental lines from observations
made independently of the diallel matings. In fact, however, we are taking
the parental values from the leading diagonal of the diallel table itself.
So, every w,. will include, as one of the four cross-products from which
it is derived, the square of the appropriate parental value. Thus, for
An example of a simple diallel 81
example, as we have already seen, w,. for the WS array is based on
(18.0500 X 17.5500) + (18.8000)2 + (18.4875 X 19.2250)
+ (18.9500 + 19.3500).
This squared value will bring in an item for non-heritable variation. It is
a value from the leading diagonal of Table 21 and so is the mean of two
observations and it provides one of the four cross-products that contri-
bute to each w,.. Hence the non-heritable component_ of Wr will be (i· !)
0.09398 = 0.011 75 and the genetic component of w,. = !D thus be-
becomes 0.31693 - 0.011 75 = 0.305 18. Before proceeding we might
observe that while the regression of Wr on Jt;. used in analysing their re-
lationship should strictly be the regression of the genetic portion of w,.
on the genetic portion of ~, the regression of the Wr on ~ uncorrected
for their non-heritable components (as used in Fig. 8) will give exactly
the same value for b since we subtract a common non-heritable item
from all four w,. and also a common one from all four Jt;.. The slope of
the regression line is thus not affected, even although its position as
defined by the point ~, v,. , through which it must pass, and hence its
intercept with the ordinate, is valid only after the non-heritable compo-
nents have been deducted.
Returning to our main theme, there is another statistic which we have
not used so far but which can be calculated from the diallel table, namely
the variance of array means, Vr whose heritable component is iD. These
means are shown in Table 21 from which we find Vr = 0.15278. This
variance, too, will contain a non-heritable component. Each array mean
is derived from an array as shown in Table 21, and thus corresponds to
the joint mean of the corresponding female and male arrays of Table 18:
in fact the mean of the WW array is the mean of all the observations in
the first column and first row of Table 18, the observations in the top
left corner each having been used twice. The array mean is thus the
mean of twelve observations used once each and two used twice, thus
being the equivalent of (12 Xl) + (2 X 2) = 16 observations. But when
an observation is multiplied by two, the amount it contributes to a vari-
ance is multiplied by four. So the non-heritable component of the vari-
ance of array sums will be (12 Xl) + (2 X 4) = 20 times the error
variance and the non-heritable variance of array means Vr, will be
correspondingly
20
16 2 (0.09398) = 0.00734.
82 Diallels
So after deducting the non-heritable components we have
= 0.62858, v,. = !CD+ H) = 0.20022
= 0.305 18, Vi=" = iD = 0.14544
We can thus find estimates of D and H as
D = ~(Vp + tv,. + Vr) = ~ (1.07920) = 0.61669
H = 4v,.-D = 0.80088-0.61669 = 0.18419.
Then as an estimate of the average level of dominance we can take

y'(H/D) = J O.18419
0.61669 = ±0.54651.

These results are collected together in Table 24.

TABLE 24.
Components of variation in the diallel of Table 18

Total Non-genetic Genetic Expectation


Vp 0.6756 0.0470 0.6286 D
Wr 0.3169 0.0117 0.3052 !D
V,-
r 0.1528 0.0073 0.1455 !D
v,: 0.2296 0.0294 0.2002 !(D+H)
D = ~(Vp+ Wr+ Vr) = 0.6167
H= 4v,:-D = 0.1842

J~ = 0.5465

We might note that this procedure for estimating D and H is not fully
efficient as we have given Vp, tv,. and Vr equal weight in finding D. A
more complex procedure can be used to provide least squares estimates
which at D = 0.62288, H = 0.17792 and y'(H/D) = 0.53445 are vir-
tually identical with those yielded by the simpler procedure.
While we now have an estimate of the dominance ratio we do not as
yet have any indication as to its direction. But as we have earlier noted,
the order of the points on the w,., v,. graph itself gives an indication of
the relative number of dominant to recessive genes present in the com-
mon parent of each array: the common parent with the most dominant
genes has the smallest values of Wr and v,. and that with the most recess-
An example of a simple diallel 83
ive genes the largest values of ~ and v,.. Now it can be seen from Table
21 that the order of the arrays from the smallest to the largest values of
~ and v,. is WW, WS, SW and SS. Since WW has a smaller value than SW
and WS than SS, the W chromosome II must show dominance over its S
homologue. Similarly WW gives smaller values for ~ and Vr than does WS,
and SW than SS. Thus the W homologue of chromosome III also shows
dominance over the S. Since, therefore, the W homologues are also as-
sociated with a lower score (Table 21) and the S homologues with a
higher score, the direction of dominance is clearly prepondantly for
lower score, the dominance deviations being negative.
This diallel is defined in the sense that the genotypes of the parents,
and hence of the progenies, are known for every mating. It is therefore
possible to approach its analysis in a different way. The progenies of the
sixteen matings fall into the nine genotypes expected for all the possible
combinates of two 'genes' each with two 'alleles'. The different geno-
types are not expected to be produced by the same number of matings;
the homozygotes WWWW, WWSS, SSWW, and SSSS are each represented
by single matings, although of course duplicate progenies are available
for each of them; the four single heterozygotes WWWS, WSWW, SSWS,
and WSSS each came from two matings (reciprocals) and so are rep-
resented by four progenies; and the double heterozygote is produced by
four matings (those along the off-diagonal in Table 18) and so is rep-
resented by eight progenies. The mean chaeta number of the nine geno-
types, obtained by averaging over the appropriate observations in Table
18 are set out in Table 25, together with (in brackets) the number of
observations from which each is derived. The means in the margins of
the table are the means of all flies of the particular genotype in question.
Thus, for example, the mean of all flies homozygous for the W chromo-
some II is given at the bottom of the first column having been found as
i[(17.55 X 2) + (18.05 X 4) + (18.80 X 2)] = l8.11Q5. The expected
departures of these marginal means from the mid-parent value of the
whole experiment are also shown in terms of d2 , d3 , h2 and h3 , where the
subscripts 2 and 3 refer to chromosomes II and III respectively. As will
be readily seen, we can estimate these four parameters from the mar-
ginal means, the chromosome II parameters from the lower margin of
the table and the chromosome III from the right-hand margin. Consider-
ing the chromosome II parameters
d2 H(d2 + !h3 ) - (-d2 + !h3)] = t(19.0188-18.1125] = 0.4531
h2 = H2(h2+th3)-(d2+th3)-(-d2+!h3)] = H2 X 18.3219-
19.0188 - 18.1125] = -0.2438.
84 Diallels
TABLE 25.
Direct estimation of genetic parameters

Chromosome Chromosome II
Mean Expectation
III WjW WjS SjS
WjW 17.5500 18.1125 19.2250 18.2500 m+!h2-d3
(2) (4) (2)
WjS 18,0500 18.1125 18.7500 18.2563 m+!h2+h3
(4) (8) (4)
SjS 18.8000 18.9500 19.3500 19.0125 m+!h2+d3
(2) (4) (2)
18.1125 18.3219 19.0188 18.4438
m-d2+!h3 m+h2+!h3 m+d2 +!h3 m+!h2~h3
m = 18.7532
Chromosome Diallel
Mean
II III analysis

d 0.4531 0.3813 0.4172 0.5553


h -0.2438 -0.3750 -0.3094 - 0.3035

h
-0.5381 -0.9835 -0.7410 - 0.5466
d

These values and those yielded similarly for d3 and h3 by the right-hand
marginal means are shown in the lower part of Table 25. It will be ob-
served that for both chromosomes the S homologue mediated a higher
chaeta number than W, and also that h is negative in both cases, indica-
tive that the W homologue is showing dominance over S for both
chromosomes II and III. Now D = d/ + d 23 = 0.4531 2 + 0.3813 2 =
0.3507 which compares with the estimate D = 0.6167 obtained from
the diallel analysis, and similarly H = h/ + hi = 0.2001 as compared
with the estimate H = 0.1842 from the diallel analysis. The agreement
between the two estimates of H is close, but that between the two esti-
mates of D less so. We should remember, however, that D and Hare
quadratic quantities and hence will tend to magnify apparent discrep-
ancies. In order to make a comparison in linear quantities, let us note
that the direct estimates of d2 and d3 do not differ significantly and
Undefined diallels 85
hence assume that they are equal. Similarly h2 and h3 do not differ
significantly and we assume_that they also are equal. We t~en replace d 2
and d 3 each by their mean d, and 112 and h3 similarly by h. The values
for d and h obtained from the direct analysis are shown in the column
headed Mean in the lower part of Table 25. Turning to the estimates from
the diallel analysis,D = 2d2 andH= 2Ji2. Thend =Y(-!·0.6167) = 0.5553
andh =Y1·0.1842) = -0.3035, these findings being entered in the
column of the table headed Diallel Analysis. That h from the diallel
analysis must in fact have a negative sign is shown, as already noted, by
the order of the points on the w,., v,. graph.
The agreement in respect of h is now strikingly good and that in re-
spect of d reasonably close. In fact, although it is not easy to test the
significance of the difference between the two estimates of d, it is un-
likely to be significant. If we now estimate the average level of domi-
nance by taking hid we obtain -0.7416 from the entries in the mean
column and -0.5466 from the diallel column. The two analyses agree
in showing dominance to be incomplete, lying somewhere between half
and three-quarters, and in the direction of low chaeta number. Evidently
the diallel analysis has produced estimates which are compatible with
those of the direct analysis, over and above it showing that while domi-
nance is present there is no evidence for interaction of non-allelic genes.

1 7. Undefined diallels
Just as a 4 X 4 diallel can be used to investigate two genetic differences,
in the way we have seen, an 8 X 8 could be designed using as parents all
the possible combinations of three genetic differences and used to exam-
ine the behaviour of these genetic differences and to test whether they
showed non-allelic interaction. We could go on to a 16 X 16 to look at
four genetic differences in the same way, and so on. Where, however,
the genotypes of the parents, and hence of the progenies are defined
and known, as in the case we have described, the approach through
direct analysis is always open and will in general lead to more informa-
tive results since the d's and h's are then estimated individually and not
pooled in D and H. The value of applying the diallel analysis to the ex-
periment discussed in the last section, was in fact that it allowed us to
compare its results with those of direct analysis and see that it did effec-
tively extract the same information.
With the vast majority of diallels, direct analysis is not possible because
it is rare for the parental genotypes to be defined as they were in the
Drosophila experiment. Where the differences among the parental geno-
86 Diallels
types are undefined, diallel analysis must be used and two further com-
plications must immediately be taken into account. In the first place we
cannot know that the two alleles (assuming that there are only two) of
any gene are equally common among the parents, other than in excep-
tional cases like the diallel referred to by Jinks et al. (1969) in which the
20 parental lines were descended by selfing from 20 individuals in an F2
of Nicotiana rustica and hence might be expected to have equal fre-
quencies for the alleles at any locus, within the limits of sampling vari-
ation.
Secondly, we cannot be sure either that the pairs of alleles at differ-
ent loci are distributed at random with respect to each other in the way
that can be ensured in a defined diallel. Clearly we must take the possi-
bility of such association of the genes into account in carrying out the
analysis and interpreting its results.
Let us look into the consequences of these complications, starting
with that of unequal gene frequencies. Consider the case where a pro-
portion U a of the parent lines are true-breeding for allele A and pro-
portion va(= I - u a ) are true-breeding for allele a. The mating AA and
AA will then occur in u/ of cases and of aa with aa in v/ of cases, the
remaining 2uava of matings being AA X aa. The frequencies of the types
of matings, together with the genotypes and phenotypes in respect of
this gene difference are shown in Table 26. The array means, variances
and covariances are also shown in the table. Just one point needs noting
about their derivation. The mating AA X AA, for example, constitutes
ua X ua = Ua2 of all matings in the table, but it constitutes ua of the
matings in the arrays stemming from AA parents. Thus the mean of the
AA array is uada + vaha' not u 2 da + v 2 ha. Bearing the same point in mind,
the variance of that same array is found as
v,. = uad/ + vah/ - (uada + vaha)2 = Ua va (da - hal
and the covariance is
w,. = uaoda·da-vada·ha-(ua-va)da(uada+vaha) = 2uavada(da-ha)·
The mean, v,. and w,. of the aa array are found similarly.
We can then see from the table that the changes in v,. and Wy between
the arrays are respectively
~v,. = 4ua vadah a and ~w,. = 4ua vadaha·

Thus inequality of the frequencies of the alleles A and a makes no dif-


Undefined diallels 87
TABLE 26.
Diallel set of matings where u of the parents are homozygous for
allele A, and v(= l-u) are homozygous for allele a

Female parent
Genotype AA aa Mean
Frequency u v
Expression d -d (u-v)d
AA AA Aa
... u u2 UV
...=
CI)
d d h ud+vh
'"
p.
aa Aa aa
CI)
-;J
::E v UV v2
-d h -d uh-vd
Mean ud+vh uh-vd (u-v)d+2uvh
~ uv(d-hi uv(d+h)2 uV[d+(v-u)h]Z+4u 2V2h 2 = t(DR+HR )
w, 2uvd(d-h) 2uvd(d+h) 2uvd[d+(v-u)h] = !Dw
Vp = 4uvd2 = Dp
A w,. = 4uvdh A v,. = 4uvdh

ference to two important properties of -w,. and v,:. First the arrays will
have the same v,: and -w,. in the absence of dominance, i.e. when ha = 0

'*
the arrays will all give the same point, within the limits of sampling vari-
ation, on the -w,., v,: graph. Secondly, where ha 0 the slope of the line
joining the points from the two arrays on the -w,./v,: graph will have a
slope of 1. It will be observed that if ua = va' i.e. if the frequencies of A
and a are equal, all these expressions reduce to those found for the
simple case discussed at the beginning of the Chapter, as indeed they
clearly should.
In extending our consideration to two genetic differences, we note
that where the frequencies of A and a are u a and va respectively among
the true-breeding parents, and the frequencies of Band b are similarly
u b and Vb' the alleles at the two loci will be distributed independently of
each other if the frequency of AB, Ab, aB and ab parents are uaub ' Ua Vb'
vaub and Va Vb respectively. Given that this is the case, and assuming that
the effects of non-allelic genes are simply additive, that is that there is
no non-allelic interaction, it is not difficult to derive the expression for
the array means, variances and covariances shown in Table 27. These
88 Diallels
expressions reduce of course to those in Table 17 when ua = va = ub =
Vb = 1. It will be seen that for any pair of arrays ~ v,. = ~w,.. Thus for
example the differences between arrays aabb and AABB are ~ v,. = ~w,.
= 4 (uavadaha + UbVb db h b) while those between arrays aaBB and AAbb
are ~v,. = ~w,. = 4 (uavadaha -Ubvbdbh b)· Thus in plotting their w,.
against v,. the four arrays will again give four points lying in a straight
line of slope I, and also again the array with the two dominant alleles
will have the lowest values of w,. and v,., and so will give the lowest point
on the graph while the array with the two recessive alleles will give the
highest point with the other two arrays giving intermediate points. Thus
the test of adequacy of the additive-dominance model developed for the
defined diallel in the previous section will apply to undefined diallels.
We should note, however, that an undefined diallel will reveal failure of
the model not only when the genes show non-allelic interaction, i.e. are
not independent in their action but also when the genes show non-
random association among the parents, i.e. are non-independent in their
distribution. Finally, it is not difficult to see that these relations between
w,. and v,., and with them the test of goodness of fit of the additive-
dominance model, still hold good for three, four or indeed any number
of gene differences. They are in fact general properties of diallel sets of
matings.
So far nothing has been said about the genetical components of vari-
ation D and H, and indeed when we turn to these we find complexities
which were not present in the case of the defined dialle!. Turning back
to the case of the single gene difference in an undefined diallel (Table 26)
we find that the contribution this pair of alleles makes to Vp the vari-
ance of the parents is no longer d/, but takes the more general form
4uavad/, which of course becomes d/ when the alleles are equally fre-
quent among the parents of the diallel, i.e. ua = va = 1. With two genes
independent in their actions and their distribution Vp = 4 Uava d/ +
4U bVbdb2 and with any number of genes Vp = S(4uvd 2 ). We may thus
write Vp = Dp where Dp = S(4uvd 2 ).
When we turn to array variances, however, while the contributions of
A-a to v,. may still be written as the sum of two quadratic quantities,
one of which depends solely on h 2 , the other no longer depends solely
on d 2 • The contribution tov,. is in fact uava [da + (Va - ua)haF +
4u/ V/ h/ and generalizing to any number of genes independent in their
actions and their distribution the genetical componen.!. of v,. = S{uv[d +
(v - u)hF + 4u 2 v 2 h 2 }. This can be cast in the form v,. = !CDR + HR)
TABLE 27.
Array frequencies, means, variances and covariances for two gene differences,
A-a with frequencies ua and va' and B-b with frequencies ub and vb

Array
Overall
AABB AAbb aaBB aabb
Frequency Uaub Uavb vaub va Vb

Mean uada+vaha uada+vaha uaha-vada uaha-vada S[(u-v)d + 2uvh]


+ Ubdb +Vbhb +Ubhb-vbdb +ubdb+Vbhb +ubhb-vbdb

v,. uavaCda-ha? uavaCda-ha)2 uava(da+ha)2 uavaCda+ha)2 S [uv(d + v-uh? + 4u 2v2h 2]


+UbVb(db-hb? + Ub vb (db +hb)2 + UbVb (db-hb)2 + Ub Vb (db +hb)2 = !CDR+H~
W, 2uavada(da-ha) 2uavada(da- ha) 2uavadaCda+ha) 2uavadaCda+ha) S[2uvd(d+v-uh)]
+ 2Ub Vb db (db -hb) + 2ubVbdb(db +hb) + 2Ub Vb db (db -hb) + 2Ub Vb db (db +hb) = !Dw
Vp = 4(uavad/ + UbVbdb2) = Dp
90 Diallels
when we use the definitions DR = S {4uv[d + (v - u)hF} and HR =
S(16u 2 V 2 h 2 ) which again reduce to the standard formsD = S(d 2 ) and
H = S(h2) when u = v = t. We shall meet these components DR and HR
again. The covariances are different again for Wr = !Dw where Dw =
S{ 4uvd [d + (v - u)h]). It will be observed that the individual contri-
bution 4uvd[d + (v - u)h] to Dw is the geometric mean of the con-
tributions 4uvd2 to Dp and 4uv [d + (v - u)h] to DR. This is not sur-
prising when we recall that tv,. is the average covariance of offspring,
whose average variance is v,., with their non-recurrent parents, whose
variance is Vp.
Thus the simple assessment of the components of variation that was
possible with the defined diallel is no longer so with the undefined. The
very differences in the definitions of D and H as they appear in Vp , v,.
and tv,. can, however, be turned to profit by the use of a more complex
analysis of the relations between Vp, v,. and tv,. which can not only yield
a measurement of average dominance of the form y[S(h 2 )/S(d2 )], but
also a measure of the average value of uv and hence of the disimilarity
in the frequencies of alleles, and even under certain circumstances of
the relative frequencies of dominant and recessive alleles. This analysis,
which also brings in the variance of array means and covariance, Vr and
Wi, would however, take us beyond the scope of the present discussion.
It is set out fully in M and 1's discussion of diallels.

18. An example of an undefined diallel


The number of parent lines in a defined diallel is rigidly fixed by the
number of combinations of the genes involved: thus with two genes the
number of parents is four, with three genes it is eight, and so on. In
undefined diallels on the other hand there is no such restriction on the
number of parent lines and indeed any number can be used. The values
of u and v will reflect the frequencies of. the two alleles in the actual
set of parents chosen, and although the frequencies of the different
combinations of genes among the parents cannot generally agree pre-
cisely with the frequencies uau b' Uavb' vau b' vavb and so on expected
from independent distributions of the genes, provided the departures
fall within sampling variation the assumption of independent distri-
bution will be sufficiently well realized for the diallel analysis to pro-
ceed without disturbance arising from non-independence of the distri-
butions.
An example of an undefined diallel 91
As an example of the analysis of an undefined diallel we will take the
9 X 9 diallel from Nicotiana rustica quoted by M and J. The parents
were nine inbred lines, the character was date of opening of the first
flower in days after 1st July (the choice of date for the origin is of no
consequence as a change of it merely alters the mean of the experiment),
and the experiment was laid out as 2 blocks, each of which consisted of
81 plots to which the 81 progenies of the 9 X 9 matings were assigned
at random. Each plot comprised 5 plants and the datum from each plot
is the mean flowering time of the 5 plants it contained. The flowering
time of each of the 81 progenie,s, averaged over the 2 blocks, is shown
in Table 28. There was no overall difference between the 2 blocks in
respect of flowering time, and all 81 df for differences between the
duplicate progenies in the 2 blocks may therefore be used in the esti-
mate of error, their mean square being 3.858.
The analysis proceeds in a way exactly analogous to that of the de-
fined diallel in Section 16. The analysis of variance corresponding to
that in Table 19 may be carried out from the data in Table 28, bearing
in mind that since the observations in this table are the means of dupli-
cate plots, the SS found from them must be multiplied by 2 to put them
on the single plot basis. There will be 80 df among the 81 progenies, 8
for differences among the 9 female parents, 8 for differences among the
male parents and 8 X 8 = 64 for interaction. The analysis of variance is
set out in Table 29, and all these items, for female parents, male parents
and interactions, are highly significant when tested against the duplicate
error variance of 3.858.
There are 36 pairs of reciprocal crosses and we can therefore find a
SS corresponding to 36 df for differences between reciprocals., in just
the same way that we did in the earlier example. This turns out to give
a MS of 290.092/36 = 8.058, which has a probability of between 0.01
and 0.001 when tested against the duplicate error. It must therefore be
regarded as significant. We cannot, however, regard it as clearly demon-
strating an extra-nuclear element in the determination of flowering
time: true it could reflect such a determinant, but it could arise in other
ways too. For example, if the inbreeding of the parent lines had not
been completely effective and some residual variation remained in them,
and if precisely the same parent plants had not been used in making the
reciprocal crosses, differences such as those observed could have arisen.
Equally if the seed for each family was sown in a single seed pan,
members of a family, including those plants grown in separate blocks as
well as those in the same block, could resemble one another more than
TABLE 28.
Flowering-time in a 9 X 9 diallel set of matings in Nicotiana rnstica.
All entries are means of duplicate observations from two blocks.
The flowering-times of the parental lines are in italics

Male Female parent


Mean
parent
1 2 3 4 5 6 7 8 9
1 38.90 26.70 39.80 34.80 25.10 29.80 35.70 33.80 25.30 32.2111
2 23.90 27.05 25.00 23.10 21.50 26.20 23.40 20.60 20.20 23.4389
3 34.40 26.60 48.80 29.55 25.00 31.50 36.10 24.40 26.00 31.3722
4 36.10 23.50 31.20 34.10 23.40 29.35 27.20 22.30 25.00 28.0167
5 26.50 23.20 26.00 25.50 26.60 27.50 27.20 20.20 24.20 25.2111
6 28.40 24.10 30.30 31.90 24.15 27.00 27.70 22.40 24.80 26.7500
7 36.90 24.70 41.80 33.90 30.10 29.80 37.00 24.40 29.10 31.9667
8 26.80 19.30 27.80 22.10 19.20 18.80 22.70 15.30 21.80 21.5333
9 25.30 23.30 24.90 24.00 22.50 21.30 27.40 19.00 25.40 23.6778
Mean 30.8000 24.2722 32.8444 28.7722 24.1722 26.8056 29.3778 22.4889 24.6444 27.1309
An example of an undefined diallel 93
TABLE 29.
Analysis of variance of the 9 X 9 diallel in Nicotiana rustica

Duplicate - Error - Reciprocal


Item df MS
VR P VR P
Female parents 8 221.876 56.8 v.s. 27.5 v.s.
Male parents 8 289.541 74.1 v.s. 35.9 v.s.
Interaction 64 20.923 5.4 <0.001 2.6 0.001
Duplicate error 81 3.858
Reciprocals 36 8.058
v.s. = very small

they resembled the plants from the reciprocal crosses started off in a
different seed pan. This would produce the result observed and later
experiments in fact pointed to it as the most likely cause. Whatever the
explanation, however, it is clear that the duplicate error variance is not
a reliable yardstick to use in assessing the significance of the items in the
analysis of variance. When tested against the reciprocal mean square,
8.058, the probabilities of the variances for female and male parents are
still very small and even that for interaction still has a probability as low
as 0.001. Thus even when tested against this new and higher estimate of
error, all the items are still significant.
It should be noted that this test of significance is not strictly valid,
since the 36 df for reciprocal differences are not orthogonal to the 3
items, female parents, male parents and interaction, contained in the
80 df that these 3 items jointly comprise. Since, however, the recipro-
cals mean square is lower than any of the other 3, deduction of the
36 df from the 80 could only serve to raise the mean square attaching
to the residual df's and so raise the VR and hence the significance.
Although our test is not strictly valid, it is a conservative test and we
can therefore accept the significance that it reveals for all 3 items in
the main analysis of variance. The Hayman analysis of variance of these
data described by M and J overcomes this difficulty and confirms the
conclusions from this simple analysis.
The significant interaction item in the analysis of variance shows us
that there is non-additive heritable variation, and we must now con-
tinue the analysis to discover whether this non-additive element can be
accounted for by dominance or whether non-independence of the effects
94 Diallels
of non-allelic genes is also involved. Proceeding just as we did in the
earlier example, the values of w,. + v,. and w,. - v,., taken from M and J,
are listed for each of the nine arrays from each of the two blocks in the
upper part of Table 30, with their analyses of variance in the lower part
TABLE 30.
Wr + v,: and Wr - v,: from the two blocks, 1 and 2

Array
Wr + v,: Wr-v,:
2 2
1 81.939 61.763 13.728 3.419
2 25.105 14.731 11.650 6.141
3 159.529 112.761 9.055 2.106
4 80.289 41.610 7.453 9.125
5 29.814 19.128 14.277 4.919
6 51.130 36.211 21.844 15.149
7 91.263 72.303 20.668 16.212
8 55.178 55.213 19.008 16.410
9 29.152 15.093 15.535 4.691
Analyses of variance Wr + v,. Wr-v,
df MS VR P MS VR P
Between arrays 8 2736.0 9.67 <0.001 50.34 1.96 0.20-0.10
Within arrays 9 282.9 25.95

of the table. As in the Drosophila example, we have not taken out the
single degree of freedom for the block difference because our analysis
of the original data again shows no evidence of such a difference. It is
clear that w,. + v,. varies significantly from array to array whereas w,. -
v,. does not. There is therefore clear evidence of dominance but no evi-
dence of non-independence in effect of non-allelic genes. This means
that not only is there no evidence of interaction between non-allelic
genes in producing their effects, but also that there is no evidence of
the genes being associated in a non-random way in their distributions
between the parents.
We can move on to the regression of w,. on v,.. The arrays pooled over
blocks and reciprocals are set out in Table 31. The values of Wr and v,:
are also shown for each array. The SS for w,. is 2680.75 and for v,. is
2892.76, while the SCP for Wr and V, is 2690.30. The linear regression of W,
on V, is thus b = 2690.30/2892.76 = 0.9300 which does not differ signifi-
An example of an undefined diallel 95
TABLE 31.
Half-diallel table pooled over the two blocks, with array means, v,. and Wr
Arrays
Mean Vr Wr
2 3 4 5 6 7 8 9

1 38.90 25.30 37.10 35.45 25.80 29.10 36.30 30.30 25.30 31.506 30.1665 38.9886
2 27.05 25.80 23.30 22.35 25.15 24.05 19.95 21.75 23.856 5.0059 13.4255
3 48.80 30.38 25.50 30.90 38.95 26.10 25.45 32.108 64.8375 70.3335
4 34.10 24.45 30.63 30.55 22.20 24.50 28.394 23.8661 33.6539
5 26.60 25.83 28.65 19.70 23.35 24.692 6.8419 16.0434
6 27.00 28.75 20.60 23.05 26.778 12.0930 30.3851
7 37.00 23.55 28.25 30.672 31.0601 49.6899
8 15.30 20.40 22.011 18.3805 35.8031
9 25.40 24.161 5.3974 15.8965

cantly from 1, although of course it departs very significantly from


O. Again there is clear eVidence of dominance, but no evidence of non-
independence in the effects of non-allelic genes. Evidently the additive-
dominance model is adequate to account for the behaviour of this dialle!.
The graph showing the regression of w,. on v,. is plotted in Fig. 9. The
lowest point is from array 2 whose parent line must therefore carry the
largest number of dominant alleles, while the highest is from array 3
which must carry the smallest number of dominant alleles. The other
arrays give intermediate points whose order shows the order in numbers
of dominant alleles carried by the parent. We can compare the value of
w,. + v,. for each array with the mean flowering time of the common
parent of that array to see whether the distribution of dominant alleles
is correlated with the phenotypes of the common parent. The parental
flowering times (P) are plotted against w,. + v,. in Fig. 10, from which it
is clear that in general the later flowering lines give the larger values of
w,. + v,. and so must be carrying fewer dominant genes. There is in fact
a significant correlation of r = 0.779 between flowering time and + w,.
v,.. Evidently the genes which give early flowering tend to be dominant.
The anomalous position of line 8, which while being the earliest flower-
ing of all the parents has an intermediate value of w,. + v,., shows how-
ever that not all the genes for early flowering can be dominant and
suggests that there is an ambidirectional element in the dominance re-
lation of the flowering time genes, as would be expected if this character
had been under stabilizing selection (Mather, 1973).
96 Diallels

o 10 20 30 40 50 60

Fig. 9. The Wr/v,. graph for flowering time in the undefined diallel among
nine lines of Nicotiana rustica. The parental line giving rise to the array
represented by each point is indicated by the number against it.

Finally we turn to the components of variation. The values found for


Vp, Wr , v,. and Vy are listed in Table 32. Before they can be used for esti-
mating the components of variation they must be corrected for their non-
genetic components. The corrections are derived using the reciprocal
mean square as the estimate of error, VE = 8.058. The coefficients to be
applied to VE to find the corrections are obtained in the same way as in
the earlier example, bearing in mind that there are now 9 parent lines
and 9 items in each array, not 4 as in the earlier example. These coef-
ficients are shown in the table, together with the actual values of the
corrections and the resulting estimate of the heritable components of
Vp , etc. Vp, 2W,. and 417 yield estimates of Dp, Dw and DR respectively,
while 4(~ - 17) gives an estimate of HR" Now Dp = S(4uvd 2 ), Dw=
S{4uvd [d + (v - u)h]) Cind DR = S {4uv [d + (v - u)hj2}. Thus each
term in D w, is the geometric mean of the corresponding terms in Dp and
DR· If the ratio of [d + (v - u)h] to d is constant over all the genes Dw
itself will then be the geometric mean of Dp and DR' but if this ratio
varies Dw must be less than y'(Dp· DR). In fact, as we see from Table 32
y'(Dp·DR ) =y'(90.162 X 53.947) = 69.742 while Dw is 66.709. The
An example of an undefined diallel 97
140

120

100

80
~
+
~'-
60
-8

40

20

0 10 30 40 50
P
Fig. 10. Wr + v,. from each array of the Nicotiana rnstica diallel plotted
against P the mean flowering time (expressed in days after 1st July) of the
parental line giving rise to that array. Note that all the points lie as a straight
regression line except that from parental line 8. With the exception of line
8, the earlier the flowering of the parent, the smaller the corresponding
Wr + v,., showing that in general the alleles for earlier flowering aredomi-
nant to those for later flowering. The position of point 8, however, indi-
cates that this dominance relation no longer holds when the parent's
flowering time is earlier than mid-July.

TABLE32.
Components of variation for flowering-time in Nicotiana rustica

Total Non-genetic Genetic Expectation


Vp 94.1907 ! VE = 4.0291 90.1616 Dp
Wr 33.8022 is VE = 0.4477 33.3545 !Dw
V;-r 13.7355 1~2 VE = 0.2487 13.4868 !DR
Vr 21.9610 fs VE = 2.2384 19.7226 !(DR+H~
Dp = 4 S [uvd 2] Vp 90.1616
Dw = 4 S[uvd(d+v-uh)] = 2Wr 66.7090
DR 4 S[uv(d+V=tlh)2] 4 V;:- 53.9472
HR = 16 S[U 2 v 2h 2] = 4(Vr-Vy) = 24.9432
98 Diallels
agreement is good and although Dw is slightly less than ...j(Dp· DR) there
is little indication of any serious variation in the ratio [d + (v - u)hl/d.
Evidently all the genes have much the same properties in this respect.
In the defined diallel of the earlier example (Section 16) all u = v, and
Dp, Dw and DR will therefore all be expected to yield estimates of D =
S (d 2 ). While DR actually turned out to be less than Dp with Dw inter-
mediate in value, the differences were small, with DR having a value over
0.92 that of Dp. The differences could therefore be fairly attributed to
sampling variation, and all three could be brought together to give a
common, overall estimate of D = S (d 2 ).
In the present case, however, the differences among Dp, Dw and DR
are marked: DR is only 0.72 Dw and 0.60 Dp. So here [d + (v - u)hl
must be less than d, that is (v - u)h must be negative or to put it in
other words (v - u) and h must be of opposite sign. h will be positive
when the increasing allele is dominant, that is the allele which when
homozygous contributes d to the phenotype (e.g. A), and h will be
negative when the decreasing allele is dominant, that is the allele which
when homozygous contributes -d to the phenotype (e.g. a). Now u is
the frequency of the increasing allele and v that of the decreasing allele,
so that v - u will be positive when the decreasing allele is the more com-
mon. So for (v - u)h to be negative, the increasing allele must be more
common (i.e. v - u negative) when it is dominant (i.e. h positive); and
equally the decreasing allele must be the more common (v - u positive)
when it is dominant (i.e. h negative). The conditions for DR < Dw < Dp
are thus not only that dominance is present (otherwise all h = 0) and
allele frequencies unequal (otherwise v - u = 0) but further that the
dominant alleles are preponderantly more common than the recessives.
Since in the present case the decreasing alleles, leading to earlier flower-
ing times, must in the main be dominant, they must in general be more
common than their increasing counterparts, or in other words since h is
preponderantly negative, v must in general be greater than u.
It is possible to take the analysis still further and arrive at estimates
of the average dominance (h/d) and of the average value of uv (and hence
of u and v); but to do so would take us beyond the scope of our present
discussion. The methods of doing so are set out by M and J.
Genic interaction
and linkage

19. Non-allelic interaction


In the analyses of the foregoing chapters we have assumed that, save in
one respect, the different genes were independent of one another in the
contribution that they made to the various statistics, means, variances
and covariances, under discussion. To put it in other words, we have
assumed that the gene effects were simply additive, the exceptional re-
spect being that we have accommodated dominance by incorporating
the parameter h in the models we have constructed and tested.
Our analyses have included tests of the validity of this assumption -
that, dominance apart, the genes were independent of each other in their
contributions to the means, variances and covariances - in the form of,
for example, scaling tests in the analysis of means or tests of the con-
stancy of Wr - v,. in diallel analysis. When subjected to these tests, the
additive-dominance model by no means always proves to be adequate
for the interpretation of the data: we must then conclude that the as-
sumption of independence is invalid. Nor can we assume that the choice
of a more appropriate scale on which to represent our measurements
would always overcome the problem, for as we have seen earlier, in many
cases a scalar transformation will clearly not serve to remove the diffi-
culty. We need, therefore, means of explicitly accommodating the con-
sequences of non-independence in the analysis.
Now the genes may show non-independence in two ways. First, they
may be influenced by one another in their expression, i.e. they may
interact in producing their effects. Secondly, they may be correlated
with one another in their distribution among the individuals whose
phenotypes are under investigation. We will consider interaction first,
and to see how we proceed let us return for a moment to dominance.
100 Genic interaction and linkage
In the absence of dominance individuals heterozygous for the gene
Aa, would display a phenotype midway between those of the homo-
zygotes AA and aa. The effect of substituting allele A for a would be
independent of whether the allelic gene also present was A or a: the
effects of the alleles would be simply additive and there would be no
need to incorporate h into the model. The incorporation of h is at once
a recognition that alleles need not be independent of each other in
exerting their effects, and the provision of a parameter by which their
interaction can be accommodated and measured. Dominance is thus the
interaction of allelic genes and h is the parameter by which this allelic
interaction is measured. We require corresponding means of representing
and measuring the interaction of non-allelic genes, or non-allelic inter-
action as it is often called.
Consider the simplest case of two gene pairs, A-a and B-b. These can
give rise to nine different genotypes each with its own phenotypic charac-
teristics as shown in Table 33. The differences among these phenotypes

TABLE 33.
Phenotypes from the nine genotypes comprising all combinations
of A-a and B-b

AA Aa aa
BB da +db ha+db -da+db
+iab +jba -iab

Bb da + hb ha + hb -da+hb
+jab +lab -jab

bb da-db ha-db -da-db


-iab -jba +iab

can therefore be completely described by eight parameters, which cor-


respond of course to the 8 df among the nine observations. Four of these
parameters we have already defined, namely da , db' ha and h b . The remain-
ing four parameters can then be conveniently defined as representing re-
spectively the interactions of da and db' da and hb' ha and db and ha andh b •
Now da measures the difference in phenotype between AA and aa, and
similarly db that between BB and bb. If da and db are independent, da
will be the same whether the difference AA-aa is measured in BB or bb
individuals. Thus with independence AABB - aaBB = AAbb - aabb or
AABB - aaBB - AAbb + aabb = 0, where AABB is the phenotype of
Non-allelic interaction 101
AABB etc. Similarly in respect of db' AABB - AAbb = aaBB - aabb or
AABB - aaBB - AAbb + aabb = 0, as before. We can therefore accom-
modate prospective interaction of da and db by including a further par-
ameter iab such that the phenotype of AABB is da + db + i ab , that of
AAbb is da - db - i ab , that of aaBB is -da + db - iab and that of aabb is
-da - db + iab · Then the difference of AA and aa taken over both BB
and bb genotypes is (AABB -aaBB) + (AAbb -aabb) = 4da since the
db'S and iab's cancel out. Similarly the overall difference of BB and bb is
(AABB - AAbb) + (aaBB - aabb) = 4db , and the interaction of these
differences is (AABB -aaBB - AAbb + aabb) = 4iab . The relation of
these four completely homozygous classes has been described completely
by the introduction of the new parameter iab representing the interaction
of da and db. When there is no such interaction iab = 0 since (AABB -
aaBB - AAbb -aabb) = 4iab = o.
Turning to the relation of da and hb' since da represents the difference
between AA and aa, absence of interaction implies that hb will be the
same whether measured in individuals that are AA or individuals that
are aa. In the presence of interaction between da and h b , these two
measurements will not be the same, and we can accommodate the inter-
action by including a new parameter jab such that it is added in the
specification of AABb which is basically da + hb' but subtracted in the
specification of aaBb, which is basically -da + h b . In the absence of
interaction jab = 0, and its value provides a measure of any interaction
that may be present between da and h b . A corresponding parameter jba
can be similarly incorporated into the specifications of AaBB and Aabb
to represent and provide a measure of the interaction between ha and db.
The last of the four interactions, between ha and hb' is covered by a fourth
parameter lab which is incorporated into the specification of AaBb, where
ha and hb appear together.
The formulations of the phenotypes stemming from the nine geno-
types are set out in terms of the eight parameters, 2 d's, 2 h's, i, 2 j's and
I, in Table 33. The interaction terms are easy to derive: wherever the
formulation includes da and db it also includes i; wherever it includes a d
and an h it also includes the appropriate j; and wherever it includes ha
and hb it includes I. In all cases the coefficient of the interaction term is
the product of the coefficients of the two main items: thus da + db is
accompanied by iab whose coefficient is 1 Xl, while da - db takes -iab
the coefficient being 1 X -1, and so on. The system is readily extend-
able to the development of parameters covering trigenic and even more
complicated interactions but we shall not now concern ourselves with
102 Genic interaction and linkage
these. One further point requires clarification. d and h were defined in
Chapter 3 as deviations from the mid-parent m, that is mean of the two
true-breeding parents from whose cross the families were derived. This
definition of m can be seen now to be no longer adequate, for if we
start with a cross between AABB and aabb, the mid-parent is
H(m + da + db + iab ) + (m - da - db + iab )] = m + iab
whereas the alternative cross, AAbb X aaBB, gives a mid-parent of
t [em + da - db - iab ) + (m - da + db - iab )] = m - iab
even although in the absence of linkage it gives just the same distribution
of genotypes in F2 and other derived generations as did AABB X aabb. In
neither cross do the deviations cancel out and they leave residua which
have opposite signs. The mid-parent must in fact be redefined as the
mean of all the possible true-breeding combinations obtainable from the
two gene pairs - in this case the mean of AABB, AAbb, aaBB and aabb
which gives

H(m + da + db + iab )+ (m + da -db - iab )


+ (m - da + db - iab ) + (m - cia - db + iab )] = m.

Before we proceed to discuss the use in analysis of these four inter-


action parameters, we should observe that since, together with the d's
and h's, they afford a complete account of any differences that may be
observed among the phenotypes of the nine genotypes, it follows that
any system we may care to consider of interrelations among the nine
phenotypes can be defined in terms of these parameters. Thus all the
classical types of digenic interaction elucidated by Bateson and others
in the early days of genetics can now be defined in biometrical terms.
To take but two of the six classical interactions illustrated by Darlington
and Mather (1949, Fig. 38), complementary gene action, first elucidated
by Bateson and Punnett, gives a characteristic 9: 7 ratio in F2 because as
Bateson and Punnett showed by breeding tests the genotypes AABB,
AaBB, AABb and AaBb all had one phenotype, while AAbb, Aabb, aaBB,
aaBb and aabb all had another. Allowing for the frequencies with which
the genotypes appear in F2 , the first group will include -h (I + 2 + 2 + 4)
= -k while the second will include To (l + 2 + I + 2 + I) = -?6 of the F2
individuals. Writing these relations in our biometric notation, the likeness
in phenotype of AABB, AABb, AaBB and AaBb requires that da + db + i
= da + hb + ja = ha + db + jb = ha + hb + I, and the likeness of aaBB,
Non-allelic interaction 103
aaBb, AAbb, Aabb, and aabb requires that -da + db - i = -da + hb - ia
= da - db - i = ha - db - ib = -da - db + i where for the sake of conveni-
ence we write iab as i, iab as ia , i ba as ib and lab as I. It is not difficult to
show that these equations are satisfied if, and only if,

da = db = Ita = hb = i = ia = ib = I.
Now in our usage, the designation of the commoner phenotypes as
being produced by AABB, AaBB, etc. implies that this commoner pheno-
type is the one with the greater expression of the character. Clearly there
could then be a counterpart situation where the phenotype with the
lesser expression would constitute n, of the F2 , and that with the greater
expression only k This would arise where the phenotype of aabb, Aabb,
aaBb and AaBb were alike on the one hand and those of aaBB, AaBB, AAbb,
AABb and AABB were alike on the other. The equations then became
-da - db +i = ha - db - ib = -da + hb - ia = ha + hb + I
and

-da + db - i =ha + db + ib = da - db - i = da + hb + ia = d a + db + i.
These equations are satisfied when da = db = -ha = -hb = -i = ia = ib
= -I. Thus the general conditions for classical complementary action
are that all eight parameters are equal in size, with two j's positive like
the d's and i and I having the same sign as the two h's, which themselves
are of the same sign.
The second classical interaction we will consider is that of so-called
duplicate genes, which give a 15: 1 ratio in F 2, aabb being the only geno-
type to give a unique phenotype where the commoner phenotype has the
greater expression of the character and AABB being the genotype with
the unique phenotype where the commoner class has the lesser expression
of the character. In the former case AABB, AABb, AaBB, AaBb, aaBB,
aaBb, AAbb and Aabb must have the same phenotype from which it
follows that

da + db +i = da + hb + ia + db + ib = ha + hb + I
= ha
= -da+db-i =-da+hb-ia = da-db-i = ha-db-ib ·
These equations are satisfied if da = db = ha = hb = -i = -ia = -ib = -I.
The counterpart situation where AABB is unique and aabb, aaAb, Aabb,
AaBb, AAbb, AABb, aaBB and AaBB are alike arises where
104 Genic interaction and linkage
da = db = -ha = -h b = i = -ia = -ib = I.

So we see that duplicate interaction arises when all the parameters have
the same magnitude, and the two j's are negative while i and I have the
opposite sign to the h's. To abbreviate, the condition for complementary
action is that
da = db = ± ha = ± hb = ±i = ia = ib = ±I
while the condition for duplicate interaction is similarly
da = db = ± ha = ± hb = +i = -ia = -ib = + I.
The value of this approach is that we can now generalize the notion
of complimentary and duplicate action. For example if we write

e(da=db=±ha=±hb) = ±i = ia = ib = ±I
we have no interaction when i = i = I = 0 i.e. e = 0, full complemenary
interaction when e = I, partial complementary interaction when the
interaction parameters are all equal but less than the d's and h's i.e.
0< e < I and over or super-complementary interaction when e> 1.
Furthermore when e = -1, we have full duplicate interaction, when
o > e> -I partial duplicate interaction, and when -I > e over or super-
duplicate interaction. We shall see later how this generalization can be
put to use. Other more complicated generalizations about interaction are,
of course, also possible although none have yet been developed for use in
practice.
One last point remains to be made about the classical interactions. An
F2 giving a 9:3:3: 1 ratio was regarded classically as showing no interac-
tion. In point of fact a 9: 3: 3: 1 or one of its simple derivatives is obtained
whenever da = ± ha' db = ± hb and ± i = ia = ib = ± I. Thus the ratio does
not necessarily indicate an absence of interaction in our sense, but again
implies its own limitations in the relations among the interaction par-
ameters.

20. Interaction as displayed by means


A cross producing an Fl heterozygous for two gene pairs can be made in
two ways. The increasing alleles may occur together in one of the true-
breeding parents and the decreasing alleles in the other, the cross thus
being AABIt X aabb, and the genes being said to be associated. Or each
Interaction as displayed by means 105
parent might carry the increasing allele of one gene and the decreasing
allele of the other, the cross thus being AAbb X aaBB, and the genes
being said to be dispersed. With association of the genes the parental
phenotypes will be m + da + db + i, and m - da - db + i, while with dis-
persion the phenotypes will be m + da - db - i and m - da + db - i. The
FI will have the same phenotype, m + ha + hb + I, no matter from which
cross it is raised. Furthermore, in the absence of linkage, so will the F 2,
whose mean can be shown by combining the classes of Table 33 in the
F2 proportions, to be m + -!ha + -!hb + if. It will be observed that again
the coefficient of I is the product of the coefficient of the two h's, so
illustrating in a new context the general rule for finding the coefficient
of an interaction parameter.
Turning to the back-crosses however, the results from the associated
and dispersed crosses again differ. With the associated cross, the back-
cross to AABB will yield the four genotypes, AABB, AABb, AaBB and
AaBb in equal frequencies and the mean will thus be m + -!da + -!db +
-!ha + -!h b + !i + !ia + Vb + !to Similarly the mean of the back-cross to
aabb will be m - -!da - -!db + -!ha + -!h b + !i - tia - Vb + if. With the
dispersed cross on the other hand the means of the two back-crosses will
be
m + -!da --!db + -!ha + -!h b -!i + !ia -lib +!t
and m - -!da + -!db + -!ha + -!h b -!i - !ia + Vb + !I
respectively.
These results are collected together in Table 34. Before, however,
they can be used in the analysis of experimental data they must be gen-
eralized to cover the case of more than two genes. As we saw in Section 8,
the d's of the different genes must tend to balance one another out where
the genes are dispersed, so leading us to define [d] as the sum of the d's
taking sign into account where some genes are associated in the parents
while others are dispersed. We also defined [h] as the sum of the h's of
the individual genes taking sign into account, although here the sign of
h does not depend on gene association nor dispersion but on the direction
of the dominance itself. In the same way with k gene differences there
will prospectively be -!k(k - I) digenic interactions of types i and I and
k(k - I) digenic interactions of type i, since each pair of genes prospec-
tively yields two i interactions, iab and i ba . We must therefore define [i],
[j] and [I] as being respectively the sums of the -!k(k - 1) i and I inter-
actions and of the k(k - l)i interactions, taking sign into account. Now
with I, as with h, sign will depend solely on the direction of the interac-
106 Genic interaction and linkage
TABLE 34.
Interactions in the means of families of a digenic cross

m da db ha hb iab jab jba lab


Associated AABB X aabb
~ 1 1 1
~ 1 -1 -1
_Fi _ 1
1 1 1
F2~S3 '1 '1 4
~ !1 1
4 T6
B] t 1
'1 t'1 ! 41 41 !
B2 -t -t t t ! -! -! !
Dispersed AAbb X aaBB
~ 1 1 -1 -1
~ 1 -1 1 -1
l!J 1 1
'1 -t 1
'1
1
'1-! 41 -! 41
B2 1 -'1
1 1
'1
1
'1 t -! -! ! !

tion, being positive when, for example, the two h's and I yielded by two
genes are in the same direction, and negative when I is in the opposite
direction to the h's. With i and j interaction, however, not only does the
direction of the interaction itself enter in, but also whether the two
genes in question are associated or dispersed in the parents, as indeed we
can see from Table 34. The i yielded by two genes will be in one direc-
tion when the genes are associated but in the other when they are dis-
persed, whereas if they are intrinsically in the same direction the two j's
will reinforce one another when the genes are associated but will tend to
cancel one another out when the genes are dispersed. The algebraic re-
lations of i and j to the proportions of the k genes which are associated
and dispersed is somewhat complex (see M and J) and need not be de-
tailed here. It is sufficient for us to note that neither [i] nor [j] need be
o in a given cross even where [d] = 0 as a result of partial dispersion of
the genes. As with [d] and [h], however, [i] = 0 does not necessarily
imply that all the individual i's are 0, although [i] =F 0 must imply that
at least some of the i's are not O. The same is of course true of [j] and [l].
We can see from Table 34, but using the generalized forms for [d], [h]
and their interactions, which take into account the effects of association
and dispersion as well as the direction of the interaction
~ = m+ [d] + [i]
Interaction as displayed by means 107
P2 m - [d] + [i]
F1 = m + [h] + [I]
~ m + ![h] + HI]
~ = m + ![d] + ![hJ + Hi] + !U] + HI]
B; = m -![d] + ![h] + ![i] -!U] + ![I].
Six parameters are involved in these expressions and six means are
available for their estimation. We can therefore arrive at perfect fit esti-
mates of the six parameters, thus
m = !ft; +!i5; + 4~- 2~ - 2B;
[d] = !ft; -!i5;
[h] 6~ + 6B;-8~-~-lt~- Hi>;
[i] = 2~ + ]B2 -=-4F~
U] 2B 1 - P1 - 2 B2 + P2
[I] = ~ + i>; + 2 ~ + 4 ~ - 4 Ii1 - 4 B;.
The standard errors of these estimates can be found in the usual way.
Thus, for example,

J-(d) = ! VPj + ! ~ and SId) = vi J-(d)·


The significance of [d] can then be tested by calculating

t = [d]/S[d).
Finding [i]' [j] or [I] significant in such tests is obviously equivalent to
finding significant deviations from zero in the scaling tests; but it has the
additional advantage of yielding estimates of the parameters and therefore
of identifying the type or types of interactions responsible for the depar-
ture from the simple additive-dominance situation. We should note that
the 3 degrees of freedom, from which is derived the Xf3) testing the good-
ness of fit of the model in the joint scaling test described on pages 37-40,
are now being used for estimating the three interaction parameters. No
test of goodness of fit is therefore possible of the new model incorpor-
ating the three types of digenic interaction: indeed as we have seen it is
a perfect fit estimation. More generations such as F3 or second back-
crosses must be included if sufficient equations are to be available to
provide a test of goodness of fit. If in such a case the model involving
digenic interactions proves to be inadequate to account for the results,
108 Genic interaction and linkage
we should have to consider the possibility of trigenic interaction or
some other further complicating factor but this is beyond the scope
of our present treatment.
We may illustrate the procedure of estimation in a simple case by ref-
erence once more to data from the cross between varieties 72 and 22 of
Nicotiana rustica for plant height six weeks after planting in the field
which was analysed in Chapter 3. The C scaling test and the joint scaling
test when applied to these data were highly significant (Table 8). The
simple additive-dominance model is clearly inadequate. Furthermore,
attempts to find an alternative scale on which this model would be ad-
equate failed. If we wish to analyse these data further we must, there-
fore, allow for the presence of non-allelic interaction (or epistasis as it is
sometimes called) in any model we attempt to fit.
Using the perfect fit formulae we can estimate the three interaction
components, [i], [j] and [I] in addition to m, [d] and [h]. As we have
already seen
[d] = t~ - tF;.
On substituting the appropriate family means from Table 8, this gives
[d] = tc80.40 - 65.4 7)
= 7.46.
Similarly,
SId) = v'J[d) = v'[!(l.936)2+!(l.726)2]
= v'1.680 = ± 1.296.

The t for 38 df for testing the significance of [d] is therefore


7.46
t(38) = 1.30 = 5.74

which has a probability of P < 0.001.


These results along with those for the other five components are sum-
marized in Table 35. Five of the estimates are significant, including the
two interaction components [i] and [I]. The significance of these two
interactions components confirms the earlier conclusions from the
scaling tests. Now because we have estimated six components from six
observed means we have no test of the adequacy of the present model.
Normally we would have to raise further generations to provide such a
test. Since, however, the estimate of one of the interaction components,
Interaction as displayed by means 109
TABLE 35.
Estimates of the additive, dominance and digenic interaction components of means
for plant height in the cross between varieties 72 and 22 of Nicotiana rustica

Perfect fit Five component


Component P P
estimate estimate
m 92.93 ±4.76 <0.001 93.50 ± 4.60 <0.001
[d] 7.46 ± 1.30 <0.001 8.64 ± 0.99 <0.001
[h] -28.64 ± 12.21 0.05 -0.01 -30.27 ± 12.13 0.05 -0.01
[i] -19.99 ± 4.61 <0.001 -20.43 ± 4.60 <0.001
[j] 5.68 ± 4.03 >0.05
[I] 21.71 ± 7.91 om - 0.001 22.86 ± 7.88 0.01 -0.001
2
X(1) 1.99 0.20-0.10

[j], does not differ significantly from zero it would appear that a model
in which it was omitted would be adequate for these data.
Fitting a five parameter model by omitting [j] would allow us to test
the goodness of fit of the model by means ofax 2 with one df, and at
the same time improve the precision with which the remaining par-
ameters were estimated. Estimating the five components of this model
proceeds exactly as for the simple additive-dominance model in the joint
scaling test (Chapter 3, Section 9). It leads to the estimates on the right-
hand side of Table 35. As expected the five parameter model is adequate,
the X2 [1) testing its goodness of fit being non-significant. There is also a
marginal improvement in the precision with which we have estimated the
five components, as shown by their lower standard errors.
Since the model is adequate we can conclude that trigenic interactions
and similar complex factors are not making a significant contribution to
the differences among the generation means. We can interpret the data,
therefore, in terms of the additive, dominance and digenic non-allelic
interaction components of the gene action. The h increments of the
majority of individual loci must be negative while the I increments of
the majority of pairs of loci must be positive. The non-allelic interaction
is, therefore, mainly of the duplicate kind.
Before leaving the effects of non-allelic interaction on means we must
note the contribution it can make to heterosis. Heterosis will be observed
when ~ > liz, liz
where is taken as the parent with the greater expression
of the character. As we have seen earlier, in the absence of interaction
liz
~ > requires that [h] > [d], and this in turn requires that one or both
110 Genic interaction and linkage
of two conditions be satisfied, namely
(i) h > d for some or all of the genes; that is there must be over-
dominance at some or all loci.
(ii) [d] < Sd; that is there must be dispersion of the genes between
the parents, the value of [d] being thus reduced by the balancing effects
of the genes of opposite effect in each parent, whereby [h) may exceed
[d) although each h is no larger and may even be smaller than its corre-
sponding d.
These two conditions cannot be distinguished from means alone,
although second degree statistics allow the distinction to be made. At
the same time it is a distinction of great practical importance, since
wherever heterosis depends on overdominance the maximum expression
of the character, for example yield in a crop plant, can be achieved only
by a hybrid breeding programme producing FI's for commercial use.
Where, however, heterosis is due to dispersion of the genes, it is in prin-
ciple always possible to produce a true breeding line expressing the
character to at least as high a degree as the Fp although of course this
may involve the breakage of linkages between the dispersed genes.
Now where digenic interaction is displayed the requirement for
P1 > P1 becomes [h] + [I] > [d] + [i]. This relation clearly offers a num-
ber of possibilities for the production of heterosis. Two effects, re-
inforcing the relations by which heterosis may arise in the absence of
interaction, are however of special importance, namely
(i) That the h's and their associated l's are entirely or at any rate
preponderantly of the same sign, which of course is a feature of comp-
lementary gene action.
(ii) Dispersion of the interacting genes between the parents, so that
although, as is required by complementary interaction, the sign of the
individual i's is the same as that of the h's, [i] will take a negative sign in
the parents.
The first relation will raise the value of [h] + [I], the expression of
the character in Fl. The second will limit the increase in value of [d] +
(i], and may even diminish it relative to [d).
Thus complementary interaction can increase the expression of het-
erosis whether it be due to over-dominance or gene dispersion. It is thus
not surprising that wherever the data permit the analysis to be made,
non-allelic interaction, presumably of the complementary type, has
been found to be a common accompaniment of heterosis. These effects
of digenic interaction on heterosis are illustrated in Fig. 11.
Variances and covariances 111
(-if

Duplicate -6 Complementary

Fig. 11. Heterosis, measured by the excess of the Fl mean over that of the
better parent (~- ~), in relation to non-allelic interaction, measured by 8.
Solid lines show the relationship where 2,4, and 8 gene pairs are respectively
involved with maximum dispersal, i.e. 1 increasing allele in each parent for
2 gene pairs (1/1), 2 in each parent for 4 gene pairs (2/2) and 4 in each
parent for 8 gene pairs (4/4). The broken line shows the relationship for 8
genes where 6 increasing alleles are in one parent and 2 in the other (6/2).
Note that in all cases, except that of 2 gene pairs, the sign of the heterosis is
reversed where duplicate type interaction of sufficient strength is operating.
The diagram assumes that all d's are equal to one another and to all h's, with
all i's = alll's = 8d.

21. Variances and covariances


Although the means of the parent lines and the Fl reflect the effects of
non-allelic interaction, their variances are unaffected because being gen-
etically uniform their variation is entirely non-heritable. Turning to F2
we can find its variance by squaring the phenotype of each of the nine
genotypes as set out in Table 33, multiplying by the frequencies with
which they appear in F2 , summing and subtracting the square of the
mean phenotype, thus
~F2 = -h;(da + db + i)2 + i(ha + db + ib)2 ... + !(ha + hb + 1)2
... + -h;C-da - db + 0 2 - Ctha + thb + !1)2
112 Genic interaction and linkage
which reduces to
~F2 = !(da + !ja)2 + Hdb + !jb)2 + !(ha + !/)2 + t{h b + !Z)2
+ !i 2 + !j/ + Vb 2 + -hz2.
Terms appear in i 2, j/, jb2 and 12 each with a coefficient the product of
its two relevant main effects just as in the case of contribution to means,
but in addition the j's appear in combination with the d's and 1 with the
h's. By comparison with ~F2 where interaction is absent, da is replaced
by (da + !ja)' db by (db + !jb)' ha by (ha + !l) and hb by (hb + !l). The
reason for this is readily apparent if we refer to Table 33. In an F2 the
mean expression of all AA individuals is the mean of the classes AABB,
AABb and aaBB where class AABb is given twice the weight of the other
two, i.e. it is !(da + db + i) + Hda + hb + ja) + t{da - db - i) = da + !ja
+ !hb •
Finding the means of all Aa and aa individuals similarly we obtain the
results set out in Table 36. The mid-parent of AA and aa homozygotes is
TABLE 36.
Mean phenotypes of AA, Aa and aa classes in F2 and F3, expressed
as deviations from the mid-parent, m

Class Mean Deviation Mean Deviation


AA da + ha + !hb da +ha da + ha + !h b da+ha
Aa ha +!l + !h b ha +!Z ha +!Z + !hb ha +!Z
aa -da - !ja + !h b -(da + ha) -da - ha + !h b -(da + ha)
! (AA + aa) !hb !h b

!h b and the deviation from it of AA, aa and Aa are respectively (da + !ja)'
- (da + h a) and (ha + !Z). In the case of gene B-b, the corresponding
deviations are (db + !jb)' - (db + !jb) and (hb + !/). These deviations re-
place da , -da, ha' db' -db' and hb which obtain in the absence of inter-
action.
If we pass on to F 3, taking the generation as a whole, the four complete
homozygotes (AABB, etc.) each comprise l4, the four single heterozygotes
(AABb, etc.) each comprise 0\, and the doubly heterozygous genotype
(AaBb) comprises -A of the individuals. The means of all AA, Aa and aa
individuals are thus da + !ja + !h b , ha + !Z + !h b and -(da + !ja) + !h b
giving deviations of (da + !ja), (ha + !/) and - (da + !ja)' It is not surpris-
ing therefore to find that the total variance of the F3 generation is
Variances and covariances 113
VF3 ~F3 + ViF3 = Hda + V a)2
+ ~(db + !jb)2 + n(ha + !/)2
+ n(h b + !Z)2 + fti 2 + &j/ + &jb2 + ~/2

the coefficients of the terms in i 2 , P and 12 being once again the products
of the coefficients of the relevant main effects. If we proceed further to
find ~F3 and ViF3 we again find terms in (d + *j)2 and (h + !l)2, thus
~F3 + !ja)2 + t{db + Vb)2 + rdha + !Z)2 + fo(h b + !/)2
t{da
+ !i 2 + -hj/ + -hjb 2 + -rh/2
ViFJ = !(da + !ja)2 + !(db + !jb)2 + Hha + *1)2 + Hhb + 11)2
+ foi 2 + ~j/ + ~j/ + -hz2.
The coefficients of the interaction terms are again the products of the
coefficients of the relevant main effects in ~F3' but not in ViF3. Indeed
since ~F3 + ViFJ = VFJ the product rule cannot apply to ViF3 if it
applies to ~F3 and VF3 ·
The covariance of F2 parents and F3 means is

W1F23 = t{da + !ja)(da + Va)


+ t{db + !jb)(db + Vb) + Hha + !I)
(ha +!Z) + Hhb + !/)(h b + !/) + !i2 + foj/ + fojb 2 +-bz2
the product rule applying once again to the interaction coefficients. The
expressions involving da, db, ha and hb are, not surprisingly, the geometric
means of their counterparts in ~F2 and ~F3. We can proceed to find
similar expressions for ~F4' ViF4' and ~F4' which all include terms in
(da + Va), (db + Vb)' (ha + !I) and (hb + !l). The covariances W1F34 and
W2F34 similarly include terms in (da + !ja)(da + Va) etc.
The various expressions relate only to the effects of two gene pairs,
A-a and B-b. They require generalization in two ways. In the first place
in so far as further genes C-c, D-d, etc. are involved, their digenic inter-
action with A-a will be covered for F2 if for (da + ha) we substitute
(da + !Sja) and for (ha +!I) we substitute (ha + tSla) where Sja is the
sum of jab' jae etc. and Sia is the sum of lab' lac etc. The further interac-
tions with B-b are covered by the corresponding substitutions of Sjb and
Sib for jb and lb. The second stage is the generalization of the expressions
to cover all genes showing digenic interactions by writing ~F2 = !D +
!H+I
where D = S(da + !Sja)2, H = S(ha + !Sla)2

and I = !S(i2) + !S(P) + foS(l2).


114 Genic interaction and linkage
Similarly ~F3 = tD + foH + I and V2F3 = aD + iH + I
where
D = S(da + aSia)2, H = S(ha + aS1a)2 and
I = as(i2) + ,*S(P) + ~S(l2) in ~F3

and 1= nS(i2) + ifS(P) + ,*S(z2) in ~F3' Each expression should of


course have an appropriate E attached to it to accommodate non-heritable
variation.
Leaving aside the term I for the moment, these expressions are the
same as already found for ~F2' etc., in the absence of interaction, whose
effects are accommodated by changing the definition of D from S(da2 ) to
S(da + WSia)2 and that of H from S(h/) to S(ha + wSla)2 the coefficient
w changing with the generation, being t in F2 , a in F 3 , i in F4 and so on.
The evidence of non-allelic interaction at least of the i and I types is thus
provided by a test of homogeneity of D and H over generations. The term
I is a distraction in such a test. It too is inhomogenous over generations
but it is also inhomogeous within generations. Short of the cumbersome
and demanding estimation and testing of S (i2), S (p) and S (12) as individ-
ual parameters it is not easy to deal with the inhomogeneity of I with-
out assuming some relation between S(i2), S(P) and S(z2). There has as
yet been insufficient study of interaction to provide any basis for the
handling of I, and indeed beyond demonstrating that interactions are
exerting their effects in distorting the second degree statistics from
which we estimate D and H, experimental studies have provided little
information about the way their consequences are revealed by these
second degree statistics.
We are assuming that genes A-a and B-b are unlinked. It therefore
makes no difference to the variances and covariances of F2, F3 , etc. and
indeed to S2' S3 and other generations derived directly from the initial
cross, whether this was AABB X aabb or AAbb X aaBB. This is not true,
however, of the statistics obtained from families obtained by back-
crossing to the parents. Just as we have seen to be the case with the
means of the back-cross families, their variances differ according to
whether the genes were associated or dispersed in the parents of the
cross. Again just as in the case of the means (Table 34) these differences
appear in the signs the interaction parameters take in the various terms
of the variances. This is well illustrated by the summed variances of the
two back-crosses which is
Variances and covariances 115
VBl + ~2 =
t{da + tia ± !jb)2 + Hdb ± tia + tib)2
+ Hha ±!i + !l)2 +t{hb ±!i+!l)2 +lU ± 1)2 + lVa ±jb)2
where in the case of a double sign the upper one applies where the genes
were associated in the parents of the cross (AABB X aabb) and the lower
where they were dispersed (AAbb X aaBB). It will be seen too that the
summed variances of the back-crosses differ from the variances of F2, F3
etc. not only in their dependence on the distribution of the genes be-
tween the parents but also in the interaction items which are associated
with d and h in the relevant terms. Thus in VBl + ~2 bothj interactions
appear with appropriate signs, in both the da and db terms, and i appears
as well as 1 in the h terms. Furthermore in the purely interaction terms
themselves, i, j and 1 do not contribute separately, but i is always joined
with 1 and ja with jb. Once again D and H as they appear in VBl + VB2
will be inhomogeneous with D and H as they appear in F2 etc., as will
also the I term. VBl + VB2 can thus be brought directly into the test of
second degree statistics for the effects of non-allelic interaction.
With back-crosses the effects of interaction in inflating or reducing
the variances will depend, at least in some measure, on the association
or dispersion of the genes in the parents. In F2 and its derived gener-
ations this is not the case: inflation or reduction of the variances de-
pends only on the direction and nature of the interaction, that is on the
intrinsic signs of the interaction parameters themselves. In
~F2 = t{da + tia)2 + t{db + !jb)2 + !Cha + !l)2 + !Ch b + !/)2
+ ti 2 + lj/ + ijb 2 + -hZZ
i will always tend to increase the variance, but j will tend to increase it
when positive and generally to decrease it when negative. Equally, 1 will
tend to increase the variance when of the same sign as h but will gener-
ally decrease it when of opposite sign. Thus in complementary type
interaction where, as we saw in Section 19, j must always be positive
and i and I the same sign as h, the interaction must always inflate the
value of ~F2' to an extent depending on e. It will equally inflate the
variances in F3, F4 , etc. although to varying degrees depending especially
on w, the coefficient of j and 1 in the terms contributing to D and H.
Equally in duplicate type interaction, where j is always negative while
i and 1 are of opposite sign to h, the interaction will tend to reduce the
variances of F2 and its derived generations, again to varying degrees
depending on the value of e, until e attains the critical ratio where the
116 Genic interaction and linkage
depressing effect of j on ! (d + !j)2 is offset by the increase due to the
!P term, and the effect of Ion !(h + !l)2 is offset by the term /6[2. This
ratio is reached in F2 when () = -1.6 where only two gene pairs are in-
volved in the interaction, but because of the cumulative effects of the
interaction in the D and H components it is attained in values of ()
nearer to zero as the number of genes involved in the system rises. Thus
complementary interactions always tend to raise the variances of F2 and
its derived generations, but duplicate interaction tends to reduce these
variances at least when () is not unduly large (see Fig. 12).
I1F2
5

-~I------2~1----~O------2~1----~,e

Duplicate Complemetary

Fig. 12. The effect on VzF2 of complementary and duplicate type interac-
tion, measured bye, in the cases of 2 and 5 segregating gene pairs. In each
of the two cases all d = all h, and all i = alIj = alII = ed. In both the 2 and
5 gene cases the values of VzF2 are scaled to be 1 when there is no interac-
tion (e = 0).

22. Correlated gene distributions: linkage


The second cause of non-independence of the effects of the various
genes on the phenotype is the correlation of their distributions among
the individuals of the families, groups or generations under observation.
In the generations derived from a cross between true-breeding parents,
the primary cause of correlated distributions of the genes is linkage, to
whose consequences must we turn first.
Consider first the consequences of linkage for the mean expression of
the character. Now, of itself, linkage does not affect the frequencies with
which the alleles of each gene pair are recovered in segregating gener-
Correlated gene distributions: linkage 117
ations: it only leads to particular combinations of the alleles of different
gene pairs appearing with frequencies other than those expected on the
basis of independence. In the absence of non-allelic interaction the in-
crements added to the phenotype by the various gene pairs are additive
and the average effect of a gene on the phenotype will be the same,
apart from sampling variation, no matter what its linkage relations may
be: the relative frequencies of particular combinations in which the
alleles occur with other non-allelic genes will have no effect, because
everyone which is over-common will be balanced by another which is
correspondingly rare. Linkage therefore can of itself have no effect on
the mean measurements of segregating families provided that no non-
allelic interaction is present; and indeed the same will be true of any
correlation of gene distribution whatever its cause, provided it does not
alter the frequencies with which the combinations of allelic genes are
recovered.
Thus linkage will not vitiate the use of the scaling tests for detecting
departures from the assumption of no non-allelic interaction. At the
same time however, where non-allelic interaction is indeed present,
linkage will affect the contribution of this interaction to the mean ex-
pressions of segregating generations: since it determines the relative fre-
quencies with which different combinations of non-allelic genes appear,
it will determine the frequencies with which the different types of inter-
action, i, j and /, arise. This is, however, a complex subject (see M and J,
Section 18), which we will not pursue beyond noting that where the
frequency of recombination between A-a and B-b is p, and q = 1 - p
m + !(ha + h b) + to - 2p)i + to - 2pq)1
m + !(da ± db) + t(ha + h b) ± to - 2p)i + tp(ja ± jb) + to- p)/
m - !(da ± db) + !(ha + h b) ± to - 2p)i - tp(ja ± jb) + to- p)/
where in the case of a double sign the upper refers to coupled genes and
the lower to repulsion, as association and dispersion may properly be
styled in the case of linkage.
Turning to second degree statistics, consider the simplest case of two
genes A-a and B-b. Where the frequency of recombination between them
is again p and they are in coupling the ten genotypes are expected in F2
with the frequencies shown in Table 37, which also shows the phenotypic
deviations from m and the mean phenotypes of the corresponding F3
families, both on the assumption of no non-allelic interaction. The mean
of F2 is tcha + h b ) being unaffected by linkage. The heritable variance of
F2 is found as
118 Genic interaction and linkage
TABLE 37.
Frequencies, F 2 phenotypes and F 3 mean phenotypes of the ten genotypic classes
in an F2 for two coupled genes. In each cell the uppermost entry is the frequency,
the middle entry is the F2 phenotype (expressed as a deviation from the mid-
parent) and the lowest is the F 3 mean. All frequencies should be divided by four.
C indicates coupling and R repulsion double heterozygotes

AA Aa aa
q2 2pq p2
BB da+db ha+db -da +db
da+db !ha + db -da +db
2pq C2q2 2pq
Bb da + hb ha+hb 2p2 -da + hb
da + !hb !ha + !h b ha + hb -da + !hb
!ha + !h b R
p2 2pq q2
bb da-db ha-db -da -db
da-db !ha -db -da-db

T-jF2 = Hq2 (da + db? + 2pq (ha + d b)2 ...


+ q2 (-da - db?] - [!(ha + h b)]2
Hd/ + d b2 + 2(1- 2p)dad b]
+ ![h/ + hb2 + 2(1 - 2p)2 hahb].

The two hitherto unfamiliar terms in this expression involve the re-
combination value, combined in one case with da db and in the other with
ha h b · With free recombination p = !, I - 2p = 0 and the new terms van-
ish to leave the expressions obtained in Section II. With complete link-
age p = 0, I - 2p = I and, aside from non-heritable variation, T-jF2 =
t{da + d b)2 + !(ha + hb)2. The two genes are then acting as one. Even
where recombination occurs, however, the recombinant genotypes will
be rare if p is small, and the genes will effectively act as one except in
so far as selection may isolate one of the rare recombinants.
Where the genes are in repulsion the heritable variance of F2 becomes
T-jF2 = Hd/+db2-20-2p)dadb]+![h/+hb2+20·-2p)2hahb]·
The sign of the term in da db is changed but, as would be expected, that
in ha hb remains the same. It should be noted, however, that ha hb will be
Correlated gene distributions: linkage 119
positive only if ha and hb are reinforcing one another by aCting in the
same direction. If they are opposing one another in action this term will
take a negative sign. Thus reinforcement versus opposition of the h's
resembles coupling versus repUlsion of the genes in its effects on the
signs of the term in p. It should be remembered nevertheless that re-
inforcement versus opposition is a physiological distinction while coup-
ling versus repulsion is a mechanical one.
If we now write
D = d/ + db 2 ± 2(1 - 2p)da db

and H = h/ + hb2 + 2(1 - 2p)2 hahb


where the ± of the term in da db denotes + for coupling and - for repul-
sion, we can put
~F2 = tD+!H+E.

Furthermore, it is easy to show by reference to Table 37

~F3 = t D + -kH + E
WIF23 = 1D+1H
2" g

with the same definitions of D and H.


This revision of the definition of D and H, by comparison with those
of Section 12, accommodates the effect of linkage on the variation as
expressed in any variance or covariance of rank 1 (indicated by the in-
itial 1 in the subscript of, for example, ~F2)' Now just as the mean of
F2 is unaffected by the linkage relations of the genes in the FI from
which it is derived, the mean of an F3 family is unaffected by the link-
age r~lations of the genes in its F2 parent. Thus the means of the F3 fam-
ilies will show the effects of linkage only by virtue of the freq uencies
with which the different genotypes appear in F2, that is in exactly the
same way as does the F2 itself. Hence ~F2' ~F3 and by derivation W1F23
all depend on the same D and H, which themselves reflect the recombi-
nation that occurred at gametogenesis in the Fl' When we turn to the
mean variance of the F3 families the situation is different. The fre-
quencies of the different types of F3 family each with its own variance,
will of course reflect recombination at gametogenesis in the Fl' but their
individual variances, at least in the families derived from doubly hetero-
zygous F2 individuals, will reflect recombination at gametogenesis in the
F2. Thus the mean variance of F3 is of rank 2, because it shows the effects
of two rounds of recombination, just as rank I variances showed the
120 Genic interaction and linkage
effects of only one round of recombination. It is not surprising there-
fore that while the mean variance of F3 can still be written as lSF3 =
iD + iH + E the definition of D and H have changed to

D = d/+d/±2(1-2p)2da d b
and H = h/ + hb2 + 2(1 - 2p)2 (1 - 2p + 2p2) ha h b •
The same definition will apply to lSF4 and W2F34 ' the rank 2 statistics
of F4 , just as the rank 1 definition will apply to ~F4 and W1F34 . The
mean variance of F4 families will, however, by extension of the argu-
ment reflect three rounds of recombination, at gametogenesis in F l , F2
and F 3 , and hence will be of rank 3 as is denoted by it being written as
f3F4. The rank 3 components of variation which appear in f3F4 are

D = d/ + d b2 ± 2(1 - 2p)3 dad b and


H = h/ + hb2 + 2(1 - 2p)2 (1 - 2p + 2p2)2 ha hb.
When we turn to the back-crosses we find that, as might now be ex-
pected, while VBl + VB2 can still be written as tD + tH + E the defi-
nitions of D and H reflect the effects of the linkage, being D = d/ + db 2
± 2(1 - 2p) dadb and H = h/ + hb2 + 2(1 - 2p) hahb. The definition of
D is the same as in ~F2 but that of H is different from any that we have
seen before. If we go on further to the generations derived from the
back-crosses we find the same thing: the effects of linkage are accommo-
dated by characteristic changes in the definitions of D and H which reflect
the number of rounds of recombination, just as they do in the generations
derived from F2 .
Unlike non-allelic interaction, linkage cannot of itself be detected and
its effects measured by the analysis of means. We must go directly to
second degree statistics for this purpose. Before we can do so, however,
we must generalize the results we have obtained from the combination
of two genes to cover any number of them. Now for every two genes
there will be a potential term in da db and another in ha hb' that in da db
taking sign according to the phase of linkage, coupling or repulsion. We
can thus write general expressions for D and H in the form
D = S(d/) + S[± 2(1 - 2p)dad b l and
H = S(h/) + S[2(1 - 2p)2 hahb1
for rank 1 variances and covariances; in the form
Correlated gene distributions: linkage 121
D S(d/)+S[±2(1-2p)2da dbl and
H = S(h/) + S[2(1 - 2p)2 (1 - 2p + 2p2)hahb1
for rank 2, and so on for rank 3 components and for back-crosses and
so on. With k genes there will be k items each to sum in S(d 2 ) and S(h 2 )
and tk(k -1) items in S[± 2(1 - 2p)da dbl etc. and S[2(1 - 2p)2 hahb1
etc.
The linkage of a number of genes will exert its maximum effect when
all are coupled and all their h's are reinforcing. All the terms in p will
then be positive. The consequences of repulsion and opposition can
never be so great, except in the special case of two genes, since more
than two genes can be neither all repulsed nor have their h's all in oppo-
sition. The maximum effects of repulsion and opposition might be ex-
pected when all are linked, the adjacent genes along the chromosome
being repulsed and their h's opposed. Even then the 1st, 3rd, 5th, etc.
will be coupled and reinforcing, as must the 2nd, 4th, 6th, etc. In-
equality of the d's and h's of the various genes will also reduce the
effect of linkage on the components of variation.
Even though linkage was in fact present, its effect on the value of a
statistic could be zero, since the coupling and repulsion items could
balance out as also could reinforcement and opposition. The balance
will obviously depend on the magnitudes of effect of the genes and on
the recombination frequencies, in addition to the phasic relations of the
linkage. Furthermore, even where a balance is struck in the first rank
components of variation the items in the components of other ranks
will not balance so exactly. The effect of linkage may still thus appear,
although in such a case it must be expected to be very small.
The test for linkage is thus basically a test of homogeneity of the D
and H components of variation over rank. In the absence of linkage
these components should be as homogenous between ranks as within
them. With linkage operating, the components should be heterogenous
between ranks by comparison with their variation within ranks. This
test is seen at its simplest by reference to a study of ear-conformation
in barley, described by Mather (1949). Ear-conformation was measured
by an index compounded of ear-length, ear width and the density of the
spikelets in the centre of the ear. A cross was made between two var-
ieties, Spratt and Goldthorpe, each of which was effectively true-
breeding (as indeed varieties of barley normally are) from which an F2
and F3 were raised. The parents thus provide estimates of the non-
heritable variation E1 , between individuals within the plots of ten plants
122 Genic interaction and linkage
into which the experiment was divided, and E2 the non-heritable vari-
ation between the means of plots. Each of the 100 F3 families occupied
one plot. ~F2' W1F23 , ~F3 and V2F3 were calculated, ~F2 being found
as the variance of F2 individuals within the ten plots allocated to it in
each of the five blocks into which the experiments was divided. The
values of ~F2' W1F23 , ~F3' V2F3' E1 and E2 averaged over the five blocks
are shown in Table 38, together with their expectations in terms of the
components of variation.

TABLE 38.

Ear conformation in barley (Mather, 1949). D I and H1 denote the rank 1 com-
ponents, and D2 and H2 the rank 2 components. £1 and £2 are the non-heritable
variances of individuals and family means respectively

Heritable variation
Statistic Observed Expectation
Observed Expected Deviation

ViF2 9713 !D1 + 1H1 +£1 8492 8489 3


W1F23 6833 !D1 + iH1 6833 6844 -11
ViF3 6247 !D1 + 16H1 + £2 6028 6021 7
V;zF3 4313 !D2 + iH2+ £1 ------------
3093 4244 -1151
£1 1221
£2 219 D1 = 10397 D1 + !H1 = 169 77
HI = 131 60 D2 + !H2= 12372

A proper analysis of these results requires the use of least squares


techniques, an unweighted form of which was used by Mather (loc. cit.).
A much simpler analysis will, however, serve to bring out the points in
which we are interested. We can first correct for the non-heritable vari-
ation by subtracting E1 from ~F2 and V2F3 ' and E2 from ~F3' The results
of doing so are shown in the fourth column of the table. Thus corrected,
~F2 supplies an estimate of tD1 + !H1, ~F3 an estimate of tD1 + -hH1
and W1F23 an estimate of tD1 + !fiz where D1 and H1 denote the first
rank forms of D and H. We can thus find W1F23 + 2 VzF3 - ~F2 = D1 =
10397. Then ~F2 + ~F3 + W1F23-~D1 = rt;H1 = 5757.5 givingH1 =
13160.
These joint estimates of D1 and H1 allow us to formulate expectations
for the heritable portions of ~F2' W1F23 and ~F3 as set out in the fifth
column of the table, and the agreement between expectation and ob-
served values is very close. On the assumption that there is no linkage
Correlated gene distributions: linkage 123
and that the second rank components, D2 and H 2, will be the same as
those of the first rank, D.J and H], we can also use these same estimates
to formulate an expectation for V2F3. This expectation, also shown in
the fifth column of the table is 4244, while the value actually observed
was 3093, a difference of 1151. Thus while agreement within the rank 1
statistics is good, agreement for the rank 2 statistic is very poor. Evi-
dently D and H are homogeneous over rank 1 statistics but heterogeneous
between ranks 1 and 2. Linkage must be operating.
The analysis can be taken a step further. Reverting for a moment to
the case of two genes, A-a and B-b, with coupling D] = d/ + d/ +
2 (1 - 2p) da db and D2 = d/ + d b2 + 2 (1 - 2p)2 da db. Both are greater
in value than D = d/ + db 2 which obtains in the absence of linkage. Also
D] > D2 since (1 - 2p) > (1 - 2p)2. Thus with coupling the value of D
will fall from rank 1 statistics to rank 2. With repulsion there would be
a corresponding rise. Furthermore, expressed as a proportion of D] the
fall will be
D]- D2 _ [d/ + d b2 + 2(1 - 2p)da d b) - [d/ + d b2 + 2(1 - 2p)2da db]
D] - da2 + d b2 + 2(1 - 2p)da db

which reduces to
4p(l - 2p)
4(1 - p) when da = db·

This ratio of the fall to D] is at its maximum value of 0.17 when p =


0.29.
We cannot however compare D] and D2 from the barley experiment,
because with only one rank 2 variance we cannot separate D2 and H 2.
We must therefore work in terms of D + tH upon which the heritable
components of V2F3 depend. V2F3 yields us a joint estimate D2 + tH =
4 X 3093 = 12372. The first rank statistics yield D] + tH] = 10397
t
+ C131 60) = 169 77 which is markedly larger than D2 + t H2. There
must therefore be linkage in coupling, with a fall ratio
CD] + tH]) -
CD2 + tH2) = 4605 = 027
D]+tH] 16977· .
Now with the h's reinforcing H] will be greater than H 2 , by 2 ha hb
(1 - 2p)2 [1 - (1 - 2p + 2 p 2)). If ha = hb this fall is 0.06 of H2 when
p = 0.29. If we assume that ha = hb = da = db' as is statistically consist-
ent with the data, the fall ratio of D + tH with 2 genes at p = 0.29 is
0.13. Although the maximum fall ratio in H is at a somewhat lower
124 Genic interaction and linkage
value of p than is that in D, this value 0.13 is a sufficiently good approxi-
mation to the maximum for our purpose, because it is only about half
the fall ratio actually observed. Clearly two coupled genes are incapable
of explaining the barley results. As the number k of linked genes in-
creases, however, the number of terms in da db will increase as !k(k - 1).
They will therefore 100m larger in the composition of D, and the fall
ratio will increase correspondingly, where all the genes are coupled. It
is possible to calculate the maximum fall-ratio given by three or more
genes just as we did for two, and when this is done we find that a mini-
mum of about four coupled genes is required to give a fall ratio of 0.27
as found in the barley. In point of fact, for reasons into which we need
not enter here, the polygenic system governing ear-conformation in this
barley cross is likely to be more complex even than this simple consider-
ation of the fall ratio would indicate (Mather, 1949).
Although scaling tests applied to the means of parents, Fl F2 and F3
in the barley revealed some evidence of non-allelic interaction (probably
arising from inadequacy of the scale), this was clearly too small to
account for the heterogeneity of D and H, a conclusion which is further
substantiated by the homogeneity of ~F2' W1F23 and ~F3. Where, how-
ever, the scaling tests have revealed major interaction, the test of linkage
becomes more difficult. Interaction produces heterogeneity of D and H
over generations, but not within them: linkage gives heterogeneity of D
and H between ranks, but not within them. Difficulty arises, however,
because generation and rank are themselves related, since an additional
rank can be obtained only by introducing an additional generation. In
principle an unambiguous test is possible if a sufficiently complex
crossing programme is used (Van der Veen, 1959), and Perkins and
Jinks (1970) have been successful in obtaining conclusive evidence of
linkage in the presence of interaction using generations of less familiar
types. The whole subject is however, complex and worthy of more
study than it has yet received.

23. Diallels
The means of the families which constitute a set of diallel crosses will
reflect any interaction shown by the genes in which the parental lines
differ. On the other hand, since only these means are used in diallel
analysis, and indeed the families themselves are non-segregating in the
diallels we have been observing, linkage as such can be having no effect
on the variation that we observe and measure. At the same time the
Diallels 125
genes in which the parental lines differ may be correlated in their dis-
tributions among the parents and in such a case their contributions to
the variation among the families of the diallel will not be independent.
The general expression for the effects of digenic interaction on the
means, variances and covariances of a diallel are very complex (M and J,
Table 96). We can, however, learn something of the ways in which both
interaction and correlated gene distributions express themselves in diallel
analysis if we consider the special and relatively simple case of four
parental lines representing all the combinations of two genes pairs with
ua = va = ub = Vb = t (i.e. all gene frequencies equal) but having corre-
lated distributions among the four parents, and where da = db = ha = hb
and i = ia = ib = 1 = ()d (i.e. with digenic interaction of the complementary-
duplicate type). The correlation of the gene distributions is measured by
the parameter c the frequencies of the AABB and aabb parents each
being!(1 + c) and those of the AAbb and aabb parents each being!(1-c).
When c = 0, all the parents occur with the frequency!. When c = 1 associ-
ation is complete, the AAbb and aaBB parents being absent, with A and B
on the one hand and a and b on the other always occurring together as a
single compound gene pair. Equally when c = -1 dispersion is complete,
A always occurring with b and a with B the AABB and aabb parents
being absent. Values of c between 1 and -1 represent various strengths
of association and dispersion.
Similarly the interaction is measured by (). So with da = db and i = ()d
the phenotype of for example AABB, which in general terms is da + db
+ i, can be written as d(2 + (), and that of AAbb as d(-(). Similarly
with h = d and 1 = i = ()d the phenotype of AaBb, which in general
terms is ha + hb + I, becomes d(2 + () and so on. The phenotypes of the
sixteen families in the diallel are set out in these terms in the body of
Table 39, where the frequencies of the four parental lines are also shown
in terms of c.
The diallel table is sufficiently simple for us to undertake a full analy-
sis. The first point to note is that since da = db = ha = hb and ia = i b, the
central two arrays will be alike in the values they yield for v,. and w,. and
so will provide only a single joint point in the w,./v,: graph, which thus
will have only three points instead of the more general four. The mean
of the parents will be !d[(1 + c)(2 + ()) - 2(1 - c)() + (1 + c)(-2 + ())]
= d ()c and the mean of array ab will obviously be the same. Since the
phenotype is d (2 + () for all four classes in the AB array, its mean will
obviously be d(2 + () while the means of the Ab and aB arrays will be
!d[(1 + c)(2 + () - (1 - c)() + (1 - c)(2 + () - (1 + c)()J = d. v,. for
126 Genic interaction and linkage
TABLE 39.

Two-gene diallel set of matings with complementary/duplicate interaction, measured


by 8, and equal gene frequencies but correlated gene distributions, measured by c.
The body of the table gives the phenotypes of the various classes in terms of 8 and
the frequencies of the parents are shown in terms of c. d a = db = ha = hb = d

Genotype Female parent


Frequency and
phenotype AABB AAbb aaBB aabb

t(1+c) AABB AABB AABb AaBB AaBb


d(2 +IJ) d(2 +0) d(2 +0) d(2 +IJ) d(2 +0)

-;:; HI-c) AAbb AABb AAbb AaBb Aabb


...
'"0.. d(-O) d(2 + 0) d(-O) d(2 + 0) d(-O)
'"
'"
0; t(1-c) aaBB AaBB AaBb aaBB aaBb
~
d(-O) d(2 +0) d(2+ 0) d(-o) d(-O)

t(1+c) aabb AaBb Aabb aaBb aabb


d(-2+0) d(2 +0) d(-O) d(-O) d(-2+IJ)

Array mean d(2 +0) d d d(Oc)


v,. a d 2 (1 +0)2 d 2 (1 +0)2 d 2 (2 + 0 2 + 2c)
Wr a d 2 (1 +0)(1 +c) d 2 (1 +0)(1 +c) d 2 (2 + 0 2 + 2c)
Wr + v,. a d 2 (1 +0)(2+0+c) d 2 (1 +0)(2+0+c) 2d 2 (2+02+2c)
wr-v,. a d 2 (1 + O)(c-O) d 2 (1 + O)(c-II) a

the AB array will clearly be 0 since the phenotypes of all its classes will
be alike, and so of course will its w,. also. For the ab array
v,. = !d2 [(l+c)(2+0)2+2(l-c)02+(l+C)(-2+0)2]-d 2 0 2 C2
= d(2 + 0 2 + 2c)
the term d 2 0 2 c 2 being the correction for the mean. The variance of the
parents will obviously be the same as v,. for the ab array, since the pheno-
types of the four classes in the array are the same as those of the corres-
ponding parents. For these reasons also Wr will equal v,. for this array.
Turning to the central arrays we find

v,. = !d 2[(l + c)(2 + 0)2 + (1 - C)(-0)2


+ (1 - c)(2 + 0)2 + (1 + C)(-0)2] -d 2 = dO + 0)2
the d 2 being the correction for the mean. Similarly for these two arrays
Diallels 127
~ = ad2[(1 + c)(2 + e)2 + (1 - c)(2 + e)(-e) + (1 - c)
(-e)2 + (1 + c)(-2 + e)(-e)] - d 2 ec = d 2 (1 + e)(1 + c).
These various results are collected together in the lower part of Table 39,
as are Wr + v,: and Wr - v,: for each array.
A number of conclusions emerge from these results. In the first place
Wr - v,: = 0 for both the AB and ab arrays. Their points in the w,./v,:
graph will thus be on a line of slope I which passes through the origin,
no matter what the situation may be about the interaction and gene
distribution. Furthermore since this line intercepts the ordinate at the
origin it indicates that d = h, which of course agrees with the assump-
tion on which the analysis is based. The point from the central arrays
Ab and aB, will however lie on this line only when w,. - v,: = d 2 (1 + e)
(c - e) = 0 and this will happen only when c = 8. When both interac-
tion and correlation of gene distribution are absent, c = e = 0 and a
straight line is obtained for the regression of Wr and Vr , after due correc-
tion has been made for any non-heritable variation, as indeed we saw in
Chapter 4. This will also happen in the presence of both interaction and
correlated distribution provided that, as measured by e and c respect-
ively, they are equally strong.
Either the interaction by itself (e =# 0, c = 0) or correlation of the
gene distribution by itself (e = 0, c =1= 0) must result in the regression of
Wr on v,: departing from a straight line of slope 1. The relation of this
departure to the strength of the interaction (e) is shown in Fig. 13 and
to the strength of the correlation (c) in Fig. 14. The values of the w,. and
v,: are divided by d 2 (2 + ( 2) in the one case and d 2(2 + 2c) in the other
in order to standardize the graph by making the point for the ab array
fall at ~ :;::: v,: = 1. When e is positive (complementary type interaction)
or c is negative (dispersion of the genes) the central point lies to the
right of and below the line of slope 1 through the origin delimited by
the points from array AB and abo When e is negative (duplicate type
interaction) or c is positive (association of the genes) it lies above and
to the left of the line. The relation of the departure from the line to
the value of e or c is shown by the trajectory the central point follows
with change in e or c. These trajectories are not the same for interaction
and correlated distribution of the genes. Since however the interactive
properties of two genes are presumably fixed and their gene distributions
are equally fixed in any set of parents, we can obtain only one point in
the trajectory and so the difference in trajectories is of no help to us in
seeking to distinguish the effects of interaction from those of association
128 Genic interaction and linkage
1-0 aabb

Duplicate

w,. 0-5

Complement.ary

0-5 1-0 1-5

Fig. 13. The effect of complementary and duplicate type interaction,


measured bye, between two gene pairs on the Writ;. graph from a diallel
set of matings, with da = db = ha = hb and c = O. The AAbb and aaBB
arrays give a common point which lies mid-way on a straight line between
the AABB and aabb points. Complementary interaction causes this point
to move to the right and upwards and the Writ;. graph ceases to be a straight
line, becoming concave upwards. Duplicate interaction produces the op-
posite result, the graph becoming concave downwards. The heavy curve
shows the path of the middle point as it moves under the influence of inter-
action, the numbers indicating the values of e to which the points correspond.

or dispersion. In short, diallel analysis enables us to detect interaction


andlor correlation of the gene distributions but it does not enable us to
distinguish between them.
When both interaction and correlation of the gene distribution are
present, they may either reinforce one another's action in moving the
central point away from the line if e and c are of opposite sign, or op-
pose one another's effects if e and c are of the same sign. They will
balance exactly and the central point will fall on the line itself when-
ever e = c.
We thus see how interaction and correlation of the gene distributions
can affect the Wrlv,. graph in ways which are not distinguishable on the
basis of this evidence alone, and how they can reinforce, oppose and
even cancel out one another's effects on the graph. In conclusion it
should be remembered that we have been considering only the special
case of complementary-duplicate type interaction with equal gene
frequencies. Our findings still hold good when the gene frequencies are
Diallels 129
1-0 aabb

Association

w,. 05
-1-

OAABB 0-5 1-0 1-5

Fig. 14. The effect of gene association and dispersion, measured by c, of two
gene pairs on the Wrlv,. graph from a diallel set of matings, with da = db = ha =
hb and e= O. The effect of association is similar to that of duplicate interaction
and dispersion to that of complimentary interaction, illustrated on Fig. 13. The
path of the middle point with change in c is not however curved, as with interac-
tion, but follows a line parallel to the abscissa as shown by the heavy line. The
numbers indicate the values of c to which the points correspond.

unequal. Indeed they apply generally in respect of correlated gene dis-


tributions, and they are unlikely to be modified in more than detail in
respect of this general type of interaction where the d's and h's are not
equal, although the graph will then have four points on it, not just three
as in the special case we have been discussing. We should not, however,
extrapolate these to other less simple systems of interaction, which
although yet to be fully investigated are known to be capable of pro-
ducing very bizarre effects on the w,. Iv,.
graph.
Interaction of genotype
and environment

24. Genotype X environment interaction


The simple additive-dominance model assumes that gene differences
contribute independently from one another to variation in the pheno-
type. We have seen how failure of this assumption can be detected and
how departures from the model may be produced by the interaction of
non-allelic genes and by the correlation of gene distributions, both of
which may be described in terms of appropriate parameters whose values.
can be estimated from suitable data. As we have developed and used it
so far, the additive-dominance model further assumes that gene differ-
ences and environmental differences also contribute independently of
one another to variation in the phenotype. We must now turn to con-
sider the interaction of gene and environmental differences (or geno-
type X environment interaction as it is commonly called), how such
interaction may arise, and how it can be detected, measured and inves-
tigated.
Genotype X environment interaction has long been known to occur.
An early example is that of Akerman (1922), who reported a genetic
difference affecting the chlorophyll of oats which was undetectable
when the plants were grown in subdued light but revealed itself by the
bleaching and death of one genetic class when they were grown in direct
sunlight. Interaction of genotype and environment must indeed be ex-
pected to occur and in fact some gene changes must themselves result
in marked changes of the environment which the individuals experience;
climbing beans and dwarf beans, for example, must experience very dif-
ferent environments although they may differ in only a single gene. This
is, however, an extreme example and we must expect most cases of
genotype X environment interaction to be much less dramatic. We must
Genotype X environment interaction 131
therefore seek to give a more general account of them and to develop an
appropriately general method for their investigation.
Like other forms of interaction, that between genotype and environ-
ment may arise from the scale on which the character is measured and
represented. An example of this is afforded by the data from Hogben
(1933) quoted by Mather and Jinks (1971) concerning the average num-
bers of facets in the eyes of two strains of Drosophila melanogaster, re-
ferred to as Low-Bar (L) and Ultra-Bar (U) raised at two temperatures,
15 and 25° C. These facet numbers are shown diagrammatically on the
left of Fig. 15. At 15° C, L had on average 146 more facets than U, but

200

2-0
<fl
-+-'
~ 100
o
LL

1-5

o~------------~
15 25

Fig. 15. Krafka's data (from Hogben, 1933) on the mean numbers of facets
in the eyes of two lines of Bar-eyed Drosophila at two temperatures. When
the direct count of eye facets is used (on the left) the difference between
the lines at 15° C (dl5 ) is larger than the difference at 25° C (d25 ), so indi-
cating genotype X environment interaction. When, however, the logs of
the mean numbers of eye facets are used (on the right) d15 and d 25 are
nearly equal. The scalar transformation has removed the interaction.

at 25° C the difference is only 49. At the higher temperature the dif-
ference is only 1/3 of that at the lower. The lines are not reacting equally
to the change in temperature: the effects of genotype and environment
are not additive, or in other words, there must be an interaction of geno-
type and environment. When, however, we change the scale by taking
logarithms of the number of facets, we obtain the picture shown on the
right of Fig. 15. In log measure the difference between Land U is 0.58
at 15° C and 0.4 7 at 25° C. The higher temperature still gives a smaller
difference than the lower, but the reduction is proportionately very
much less than when the untransformed facet number was used. The
132 Interaction oj genotype and environment
log transformation has very much reduced the genotype X environment
interaction, if not entirely eliminated it.
The size of the reduction emerges even more dramatically if we sub-
ject the data to an analysis of variance. The 3 df among the four obser-
vations may be assigned 1 each to the overall effect of the genetic dif-
ference, the overall effect of the environment, and the genotype X en-
vironment interaction. The percentages of the total variation taken out
by each of these three items using direct measure and log measure are:
Item Direct-Measure-Log
Genetic 54.1 66.1
Environmental 32.5 33.2
Interaction 13.4 0.7
Looked at in this way the interaction has been rendered negligible by
the log transformation.
One further point is worth noting before we leave this example. When
considering another Bar-eye gene, in Section 8, we saw that a square root
transformation eliminated that interaction between alleles which we
term dominance, whereas a log transformation did not, and we saw too
that a theoretical interpretation of this finding could be advanced. In
the present example, while a square root transformation reduces the
genotype X environment interaction it is much less effective than the
log transformation. This contrast emphasizes the essentially empirical
nature of choice of a transformation, and the unwisdom of seeking to
draw theoretical conclusions from a successful case of a particular
change of scale.
Not all genotype X environment interactions can, however, be as-
cribed to the use of an inappropriate scale for the representation of the
character. Table 40 sets out the mean numbers of sternopleural chaetae
borne by the two inbred lines, Samarkand (S) and Wellington (W), of
Drosophila when raised in six different environments, which comprised
all the possible combinations of two temperatures 18 and 25° C, and
three types of culture vessel, ! pint milk bottles with yeasted food (B),
1 X 3 inch vials with yeasted food (Y), and similar vials with unyeasted
food (U). Five cultures were reared of each line in each environment,
the figures in the table being the means of all the five replicate cultures
in each case. Comparisons among the five replicates give us an estimate
of error variation which will be based on 4 df within each combination
of genotype and environment. Since there are 2 X 6 = 12 such combi-
Genotype X environment interaction 133
TABLE 40.
Mean numbers of sternopleural chaetae in the Sand W inbred lines of
Drosophila melanogaster, their Fl and F2 raised in six environments

18°C 25°C Error


variance
B Y U B Y U
S 20.58 20.51 20.26 20.44 20.93 20.66 0.020721
W 19.63 19.34 19.34 18.67 18.14 17.61 0.020721
Fl 19.98 20.01 20.16 19.22 18.93 18.48 0.010332
F2 20.19 19.86 19.75 19.45 18.68 18.75 0.101823
The environments are the six possible combinations of two temperatures,
18 and 25°C, with three types of culture, in 1/3 pint milk bottles with
yeasted food (B), and in 3 X 1 inch vials with yeasted (Y) and un-
yeasted (U) food.

nations, the pooled estimate of error variation will thus be based on


48 df. It turns out to be 0.1036. As the entries in the table are the
means of five replicates, they will be subject to an error variance of
0.1036 -7 5 = 0.02072, as shown in the right-hand column.
We might note in passing that the mean numbers of chaetae were
also determined for the Fl and F2 of the cross between these two lines.
Although they will not be discussed until later, these means are also
recorded in Table 40. Equal numbers of families were raised from the
reciprocal crosses, S X Wand W X S in both Fl and F 2. Eight replicates,
four from each reciprocal, were raised of the Fl in each environment,
but only two, one from each reciprocal, of the F2 . The entries for Fl are
thus the means of eight replicates and those for F2 the means of two.
The error variances of their entries were found separately for Fl and F2
although otherwise in the same way as for Sand W themselves, and are
given in the table. Not surprisingly the error variance of the Fl entries
is lower than that of the parents, but that of the F2 entries is much
higher.
Returning to the parent lines, the numbers of chaetae of S averaged
over B, Y and U is 1(20.58 + 25.51 + 20.26) = 20.45 to two places of
decimals, at 18° C and !C20.44 + 20.93 + 20.66) = 20.68 at 25° C.
Those for Ware similarly 19.44 and 18.14. Thus W's chaeta number is
1.30 higher at 18 than at 25° C while that of S changes much less, such
change as there is being a reduction of 0.23, i.e. in the opposite direction
to W. Clearly the lines""'are reacting differently to the change in tempera-
134 Interaction of genotype and environment
ture. Since, however, the change in W is a major reduction with increase
in temperature, while that in S is if anything in the opposite direction,
no simple or even acceptable transformation of the scale on which chaeta
number is measured could eliminate this apparent interaction of the two
genotypes with the environmental difference. Clearly given that it is
significant we must accept the interaction as it is and elaborate our
model to accommodate it.

25. Two genotypes and two environments


Now if we let [d) be the genetically determined deviation of S, the mean
chaetae number of S from the mid-parent, m, and -[d) that of tv, the
assumption made by the simple model, that the non-heritable deviations,
spring from the environmental difference are independent of the geno-
type, would be tantamount to saying that the environment adds a devi-
ation e at 18 DC and a deviation -e at 25 DC, equally in the cases of both
genotypes. The situation would then be as shown in Table 41, which
sets out the algebraic formulation for the two genotypes in the two en-
vironments, with the corresponding mean number of chaetae, (rounded
off to two decimal places) below them. We can proceed to estimate the
parameters we have used. m is the overall average of the observations
and is found as l(20.45 + 20.68 + 19.44 + 18.14) = 19.6775. The gen-

TABLE 4l.
Mean chaeta numbers of the Sand W inbred lines at 18 and 25°C

18°C 25°C Sum


m + [d) + e m + [d]-e 2m + 2[d]
Obs 20.45 20.68 41.13
S Exp 20.8325 20.2975 41.13
O-E -0.3825 0.3825 0

m - [d] + e m- [d]-e 2m-2[d]


Obs 19.44 18.14 37.58
W Exp 19.0575 18.5225 37.58
O-E 0.3825 -0.3825 0
Sum 2m +2e 2m-2e 4m
39.89 38.82 78.71
m = 19.6775 [d] = 0.8875 e = 0.2675
Two genotypes and two environments 135
etic parameter is estimated from the line sums in the right-hand column
of the table as [d] = !C41.13 - 37.58) = 0.8875, and the environmental
parameter is similarly found from the environmental sums in the bottom
row of the table as e = !C39.89 - 38.82) = 0.2675. We can now con-
struct expected values for the chaeta numbers of the two lines at the
two temperatures by substituting the estimates of m, [d] and e in the
formulations that the model yields. Thus the expected chaeta number
(E) ofW at 25°C is m - [d] - e = 19.6775 - 0.8875 - 0.2675 =
18.5225 which compares with 18.14, the number observed (0), giving
a difference 0 - E = -0.3825. When comparing the expectation so ob-
tained with the observed chaeta numbers we find that S at 18° C also
gives 0 - E = -0.3825, while S at 25 and W at 18° C give a difference
0 - E = 0.3825. The large size of these deviations relative to [d] and e
suggests strongly that the simple model we have used is inadequate and
that the two genotypes do not react equally to the change in tempera-
ture: in other words that genotype X environment interaction is present.
We can accommodate this interaction by introducing a further par-
ameter, g, into the formulation in the way shown in the upper ex-
pressions of Table 42. This new parameter g is a measure of the geno-
type by environment interaction and in the present case is estimated
asg =: !C20.45 - 19.44 - 20.68 + 18.14) = -0.3825. In conjunction
with [d] and e, g completes the set of three parameters, correspond~ng

TABLE 42.
Alternative models for the phenotypes given by two genotypes, Sand W,
raised in two environments, 18 and 25°C

18°C 25°C
m + [d] + e + g m + [dj-e-g
S
m + [d] + es m + [d] - es

m - [d] + e-g m -[dj-e +g


W
m - [d] + ew m - [d] - ew
In each case the upper expression is in terms of the genetical parameter [d] found by
averaging over environments, the environmental parameter e found by averaging over
genotypes, and g the statistical interaction of [d] and e. The lower expression is in
terms of the same genetical parameter [d], but with e s measuring the change in ex-
pression of genotype S between the environments and ew similarly measuring the
change in expression of W. Three parameters are involved in each formulation, [d]
being the same in both, with e s = e + g and e w = e - g.
136 Interaction of genotype and environment
to the 3 df among four observations, required to give a perfect fit for
the deviation of the four observed chaeta numbers from their mean, m.
In the absence of interaction g will not depart significantly from 0,
and we can therefore test the adequacy of the simple model, which as-
sumes no interaction, by testing the significance of g. This can be done
in either of two ways, which both give the same answer. First, since
g = !(~8 - W18 - ~5 + W25 ) where ~8 is the mean chaeta number of S
at 18° C, etc.
Ji = -h-(VS18 + VW18 + VS25 + VW2S ) and Sg = v'Ji.
Each chaeta number in Table 42, from which g has been calculated, is
the mean of three of the observations in Table 41, and each of these ob-
servations is subject to an error variance of 0.02072, based on 48 df as
we have already seen. Thus each chaeta number in Table 41 has an error
variance of !C0.020 72) = 0.006907 and Ji will thus be k(4 X 0.006907)
= 0.001 727 giving Sg = v'Ji = 0.0414. Then t48 = g/Sg = 9.2 giving a
very small probability. The interaction is thus significant and the simple
model must be judged to be inadequate. We may note that [d] and e
will have the same standard error as g, and when tested in the same way
they also both depart very significantly from O.
The second way of testing g, and also [d] and e, is by an analysis of
variance of the four chaeta numbers in Table 41. As we have seen [d] =
1(41.13 - 37.58) = 1(3.55) = 0.8875. The SS accounted for by [d] will
thus be !{3.55)2 = 3.1506. Since this SS stems from a single parameter
and hence corresponds to 1 df, the MS will be the same as the SS. Finding
the SS's accounted for by e and g similarly, we obtain the analysis of vari-
ance shown in Table 43. This also includes the estimate of error variance
applicable to the chaeta numbers which we found in the previous para-
graph to be 0.006907 and with which the MS's for the three parameters
TABLE 43.
Analysis of variance of the observations in Table 41

Item df MS VR P
[d] 3.1506 456.1 v.s.
e 1 0.2862 41.4 v.s.
g 1 0.5852 84.7 v.s.
Error 48 0.00691
v.s. = very small
Two genotypes and two environments 137
must be compared to test their significance. Again all the three items are
highly significant. Since each MS in the analysis stems from 1 df, the VR
obtained when it is divided by the error variance is a t 2 • Thus in the case
of the g item, the VR is 84.7, giving t = y'(VR) = 9.2 as in the earlier
test. The two ways of testing the significance of g are thus no more than
two forms of the same test.
The significance of g shows that genotype X environment interaction
is present, or in other words that the two genotypes Sand W do not
react equally to the change in temperature. This suggests an alternative
formulation for the phenotypes of the two lines at the two temperatures,
in which e and g are replaced by two different parameters es and ew
measuring respectively the differences produced in Sand W by the alter-
ation of temperature. Thus S at 18° C has the phenotype m + [d] + es
and at 25° C is m + [d] - es ' while for W at the two temperatures are
m - [d] + ew and m - [d] - ew as set out in the lower expressions of
Table 42. This formulation has the advantage that es and ew are proper-
ties of the individual lines, unlike e and g, which are compounds of the
properties of the two lines. As such es and ew are biologically more di-
rectly meaningful than e and g, and indeed are direct measures of the
sensitivity of the two lines to change in an aspect of the environment.
They thus measure a character which is prospectively important and
whose genetic basis can be investigated in a direct way.
Now [d], ew and es permit a complete specification of the phenotype
as do [d], e and g. Clearly therefore, since [d] is common to both for-
mulations, es and ew must relate to e and g. In fact, es = e + g, and ew =
e - g, or put the other way round e = Hes + ew ) while g = Hes - ew )'
and the SS jointly accounted for byes and ew equals that jointly ac-
counted for by e and g, each SS corresponding of course to 2 df. Thus
given the values of one pair of parameters the values of the other two
can be found: they are no more than alternative ways of representing
the same thing and are readily converted into each other. The formu-
lation to be used may be chosen by its convenience for the investigation
or analysis in hand. In general, while es and ew are the more biologically
meaningful pair, e and g are commonly the more analytically useful,
although this is not always the case.
In the present example es = t(20.45 - 20.68) = e + g = 0.2675 -
0.3825 = -0.115 while e w = H19.44 -18.14) = e - g = 0.2675-
(-0.3825) = 0.650. We note that es and ew are each found as half the
difference between two of the observed values in Table 41 each of which
has an error variance of 0.006 907. Hence ~s = ~w = !C2 X 0.006907)
138 Interaction of genotype and environment
= 0.003454 and ses = sew = y'0.003 454 = 0.05877. The difference
between es and ew is significant (which is, of course, an alternative way
of demonstrating genotype X environment interaction and leads in fact
to exactly the same test of significance that we have already used), and
ew is significantly greater than 0, but es is not significantly negative on
these results. Thus while we can say that the two genotypes respond
differently to the change in temperature, we cannot say from these data
that they respond in different directions.

26. A more complex case


So far we have been discussing the simplest case of two genotypes and
we have derived two different approaches to the detection and measure-
ment of the interaction. The first using [d], e and g, leads to an analysis
of variance into items for the effects of the genetic difference, the en-
vironmental difference and their interaction, in the familiar statistical
way. The second, using [d], es and ew ' depends on finding and com-
paring the changes produced by the temperature difference in the two
lines taken individually. Both approaches are readily generalized to deal
with any number of lines in any number of environments.
Table 40 gives the mean numbers of sternopleural chaeta not only for
Sand W but for their FI and F2 also. Strictly we should not bring either
the FI data nor that from F2 into the same analysis of variance as Sand
W since the observations on them are subject to error variances different
from that of the two parents. The error variance of the FI observations,
however, differs from that of Sand W only by a factor of two which
is not likely to lead to problems of interpretation if we include them
in the same analysis especially if we are conservative and assume that
the parental lines error variance applies to F 1 as well. The F2 results
on the other hand have an error variance greater by a factor of five than
the parents and will be excluded from the analysis of variance for this
reason.
Taking S, Wand FI we have observations on three lines in six environ-
ments, or eighteen observations which will of course yield 17 df in the
analysis. Of these 2 df will correspond to the genetic difference between
the three lines, and 5 df to the differences among the six environments.
The remaining 2 X 5 = 10 will correspond to the interaction of the two
main effects. The genetical items depend on differences analogous to [d],
in the simple case, the environmental items to differences of type e and
A more complex case 139
TABLE 44.
Analysis of variance of the observations on S, Wand Fl in Table 40

Item df MS VR P
lines (L) 2 4.8163 232.4 v.s.
Environments (E) 5 0.6026 29.1 v.s.
Interaction (I) 10 0.2743 13.2 v.s.
Error 48 0.02072
-----------------------
L1 (S-W) 9.4519 456.2 V.s.
L2 (S + W -2Fl ) 0.1806 8.7 v.s.
11 5 0.4433 21.4 v.s.
12 5 0.1052 5.1 0.001

E1 (18 -25°C) 1 2.5163 121.4 v.s.


E2 (culture types) 2 0.1750 8.4 0.001
11' 2 1.0739 51.8 v.s.
12' 4 0.0669 3.2 0.05-0.01

the interaction to differences of type g. The simple analysis of variance


of the eighteen observations is set out in Table 44, the error variance
used being that pertaining to the observations on Sand W in Table 40,
as we have already noted. It is clear that all three items in the simple
analysis of variance (set out in the upper part of the table) are very sig-
nificant when tested against this estimate of error. There is thus evidence
not only of genetical differences among S, Wand F l ' and differences
among the six environments but also of interaction between the genetic
and environmental differences: the lines do not change equally as the
environment alters, as indeed we have already seen to be so in the sim-
pler case of Sand W at the two temperatures.
We can take the analysis further. First we can compare the behaviour
of Sand W over all six environments. Thus in bottles (B) at 18° C we can
find from Table 40, S - W = 20.58 - 19.63 = 0.95 and so on. The sum
of the six differences is 10.65 which contributes 12 (1 0.65? = 9.4519
for 1 df to the SS of 9.6325 for 2 df (yielding a MS of 4.8163) for lines
(L) in the main analysis. The SS of the six differences round their mean
is found as tcO.95 2 + 1.172 ... + 3.05 2 ) - /2 (10.65)2 the divisors 2
and 12 being the number of observations that go into each difference
and into the sum of the differences, respectively. This SS turns out to
be 2.2167 for, of course, 5 df among the six differences giving a MS of
! X 2.2167 = 0.4433 for the interaction of the genetic difference be-
140 Interaction of genotype and environment
tween Sand W with the six environments. These two items appear as L1
and 11 in the middle part of Table 44 and both are very significant, so
bearing out our earlier test of the two lines over the two temperatures
(Table 43), in showing that they do not react in the same way to changes
of environment. The remaining comparison (L2) among the three lines
is of Sand W, taken together, with the Fl. We can find it as the differ-
ence between L1 and the SS for 2 df for lines in the main analysis. Thus
the SS for this comparison of parents, taken together, with FI (L2) is
9.6325 - 9.4519 = 0.1806 which corresponding as it does to 1 df is
also the MS. The interaction item 12 is found similarly from the SS for
interaction, having 10 df, in the main analysis and the SS for 11, having
5 df. The SS for 12 which also has 5 df, is thus 2.7426 - 2.2167 = 0.5259
giving a MS of 0.1 052, as entered in the middle part of the table. Although
smaller than II, 12 is also significant with a P of just about 0.001. This
further analysis of the lines and interaction items shows not only that S
and W differ in their overall effects on chaeta number and in their inter-
action with environments as well, but that the FI differs from the joint
or mean behaviour of its parents, again in both overall effect on chaeta
number and in interaction with the environments. We shall look further
into these relationships in Section 28.
Just as we can compare the lines with one another over all six environ-
ments, we can compare the effects of the environments with one another
over all three lines. Of the 5 df among the six environments, one can be
identified as relating to the difference between the two temperatures,
and two more as relating to the differences among the three types of
container, B, Y and U. The effect of temperature is calculated by first
finding the difference in the chaeta number at 18 and 25° C for the three
lines separately. Thus the difference for W is 19.63 + 19.34 + 19.34-
18.67 -18.14 -17.61 = 3.89 those for Sand FI being similarly -0.68
and 3.52. The overall difference between the two temperatures is hence
3.89 - 0.68 + 3.52 = 6.73 which gives a SS of -h(6.73)2 = 2.5163 for
1 df, the divisor 18 being of course the number of observations of which
the 6.73 is composed. This SS is part of the SS, having 5 df, for environ-
ments in the main analysis. It is shown as E 1 in the lowest part of Table
44. The interaction of lines with the temperature difference (11') in the
lowest part of Table 44, and not to be confused with 11 in the middle
part of the table is found as i [3.89 2 + (-0.68)2 + 3.52 2 ] - -h(6.73)2 =
2.1479 for 2 df giving a MS of 1.0739. This is part of the interaction SS
in the main analysis of variance, and is very significant when tested
against the error variance, 0.020 72.
A more complex case 141
The interaction with container types is most easily found by construc-
ting a 3 X 3 table in which each entry is the sum of two corresponding
values, one from each temperature, the nine entries being one for each
container type in each line. Thus the value for B in line W is 19.63 +
18.67 = 38.30: that for U in FI is 20.16 + 18.48 = 30.64 and so on.
An analysis of variance can then be carried out on the entries in this
3 X 3 table, an additional factor of two being used in all the divisors
because each entry is the sum of two of the initial observations from
Table 40. One margin of the 3 X 3 table will yield a SS for 2 df reflec-
ting the line differences and will be exactly the same as the lines item in
the main analysis of variance. The other margin yields a SS of 0.3499
for 2 df, giving a MS of 0.1750, for the overall variation between the
three container environments (E2 in the lowest part of Table 44). This
is again, of course, part of the environments item in the main analysis
of variance. Finally to complete the analysis of this 3 X 3 table, we
obtain a SS of 0.2674 for 2 X 2 = 4 df, giving a MS of 0.0669 for the
interaction of the genetic differences among the three lines with the
environmental differences among the three container types (12' in the
lowest part of Table 44). The overall effect of container types (E2) gives
a VR of = 0.1750/0.02072 = 8.4, when compared with the error vari-
ance, showing significance at p:!:= 0.001. The VR for the interaction of
container types with lines (I2') when compared with error is 3.2, which
again is significant, P lying between 0.05 and 0.01. Evidently the three
lines are not alike in their reactions to container type, although this
interaction is smaller than the interaction with temperature (II '), just as
the overall effect of container (E2) is smaller than that of temperature
(E 1). We shall discuss the comparisons of temperature and container
interactions further in the next Section.
As we have noted, this approach to genotype X environment interac-
tions, through the analysis of variance, utilizes the formulation in [d], e
and g. The second approach to which we now turn, utilizes the alterna-
tive formulation which is represented in the simple use of Sand W at the
two temperatures by [d], es and ew . Now es and ew measure the differ-
ences in Sand W produced by the change in temperature. We could ob-
viously introduce further parameters to represent the changes produced
in Sand W by the changes in culture containers. Altogether five orthog-
onal parameters would be required to specify the differences of chaeta
number in S among the six environments and similarly five more for W.
Now given that they are orthogonal to one another, as they must be if
the specification Is to be adequate, the five parameters for, say S, will
142 Interaction of genotype and environment
make independent contributions to the SS, for of course 5 df, among the
six chaeta numbers, one from each environment. Thus Vs , the variance of
S over environments, will reflect the values of these five es parameters,
and so the response of this genotype to the environmental changes. Vw,
the variance of W over environments will similarly reflect the values of
the five ew parameters, and if the corresponding es and ew parameters are
not equal to one another, that is :i:f there is genotype X environment in-
teraction, Vs will not in general equal V w. So we can detect the presence
of genotype X environment interaction by comparing the variances of
the different lines taken over environments.
This test is applied to the data of Table 40 including now the F2 since
the difference between its and the other error variance is of lesser im-
portance in relation to this procedure. The results of the test are set out
in Table 45, from which it is immediately apparent that the variances
TABLE 45
Variances over environments of S, W, Fl and F2 (Table 40)

Over Over
all environments temperatures Remainder
df= 5 1 4
S 0.0508 0.0771 0.0443
W 0.6280 2.5220 0.1544
Fl 0.4723 2.0651 0.0742
F2 0.3777 1.4211 0.1168
All entries are mean squares

are not alike. In particular Vw = 0.628 is much larger than Vs = 0.0508,


giving with it a VR of Vw/Vs = 12.4 which for 5/5 df has a probability
of P < 0.02, after doubling the P to allow for putting the larger variance
Vw over the smaller Vs in the YR. There is no doubt about the significance
of the interaction of these genotypes with the environments, as indeed
we have already found using the analysis of variance. The comparison of
Vwand Vs however, gives us further information not immediately avail-
able from the analysis of variance: since Vw is bigger than Vs, the W line
must change more than S over environments - W is more sensitive to en-
vironmental change than S. Furthermore, VFj and VF2 are significantly
greater than Vs ' although neither is significantly smaller than Vw. It
would thus appear that both Fl and F2 are closer to their W parent than
to S in their sensitivity to environmental change. We must however, still
A more complex case 143
be a little cautious where the F2 is concerned because of its markedly
larger error variance (Table 40).
We can partition the changes over the six environments into that re-
lated to temperature for 1 df, and the rest involving types of culture
container, for 4 df. In respect of W, the total SS over all six environ-
ments, is
SSw (19.63 2 + 19.34 2 ... + 17.6J2)
-!(19.63 + 19.34 ... + 17.61)2 = 3.1397

giving Vw = 3.1397/5 = 0.6280 as entered in Table 45.


The SS for the temperature difference is similarly
SSWT = i[(19.63 + 19.34 + 19.34)2
+(18.67+ 18.14+ 17.61)2]-iOI2.73)2 = 2.5220
which corresponding as it does to I df, is also VWT . The SS remaining,
and corresponding to 4 df, is thus
SSWR = 3.1397 - 2.5220 = 0.6177 giving VWR = 0.1544.
These and similar results for S, FI and F2 are included in Table 45.
In W; FI and F2 the effect of temperature accounts for much of the
major response to the environmental changes, the VT significantly ex-
ceeding the VR at the 0.0 I level of probability in Wand Fp and exceed-
ing at the 0.05 level in F2. With S the effect of temperature is relatively
much smaller, and although VT is greater than VR even in this case, it is
not significantly so. The differences among the four lines for VR are not
significant, but again VR is smallest for S, largest for Wand intermediate
for Fi arid F2 . There is thus at least a hint that the order of the four lines
is basically the same for sensitivity to changes involving container type as
for sensitivity to the temperature change. It should be observed, however,
that although VR is much smaller than VT for all lines but S, it is on the
other hand significantly greater than the relevant error variance (see
Table 40) in Wand Fl and even in S it is approaching significance. Thus
Wand Fi' at least, do change with container, although to a smaller ex-
tent than with temperature, while S gives some appearance (even if it is
not significant) of being less sensitive than the others to changes involv-
ing container type as well as to temperature. The high error variance of
F2 renders it relatively uniformative in the present connection; but even
leaving it aside, the results from the analysis again pose the question of
144 Interaction of genotype and environment
whether Wand FI are just more sensitive to any environmental change
than is S, or whether the differences in reaction to temperature and
container changes, although both smaller in S than in the others, fail in
fact to follow precisely the same relative patterns in all the lines. This is
a question which we must now examine further.

27. The relation of g to e


So far our discussion of genotype X environment interaction has not
required us to introduce measurements of the environment such as
would allow the quantification of the environmental changes and hence
the comparison, one with another, of different changes in their effects
on the interaction. Such quantification of the changes is of course easy
enough where temperature is altered; it can be measured in °C. The tem-
perature change used in the experiment we have been discussing was 7° C,
and if more than two temperatures had been used, the changes they rep-
resented could have been compared on this scale. The changes in con-
tainer and the culture conditions which they imply, are however not so
easily quantifiable: there is no obvious scale on which we can simul-
taneously represent the differences in size and shape of the containers
themselves and the differences in food mass and supply of yeast. Fur-
thermore, if we are to compare the differences in response of the two
lines Sand W to change of temperature with their differences in response
to containers we need a single scale on which all the variations in environ-
ment can be repres(mted. The only way in which we can achieve this is
by seeking a biological measurement of the environment and its changes,
that is by measuring the environment through its effects on the organisms
themselves.
If for a moment we confine ourselves to the temperature difference
and go back to the formulation in [d), e and g as set out in Table 42, we
see that the average of the Sand W chaeta number at 18° C was m + e
and at 25° C was m - e. These averages are independent of both [d] and
g. They thus afford us a measure of the average or overall effect of the
change in temperature. The bottom margin of Table 41 shows that the
average at 18°C wasm + e = 39.89/2 = 19.945, and at 25°C wasm-e
= 19.410, thus giving e = !(19.945 - 19.410) = 0.2675, as indeed we
found earlier (Section 25).
Now if instead of taking the average, i.e. half the sum, of the Sand
W chaeta number at 18° C we had taken half their difference, we see
The relation of g to e 145
from Table 42 that this will give us
!(S - W) = ! {em + [d] + e + g) - (m - [d) +e- g)} = [d] + g.
At 25° C we find similarly that !(S - W) = [d) - g. Then taking the data
of Table 41, [d) + g = H20.45 - 19.44) = 0.505 and [d] - g = !(20.68
-18.14) = 1.270 giving [d) = 0.8875 andg = -0.3825 again as found
earlier. Thus when the overall effect of the environment changes by
e = 0.2675 the interaction changes by g = -0.3825. In other words the
ratio of change in the interaction to that in the overall effect of the en-
vironment isgje = -0.3825/0.2675 = -1.4299. So, given that there is
a straight line relation between g and e (which with only two tempera-
tures we cannot test and hence must be cautious in assuming) a change
of temperature that produces an effect e in the average chaeta number
given by these genotypes will then alter their difference by -1.4299 X
2e = 2.8598e. It will do so by virtue of a change in S of e + g =
-0.429ge and a change in W of e - g = 2.429ge.
This treatment is readily extended to take all six environments into
account. For convenience the environments have been numbered 1 to 6
where 1 is B at 18° C etc. as shown in Table 46. !(S + W) is then found
TABLE 46.
Relation of g to e in Sand W

1 2 3 4 5 6
Environment Mean
(l8°e B) (l8°e Y) (l8°e U) (25°e B) (25°e Y) (25°e U)

20.105 19.925 19.800 19.555 19.535 19.135 19.6758


(= m)

e 0.4292 0.2492 0.1242 -0.1208 -0.1408 -0.5408 o


t(S - iii) 0.4 75 0.585 0.460 0.885 1.395 1.525 0.8875
= [d] +g (= [d])

g -0.4125 -0.3025 -0.4275 -0.0025 0.5075 0.6375 o


SS(e) = 0.5886 SS(g) = 1.1084 sep= -0.7214
b = -1.2256

Analysis of variance of g

Item df MS VR p

Regression 1 0.8842 85.3 v.s.


Remainder 4 0.0560 5.4 0.01-0.001
Error 48 0.0104

VR for Regression/Remainder = 15.8 with P = 0.05 -0.01


146 Interaction of genotype and environment
from the data of Table 40 as entered in Table 46. Each of these entries
is m + e where e] to e6 , from the six environments, sum to O. m is the
average of the six values and turns out to be 19.67583, which on de-
ducting from ~(S + W) gives the values of e] to e6 as shown. (It should
be noted that this value for m does not agree exactly with that found in
Section 25, where only the temperatures were being considered, because
the data of Table 41, although found from that of Table 40 were rounded
off to the second decimal place for ease of calculation.) Next the six
values of 1(S - W) are found. These are [d] + g, and since the six values
of g must sum to 0, their average gives [d] = 0.8875, which on deducting
from ~ (S - W) leaves the six g's .

0·6 •
6

OA

0·2

9 o~----~--~-+--~--~----~----~

-0'2
.2
-OA

-0·6
-~0~6-----0~·4-----0~·2----~0----0~·2~--~~L-~

e
Fig. 16. The regression of g on e for sternopleural chaeta number in two
lines of Drosophila melanogaster (S and W) raised in six environments (1 to
6). The regression line of g on e has a slope of -1.2256 which by being out-
side the range 1 to -1, shows that the two lines of flies respond in opposite
directions to the relevant change in the environment (see also Fig. 17).

These six g's are plotted against their corresponding e's in Fig. 16 from
which it is clear that there is a negative relation between g and e, g falling
as e rises. We can test whether this relation is rectilinear by finding the
regression of g on e. The calculation is shown at the foot of Table 46.
The SS for e is found simply as e/ + el ... e6 2, since the sum of
the six e's must be O. (It is nevertheless easier to find this SS as (m + e])2
The relation of g to e 147
+ (m + e2? ... + (m + e6)2 - H(m + el) + (m + e2) ... +
(m + e6)]2 as every m + e is known exactly whereas all the e's involve
recurring decimals.) Similarly SS(g) = g/ + g/ ... + g/ while the
S.C.P. of g and e is e1gl + e2 g2 ... e6g6. Then the linear regression co-
efficient of g on e is S.C.P./SS(e) = -0.7214/0.5886 = -1.2256. The
analysis of variance of g is carried out in the customary way, the SS for
regression being (-0.7214)2/0.5886 = 0.8842 which on subtracting
from SS(g) leaves 1.1084 - 0.8842 = 0.2242 as the SS remaining. Since
there are six environments each yielding an observation, there will be
5 df of which 1 is taken up by the regression itself leaving 4 df for vari-
ation of the points round the regression line, so giving as the remainder
MS 0.2242/4 = 0.0560.
Each g value is derived from that of the difference between an S ob-
servation and a W observation. Each observation is subject to an error
variance of 0.020 72 and the difference between two of them will have
an error variance twice this value. Half the difference will have an error
variance of one-quarter that of the difference itself. Thus g will be sub-
ject to a variance of! X 2 X 0.020 72 = 0.010 36. When tested against
this estimate of error the remainder MS gives a VR of 5.41 for 4 and
48 df and this has a P between 0.01 and 0.001. The departures from
the linear regression are thus significant. At the same time the regression
MS tested against the remainder MS yields a VR of 15.78 which for 1
and 4 df has P = 0.02 - 0.01. Thus, despite the variation round the line,
there can be no doubt of the linear component in the regression of g on e.
This linear component must reflect the relation between g and e which
we have already seen to be produced by the temperature changes. This
relation plays a dominant part in producing the regression line because
the effect of temperature in changing e is greater than the effects of the
changes in culture container as a glance at Fig. 16 will confirm. The sig-
nificant variation about the regression line reflects the consequences of
the changes in container, which must thus produce interactions, g, not
related in the same way to the overall effects, e, as those brought about
by the alteration of temperature. Thus the relative responses of the two
genotypes to change in culture container cannot be following the same
pattern as their relative responses to change in temperature. It is there-
fore necessary to specify the type of environmental change before we
can discuss the relative sensitivities of the two genotypes to it.
The plot of g against e in Fig. 16 brings out in a clear and simple way
the relation between these two quantities. It shows us, however, nothing
of the sensitivities to environmental change of the individual genotypes
148 Interaction of genotype and environment
Sand W. A more informative, albeit somewhat more complex, picture
can be obtained in a slightly different way. If we deduct m from the
values given by S in the six environments we are left with [d] + e] + g],
[d] + e2 + g2' etc., which may of course be written in the alternative
formulation as [d] + es ] [d] + es2 ' etc. Similarly deducting m from the
values given by W leaves -[d] + e] - g], -[d] + e2 - g2' etc. which may
also be rewritten as -[d] + e w ], -[d) + e w 2, etc. The values so obtained
for the two genotypes are set out in Table 47 and are plotted against e
in Fig. 17. The table also gives the linear regression coefficients, b, of
S - m and W - m on e, and the regression lines themselves are shown
on the figure.

1·5

0·5

o 6 54

-1'0

-1,5

-2,0
-0'2 o 0·2 0-4 0·6
e
Fig. 17. The sensitivity diagram for sternopleural chaeta number in S
and W. The deviations of Sand W from the mid-parent, m, for each en-
vironment are plotted (ordinate) against e (abscissa). The six environments
are denoted by the numbers 1 to 6. The outer broken lines are the best
fitting regression lines of S - m and W - m on e. The mean of S - m and
W - m is e for each environment, and the central broken line derived from
these means is thus the regression of e on e and must have a slope of 1. The
diagram makes clear that W is more sensitive than S to change in the en-
vironment and that the two change in opposite directions.
TABLE 47.
Sensitivity to environmental change in S, W, their F1 and F2

Environment
Mean b
1 2 3 4 5 6
S- m (= [d] + es) 0.9042 0.8342 0.5842 0.7642 1.2542 0.9842 0.8875 -0.226
iii - m (= [-d] + e w ) -0.0458 -0.3358 -0.3358 -1.0058 -1.5358 -2.0658 -0.8875 2.226
e (= He s + e w ]) 0.4292 0.2492 0.1242 -0.1208 -0.1408 -0.5408 0 1.000
~-m(= [h] +eh) 0.3042 0.3342 0.4842 -0.4558 -0.7458 -1.1958 -0.2125 1.836
F2 - m (= Hh] + !eh) 0.5142 0:1842 0.0742 -0.2258 -0.9958 -0.9258 -0.2292 1.603
150 Interaction of genotype and environment
The first point to note is that the means of S - m and W- mare [d]
and -[d] respectively. The regression line for S - m must thus cut the
ordinate of the graph at [d], and the regression line for W- m cuts it at
-[d]. These two points must be equally spaced above and below the
origin, as will be seen from the figure. Next, the slope of the S - m re-
gression line measures the rate of change of es = e + g on e: in other
words it measures the sensitivity of S to change in the environment.
Equally the slope of the W - m line measures the sensitivity of W to
change in the environment, and clearly this is much greater than the
sensitivity of S, which in so far as it changes at all does so in the op-
posite direction. Now the slope of the S line depends on the change in
es = e + g on e, while that of the W line depends on e w = e - g. The
slope observed for the S line is -0.226 while that of the W line is 2.226.
The regression of e on e, which is also shown in the figure, will obviously
have a slope of 1. Thus the slope of the S line departs from that of e by
-0.226 - 1.000 = -1.226, while that of the W line departs by 2.226 -
1.000 = 1.226. So the interaction of genotype with environment is
responsible for a slope of -1.226 in Sand 1.226 in W - values which are
equal in magnitude but opposite in sign as indeed they must be since e
is found from the mean of Sand W in each environment, with the S line
reflecting the change of g and the W line that of -g. We may also note
that -1.226 measuring the contribution of g to the slope of the S line
equals the slope we have already found in a different way for the re-
gression of g on e (Table 46 and Fig. 16) as indeed it must. Table 47 and
Fig. 17 thus give us all the information that we were able to obtain from
Table 46 and Fig. 16 and more besides.
This analysis of the genotype X environment interaction is made poss-
ible only by using the chaeta numbers displayed by Sand W in the differ-
ent environments to provide their own biological measurement of the
environments and so to quantify the overall effects of various changes
of environments. The biological measure, e, has allowed us to quantify
the consequences of the changes in culture condition as well as those of
change in temperature and show both on the same scale. In doing so it
has enabled us to compare the patterns of response to temperature and
culture condition and show that they are not the same. A further advan-
tage, although not one that is brought out by our present data, is that g
may display a rectilinear relation to environmental change measured by
e, even where it fails to do so when the environment is measured in other
and perhaps more obvious ways. As an example of this, two strains of
the fungus Schizophyllum commune (Jinks and Connolly, 1973) when
Crosses between inbred lines 151
grown in a series of nine environments differing by temperature, display
interactions which when quantified by g are related in a curvilinear man-
ner to temperature itself. But when the temperature is replaced by the
biological measure e the relation of g to the environmental change be-
comes rectilinear, as is shown in Fig. 18 .

.c 80
-+-'
3
e
..:: 60
0
2
0
0:: 40

2
20
15

100

.c 80
-+-'
3
e
Ol
'0 60
Q)

0
0::
40

2 3 49 8 5 6 7
20
o 10 20
e
Fig. 18. The effect of temperature on growth rate (in mm per nine days)
of a slow (L) and a fast (H) growing strain of Schizophyllum commune. The
upper graph shows growth rates plotted against temperature, and the lower
graph shows it plotted against e, the biological measure of the nine environ-
ments. The nine temperatures are denoted by the numbers 1 to 9, which
thus relate corresponding points on the two graphs.

28. Crosses between inbred lines


As we have seen, the Fl and F2 generations were raised in the six en-
vironments in addition to the Sand W parental lines. Now the depar-
ture of the mean chaeta number of Fl from m, the mid-parent value, is
[h] and that of the F2 mean is t [h]. So whereas the interaction of the
152 Interaction of genotype and environment
parental genetic difference is the interaction of [d] with e, the interac-
tion of the Fl genotype, and with it any interaction of the F2 mean, will
depend on the interaction of [h] with e. We must therefore, distinguish
between two g parameters, gd measuring the interaction of [d] and e,
and gh measuring the interaction of [h] with e. The mean phenotype of
the F 1 thus becomes m + [h] + e + gh. In respect of any single gene
difference A-a, the F2 will comprise individuals of whom t will be AA,
and t aa, which are d and -d respectively. Their genetic deviation of d
and -d from m thus cancel out and so equally will their genotype X en-
vironment interaction, which will of course be gd and -gd respectively.
Thus in the absence of complications not only will the basic genetic
component of the F2 mean be t [h] but the interaction component will
correspondingly be tgh • The mean of F2 will thus be m + t[h] + e +
tgh when summed over all the genes by which the parents differ, pro-
vided there are no complications such as those introduced by non-allelic
interaction. So to have observations in both parents together with their
Fl and F2 mean allows us to test the adequacy of the model we have
developed for genotype X environment interaction in just the same way
that we tested the adequacy of the simple additive-dominance model in
Section 9.
We will take as our example the reaction of S, W, and their derivatives
to the change in temperature. The mean chaeta numbers of the parents
and their Fl and F 2 at both 18 and 25°C are shown in Table 48, together
TABLE 48.
The model for genotype X environment interaction applied to the effect of
temperature on sternopleural chaeta number in S, W, their Fl and F2

Temp. Weight m [d] [h] e gd gh Observed Expected


18 144.783 0 1 1 0 20.450 20.447
S
25 144.783 1 0 -1 -1 0 20.667 20.668

18 144.783 1 -1 0 1 -1 0 19.437 19.434


W
25 144.783 1 -1 0 -1 1 0 18.140 18.131

18 290.360 0 1 1 0 1 20.050 20.047


Fl -1 -1 18.868
25 290.360 1 0 1 0 18.877

F2
18 29.463 0 1
2 0 !1 19.933 19.994
25 29.463 0 ! -1 0 -2 18.960 19.134
2
X[2J = 1.046 P = 0.7 - 0.5
Crosses between inbred lines 153
with their structures in terms of the six parameters, m, [d], [h], e, gd' gh
and also the weights attached to each observed chaeta number in the
analysis. The weights come of course from the variances given in Table
40. Since each temperature mean is found by averaging three obser-
vations, B, Y and U, at that temperature, its variance in for example the
parent lines will be 0.02072 -;- 3 = 0.006907 and the weight is 1/
0.006907 = 144.78. The six weighted least squares equations of esti-
mation for the parameters are then obtained in a manner exactly anal-
ogous to that used in Section 9 and turns out to be, in matrix form,

m
A

1218.777 0 610.183 0 0 0 23843.546


0 579.131 0 0 0 0 [£1] 513.979
610.183 0 595.451 0 0 0 [~] 11 875.702
0 0 0 1218.777 0 610.183 e = 524.284
A

0 0 0 0 579.131 0 ![d -220.553


0 0 0 610.183 0 595.451 gh 355.028

J M S.

Inversion of the J matrix thus enables us to write

0.0016849 0 -0.001 7266 o 0 o


o 0.001 726 7 0 o 0 o
-0.001 7266 0 0.0034487 o 0 o
0 0 0 0.0016849 0 -0.0017266
000 o 0.001 7267 o
000 -0.0017266 0 0.0034487

23843.546
513.979
11875.702
524.284
-220.553
355.028

s
from which we find
m = 19.6799 ± 0.0411
A

[3] = 0.8875 ± 0.0416


[~] = -0.2125 ± 0.0587
e = 0.2704 ± 0.0411
=
A

~d -0.3808 ± 0.0416
gh = 0.3192 ± 0.0587.
154 Interaction of genotype and environment
the standard errors being obtained as the square roots of the entries
along the leading diagonal of the variance-covariance matrix rl. All the
estimates are significant and no parameter is redundant therefore.
The estimates allow us to calculate expectations for the mean chaeta
number of the S, W, FI and F2 at each temperature as shown in the last
column of Table 48. Then squaring the differences between observed
and expected means, multiplying each squared difference by the corres-
ponding weight and summing over all eight observations gives Xf2J =
1.046, there being 2 df because six parameters have been estimated from
the eight observations. This Xf2J has a probability lying between 0.7 and
0.5, indicating that so far as these data go the model is fully adequate to
account for the observations: there are no grounds for suspecting com-
plications such as non-allelic interaction.
We should note, however, that a more sensitive test would have been
possible if more generations, notably the two back-crosses, had been
included in the experiment and if more replicates had been raised of the
F2 to reduce the variance of its mean.
Comparing the estimates of the parameters with their contributions
to the eight observations shows that:
(i) [d) is positive because S has a larger mean number of chaetae
than W.
(ii) [Il] is negative because the FI and F2 are nearer to W, the -[d)
parent, than to S which has [d).
e
(iii) is positive because the average chaeta number is higher at 18
than at 25° C.
(iv) gd is negative because the difference between Sand W decreases
as the overall chaeta number rises from 25 to 18° C.
(v) gh has the opposite sign to [h) because dominance decreases as the
chaeta number rises from 25 to 18° C.
These points become clearer, if, -having satisfied ourselves that on the
one hand the model is adequate while on the other it contains no redun-
dant parameters, we set out the analysis and its results in a different
way. If we concentrate on the data from a single environment we have
no information about the effects of environmental change. The four
observations from one environment can therefore be accounted for by
estimating only, m, [d) and [h], the estimates so obtained being of course
applicable only to that environment. Proceeding in this way, one environ-
ment at a time, we obtain two estimates each of m, [d) and [h) thus:
Crosses between inbred lines 155
25°C 18°C s.d.
m 19.3995 19.9403 ± 0.0581
[d] 1.2683 0.5067 ± 0.0588
[h] -0.5316 0.1067 ± 0.0831
2
X(1) 0.934 0.112
There are two X2 's one from each environment and each having 4 - 3 =
1 df. Neither is significant and the model is thus adequate at both en-
vironments.
Now min the combined analysis is a combination of the two m's from
e
the separate environments while is a measure of the difference between
the two separate m's. Similarly the combined [3] is a compound of the
two separate [d],s, while gd depends on their difference; and the com-
bined [ii] is a compound of the two separate [h),s while gh depends on
their difference. In the present case, where the variances of correspond-
ing observations are equal in the two environments, m is in fact the sim-
e
ple average of m18 and m 2S ' while is half their difference, i.e. is !(m I8 -
m2S). Similarly [£I] = !([d] 18 + [d] 2S) and gd = !([d] 18 - [d12s) while
[/1] = ! ([h]18 + [h bs) with gh = ! ([h ]18 - [h bs). The interpretation and
implication of the estimates of the six parameters from the combined
analysis of the results from the two environments are now clear. [d] falls
as m rises. Hence [d] and e are moving in opposite directions, and gd is
thus negative. Similarly, while [h] is preponderantly negative, it is rising
as e rises and gh is thus positive. A further point is brought out well by
the present estimates. At 25° C [h] is significantly negative, giving a ratio
[h]/[d] = -0.43. At 18°C [h] is positive but it does not differ signifi-
cantly from 0, although it obviously does differ significantly from [h ]2S.
The ratio [h]/[d] = 0.19. Thus the dominance, or to be more precise the
potence of the W genotype over the S changes markedly with the en-
vironment: the value of [h] as indeed that of [d] also, is not unconditional.
This is of course another way of saying that the interaction between geno-
types and environments affects dominance as well as additive variation.
One last point remains to be made about these results. The rate of
change of gd on e is -0.3808/0.2704 = -1.4085 which agrees with our
estimate of -1.4299 obtained in the previous section from consideration
of Sand W alone. Since S departs from m by [d] its interaction with the
temperature change will thus be -1.4085e but with W the deviation
from m is -[d] and the interaction is thus -gd = 1.4085e. The rate of
e
change ofgh on is 0.3192/0.2704 = 1.1804. Thus the reaction to tem-
perature of the heterozygote is not only much nearer to that of W than
156 Interaction of genotype and environment
it is to that of S - it is in fact approaching quite closely in value to that
of W. Clearly W is dominant to S in its genotype X environment inter-
action: indeed its dominance in respect of the interaction is even
greater than in respect of overall chaeta number.

1·5

1·0
S

0·5

-0·5

-1·0

-1·5

-2·0
0·6
e
Fig. 19. The sensitivity diagram for sternopleural chaetas in the Sand W
lines of Drosophila, together with their FI and F2 • FI and F2 follow the
response pattern of W more than that of S, thus indicating the dominance
of the relevant genes in W.

This is well seen from Fig. 19 which is a sensitivity diagram similar to


that already presented for the parent lines in Fig. 17 but which now in-
cludes Fi and F2 as well. Although only the reaction to the temperature
change was taken into account in the foregoing analysis, the figure shows
the behaviour in all six environments. ~ - m and ~ - m are set out for
all the environments in the lower part of Table 47, from which Fig. 19
has been drawn.
The regression lines for the two parents, FI and F2 have been omitted
from the figure, in the interests of clarity; but the regression coefficients
(b) are shown in the right-hand column of Table 47. The expectations of
S - m, W - m, ~ - m and ~ - m are shown on the left of the table
Variance of F2 157
from which it will be seen that just as b for Sand Ware the rates of
change on e of e + gd and e - gd respectively, that of FI and F2 are the
rates of change of e + gh and e + !gh' Since the rate of change of eon e
is obviously 1, the rate of change (Le. the regression) of gh on e is bFJ- 1
= 1.836 - 1 = 0.836. We can then predict that the regression of !gh on e
will be 0.836 -;- 2 = 0.418, which will give 0.418 + 1 = 1.418 as the ex-
pected regression for the F2. The observed regression is 1.603 which as
expected is lower than that for the Fl' It is higher than the expectation
but not significantly so.
The figure brings out very clearly, as do also the regression coefficients,
the close similarity of FI and F2 to W in their patterns of sensitivity to
environmental change: in fact, the W pattern shows a high degree of
dominance over that of S. In average chaetae number, on the other hand,
although W is again dominant over S, the dominance is less, with the FI
and F2 means departing more in the direction of S. The results of this
difference in dominance in respect of sensitivity and mean chaeta num-
ber is that in environments 1, 2 an? 3 (the three at 18° C) the FI and F2
are very close to half-way between Wand S in chaeta number, i.e. show
little or no dominance, while at environments 4, 5 and 6 (at 25° C) they
are much closer to W than S. The dominance relations of Wand S will
thus depend on the environment in which they are measured. The dia-
gram thus shows us both how and why the estimate of dominance can
change, and change drastically, with alteration of the environment. It
has also shown us the value of investigating sensitivity to environmental
change as a character in its own right.

29. Variance of F2
So far we have been considering the situation where the environments
are defined and hence distinguishable from one another. The expression
of the different genotypes can then be observed in each environment
and the changes of expression related directly to change from one en-
vironment to another. The analysis is thus essentially one of components
of means. Frequently, however, the environments are not so definable
and unambiguously distinguishable. Thus, for example, the results from
plants grown on distinct blocks in an experimental field can be handled
by the methods we have been discussing because although we cannot
specify the chemical or physical differences between the environments
associated with the different blocks we can at least distinguish unam-
158 Interaction of genotype and environment

biguously the plants grown in the environment of block 1 from those


grown in the environment of block 2 and so on. At the same time we
would expect there to be similar, if smaller, differences between the
environments in different parts of a single block, with the consequence
that genotype X environment interaction must be affecting the results
from plants from the same block, although these will not be assignable
to environments identifiable as contrasting in the way possible where
the comparison is between blocks. Where this is the case we must pro-
ceed in a different way, relying on variances rather than means for re-
cognizing and analysing the interaction.
Let us consider a single gene difference on the one hand and a single
environmental difference on the other. We assume that no matter what
its genotype each individual has an equal chance of occurring in each
of the two types of environment. Since the environments are not unam-
biguously distinguishable from one another, this condition must in prac-
tice generally require that the individuals irrespective of genotype are
distributed at random over the range of environments present. Table 49
sets out the phenotypes expected from the three genotypes in the two
TABLE 49.
Contribution of g X e interaction to variances of parents, Fl and F2
over two environments

Environment Overall
2 Mean Variance
Parent 1 AA d+e+gd d-e-gd d (e + gd)2
Fl Aa h+e+gh h-e-gh h (e + gh)2
Parent 2 aa -d+e-gd -d-e +.gd -d (e - gd)2

F2 mean !h + e + !gh !h -e -!gh !h !d 2 + !gi + !h 2


+ !gh2 + (e + !g0 2

classes of environment, expressed in terms of their deviation from the


mid-parent, m. Taking the three genotypes individually we see that their
expressions, averaged over the two environments, are, d, hand -d re-
spectively. In other words the only means available tell us nothing about
the interaction. Their variances however, do so. In the absence of inter-
action they will all be equal but with interaction present they will no
longer be equal. They become (e + gd)2, (e + gh)2 and (e - gd)2 for AA,
Variance of F2 159
Aa and aa respectively. Extending this to any number of environmental
differences in respect of which the genotypes are distributed at random,
we can see that even where the different environments are not directly
distinguishable, genotype X environment interaction can be detected by
differences in the variances of the phenotypes produced by the different
genotypes. This is effectively the same test that we were using in Table
45, although in that case the environments were distinguishable from
one another.
Turning to the F2 , where the environments are distinguishable and
each individual hence assignable to its environment, the means of the
F2 in the two environments differ by 2(e + !gh), one of them being
!h + e + !gh and the other!h - e - !gh. ~F2 also differs in the two
environments. It can still be represented in the form !D + !H but the
definition of D and H change with the environment, the gene contribu-
ting (d + gd)2 and (h + gh)2 to D and H respectively in one environment
and (d - gd)2 and (h - gh)2 in the other. We are in fact elaborating the
simple definition of d and h that we use in a single environment to take
I
into account the interaction of the gene with its environment.
Where the environments are not distinguishable and we must there-
fore take the mean and variance of the F2 as a whole, the contribution
of the gene-pair to the mean becomes !h, which is of course the average
of the means in the two separate environments, !h + e + !gh and !h - e
- !gh. As with parents and FI' the overall mean gives no information
about the interaction. But again as in the earlier case the variance does
reflect the interaction, being V1F2 = !d 2 + !gd 2 + !h 2 + !g/ +
(e + !gh)2. Now if we use the variances of the two parental homo-
zygotes and their Fl to provide an estimate of non-heritable variation as
we have done in earlier chapters and combine them in the F2 ratio itself,
i.e. by finding! VAA + ! VAa + ! ~a' our estimate becomes
He + gd)2 + He + gh)2 + He - gd)2 = !gi + !g/ + (e + !gh)2.
Deducting this from ~F2 to estimate the gene's contribution to the heri-
table component of the F2 variance leaves us with

H~F2 = !d 2 + !gi + !h 2 + !gh 2 + (e + !gh)2


- !gi - !gh 2 - (e + !gh)2 = !d 2 + !h 2

which is the same as is found in the absence of genotype X environment


interaction.
The result is not difficult to generalize for more than two environ-
160 Interaction of genotype and environment
ments. Consider t environments 1 to t, among which parents, F 1 and F 2
are distributed at random, the probability of any individual falling into
a given environment being I It i.e. equal for all environments. (Note that
if one type of environment is more common than another, it can be
accommodated in the formulation by letting an appropriate number of
the t environments all be of this kind.) Each environment has its own e,
gd and gh' those in environment 1 being el , gdl and ghl etc., where See)
= 0, S(gd) = 0 and S(gh) = O. The phenotypes of parents and Fl' and the
mean phenotype of F2 in each environment will be as set out in Table 50.
TABLE 50.
Variances of parents, F 1 and F2 over t environments

Environment Overall
1· ........... t Mean Variance
Parent 1 AA d + el + gdl· ... d + et + gdt d see + g~2
Aa h + el + ghl .... h + et + ght h see + g~2
Parent 2 aa -d+el-gdl·· -d+et-gdt -d see _g~2

!h + el + tghl . ·!h + et + !ght th !d 2+ !Sgi + ih 2


+ iSgh2 + see + tg~2

Then taken over all environments, the means of the parents are d and -d
respectively, that of Fl is h and that of F2 is tho The variance of the AA
paren t will be S (e + gd? which will also be V (e + gd) since with each
environment carrying l/t of the individuals the SS will also be the MS.
The variances of the other parent, Fl and F2 are similarly shown on the
right-hand column of the table. Now when the parental and Fl variances
are combined in the F2 proportions they giv~ !S(gd)2 + 1S(gh)2 + See +
tgh )2 and subtracting this from ~F2 gives the heritable component due
to the gene A-a as
H~F2 = !d 2 + !S(gd)2 + 1h2 + 1S(gh)2 + See + !gh)2
-!S(gd)2 - 1S(gh)2 - See + !gh)2
just as we found earlier in the simpler case of two environments.
The extension to more than one gene difference however, brings in a
new problem. This is simply illustrated by the case of two gene differ-
ences, A-a and B-b, in two environments. It is easy to show that the
Variance ofF2 161
variances of the four possible homozygotes will be

JAABB = (e+ gda + gdb)2; JAAbb = (e+ gda - gdb)2;


~aBB = (e - gda + gdb)2; ~abb = (e - gda - gdb)2.

Thus if we use AABB and aabb as the parents from whose cross the F2 is
raised, the average of their variances will clearly be (gda + gdb)2 + e 2
while with the alternative pair of parents, AAbb and aaBB, it will be
(gda - gdb)2 + e 2. The variance of Fl will be (e + gha + ghb)2 in both cases,
so combining parents and Fl variances in the F2 proportions will give

!(gda + gdb)2 + !(gha + ghb)2 + (e + !gha + !ghb)2


with the cross AABB X aabb
and

!(gda - gdb? + !(gha + ghb)2 + (e + !gha + !ghb)2


with the cross AAbb X aaBB.
Now in the absence of linkage the composition of the F2 will be the same
from both crosses and its variance in respect of these two gene pairs will
be
JjF2 = !d/ + !d/ + !gd/ + !gdb2 + !h/ + !hb2
+ !gh/ + !ghb2 + (e + !gha + !ghb)2
which, after deducting the variances of parents and Fl combined in F2
proportions leaves

H J?;F2 = !d/ + !db2 + gdagdb + !h/ + !hb2 - !ghaghb

the term in gdagdb being negative where the cross was AABB X aabb and
positive where it was AAbb X aaBB. The estimate of the basic genetical
component of the variation is thus not free from the effects of the en-
vironmental interaction where two or more genes are involved. These
residual effects depend on cross-product terms of the kinds gda gdb and
ghaghb and as the number of genes rises the number of such terms rises
relative to the number of squared terms of the kinds gda2, gdb 2, gh/' ghb 2
which are eliminated. The residual effects are therefore prospectively
the more troublesome as the number of genes in the system increases.
In the case of the gd terms the residual effects could be eliminated if
all the homozygotes (four with two gene pairs) were available for their
variances to be compounded in finding the correction to be deducted
162 Interaction of genotype and environment
from ~F2' but this will seldom be possible. The signs of the terms in
gd. gd. will however, depend not only on the intrinsic signs of th e indi-
vidual gd'S but also on whether the relevant genes are associated or dispersed
in the parental homozygotes. If the genes are suitably dispersed between
the parents the net result could be that on summing over all pairs of
gene differences the aggregate S (gda gdb) was negligible. The estimate of
D = S (d 2 ) would then not be greatly affected by the covariance of the
interactions. The sign of the terms in gh.gh. on the other hand, depends
only on the intrinsic signs of the individual gh'S. Unless therefore there
is an approach to equality in the number of positive and negative g's,
the aggregate S(ghaghb) cannot be expected to become negligible.
Similar terms in S (gd. gd) and S (gh. gh) will be associated with the con-
tributions made by D = S(d 2) and H = S(h2) respectively in the variance
derived from later generations such as F3 . The relative size of the contri-
butions made by these terms will depend not only on the variance in
question, whether for example it is ~F3 or V 2F3 , but also on the detailed
design of the experiment from which the variances are estimated. The
presence of genotype X environment interaction is, however, always
liable to introduce bias into the estimates of D and H, the amount of
bias depending on the extent to which the different gd gd and gh gh items
balance out in S(gd.gd) and S(gh.gh) respectively. Thus, wherever differ-
ences in the variances of the two parental lines and the Fl suggest size-
able interaction components of variation, we must treat the estimates of
D and H with corresponding caution.
Randomly breeding
populations
30. The components of variation
So far we have been concerned with the analysis of data obtained from
true-breeding lines and the descendants of crosses made between them.
Following such a cross, a multiplicity of generations and types of fam-
ily can be raised experimentally - a multiplicity limited only by the bio-
logical properties of the material (whether, for example, it can be selfed
as well as crossed, whether individuals can be kept alive for crossing to
their own offspring and so on) and by the time and facilities available
for the experimental programme. Each generation and type of family
will have its own mean and variance, and its own covariances with other
related families. Thus a large number of statistics can be obtained from
which we can estimate the genetical and environmental components of
both means and variances. The specification of these components of
variation is simpler because by starting with true-breeding lines we can,
in the absence of selective elimination; specify the relative frequencies
of the types of zygotes and gametes that we expect in and from any
given type of family.
When however we turn from the descendants of crosses among true-
breeding lines to consider genetically heterogeneous populations of un-
specified constitution, not only is the situation more complex, but the
range of statistics available from the populations is commonly more
limited. We can of course ascertain the mean and variance of the popu-
lation itself; but given that it is in equilibrium and that non-heritable
effects are not changing, these will be the same within sampling vari-
ation from one generation to the next. We can also compare the vari-
ation within families raised from pairs of parents with the variation be-
tween them, and we can look at the covariation between individuals of
different genetical relationships, such as parents and offspring, full-sibs,
164 Randomly breeding populations
half-sibs, first cousins and so on, provided we can recognize individuals
with these relationships. Our analysis will thus depend on differences in
second degree statistics rather than means and we shall not in general
have the direct estimates of non-heritable variation that are provided by
the variation of homozygous lines and their F 1 S in the experiments we
have discussed in earlier chapters.
Let us consider the gene pair A-a in a population in which mating is at
random, the frequency of allele A being ua and that of allele a being va =
I - ua • The incidence of the three genotypes in respect of this gene pair
will then be AA u/; Aa 2ua va; aa v/. AA and aa deviate by da and -da
respectively from the mid-parent and Aa by ha • Then in respect of this
gene pair, the population mean will be u/da + 2uavaha - v/da = (ua -
va) da + 2ua Va ha· The contribution of A-a to the variance of the popu-
lation will thus be
u/d/ + 2uavah} +v/d/- [(ua -va)da + 2uavaha]2
which reduces to 2uava [d/ + 2 (va - ua)da ha + (1 - 2ua va)h/]
which in its turn can be rewritten as
2ua va [d/ + 2 (va
- ua)da ha + (va - ua)h/ + 2ua Va h/]
= 2uaa
V [da +(va -U)h]2+4u 2 2 2
a a a Va ha .
Where the genes are independent in their action and uncorrelated in
their distribution within the population, the total heritable variation will
be the sum of a series of such terms, one from each gene pair, namely
VR = S2uv[d+ (v-u)h]2+ S4U 2 V2 h 2.

Ifwe now put DR = S 4uv [d + (V - u)h]2 and HR = S 16u 2 v 2 h 2 the


heritable variance becomes tDR + !HR , and apart from sampling vari-
ations this heritable variance will be constant from one generation to
another. We have already met these expressions for DR and HR earlier,
when we were discussing undefined diallels in Section 17.
Where u = V = t for all genes, as in the F2 of a cross between two true-
breeding parental lines, these expressions for DR and HR reduce to S(d 2 )
and S(h2) and the heritable variance itself becomes tD +!H as already
found for ViF2. This is indeed as it should be since an F2 can be regarded
as the special case of a population where necessarily u = V = !. It will
thus be seen that if and only if u = V the contributions made to the heri-
table variance by d and h will be capable of complete separation. Where
u =1= V DR will always be affected by h, and HR will be correspondingly
The components of variation 165
less than the sum of h 2 • DR will be greater than S(d 2 ) where S(v - u)h is
positive which will happen when the dominant genes are in general rarer
than their recessive alleles. Equally DR will be less than D = S (d 2 ) where
S(v - u)h is negative, that is where the dominant genes are in general
commoner than their recessive alleles, (Fig. 20). In fact if in general h >
d, DR will become 0 where u = (d + h)/2h.

1-5

1-0

c
_Q
-'

g
o

0-5

o 0-5 1-0
Gene frequency (u)

Fig. 20. Change in the contribution made by a gene pair to DR and HR ac-
cording to u, the frequency of the dominant allele, in a randomly breeding
population, where d = h = 1.

The value of DR thus depends not only on the effects of the various genes
of the system when in the homozygous state, which we denote by d, but
also on h, their effects when heterozygous, and on the allele frequencies
t
u and v. Only when either h = 0 or u = v = (or of course when both
conditions are satisfied) does DR = D = S (d 2 ). Thus DR is not in general
the additive variation as we have defined and used this term in the earlier
chapters.
It is nevertheless frequently referred to as such. As so used it is the
166 Randomly breeding populations
TABLE 51.
The pair matings in a randomly breeding population in respect of
a single gene difference

Female parents
AA Aa aa
Frequency u2 2uv v2
AA u2 u4 2u 3 v U 2V 2 Frequency

-'"~
....
<I)

<U
P.. Aa 2uv
d
0
2 u3 v
!(d + h)
!Cd -h)2
4U 2 V 2
h
0

2uv 3
mean
variance

<I)
0;3 !(d+h) !h !ch -d)
:E k(d-h? !d 2 + kh 2 l(d + h)2
aa v2 U 2 V2 2 uv 3 v4
h !(h -d) -d
0 k(d + h)2 0
Overall mean (u - v)d + 2uvh

additive variation only in a statistical sense, and not in the genetical


sense that we have adopted. Unlike D it is not a direct measure of the
variation that is genetically fixable and so cannot be taken as a certain
guide to the innate capacity of the population for permanent genetical
change by selection or other means of genetical manipulation.
If a population is composed of a series of families, each the progeny
of a pair of individuals, the variation of the population may be sub-
divided into variation within families and variation between them. Where
mates come together at random in relation of their genotypes, there are
nine possible types of mating in respect of a single gene difference (Table
51). In respect of the parental and progeny genotypes, these nine types
of mating fall into six classes which are recognizable as equivalent to the
two parental Fl' F2 and two back-cross families; whose means and vari-
ances are already known from Chapter 3.
The mean variance within these families is found directly from Table
51 by summing the products of the frequencies of the families and their
variances, to give
U 4 0
a . + 4 ua3 Va·<l~ (da - ha)2 + 2 ua2 Va·
2 0 + 4 u 2 V 2 (~d 2 + ~ h 2)
a a· Z a <l a

+ 4ua v/.tcda + ha)2 + Va4 . O


The components of variation 167
which reduces to

Uava d/ + Uava (va - ua)da ha + (Uava - u/ v/)h/


= Uava [da + (va - ua)haF + 3 U/ v/ h/.
On summing over all relevant genes this yields !DR + fbHR .
The variance of family means, measuring the variation between fam-
ilies is similarly found from the frequencies and means of the types of
family, as
d2
U 4
aa + 4 Uaa
3 V (~d + ~ h )2 + 2 U 2 V 2 h 2 + 4 U 2 V 2 (~h )2
2 a 2a aaa aa 2 a
+ 4uav/ (-tda + tha)2 + va da - [(Ua - va)da + 2uava haF
4 2

the last term being the correction for the overall mean. This reduces to
Ua va [da + (va - ua)haF + ua2 v/ h/.
Summing over all relevant genes we obtain !DR + fgHR .
These two variances sum to give tDR + !HR , the· total heritable vari-
ance of the population, as obviously they must. In the special case of
U = v = t, where DR becomes D and HR becomes H, the two variances
become respectively!D + fbH and !D + fgH which we have already
found for 1-2s3 and ~S3' Thus such families within a population may be
regarded as the general case of biparental families obtained from an F2 ,
just as the population itself corresponds to the general case of the F2 •
The members of a single family are distinguishable in the population
as full-sibs. The covariance of such full-sibs may be obtained directly,
but it is simpler to note that where a population is divided into groups
of like status, such as our families of full-sibs, the mean covariance of
two members of the same group can be shown to equal the variance of
the group means. We can therefore immediately write down the covari-
ance of full-sibs taken over the population as a whole as !DR + b,HR .
Where the mating system of a population is such that any parent may
leave a number of offspring, the second parent of which is however
prospectively different for each of them, this second parent being drawn
at random from the population, full-sibs will be rare but groups with one
common parent, and composed therefore of what are termed half-sibs,
may be recognized. In such a case the second parents may be regarded as
providing a set of gametes having the population frequencies of A and a,
namely ua and va' The properties of these families will thus be as shown
in Table 52. The contributions of A-a to the mean variance of the single
parent families and to the variance of their means are given at the foot of
168 Randomly breeding populations
TABLE 52.
Families of individuals having one parent in common, and thus composed
of half-sibs (HS), in a randomly breeding population in
respect of a single gene difference
[Note: since the second parents of the progeny of any common parent
are drawn at random from the population they may be
regarded as providing an array of uA + va gametes for
fusion with those of the common parent]

Common parent

AA Aa aa
Progeny u1 2uv v1 Frequency in population
d h -d Phenotype

AA u tU 0 Frequency in family
d d Phenotype

Aa v t(u + v) u
h h h

aa 0 tv v
-d -d

Family mean ud + vh H(u - v)d + h) uh -vd

Family variance uv(d-h)l 2uvd 1 uv(d + h)2


+H(v-u)d+hJl

Meanvariance = tuv[d+(v-u)h)2+4u2Vlhl ..... V2HSR = }DR+~HR

Variance of means = tUV [d + (v - U)h)2 --> V1HSR = WHSR = ~DR


(= covariance of half-sibs)

Covariance of parent and offspring = uv [d + (v - U)h)l --> WpOR = ~ DR

the table, and it will be seen that on summing over the relevant gene dif-
ferences, the heritable portion of mean variance becomes iDR + !HR and
that of the variance of the family means becomes !DR" These two vari-
ances sum to !DR + !HR the heritable variance of the population, as in-
deed they clearly should. The covariance of the half-sibs, of which these
families are composed, will of course be the same as the variance of the
family means, namely !DR •
One further statistic may be found from Table 52. The covariance of
a single parent and its offspring is found as the covariance of the com-
mon parent and the mean of its offspring as set out in the table. This is
clearly
The components of variation 169
u/ da [ua da + va ha1+ Ua va ha [(Ua - va)da + ha1+ V/ da [Va da
- ua ha1- [(Ua - va)da + 2uavahaF = UaVa [da + (Va - ua)haF
the correction term being the square of the population mean, since this
is the mean of all the parents as well as the mean of all their progeny.
Summing over all the relevant gene differences then shows the covari-
ance of parent and offspring to be tDR .
All that remains to complete these formulations of different variances
and covariances derivable from the population is to add in the appropri-
ate items for non-heritable and sampling variation. Here as in our earlier
consideration eSection 12) we must distinguish between the non-
heritable variation among members of the same family and that be-
tween families. If we denote by Ew' the non-heritable variance within
families, the mean variances of full-sib and half sib-families become
respectively tDR + nHR + Ew and iDR + nHR + Ew- Where Eb is the
non-heritable variance between families the variances of family means
must obviously include E b . They will, however, also include an item for
sampling variation which will of course be 1; V, where V denotes the
relevant mean variance and n is the number of individuals in the family
or the harmonic mean of these numbers if they vary from one family to
another. Thus if ~SR and ViSR stand for the variance of the mean and
the mean variance of full-sibeS) families as observed in a randomly breed-
ing population
~SR = tDR + -hHR + Eb + keVisR)
and ViSR tDR + nHR + Ew'
Similarly for half-sib families, denoted by the inclusion in the suffix of
HS in place of S standing for full-sibs,

~HSR
and
Now if the individuals from a family are distributed independently of
one another across the range of the environments throughout their lives
there will be no cause of non-heritable variation between families ad-
ditional to those within, and Eb = O. But if families remain together,
perhaps also enjoying parental attention as in many animal species, or
being endowed by the mother with nutritional resources on which to
draw during early life as happens in both plants and animals, there will
be non-heritable differences between families which go beyond those
170 Randomly breeding populations
within: Eb is then> 0 and will be reflected by a corresponding increase
in the variance of family means. Furthermore, since members of the
same family will share the same environment in respect of such family
effects while members of different families will not, their covariance
will reflect Eb also, whether they are sibs or half-sibs. An Eb component
must therefore also be included in these covariances which thus become

and
And if offspring in some measure share the environment of their parents
the same will be true to a corresponding extent of the parent/offspring
covariance" which must thus be written as

WpOR = !DR + E~
the prime indicating that the non-heritable effects common to parent
and offspring may not be just the same as that shown by members of
the same progeny.
These various results are collected together in Table 53. Two points
remain to be made about them. First, where nutritional resources for
early life are provided by the mother, or where parental attention is
provided and it is not the same from mother and father, the Eb compo-
nent in the covariance of half-sibs will be different according to whether
the common parent is mother or father. Secondly the non-heritable vari-
ance of the population as a whole will be Ew + Eb since each individual
will reflect both effects. Where Eb = 0 this non-heritable component of
TABLE53.
Composition of variances and covariances in a randomly breeding population

Relationship Statistic
Full-sib families VJSR == !DR + -/r,HR + Eb + 11 V2SR
(both parents common) V2SR = !DR + ff,HR + Ew
WSR = !DR + -k,HR + Eb
Half-sib families VJHSR = lDR + Eb + t V2HSR
(one parent common) V2HSR = iDR+!HR+Ew
WHSR iDR+Eb

Parent and offspring WPOR !DR+E;

Whole popUlation VR !DR + !HR + Ew+ Eb


Human populations 171
VR will of course reduce to Ew which will obviously cover all the environ-
mental differences among the individuals of the population.

31. Human populations


In most species we can carry out the analysis of a population by exper-
imental means, that is by using families obtained from controlled matings
and by adopting experimental designs that will enable us to disentangle
the various heritable components of variation both from one another
and from the non-heritable components. How this can be done will be
seen in a later section, but first we must look at a species, our own,
where neither controlled mating nor the controlled distribution of indi-
viduals or groups of individuals among the differing environments is
possible. Despite these limitations, man offers many advantages for the
study of populations. In particular, in our present context, we know
more about the variation to be observed in human populations than in
those of any other species; we can trace a more complex and wider range
of relationships than in any other species; we can observe human mates
even if we cannot control their choices, and so can detect and measure
departures from random mating among them; and we can detect with
some confidence monozygotic twins and distinguish them from their
dizygotic counterparts.
The classical approach to the genetical analysis of continuous vari-
ation in man is by the use of correlations between individuals of known
relationships, an approach that was initiated by Galton a hundred years
ago and put to such good use by Fisher in 1918. In principle, such corre-
lations are obtainable for many different degrees of relationship, but in
practice relatively few have been used. We will illustrate this approach
and its limitations using two genetical relationships, those between
parent and offspring and between full-sibs. (To these we will add the
correlation between spouses which is zero when mating is at random
but which is commonly observed to depart from this expectation.)
Fisher (1918) records that Pearson and Lee observed the correlation
between parent and offspring (rpo ) to be 0.4180, and that between full-
sibs (rss) to be 0.4619 in respect of the cubit measurement, that is the
length of the forearm from elbow to fingertip. If we assume mating to
be at random we have from Table 53

= 0.4180
172 Randomly breeding populations
!DR + -hHR + Eb
and rss = 0.4619.
!DR+!HR+Ew+Eb
The denominator used in finding rpo is, of course, the geometric mean
of the variances of parents and offspring, but when single parents and
single offspring are used in finding WpOR and these are a fair sample from
the population, the variances of both parents and offspring, and hence
their geometric mean, will all be VR as shown. The same argument applies
to the denominator used in finding rss.
If we could further assume that the non-heritable variation between
individuals from different families was no greater than that between in-
dividuals from the same family, i.e. Eb = E~ = 0 these equations would
reduce to
lD +lH +E = 0.4180
2 R 'f R w

!DR + -hHR
and lD + lH + E = 0.4619.
2 R 'f R w

Although we would still have three parameters with only two equations
and so be unable to estimate the numerical values of the parameters, we
could find their values relative to one another or more usefully find the
relative contributions that DR' HR and Ew made to VR the total variance
of the population. Thus

!DR = rpo VR = 0.4180 VR

-hHR = (rss - rpo) VR = 0.0439 VR ·


Then !DR would be 0.8360 VR and !HR would be 0.1756 VR leaving
Ew = -0.0116 VR , and we should conclude that the variation in thepopu-
lation was almost entirely heritable, with Ew very small and our estimate
of it becoming negative through sampling variation.
The assumption that non-heritable variation between individuals from
different families is no greater than that between individuals from the
same family would, however, be very difficult to sustain in man: indeed
our experience would point strongly the opposite way. We cannot there-
fore set Eb =E; = 0 and must use the full equations which include these
parameters. When we do so we find that in place of DR' HR and E w' as
they appear in the solutions of the equations when simplified by the
omission of Eb and E~, we have DR + 4E~, HR + 16(Eb - E~) and E w -
3Eb + 2E~, giving as the partition of VR
Human populations 173
!DR + 2E~ = 0.8360 fR
!HR + 4(Eb - E~) = 0.1756 VR
Ew - 3Eb + 2E~ = -0.0116~.

Furthermore it is impossible to take the analysis further because it is


impossible to separate the estimate of DR from E;, that of HR from Eb -
E~ and that of Ew from Eb and E~ although we might note that the esti-
mate of HR is less affected by non-heritable differences between families
than is that of DR' and indeed is completely free of non-heritable effects
if E~ = Eb • Thus our conclusions must be revised: the results show that
the variation of the population is almost entirely accounted for not just
by genetic differences but by genetic differences plus the non-heritable
differences between families, and until we can find some means of sep-
arating the genetical effects from the non-genetic differences between
families we can take the analysis no further. How this can be done using
twin studies in particular we shall see in a moment, but before proceeding
to this we must look at a further complication in the analysis of corre-
lations among human relatives.
In deriving our formulae for rpo and rss we assumed mating to be at
random. We know, however, that this assumption is not fully justifiable.
In respect of the cubit measurement, Pearson and Lee found that there
is a correlation between spouses of rFM = 0.1977. In other words there
is positive assortatitve mating in respect of this character: there is a tend-
ency for like to mate with like in respect of this (as indeed of most other)
characters in man. The effects of assortative mating may be complex and
a detailed consideration of them is beyond the scope of our present con-
sideration, but two results may be noted. First, where a large number of
gene differences are involved in the variation, assortative mating does
not alter the contribution of dominance deviations (h's) to the variation.
Nor does it affect the additive variation within families. It does however,
change the additive variation between families, raising it with positive
assortative mating, i.e. when rFM is positive, and lowering it with nega-
tive assortative mating, i.e. when rFM is negative. Secondly, since both
the rpo and rss depend on comparisons between families, they will be
increased by positive assortative mating such as has been observed for
the cubit measurement. So any analysis of variation based on the as-
sumption of random mating, such as the one we have carried out, will
overestimate the additive genetic component if positive assortative
mating is in fact in operation. We can obtain an idea of the extent of
this overestimation by noting that, other things being equal, rpo is pro-
174 Randomly breeding populations
portional to (1 + rpM). Since for the cubic measurement rpM was found
to be 0.1977, rpo will be raised to approximately (1 + 0.1977) times the
value it would have shown had the popUlation been truly random mating
as we assumed. Thus the true contribution of DR + 4E; to the popu-
lation variance should have been about 1/1.1977 or say 5/6 the value
we found, i.e. 0.70 instead of 0.84 as we calculated it on the assumption
of random mating. How this change should be apportioned between DR
and E;cannot of course be determined, since we cannot separate these
parameters in the analyses.

32. The use of twins


If twins arise at random in the population the total variation among a
sample of them, assuming random mating, will be the same as in the
population as a whole, namely, !DR + !HR + Ew + E b. Where we have
monozygotic (identical) twins raised together (MZT) in their natural
family groupings the variation within the pair is entirely due to non-
heritable causes operating within a family and will therefore on average,
be Ew. All the remaining variation, that is !DR + !HR + Eb, will be be-
tween the means of families of twins. This expectation, however, like all
theoretical expectations assumes very large family sizes whereas with
families of twins the family size is very small, indeed it is always two.
The expected variance of family means must, therefore, have added to
it half the mean variance within families, that is !Ew. (For families of
size n this would be ~Ew and, of course, where n is very large this reduces
for all practical purposes to zero.)
If we now have monzygotic twins that have been raised apart (MZA)
and these are a random sample of all twins, the difference within twin
pairs will still be entirely non-heritable but it will now include both
within and between family components, that is both Ew and Eb. Provid-
ing that the separated twins are distributed at random among families in
the population the mean variance within twin pairs will be Ew + Eb and
the variation between pairs of twins means after correcting for the effect
of families of size two, will therefore be

With monozygotic twins raised apart we can separate the environmen-


tal and the heritable sources of variation independently of the model as-
The use of twins 175
sumed for the latter and therefore independently of the kind of gene
action and interaction present and the mating system. Even if we assume,
however, that an additive-dominance model with random mating is the
appropriate model giving HEw + Eb) and tDR + !HR respectively as the
non-heritable and heritable components this does not enable us to sep-
arate the additive and dominance variation. And even if we further com-
bine data from MZT and MZA we still cannot separate DR and HR
although we may now separate Ew from Eb.
Shields (1962) reports a measure of Neuroticism in man for 29 pairs
of monozygotic female twins raised together, 26 pairs raised apart .and
14 pairs of male twins raised apart which have been analysed by links
and Fulker (1970). The mean variances within families Vp and the vari-
ances of family means Vp are as follows:

Females Males Expectations


MZT Vp 11.0819 tDR + !HR + tEw + Eb
V'p 8.1207 Ew

MZA
Vp 14.5608 14.7307 tDR + !HR + fEw + tEb
V'p 9.6635 5.0000 Ew + Eb·
The estimates of the total variance of the MZT and the two MZA
samples ·are not significantly different. This is expected on the model
since they should all be estimates of tDR + !HR + Ew + E b. Equally the
mean scores in the three samples do not differ significantly, being 9.72,
11.86 and 10.71 respectively. This too is expected on our model which
assumes that all three samples are drawn from the same population and,
therefore, have the same genetical and environmental sources of vari-
ation. This does not of course mean that the specification of the genetical
componeJit as tDR + !HR and the environmental component as Ew + Eb
is necessarily adequate but that the genetical and environmental compo-
nents are the same for all three samples whatever their compositions. We
can, therefore, regard the males and females as replicate estimates of the
statistics for the purposes of analysis. For twins raised apart therefore

Vp = tDR + !HR + lEw + fEb = 14.6458


and Vp = Ew + Eb = 7.3318
hence tDR + !HR = 10.9799.
Some 60% of the variation is, therefore, due to heritable differences
176 Randomly breeding populations
and 40% due to environmental differences. Since we have not needed to
take account of the make up of either the heritable or environmental
portions in arriving at this partition the result would be the same irres-
pective of the model assumed for either.
This still remains true for the genetical component even on combin-
ing the twins raised together and apart. On our simple model we now
have four statistics but only three parameters since !-DR and !HR are
still inseparable. We can therefore, obtain least squares estimates by the
normal procedures (Section 9). These are
!-DR + !HR = 10.0291
Ew 8.7546
Eb = -2.0568.
We can now calculate the expected values of the four statistics and we
have one degree of freedom for comparing the observed and expected
values. From replicate statistics (males and females for MZA) we have
an error variance for two degrees of freedom against which to test the
significance of the discrepancy between observed and expected values.
Observed Expected Deviation
Vp 11.0819 12.3496 -1.2677
MZT
VF 8.1207 8.7546 -0.6339
Vp 14.6458 13.3781 1.2677
MZA
Vp 7.3318 6.6978 0.6339
The SS of deviations is (-0.6339)2 + (-1.2677)2 + (0.6339)2 +
(1.2677)2
= 4.0152 for 1 df

and the SS for replicates is !-(14.5608 - 14.7307)2 + !C9.6635-


5.0000)2
= 10.8871 for 2 df.
However, for two of the four statistics to which we are fitting the
model we are working with the averages of two replicates and hence the
replicate mean square appropriate for testing the deviation mean square
is
!- CO.~871 + 10.!871) = 4.0827.

The two mean squares clearly do not differ.


The use of twins 177
The deviations, therefore, are no greater than would be expected to
arise from error variation and we can conclude that the model fits ad-
equately. The deviation mean square and the error mean square being
homogeneous may be pooled to give an error variance of 4.0602 for
3 df. By multiplying this by the appropriate coefficients on the leading
diagonal of the inverted matrix (see Section 9) we obtain the error vari-
ance of each of the estimates of the three components and hence their
standard errors. These are:
tDR + !HR = 10.0291 ± 2.0400 t(3) 4.92 P = 0.01 - 0.02
Ew = 8.7546 ± 1.9116 '(3) = 4.58 P = 0.02
Eb = -2.0568 ± 2.5488 t(3) = 0.81 P = 0.40 - 0.50.
Thus, although the error variance is based on very few degrees of
freedom we can see that the estimate of the genetic component and Ew
are significant while the negative Eb is not. These estimates are in good
agreement with those obtained from our earlier analysis of MZA's alone.
Before proceeding further we should reiterate that while we make the
usual assumptions of random mating and no non-allelic interactions to
arrive at the expectations for the heritable variation, the process of fitting
the model and the estimates obtained would be unchanged irrespective of
the mating system and the nature of the gene action or interaction. As we
pointed out earlier this partitioning of the variation into a heritable and
a non-heritable component makes no assumptions about the nature of
either.
Examination of the deviations of observed and expected values of the
four statistics shows that they are identical in value but opposite in sign
between the statistics from twins raised together and twins raised apart.
The I df for testing the significance of these deviations is in fact testing
the difference between the total variance components of the two types
of twins which we expect to be identical. This test is, therefore, equiv-
alent to our earlier test of the homogeneity of the three total variances
and not surprisingly they agree in finding the model adequate.
We could improve these estimates and increase the power of the test
of significance by repeating the estimations using a weighted least squares
procedure (Section 9). With such a good fit to the model as is shown by
these data, however, the improvement can only be marginal. It is more
important to consider an assumption implicit in the model whose val-
idity has not been tested by the test of goodness of fit of the model.
This is the assumption of no genotype X environment interaction. Our
178 Randomly breeding populations
tests of the goodness of fit of the model are in effect tests of the hom-
ogeneity of the total variances. Since the total variances are expected to
have the same genetical and environmental components irrespective of
the constitution of these components they are also expected to have the
same genotype X environment interaction components. We cannot,
therefore, test the assumption of no genotype X environment interac-
tions by testing the homogeneity of the total variances, and we have
therefore so far no test of this assumption. Nevertheless, it is possible
to provide a sensitive test for certain kinds of genotype X environment
interactions.
The difference between a pair of monzygotic twins is solely environ-
mental in origin. In the absence of genotype X environment interactions,
therefore, the magnitude of this difference should be independent of the
genotypes of the twin pairs. This expectation is identical with the expec-
tation that in the absence of genotype X environmental interactions the
variation between the individuals of a family should be the same for all
pure breeding lines and the FI'S produced by crosses between them
(Chapter 6). Our measure of the genotypic differences between twins
is the difference between family means of twins raised apart. We can
test the assumption of no genotype X environment interaction, there-
fore, by testing for the independence of the means (or sums) and differ-
ences of twin pairs where the twins have been raised apart (Jinks and
Fulker, 1970). Since in general we cannot cross classify twins over fam-
ilies, the signs we allocate the differences are arbitrary. We can make
them all positive by always taking the smaller twin score from the
larger or all negative by doing the reverse. Equally, we can take them at
random in which case approximately half will be positive and half nega-
tive. We shall adopt the convention of making them all positive.
We can examine the sums and differences for twin pairs for evidence
of non-independence by plotting one against the other for all twin pairs.
Non-independence would then show itself by the points departing from
a random scatter by being distributed along a line or curve. Statistically
we can detect non-independence by calculating the correlation between
sums and differences over the twin pairs. For Neuroticism this leads to
a correlation of r = 0.0583 over the 40 pairs of MZA for 38 df. Clearly
there is no relationship and hence the magnitude of the environmentally
caused differences is independent of the genetical differences. That is,
there is no evidence of genotype X environment interaction.
Monozygotic twins raised together provide a similar but less compre-
hensive test of the assumption because the differences include only
The use of twins 179
within family environmental effects (Ew )' and the sums include the com-
mon environmental effects that arise from sharing the same family en-
vironment (Eb ). Nevertheless, if the scatter diagram and correlation show
no relationship we can still conclude that genotype X environment inter-
actions are absent. If, however, they reveal a relationship we cannot claim
that the presence of genotype X environment interactions has been unam-
biguously demonstrated because, no matter how unlikely this may be,
such a relationship could have arisen because of the non-independence of
the within and between family environmental components. For the
Neuroticism data the MZT's confirm the absence of genotype X environ-
ment interactions (r = 0.1489).
We can, therefore, claim to have separated the genetical and environ-
mental components of variation for Neuroticism without making any
untestable assumptions.
We can extend the analysis indefinitely by adding samples from other
kinds of families and other kinds of relationships. In particular, in the
present context, we can extend it to include dizygotic twins. The stat-
istics obtainable from dizygotic twins have the same expectations as
those of full-sibs for the standard within and between family variances
provided that dizygotic twins arise at random in the population. By
adding these statistics to our earlier analysis of monozygotic twins we
can now test the adequacy of a model that assumes random mating and
an additive-dominance model of the gene action.
Shields (1962) gives the Neuroticism scores of 16 pairs of female
dizygotic twins raised together (DZT) and Jinks and Fulker have pre-
sented a combined analysis of these with the data from the monozygotic
twins. The mean score and the total variance do not differ from those of
the MZT and MZA and all three can, therefore, be regardeti as random
samples drawn from the same population. The six observed statistics,
after pooling males and females as before, and their expectations on the
model are
Source Observed Expected when Model
HR = O,Eb =0

MZT
Vp 11.0819 13.0290 !DR + iHR + !Ew + Eb
Vp 8.1207 7.7199 Ew

MZA
Vp 14.6458 13.0290 !DR + iHR + !Ew + !Eb
P'p 7.3317 7.7199 Ew+Eb

DZT
Vp 11.7828 10.7368 iDR + J.rHR + !Ew + Eb
J:j.. 13.8552 12.3045 iDR + ftHR + Ew·
180 Randomly breeding populations
Fitting the full model by least squares procedures confirms our earlier
conclusion that Eb is not significantly different from zero and also reveals
that HR is not significant. A DR' Ew model may, therefore, be fitted which
with six observed statistics leaves 4 df for testing the adequacy of the
model against the replicate error. The least squares estimates are
DR = 18.6248 and Ew = 8.1605.
We can of course go further and obtain improved estimates of DR and
Ew by weighting the observed statistics by their amounts of information
(Section 9). This method gives
DR = 18.3380 ± 4.9884 c = 3.68 P < 0.001
Ew = 7.7199±1.2755 c = 6.05 P<O.OOI
which agrees with the estimates from the simpler calculations just given
and with the estimates based on monozygotic twins alone. The test of
the fit of the model based on the comparison of the observed and ex-
pected statistics from the weighted estimation leads to an approximate
Xf41 = 1.3717 (P = 0.80) which confirms once more the adequacy of
the simple model.
These four degrees of freedom for testing the adequacy of the model
are made up of two parts. Two degrees of freedom are testing the effect
of omitting HR and Eb from the full model and two are testing the
equality of the total variance components of the three types of twins
which are expected to be equal on the model. Since the DR' Ew model
is adequate this confirms that HR and EI) are not significantly different
from zero and that the total variance components do not differ signifi-
cantly. This in turn confirms the earlier test of the homogeneity of the
three total variances. We can conclude, therefore, that dominance and
the family environment have no detectable effects on the Neuroticism
score and that all three types of twins, that is MZT, MZA and DZT,
are subject to the same heritable and environmental sources of variation.
Hence, the results provide no evidence for the often assumed greater en-
vironmental heterogeneity experienced by dizygotic relative to mono-
zygotic twins.
We have now considered three sets of data each of which allow us to
separate heritable from non-heritable sources of variation. In each set,
however, it is the presence of monozygotic twins raised apart that has
permitted this partitioning. Indeed, as we have seen, we can make this
partitioning solely on the basis of MZA scores and at the same time have
The use of twins 181
available the best test for genotype X environment interactions. What we
cannot do however, without involving other types of twin data or other
kinds of family relationships is to test any other assumptions we may
care to make about the sources of variation, mating system, etc.
Providing that we retain MZT and DZT scores we can substitute di-
zygotic twins reared apart (or full-sibs reared apart) DZA for MZA to
obtain an almost equally effective test of the assumptions and estimates
of the parameters of the additive-dominance model of gene action if
adequate. The expectations of the two variances for DZA on this model
for a randomly mating population are
Vp = iDR + f.IHR + fEw + fEb
VF = tDR + -nHR + Ew + Eb·
As we have seen twins, or alternatively full-sibs, raised apart are in-
valuable for unambiguously separating heritable and non-heritable sources
of variation. The extent to which they allow us to achieve this objective,
however, rests on the validity of the assumption that the two individuals
of each twin pair are distributed at random among the family environ-
ments present in the population. We can test whether 'foster' homes are
a random sample of family environments by comparing their mean and
variance for any particular measure with those of a random sample of
'own' homes. There are a variety of measures we can use for this purpose.
We could, for example, measure the physical environment directly using
an index such as socio-economic class that has been developed by social
scientists for comparing family environments. Equally, of course, we
could measure the environment biologically as we did in Chapter 6 to
analyse genotype X environment interactions. One measure might then
be the phenotypes of the parents, either biological or foster, who pro-
vide the home environment in respect of the character in question.
This can tell us whether foster homes are a random sample. It does
not, however, tell us whether the separated twins were allocated to this
sample of foster homes at random. That is, whether there is a 'place-
ment' effect because successful attempts have been made to match the
fostered individuals with the foster home. In such a case the separated
twins would have been raised independently but in similar family en-
vironments. In order to test for such effects we would have to look for
a correlation between the family environments of separated twins. Our
measure of the family environments would again be based on an en-
vironment index or the phenotypes of the foster parents. Only if the
182 Randomly breeding populations
correlations were non-significant could we conclude that the separated
twins provided a valid estimate of the total environmental effects.
Much of the available twin data consists of MZT and DZT and as we
have already noted an unambiguous analysis of such data is not gener-
ally possible because the simplest additive-dominance, random mating
model has four parameters and we can fit only three as a maximum. If
one of the two parameter models fits, for example Ew and Eb or Ew and
DR' and the others fail we can be confident of the results. If, however,
all the two parameter models fit equally well or fail equally badly no
unambiguous conclusion is possible. We have no basis for choosing be-
tween the alternative two parameter models and all three parameter
models are equally satisfactory since all would lead to perfect fit sol-
utions. What can and cannot be achieved in these circumstances is well
illustrated by the work of N. G. Martin (1975).
Even more typical of the kind of twin data found in the literaturv are
the observations of Holt (1952) on the number of palm print ridges in
man which are presented in the form of correlations for MZT, DZT and
full-sib families. Although correlations provide a useful summary of the
data and are widely used in human genetics, they are not a good starting
point for an analysis. In particular we cannot carry out any of the tests
of assumptions that depend on a comparison of total variances. As
correlations the data have been standardized to the same unit total vari-
ance for all kinds of families and at the same time we lose one statistic
from each kind of family.
For this character mating is known to be at random. The correlation
for monozygotic twins on the additive-dominance model is therefore
tDR + !HR + Eb
r = 0.96 =
tDR +!HR +Ew+Eb
which can be rewritten

tDR + !HR + Eb = 0.96(tDR + !HR + Ew + Eb)·


Similarly, from Holt's correlations,
!DR + -kHR + Eb = 0.47 (tDR + !HR + Ew + E b) for dizygotic twins of
same sex
= 0.49 (tDR + !HR + Ew + E b) for dizygotic twins of
opposite sex
= 0.51 (tDR + !HR + Ew + E b ) for full-sibs.
The last three have identical expectations on this model which as-
sumes that they are subject to the same environmental sources of vari-
Experimental analysis 183
ation. Since the three correlations do not differ significantly we can ac-
cept that this is the case. It is, however, of interest to note ·that the non-
significant differences between them fit a pattern in which the full-sibs
appear to have been subjected to less environmental differences than
dizygotic twins and dizygotic twins of different sexes subjected to less
environmental differences than twins of the same sex. In the absence of
significance, however, we may pool them. In effect we now have three
equations for solving the four unknowns, the third being !DR + !HR +
Ew + Eb = 1.00. We can therefore, estimate three quantities as proportions
of the total variance
!DR + 3Eb = 1.00
!HR - 2Eb = 0.04
Ew = 0.04.
One conclusion that can be drawn from these estimates is that HR , Ew
and Eb must be small relative to DR but to go beyond this we should need
to test the adequacy of all possible two parameter models as described
earlier. This is practicable with variances using weighted least squares
techniques but is not with correlations.
Our aim in this section has been to illustrate the value and limitations
of twin data and for this reason we have confined the discussion to twins
and the only other kind of relationship (full-sib) that has the same ex-
pectation on a simple model. Twin data, however, are at their most
powerful when supplementing the commoner types of relationships
found in natural populations, (Eaves, 1975; links and Fulker, 1970).
But because they are more powerful, more complex sources of variation
become amenable to analysis and sources of variation that do not nor-
mally a.rise in experimental populations reach significance and must be
allowed for in any adequate model. These sources would, for example,
include assortative mating, genotype-environment correlations, co-
operation or competition between siblings and cultural transmission
from parent to offspring. These developments are beyond the scope of
our present treatment but they are described by Eaves et al. (1977).

33. Experimental analysis


The analysis of variation in a population becomes possible by exper-
imental means in species where we can use controlled matings and raise
the progenies in such a way that we can determine the impact on them
184 Randomly breeding populations
of the effects of the non-heritable sources of variation. A number of ex-
perimental breeding programmes are then possible, of which the simplest
is the use of biparental progenies produced by the mating of pairs of
parents taken at random from the population, no parent being used
more than once. With hermaphroditic plants, half the individuals would
be used as males and half as females, and with species where the sexes
are separate, equal numbers of males and females would be taken and
mated in pairs taken at random. We should thus have a number of full-
sib families which could all be made to comprise the same number, n, of
individuals. Then from Table 53 we should have
variance of family means (VF'") !DR + -hHR + Eb + ~(Vp)
and mean variance of families (Vp) = !DR + ToHR + Ew
where DR and HR are the parameters of the population from which the
parents of the families were taken at random. Vp can obviously be cor-
rected for ~ (Vp) to give an estimate of !DR + -hHR + E b • The analysis
can, however, be taken further only if we make further assumptions or
elaborate the design of the experiment. Thus if each family is divided
into, say, halves and each half raised in separate randomized blocks we
can obtain from the family X block interactions an estimate of Eb + ~ (Vp)
and hence of Eb and !DR + -hHR. Even so only if we could assume HR to
be zero would we be able to estimate DR. The difficulty is that such an
elaboration would still not provide enough statistics to estimate all the
parameters. A further statistic is in fact necessary if the analysis is to be
completed.
This further statistic might be sought in either of two ways. First we
could use the parent/offspring covariance, but we should have to take
steps to ensure that in doing so we were not introducing the further par-
ameter E~ and if it were necessary to raise parents and offspring in dif-
ferent environments, as for example, if they had to be grown in differ-
ent years, their covariance might be biased by genotype X environment
interaction. Given however, that a satisfactory estimate of the covari-
ance could be obtained, it would supply us with a further statistic
whose expectation is !DR and the analysis could be completed.
The second, and preferable, approach is to vary the design of the ex-
periment so as to include not only families of full-sibs but new families
standing in the half-sib relation to one another. This can be achieved by
adopting the design often referred to as North Carolina (NC 1), and in-
volves the mating of each parent of one sex (usually for obvious reasons,
Experimental analysis 185
the male) with a number of parents of the other sex, the group of indi-
viduals used of the second sex being a different one for each individual
of the first sex. Thus Robinson et af. (1949) record an experiment with
maize in which 48 plants used as males were each crossed on to 4 fe-
males, making a total of 4 X 48 = 192 females in all, both the males and
the females to which each was crossed being taken at random from the
population. This population was in fact the F2 of a cross between two
inbred lines, CI21 and NC7, but the experiment will serve to illustrate
the use of the NCI design which can be used just as well with any open-
bred population as with an F2 : as we have noted earlier, an F2 may pro-
perly be regarded as a randomly bred population but with the special
condition that u = v = ! for all genes. Thus the only special feature of
the results of Robinson et al. is that they will yield estimates of D and
H rather than just DR and HR , since again as we have already seen D and
H are the special cases of DR and HR where u = v = ! for all genes. We
will, however, use DR and HR in our present analysis as a continuing
reminder of the general applicability of the analysis.
The families produced by the 192 crosses were grown in 12 blocks,
each block including the 16 families from the crosses of 4 males each
to its 4 females. Each block was divided into 2 sub-blocks and all of the
16 families of the block were grown in each sub-block, randomization
of the 16 families being carried out separately for the 2 sub-blocks. The
data we will use relate to yield of grain, expressed as mean pounds per
plot.
The analysis of variance is shown in Table 54. Each block includes 32
plots divided into two sub-blocks of 16 plots each. There is thus 1 df for
TABLE 54.
Analysis of variance of yield in maize (Robinson et al., 1949)

Item df MS
Blocks 11 0.0153
Sub-blocks 12 0.0063
Male groups 36 0.0167 *
Families within groups 144 0.0069 *
Plots within families 178 0.0031 *
Sampling variance of 250 0.0017
plot means
The analysis is in terms of plot means.
* Significant when tested against the appropriate error vari-
ance, which in all these cases is the MS immediately below.
186 Randomly breeding populations
the difference between sub-blocks, 15 for differences among the 16 fam-
ilies in the block and 15 for sub-block X family interaction. The first item
is of little interest to us, but the second provides information about the
effects of the genetical differences among the 16 families and the third
item is a direct measure of the variance of the non-heritable component
of variation in the family means. The 15 df for family differences are sub-
divisible into 3 for differences among the progenies of the 4 males and
3 X 4 = 12 for the differences among the progenies of the females mated
to the same male, averaged over the 4 males of the block. This last item
is clearly a measure of the variance of means of full-sib families, while
the former measures the variance among the means of half-sib groups of
families, since the 4 families tracing back to a single male each has a dif-
ferent mother and are therefore in the half-sib relationship to one
another.
Since the 12 blocks are derived from 12 different sets each of 4 males
and 16 females, we can pool corresponding items from all the blocks and
find I X 12 = 12 df for sub-block differences, 3 X 12 = 36 for differ-
ences among the progenies of different males, 12 X 12 = 144 for differ-
ences among the females mated to the same male, and 15 X 12 = 180 df
for the non-heritable component of variation of family means. Since,
however, two plots failed in the experiment, their means were estimated
by the standard missing plot technique and 2 df were lost from this total
of 180 leaving 178 in the analysis. There are of course 11 df for differ-
ences among the 12 block totals, but, like the 12 df for sub-block differ-
ences, these are of little interest to us. Each plot contained 10 plants
except in a few cases. The results were recorded as the mean yield per
plant for each plot and an analysis of variance was carried out on a
single plot basis. A further observation was, however, made. The mean
variance of plants within plots was found from a sample of the plots
used in this and another related experiment and used to derive an esti-
mate of the sampling variance of the plot means, which is recorded as
0.0017 by Robinson et al. Where Vw is the mean variance within plots
the sampling variance of the mean of plots of 10 plants would be 10 ~,
but there were missing plants in a few plots and the divisor 10 was there-
fore replaced by 9.4 which is the harmonic mean of the actual numbers
of plants in the plots.
The results of the analysis of variance require little comment. The non-
heritable variation of plot means, estimated from the family X sub-block
interaction, is clearly greater than the sampling variance of plot means
arising from the variance of plants within plots. The MS for family X
Experimental analysis 187
sub-block interaction must therefore be used for testing the MS between
females within males which, if significant, must itself be used for testing
the MS between males. Although the VR's are not large, with the high
number of df available these two items are both significant when so
tested and thus combine to provide evidence for genetical variation
among the families. The differences between sub-blocks are not signifi-
cant, while those between blocks are, but as already noted these items
are of little interest for our present analysis and will be used no further.
The further analysis of the variation into the various heritable and
non-heritable components can be carried out directly from the MS's in
the analysis of variance set out in Table 54. This is in fact the approach
used by Robinson et al. (and see M and J, pp. 226 et seq.). It is, how-
ever, somewhat easier to follow if we first find the variance of plot means,
that of family means within male groups (i.e. within groups having a com-
mon male parent) and that between male group means, all of which are
easily derivable from the MS 's ·of Table 54. Since the analysis of variance
was based on single plot observations, the variance of plot means within
families is given directly by the MS for family X sub-block interaction.
Each family included two plots, one in each sub-block, and the variance
of family means within male groups is thus! the MS between families
within groups. Finally each male group includes four families each raised
in two plots, and the variance of male group means thus becomes
1/(4 X 2) = 1 of the MS between males. The variances so calculated are
listed in Table 55, which also includes the mean variance within plots.
TABLE 55.
Components of variation of yield in the maize experiment

Variance of Observed Sampling correction Corrected

Male group means (VM) 0.00209 !, VF = 0.000 86 0.00123 = tDR


Family means within groups (VF) 0.00345 t Vp =0.001 57 0.00188 = tDR + ioHR
Plots within families (Vp) 0.00313 9~4 V2S=0.00170 0.00143 =Eb
Plants within plots (V2SR) 0.01598 0.01598 =!,DR+ ftHR +Ew

Since each plot mean has a sampling variance of 0.0017 as shown in


Table 54, the variance of plants within plots is this sampling variance
multiplied by 9.4, the harmonic mean number of plants per plot.
Now from Table 53 the mean variance within full-sib families is
!DR + -&HR + Ew and the sampling variance this contributes to the vari-
188 Randomly breeding populations
ance of plot means is thus ~ V2SR where n is the harmonic mean of the
number of individuals in the various families, here 9.4. The other com-
ponent of the variance of plot means within families is Eb • We can thus
find from the data in Table 55, Eb = 0.003 13 - 0.00170 = 0.00143.
Since each family mean is derived from two plots it will be subject to
a sampling variance of! the variance between plots within families,
and if we were taking the analysis no further we should deduct! the
variance of plots within families (i.e. !Eb + 2~ V2SR) to obtain the
overall genetical component of variation between. families. We have,
however, sub-divided this variation into two parts, that between fam-
ilies within male groups (i.e. between progenies each with its own
mother but having a common father) and that between male groups
(i.e. between groups of families, each of which group comprises fam-
ilies with a common father). Before we can proceed further, therefore,
we must ascertain how the genetical components divide up between
these two sub-divisions of the variation.
We can obtain this partition of the genetical components by refer-
ence back to Table 51, which sets out the matrix of matings between
the three genotypes of male and the three corresponding types of
female in respect of the gene difference A-a. Now each row repre-
sents the families obtained by mating the various types of female with
a constant, or single, male. In other words the row in the table are
a model for our male groups. The expectation for the genetic part of
the variance of means of male groups is thus given by the variance of
row means and turns out to be !uava[d a + (va - ua)h a ] 2 which on
summing over all relevant genes becomes !DR . The expectation for the
genetic part of the variance of family means within male groups is simi-
larly given by the mean variance of families within rows and this is
found to be !uava [da + (Va - ua)haF + U/ V/ h/ which on summing over
all relevant genes becomes !DR + -hHR" When summed these two vari-
ances give

which is, of course, the expectation we have already found for the over-
all variance of means of biparental families. The Ne1 mating system has
thus enabled us to break the overall variance of family means into two
recognizable parts having different expectations in terms of our par-
ameters and so add a further equation for the estimation of the par-
ameters.
Returning to our analysis, we note that the means of families within
Experimental analysis 189
male groups are each based on two plots. Their variance will thus have an
expectation of !DR + T6HR + tEb + 2~ V;SR allowing us to estimate !DR
+ T6HR as 0.003 45 - 1(0.00313) = 0.00188. Since the male groups
each include four families their means will be subject to a sampling vari-
ance of one-quarter the variance of individual family means. Their ex-
pectation for the variance of male group means is thus !DR + !C!DR +
rr,HR + tEb + 2~ V;SR) and we can estimate !DR by deducting one-quarter
the variance of family means within male groups from the variance of
male group means, giving !DR = 0.002 09 - !C0.003 45) = 0.001 23.
We now have the estimates !DR + T6HR = 0.001 88 and !DR = 0.00123
giving DR = 8 X 0.00123 = 0.009 84 and HR = 16 X (0.001 88-
0.00123) = 0.0104.
Finally we note that the variance of individuals within families is
!DR + fr,HR + Ew = 0.015 98 and now having estimates of DR and HR we
can complete the analysis by finding
Ew = 0.01598-!(0.00984)-fr,(0.01040) = 0.00116.
The estimates of the four parameters DR' HR , Ew and Eb are assembled
at the foot of Table 55. Since there were only four equations (provided
by the variances of male group means, of family means within male
groups, of plot means within families and of individuals within plots
respectively) the solutions give perfect fit estimates of the parameters
and we therefore have no test of adequacy of the model: at least one
more equation, whose provision would require the experiment to be
further elaborated in an appropriate way, would be needed for such a
test of adequacy.
Various more elaborate experimental designs have been proposed
from time to time, and have indeed been used in practice in a limited
number of cases. There is, for example, the design often referred to as
North Carolina 2, in which a number of male and female parents are
used, but with every male mated to every female. This yields a quasi-
diallel set of crosses, resembling the diallel in that every male genotype
is mated to every female, and of course vice versa; but differing from it
in that (a) the male parents and female parents are separate samples
from the population of genotypes, there being no necessary correspon-
dence between them in either genotype or number, and (b) being sam-
ples from an open bred population, the parents are not fully homo-
zygous as are the parents of the diallels we discussed in Chapter 4. The
data from an NC2 experiment can nevertheless be analysed like a diallel,
although for reason (b) above, they will not yield the same estimates of
190 Randomly breeding populations
the genetical parameters as a true diallel. Thus, the variances of the
means of both the male and the female arrays yield estimates of kDR'
and not of !DR as with a true diallel, and similarly the term for inter-
action of male and female parents in the simple analysis of variance of
the quasi-diallel table depends on roHR not !HR as in the true dialle!.
Finally, the mean variance within families has a genetical component,
!DR + fr,HR in a quasi-diallel whereas in a true diallel this variance
within families is wholly non-heritable. Since this design yeilds two
estimates of DR' from the means of male and female arrays respectively,
it affords in principle a test of adequacy of the model, but it will clearly
be more a test of the assumption that male and female parents contrib-
ute equally to the phenotype of the progeny, i.e. that there are, for
example, no maternal effects, than of anything else.
Where a number of inbred, homozygous lines are available from the
population, or are otherwise readily made from it, a true diallel exper-
iment may be carried out and analysed in the normal way. Appropriate
sets of homozygous lines will however seldom be available, although such
a set has been used in at least one case. Where analysis can be carried out
by such a true diallel experiment, it will afford a better test of adequacy
of the model and will yield more informative estimates of the parameters
in the sense that their standard errors will be lower from an experiment
involving a given number of individuals, than will any of the other designs,
just as NC2 is more informative than NC I (M and J, pp. 241-3). A true
diallel, however, demands a suitable sample of homozygous lines, and
even an NC2 requires the capacity for producing a series of different
progenies from a single female by controlled matings with successive
males. Such a controlled multiplicity of matings is more likely to be
possible with plants than with animals, where indeed the possibilities
must commonly be restricted to the NCI design. In general the choice
of design will be governed more by the biological possibilities of the
species than anything else. Also because the analysis of NCI exper-
iments depends on the partitioning of variances, and variances whose
genetical components involve DR and HR with such low coefficients as
1/8 and 1/16, such experiments must be large, involving large numbers
of individuals and hence demanding of resources to carry out, if they
are to yield informative estimates of the genetical components.
Complicating factors 191
34. Complicating factors
The assumptions on which is based the model we have used in the gen-
etical analysis of populations are (a) that the genes, both allelic and non-
allelic, are distributed independently of one another in the population
under analysis and (within the limits imposed by the mating system used)
in the progenies on which are based the observations used in the analysis,
and (b) that the genes display neither non-allelic interaction nor geno-
type X environment interaction in expressing their effects. The assump-
tion of independence of gene distribution is primarily the assumption of
random mating: linkage will have little effect in a randomly mating popu-
lation unless the forces of selection impinging on the population are such
as to produce a marked linkage disequilibrium. The assumption of ran-
dom mating does not always hold good. We have already seen that there
is assortative mating (that is a phenotypic correlation between mates) in
man and it is known that mating can depart from randomness in popu-
lations of other animal species also. Indeed anything that affects the
time of sexual maturity or mating behaviour and choice can prospec-
tively lead to non-random mating. In plants a variety of mechanisms are
known to affect mating, some leading to an excess of self-mating and
others virtually to exclusive cross-mating. The latter may be regarded as
a means of ensuring effectively random mating in respect of all the genes
except those governing the mechanism itself (see Mather, 1973). The
former by encouraging self-mating must generally lead to marked depar-
tures from randomness in the direction of inbreeding and hence to pro-
portions of homozygotes in excess of those expected from the Hardy-
Weinberg equilibrium in respect of any genes that vary in the population.
Assortative mating is the preferential coming together of individuals in
mating pairs on the basis of similarity (or, in negatively assortative mating,
of dissimilarity) of their phenotypes. Inbreeding is the preferential coming
together of individuals in mating pairs on the basis of closer than average
family, and hence genetic, relationship. Inbreeding may be held to imply
a form of assortative mating; but the distinction between them is never-
theless an important one, as their consequences are not the same. They
differ in several ways. Inbreeding will tend to raise the proportion of
homozygotes in the population and if sufficiently close will lead to com-
plete homozygosis apart from the effect of recurrent mutation.
Furthermore it will do so for all the genes in the nucleus, with the
result that, as in Johannsen's beans, the population will consist of a mix-
ture of true-breeding lines. Assortative mating on the other hand, depen-
192 Randomly breeding populations
ding as it does only on phenotypic similarity, will be affected by non-
heritable agencies as well as by heritable: it will affect the distribution
of the genes mediating the character in question, but it need not lead to
any marked increase in homozygosis, even where the contribution of
non-heritable agencies is small. Indeed it will not result in any signifi-
cant rise in the proportion of homozygotes where the variation in the
expression of the character in question is mediated by a reasonably
large number of gene-differences whose effects are not grossly dissimi-
lar in magnitude. Thus the consequences of assortative mating and in-
breeding will appear in different ways in respect of continuous variation.
Because of the association of non-allelic genes of similar effect to which
it leads, assortative mating raises the contribution of DR to the variation
of the character in the population, while in so far as it does not lower
the proportion of heterozygotes, it leaves the contribution of HR un-
changed. Because it raises the proportion of homozygotes, inbreeding
also raises the contribution of DR to the variation, but because of the
concommitant reduction in the proportion of heterozygotes, the con-
tribution of HR is correspondingly lowered. With complete inbreeding
HR vanishes entirely from the composition of the variation.
Where assortative mating is operative, it can be accommodated by the
approach due to Fisher (1918) to which we have already made a brief
reference, and which has been illustrated further in its analytical situ-
ation by links and Fulker (1970). Where inbreeding is complete it is
easily accommodated in the analysis. The population then consists of
nothing but homozygotes in the proportions u AA:v aa,and its variance
will be Dp + Ew + E b , where Dp = S [4ua va d/l as shown when we were
considering the variance of the homozygous parents of a diallel in Sec-
tion 18. Where inbreeding is only partial the situation is more complex
involving DR' HR and /, the inbreeding coefficient, as well as Dp. The
analysis then becomes correspondingly complicated.
Turning to interactions, the presence of genotype X environment in-
teraction is easy to detect by a comparison of the variance of the popu-
lation over two or more environments. If the simple model assuming no
such interaction is adequate, the variances of the population will be
homogeneous: any significant heterogeneity of their variances will show
that genotype X environment interaction must be taken into account.
Kearsey (1965) has reported an analysis of the variation in flowering time
of a randomly bred population of the poppy, Papaver dubium, which he
carried out using a number of experimental designs, two of which were
NCI and NC2. He sowed samples of each of the experimental progenies
Complicating factors 193
that he used in the analysis of the population, at two different times, so
making it possible to compare the variances they yield when grown in
the two different environments experienced by plants raised at two dif-
ferent periods of the year. The mean variances of the families following
the two sowings are shown for both his NC 1 and NC2 experiments in
Table 56. Each of these four MS are based on over 320 df, and it is clear
TABLE 56.
Variation in flowering time of a population of poppies (Kearsey, 1965)

Experiment
Sowing
NCI NC2 Mean Ratio 1/2
1 36 49 42.5
VF 2.02
2 19 23 21.0
1 45 30 37.5
DR 3.13
2 10 14 12.0

1 76 159 117.5
HR 2.67
2 46 42 44.0

11 10 10.5
Ew 2 8 12 10.0
1.05

that in both experiments the mean variance of families VF , which is of


course !DR + nHR + E w , is lower with sowing 2 than with sowing 1.
His data allow estimates to be obtained of DR' HR and Ew from both the
NC 1 and NC2 experiments, and these are also set out for both sowing
times in the table. If we take their averages over the two experiments
both of the genetical parameters are about three times as high after
sowing 1 than after sowing 2, but Ew hardly changes between sowings.
The difference in the variation between environments is thus unlikely
to be one that can be scaled out by transforming the metric on which
the character has been measured, and we must conclude that expressions
of the genes mediating the variation in flowering time are changing mark-
edly with the change in environment.
Interaction between genotype and environment is relatively simple to
detect. That between non-allelic genes, on the other hand, is difficult.
As in the descendants of a cross between true-breeding lines (Section 21),
the effects of non-allelic interaction on the genetical component of vari-
ation in a randomly breeding population are two-fold (Mather, 1974).
194 Randomly breeding populations
First, the terms in DR and HR have added to them terms in fR , JR and L R ·
These terms appear with the same coefficients as do f, J and L in the
corresponding variances and covariances of F2 and its descendants. They
are set out in the upper part of Table 57. Secondly, the non-allelic inter-
TABLES7.
Non-allelic interaction in randomly breeding populations (Mather, 1974)

VR = tDR + !HR + HR + VR + -hLR + Ew + Eb


ViSR = !DR + -hHR + !oIR + -b,JR + -kLR + Eb +
V2SR = !DR + 16HR + kIR + i4JR + H6LR + Ew
* V2SR

WSR = !DR + -hHR + -hIR + -AJR + ~LR


WHSR = iDR + -AIR
WPOR = !DR + -hIR
where
DR = Sa [4 I1a ([da + 2SbCITb jab) + SbC~b i ab)] - ~a [ha + SbC~bjba) + 2 Sb(IT b lab)]}2]
HR = Sa [16 ITa 2 {ha + SbC~bjba) + 2 SbCIT b lab)} 2]
IR = Sub [16 ITa ITb {iab - ~bjab - ~ajba + ~a ~b lab}2]
JR = Sub [64 ITa ITb {ITbUab - ~alab)2 + ITaUba - ~b labi}]
LR = Sab [256 I1a2 IT b2 1;b]
and
Sa = summation over all genes
Sb = summation over all genes interacting with A-a
Sub = summation over all pairs of interacting genes
I1a = Ua va and ~a = ua - va

action changes the definitions of DR and HR in a randomly breeding popu-


lation, just as it changes those of D and H in F2 although in a more com-
plex way. Indeed DR and HR are now affected by the i, both j's and I for
each pair of interacting genes, and not just by j and I respectively as are
D and H. The definitions of DR' H R , f R , JR and LR are also set out in
Table 57. They are very complex, but reduce to the simpler expressions
for the D, H, f, J and L of F2 when all u = v = !.
It will be seen from the table that fR appears whenever DR is present
in a variance or covariance, while JR and LR appear whenever HR is pre-
sent. It is thus difficult to separate IR from DR and JR and LR from HR
and this necessarily aggravates the problem we have already met in sep-
arating the E components of variation from DR and HR. In an F2 and its
descendants we can detect interaction by the changes in value of D and
fJ with generations, but this approach is not available to us with a ran-
Heritability 195
domly breeding population since all the variances and covariances we
obtain from the population itself or from the test matings made in it
are the equivalent of first generation statistics, and if we go on to pro-
duce the equivalent of second or later generations we run into the diffi-
culties we have already seen to arise in partially inbred populations.
Nor, for reasons which we saw at the beginning of this chapter, can we
use that most powerful means of all for detecting non-allelic interaction,
the scaling test. Thus our estimates of DR and HR are subject to distor-
tion both by the difficulty of separating IR from DR and JR and LR from
HR , and by the direct impact of the i's, j's and l's on the DR and HR
themselves, while at the same time the presence of the interaction caus-
ing the distortion may pass undetected, save in special cases. This is a
subject worthy of more attention than it has yet received.

35. Heritability
The proportion that the heritable variation constitutes of the total
phenotypic variation of a character in a population is commonly re-
ferred to as the heritability of that character. The heritability is gener-
ally denoted by h 2 , but to avoid confusion with hand h 2 as we have
been using them, we will here denote it by T. A distinction is further
drawn between what are termed the 'narrow' heritability and the 'broad'
heritability. The former is the proportion that the additive genetic vari-
ation constitutes of the total variation, and the latter is the proportion
that all the heritable or genotypic variation constitutes of the total.
Thus where both additive and dominance variation are present (but
leaving aside non-allelic interaction) the narrow heritability in a popu-
lation is ~ = ~DR/(~DR + !HR + Ew + Eb ) while the broad heritability
is Tb = (tDR + !HR)/(!DR + !HR + Ew + E b)· Where dominance vari-
ation is absent ~ = Tb = tDR/(tDR + Ew + E b). It should be noted that
non-allelic interaction like dominance can change Tb without altering Tn to
a corresponding extent.
The heritability, and particularly the narrow heritability, ~, provides
a convenient summary of the situation with regard to the distribution of
variation between the genetic and the non-genetic within the population.
It is easily measured as the ratio that twice the parent/offspring covari-
ance (WpOR = !DR ) bears to the variance of the individuals in the popu-
lation, provided thatE; can be shown to be negligible or can be made
negligible or can be measured and deducted from WpOR to leave a direct
estimate of !DR . Furthermore, once we know the value of ~ it can be
196 Randomly breeding populations
used to predict the response of the population to certain types of selec-
tion. Thus if we select that group of individuals which has a greater ex-
pression of the character than the remaining group of unselected indi-
viduals and then breed them together, the mean expression of the off-
spring so obtained will exceed that of the population by R = T.t S where
R is referred to as the response to selection and S, the intensity of selec-
tion, is the amount by which the mean of the selected parents exceeds
that of the population (see Falconer, 1960). As Falconer points out,
this prediction of selective response will hold good in detail only where
a number of other conditions apply, for example, that there is no non-
allelic interaction and the scale of measurement is adequate. In any case
the predictions can be expected to be valid only in the short-term, since
response to selection must itself imply changes of gene frequency, includ-
ing some gene fixation. Nevertheless predictions of this kind have proved
to hold good, at least to a first approximation, in a high proportion of
cases.
The uses to which the concept of heritability can be put should not,
however, blind us to its limitation. These stem ultimately from two of its
features. In the first place it is a ratio, in the case of T.t the ratio of the
additive genetical variation to the total phenotypic variation of the popu-
lation. It depends therefore not just on the amount of heritable variation
in the population, but also on the amount of non-heritable. The herita-
bility can be raised not only by injecting more genic variation into the
population but also by making more stable the environment in which
the individuals are raised and develop. Equally it can be lowered by
raising the non-heritable variation as well as by reducing the heritable.
Thus, while the heritability may be a convenient summary of the situ-
ation for some comparisons or uses, it can never give as clear and in-
formative a picture as the estimates of the components of variation, DR'
HR and E. Given such estimates we can easily construct ~ or Tb which-
ever we need, should we need it, and at the same time we have compre-
hensive information which provides an understanding beyond anything
to be obtained from heritabilities and their comparison.
The second limitation of the concept of heritability stems from the
properties of the genetical components of variation, especially DR' of
which it is compounded. As we have already noted, since DR = S 4uv[d
+ (v - u)hj2 it cannot give us information about the genetical poten-
tialities of the population in the way that D = S (d 2 ) can do for the de-
scendants of a cross between two inbred lines. The value of DR not only
varies with the gene frequencies as a result of the general factor uv that
Heritability 197
it contains for each gene difference, but it also depends on the term (v -
u)h which is included with d. Now if the more common of two alleles is
dominant, v < u when h is positive and v > u when h is negative. In either
case (v - u)h will be negative and d + (v - u)h will be less than d. In the
same way when the less common allele is dominant, (v - u)h will be posi-
tive and d + (v - u)h will be greater than d. We can illustrate the effect
of this relationship by reference to the data of Robinson et al. (1949) on
yield in maize, which we analysed in Section 33. Although we used the
data there to illustrate the analysis of a population by means of the Nel
experimental design, the results were in fact derived from an F2 where of
course all u = v =!, giving DR = D = S(d 2) and HR = H = S(h2). We
found DR = D = 0.009, HR = H = 0.010 and Ew + Eb = 0.013. Approxi-
mating these findings by setting D = H = Ew + Eb = 0.01 for ease of pre-
sentation, we note that if all the genes in the system are alike in their
effects hid = y'(HID) = 1 and d = h. Then assuming that u and v are
the same for all genes we can calculate ~ and ~ for any gene frequency
that we choose. The relations of ~ and Tb to u, so obtained, are shown
in Fig. 21, from which we see that Tn becomes increasingly small as u
50r-------~--~----.---------------_,

40

>- 30
~
:.c
.9
.~

:r: 20

10

o
iJ

Fig. 21. Effect of gene frequency, u, on the narrow (Tn) and broad (Tb )
heritabilities, in %, in a randomly breeding population, where Sd 2 = Sh 2 =
E = 1. d, hand u are assumed to be the same for all gene pairs.
198 Randomly breeding popUlations
increases above 0.5, and in particular becomes relatively very small as u
rises to 0.8 or more. When, however, U < 0.5, Tn can rise to 3/2 the value
it has at U = v = 1, before falling away towards 0 as u approaches O. Thus
when u > 0.5, Tn will always underestimate the fixable genetic variation
and will grossly underestimate it as u approaches 1. When u < 0.5, Tn
can materially overestimate the fixable genetic variation until u gets
fairly close to O. If we had taken h = -d, which is also consonant with
the data, the same pair of curves would have been obtained but with v =
1 - u replacing u along the abscissa.
In both cases the abscissa is the frequency of the dominant-allele and
Tn always gives an underestimate of the fixable genetic variation when
the dominant gene is the more common; although it generally overesti-
mates it when this allele is the less common. Such evidence as we have
suggests that the dominant allele tends to be the more common in popu-
lations. We must expect therefore that although Tn may tell us how the
population will respond to simple mass selection, it will underestimate
the changes that can be obtained if we set about our breeding programme
in a different way. If, for example, instead of applying mass selection to
the population, we first of all raise from it a number of at least partially
inbred lines, choose the best of these, cross them together in pairs and
select further from their F2 's, progress can be made going well beyond
anything that our estimate of Tn would suggest. Experience in breeding
maize, for example, accords with this expectation.
One last point remains to be made. If we have estimates of both Tn
and Tb, we can find Tb - Tn = !HR/(tDR + !HR + Ew + Eb) and this can
be compared with tTn = !DR/(tDR + !HR + Ew + Eb) to give us an esti-
mate of HR/DR . In our example, HR/DR is always greater than 1 when the
frequency of the dominant allele is greater than 0.5. If we failed to re-
member the composite nature of DR' we would be in danger of taking
this as evidence of preponderant over-dominance of the genes in the popu-
lation, when no such over-dominance was, in fact, present.
Genes and
effective factors

36. Estimating the number of segregating genes


In the absence of non-allelic interaction the mean phenotypes of two
true-breeding lines may, as we have seen in Chapter 3, be represented as
m + [d] and m - [d] respectively, where m is the mid-parent value and
[d] is the sum of the d increments of all the genes in which the lines
differ. Sign is taken into account in finding [d] to accommodate the
association in the two lines of the - alleles at some loci with the + alleles
at others. Where, however, the + alleles at all the k loci in which the lines
differ, are associated in one parent and all the - alleles in the other, [d]
= da + db .... d k = S (d) and this becomes kd where all the gene differ-
ences are of equal effect, that is da = db = .... = d k = d. Thus with
complete association of like alleles and with all the gene differences
having equal effects the mean phenotypes of the two lines will differ by
2S(d) = 2kd. Now in the absence of linkage D = S(d 2 ) = kd 2 and if we
divide the square of half the parental difference by D we find

and we have an estimate of k, the number of genes in which the two lines
differ.
In arriving at this estimate of k we have made four assumptions, that:
(a) there is no non-allelic interaction,
(b) the gene differences are of equal effect,
(c) there is complete association of like alleles in the parents,
(d) there is no linkage of the genes.
What are the consequences on the estimate of k if these assumptions fail?
Taking non-allelic interaction first, it will be recalled from Section 20
that when allowance is made for such interaction the means of the two
200 Genes and effective factors
parental lines become m + [d] + [i] and m - [d] + [i]. So, half the par-
ental difference is still [d], and no complication is introduced into the
numerator of the fraction which yields our estimate of k. Turning to
the denominator, however, we note that D = S(da + tSia)2 in F2 and S3
and it will exceed S(d/) or fall short of it according to the preponderant
sign of the j's, and by an amount which will depend also on the extent
and magnitude of this interaction (see Section 21). The estimate of k
can thus be biased upwards or downwards by i interaction. If we have
the data for estimating D in more than one generation we may be able
to correct it for the effect of the interaction, since in F 3 , D = S (da +
! Sia)2 and in F4 it changes further to S (da + i Sia)2 so allowing us to ex-
trapolate to S(da2 ). An extensive set of observations would be necessary
for such a procedure and no attempt has yet been made to find k in the
known presence of non-allelic interaction.
Turning next to the assumption of equality of gene effects, we note
that if these effects are not in fact equal we can define d as their average
- -
and then write da = d (1 + O'a)' db = d (1 + O'b) and so on. It can then be
shown that our estimate of k becomes k = k/( 1 + ~) where Va is the
variance of 0' or equally the variance of d/{j (see M and J, p. 309). Thus
inequality of the gene effects must always lead to an underestimate of k.
To take an example, where there are three gene differences of equal
effect, d being 2 for each of them (da = db = de = 2) with the + alleles
all in one parent and the - alleles in the other (shown as ~2 , ~2 , ~2 in
Table 58). [d] = 2 + 2 + 2 = 6 and D = S(d 2 ) = 12, giving k = 6 2/12 = 3,
which of course equals the true k. If however, we have three genes of
unequal effects, with da = 3, db = 2, de = 1, again with complete associ-
ation of like alleles in the parents (shown as ~3 , ~2 , ~l in Table 58),
[d] = 3 + 2 + 1 = 6 as before, but D = 3 2 + 22 + 12 = 14 giving k =
6 /14 = 2.57, so underestimating k. In this case d = 1(3 + 2 + 1) = 2
2
and da = 2(1 + t), db = 2-{1 + 0) and de = 2(1 - t) giving O'a = t, O'b = 0,
O'e = - t and ~ = HP + 0 2 + (_t)2] =!. Then k = k/(1 + ~) = 3/(1 +
!) = 2.57 as already found.
Incomplete association of like alleles also leads to an underestimate,
and generally a much greater underestimate, of k, since [d) is necessarily
less than S (d). If we write S (d+) for the summed effects of the genes,
whose + alleles are present in the larger parent and S (d_) for the summed
effects of those whose - alleles are also present in that parent, [d] =
S (d+) - S (d_) = S (d) - 2 S (d_) and we can obtain a measure of the
Estimating the number of segregating genes 201
TABLE58.
The consequences of inequality of gene effects and incomplete association of
like alleles for the estimate of the number of gene differences.
[Note: The effects of the three gene differences and the distribution of alleles
between the parents are shown in the left-hand column. Thus, for
example, _~ _~ _~ indicates that all gene differences are of equal
effect (all d = 2) with the + alleles concentrated in one parent and
the - alleles in the other; while _~ -; _~ indicates gene differences
of unequal effect (d = 3, 2 and 1 for them respectively) wi th the -
allele of the second gene associated with the + alleles of the other two.]

Assumptions
[d] r D Va
Equal effects Complete association
2 2 2
v v 6 12 0 3.00
-2 -2-2

3 2 1 1
f v 6 14 7) 2.51
-3 -2-1

2 2-2
v f 2 ~ 12 0 0.33
-2-2 2

3 2 -1 2 1
f f 4 :J 14 7) 1.14
-3-2 1

3-2
-3 2 -1
f f 2 ! 14 1
7) 0.29

3-2 -1
-3 2 1
f f 0 0 14 i 0.00

k = [d]2/D = kr2 /;P (1 + Va), In all cases k = 3 and d = 2; v = assumption valid;


f = assumption invalid

degree of association by setting rd = [Sed) - 2 S(d_)]/S(d). This will of


course be 1 when association is complete and 0 when dispersion of like
alleles between the two parents is at its effective maximum. The esti-
mate of k thus becomes k = [dF/D = [rdS(dF D = krd2, which must tend
to be an underestimate since rd lies between 1 and O. If the assumptions
202 Genes and effective factors
of equal gene effects and complete association fail simultaneously, it
can be shown (M and J, p. 310) that k = kri/(l + Va), with the in-
equality of effects and incompleteness of association reinforcing one
another in reducing k.
We can illustrate the consequences of incomplete association by ref-
erence to the basic example already used to illustrate the consequences
of inequality of effects. With three genes of equal effect, all d = 2, but
with two of their + alleles associated with the - allele of the third
(~~ , ~2 , ~2 in Table 58), [d] = 2 +2- 2 = 2 while D = 12 as before.
Then k = 2 2 /12 = 0.33, whereas of course k still is 3. Looking at it in the
alternative way rd = H6 - (2 X 2)] = ! and k = kri = 3 X (!)2 = 0.33.
With the genes of unequal effects, da = 3, db = 2, de = 1 there are three
possible distributions between the parents as shown in Table 58, and
"t
eac h gIves I sown ch aractenstic
. . un d erestimate
. 0
f k . TIlUS -3
3, -2
2, -1
1
, ,
has [d] = 3 + 2 - 1 = 4 and D = 14 as in the earlier example, giving
k = 4 2 /14 = 1.14. Put the other way, rd = i[6 - (2 Xl)] = ~ with
~ = 1 as fOllnd in the earlier example, so giving k = kr//(l + ~) =
[3 X (i)2]/( 1 + 1) = 1.14. The values of [d], D, rd , Va and k are also
shown in the table for the other two possible distributions of the alleles.

37. Consequences of linkage: effective factors


The fourth assumption we made in arriving at our estimate of the num-
ber of genes was that the genes were unlinked, and we must now con-
sider the effects of linkage. Now, as we saw in Section 22, linkage has
no effect on family means, provided there is no non-allelic interaction
of the linked genes, but it does however affect D, which no longer is
S(d 2 ) but includes terms in dadb and p, the recombination value. We
can illustrate the consequences of this change in D for our estimate of
k by considering the case of two genes A-a and B-b, where da = 3, db = 1
and the recombination value is p.
Two distributions are possible of the genes between the parent lines.
In one, the like alleles are associated in the two parents which are thus
AABB and aabb or -3,
3 1
-1 in the notation of the previous section. In
the other the genes are dispersed, the parents being AAbb and aaBB or
3, -1
in the same notation. The associated distribution will lead to
-3, 1
Consequences of linkage: effective factors 203
coupling linkage in FI and so may be denoted by C, while the dispersed
distribution will give repulsion linkage and so may be denoted by R.
With the C arrangement [d] = da + db = 3 + 1 = 4, and D = d/ + db2 +
2da db( l - 2p) = 3 2 + 12 + 2.3.1 (1 - 2p) = 16 -12p, and with the R
arrangement [d] = 3 - 1 = 2 and D = d/ + db2 - 2da db (1 - 2p) =
4 + l2p. We thus find, kc = 4 2 /(16 -12p) and kR = 2 2 /(4 + l2p). In
the absence of linkage p = 0.5 and kc = 4 2 /10 = 1.6, and kR = 2 2 /10 =
g.4 the departure from the true value of k = 2 being due partly, as with
kc, to the inequality of d a and db, but now chiefly to the dispersion of
like alleles between the parents.
When however, linkage is completep = 0, and kc = 4 2 /16 = 1 while
kR = 2 2 /4 = 1. No matter whether the genes are in the coupling or re-
pulsion arrangement we now arrive at the conclusion that there is but
one gene difference between the parents, as indeed we should since two
completely linked genes are effectively a single unit of inheritance. The
difference between the two cases is in the effects produced by the two
alleles of the composite unit; with coupling they are AB and ab giving
the components [d] = 3 + 1 = 4 and D = (3 + 1)2 = 16, while with re-
pulsion they are Ab and aB giving [d] = 3 - 1 = 2 and D = (3 - 1)2 = 4.
With two genes linked but some recombination between them, values
of k are obtained intermediate between 1.0 and 1.6 for coupling and be-
tween 1.0 and 0.4 for repulsion, as illustrated in Fig. 22 where k is plotted

20,--------------,

1·5

,
k 10

0·5

o 0'1 0·2 0·3 0-4 0·5


p

Fig. 22. Effect of linkage on the estimate, k, of the number of units of in-
heritance where two genes, with da = 3 and db = 1 show the recombination
frequency p. C indicates the coupling (that is, like alleles associated) and R
the repulsion (that is, like alleles dispersed) arrangements of the genes.
204 Genes and effective factors
against p. This is of course to be expected but we should note that with
tight linkage k lies close to 1, and even with p as high as 0.1, k is still
close to 1, especially with coupling where it is 1.08, although even with
repulsion it has fallen only to 0.77. Thus even where recombination
occurs, the two genes still appear more like a single unit of inheritance
than like two, unless the linkage is loose and recombination fairly fre-
quent. Where linkage is reasonably tight therefore we are estimating not
the number of genes but the number of effective units of inheritance or
effective factors as they are termed. We should note further that with
reasonably tight linkage k is much the same whether measured from the
coupling or the repulsion cross. Thus with p = 0.05, ke = 1.04 and
k:R = 0.87. The difference between the two cases lies not so much in
the number of effective factors as in the average effect of that factor:
with coupling [d]e = 4 and de = [d1e/ke = 4/1.04 = 3.85 while with re-
pulsion [d]R = 2 and dR = [d]R/kR = 2/0.87 = 2.30. This is of course a
very simple example that we have taken for illustrative purposes. Clearly,
however, the same principle will hold where a greater number of genes
are linked and so aggregated into a single effective factor. At the same
time the number of possible arrangements of the genes in relation to
one another is much greater and the change in the effect of the factor
from the most dispersed to the most associated will be correspondingly
greater. The same principle will hold also where more than one group
of linked genes is segregating. Thus, for example, with four genes falling
into two groups, each comprising two genes with da = 3 and db = 1, and
p = 0.05 in both cases, the two groups being unlinked with each other,
we should find k = 2 X 0.87 = 1.74 and d = 2.30 when both groups
were in the dispersed arrangement, and k = 2 X 1.04 = 2.08 and d =
3.85 when both were in the associated arrangement.
So, if we cross two parental lines differing at a number of loci which
fall into linked groups, and with the alleles at the loci within the groups
preponderantly in the dispersion arrangement, and select for high and
low expressions of the character in the descendants of the cross, we
expect to pick up and fix recombinants within the groups and so to
have replaced the preponderantly dispersed arrangements of the parental
groups by preponderantly associated arrangements in the selected lines.
Then on estimating k from the cross between the selected lines we would
expect to find k much the same as that found from the cross of the
parent lines themselves, but with d increased to an extent corresponding
to the effectiveness of the selection in raising and lowering the ex-
pression of the character in the high and low selective lines respectively.
Consequences of linkage: effective factors 205
This is well illustrated by an experiment described by Mather (1941) in
which two lines of Drosophila melanogaster were crossed. Beginning with
the F2 , selection was practised over thirteen generations for an increased
number and over twelve generations for a decreased number of abdomi-
nal chaetae. The selected lines were then crossed with each other and an
F2 raised.
The results of this experiment are summarized in Table 59, where the
means and variances shown are the averages of males and females. The
- -
mean numbers of abdominal chaeta (PJ and P2 ) are shown for the two
lines that were crossed together for both the original cross, with which
TABLE 59.
k and d in the original lines and the selection lines derived
from their cross, in a selection experiment for abdominal chaetae
in Drosophila melanogaster (Mather, 1941)

Cross ~ P2 [d] VE VjP2 D k d


Original lines 42.24 39.77 1.235 6.412 6.932 1.040 1.5 0.84
Selected lines 46.12 32.85 6.635 7.544 17.469 19.850 2.2 2.99
VE = HpJ + ! VP2 + ! VpJ

the experiment was started, and the cross between the two selected lines,
high and low, derived from that original cross. In each case the non-
heritable component of the ~F2' the variance of the F2 , was estimated
by combining the variance of PI' P2 and Fl in the F2 proportions, thus
VE = ! VpJ + ! VP2 + 1- VFJ· ~F2 - VE is taken as an estimate of !D. This
assumes that H is 0 and so almost certainly overestimates D, but the
overestimation is unlikely to be serious since there was little evidence
of dominance in these crosses and in any case H makes only half the
contribution of D to ~F2. Nevertheless to the extent that H exceeded
0, k will be an underestimate, although the bias will be equal for the
two crosses unless the dominance ratio HID differs between them.
It should be noted too, that the estimate of D and hence that of k will
be less precise in the case of the original cross since the small difference
between ~F2 and VE , from which D is found, will render it subject to
sampling variation proportionately much greater than in the cross be-
tween the selected lines where the difference between V~F2 and VE is
much larger.
Despite these necessary provisos, however, the results are clear and
206 Genes and effective factors
striking. In both crosses there are some two, or if we allow for the low-
ering of the estimate arising from inequality of their effects, perhaps
three effective factors, but in the cross of the selected lines the average
effect of the factors is about3t times as great as in the original cross.
The effect of selection has been to build up greatly the effects of the
units of inheritance that we can detect and whose number we can esti-
mate by biometrical methods.
These findings have a simple interpretation in terms of linked groups
of genes, and indeed as we have seen are to be expected on that basis.
They afford us the prime clue to our understanding of how selection
acts by rearranging linked combinations of the genes - polygenic com-
binations as they are called. They also emphasize to us the distinction
between the effective factors that we can detect and the genes that we
postulate and of which the factors are made up. Effective factors are
not genes which can change only by the process (or combination of
processes) that we term mutation. Their physical basis lies in the pieces
of chromosomes marked and delimited by the genes - all members of
the same polygenic system - through whose effects they are recognized.
And being pieces of chromosome, they can change their genic content
and hence their effects by recombination. They thus have a quality of
lability and hence of transcience much greater than that of their con-
stituent genes, which can change only by mutation. True they will be
changed by the mutations of their constituent genes, but this is a rarer
event than is the recombination whicb- takes place within them as many
experiments have shown. Recombination within effective factors rather
than mutation of their constituent genes is the basis for understanding
the reassortment of polygenic variability and hence of response to selec-
tion. It is a basis, too, which allows us to understand the way in which
selection appears to create the polygenic variability upon which response
to its impact depends (Mather, 1973) and this is reflected in the combi-
nations of constancy, or near constancy, of k with change in d.
Furthermore, since the basis of the effective factor is a piece of
chromosome, we must expect it to include not only a number of linked
genes which are members of the same polygenic system and hence affect-
ing the expression of the character through which the factor is recog-
nized, but also other genes, members of other polygenic systems affect-
ing other characters. The properties in action of an effective factor can
thus transcend the properties of the individual genes of which it is com-
posed, in at least two ways. First a factor comprising two or more genes
in a preponderantly dispersed arrangement, each of which is dominant
Other sources of estimates 207
in the same direction, can show overdominance as a factor even though
none of the individual genes shows overdominance. This is indeed one of
the classical explanations of the occurrence of heterosis in the FI of two
inbred lines and of course by the same token also of inbreeding de-
pression. Secondly, taking into account the admixture of different poly-
genic combinations in the same piece of chromosome, the effective factor
can show pleiotropy in its action even though none of its constituent
genes shows pleiotropic action as an individual. Such a 'pleiotropy'
provides a basis for understanding the correlated responses to selection
that are so commonly and so extensively observed. But being a pleiotropy
that depends on linkage, it can be resolved by recombination, thus those
correlated expressions of two or more characters which we recognize as
correlated responses to selection can be, and indeed in experiment regu-
larly have been, resolved by giving time and opportunity for recombi-
nation to reassort the genic content of the effective factor (Mather, 1973).

38. Other sources of estimates


The estimate of the number of effective factors that we have been dis-
cussing (K1 as Mather and links term it) is but one of a number of esti-
mates that can be derived, given an appropriate body of data. One such
further estimate can be obtained from the dominance properties of the
genes. Where [h] is the deviation of the FI mean from the mid-parent we
can find k = [hFIH, in just the same way as we have found k = [dJ2ID.
This further estimate has, however, no advantage over the one we have
been using: its properties and limitations are essentially the same except
that it will not be affected by the association or dispersion of like alleles
between the parents, which is resolvable by recombination, but by the
reinforcement or opposition of the dominance of the different genes in
the system which is not similarly resolvable. In this sense it is of less use
than k = [dFID and its inferiority is all the greater because whatever the
uncertainty arising from the sampling variation of [d] and D, that of
k = [h FIH will be greater since the sampling variations of [h] and es-
pecially H will generally be greater than those of their counterparts.
Of more interest are the estimates of k arrived at in quite a different
way. If, for example, we have available the variances of a number of F3
families raised from different individuals of F2 , we can estimate k as

k = H V2~3
H VVF3
208 Genes and effective factors
where V;F3 is of course the mean variance of these F 3's. VVF3 is the vari-
ance of the variances and the subscript H denotes that it is the heritable
portion of the variances about which we are talking (M and J, p. 311).
This estimate is the K2 of Mather and Jinks. Similar estimates can be
derived from the variances of groups of S3 and also second back-cross
families. This type of estimate has one great advantage over the esti-
mates we have been using: in the absence of linkage it is unaffected by
the association or dispersion of alleles in the parental lines, just as in
the absence of linkage D is unaffected by association or dispersion
although [d) is. It has, however, two disadvantages over and above its
requirement for an F3 or similar generation to be raised. The first is
that it is more affected by inequality of the effects of the genes than is
the k we have been using. This is, however, probably not actually so
serious a matter as the fact that to obtain it we have to estimate not
just V;F3 and V VF3 ' but the heritable components of these variances
H V;F3 and H VVF3' To do so involves the use of a number of corrections
based on the estimates of non-heritable variation obtained from parents
and Fp and these corrections may not be small by comparison with the
F3 variances that they are used to correct. The estimate of k that is ulti-
mately obtained is thus likely to be subject to a proportionately greater
standard error and the confidence with which it can be used is corre-
spondingly reduced.
Useful estimates of this kind can nevertheless be obtained where the
necessary data are available (M and J, pp. 319-24), and if obtainable
they can be put to very good use because as we have already noted they
are not affected by the dispersion of like alleles between the parents.
Now going back to the estimate of k that we have chiefly been dis-
cussing in this chapter, we found k = [dF/D, which can be rewritten as
[dF = kD. Given, therefore that we have an independent estimate of k
and knowing D, we can find [dF and hence [d). And given further that
the estimate of k we are using is independent of the association or dis-
persion of the genes the [d) that we do find will in fact be an estimate
of S(d). So if we cross two parent lines and, by raising from them F2 ,
back-crosses and F 3's or any other combination of families that will
give us the value of D together with a k of the second kind (K2 ) we can
calculate S(d). This will tell us whether we can expect to produce lines
that will transcend the parent lines in their expression of the character
we are considering, and indeed how far they will so transcend them.
The value of such information to a breeder concerned to enhance or
diminish the expression of the character needs no emphasis.
Other sources of estimates 209
Still a third basic method of estimating the number of effective fac-
tors has recently been developed by links and Towey (1976). It depends
on ascertaining the proportion of individuals in a generation, say the F2 ,
which are heterozygous for at least one gene - or rather one effective
factor. This proportion is found by raising a progeny, an F3 family for
example, from each of a number of individuals in the F2 . Two individuals
are selfed from each F3 family, and if the two F4 's so produced differ in
either mean or variance (or of course both) in respect of the character
under observation, the two F3 individuals must have had different geno-
types and the F2 individual which gave rise to the F3 from which they
were taken must have been heterozygous for at least one effective fac-
tor. Thus the proportion of F2 heterozygous for at least one unit is ascer-
tained and assuming no linkage of the effective factors their number can
be estimated. Once again, the estimate must be minimal since there could
have been gene differences too small to detect by families of the size used;
but equally the estimate will be unaffected by dispersion of the genic
differences in the parents. It can then be used in the same way as the K2
estimates derived from the variances of F3 or similar families and there
are fewer corrections to be made in the process of estimation, although
of course it requires continuing the experiment for an extra generation
to F4 .
Conclusion

39. Designing the experiments


In the foregoing chapters we have seen how additive gene effects, domi-
nance, non-allelic interaction, linkage and g X e interaction may be rep-
resented in biometrical terms, how they may be distinguished both from
one another and from non-heritable effects and how they may be de-
tected and measured biometrically in the descendants of single-crosses
and in randomly breeding populations. We have not covered the full
range of genetical phenomena - we have, for example, not touched on
sex-linkage, cytoplasmic inheritance and maternal effects, haploidy and
polyploid inheritance. But we have seen enough of biometrical genetics
to appreciate that it is capable of dealing with any of the many phenom-
ena that genetic analysis has taught us to recognize: we proceed by intro-
ducing the appropriate parameters into the specifications of the pheno-
typic expression of the character and then, by comparing the appropriate
statistics from relevant types of families, go on to test and measure these
parameters.
In the case of haploid inheritance, the biometrical analysis is actually
simpler than with diploids, since dominance and all dominance related
interactions no longer enter into the specification: we can dispense with
h, j, i, gh and all the other parameters representing dominance and domi-
nance based effects. In other cases like those of sex-linkage and cyto-
plasmic inheritance more parameters are needed; but this need not com-
plicate the experiments unduly for although we require more statistics
from which to construct the additional equations of estimation made
necessary by these additional parameters, we do not need additional
types of family since we can obtain the extra statistics by subdividing
the observations according to sex within the families and generations in
the case of sex-linkage or according to the direction of the initial cross
in the case of cytoplasmic effects. In still other cases, however, the com-
plexity of the experiments and analysis is greatly increased by the intro-
duction of the further parameters into the specification. More, and
Designing the experiments 211
perhaps many more, types of family may be needed to provide the
necessary statistics. We can see this without even going beyond the
phenomena we have discussed in the earlier chapters, for if we wish to
examine the capacity for digenic interaction between linked loci to
account for the behaviour of a character we need some 20 different
types of family of appropriate kinds to carry out the test. This indeed
requires a complex experiment and a complex analysis; but it has been
done (Jinks and Perkins, 1969), and so in its own way it serves to em-
phasize the point that in principle any genetical phenomena can be
accommodated in the biometrical approach, albeit at a price.
This prospective price serves in its turn to emphasize various points
about experimental design and analytical procedure. Thus, sex-linkage
and cytoplasmic effects can be detected by appropriate comparisons
between reciprocals from crossing two true-breeding lines. It behoves us
therefore to raise and compare reciprocal Fl'S and where the individuals
are unisexual to record the sexes separately and compare them in these
F1's, in order to ascertain whether any complexities arising from these
phenomena must be taken into account in planning later generations of
the experiment. In other cases observations on certain specific combi-
nations of relationship are needed if the analysis is to be complete and
we must ensure that these appear in our data. Thus, in randomly breed-
ing populations the covariance of parent and offspring is !DR and that
of half-sibs is lDR while full-sibs give a covariance of !DR + -hHR . The
comparison of either the parent/offspring or the half-sib covariance with
that of full-sibs can give evidence of dominance, but that between the
parent/offspring and half-sib covariances cannot do so. We must there-
fore ensure that data on full-sibs are obtained, whether we include both
of the other relations or only one of them. To take a second example,
non-allelic interaction can be detected from the means of parents, F 1, F2
and back-crosses. But the detection of linkage requires not merely the
use of variances from segregating generations, but variances of at least
two ranks. In the absence of interaction the most informative compari-
son is of the heritable portion of ~F2 with the heritable portion of ~F3'
since in the absence of linkage H ~F2 = 2 H ~F3' If they differ signifi-
cantly, linkage must be judged to be operative and the sign of the differ-
ence will tell us its preponderant phase. So the experiment should be
designed to facilitate this comparison being made with maximum ef-
ficiency; and the further comparison of H ~F2 with H ~F3 (which must
be available if H ~F3 can be found) will provide an additional test of
whether there are detectable differences between second degree stat-
212 Conclusion
istics of the same rank but from different generations, such as would
result from non-allelic interaction.
Other examples could readily be given of the need for care in the
genetical design of the experiments, that is for designing them so as to
permit and facilitate the detection and measurement of the genetical
phenomena at issue. We must also, however, pay attention to the
statistical design, that is to the adoption of a \design which will provide
a valid estimate of error variation against which the genetically import-
ant comparisons can be tested, and which will enable us as far as poss-
ible to make these comparisons with the maximum efficiency permitted
by the numbers of individuals and families that available resources per-
mit us to raise and observe. The provision of a valid estimate of error
will always entail a design which allows a valid estimate of the non-
heritable component of variation, which is of course error variation for
the purpose of genetical analysis, and this may in its turn put restrictions
on the way we raise, for example, plants from the time the seed is sown
(see M and J, pp. 338-9).
Non-heritable variation is not, however, the only type of error vari-
ation to be taken into account: the effects of genetical phenomena
which the experiment was not designed to take into account and which
may indeed not have been recognized as operative in the material in
question, may also be affecting the comparison, the testing of which
is the prime purpose of the experiment. There is thus a need to obtain
more than one set of comparisons which will reflect the phenomena
under investigation and to compare these with one another to see
whether the phenomena are adequate to account for the heritable differ-
ences observed, or whether the sets are sufficiently different from one
another to require us to recognize that further unspecified genetical com-
plications exist. This genetico-statistical point is well illustrated by the
joint scaling tests that we discussed in Section 9. There we were testing
the additive-dominance model, with a view first to detecting and measur-
ing additive and dominance components represented explicitly by [d]
and [h] in the formulations, and secondly to testing whether these, taken
together with the non-heritable variation, were adequate to account for
the differences observed among the mean measurements of parents, Fb
F2 and back-crosses. The comparison between means of the parents, ~ -
A. would itself have been sufficient to establish that [d] was significant
and that additive variation was therefore present, just as F1 - ! (liz + P2)
by showing that [h] was significant would have established that domi-
nance was operative. The introduction of F2, Bl and B2 in principle
Designing the experiments 213
allowed further, independent comparisons from which [d] and [h] could
be measured and compared among themselves and with the estimates
from PI' P2 and FI for consistency. This was implicitly done by the Xf31
for goodness of fit (Table 6) which tests whether there are detectable
sources of genetical variation, a:nd hence genetical phenomena beyond
additive gene effects and dominance, displaying their effects in these
data, that is whether there is genetical as well as non-heritable error vari-
ation. It is thus a test of the adequacy of the genetical formulation, for
which purpose we in fa:ct used it.
The example we discussed yielded no evidence of such further gen-
etical complication: the additive-dominance formulation was adequate.
But had it proved to be inadequate we could have gone on to use the
degrees of freedom on which the test of adequacy was based for the
introduction into the formulation of further parameters specifying ad-
ditional genetical effects, which could then have been measured and
tested for their adequacy to account for the residual variation, as indeed
we did in the later example of Section 20.
Turning to the-precision of the statistics we obtain from our families
and of the genetically meaningful comparisons that we seek to make
among them, it is obvious that, other things being equal, the bigger the
experiment the greater the precision that will be obtained. But resources
are not infinite and those available, whether of land, labour, cultural or
analytical facilities, will always set a limit to the size of the experiment
we can carry out and hence to the precision of the results and the infor-
mation we can obtain. In this connection, therefore, our task is basically
that of designing the experiment so that the maximum of relevant in-
formation is obtained from the number of individuals that we can raise,
observe and analyse. Having decided on the types of family that must be
included to provide the statistics and comparisons needed to answer the
genetical questions we have in mind, and to provide the estimates of
error variation, heritable and non-heritable, that our tests of significance
will require, we must next decide how we shall apportion the individuals
between numbers of families of each of the various kinds and numbers
of individuals within each of these families.
Taking a simple example, if genetical considerations require us to use
F3 families in order to estimate VzF3 and V2F3 , we can obtain estimates
of these two variances with approximately equal precision by raising nl2
families each of two individuals where n individuals can be raised in all.
For some purposes this would be the thing to do, but for others it would
not: to take another case, if we merely needed an estimate of ~ or we
214 Conclusion
were concerned solely with separating the D, Hand E comp-onents of
variation, we might decide that n families each of one individual would
be preferable. Here, however, the matter of biological manipulation
enters again for it is as easy with, say, Drosophila to raise an S3 family
of 40 or 50 as it is to raise a family of one whereas every additional
family means an additional mating, and an additional culture. With
self-pollinating plants on the other hand little labour is involved in pro-
ducing F3 seed and n single plant families are not much more trouble-
some to raise than a single family of n plants. We might observe a further
restraint also imposed by the biology of the species. S3 is the nearest to
an F3 generation that can be obtained from Drosophila or any other
dioecious species, whereas the crossing needed to produce an S3 may be
very troublesome in naturally self-pollinating species of plants like wheat,
barley or tomatoes, in which F3's are easy to obtain. Thus many consider-
ations enter into designing experiments in biometrical genetics to derive
the maximum information for the resources available. Sometimes we can
use earlier experience to help us, but in general little attention has yet
been paid to problems of experimental design in biometrical genetics:
some of its problems have been investigated but much remains to be
done (M and J, Section 58).
One further point remains to be made about statistical precision. Some
of the analyses we have discussed have been of means, and others of the
second degree statistics, variances and covariances. Now means are sub-
ject to much lower error variances than are second degree statistics and
so yield estimates and comparisons of greater precision for any given
number of individuals observed. Thus information arising from the analy-
sis of means is easier to obtain, and in that sense more rewarding, than
informa:tion from the analysis of second degree statistics, and for this
reason the value in biometrical genetics of anything beyond the analysis
of means has on occasion been denied. This would indeed be a fair point
if first degree and second degree statistics were merely alternative ways
of obtaining the same genetical information, but we have in fact seen
that they are not. Means provide us for example with an estimate of [d],
which may range anywhere from 0 to S (d) according to the distribution
of like alleles between the parents, and an estimate of [h) which will be
reduced by any opposition in the direction of dominance between genes
at different loci. We can never, therefore, be confident of obtaining a
measure of average dominance from the analysis of means. Second degree
statistics on the other hand yield estimates of D and H, which in the
absence of linkage, are unaffected either by the distribution of alleles
Concepts and uses 215
between the parents or by differences in the direction of dominance at
different loci. SO V(H/D) is in principle always able to provide a measure
of the average dominance. Furthermore, linkage can be detected and
measured only by using second degree statistics, and the analysis of ran-
domly breeding populations too can be achieved only by the use of second
degree statistics. So, far from being no more than alternative sources of
the same information, first and second degree statistics provide different
and complementary information. To deny the value of one because it is
statistically more troublesome is merely to shut one's eyes to this com-
plementary quality. If we are to gain the genetical information we require
we must be prepared to face the statistical problems it entails and seek
to overcome them.

40. Concepts and uses


Biometrical genetics requires statistically valid analyses of results from
experiments designed to this end. It also of course requires that the
analyses are genetically meaningful, and this in its turn makes demands
on the design of the experiments, as we saw in the previous section. The
genetical requirement goes deeper, however: ultimately it must imply
that the genetical formulations of the means, variances and covariances
that we observe and compare in the analysis, must be derived from the
basic principles of genetics and be expressed in terms of parameters that
properly represent and quantify acceptable genetical phenomena.
These basic genetical principles and (at any rate in the main) the gen-
etical phenomena that we might seek to incorporate were obtained not
from the biometrical study of continuous variation, but by using the
Mendelian approach of observing the properties and inter-relations of
individually recognizable and hence individually traceable genes. Indeed
this must be so, for biometrical genetics could not of itself have laid
the wide genetical foundation on which the biometrical analyses rest.
True, the concept of equilinear transmission from male and female
parents could have been established by biometrical means, and particu-
late inheritance could have been inferred from the excess of variation in
F2 over that shown by inbred parents and their Fl. It would, however,
have been virtually impossible to establish with any confidence the pre-
cise rules of segregation of these particles or the variety of their relations
to one another in hereditary transmission. Neither could the chromosome
theory have been established as we know it, nor the nature of linkage and
216 Conclusion
the mechanism of recombination understood. Dominance and at least
some interactions could have been demonstrated, but a precise basis for
their quantitative analysis would still have been lacking.
Conceptually therefore biometrical genetics is the child of Mendelian
genetics. But it is the partner, too, since the concepts can seldom be
taken over and used just as they are. Ambiguities must first be removed
from them, and they must be refined and adapted to yield the parameters
by which they are represented and quantified for biometrical use. To take
an example, in the early days of genetics dominance was the capacity of
a gene to over-ride the expression of its recessive allele in a heterozygote,
whose phenotype was thus the same as that of the homozygote for the
dominant gene. It was soon recognized that the heterozygote might be
intermediate in phenotype between the two homozygotes, and this was
termed incomplete dominance, as distinct from the, by implication,
customary complete dominance; but no attempt was made to recognize
degrees of incompleteness or to define the absence of dominance, the
possibility of which is clearly implied as a special case of incompleteness.
Later, the Drosophila geneticists came to use the term in yet a different
way, any mutant gene which displayed its presence by changing the
phenotype when heterozygous with the wild-type allele being described
as dominant, without any reference to the relation the phenotype of the
heterozygote might bear to that of the mutant homozygote. This new
usage had the curious result of a mutant gene whose expression was not
always readily recognizable in heterozygotes being sometimes described
as 'dominant, but better used as a recessive'. Clearly, although the con-
cept of dominance obviously had to be brought into biometrical gen-
etics, it equally obviously had to be given a consistent and quantitatively
precise definition before it could be so used. We have seen in earlier chap-
ters how this is done in the form of the ratio hid, and this leads us to
recognize the fundamental distinction between the phenotypic relation
an Fl bears to its parents as expressed by [h]/[d] (the potence ratio as it
has been called) and the dominance ratios, hid, of the gene-pairs which
contribute to that relation. It also emphasizes a further feature of any
ratio which depends on the relations between three or more measure-
ments, namely the general dependence of the ratio on the choice of the
scale used in making the measurements. We saw the consequence of this
dependence for the dominance ratio in the example on p. 46, where
the choice of scale could affect its magnitude and even change its sign.
If we take non-allelic interaction as a second illustration, a further
point is brought out. Various kinds of digenic interaction were early
Concepts and uses 217
recognized by geneticists through the aberrations they produced in the
classical 9:3:3: 1 ratio expected in F2 , and indeed the interpretation of
these aberrant ratios in terms of interaction made a major contribution
to the establishment of Mendelian inheritance as both ubiquitous and
virtually exclusive. As recognized by the early geneticists, these ratios
involved not only complete dominance but also complete dependence
in expression of specific alleles at the two loci, and the interactions were
classified as being of one kind or another according to the particular
combination of dominance and dependence that the various genes dis-
played. Thus before it could be used in biometrical genetics, not only
had the concept to be extended to allow of partial as well as complete
interaction, but a framework also had to be found which could ac-
comodate all the types of interaction, complementary, duplicate and so
on, and so avoid the need to treat each of them separately from the rest.
This framework is provided by the recognition of three basic types of
interaction: d X d, or i-type; d X h or j-type; and h X h or I-type, in the
way we saw in Section 19. All the classical types of interaction are
definable in terms of i, j and I, each of which makes its own character-
istic contributions to the means and variances of the different gener-
ations and families, thus affording not merely the means of specifying
and quantifying incomplete interaction but also of combining into a
single formulation the different types and degrees of interaction that
might be expressed by the member genes of a polygenic system when
taken two at a time. Again we can see the distinction between, on the
one hand, the gross or overall interaction properties taking together all
the genes by which two parent lines differ, as represented by [i], [j] and
[I], and on the other hand the interaction properties of individual pairs
of genes, as represented by the individual i's, j's and l's. Furthermore
the representation of interactions is now readily extensible to trigenic
or even higher orders, should this be required.
Since the biometrical and Mendelian techniques seek to analyse gen-
etical situations in terms of the same principles and the same phenomena,
it has sometimes been assumed that the biometrical approach is no more
than a rival alternative to the Mendelian. In principle it is true that some
situations, normally and properly dealt with by classical means, could be
handled biometrically. This would be the case, for example, where a
single gene difference in some readily measurable character, like stature,
was involved. But to use the biometrical approach rather than the
Mendelian where the genetical classes are easily recognized by inspection
would, to say the least, be inefficient and even tortuous; and in the
218 Conclusion
absence of some compelling reason, Mendelian analysis would always be
preferred to biometrical in such a case. Indeed to see the two approaches
as rival alternatives is to miss the point that each technique of analysis
has its own field of application, to which the other is less suited or even
impossible to adapt. We should recall that biometrical genetics began
and has been developed for the genetical analysis of continuous vari-
ation. Even this can be expedited (although seldom if ever carried through
to completion) where appropriate special means of genetical or cytological
manipulation are available, as we saw when discussing the information to
be gained from direct assays of variation in the sternopleural chaeta num-
ber of Drosophila melanogaster (Section 3). Such analyses of continuous
variation require, however, special marker genes and chromosome types
which are available in only a few well investigated species of animals and
plants, and only to a limited extent in most even of these. They require,
too, elaborate and lengthy breeding programmes which are justifiable
only for special reasons such as obtained, for example, in the exper-
iments to which we have just referred, where we were concerned to
ascertain the detailed nature of the genetical control of the variation
and the distribution of the controlling elements between and, as far as
possible, within the chromosome.
Thus in all but special cases in a few species, the genetical investigation
and understanding of continuous variation must require the use of
methods that only biometrical genetics provides. Without these methods
continuous variation can be neither probed nor manipulated efficiently.
Furthermore, since biometrical genetics neither depends on nor makes
use of the recognition, through their effects, of individual gene differ-
ences, its analyses will cover all the variation shown by a character,
whether non-heritable or heritable, stemming from genes of large effect
or small or for that matter from transmissable agents which are not
nuclear genes. The completeness of this coverage must often mean com-
plexity in the analyses themselves and in the experiments upon whose
results the analyses are based. It is however, this capacity for not only
dealing with continuous variation but for clarifying, measuring, analys-
ing and understanding the totality of the variation shown by a character
which gives to biometrical genetics its place in our armoury of genetical
methodologies for investigating the properties and changes of variability,
its adjustment in the wild and its manipulation in those species that we
have brought into domestication.
Glossary of symbols
and abbreviations

A~a (B~b, etc.) A pair of alleles, a gene pair, a single gene difference. A is
the allele which increases and a that which decreases the
expression of the character.
A (E, C etc.) Individual scaling tests.
b Regression coefficient.
c A measure of gene association in the parental lines of a
diallel.
d The departure of one of a pair of corresponding homo~
zygotes from their mid~point or mid~parent (m). It is
positive for the homozygote carrying the increasing
allele and negative for that carrying the decreasing allele.
The relevant gene pair may be denoted by a subscript:
thus AA departs from m by da and aa departs from m by
-da ·

[d) The departure of one of a pair of true breeding parental


lines from their mid~parent (m). The parent with the
greater expression departs by [d) and that with the lower
expression by -[d). [d) is the sum, taking sign into ac~
count, of the d's of all the relevant genes carried by the
larger parent.
D = S(d 2 ) The genetically additive component of variation.

Dp = S(4uvd 2 ) The genetical component of variation among


the parents of a diallel. Dp = D when all u = v = !.
Glossary of symbols and abbreviations
= S(4uv[d + (v - u)h]2). The statistically additive com-
ponent in a randomly breeding population. DR = D when
allu=v=!
= S(4uvd[d + (v -u)h]). The genetical component in
Wr from a diallel. Dw = D when all u = v = !.
df Degrees of freedom.
e The departure from m ascribable to the effect of the en-
vironment, averaged over all genotypes. A biological
measure of the environment.
The departure from m of genotype X ascribable to the
environment.
E The non-heritable component of variation. Ew is the non-
heritable component ascribable to differences expressed
within a family and Eb that is ascribable to differences as
expressed between families. E1 = Ew' E2 = Eb + ~ Ew.
F = S(dh).

g The departure from m ascribable to interaction of the


genotype and environment. gd is the interaction of d or
[d] with e. gh is the interaction of h or [h] with e.

G = S(g2). The component of variation ascribable to geno-


type X environment interaction. Gd = S(gi). Gh = S(g;).
h The departure of the heterozygote from the mid-parent,
m. h takes sign, ha for example being positive when in its
expression of the character Aa is nearer AA than to aa,
and negative when it is nearer to aa than to AA.
[h] The departure of an Fi from the mid-parent of the true
breeding lines of which it may be regarded as a cross. [h]
is the sum taking sign into account of the h's of all the
relevant genes.
H = S(h2) The dominance component of variation.
= S (16u 2v 2h 2) in a randomly breeding population. HR =
H where all u = v = !.
Glossary of symbols and abbreviations 221
The departure from m ascribable to the hom X hom in-
teraction of A-a and B-b. The interaction of da and db'
I (i) = S (i2). The component of variation ascribable to
hom X hom interaction.
(ii) The sum of the interaction terms in ~F2 etc.
The corresponding component in a randomly breeding
population. IR = I when all u = v = !.
The departures from m ascribable to hom X het interac-
tion of A-a and B-b. jab is the interaction of da and hb' jba
that of db and ha .
J = S(P). The component of variation ascribable to hom
X het interaction.
The corresponding component in a randomly breeding
population. JR = J when all u = v = t.
k The number of gene pairs in which two true breeding
lines differ. k, the estimate of k, is the number of effective
factors.
The departure from m ascribable to the het X het inter-
action of A-a and B-b. The interaction of ha and h b •
L = S (12). The component of variation ascribable to het X
het interaction.
The corresponding component in a randomly breeding
population. LR = L when all u = v = t.
m The mid-point between the expressions of the character
in two true-breeding lines. Commonly termed the mid-
parent (but see also pp. 102).
MS Mean square.
n The number of individuals in a family. Similarly n' is the
number of families in a group.
p(q) A recombination frequency. Pab is the frequency of re-
combination between A-a and B-b. 1 - P = q.
P Probability, in relation to a test of significance.
222 Glossary of symbols and abbreviations
r (i) Correlation coefficient.
(ii) Measure of association of the genes in which two true-
breeding lines differ. Where the lines differ by k gene
pairs of equal effect, and the larger parent nevertheless
carries reducing alleles at k' of them, r = Ie (k - 2k').
r = I for complete association and 0 for maximum dis-
persion.
S Indicates summation.
SCP Sum of cross products.
SS Sum of squares.
t The ratio of a quantity to its estimated standard error.
A test of significance.
T Heritability, Tn being the narrow and Tb the broad heri-
tability.
u(v) The frequency of the increasing allele. I - u = v. Thus
ua is the frequency of allele A and va that of a.
v A variance, the relevance of which is indicated by a sub-
script. Thus VPl is the variance of PI (the larger parent),
VF1 that of FI' ~F2 that of F2 etc.
The variance of a randomly breeding population, with
VsR the variance of a full sibs, VHSR the variance of half-
sibs etc.
The variance of an array in a dialle!. v,. is the mean vari-
ance of all arrays and 17 the variance of array means.
VR Variance ratio. A test of significance.
W A covariance the relevance of which is indicated by a
subscript. Thus W1F23 is the covariance of an F2 individ-
ual with the mean of the F3 derived from it, etc.
A covariance in a randomly breeding population, the
relevance of which is indicated by a subscript. Thus WPOR
is the covariance of parent and offspring, WSR that of full
sibs, WHSR that of half-sibs etc.
Glossary of symbols and abbreviations 223
The covariance with the non-recurrent parent given by
an array in a diallel. Wr is the mean covariance of all arrays.
A measure of the variation in magnitude of a set of d's.
e A measure of the intensity of complementary and dupli-
cate type interactions between gene pairs. Where da = db =
ha = hb = d, iab = jab = jba = lab = ed, e is positive for com-
plementary and negative for duplicate type interaction.
The mean of a family or generation is denoted by a bar over the desig-
nation of that family or generation. The mean of parent PI is~, of FI is
Fl , of F2 is ~ etc.
References

XKERMAN, A. (1922). Untersuchungen tiber eine in direktem Sonnenlichte nicht


lebensfahige Sippe von Avena sativa. Hereditas 3, 147-77.
BA TESON, W. (1909). Mendel's Principles of Heredity. University Press, Cambridge.
CALIGARI, P.D.S. and MATHER, K. (1975). Genotype-environment interaction:
III. Interactions in Drosophila melanogaster. Proc. R. Soc. Lond. B. 191, 387-
411.
CA VALLI, L.L. (1952). An analysis of linkage in quantitative inheritance. Quanti-
tative Inheritance (Ed. E.C.R. Reeve and C.H. Waddington) pp. 135-44. HMSO,
London.
DARLINGTON, C.D. and MA THER, K. (1949). The Elements of Genetics. Allen
and Unwin, London.
DA VIES, R.W. (1971). The genetic relationship of two quantitative characters in
Drosophila melanogaster. II. Location of the effects. Genetics 69, 363-75.
EA VES, L.J. (1975). Testing models for variation in intelligence. Heredity 34,
132-6.
EAVES, L.J., LAST, K., MARTIN, N.G. and JINKS, J.L. (1977). A Progressive
approach to non-additivity and genotype-environmental covariance in the analy-
sis of human differences. Br. J. Mathematical and Statistical Psychology (in press).
FALCONER, D.S. (1960). Introduction to Quantitative Genetics. Oliver and Boyd,
Edinburgh.
FISHER, R.A. (1918). The correlations between relatives on the supposition of
Mendelian inheritance. Trans. R. Soc. Edinb. 52,399-433.
FISHER, R.A. (1946). Statistical Methods for Research Workers (10th Edn). Oliver
and Boyd, Edinburgh.
GAL TON, F. (1889). Natural Inheritance. Macmillan, London.
GOLDSCHMIDT, R. (1938). Physiological Genetics. McGraw-Hill, New York and
London.
GRUNEBERG, H. (1952). Genetical studies on the skeleton of the mouse; IV.
Quasi-continuous variations. J. Genet. 51,95-114.
HA YMAN, B.1. (1960). Maximum likelihood estimation of genetic components of
variation. Biometrics 16, 369-8l.
HOG BEN , L. (1933). Nature and Nurture. Allen and Unwin, London.
H 0 L T , S.B. (I 9 52). Genetics of dermal ridges: Inheritance of total finger ridge-
count. Ann. Eugen. 17, 140-61.
References 225
JINKS, J.L. and CONNOLLY, V. (1973). Selection for specific and general response
to environmental differences. Heredity 30,33-40.
JINKS, J.L. and FULKER, D.W. (1970). A comparison of the biometrical genetical,
M A V A and classical approaches to the analysis of human behaviour. Psychol. Bull.
73,311-49.
JINKS, J.L. and PERKINS, J.M. (1969). The detection of linked epistatic genes for
a metrical trait. Heredity 24,465-75.
JINKS, J.L., PERKINS, J.M. and BREESE, E.L. (1969). A general method of de-
tecting additive, dominance and epistatic variation for metrical traits: II. Appli-
cation to inbred lines. Heredity 24,45-57.
JINKS, J.L. and TOWEY, P. (1976). Estimating the number of genes in a polygenic
system by genotype assay. Heredity 37, 69-81.
JOHANNSEN, W. (1909). Elemente der exakten Erblichkeitslehre. Fischer, Jena.
KEARSEY, M.J. (1965). Biometrical analysis of a random mating population:
A comparison of five experimental designs. Heredity 20, 205-35.
KEARSEY, M.J. and JINKS, J.L. (1968). A general method of detecting additive,
dominance and epistatic variation for metrical traits: I. Theory. Heredity 23,
403-9.
LA W, C.N. (1967). The location of genetic factors controlling a number of quanti-
tative characters in wheat. Genetics 56,445-61.
MAR TIN, N .G. (1975). The inheritance of scholastic abilities in a sample of twins.
Ann. hum. Genet. 39,219-29.
MATHER, K. (1941). Variation and selection of polygenic characters.]. Genet. 41,
159-93.
MATHER, K. (1949). Biometrical Genetics (1st Edn.) Methuen, London.
MATHER, K. (1967). The Elements of Biometry. Methuen, London.
MA THER, K. (1973). Genetical Structure of Populations. Chapman and Hall,
London.
MA THER, K. (1974). Non-allelic interaction in continuous variation of randomly
breeding populations. Heredity 32,414-19.
MATHER, K. and HARRISON, B.J. (1949). The manifold effect of selection.
Heredity 3,1-52 and 131-62.
MATHER, K. and JINKS, J.L. (1971). Biometrical Genetics (2nd Edn.) Chapman
and Hall, London. (This reference is abbreviated to M and J in the text.)
PERKINS, J.M. and JINKS, J.L. (1970). Detecting and estimation of genotype-
environmental, linkage and epistatic components of variation for a metrical
trait. Heredity 25,157-77.
POWERS, L. (1951). Gene analysis by the partitioning method when interactions
of genes are involved. Bot. Gaz. 113, 1-23.
ROBINSON, H.F., COMSTOCK, R.E. and HARVEY, P.H. (1949). Estimates of
heritability and the degree of dominance in corn. Agron. J. 41,353-9.
SEARLE, S.R. (1966). Matrix Algebra for Biologists. Wiley, New York.
SHIELDS, J. (1962). Monozygotic Twins. Oxford, University Press.
226 References
SPICKETT, S.G. (1963). Genetic and developmental studies of a quantitative charac-
ter.Nature 199,870-3.
STURTEV ANT, A.H. (1925). The effects of unequal crossing over at the bar locus
in Drosophila. Genetics 10, 117-47.
THODA Y, I.M. (1961)_ Location of polygenes. Nature 191,368-70.
VAN DER VEEN, I.H. (1959). Tests of non-allelic interaction and linkage for quanti-
tative characters in generations derived from two diploid pure lines. Genetica 30,
201-32.
WOLSTENHOLME, D.R. and THODA Y, I.M. (1963). Effects of disruptive selection:
VII. A third chromosome polymorphism. Heredity 18,413-31.
Index

Akerman, 130 - assays, 10 et seq.


Analysis of variance, 7,12,65,72-5,79, inversion -, 8, 14
91,93,136,138-42,145,147, sex -, 7
185-7 - theory, 215
Animal geneticists, 35 Competition (and co-operation), 183
Antigenic specificity, 2 Complementary (interaction), 102-4,
Association and dispersion (of genes), 110, 115 -16, 125 -9, 21 7
86,88,104-6,110-11,114-15, Computer, need for, 67
117, 125-9, 162, 192, 199-209 Connolly, 150
- rd, 201-2 Correlated response (expression), 207
Autosomal inheritance, 42-3, 74 Correlation(s), 5, 28, 54,171-3,178-9,
181-3
Back-cross, 8-10,15,35-8,40-2,45, GxE -, 183
47 et seq. 51, 56, 58-61,65,69, sampling -, 66-7
105-7,114-15,120-1,154,166, Coupling and repulsion (in linkage),
208,211-12 117-24,203
Ba~eye, 8, 33,46-7,131-2 Covariances, 5, 29-30, 48, 53-6, 60,
Barley, 121-4,214 66,69 et seq. 99, III et seq.
Bateson, 1, 102 167-70,184,194-5,211,214-15
Beans, 5-6, 130, 191 sampling -, 66
Biparental matings, 54-5, 57,167,184, Crossing-over, I 5
188 Cubit measurement, 171-4
Breeders (plant and animal), 4,208 Cultural transmission, 183
Breeding programme, I 10 Cytoplasmic
Breeding test, 1,6,19,21,102 - effects, 211
Bristles (v. chaetae) - inheritance, 210
- units, 10
Caligari, 10, 11
Cavalli, 37 Darlington, 6,14,102
Chaetae Darwin, 4
abdominal-, 8-10,19,27,205 Davies, 19
sterno pleural -, 3-4, 6-7, 11-14, Development
18-19,27,30,72-85,132-57, stability and instability (variation),
218 6-7,19
Chlorophyll, 130 Diallel(s), 68 et seq. 99,124 et seq.
Chromosomes, 5, 8-10,15,19,30, 189-90, 192
72-3,83-4,206,218 - c, 125-9
228 Index
defined -,83,86,88,91,98 Equilinearity (of inheritance), 7-8, 74,
half -,76,95 215
-1,72 Experimental design, 210 et seq.
quasi -, 189-90 Extra-nuclear element, 91
- set, 71,124,128
- table, 74, 75,76,80, 125 Fl,8-10,36-8,43,48,51-2,57-8,
undefined -,85 et seq. 90 et seq. 60-2,68,104-7,109-11,142-3,
- V" 68-72, 76-83, 86-90,94-7, 149,151-2,156-7,158-62,203,
125-7 207,211-12,216
-W,,68-72,76-83,86-90,94-7, F2, 21,34-8,47 et seq. 51-62,66,
125-7 69-70,102,104-7,112-20,143,
- W,/V, graph, 69-72, 77, 81-2, 149,151-2,154, 156-7 et seq.
85,87-8,94-6,125,127-9 164,167,194,197-8,200,209,
- W,+V, and W,-V" 70,72,78-9, 211
94-5,97,99,127 F3, 35-7, 51-6, 57-8,61-2,66, 106-7,
Dispersion (of genes) (v. association) 112-16,118-24,162,207-9,
Dominance, 21-4, 32-4, 47, 49-50, 59, 213-14
65,69-70,74-5,77-9,88,94-8, F 4 , 35-6,54-6,57-8,61-3,66,114-15,
99-100,154-7,195,211-17 200,209
degree of -, 33,46-7,49,82 Fs,54,57
direction of -,49,59,70,82-3, Facet number, 33, 46-7,131
105,214-15 Falconer, 196
partial-, 32-3,46,85,216 Fall ratio, 123-4
- ratio, 46, 50, 60-1,82, 85, ISS, Family size, 37-8, 53-6, 174
205,216 Fertility, 4
super (or over) - 33, 34,47,49, Fibonacci series, 58
11 0, 198,207 Fisher,S, 39,171,192
Drosophila, 3-4,6-9, 12, 18-19,21, Flower colour, 1,28
26,27,29,30,33,46,72-3,77, Flowering time, 3, 91-8,192-3
85,94,131-3,146,156,205, Frequency (allelic), 86-7,98, 164-5,
214,216,218 196-8,
Duplicate (interaction), 103-4, 109, Ill, - distribution, 3-4
115-16,125-9,217 Fulker, 175, 178, 179, 183, 192

Ear conformation (in barley), 121-22 Galton, 5, 6,171


Eaves, 183 Gene(s)
Effective factors, 199 et seq. distribution of -, 34,49,59,90,99,
interpretation of -, 206 125-9,191
-K1,207 dominant -, 8
- K2, 208-9 - effects (a), 200-2
number of (k) -, 199-209 location of -, 14 et seq. 26
Environment, 1, 26, 32, 130 et seq. major -,1,215
181,183,193,196 marker -, 8,15-18,218
- agencies, 19 nuclear -, 9,191
common - 54, 170, 179 number of -, 19, 199 et seq.
- e, 134-8, 141-62 recognition of -, 1
Epistasis, 14, 108 Genetic analysis, 25 et seq.
Index 229
Genotype, 1,5,23,28,32-3, 130 Law, 26
Goldschmidt, 33 Least squares, 38, 122, 180
Growth rate, 151 Lee, 171, 173
Griineberg, 4 Linkage, 35, 52, 102, 116etseq.124,
191,199,202 et seq. 208,211,
Haemoglobins, 28 214-15
Haploidy, 210 - disequilibrium, 191
Hardy-Weinberg equilibrium, 191 - map, 15, 18-19
Harrison, 8, 9, 11 - of interacting genes, 117, 211
Hayman,61,63,64,65,66,67,75,93 - phase, 211
Heritability, 195 et seq.
Heritable agencies, 6 Maize, 185,197,198
Hereditary component, 5, 19 Man, 171 et seq. 174 et seq.
- element, 7, 10 stature in -, 2, 3, 7, 217
Hermaphrodite (organisms), 43, 184 Map distance, 15
Heterosis, 34,109-11,207 Marital correlation, 171, 173, 191
Hogben, 131 Marker class, 15 -19
Holt, 182 Martin, 182
Mating-assortative, 173, 183, 191-2
Inbred lines, 6, 11,26,51, 185, 190, - negative assortative, 1.91
196,198 -systems, 35, 175, 177, 191
Inbreeding, 91,191-2 Maternal care, 54
coefficient of (f) -, 192 - effects, 42-3, 74,190,210
- depression, 207 Mather, 6, 8, 9,10,11,12,14,21,37,
Interference, 15-16 54,57,96,102,121,122,124,
Interaction, 14, 19,26,43-5, 104 et seq. 191,193,205,206,207
216 M and J (Mather and Jinks), 51,56, 57,
classical types of -, 14, 102-4 61,62,65,66,75,90,91,93,94,
genotype x environmental -, 50, 130 98,106,117,125,131,187,
et seq. 177-9, 181, 184, 191-3 190,202,207,208,212,214
non-allelic -, 46, 65, 74-5, 78-9, 88, Matrices, 39, 64, 153-4, 177
94,99 et seq. 104 et seq. 114-15, Means, 4
124-9,191-5,196,199-200,211- components of -, 32 et seq.
12,216-17 additive ([d)), 34-40, 58-9, 70,105-
-0,104, Ill, 115-16, 125-9 10,134-8,141,144-6,148-50,
partial -, 104 152-5, 199-208,212-13
super -,104 dominance ([hj), 34-40, 57-9, 105-
trigenic -, 101, 108, 109, 217 10,149,151-5,207,212-14
Iteration, 63-5 interaction, genotype x environment
(g), 135-8, 144-62
Jinks, 65, 86,124,150,175,178,179, interaction, non-allelic ([ij , [jj , [1]),
183,192,209,211 106-10,200,217
Johannsen, 5,191 Mendel, 1,2,5
Mid-parent, 32-5, 57,102,199
Kearsey, 65, 192 Multiplicative action, 43-4
Krafka, 131 Mutation, 48, 57,191,206
Kurtosis, 30
230 Index
Neuroticism, 175-80 - value (p), 15-19, 117-24,202-4
Nicotiana, 3, 37-8,41-3,46,60-1,63, Regression, 9, 72, 77-8, 81, 94-5, 97,
86,91-3,96-7,108-9 127,145-51,156-7
Non-heritable agencies, 6, 20, 26, 50, Reinforcement and opposition (of
52, 192 dominance), 119,121,123,207
Normal distribution, 3 Repulsion (v. coupling)
North Carolina designs Robinson, 185, 186, 187, 197
- 1, 184-9, 190, 192-3, 197
- 2, 189-90, 192-3 S2, 114
- 3,65-6 S3, 35, 36, 55-6, 58,106,114,200,
208,214
Oats, 130 S4, 36, 56, 57-8
Orthogonal comparisons, 12, 141 Ss,57
Outbreeding species, 51 Scale(s),42etseq.131-2,196,216
transformation of -, 43-7,51,99,
Palm print ridges, 182 131-2,134,193
Papaver (v. poppy) Scaling tests; 36-7, 40-1, 99,107-8,
Parent/offspring relation, 6-7, 168, 170, 117, 124
171-4,195,211 joint -, 37-42,107-9,212
Pearson, 5,171,173 Schizophyllum,150-1
Peas, 1,5 Searle, 39
Perkins, 124, 211 Segregation, 15,29,43,48,52,57
Phaseolus (v_ beans) Selection, 21,43,49,57,166,191,196,
Phenotype, 1,5,21,23,32,130,216 198,204-7
Pisum (v. peas) intensity of - (S), 196
Plant height, 37-42, 46, 60, 61-5,108-9 response to - (R), 196
Pleiotropy, 207 stabilizing -,96
Polygenic com binations, 206-7 Selfing, 5, 35, 66, 19 I
- systems, 20, 21 et seq. 26-7, 29, Sensitivity, 137, 142-4, 147-50, 156-7
124,206,217 Sex-chromosomes, 7
Polyploid inheritance, 210 - linkage, 42-3, 210, 211
Poppy, 192-3 Shields, 175, 179
Potence ratio, 34,155,216 Sib-mating, 35, 55-6, 57-8
Powers, 41, 42, 45 Sibs
Progeny test, 19, 26 full-, 163, 167-70, 172, 179,
Punnett, 102 181-3,187,211
Pure line, 5, 37, 178 half -,164,167-70,21 I
Skewness, 3, 22, 30
Randomization, 37, 54,181,184,185 Socio-economic class, 18 1
Random mating, 54-5, 164, 171, 173- Somatic analysis, 25 et seq.
4,177,191 Spickett,27
Rank, 52-4,119-24,211-12 Standard deviation (error), 36, 40, 64,
Ranunculus (v. water crowfoot) 78,107,136,154
Reciprocal crosses, 7,43,68, 74-6, 91, Statistics, 4
211 first degree -,29,214-15
Recombination, 8,10, II, 15-19,57, second degree -,29,48,60,70,110,
117-21,203-7 114-15,120,164,21 I, 214-15
Index 231
Sturtevant, 33 -E1,53,62,121-2
Sub-characters, 27-8 -E2, 53,62,121-2
Substitution lines, 12, 72 - F, 50, 60-1
Sugar beet, 28 -H,49-50, 52-6,57-9,60-5, 72,
80,84-5,113-14,119-21,124,
Tester stock, 8 159,162,205,207
Thoday, 17, 18, 19 -HR, 87-90, 96-7, 164-5, 167-70,
Tomatoes, 41, 42, 43, 44, 46, 214 171-3, 174-7, 179-83, 184-90,
Towey, 209 192-5,195-8
Triple test cross, 65 -H},122-3
Triticum (v. wheat) - H2, 122-3
True-breeding line(s), 48, 59, 68, 70, 71, - 1,113-15
I 04, 1 10, I 16, 19 I, I 93, 199, 2 I I - I, J, L, IR ,JR. LR, 194-5
Twins, 174 et seq. fixable -,49, 198
dizygotic - 171, 179-83 heritable-,9, 19,21,48,57, 171,
monozygotic - 1 71, 174-83 196
non-heritable -,6-7, 15,20,24,26,
Van der Veen, 124 48-51,52-3,80-2,164,172,
Variability, conservation of, 57-8 196,212
Variance of a variance, 62-4 partitioning of --, 59 et seq.
Variation, quantal -,4
balance sheet of -, 57 et seq. quasi-continuous -, 3-4, 8, 19
components of -, 47 et seq. 59,97 sampling -, 7, 36, 53-6, 61-2, 74,
163 et seq. 86,117,163-4,169-70,172,
- D, 49-50,52-6,57-9,60-5,72, 205,207
80-2,84-5,113-14,119-21, unfixable -, 49
124, 159, 162, 199-202, 203, Vertebrae, 2, 3
205,207-8 Viability, 43, 48
-Dp, 87, 88-90, 96-8,192 Virk, 37, 41
-DR,87-90, 96-8,164-70,171-4,
174-7,179-83,184-90,192-5, Water crowfoot, I
195-8,211 Weight(s), 38-40, 50, 62-6', 153, 180
- Dw, 87,89-90,96-8 Weight of fruit, 41-2, 44-5
- D1, 122-3 Weighted estimates, 38, 180
-D2,122-3 Weighted least squares, 38-40, 62-5,
-- E, 49-51,52-4 66,153,177
-Eb,53-6,62-3, 169-70, 171-3, Wheat, 26, 27, 214
174-7,179,182-3,184,188-90, Wolstenholme, 17, 18, 19
194, 195-8
-E~, 170, 171-4, 184, 195 Yield, 4, 27-8, llO, 185-9, 197
-Ew, 53-6, 62-3,169-70,171-2,
174-7,179,182-3,189,192-4,
195-8

You might also like