Polymorphism
Polymorphism
The evidence for genetic variation can be traced to Mendel’s experiments: The discovery
of the laws of heredity was made possible by the expression of segregating alleles. Since
that time, the study of genetic variation in natural populations has been characterized by a
gradual discovery of ever-increasing amounts of genetic variation. In the early decades of
this century geneticists thought that an individual is homozygous at most gene loci and that
individuals of the same species are genetically almost identical. Recent discoveries suggest
that, at least in outcrossing organisms, the DNA sequences inherited one from each parent
are likely to be different for nearly every gene locus in every individual: ie, that every
individual may be heterozygous at most, if not all, gene loci. But the efforts to obtain
precise estimates of genetic variation have been thwarted for various reasons.
Key words: genetic variation, molecular evolution, natural selection, DNA polymorphism
INTRODUCTION
This paper consists of two parts. First, I review the question of how much
molecular genetic variation exists in natural populations. Second, I present some
general considerations concerning the processes that contribute to maintain that
variation.
PROTEIN POLYMORPHISMS
Genetic variation is an attribute that cannot be exhaustively measured. It is not
possible, even if we wanted it, to examine every gene in every individual of a given
species, so as to obtain a complete enumeration of the genetic variation in the species.
The well-known solution in such a situation is to measure a sample from the group to
be evaluated. Two conditions need to be met for a valid extension of the results from
a sample to the whole set. First, the sample must be representative or unbiased;
second, the sample must be accurately measured. In the case at hand, the requirement
that the sample be unbiased applies to two levels: (1) the individual organisms sampled
must be, on the average, neither more nor less genetically variable than the population
as a whole; (2) the genes sampled must be neither more nor less polymorphic, on the
average, than the whole genome. And the condition of accuracy requires that genes
that are different be identified as such; ie, it requires that every allelic variant be
recognizable.
*Based on a lecture delivered in September, 1983 at the International Conference in Biochemical and
Developmental Genetics, Kos, Greece.
Received for publication October 23, 1983; accepted December 26, 1983.
Address reprint requests to Francisco J. Ayala, Dept. of Genetics, University of California, Davis, CA
95616.
DNA-SEQUENCE POLYMORPHISM
It has been known for more than a decade that only a small fraction, perhaps
less than 10% of the nuclear DNA of eukaryotes is translated into protein. The
recently developed techniques of DNA cloning and sequencing have shown that genes
are separated from each other by long DNA sequences that do not become transcribed
into RNA. The genes themselves have a complex organization. At both ends they
have relatively short sequences that are present in the mature mRNA transcript, but
do not code for amino acids. Most genes contain, in addition, intervening sequences
(introns), which separate from each other the segments that code for the amino acids
(exons). The introns are transcribed in the nucleus, together with the rest of the gene,
but they are spliced out before the mRNA migrates to the cytoplasm.
The question of how much genetic variation exists in the DNA of an organism
can, thus, be formulated in various ways. One may ask the question about the whole
genome or about particular components such as, for example, the coding segments.
A number of genes have been sequenced in two or more related species, and it has
become apparent that different segments evolve at different rates. This suggests that
different kinds of segments may have different levels of polymorphism, a hypothesis
recently corroborated by direct evidence.
Slightom et al [4] have sequenced two alleles of the *Y gene, which codes for
one of the polypeptides of fetal hemoglobin (Fig. 1). The two alleles are from a single
individual, one allele from the paternal and the other from the maternal chromosome.
The results are summarized in Figure 2. There are 13 substitutions of one nucleotide
by another and three segments deleted in one of the alleles (or inserted in the other).
None of the substitutions occurs in the exons; most (nine) are concentrated in the 5'
half of the long intron. Two deletions are each 4 np long (positions 741-744 and 791-
794 of the sequence); the third consists of 18 contiguous np (starting at position 1080).
If the *y gene is a typical example, it seems likely that at the level of the DNA
sequence every outcrossed individual will be heterozygous at nearly all, if not all,
loci-that is, if the noncoding sequences are taken into account. The question of
heterozygosity needs to be reformulated in terms of the proportion of nucleotide
differences, which may be called nucleotide heterozygosity or nucleotide diversity.
Trying to measure nucleotide heterozygosity, one encounters some ambiguity.
If only substitutions are considered, the nucleotide heterozygosity of *y is 13/1647 =
0.008. If the deletions are also taken into account, the question arises of how they are
to be counted. If each deleted segment is counted as one difference independently of
its length, then there are three additional differences between the two alleles and the
heterozygosity is 16/1647 = 0.010; if each deleted nucleotide is counted as one
difference, then the heterozygosity is 3911647 = 0.024.
TABLE 3. Increase in Genetic Variation Detected by Three Different Methods at the Adh Locus of
Drosophila melanogaster*
Method H'-H n;/n,
Sequential electrophoresis 0.00 1 .00
Heat denaturation 0.02 1.03
Peptide mapping 0.10 1.20
*From Ayala [30].
TABLE 4. Increase in Genetic Variation for Three Groups of Organisms when Electrophoretically
Cryptic Protein Variation Is Taken Into Account*
Electrophoretic
variation Total variation
Organisms H ne H' n;
Invertebrates 0.134 1.155 0.278 1.386
Vertebrates 0.060 1.064 0.217 1.277
Plants 0.124 1.142 0.270 1.370
*It is assumed that the average increase in variation is 20%; ie, ndln, = 1.20. Average values from
Table 1.
384 Ayala
1 1 1 1 1 1 1 1 1 ~ 1 1 ~ 1 1 ~ 1
bp 0 300 600 900 I200 I500
Fig. 1. Organization of the A-y globin gene. This gene is part of the P-globin cluster located in
chromosome 11. The functional gene consists of three exons (black) separated by two introns (white).
One intron separates the triplets coding for amino acids 30 and 31; the other is between the triplets for
amino acids 104 and 105. The A-y gene consists of about 1,600 base-pairs (bp), 438 of which code for
the 146 amino acids of the polypeptide.
r Substitutions
4-
3-
m
w 2-
0
=W I -
fx
W
LL
k I -
Q
2 -
W
3-
U
m 4-
T
I7 - De/etions/Insertions
18 -
I 1 I I
0 500 1000 I500
SEQUENCE POSIT 10N
Fig. 2. Local distribution of nucleotide differences between two allelic A~ genes. Nucleotide substitu-
tions are shown on top; deletions/insertions on bottom. A diagram of the gene is shown in the middle;
black regions are the exons, white regions are the introns, hatched regions are flanking sequences. Data
from Slightom et a1 [4].
The nucleotide heterozygosity in other genes for which two independent alleles
have been sequenced is given in Table 5. Three genes (Adh in Drosophila, C, in rats,
and *y in humans) have substitution heterozygosities between 1 and 2%. The DNA
sequenced for Adh and C, includes only coding regions and thus no deletions were
observed. For the insulin genes the substitution heterozygosity is only 0.003, but the
Molecular Polymorphisms-How and Why 385
TABLE 5. Heterozygosity at the Single Nucleotide Level
Length of Heterozygosity
DNA sequence Substitutions
Organism Gene region (base pairs) Substitutions and deletions References
Drosophila Adha 765 0.009 0.009 ~311
melanogaster
Mouse IgG2a 1,108 0.100 0.100 [321
Rat Immunoglobulin C,“ 1,172 0.018 0.018 [331
Man Globin A~ 1,647 0.008 0.024 [41
Man Insulin 2,721 0.003 0.175 [341
aOnly coding sequences are included in these comparisons.
All genetic variations arise first by the mutation process, broadly understood so
as to include not only the substitution of one nucleotide by another but also deletions,
duplications, and reorganizations of the DNA. If the mutants modify the adaptations
of organisms, they will increase or decrease in frequency as a result of natural
selection. If they have no effect on adaptation, mutants will drift in frequency as a
consequence of random sampling from generation to generation. The hypothesis that
considers a mutation or a polymorphism as adaptively neutral is the starting null
hypothesis of the population geneticist. In recent years, however, Kimura and others
[5-71 have argued that, with respect to DNA and protein evolution, adaptive neutrality
is no longer just a null hypothesis, but a notion positively supported by evidence.
Two approaches may be followed to test the hypothesis of neutrality versus
natural selection. One consists of testing each particular polymorphism to ascertain
whether natural selection is implicated [see 8-12]. The other approach is global: It
uses theoretical reasoning or empirical evidence to argue for or against the role of
natural selection with respect to a general kind of variation, molecular variation in
the case at hand [eg, 13,141.
I want to examine here two general arguments-one positive, the other nega-
tive-that have been advanced to support the adaptive neutrality of protein variation.
The positive argument relies on the apparent existence of a molecular evolutionary
clock. When the rate of evolution is examined in a protein such as cytochrome c, it is
observed that amino acid substitutions have occurred in different branches of the
phylogeny at different times and at approximately constant rates. What is meant by
the phrase “approximately constant rates” is that the substitutions occur with a
constant probability, but stochastic variation is expected.
Langley and Fitch [ 151 have tested statistically the evolution of seven proteins
in 17 mammals and found that the variance in the rate of amino acid substitutions is
much too large-inconsistent with the hypothesis that the rate was stochastically
constant as predicted by the neutrality theory (Table 7). It is possible, however, to
maintain that the rate is stochastically constant but that it has a variance greater than
expected from a Poisson distribution [16]. One additional problem with this sort of
evidence in support of the neutrality hypothesis is that stochastically constant rates of
molecular evolution are also predicted by models of natural selection [ 171. Therefore,
the existence of a molecular evolutionary clock cannot be used in support of either
the neutrality or the adaptive hypothesis.
others, then a number of individuals would have less than optimal genotypes at each
polymorphic locus subject to natural selection. If the number of such loci is very
large, a population might be unable to withstand the burden of so many poorly fit
individuals. This argument deserves to be examined in detail because the neutrality
hypothesis was largely proposed as the only alternative left for those who rejected
natural selection because of the enormous genetic load that would be created by
ubiquitous protein polymorphisms .
The genetic load argument is strongest in the case of heterosis; ie, when a
polymorphism is maintained owing to the adaptive superiority of the heterozygotes.
Sved et a1 [18], King [19], and others have suggested that an efficient method for
testing whether heterosis plays a major role in natural populations is to compare the
fitness of ordinary outbred individuals with the fitness of individuals homozygous for
a larger-than-average proportion of loci. This method permits one to ascertain whether
heterozygotes are at an overall advantage over homozygotes.
Numerous experiments, particularly in Drosophila, have shown that an increase
in homozygosity results in a decrease in fitness. The experiments published before
1970 were, in general, carried out by measuring particular components of fitness,
mostly viability [20] and fertility [21,22], and were not, in any case, performed under
population conditions [23]. Sved and Ayala [24] devised a method by which fitness
as a whole can be measured in Drosophila flies made homozygous for full chromo-
somes, under conditions of equilibrium population density and a stable age distribu-
tion. This method has now been used in a number of experiments that yield consistent
results in that the fitness of homozygotes for one full chromosome is invariably very
low, in the sublethal range.
The method of Sved and Ayala [24] is as follows. Flies homozygous and
heterozygous for whole chromosomes sampled from a natural population are obtained
using the method shown in Figure 3. The flies recovered in the F3 are used to
establish experimental populations, where the course of natural selection can be
studied over many generations. Since the balancer chromosome inhibits recombina-
tion, only two kinds of viable zygotes can exist at any time-those homozygous for
the wild chromosome and those heterozygous for the wild and the balancer chromo-
some; all zygotes homozygous for the balancer chromosome die before completing
development. If the homozygotes for the wild chromosome have lower fitness than
the balancer heterozygotes, a stable equilibrium will eventually be established be-
tween the two types of flies. The relative fitness of the homozygotes can be directly
calculated from the zygotic equilibrium frequencies. If the balancer heterozygotes
have lower fitness than the chromosomal homozygotes, the balancer chromosome
388 Ayala
Fig. 3. Crosses used to obtain large numbers of Drosophila flies homozygous for a chromosome
sampled from a natural population. A. P generation: A wild male is crossed to females of a balanced
marker stock. The balancer stock contains a chromosome with multiple inversions (to inhibit recombi-
nation) and two mutant markers, one dominant and the other recessive; the other chromosome contains
the recessive mutant marker. F, generation: A single F I male heterozygous for one wild chromosome
and the marker chromosome is crossed to females of the balanced marker stock. F2 generation: Males
and females heterozygous for the same wild chromosome and the balanced marker chromosome are
intercrossed. Three kinds of progeny are expected in the F3 generation: One fourth should be homozy-
gous for the wild chromosome, one half should be heterozygous for the wild and the balanced marker
chromosome, one fourth should be homozygous for the balanced marker chromosome, but this carries
also a recessive lethal gene and these flies die. B. Control crosses are made by intercrossing F2 flies
heterozygous for different wild chromosomes. The wild flies in the F3 generation of these crosses are
heterozygous for different wild chromosomes, and thus are genetically similar to wild flies.
will gradually decrease in frequency; the relative fitness of the two kinds of flies can
then be estimated from the rate of elimination. Control experimental populations are
set up with flies heterozygous for different wild chromosomes, and for these and the
balancer chromosome. The heterozygotes for different wild chromosomes have ge-
netic constitutions comparable to flies in a natural population. The control populations
permit an estimation of the fitness of balancer heterozygotes relative to wild hetero-
zygotes. This estimate of fitness can be used to estimate the fitness of chromosomal
homozygotes relative to flies heterozygous for random combinations of wild
chromosomes.
With this method, overall fitness rather than a specific fitness component is
measured under population conditions. The experiments show that under these con-
ditions all chromosomes become either lethal or semilethal. Figure 4 shows the fitness
distribution of 23 second chromosomes sampled from a natural population of Droso-
phila melanogaster [25]. Although the method is extremely laborious, studies
have been conducted in three species of Drosophila. The results are summarized in
TabIe 8.
In order to estimate the number of loci that can be maintained by natural
selection in view of the fitness experiments, the assumption is made that selective
interactions between loci are multiplicative and that there is no linkage disequilibrium
[26]. If at each locus maintained by heterosis the heterozygote has a 0.01 selective
Molecular Polymorphisms-How and Why 389
REFERENCES
1 . Dobzhansky Th, Ayala FJ, Stebbins GL, Valentine JW: “Evolution.” San Francisco, CA: Freeman,
1977.
2. Marshall DR, Brown AHD: The charge-stage model of protein polymorphism in natural popula-
tions. J Mol Evol6:149-163, 1975.
3. Ramshaw JAM, Coyne JA, Lewontin RC: The sensitivity of gel electrophoresis as a detector of
genetic variation. Genetics 93: 1019-1037, 1979.
4. Slightom JL, Blechi AE, Smithies 0: Human fetal ‘7- and Ay-globin genes: Complete nucleotide
sequences suggest that DNA can be exchanged between these duplicated genes. Cell 21:626-638,
1980.
5 . Kimura M: Evolutionary rate at the molecular level. Nature 217:624-626, 1968.
6 . King JL, Jukes TH: Non-Darwinian evolution. Science 164:788-798, 1969.
7. Kimura M, Ohta T: Protein polymorphism as a phase of molecular evolution. Nature 229:467-469,
1971.
8. Marinkovic D, Ayala FJ: Fitness of allozyme variants in Drosophila pseudoobscura. I. Selection at
the Pgm-1 and M e 2 loci. Genetics 79:85-95, 1975a.
9 Marinkovic D, Ayala FJ: Fitness of allozyme variants in Drosophila pseudoobscura. 11. Selection at
the Est-5, Odh, and Mdh-2 loci. Genetical Research 24: 137-149, 1975b.
10. Snyder TP, Ayala FJ: Temperature and density effects on fitness at the Mdh-2 and Pgm-1 loci of
Drosophila pseudoobscura. Genetica 51 59-67, 1979a.
11. Snyder TP, Ayala FJ: Frequencydependent selection at the Pgm-I locus of Drosophila pseudoob-
scura. Genetics 92:995- 1003, 1979b.
12. Tosic M, Ayala FJ: Density- and frequency-dependent selection at the Mdh-2 locus in Drosophila
pseudoobscura. Genetics 97:679-701, 1981.
13. Ayala FJ, Tracey ML, Barr LG, McDonaId JF, Perez-Salas S: Genetic variation in natural popula-
tions of five Drosophila species and the hypothesis of selective neutrality of protein polymorphisms.
Genetics 77:343-384, 1974.
14. Ayala FJ: Protein evolution in related species: Adaptive foci. Johns Hopkins Med J 138:262-278,
1976.
15. Langley CH, Fitch WM: An examination of the constancy of the rate of molecular evolution. J Mol
EVOI3: 161-177, 1974.
Molecular Polymorphism-How and Why 391
16. Gillespie JH, Langley CH: Are evolutionary rates really variable? J Mol Evol 13:24-34, 1979.
17. Gillespie JH: Polymorphism and molecular evolution in a random environment. Genetics 93:737-
754, 1979.
18. Sved JA, Reed TE, Bodmer WF: The number of balanced polymorphisms that can be maintained in
a natural population. Genetics 55:469-481, 1967.
19. King JL: The gene interaction component of the genetic load. Genetics 53:403-413, 1966.
20. Dobzhansky T, Spassky B: Genetics of natural populations. XXXIV. Adaptive norm, genetic load
and genetic elite in D. pseudoobscura. Genetics 48: 1467-1485, 1963.
21. Gowen JW (ed): “Heterosis.” Ames, 10:Iowa State College Press, 1952.
22. Marinkovic D: Genetic loads affecting fertility in natural populations of Drosophila pseudoobscura.
Genetics 57:701-709, 1967.
23. Latter BDH, Robertson A: The effects of inbreeding and artificial selection on reproductive fitness.
Genet Res 3:llO-138, 1962.
24. Sved JA, Ayala FJ: A population cage test for heterosis in Drosophila pseudoobscura. Genetics
66:97-113, 1970.
25. Tracey ML, Ayala FJ: Genetic load in natural populations: Is it compatible with the hypothesis that
many polymorphisms are maintained by natural selection? Genetics 77:569-589, 1974.
26. Lewontin RC, Hubby JL: A molecular approach to the study of genic heterozygosity in natural
populations. 11. Amount of variation and degree of heterozygosity in natural populations of Droso-
phila pseudoobscura. Genetics 54595-609, 1966.
27. Seager RD, Ayala FJ, Marks RW: Chromosome interactions in Drosophila melanogaster. 11. Total
fitness. Genetics 102:485-502, 1982.
28. Tosic M, Ayala FJ: “Overcompensation” at an enzyme locus in Drosophila pseudoobscura. Genet
Res Camb 3657-67, 1980.
29. Ayala FJ, Kiger JA Jr: “Modern Genetics, 2nd ed.” Menlo Park, CA: BenjaminlCummings, 1984.
30. Ayala FJ: Genetic variation in natural populations: Problem of electrophoretically cryptic alleles.
Proc Natl Acad Sci USA 79:550-554, 1982.
31. Benyajati C, Place AR, Powers DA, Sofer W: Alcohol dehydrogenase gene of Drosophila melano-
gaster: Relationship of intervening sequences to functional domains in the protein. Proc Natl Acad
Sci USA 78:2717-2721, 1981.
32. Schreier PH, Bothwell ALM, Mueller-Hill B, Baltimore D: Multiple differences between the nucleic
acid sequences of the IgG2aa and IgG2ab alleles in the mouse. Proc Natl Acad Sci USA 784495-
4499, 1981.
33. Sheppard HW, Gutman GA: Allelic forms of rat K chain genes: Evidence for strong selection at the
level of nucleotide sequence. Proc Natl Acad Sci USA 78:7064-7058, 1981.
34. Ullrich A, Dull TJ, Gray A, Brosius J, Sures I: Genetic variation in the human insulin gene. Science
209~612-615,1980.
35. Grula JW, Hall TJ, Hunt JA, Guigni TD, Davidson EH, Britten RJ: Sea urchin DNA sequence
polymorphism and reduced interspecies differences of the less polymorphic DNA sequences.
Evolution 36:60-676, 1982.
36. Mourao CA, Ayala FJ, Anderson WW: Darwinian fitness and adaptedness in experimental popula-
tions of Drosophila willistoni. Genetica 43:552-574, 1972.
37. Wilton AN, Sved JA: X-chromosomal heterosis in Drosophila melanogaster. Genet Res Camb
34:303-315, 1979.
38. Sved JA: An estimate of heterosis in Drosophila melanogaster. Genet Res 18:97-105, 1971b.
39. Sved JA: Fitness of third chromosome homozygotes in Drosophila melanogaster. Genet Res 25: 197-
200. 1975.