Genetics Genomics - en PDF
Genetics Genomics - en PDF
Authors:
Chapter 1: Valéria László
Chapter 2, 3, 4, 6, 7: Sára Tóth
Chapter 5: Erna Pap
Chapter 8, 9, 10, 11, 12, 13, 14: Csaba Szalai
Chapter 15: András Falus and Ferenc Oberfrank
Summary
The book contains the substance of the lectures and partly of the practices of the subject
of ‘Genetics and Genomics’ held in Semmelweis University for medical, pharmacological
and dental students. The book does not contain basic genetics and molecular biology,
but rather topics from human genetics mainly from medical point of views. Some of the
15 chapters deal with medical genetics, but the chapters also introduce to the basic
knowledge of cell division, cytogenetics, epigenetics, developmental genetics, stem cell
biology, oncogenetics, immunogenetics, population genetics, evolution genetics,
nutrigenetics, and to a relative new subject, the human genomics and its applications for
the study of the genomic background of complex diseases, pharmacogenomics and for
the investigation of the genome environmental interactions. As genomics belongs to
sytems biology, a chapter introduces to basic terms of systems biology, and
concentrating on diseases, some examples of the application and utilization of this
scientific field are also be shown. The modern human genetics can also be associated
with several ethical, social and legal issues. The last chapter of this book deals with these
issues. At the end of each chapter there are questions, with which the readers can
ascertain whether they understood and/or learned the chapter. Because it is an e-book,
some terms and definitions has a hyperlink for more detailed explanations in the World
Wide Web. Besides university students, the book is also recommended to all those who
are interested in modern medical genetics and genomics and want to be up-to date in
these subjects.
Typotex Kiadó
2013
COPYRIGHT: András Falus, Valéria László, Ferenc Oberfrank, Erna Pap, Dr. Csaba
Szalai, Sára Tóth, Budapest University of Technology and Economics
Figure 1.1. Phases (G1, S, G2, M) and checkpoints (G1, G2, M) of cell cycle. Cell cycle control
system allows to overstep checkpoints if the conditions are suitable for the cell to proceed
to the next phase.
The main checkpoints are the following: G1 checkpoint (in higher eukaryotes it is
referred to as restriction point), where first of all the integrity of DNA is checked,
operates at the end of the G1 phase. The second checkpoint is at the end of G2 phase, it is
the G2 checkpoint, where the accuracy and integrity of DNA is monitored. Finally, the
function of M checkpoint, in the metaphase of mitosis is to ensure the appropriate
attachment of all chromosomes to the microtubules of the mitotic spindle before the
duplicated chromosomes are separated. And now let us see a brief summary of
multicellular (mammalian) cell cycle and the regulation.
1.1.1. G0 - G1 transition
In an adult multicellular organism most cells do not divide, they are found in a special
phase, G0 phase. G0 phase cells lack functional cyclins and cyclin-dependent kinases, the
main cell cycle regulators. If proliferation is necessary, these G0 phase cells have to
return into the cell cycle, essentially have to pass G1 checkpoint or restriction point. It is
induced by growth factors or extracellular matrix components initiating transcription
and translation of D cyclin and reduction of Cdk inhibitors by stimulating their
proteasomal degradation. These Cdk inhibitors: p16, p15, p18 and p19 specifically
inhibit Cdk4 and Cdk6 by preventing the binding of activating D cyclin, and also the
activity of Cdk-cyclin complex. The main target of active Cdk4/6-D cyclin complex is
pRb (Rb stands for retinoblastoma, a malignant disease of the retina caused by the
mutation of pRb encoding gene), p107 and p130 proteins. The phosphorylation of these
proteins causes conformational changes and they release E2F transcription factors. And
it is the turning point in G0-G1 transition, because E2F transcription factors induce the
transcription of several S-phase specific genes, such as E cyclin, A cyclin, thymidine
kinase, DNA polymerase etc. E cyclin activates Cdk2 whose main target, similarly to the
Cdk4/6-D-cyclin is Rb protein, the phosphorylation of which is enhanced (positive
feedback). Cdk2 has another activator, A cyclin, their complex is essential in S phase
initiation (Figure 1.2).
Disadvantageous environmental effects, e.g. hypoxia (excessive proliferation of cells
may result not sufficient blood flow) or DNA damages activate G1 checkpoint machinery
and it will stop the cell cycle. The amount and activity of p53 is increased which in turn
induces the transcription of a Cdk inhibitor protein, p21. p21 is a general Cdk inhibitor,
hence it inhibits all Cdk-cyclin complexes: Cdk4 / 6 - D cyclin, Cdk2- E cyclin and Cdk2- A
cyclin, so the cell cycle is halted and the cell may not enter S phase. This general Cdk
inhibitor family has two other members, p27 and p57. These proteins prevent the
duplication of damaged DNA, suspend the cell cycle, allowing error correction. Briefly,
their activity prevents the cell cycle resulting genetically different cells (Figure 1.2).
p
M G1 Restriction point S
Stimulation pRb-PO4
Inhibition
Cdk2 - Ec
Dc
⊥
⊥ Cdk4/6- Dc ⊥ pRb-E2F
CKI↓ CKI ↑
Go p15,16,18,19 p21,27,57
E2F
p53↑ Ac
The Cdk inhibitor encoding genes are tumor suppressor genes whose mutations in
homozygote form (recessive) are the main contributors of tumor development. The
most well-known tumor suppressor gene species are p53 and pRb encoding genes.
About half of the tumors lack functional p53. The genes encoding cell cycle stimulating
proteins (Cdk-s, cyclins, growth factors and many others) are protooncogenes. Their
mutation in heterozygote form (dominant) is also involved in tumor development.
1.1.3. G2 – M transition
The regulation of G2 - M transition is better known than that of G1 - S transition. The M-
phase is triggered by MPF (M-phase or Mitosis Promoting Factor), that is a complex of
B cyclin and Cdk1. After the binding of these proteins post-translational modifications
are required for the final activation. Cdk1 component of the complex is the substrate of
two kinases, one is an activating kinase which adds a phosphate group to a tyrosine, the
other is an inactivating kinase which phosphorylates a threonine residue of the protein.
The latter is removed by a phosphatase (product of a gene belonging to Cdc25 gene
family), and this is the last step in MPF activation (Figure 1.3). But all these events will
only happen if G2 checkpoint machinery finds DNA undamaged and correctly replicated.
MPF has numerous substrates, first of all it activates Cdc25 protein, thus by a
positive feedback control more and more MPF is activated. In mammalian cells there are
three phosphatases: Cdc25A, B and C, at this point of cell cycle regulation, the C type
operates.
Then, MPF triggers M-phase through the phosphorylation of further target proteins,
like lamin A, B and C, components of nuclear lamina, a structure attached to the inner
nuclear membrane. It results disintegration of nuclear membrane.
MPF indirectly inhibits actomyosin ATP-ase activity causing rearrangement of
microfilaments and consequently rounding of the cell and also inhibiting premature
cytokinesis.
One of the major events, the chromosome condensation is also triggered by MPF,
through the phosphorylation of condensins, H1 and H3 histones.
Phosphorylation of MAP-s (microtubules associated proteins) changes the
arrangement of microtubule system and induces mitotic spindle formation needed for
chromosome separation.
In G2-M phase transition the APC (Anaphase Promoting Complex) having role later in
metaphase-anaphase transition is indirectly activated by MPF.
1.1.4. M-phase
The M-phase is a complex process of successive steps, a series of events, used to be
divided into mitosis and cytokinesis. In the first half of the M-phase, in mitosis the
doubled DNA divides in two, followed by the separation of cytoplasm, by the phase of
cytokinesis.
Metaphase. The chromosomes are arranged in the equatorial plane of the cell by the
help of kinetochore microtubules. Kinetochore regions face the two poles of the cell and
the kinetochore microtubules bind to sister chromatids of a chromosome from opposite
direction.
Anaphase. Sister chromatids of chromosomes split and move toward the poles of
the cell. In the first half of anaphase (anaphase A) the kinetochore, later in the second
half of anaphase (anaphase B) the polar microtubules operate. It is the shortest phase of
mitosis.
Cytokinesis. The separation of the cytoplasm begins in the late anaphase and is
completed after the telophase. In the middle of the cell, perpendicular to the axis of the
mitotic spindle cleavage furrow appears, which gradually deepens and thus the
connection between the two half cells narrows. The overlapping region of polar
microtubules makes so-called midbody. Finally, the cytoplasm completely splits.
this packaging is still not known in detail. The major points of a widely accepted model
are described below (Figure 1.4).
Two nm wide DNA double helix wraps the octamers of histones (2 of each H2A, H2B,
H3 and H4 histone molecules) forming nucleosomes, disc-like structures connected by
the continuous DNA molecule. It is called nucleosomal structure having a diameter of
11 nm. H1 histone folds six nucleosomes in one plane to give a diameter of 30 nm fiber
called chromatin or solenoid. The chromatin fiber is attached to a protein scaffold and
forms loops. These loops are the basic unit of replication and transcription, and this
structure is 300 nm wide. Finally, it is further compressed and folded to produce the
chromatids of 1400 nm wide metaphase chromosome (Figure 1.4). The final step of
chromosome condensation is induced by the MPF activated condensins. There are two
protein complexes of similar structure influencing different DNA functions: the
condensins and the cohesins. They are composed of different SMC (structural
maintenance of chromosomes) proteins having ATPase activity and regulatory
functions, all associate in a ring-like structure (Figure 1.5).
There are 5 pairs of human chromosomes having not only primary, but secondary
constriction or NOR (nucleolar organizer region) too, which contains a high number of
copies of the large (45S) rRNA gene.
1.1.4.2. Disappearance and re-formation of nuclear envelope
As it was mentioned before, the lamins of nuclear lamina attached to the inner surface of
nuclear envelope are phosphorylated by MPF causing the dissociation of nuclear
membrane into vesicles. Lamin B remains in the membrane of vesicles, but lamin A and
C are found in soluble form in the cytoplasm. The highly organized nuclear pores are
also decomposed. At the end of mitosis, in the telophase, the phosphatases are activated
and dephosphorylate the lamins. The reformation of nuclear envelope begins on the
surface of chromosomes. They move closer to each other, and the membranes fuse and
the pores are also reorganized. Finally the chromosomes decondense to chromatin.
1.1.4.3. Structure and role of mitotic spindle
The components of mitotic spindle are the centrosomes and the microtubules. In
human cells, the major microtubule organizer center (MTOC) is the centrosome; in
interphase cells it is located usually near to the nucleus. The structure of centrosome is
the following: in the center there are two perpendicular cylindrical bodies (the
centrioles), which are connected by proteins at their bases. The centrioles are made of
9x3 microtubules in windmill-like arrangement. Around them an amorphous,
unstructured material, the pericentriolar matrix is located, in which numerous
different proteins are found. The microtubules grow out from the pericentriolar matrix
in star-like manner, this region is called aster (Figure 1.7). Minus ends of microtubules
face the centrioles, their plus ends face outward. The microtubules are organized, the
tubulin heterodimer polymerization is induced by a special subtype of tubulin, found in
γ-tubulin rings in the pericentriolar matrix.
To produce genetically identical cells through the cell cycle not only the accurate
replication and separation of DNA, but the precise duplication and division of
Mitotic spindle organization also needs the activation of MPF. At the beginning of
division, the MAP-s, microtubule associated proteins, are phosphorylated by MPF, which in
turn changes the characteristic interphase microtubule arrangement, and induces the
development of mitotic spindle. In interphase there are few, long and relatively stable
microtubules. Oppositely the mitotic spindle is characterized by many, short and highly
dynamic microtubules.
In prophase many, dynamic microtubules grow in all directions back away from the
centrosomes. The attachment to any structure by their + end stabilizes the microtubules.
The microtubules growing from different poles may bind to each other giving rise to the
partly overlapping polar microtubules. In the overlapping region + end motor proteins
are found, which stabilize the polar microtubules and are also needed to push apart the
two poles in anaphase B.
1.1.5. Cytokinesis
The division of cytoplasm is carried out by other components of the cytoskeleton than
the separation of chromosomes, but the two cytoskeletal systems are not independent.
The site of cytoplasmic cleavage is denoted by the mitotic spindle. Asymmetric
positioned mitotic spindle results in asymmetric cytokinesis, in different sized cells. In
late anaphase (anaphase B), after the migration of two sister chromatids to the poles
beneath the plasma membrane, perpendicularly to the mitotic spindle axis, a
contractile ring composed of actin and myosin II filaments is formed. The regulation
of contractile ring development is not exactly known, but the role of kinases and
monomeric G-proteins is suspected. The sliding of actin and myosin filaments on each
other eventually leads to the progressive cleavage of the cell. Finally, below the
contractile ring, the two cells are only connected by the so-called midbody. The new cell
membranes develop by the fusion of vesicles, which transport probably takes place
along the microtubules of midbody.
phosphorylation, that is why the inactive Cdc25 is not able to activate MPF by the
cleavage of inactivation phosphate group, so the cell cycle is stopped before M-phase.
In the M checkpoint the sensor proteins bind to free kinetochores of chromosomes.
However, these proteins recruit a protein which is required for the APC function, so in
the case of free kinetochores APC is not functional; the cell is retained in metaphase.
Obviously the precise operation of checkpoint machinery is more complex than it has
been described before; the details are being discovered nowadays. Anyway its accurate
function is essential to give rise to genetically identical cell by cell cycle.
The failure of sufficient regulation or checkpoint machinery may result in atypical
divisions. Although some of these atypical divisions, depending on the species and cell
type is not necessarily abnormal, but the majority of them is characteristic of tumor
cells. Of course, from each of atypical proliferation genetically diverse cells are
originated.
In endomitosis the nuclear envelope remains intact, therefore the amount of
cellular DNA content increases. In parallel with the size of the nucleus and the whole cell
enlarges, too, so giant cells are formed. The sister chromatid separation inside the
intact nuclear causes the increase in chromosome number of the cell, these are referred
to as polyploid cells. If sister chromatids remain together, it leads to the formation of
giant chromosomes (polytene) composed of many sister chromatids instead of two.
Failure of cytokinesis results in giant cells, too, but these cells have more nuclei.
Many division abnormalities are caused by the mitotic spindle defects. The normal
division is bipolar due to the precise duplication and separation of centrosomes.
Abnormalities either in the duplication or the separation of centrosomes may cause so-
called multipolar divisions, depending on the number of poles: tri-, tetra-, etc. polar
divisions.
The consequence of non-disjunction of sister chromatids of a chromosome can be
easily calculated, it leads to the change of chromosome number (aneuploidy) in both
cells, in one of them one more and in the other one less chromosome is found. The
reason for such a defect is the syntelic or monotelic kinetochore-microtubule
attachment (Figure 1.10).
However, the merotelic attachment may result in bridge formation (or anaphase
bridge, because it becomes visible in anaphase). First the sister chromatid to which
microtubules bind from both poles make a kind of bridge but later it more probably
breaks. The breakage of chromosomes leads to structural chromosomal abnormalities.
The chromosome fragment without centromere is excluded from the nucleus and makes
a so-called micronucleus in the cytoplasm. This phenomenon is used in mutagenicity
assays to detect compounds which cause/increase chromosomal breakages.
1.2. Meiosis
There are two forms of genetic information transmission from one generation to the
next one. Firstly asexual reproduction, which is typical for the lower organisms evolved.
It is a simple process, the offspring develop from the somatic cells of a single parent,
thus they are genetically identical to the parent organism.
In sexual reproduction the offspring have mixed genome of two parents, so they are
genetically different from both parents and from each other. Sexual reproduction has a
great evolutionary advantage for the species, because the individuals gain high genetic
variability allowing the adaptation to the unexpected circumstances. Sexual
reproduction is crucial for the survival of species. In sexually reproducing organisms,
there are two successive generations of cells: the diploid somatic cells give rise to
haploid cell by meiosis and the haploid cells, which are reduced to gametes in animals.
The species-specific chromosome number is restored by the fusion of gametes resulting
diploid zygote, and the life of a new individual starts.
How these haploid cells are formed in meiosis? The essence of the process is double:
on one hand the chromosome number is halved, secondly the parental genetic
information is mixed.
Due to DNA condensation, chromosomes become thicker and more visible and the
synapses are completed in pachytene. After pairing they form structures composed of
two chromosomes, maternal and paternal one (bivalent), both having two sister
chromatids (tetrad). The tight binding between the homologous chromosomes leads to
apparent decrease in number of chromosomes (pseudoreduction). The majority of
double-stranded DNA breaks are repaired, but at some of them homologous
recombination (crossing-over), exchange of corresponding chromatids occurs. This
process is mediated by the recombination nodules, large 100 nm sized multi-enzyme
complexes, which appear on the synaptonemal complex. The detailed molecular
mechanism of crossing over is not discussed here. The crossing over may occur between
any chromatids, but it results new combination of genes if it happens between non-
sisters. The number of crossing overs between non-sister chromatids of a chromosome
pair is 1-3. There is compulsory recombination even between the basically not
homologous X and Y chromosomes at their pseudoautosomal regions (PAR). Checkpoint
machinery controls the appearance and the process of crossing over, underlining the
significance of homologous recombination.
In diplotene stage the synaptonemal complex largely detaches, thus the members of
homologous pairs may slightly move away from each other, so the chromosomes are
linked only at the sites of crossing overs, referred as chiazmata. Finally, in diakinesis the
homologous separation continues, but the bivalents are still connected at chiazmata,
found between sister chromatids of homologous chromosomes, and also by cohesins
which held together sister chromatids of a chromosome. Later the cohesins dissociate
from the arms and keep the chromatids together only at centromeric regions. During the
prophase kinetochore region develops on chromosomes, but in contrast to mitosis, both
kinetochores of a chromosome face one pole, while the kinetochore of the homologous
face opposite poles (Figure 1.13).
- In first division metaphase not the single chromosomes, but the chromosome
pairs are arranged in the equatorial plane, whereas the chiazmata still connect the
homologs. Chiazmata only disappear at the end of metaphase.
- In the anaphase the kinetochore microtubules pull the homologous chromosomes
and not the chromatids toward the poles, since the kinetochores of a chromosome face
the same pole. Thus the synapses not only allow the cross-over, but also needed to halve
the number of chromosomes. The separation of homologous, which member of a pair is
pulled to a given pole is a random process. It increases further the genetic variation. In
human it is 223.
In telophase the nuclear membrane is reorganized, and the cytoplasm splits. Arising
cells are haploid, that is why the first division of meiosis is called reduction division.
The chromosomes are still composed of two sister chromatids, which will separate in
the following part, in meiosis II.
The first division is followed by a short interphase, in which there is no DNA
replication.
Second division of meiosis is also divided into pro-, meta-, ana- and telophase, but
these phases are essentially very similar to the phases of mitosis. Thus, in metaphase the
single chromosomes are arranged in the equatorial plane, and in the anaphase the sister
chromatids of the chromosomes are separated. The orientation of kinetochores is also
similar to mitosis.
In the telophase the nuclear envelopes are reorganized, two nuclei are formed and
then the cytoplasms are also halved.
Finally the meiosis results from a diploid cell four haploid cells, the gametes.
After the fusion of two haploid cells, in the zygote chromosome number of the species is
reconstituted. At the same time the genetic information of the gamates is different
caused by the homologous recombination in meiosis I prophase and the random
assortment of homologous in meiosis I anaphase. These processes provide high
genetic variability needed for the survival of the species.
1.2.2. Oogenesis
In most animals the female gamete (egg) is very large compared to somatic cells. Eggs
contain yolk: different nutrients (lipids, proteins, carbohydrates) sufficient for the early
development of embryo, until self-feeding will be able. Although the egg of mammals
contains small amount of yolk (oligolecithal), it is much bigger than the body cells of the
organism. The size of a human egg is about 100 µm.
In developing gonads of embryo the primordial germ cells (46 chromosomes)
develop to oogonia (46 chromosomes) which divide by mitosis. The cell entering the
first meiotic division is primary oocyte (46 chromosomes). Meiosis I is halted, the cells
may remain in prophase diplotene stage for decades. Meanwhile, a coat, the zona
pellucida develops around them, and in the cytoplasm cortical granules accumulate,
which content is released after the sperm penetration, preventing the penetration of
further sperms. From the puberty due to hormonal effects, cyclically one cell resumes
meiosis I. Division of the cytoplasm is asymmetric, the larger cell is the secondary
oocyte (23 chromosomes), whereas the smaller cell is polocyte (or polar cell, also has
23 chromosomes). The unequal cytokinesis is likely provided by the asymmetric mitotic
spindle position. Each cell divides again, this is meiosis II, polocyte gives rise to two
polocytes (each 23 chromosomes), secondary oocyte results a large ovum and a small
polocyte (each 23 chromosomes). The second cytokinesis is asymmetric, too. In higher
organisms, in human meiosis II is halted in metaphase. The secondary oocyte ovulates in
this stage and the completion of second division of meiosis is triggered by the
penetration of sperm, by the fertilization (Figure 1.15).
1.2.3. Spermatogenesis
While in most species the egg is the largest cell, not capable of independent movement,
the other gamete, the sperm is the smallest cell and is able to move.
In male organism primordial germ cells (46 chromosomes) migrate into developing
testis where they become spermatogonia (46 chromosomes) in the external wall of
testis. From the puberty, spermatogonia divide continuously by mitosis. A group of them
enters meiosis I; these are the primary spermatocytes (46 chromosomes). Meiosis I
gives rise to two haploid cells, called secondary spermatocytes (23 chromosomes).
The second meiotic division makes for haploid round, immobilized cells, called
spermatids (23 chromosomes). In both divisions the cytokinesis is incomplete, the
secondary spermatocytes and spermatids, too, are connected to each other by
cytoplasmic bridge.
After meiosis differentiation process, cytological morphogenesis begins, which
results in actively motile sperms. This step is called spermiohistogenesis, which
happens embedded in Sertoli cells. From Sertoli cells, the sperms are placed in the
lumen of testis (Figure 1.15).
The typical structure of differentiated sperms (head, midpiece and tail part) serves
only one purpose, to safely convey DNA content to the egg.
The head contains the nucleus where DNA is in completely heterochromatic form to
occupy the smallest space. Protamines being more positively charged proteins than
histones are needed for DNA to be packed so condensely. In front of the nucleus, a giant
secretory vesicle, acrosome is located. The acrosome vesicle containing hydrolytic
enzymes is responsible for dissolving the different coats of eggs during fertilization.
In the midpiece fused mitochondria are located, forming so-called mitochondrial sheath,
where ATP necessary for the movements of the sperm is produced. The tail essentially is
a flagellum composed of 9x2 + 2 microtubule system, additionally at the periphery there
are nine dense keratin containing fibrils whose function is still not clear.
Recent studies have demonstrated the differences of spermatogenesis or oogenesis
regarding the meiosis I prophase. These are summarized in the next table.
According to the classic definition mutations are sudden heritable changes in the DNA.
Today the definition of mutations is more complex: a mutation is a change in the DNA
sequence, which population frequency is less than 1%. In contrast to this the
polymorphism is a variant which has a population frequency greater than 1%.
This separation is artificial and a bit confusing, since in both cases there are DNA
sequence alterations, and similar mechanisms are leading to their formation. Since the
mechanisms leading to mutations and polymorphisms, their origin, the level of their
expression, and their consequences can be extremely diverse, there are many
possibilities to group them (also see Chapter 8). In both cases the scale runs, from the
very small alterations (point mutations - SNP = single nucleotide polymorphisms,) up to
the very long ones (structural chromosomal mutations - chromosome polymorphisms
such as 1qh+). Similarly, in both cases there are symptom / disease causing and non-
causing variations. In this chapter, mutations and within this group rather small-scale
mutations are dealt with, while the larger changes in the DNA sequence are discussed in
the Cytogenetics chapter (Chapter 3). The polymorphisms and their biological - genetic
role and the medical significance of them will be detailed in Chapter 8 describing the
characteristics of genome.
modern molecular biological methods, it may be suitable for the detection of pre-
cancerous conditions.
Germline mutations occurring in primordial germ cells or in germ cells during
gametogenesis are inherited to the offspring, and therefore they have crucial importance
in medicine.
According to their origin there are spontaneous, due to defective DNA replication,
and induced mutations caused by various environmental effects (radiation, chemicals,
etc.).
The mutation frequency depends on the evolutionary level, since in prokaryotes in
the absence of DNA repair the mutation rate is much higher than eukaryotes.
Accordingly, high mutation frequency was observed in mitochondrion with its
prokaryotic-like DNA. This is about ten times! higher than the mutation rate of the
nuclear DNA, which is about 10-5 per gene per generation.
The mutations were also classified according to the degree of change caused in the
DNA. Thus, we can talk about gene mutation - this is sometimes called point mutation,
chromosome mutation when the lesion involves several genes, and genome mutation,
which can affect the entire genetic material.
Original sequence:
5'-GCC ATT TCA ACT GCC TGC AGC 3 '
MUTATION
5'-GCC ATT TCG AGC CTG CAC TAG C 3' insertion
ancestral gene, are located directly one after the other in the DNA). Such is the case of
deletions affecting globin gene families in various hemoglobin disorders
(hemoglobinopathies), for example in thalassemias (its mutation mechanism is
discussed at mutation hot spots). Of course, the longer the lesion, the more severe the
consequences, that is the more wrong, altered or functionless the protein product as
well.
A special case of gene mutations, namely the additions is, when Alu sequences or
LINE elements are integrated into the coding region of a gene by transposition or
retrotransposition. Then the addition of the jumping element (transposon or
retroposon) breaks the original exon sequence, and thus leads to RNA and protein
formation of altered information content. In the case of hemophilia A addition of an Alu
sequence causes the disease.
Similarly, gene duplication due to recombination results in mutation as well. It
occurs either in meiosis, when unequal crossing over between non-sister chromatids
leads to gene duplication, or in mitosis, when rarely mitotic recombination (crossing
over) takes place between sister chromatids. In the latter case, it is a somatic mutation,
and may lead to tumorigenesis by creating such daughter cells, where in one of them the
heterozygosity is lost. (In the other daughter cell three copies of the gene are present,
where gene duplication is in one of the homologous chromosomes while the other
homologue is with a single locus of the given gene).
The so-called mutational hot spots should be mentioned here. Individual DNA
sequences, genes more likely mutate, where repetitive sequences are found. These
repeats may interfere with the replication and meiotic pairing of the homologous
chromosomes. The replication abnormality has physical causes: symmetrical or
repetitive sequences located on the same strand of unwound DNA can pair on the basis
of complementarity or can form loops, and thus disturb the function of enzymes
involved in replication and repair. For example, in hemophilia B, where large direct
sequences with CG repeats occur in the factor IX. coding gene, there are 10 to 100 times
more mutations. This higher mutation frequency may be attributable to epigenetic
causes as well. Methylated cytosine easily deaminates to thymine, thus leading to a C →
T transition on one strand and a G → A transition on the other.
The above mentioned uneven crossing (see Chapter 3, Cytogenetics) can explain the
repetition of larger sequences, sometimes of entire genes. For this phenomenon a good
example is the formation of α-thalassemia. Normally, both homologs of chromosome 16
contain two α-globin genes one after the other. The uneven crossing over can result in
gametes, which for example contain 1, or 3 α-globin genes. Following the fertilization of
such gametes zygotes may be formed in which one more or one less α-globin genes are
present. The person's health depends on the number of α-globin genes: 0 copy -
intrauterine lethality, 1 copy - severe anemia, 2 copies - mild anemia, three copies -
asymptomatic carrier can be the consequence. Today, more than 30 diseases are known,
which are caused by uneven crossing over (e.g. red-green color blindness).
Not only the direct repeats, but palindromic sequences (sequences of which base
sequence in 5'-3 'direction is the same on both strands) can frequently lead to additions
and deletions.
A special type of gene mutations are the repeat mutations where different numbers
of nucleotides are repeated forming so called repetitive units. Beside the best known
trinucleotide repeat mutations there are other length mutations involving up to 24
nucleotides long units, thus resulting in the accumulation of e.g. octapeptide units in the
protein (Creutzfeldt-Jakob disease).
It is typical for the repeat mutations that the diseases are caused only above a certain
number of repeats, so there is a so-called premutation state, and that the repeat
expansion (the increase in the repeat number) takes place during meiosis.
The increase in the number of repeats (e.g. CAG) is the result of a process (already
mentioned at the mutation hot spots), when during the replication one of the DNA
strands is looped due to the repetitive sequences. If this looping affect the newly
synthetized strand, the replication apparatus can detect it as if the replication had not
been completely carried out from the template strand, so more repeat units are added to
the new strand. This phenomenon is called replication skipping.
With this the new strand contains several new repeat units as well. The old and the
new strands are different in length, and then the repair mechanism corrects it by adding
a sufficient number of new repeats to the old strand. The opposite phenomenon also
exists - that may decrease the number of repeats. In this case looping affects the
template strand, i.e. the newly synthetized strand will be shorter than the original was.
However, in this case the repair will correct the error, too, i.e. the unnecessary number
of repeats will be cut out from the old strand, so eventually there will be a DNA molecule
containing fewer repeat units.
Since DNA replication takes place before mitosis and meiosis and, in principle, the
repeat number can change in both cases. In contrast, in the case of GCN trinucleotide
repeats the change in the repeat number is rather explained by a meiotic event namely
by uneven crossing over.
Since the number of repeats varies from generation to generation, the repeat
mutations are not stable and therefore they are recently also called dynamic mutations.
In the case of prokaryotes, this dynamism has an important role in counteracting with
the effects of the host's immune system; in eukaryotes it may play a role in
tumorigenicity.
The role of repeat mutations that cause disease is easy to see, since with the addition
of increased number of repeats into the coding sequence, with this expansion, the
structure of the gene involved will be more and more distorted, therefore the protein is
coded by it becomes more and more altered and non-functioning.
There is a phenomenon called anticipation - long been known in human genetics but
cannot be explained for a long time. Anticipation means that a hereditary disease
transmitted from generation to generation will appear in younger and younger age
and in more and more severe form. Since the repeat expansions of medical
importance takes place especially in meiosis (or in the previous S phase), and the gene in
question will be increasingly damaged by this process, the above mentioned
phenomenon is well explained.
In case of gene mutations not only the size of the mutation, the length of the DNA
sequence in question is important, but in eukaryotes, including humans the place of
mutation is important as well. It is not all the same whether a mutation is in a coding or
non-coding region or in an untranslated region, in an exon or an intron, or even in the
border between them. In the latter case, so-called splicing mutations can be formed,
because the exon-intron boundary sequences play an important role in the intron
looping, in the lariat formation thereby in spliceosome function. As a consequence of
splicing mutations an exon is lost, or an intron can be translated, and a definitely
defective protein will be the end result.
Even mutations of a single gene, which occur at various places, or base exchanges or
splicing mutations can cause symptoms of completely different diseases, or symptoms of
different severity as it is known for a large number of mutations in cystic fibrosis.
The role of the UTR mutations had not been understood till the past few years, since
at first glance we might think that a mutation affecting the non-coding DNA sequences
only, and therefore no defective protein had been produced, could not cause symptoms
or disease. In contrast, we now know that the 5 'UTR region of the mRNA is required for
ribosome binding, and normal protein synthesis. Thus, it was also understood that some
of the trinucleotide repeat mutations in which the expansion affects the UTR region why
cause diseases. Moreover, the methylation of cytosines in a number of repeats induces
epigenetic alterations (attachment of methyl-binding proteins and / or non-coding
RNAs, chromatin remodeling), which also explains the role of the UTR mutations in
disease.
Although as a result of the Human Genome Project the human DNA sequence is
almost completely known, but knowing the sequence does not imply the identification of
the gene and knowing the gene does not automatically mean the understanding of its
function.
This is particularly problematic in those cases where the mutation results in a new
protein with different functions, but neither the original protein nor the mutation itself
is known. These are the so-called gain of function mutations. It makes the genetic
analysis more difficult. It is typical for Huntington disease, where the exact original
function of the finally identified huntingtin protein is still not well understood.
The situation is simpler when a mutation changes the structure or the function of a
previously known protein. Then loss-of-function mutations are present like in the case
1/ whether it reverses the chemical reaction that causes mutation; it is the direct
repair or
2/ it cuts out the incorrect bases and replaces by correct ones; it is known as
excision repair.
1/ The best example of the direct repair is the removal UV-induced thymine dimers.
Photoreactivation is a process typical mainly for prokaryotes and for some eukaryotes
(e.g. yeast). In this process the cyclobutane ring formed between the pyrimidine bases
are slit by the use of visible light energy, and the bases remain in their original location,
and the previous structure is restored.
Although UV radiation is one of the most mutagenic effects (just think about the
growing ozone hole and the intensive UV radiation exposure over Earth's surface),
unfortunately, many species, including humans are not capable of this photoreactivation
repair. Will it be explained by the later evolution of man which followed the formation of
the protective ozone layer, and by the small amounts of UV rays reaching the earth's
surface?
Another direct repair mechanism is the removal of alkylated bases. The O6-
methylguanine methyltransferase enzyme removes the methyl group of guanine by
linking it to a cysteine base in its active site. Such enzymes are found both in pro- and
eukaryotes.
2/ The excision repair is more common than the direct repair. There are three types:
a/ base excision
b/ nucleotide excision
c/ mismatch repair
a/ During the base excision repair the single incorrectly incorporated base is
spliced out and the DNA polymerase fills the gap by using the intact complementary
strand as a template.
b/ In nucleotide excision repair not only the mutated part e.g. thymine dimer is
cleaved off, but also the preceding and following few other nucleotides, i.e. a shorter
oligonucleotide sequence. Then, the DNA polymerase fills the gap based on the
undamaged complementary strand, and DNA ligase connects the old and the repaired
section. For this process in humans seven different genes are required and mutations in
any of them are associated with uncorrected UV radiation-induced mutations. It is
typical for rare inherited disorders, such as the Cockayne syndrome or xeroderma
pigmentosum. The latter disease is a good example of genetic heterogeneity, since the
errors of different excision repair enzymes result in the same symptoms.
c/ During mismatch repair not exactly complementary bases that not exactly fit to
the double helix are recognized and removed. During DNA replication - in parallel with
the synthesis - mismatched bases are detected and removed by the proof-reading (3'→5'
exonuclease) activity of the DNA polymerase enzyme. Those that escape this process are
corrected by the enzymes of the mismatch repair complex.
While bacterial mismatch repair is relatively well, human one is less known.
However, we know that the common hereditary colon cancer is caused by mutations in
some genes of the mismatch repair protein complex. So not only mutations in a
specific protein coding gene, but any molecular defects in the repair mechanism
can also lead to disease.
Both the direct and the excision repair take place before DNA replication, ensuring
that only the correct DNA molecule’s replication is possible. However, the cells have
“multiple insurance”, so there are two additional, alternative, post-replication repair
mechanisms in case of the failure of the first two repair processes. One of them is the
recombination repair, the other is the so called error-prone or SOS repair. During the
recombination repair the mutation remains uncorrected - for example a thymine dimer
inhibits DNA synthesis, so there will be a gap in the right place in the new strand.
(The synthesis is not interrupted completely because the DNA polymerase - as in the
case of Okazaki fragments can synthesize a new strand in pieces.) The gap is filled after
recombining with the original strand, while the gap of the original strand derived from
the previous recombination is filled by DNA polymerase and ligase enzymes in
cooperation. This late repair mechanism makes the correction of a mutation possible
before the next DNA replication.
The SOS or error-prone repair is known only in prokaryotes (although similar
mechanisms are assumed in eukaryotes as well), and it works only in extreme cases
where cell survival is at stake. When much of the DNA is damaged by strong radiation or
other mutagenic agents, there is no time for the precise yet time-consuming repair
mechanisms mentioned above, but to correct the DNA damage quickly and inaccurately,
thus avoiding the immediate cell death. It is obvious that such a mechanism in
multicellular eukaryotes is not necessary, since the death of a single cell does not lead to
the death of the whole organism, the other cells will take over the function of the lost
cell.
The most difficult is to correct those mutations which are usually caused by ionizing
radiation or oxidative damage resulting double-stranded DNA breaks, since then - in
contrast to the previously mentioned repair mechanisms - there is no template strand
serving as the basis of correction. The double-stranded breaks - given that free ends are
In NHEJ a special DNA ligase with a cofactor brings the broken ends together. If the
double-stranded break creates fragments with overhanging strands and
microhomologous sequences then the repair is most likely correct. If the fragments have
blunt ends, then there is a high chance to unite not related pieces and generate
structural chromosome abnormality as well.
For error correction homologous recombination repair uses either the correct
sequence of the homologous chromosome, or in G2 phase of the cell cycle the sister
chromatid already formed as template by an enzyme system similar to that used in the
crossing over.
Of course, not only various mechanisms for repair are available to protect genome
integrity, but inactivation systems as well which can neutralize or inactivate mutagenic
agents. Such as the peroxisomal system in which the oxidative and thus mutagenic
superoxides are eliminated by superoxide dismutase that converts peroxides H2O2, then
catalase cleaves and thus neutralizes it.
chromosome fragility and instability, which characterized by ≈ 60 SCE / cell in the SCE
test and these values serve as a basis for diagnosis.
Categories of beneficial, neutral and harmful mutations are used for the
assessment of consequences of mutations in population and evolution genetics. Then the
mutation is evaluated not from the individual’s point of view but from the survival of the
species. However, we must not forget that in this case the mutation is not alone, but in
relation to the environment investigated. The best, now classic example is the case of
white and black pigmented versions (morphs) of peppered moth (Biston betularia) in
England (see http://en.wikipedia.org/wiki/Peppered_moth). It also warns that not only
the genetic material, but its environment is changing, and that was once beneficial or
neutral environment for one of them, it was detrimental to the other, or vice versa. It
should be noted also that the wild type / mutant allele discrimination applies to a
particular environment, population state and the wild type allele always means the
mutant, which is the most common under those circumstances.
2.6. Questions
1. What is the difference between a point mutation and a SNP?
2. What kind of mutations do you know according to their origin?
3. Give examples of some physical and chemical mutagens!
4. Why could a double-stranded DNA break lead to structural chromosomal
abnormality?
5. What is the difference between the causes leading to polyalanine and polyglutamine
diseases?
6. What is the explanation for the existence of mutational hot spots?
7. What is the connection between the anticipation and nucleotide repeat mutations?
8. Why is SOS repair not found in multicellular organisms?
9. When does mutation repair take place?
10. Give examples of some mutagenicity tests!
11. What could be the consequence of splicing mutations?
3.1.1. Deletions
If a chromosome is broken, and the broken piece lost, we are talking about deletion.
Then the genetic information carried by the broken piece will be absent from the cell
involved, whereupon the cell does not function normally or die. Since the deletions
eliminate certain functions therefore certain proteins for example enzymes are not
produced. By the help of deletions the location of the gene eliminated can be mapped - it
was one of the earliest methods of gene mapping, the deletion mapping.
If the break is close to the end of the chromosome, a terminal deletion is generated.
In this case, in addition to other genes telomere is lost, too and this also contributes to
the severity of symptoms, to early lethality. The best known example of a terminal
deletion is the cat cry (cri du chat) syndrome: the short arm of chromosome 5 is
deleted (5p-). The disease is named after the affected newborns characteristic mewing
cry.
There are two breaks within one chromosome in the case of interstitial deletion,
and the intermediate piece is lost. Such lesions usually may cause severe physical and
3.1.2. Duplications
During duplication a chromosomal segment is duplicated. It's either a replication error
or due to meiotic unequal crossing over. In both cases the repetitive sequences
occurring in the affected region may explain the "slipping" of the replication apparatus
or the non-exact pairing of the non-homologous chromosomes (skipping). Like
deletions, duplications are also used to identify the chromosomal location of a gene or
group of genes, so to map a gene.
3.1.3. Translocations
For the formation of translocations more than one, usually 2 or 3 breaks are needed. The
broken part/s are transferred to another chromosome. Depending on the origin of the
broken piece or on the number of fragments translocated there are different sub-groups
of the translocations.
3.1.3.1. Reciprocal translocations
At least two breakpoints are expected in the reciprocal translocations, which may be in
two homologous chromosomes or in two completely different non-homologous ones.
The broken fragments of chromosomes are exchanged then join to a new location.
As a result, two chromosomes of altered structure are created. However, this does
not cause phenotypic changes, i.e. symptoms or disease in most cases. This is the case of
balanced translocation. This phenomenon can be explained by just changing the
position of the affected genes, not the genes themselves. Breakpoints are usually in non-
coding regions, as the proportion of the coding regions of the human genome is <2%.
In cases where the breakpoint is within a gene, following the translocation the gene
itself is affected, so the abnormal product - with different function, activity or amount, or
perhaps unable to function - is responsible for the appearance of pathological traits, e.g.
tumor formation.
The best example of reciprocal translocations leading to the formation of the
Philadelphia chromosome (Ph1), is between 9 and 22 chromosomes, its cytogenetic
abbreviation is t(9;22)(q34;11). This translocation occurs in chronic myeloid (CML) or
acute lymphocytic leukemia (ALL). The breakpoint in chromosome 22 is in the BCR
(breakpoint cluster region) gene, while the breakpoint of chromosome 9 affects in the
ABL (Abelson murine leukemia) proto-oncogene. Since the ABL gene encodes a tyrosine
kinase as the result of the translocation a bcr / abl fusion protein is produced which not
only has a greater molecular weight than the original enzyme, but also a higher activity.
In fact, during this translocation the well-regulated promoter of ABL gene is lost, and the
gene permanently overexpressed. Finally this leads to uncontrolled cell proliferation, i.e.
the development of the tumor.
Another medically important example is the Burkitt's lymphoma caused mostly by
Epstein-Barr virus. In this disease the c-MYC proto-oncogene coded by chromosome 8
The centric fusion not only results in an abnormal chromosome structure, but the
chromosome number is reduced from 45 to 46.
Based on current cytogenetic evidence, chromosome number reduction occurred in
hominid evolution can be explained by two consecutive centric fusions. While the great
apes: gorillas, chimpanzees and orangutans have 48 chromosomes, humans have 46.
This means that the centric fusion has had to occur after the line of apes and humans
separated during evolution.
3.1.4. Inversions
The inversion is a structural chromosome aberration in which the same chromosome
breaks twice and the fragment between the breakpoints turns 180 degrees. There are
two types:
1/ pericentric
2/ paracentric inversion
3.1.6. Isochromosome
The isochromosome is an abnormal chromosome containing the same genes on both
arms. Upon formation the sister chromatids are not separated parallel to the long axis of
the chromosome, and migrate towards the poles of the cell, but the plane of their
separation is perpendicular to the longitudinal axis.
Thus aberrant chromosomes ultimately cells containing them are formed which
contain either the short arm or the long arm specific information only on both arms and
the information of the other arm is lost.
Since a chromosome arm is rich in many genes, so the surplus or the lack of these
lead to severe, often lethal consequences. X and Y chromosomes seem to be exceptional
since in most of the known viable isochromosome cases these chromosomes are
involved. This is because the Y chromosome is relatively gene poor and the X
chromosome inactivation has a compensating effect.
The above mentioned structural abnormalities are usually formed, except
duplication, prior to DNA duplication (G1 phase), and therefore replication of damaged
DNA leads to identical sister chromatids. Their separation at the end of mitosis results in
identical chromosomal aberration carrying daughter cells. Although these types of
aberrations can be formed in both mitosis and meiosis, in a strictly medical point of view
the latter one is more important because gametes with chromosome mutations can lead
to the birth of affected /mutant offspring.
From amongst structural aberrations the translocations and inversions exist not only
in balanced, asymptomatic forms. In the case of carriers however, the birth risk of
chromosomally unbalanced, physically and mentally retarded child, with severe
developmental abnormalities is very high. The symptoms often lead to intrauterine
death so to spontaneous abortion or stillbirth. This is due to the difficult pairing of
structurally deficient and normal homologous chromosomes in the first meiotic division
(Figure 3.3.).
The most severe segregational abnormalities may also inhibit gametogenesis and
thus cause infertility, sterility. The best example is the centric fusion between
homologous acrocentric chromosomes. For example, from t(15;15) or t(14;14)
Robertsonian translocations viable offspring cannot be born, from t(21;21) centric
fusion either unviable or Down syndromic offspring can be born (Figure 3.6.).
Figure 3.6. Gametes and offspring derived from a t(14;21) centric fusion.
Source: http://mhanswers-auth.mhhe.com/biology/genetics/mcgraw-hill-answers-
changes-chromosome-structure-and-number; Figure 8.29.; 29/07/2013.
1/ euploid
2/ aneuploid
3/ mixoploid mutations
Several somatic and especially sex chromosome aneuploidies - trisomies - occur in live-
born, but only one - the X chromosomal monosomy (Turner syndrome) occurs in live-
born. The aneupolid mutations are due to mitotic or meiotic non-disjunctions, when
the sister chromatids or the chromosomes do not separate in the anaphase – because of
the abnormality of the kinetochore, the centromere or both. Less frequently
(uniparental disomy) a chromatid / chromosome lagging behind the others in the
anaphase - anaphase lag - does not get to the right pole, and therefore not to the
daughter cell.
Due to this one of the daughter cells is with an extra chromosome, while there is a
deficiency in the other. Of course, from medical point of view meiotic non-disjunctions
are more important as these lead to defective gametes, and finally to affected offspring.
In the case of mitotic non-disjunction, it is crucial, when and in which cell type’s
division occurs. The early non-disjunction, eventually involving many cells / tissues
leads to severe consequences (mosaicism).
The meiotic non-disjunctions are grouped according to when they occur – in the
first or in the second meiotic division. In the first meiotic non-disjunction, some pairs of
homologous chromosomes are not segregated, whereas in the second meiotic non-
disjunction - as in mitotic non-disjunctions the sister chromatids are not separated.
These have different consequences accordingly.
Following the first meiotic non-disjunction all four progeny cells - in
spermatogenesis the four sperms - will have an abnormal chromosome set – will be
aneuploid. Two is with an additional chromosome (n+1); two is without one (n-1).
In the second meiotic non-disjunction only the half of the daughter cells are affected.
They will also be with an extra or an absent chromosome.
The fusion of such abnormal gamete with a normal one results in trisomic or
monosomic zygote.
In the case of trisomies there is difference in the origin of the three homologues
depending on in which meiotic division the mutation took place. Trisomies derived
from the first meiotic division all three homologues are of different origin (e.g. one is
from the maternal grandmother, the other is from the maternal grandfather, and the
third is inherited from the father). However, in trisomies from the second meiotic non-
disjunction two homologues are identical (e.g. either from the maternal grandmother
or from the maternal grandfather) and only the third comes from the other parent,
from the father. 70% of the human aneuploid chromosome mutations are derived
from the first and 30% from of the second meiotic non-disjunction.
So most of the meiotic non-disjunctions occur during the first meiotic division
and are of maternal origin. The frequency of maternal non-disjunctions and the
aneuploid offspring (like Down syndrome) - increases with maternal age (Figure 3.7).
The reason of this lies in the characteristics of female gametogenesis: probably the aging
of the synaptonemal complex, which reduces the chance of co-segregation of
homologues leads to the formation of gametes with abnormal chromosome number.
This is why above a certain maternal age (35-40) prenatal tests are recommended or
required, to determine whether a fetus carries a numerical chromosome aberration or
not.
3.2.3.1. Trisomy 21
Trisomy 21 is the cause of Down syndrome. Although the non-disjunction of
chromosome 21 is not the only cause of Down syndrome - a smaller proportion of the
cases is due to either centric fusion or translocation - it is the most common type.
Despite the fact that trisomy 21 fetuses die in utero the average population frequency of
Down syndrome is 1:650, but this value increases dramatically with maternal age, at 45
years of age it is more than 1:100!
Although today live-born trisomy 21 patients have more or less the same life
expectancy than healthy individuals, but the leukemia and some other disease
prevalence is higher among them than in the general population. In recent decades,
there is a significant change in the social status of Down syndromic individuals, whereas
before they were excommunicated, teaching them was thought to be impossible, now
increasing efforts have been made to facilitate their social integration (e.g. special
kindergartens, in many countries they are taught together with healthy children in
public school classes, sporting events etc.).
3.2.3.2. Trisomy 13
Trisomy 13 is the Patau syndrome. Similar to Down syndrome it is most commonly
derived from maternal non-disjunction. 65% of such non-disjunctions derived from the
first meiotic division. Frequency of birth is 1:12 500 - 1:21 700. Only <5% of these
infants survive the first year of life.
3.2.3.3. Trisomy 18
Trisomy 18 is the Edwards syndrome. It is primarily due to maternal non-disjunction.
95%! of the cases are due to non-disjunction in the first meiotic division. The frequency
is 1:6000 -1:10 000 live-born but the frequency at the time of conception can be much
higher, since approx. 95% of the fetuses die within the womb. 30% of the Edwards
syndromic abnormal newborns die within one month, > 95% of them die within a year.
will largely influence the production of other proteins, and thus indirectly the body
height, too.
Although Turner syndrome is often characterized by normal intelligence there is a
difference in verbal skills, social integration between patients inherited their X
chromosome from father or the mother. Maternal X carriers, according to surveys are
weaker in these features than the patients inherited paternal X. The phenomenon is
explained by the different methylation of the two types X chromosomes and the genomic
imprinting.
3.2.4.2. Klinefelter syndrome
Klinefelter syndrome is characterised by 47,XXY karyotype and male phenotype. The
frequency is 1:1000. Nearly it is derived with the same probability from maternal (56%)
and paternal (44%) non-disjunction. 36% of the maternal non-disjunctions take place in
the first meiotic division. Since there are two X chromosomes, thus they are Barr body
positive. Their sterility can also be attributed to presence of 2 X chromosomes, since
certain X chromosomal gene products are in a higher dose than in normal fertile males.
3.2.4.3. Triple X syndrome
Feminine phenotype and 47, XXX karyotype are present. 89% is of maternal, 8% is of
paternal origin, and the remaining 3% is due to post-fertilization mitotic non-
disjunction. Neonatal frequency is 1:1000. Two Barr bodies are typical.
3.2.4.4. Double-Y syndrome, "superman" or Jacobs syndrome
In this case normal, slightly taller than the average males have 47,XYY karyotype.
The birth rate is 1:1000. They derived only from paternal second meiotic non-
disjunction. In contrast to all meiotic non-disjunctions, the formation is not affected by
age, as paternal gametogenesis is continuous from puberty, there are no aged sperms.
They are also featured by poorly tolerated frustration and stronger aggressivity;
perhaps that is why this chromosome abnormality is found in greater numbers amongst
imprisoned men. The aggressiveness and the possible criminal tendency are strongly
debated, and it would only be 100% decided if the entire male population would be
karyotyped and comparative data about their aggressivity would have been available as
well. Today many different aggressiveness associated genes and gene mutations are
known, that is why the role of the Y chromosome in aggressiveness is questioned.
Knowing the characteristics of meiotic division we could ask that the two
aneuploidies (47,XXX and 47,XYY) with normal fertility are characterized by greater
prevalence of similar disorders among offspring or not. For example, in the case of
double Y syndrome the following karyotypes offspring are expected in the offspring: 2
XXY, 2 XY, 1 XX and 1 XYY. In contrast, birth of only normal offspring was reported so
far, however its exact explanation is still not known.
zygote is formed first, and subsequently the 3rd homologue is lost. Depending on
whether first or second meiotic non-disjunction occurred, uniparental heterodisomy
or uniparental isodisomy is present. The first case is when the child inherits two
different homologues from the parent (one grandmaternal and one grandpaternal), that
is non-disjunction occurred in the first meiosis. The latter is when the two homologues
inherited are the same (either both are grandmaternal or grandpaternal) suggesting a
second meiotic non-disjunction.
In UPD depending on the parental origin of the homologues, and due to genomic
imprinting, different symptoms may be seen. The different symptoms in some of the
Prader-Willi and Angelman syndrome cases are not due to the 15q deletion, but the UDP.
3.4.1. Mosaicism
In genetics a mosaic is a living creature, where two cell lines of different chromosome
numbers, but of the same origin are present in the body. They are either aneuploid or
polyploid mosaics.
The former occurs as a result of mitotic non-disjunction or anaphase lag during
cleavage, two cell lines of different chromosome number are formed, when one is
normal and the other is aneuploid, generally trisomic. For example, assuming a two-cell
embryo, if one cell is divided normally and the other is abnormally, then finally 2 normal
and 1 trisomic and 1 monosomic cells are present. Since the monosomic cells are not
viable eventually the ratio of the normal and the trisomic cells will be 2:1.
In the case of polyploid mosaicism a normal and a polyploid (generally triploid /
tetraploid) cell line are present. In this case, however mitotic spindle error leads to the
formation of the aberration. Again, assuming a two-cell embryo, if one divides normally,
the other not, ultimately there will be 3 cells instead of four, and two are normal and one
is tetraploid.
Depending on the time the aberration occurs (during cleavage or in organogenesis or
even later in development), the symptoms become more or less severe. So the
proportion of normal and defective cells is crucial. Mosaicisms involving sex
chromosomes are relatively common.
In the case of gonadal mosaicism only the cells in the germ line have abnormal
chromosome number, thus the risk of numerical aberrations in the offspring is high.
Unfortunately, the detection of such defects is still not possible routinely, but the birth of
an abnormal offspring of the patient can indicate this. Mosaicism in a broader sense is a
somatic mutation, when different mutants (alleles) of a given gene are located in
different organs or in different cells of the same organ (for example eyes with different
colors: one is blue and the other is brown or a blue eye with brown spots).
3.4.2. Chimerism
After the lion-headed, bird-legged, snake-tailed monster of Greek mythology the
creature that has two cell lines of different origin - derived from different zygotes - is
called chimera. A chimera is derived either from fusion of fraternal twins, or from
double fertilization of an egg and a polar body (polocyte), or from transplacental
haematopoietic stem cells exchange between fraternal twins (blood group chimerism).
Recently, the chimera referred to as transgenic animals / plants, which contain cells
of different origin, derived from either the fusion of few-cell-embryos, or via the
microinjection of foreign genes into fertilized oocytes.
Recently several publications dealt with the phenomenon of microchimerism. It has
been known for 25 years that in maternal body after being pregnant with a male fetus -
after giving birth, and even after abortion - Y chromosome carrying or Y body-positive
cells can be detected in the bloodstream. It is now found that these foreign cells in
maternal body detected many years (decades!) later, not only survived, but probably
also proliferated. This means that stem cells of the male fetus were transferred by the
bloodstream to the mother's body, where they reached and adhered to certain organs
and formed cell clones. Therefore a hypothesis is also suggested that some putative
autoimmune diseases are actually not autoimmune but against the cells - 50 % foreign
to female body (immunologically incompatible) immune reaction are awoken. This also
serves as an explanation why autoimmune diseases are more common in women.
However, transplacental cell migration in the opposite direction (from mother to fetus) -
cannot be excluded, and may play a role in the tolerance against alloantigenes, although
its mechanism and consequences are not well known.
3.6. Questions:
1. What are the causes of aneuploidy and polyploidy?
2. What are the main regions of chromosomes?
3. Explain the low incidence of monosomies!
4. What is microchimerism, and what is its biological significance?
5. In what diseases has UPD an etiologic role?
6. What are the different positions of chromosomal breakpoints?
7. What techniques are used for the detection of chromosomal aberrations?
8. What are chimerism and mosaicism?
9. What are the possible consequences of centric fusions?
10. What is the explanation of the higher frequency of first meiotic non-disjunctions?
In recent years, epigenetics has become one of the fastest growing areas of genetics. In
this subject - according to PubMed database - last year only, over 10 000 scientific
papers have been published.
The term epigenetics is connected to Conrad Waddington, who in the early '50s
when studying the processes of ontogeny talked about a so-called epigenetic landscape
when he tried to explain how an extraordinary variety of cells can develop from a single
cell, the zygote. Although they are the same genetically, but morphologically,
functionally different, due to what point of the scenery they reach (mountain, valley or
slope), so how the gene could be regulated during development. Today, those mitotically
and / or meiotically transmissible processes are called epigenetic phenomena that alter
the function, so the expression of each gene, without affecting the DNA sequence itself,
that the changes in gene expression are not due to mutations.
The range of these phenomena and the known enzymes and regulatory proteins
involved in these processes is expanding, and epigenetic changes related to almost all
aspects of life have been reported. Parallel with the increasing knowledge of epigenetic
processes, many previously unexplained observations, phenomena became
understandable.
type, its metabolic condition how these CpG dinucleotides are methylated, i.e. what is the
methylation pattern. The methylation of CpGs of promoter provides a basic
regulation of gene expression: the methylation usually (but there are exceptions)
leads to inhibition of gene expression. Since epigenetic marks are transmitted from
cell division to cell division but from generation to generation are usually not, this
means that the DNA methylating enzyme system is specialized accordingly. There are
two main methylating enzymes known: the maintenance DNA methyltransferase
(DNMT1) and de novo DNA methyltransferase (DNMT3). During DNA replication
DNMT1 methylates cytosines of the CpGs in the new complementary DNA strand - in
accordance with the old strand, thereby maintaining the original pattern of DNA
methylation. The DNMT3 can methylate cytosines which had not previously been
methylated. This is important in gametogenesis when the original pattern inherited
from the parents erased and a new methylation pattern - appropriate for the sex of the
organism - is built up. DNA demethylases are involved in the removal of methylation
patterns.
frequent as well. Since the genetic material of the nucleus from adult organism used for
cloning via nuclear transfer to the enucleated oocyte previously undergone a series of
epigenetic changes to function normally these changes should be reversed after
implantation. However, in this epigenetic reprogramming the oocyte cytoplasm is
involved as well, it seems that under these artificial circumstances it does not work
perfectly: the reprogramming is usually incomplete and imperfect.
The importance of epigenetic reprogramming is shown by the higher frequency
of Beckwith-Wiedemann syndrome of offspring conceived through IVF procedures.
Although the frequency of 1:5000 is not too high, it's higher than the value observed in
the naturally conceived offspring. Probably artificial conditions of IVF techniques are not
favourable to epigenetic reprogramming. However, the epigenetic changes play a crucial
role in adaptation to the environment. The importance of environment in epigenetics is
proven by twin studies. The epigenetic similarity of identical twins, (e.g. their DNA
methylation patterns and histone modifications) is very high, but it decreases as
they are getting older, due to increasing epigenetic differences induced by their
different environment, lifestyle and diet.
A controversial theory based on epidemiological studies is the transgenerational
epigenesis. Swedish studies have also associated the nutrient supply of the father and
paternal grandparents in childhood and the proband’s life-span or mortality due to
diabetes or cardiovascular diseases. (Recent animal studies showed a correlation
between the parental high-fat diet and the obesity and diabetes of the offspring). Others
described a relationship between the age when fathers started smoking and body mass
index (BMI) of their 9-year-old offspring. These observations are very difficult to explain
at present, especially on the maternal side, where metabolic signals transferred via the
placenta should be taken into consideration. However, on the father's side the sperm-
mediated epigenetic transmission is easier to interpret.
From the point of view of transgenerational epigenetic processes the role and the
delivery of modifications created by such environmental effects as diet and
environmental pollutants (e.g. pesticides) to the offspring are particularly interesting
and thought-provoking. Since folates play a key role in the synthesis of methyl donors
required for DNA methylation, so it is understandable that the content of dietary folate
has an epigenetic importance as well. It could be justified by a murine experiment later
become famous. The wild-type mice’ fur color is the so-called agouti (a peculiar
brownish-gray color), besides there is an Avy (viable yellow) mutation causes a yellow
coat color. This allele is metastable, as it leads to yellow fur only in unmethylated
state, in methylated state an unchanged agouti coat color develops. In addition, in
heterozygotes (uAvyA) the non-methylated mutant allele is dominant, whereas the
methylated form is recessive concerning the wild-type allele. The uAvy unmethylated
allele is dominant against the methylated mAvy allele, so the coat color in the animals
homozygous for of Avy is in function of the methylation of the alleles. In a conclusive
experiment homozygote (AA) mothers were crossed with heterozygous (AvyA) fathers.
During pregnancy in one group the maternal diet was rich in methyl donors, while the
others got normal diet. In the first group the majority of heterozygous offspring was
agouti-colored or smaller and less yellow spots were seen on them (referring to the
limited expression of the non-methylated mutant allele). A normal diet resulted in
exactly the opposite effect, there were more completely yellow or large yellow spotted
in the heterozygous offspring. In this case, the effect was observed in the of F2
generation as well, but not in later generations. This metastable mutation has other
consequences e.g. the yellow furred animals are generally fat and have higher frequency
4.6. Questions
1. What is the purpose of dosage compensation?
2. What could be an evolutionary explanation for imprinting?
3. What is a differentially methylated cluster?
4. What molecular alterations are in the background of epigenetic changes?
5. Why CpG dinucleotides can be mutation hot spots?
6. What is the role of non-coding RNAs in X inactivation?
7. What mechanisms can cause Angelman syndrome?
8. What is the histone code?
9. What are the CpG islands and what is their epigenetic significance?
10. What is chromatin remodeling?
5.1. Introduction
Mendelian inheritance is the basis of classical genetics. Although our knowledge about
classical genetics has significantly expanded lately, the understanding of the heredity of
the human diseases / traits can still be related to Mendelian inheritance.
Those patterns of inheritance are considered Mendelian in a simplified way, which
fulfill two criteria: on the one hand Mendel’s principles (see:
http://en.wikipedia.org/wiki/Mendelian_inheritance) can be applied, on the other hand
the environment has no influence on them. The classical classification of the hereditary
patterns is the following: autosomal dominant and recessive, codominant, X-linked
dominant and recessive, and Y-linked.
The rigid validity and interpretation of these patterns have been questioned in the
last few decades. Numerous phenomena can be observed at the expression of
monogenic diseases whose understanding raises difficulties upon the classical rules:
either the principle of dominance does not fully appear, or the severeness of the disease
is variable, furthermore the environmental factors may provide some influence on the
manifestation of the same mutated gene. It has become clear by now, that the „one gene
– one locus” genotype determines only partially the clinical symptoms of a disease, and
that the combination of several secondary genetic effects with environmental factors
together contributes to the manifestation of the disease. These phenomena are among
others penetrance, variable expressivity, oligogenic influence and as an epigenetic factor
X -inactivation in women. Furthermore a large number of already clarified and still
unclarified epigenetic components can alter the onset, the severeness and the course of
a disease.
The genetics of human traits is difficult to study, as it is not possible to make back-
crosses, and the time lag between the generations is much longer than in the case of
classical models, such as bacteria, yeast cells, drosophila, mice and rats. Additionally the
size of the families is significantly smaller than in classical models.
Up till now more than 6000 monogenic traits/diseases have been revealed. It is to be
underlined, that less than 6000 genes lie behind these monogenic diseases. Some
diseases are caused by microdeletions or chromosome duplications, but as they follow
the rules of Mendelian heredity, they are classified as monogenic diseases in human
genetics. This also shows that the hereditary pattern is not applicable to genes, rather to
characteristics.
Human diseases following Mendelian inheritance pattern are much less frequent in a
population than those of human polygenic (multifactorial) inheritance (Table 5.1). It
may happen that some physicians will only rarely face a patient suffering from some
kind of monogenic disease during their practice. This topic is still of crucial importance
for future doctors, as the genetic properties of mankind, can be „deduced” this way.
Several genetic processes, intergenic interactions, even the whole genetics can be
relatively more easily studied in monogenic systems. The obtained knowledge can
contribute then to the understanding of the development, the clinical course and the
curability of the polygenic, multifactorial diseases. In these diseases often the genes
themselves have not even been identified yet. Numerous genes play a role in their
development, therefore the simplified connections as „gene – mRNA – protein -
symptoms” cannot be set up either.
Listing and discussing all monogenic diseases is not the goal of this present chapter.
Besides other study books and the subject „Clinical Genetics” a huge online reference
source is available: the OMIM (online Mendelian inheritance in men). http://omim.org/
We intend to widen the view: to draw attention to some new aspects besides the general
laws, thereby to reveal the complexity of heredity and demonstrate that there are no
absolute truths in nature – presently in genetics – and that the mutual effects of genes
with other genes and with the environment offer unlimited possible variations in the
individuum. All of these may lead us to get closer to the desired personalised medicine,
hopefully not too far in the future. At the same time the field shifts from genetics to the
world of genomics.
Monogenic
Huntington chorea AD 1:10.000
Osteogenesis imperfecta AD 1:10.000
Familial hypercholesterolemia AD 1:500
Polycystic kidney AD 1: 800
Cystic fibrosis AR 1:3.600
Phenylketonuria AR 1:12.000
Albinism AR 1:1000-10.000
Duchenne muscular dystrophy XR 1: 4.500(in boys)
Mitochondrial
Leber optic neuropathy 1:50.000
Chromosomal
Prader-Willi Syndrome Deletion 1:30.000
Down Syndrome Trisomy 1:500 (not hereditary in general)
Multifactorial (Complex)
Schizophrenia 1: 100
Diabetes mellitus II.* 7: 100
Diabetes mellitus I 4: 1000
Maniac depression 1: 10
Breast carcinoma* 1: 10 (in women)
Allergy 2: 10
Asthma 1: 10
Hypertonia* 3: 10
Obesity* 1: 10
* Significantly more frequent in adulthood or in elderhood.
Table 5.1. The prevalence of some diseases
respect to the examined family. Obviously, anticipation complicates the exact analysis of
a pedigree as well.
Complex or compound heterozygotes: the offspring often inherits two differently
mutated alleles of the same gene from the heterozygous parents in recessive
inheritance, which means that the „aa” genotype should rather be described as „a1a2”.
The differently mutated alleles can result in differences in the severeness of the
manifestation of the disease. At some extent, theoretically, the effect of the two alleles
could even quench each other.
Pleiotropy: a single gene controlling or influencing multiple (and possibly
unrelated) phenotypic traits. Mutation in this type of gene will simultaneously affect
more than one trait. The explanation is that the gene is expressed in multiple organs;
furthermore, fulfilling perhaps completely different functions (see Chapter 11, the
example of EDAR gene). The protein coded by the gene may be an intermediate molecule
of further metabolic cascades or the protein can serve as regulatory molecule,
participating in the regulation of several different processes. As for structural proteins,
they are practically present in all tissues.
Heterogeny: it seems to be the opposite of pleiotropy as the expression of multiple
genes, independent one from the other, results in the same phenotype. In this case two
affected parents can have phenotypically healthy, unaffected children if the disease is
recessive.
Phenocopy: an inherited disease, due to environmental influences, will be less or
will not be manifested at all despite the presence of the mutant alleles. This can occur by
the effect of medicaments or for instance in Phenylketonuria by proper diet. The
oppository situation is when environmental factors induce the manifestation of a genetic
disease despite the presence of fully healthy alleles (for example deafness caused by
Rubella virus infection). The phenocopy is called pathological when the phenotype is
pathological despite a healthy genetic background, and it is called normal when the
phenotype is healthy despite a mutated genotype.
„de novo”, new mutation: a genetic mutation that individuals neither possessed nor
transmitted in a population through many generations, but the disease appears in one of
the offsprings unexpectedly. In this case the new spontaneous mutation must have
occurred in the germ line of one of the parents of the affected child. Certain genes
undergo mutations time to time, this explains why certain diseases never disappear
from a population, as mutations of their genes can be „recreated” in some individuals.
For instance Rett syndrome occurs in 95%, Achondroplasia in 80% due to new
mutation.
Influence of the age: after having had healthy children, originally healthy parents
can have an affected child suffering from autosomal dominant disease due to the elderly
age of one of the parents. The mutation appears in the germline of the parent as a new
mutation. Interestingly, in monogenic diseases, the mutation shows a stronger
correlation with the age of the father (~ above 50). The explanation is that
gametogenesis is sustained until elderhood in men, spermatogonia undergo multiple
divisions, the regulatory mechanisms fail to function properly, and the mutations
appearing during replication become stable.
Lethal/sublethal genes: the genetic mutation causes 100% or less than 100 % the
death of the affected individual. Sublethal genes cause the death of only some of the
individuals. In special cases the dominant allele can cause death before the affected
person would have offsprings. (ex. Hutchinson-Guilford – progeria
http://ghr.nlm.nih.gov/condition/hutchinson-gilford-progeria-syndrome). This could
lead to the elimination of the mutant allele from a population, but as these lethal
mutations recreate themselves, they do not disappear. In the case of Huntington Chorea
the situation is different: the onset of the lethal disease appears relatively late in life, so
the affected person will have had children by the time of the manifestation, therefore the
lethal mutated allele will be transmitted.
„modifier genes”: genes that influence the expression of another gene. These are
interactions between two or more genes of different loci. When an originally monogenic
disease shows weaker or stronger symptoms, the reason is that the expression of the
mutated gene is often altered by other gene effects. The number of these modifier genes
is usually one or two. It has been already proven that the manifestation of some diseases
is slightly controlled by the mutated forms of specific modifier genes. This offers an
explanation to the variable course of the disease in different persons (see above
expression, penetrance). Epistasis has been considered for long as a separate
phenomenon, by today it has become clear though that it is about an interaction
between certain main genes and known or still unknown modifier genes. Until now
relatively few modifier genes have been identified, but it is supposed that in most cases
not only one gene but a set of modifier genes is involved in the manifestation of the
disease. (See Chapter 14 – Systems biology). The picture is further complicated by the
fact that the modifier genes themselves follow some kind of hereditary pattern and that
they can also be polymorphic, therefore they can differently modify the main gene. The
inheritance is called oligogenic in those hereditary diseases, whose development and
manifestation have been proven to be influenced by such modifier genes, like in the case
of Cystic Fibrosis and polycystic kidney. Until recently both have been outlined as
classical monogenic diseases. As the methods of full genome sequencing or exome
sequencing (see Chapter 10) have become cheaper, in accordance with the systems
biology theory it has become possible to demonstrate that every monogenic disease is
caused practically not only by the mutation of the „main gene” but that parallel the
mutations of many other genes also contribute to the development of the symptoms.
Heterozygote advantage: it is the superiority of the heterozygous genotype to
either homozygous genotype. The key may be that a particular allele may have
advantages under given conditions, but that a different allele may be favored when
conditions change. In the case of certain autosomal recessive diseases heterozygotes
have reproductive advantage due to environmental factors. This alters significantly the
frequency of the disease in these populations. Independently of the environmental
factors, modifier genes are supposedly also involved in this phenomenon.
Influence of the sex: the manifestation or severeness of certain diseases is different
in men and women. (See details in Chapter 6.) In the case of sex influenced traits the
autosomal genes are expressed more in one of the sexes (for ex. boldness). In congenital
adrenal hyperplasy (CAH) altered phenotypes develop in both sexes. In the so-called sex
restricted diseases the phenotype is manifested only in one of the sexes, although the
inheritance is autosomal. Due to the fact that specific hormones are needed for the
expression of the disease, it will be manifested in one gender only. In Pubertas praecox
for instance the level and effect of sex hormones play the main regulatory role in the
development of the disease.
The influence of the environment: some monogenic diseases - despite the mutated
genotype - are manifested only when particular inducing environmental effects hit the
organism. The inducing factors are usually medicament or food. Earlier these diseases
used to be called ecogenetic (Porfiria, Malignus hypertermia, Glucose-6-phosphate-
dehidrogenase deficiency) (see 5.4.4). Either an altered function of the modifier genes or
some epigenetic event lies in the background of the inducing effect.
Table 5.2 shows a short summary of the occurrence of the above mentioned
terms/phenomena and in connection with some autosomal diseases.
heterogeneity
heterogeneity
Heterozygote
paternal age
Anticipation
expressivity
Influence of
Incomplete
penetrance
Phenocopy
Pleiotropy
advantage
Multiplex
Variable
allelism
Locus
Achondroplasia X Allele X
Marfan Syndrome X X X X
Osteogenesis ? X X X
imperfecta
Familial X X
hypercholes-
terolemia
Polydactyly X X X
Huntington Chorea X
Deafness X
Cystic fibrosis X X X X
Phenylketonuria X
Albinism (albino X X
phenotype)
CAH X X X X
Xeroderma X X
pigmentosum
Sickle cell anemia X X
Table 5.2. Summary of the genetic characteristics and phenomena in connection with
some AD and AR diseases. In column „Phenocopy” those diseases are shown whose negative
(mutated) gene effect can be compensated either by diet or by medicaments. Therefore the
treatment results in fully or partially healthy phenotype.
been mutated in the cascade. The inheritance pattern is mostly autosomal dominant,
rarely recessive. Although the mutated alleles are present in the patients, they do not
lead to phenotypic appearance unless certain specific environmental factors induce the
manifestation of the disease. These factors are from different origins: drugs, alcohol,
steroids (for example contraceptives) stress, starvation, light, etc. The classification of
Porfirias in internal medicine and their biochemical background will not be discussed in
this chapter. Instead, once more, we intend to draw attention to the fact that genes per se
are not „omnipotent” and although the monogenic background is clarified, the
inheritance pattern of this disease differs significantly from the classical Mendelian
schema.
5.4.6.2. Malignant hyperthermia
The disease can be caused by the mutation of at least six different genes. Mutations of
CACNA1S and RYR1 (ryanodine receptor) genes are the most frequent. The product of
CACNA1S gene regulates the function of ryanodine receptor. As ryanodine receptor
regulates the function of Ca++ ion channels, the mutation of either CACNA1 or RYR1
genes results in the efflux of large amounts of Ca++ ions from the sarcoplasmic
reticulum to the cytosol due to the faster opening and slower closing of the ion channels.
The increased Ca++ ion concentration causes increased muscle contraction and
increased heat production, resulting in unquenchable high fever and even death. This is
indeed a pharmacogenic disease as it is exclusively triggered by drugs, namely by those
that are commonly used as general anesthetics.
5.5.2. Enzymopathies
The mutated alleles cause decreased enzymatic activity already in heterozygotes. Its
value can fall exactly between the enzymatic activity values of homozygous affected and
homozygous healthy persons. It is also possible that the level of the defeated metabolic
product significantly increases in the organism; still, there are no signs of the
manifestation of the disease.
The phenomenon of pleiotropy can be easily interpreted in the case of
enzymopathies, as the affected enzymes often catalyze steps of cascade reactions. If the
enzyme is missing from the beginning of the cascade or from a branching point, more
than one metabolic process will be probably damaged.
5.5.2.1. Phenylketonuria (PKU). The disease is caused by the lack of phenylalanine
hydroxylase enzyme, toxic phenylpyruvate is produced instead of Tyrosine. The enzyme
deficiency causes problems in the cascade of Tyrosine conversion. Although Tyrosine is
supplemented in the organism with diet, it remains below the normal needs of Tyrosine,
therefore DOPA and melanin synthesis suffers damages. PKU can be treated with
phenylalanine free or phenylalanine poor diet. While the manifestation of the toxic
product can be avoided, light skin color, blue eyes, light hair color remain as
characteristic phenotype in the affected persons. It has been recently discovered that
the expression of the several hundred type of phenylalanine hydroxylase mutations is
influenced by modificator genes.
5.5.2.2. Classical albinism. The disease is caused by the mutation of tyrosine kinase gene,
therefore melanin synthesis fails. This enzyme is a component of a cascade reaction
series as well, which explains the pleiotropic effects in albinism. It is to be underlined
that the mutations of multiple other genes also result in a similar albino phenotype. The
causes of these diseases are deficiencies in intracellular melanin transport. Mutations of
several different genes have been identified already, also the diseases have been named
differently. (See Table 5.3.)
5.5.4. Haemoglobinopathies
5.5.4.1. Sickle cell anemia. The cause of sickle cell anemia is one of the most well-known
mutations. The 6th aminoacid Glutamine is substituted onto Valin in the betaglobin
chain of the haemoglobin molecule, due to a transversion substitution in the gene. The
mutated haemoglobin is called haemoglobin S. It causes the sickle shape of the red blood
cells and gives rise to multiple additional pleiotropic effects. The development of the
heterozygote advantage against malaria can also be attributed to this mutation. (See
Chapter 11.)
5.5.4.2. Thalassemia. The disease can be caused by several types of mutation. Deletions,
frameshift and splicing mutations can arise in both alpha and beta chains of the
haemoglobin. This explains at the same time the differences in the severeness and the
geographical spreading of the types of Thalassemias. Heterozygotes are protected
against malaria in the case of this disease as well.
not to be confused either with the fact that people can react very differently even to the
most common medicament, which can be explained by the large number of genomic
polymorphisms as well.
5.8. Conclusion
A new interpretation of the classical monogenic inheritance and of the application of
Mendelian principles has arisen in the last few decades. It seems undoubted by today
that environmental factors, epigenetic effects and the products of the so-called
modificator genes all influence the phenotypic manifestation of the allele pair (gene)
which is responsible for a given trait/disease. Even monogenic inheritance that was
believed relatively simple earlier, seems to be more complicated and complex. (See
Systems biology, Chapter 14.) Although the complexity was assigned to the polygenic,
multifactorial inheritance, our view about monogenic diseases widens due to the new
discoveries of genetics / genomics (see Fig. 5.1).
5.9. Questions
1. Describe Mendel’s principles!
2. Define the following terms! – gene, allele, multiplex allelism, complex or compound
heterozygotes, locus heterogeneity, allele heterogeneity, dominance, recessivity,
codominance.
3. Which phenomena interfere with the classical application of Mendel’s principles in
the case of monogenic diseases?
4. Define the following terms! Give examples of the diseases for each term! – pleiotropy,
expressivity, penetrance, anticipation, phenocopy, complex or compound
heterozygotes, heterogeny, sublethal/lethal gene, new mutation, modifier gene.
5. Describe the meaning and give examples: the age and the sex influence the
manifestation of some diseases.
6. Which are the classical monogenic inheritance patterns?
7. How has the discovery of oligogenic inheritance pattern affected our view of the
monogenic inheritance? Give examples!
8. What types of genes are usually mutated in the case of AD and AR diseases? Give
examples for each type!
9. Does the environment influence the manifestation of diseases following monogenic
inheritance patterns?
10. Describe the inheritance and the manifestation of tumors and pharmacogenetic
diseases with respect to the environmental effects!
Recommended readings
http://ebooks.thieme.com
Our gender on the one hand acts directly through the sex chromosomes and the genes
encoded by them, on the other hand through the characteristics of gametogenesis and
fertilization it influences indirectly the appearance of our characteristics.
While the symptoms of homozygous dominant XAXA females are alleviated only by
the X inactivation, whereas in heterozygous XAXa women the product (protein) coded by
the normal allele Xa can do the same it as well.
Traits / diseases determined by genes on the X chromosomal PAR1 region e.g. the
Xg blood group antigen and amelogenesis imperfecta (incomplete teeth enamel
production) have such inheritance. In the latter one the enamel layer of the teeth is
missing and such teeth grow carious more easily.
The most known X-linked dominant disorder is the hypophosphataemia (formerly
called vitamin D-resistant rickets, coded on the long arm of the X chromosome), which is
characterized by growth retardation in childhood, rickets and low serum phosphate
level. It is a treatable disease by large doses of vitamin D and phosphate!
The fragile X syndrome, a trinucleotide (CGG) repeat mutation caused disease is also X-
linked dominant. This is the most common cause of male mental retardations. While the
normal repeat number is <30, this number is between 200 and 2000 in the affected
individuals. Between about 50 and 200 repeats there is the so-called premutation or
gray zone. The adult affected males are characterized with a long face, protruding ears,
large jaws and large testes. In addition to mental retardation, behavioral problems and
mood swings are part of the symptoms. The protein encoded by the FMR1 gene
probably causes the symptoms by binding the mRNAs of other genes involved in the
functions of the nervous system.
The assessment of the X-linked dominant pedigrees is complicated by the so-called
X-linked male lethality. Since there is no normal allele the hemizygous, male embryos
already die in utero. In this case, there are usually not as many offspring in the family to
realize the 2:1 female : male sex ratio characteristic of such inheritance. Incontinentia
pigmenti associated with hemizygous lethality is a disorder of pigmentation
characterized by blistering of the skin in early childhood and with partial hair loss that
manifests only in women. Rett syndrome, which is basically a neural developmental
disorder, is also characterized by male lethality but moreover epigenetic phenomena are
involved as well. In girls the typical progressive symptoms of loss of speech and
acquired motor functions, the compulsive hand-wringing, ataxia and seizures are due to
the mutation of the methyl-cytosine binding protein coding MECP2 gene.
1/ zigzag or knight’s move pattern: the disease is transferred from mother to son
and from son to his daughter
2/ there are many more men affected than women
3/ sick women are born to affected father and obligate heterozygote mother
4/ affected man usually comes from healthy parents where the mother is obligate
carrier
5/ there is no man-to-man transmission
Although hemophilia is known for at least 4,000 years - as already mentioned in the
Talmud that in families where one of the sons of the matrilineal relatives died due to
bleeding out at circumcision as a result, their newborn sons were not circumcised - the
first point mutation was described only in 1986. The X-linked recessive hemophilia has
two forms: Hemophilia A, which is due to the failure of blood clotting factor VIII, and
hemophilia B, which is due to the failure of blood clotting factor IX.
In 40% of hemophilia A cases a specific mutation of the factor VIII gene occurs. The
intron 22 of the gene contains two small genes of unknown function, the F8A and F8B.
About 400 kb away there are more copies of F8A of as well. Among these copies
intrachromosomal crossing over takes place during meiosis, causing the inversion
of the corresponding chromosome piece and thus factor VIII gene falls apart into
two distant pieces.
This is the cause of the lack of clotting factors and hemophilia. The most common
mutation causing hemophilia occurs in the paternal germ line during meiosis. The large
number of divisions and the concominant increased spontaneous mutation rate typical
to paternal gametogenesis explain among other things that mutations occur with higher
probability in the offspring of aged fathers.
One of the best known and most studied cytoskeletal diseases is Duchenne
muscular dystrophy. This X-linked recessive disease, which was described in the
second half of the 19th century, begins with difficulties of standing up in the 2nd-3rd
years of life - Gower's sign - and associated with increasing muscle weakness.
The boys around the age of 10 are wheelchair-bound then die around 20 years of
age. Because the disease is relatively common (incidence of 1:3500), and to this day is
incurable, it is clear that it is intensively investigated. Thus came to light that the cause
of the disease is a gene mutation affecting a cytoskeletal protein called dystrophin. The
dystrophin, a muscle cell specific protein whose C-terminal end is bound to the
sarcolemma through a glycoprotein complex of six components and the N-terminus
linked to the actin cytoskeleton. The dystrophin is the product of the largest
currently known gene, which is 2400 kb in length, and thus its transcription takes
more than 16 hours. The function of dystrophin in muscle is the cell membrane
stabilization. The mutation is often a frame-shift causing deletion, and thus the cell does
not produce dystrophin, or a protein with completely altered structure and function is
synthesized. If only an in-frame mutation occurs in the dystrophin gene, that is only a
small part is deleted, then the so-called Becker muscular dystrophy with milder
symptoms is formed. The Duchenne and Becker muscular dystrophies are due to
different mutant alleles of the same gene, so they are examples of allelic
heterogeneity as well. As many other mutations (for example, point mutations,
and duplications) occur in the dystrophin gene, multiplex allelism is also typical
for it.
Since the affected men generally do not reach reproductive age, they can not
transmit their mutant gene to the offspring, so this sub-lethal mutant gene should
gradually disappear from the population. However, the incidence of the disease is fairly
constant; it is just possible as the rate of new mutations is high, that the mutant gene is
repeatedly produced. According to new observations deletion mutations involving
the dystrophin gene take place typically in the maternal germ line while the other
types of mutations are rather common in the paternal germ line, but the reason
has not been known yet.
The X chromosome inactivation further complicates the pedigree analysis also in X-
linked recessive inheritance. The phenotype of heterozygous females varies depending
on the ratio of healthy XA and mutant Xa bearing cells. If the gene product is a soluble
protein, such as the clotting factors in hemophilia, the effect is “averaged”. In
other words, these women are asymptomatic but biochemically will be different
from normal. However, where the product is localized to a given cell type, there
the symptoms appear in a mosaic form. Such as the hypohydrotic ectodermal
dysplasia, where the mutation causes the absence of sweat glands and the abnormal
development or deficiency of dentition.
In this case, the paternal genome half of the developing embryo affects the development
of the placenta in a way, that it may cause a sudden increase of maternal blood pressure
towards the end of the pregnancy.
neutrophils is a particular manifestation of the inactive X, so the Barr body. The rapid
detection and microscopic examination of Barr bodies are simple, in the past it was used
for quick sex determination in connection with sports competitions.
Initially, it was thought that the whole X chromosome is inactive, but we now know that
the PAR regions are never inactivated! Moreover, non-inactivated X chromosomal
genes outside the PAR were also found, a part of them has a functional therefore
transcribed homologue in the Y chromosome, while the other part has only non-
functional pseudogene on Y (such as the steroid sulfatase (STS) gene and the anosmin
gene responsible for Kallman syndrome).
In other species, where heteromorphic sex chromosomes also occur other
mechanisms exist for dose compensation. The X chromosome in male Drosophila is
twice as active, than in females. A 1:1 ratio instead of 2:2 is formed this way. It is also
possible that both female X-s are only half as active as the male one, thus 1/2+1/2 : 1 =
1:1 is the final ratio.
An interesting possibility of the X chromosome inactivation is the so-called skewed X
inactivation. This means that in certain tissues always one – let’s say - always the
paternal X chromosome is inactivated. This may have far-reaching consequences. It is
attempted to explain by this the higher frequency of certain autoimmune diseases (e.g.
SLE) observed in females. In the thymus maturing T lymphocytes can only tolerate those
antigens which are encoded by the active X-chromosome, and not the antigens coded by
the other, the inactive one. Thus, all the cells / tissues where the other X chromosome is
active are considered non-self, and immune response is generated against them,
resulting in autoimmune disease symptoms. Of course, this cannot be the sole cause of
autoimmune diseases, since it cannot be explained by this why the disease manifests in
different ages.
6.8. Questions
1. What is the role of RNAs in cytoplasmic inheritance?
2. What kinds of dose compensation mechanisms are known?
3. What is the supposed role of skewed X inactivation in the development of
autoimmune diseases?
4. Based on pedigree analysis how can we distinguish the X linked dominant
inheritance from the autosomal dominant one?
5. What are homo- and heteroplasmy?
6. What can be the consequences of maternal heteroplasmy?
7. What do you know about the genetics of pre-eclampsia?
8. What are the characteristics of the inheritance of precocious puberty?
9. Which genes can escape the X inactivation?
10. What are the differences amongst the symptoms of a carrier woman, if the X-linked
gene encodes a soluble or a cell-bound product?
In this section we give a brief insight into the genetics of 3 biological processes:
a. Developmental genetics
b. Oncogenetics
c. Immunogenetics
7.1.1. Morphogens
Morphogens involved in cell differentiation are soluble molecules, whose effects
depend on their concentration gradient. Such a morphogen is the activin what is able
to determine different cell types depending on its concentration (e.g. in vitro ~ 0.1 ng/ml
concentration mesenchymal, while in 1.0 ng/ml skeletal muscle differentiation is
induced by).
Another of morphogen is the sonic hedgehog (SHH), which has a role in the
differentiation of the neural tube, and in the separation of the eyes. The sonic
hedgehog produced by ventral, central cells of the neural tube gradually diffuse to dorsal
cells where in almost negligible concentration sensory neurons are generated, on the
other hand from the ventral and lateral cells due to its large(r) concentration motor
neurons differentiate.
The next step in the differentiation cascade initiated by SRY is the anti-Mullerian
hormone (AMH) or MIS (Mullerian inhibiting substance) production by the developing
testicular Sertoli cells. Thus, the development towards the female sex differentiation, that
is, the development of Mullerian duct is inhibited. Shortly afterwards the production of
testosterone in Leydig cells starts, and this leads to the development of male gonads and
external genitalia.
Beside the experiments mentioned above the role of SRY was suggested by human
diseases associated with the abnormalities in sex development. Such is the sex reversal
where in the presence of XX sex chromosomes male phenotype or at XY genotype
female phenotype develops. The possible explanation is that in paternal meiosis,
the obligate crossing over is not in PAR1, but it is shifted proximally towards the
centromere. Thus, the SRY gene is transferred to the X chromosome, and thereby a
recombinant, aberrant X and microdeleted Y is formed.
There are also sex revertants, when female phenotype is formed because of a
mutated SRY. In these cases, the HMG (high mobility group) part, the DNA binding
domain of the protein is wrong, and in the absence of DNA binding the differentiation
cascade cannot start.
Although the SRY alone is sufficient for male sex determination, i.e. to induce
the differentiation, however, many other autosomal (e.g. chromosome 17 localized
SOX9 [SRY HMG box related genes] a transcription factor encoding gene), and X
chromosome localized genes are necessary to switch on SRY and to the whole
process of sexual differentiation.
For the normal sexual differentiation not only the sufficient quality and quantity of
the inductors but their adequate receptors are necessary, too. Their mutations also
cause disturbed sexual development.
The androgen insensitivity syndrome (AIS), formerly known as testicular
feminization (X-linked recessive hereditary disease) drew attention to a gene localized
on chromosome X, which is involved in male sexual differentiation. In this disease beside
XY genotype and normal serum testosterone level female external sexual characteristics
develop, although there are testes in the abdominal cavity! Since neither ovaries nor a
uterus develop, these patients are sterile. It was concluded from the symptoms, that the
problem could be after testosterone induction in the differentiation cascade (either its
receptor, or its signalization or the target genes may be incorrect). Finally, the
testosterone receptor mutation(s) was / were verified as the cause of the syndrome.
The role of pituitary-derived hormones in the sex differentiation disorders was
demonstrated by the symptomes of Kallman's syndrome patients. The most common
symptoms are anosmia (lack of sensing smell) and the complete absence of testicular
functions although XY sex chromosomes are present. The disease is caused by a deletion
of gene located proximally from the PAR1 region of the X chromosome. The gene
encodes a cell adhesion protein which has a role in neuronal migration. A part of
these stem cells migrate to the olfactory nerve, another part to the hypothalamus during
development. In the latter area they produce gonadotropin-releasing hormone (GHRH)
and thus indirectly - through the gonadtropin synthesis of the pituitary – they effect on
gonadal differentiation. This will be apparent in the unusual symptoms:
hypogonadotrop-hypogonadism and anosmia. The KAL1 gene has a Y chromosomal
homologue, too but it is an inactive pseudogene.
this is the "default pathway". Later examining rare woman to man sex reversal families it
turned out that there is a female sex determining gene, too which is the R-spondin1
(RSPO1). So if it is mutated male phenotype with 46, XX karyotype is formed. Unlike
SRY, it defines a soluble ligand which competes with WNT4 factor for a membrane
receptor (frizzled) and triggers the β-catenin pathway, and leads to the target gene
activation, and thereby the female sex determination and differentiation. Of course, like
men, also a variety of other transcription and growth factors are needed to reach the
terminally differentiated state, and perhaps they are even less known than the ones in
male sex determination and differentiation. But it is certain that the components of the
two systems mutually inhibit each other.
According to our current knowledge, in the bipotential gonads male and female
determinants are in balance, and only later, at the time of the expression of SRY, and
RSPO1 the balance is shifted to one way or the other.
Mutations in the steroid metabolism which plays an important role in sexual
differentiation as well cause the autosomal recessive congenital adrenal hyperplasia
or adrenogenital syndrome. Then female infants of XX genotype are born with not
obvious external genitalia, generally with enlarged clitoris. Other symptoms are adrenal
enlargement, salt loss and lack of cortisone. The disease is due to 21-α-hydroxylase
enzyme mutation. Due to this mutation the progesterone can not be converted to
deoxycortisone, but to 17-OH-progesterone. The latter one has an androgen-like effect,
and it is responsible for the masculinisation of the external genitals. Although the
incidence of the disease is 1:8 000–25 000 in the Caucasian population, among the Yupik
Eskimos it is very common, 1:500. This is probably due to the fact that heterozygotes
have selection advantage over bacterial infection caused by the Haemophilus
influenzae type B strain, which causes not only simple cold but meningitis in the normal
AA genotype Eskimos.
7.4. Oncogenetics
Although the process and causes of carcinogenesis have already been discussed in
several other subjects, this section will cover the major genetic events of the
development tumors, because at cellular level tumors may also be considered genetic
disorders.
The cancers affect 1 in 3 people worldwide, that a man has ~ 40% chance of the
cancer. Even this high frequency indicates that tumors are usually not of monogenic
origin, with the exception of rare monogenic tumors such as retinoblastoma, or Li-
Fraumeni syndrome. There are a number of underlying genetic susceptibility
factors (mutations) and environmental effects.
The cancer can be described as a group of diseases characterized by unlimited
proliferation and spread of mutant cells in the body.
7.4.1. Oncogenes
Oncogenes are actually genes (proto-oncogenes) of changed normal function, which are
essentially involved in cell cycle regulation. Such genes include genes encoding growth
factors (such as EGF) and their receptors (such as EGFR), the components involved in
their signal transduction (such as Ras, Raf,) and transcription factors. Mutations of these
lead to the growth factor independent unlimited cell proliferation – e.g. this can be the
result of the constitutive activation of mutant receptor tyrosine kinases. Oncogenes are
activated not only by point mutations in the above players, but by gene
amplification or chromosome translocations (e.g. the t(9;22) translocation leading to
Ph1 chromosome described in chronic myeloid leukemia which results in a fusion
protein with increased tyrosine kinase activity) as well.
In addition to classic genetic alterations, epigenetic changes - epimutations -
also can cause oncogene activation. It is known that increased genome
hypomethylation during aging often affects oncogenes. This not only explains the
higher activity of oncogenes, but the known phenomenon that certain cancers’ incidence
increases with age. A specific example for the relationship between oncogenes and
epigenetics is given by the imprinted IGF2 (insulin-like growth factor 2). Normal
colonic epithelial cells express only the maternal allele, but in colon tumors the
imprinting is lost (LOI = Loss Of Imprinting), the paternal allele is expressed, and
the tumor develops.
functions there are gate keepers and care takers mentioned. The former includes the
classic tumor suppressors e.g. RB and TP53 genes, the latter the DNA repair genes -
also known as mutator genes - (e.g. MLH1 and MSH2 mismatch repair genes).
While a single mutant allele of protooncogenes is sufficient for oncogenesis, so there
must be dominant mutation, in the case of tumor suppressors both alleles should be
mutated for the loss of the growth inhibiting function. Here, then, the mutant is
recessive. In the care taker or mutator genes the haploinsufficiency phenomenon may
play a role in oncogenesis, as in the case of mutation of one allele, the remaining normal
allele has only reduced ability to function, and in many cases even this is sufficient to
tumor induction due to the large number of uncorrected mutations.
Knudson set up the so-called two-hit hypothesis after investigating the tumor
suppressors (RB). Thus, the development of certain cancers requires two successive
mutational events affecting tumor suppressor genes. It is usually already inherently
present (familial retinoblastoma), while the other is formed only in one or certain
organs and as the previously heterozygous state is lost the homozygous mutant tumor
suppressor gene leads to tumor formation. In sporadic cases, both mutations take place
in the same person. The phenomenon is called loss of heterozygosity = LOH, and after
being identified by the modern molecular biological experimental methods, it may be
suitable for the detection of pre-cancerous condition.
Similarly to oncogenes in tumor suppressor genes epigenetics and
epimutations may play a role as well. While CpG dinucleotides in the promoters of
normal tumor suppressors are not methylated, thereby ensuring gene expression,
in tumors they are often hypermethylated so the transcription of the gene is
inhibited, and the protection against excessive cell proliferation is lost. Another
epigenetic relationship is the formation of tumor suppressor protein and HDAC (histone
deacetylase) complex. The normal suppressor proteins interact with HDAC, thereby
triggering the chromatin remodelling, the heterochromatinization which limits the
functioning of genes in the affected area, thereby inhibiting cell proliferation. The
mutant suppressors are unable to do so, therefore the euchromatic structure remains
and proliferation continues.
7.4.4. Telomerase
It is known that eukaryotic DNA is shortened in somatic cells from division to division
because of the characteristics of replication. This occurs in the subtelomeric and
telomeric repetitive sequences of chromosomes, and following approx. 50-70 divisions it
leads to cell senescence, arrest of cell division and aging. In germ line cells the
telomerase enzyme, which comprises a reverse transcriptase, and a telomeric
DNA complementary RNA can restore the length of the telomere. It's crucial in the
transmission of the same sized genome from generation to generation. However,
telomerase activity is also linked to cancer cells. They can restore the telomeres
either by up-regulating telomerase enzyme or by recombination based alternative
telomere lengthening.
If a cell - due to different mutations - avoids cell death caused by the extreme short
telomeres its genome becomes unstable, leading to the oncogenic transformation of the
cell through the aforementioned mutations (amplifications, translocations). This can be
further strengthened by mutated genes induced telomerase (e.g. c-MYC via binding to
the promoter of telomerase can activate it).
7.5. Immunogenetics
In teaching immunology a number of genetic processes - critical in the function of the
immune system - were also discussed. Of these, perhaps the most specific is the process
which leads to the enormous diversity of immunoglobulins (B cell receptors) and T-cell
receptors. All people are capable to produce approx. 1011 different antibodies,
although the human haploid genome size is "only" 3x109 bp.
Somatic gene rearrangement and somatic mutations make the elimination of this
disrepancy. There are only 3 immunoglobulin (Ig) / B cell receptor (BCR) loci (IGH, IGK,
and IGL) in the human genome which determine the heavy and the two light chains (κ
and λ), and four T-cell receptor (TCR) loci (TRA, TRB, TRG and TRD) which make the four
(α, β, γ and δ) TCR chains’ syntheses possible.
Despite the enormous diversity each individual B-and T-cell is monospecific, i.e. only
one type of Ig or TCR heterodimer with a unique antigen binding site can be produced
which are specific for one antigen only. Different B-and T-cells express different Ig and
TCR heterodimers specific to different antigens, so this enables the population of billions
of cells of the immune system to recognize virtually any antigen. Diversity is due to the
specific organization and expression of the corresponding genes.
Take, for example, a reminder of the immunoglobulins. Each Ig consists of 4 chains, 2
heavy (H) and two light (L) chains. Each chain contains a variable (V) and a constant (C)
region. We cannot find complete genes for heavy and light Ig chain determination in the
human genome, but each H and L chain are defined by a number of separate genes. The
heavy chain variable region has three domains: the V (variable), D (diversity) and J
(joining). (As for the light chain the D segment is missing.) At H locus about 200 V,
approx. 30 D and 9 J genes (including 3 pseudogenes) are found. These genes are
inactive in the cells of non-immune organs and become active only during the T-and B-
cell maturation. In the primary immune organs one from these genes (a V, a D and a J)
are randomly combined and brought together that a new fusion exon is formed, which
determines the H chain variable region.
This process is called somatic gene rearrangement and somatic recombination
(Figure 7.2), which is achieved through DNA splicing. The RAG1 and -2 enzymes
encoded by the recombination activating gene 1 and 2 are involved in V-D-J
recombination.
The D-J and the V-DJ rearrangements of immunoglobulin heavy chains and the V-JC
recombinations of the λ- and V-J rearrangements of the κ-light chains are carried out
this way. The further rearrangements of both heavy (VDJ-C) and light chains (VJ-C) are
due to mRNA splicing.
Figure 7.2. Somatic recombination of the V-D-J genes of the immunglobulin heavy chain.
Source: http://en.wikipedia.org/wiki/File:VDJ_recombination.png; 29/07/2013.
RNA splicing also takes place during the switch from the membrane-bound IgM C
domain to the soluble IgM C domain.
T-cell receptor somatic gene rearrangement takes place in similar manner that is
observed in the case of immunoglobulins.
The diversity is further increased by the facts that recombination can be
shifted a few bases in 5' to 3' direction and that the RAG recombinases can cause
double strand DNA breaks, whose repair is inaccurate and therefore the specificity
of antibodies can be even more different.
On the other hand, there is somatic hypermutation by which random base-exchange
mutations occur in the V region of B cells. This mechanism does not function in other
cells, and other genes are not affected, only a ca. 1.5 kb region. This only takes place
during the activation of B cells: when it starts to divide after interacting with an antigen,
the resulting somatic hypermutation alter the antigen-binding region. The cells
binding the antigen best survive and divide more than the other B cells. This process is
called affinity maturation. This is triggered by the activation induced the cytidine
deaminase (AID), which deaminates cytidine to uracil. This base mismatch - not exactly
repaired by a variety of mechanisms - can result in a number of different mutations.
Figure 7.3.
Source: http://pandasthumb.org/archives/2006/07/3-recent-report.html; 29/07/2013.
Figure 7.4.
Source: http://en.wikipedia.org/wiki/File:Class_switch_recombination.png; 29/07/3013.
autosomal gene (chromosome 20). Then, although the mutation affects only one gene,
the varied symptoms are the consequences of defective methylation of many other
genes, i.e. they are due to the lack of correct spatial and temporal methylation.
7.7. Questions:
1. What kind of oncogene activation mechanisms do you know?
2. What is LOI?
3. What is function of care taker and gatekeeper genes?
4. What do you know about the iPS cells?
5. What is sonic hedgehog and what is its effect based on?
6. What is the role of HOX genes?
7. What is the role of SRY and RSPO1?
8. Give an example of epigenetic changes related to carcinogenesis?
9. Explain the Knudson’s hypothesis!
10. Why somatic recombination is not regarded an epigenetic mechanism?
8.1. Genomics
Although the science of genomics has a past of several decades, it became well-known
only in the last 20 years even for the majority of natural scientists. It is one of the most
rapidly developing scientific areas, but still for most people, even for physicians or
pharmacists graduated before the 90s, it covers mainly unknown concepts. Because of
this, here I will define some terms.
First of all: What is the genome? The genome is the entirety of an organism's
hereditary information. It is encoded either in DNA or, for many types of viruses, in RNA.
The genome includes both the genes and the non-coding sequences of the DNA/RNA.
The genome in humans is the total haploid DNA content of a diploid cell, plus the
mitochondrial DNA. Because there is a difference between the female and male
genomes, since males have two types of sex chromosomes (X and Y), in this definition
we have to take this also into consideration.
The next important question is: what is genomics? There are possibilities for a
number of definitions, but perhaps the simplest is: Genomics is the study of the
function, structure and interactions of the genome. The term genomics also involves the
special genomic methods. Besides studying the DNA, genomics among others can also
involve the study of RNA (transcriptomics) and proteins (proteomics), and
bioinformatics. Because in English a lot of terms in this area end with “omics”, omics
has become a new synthetic term, and been used widespread in biology and related
sciences (http://en.wikipedia.org/wiki/Omics; http://www.omicsworld.com/).
In some definitions genomics is defined as the synonym of molecular systems
biology, in which the life is studied on the level of genomic organizations. But, genomics
is rather part of the systems biology. According to the subject of the genomics, there
are several subtypes of genomics, like structural genomics, comparative genomics, plant
genomics, human genomics, pharmacogenomics or medical genomics, etc. Here we deal
mainly with the last three of them.
There is still one important question, which can often cause problems for a lot of
people. What is the difference between genetics and genomics? In reality, there is no
sharp difference between the two terms, but in general, if a gene or a genetic variation is
studied, or the heredity of some traits is investigated, then it is called genetics. If several
genes or the genome as a system are studied, then it is genomics. Because genomics is
part of the systems biology, it requires more complex and sophisticated methods.
However, even in the scientific usage, genetics and genomics are often interchangeable.
Nature, and was followed, one day later, by a Celera publication in Science. Despite some
claims that shotgun sequencing was in some ways less accurate than the clone-by-clone
method chosen by the HGP, the technique became widely accepted by the scientific
community and is still the de facto standard used today.
The published genome in both cases contained the so-called draft sequence of human
genome, which means that it contained several gaps, and sequencing mistakes. On
average, the whole genome was read at 4-5 fold coverage. Later most of the gaps were
filled, and the mistakes corrected at higher coverage, but this project is still in progress
even today. The official end of the HGP was announced in April 2003 with fewer gaps,
and at 8-9 average coverage.
More details: http://en.wikipedia.org/wiki/Whole_genome_sequencing.
This was announced in 2006, and at that time it seemed to be utopian, but the high
demand for low-cost sequencing has driven the development of high-throughput
sequencing (also called as next-generation sequencing, or NGS) technologies that
parallelize the sequencing process, producing thousands or millions of sequences at
once. Some example for the new techniques:
The methods were so successful that in 2007 the new generation sequencing (NGS)
became the method of the year in Nature Methods magazine. In 2007 the genome of
James Watson was sequence with the 454 technology in 2 months and for $1 million. It
was still far away from the aim, but it was a big step ahead. Since then, the price has
been lower and lower, and the time shorter and shorter (Figure 8.2). E.g. in June 2009,
Illumina announced that they were launching their own Personal Full Genome
Sequencing Service at a depth of 30× for US$48,000 per genome.
In November 2009, Complete Genomics published a peer-reviewed paper in Science
demonstrating its ability to sequence a complete human genome for US$1,700. If true,
this would mean the cost of full genome sequencing has come down exponentially
within just a single year from around US$100,000 to US$50,000 and now to US$1,700.
In 2011 Complete Genomics charges approximately US$10,000 to sequence a
complete human genome (less for large orders).
In May 2011, Illumina lowered its Full Genome Sequencing service to US$5,000 per
human genome, or US$4,000 if ordering 50 or more.
In January 2012, Life Technologies introduced a sequencer to decode a human
genome in one day for $1,000.
Taking advantage of the development in sequencing, several genome projects have
been launched, like the 1000 Genomes project. The 1000 Genomes Project, launched in
January 2008, is an international research effort to establish by far the most detailed
catalogue of human genetic variation. Its aim was to sequence the genomes of at least
one thousand anonymous participants from a number of different ethnic groups. In
2010, the project finished its pilot phase, which was described in detail in a publication
in a Nature paper. Because of the rapid development of the NGS, now they are planning
to sequence the genome of 2500 persons (http://www.1000genomes.org/).
Similar projects are the Genome 10k project. It aims to assemble a genomic zoo—a
collection of DNA sequences representing the genomes of 10,000 vertebrate species,
approximately one for every vertebrate genus (http://genome10k.soe.ucsc.edu/), and
the i5k project, which plans to sequence the genomes of 5,000 insect and related
arthropod species over 5 years http://www.arthropodgenomes.org/wiki/i5K).
Figure 8.2. Changing of the price of DNA sequencing (red line) and the amount of DNA
sequence data between 2000 and 2010 in logarithmic scale. In 2000, the price of
sequencing 1 million DNA was $10 thousand, which reduced in 2010 to $1. The finished
DNA sequence in 2000 started with 8 million base pair (bp) and doubled in every 18
months. By 2010 it increased to 270 billion bp. But this number is dwarfed comparing to
the raw data that has been created and stored by researchers around the world in Trace
archive and Sequence Read Archive (SRA). Here, the amount of data was 25 trillion bp in
2010, which in this scale would be 12 m high, twice the height of a giraffe.
Source: http://www.nature.com/news/2009/091021/full/464670a.html; 15/02/2013.
complex mix of factors, including the goal of achieving diversity as well as technical issues
such as the quality of the DNA libraries and availability of immortalized cell lines.”
The official HGP collected the samples in two centers, with similar criteria
(http://www.nature.com/nature/journal/v409/n6822/full/409860a0.html). It must be
added, however, that later it turned out that Celera sequenced mainly the genome of
Craig Venter, thus he became the first named person, whose genome was sequenced.
In the HGP, altogether 18 countries participated, but the USA had the main role.
There are several web pages about the results of the HGP:
http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml),
http://en.wikipedia.org/wiki/Human_Genome_Project, etc.
Gene counts
Coding genes A known gene is an Ensembl gene for which at least one known transcript has been 20,476
annotated:
Non-Coding Genes 22,170
Pseudogenes A noncoding sequence similar to an active protein: 13,322
Gene exons The part of the genomic sequence that remains in the transcript (mRNA) after 700,947
introns have been spliced out.:
Gene transcripts Nucleotide sequence resulting from the transcription of the genomic DNA to 194,015
mRNA. One gene can have different transcripts or splice variants resulting from the alternative
splicing of different exons in genes:
Other
Short Variants (SNPs, indels, somatic mutations): 54,418,495
Structural variants: 9,235,137
Some interesting results of the human genome, corrected with new data, from e.g.
the 1,000 genome project:
Largest gene: DMD, which codes for dystrophin; size: 2,224,919 bases; location:
Xp21.2
Longest coding sequence: TTN, codes for titin; coding sequence: 104,076 bp;
34,692 amino acid
Longest exon: TTN: 17,106 bp
Most exon: TTN; 351
20% of the genome is gene desert (a region >500 kbp without a gene)
Gene rich chromosomes: 17, 19, 22 (the richest is the 19, with 1,484 genes, and
25.10 genes/Mb)
Gene-poor chromosomes: Y, 4, 13, 18, and X; the poorest is the Y with 72 genes
and ~ 1.2 gene/Mb
The 5’ end of the 98.12% of the introns are GT bases and AG at the 3’ end;
0.76% is GC-AG
The recombination is higher in females than in males, but the number of
mutations is higher in male meiosis, which means that the majority of the
mutations originates from males.
Every new-born receives about 60 mutations from the parents.
Every individual has 250-300 loss-of-function mutations in the annotated
genes, among which 50-100 genes are involved in Mendelian diseases. It
shows, among others, why it is so dangerous when the parents are relatives. The
closer is the kinship, the higher is the probability that the child receives two
mutations from the same gene, resulting in recessive diseases, or even
multigenetic syndromes.
46% of the human genome consists of repeats. A lot of them are transposons, i.e.
jumping genes, inactivated about 40 million years ago. The most frequent repeats
are called Alu, which occupy of the 10.6% of the genome.
Several hundreds of human genes originate from bacteria, through horizontal
gene transfer.
There are long repeated regions in the pericentromeric and subtelomeric
regions.
At present 156 imprinted genes are known. Imprinting is a genetic phenomenon
by which certain genes are expressed in a parent-of-origin-specific manner.
Appropriate expression of imprinted genes is important for normal development,
with numerous genetic diseases associated with imprinting defects including
Beckwith–Wiedemann syndrome, Silver–Russell syndrome, Angelman syndrome
and Prader–Willi syndrome. 56% of these genes are maternally, 44% are
paternally imprinted (http://www.geneimprint.com/site/genes-by-species;
http://en.wikipedia.org/wiki/Genomic_imprinting).
There are 27-29,000 CpG islands.
CpG islands or CG islands are genomic regions that contain a high frequency of
CpG sites (http://en.wikipedia.org/wiki/CpG_island). The "p" in CpG refers to the
phosphodiester bond between the cytosine and the guanine, which indicates that
the C and the G are next to each other in sequence. 99% of the methylation
occurs at CG dinucleotides, which influence the transcription of the nearby genes,
and play important roles in genetic regulation, imprinting and cell differentiation.
The methylation occurs on the cytosine. About 70% of human promoters have a
high CpG content. In the ENCODE project it was found that 96% of CpGs exhibited
differential methylation in at least one cell type or tissue assayed, and levels of
DNA methylation correlated with chromatin accessibility. Methylation in the
promoter reduces, in the gene bodies increases the expression of the genes.
In stem cells 25% of the methylation occurs in CA, instead of CG.
Besides methylation of the CpG islands, modifications (methylation,
acethylation, etc.) of the histone proteins around the chromosomes also play an
important role in the regulation of gene expression. To study these phenomena
the Human Epigenome Consortium was founded and the Human Epigenome
Project was launched (http://www.epigenome.org/). From these a new scientific
area has been formed, called epigenomics, or epigenetics.
There are two different genome region types, which participate in the regulation
of gene expression. Promoter regions located near the genes they transcribe, on
the same strand and upstream, towards the 5' region of the sense strand; and the
enhancer regions that regulate expression of distant genes. Beyond the linear
organization of genes and transcripts on chromosomes lies a more complex (and
still poorly understood) network of chromosome loops and twists through
which promoters and more distal elements, such as enhancers, can communicate
their regulatory information to each other. In the ENCODE project more than
70.000 promoter and nearly 400.000 enhancer regions were detected.
Enhancers are often cell-type specific.
Several paralogous genes have been detected. According to the definition,
paralogs are two genes or clusters of genes at different chromosomal locations in
the same organism that have structural similarities indicating that they derived
from a common ancestral gene, and have since diverged from the parent copy by
mutation and selection or drift. By contrast, orthologous genes are ones which
code for proteins with similar functions, but exist in different species, and are
created from a speciation event.
Until October 2012, 13,322 pseudogenes have been detected. In contrast to
paralogs, pseudogenes are dysfunctional relatives of genes that have lost their
protein-coding ability or are otherwise no longer expressed in the cell.
Duplicated pseudogenes have intron-exon-like genomic structures and may still
maintain the upstream regulatory sequences of their parents. In contrast,
processed pseudogenes, having lost their introns, contain only exonic sequence
and do not retain the upstream regulatory regions. In the human genome,
processed pseudogenes are the most abundant type due to a burst of
retrotranspositional activity in the ancestral primates 40 million years ago.
Originally thought as functionless, pseudogenes have been suggested to exhibit
different types of activity. Firstly, they can regulate the expression of their
parent gene by decreasing the mRNA stability of the functional gene through
their over-expression. A good example is the MYLKP1 pseudogene, which is up-
regulated in cancer cells. The transcription of MYLKP1 creates a non-coding RNA
(ncRNA) that inhibits the mRNA expression of its functional parent, MYLK.
Moreover, studies in Drosophila and mouse have shown that small interfering
RNA (siRNA) derived from processed pseudogenes can regulate gene expression
by means of the RNA-interference pathway, thus acting as endogenous siRNAs.
In addition, it has also been hypothesized that pseudogenes with high sequence
homology to their parent genes can regulate their expression through the
generation of anti-sense transcripts. Finally, pseudogenes can compete with
their parent genes for microRNA (miRNA) binding, thereby modulating the
repression of the functional gene by its cognate miRNA. According to predictions,
at least 9% of the pseudogenes present in the human genome are actively
transcribed.
There are several web pages containing information about the genomes of human
and other organisms (e.g.: http://genome.ucsc.edu/; http://www.ensembl.org/;
http://www.ncbi.nlm.nih.gov/). There is still an important topic, not detailed above,
which is about the variations in the genome. We consider it, however, so important that
there is a special subchapter for this topic (see below).
The mapping of the human genome has not been finished after the completion of the
HGP. The Genome Reference Consortium has been founded, whose main task is to
map the missing gaps. These are located in difficult-to-sequence regions, usually in
repeat-rich regions. At the completion of the HGP about 350 gaps were in the genome.
These regions are not small; they represent about 5% of the genome. To fill these gaps
are far from easy, which is shown by the fact that 6 years after the initiation of this
project, in 2009, only 50 such gaps were completed.
individuals, or there can be more copies of a gene in some people. Usually, it causes no
large phenotypic differences, but there are several diseases, where CNVs can play a
role, like Crohn’s disease, Alzheimer disease, autism, obesity, AIDS, etc.
CNVs can play a role in transplantation. If in the organ acceptor, owing to a CNV, a
gene is missing, and the gene is present in the donor, a graft-versus-host disease can
develop in spite of MHC identity, i.e. an immune response could develop against the gene
product.
Structural differences were also observed in concordant twins. This observation
questions the long standing notion that monozygotic twins are essentially genetically
identical, and also shows that structural variation might also originate during somatic
development.
The discovery of the abundance of the CNVs also changed our view of the genomic
differences among individuals. In the first two papers about the sequence of the human
genome it was stated that the genomic difference between two individuals is 0.1%, i.e.
any two persons are in 99.9% identical. At that time it caused a large media coverage.
The differences were attributed in these papers mainly to SNPs. Later the diploid
sequences of both Craig Venter and James Watson have been published. Analysis of
diploid sequences has shown that non-SNP variation, i.e. CNVs accounts for much
more human genetic variation than single nucleotide diversity. It is estimated that
approximately 0.4% of the genomes of unrelated people typically differ with respect to
copy number. When copy number variation is included, human to human genetic
variation is estimated to be at least 0.5% (99.5% similarity). In Table 8.2 the number
of variations can be seen in some sequenced genomes. But, according to more recent
studies, between individuals, separated historically long ago from each other, the
difference can be as high as 2-3%. For this difference large genomic rearrangements can
be responsible. Populations separated by distance tend to drift apart genetically over
time, and roughly 95% of variability between populations is a result of this random drift.
For some differences the natural selection is responsible (see Chapter 12).
Number of SNPs
Genome of J. Craig Venter 3,213,401
Genome of James Watson 3,322,093
Asian genome 3,074,097
Yoruban (African) genome 4,139,196
Structural variations in Venter’s genome
n Long (bp)
CNV 62 8,855–1,925,949
Insertion/deletion 851,575 1–82,711
Block substitution 53,823 2–206
Inversion 90 7–670,345
Table 8.2. Number of variations in some sequenced genomes
There was a large change in our view regarding the development of modern human
genome. In a Science paper published in May 2010, Scante Pääbo's international team
found that a small amount—1% to 4%—of the nuclear DNA of Europeans and Asians,
but not of Africans, can be traced to Neanderthals
(http://www.sciencemag.org/content/328/5979/680.full). The most likely model to
explain this was that early modern humans arose in Africa but interbred with
Neanderthals in the Middle East or Arabia before spreading into Asia and Europe, about
50,000 to 80,000 years ago. Seven months later, on 23 December, the team published in
Nature the complete nuclear genome of a girl's pinky finger from Denisova Cave in the
Altai Mountains of southern Siberia. To their surprise, the genome was neither a
Neanderthal’s nor a modern human's, yet the girl was alive at the same time, dating to at
least 30,000 years ago and probably older than 50,000 years. Her DNA was most like a
Neanderthal's, but her people were a distinct group that had long been separated from
Neanderthals. By comparing parts of the Denisovan genome directly with the same
segments of DNA in 53 populations of living people, the team found that the Denisovans
shared 4% to 6% of their DNA with Melanesians from Papua New Guinea and the
Bougainville Islands. Those segments were not found in Neanderthals or other living
humans. The most likely scenario for how all this happened is that after Neanderthal and
Denisovan populations split about 200,000 years ago, modern humans interbred with
Neanderthals as they left Africa in the past 100,000 years. Thus Neanderthals left their
mark in the genomes of living Asians and Europeans. Later, a subset of this group of
moderns—who carried some Neanderthal DNA—headed east toward Melanesia and
interbred with the Denisovans in Asia on the way. As a result, Melanesians inherited
DNA from both Neanderthals and Denisovans, with as much as 8% of their DNA coming
from archaic people (http://en.wikipedia.org/wiki/Denisova_hominin).
Later it was shown that archaic people contributed more than half of the alleles that
code for proteins made by the human leukocyte antigen system (HLA), which helps the
immune system to recognize pathogens. Pääbo's team published the complete genome
of the Denisovan cave girl. She didn't carry B*73—and it hasn't been found in Siberia—
but she carried two other linked HLA-C variants, which occur on the same stretch of
chromosome 6. If living people have any of these variants, they almost always carry at
least two of the three variants—as did the cave girl. So even though she lacked B*73, the
researchers propose that all three variants were inherited, often in pairs, from archaic
reported that its members were able to assign biochemical functions to over 80% of the
genome.
In short, the following functional elements could be distinguished:
In the 30 papers there are a lot of interesting results, let us see some of them:
It was found that about 75% of the genome is transcribed at some point in some
cells, and that genes are highly interlaced with overlapping transcripts that are
synthesized from both DNA strands. These findings force a rethink of the definition of
a gene and of the minimum unit of heredity.
Some studies were based on the DNase I hypersensitivity (DHS) assay. DHSs are
genomic regions that are accessible to enzymatic cleavage as a result of the
displacement of nucleosomes (the basic units of chromatin) by DNA-binding proteins.
The authors identified cell-specific patterns of DNase I hypersensitive sites that
show remarkable concordance with experimentally determined and computationally
predicted binding sites of transcription factors. DNA binding of a few high-affinity
transcription factors displaces nucleosomes and creates a DHS, which in turn facilitates
the binding of further, lower-affinity factors. The results also support the idea that
transcription-factor binding can block DNA methylation, rather than the other
way around — which is highly relevant to the interpretation of disease-associated sites
of altered DNA methylation.
Beyond the linear organization of genes and transcripts on chromosomes lies a more
complex (and still poorly understood) network of chromosome loops and twists
through which promoters and more distal elements, such as enhancers, can
communicate their regulatory information to each other.
ENCODE defined 8800 small RNA molecules and 9600 long noncoding RNA
molecules, each of which is at least 200 bases long. It was found that various ones home
in on different cell compartments, as if they have fixed addresses where they operate.
Some go to the nucleus, some to the nucleolus, and some to the cytoplasm, for example.
Earlier investigations aimed for finding genomic background of complex diseases or
traits (e.g. height) found that the majority (~93%) of disease- and trait-associated
variants lay within noncoding sequence, complicating their functional evaluation. The
map created by ENCODE reveals that many of these disease-linked regions include
enhancers or other functional sequences. And cell type is important. In one study,
several variants were studied that were earlier found strongly associated with systemic
lupus erythematosus, a disease in which the immune system attacks the body’s own
tissues. It was noticed that the variants identified in genomic studies tended to be in
regulatory regions of the genome that were active in an immune-cell line, but not
necessarily in other types of cell.
In one cell line (K562) 127,417 promoter-centered chromatin interactions were
detected, 98% of which were intra-chromosomal (which is called gene kissing). Multi-
genic interactions could be detected in 90% of the genes including several promoter-
Comparing the genome of mouse and human, it turned out that 99% of the protein
coding genes in the mouse have human homologs. It means that at gene level, the two
species differ from each other in about 300 genes. Thus, the mouse is suitable for model
organism, for investigation of gene functions, for disease models in genetic diseases, etc.
The closest relative of the human species is the chimpanzee. Applying molecular
dating it is known that human and chimpanzee speciation occurred less than 6.3 million
years ago. But, because of the reduced amount of variation on the X chromosome,
humans and chimpanzees were still exchanging X chromosomes 1.2 million years after
the species split. The similarities of human and chimpanzee protein sequences are
remarkable. About 50,000 amino acid differences separate us and chimpanzees. The
identification of non-alignable sequences in the two genomes that were due to small-
and large-scale segmental deletions and duplications, showed that the overall
difference between the two genomes is actually ~4%.
In contrast, the common chimpanzee (Pan troglodytes) and human Y chromosomes
are very different from each other. Many of the differences between the chimpanzee
and human Y chromosomes are due to gene loss in the chimpanzee and gene gain in the
human. It was found that the chimp Y chromosome has only two-thirds as many distinct
genes or gene families as the human Y chromosome, and only 47% as many protein-
coding elements as humans.
Another interesting finding was that difference was found in the conserved FOXP2
gene between humans and chimpanzee. Because in humans mutations in this gene cause
a severe speech and language disorder, this was named (wrongly) the language gene
(http://news.nationalgeographic.com/news/2001/10/1004_TVlanguagegene.html). In
contrast, Pääbo found that the FOXP2 gene is the same in modern humans and in
Neanderthals, raising the possibility that the Neanderthals could speak.
Comparison of human and Neanderthal genome indicated that there are only 1000 to
2000 amino acid differences between the two species. The researchers found 78
protein-altering sequence changes that seem to have arisen since the divergence from
Neanderthals several hundred thousand years ago, plus a handful of other genomic
regions that show signs of positive selection in modern humans. These are linked to
sperm motility, wound healing, skin function, genetic transcription control and cognitive
development.
8.9. Literature
1. http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml 2009.
2. International Human Genome Sequencing Consortium: Initial sequencing and
analysis of the human genome. Nature 2001;409:860-921.
3. Venter JC et al. The sequence of the Human Genome. Science 2001;291:1304-51.
4. International Human Genome Sequencing Consortium: Finishing the euchromatic
sequence of the human genome Nature 431, 931 - 945 (21 October 2004)
5. http://genomics.xprize.org/
6. Rusk N, Kiermer V.Primer: Sequencing—the next generation. Nature Methods
2008;5:15.
7. http://www.genome.gov/10005107; 2009.
8. Pennisi E. 1000 Genomes Project Gives New Map Of Genetic Diversity. Science 2010;
330: 574-5.)
9. http://www.epigenome.org/; 2009.
10. Redon R. et al.: Global variation in copy number in the human genome. Nature 2006;
444: 444-454.
11. Armour JA. Copy number variation and antigenic repertoire. Nat Genet.
2009;41(12):1263-4.
12. Bruder CE et al.: Phenotypically concordant and discordant monozygotic twins
display different DNA copy-number-variation profiles. Am J Hum Genet.
2008;82:763-71.
13. Ng PC et al. Genetic variation in an individual human exome. PLoS Genet. 2008 Aug
15;4(8):e1000160.
14. Reich D et al. Genetic history of an archaic hominin group from Denisova Cave in
Siberia. Nature. 2010 Dec 23;468(7327):1053-60.
15. Green RE et al. A draft sequence of the Neandertal genome. Science. 2010 May
7;328(5979):710-22.
16. Reich D et al. Denisova admixture and the first modern human dispersals into
southeast Asia and oceania. Am J Hum Genet. 2011 Oct 7;89(4):516-28.
17. Burbano HA et al. Targeted investigation of the Neandertal genome by array-based
sequence capture. Science. 2010 May 7;328(5979):723-5.
18. Gibbs W.W. (2003) "The unseen genome: gems among the junk", Scientific American,
289(5): 46-53.
19. The ENCODE Project Consortium. Identification and analysis of functional elements
in 1% of the human genome by the ENCODE pilot project. Nature 2007; 447:799-816.
20. Parker SC, Hansen L, Abaan HO, Tullius TD, Margulies EH. Local DNA Topography
Correlates with Functional Noncoding Regions of the Human Genome. Science. 2009;
324: 389 – 392.
21. Fire A, Xu S, Montgomery M, Kostas S, Driver S, Mello C (1998). "Potent and specific
genetic interference by double-stranded RNA in Caenorhabditis elegans". Nature 391
(6669): 806–11.
22. Swami M. RNA world: A new class of small RNAs Nature Reviews Genetics 2009;10,
425.
23. Waterston RH. et al. Initial sequencing and comparative analysis of the mouse
genome. Nature 2002; 420 (6915) 520 - 562.
24. Kirkness EF et al. The Dog Genome: Survey Sequencing and Comparative Analysis.
Science. 2003; 301:1898-1903
25. Krause J et al. The Derived FOXP2 Variant of Modern Humans Was Shared with
Neandertals. Current Biology 2007; 17: 1908-1912
26. ENCODE Project Consortium et al. An integrated encyclopedia of DNA elements in
the human genome. Nature. 2012 Sep 6;489(7414):57-74.
8.10. Questions
1. What is the genome?
2. What is genomics?
3. What is the difference between genetics and genomics?
4. When did the Human Genome Project start?
5. Which organizations were involved in the initialization of the HGP?
6. Give some examples about the main aims of the HGP!
7. What was the name of the sequencing method which was proposed by Craig Venter,
and what was the name of the company funded by him?
8. What is the Archon Genomics X PRIZE?
It must be noted, however, that the genomic results in medicine infiltrate only very
slowly to the clinical practice, and many things are different than in the 90s was
expected. Besides, genomics obeys the First Law of Technology: we invariably
overestimate the short-term impacts of new technologies and underestimate their
longer-term effects. Here are some reasons that can be the cause of the above mentioned
problems:
1. The individuals are genetically too heterogeneous, which makes the personal
therapy very difficult, although there are some positive examples, especially in
cancer therapy.
2. Even if an increased risk to a certain disease is recognized, which can be
prevented by changing of lifestyle, people usually do not incline to change very
easily, e.g. everybody knows the increased risk associated with smoking,
alcoholism, drugs, or sedentary lifestyle, but most people do not care about it.
their "phenotypes"). Heritability can change without any genetic change occurring (e.g.
when the environment starts contributing to more variation).
Any particular phenotype can be modelled as the sum of genetic and environmental
effects:
Phenotype (P) = Genotype (G) + Environment (E).
Likewise the variance in the trait – Var (P) – is the sum of genetic effects as follows:
Var(P) = Var(G) + Var(E) + 2 Cov(G,E).
In a planned experiment Cov(G,E) can be controlled and held at 0. In this case,
heritability is defined as:
H2 = Var (G) / Var (P).
H2 is the broad-sense heritability. See for more details here.
In the scientific literature heritability is often given in percent. E.g. the heritability of
height is 80%. Heritability cannot be interpreted at an individual level; it is specific to a
particular population in a particular environment.
A prerequisite for heritability analyses is that there is some population variation to
account for. In practice, all traits vary and almost all traits show some heritability. For
example, in a population with no diversity in hair color, "heritability" of hair color would
be undefined. In populations with varying values of a trait, variance could be due to
environment (hair dye for instance) or genetic differences, and heritability could vary
from 0-100%.
This last point highlights the fact that heritability cannot take into account the effect
of factors which are invariant in the population. Factors may be invariant if they are
absent and don't exist in the population (e.g. no one has access to a particular antibiotic),
or because they are omni-present (e.g. if everyone is drinking coffee).
together with the QT. Such loci e.g. are those which segregate together with
elevated cholesterol or fasting insulin level.
Let us go back to Figure 9.1. If a locus has two alleles with equal frequency, one of
which reduces the value of the trait, the other increases it, then, as depicted in Figure
9.1A, in respect of the QT, the population can be divided into three groups. In Figure
9.1B and C those cases can be seen when there are two or three loci influencing the QT.
In case of 3 loci there can be as much as 7 different genotypes associated with a QT value
in the population. In cases of multifactorial traits, the QT is usually influenced by several
hundreds of loci, plus the environmental factors. In these cases the distribution will be
continuous, and if we determine the QT, there is a huge number of possible genotype
and environmental factor combinations which can be responsible for the given value. It
means that the determination of the QT will give very little information regarding the
genotype.
In Table 9.1 there are some characteristics which make the determination of the
genetic background of the multifactorial diseases difficult.
Figure 9.1. A. If a locus has two alleles with equal frequency, one of which reduces the
value of the trait, the other increases it, then, in respect of the QT the population can be
divided into three groups. B, C. There are two and three loci influencing the QT,
respectively. In Figure 9.1B, in case of 3 loci there are already 7 genotypes associated with
the QT value of. D. In cases of multifactorial traits, the QT is usually influenced by several
hundreds of loci, plus the environmental factors. In these cases the distribution will be
continuous. Source: http://www.ncbi.nlm.nih.gov/books/NBK7564/;
http://www.ncbi.nlm.nih.gov/books/NBK7564/figure/A2478/?report=objectonly;
31/05/2013)
Problems Explanation
Genetic heterogeneity Different allelic combinations lead to
similar phenotypes.
Phenocopy Environmental factors lead to the same
clinical phenotype as do the genetic factors.
In other words, the environmental
condition mimics the phenotype produced
by a gene.
Pleiotropy The genetic variation can lead to different
phenotypes.
Incomplete penetrance Some individuals fail to express the trait,
even though they carry the trait associated
alleles.
The exact diagnosis is difficult Often in complex diseases there are no
standard diagnoses. There are subtypes of
the diseases that cannot be differentiated
with standard methods. The symptoms can
change with the time, or manifest in
episodes. Different diseases with similar
symptoms. Concordance of different
diseases.
Table 9.1. Factors, which make the determination of the genetic backgrounds of the
complex diseases difficult
strong impact, like in the case of macular degeneration. In contrast, the determination of
the genetic background of monogenic diseases is a great success; it has been clarified for
about 4000 such diseases so far.
What can be the reason for this situation, which is often called the dark matter of
heritability? Previously, some explanations have been already mentioned and below
some additional ones will be given.
Figure 9.2. Adapted from: Manolio TA et al. Finding the missing heritability of complex diseases.
Nature. 2009 Oct 8;461(7265):747-53 (http://www.ncbi.nlm.nih.gov/pubmed/19812666).
The rare variants can also cause another statistical problem called synthetic
associations. In this case rare variants at the locus create multiple independent
association signals captured by common tagging SNPs causing that variants, which do
not participate in the given phenotype, will be falsely named.
The other problem is the lack of proper statistical methods. One problem is called
the multiple testing problem.
If in a GWAS 100 thousand genetic variations are measured, in a statistical point of
view it means that 100 thousand independent measurements are carried out. In this
case the probabilities of the false results are summed up. In statistics, p < 0.05 is used as
a significance threshold. It means that the probability of the false statement is 5% (we
can make a false statement 5 times in 100 independent investigations). One of the
methods to correct this is called Bonferroni correction (see
http://en.wikipedia.org/wiki/Bonferroni_correction). In this case, 0.05 is divided by the
number of the measurements (in this case with 100 thousand; p = 0.05/100.000 = 5×10-7).
But the number of the independent investigations depends not only on the number of
the measurements, but on several other factors, like the number of the samples, the
clinical parameters and the type of tests, etc. But the Bonferroni correction is too
conservative, i.e. if the correction is applied, only the strongest effects can be detected. In
contrast, according to the CD/CV hypothesis the complex diseases develop through
interactions between multiple genetic variants with weak effects and the environment.
In addition, as the genetic factors interact with each other, if we want to calculate this
interaction as well, it would increase the number of independent questions to a very
large number. It means that the Bonferroni corrections and the similar other methods
are not capable of detecting the variants of weak effects, i.e. other methods are needed.
The result is widespread chronic obesity and related health problems like diabetes
(http://en.wikipedia.org/wiki/Thrifty_gene_hypothesis).
The theory regarding the salt-conserving phenotype also belongs to the thrifty
phenotype hypothesis, and it hypothesizes, why the frequency of hypertension is
elevated especially in the USA black population. Earlier it was advantageous if salt-losing
was kept minimal, especially for populations living in warm climate. In those times it
was not as easy to get salt as it is today. As normal level of sodium chloride is essential
for life, the ancestral sodium-conserving genotypes gave a selection advantage. In our
days when salt is abundant, and a lot of black people now live in cooler climate, this
genetic background leads to a higher susceptibility to salt sensitive hypertension. But it
is also true for non-black-population. The selection pressure made the humans salt
craving, and now most people eat much more salt than would be needed.
There are several spectacular examples for the thrifty gene hypothesis. Pima Indians
live in the USA and in Mexico. Earlier these people lived a hunter-gatherer existence and
enjoyed both prosperity and good health. Shortly after the Pima Indians encountered the
Anglos and Mexicans, however, they suffered a great famine. Over the course of just a
decade or two, the Pima went from being rich and healthy to being poor and unhealthy.
In our days, however, Pima Indians have an unusually high rate of obesity and T2DM.
The prevalence of these diseases is above 50%. In contrast, the rate of obesity in the
Indians living in Mexico is about 8%, although the genetic background of the two
populations must be very similar.
According to this hypothesis, we can say that these diseases are caused by normal
genes, which got to an unfavorable environment.
inflammatory diseases seems at least partly due to gradually losing contact with the
range of microbes our immune systems evolved with, way back in the Stone Age.
Only now are we seeing the consequences of this, doubtless also driven by genetic
predisposition and a range of factors in our modern lifestyle -- from different diets and
pollution to stress and inactivity. It seems that some people now have inadequately
regulated immune systems that are less able to cope with these other factors."
(http://www.sciencedaily.com/releases/2012/10/121003082734.htm). "Since the
1800s, when allergies began to be more noticed, the mix of microbes we've lived with,
and eaten, drunk and breathed in has been steadily changing. Some of this has come
through measures to combat infectious diseases that used to take such a heavy toll in
those days. In London e.g., 1 in 3 deaths was a child under 5. These changes include
clean drinking water, safe food, sanitation and sewers, and maybe overuse of antibiotics.
Whilst vital for protecting us from infectious diseases, these will also have inadvertently
altered exposure to the 'microbial friends' which inhabit the same environments."
The other associated hypothesis is the Th1 maturation hypothesis. It says that
children with an inborn Th1 maturation defect might survive by better health care and
antibiotic use at the cost of higher asthma and allergy rates.
9.14. Literature
1. Sørensen TI et al. Childhood body mass index--genetic and familial environmental
influences assessed in a longitudinal adoption study. Int J Obes Relat Metab Disord.
1992 Sep;16(9):705-14.
2. Manolio TA et al. Finding the missing heritability of complex diseases. Nature. 2009
Oct 8;461(7265):747-53.
3. Kaati G et al. Cardiovascular and diabetes mortality determined by nutrition during
parents' and grandparents' slow growth period. Eur J Hum Genet. 2002
Nov;10(11):682-8.
4. Eldar A, Elowitz MB. Functional roles for noise in genetic circuits. Nature. 2010 Sep
9;467(7312):167-73.
5. Moore JH, Asselbergs FW, Williams SM (2010) Bioinformatics challenges for genome-
wide association studies. Bioinformatics 26: 445-455.
6. Yang J et al. Genomic inflation factors under polygenic inheritance. Eur J Hum Genet.
2011;19(7):807-12.
7. Torkamani A, Topol EJ, Schork NJ. Pathway analysis of seven common diseases
assessed by genome-wide association. Genomics. 2008 Nov;92(5):265-72.
8. Johnson RJ, Andrews P, Benner SA, Oliver W. Theodore E. Woodward award. The
evolution of obesity: insights from the mid-Miocene. Trans Am Clin Climatol Assoc.
2010;121:295-305;
9. Lev-Ran A, Porta M. Salt and hypertension: a phylogenetic perspective. Diabetes
Metab Res Rev. 2005 Mar-Apr;21(2):118-31.
10. Franceschi C et al. Inflamm-aging. An evolutionary perspective on
immunosenescence. Ann N Y Acad Sci. 2000 Jun;908:244-54.
11. Capri M et al. Human longevity within an evolutionary perspective: the peculiar
paradigm of a post-reproductive genetics. Exp Gerontol. 2008 Feb;43(2):53-60.
12. Candore G et al. Inflammation,longevity, and cardiovascular diseases: role of
polymorphisms of TLR4. Ann N Y Acad Sci. 2006 May;1067:282-7.
13. ENCODE Project Consortium et al. An integrated encyclopedia of DNA elements in
the human genome. Nature. 2012 Sep 6;489(7414):57-74.
9.15. Questions
1. What are the complex diseases?
2. What features have the complex diseases?
3. Why is it important to study the genomic background of complex diseases?
4. What are the difficulties which cause that genomic results infiltrate only slowly to
the practice?
5. How can we prove that a disease has a heritable fraction?
6. What are the problems with the λ values?
7. How can the bias because of the environmental factors be mitigated?
8. What is the heritability of a trait?
9. What is the QT?
10. What are the discontinuous traits?
11. What is the QTL?
12. What are the factors, which make the determination of the genetic backgrounds of
the complex diseases difficult?
In this chapter the main methods for the investigation of the genomic backgrounds of
the complex diseases and some related theoretical considerations will be summarized.
The basic genetic methods will not be described here.
levels of heterozygosity and low levels of population differentiation and are therefore
suitable for universal human identification purposes. Multiplex genotyping assays for
these SNPs have been developed.
where there were at least two affected siblings. These studies are also called affected
sib pair (ASP) studies, or linkage studies. Here LOD scores were calculated. The LOD
score (logarithm (base 10) of odds) is a statistical test often used for linkage analysis.
The LOD score compares the likelihood of obtaining the test data if the two loci, or the
disease phenotype and a locus are indeed linked, to the likelihood of observing the same
data purely by chance. Positive LOD scores favor the presence of linkage, whereas
negative LOD scores indicate that linkage is less likely. A LOD score greater than 3.0 is
considered evidence for linkage. A LOD score of +3 indicates 1000 to 1 odds that the
linkage being observed did not occur by chance.
On the other hand, a LOD score of less than -2.0 is considered evidence to exclude
linkage (http://en.wikipedia.org/wiki/Genetic_linkage).
The method has given a lot of interesting results, but there have been several problems
with it. First, it is difficult to collect families with two affected siblings, second, the
genotyping of the microsatellies are very cumbersome and expensive. Because of this
latter, the number of microsatellites in the studies was limited (usually not more than
400), thus the resolution was very low. This means that it was a great chance that
disease associated loci, which were not in linkage with any of the microsatellites were
lost. In addition, these studies could determine only genomic regions (because of the
limited number of markers), and not genes. And often, these regions are large, several
megabase long and contain several hundreds of genes. In this way, additional methods
are needed for the determination of the genes.
10.2.2. GWAS
Presently, the most popular method for the study of the genomic background of complex
diseases and traits is called GWAS (genome-wide association study), also known as
whole genome association study (WGA study or WGAS). The method has become
possible, when arrays and chips have been developed with which first 100 thousand,
then several million SNP could be genotyped in one measurement, and the price of one
chip has become relatively cheap, i.e. about $100. First, only SNPs were determined,
later, when the significance of CNVs became apparent, they were involved as well. The
CNVs were determined through their known linkage with SNPs. In 2007 this method
was selected for the breakthrough of the year.
There are two main companies in the markets, Affymetrix and Illumina. The
Affymetrix Genome-Wide Human SNP Array 6.0 features 1.8 million genetic markers,
including more than 906,600 SNPs and more than 946,000 probes for the detection of
CNVs.
The Illumina HumanOmni5-Quad (Omni5) BeadChip can detect 4.3 million tagSNPs
selected from the International HapMap and 1000 Genomes Projects that target genetic
variation down to 1% minor allele frequency (MAF).
In GWAS the distribution (frequencies) of the variants is compared in the different
populations; usually one of them is affected with the trait, the other is not. But, with the
development of the statistical methods GWAS has become capable of studying the
genomic background of continuous traits (like fasting glucose levels or blood pressure)
as well. In this latter case there are no different groups.
GWAS has been offering a great chance for the investigation of the genomic
background of the diseases, which have been utilized by a lot of research groups and
consortia. Because of the strict statistical conditions and the large investigated
populations, the results of GWAS may contain only few false results; and because this is
a hypothesis-free method, there is a possibility that it reveals new aspects of the disease.
To make these important results public, a web page was established on 25 November
2008 (A Catalog of Published Genome-Wide Association Studies), and it includes only
those publications which investigate at least 100,000 SNPs in the initial stage.
Publications are organized from most to least recent date of publication, indexing from
online publication if available. Studies focusing only on candidate genes are excluded
from this catalog. Studies are identified through weekly PubMed literature searches,
daily NIH-distributed compilations of news and media reports, and occasional
comparisons with an existing database of GWAS. SNP-trait associations listed here are
limited to those with p-values < 1.0 × 10-5. The catalog included on 9 November 1394
publications and 7454 SNPs.
tagSNPs in genes or regulatory regions are genotyped, which were detected by other
studies (e.g. by GWAS or gene expression measurements).
In these studies a lot of the statistical difficulties are solved, and it is a greater chance
to find the responsible genes or variants.
• Pathway Genomics[28] analyses over 100 genetic markers to identify genetic risk
for common health conditions such as melanoma, prostate cancer and
rheumatoid arthritis.
• 23andMe sells mail order kits for SNP genotyping.[29] The information is stored in
a user profile and used to estimate the genetic risk of the consumer for 178
diseases and conditions, as well as ancestry analysis. 23andMe utilizes a DNA
array manufactured by Illumina.
• SNPedia is a wiki that collects and shares information about the consequences of
DNA variations, and through the associated program Promethease, anyone who
has obtained DNA data about themselves (from any company) can get a free,
independent report containing risk assessments and related information.
• Knome[30] provides full genome (98% genome) sequencing services for $4,998
for whole genome sequencing and interpretation for consumers,[31] or $29,500
for whole genome sequencing and analysis for researchers, depending on
requirements.[10][32][33][34]
There are some controversial ethical and legal issues connected with these services,
e.g.: “Genetic discrimination is discriminating on the basis of information obtained from
an individual’s genome. Genetic non-discrimination laws have been enacted in some US
states and, at the federal level, by the Genetic Information Nondiscrimination Act
(GINA). The GINA legislation prevents discrimination by health insurers and employers,
but does not apply to life insurance or long-term care insurance. Patients will need to be
educated on interpreting their results and what they should be rationally taking from
the experience. It is not only the average person who needs to be educated in the
dimensions of their own genomic sequence but also professionals, including physicians
and science journalists, who must be provided with the knowledge required to inform
and educate their patients and the public.”
Some companies use the collected data also for scientific studies, naturally, after the
informed consent has been signed by the participants. The 23andMe e.g. has already
published a paper about the genomic background of Parkinson disease
(http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002141).
The experiences are usually positive. Earlier it has been generally believed that if an
individual receives data that he/she has a greater risk to a certain disease, she/he will
be depressive or it will influence his/her psyche adversely. But according to surveys no
such things have been experienced. Possibly, it is similar to the case when somebody is a
smoker or obese, and has a greater chance for the development of several serious
illnesses. In contrast, most participants changed their lifestyle to avoid the disease. It
must be added, however, that these participants must be more health-conscious than the
average people.
regions are sequenced, which is about 30 Mb, 1% of the whole genome. According to the
estimations, 85% of the mutations causing monogenic diseases are in this region. Its
disadvantage is that in the complex diseases, the majority of variations are outside this
region. But possibly, the rarer variations with strong effects are rather here.
With the development of NGS several methods have been worked out utilizing the
technique.
The central method of the ENCODE project was the DNase-seq
(http://www.nature.com/encode/#/threads). The DNase I enzyme will preferentially
cut live chromatin preparations at sites where nearby there are specific (non-histone)
proteins. The resulting cut points are then sequenced using high-throughput sequencing
to determine those sites ‘hypersensitive’ to DNase I, corresponding to open chromatin.
The cell-specific patterns of DNase I hypersensitive sites show remarkable concordance
with experimentally determined and computationally predicted binding sites of
transcription factors and enhancers.
ChIP-seq: Chromatin immunoprecipitation followed by sequencing. Specific regions
of crosslinked chromatin, which is genomic DNA in complex with its bound proteins, are
selected by using an antibody to a specific epitope. The enriched sample is then
subjected to high-throughput sequencing to determine the regions in the genome
most often bound by the protein to which the antibody was directed. Most often
used are antibodies to any chromatin-associated epitope, including transcription factors,
chromatin binding proteins and specific chemical modifications on histone proteins.
The development of NGS also accelerated and improved the epigenetic studies. The
methylation of the DNA most often occurs at CG dinucleotides on cytosine. Usually, it is
detected by methylation-specific PCR and comparative sequencing. The method is based on
a chemical reaction of sodium bisulfite (NaHSO3) with DNA that converts unmethylated
cytosines to uracil, followed by PCR. However, methylated cytosines will not be
converted in this process, and primers are designed to overlap the CpG site of interest,
which allows one to determine methylation status as methylated or unmethylated. The
samples can be sequenced also on NGS platform. The sequences obtained are then re-
aligned to the reference genome to determine methylation states of CpG dinucleotides
based on mismatches resulting from the conversion of unmethylated cytosines into uracil.
• There are different mouse strains, which differ from each other in disease
susceptibilities or other phenotypes. These can be used as animal-models, or
through crosses we can study the connection between segregation of genetic
markers and phenotypes. These animals can be kept in strictly controlled
environments, thus the effects of these can be easier studied.
• At gene level human is not so different from the rest of the animals. The essential
genes are usually the same; we differ from the mouse only in 300 genes. But,
species like Drosophila melanogaster (fruit fly) are also widely used, and a lot of
pathways (like Hippo pathway, which is conserved and plays an important role in
organ size control) were first discovered in this animal.
Already two Nobel prizes have been given because of the studies of this species
(http://en.wikipedia.org/wiki/Drosophila_melanogaster).
• There are a lot of experiments which in humans may not be carried out from
ethical reasons, only in animals.
• There is much easier to get tissues from animals (lung, brain etc.) and measure
gene expression, etc. The diagnoses of diseases are much more accurate.
• There are a lot of animal-models for different human diseases.
• There is a possibility to develop genetically modified animals.
Let us see the last two points in more detail. There are two basic types of genetically
modified animals used in these studies. One of them is the knock out or KO animals. In
KO animals researchers inactivate, or "knock out," an existing gene by replacing it or
disrupting it with an artificial piece of DNA.
Among them the mice are the most significant for studying the role of genes which
have a known sequence, but whose functions have not yet been determined
(http://en.wikipedia.org/wiki/Knockout_mouse). By causing a specific gene to be
inactive in the mouse, and observing any differences from normal behavior or
physiology, researchers can infer its probable function.
In 2006, 33 research centers in 9 countries founded the International Knockout
Mouse Consortium (IKMC), then in July 2011 the International Mouse Phenotyping
Consortium aiming to build a huge, shared resource for biomedical research. Mouse
embryonic stem cells have been produced, in which researchers have “knocked out”
each of the more than 20,000 specific mouse genes that code for proteins. By growing
mice from these cells, researchers can gain insight into the role that the missing genes
play in health and disease. The phenotyping effort will aim to probe the anatomy,
development, physiology, behavior, and disease traits of 5000 of these mouse lines by
the end of 2016 (http://news.sciencemag.org/scienceinsider/2011/09/the-consortium-
that-will-launch-.html).
Knocking out genes can also be used for animal-models for different diseases (Table
10.1). In these animals the molecular pathomechanisms or different therapies can be
studied.
The other types of the genetically modified animals are called transgenic. Here, the
genes are over-expressed, or new genetic information is inserted into the mouse
genome. These animals can be used for the same aims as the KO animals.
The over-expressed genes are usually under the regulation of promoters with strong
activity. It is also possible that the promoters are only active in certain organ or tissue.
In this way the gene will be over-active only in this organ. E.g. SCGB1A1 is expressed
only in the lung, but here it is highly expressed. IL5 gene was introduced after the
promoter region of this gene in a mouse strain. The mouse over-expressed the IL5 in its
There are some widespread animal models for polygenic diseases, like Non-obese
diabetic (NOD) mouse, spontaneously hypertensive rat (SHR), Dahl salt sensitive rat,
New Zealand Obes (NZO) mouse, etc.
10.4. Literature
1. International HapMap 3 Consortium, Altshuler DM et al. Integrating common and
rare genetic variation in diverse human populations. Nature. 2010;467(7311):52-8.
2. International HapMap Consortium, Frazer KA et al. A second generation human
haplotype map of over 3.1 million SNPs. Nature. 2007;449(7164):851-61.
3. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000
cases of seven common diseases and 3,000 shared controls. Nature. 2007;7;447:661-
78.
4. Pennisi E. 1000 Genomes Project Gives New Map Of Genetic Diversity. Science 2010;
330: 574-5.)
5. Wang K, Li M, Bucan M. Pathway-based approaches for analysis of genomewide
association studies. Am J Hum Genet. 2007 Dec;81(6):1278-83.
6. https://www.23andMe.com/
7. Kaye J. The regulation of direct-to-consumer genetic tests. Hum Mol Genet.
2008;17:180-3.
8. Allayee H, Ghazalpour A, Lusis AJ. Using mice to dissect genetic factors in
atherosclerosis. Arterioscler Thromb Vasc Biol. 2003 Sep 1;23(9):1501-9.
9. Mehrabian M et al. Identification of 5-lipoxygenase as a major gene contributing to
atherosclerosis susceptibility in mice. Circ Res. 2002 Jul 26;91(2):120-6.
10. Rapp JP. Genetic analysis of inherited hypertension in the rat. Physiol Rev. 2000
Jan;80(1):135-72.
11. ENCODE Project Consortium et al. An integrated encyclopedia of DNA elements in
the human genome. Nature. 2012 Sep 6;489(7414):57-74.
10.5. Questions
1. What are the genetic markers? Give examples!
2. What are the advantages and disadvantages of the STRs relative to the SNPs?
3. What basic methods are known for the study of the genomic background of the
complex diseases?
4. What are in genomics the in silico vs. wet laboratory methods?
5. Give examples for the hypothesis-driven and hypothesis-free genetic/genomic
methods!
6. What is the candidate gene association study?
7. In what genomic studies are the microsatellites used?
8. What is the disadvantage of the linkage studies?
9. What is the LOD score?
10. What is the GWAS?
11. What are the main difficulties of GWAS?
12. What does genome wide significance mean?
13. What is the pathway analysis?
14. What is the gene set enrichment analysis?
15. What is the PGAS?
16. What is the positional cloning?
17. What does personal genomics mean?
18. What can be the problems with the DTC genomics companies?
19. What does exome sequencing mean?
20. What does DN-ase-seq mean?
21. What is the ChIP-seq?
22. What are the advantages of the microarray gene expression measurements?
23. What is the RNA-seq method?
24. What is the CGH?
25. How is it possible to determine the methylation pattern?
26. What is the advantage of using animal models for studying human diseases?
27. What genomic modified animal models do you know and what are they good for?
28. What are conditional transgenic animals?
29. What are the shortcomings of the animal models?
30. Give examples for the experimental disease models!
disease samples and 3,000 common controls) and additional samples from breast
cancer.
The WTCCC became a great success, and in 2008 the WTCCC2 was established. Here
120,000 samples were collected and several further diseases were investigated with
GWAS. In 2009 WTCCC3 was also launched.
In prospective studies the samples and data are usually collected from healthy
participants, and then, their life has been tracked for decades by regular visits, data and
sample collections. The organization of these studies are much more complex, they are
more expensive and last much longer than the retrospective studies, but usually the
results are more valuable and less biased. In retrospective studies there can be several
biases. E.g. persons who died of the disease, or the mildest, who remained unrecognized
can be underrepresented.
There are several famous prospective studies. The first such large study started in
1948 in Framingham, USA, and collected samples and data from 5209 participants. Since
then the Framingham Heart Study has been in progress and already the 3rd generation
has also been involved. Most of our knowledge about the risk factors of the
cardiovascular diseases has come from this study.
The UK Biobank project is even larger. It started in 2007 and aimed for collecting
samples and data about 500,000 volunteers in the UK, at ages from 40 to 69. The
volunteers will be followed for at least 25 years thereafter. Its main aim is to investigate
the respective contributions of genetic predisposition and environmental exposure
(including nutrition, lifestyle, medications, etc.) to the development of disease.
The Avon Longitudinal Study of Parents and Children (ALSPAC) - which is also
known as Children of the 90s - is a long-term health research project. More than 14,000
mothers enrolled during pregnancy in 1991 and 1992, and the health and development
of their children have been followed in great detail ever since. The ALSPAC families have
provided a vast amount of genetic and environmental information over the years.
Around 2012 the first grandchildren were born and also included in the study. On the
web page of the study several interesting results can be read.
or in hypertension patients with low rennin levels, or in obesity patients with high waist
to hip ratio, or high leptin levels, in atherosclerosis patients with high LDL-C, or low
HDL-C, etc. In these cases we do not determine the genetic background of the diseases,
but the QTL for these QTs, and these are associated with the diseases, thus we can
determine some parts of the genetic background.
p + q = 1 and p² + 2pq + q² = 1,
where:
• Wrong genotyping.
• The sampling was not random. E.g. there are a lot of relatives in the collected
population.
• The population is inbred. In both last cases, the rate of homozygotes is increased.
• The studied allele is in a CNV (repeat) region. In this case the rate of
heterozygotes is elevated.
• The studied genotype plays a role in the disease. E.g. it triggers the disease in
homozygote form (recessive diseases). In that case the number of homozygotes is
larger than the expected one.
If there is a deviation from the HWE in controls, then the allele must usually be
excluded from the study, because it can result in wrong conclusions. In cases (case-
control studies) at the first four points the situation is the same, but at the fifth, because
the real causes are unknown, the results must be further investigated. In both controls
and cases the genotyping can be repeated with an alternative method, and the family
connections must be studied, if it is possible. At genomic studies (e.g. in case of GWAS)
when several thousands of genotypes are investigated parallel, this is possible; there are
even programs for it.
The last point is the most interesting one in cases. If a genotype is significantly more
frequent in cases, then it can mean that it increases the risk to the disease; if it is
significantly rarer, then it can protect from it. Both are valuable pieces of information
regarding the genetic background of the disease, but must be confirmed with alternative
methods. In both cases it can be said that the genotype is associated with the disease
(see more in 11.1.6).
Example:
In a population genetic study two loci were genotyped with alleles of A and G,
respectively; both have a population frequency of 50%. If there is no linkage between
them, i.e. they are inherited independently, than the occurrence of AG combination is
50%×50% = 25% in the population. If we measure 40% occurrence, it means that they
are not inherited independently, then there is LD between them. But the situation is the
same if they occur significantly rarer together (e.g. 10%) than expected.
Many measures of LD have been proposed, though all are ultimately related to the
frequency difference between a two-marker haplotype and the frequency expected
assuming that the two markers are independent. The two commonly used measures of
linkage disequilibrium are D' and r2. D' is a population genetics measure that is
related to recombination events between markers and is scaled between 0 and 1. A D'
value of 0 indicates complete linkage equilibrium, which implies frequent recombination
between the two markers and statistical independence under principles of Hardy-
Weinberg equilibrium. A D' of 1 indicates complete linkage disequilibrium, indicating no
recombination between the two markers. Alternatively, r2 is the square of the
Let us go back to the linkage. In genomic or genetic studies linkage can have two
meanings. The first one corresponds to the previously described meaning, i.e. the alleles
are often inherited together, and thus there is linkage between them. But linkage can be
between a phenotype (QT) and a marker. Here can be hypothesized that there is a locus
(genetic variation) linked to the marker, which influences the QT.
Figure 11.1. LD (or heat) map. There are 3 regions of triangles grouping red blocks
inferring 3 haplotype blocks. The numbers above the map show the marker numbers and
names of the alleles. The numbers in the squares are r2 between markers (SNPs).
Source: http://woratanti.files.wordpress.com/2009/11/picture6.jpg; 18/02/2013.
1. Direct effect: the marker allele has direct effect on the disease.
2. Natural selection: the marker allele increases the chance of survival of the
disease.
3. Population stratification (see below).
4. Statistical error (type I error, see in Chapter 8).
5. The marker allele is in LD with the variation, which directly influences the
phenotype.
When the results are evaluated, all these points must be considered for a reliable
conclusion.
In association studies population stratification is the presence of a systematic
difference in allele frequencies between cases and controls possibly due to different
ancestry. Population stratification can be a problem because the association found could
be due to the underlying structure of the population and not a disease associated locus.
To take a classic example, a GWAS for skill with chopsticks carried out in San Francisco
might identify human leukocyte antigen A1 (HLA-A1) as an allele associated with
chopstick skill, simply because this allele is more common in people of East Asian origin.
Also the real disease causing locus might not be found in the study if the locus is less
prevalent in the population, where the case subjects are chosen.
For this reason, it was common in the 1990s to use family-based data where the
effect of population stratification can easily be controlled for using methods such as the
transmission disequilibrium test. In this test family trios are used (two parents and
the affected child), and the over-transmission of an allele from heterozygous parents to
affected offsprings is investigated. If the allele has a role in the disease, it is transmitted
with greater probability to the offsprings. For an allele only those parents may be
involved in the analysis who are heterozygotes to the allele.
In our globalized world several populations can live next to each other (like e.g. in
the USA), and it is not so easy to collect case-control populations with balanced
ethnicity, and the unbalanced populations can lead to both type I and type II errors. But
if the structure is known or a putative structure is found, there are a number of possible
ways to implement this structure in the association studies and thus compensate for any
population bias. Most contemporary genome-wide association studies take the view that
the problem of population stratification is manageable, and that the logistic advantages
of using unrelated cases and controls make these studies preferable to family-based
association studies. These methods can also handle the situation, when the populations
mixed at genomic level (like black Americans with the Caucasians). This is called
population admixture. E.g. it is estimated that the median proportion of European
admixture among African Americans in the USA is 18.5%. This phenomenon can also be
utilized by a genomic method, called admixture mapping.
In association studies in most cases the associated allele is in LD with the
disease causing locus. Here, the next task is to identify the responsible locus. The first
step here is usually sequencing, but in complex diseases, where the responsible allele is
frequent, in silico methods can be used. For this, databases like dbSNP can be used, and
the linkage can be analysed by the Haploview software. If there is a suspected allele, its
possible function in the disease must be established with in vitro and in vivo methods.
Usually it is far from simple; often it is the most difficult task. If it is not in the coding
region of a gene, or does not change an amino acid code, then the task is still more
difficult. Next to the wet labor experiments, the results of the ENCODE project and
several predicting software products can also help in this job.
usually 0.05. When the null hypothesis is rejected (p<0.05), the result is said to be
statistically significant.
In retrospective studies the odds ratio (OR) is used for the estimation of the risk.
This value has a direct connection with the p-value. The odds ratio is the ratio of the
odds of the association of an allele with the trait (disease) in cases to the odds of it
in the control group.
In prospective studies and in clinical trials the relative risk (RR) is used. Relative
risk is a ratio of the probability of the association in the case group versus a
control group.
If these values are greater than 1, then the risk is elevated, if they are <1, it is lower.
Next to these values, their 95% confidence intervals (95%CI) must be given, which
depends mainly on the population size. The larger the population, the narrower the
95%CI is. For statistically significant association both values of the 95%CI must be
above 1, if the OR, or RR is >1, and below 1, if these values <1. E.g. OR = 2.2 (95%CI 1.3-
3.9) is significant, OR = 2.2 (95%CI 0.9-4.5) is not significant. If the p-value < 0.05, then
the OR is significant.
• Directional selection occurs when a certain allele has a greater fitness than
others, resulting in an increase of its frequency.
• Stabilizing selection lowers the frequency of alleles that have a deleterious
effect on the phenotype – that is, produce organisms of lower fitness.
• Purifying selection results in functional genetic features, such as protein-coding
genes or regulatory sequences, being conserved over time due to selective
pressure against deleterious variants.
• Balancing selection does not result in fixation, but maintains an allele at
intermediate frequencies in a population. E.g. heterozygote advantage when the
heterozygote state can provide an advantage in a certain environment.
The human species has a special situation in the animal world. According to a theory
of Eva Jablonka the human genome is able to tolerate potentially toxic variants thanks to
clothing, tools, agriculture and other cultural innovations that allow individuals with
these variants to survive. The human genome has accumulated more than its fair share
of potentially harmful genetic changes — in protein coding regions, promoters and even
the loss of entire genes. The relaxed selection created by human culture allowed the
evolution of more diversity and complexity, but it has also made humans more reliant on
the innovations that freed them from selection.
11.2.1.2. Role of infections in formation of the genome
The different microorganisms and infections have one of the largest roles in the
formation of human genome. This selection factor played even important role in the
history of the modern human (see e.g. Population history of indigenous peoples of the
Americas), and its effect can be felt even today. Think about the large epidemics like
cholera, pest, influenza, pox, TBC, etc., which frequently decimated the population.
Individuals who contracted one of these infections often died and were not able to pass
their genome to the next generation. In contrast, there were individuals who were
resistant or survived and could pass their genomes which gave them greater fitness. The
today population is the descendant of those who survived all the infections, and were
able to pass their genome to the next generation. Naturally, not only the genome
influences how individuals respond to an infection, but several other factors as well, like
the actual physical state, other infections, age, epigenetic states, pure chance, etc.
During the last years several traces of these microorganism–human genome
interactions could be detected in the human genome. E.g.:
From evolutionary genetic point of view population genetics is the study of allele
frequency distribution and change under the influence of the evolutionary processes:
natural selection, genetic drift, mutation and gene flow.
• Cholera: With the discovery that cholera toxin requires normal host CFTR
proteins to function properly, it was hypothesized that carriers of mutant CFTR
genes benefited from resistance to cholera and other causes of diarrhea.
• Typhoid: Normal CFTR proteins are also essential for the entry of Salmonella
Typhi into cells, suggesting that carriers of mutant CFTR genes might be resistant
to typhoid fever. No in vivo study has yet confirmed this. In both cases, the low
level of cystic fibrosis outside of Europe, in places where both cholera and
typhoid fever are endemic, is not immediately explicable.
• Diarrhea: It has also been hypothesized that the prevalence of CF in Europe might
be connected with the development of cattle domestication. In this hypothesis,
carriers of a single mutant CFTR chromosome had some protection from diarrhea
caused by lactose intolerance, prior to the appearance of the mutations that
created lactose tolerance.
• Tuberculosis: Another possible explanation is that carriers of the gene could have
some resistance to TB.
Haemoglobin S (sickle cell trait) provides a survival advantage over people with
normal haemoglobin in regions where malaria is endemic. The trait is known to cause
significantly fewer deaths due to malaria, especially when Plasmodium falciparum is the
causative organism. This is a prime example of natural selection, evident by the fact that
the geographical distribution of the gene (for haemoglobin S) and the distribution of
malaria in Africa virtually overlap. Because of the unique survival advantage, people
with the trait increase in number as more people infected with malaria and having the
normal haemoglobin tend to succumb to the complications.
Although the precise mechanism for this phenomenon is not known, several factors
are believed to be responsible.
The sickle cell trait was found to be 50% protective against mild clinical malaria,
75% protective against admission to the hospital for malaria, and almost 90% protective
against severe or complicated malaria.
the cells, the lack of it protects individuals from the infection. In the European
population the frequency of the ∆32 mutation is very high, 1/100 individuals
homozygote for it, and thus protected from HIV-1 and AIDS. Even heterozygotes have
some advantage. Although they can be infected by the virus, but the AIDS disease
developed significantly slower in them (2-4 vs. 6-8 years, in untreated people).
Interestingly, the mutation occurs only in people with European ancestry. According
to researches the mutation appeared 7000 (2900-15750) years ago. Possibly there was
an epidemic at that time in this population, in which the pathogen used the same
receptor for the infection. So far, this infection has not yet been identified, but there are
some suspects like pest and pox. In 2012 it was identified the CCR5 as a cellular
determinant required for cytotoxic targeting of subsets of myeloid cells and T
lymphocytes by the Staphylococcus aureus leukotoxin ED (LukED). CCR5-deficient
mice are largely resistant to lethal S. aureus infection thus this finding put forth the
possibility that resistance to S. aureus leukotoxins may have influenced the selection of
the Δ32 allele.
mutations. These last two examples are examples for convergent evolution, which
means that different processes in different population lead to similar phenotypes.
Often, different traits can be developed in individuals, which are only side-effects of
the changes induced by natural selection. One of the reasons of this is that most of these
genes are pleiotropic: that is, they are individually involved in several different traits.
For example, EDAR regulates hair follicle density and the development of sweat glands
and teeth. In humans, selective pressures on EDAR favoring changes in body
temperature regulation and hair follicle density in response to colder climates may have
influenced tooth shape, although this trait probably does not affect population fitness.
This example shows how 'phenotypic hitchhiking' in genes under positive selection may
have substantially increased the observed number of physiological and morphological
traits differentiating modern human populations.
Bacteria can acquire mutations or genes which are advantageous for their survival
through horizontal gene transfer, e.g. genes for antibiotic resistance. In modern humans
it was shown that archaic people contributed more than half of the alleles that code for
proteins made by the human leukocyte antigen system (HLA), which helps the immune
system to recognize pathogens. Thus, it seems that archaic genome contributed to
modern human HLA variations and selection fitness through horizontal gene
transfer.
11.3. Literature
1. International HapMap 3 Consortium, Altshuler DM et al. Integrating common and
rare genetic variation in diverse human populations. Nature. 2010;467(7311):52-8.
2. International HapMap Consortium, Frazer KA et al. A second generation human
haplotype map of over 3.1 million SNPs. Nature. 2007;449(7164):851-61.
3. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000
cases of seven common diseases and 3,000 shared controls. Nature. 2007;7;447:661-
78.
4. Pennisi E. 1000 Genomes Project Gives New Map Of Genetic Diversity. Science 2010;
330: 574-5.)
5. Wang K, Li M, Bucan M. Pathway-based approaches for analysis of genomewide
association studies. Am J Hum Genet. 2007 Dec;81(6):1278-83.
6. https://www.23andMe.com/
7. Kaye J. The regulation of direct-to-consumer genetic tests. Hum Mol Genet.
2008;17:180-3.
8. Allayee H, Ghazalpour A, Lusis AJ. Using mice to dissect genetic factors in
atherosclerosis. Arterioscler Thromb Vasc Biol. 2003 Sep 1;23(9):1501-9.
9. Mehrabian M et al. Identification of 5-lipoxygenase as a major gene contributing to
atherosclerosis susceptibility in mice. Circ Res. 2002 Jul 26;91(2):120-6.
10. Rapp JP. Genetic analysis of inherited hypertension in the rat. Physiol Rev. 2000
Jan;80(1):135-72.
11. ENCODE Project Consortium et al. An integrated encyclopedia of DNA elements in
the human genome. Nature. 2012 Sep 6;489(7414):57-74.
11.4. Questions
1. What is population genetics?
2. What types of sample collection methods do you know? Give some examples!
3. What strategies do you know for the selection of patients in retrospective studies?
What are the advantages and disadvantages?
4. What does endophenotypes mean?
5. What is HWE?
6. What can be the causes for the deviation from the HWE?
7. What does linkage disequilibrium mean?
8. What measures for LD do you know?
9. What does haplotype mean?
10. What is cM?
11. What can be the causes for genetic linkage?
12. What are the founder populations and how can they be used in genetic studies?
13. In population genetic studies what can be the cause of an association?
14. What does population stratification mean?
15. What methods do you know for the control of the problem of population
stratification?
16. What does the population admixture mean in population genetics?
17. What values can be used for the estimation of risk in association studies?
18. What is the evolutionary genetics?
19. What selection processes contributed to the development of the human genome?
20. Give examples for the microorganism–human genome interactions!
21. What is the genetic drift?
22. What is the bottleneck effect?
23. Why are some lethal mutations frequent?
24. Why is ∆F508 mutation frequent?
25. Why is the sickle cell anemia frequent in certain populations?
26. What mutation is protective against AIDS?
27. Give examples for effects forming the genome!
28. Give examples for the selection pressure of the sunlight today!
29. Give an example for the selection pressure of the available food!
30. Give an example for the selection pressure of the life in high altitude!
31. What do you know about the genetic background of lactose intolerance?
32. What does convergent evolution mean?
33. What can be the reason that sometimes traits may appear which have nothing to do
with the selection pressure?
34. What role did horizontal gene transfer play in the development of the human
genome?
In the previous chapter some examples were shown how environmental factors have
played roles in the formation of the human genome. In this chapter examples will be
shown how genetic variations can influence the effect of environmental factors on their
carriers.
• Highly penetrant. A mutation is highly penetrant, when the trait it produces will
almost always be apparent in an individual carrying the allele. The mutation
significantly influences the function of a gene or its products. These are e.g. the
monogenic disease causing mutations.
• Low penetrant. A mutation has low penetrance, when only sometimes produce
perceptible phenotype. The mutation does not influence significantly the function
of the gene or its products, or the intact gene is not essential for the normal
phenotype. But in interactions with other mutations or environmental factors its
effect may be apparent. E.g. polymorphisms can belong to this class.
Naturally, the transition between the two types of mutation is continuous. The
genomic background of an individual strongly influences his/her respond to the
environmental stimuli. Individuals, who are hypersensitive to an environmental
stimulus, usually have monogenic diseases, because normal amount of environmental
factors can cause a disease. Like people with cutaneous porphyria, who are extreme
light-sensitive, and develop symptoms like blisters, necrosis of the skin and gums,
itching and swelling in response to normal sunlight. In these people deficiency in the
enzymes of the porphyrin pathway leads to insufficient production of haeme. But among
people without deleterious, highly penetrant mutations there are light-sensitive (light-
skinned), and light resistant (dark-skinned), and their population distribution is usually
continuous and normal (Gaussian).
Factor V Leiden, caused by the Leiden mutation in F5gene (R506Q). In this disorder,
the Leiden variant of factor V of the coagulation system cannot be inactivated by
activated protein C. Factor V Leiden is the most common hereditary hypercoagulability
disorder amongst Eurasians. It is named after the city Leiden (Netherlands), where it
was first identified in 1994. It is very common; its carrier frequency is 6.5%.
dopamine receptor gene) A1, or the SLC6A3 (dopamine transporter gene) nine-
repeat allelic variants. Stress-induced craving was markedly higher for those carrying
both alleles, compared to those with neither, consistent with the separate biological
pathways involved (receptor, transporter). These findings provide strong support for
the possibility that the dopamine system is involved in stress-induced craving and
suggest a potential genetic risk factor for persistent smoking behavior. This pathway
plays also a role in drug abuse or alcoholism. Both allelic variants are associated with
lower brain dopaminergic function, and these basal deficits, in turn, are thought to
increase the incentive salience of drug use in the presence of triggers (e.g. stress) that
might be related to acute phase increases in dopamine levels.
Another important pathway that plays a role in the addiction to smoking is the
nicotine pathway. The CYP2A6 gene (19q13.2) codes for an enzyme responsible for the
degradation of nicotine. Deficiency of this enzyme is quite common (10-17.6%), and
causes reduced degradation of nicotine, and people with this deficiency have reduced
possibility of smoking addiction. If they smoke cigarette, they smoke less, and have
reduced risk to cancer and emphysema. The nicotine is accumulated in their organisms,
their carving will be reduced more quickly, and thus less toxic substance from the smoke
will get to their bodies.
Several GWAS were carried out to study the genomic background of smoking. In
2008 three independent GWAS identified a SNP (rs1051730) in a nicotinic receptor
subunit gene, which associated with both smoking and lung cancer. There was a
discussion, which one is the real association. Then, with the help of association studies it
was verified that the genomic region (CHRNA5-CHRNA3-CHRNB4, 15q24; CHRNA =
neuronal acetylcholine receptor subunit alpha), where there are several nicotinic
receptors was responsible for the association with strong smoking and the link with
lung cancer is primarily mediated through the smoking-related phenotypes. When
patients with lung cancer were taken out of the population, the association remained.
The significance of the nicotinic receptors was shown by further studies, where
additional nicotinic receptor gene cluster (CHRNA6–CHRNB3) on chromosome 8p11
was found to be associated with smoking and also with the quantity of cigarette
smoked in a day.
In a Hungarian study a highly significant association between ever smoking (past +
current smokers) and a specific MHC haplotype was observed. The 8.1 ancestral
haplotype occurred more frequently in the ever smokers than in the never smokers
[odds ratio: 4.97 (1.96-12.62); P = 0.001], and such associations were stronger in
women (odds ratio = 13.6) than in men (odds ratio = 2.79). An independent study in
Icelandic subjects (n = 351) yielded similar and confirmative results. Considering the
documented link between olfactory stimuli and smoking in females, and the presence of
a cluster of odorant receptor genes close to the MHC class I region, the findings implicate
a potential role of the MHC-linked olfactory receptor genes in the initiation of smoking.
Its null mutation is very frequent (39%). This deficiency in smokers is associated with
increased risk to asthma and lung cancer. Vitamin C and E can be protective.
Eighty-ninety percent of patients with rheumatoid arthritis (RA) have certain
subtypes of HLA-DRB1: DRB1*0401, *0404, *0405, *0408, *0101, *0102, which are
called shared epitopes (HLA-(DRB1) SE). Carriers of HLA-SE have an increased
susceptibility to RA and this also has a prognostic significance. In RA, smoking is the
most important environmental risk factor. In the last years it has been discovered
that in RA patients anti-CCP (anti-cyclic citrullinated peptide) auto-antibodies can be
detected. In HLA-DRB1 SE carriers smoking can lead to appearance of anti-CCP
antibodies. It starts in the lung and years afterwards RA develops. Smoking activates the
enzyme peptidylarginine deiminase which catalyzes the conversion of arginine to
citrulline in the proteins in the lung. The cigarette smoke functions as local adjuvant,
which leads to the production of anti-CCP. The HLA-DRB1-SE variants bind and present
citrullinated proteins especially well. Months or years later a mild inflammation in the
joints can trigger appearances of citrullinated proteins. In individuals, who have high
anti-CCP level it can lead to development of chronic RA. If a smoker is a HLA-SE allele
carrier, his/her relative risk is 6.5, in two allele-carriers it is 21.
A gene-gene-environmental interaction can be observed in those who have null
mutation in the GSTM1 gene, HLA-SE carriers, and smokers. They have 58-fold risk to
development of RA.
The expected life time of smokers is significantly lower than that of non-smokers.
Individuals who are carriers of the C4B*Q0, an inactive variants of the complement C4B
gene in the HLA region (6p21.3), have reduced life expectancy. The population
frequency of this variant is 16% in young age (below 45), and reduces to 6% in people of
70-79 years of age. These findings were detected in Hungarian populations and
confirmed in Icelandic; and showed that carriers of the C4B*Q0 had a substantially
increased risk to suffer from myocardial infarction or stroke, and were sorted out from
the healthy elderly population. This was associated strongly with smoking both in
Iceland and Hungary. The findings indicated that the C4B*Q0 genotype could be
considered as a major covariate of smoking in precipitating the risk for acute myocardial
infarction and associated deaths.
The CYP1A1 gene belongs to the cytochrome P450 superfamily (CYP). Enzymes in
this group catalyze the oxidation of organic substances and they are the main
detoxifying agents in the organism. CYP1A1 degrades the toxins in the cigarette
smoke. The most frequent cancer in children is the acute lymphoid leukemia (ALL).
Children whose parents are smokers have a significantly higher risk. In a study it was
found that if the father smokes at home, then the risk is 1.8-fold. The variations in the
CYP1A1 gene alone do not influence the risk to ALL, but if the children are carriers of
certain haplotypes of CYP1A1 and their fathers smoke at home, then the risk is 2.8-fold,
if the father is a strong smoker, the risk is 4.9-fold.
markers, and environmental information about exposure to ETS during infancy was
incorporated in the study. Three regions showed a significant increase from the baseline
LOD score (chromosome 1p between D1S1669-D1S1665 markers; 5q at D5S1505-
D5S816; and 9q at D9S910). The highest LOD score was found on chromosome 5q. In
this genomic region 3 candidate genes were found between the markers of D5S1505-
D5S816: ADRB2, IL4 and IL13. Among these the strongest candidate is the ADRB2 which
codes for the adrenergic β2 receptor, because it is expressed in the lung and binds
substances from the cigarette smoke. The receptor has a common variant: Arg16Gly,
which influences the amount of expressed receptors, and has pharmacogenetic
significance (see in Chapter 13). In another study it was found that compared with
never-smoking Gly-16 homozygotes, those ever-smokers who are Arg-16
homozygotes had a significantly increased risk of asthma (odds ratio = 7.81; 95%
confidence interval [CI]: 2.07 to 29.5). This association showed a clear dose-response
relationship with the number of cigarettes smoked.
The smoking increases the risk of atherosclerosis and T2DM (type 2 diabetes
mellitus) as well. The CYP1A1 gene has a polymorphism called MspI (T6235C). The C
allele is associated with a better inducible gene, its frequency is 10%, and it increases
the risk to atherosclerosis and T2DM and higher rate of complications only in mild
smokers. In heavy smokers the risk of these diseases are so high that the weak effect of
this polymorphism could not be detected. This observation suggests that the presence of
the rare C allele of the CYP1A1 gene in smokers may enhance predisposition to severe
CAD and T2DM.
The variants of the gene APOE (E2, E3, E4) influence the susceptibility to several
diseases, like Alzheimer disease, or atherosclerosis. The variants are quite frequent and
differ from each other in their reduction potential, and affinity to lipoprotein receptors.
The APOE4 has the lowest reduction potential, meaning that it can reduce least
effectively the oxidative stress induced by smoking. In a study the highest levels of
oxLDL and risk to atherosclerosis were measured in APOE4 smokers.
Carrying APOE4 is associated with high risk to Alzheimer disease, the same is true
for smoking (OR = 4.93) but this risk is the highest in APOE4 smokers (OR = 6.56).
allele was associated with poor response, while that of the E4 with good response. This
implies among others that certain alleles can have both positive and negative effects.
Leukotrienes are inflammatory mediators generated from arachidonic acid by the
enzyme 5-lipoxygenase coded by the ALOX5 gene on 10q11.2. Since atherosclerosis
involves arterial inflammation, in a study it was investigated whether a repeat
polymorphism in the ALOX5 gene promoter could relate to atherosclerosis and that this
effect could interact with the dietary intake of competing 5-lipoxygenase substrates. The
ALOX5 genotypes, carotid-artery intima-media thickness, and markers of inflammation
were determined in a randomly sampled cohort of 470 healthy, middle-aged women and
men from the Los Angeles Atherosclerosis Study. Dietary arachidonic acid and marine
n-3 fatty acids were measured with the use of six 24-hour recalls of food intake. Variant
ALOX5 homozygotes (lacking the common allele) were found in 6.0 percent of the
cohort. Mean intima-media thickness adjusted for age, sex, height, and racial or ethnic
group was increased among carriers of two variant alleles, as compared with carriers of
the common (wild-type) allele. Increased dietary arachidonic acid significantly
enhanced the apparent atherogenic effect of genotype, whereas increased dietary
intake of n-3 fatty acids blunted the effect. According to this study variant ALOX5
genotypes identify a subpopulation with increased atherosclerosis. The observed diet-
gene interactions further suggest that dietary arachidonic acids promote, whereas
marine n-3 fatty acids inhibit the leukotriene-mediated inflammation that leads to
atherosclerosis in this subpopulation.
High level of homocysteine is associated with CAD (coronary atherosclerotic
disease). It contributes to damage of the endothelial wall, proliferation of smooth muscle
in the blood vessel, and to the development of atherosclerotic plaques. The enzyme
MTHFR (methylenetetrahydrofolate reductase) and the vitamin folic acid are important
players in the homocysteine metabolic pathway. A common thermolabile variant of the
MTHFR gene, C677T (Ala222Val) was associated with high homocysteine level and
increased CAD risk in people with low folate intake. The 677TT genotype was associated
with the highest risk, but it could be reduced with folate intake.
These two last examples showed that knowing certain genotypes can be
advantageous if the environmental factors (here food) can be easily changed to blunt
their harmful effects. It is utilized by most personal genomic DTC companies, where on
the basis of the genotypes of the costumers personal advice is given.
CD14 is part of the innate immunity and codes for the lipopolysaccharide receptor
with a ligand found on the surface of Gram negative bacteria. With the help of toll like
receptor 4 (TLR4), it induces a Th1 immune response against the pathogens. In the SNP -
159C/T of the CD14 gene, the rarer T allele increases the level of transcription, and the
soluble CD14, and decreases the IgE level. In a French study it was studied whether
different environments influenced the effect of this allelic variant on allergic rhinitis. The
CD14 -159TT genotype was associated with 2-fold reduced risk to atopy and rhinitis.
Exposure to a farming environment in early life was associated with a similar reduced
risk of nasal allergies. When farm exposure and CD14 -159C/T were considered
together, the risk of nasal allergies and atopy was the most reduced in the subjects who
combined both an early-life exposure to a farming environment and the −159TT
genotype (OR = 0.21 meaning ~5-fold risk reduction). This study showed that a gene-by-
environment interaction between CD14 -159C/T and environmental exposure in
childhood may modify the development of atopy. This polymorphism could be
considered in interventions studies that use microbial stimuli to reduce sensitization. In
a similar study TLR2 and CD14 SNPs were associated with asthma and atopic asthma,
respectively. In addition, CD14, TLR2, TLR4, and TLR9 SNPs modified associations
between country living and asthma.
The CD14 gene is a good example for the observation that the effects of a genotype
can be even opposite depending on the environmental factors. An interesting finding
was published about Karelian ethnic groups living both in Finland and Russia.
Considering the prevalence of asthma/allergic diseases, an East-West gradient has been
consistently confirmed between Western affluent countries and Eastern developing
countries, with, e.g. atopic diseases being more prevalent in Western than Eastern
Europe. Finnish Karelians (Western environment) have previously been shown to have
a higher prevalence of allergic disease than Russian Karelians (Eastern environment).
These two areas are geographically adjacent and are expected to have similar outdoor
air pollution. The Karelians were one ethnic group until they were artificially divided by
a new Finnish/Russian border during the Second World War, after which changes have
occurred mainly on the Russian side of the border with an influx of white Russians.
However, the genetic make-up of the two populations should still be similar and, indeed,
any differences can be readily detected as allele distribution differences between the
populations on each side of the border. The major differences between Finnish and
Russian Karelians therefore are likely to be in the cultural, economic and lifestyle
conditions, with which they lived since their separation at the time of the Second World
War. The study analysed two asthma/atopy-related genes, CD14 and CC16, which were
chosen due to strong evidence of gene by environment interactions.
Opposite effects on asthma-related phenotypes were found for specific alleles of both
CD14 -159C/T and CC16 A38G in the Karelian women. Of particular interest was the
finding of paradoxical gene by environment responses for several allergy
phenotypes. For some of these, itchy rash in particular, the CD14 TT genotype
conferred the greatest risk among those in Finland, but the TT genotype was
associated with the least risk in Russians. A paradox was also found for CC16 as the
AA genotype was associated with the greatest risk of rhinitis and allergic eye symptoms
in Finnish subjects, but the least risk for these phenotypes in Russians. Gene by
environment interactions have been suggested to explain the inconsistencies in the
associations of CD14 -159C/T with atopic phenotypes and an endotoxin switch model
has been postulated, in which the CD14 promoter polymorphism changes the threshold,
at which environmental endotoxin stimulation leads to a TH2 immune response. Several
population-based and family-based studies showed that the CD14 -159C/T
polymorphism had an interactive effect with endotoxin exposure on atopic phenotypes.
The T allele of the CD14 -159C/T SNP, with possibly higher expression of CD14,
exhibited protective effects on atopy with low exposure to endotoxin, in contrast, being
a risk factor with high exposure to endotoxin. Russian Karelians in the Eastern
environment may have had higher levels of endotoxin exposure, relative to Finnish
Karelians in the Western environment. Indeed, recent studies found that Finnish
Karelian children had a lower prevalence of microbial antibodies and less exposure to
microbial loads in drinking water, compared with Russian Karelian children. With
regard to the endotoxin switch model, Russian women with the T allele of CD14 -
159C/T, potentially exposed to higher levels of endotoxin relative to Finnish women,
should have an increased risk for atopic phenotypes. However, the study found that in
Russian women the T allele was protective against atopic conditions, which is
inconsistent with the switch model and indicates the complexity of interactions between
environmental exposure to endotoxin and genotypes of CD14 (see in Chapter 9, Hygiene
hypothesis).
Below, there are some examples for the new types of studies using genomic methods
and large biobanks.
In a large study involving several populations gene-environmental interactions were
investigated in MI. Earlier one of the most robust genetic associations for
cardiovascular disease (CVD) have been found with the chromosome 9p21 region.
In this study it was investigated whether environmental factors (nutrition) influence the
effect of variants in 9p21 on MI risk. All four SNP risk variants increased the risk of MI
by about a fifth. However, the effect of the SNPs on MI was influenced by the “prudent”
diet pattern score of the INTERHEART participants (multiethnic population with 8,114
individuals (3,820 cases and 4,294 controls) from five ethnicities—European, South
Asian, Chinese, Latin American, and Arab), a score that includes fresh fruit and vegetable
intake as recorded in food frequency questionnaires. That is, the risk of MI in people
carrying SNP risk variants was influenced by their diet. The strongest interaction was
seen with an SNP called rs2383206, but although rs2383206 carriers who ate a diet
poor in fruits and vegetables had a higher risk of MI than people with a similar diet who
did not carry this SNP, rs2383206 carriers and non-carriers who ate a fruit- and
vegetable-rich diet had a comparable MI risk. Overall, the combination of the least
“prudent” diet and two copies of the risk variant were associated with a two-fold
increase in risk for MI in the INTERHEART study. Additionally, data collected in the
FINRISK study, which characterized healthy individuals living in Finland at baseline and
then followed them to see whether they developed CVD (19,129 Finnish individuals with
1,014 incident cases of CVD), revealed a similar interaction between diet and 9p21 SNPs.
These findings suggest that the risk of CVD conferred by chromosome 9p21 SNPs may be
influenced by diet in multiple ethnic groups. Importantly, they suggest that the
deleterious effect of 9p21 SNPs on CVD might be mitigated by consuming a diet
rich in fresh fruits and vegetables.
Similar study has been carried out in connection with obesity. More than 20,000
individuals with European ancestry were involved. Altogether 12 SNPs were selected
and genotyped previously showed strong association with increased body mass index
(BMI). A genetic predisposition score for each individual was calculated and their
occupational and leisure-time physical activities were assessed by using a validated self-
administered questionnaire. Then, the researchers used modeling techniques to
examine the main effects of the genetic predisposition score and its interaction with
physical activity on BMI/obesity risk and BMI change over time. The researchers found
that each additional BMI-increasing allele was associated with an increase in BMI
equivalent to 445 g in body weight for a person 1.70 m tall and that the size of this effect
was greater in inactive people than in active people. In individuals who have a physically
active lifestyle, this increase was only 379 g/allele, or 36% lower than in physically
inactive individuals, in whom the increase was 592 g/allele. Furthermore, in the total
sample each additional obesity-susceptibility allele increased the odds of obesity by
1.116-fold. However, the increased odds per allele for obesity risk were 40% lower in
physically active individuals (1.095 odds/allele) compared to physically inactive
individuals (1.158 odds/allele). The findings of this study indicate that the genetic
predisposition to obesity can be reduced by approximately 40% by having a physically
active lifestyle. The findings of this study suggest that, while the whole population
benefits from increased physical activity levels, individuals who are genetically
predisposed to obesity would benefit more than genetically protected individuals.
Several studies showed that regular consumption of coffee reduced the risk of
Parkinson disease (PD). In the next example the results of a genomic study will be
described, in which it was investigated with the help of GWAS, what genomic
background influenced the positive effect of coffee-drinking.
In this study genome-wide genotype data and lifetime caffeinated-coffee-
consumption data on 1,458 persons with PD and 931 without PD were involved. A
genome-wide association and interaction study (GWAIS) was performed, testing
each SNP's main-effect plus its interaction with coffee, adjusting for sex, age, and two
principal components. Subjects were stratified as heavy or light coffee-drinkers and a
GWAS was carried out in each group. The rs4998386 SNP and the neighbouring SNPs in
GRIN2A gene were associated with PD via heavy coffee consumption. GRIN2A encodes an
NMDA-glutamate-receptor subunit and regulates excitatory neurotransmission in the
brain. In stratified GWAS, the GRIN2A signal was present in heavy coffee-drinkers
(OR = 0.43; P = 6×10−7) but not in light coffee-drinkers. This study was a proof of
concept that inclusion of environmental factors can help identify genes that are missed
in GWAS. Both adenosine antagonists (caffeine-like) and glutamate antagonists
(GRIN2A-related) are being tested in clinical trials for treatment of PD. GRIN2A may be a
useful pharmacogenetic marker for subdividing individuals in clinical trials to determine
which medications might work best for which patients.
There are individuals who, because of their genetic background, are extremely
sensitive to certain foods. E.g. individuals with glucose-6-phosphate dehydrogenase
(G6PD) deficiency can have fatal haemolytic anaemia after consumption of broad bean
(vicia faba). It is an X-linked hereditary condition, thus it occurs mainly in males. The
disease is also called favism after the other name of the bean, fava bean. G6PD
deficiency is the most common human enzyme defect, being present in more than 400
million people worldwide. African, Middle Eastern and South Asian people are affected
the most along with those who are mixed with any of the above. A side effect of this
disease is that it confers protection against malaria, in particular the form of malaria
caused by Plasmodium falciparum, the most deadly form of malaria (see more
http://en.wikipedia.org/wiki/Favism). Many pharmacological substances are
potentially harmful to people with G6PD deficiency. Henna has been known to cause
haemolytic crisis in G6PD-deficient infants. In these cases knowing the genetic
background of the carriers can be life saving.
Naturally, the nutrigenetic investigations are quite expensive, but the main problem
is not this. The population genetic studies give probabilities, and population averages. A
given result may be valid and significant for the whole population, but not for every
individual. It can occur that an individual carries the “good genotype”, but because of
other factors (genetic and other environmental), the effect of an environmental factor
has an opposite or even harmful effect than in the population average. E.g. somebody
has the advantageous SNP in the GRIN2A gene in connection with PD, but must not drink
coffee, because his/her stomach is sensitive for it and ulcer can develop. But the more
factors are considered in the analysis, the more reliable the prediction at individual level
can be.
from medical point of view it is especially important; thus the whole next chapter is
about this theme.
12.9. Literature
1. Laland KN, Odling-Smee J, Myles S. How culture shaped the human genome: bringing
genetics and the human sciences together. Nat Rev Genet. 2010 Feb;11(2):137-48.
2. International Human Genome Sequencing Consortium: Initial sequencing and
analysis of the human genome. Nature 2001;409:860-921.
3. Venter JC et al. The sequence of the Human Genome. Science 2001;291:1304-51.
4. Barreiro LB, Quintana-Murci L. From evolutionary genetics to human immunology:
how selection shapes host defence genes. Nat Rev Genet. 2010 Jan;11(1):17-30.
5. Fumagalli M et al. Genome-wide identification of susceptibility alleles for viral
infections through a population genetics approach. PLoS Genet. 2010 Feb
19;6(2):e1000849.
6. Szalai Cs, Czinner A, Császár A, Szabó T, Falus A: Frequency of the HIV-1 resistance
CCR5 deletion allele in Hungarian newborns. Eur J Pediat 1998: 157:/9:782.
7. Hütter G, Ganepola S. The CCR5-delta32 polymorphism as a model to study host
adaptation against infectious diseases and to develop new treatment strategies. Exp
Biol Med (Maywood). 2011 Aug 1;236(8):938-43.
8. Tishkoff SA et al. Convergent adaptation of human lactase persistence in Africa and
Europe. Nat Genet. 2007 Jan;39(1):31-40.
9. Tully G. Genotype versus phenotype: human pigmentation. Forensic Sci Int Genet.
2007 Jun;1(2):105-10.
10. Reich D et al. Denisova admixture and the first modern human dispersals into
Southeast Asia and Oceania. Am J Hum Genet. 2011 Oct 7;89(4):516-28.
11. Chambers V et al. Haemochromatosis-associated HFE genotypes in English blood
donors: age-related frequency and biochemical expression. J Hepatol. 2003
Dec;39(6):925-31.
12. Erblich J et al. Stress-induced cigarette craving: effects of the DRD2 TaqI RFLP and
SLC6A3 VNTR polymorphisms. Pharmacogenomics J. 2004;4(2):102-9.
13. Minematsu N et al. Association of CYP2A6 deletion polymorphism with smoking
habit and development of pulmonary emphysema. Thorax. 2003 Jul;58(7):623-8.
14. Stevens VL et al. Nicotinic receptor gene variants influence susceptibility to heavy
smoking. Cancer Epidemiol Biomarkers Prev. 2008 Dec;17(12):3517-25.
15. Füst G, Arason GJ, Kramer J, Szalai C et al. Genetic basis of tobacco smoking: strong
association of a specific major histocompatibility complex haplotype on chromosome
6 with smoking behavior. Int Immunol. 2004 Oct;16(10):1507-14.
16. Lundström E et al.Gene-environment interaction between the DRB1 shared epitope
and smoking in the risk of anti-citrullinated protein antibody-positive rheumatoid
arthritis: all alleles are important. Arthritis Rheum. 2009 Jun;60(6):1597-603.
17. Criswell LA et al. Smoking interacts with genetic risk factors in the development of
rheumatoid arthritis among older Caucasian women. Ann Rheum Dis. 2006
Sep;65(9):1163-7.
18. Blaskó B et al. Low complement C4B gene copy number predicts short-term
mortality after acute myocardial infarction. Int Immunol. 2008 Jan;20(1):31-7.
19. Füst György, Kramer Judit, Kiszel Petra, Blaskó Bernadette, Szalai Csaba, Gudmundur
Johann Arason, Chack Yung Yu . C4BQ0, egy génvariáns, amely jelentősen csökkenti
az esélyt az egészséges öregkor megélésére. Magyar Tudomány, 2006/3 266. o.
20. Lee KM et al. Paternal smoking, genetic polymorphisms in CYP1A1 and childhood
leukemia risk. Leuk Res. 2009 Feb;33(2):250-8.
21. Susan Colilla et al. Evidence for gene-environment interactions in a linkage study of
asthma and smoking exposure. J Allergy Clin Immunol 2003;111:840-6.
22. Wang Z et al. Association of asthma with beta(2)-adrenergic receptor gene
polymorphism and cigarette smoking. Am J Respir Crit Care Med. 2001
May;163(6):1404-9.
23. Wang XL et al. Effect of CYP1A1 MspI polymorphism on cigarette smoking related
coronary artery disease and diabetes.Atherosclerosis. 2002 Jun;162(2):391-7.
24. Talmud PJ, Hawe E, Miller GJ. Analysis of gene-environment interaction in coronary
artery disease: lipoprotein lipase and smoking as examples. Ital Heart J. 2002
Jan;3(1):6-9.
25. Kivipelto M et al. Apolipoprotein E epsilon4 magnifies lifestyle risks for dementia: a
population-based study. J Cell Mol Med. 2008 Dec;12(6B):2762-71.
26. Rusanen M et al. Midlife smoking, apolipoprotein E and risk of dementia and
Alzheimer's disease: a population-based cardiovascular risk factors, aging and
dementia study. Dement Geriatr Cogn Disord. 2010;30(3):277-84.
27. Drenos F, Kirkwood TB. Selection on alleles affecting human longevity and late-life
disease: the example of apolipoprotein E. PLoS One. 2010 Apr 2;5(4):e10022.
28. Dwyer JH et al. Arachidonate 5-lipoxygenase promoter genotype, dietary arachidonic
acid, and atherosclerosis. N Engl J Med. 2004 Jan 1;350(1):29-37.
29. Zhang G et al. Opposite gene by environment interactions in Karelia for CD14 and
CC16 single nucleotide polymorphisms and allergy. Allergy. 2009 Sep;64(9):1333-41.
30. Alam MA et al. Association of polymorphism in the thermolabile 5, 10-methylene
tetrahydrofolate reductase gene and hyperhomocysteinemia with coronary artery
disease. Mol Cell Biochem. 2008 Mar;310(1-2):111-7.
31. Bufalino A,. Maternal polymorphisms in folic acid metabolic genes are associated
with nonsyndromic cleft lip and/or palate in the Brazilian population. Birth Defects
Res A Clin Mol Teratol. 2010 Nov;88(11):980-6.
32. Chen L et al. Alcohol intake and blood pressure: a systematic review implementing a
Mendelian randomization approach. PLoS Med. 2008 Mar 4;5(3):e52.
33. Hines LM et al. Genetic variation in alcohol dehydrogenase and the beneficial effect of
moderate alcohol consumption on myocardial infarction. N Engl J Med. 2001 Feb
22;344(8):549-55.
34. Capri M et al. Human longevity within an evolutionary perspective: the peculiar
paradigm of a post-reproductive genetics. Exp Gerontol. 2008 Feb;43(2):53-60.
35. Candore G et al. Inflammation, longevity, and cardiovascular diseases: role of
polymorphisms of TLR4. Ann N Y Acad Sci. 2006 May;1067:282-7.
36. Do R et al. The Effect of Chromosome 9p21 Variants on Cardiovascular Disease May
Be Modified by Dietary Intake: Evidence from a Case/Control and a Prospective
Study. PLoS Medicine 2011;9 (10)
37. Li S et al. Physical activity attenuates the genetic predisposition to obesity in 20,000
men and women from EPIC-Norfolk prospective population study. PLoS Med. 2010
Aug 31;7(8). pii: e1000332. PubMed PMID: 20824172; PubMed Central PMCID:
PMC2930873.
38. Lu Y, Feskens EJ, Dolle ME et al. Dietary n-3 and n-6 polyunsaturated fatty acid intake
interacts with FADS1 genetic variation to affect total and HDLcholesterol
concentrations in the Doetinchem Cohort Study. Am J Clin Nutr 2010; 92:258–265.
39. Ordovás JM, Robertson R, Cléirigh EN. Gene-gene and gene-environment interactions
defining lipid-related traits. Curr Opin Lipidol. 2011 Apr;22(2):129-36.
40. Hamza TH et al. Genome-wide gene-environment study identifies glutamate receptor
gene GRIN2A as a Parkinson's disease modifier gene via interaction with coffee. PLoS
Genet. 2011 Aug;7(8):e1002237.
12.10. Questions
1. From a gene environmental point of view what does it mean that a mutation has high
or low penetrance?
2. What is the distribution of the population regarding responses to environmental
stimuli?
3. Give examples for the interactions between highly penetrant mutations and the
environment!
4. Give examples for the interactions between low penetrant mutations and the
environment!
5. What aspects can be investigated studying the smoking-genome interactions?
6. Give examples for genes playing roles in the addiction to smoking!
7. With which disease did genetic variants in the nicotinic receptors associate in
different GWAS?
8. What roles do the variants of the CYP2A6 genes play in smoking?
9. What does the association between the 8.1 ancestral haplotype of the MHC region
and smoking initiation implicate?
10. Give examples for genes in which variations can influence the health of the smokers!
11. What kind of environmental factor interacts with HLA-DRB1 SE, and what can be the
consequences?
12. What consequences have been found for carriers of the C4B*Q0 variants?
13. What gene has an important role in the degradation of the toxins in the smoke? What
can be the consequences of the variations in this gene?
14. What gene has variations which influenced the risk to asthma in smokers?
15. Give some examples for the gene environmental interactions regarding the APOE
gene!
16. Which gene variations can influence the effect of consuming food rich in arachidonic
acid on intima-media thickness?
17. What food supplements would you recommend for men carrying promoter
polymorphisms in the ALOX5 gene?
18. What food supplements would you recommend for individuals carrying the
thermolabile variant of the MTHFR gene?
19. What environmental factors and how can influence the effect of variations in the
CD14 gene?
20. Is it possible that a genetic variation can have opposite effects in different
populations? Explain it!
21. What environmental factor can interact with the variations in the ADH3 gene, and
how?
22. With what diseases did the 9p21 chromosome region associate, and what and how
influenced this association?
23. What non-genetic factor influenced the effect of polymorphism associated with risk
to obesity?
24. What is GWAIS and what did it find in Parkinson disease?
pharmaceutical companies, but this is also harmful for those sick people, for whom the
drug was efficient. If the cause of the serious adverse effects could be determined, which
could be genetic, then the individuals who have a high risk for the adverse effects, could
be treated with alternative therapy.
Theoretically it is possible that in the future, everybody will have a genomic profile,
available for the physicians, who with the help of a decision-support system would be
able to select the optimal drugs or therapy for the patients. This could be the ideal case
for the personal therapy.
Pharmacodynamic: Genetic variants, which are in the genes of the drug targets or
in their associated pathways. Pharmacodynamics is often summarized as the study of
what a drug does to the body, whereas pharmacokinetics is the study of what the body
does to a drug.
Idiosyncratic: Genetic variations in genes coding for proteins, which are not in the
drug target or pharmacokinetic pathways, but could influence the drug response. The
adverse effects could be caused by e.g. an enzymopathy, so that the triggering substance
cannot be processed properly in the organism and causes symptoms by accumulating or
blocking other substances to be processed.
On the web page of FDA a table can be found with the list of FDA-approved
pharmacogenomic biomarkers in drug labels. In November 2012 there were 118
rows with drug names in this table. The most were connected to oncology (32), then
psychiatry (30). For the cardiovascular diseases there are 8 listed pharmacogenomic
biomarkers. Among the genes the most frequent ones belong to the CYP gene family
(60). The most frequent gene is the CYP2D6 (37) and then the CYP2C19 (14). Because
the CYP gene family plays an important role in the drug metabolism, it shows that the
pharmacokinetic variants are overrepresented in this list.
In the following we show only a few examples of the above mentioned list, and will
concentrate rather on the researches which are carried out in this topic. We use two
diseases, atherosclerosis and asthma as main examples.
It must be noted that most results are genetic and not genomic, but in this area the
terms of pharmacogenomics and pharmacogenetics are often used as synonyms, and we
used them in a similar way.
Warfarin activity is determined partially by genetic factors. The FDA offers to use
genetic tests to improve their initial estimate of what is a reasonable warfarin dose for
individual patients. Polymorphisms in two genes (VKORC1 and CYP2C9) are particularly
important.
to effective INR as opposed to VKORC1, but does shorten the time to INR >4
(International Normalized Ratio).
• VKORC1 polymorphisms explain 30% of the dose variation between patients:
particular mutations make VKORC1 less susceptible to suppression by warfarin.
There are two main haplotypes that explain 25% of variation: low-dose
haplotype group (A) and a high-dose haplotype group (B). VKORC1
polymorphisms explain why African Americans are on average relatively
resistant to warfarin (higher proportion of group B haplotypes), while Asian
Americans are generally more sensitive (higher proportion of group A
haplotypes). Group A VKORC1 polymorphisms lead to a more rapid achievement
of a therapeutic INR, but also a shorter time to reach an INR over 4, which is
associated with bleeding.
Because of the known clinical significance of CYP polymorphisms, there are CYP
chips available for the determination of the known predictor genotypes.
Suxamethonium chloride, also known as suxamethonium or succinylcholine, is a
nicotinic acetylcholine receptor agonist, used to induce muscle relaxation and short-
term paralysis, usually to facilitate tracheal intubation. In 1/2500 individuals the
enzyme butyrylcholinesterase (also known as pseudocholinesterase, plasma
cholinesterase and is encoded by the BCHE gene) that hydrolyses many different choline
esters and also this compound, has no activity due to mutations in both genes, which can
cause serious adverse reactions like apnea.
Mercaptopurine (its brand name Purinethol) is an immunosuppressive drug used
to treat e.g. leukemia, pediatric non-Hodgkin's lymphoma, and inflammatory bowel
disease (such as Crohn's disease and ulcerative colitis). Its metabolizing enzyme is
thiopurine methyltransferase (TPMT). Its gene has three known SNPs causing enzyme
deficiency. Patients with TPMT deficiency are much more likely to develop dangerous
myelosuppression.
The gene for multidrog resistance-1 (MDR-1) belongs to the ABC-transporter
family and the name of its gene is ABCB1. It is expressed at the apical membrane of the
mucosal epithelium all along the gastrointestinal tract, at the biliary canalicular
membrane of hepatocytes and on the apical surface of cells in the proximal kidney
tubules protecting our cells against toxic compounds, including some drugs. In the
ABCB1 3435T/C SNP, the T allele is associated with lower expression level of the gene,
and higher rate of adverse drug reactions (Erdélyi et al., 2006).
Anthracyclines are potent cytostatic drugs, the correct dosage of which is critical to
avoid possible cardiac side effects. ABCC1 (MRP1) is expressed in the heart and takes
part in the detoxification and protection of the cells from toxic effects of xenobiotics,
including anthracyclines. Polymorphisms in this gene influence the cardiac side effects
of the anthracyclines (Semsei AF et al., 2012).
GWAS were also carried out in this topic. The Study of the Effectiveness of Additional
Reductions in Cholesterol and Homocysteine (SEARCH) identified a SNP in the SLC01B1
gene (SLCO1B1*5) which is associated with statin-induced myopathy in simvastatin
(Zocor) treated patients with cardiovascular diseases.
It must be noted, however, that the pharmacogenetic results in connection with the
statins are rather controversial, and thus in the FDA-approved list there are only two
items of statins with pharmacogenomic drug labels.
13.6.2. Clopidogrel
Clopidogrel is an oral, thienopyridine class antiplatelet agent used to inhibit blood clots
in coronary artery disease, peripheral vascular disease, and cerebrovascular disease. It
is marketed by Bristol-Myers Squibb and Sanofi under the trade name Plavix. The drug
works by irreversibly inhibiting a receptor called P2Y12, an adenosine diphosphate
(ADP) chemoreceptor on platelet cell membranes. Adverse effects include haemorrhage,
severe neutropenia, and thrombotic thrombocytopenic purpura (TTP). It is prescribed
for 40 million patients annually.
Clopidogrel is a pro-drug activated in the liver by cytochrome P450 enzymes,
including CYP2C19. Three-four percent of the Caucasian population homozygote, while
24% heterozygote for the inactive variants of the gene associating with higher rate of
cardiovascular complications.
GWAS was carried out in an Amish population, and a SNP in the CYP2C19 gene was
identified, which was associated with reduced drug response, and this was responsible
for 12% of the drug response variations. The traditional factors (BMI, age, cholesterol
level) were responsible for only 10% of the variations. This was later confirmed in
another study and in a 12-year follow-up study the CYP2C19 status was the only
independent risk factor, when cardiovascular death, non-fatal myocardial infarction or
coronary revascularization were applied as target values. In another study two variants
of the ABCB1 were shown to be associated with adverse drug response. The product of
this gene plays a role in the absorption of the drug. CYP2C19 has a gain of function allele
(CYP2C19*17) which codes for an ultra-fast metabolizing form of the enzyme. Carriers
of this allele respond better to the drug (Myburgh R et al., 2012). Presently, FDA
recommends alternative therapies for poor responders, and in March 2010 the warnings
about CYP2C19 genotypes were put into the drug label.
primary outcome of the clinical study was improvement in FEV1 (forced expiratory
volume in 1 second). In the unstratified population, the inhibitor produced a 12% to
14% improvement in FEV1. Patients homozygous for the wild-type promoter had a 15%
improvement in FEV1. In contrast, those patients homozygous for the mutant version of
the promoter had a significantly decreased FEV1 response. Otherwise the ALOX5 core
promoter locus does not account for all patients who did not respond to ALOX5
inhibition, which suggests that there may be other gene defects in the pathway leading
to a lack of response to this form of treatment. It was suggested that patients who fail to
respond to ALOX5 inhibition are those in whom other mechanisms are responsible for
asthmatic airway obstruction.
LTC4 synthase is a membrane-bound glutathione transferase expressed only by cells
of hematopoietic origin and is a key enzyme in the synthesis of cys-LTs, converting LTA4
to LTC4. The gene encoding LTC4 synthase is located on 5q35. An adenine to cytosine
transversion has been found 444 bp upstream (-444) of the translation start site of the
LTC4 synthase gene and reported that the polymorphic C -444 allele occurred more
commonly in patients with aspirin intolerant asthma (AIA) (Sanak et al., 1997 and
2000). A 5-fold greater expression of LTC4 synthase has been demonstrated in
individuals with AIA when compared with patients with aspirin-tolerant asthma;
furthermore, the expression of LTC4 synthase mRNA has also been shown to be higher
in blood eosinophils from asthmatic subjects compared with control subjects and was
particularly increased in eosinophils from patients with AIA. In addition, it was found
that, among subjects with asthma treated with zafirlukast (a leukotriene receptor
antagonist), those homozygous for the A allele at the -444 locus had a lower FEV1
response than those with the C/C or C/A genotype (Palmer et al., 2002).
It must be noted that these examples are researches, and the results have not got
into clinical practice yet.
and the whole organism, it turned out that, presently it is not even known whether it
will ever be a reality? In 2012 there are very few genomic results that have gone into the
practice. Mainly mutations in the protein coding regions with strong effects can give
clinically relevant information; the effects of common SNPs are usually unpredictable
and clinically unusable.
But we are only at the beginning of this process, and regarding the huge
development of the last decades, we can be sure that the number of the usable
pharmacogenomic tests or personal therapies will be expanded in the future.
13.8. Literature
1. http://www.fda.gov/
2. http://www.fda.gov/downloads/AboutFDA/Transparency/Basics/UCM247465.pdf
3. http://www.fda.gov/Drugs/ScienceResearch/ResearchAreas/Pharmacogenetics/uc
m083378.htm.
4. ENCODE Project Consortium et al. An integrated encyclopedia of DNA elements in
the human genome. Nature. 2012 Sep 6;489(7414):57-74.
5. Erdelyi DJ, Kamory E, Zalka A, Semsei AF, Csokay B, Andrikovics H, Tordai A,
Borgulya G, Magyarosy E, Galantai I, Fekete G, Falus A, Szalai C, Kovacs GT. The role
of ABC-transporter gene polymorphisms in chemotherapy induced
immunosuppression, a retrospective study in childhood acute lymphoblastic
leukaemia. Cell Immunol. 2006 Dec;244(2):121-4.
6. Erdélyi DJ, Kámory E, Csókay B, Andrikovics H, Tordai A, Kiss C, Félné-Semsei Á,
Janszky I, Zalka A, Fekete G, Falus A, Kovács GT, Szalai C. Synergistic interaction of
ABCB1 and ABCG2 polymorphisms predicts the prevalence of toxic encephalopathy
during anticancer chemotherapy. Pharmacogenomics J. 2008 8: 321-327.
7. Semsei AF, Erdelyi DJ, Ungvari I, Csagoly E, Hegyi MZ, Kiszel PS, Lautner-Csorba O,
Szabolcs J, Masat P, Fekete G, Falus A, Szalai C, Kovacs GT. ABCC1 polymorphisms in
anthracycline induced cardiotoxicity in childhood acute lymphoblastic leukemia. Cell
Biol Int. 2011 Sep 20. [Epub ahead of print] PubMed PMID: 21929509.
8. Tan GM, Wu E, Lam YY, Yan BP. Role of warfarin pharmacogenetic testing in clinical
practice. Pharmacogenomics. 2010 Mar;11(3):439-48.
9. Gasche Y et al. Codeine intoxication associated with ultrarapid CYP2D6 metabolism.
N Engl J Med. 2004 Dec 30;351(27):2827-31.
10. http://en.wikipedia.org/wiki/Statin
11. Kajinami K, Brousseau ME, Ordovas JM, Schaefer EJ. CYP3A4 genotypes and plasma
lipoprotein levels before and after treatment with atorvastatin in primary
hypercholesterolemia. Am J Cardiol. 2004 Jan 1;93(1):104-7.
12. Kivistö KT et al. Lipid-lowering response to statins is affected by CYP3A5
polymorphism. Pharmacogenetics. 2004 Aug;14(8):523-5.
13. Mangravite LM, Wilke RA, Zhang J, Krauss RM. Pharmacogenomics of statin response.
Curr Opin Mol Ther. 2008 Dec;10(6):555-61.
14. Mangravite LM,et al.. Clinical implications of pharmacogenomics of statin treatment
The Pharmacogenomics Journal (2006) 6, 360–374.
15. http://en.wikipedia.org/wiki/Clopidogrel
16. Myburgh R, Hochfeld WE, Dodgen TM, Ker J, Pepper MS. Cardiovascular
pharmacogenetics. Pharmacol Ther. 2012 Mar;133(3):280-90.
35. Tantisira, K.G. et al. TBX21: a functional variant predicts improvement in asthma
with the use of inhaled corticosteroids. Proc Natl Acad Sci U S A. 2004;101:18099-
18104.
36. Tantisira, K.G. et al. Molecular properties and pharmacogenetics of a polymorphism
of adenylyl cyclase type 9 in asthma: interaction between beta-agonist and
corticosteroid pathways. Hum Mol Genet. 2005, 14, 1671-1677.
37. Tantisira KG et al. Genomewide association between GLCCI1 and response to
glucocorticoid therapy in asthma. N Engl J Med. 2011 Sep 29;365(13):1173-83.
38. Palmer, L.J. et al. Pharmacogenetics of asthma. Am J Respir Crit Care Med.. 2002,15,
861-866.
39. Distefano JK, Watanabe RM. Pharmacogenetics of Anti-Diabetes Drugs.
Pharmaceuticals (Basel). 2010 Aug 1;3(8):2610-2646.
40. Konoshita T; Genomic Disease Outcome Consortium (G-DOC) Study Investigators. Do
genetic variants of the Renin-Angiotensin system predict blood pressure response to
Renin-Angiotensin system-blocking drugs?: a systematic review of
pharmacogenomics in the Renin-Angiotensin system. Curr Hypertens Rep. 2011
Oct;13(5):356-61.
41. Manunta P et al. Physiological interaction between alpha-adducin and WNK1-
NEDD4L pathways on sodium-related blood pressure regulation. Hypertension. 2008
Aug;52(2):366-72.
42. Turner ST et al. Genomic association analysis suggests chromosome 12 locus
influencing antihypertensive response to thiazide diuretic. Hypertension. 2008
Aug;52(2):359-65.
43. Chung CM et al. A genome-wide association study identifies new loci for ACE activity:
potential implications for response to ACE inhibitor. Pharmacogenomics J. 2010
Dec;10(6):537-44.
44. Corvol JC et al. The COMT Val158Met polymorphism affects the response to
entacapone in Parkinson's disease: a randomized crossover clinical trial. Ann Neurol.
2011 Jan;69(1):111-8.
45. Arbouw ME et al. Novel insights in pharmacogenetics of drug response in
Parkinson's disease. Pharmacogenomics. 2010 Feb;11(2):127-9.
13.9. Questions
1. What main goals has pharmacogenomics?
2. What is the significance of pharmacogenomics?
3. How can genetic variations be used in clinical trials?
4. With what mechanisms can genetic variations influence the drug-response?
5. What are the difficulties of pharmacogenomic researches?
6. What diseases and what gene family are overrepresented in the FDA table with
approved pharmacogenomic biomarkers in drug labels?
7. Give examples for genes influencing pharmacokinetics!
8. What and how can genetic variations of CYP2C9 and VKORC1 influence?
9. How can CYP chips be used?
10. What can genetic variations in butyrylcholinesterase influence?
11. What gene can influence the adverse effect of mercaptopurine?
12. What roles can ABC-transporters have in pharmacology?
13. To what gene family does the gene whose genetic variations can influence the
cardiac side effects of the anthracyclines belong?
14.1. Introduction
In the previous chapters it has been pointed out that with the development of genomic
methods, computers and bioinformatics there are new possibilities for better
understanding and modeling of living organisms as complex systems, which are more
similar to the reality. With the spreading of high throughput methods (microarray
measurements, new generation sequencing, etc.) we can get immense amount of data,
and it is well-known that these data points are not independent, but in connection and
interaction with each other. If e.g. a SNP locates in the regulatory region of a gene, it
influences not only the expression of this gene, but also the operation of those proteins
which are in interaction with the product of the gene. Furthermore, another SNP can
influence the effect of this SNP in both positive and negative ways. In a living organism
these interactions are on several levels, and now it is clear that if we want to interpret
the effect of a mutation or an environmental factor, we must consider these interactions.
In biology, the scientific field that focuses on complex networks of interactions within
biological systems and tries to map and interpret these is called systems biology.
According to the definition, systems biology is the study of the interactions
between the components of biological systems, and how these interactions give
rise to the function and behavior of that system.
In the last years, due to the above mentioned progresses, systems biology has been
developed considerably. Below, concentrating on diseases, basic terms of systems
biology will be introduced, and some examples of the application and utilization of this
scientific field will be shown.
strength of evidence for some of these effects is still debated, by virtue of the many
interactions they have, one expects that the absence of a hub would affect the function of
an exceptional number of other proteins. This assumption has led to the hypothesis that,
in humans, hubs should typically be associated with disease genes. Indeed, one study
found that disease proteins in the OMIM Morbid Map have more protein-protein
interactions than non-disease proteins in literature-curated protein-protein
interaction databases.
Note, however, that the essential gene concept in simple organisms does not map
uniquely into disease genes in humans. Indeed, some human genes are essential in early
development, so functional changes in them often lead to first-trimester spontaneous
abortions (embryonic lethality). Mutations in such ‘essential’ genes cannot propagate in
the population, as individuals carrying them cannot reproduce. In contrast, individuals
can tolerate for a long time the disease-causing mutations, often past their reproductive
age. The question is, are both (disease and essential) genes associated with hubs? Goh et
al. found that essential genes show a strong tendency to be associated with hubs
and expressed in multiple tissues, i.e., they tend to be located at the functional center of
the interactome (Fig. 14.1). Yet, in contrast with our initial hypothesis, non-essential
disease genes do not show a tendency to encode hubs and tend to be tissue-
specific. That is, from a network perspective, these genes segregate at the functional
periphery of the interactome (Fig. 14.2). In summary, in human cells it is the essential
genes, and not the disease genes that are encoding hubs. This difference can be
understood from an evolutionary perspective: mutations that disrupt hubs have
difficulty propagating in the population, as the absence of hubs create so many
disruptions that the host may not survive long enough to reproduce. Thus, only
mutations that impair functionally or topologically peripheral genes can persist,
accounting for the family of heritable diseases, especially those that appear in adulthood
(Barabási AL et al., 2011).
Figure 14.2. Schematic illustration of the differences between essential and non-essential
disease genes. Non-essential disease genes (illustrated as blue nodes) are found to
segregate at the network periphery whereas in utero essential genes (illustrated as red
nodes) tend to be at the functional center (encode hubs, expressed in many tissues) of the
interactome. Source: Barabási AL et al., 2011; 18/02/2013.
It must be added, however, that the above mentioned findings are referred mainly to
monogenic diseases, and highly penetrant mutations (Chapter 12). Considering low
penetrant mutations, common SNPs and complex diseases, there are several
examples of hub proteins in disease networks associating with many diseases. E.g.
common SNPs in the TNF gene are associated with asthma, atherosclerosis, obesity,
T1DM, T2DM and Alzheimer disease. Similarly, β2 adrenerg receptor (ADRB2) is also a
hub protein. Its variations are associated with asthma, responses to drugs, obesity and
hypertension. PPARG codes for a typical hub protein, since mutations in it can cause
hypertension, obesity, T2DM, and atherosclerosis.
If a gene or molecule is involved in a specific biochemical process or disease, its
direct interacting partners might also be suspected to play some role in the same
biochemical process. In line with this hypothesis, proteins involved in the same disease
show a high propensity to interact with each other. For example, Goh et al. observed 290
physical interactions between the products of genes associated with the same disorder,
representing a 10-fold increase relative to random expectation. Furthermore, it was
found that genes linked to diseases with similar phenotypes have a significantly
increased tendency to interact directly with each other. These observations indicate that
if we identify a few disease components, the other disease-related components will
likely be in their network-based vicinity. That is, we expect that each disease can be
linked to a well-defined neighbourhood of the interactome, often referred to as a
disease module (Fig. 14.3). Thus a disease module represents a group of network
components that together contribute to a cellular function whose disruption
results in a particular disease phenotype.
These disease modules can be identified by several biochemical and genomic
methods, even in silico on the basis of currently available data using bioinformatics
approaches. E.g. Chen et al., relied on co-expression networks constructed from liver and
adipose tissues, facilitating the identification of sub-networks associated with genetic
loci linked to obesity- and diabetes-related DNA variations. The results confirmed the
connection between obesity and a macrophage-enriched metabolic subnetwork,
validating three previously unknown genes, LPL, LACTB, and PPM1L, as obesity genes in
transgenic mice.
Figure 14.4. Perturbations in biological systems and cellular networks may underlie
genotype-phenotype relationships. By interacting with each other, genes and their
products form complex cellular networks. The link between perturbations in network and
systems properties and phenotypes, such as Mendelian disorders, complex traits, and
cancer, might be as important as that between genotypes and phenotypes. There are
examples of node removal as well as edge modification.
Source: Vidal M et al., 2011; 18/02/2013.
Figure 14.6. Comorbidity between diseases linked in the HDN measured by the logarithm
of relative risk, indicating that if the disease-causing mutations affect the same gene (2nd
column), then the comorbidity is 2-times higher. If it affects the same domain of the shared
disease protein, then the comorbidity is even higher.
Source: Barabási AL et al., 2011; 18/02/2013.
While most efforts focused on the role of single molecular or phenotypic measure to
capture disease-disease relationships (such as shared genes or metabolites), a
comprehensive understanding requires us to inspect multiple sources of evidence, from
shared genes to protein-protein interaction based relationships, shared environmental
factors, common treatments, affected tissues and organs, and phenotypic manifestations
(Barabási AL et al., 2011).
scientific papers. For the predicted genes, one may argue that their appearance in T1DM
publications could be a result of their interactions with the known disease genes, as
interacting genes often appear in the same publications. To address this issue, all
PubMed records were excluded from the analysis of predicted genes that have cited the
known T1DM genes. Out of the 68 new candidates 13 (~20%) were cited significantly
more often than random in T1DM publications, as compared to only ~6.9% of the
Human Protein Reference Database genes. This was a ~3-fold enrichment. As a group,
members of the 68 list were significantly (p<10-7) more likely to appear in T1DM-related
publications than members of a random set of 68 genes. It shows that there is a high
possibility that these genes play a role in T1DM. Out of the 68 novel candidates, more
than a third (24) interact with at least two known disease genes, and about a sixth (12)
interacts with at least three. It shows the connection between disease modules and
provides further proof that they are really disease genes. This is intuitive, as subsets of
genes having much more interactions with each other than with others are likely to be
from a same functional network module, and consequently to be involved in the same
physiological processes and disease phenotypes.
Figure 14.7. Protein-protein interaction network of the top 5 predictions (ellipse) in T1DM
among the 68 proteins and their corresponding baits (round rectangle; interacting known
T1DM genes). Bright magenta nodes represent genes with significant citation in T1DM-
related publications (p<0.01). Source: Gao et al. 2009; 18/02/2013.
The number of independent baits (known T1DM genes) for each gene was also
determined. Figure 14.7 shows the PPI network of the top 5 candidates in terms of
number of baits. On the top are ESR1 and VIL2, each with 6 baits. Interestingly, they are
also among the top in terms of independent citations in T1D-related publications and
network degrees. ESR1, or estrogen receptor 1, has been cited in 139 (124, after
Figure 14.8. Overall disease relatedness based on shared pathways in the Panther, KEGG
and CGAP-BioCarta databases. Green nodes indicate the neurodegenerative disorders,
whereas pink nodes highlight the autoimmune diseases. The color of the edges connecting
the nodes reflects the shared pathway rank ranging from 3 (highest relatedness) to 30
(lowest relatedness). Parkinson's disease-Park, Alzheimer's disease-Alz, multiple sclerosis-
MS, rheumatoid arthritis-RA and Type 1 diabetes-T1D.
Source: Menon R et al., 2011; 18/02/2013.
14.12. Literature
1. Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach
to human disease. Nat Rev Genet. 2011 Jan;12(1):56-68.
2. Vidal M, Cusick ME, Barabási AL. Interactome networks and human disease.
Cell.2011 Mar 18;144(6):986-98.
3. Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999 Oct
15;286(5439):509-12. PubMed PMID: 10521342.
4. Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási AL. The large-scale organization of
metabolic networks. Nature. 2000 Oct 5;407(6804):651-4.
5. Albert R, Jeong H, Barabasi AL. Error and attack tolerance of complex networks.
Nature. 2000 Jul 27;406(6794):378-82.
6. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL. The human disease
network. Proc Natl Acad Sci U S A. 2007 May 22;104(21):8685-90.
7. Duarte NC et al. Global reconstruction of the human metabolic network based on
genomic and bibliomic data. PNAS. 2007; 104:1777–1782.;
8. Ma H et al. The Edinburgh human metabolic network reconstruction and its
functional analysis. Molecular Systems Biology. 2007; 3:135.
9. Chen Y et al. Variations in DNA elucidate molecular networks that cause disease.
Nature. 2008 Mar 27;452(7186):429-35.
10. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL. The human disease
network. Proc Natl Acad Sci U S A. 2007 May 22;104(21):8685-90.
11. Lee D-S et al. The implications of human metabolic network topology for disease
comorbidity. PNAS. 2008; 105:9880–9885.
12. Lu M et al. An Analysis of Human MicroRNA and Disease Associations. Plos ONE.
2008; 3:e3420.
13. Hidalgo C et al. A Dynamic Network Approach for the Study of Human Phenotypes.
Plos Computational Biology. 2009; 5 e1000353.
14. van Driel MA et al. A text-mining analysis of the human phenome. European Journal
of Human Genetics. 2006; 14:535–542.
15. Edwards YJ et al.Identifying consensus disease pathways in Parkinson's disease
using an integrative systems biology approach. PLoS One. 2011 Feb 22;6(2):e16917.
16. Gao S, Wang X. Predicting Type 1 Diabetes Candidate Genes using Human Protein-
Protein Interaction Networks. J Comput Sci Syst Biol. 2009 Apr 1;2:133.
17. Menon R, Farina C. Shared molecular and functional frameworks among five complex
human disorders: a comparative study on interactomes linked to susceptibility
genes. PLoS One. 2011 Apr 21;6(4):e18660.
18. Binder CJ et al. (2004) IL-5 links adaptive and natural immunity specific for epitopes
of oxidized LDL and protects from atherosclerosis. J Clin Invest 114: 427–437.
19. Taleb S, Tedgui A, Mallat Z (2010) Adaptive T cell immune responses and
atherogenesis. Curr Opin Pharmacol 10: 197–202.)
14.13. Questions
1. What is systems biology?
2. How are the interactions displayed in systems biology?
3. What properties have the interaction networks in different organism?
4. Give examples for the interaction networks in biologic systems!
15.1. Background
In recent decades, our genetic knowledge has expanded rapidly. This is primarily owing
to the development of modern biomedicine, however, it was greatly accelerated by the
U.S. and British governments supporting the "Human Genome Project" and related
programs. The program has fulfilled the main objective of the first stage, that is, the
molecular description of the human genome, much faster than originally planned. This
was due to several factors:
The current situation is expected to remain valid for some time. In addition to
scientific development, this is also predicted by social needs and expectations, the
dominant geopolitical, economic, political and cultural trends.
One of the main fields of application of genetic research is diagnosis and treatment.
The major changes in paradigm of clinical practice and medical approaches have already
started and will continue based on the rapidly changing developments. Its center of
gravity will gradually shift from the symptoms and treatment of patients towards the
preventive studies in asymptomatic status, and personalized medical interventions are
going to be repositioned.
and other branches of justice (e.g. paternity issues) will also benefit from these scientific
procedures. Today, in the era of genomics, gene expression patterns may reveal more
variations and the application of gene diagnostics is more accurate and precise. The new
science of bioinformatics is also significant. Accessing the internet, "in silico" work can
be done by biologists: searching in DNA databases, they can perform advanced, useful
research on the computer screen. It can be said that besides "single instrumental parts"
(i.e. the individual genes), also whole "orchestral harmonies” (= patterns of thousands of
genes and information content of biological pathways) will be assessed. It is more and
more feasible to make progressive genetic "predictions", to see the outcome of certain
diseases (e.g. the possibility of tumor metastases in cancer), and anticipate the side
effects of drugs. This latter option has huge benefits (not only financially, but also in
terms of health, in the management of unnecessary delays in treatments). New
personalized vaccines are under development with the computational tools of the new
science of immune-genomics. It is clear, however, that with faster, more complete
genetic diagnosis, professionals and the subjects of gene diagnostics have to face various
new laws (labor law, insurance), and ethics ("prejudices").
The situation has become extremely difficult for doctors as to when and what to say
to the patient. Even if the doctor emphasizes the eventuality and limitations of our
knowledge, the patient or his relatives might insist on facing the results, and want to
know the hazards and chances according to the actual state of science. An increase in the
mass of accessible data is due to the availability of the exponentially improving
international data bank networks. In addition to the advantages, one should see the
danger of the misused interpretations, as well. The most important tool is education, the
modernization of teaching biological sciences and a sober, honest, sincere dissemination
of knowledge, even if the market-oriented information industry leaves less and less
space for science. Anyway, there are some very encouraging trends (e.g. "University of
All Knowledge"). The basic principle of genetics is still the probability of the inheritance
(and not the definite faith). In spite of knowing and emphasizing this, every day there
are newly occurring, sometimes ethically difficult situations.
Perhaps even more problems emerge in connection with gene therapy (manipulation
of genes). Gene therapy means gene transfer or modification in human cells (DNA) with
some genetic dysfunctions which cause a disease. Although there are much more failures
in this area than successes, yet seductive promises of healing tend to overshadow the
legitimate scientific skepticism warning us to take caution and moderation. While it is
true that more and more successful techniques (e.g. gene silencing) are available for
genetic improvement (the more accurate knowledge of genomics of the human genome
also helps), but we are still far from real successes in gene therapy. The sensation-hype
trends in the commercial mass media are worsening the public reaction, by drawing an
unrealistic picture for the public. Hopefully, attractive and meaningful new scientific
dissemination organs will gain ground in this area as well.
Today it is a more or less universally accepted agreement that, as long as there are
no technical barriers, diseases can be cured by genetic tools (probably including disease
prevention), but skills, mental abilities are not permitted to be improved. It should be
noted, however, that the boundaries between the two concepts are not clear enough,
which raises several additional ethical problems. In any case, perhaps fortunately, in
spite of the scientific reductionism excesses, now it is quite clear that those processes of
brain-psychic-emotional intelligence cannot be interpreted or modified by genetic
methods (at least not more than by chemical, pharmacological effect).
This raised very serious ethical issues and generated controversy. These issues are
particularly challenging in the context of human genome.
It is a widely accepted principle, also confirmed by international law that the human
genome is the common heritage and property of mankind, and the results of genetic
research provide scientific evidence that humans living today share a common origin.
On this basis, only the common interest and charitable purpose could be acceptable to
exploit the research results, and unrestricted access to them has to be ensured. It is very
difficult, however, to put these noble principles into practice. A system should be
developed, which is based on these common values and common interests, but also
provides the advantage that personal motivations (scientific knowledge, ambition) and
economic efficiency (value for money, efficiency) could be achieved. It is also very
important that the system be fair: all who contribute should benefit from the results.
The bio-innovation system is first and foremost determined by the standards that
can be derived from the values using the principles. All of these are deeply rooted in the
society and culture, the political and professional institutions of which create them.
cloning of human embryos. In many countries, including Hungary, the prohibition law
had already been established.
The UNESCO Universal Declaration on the Human Genome and Human Rights (1997)
is ceremonial, but gives only a little practical help; still it could lead to a global consensus
that research ethical rules be respected around the world. The UNESCO released an
International Declaration on Human Genetic Data on the use and protection of genetic
data as well. The World Health Organization (WHO) also released a number of decisions,
draft international guidelines, reports and recommendations since 1998. Of these,
perhaps the most important was the one about the genetic databases (2003).
Ethical standards in relation to the utilization of human genetic innovation results
are included in the European Directive on the Legal Protection of Biotechnological
Inventions (1998) and the European Patent Convention as well.
The declaration of HUGO on DNA samples in 1998 was a ground-breaking document,
which was followed by the recommendations on DNA Banking by the Ethics Committee
Recommendations of the Royal College of Physicians, United Kingdom (2000). The
declaration called Opinion on biobanks for research by the German National Ethics
Council (2004) and the joint resolution of the governing biobanks previously published
by the French and German National Ethics Committee (2003) were milestones. An
important document was released by the European Society of Human Genetics, which is
a recommendation of the DNA data banking (2001) and the proposal of the Council of
Europe on the regulation of the biomedical archiving of human biological materials
(2003).
The pioneer national legislation of genetic research, biobanks and data protection
issues was published in Australia, Singapore, the U.S., France and Canada (Quebec). The
Hungarian Human Genetic Law, based on a serious professional work, was born in 2008.
One must mention the activities of the Australian and Canadian legal reform committees,
as the effects of their activities in these areas exceeded the national framework.
In the future, one can expect specific professional and ethical problems regarding
national genomics programs, such as the first Icelandic medical database (in the
cooperation of the Icelandic government and a company called deCODE), the Estonian
Genome Project and the residential database of Tonga.
Genetics has an epoch-making significance, which was recognized and reflected in
world affairs: the United Nations Millennium Declaration (2000) dealt with it and the G8
Summit has also repeatedly taken sides on the issue.
15.8. Conclusion
Obviously, human knowledge cannot be separated from the mental and physical events
of the society, which we all must be aware of in the era of genetics and genomics.
Seeking renewal and strengthening, our nation has to adapt to the consequences of the
development of our knowledge in many spheres (economic, legal, ethical, religious life).
We must proceed towards the light against the unscientific darkness. This may serve
clear priorities, values and commitment, and the accurate designation of personal
responsibility.
15.9. Bibliography
Bernice Elger, Nikola Biller-Andorno, Alexandre Mauron and Alexander M. Capron (ed.):
Ethical Issues in Goberning Biobanks – Global Perspectives; Ashgate (2008)
Ferencz Antal, Kosztolányi György, Falus András, Kellermayer Miklós, Somfai Béla,
Jelenits István, Hámori Antal: Biogenetika és etika (Sapientia füzetek 4.); Vigília Kiadó
(2005)
The Advisory Committee on Health Research: Genomics and World Health; World
Health Organization (2002); Jan Helge Solbakk, Soren Holm, Bjorn Hofmann: The Ethics
of Research Biobanking; Springer (2009)