Gene Expression and DNA Replication
Gene Expression and DNA Replication
Fall 2020
Supervised by:
Assis. Prof. Omneya Ramadan
Dr. Hend Helmy
By:
Nourhan Ali Abou Madawi
The gene is the fundamental unit for storage and expression of genetic
information. A gene is made up of nucleotide sequences of DNA. It typically
provides information about structural and functional components of the cell. The
information transfer occurs not only between parent and daughter cells during
cell division, but also between nucleus and the cytoplasmic machinery. When it
occurs between parent and child, the transfer of the information is called
inheritance. The transfer of information from the genome with in the nucleus to
the cell proper is referred to as gene expression.
Only 1 to 2 % of human DNA serves as genes, which are the templates for protein
production. Most genes are broken down into separated coding sections known
as exons. These exons are separated from each other by intervening non coding
sequences known as introns. Genes also have other non-coding DNA, typically
short sequences near or within genes that function as regulatory sequences
critical for controlling gene expression.
The term genome refers to the collection of all the genes that an organism
possesses. It has been estimated that between 20,000 to 30,000 genes are in the
human genome. These genes are encoded by lengthy strands of DNA which are
packed with proteins and organized linearly into chromosomes. The linear older
of the genes on the chromosomes allows for the creation of maps of the human
gene order. These maps are generally invariant throughout any given species. It
is these maps that allows us to associate specific genes with specific traits through
gene mapping.
There are two different terms that seem related but actually differ: genome(gene)
mapping and genome (gene) sequencing. Simply, gene mapping is the method of
determining the locations of genes on chromosomes, while gene sequencing is
the method of determining the sequences of nucleotides in DNA. So, gene
sequencing is more detailed than gene mapping.
In genetics, gene expression is the most fundamental level at which the genotype
gives rise to the phenotype. The genetic information stored in DNA represents the
genotype whereas the phenotype results from interpretation of that information.
The properties of cells, tissues and organisms depend largely on the aggregate
structures and properties of their proteins. Molecular biology dogmatically states
that genes control these properties by controlling the structure of proteins, the
timing and amount of their production and the coordination of their synthesis
2
with that of other proteins. The genetic information is transmitted by means of
RNA. Thus, the flow of genetic information is DNA RNA protein.
The sequence of nucleobases determines the sequence of amino acids in the
protein that results from the process of translation. The nucleotides within the
mRNA molecule (the coding mRNA) are arranged in groups of three called codons,
which determine the precise amino acid sequence. This is called the genetic code.
Because there are four bases, 64 possible combinations of nucleotides exist.
There are only 20 amino acids; thus, most amino acids are specified by more than
one codon. A specific codon, AUG, codes for methionine and indicates the
beginning of a coding sequence. The three stop codons are UAA, UGA & UAG.
Converting genetic information contained in the DNA sequence of a gene into a
finished protein product is a complex process consisting of several steps, with
each step involving distinct regulatory mechanisms.
Gene expression involve a series of events: transcription, post transcriptional
modification, nuclear export, translation and post translational modification.
The Life of a cell is the life of its RNA. The function of a cell is governed by the sum
of specific proteins. Protein expression is most commonly regulated at the level
of gene transcription into RNA, which is then processed and translated.
Transcription of DNA into RNA controls cellular differentiation, proliferation and
apoptosis in all differentiating cell systems. Even DNA replication occurs by
certain enzymes expressed from genes.
Aberrant gene expression can result in disturbance in cell function manifested by
a disease.
Transcription
For transcription to take place, the DNA must be unwound from the chromatin
complex. This process of unpackaging is called chromatin remodelling and is
mediated by a family of proteins with switch/sucrose nonfermentable SWI/SNF
domains. These proteins utilize ATP hydrolysis to shift the nucleosome cores
along the length of the DNA, a process known as nucleosome sliding. By sliding
nucleosomes away from a gene sequence, SWI/SNF complexes can activate gene
transcription.
Only one of the two DNA strands serve as a template for transcription. The
antisense strand of DNA is read by RNA polymerase from the 3' end to the 5' end
3
during transcription. The complementary RNA is created in the opposite direction
in the 5' → 3' direction, matching the sequence of the sense strand with the
exception of switching uracil for thymine.
The non-template (sense) strand of DNA is called the coding strand because its
sequence is the same as the newly created RNA transcript (except for the
substitution of uracil for thymine).
Transcription occurs in 4 steps: initiation, promoter escape, elongation and
termination.
1.Initiation
This process starts when certain transcription factors bind to a specific DNA
sequence located upstream to the gene called promoter. The binding recruits
RNA polymerase which also binds to the promoter forming the transcription
initiation complex
The minimum essential transcription factors needed for transcription to occur are
termed basal (general) transcription factors and include transcription factor (TF)
IIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH.
The promoter contains the consensus sequence to bind the transcription factors
and then the RNA polymerase needed to initiate transcription. The best known
example of this sequence is the TATA box sequence, TATAAA, which binds RNA
polymerase and associated transcription factors.
However more than 80% of mammalian protein coding genes are derived by
promoters containing different recognition sequences, often GC boxes. The GC
promoters are repeats of guanine and cytosine nucleotides, frequently having
multiple transcription start sites and require alternative transcription factors like
Specificity protein 1 (Sp1).
4
Basal transcription factors cannot by themselves increase or decrease the rate of
transcription. Instead, promoters can function together with other more distant
regulatory DNA regions (e.g., enhancers and silencers) to direct the level of
transcription of a given gene.
-Enhancer is a sequence of DNA that can be bound by certain transcriptional
factor called activator to promote gene transcription.
-Silencer is a sequence of DNA capable of binding a transcriptional factor called
repressor that blocks the attachment of RNA polymerase to the promoter, thus
preventing the transcription of the gene.
Enhancers and silencers can be located upstream or downstream to the genes
they regulate.
Coactivators and corepressors are transcriptional coregulators that can bind to
activators and repressors respectively. Thus, increasing or decreasing the rate of
gene expression. They lack DNA binding domains so cannot bind DNA but rather
activators or repressors.
The binding strength between the promoter and transcription factors affects the
binding quality of RNA polymerase and subsequently transcription.
β
β
β
β
RNA polymerase with the basal transcription factors binds the promoter forming
an RNA polymerase-promoter closed complex. Then, RNA polymerase with the
help of transcription factors unwinds approximately 14 base pairs of DNA to form
an RNA polymerase-promoter ‘’open complex’’. In the open complex the
promoter DNA is partly unwound and single stranded. The exposed single
stranded DNA is referred to as the transcription bubble. This is achieved by
breaking the hydrogen bonds between complementary DNA nucleotides.
RNA polymerase binds to a nucleoside triphosphate (NTP) considered as the
transcription start site and form a complementary NTP.
5
Nucleotides that come before the initiation site are given negative numbers and
are said to be upstream. while nucleotides that come after the initiation sites are
given positive numbers and are said to be downstream.
2.Promoter escape
When RNA polymerase has initiated de novo phosphodiester bond synthesis, it
begins to give up its bond on promoter DNA and advance to downstream regions
of the template. In vitro, this process is marked by the release of high levels of
abortive transcripts at most promoters, reflecting the high instability of initial
transcription complexes (ITCs) and indicative of the existence of barriers to the
escape process.
Abortive initiation is an early process of genetic transcription in which RNA
polymerase binds to a DNA promoter and enters into cycles of synthesis of
short mRNA transcripts which are released before the transcription complex
leaves the promoter. Abortive initiation continues to occur until an RNA product
of a threshold length of approximately 23 nucleotides is synthesized, at which
point promoter escape occurs and a transcription elongation complex is formed.
It was found that there was a relationship between the amount of abortive
transcripts produced and the time until long RNA strands are successfully
produced.
3.Elongation
After promoter escape, RNA polymerase progresses to traverse the template
DNA strand, using ATP while complementarily pairing bases and forming the
phosphodiester–ribose backbone.
The characteristic elongation rates in prokaryotes and eukaryotes ranges
between 10-100 nts/sec.
In eukaryotes, however, nucleosomes act as major barriers to transcribing
polymerases during transcription elongation. The pausing induced by
nucleosomes can be regulated by transcription elongation factors such as TFIIS.
During elongation of RNA transcript, a correction mechanism occurs by
substitution of incorrectly incorporated bases, usually by allowing short pauses
during which appropriate RNA editing factors can bind. RNA editing mechanisms
in mRNAs include nucleoside modification of cytidine to uridine (C-U) and adenine
6
to inosine (A-I) by deamination, as well as nucleoside insertions and additions
without a DNA template by proteins called editosomes.
RNA editing in mRNAs effectively alters the amino acid sequence of the encoded
protein so that it differs from that predicted by the genomic DNA sequence
Many RNA transcripts may be rapidly produced from a single copy of a gene, as
multiple RNA polymerases may be transcribing the gene simultaneously spaced
out from one another.
4.Termination
Transcription termination involves cleavage of the new transcript followed by
template- independent addition of adenines at its new 3' end. The later event is
called polyadenylation.
7
-The synthesis of a DNA molecule from an RNA template using reverse
transcriptase (RT) enzyme. The resulting DNA is known as complementary DNA
(cDNA)
-Some viruses use RT to convert their RNA into DNA to be integrated into the
host’s genome. So, the virus DNA can then be transcribed along with the host DNA
and produces new viruses and viral proteins. The retroviral RT has three
sequential activities: RNA-dependent DNA polymerase, RNase H and DNA
dependent DNA polymerase. These activities enable the enzyme to convert single
stranded RNA into double stranded cDNA.
- Reverse transcription can be applied in vitro in polymerase chain reaction (PCR).
This method is known as RT-PCR. RT enzyme here has RNA-dependent DNA
polymerase activity, and can synthesize a single-stranded cDNA from an RNA
template. cDNA can be used as template for PCR amplification.
5'processing
This step occurs shortly after the start of transcription. It includes the addition of
7-methyl guanosine (m7G) to the 5'end. First the terminal 5'phosphate is removed
using phosphatase. Then the enzyme guanosyl transferase catalyzes the reaction
between the 5'end of the RNA transcript and a guanosine triphosphate molecule
(GTP) through 5'5' triphosphate link such that the GTP molecule is oriented in an
opposite direction to the other nucleotides in the RNA transcript chain. Then the
enzyme methyl transferase transfers a methyl group from S-adenosyl methionine
to the guanine ring. This type of cap, with just the (m7G) in position is called a cap
0 structure. The ribose of the adjacent nucleotide may also be methylated to give
a cap 1. Methylation of nucleotides downstream of the RNA molecule produce
cap 2, cap 3 structures and so on. In these cases, the methyl groups are added to
the 2'OH groups of the ribose sugar. The cap protects the 5’ end of the primary
8
RNA transcript from attack by ribonucleases that have specificity to the 3' 5'
phosphodiester bonds. Also, it provides proper attachment to the ribosome
during translation.
Figure (3): 5' processing. Cap 0 is 7-methyl guanosine. Cap-1 has a methylated 2′-hydroxy
group on the first ribose sugar, while cap-2 has methylated 2′-hydroxy groups on the first
two ribose sugars
3' processing
This involves cleavage of the 3' end, then the addition of about 250 adenine
residues to form a poly(A) tail. The cleavage and adenylation reactions occur
primarily if a polyadenylation signal sequence (5'- AAUAAA-3') is located near the
3' end of the pre-mRNA molecule, which is followed by another sequence, which
is usually (5'-CA-3') and is the site of cleavage. Synthesis of poly(A) tail and
termination of transcription requires binding of specific proteins, including
cleavage/polyadenylation specificity factor (CPSF), cleavage stimulation factor
(CstF), polyadenylate polymerase (PAP), polyadenylate binding protein 2 (PAB2),
cleavage factor I (CFI), and cleavage factor II (CFII), that function to catalyze
cleavage and protect the mRNA from exoribonuclease.
9
RNA splicing
The process of removing introns and reconnecting exons. It requires a series of
reactions mediated by the spliceome, a complex of small nuclear
ribonucleoproteins (snRNPs). The types of snRNPs in the spliceome determine the
mechanism of splicing. Canonical splicing, also called Lariat pathway, utilizes the
major spliceome and account for more than 99%.
The major spliceome is composed of the nuclear active snRNPs U1, U2, U4, U5, &
U6 along with specific accessory proteins, U2AF and SF1. This complex recognizes
the dinucleotide GU at the 5'- end of an intron and an AG at the 3' end.
Subsequently, a lariat structure forms providing for both excision of the intron
and proper alignment of the ends of the two bordering exons to allow precise
ligation.
When the 2 dinucleotide sequences GU & AG are not present, non-canonical
splicing removes these rare introns with difference splice site sequences using the
minor spliceome. It is formed of U5, U11, U12, U4atac and U6atac.
Splicing may occur without spliceome like in case of tRNA.
10
Figure (5): Alternative splicing
11
The mature mRNA molecule is now formed of:
1. A 5′-untranslated region (5′ UTR) called cap
2. A protein coding sequence that begins with the initiation codon AUG and ends
with one of three stop codons (UAA, UGA, UAG)
3. A 3′-untranslated region (3′ UTR) called poly(A) tail
Nuclear export
The nuclear envelope serves as a major regulator of gene expression by controlled
flow of RNA to the cytoplasm for translation. Nuclear pore complexes (NPCs)
inserted with in the nuclear envelope regulate the transport of molecules in and
out of the nucleus. Ions, small metabolites and proteins under 40 kDa passively
diffuse across NPC channels. However. Larger proteins and mRNA are transported
through NPCs via energy dependent (GTP) and signal mediated processes that
require chaperoning transport proteins.
NPCs are composed of three major parts:
(1) A central core containing a 10-nm channel
(2) A nuclear basket that can dilate in response to large cargoes
(3) Flexible fibrils that extend from the central core into the cytoplasm. The proteins
that make up the NPC are nucleoporins.
12
A protein called exportin, which belongs to the nuclear pore complex family, binds
to nuclear export signal in poly(A) binding protein. The reaction is regulated by a
small Ras related GTPase, Ran. Ran binds to a molecule called guanosine
triphosphate (GTP) which it hydrolyzes to create guanosine diphosphate (GDP)
and release energy. Ran is in a different conformation depending on whether it is
bound to GTP or GDP. In its GTP bound state, Exportin must bind to Ran-GTP to
form a triple complex with their export cargo. Once it diffuses through the pore
to the cytoplasm, it dissociates. RAN-GTP binds to GTPase activating protein (GAP)
and hydrolyzes GTP. and the resulting Ran-GDP complex is restored to the nucleus
where it exchanges its bound ligand for GTP forming Ran-GTP again.
Translation
In the cytoplasm, mRNA can be translated into protein. The mRNA message is
translated in segments of three adjacent nucleotides called codons. Each codon
is translated into 1 of 20 amino acids. A total of 64 different codons can be
generated from 4 bases found in mRNA. So, more than one codon can encode for
a single amino acid.
The Ribosome, the machinery for protein synthesis, is formed of two subunits:
small and large subunits. Each one is formed of one or more ribosomal RNAs and
many ribosomal proteins. Eukaryotic ribosome is 80S ribosome formed of 40S
small subunit and 60S large subunit.
The ribosome contains three RNA binding sites: A, P & E. The A-site binds an
aminoacyl tRNA, the P-site binds a peptidyl tRNA (tRNA bound to the polypeptide
chain) and the E site, the exit site, binds to a free tRNA.
13
Translation occurs in three steps: initiation, elongation and termination.
Successive tRNAs carry amino acids to the ribosome for joining together in
sequence as the ribosome moves towards the 3′ end of the mRNA. The codons in
the mRNA interact by base-pairing with anticodons of the tRNAs so that amino
acids are incorporated into the growing polypeptide chain in the right order.
Amino acids are joined using peptide bonds. A bond is created by the
condensation of the α-carboxyl group (COOH) of one amino acid with the α-amino
group (NH2) of another. One water molecule (H2O) is lost during this reaction.
The free NH2 and COOH groups of the terminal amino acids define the N- terminal
end and C- terminal end of the resulting polypeptide chain respectively.
14
Figure (9): The mechanism of Protein Synthesis
In mammalian cells, mRNA life time ranges from several minutes to days.
The limited lifetime of mRNA enables a cell to alter protein synthesis in response
to its changing needs. The stability of mRNA is regulated by the untranslated
regions (UTRs) of mRNA.
*Exosome complex: a protein complex capable of degrading various types of RNAs. It has an
exoribonucleolytic function and an endoribonucleolytic function.
*Cap binding complex: binds to 5′ cap protecting it from degradation
15
Post translational modification
To be functional, proteins needs to be properly folded to acquire its native three-
dimensional structure, often modified and transported to the final destination.
For achieving this, some reactions referred to as post translational modifications
are required. Examples:
Protein folding may start during translation or after it. Folding may occur
spontaneously with the effect of temperature, pH and concentration of salts.
However, most proteins require other proteins to help proper folding. These are
called molecular chaperones. They work by reducing possible unwanted
aggregation of the polypeptide chains. Chaperones are not to be confused with
folding catalysts, which actually do catalyze the otherwise slow steps in the
folding pathway. Example is protein disulphide isomerase involved in the
formation of disulphide bond.
The chaperones are also important in protein transport, degradation, and even
allow denatured proteins exposed to certain external noxious factors an
opportunity to refold into their correct native structure.
1. Primary
2. Secondary
3. Tertiary
4. Quaternary
16
-Primary structure is the polypeptide chain produced from translation.
-Secondary structure is the result from the first step in the folding process.
Secondary structures include alpha helices and beta pleated sheets that fold
rapidly because they are stabilized by intramolecular hydrogen bonds. These
bonds contribute to protein stability.
17
Figure (11): Protein structure
18
DNA Replication
An accurate copy of DNA is a prerequisite for cell division to occur in both somatic
and germ cells. DNA replication occurs during the S phase of the cell cycle.
It is a semiconservative process meaning that that the original DNA double helix
is not conserved after one round of replication; instead, each strand of the double
helix becomes part of a new helix. In mammalian cells, DNA replication occurs at
a polymerization rate of about 50 nucleotides per second. The speed and
accuracy of this process are under the control of a group of enzymes forming a
replication machinery.
The replication process starts at the replication origin by untwisting the DNA
double helix by initiator proteins. DNA helicase then separates the two strands
by breaking apart the hydrogen bonds between the nitrogenous bases. It
continues to moves along the DNA molecule separating the strands forming the
replication bubble. DNA replication proceeds in both directions in what are called
replication forks. Single-stranded DNA binding proteins bind to exposed DNA
strands without covering the bases, allowing them to remain available for base
pairing. These proteins also help open the DNA helix by stabilizing the unwound,
single-stranded conformation.
19
Figure (13): Replication bubbles
The actual process of replication occurs at the replication fork. DNA Polymerase
moves along the old strand only in 3′→5′ direction creating a new strand having
5′→3′ direction. DNA polymerase cannot begin a new chain; it can only add a
nucleotide onto a pre-existing 3′-OH group. This necessitates the synthesis of an
RNA primer (short sequence of about 10 ribonucleotides) by RNA primase. Then,
DNA polymerase adds the deoxyribonucleotides to the RNA primer.
Because the two parent strands are antiparallel, the new strands are synthesized
in opposite directions along the parent templates at each replication fork. Thus,
they are elongated by different mechanisms. The leading strand is continuously
built in the direction of the replication fork using DNA polymerase, while the
lagging strand is discontinuously synthesized in a series of short segments called
Okazaki fragments (each is about 100 to 200 nucleotides length in eukaryotes).
They are synthesized in a direction opposite to the replication fork. For each one,
RNA primase adds an RNA primer then DNA polymerase follows by adding DNA
nucleotides.
20
After completion of DNA synthesis in the new strands, RNA primers in both
leading and lagging strands are removed by RNases and replaced by DNA using
DNA polymerases. DNA ligase then seals any nicks present in both strands.
DNA polymerases are highly accurate, with an intrinsic error rate of less than one
mistake for every 107nucleotides added. Besides, Many DNA polymerases contain
an exonuclease domain, which detects base pair mismatches and further removes
the incorrect nucleotide to be replaced by the correct one. DNA profreading
mechanisms are superior to RNA.
Unwinding of DNA by helicase forces the DNA to rotate. This process results in a
build-up of twists in the DNA. This build-up forms a torsional resistance that would
eventually halt the progress of the replication fork. DNA gyrase (a specific type of
topoisomerase) relieves the tension caused by unwinding the two strands of the
DNA helix.
DNA clamp is a protein that forms a sliding clamp around DNA, helping the DNA
polymerase maintain contact with its template, thereby assisting with
processivity. The inner face of the clamp enables DNA to be threaded through it.
Once the polymerase reaches the end of the template, the sliding clamp
undergoes a conformational change that releases the DNA polymerase.
DNA replication can be performed in vitro. DNA polymerases isolated from cells
and artificial DNA primers can be used to start DNA synthesis at known sequences
in a template DNA molecule. PCR is a popular method for DNA amplification
(replication).
21
References
1. Wagner AJ, Berliner N, Benz EJ. Anatomy and Physiology of the Gene. In:
Hoffman R, Benz EJ, Silberstein LE, Heslop HE, Weitz J, Anastasia J et al.
Hematology: Basic Principles and Practice. Seventh edition. Philadelphia:
Elsevier;2018.3-16
4. Tu NC & Friedman RA. Genetics of Era Disorders. In: Flint PW, Francis HW,
Haughey BH, Lesperance MM, Lund VJ, Robbins KT et al. Cummings
Otolaryngology: Head and Neck Surgery. Seventh Edition. Philadelphia:
Elsevier;2021.2269-78 e2
8. Sfanos KS & Gonzalgo ML. Molecular Genetics and Cancer Biology. In:
Partin AW, Dmochowski RR, Kavoussi LR, Peters CA. Campell-Walsh-Wein
Urology. Twelfth Edition. Philadephia: Elsevier; 2021.1346-69. e14.
10.Gupta S & Yel L. Molecular Biology and Genetic Engineering. In: Burks AW,
Holgate ST, O'Hehir RE, Broide DH, Bacharier LB, Khurana Hershey GK.
Middleton’s Allergy: principles and practice. Nineth Edition. China: Elsevier;
2020. 154-175. e1
22