Lecture Notes, Lectures 12+ - DNA Lectures
Lecture Notes, Lectures 12+ - DNA Lectures
DNA Summaries
Lecture 12:
1950s – knew genes were inherited via chromosomes but unsure whether protein or DNA
Proteins DNA
Amino acids in sequence (20 letters) variation 4 letters in sequence (ATGC)
Protein structures varied DNA looks similar
Enzymes studies
Proteins unstable DNA stable
Erwin Chargaff:
o DNA-based content different b/w species
o DNA the same within one organism
o Does not vary over time/environment (stable)
o A=T
o G=C
Rosalind Franklin, Maurice Wilkins X-ray crystallography of DNA indicated double helix and periodicity
Watson and Crick developed double helix DNA model
o Satisfied periodicity and equal A-T, G-C ratio
o Satisfied replication and translation – one strand acts as a template
Transcription of DNA into complementary RNA
Translation of RNA on ribosome into polypeptide chain
RNA
o Nucleic acid
Different types of base pairing with modified bases
E.g. Triple base pair: cytosine+guanine+7-methylguanine distortion of duplex
o Double stranded in sections, more flexible loops + varied 2o structure
Hairpins – exact reverse complement in a single strand
Internal loops (bulging)
o Involved in translation and transcription
o Makes riboprotein complexes which carries out enzymatic functions
o Regulates gene expression
o Viruses RNA = genetic material (retrovirus = reverse
translation, RNA DNA)
Nucleotides
o Make up nucleic acids
o Phosphate + pentose sugar +(N-β-glycosidic bond)+ base
o Pentose
Ribose = RNA
Deoxyribose = DNA (C2 no OH)
Linear = aldehyde
Circular = β-furanose
o Bases
Adenine, Guanine = purines = two rings
5C ring not flat puckered
4 structures depending on
whether C2 or C3 up/down
Cytosine, Thymine/Uracil = pyrimidines
A-T = two H bonds
Lecture 13:
DNA replication occurs during interphase
o M phase is when chromosomes divide
Proteins needed so that nucleotides can be connected to template strand with phosphodiester bonds
o New nucleoside triphosphate added with 2Pi cleaved
DNA polymerase – 5 types in E.coli (2 involved in DNA replication and others in DNA
repair)
DNA polymerase I – functions in repair and recombination
o Wraps around DNA in pocket with palm, thumb and fingers
o Adds nucleotides along DNA template strand 5’ 3’ processive
o 2nd active site for exonuclease activity
Polymerase repositions the mispaired 3’ terminus (nucleotide)
into the 3’5’ exonuclease site
The newly vacant 3’ terminus repositions back to the polymerase active site, adds
correct nucleotide
Excised nucleotide with one phosphate goes back into pool since only triphosphates can
be added to strand
DNA polymerase III – main polymerase in DNA replication (we need 2, one for each strand)
o Much bigger complex with more proteins
o 2 β sliding clamps keep DNA associated with polymerase
o Can only operate on 3’ end
o Adds nucleotides onto 3’ end of new strand; strand grows 5’3’
o Asp on DNA Poly II co-ordinates with 2xMg2+ which co-ordinates with the phosphate oxygen
already on growing strand and the phosphate oxygen on nucleotide to be inserted
o RNA primer needed to start double stand on growing strand of DNA
RNA does not need double strand to start synthesis
DNA replication makes 1 mistake every 109-1010 nucleotides (1/1,000-10,000 replications in E.Coli)
o Binding of nucleotides and proofreading ensures accuracy
o Incorrect base pairing doesn’t fit in enzyme pocket
DNA Polymerase I DNA Polymerase II
Subunits 1 >10
Mr 100,000
3’5’ exonuclease (proofreading, Yes Yes
removal of nucleotide backwards)
5’3’ exonuclease (i.e. removes in Yes No
forwards direction)
Initiation of Replication:
Prokaryotes – single replication origin with replication forks in either direction
Eukaryotes – multiple replication bubbles with replication forks in either direction which eventually join
more efficient for more genetic material
DnaA proteins recognise and bind to the origin sequence (oriC, 4x9bp) - Ecoli
o Forces DNA into tighter sequence, higher tension overwinding
o Active DnaA requires ATP but does not hydrolyse it
o DUE sequence (rich in A=T pairs, 3x13bp) opens/denatures by itself under tension
DnaB protein (helicase) loads onto open DUE sequence and unwinds DNA
o 6 subunits which spin as it unwinds DNA and separates base pairs, using ATP
DnaG protein (DNA primase) synthesises RNA primers
Single-stranded DNA-binding protein (SSB) binds and protects single-stranded DNA
DNA gyrase (DNA topoisomerase II) relives torsional strain (by forming isomers) generated by DNA
unwinding by underwinding DNA just ahead of replication fork
Semiconservative
Leading/Lagging Strand:
Leading strand adds nucleotides continuously towards replication fork
Lagging strand has Okazaki fragments (5’3’) added away from fork for about 1000 base pairs
o Lagging strand looped around so it can be replicated in correct direction at same time as leading
strand
o DNA polymerase III detaches when it reaches next RNA primer (and previously synthesised
fragment) and moves up towards fork to make new fragment
β clamp releases lagging strand
o Clamp loader swivels around and attaches the β clamp and transfers this to DNA Pol III, creating
a new loop which grows as nucleotides added
DNA polymerase 1 removes RNA primer (since it has exonuclease activity 5’3’ direction) and
replaces RNA with DNA one nucleotide at a time
DNA ligase forms phosphodiester bond b/w DNA, joins 3’5’ direction
Telomeres:
Primer added to end of linear chromosomes, gaps when removed
Telomerase adds new DNA at the ends so they are not degraded and lost with each replication
o Telomerase is a reverse transcriptase that carries a short stretch of RNA which hybridises to
parent strand and allows its extension
Mismatch Repair:
Specific to E.coli
Error during replication that was not corrected by the polymerase
11x GATC sequence (palindrome on other strand, located in OriC locus) has A methylated by Dam
methylase
o NB: Not the same as chromatin remodelling in eukaryotes – cytosine methylation
o Hemi-methylated state for short period where proteins can distinguish b/w old and new
strand and identify mismatch
Dam methylase adds CH3 other strand after a few minutes so that DnaA binding occurs for replication
Nick in new strand near the mismatch, exonuclease (e.g. helicase) gets rid of short sequence of bases
and DNA Pol III fixes region
o Reasonably expensive mechanism, uses ATP to take out bases important process
o DNA always has 5’ to the top left, promoter is left of the 5’ of coding strand
RNA peels off as it is made 5’3’
Different genes may be transcribed from different strands
RNA promoters
o Upstream=negative from RNA start, downstream=positive from RNA start
o Transcription machinery recognises -35 region, -10 region upstream
Eukaryotes:
RNA polymerase I – encodes pre-ribosomal RNA which is processed
RNA polymerase II
o 12 subunits
o Conserved structure, function and mechanism similar to E.coli RNA polymerases but more
complex
o Transcribes/expresses almost all genes that encode mRNA to be translated into proteins
o Recognition sequence for RNA polymerase = TATA box (= TATAAAAA = directional)
o Requires transcription factors
RNA polymerase III – tRNAs and splicing RNAs
Transcription Factors:
General transcription factors required for expression of almost all genes
o TBP = TATA Box binding Protein
First protein that recognises promoter sequence
Bends the DNA at the promoter (TATA box)
Easier to open? Or easier for region to be recognised by other proteins?
o TFIIB binds to TBP, recruits RNA Pol II to bind to TATA box
o TFIIH has helicase activity, unwinds DNA at promoter so transcription can start
Not the same helicase (DNA replication) but similar activity in unwinding
TFIIH phosphorylates CTD (C-terminal domain) of RNA Pol II which alters the structure so that RNA
polymerase can transcribe
o CTD domain has repetitive structure allowing it to coil in β spirals, very flexible different
shapes bind to different proteins (like regulation centre for RNA Pol II)
mRNA Processing:
After transcription, pre-mRNA (primary transcript) has to be processed to make it more stable
o Markers regulate RNA’s stability since it is unstable and can be degraded
o Processing occurs while mRNA still being produced
CTD controls initiation elongation, termination, 5’ cap, spliceosome and polyA tail
5’ cap
o Modified nucleotide (7-methylguanosine) connected in completely different way, signal stability
2x 5’ joined together instead of 5’3’
All 3 phosphates of nucleotide are in between two nucleotides
o Cap synthesising enzyme on CTD
As RNA comes off, cap is put on and tethered to the CTD (controlled by CTD)
Splicing introns
o Eukaryotic genes have introns that need to be spliced to form mature RNA
o Spliceosome (proteins + RNA composition) subunits called snRNPs (U1-6 = forms of snRNP)
o Recognisable sequences in the intron
GU at the start - U1 snRNP base pairs to ψ (modified bases) located after GU
A in the middle
U2 binds to A and sequence around it
Rest of the proteins (U4-6) bind and sequence loops up, bringing A over to GU
which bind and a cut is made precisely at end of LHS intron
AG at the end, which is then cut and two ends are spliced together with no errors
Lasso of DNA released (Lariat structure)
ATP used for this process
o Splicing occurs on CTD snRNPs bind to CTD
As RNA produced, first intron binds to spliceosome and spliced out during transcription
Poly A tail
o Preserves RNA
o At the end of 3’ end of eukaryotic mRNAs there is a sequence AAUAAA but transcription
continues
An exonuclease cleaves the mRNA after its AAUAAA sequence (attached to the CTD)
This sequence recognised by polyadenylate polymerase which adds many A’s after
o Determines how long a protein hangs around: long tail = long time
Aminoacylation
Aminoacyl-tRNA covalently attaches amino acid to tRNA using ATP
o Carboxyl end of amino acid has ester linkage with 3’ end of tRNA
o Can have monomeric and dimeric enzyme
o Dimer attaches amino acids to 2 tRNAs at once
o 20 aminoacyl tRNA synthetases for each amino acid and all its correct
tRNAs
Amino acid arm with amino acid language, 3’ end
Anticodon arm (with nucleic acid language, 5’ end) binds to codon on mRNA
Mutations:
o Exit site
o Becomes free when outgoing tRNA ejected (translocation)
Ribosomal dissociation occurs once A site encounters a stop codon. Release factor protein binds to
the A-site and hydrolysis breaks the carboxyl bond peptide released
o Release factor structurally similar to tRNA so it fits in the A site
Eukaryotes:
Ribosomes in cytoplasm and ER
Signal sequence in mRNA determines whether a ribosome is moved from cytosol to the membrane of
the endoplasmic reticulum
o Protein fed into ER where modifications or secretions occur
o Signal peptide cut off
o E.g. Insulin produced by β cells in pancreas and has signal sequence which determined whether
it is secreted into bloodstream
o Proteins transported in vesicles to Golgi – sorting apparatus
Cytoplasm compact with protein translation apparatus
Glucose:
Lactose takes more energy input to break down than glucose (which feeds straight into glycolysis)
cAMP accumulates in low glucose, binds to and activates CAP (promoter upstream of RNA polymerase
binding site) activates lac operon gene expression positive regulation
Low ATP = high cAMP = active CAP = expression of genes on lac operon
Expression of lac operon never completely off, always a low level of expression
Combinatorial control: several important regulators of a single gene, messages integrated for a single
response; sum total of all activators/repressors
Gene activation synergistic collectively much greater level of transcription
Regulation important in eukaryotes since many different cell types with certain necessary proteins
o Liver cell have all transcription factors required to have increased expression of albumin
o Lens cell has transcription factors for crystalline, which makes the cell transparent
Expression of albumin in the lens would make them opaque blindness
Transcription factors as regulatory proteins account for 12% of genes
Chromatin can block RNA polymerase access (eukaryotic genes more protective)
Basal transcription is low in eukaryotes (regulation mainly of promoters), high in prokaryotes
(regulation mainly of repressors)
o Majority of regulation is positive, not negative
o Also more proteins involved in transcriptional regulation
More transcription factors per gene, average of 6 sites
Mediators:
Mediator connects activator (with bound enhancer) and the coding region of a gene
o Histone modification/nucleosome modelling complex of mediator binds to activator
Mediator activates TFIIB and TBP so that other proteins will bind to coding sequence, leading to
phosphorylation of the CTD of RNA Pol II leads to transcription initiation
DNA Structure:
Major groove more open than minor groove
Base pairs in major groove more exposed which transcription factors recognise and bind to via DNA
binding domain
o Base pairing fit and charge more differentiable/distinct
DNA-binding protein has more circumference within major groove to dip in and reach nucleotides
At least two domains of protein structure; a DNA binding domain and an activation domain: modular
o Zinc finger motif has helix binding into the major groove
Since small, they don’t recognise a big sequence, so multiple zinc fingers improve
specificity
Zn+ co-ordinated with cysteines and/or histidines maintains structure while helix binds
in DNA pocket
Can be many different transcription factors that have zinc fingers, since there are not
many structures that can dip into the major groove
o Helix loop helix
Works as a dimer
Helix binds into major groove with loop (DNA-binding domain) and then
rest of protein may lead into another domain (could interact with
mediator)
Both helices in the dimer bind in major groove, likely to be a palindrome in the DNA
sequence
Post-translational Modifications:
Formation of disulphide bridges for folding + stability
Glycosylation (in ER) assist sin folding and stability, important for extracellular functions
Ubiquitination – ubiquitins covalently (reversibly) add target proteins to degrade it, or can act as a
signal
SUMOylation – addition of SUMO protein changes localisation or binding partners (function)
Phosphorylation
o Addition of phosphate groups (charged) activates/deactivates proteins by changing
conformation
o Induces protein-protein interactions
o May change how it interacts with DNA
o Reversible modification
Palmitoylation – covalent attachment of lipid to a protein, important if protein embeds in membrane,
used in neuronal synapses
DNA Supercoiling:
DNA not a static molecule, it has a tendency to not be in a straight line
Supercoils form to relieve tension, since DNA usually under tension (over or underwound) regulated
by the cell
o Next step up from DNA helix
o Coil within a coil
o Most cellular DNA is underwound makes it easier for the strands to separate e.g. for
replication
o During DNA replication, nucleosome released from histone core DNA strongly overwound
topoisomerase required to separate strands
Agarose gel running DNA (same weight, same # base pairs) with different numbers of supercoils shows
ladders
o Smallest fragments with most supercoils run the fastest, go to bottom
RNA polymerase puts tension in DNA, tension/supercoils relieved by topoisomerase
o Type I = simple, bacterial, cuts doubles stranded DNA, flips it around and re-joins
o Type II
Multisubunit enzyme (dimer) binds a segment of a DNA molecule, with one piece to be
cut bound in the C gate
A second segment of the same DNA molecule trapped in the N gate
DNA in C gate is cleaved on both strands to form two 5’-phosphotyrosyl linkages with
amino acids to protect ends
The N gate DNA segment is passed through the break
The broken DNA in C gate is re-ligated and released error-free
More Packing:
Nucleosome + H1 b/w nucleosomes form helices = 30nm fibre (~100 fold compaction)
o Packing level for DNA not being accessed by DNA binding proteins
o 30nm affected by chromatin remodelling complex which activates gene remodelled
nucleosomes (can modify nucleosome separation), histone removal, histone replacement
o 30nm with histone-modifying enzyme specific pattern of histone modification e.g.
methylations, phosphorylations
X structure of chromosome has a chromosomal scaffold even when DNA not attached
Areas of gene activity are not as tightly packed, loops have high level of gene expression
Chromosomes move around in cell depending on how it is being expressed
o Cells have expression and repression domains
DNA Modification:
DNA methylation has no effect on base-pairing
De novo DNA methyltransferases (DNMT) enzyme methylates CPG dinucleotide clusters (islands) in
promoters
Methylation represses the expression of a promoter
Can be inherited because hemi-methylated sequences are recognised by maintenance DNA
methyltransferases, causing methylation on both sides of double stranded DNA
Genes that make cells grow/divide often repressed by methylation rather than having a protein bound
(can still produce enzyme)
Lecture 20:
Genome sequencing/genomics; haploid human genome ~ 3.2 x 109 base pairs
o Measuring human variation
Human copy number variation, some people have extra or less DNA in certain regions
people have different numbers of genes
o Medical genetics – gene data publically released, genomes compared to identify genetic
disorders e.g. Alzheimer disease
o Evolutionary past
o Human migration map
Observing Y-chromosome or mitochondrial genomes (since they change slowly)
Genomes:
Nuclear genome: 25,000 genes, 1.5% coding, lots of introns
Mitochondrial genome
o Dense packing – little space b/w genes
o Not much regulation – expressed all the time
o Variant genetic code
4 of the 64 codons encode a different thing compared to the nuclear genome
Wobble in third base can be A, T, C, G 4 possible options
Fewer tRNAs, slightly different
o Mitochondrial diseases affect energy, development, vision, cause seizures (CNS issues)
o Circular, own ribosomal RNAs, genes produce proteins for its own function
o Mitochondria requires proteins encoded by both the mitochondrial genome and nuclear
genome
Defects can be mutations in nucleus or mitochondrial gene (maternal inheritance)
Transposons/mobile elements:
Active piece of DNA that can move from place to place by being cut and inserted into different place
(few can do this)
3 different types make up 45% of human genome, most located in an intron
DNA transposons move using a DNA intermediate
Retrotransposons move using an RNA intermediate
o LINE elements (long)
~6.1kb long
Encode reverse transcriptase so can move autonomously
o SINEs (short)
~350bp, Alu repeats
1.5 million copies in human genome
Only encode a copy of themselves
Do not encode reverse transcriptase and cannot move autonomously unless LINE
nearby with RT which can produce SINE somewhere else
Retrovirus
o Virus injects its RNA genome and proteins into host cell
o Virus’ reverse transcriptase enzyme takes RNA and produces DNA copy, which can join with
DNA of host
o This DNA will be transcribed producing RNA + genes translated into reverse transcriptase
and envelope/coat new virus particles
Retrotransposons are an evolutionary descendent of viruses, non-infectious (can’t produce coat)
Alzheimer’s Disease:
Before whole genome sequences and we only had maps Pedigrees of families with early onset
Alzheimer collected
Genomic comparison of people on pedigree using SNPs
o Single nucleotide polymorphisms (a natural change, not causing a disease)
o Observe whether SNPs relate to same location as the disease gene
o Get a region of interest, start to isolate DNA out of that and sequence it to understand where
gene is
Identification of the chromosome and chromosomal region
Identification of the genes in the region using the database of human genome
Sequencing and comparison of these genes b/w people with and without early onset Alzheimer
Identification of the presenilin-1 gene
PCR:
Amplifies one piece of DNA
98oC denaturation single-stranded DNA
58oC allows annealing of DNA primers specific to the gene (~20bp long)
72oC allows DNA Taq polymerase to add nucleotides to 3’ end of each primer (withstands high
temperatures, otherwise you have to keep adding DNA polymerase as it is degraded)
One round doubles the amount of DNA exponential
RT-PCR:
Amount of cDNA produced from reverse transcriptase reflects how
much a particular gene is expressed
When cDNA put through PCR cycle, amount of signal in exponential
phase is relative to the amount of starting cDNA
Sample 1 has gene of highest expression since greater signal within
fewer number of cycles
Plateau due to running out of nucleotides and primers
RT-PCR produce can be stopped at a certain cycle and run on a gel for semi-quantitative experiment
gel intensity indicates expression
Fluorescent probe (with sequence specific for DNA of interest) has a fluorophore signal which is
quenched when the probe forms a semi-stable hairpin (reverse complement sequence)
Probe binds preferentially to target DNA fluorophore is separated from quenching molecule and
fluorescence signal increases
As amount of DNA increases, more probes come out of hairpin and bind more fluorescence
Detector measures amount of light emitted during the reaction and indicates level of expression
Western Blot:
SDS-PAGE gel run
o SDS = detergent unfolds proteins which separate on basis of size
o PAGE = polyacrylamide gel electrophoresis, sandwiched b/w 2 glass plates, very thin and runs
vertically
Smaller segments run further
Transfer proteins onto a polymer sheet, which is exposed to radiolabeled 1o antibodies which selects
gene
o Protein bands detected by specific antibodies are exposed to film
o A 2o antibody (that detects 1o antibody) produces luminescence in proportion to the amount of
protein
o Transferred to polymer sheet used where exposure to light creates image of antibodies bound
to the blot
RNA-seq database
Whole Transcriptome Shotgun Sequencing all of the mRNAs in a cell are observed
RNA sample Reverse Transcriptase cDNA library Next Gen sequencing of all cDNAs in
transcriptome RNA-seq data matched to database to identify genes
If cDNA reads are taken, then gene is expressed in the certain tissue sample
Flat signal = no cDNA detected = gene not expressed in sample
Lower quantity of expression = introns
Cyclins:
Undergo a cycle of protein synthesis and degradation (abundance
cycles with cell cycle)
Essential regulators of CDK activity, also regulated themselves
Animal have 10 cyclins
G1/S cyclins – abundant when cell passes start
o Cyclin E + Cyclin E-CDK2
S-cyclins – abundant for long period of time
o Cyclin A + Cyclin A-CDK 2 increases from S-phase and drops mid-mitosis-phase
G2/M cyclins (A) – increases before mitosis then drops during mitosis
o Cyclin B increases from G2 and drops mid-mitosis-phase
o Cyclin B-CDK1 spikes during M phase
Held inactive by phosphorylation of Tyr15 until mid-mitosis spike
o Intact DNA has no checkpoint response, no p21 produced, CDK is active and phosphorylates pRb
which now cannot bind to E2F E2F active which activate enzymes for DNA synthesis
o Retinoblastoma = rare autosomal disease in which cancer forms in retinal cells loss of vision
o Rb was the first tumour suppressor gene cloned and is involved in many sporadic tumours
Rb1 negatively regulates cell proliferation, loss of function causes cancer
o Oncogenes: mutation modifies a normal gene (proto-oncogene) and now the protein signals
cell proliferation constitutively
G protein dissociates from receptor and diffuses through membrane to activate the enzyme
β-adrenergic receptor:
Epinephrine binds to specific receptor and sits b/w loops at tops of helices of GPCR
G protein has bound GDP replaced by GTP activation
G protein diffuses through membrane and activates adenylyl cyclase, which catalyses the formation of
cAMP
cAMP activates downstream signalling molecules e.g. PKA (cyclic AMP dependent protein kinase) which
phosphorylates cellular proteins
GTP is hydrolysed by the protein’s intrinsic GTPase, inactive α subunit reassociated with β,γ subunits
We have amplification: 1 epinephrine molecule 100,000 glucose molecules released into
bloodstream
o Epinephrine ligand affects 10 G-proteins which produce lots of cAMP
o Trans triple auto-phosphorylation of carboxyl-terminal Tyr residues
o Activation loop moved dramatically in tyrosine kinase domain space for target protein in
substrate-binding site
o Tetramer phosphorylates IRS-1 (insulin receptor substrate 1) on its Tyr residues
o SH2 domain of Grb2 binds to phosphorylated Tyr of IRS-1
Sos bridges Grb2 and Ras, causing GDP release and GTP binding to Ras (G-protein)
o Transcription factor activated in cytoplasm, moves to the nucleus through pores gene
expression stimulated
Dominant negative inhibition occurs when one dimer mutated and the tetramer formed has overall no
kinase activation