MG - L8 - Genomics & Proteomics
MG - L8 - Genomics & Proteomics
PROTEOMICS
OUTLINE
2
GENOMICS
3
GENOMICS
4
WHAT IS TRANSCRIPTOMICS?
The transcriptome is the set of all RNA molecules,
including mRNA, rRNA, tRNA, and non-coding RNA
produced in one or a population of cells
Can mean the total set of transcripts in a given organism
OR
A specific subset of transcripts present in a particular cell
type
5
WHAT IS TRANSCRIPTOMICS?
Unlike the genome, which is roughly fixed for a given
cell line (excluding mutations), the transcriptome can
vary with external environmental conditions.
7
WHAT IS TRANSCRIPTOMICS?
10
OUTLINE
11
GENETIC MAPS
Genetic (linkage) map approximately provides the
location of one gene relative to the locations of other
known genes. Unit: cM, map units
Estimate recombination frequency between loci in the
progeny by Testcross
50% - loci on the different chromosome or far apart on the
same chromosome
< 50% - loci close together on the same chromosome
AB
Paternal chromosome a b
ab
• no Ab or aB gametes
• genes are tightly linked
TESTCROSSES
A testcross is a method of determining if an
individual is heterozygous or homozygous
dominant
An individual with unknown genotype is crossed
with one that is homozygous recessive (PP x pp)
or (Pp x pp)
PP pp Pp
genotype: (homozygous for (homozygous for (heterozygous at
dominant allele P) recessive allele p) the P gene locus)
phenotype:
Recombinant
gametes as a result
of crossover during
meiosis in the
heterozygous parent
A number of
recombinant offspring
are fewer than non-
recombinant offspring
Figure 5.9
GENETIC MAPS
20
21
PHYSICAL MAPS
22
23
24
25
SEQUENCES OF GENES &
CHROMOSOMES
26
A QUICK HISTORY OF
SEQUENCING
1953 – Structure of DNA solved
1977 – Sanger sequencing invented
– First genome sequenced - Phage Φ-X174 (5 kb)
1986 – First automated sequencing machine
1990 – Human genome project started
27
A QUICK HISTORY OF
SEQUENCING
1995 – First bacterial genome
Haemophilus influenzae (1.8 Mb)
1998 – First animal genome
Caenorhabditis elegans (97 Mb)
2003 – Completion of Human genome project
Homo sapiens (3 Gb), 13 years, $ 2.7 bil.
2005 – First ‘next-generation’ sequencing
instrument
2013 – > 10,000 genome sequences in NCBI
database 28
GENERATION OF SEQUENCING
TECHNOLOGIES
29
GENERATION OF SEQUENCING
TECHNOLOGIES
First generation: Sanger & Maxam-Gilbert
technique
Next (second) generation: Roche 454, Illumina
Solexa, ABI SOLiD
Third generation: Pacific Bio, Helicos
Other/4th generation: Oxford nanopore, Polonator
Ion Torrent??
30
31
SANGER SEQUENCING: DYE-
TERMINATOR SEQUENCING
32
SANGER SEQUENCING: DYE-
TERMINATOR SEQUENCING
33
CHARACTERISTICS OF NGS
There are four main sequencing methods
Pyrosequencing (454)
Reversible terminator sequencing (Illumina)
Sequencing by ligation (SOLiD)
Semiconductor sequencing (Ion Torrent)
35
COMPARISON OF NGS
36
37
BIOINFOMATICS
Bioinformatics = Molecular biology + Computer
Developing databases, computer-based algorithms,
gene-prediction software, analytical tools to “mine the
data” from sequencing projects
Assembly – align multiple sequencing reads that are
overlapping with one-another to reconstruct a long
DNA fragment
Annotation – link sequence information to its function
& expression on similar genes in other species
BLAST (Basic Local Alignment Search Tool)
38
Basic scheme of a NGS
39
40
Diagram of de novo sequence assembly
41
APPLICATIONS OF NGS
Whole genome sequencing
• De novo assembly
• Re-sequencing
• Comparative genomics
42
APPLICATIONS OF NGS
RNA-seq
Gene expression
Transcriptome assembly
43
APPLICATIONS OF NGS IN
ANIMAL BIOTECHNOLOGY
44
APPLICATIONS OF NGS IN
HUMAN HEALTH
Cancer
research
Genetic
disorders
Personalized Human
medicine microbiome
Pre- &
Infectious
post-natal
diseases
diagnosis
45
46
OUTLINE
47
EXPRESSED-SEQUENCE TAGS
ESTs are markers associated with DNA
sequences that are expressed as RNA
Reverse transcription
Tag as a marker to find active
genes
Sequencing
Set of cDNA
fragments
48
EXPRESSED SEQUENCES
Eukaryote genomes contain a small proportion of the
DNA encodes protein – 1% in human
Analysing cDNA (DNA complementary to RNA) or
EST to focus on the protein-coding content of genomes
Study of gene expression to monitor changes in total
genome expression overtime
Development
In response to changes in the environment
49
DOT BLOT OR ARRAY HYBRIDIZATION
ANALYSIS OF GENE EXPRESSION
Gene-specific
nucleotide probes are
applied to a membrane
in a specific pattern.
Labeled (fluorescent or
radioactive) cDNA
preparations are
hybridized with the
probes on the
membrane.
50
MICROARRAYS (GENE CHIPS)
A microarray contains thousands of hybridization
probes on a single membrane or silicon wafer to
simultaneously detect the expression of many genes
A chip of 23,000 human genes
Microarrays are produced in several ways
Microsynthesis of oligonucleotides in situ
Spotting prefabricated oligonucleotides on solid supports
Spotting DNA fragments or cDNAs on solid supports
Probes on microarrays are hybridized to fluorescent
cDNA samples
51
52
53
GENE CHIPS
54
Typical dual-colour
microarray experiment
55
Interpretation
mRNA
AAAAAAAAAAA
Short over-lapping
sequence reads
Application
Simultaneously map transcribed Yes Limited for gene expression Yes
regions and gene expression
Dynamic range to quantify gene Up to a few 100 fold Not practical > 8000 fold
expression level
Practical Issues
Required amount of RNA High High Low
Cost of mapping transcriptome of
High High Low
large genomes
Pathway Analysis
Muñoz Garcia et al. (2018) Pathway analysis of transcriptomic data shows immunometabolic effects of vitamin D. J. Mol.
Endocrinology 60 pp. 95-108
Issues with RNA seq technologies
• Involves amplification, usually by PCR - can sometimes get
sequence amplification bias
• Some transcripts reverse transcribe less efficiently
• Sequencing of short reads sometimes makes determination
of RNA processing (e.g. alternative splicing) difficult to
ascertain
Parker et al. (2020) Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA
processing and m6A modification. eLife 9:e49658.
OUTLINE
64
GOALS OF PROTEOMICS
mRNA will produce relative little protein if the mRNA
is short-lived or poorly translated
Many proteins are post-translationally modified in
ways that affect their activities, and transcription
profiling gives no data regarding this level of
regulation.
The goals of proteomics is the identification of the full
set of proteins produced by a cell or tissue under a
particular set of conditions: their relative abundance,
their modification, their interacting partner proteins.
65
WHAT DO WE WANT TO
KNOW?
What proteins are there?
How much protein is present?
Does the level change under certain conditions?
Is the protein active?
What does the protein do?
What other proteins does it interact with?
Is the protein modified? Under what circumstances? What
are the consequences of modification?
2D-PAGE
Key principles of 2D-PAGE (Two-Dimensional
Polyacrylamide Gel Electrophoresis)
Proteins differ from each other in terms of their mass
and charge
Both these properties can be used to separate
proteins by gel electrophoresis
The successive application of both techniques in
perpendicular directions (two dimensions) provides
maximum separation and allows thousands of
proteins to be resolved
67
2D-PAGE
Key principles of 2D-PAGE (Two-Dimensional
Polyacrylamide Gel Electrophoresis)
Staining the gel reveals the positions of individual
proteins as spots or smudges. These can be picked
and analysed by mass spectrometry.
There are tens of thousands of proteins in the cell,
differing in abundance over six orders of magnitude.
2D-PAGE is not sensitive enough to detect the rare
proteins and many proteins will not be resolved.
Splitting the sample into different fractions is often
necessary to reduce the complexity of protein
mixtures prior to 2D-PAGE. 68
WHY SDS?
All proteins have different charge
Determined by average charge of all amino
acids
SDS is a –very charged detergent
Before SDS
Coats proteins, denatures them, gives them
a uniform negative charge
Charge is dependent on molecular weight
73
HOW DOES 2D-PAGE WORK?
When a voltage is applied across the gel, the proteins move through the
gel until they reach the point at which their charge is the same as the
surrounding pH. This separation is called isoelectric focusing
74
HOW DOES 2D-PAGE WORK?
76
BASIC PRINCIPLES OF MASS
SPECTROMETRY
Mixture of proteins is ionised by
laser pulse
Accelerated by electrostatic field
Separated by time of flight
(determined by mass/charge ratio)
Detected as a TOF spectrum
PEPTIDE MASS FINGERPRINT
IDENTIFICATION
Why trypsin?
MEHTTSRYLLDEDDKIAQNFLLEWA
It cleaves on C-end of lysine +
arginine residues
Therefore a given protein sequence
should always produce the same MEHTTSR YLLDEDDK
peptides
IAQNFLLEWA
Peptides produced can be compared to
database (obtained from whole genome
sequencing)
Abundance
Identification of protein!
m/z
BACKGROUND READING