0% found this document useful (0 votes)
18 views79 pages

MG - L8 - Genomics & Proteomics

Uploaded by

minghouu215
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views79 pages

MG - L8 - Genomics & Proteomics

Uploaded by

minghouu215
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 79

GENOMICS &

PROTEOMICS
OUTLINE

1. Genomics & Proteomics: An overview


2. Structural genomics
3. Functional genomics
4. Proteomics

2
GENOMICS

 Genome - the complete copy of the genetic


information or one complete set of chromosomes
of an organism.

 Genomics - mapping, sequencing, and analyzing


the functions of entire genomes.

3
GENOMICS

 Structural genomics: the study of genome structure


 Functional genomics: the study of genome function
 Transcriptome
 Proteome: the complete set of proteins encoded by a genome
 Proteomics: the determination of the structures and functions
of all of the proteins in an organism
 Comparative genomics: the study of genome evolution

4
WHAT IS TRANSCRIPTOMICS?
 The transcriptome is the set of all RNA molecules,
including mRNA, rRNA, tRNA, and non-coding RNA
produced in one or a population of cells
 Can mean the total set of transcripts in a given organism
OR
 A specific subset of transcripts present in a particular cell
type

5
WHAT IS TRANSCRIPTOMICS?
 Unlike the genome, which is roughly fixed for a given
cell line (excluding mutations), the transcriptome can
vary with external environmental conditions.

 It reflects the genes that are being actively expressed at


any given time, with the exception of mRNA
degradation phenomena such as transcriptional
attenuation .

7
WHAT IS TRANSCRIPTOMICS?

 Transcriptomics (expression profiling) examines the


expression level of mRNAs in a given cell population,
often using high-throughput techniques based on
DNA microarray technology.
 Can be used to compare gene expression in
production traits
e.g. Growth rates
Feed conversion rates (FCR)
Disease susceptibility 8
WHAT IS TRANSCRIPTOMICS?

 BUT only possible if a large amount of mRNA data is


available
 Traditionally this mRNA data is generated using
cDNA libraries and EST characterization.
 Problems:
Expensive
Low throughput
Time consuming
Low coverage – miss the rare transcripts 9

Requires a large amount of RNA


WHAT IS TRANSCRIPTOMICS?

 NOW next-generation sequencing: High-throughput


parallel sequencing technologies can generate millions of
short reads from a library of nucleotide sequences (DNA,
RNA, or a mixture)

10
OUTLINE

1. Genomics & Proteomics: An overview


2. Structural genomics
3. Functional genomics
4. Proteomics

11
GENETIC MAPS
 Genetic (linkage) map approximately provides the
location of one gene relative to the locations of other
known genes. Unit: cM, map units
 Estimate recombination frequency between loci in the
progeny by Testcross
 50% - loci on the different chromosome or far apart on the
same chromosome
 < 50% - loci close together on the same chromosome

 Need multiple two-point or three-point crosses to


construct genetic map for whole chromosome
12
LOCI OF SOME HUMAN GENES
LINKAGE GROUPS
 All genes on one chromosome are called a linkage
group
 The farther apart two genes are on a chromosome, the
more often crossing over occurs between them
 Linked genes are very close together; crossing over
rarely occurs between them
 The probability that a crossover will separate alleles of
two genes is proportional to the distance between those
genes
Crossing over
Linked gene loci
Locus 1 Locus 2
Maternal chromosome A B

AB
Paternal chromosome a b
ab

• no Ab or aB gametes
• genes are tightly linked
TESTCROSSES
 A testcross is a method of determining if an
individual is heterozygous or homozygous
dominant
 An individual with unknown genotype is crossed
with one that is homozygous recessive (PP x pp)
or (Pp x pp)
PP pp Pp
genotype: (homozygous for (homozygous for (heterozygous at
dominant allele P) recessive allele p) the P gene locus)

phenotype:
Recombinant
gametes as a result
of crossover during
meiosis in the
heterozygous parent

A number of
recombinant offspring
are fewer than non-
recombinant offspring

Figure 5.9
GENETIC MAPS

 Testcross is according to available single-locus traits


 Molecular markers – RFLP, microsatellite, SNP, PCR,
DNA sequencing – can be used to construct & refine
genetic maps
 Limitation:
 Less resolution or detail
 Not always correspond to physical distances between loci

20
21
PHYSICAL MAPS

 Physical map locates the genes in relation to their


distances measured in number of base pair (bp),
kilobases (kp, 1000 bp), megabases (mb, 1 million bp)
 Physical maps can be created by restriction mapping
 1cM # 1mb of DNA

22
23
24
25
SEQUENCES OF GENES &
CHROMOSOMES

 Why sequence DNA?


 Understand how biological
processes work
 Find changes to explain genetic
disorders

26
A QUICK HISTORY OF
SEQUENCING
1953 – Structure of DNA solved
1977 – Sanger sequencing invented
– First genome sequenced - Phage Φ-X174 (5 kb)
1986 – First automated sequencing machine
1990 – Human genome project started

27
A QUICK HISTORY OF
SEQUENCING
1995 – First bacterial genome
Haemophilus influenzae (1.8 Mb)
1998 – First animal genome
Caenorhabditis elegans (97 Mb)
2003 – Completion of Human genome project
Homo sapiens (3 Gb), 13 years, $ 2.7 bil.
2005 – First ‘next-generation’ sequencing
instrument
2013 – > 10,000 genome sequences in NCBI
database 28
GENERATION OF SEQUENCING
TECHNOLOGIES

29
GENERATION OF SEQUENCING
TECHNOLOGIES
 First generation: Sanger & Maxam-Gilbert
technique
 Next (second) generation: Roche 454, Illumina
Solexa, ABI SOLiD
 Third generation: Pacific Bio, Helicos
 Other/4th generation: Oxford nanopore, Polonator
Ion Torrent??

30
31
SANGER SEQUENCING: DYE-
TERMINATOR SEQUENCING

32
SANGER SEQUENCING: DYE-
TERMINATOR SEQUENCING

33
CHARACTERISTICS OF NGS
 There are four main sequencing methods
 Pyrosequencing (454)
 Reversible terminator sequencing (Illumina)
 Sequencing by ligation (SOLiD)
 Semiconductor sequencing (Ion Torrent)

 They differ each other in term


 Read length
 Data produced
 Data quality
 Bioinformatics required 34
CHARACTERISTICS OF NGS

 NGS platforms generate millions of reads and billions of


base calls each run
 NGS reads are typically short (<400 bp)
 Dramatic reduction in cost of sequencing
 GS-FLX provides > 100x decrease in costs compared to Sanger
sequencing
 HiSeq and SOLiD > 100x decrease in costs over GS-FLX

35
COMPARISON OF NGS

36
37
BIOINFOMATICS
 Bioinformatics = Molecular biology + Computer
 Developing databases, computer-based algorithms,
gene-prediction software, analytical tools to “mine the
data” from sequencing projects
 Assembly – align multiple sequencing reads that are
overlapping with one-another to reconstruct a long
DNA fragment
 Annotation – link sequence information to its function
& expression on similar genes in other species
 BLAST (Basic Local Alignment Search Tool)
38
Basic scheme of a NGS
39
40
Diagram of de novo sequence assembly
41
APPLICATIONS OF NGS
 Whole genome sequencing
• De novo assembly
• Re-sequencing
• Comparative genomics

 Targeted gene sequencing

42
APPLICATIONS OF NGS
RNA-seq
 Gene expression
 Transcriptome assembly

43
APPLICATIONS OF NGS IN
ANIMAL BIOTECHNOLOGY

44
APPLICATIONS OF NGS IN
HUMAN HEALTH
Cancer
research

Genetic
disorders

Personalized Human
medicine microbiome
Pre- &
Infectious
post-natal
diseases
diagnosis
45
46
OUTLINE

1. Genomics & Proteomics: An overview


2. Structural genomics
3. Functional genomics
4. Proteomics

47
EXPRESSED-SEQUENCE TAGS
 ESTs are markers associated with DNA
sequences that are expressed as RNA

Isolating RNA from cells

Reverse transcription
Tag as a marker to find active
genes
Sequencing
Set of cDNA
fragments
48
EXPRESSED SEQUENCES
 Eukaryote genomes contain a small proportion of the
DNA encodes protein – 1% in human
 Analysing cDNA (DNA complementary to RNA) or
EST to focus on the protein-coding content of genomes
 Study of gene expression to monitor changes in total
genome expression overtime
 Development
 In response to changes in the environment

49
DOT BLOT OR ARRAY HYBRIDIZATION
ANALYSIS OF GENE EXPRESSION

 Gene-specific
nucleotide probes are
applied to a membrane
in a specific pattern.
 Labeled (fluorescent or
radioactive) cDNA
preparations are
hybridized with the
probes on the
membrane.

50
MICROARRAYS (GENE CHIPS)
 A microarray contains thousands of hybridization
probes on a single membrane or silicon wafer to
simultaneously detect the expression of many genes
 A chip of 23,000 human genes
 Microarrays are produced in several ways
 Microsynthesis of oligonucleotides in situ
 Spotting prefabricated oligonucleotides on solid supports
 Spotting DNA fragments or cDNAs on solid supports
 Probes on microarrays are hybridized to fluorescent
cDNA samples

51
52
53
GENE CHIPS

54
Typical dual-colour
microarray experiment

55
Interpretation

 RED: more cDNA from


disease cells hybridize to
DNA probes –
overexpression of the genes
in disease cells
 GREEN: more cDNA from normal tissue hybridize
to DNA probes – underexpression of the genes in
disease cells
 YELLOW: both cDNA hybridize equally to DNA
probes – equal expression in both types of cells
 NO COLOR - neither cDNA from the control nor
cDNA from the disease cells hybridize to DNA
probes – no expression 56
RNA-seq
• Now have the technology to sequence the
mRNA (via cDNA) content of a
cell/organism
• Use Next Generation Sequencing
• Several types of NGS available
RNA-seq generates large numbers of short sequence reads

mRNA
AAAAAAAAAAA

Small cDNAs (75 – 100bp)

Short over-lapping
sequence reads

Computationally intense computer algorithms:

• align overlapping sequence reads with statistical certainty (helps to have a


genome sequence as a scaffold)
• calculate the frequency with which reads appear for each full length transcript
sequence
• determine the proportional representation of each transcript in the original
RNA sample
Advantages of RNA-seq
Technology Microarray cDNA/EST Sequencing RNA-seq
Technology Specification
Principle Hybridisation Sanger Sequencing High-throughput sequencing
Resolution Up to 100bp Single base Single base
Throughput High Low High
Reliance on genomic sequence Yes No In some cases
Background Noise High Low Low

Application
Simultaneously map transcribed Yes Limited for gene expression Yes
regions and gene expression

Dynamic range to quantify gene Up to a few 100 fold Not practical > 8000 fold
expression level

Ability to distinguish gene


Limited Yes Yes
isoforms

Ability to distinguish allelic


Limited Yes Yes
expression

Practical Issues
Required amount of RNA High High Low
Cost of mapping transcriptome of
High High Low
large genomes
Pathway Analysis

Muñoz Garcia et al. (2018) Pathway analysis of transcriptomic data shows immunometabolic effects of vitamin D. J. Mol.
Endocrinology 60 pp. 95-108
Issues with RNA seq technologies
• Involves amplification, usually by PCR - can sometimes get
sequence amplification bias
• Some transcripts reverse transcribe less efficiently
• Sequencing of short reads sometimes makes determination
of RNA processing (e.g. alternative splicing) difficult to
ascertain

Would be better to directly sequence full length RNAs without


amplification
Direct nanopore sequencing of long RNAs
The future of RNA-seq

Direct RNA-sequencing

Single-cell transcriptomics

Parker et al. (2020) Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA
processing and m6A modification. eLife 9:e49658.
OUTLINE

1. Genomics & Proteomics: An overview


2. Structural genomics
3. Functional genomics
4. Proteomics

64
GOALS OF PROTEOMICS
 mRNA will produce relative little protein if the mRNA
is short-lived or poorly translated
 Many proteins are post-translationally modified in
ways that affect their activities, and transcription
profiling gives no data regarding this level of
regulation.
 The goals of proteomics is the identification of the full
set of proteins produced by a cell or tissue under a
particular set of conditions: their relative abundance,
their modification, their interacting partner proteins.

65
WHAT DO WE WANT TO
KNOW?
 What proteins are there?
 How much protein is present?
 Does the level change under certain conditions?
 Is the protein active?
 What does the protein do?
 What other proteins does it interact with?
 Is the protein modified? Under what circumstances? What
are the consequences of modification?
2D-PAGE
Key principles of 2D-PAGE (Two-Dimensional
Polyacrylamide Gel Electrophoresis)
 Proteins differ from each other in terms of their mass
and charge
 Both these properties can be used to separate
proteins by gel electrophoresis
 The successive application of both techniques in
perpendicular directions (two dimensions) provides
maximum separation and allows thousands of
proteins to be resolved
67
2D-PAGE
Key principles of 2D-PAGE (Two-Dimensional
Polyacrylamide Gel Electrophoresis)
 Staining the gel reveals the positions of individual
proteins as spots or smudges. These can be picked
and analysed by mass spectrometry.
 There are tens of thousands of proteins in the cell,
differing in abundance over six orders of magnitude.
2D-PAGE is not sensitive enough to detect the rare
proteins and many proteins will not be resolved.
Splitting the sample into different fractions is often
necessary to reduce the complexity of protein
mixtures prior to 2D-PAGE. 68
WHY SDS?
 All proteins have different charge
 Determined by average charge of all amino
acids
 SDS is a –very charged detergent
Before SDS
 Coats proteins, denatures them, gives them
a uniform negative charge
 Charge is dependent on molecular weight

 Usually use a reducing agent as well (e.g.


beta-mercaptoethanol/DTT) After SDS
 Break disulphide bonds

 This is added before the protein is loaded


onto the gel
71
HOW DOES 2D-PAGE WORK?

Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) is used to


separate mixtures of proteins, and is particularly useful for comparing related
samples - such as healthy and diseased tissue. The first step is to load the
proteins onto the gel, which has a pH gradient from top to bottom

73
HOW DOES 2D-PAGE WORK?

When a voltage is applied across the gel, the proteins move through the
gel until they reach the point at which their charge is the same as the
surrounding pH. This separation is called isoelectric focusing

74
HOW DOES 2D-PAGE WORK?

Having separated the proteins by charge, the second step is to separate


them according to their mass in the perpendicular dimension. Individual
proteins can then by isolated and identified by mass spectrometry.
Comparing the two gels can reveal proteins that are expressed at
different levels. For example, the red protein is more abundant in the
diseased tissue, and could be a useful drug target
75
MASS SPECTROMETRY

76
BASIC PRINCIPLES OF MASS
SPECTROMETRY
 Mixture of proteins is ionised by
laser pulse
 Accelerated by electrostatic field
 Separated by time of flight
(determined by mass/charge ratio)
 Detected as a TOF spectrum
PEPTIDE MASS FINGERPRINT
IDENTIFICATION
 Why trypsin?
MEHTTSRYLLDEDDKIAQNFLLEWA
 It cleaves on C-end of lysine +
arginine residues
 Therefore a given protein sequence
should always produce the same MEHTTSR YLLDEDDK
peptides
IAQNFLLEWA
 Peptides produced can be compared to
database (obtained from whole genome
sequencing)

Abundance
 Identification of protein!
m/z
BACKGROUND READING

Pierce, B.A., 2012. Genetics: A Conceptual Approach,


4th Ed., Chapter 20, pp. 557-590

You might also like