Algorithms in Bioinformatics: A Practical Introduction: Introduction To Molecular Biology
Algorithms in Bioinformatics: A Practical Introduction: Introduction To Molecular Biology
Practical Introduction
Introduction to Molecular Biology
Outline
Cell
DNA, RNA, Protein
Genome, Chromosome, and Gene
Central Dogma (from DNA to Protein)
Mutation
List of biotechnology tools
Brief History of Bioinformatics
Our body
Cell
Actors:
Protein
Amino acid
Amino
group
Amino group
Carboxyl group
R group
NH2
Carboxyl group
R
C
(the central carbon)
OH
R group
Arginine (Arg, R)
Histidine (His, H)
Lysine (Lys, K)
Asparagine (Asn, N)
Cysteine (Cys, C)
Glutamine (Gln, Q)
Glycine (Gly, G)
Serine (Ser, S)
Threonine (Thr, T)
Tyrosine (Tyr, Y)
Alanine (Ala, A)
Isoleucine (Ile, I)
Leucine (Leu, L)
Methionine (Met, M)
Phenylalanine (Phe, F)
Proline (Pro, P)
Tryptophan (Trp, W)
Valine (Val, V)
1-Letter
A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
3-Letter
Ala
Cys
Asp
Glu
Phe
Gly
His
Ile
Lys
Leu
Met
Asn
Pro
Gln
Arg
Ser
Thr
Val
Trp
Tyr
Side
chain
polarity
non-polar
polar
polar
polar
polar
polar
polar
non-polar
polar
non-polar
non-polar
polar
non-polar
non-polar
non-polar
polar
polar
non-polar
polar
non-polar
Side chain
Hydropathy
acidity or
index
basicity
Neutral
1.8
basic (strongly)
-4.5
Neutral
-3.5
acidic
-3.5
neutral
2.5
acidic
-3.5
neutral
-3.5
neutral
-0.4
basic (weakly)
-3.2
neutral
4.5
neutral
3.8
basic
-3.9
neutral
1.9
neutral
2.8
neutral
-1.6
neutral
-0.8
neutral
-0.7
neutral
-0.9
neutral
-1.3
neutral
4.2
Polypeptide
Protein or polypeptide chain is formed by joining the amino
acids together via a peptide bond.
One end of the polypeptide is the amino group, which is called
N-terminus. The other end of the polypeptide is the carboxyl
group, which is called C-terminus.
NH2
OH + NH2
R
NH2
Peptide bond
OH
H
OH
Protein structure
Primary structure
Secondary structure
Tertiary structure
Quaternary structure
DNA
Deoxyribose
Phosphate (bound to the 5 carbon)
Base (bound to the 1 carbon)
N
N
OH
Phosphate
HO P O CH3
Base
(Adenine)
O
H
H
3
OH
H
2
Deoxyribose
More on bases
There are 5 different nucleotides: adenine(A),
cytosine(C), guanine(G), thymine(T), and uracil(U).
A, G are called purines. They have a 2-ring structure.
C, T, U are called pyrimidines. They have a 1-ring
structure.
DNA only uses A, C, G, and T.
N
N
N
N
Adenine
N
N
N
N
Guanine
N
Thymine
N
N
N
Cytosine
N
Uracil
Watson-Crick rules
Complementary bases:
A
T
10
10
Orientation of a DNA
Genome
Chromosome
Gene
Chromosome
Example:
Gene
RNA
Phosphate
1`
3`
2`
Ribose Sugar
RNA vs DNA
Non-coding RNA
snoRNAs
microRNAs
siRNAs
piRNAs
long ncRNAs
genome
pri-miRNA
pre-miRNA
miRNA
RNA interference
Mutation
Mutation
Central Dogma
RNA
DNA
AAAA
Protein
Modified Protein
translation
modification
cytoplasm
Transcription (Procaryotes)
Synthesize a piece of RNA (messenger RNA,
mRNA) from one strand of the DNA gene.
1.
2.
3.
4.
Translation
Translation synthesizes a protein from a mRNA.
In fact, each amino acids are encoded by consecutive
sequences of 3 nucleotides, called codon.
The decoding table from codon to amino acid is called genetic
code.
Note:
There are 43=64 different codons. Thus, the codons are not oneto-one correspondence to the 20 amino acids.
All organisms use the same decoding table!
The codons that encode the same amino acid tend to have the
same first and second nucleotide.
Recall that amino acids can be classified into 4 groups. A single
base change in a codon is usually not sufficient to cause a codon
to code for an amino acid in different group.
Genetic
code
Start codon:
ATG (also
code for M)
Stop codon:
TAA, TAG,
TGA
TTT
TTC
TTA
TTG
Phe
Phe
Leu
Leu
[F]
[F]
[L]
[L]
TCT
TCC
TCA
TCG
Ser
Ser
Ser
Ser
[S]
[S]
[S]
[S]
TAT Tyr [ Y]
TAC Tyr [Y]
TAA Ter [end]
TAG Ter [end]
T
C
A
G
CTT
CTC
C
CTA
CTG
Leu
Leu
Leu
Leu
[L]
[L]
[L]
[L]
CCT
CCC
CCA
CCG
Pro
Pro
Pro
Pro
[P]
[P]
[P]
[P]
CAT
CAC
CAA
CAG
His
His
Gln
Gln
[H]
[H]
[Q]
[Q]
CGT
CGC
CGA
CGG
Arg
Arg
Arg
Arg
[R]
[R]
[R]
[R]
T
C
A
G
AT T
AT C
A
ATA
ATG
Ile
Ile
Ile
Met
[I]
[I]
[I]
[M]
ACT
ACC
ACA
ACG
Thr
Thr
Thr
Thr
[T]
[T]
[T]
[T]
AAT
AAC
AAA
AAG
Asn
Asn
Lys
Lys
[N]
[N]
[K]
[K]
AGT
AGC
AGA
AGG
Ser
Ser
Arg
Arg
[S]
[S]
[R]
[R]
T
C
A
G
GTT
GTC
G
GTA
GTG
Val
Val
Val
Val
[V]
[V]
[V]
[V]
GCT
GCC
GCA
GCG
Ala
Ala
Ala
Ala
[A]
[A]
[A]
[A]
GAT
GAC
GAA
GAG
Asp
Asp
Glu
Glu
[D]
[D]
[E]
[E]
GGT
GGC
GGA
GGG
Gly
Gly
Gly
Gly
[G]
[G]
[G]
[G]
T
C
A
G
Codon usage
coding region
2.
3.
More on tRNA
Transcription is done
within nucleus
Translation is done
outside nucleus
DNA
transcription
Add 5 cap and
poly A tail
AAAAA
RNA splicing
AAAAA
nucleus
export
AAAAA
translation
modification
cytoplasm
Transcription (Eukaryotes)
1.
2.
3.
4.
DNA
transcription
Add 5 cap and
poly A tail
AAAAA
RNA splicing
AAAAA
nucleus
export
exon 1
intron
exon 2
intron
exon 3
splicing
exon 1
exon 2
exon 3
translate the
yellow part as protein
The length
of the yellow part
must be multiple of 3!
Post-translation modification
(PTM)
Structural changes
Population genetic
Copying DNA
Restriction Enzymes
Shortgun method
Cloning
Polymerase Chain Reaction PCR
Gel Electrophoresis
Restriction Enzymes
EcoRI
Digested
by EcoRI
5-G
AATTC-3
+
3-CTTAA
G-5
Shotgun method
Cloning
2.
3.
4.
5.
Step 1
Step 2
Step 3
Step 4&5
More on cloning
Cycle 1
Phases 1 & 2
Cycle 2
Gel electrophoresis
Applications
Sequence Reconstruction
Sequencing by Gel
electrophoresis
Hybridization
2.
3.
4.
DNA array
DNA sample
hybridize
Sequencing by hybridization
SNP detection
Mass Spectrometry
SAGE, PET technology
ENCODE Project
HAPMAP Project