0% found this document useful (0 votes)
41 views

Tutorial Note 9 Hidden Markov Model

Uploaded by

Romario Tim Vaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Tutorial Note 9 Hidden Markov Model

Uploaded by

Romario Tim Vaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Tutorial Note 9

Hidden Markov Model

The Chinese University of Hong Kong


CSCI3220 Algorithms for Bioinformatics
TA: Zhenghao Zhang
Agenda
• Markov Model
– Introduction and example
• Hidden Markov Model
– Introduction, example and exercise
• Hidden Markov Model and Bioinformatics

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 2


Markov Model
• A stochastic model assuming memory of the
previous n states for an n-th order model
• Example: Day 0
0.5 0.5

0.4
0.8 Sunny Cloudy 0.6
0.2

S0
0.5 0.5
0.4

0.8 S1 S2 0.6
0.2

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 3


Markov Model
• What is the probability of having the event:
– Sunny  Cloudy  Cloudy  Sunny  Sunny
Day 0
0.5 0.5

0.4
0.8 Sunny Cloudy 0.6
0.2

• Probability = 0.5 × 0.2 × 0.6 × 0.4 × 0.8

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 4


Hidden Markov Model
• A Markov model of which the actual state sequences are not
known, but whether we know the parameters of the model
is a separate matter
• Example: Day 0
? ?

?
? ? ? ?
?

Sunny Cloudy Sunny Cloudy

– We can only observe:


– Sunny  Cloudy  Cloudy  Sunny  Sunny

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 5


Hidden Markov Model
• Problem Definition:
– We have a series of observations
– Each observation comes from a hidden state
• Example:
– We have records of someone observing the weather of
each day, either sunny or cloudy
– How to guess the weather of subsequent days to be
sunny or cloudy?

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 6


Hidden Markov Model Representation
• Model parameters:  =( , P, E)
– : Initial probabilities, : π
si
A B
– P: Transition probabilities, Init 0.5 0.5

– E: Emission probabilities P: qt+1


A B
qt

• State sequences: Q=q1,q2,… A 0.8 0.2


B 0.4 0.6
– A, B, B, A..
O
• Observations: O=o1,o2,… E: si Sunny Cloudy
A 1 0
– Sunny, Cloudy, Cloudy, Sunny.. B 0 1

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 7


Exercise1: Markov model
• Define a Markov model based on the information in the text
description below:
a). The motif typically contains four nucleotides, with a consensus
sequence of GACG.
b). The first and the last position are completely conserved.
c). The second position has a 20% chance of being G instead.
d). The third position has a 10% chance of being A and 25% chance
of being T instead.
e). In 10% of the cases, there is an insertion of one nucleotide
between the first and second positions, with all four nucleotides
equally likely for that nucleotide.
r). In the genomic background(the nucleotides after motif), all four
nucleotides are equally likely.

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 8


Answer: Markov model
1
S2
0.1 1

0.9 1 1 1
S1 S3 S4 S5 S6

:
si
π
s1 s2 s3 s4 s5 s6
qt+1
s1 s2 s3 s4 s5 s6 Init 1 0 0 0 0 0
P: qt

s1 0 0.1 0.9 0 0 0 O
E: si A C G T
s2 0 0 1 0 0 0
s1 0 0 1 0
s3 0 0 0 1 0 0 0.25 0.25 0.25 0.25
s2
s4 0 0 0 0 1 0 s3 0.8 0 0.2 0
s5 0 0 0 0 0 1 s4 0.1 0.65 0 0.25
s6 0 0 0 0 0 1 s5 0 0 1 0
s6 0.25 0.25 0.25 0.25
CSCI3220 Algorithms for Bioinformatics Tutorial Notes 9
The Three Problems related to HMM
• Evaluating data likelihood i.e. Pr(O | )
– Forward Algorithm
– Backward Algorithm
• Using a model i.e. argmaxPrሺ𝑄|𝑂,θሻ
𝑄
– Viterbi Algorithm
• Learning a model i.e. argmaxPr൫൛𝑂ሺ𝑑ሻൟ|θ൯
θ
– Baum-Welch algorithm

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 10


Exercise 2: Hidden Markov Model
• Construct the transition matrix (P) of the given
HMM. 0.3

S1 S4 0.9
0.3 0.7

0.7 0.1
S3
S2 1 0.8 S5 0.2
qt+1
qt s1 s2 s3 s4 s5
s1
s2
s3
s4
s5

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 11


Answer: Hidden Markov Model
0.3

S1 S4 0.9
0.3 0.7

0.7 0.1
S3
S2 1 0.8 S5 0.2

qt+1
s1 s2 s3 s4 s5
P: qt

s1 0 0.7 0.3 0 0
s2 0 0 1 0 0
s3 0 0 0.3 0.7 0
s4 0 0 0 0.9 0.1
s5 0 0 0.8 0 0.2

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 12


Exercise 3: Hidden Markov Model
• You are further provided with the emission matrix
and initial probabilities:
1) Can the HMM generate the
following sequences?
:
si
π
s1 s2 s3 s4 s5
a. ACGT Init 0.8 0.2 0 0 0
b. AAA
O
E: A C G T
c. CCCCCCC si

s1 0.8 0.2 0 0
0.3
s2 0.5 0 0 0.5
S1 S4 0.9
0.3 0.7 s3 0 0.3 0.7 0
0.7 S3 0.1
s4 0 0 0.1 0.9
S2 1 0.8 S5 0.2
0.2 0.3 0.3 0.2
s5

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 13


Answer: HMM – Generation
1) Can the HMM generate following sequences?
a. ACGT
– Yes. E.g. s1 > s3 > s3 > s4
b. AAA
– No. We can have at most 2 ‘A’s : π s i
s1 s2 s3 s4 s5
at the beginning. Init 0.8 0.2 0 0 0
c. CCCCCCC O
E: si A C G T
– Yes. E.g. s1 > s3 > s3 > s3 > s3 > s3 > s3 0.8 0.2 0 0
s1
0.3
S1 S4 s2 0.5 0 0 0.5
0.7 0.9
0.3
s3 0 0.3 0.7 0
0.7 0.1
S3 s4 0 0 0.1 0.9
S2 1 0.8 S5 0.2 s5 0.2 0.3 0.3 0.2

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 14


Exercise 3: Hidden Markov Model
2) Which sequence is more
likely to be generated, ATGGG
or CCTTT?

:
si
π
s1 s2 s3 s4 s5
Init 0.8 0.2 0 0 0

O
E: si A C G T
s1 0.8 0.2 0 0
s2 0.5 0 0 0.5
s3 0 0.3 0.7 0
s4 0 0 0.1 0.9
s5 0.2 0.3 0.3 0.2

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 15


Forward Algorithm
• Define (t,i)=Pr(o1, o2, ..., ot, qt=si|), which is the
probability that we are at state si in step t, and we
have observed o1, o2, ..., ot.
• Initialization: (1,i) = iei(o1)
𝑁

αሺ𝑡,𝑖ሻ= ෍ αሺ𝑡 −1,𝑗ሻ𝑒𝑖 ሺ𝑜𝑡 ሻ𝑝𝑗𝑖


• Recursion(/Iteration): 𝑗=1

𝑁 where (t-1,j)=Pr(o1, o2, ..., ot-1, qt-1=sj|)


Prሺ𝑂|θሻ=෍ αሺ𝑚,𝑗ሻ
• Finally: 𝑗=1
Si: State i
qt : State at time t
q =( , P, E): Model Parameters
: Initial P: Transition E: Emission
Ot: Observation at time t

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 16


Answer: HMM – Forward Algorithm
:
si
Likelihood of ATGGG π
s1 s2 s3 s4 s5
= 0.00691488 + 0.0032928 + 0.00032928
Init 0.8 0.2 0 0 0
= 0.01053696

(t,i s s2 s3 s4 s5 P: qt
qt+1
s1 s2 s3 s4 s5
1
)
s1 0 0.7 0.3 0 0
0.8 0.2
t=1 ×0.8 ×0.5 0 0 0
s2 0 0 1 0 0
=0.64 =0.1

0.64 s3 0 0 0.3 0.7 0


(0.64×0.3
×0.7
t=2 0
×0.5
+0.1×1) × 0 0 0 s4 0 0 0 0.9 0.1
=0.224 =0
s5 0 0 0.8 0 0.2
0.224×1 O
t=3 0 0 ×0.7 0 0 E: si A C G T
=0.1568
s1 0.8 0.2 0 0
0.1568×0.3× 0.1568×0.7
t=4 0 0 0.7 ×0.1 0 s2 0.5 0 0 0.5
=0.032928 =0.010976
s3 0 0.3 0.7 0
(0.032928
0.032928 ×0.7 0.010976 0 0 0.1 0.9
t=5 0 0 ×0.3×0.7 +0.010976 ×0.1×0.3 s4
=0.00691488 ×0.9) × 0.1 =0.00032928
=0.0032928 s5 0.2 0.3 0.3 0.2
CSCI3220 Algorithms for Bioinformatics Tutorial Notes 17
Answer: HMM – Forward Algorithm
:
si
Likelihood of CCTTT π s1 s2 s3 s4 s5
= 0.0059521392 + 0.000154224
= 0.0061063632 < 0.01053696 Init 0.8 0.2 0 0 0

(t,i s1 s2 s3 s4 s5 P: qt
qt+1
s1 s2 s3 s4 s5
)
0.8 s1 0 0.7 0.3 0 0
0.2×0
t=1 ×0.2 =0 0 0 0
=0.16 s2 0 0 1 0 0
0.16 0.16×0.3
t=2 0 ×0.7×0 ×0.3 0 0 s3 0 0 0.3 0.7 0
=0 =0.0144
0.0144 0.0144×0.7 s4 0 0 0 0.9 0.1
t=3 0 0 ×0.3×0 ×0.9 0
=0 =0.009072 s5 0 0 0.8 0 0.2
O
0.009072 0.009072 E: si A C G T
t=4 0 0 0 ×0.9×0.9 ×0.1×0.2
=0.00734832 =0.00018144 s1 0.8 0.2 0 0
0.00018 (0.00734832
144×0.8 0.00734832 ×0.1 s2 0.5 0 0 0.5
t=5 0 0
×0
×0.9×0.9 +0.00018144
=0.0059521392 ×0.2) × 0.2 s3 0 0.3 0.7 0
=0 =0.000154224
s4 0 0 0.1 0.9
ATGGG is more likely than CCTTT s5 0.2 0.3 0.3 0.2
CSCI3220 Algorithms for Bioinformatics Tutorial Notes 18
Backward Algorithm
• Define (t,i)=Pr(ot+1, ot+2, ..., om|qt=si, )
𝑁

• Initialization: ሺ𝑚 −1, 𝑖ሻ=෍ 𝑒𝑗 ሺ𝑜𝑚ሻ𝑝𝑖𝑗


𝑗=1

• Recursion(/Iteration): ሺ𝑡, 𝑖ሻ=෍ βሺ𝑡+1,𝑗ሻ𝑒𝑗 ሺ𝑜𝑡+1ሻ𝑝𝑖𝑗


𝑗=1

where (t+1,j)=Pr(ot+2, ot+3, ..., om|qt+1=sj, )

𝑁
• Finally: Prሺ𝑂|θሻ=෍ ሺ1, 𝑗ሻ𝑒𝑗 ሺ𝑜1ሻπ𝑗 Si: State i
𝑗=1 qt : State at time t
q =( , P, E): Model Parameters
: Initial P: Transition E: Emission
Ot: Observation at time t

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 19


Answer: HMM – Backward Algorithm
:
si
Likelihood of ATGGG π
s1 s2 s3 s4 s5
= 0.016464×0.8×0.8
Init 0.8 0.2 0 0 0
= 0.01053696

β(t,i) s1 s2 s3 s4 s5 P: qt
qt+1
s1 s2 s3 s4 s5
0.008466
×0.9×0.9
0.04704 0.008466 +0.049272 0.049272 s1 0 0.7 0.3 0 0
t=1 ×0.7×0.5 0 ×0.7×0.9
×0.1×0.2
×0.2×0.2
=0.016464 =0.00533358 =0.00197088
=0.007842 s2 0 0 1 0 0
9

0.0672×0. 0.0672×1
0.0672×0.3 0.0294×0.9 0.0672×0.8 s3 0 0 0.3 0.7 0
×0.7+0.0294 ×0.1+0.194 ×0.7+0.194
t=2 3×0.7 ×0.7
×0.7×0.1 ×0.1×0.3 ×0.2×0.3
=0.014112 =0.04704
=0.01617 =0.008466 =0.049272
s4 0 0 0 0.9 0.1
0.28×0.3×0.7 0.12×0.9 0.28×0.8×0.7 s5 0 0 0.8 0 0.2
0.28×0.3 0.28×1 O
+0.12×0.7 ×0.1+0.62 +0.62×0.2×0.
t=3 ×0.7
=0.0588
×0.7
=0.196
×0.1 ×0.1×0.3 3 E: si A C G T
=0.0672 =0.0294 =0.194
s1 0.8 0.2 0 0
1×0.3×0.7 1×0.9×0.1 1×0.8×0.7
1×0.3×0.7 1×1×0.7
t=4 =0.21 =0.7
+1×0.7×0.1 +1×0.1×0.3 +1×0.2×0.3
0.5 0 0 0.5
=0.28 =0.12 =0.62 s2
t=5 1 1 1 1 1 s3 0 0.3 0.7 0
s4 0 0 0.1 0.9
s5 0.2 0.3 0.3 0.2
CSCI3220 Algorithms for Bioinformatics Tutorial Notes 20
Answer: HMM – Backward Algorithm
:
si
Likelihood of CCTTT π s1 s2 s3 s4 s5
= 0.03816477×0.8×0.2
= 0.0061063632 < 0.01053696 Init 0.8 0.2 0 0 0

β(t,i) s1 s2 s3 s4 s5 P: qt
qt+1
s1 s2 s3 s4 s5
0.424053×0.
0.424053 0.424053 0.424053 0.000064
×0.3×0.3 ×1×0.3 ×0.3×0.3 ×0.1×0.3
8×0.3 s1 0 0.7 0.3 0 0
t=1 =0.0381647 =0.127215 =0.0381647 =0.0000019
+0.000064
×0.2×0.3
7 9 7 2 =0.10177656 s2 0 0 1 0 0
0.6731
0.6731×0.9
0.0016×0.2 s3 0 0 0.3 0.7 0
×0.9+0.0016
t=2 0 0 ×0.7×0.9
×0.1×0.2
×0.2
=0.424053
=0.545243
=0.000064 s4 0 0 0 0.9 0.1
0.83×0.7
0.83×0.9×0.
9+0.04×0.1 0.04×0.2×0.2
s5 0 0 0.8 0 0.2
t=3 0 0 ×0.9
×0.2 =0.0016 E: O
A C G T
=0.5229 si
=0.6731

1×0.7×0.5 1×0.7×0.9
1×0.9×0.9
1×0.2×0.2
s1 0.8 0.2 0 0
t=4 =0.35
0
=0.63
+1×0.1×0.2
=0.04
=0.83
s2 0.5 0 0 0.5
t=5 1 1 1 1 1
s3 0 0.3 0.7 0
ATGGG is more likely than CCTTT s4 0 0 0.1 0.9
s5 0.2 0.3 0.3 0.2
CSCI3220 Algorithms for Bioinformatics Tutorial Notes 21
Exercise 3: Hidden Markov Model
3) What states are the most
likely for the sequence, ACGGT?

:
si
π
s1 s2 s3 s4 s5
Init 0.8 0.2 0 0 0

O
E: si A C G T
s1 0.8 0.2 0 0
s2 0.5 0 0 0.5
s3 0 0.3 0.7 0
s4 0 0 0.1 0.9
s5 0.2 0.3 0.3 0.2

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 22


Viterbi Algorithm
• Given a fully specified HMM (P,E,) and a
sequence of outcomes (O), determine the mostly
likely state sequence (Q).
Prሺ𝑄 and 𝑂|θሻ
argmax Prሺ𝑄|𝑂,θሻ= argmax = argmax Prሺ𝑄 and 𝑂|θሻ
𝑄 𝑄 Prሺ𝑂|θሻ 𝑄

• Define sub-problem: Maximum over all possible


hidden state sequences.

δሺ1,𝑖ሻ= Prሺ𝑞1 = 𝑠𝑖 and o1|θሻ= Prሺo1 |𝑞1 = 𝑠𝑖 ,θሻPrሺ𝑞1 = 𝑠𝑖 |θሻ= 𝑒𝑖 ሺ𝑜1 ሻπ𝑖

δሺ𝑡, 𝑖ሻ= max δሺ𝑡−1, 𝑗ሻ𝑒𝑖ሺ𝑜𝑡ሻ𝑝𝑗𝑖


1≤𝑗≤𝑁
• So, max Prሺ𝑄 and 𝑂|θሻ= max δሺ𝑚,𝑖ሻ
Si: State i
qt : State at time t
𝑄 1≤𝑖≤𝑁 q =( , P, E): Model Parameters
: Initial P: Transition E: Emission
Ot: Observation at time t

CSCI3220 Algorithms for Bioinformatics Tutorial Notes 23


Answer: HMM – Viterbi Algorithm
:
si
Sequence: ACGGT π s1 s2 s3 s4 s5
Init 0.8 0.2 0 0 0
δ(t,i) s1 s2 s3 s4 s5

t=1 0.8×0.8
=0.64
0.2×0.5
=0.1 0 0 0 P: qt
qt+1
s1 s2 s3 s4 s5

Max(0.64 s1 0 0.7 0.3 0 0


×0.3, 0.1×1)
t=2 0 0
×0.3
0 0
=0.0576 s2 0 0 1 0 0
0.0576×0.3 s3 0 0 0.3 0.7 0
0.0576×0.7×0.1
t=3 0 0 ×0.7
=0.004032
0
=0.012096 s4 0 0 0 0.9 0.1
Max(0.012096 s5
0.012096 ×0.7, 0.004032 0.004032 0 0 0.8 0 0.2
t=4 0 0 ×0.3×0.7 ×0.1×0.3 O
=0.00254016
×0.9) ×0.1
=0.00084672 =0.00012096 E: si A C G T

Max(0.00084672
s1 0.8 0.2 0 0
Max(0.00254016 ×0.1,
×0.7, 0.00084672 0.5 0 0 0.5
t=5 0 0 0 ×0.9) ×0.9 0.00012096 s2
×0.2) ×0.2
=0.0016003008
=0.0000169344
s3 0 0.3 0.7 0

Finally, max(0.0016003008, 0.0000169344) = 0.0016003008 s4 0 0 0.1 0.9


States: s1 > s3 > s3 > s3 > s4 s5 0.2 0.3 0.3 0.2
CSCI3220 Algorithms for Bioinformatics Tutorial Notes 24
Check List
• What are Markov model and hidden Markov
model?
• What is likelihood?
• Are the likelihood values by the forward algorithm
and the backward algorithm equal?
• How can you deduce a sequence of states such
that the likelihood of the observation is
maximized?
• Can you give some examples of using HMM in
bioinformatics?
CSCI3220 Algorithms for Bioinformatics Tutorial Notes 25

You might also like