Tutorial Note 9 Hidden Markov Model
Tutorial Note 9 Hidden Markov Model
0.4
0.8 Sunny Cloudy 0.6
0.2
S0
0.5 0.5
0.4
0.8 S1 S2 0.6
0.2
0.4
0.8 Sunny Cloudy 0.6
0.2
?
? ? ? ?
?
0.9 1 1 1
S1 S3 S4 S5 S6
:
si
π
s1 s2 s3 s4 s5 s6
qt+1
s1 s2 s3 s4 s5 s6 Init 1 0 0 0 0 0
P: qt
s1 0 0.1 0.9 0 0 0 O
E: si A C G T
s2 0 0 1 0 0 0
s1 0 0 1 0
s3 0 0 0 1 0 0 0.25 0.25 0.25 0.25
s2
s4 0 0 0 0 1 0 s3 0.8 0 0.2 0
s5 0 0 0 0 0 1 s4 0.1 0.65 0 0.25
s6 0 0 0 0 0 1 s5 0 0 1 0
s6 0.25 0.25 0.25 0.25
CSCI3220 Algorithms for Bioinformatics Tutorial Notes 9
The Three Problems related to HMM
• Evaluating data likelihood i.e. Pr(O | )
– Forward Algorithm
– Backward Algorithm
• Using a model i.e. argmaxPrሺ𝑄|𝑂,θሻ
𝑄
– Viterbi Algorithm
• Learning a model i.e. argmaxPr൫൛𝑂ሺ𝑑ሻൟ|θ൯
θ
– Baum-Welch algorithm
S1 S4 0.9
0.3 0.7
0.7 0.1
S3
S2 1 0.8 S5 0.2
qt+1
qt s1 s2 s3 s4 s5
s1
s2
s3
s4
s5
S1 S4 0.9
0.3 0.7
0.7 0.1
S3
S2 1 0.8 S5 0.2
qt+1
s1 s2 s3 s4 s5
P: qt
s1 0 0.7 0.3 0 0
s2 0 0 1 0 0
s3 0 0 0.3 0.7 0
s4 0 0 0 0.9 0.1
s5 0 0 0.8 0 0.2
s1 0.8 0.2 0 0
0.3
s2 0.5 0 0 0.5
S1 S4 0.9
0.3 0.7 s3 0 0.3 0.7 0
0.7 S3 0.1
s4 0 0 0.1 0.9
S2 1 0.8 S5 0.2
0.2 0.3 0.3 0.2
s5
:
si
π
s1 s2 s3 s4 s5
Init 0.8 0.2 0 0 0
O
E: si A C G T
s1 0.8 0.2 0 0
s2 0.5 0 0 0.5
s3 0 0.3 0.7 0
s4 0 0 0.1 0.9
s5 0.2 0.3 0.3 0.2
(t,i s s2 s3 s4 s5 P: qt
qt+1
s1 s2 s3 s4 s5
1
)
s1 0 0.7 0.3 0 0
0.8 0.2
t=1 ×0.8 ×0.5 0 0 0
s2 0 0 1 0 0
=0.64 =0.1
(t,i s1 s2 s3 s4 s5 P: qt
qt+1
s1 s2 s3 s4 s5
)
0.8 s1 0 0.7 0.3 0 0
0.2×0
t=1 ×0.2 =0 0 0 0
=0.16 s2 0 0 1 0 0
0.16 0.16×0.3
t=2 0 ×0.7×0 ×0.3 0 0 s3 0 0 0.3 0.7 0
=0 =0.0144
0.0144 0.0144×0.7 s4 0 0 0 0.9 0.1
t=3 0 0 ×0.3×0 ×0.9 0
=0 =0.009072 s5 0 0 0.8 0 0.2
O
0.009072 0.009072 E: si A C G T
t=4 0 0 0 ×0.9×0.9 ×0.1×0.2
=0.00734832 =0.00018144 s1 0.8 0.2 0 0
0.00018 (0.00734832
144×0.8 0.00734832 ×0.1 s2 0.5 0 0 0.5
t=5 0 0
×0
×0.9×0.9 +0.00018144
=0.0059521392 ×0.2) × 0.2 s3 0 0.3 0.7 0
=0 =0.000154224
s4 0 0 0.1 0.9
ATGGG is more likely than CCTTT s5 0.2 0.3 0.3 0.2
CSCI3220 Algorithms for Bioinformatics Tutorial Notes 18
Backward Algorithm
• Define (t,i)=Pr(ot+1, ot+2, ..., om|qt=si, )
𝑁
𝑁
• Finally: Prሺ𝑂|θሻ= ሺ1, 𝑗ሻ𝑒𝑗 ሺ𝑜1ሻπ𝑗 Si: State i
𝑗=1 qt : State at time t
q =( , P, E): Model Parameters
: Initial P: Transition E: Emission
Ot: Observation at time t
β(t,i) s1 s2 s3 s4 s5 P: qt
qt+1
s1 s2 s3 s4 s5
0.008466
×0.9×0.9
0.04704 0.008466 +0.049272 0.049272 s1 0 0.7 0.3 0 0
t=1 ×0.7×0.5 0 ×0.7×0.9
×0.1×0.2
×0.2×0.2
=0.016464 =0.00533358 =0.00197088
=0.007842 s2 0 0 1 0 0
9
0.0672×0. 0.0672×1
0.0672×0.3 0.0294×0.9 0.0672×0.8 s3 0 0 0.3 0.7 0
×0.7+0.0294 ×0.1+0.194 ×0.7+0.194
t=2 3×0.7 ×0.7
×0.7×0.1 ×0.1×0.3 ×0.2×0.3
=0.014112 =0.04704
=0.01617 =0.008466 =0.049272
s4 0 0 0 0.9 0.1
0.28×0.3×0.7 0.12×0.9 0.28×0.8×0.7 s5 0 0 0.8 0 0.2
0.28×0.3 0.28×1 O
+0.12×0.7 ×0.1+0.62 +0.62×0.2×0.
t=3 ×0.7
=0.0588
×0.7
=0.196
×0.1 ×0.1×0.3 3 E: si A C G T
=0.0672 =0.0294 =0.194
s1 0.8 0.2 0 0
1×0.3×0.7 1×0.9×0.1 1×0.8×0.7
1×0.3×0.7 1×1×0.7
t=4 =0.21 =0.7
+1×0.7×0.1 +1×0.1×0.3 +1×0.2×0.3
0.5 0 0 0.5
=0.28 =0.12 =0.62 s2
t=5 1 1 1 1 1 s3 0 0.3 0.7 0
s4 0 0 0.1 0.9
s5 0.2 0.3 0.3 0.2
CSCI3220 Algorithms for Bioinformatics Tutorial Notes 20
Answer: HMM – Backward Algorithm
:
si
Likelihood of CCTTT π s1 s2 s3 s4 s5
= 0.03816477×0.8×0.2
= 0.0061063632 < 0.01053696 Init 0.8 0.2 0 0 0
β(t,i) s1 s2 s3 s4 s5 P: qt
qt+1
s1 s2 s3 s4 s5
0.424053×0.
0.424053 0.424053 0.424053 0.000064
×0.3×0.3 ×1×0.3 ×0.3×0.3 ×0.1×0.3
8×0.3 s1 0 0.7 0.3 0 0
t=1 =0.0381647 =0.127215 =0.0381647 =0.0000019
+0.000064
×0.2×0.3
7 9 7 2 =0.10177656 s2 0 0 1 0 0
0.6731
0.6731×0.9
0.0016×0.2 s3 0 0 0.3 0.7 0
×0.9+0.0016
t=2 0 0 ×0.7×0.9
×0.1×0.2
×0.2
=0.424053
=0.545243
=0.000064 s4 0 0 0 0.9 0.1
0.83×0.7
0.83×0.9×0.
9+0.04×0.1 0.04×0.2×0.2
s5 0 0 0.8 0 0.2
t=3 0 0 ×0.9
×0.2 =0.0016 E: O
A C G T
=0.5229 si
=0.6731
1×0.7×0.5 1×0.7×0.9
1×0.9×0.9
1×0.2×0.2
s1 0.8 0.2 0 0
t=4 =0.35
0
=0.63
+1×0.1×0.2
=0.04
=0.83
s2 0.5 0 0 0.5
t=5 1 1 1 1 1
s3 0 0.3 0.7 0
ATGGG is more likely than CCTTT s4 0 0 0.1 0.9
s5 0.2 0.3 0.3 0.2
CSCI3220 Algorithms for Bioinformatics Tutorial Notes 21
Exercise 3: Hidden Markov Model
3) What states are the most
likely for the sequence, ACGGT?
:
si
π
s1 s2 s3 s4 s5
Init 0.8 0.2 0 0 0
O
E: si A C G T
s1 0.8 0.2 0 0
s2 0.5 0 0 0.5
s3 0 0.3 0.7 0
s4 0 0 0.1 0.9
s5 0.2 0.3 0.3 0.2
δሺ1,𝑖ሻ= Prሺ𝑞1 = 𝑠𝑖 and o1|θሻ= Prሺo1 |𝑞1 = 𝑠𝑖 ,θሻPrሺ𝑞1 = 𝑠𝑖 |θሻ= 𝑒𝑖 ሺ𝑜1 ሻπ𝑖
t=1 0.8×0.8
=0.64
0.2×0.5
=0.1 0 0 0 P: qt
qt+1
s1 s2 s3 s4 s5
Max(0.00084672
s1 0.8 0.2 0 0
Max(0.00254016 ×0.1,
×0.7, 0.00084672 0.5 0 0 0.5
t=5 0 0 0 ×0.9) ×0.9 0.00012096 s2
×0.2) ×0.2
=0.0016003008
=0.0000169344
s3 0 0.3 0.7 0