0% found this document useful (0 votes)
67 views

L08-Uncertainty & Probabilistic Reasoning

The document discusses reasoning under uncertainty and probabilistic reasoning. It explains that there are many situations involving uncertainty, like travel delays or medical diagnoses. Probabilistic reasoning allows systems to better handle uncertainty compared to those that do not. The document discusses representing uncertainty using probabilities and random variables. Key concepts explained include conditional probabilities, Bayes' rule, and how Bayes' rule allows calculating the probability of a cause given an effect by combining conditional probabilities of the effect given different possible causes with their prior probabilities. This allows handling changing probabilities better than only using diagnostic reasoning from effects to causes.

Uploaded by

Alda lumbangaol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

L08-Uncertainty & Probabilistic Reasoning

The document discusses reasoning under uncertainty and probabilistic reasoning. It explains that there are many situations involving uncertainty, like travel delays or medical diagnoses. Probabilistic reasoning allows systems to better handle uncertainty compared to those that do not. The document discusses representing uncertainty using probabilities and random variables. Key concepts explained include conditional probabilities, Bayes' rule, and how Bayes' rule allows calculating the probability of a cause given an effect by combining conditional probabilities of the effect given different possible causes with their prior probabilities. This allows handling changing probabilities better than only using diagnostic reasoning from effects to causes.

Uploaded by

Alda lumbangaol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

10S3001 – Kecerdasan Buatan (Certan)

Samuel I. G. Situmeang, S.TI., M.Sc.


Semester Gasal T.A. 2020/2021
Modified slides from Jeremy Wyatt
Artificial Intelligence Principles, University of Birmingham, 2008
▪ Reasoning Under Uncertainty

▪ Probabilistic Reasoning Over Time

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 2


3

• Why do we need reasoning under uncertainty?


• How should uncertainty be represented?

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del


▪ There are many situations where uncertainty arises:

▪ When you travel you reason about the possibility of delays


▪ When an insurance company offers a policy it has calculated the risk that you will claim
▪ When your brain estimates what an object is it filters random noise and fills in missing details
▪ When you play a game you cannot be certain what the other player will do
▪ A medical expert system that diagnoses disease has to deal with the results of tests that are
sometimes incorrect

▪ Systems which can reason about the effects of uncertainty should do better than those
that don’t

▪ But how should uncertainty be represented?

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 4


▪ I have toothache. What is the cause?
There are many possible causes of an observed event.

▪ If I go to the dentist and he examines me, when the probe catches this indicates there
may be a cavity, rather than another cause.
The likelihood of a hypothesised cause will change as additional pieces of evidence
arrive.

▪ Bob lives in San Francisco. He has a burglar alarm on his house, which can be triggered
by burglars and earthquakes. He has two neighbours, John and Mary, who will call him if
the alarm goes off while he is at work, but each is unreliable in their own way. All these
sources of uncertainty can be quantified. Mary calls, how likely is it that there has been
a burglary?
Using probabilistic reasoning we can calculate how likely a hypothesised cause is.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 5


▪ A random variable can be an observation, outcome or event the value of which is
uncertain.
▪ e.g. A coin. Let’s use Throw as the random variable denoting the outcome when we toss
the coin.

▪ The set of possible outcomes for a random variable is called its domain.
▪ e.g. A coin. The domain of Throw is {head, tail}

▪ A Boolean random variable has two outcomes.


▪ e.g. Cavity has the domain {true, false}
▪ e.g. Toothache has the domain {true, false}

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 6


▪ We can create new events out of combinations of the outcomes of random variables.

▪ An atomic event is a complete specification of the values of the random variables of


interest.
▪ e.g. if our world consists of only two Boolean random variables, then the world has a four
possible atomic events
▪ Toothache = true ∧ Cavity = true
▪ Toothache = true ∧ Cavity = false
▪ Toothache = false ∧ Cavity = true
▪ Toothache = false ∧ Cavity = false
▪ The set of all possible atomic events has two properties:
▪ It is mutually exhaustive (nothing else can happen)
▪ It is mutually exclusive (only one of the four can happen at one time)

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 7


▪ We can assign probabilities to the outcomes of a random variable.
▪ 𝑃(Throw = heads) = 0.5
▪ 𝑃(Mary_Calls = true) = 0.1
▪ 𝑃(𝑎) = 0.3

▪ Some simple rules governing probabilities


1. All probabilities are between 0 and 1 inclusive: 0 ≤ 𝑃 𝑎 ≤ 1
2. If something is necessarily true it has probability 1: 𝑃 true = 1
3. The probability of a disjunction being true is: 𝑃 false = 0

▪ From these three laws all of probability theory can be derived.


𝑃 𝑎∨𝑏 =𝑃 𝑎 +𝑃 𝑏 −𝑃 𝑎∧𝑏

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 8


▪ We can often intuitively understand the laws of probability by thinking about sets

P(a  b) = P(a ) + P(b) − P(a  b)

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 9


▪ A conditional probability expresses the likelihood that one event 𝑎 will occur if 𝑏 occurs.
We denote this as 𝑃 𝑎|𝑏 .
▪ e.g.
𝑃 Toothache = true = 0.2
𝑃 Toothache = true|Cavity = true = 0.6

▪ So, conditional probabilities reflect the fact that some events make other events more
(or less) likely.

▪ If one event doesn’t affect the likelihood of another event, they are said to be
independent and therefore:
𝑃 𝑎|𝑏 = 𝑃 𝑎
▪ e.g. if you roll a die, it doesn’t make it more or less likely that you will roll a 6 on the next
throw. The rolls are independent.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 10


Product Rule
▪ How we can work out the likelihood of two events occuring together given their
base and conditional probabilities?
P(a  b) = P(a | b) P(b) = P(b | a) P(a)

▪ So in our toy example 1:

P(toothache  cavity) = P(toothache | cavity) P(cavity)


= P(cavity | toothache ) P(toothache )

▪ But this doesn’t help us answer our question:


“I have toothache. Do I have a cavity?”

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 11


Bayes’ rule
▪ We can rearrange the two parts of the product rule:

P(a  b) = P(a | b) P(b) = P(b | a) P(a)


P(a | b) P(b) = P(b | a) P(a)
P(b | a) P(a)
P ( a | b) =
P(b)
▪ This is known as Bayes’ rule.

▪ It is the cornerstone of modern probabilistic AI.

▪ But why is it useful?

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 12


▪ We can think about some events as being “hidden” causes: not necessarily directly
observed (e.g. a cavity).

▪ If we model how likely observable effects are given hidden causes (how likely
toothache is given a cavity), then Bayes’ rule allows us to use that model to infer
the likelihood of the hidden cause (and thus answer our question)
P(effect | cause) P(cause)
P(cause | effect ) =
P(effect )

▪ In fact, good models of 𝑃 effect|cause are often available to us in real domains (e.g.
medical diagnosis)
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 13
▪ Suppose a doctor knows that a meningitis causes a stiff neck in 50% of cases
P( s | m) = 0.5
▪ She also knows that the probability in the general population of someone having a
stiff neck at any time is 1/20
P( s) = 0.05
▪ She also has to know the incidence of meningitis in the population (1/50,000)
P(m) = 0.00002
▪ Using Bayes’ rule she can calculate the probability the patient has meningitis:
P( s | m) P(m) 0.5  0.00002
P(m | s ) = = = 0.0002 = 1 / 5000
P( s) 0.05
P(effect | cause) P(cause)
P(cause | effect ) =
P(effect )
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 14
▪ Why wouldn’t the doctor be better off if she just knew the likelihood of meningitis
given a stiff neck? i.e. information in the diagnostic direction from symptoms to
causes?
▪ Because diagnostic knowledge is often more fragile than causal knowledge

▪ Suppose there was a meningitis epidemic? The rate of meningitis goes up 20 times
within a group
P( s | m) P(m) 0.5  0.0004
P(m | s) = = = 0.004 = 1 / 250
P( s) 0.05

▪ The causal model 𝑃 𝑠 𝑚 is unaffected by the change in 𝑃 𝑚 , whereas the


diagnostic model 𝑃 𝑚 𝑠 = 1/5000 is now badly wrong.
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 15
▪ If we know 𝑃(effect|cause), for every cause we can avoid having to know 𝑃 effect
P (e | c ) P (c ) P (e | c ) P ( c )
P ( c | e) = =
P ( e)  P (e | h ) P ( h )
hCauses
▪ Suppose for two possible causes of a stiff neck, meningitis (𝑚) and not meningitis (¬𝑚)

P( Meningitis ) =   P( s | m) P(m), P( s | m) P(m) 

▪ We simply calculate the top line for each one and then normalise (divide by the sum of the top line for
all hypothesises.

▪ But sometimes it’s harder to find out 𝑃(effect|cause) for all causes independently than it is simply to find
out 𝑃(effect).

▪ Note that Bayes’ rule here relies on the fact the effect must have arisen because of one of the
hypothesised causes. You can’t reason directly about causes you haven’t imagined.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 16


▪ Suppose we have several pieces of evidence we want to combine:
▪ John rings and Mary rings
▪ I have toothache and the dental probe catches on my tooth

▪ How do we do this?
P(cavity | toothache  catch) =  P(toothache  catch | cavity) P(cavity)

▪ As we have more effects our causal model becomes very complicated (for N binary
effects there will be 2N different combinations of evidence that we need to model
given a cause)
P(toothache  catch | cavity) , P(toothache  catch | cavity) 
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 17
▪ In many practical applications there are not a few evidence variables but hundreds

▪ Thus 2N is very big

▪ This nearly led everyone to give up and rely on approximate or qualitative methods for reasoning about
uncertainty

▪ But conditional independence helps

▪ Toothache and catch are not independent, but they are independent given the presence or absence of a
cavity.

▪ In other words we can use the knowledge that cavities cause toothache and they cause the catch, but
the catch and the toothache do not cause each other (they have a single common cause).

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 18


▪ This can be captured in a picture, where the arcs capture conditional independence relationships

▪ Or in a new equation:

P(cavity | toothache  catch) =  P(toothache  catch | cavity) P(cavity)


=  P(toothache | cavity) P(catch | cavity) P(cavity)

▪ Using conditional independence the causal model is much more compact. Rather than the number
of parameters being O(2N) it is O(N) where N is the number of effects (or evidence variables)

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 19


▪ It turns out that your brain understands Bayes rule!

Reproduced without permission


from Kording & Wolpert Nature
Vol 427, 15 Jan 2004

▪ The human has to move their (hidden) hand to a target. Half way through the movement
they are given an uncertain clue as to where the hand is. The position their hand moves
to by the end is consistent with the brain using Bayes’ rule to combine the information it
receives
https://mitpress.mit.edu/books/bayesian-brain

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 20


▪ But it is not apparent that
Bayes’ rule is used everywhere

▪ How will rotation of one


wheel affect the other?

▪ Some kinds of inference


don’t seem to be obviously
explainable using probabilistic
reasoning alone

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 21


22

• Markov Chain
• Hidden Markov Model

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del


What is the word at
the end of this
________?

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 23


P
sentence

P
is next of 0.
6
P P
P P P paragraph
P P
P
Wha P 0.0
the word this 5
t P P
0.0
P 5
P P P line
P P

are end at
0.
3
message
P
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 24
▪ Design a Markov Chain to predict the weather of
tomorrow using previous information of the past days.

▪ We have three types of weather:


𝑠𝑢𝑛𝑛𝑦, 𝑟𝑎𝑖𝑛𝑦, and 𝑐𝑙𝑜𝑢𝑑𝑦.
▪ So, our model has 3 states: 𝑆 = 𝑆1 , 𝑆2 , 𝑆3 ,
and the name of each state is 𝑆1 = 𝑆𝑢𝑛𝑛𝑦,
𝑆2 = 𝑅𝑎𝑖𝑛𝑦, 𝑆3 = 𝐶𝑙𝑜𝑢𝑑𝑦.

▪ Assume that the weather lasts all day, i.e.


it doesn’t change from rainy to sunny in
the middle of the day.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 25


▪ Assume a simplified model of weather prediction:
▪ Collect statistics on what the weather was like
today based on what the weather was like yesterday,
the day before, and so forth to collect the
following probabilities:

▪ 𝑃 𝑤𝑛 |𝑤𝑛−1 , 𝑤𝑛−2 , … , 𝑤1

▪ With the expression, we can give probabilities of


types of weather for tomorrow and the next day
using 𝑛 days history.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 26


▪ For example, the past three days was {𝑠𝑢𝑛𝑛𝑦, 𝑠𝑢𝑛𝑛𝑦, 𝑐𝑙𝑜𝑢𝑑𝑦}, the probability that
tomorrow would be rainy is given by:
▪ 𝑃 𝑤4 = 𝑅𝑎𝑖𝑛𝑦|𝑤3 = 𝐶𝑙𝑜𝑢𝑑𝑦, 𝑤2 = 𝑆𝑢𝑛𝑛𝑦, 𝑤1 = 𝑆𝑢𝑛𝑛𝑦

▪ Problem: the larger 𝑛 is, the more statistics we must collect. Suppose 𝑛 = 5, then we
must collect statistics for 35 = 243 past histories. Therefore, Markov Assumption:
▪ In a sequence 𝑤1 , 𝑤2 , … , 𝑤𝑛 :

𝑃 𝑤𝑛 |𝑤𝑛−1 , 𝑤𝑛−2 , … , 𝑤1 ≈ 𝑃 𝑤𝑛 |𝑤𝑛−1

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 27


▪ 𝑃 𝑤𝑛 |𝑤𝑛−1 , 𝑤𝑛−2 , … , 𝑤1 ≈ 𝑃 𝑤𝑛 |𝑤𝑛−1 called a first-order Markov Assumption, since
we say that the probability of an observation at time 𝑛 only depends on the
observation at time 𝑛 − 1.

▪ A second-order Markov assumption would have the observation at time 𝑛 depend on


𝑛 − 1 and 𝑛 − 2.

▪ We can express the joint probability using the Markov assumption.


𝑛

𝑃 𝑤1 , … , 𝑤𝑛 = ෑ 𝑃 𝑤𝑖 |𝑤𝑖−1
𝑖=1

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 28


▪ Let’s arbitrarily pick some numbers for 𝑃 𝑤𝑡𝑜𝑚𝑜𝑟𝑟𝑜𝑤 |𝑤𝑡𝑜𝑑𝑎𝑦 expressed in Table 1.

Table 1. Probabilities of Tomorrow’s


weather based on Today’s weather
Tomorrow’s Weather
Sunny Rainy Cloudy

Today’s Sunny 0.8 0.05 0.15


Weather Rainy 0.2 0.6 0.2
Foggy 0.2 0.3 0.5

▪ For first-order Markov models, we can use these probabilities to draw a probabilistic
finite state automaton.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 29


▪ For the weather domain, you would have three states (Sunny, Rainy, Cloudy) and
every day you would transition to a (possibly) new state based on the probabilities
in Table 1.
▪ Such an automaton would look like this:
𝑃 𝑆𝑢𝑛𝑛𝑦|𝑆𝑢𝑛𝑛𝑦 = 0.8
𝑃 𝑅𝑎𝑖𝑛𝑦|𝑆𝑢𝑛𝑛𝑦 = 0.05 1
𝑃 𝐶𝑙𝑜𝑢𝑑𝑦|𝑆𝑢𝑛𝑛𝑦 = 0.15

𝑃 𝑆𝑢𝑛𝑛𝑦|𝑅𝑎𝑖𝑛𝑦 = 0.2
𝑃 𝑅𝑎𝑖𝑛𝑦|𝑅𝑎𝑖𝑛𝑦 = 0.6 1
𝑃 𝐶𝑙𝑜𝑢𝑑𝑦|𝑅𝑎𝑖𝑛𝑦 = 0.2

𝑃 𝑆𝑢𝑛𝑛𝑦|𝐶𝑙𝑜𝑢𝑑𝑦 = 0.2
𝑃 𝑅𝑎𝑖𝑛𝑦|𝐶𝑙𝑜𝑢𝑑𝑦 = 0.3 1
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 𝑃 𝐶𝑙𝑜𝑢𝑑𝑦|𝐶𝑙𝑜𝑢𝑑𝑦 = 0.5 30
▪ Exercise 1
▪ Given that today is Sunny, what’s the probability that tomorrow is Sunny and the day after
is Rainy?
𝑃 𝑤2 , 𝑤3 |𝑤1 = 𝑃 𝑤3 𝑤2 , 𝑤1 ∗ 𝑃 𝑤2 |𝑤1
= 𝑃 𝑤3 𝑤2 ∗ 𝑃 𝑤2 |𝑤1
= 𝑃 𝑅𝑎𝑖𝑛𝑦|𝑆𝑢𝑛𝑛𝑦 𝑃 𝑆𝑢𝑛𝑛𝑦 𝑆𝑢𝑛𝑛𝑦
= 0.05 0.8
= 0.04

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 31


▪ Exercise 2
▪ Given that today is Cloudy, what’s the probability that it will be Rainy two days from now?
There are three ways to get from Cloudy
today to Rainy two days from now: {Cloudy,
Cloudy, Rainy}, {Cloudy, Rainy, Rainy}, and
{Cloudy, Sunny, Rainy.}

𝑃 𝑤3 |𝑤1 = 𝑃 𝑤2 = 𝐶, 𝑤3 = 𝑅|𝑤1 = 𝐶 +
𝑃 𝑤2 = 𝑅, 𝑤3 = 𝑅|𝑤1 = 𝐶 +
𝑃 𝑤2 = 𝑆, 𝑤3 = 𝑅|𝑤1 = 𝐶
= 𝑃 𝑤3 = 𝑅|𝑤2 = 𝐶 ∗ 𝑃(𝑤2 = 𝐶|𝑤1 = 𝐶) +
𝑃 𝑤3 = 𝑅|𝑤2 = 𝑅 ∗ 𝑃(𝑤2 = 𝑅|𝑤1 = 𝐶) +
𝑃 𝑤3 = 𝑅|𝑤2 = 𝑆 ∗ 𝑃(𝑤2 = 𝑆|𝑤1 = 𝐶)
= 0.3 0.5 + 0.6 0.3 + (0.05)(0.2)
= 0.34
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 32
▪ A Markov Model is a stochastic model which models temporal or sequential data, i.e.,
data that are ordered.

▪ It provides a way to model the dependencies of current information (e.g. weather) with
previous information.

▪ It is composed of states, transition scheme between states, and emission of outputs


(discrete or continuous).

▪ Several goals can be accomplished by using Markov models:


▪ Learn statistics of sequential data.
▪ Do prediction or estimation.
▪ Recognize patterns.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 33


What makes a Hidden
Markov Model?

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 34


▪ A Hidden Markov Model, is a stochastic model where the states of the model are
hidden. Each state can emit an output which is observed.

▪ Imagine: You were locked in a room for several days, and you were asked about the
weather outside. The only piece of evidence you have is whether the person who
comes into the room carrying your daily meal is carrying an umbrella or not.
▪ What is hidden? Sunny, Rainy, Cloudy
▪ What can you observe? Umbrella or Not

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 35


Markov Chain Hidden Markov Model

U = Umbrella
NU= Not Umbrella

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 36


▪ Let’s assume that 𝑡 days had passed. Therefore, we will have an observation
sequence 𝑂 = 𝑜1 , … , 𝑜𝑡 , where 𝑜𝑖 ∈ 𝑈𝑚𝑏𝑟𝑒𝑙𝑙𝑎, 𝑁𝑜𝑡 𝑈𝑚𝑏𝑟𝑒𝑙𝑙𝑎 .

▪ Each observation comes from an unknown state. Therefore, we will also have an
unknown sequence 𝑊 = 𝑤1 , … , 𝑤𝑡 , where 𝑤𝑖 ∈ 𝑆𝑢𝑛𝑛𝑦, 𝑅𝑎𝑖𝑛𝑦, 𝐶𝑙𝑜𝑢𝑑𝑦 .

▪ We would like to know: 𝑃 𝑤1 , … , 𝑤𝑡 |𝑜1 , … , 𝑜𝑡 .

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 37


▪ From Bayes’ Theorem, we can obtain the probability for a particular day as:

𝑃 𝑜𝑖 |𝑤𝑖 𝑃 𝑤𝑖
𝑃 𝑤𝑖 |𝑜𝑖 =
𝑃 𝑜𝑖

▪ For a sequence of length 𝑡:

𝑃 𝑜1 , … , 𝑜𝑡 |𝑤1 , … , 𝑤𝑡 𝑃 𝑤1 , … , 𝑤𝑡
𝑃 𝑤1 , … , 𝑤𝑡 |𝑜1 , … , 𝑜𝑡 =
𝑃 𝑜1 , … , 𝑜𝑡

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 38


▪ From the Markov property:

𝑃 𝑤1 , … , 𝑤𝑡 = ෑ 𝑃 𝑤𝑖 |𝑤𝑖−1
𝑖=1

▪ Independent observations assumption:

𝑃 𝑜1 , … , 𝑜𝑡 |𝑤1 , … , 𝑤𝑡 = ෑ 𝑃 𝑜𝑖 |𝑤𝑖
𝑖=1

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 39


▪ Thus:
▪ 𝑃 𝑜1 , … , 𝑜𝑡 |𝑤1 , … , 𝑤𝑡 ∝ ς𝑡𝑖=1 𝑃 𝑜𝑖 |𝑤𝑖 ς𝑡𝑖=1 𝑃 𝑤𝑖 |𝑤𝑖−1

HMM Parameters:
• Transition probabilities 𝑃 𝑤𝑖 |𝑤𝑖−1
• Emission probabilities 𝑃 𝑜𝑖 |𝑤𝑖
• Initial state probabilities 𝑃 𝑤𝑖

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 40


▪ HMM is governed by the following parameters:
𝜆 = 𝐴, 𝐵, 𝜋
▪ State-transition probability matrix A
▪ Emission/Observation/State Conditional Output probabilities 𝐵
▪ Initial (prior) state probabilities 𝜋

▪ Determine the fixed number of states (𝑁):


𝑆 = 𝑠1 , … , 𝑠𝑁

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 41


▪ State-transition probability:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 42


▪ Emission probability distribution: A state will generate an observation (output), but
a decision must be taken according on how to model the output, i.e., as discrete or
continuous.

▪ Discrete outputs are modeled using pmfs (probability mass function).

▪ Continuous outputs are modeled using pdfs (probability density function).

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 43


▪ Discrete Emission Probabilities:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 44


▪ Initial (prior) probabilities: these are the probabilities of starting the observation
sequence in state 𝑞𝑖.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 45


▪ Problem 1: Probability Evaluation

▪ How do we compute the probability that the observed sequence was produced by the
model?
▪ Scoring how well a given model matches a given observation sequence.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 46


▪ Problem 2: Optimal State Sequence

▪ Attempt to uncover the hidden part of the model –that is, to find the “correct” state
sequence.
▪ For practical situations, we usually use an optimality criterion to solve this problem as best
as possible.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 47


▪ Problem 3: Parameter Estimation

▪ Attempt to optimize the model parameters to best describe how a given observation
sequence comes about.
▪ The observation sequence used to adjust the model parameters is called a training
sequence because it is used to “train” the HMM.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 48


𝑃 𝐻 𝑅𝑒𝑑 𝐶𝑜𝑖𝑛 = 0.9 𝑃 𝐻 𝐺𝑟𝑒𝑒𝑛 𝐶𝑜𝑖𝑛 = 0.95
𝑃 𝑇 𝑅𝑒𝑑 𝐶𝑜𝑖𝑛 = 0.1 𝑃 𝑇 𝐺𝑟𝑒𝑒𝑛 𝐶𝑜𝑖𝑛 = 0.05

𝑂𝑢𝑡𝑝𝑢𝑡𝑠 = 1,2,3,4,5,6 𝑂𝑢𝑡𝑝𝑢𝑡𝑠={1,1,1,1,1,1,1,2,3,4,5,6}


http://www.mathworks.com/help/stats/hidden-markov-models-hmm.html
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 49
http://www.mathworks.com/help/stats/hidden-markov-models-hmm.html
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 50
http://www.mathworks.com/help/stats/hidden-markov-models-hmm.html
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 51
http://www.mathworks.com/help/stats/hidden-markov-models-hmm.html
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 52
▪ Ergodic

▪ Left-right

▪ Parallel path left-right

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 53


▪ Features
▪ Field descriptor

▪ Edge descriptor

▪ Grass and sand

▪ Player height

P. Chang, M. Han, and Y. Gong. “Extract Highlights From Baseball Game Video With Hidden Markov Models,” In Proc. of ICIP, vol. 1, pp. 609-612, 2002.
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 54
▪ Reasoning under uncertainty is an important area of AI
▪ It is not the case that statistical methods are the only way
▪ Logics can also cope with uncertainty in a qualitative way
▪ But statistical methods, and particularly Bayesian reasoning has become a
cornerstone of modern AI (rightly or wrongly)
▪ A Markov Model is a stochastic model which models temporal or sequential data,
i.e., data that are ordered.
▪ Hidden Markov Model (HMM) is a statistical Markov model in which the system being
modeled is assumed to be a Markov process with unobservable (i.e. hidden) states.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 55


▪ S. J. Russell and P. Borvig, Artificial Intelligence: A Modern Approach (4th Edition), Prentice Hall International, 2020.

▪ Chapter 12. Quantifying Uncertainty

▪ Chapter 13. Probabilistic Reasoning

▪ Chapter 14. Probabilistic Reasoning over Time

▪ E. Fosler-Lusier, “Markov Models and Hidden Markov Model: A Brief Tutorial,” International Computer Science Institute, 1998.

▪ L.R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proceedings of the IEEE ,
vol.77, no.2, pp.257-286, Feb 1989
▪ John R. Deller, John, and John H. L. Hansen. “Discrete-Time Processing of Speech Signals”. Prentice Hall, New Jersey, 1987.

▪ Barbara Resch (modified Erhard and Car Line Rank and Mathew Magimai-doss); “Hidden Markov Models A Tutorial for the Course
Computational Intelligence.”
▪ Henry Stark and John W. Woods. “Probability and Random Processes with Applications to Signal Processing (3rd Edition).”
Prentice Hall, 3 edition, August 2001.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 56


10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 57

You might also like