0% found this document useful (0 votes)
9 views

Lecture 05 Reasoning Under Uncertainty

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 05 Reasoning Under Uncertainty

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Vietnam National University of HCMC

International University
School of Computer Science and Engineering

INTRODUCTION TO ARTIFICIAL INTELLIGENCE


(IT097IU)
LECTURE 05: REASONING UNDER UNCERTAINTY

Instructor: Nguyen Trung Ky


Our Status in Intro to AI
 We’re done with Part I Intelligent Agents and Search!

 Part II: Reasoning Under Uncertainty and Machine


Learning
 Diagnosis
 Speech recognition
 Tracking objects
 Robot mapping
 Genetics
 Spell corrector
 … lots more!
Inference in Ghostbusters
 A ghost is in the grid
somewhere
 Sensor readings tell how
close a square is to the
ghost
 On the ghost: red
 1 or 2 away: orange
 3 or 4 away: yellow
 5+ away: green

 Sensors are noisy, but we know F(Color | Distance)

[Demo: Ghostbuster – no probability (L12D1) ]


Video of Demo Ghostbuster – No probability
Uncertainty
 General situation:
 Observed variables (evidence): Agent knows certain things
about the state of the world (e.g., sensor readings or
symptoms)
 Unobserved variables: Agent needs to reason about other
aspects (e.g. where an object is or what disease is present)
 Model: Agent knows something about how the known
variables relate to the unknown variables

 Reasoning under uncertainty:


 A rational agent is one that makes rational decisions - in
order to maximize its performance measure
 A rational decision depends on likelihood and degrees of
belief to which they will be achieved.
 Probability theory is the main tool for handling
degrees of belief and uncertainty
Today
 Probability
 Random Variables and Events
 Joint and Marginal Distributions
 Conditional Distribution
 Product Rule, Chain Rule, Bayes’ Rule
 Inference
 Independence

 You’ll need all this stuff A LOT for the


next few weeks, so make sure you go
over it now!
Random Variables
 A random variable is some aspect of the world about
which we (may) have uncertainty
 L = Where is the ghost?
 R = Is it raining?
 T = Is it hot or cold?
 D = How long will it take to drive to work?

 We denote random variables with capital letters

 Like variables in a CSP, random variables have domains


 L in possible locations, maybe {(0,0), (0,1), …}
 R in {true, false} (often write as {+r, -r})
 T in {hot, cold}
 D in [0, )
Probability Distributions
 Associate a probability with each value

 Temperature:  Weather:

W P
T P
sun 0.6
hot 0.5
rain 0.1
cold 0.5
fog 0.3
meteor 0.0
Probability Distributions
 Unobserved random variables have distributions
Shorthand notation:

T P W P
hot 0.5 sun 0.6
cold 0.5 rain 0.1
fog 0.3
meteor 0.0

 A distribution is a TABLE of probabilities of values OK if all domain entries are unique

 A probability (lower case value) is a single number

 Must have: and


Joint Probability Distributions
 A joint probability distribution (JPD) over a set of random variables:
specifies a real number for each assignment (or outcome):

T W P
 Must obey: hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

 Size of distribution if n variables with domain sizes d?


 For all but the smallest distributions, impractical to write out!
Events
 An event is a set E of outcomes
T W P
hot sun 0.4
hot rain 0.1
 From a joint distribution, we can cold sun 0.2
calculate the probability of any event cold rain 0.3

 Probability that it’s hot AND sunny?

 Probability that it’s hot?

 Probability that it’s hot OR sunny?

 Typically, the events we care about are


partial assignments, like P(T=hot)
Quiz: Events
 P(+x, +y) ?

X Y P
+x +y 0.2
 P(+x) ?
+x -y 0.3
-x +y 0.4
-x -y 0.1
 P(-y OR +x) ?
Marginal Distributions
 Marginal distributions are sub-tables which eliminate variables
 Marginalization (summing out): Combine collapsed rows by adding

T P
hot 0.5
T W P
cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun 0.6
rain 0.4
Quiz: Marginal Distributions

X P
+x
X Y P
-x
+x +y 0.2
+x -y 0.3
-x +y 0.4 Y P
-x -y 0.1 +y
-y
Conditional Probabilities
 A simple relation between joint and conditional probabilities
 In fact, this is taken as the definition of a conditional probability
P(a,b) W = sun W = rain P(T)

T = hot 0.4 0.1 0.5


T = cold 0.2 0.3 0.5
P(W) 0.6 0.4 1
P(a) P(b)

T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
Conditional Distributions
 Conditional distributions are probability distributions over
some variables given fixed values of others
Conditional Distributions
Joint Distribution

W P
T W P
sun
hot sun 0.4
rain
hot rain 0.1
cold sun 0.2
W P cold rain 0.3
sun
rain
Quiz: Conditional Probabilities
 P(+x | +y) ?

X Y P
+x +y 0.2  P(-x | +y) ?
+x -y 0.3
-x +y 0.4
-x -y 0.1
 P(-y | +x) ?
Y = +y Y = -y P(X)
X = +x 0.2 0.3 0.5
X = -x 0.4 0.1 0.5
P(Y) 0.6 0.4 1
Normalization Trick
 A trick to get a whole conditional distribution at once:
 Select the joint probabilities matching the evidence
 Normalize the selection (make it sum to one)
T W P
hot sun 0.4 T R P T P
hot rain 0.1 hot rain 0.1 hot 0.25
Select Normalize
cold sun 0.2 cold rain 0.3 cold 0.75
cold rain 0.3
 Why does this work? Sum of selection is P(evidence)! (P(r), here)
To Normalize
 (Dictionary) To bring or restore to a normal condition

All entries sum to ONE


 Procedure:
 Step 1: Compute Z = sum over all entries
 Step 2: Divide every entry by Z

 Example 1  Example 2
W P W P T W P T W P
Normalize
hot sun 20 Normalize hot sun 0.4
sun 0.2 sun 0.4
hot rain 5 hot rain 0.1
rain 0.3 Z = 0.5 rain 0.6 Z = 50
cold sun 10 cold sun 0.2
cold rain 15 cold rain 0.3
Probabilistic Models
 A probabilistic model is a joint distribution over a set of
variables

 Inference: given a joint distribution, we can reason about


unobserved variables given observations (evidence)

 General form of a query:

This conditional distribution is called a posterior distribution or


the the belief function of an agent which uses this model
Probabilistic Inference
 Probabilistic inference: compute a desired probability
from other known probabilities (e.g. conditional from
joint)

 We generally compute conditional probabilities


 P(on time | no reported accidents) = 0.90
 These represent the agent’s beliefs given the evidence

 Probabilities change with new evidence:


 P(on time | no accidents, 5 a.m.) = 0.95
 P(on time | no accidents, 5 a.m., raining) = 0.80
 Observing new evidence causes beliefs to be updated
Inference by Enumeration
* Works fine with
 General case:  We want: multiple query
 Evidence variables: variables, too
 Query* variable:
All variables
 Hidden variables:

 Step 1: Select the  Step 2: Sum out H to get joint  Step 3: Normalize
entries consistent of Query and evidence
with the evidence
Inference by Enumeration
S T W P
 P(W)?
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
 P(W| winter)? summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
 P(W| winter, hot)? winter cold rain 0.20
Inference by Enumeration

 Obvious problems:
 Worst-case time complexity O(dn)
 Space complexity O(dn) to store the joint distribution
 Solution
 Better techniques
 Better representation
 Simplifying assumptions

Bayesian Network
Bayes’ Rule

 Two ways to factor a joint distribution over two variables:


That’s my rule!

 Dividing, we get:

 Why is this at all helpful?


 Lets us build one conditional from its reverse
 Often one conditional is tricky but the other one is simple
 Foundation of many systems we’ll see later (e.g. ASR, MT)

 In the running for most important AI equation!


The Product Rule
 Sometimes have conditional distributions but want the joint

 Example:

D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
The Chain Rule

 More generally, can always write any joint distribution as an incremental product of
conditional distributions

 Why is this always true?


Independence
 Two variables are independent if:

 This says that their joint distribution factors into a product two
simpler distributions
 Another form:

 We write:

 Independence is a simplifying modeling assumption


 Empirical joint distributions: at best “close” to independent
 What could we assume for {Weather, Traffic, Cavity, Toothache}?
Example: Independence?

T P
hot 0.5
cold 0.5
T W P T W P
hot sun 0.4 hot sun 0.3
hot rain 0.1 hot rain 0.2
cold sun 0.2 cold sun 0.3
cold rain 0.3 cold rain 0.2
W P
sun 0.6
rain 0.4 34
Example: Independence
 N fair, independent coin flips:

H 0.5 H 0.5 H 0.5


T 0.5 T 0.5 T 0.5
Conditional Independence
 P(Toothache, Cavity, Catch)
 If I have a cavity, the probability that the probe catches in it doesn't
depend on whether I have a toothache:
 P(+catch | +toothache, +cavity) = P(+catch | +cavity)
 The same independence holds if I don’t have a cavity:
 P(+catch | +toothache, -cavity) = P(+catch| -cavity)
 Catch is conditionally independent of Toothache given Cavity:
 P(Catch | Toothache, Cavity) = P(Catch | Cavity)
 Equivalent statements:
 P(Toothache | Catch , Cavity) = P(Toothache | Cavity)
 P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
 One can be derived from the other easily
Conditional Independence
 Unconditional (absolute) independence very rare (why?)

 Conditional independence is our most basic and robust


form of knowledge about uncertain environments:

 What about this domain:


 Traffic
 Umbrella
 Raining
Probability Summary
Model-based Classification with Naïve Bayes

 A general Naive Bayes model:


Y

|Y| parameters

F1 F2 Fn

|Y| x |F|n values n x |F| x |Y|


parameters

 We only have to specify how each feature depends on the class


 Total number of parameters is linear in n
 Model is very simplistic, but often works anyway
Inference for Naïve Bayes
 Goal: compute posterior distribution over label variable Y
 Step 1: get joint probability of label and evidence for each label

+
 Step 2: sum to get probability of evidence

 Step 3: normalize by dividing Step 1 by Step 2


General Naïve Bayes
 What do we need in order to use Naïve Bayes?

 Inference method (we just saw this part)


 Start with a bunch of probabilities: P(Y) and the P(Fi|Y) tables
 Use standard inference to compute P(Y|F1…Fn)
 Nothing new here

 Estimates of local conditional probability tables


 P(Y), the prior over labels
 P(Fi|Y) for each feature (evidence variable)
 These probabilities are collectively called the parameters of the
model and denoted by 
 Up until now, we assumed these appeared by magic, but…
 …they typically come from training data counts: we’ll look at this
soon
Example: Spam Filter
 Input: an email Dear Sir.
 Output: spam/ham First, I must solicit your confidence in
this transaction, this is by virture of its
 Setup: nature as being utterly confidencial and
top secret. …
 Get a large collection of example emails, each labeled
“spam” or “ham” TO BE REMOVED FROM FUTURE
 Note: someone has to hand label all this data! MAILINGS, SIMPLY REPLY TO THIS
MESSAGE AND PUT "REMOVE" IN THE
 Want to learn to predict labels of new, future emails SUBJECT.

 Features: The attributes used to make the ham / 99 MILLION EMAIL ADDRESSES
FOR ONLY $99
spam decision
 Words: FREE! Ok, Iknow this is blatantly OT but I'm
beginning to go insane. Had an old Dell
 Text Patterns: $dd, CAPS Dimension XPS sitting in the corner and
 Non-text: SenderInContacts decided to put it to use, I know it was
 … working pre being stuck in the corner,
but when I plugged it in, hit the power
nothing happened.
A Spam Filter
Dear Sir.
 Naïve Bayes spam filter
First, I must solicit your confidence in this
transaction, this is by virture of its nature
 Data: as being utterly confidencial and top
secret. …
 Collection of emails, labeled
spam or ham
TO BE REMOVED FROM FUTURE
 Note: someone has to hand MAILINGS, SIMPLY REPLY TO THIS
label all this data! MESSAGE AND PUT "REMOVE" IN THE
 Split into training, SUBJECT.
validation, test sets 99 MILLION EMAIL ADDRESSES
FOR ONLY $99
 Classifiers
Ok, Iknow this is blatantly OT but I'm
 Learn on the training set beginning to go insane. Had an old Dell
 (Tune it on a validation set) Dimension XPS sitting in the corner and
 Test it on new emails decided to put it to use, I know it was
working pre being stuck in the corner, but
when I plugged it in, hit the power nothing
happened.
Naïve Bayes for Text
 Bag-of-words Naïve Bayes:
 Features: Wi is the word at position i
 As before: predict label conditioned on feature variables (spam vs. ham)
 As before: assume features are conditionally independent given label
 New: each Wi is identically distributed Word at position
i, not ith word in
the dictionary!
 Generative model:

 “Tied” distributions and bag-of-words


 Usually, each variable gets its own conditional probability distribution P(F|Y)
 In a bag-of-words model
 Each position is identically distributed
 All positions share the same conditional probs P(W|Y)
 Why make this assumption?
 Called “bag-of-words” because model is insensitive to word order or reordering
Training and Testing
Important Concepts
 Data: labeled instances, e.g. emails marked spam/ham
 Training set
 Validation set
 Test set
 Features: attribute-value pairs which characterize each x Training
Data
 Experimentation cycle
 Learn parameters (e.g. model probabilities) on training set
 (Tune hyperparameters on validation set)
 Compute accuracy of test set
 Very important: never “peek” at the test set!
 Evaluation Validation
 Accuracy: fraction of instances predicted correctly
Data
 Overfitting and generalization
 Want a classifier which does well on test data
 Overfitting: fitting the training data very closely, but not Test
generalizing well Data
 We’ll investigate overfitting and generalization formally in a few
lectures

You might also like