Unit 2
Unit 2
Acting under uncertainty – Bayesian inference – naïve bayes models. Probabilistic reasoning –
Bayesian networks – exact inference in BN – approximate inference in BN – causal networks.
CONTENTS
• The likelihood and the degree to which, goals will be achieved. Acting
Under Uncertainty
An agent would possess some early basic knowledge of the world (Assume that
knowledge is represented in first order logic sentence). Using first order logic to
handle real word problem domains fails for three main reasons as discussed
below:
1) Laziness:
It is too much work to list the complete set of antecedents or consequents
needed to ensure an exceptionless rule and too hard to use such rules.
2) Theoretical ignorance:
A perticular problem may not have complete theory for the domain.
3) Practical ignorance:
Even if all the rules are known, perticular aspects of problem are not checked
yet or some details are not considered at all (missing out the details).
The agent's knowledge can provide it with a degree of belief with relevent
sentences. To this degree of belief probability theory is applied. Probability
assigns a numerical degree of belief between 0 and 1 to each sentence.
Probability provides a way of summarizing the uncertainity that comes from our
laziness and ignorance.
Assigning probability of 0 to a given sentence corresponds to an unequivocal
belief saying that sentence is false. Assigning probability of 1 corresponds to an
unequivocal belief saying that the sentence is true. Probabilities between 0 and
1 correspond to intermediate degree of belief in the truth of the sentence.
The beliefs completely depends on percepts of agent at perticular time. These
percepts constitute the evidence on which probability assertions are based.
Assignment of probability to a proposition is analogous to saying that whether
the given logical sentence (or its negation) is entailed by the knowledge base
rather than whether it is true or not. When more sentences are added to
knowledge base the entailment keeps on changing. Similarly the probability
would also keep on changing with additional knowledge.
All probability statements must therefore, indicate the evidence with respect to
which the probability is being assessed. As the agent receives new percepts, its
probability assessments are updated to reflect the new evidence. Before the
evidence is obtained, we talk about prior or unconditional probability; after the
evidence is obtained, we talk about posterior or conditional probability. In most
cases, an agent will have some evidence from its percepts and will be interested
in computing the posterior probabilities of the outcomes it cares about.
Uncertainty and rational decisions:
The presence of uncertainty drastically changes the way an agent makes
decision. At perticular time an agent can have various available decisions, from
which it has to make a choice. To make such choices an agent must have a
preferences between the different possible outcomes, of the various plans.
A perticular outcome is completely specified state, along with the expected
factors related with the outcome.
For example: Consider a car driving agent who wants to reach at airport by a
specific time say at 7.30 pm.
Here factors like, whether agent arrived at airport on time, what is the length of
waiting duration at the airport are attached with the outcome.
Utility Theory
Utility theory is used to represent and reason with preferences. The term utility
in current context is used as "quality of being useful".
Utility theory says that every state has a degree of usefulness called as utility.
The agent will prefer the states with higher utility.
The utility of the state is relative to the agent for which utility function is
calculated on the basis of agent's preferences.
For example: The pay off functions for games are utility functions. The utility
of a state in which black has won a game of chess is obviously high for the
agent playing black and low for the agent playing white.
There is no measure that can count test or preferences. Someone loves deep
chocolate icecream and someone loves chocochip icecream. A utility function
can account for altruistic behavior, simply by including the welfare of other as
one of the factors contributing to the agent's own utility.
Decision theory
Preferences as expressed by utilities are combined with probabilities for making
rational decisions. This theory, of rational decision making is called as decision
theory.
Decision theory can be summarized as,
Decision theory = Probability theory + Utility theory.
• The principle of Maximum Expected Utility (MEU):
Decision theory says that the agent is rational if and only if it chooses the action
that yields highest expected utility,averaged over all the possible outcomes of
the action.
• Design for a decision theoretic agent:
Following algorithm sketches the structure of an agent that uses decision theory
to select actions.
The algorithm
Function: DT-AGENT (percept) returns an action.
Static: belief-state, probabilistic beliefs about the current state of the world.
action, the agent's action.
- Update belief-state based on action and percept
- Calculate outcome probabilities for actions, given actions descriptions
and current belief-state
- Select action with highest expected utility given probabilities of outcomes
and utility information
- Return action.
A decision therotic agent that selects rational actions.
The decision theoretic agent is identical, at an abstract level, to the logical
agent. The primary difference is that the decision theoretic agent's knowledge of
the current state is uncertain; the agent's belief state is a representation of the
probabilities of all possible actual states of the world.
As time passes, the agent accumulates more evidence and its belief state
changes. Given the belief state, the agent can make probabilistic predictions of
action outcomes and hence select the action with highest expected utility.
8. Bayes' Rule
Bayes' rule is derived from the product rule.
The product rule can be written as,
P(a Ʌ b)=P(a | b) P(b) .…(7.1.6)
P(a Ʌ b)=P(b | a) P(a) .... (7.1.7)
[because conjunction is commutative]
Equating right sides of equation (7.1.6) and equation (7.1.7) and dividing by
P(a),
P(b | a) = P(a | b)P(b) / P(a)
This equation is called as Bayes' rule or Bayes' theorem or Bayes' law. This rule
is very useful in probabilistic inferences.
Generalized Bayes' rule is,
P(Y | X) = P(X | Y) P(Y) / P(X)
(where P has same meanings)
We can have more general version, conditionalized on some background
evidence e.
P(Y | X, e) = P(X |Y, e) P(Y | e) / P(X | e)
General form of Bays' rule with normalization's
P(y | x) = α P(x | y) P(y).
Applying Bays' Rule:
1) It requires total three terms (1 conditional probability and 2 unconditional
Probabilities). For computing one conditional probability.
For example: Probability of patient having low sugar has high blood pressure is
50 %.
Let, M be proposition, 'patient has low sugar'.
S be a proposition, 'patient has high blood pressure'.
Suppose we assume that, doctor knows following unconditional fact,
i) Prior probabilition of (m) = 1/50,000.
ii) Prior probability of (s) = 1/20.
Then we have,
P(s | m) = 0.5
P(m) = 1 | 50000
P(s) = 1 | 20
P(m/s) = P(s | m) P(m) / P(s)
= 0.5×1/50000 /1/20
= 0.0002
That is, we can expect that 1 in 5000 with high B.P. will has low sugar.
2) Combining evidence in Bayes' rule.
Bayes rule is helpful for answering queries conditioned on evidences.
For example: Toothache and catch both evidences are available then cavity is
sure to exist. Which can be represented as,
P(Cavity | Toothache ^ Catch) = α <0.108, 0.016> ≈ <0.871, 0.129>
By using Bayes' rule to reformulate the problem:
P(Cavity | Toothache ^ Catch) = α P(Toothache A Catch | Cavity) P(Cavity)
……(7.1.8)
For this reformulation to work, we need to know the conditional probabilities of
the conjunction Toothache Catch for each value of Cavity. That might be
feasible for just two evidence variables, but again it will not scale up.
If there are n possible evidence variable (X-rays, diet, oral hygiene, etc.), then
there are 2n possible combinations of observed values for which we would need
to know conditional probabilities.
The notion of independence can be used here. These variables are independent,
however, given the presence or the absence of a cavity. Each is directly caused
by the cavity, but neither has a direct effect on the other. Toothache depends on
the state of the nerves in the tooth, where as the probe's accuracy depends on the
dentist's skill, to which the toothache is irrelevant.
Mathematically, this property is written as,
P(Toothache ^ Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) . …
(7.1.9)
This equation expresses the conditional independence of toothache and catch,
given cavity.
Substitute equation (7.1.3) into (7.1.4) to obtain the probability of a cavity:
P (Cavity | Toothache ^Catch) = α P (Toothache | Cavity) P (Catch | Cavity) P
(Cavity)
Now, the information requirement are the same as for inference using each
piece of evidence separately the prior probability P(Cavity) for the query
variable and the conditional probability of each effect, given its cause.
Conditional independence assertions can allow probabilistic systems to scale
up; more over, they are much more commonly available than absolute
independence assertions. When their are 'n' variables given that they are all
conditionally independent, the size of the representation grows as O(n) instead
of O(2n).
For example -
Consider dentistry example, in which a single cause, directly influences a
number of effects, all of which are conditionally independent, given the cause.
The full joint distribution can be written as,
P(Cause, Effect1,..., Effectn) = P(Cause) II P(Effecti | Cause).
Such a probability distribution is called as naive Bayes' model - "naive" because
it is often used (as a simplifying assumption) in cases where the "effect"
variables are not conditionally independent given the cause variable. The naive
Bayes model is sometimes called as Bayesian classifier.
Figure 2.1 A simple Bayesian network in which Weather is independent of the other three
variables and Toothache and Catch are conditionally independent, given Cavity.
Figure 13.2 A typical Bayesian network, showing both the topology and the conditional
probability tables (CPTs). In the CPTs, the letters B, E, A, J, and M stand for Burglary,
Earthquake, Alarm, JohnCalls, and MaryCalls, respectively.
Figure 13.3 Network structure and number of parameters depends on order of introduction. (a) The structure
obtained with ordering M, J, A, B, E. (b) The structure obtained with
M, J, E, B, A. Each node is annotated with the number of parameters required; 13 in all for
(a) and 31 for (b). In Figure ??, only 10 parameters were required.
Figure 13.4 (a) A node X is conditionally independent of its non-descendants (e.g., the Zijs)
given its parents (the Uis shown in the gray area). (b) A node X is conditionally independent
of all other nodes in the network given its Markov blanket (the gray area).
Figure 13.5 A complete conditional probability table for ¶(Fever | Cold, Flu, Malaria),
assuming a noisy-OR model with the the three q-values shown in bold.
Figure 13.6 A simple network with discrete variables (Subsidy and Buys) and continuous
variables (Harvest and Cost) Figure 13.7 The graphs in (a) and (b) show the probability distribution over Cost as
a function of Harvest size, with Subsidy true and false, respectively. Graph (c) shows the distribution P(Cost |
Harvest), obtained by summing over the two subsidy cases. Figure 13.8 (a) A normal (Gaussian) distribution for
the cost threshold, centered on µ = 6.0
with standard deviation σ = 1.0. (b) Expit and probit models for the probability of buys given
cost, for the parameters µ = 6.0 and σ = 1.0. Figure 13.9 A Bayesian network for evaluating car insurance
applications. Figure 13.10 The structure of the expression shown in Equation (??). The evaluation proceeds top
down, multiplying values along each path and summing at the “+” nodes. Notice
the repetition of the paths for j and m. Figure 13.11 The enumeration algorithm for exact inference in Bayes
netsFigure 13.11 The enumeration algorithm for exact inference in Bayes netsFigure 13.13 The variable
elimination algorithm for exact inference in Bayes nets. Figure 13.14 Bayes net encoding of the 3-CNF sentence
(W ∨ X ∨ Y ) ∧ (¬W ∨ Y ∨ Z) ∧ (X ∨ Y ∨ ¬Z) . Figure 13.15 (a) A multiply connected network describing
Mary’s daily lawn routine: each
morning, she checks the weather; if it’s cloudy, she usually doesn’t turn on the sprinkler; if
the sprinkler is on, or if it rains during the day, the grass will be wet. Thus, Cloudy affects
WetGrass via two different causal pathways. (b) A clustered equivalent of the multiply
connected network. Figure 13.16 A sampling algorithm that generates events from a Bayesian network. Each
variable is sampled according to the conditional distribution given the values already sampled
for the variable’s parents. igure 13.17 The rejection-sampling algorithm for answering queries given evidence in
a Bayesian network.
Figure 13.23 (a) A causal Bayesian network representing cause–effect relations among five
variables. (b) The network after performing the action “turn Sprinkler on.”
TWO MARKS