Module 3
Module 3
ACTING UNDER
UNCERTAINTY
When an agent knows enough facts about its environment, the logical approach
enables it to derive plans that are guaranteed to work. This is a good thing.
Unfortunately, agents almost never have access to the whole truth about their
environment. Agents must, therefore, act under uncertainty.
If a logical agent cannot conclude that any particular course of action achieves its
goal, then it will be unable to act. Conditional planning can overcome uncertainty to
some extent, but only if the agent's sensing actions can obtain the required
information and only if there are not too many different contingencies.
Handling uncertain knowledge
we look more closely at the nature of uncertain knowledge. We will use a simple
diagnosis example to illustrate the concepts involved. Diagnosis-whether for
medicine, automobile repair, or whatever-is a task that almost always involves
uncertainty. Let us try to write rules for dental diagnosis using first-order logic, so
that we can see how the logical approach breaks down. Consider the following rule:
Trying to use first-order logic to cope with a domain like medical diagnosis thus fails
for three main reasons:
Laziness: It is too much work to list the complete set of antecedents or consequents
needed to ensure an exception less rule and too hard to use such rules.
Theoretical ignorance: Medical science has no complete theory for the domain.
Practical ignorance: Even if we know all the rules, we might be uncertain about a
particular patient because not all the necessary tests have been or can be run.
The connection between toothaches and cavities is just not a logical consequence in
either direction.
This is typical of the medical domain, as well as most other judgmental domains:
law, business, design, automobile repair, gardening, dating, and so on.
The agent's knowledge can at best provide only a degree of belief in the relevant
sentences.
Our main tool for dealing with degrees of belief will be probability theory, which
assigns to each sentence a numerical degree of belief between 0 and 1.
Probability provides a way of summarizing the uncertainty that comes from our
laziness and ignorance. We might not know for sure what afflicts a particular patient,
but we believe that there is, say, an 80% chance-that is, a probability of 0.8-that the
patient has a cavity if he or she has a toothache.
The 80% summarizes those cases in which all the factors needed for a cavity to
cause a toothache are present and other cases in which the patient has both toothache
and cavity but the two are unconnected. The missing 20% summarizes all the other
possible causes of toothache that we are too lazy or ignorant to confirm or deny.
Assigning a probability of 0 to a given sentence corresponds to an unequivocal
belief that the sentence is false, while assigning a probability of 1 corresponds to an
unequivocal belief that the sentence is true.
Probabilities between 0 and 1 correspond to intermediate degrees of belief in the truth of the
sentence. The sentence itself is in fact either true or false.
It is important to note that a degree of belief is different from a degree of truth. A probability of 0.8
does not mean "80% true" but rather an 80% degree of belief-that is, a fairly strong expectation.
Thus, probability theory makes the same ontological commitment as logic namely, that facts either
do or do not hold in the world. Degree of truth, as opposed to degree of belief, is the subject of
fuzzy logic.
In logic, a sentence such as "The patient has a cavity" is true or false depending on the
interpretation and the world; it is true just when the fact it refers to is the case. In probability theory,
a sentence such as "The probability that the patient has a cavity is 0.8" is about the agent's beliefs,
not directly about the world.
These beliefs depend on the precepts that the agent has received to date. These precepts constitute
the evidence on which probability assertions are based.
All probability statements must therefore indicate the evidence with respect to
which the probability is being assessed. As the agent receives new percepts, its
probability assessments are updated to reflect the new evidence.
Uncertainty and rational decisions
The presence of uncertainty radically changes the way an agent makes decisions. A
logical agent typically has a goal and executes any plan that is guaranteed to achieve
it. An action can be selected or rejected on the basis of whether it achieves the goal,
regardless of what other actions might achieve.
To make such choices, an agent must first have preferences between the different
possible outcomes of the various plans.
Utility theory to represent and reason with preferences. (The term utility is used
here in the sense of "the quality of being useful," not in the sense of the electric
company or water works.) Utility theory says that every state has a degree of
usefulness, or utility, to an agent and that the agent will prefer states .with higher
utility.
Preferences, as expressed by utilities, are combined with probabilities in the general
theory of rational decisions called decision theory:
Decision theory = probability theory + utility theory .
The fundamental idea of decision theory is that an agent is rational if and only if it
chooses the action that yields the highest expected utility, averaged over all the
possible outcomes of the action. This is called the principle of Maximum Expected
Utility (MEU).
Basic Probability Notation
Any notation for describing degrees of belief must be able to deal with two main issues:
the nature of the sentences to which degrees of belief are assigned and the dependence of
the degree of belief on the agent's experience.
The version of probability theory we present uses an extension of propositional logic for
its sentences.
The dependence on experience is reflected in the syntactic distinction between prior
probability statements, which apply before any evidence is obtained, and conditional
probability statements, which include the evidence explicitly.
Propositions
Degrees of belief are always applied to propositions--assertions that such-and-such is the
case. So far we have seen two formal languages-propositional logic and first-order logic for
stating propositions. Probability theory typically uses a language that is slightly more
expressive than propositional logic.
The basic element of the language is the random variable, which can be thought of as
referring to a "part" of the world whose "status" is initially unknown. For example,
Cavity
might refer to whether my lower left wisdom tooth has a cavity. Random variables play a
role
similar to that of CSP variables in constraint satisfaction problems and that of proposition
symbols in propositional logic. We will always capitalize the names of random variables.
(However, we still use lowercase, single-letter names to represent an unknown random
variable,
for example: P(a) = 1 - P(~a).)
Types of Intelligence
As with CSP variables, random variables are typically divided into three kinds,
depending on the type of the domain:
Boolean random variables, such as Cavity, have the domain (true, false). We will
often abbreviate a proposition such as Cavity = true simply by the lowercase name
cavity. Similarly, Cavity = false would be abbreviated by 1 cavity.
Discrete random variables, which include Boolean random variables as a special
case take on values from a countable domain. For example, the domain of Weather
might be(sunny, rainy, cloudy, snow). The values in the domain must be mutually
exclusive and exhaustive. Where no confusion arises, we: will use, for example,
snow as an abbreviation for Weather = snow.
Continuous random variables take on values from the: real numbers. The domain can be either the
entire real line or some subset such as the interval [0,1]. For example, the proposition X = 4.02
asserts that the random variable X has the exact value 4.02.Propositions concerning continuous
random variables can also be inequalities, such as X <= 4.02.
Prior probability
The unconditional or prior probability associated with a proposition a is the degree of belief
PRIOR PROBABILITY accorded to it in the absence of any other information; it is written as P(a).
For example, if
the prior probability that I have a cavity is 0.1, then we would write
P(Cavity = true) = 0.1 or P(cavity) = 0.1 .
It is important to remember that P(a) can be used only when there is no other information.
As soon as some new information is known, we must reason with the conditional probability
of a given that new information.
we will use an expression such as P( Weather), which denotes a vector of values for
the probabilities of each individual state of the weather. Thus, instead of writing the
four equations
P( Weather = sunny) = 0.7
P( TNeather = rain) = 0.2
P( Weather = cloudy) = 0.08
P( Weather = snow) = 0.02 .
we may simply write
P( Weather) = (0.7,0.2,0.08,0.02) .
This statement defines a prior probability distribution for the random variable
Weather.
In that case, P( Weather, Cavity) can be represented by a 4 x 2 table of probabilities. This is
called the joint probability distribution of Weather and Cavity.
Conditional probability
Once the agent has obtained some evidence concerning the previously unknown random
variables making up the domain, prior probabilities are no longer applicable. Instead, we use
conditional or posterior probabilities. The notation used is P(alb), where a and b are any
Proposition. This is read as "the probability of a, given that all we know is b." For example,
P(cavity1/toothache) = 0.8
indicates that if a patient is observed to have a toothache and no other information is yet
available, then the probability of the patient's having a cavity will be 0.8. A prior probability,
such as P(cavity), can be thought of as a special case of the conditional probability P(cavity/
------ ),
where the probability is conditioned on no evidence.
Conditional probabilities can be defined in terms of unconditional probabilities. The
defining equation is
which holds whenever P(b) > 0. This equation can also be written as
P(a A b) = P(a( b) P(b)
which is called the product rule. The product rule is perhaps easier to remember: it
comes from the fact that, for a and b to be true, we need b to be true, and we also
need a to be true given b. We can also have it the other way around:
P(a A b) = P(b/a)P(a.)
P(X, Y ) = P(XIY)P(Y)
The Axioms of Probability
the basic axioms that serve to define the probability scale and its endpoints:
1. All probabilities are between 0 and 1. For any proposition a,
2. Necessarily true (i.e., valid) propositions have probability I, and necessarily false
(i.e.,unsatisfiable) propositions have probability 0.
Next, we need an axiom that connects the probabilities of logically related
propositions. The simplest way to do this is to define the probability of a disjunction
as follows:
3. The probability of a disjunction is given by
This rule is easily remembered by noting that the cases where a holds, together with the
cases where b holds, certainly cover all the cases where a V b holds; but summing the two
sets of cases counts their intersection twice, so we need to subtract Y (a A b).
These three axioms are often called Kolmogorov's axioms in honor of the Russian
mathematician Andrei Kolmogorov, who showed how to build up the rest of probability
theory from this simple foundation.
Using the axioms of probability
We can derive a variety of useful facts from the basic ,axioms. For example, the familiar rule
for negation follows by substituting la for b in axiom 3, giving us:
P(a V ~a) = P(a) + P(~a) - P(a A ~a) ('by axiom 3 with b = ~a)
P(true) = P(a) + P(~a) - P(false) (by logical equivalence)
1 = P(a) + P(~a) (by axiom 2)
P(~a) = 1 - P(a) (lby algebra).
Inference Using Full Joint Distributions
In this section we will describe a simple method for probabilistic inference-that is,
the computation from observed evidence of posterior probabilities for query
propositions. We will use the full joint distribution as the "knowledge base" from
which answers to all questions may be derived. Along the way we will also introduce
several useful techniques for manipulating equations involving probabilities.
We begin with a very simple example: a domain consisting of just the three Boolean
variables Toothache, Cavity, and Catch (the dentist's nasty steel probe catches in my
tooth). The full joint distribution is a 2 x 2 x 2 table.
The probability of a proposition is equal to the sum of the probabilities of the
atomic events in which it holds; that is, This equation provides a simple method for
computing the probability of any proposition, given a full joint distribution that
specifies the probabilities of all atomic events.
For example, there are six atomic events in which cavity V toothache holds:
P(cavity V toothache) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 == 0.28
One particularly common task is to extract the distribution over some subset of
variables or a single variable. For example, adding the entries in the first row gives
the uncondiltional or marginal probability of cavity:
P(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2 .
This process is called marginalization, or summing out-because the variables other
than Cavity are summed out. We can write the following general marginalization rule
for any sets of variables Y and Z:
That is, a distribution over Y can be obtained by summing out all the other variables
from any joint distribution containing Y. A variant of this rule involves conditional
probabilities instead of joint probabilities, using the product rule:
This rule is called conditioning. Marginalization and conditioning will turn out to be
useful rules for all kinds of derivations involving probability expressions.
Conditional probabilities can be found by first using to obtain an expression in terms
of unconditional probabilities and then evaluating the expression from the full joint
distribution. For example, we can compute the probability of a cavity, given evidence
of a toothache, as follows:
A
Just to check, we can also compute the probability that there is no cavity, given a
toothache:
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs 80%
of the time. He is also aware of some more facts, which are given as follows:
The Known probability that a patient has meningitis disease is 1/30,000.
The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition that patient has
meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
Question: From a standard deck of playing cards, a single card is drawn. The
probability that the card is king is 4/52, then calculate posterior probability
P(King|Face), which means the drawn face card is a king card.
Solution: