0% found this document useful (0 votes)
23 views25 pages

Chapter 4

Unit-4 covers the representation and reasoning with uncertain knowledge, focusing on probabilistic reasoning and its applications in AI. It discusses the causes of uncertainty, key concepts such as probability, Bayes' theorem, and Bayesian networks, which are essential for handling uncertainty in real-world scenarios. The document also provides examples and applications of these concepts, illustrating their importance in decision-making and inference under uncertainty.

Uploaded by

Udayan Tathe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views25 pages

Chapter 4

Unit-4 covers the representation and reasoning with uncertain knowledge, focusing on probabilistic reasoning and its applications in AI. It discusses the causes of uncertainty, key concepts such as probability, Bayes' theorem, and Bayesian networks, which are essential for handling uncertainty in real-world scenarios. The document also provides examples and applications of these concepts, illustrating their importance in decision-making and inference under uncertainty.

Uploaded by

Udayan Tathe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Unit-4 : Representing and Reasoning with Uncertain Knowledge

Syllabus: Probability, Connection to logic, independence, Bayes rule,Bayesian Network,


Probabilistic inference and sample applications.

Uncertainty:

Till now, we have learned knowledge representation using first-order logic and
propositional logic with certainty, which means we were sure about the predicates. With
this knowledge representation, we might write A→B, which means if A is true then B is
true, but consider a situation where we are not sure about whether A is true or not then
we cannot express this statement, this situation is called uncertainty.

So to represent uncertain knowledge, where we are not sure about the predicates, we need
uncertain reasoning or probabilistic reasoning.

Causes of uncertainty:

Following are some leading causes of uncertainty to occur in the real world.

1. Information occurred from unreliable sources.


2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:

Probabilistic reasoning is a way of knowledge representation where we apply the concept


of probability to indicate the uncertainty in knowledge. In probabilistic reasoning, we
combine probability theory with logic to handle the uncertainty.

We use probability in probabilistic reasoning because it provides a way to handle the


uncertainty that is the result of someone's laziness and ignorance.

In the real world, there are lots of scenarios, where the certainty of something is not
confirmed, such as "It will rain today," "behavior of someone for some situations," "A
match between two teams or two players." These are probable sentences for which we
can assume that it will happen but not sure about it, so here we use probabilistic
reasoning.

Need of probabilistic reasoning in AI:

o When there are unpredictable outcomes.


o When specifications or possibilities of predicates becomes too large to handle.
o When an unknown error occurs during an experiment.

In probabilistic reasoning, there are two ways to solve problems with uncertain
knowledge:

o Bayes' rule
o Bayesian Statistics

As probabilistic reasoning uses probability and related terms, so before understanding


probabilistic reasoning, let's understand some common terms:
Probability:

Probability can be defined as a chance that an uncertain event will occur. It is the
numerical measure of the likelihood that an event will occur. The value of probability
always remains between 0 and 1 that represent ideal uncertainties.

0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.

P(A) = 0, indicates total uncertainty in an event A.

P(A) =1, indicates total certainty in an event A.

We can find the probability of an uncertain event by using the below formula.

o P(¬A) = probability of a not happening event.


o P(¬A) + P(A) = 1.

Event: Each possible outcome of a variable is called an event.

Sample space: The collection of all possible events is called sample space.

Random variables: Random variables are used to represent the events and objects in the
real world.

Prior probability: The prior probability of an event is probability computed before


observing new information.

Posterior Probability: The probability that is calculated after all evidence or information
has taken into account. It is a combination of prior probability and new information.

Conditional probability:
Conditional probability is a probability of occurring an event when another event has
already happened.

Let's suppose, we want to calculate the event A when event B has already occurred, "the
probability of A under the conditions of B", it can be written as:

Where P(A⋀B)= Joint probability of a and B

P(B)= Marginal probability of B.

If the probability of A is given and we need to find the probability of B, then it will be
given as:

It can be explained by using the below Venn diagram, where B is occurred event, so
sample space will be reduced to set B, and now we can only calculate event A when
event B is already occurred by dividing the probability of P(A⋀B) by P( B ).
Example:

In a class, there are 70% of the students who like English and 40% of the students who
likes English and mathematics, and then what is the percent of students those who like
English also like mathematics?

Solution:

Let, A is an event that a student likes Mathematics

B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics.
Independence :

• Let A be the event that it rains tomorrow, and suppose that P(A)= 1/3. Also
suppose that I toss a fair coin; let B be the event that it lands heads up. We have
P(B)= 1/2. Now I ask you, what is P(A|B)?

• What is your guess?

• You probably guessed that P(A|B)=P(A)=1/3 .

• You are right! The result of my coin toss does not have anything to do with
tomorrow's weather. Thus, no matter if B happens or not, the probability of A
should not change.

• This is an example of two independent events.

• Two events are independent if one does not convey any information about the
other.

• Let us now provide a formal definition of independence.

Two events A and B are independent if and only if P(A∩B)=P(A) P(B).

• Now, let's first reconcile this definition with what we mentioned earlier,
P(A|B)=P(A). If two events are independent, then P(A∩B)=P(A) P(B), so

P(A|B) = P(A∩B) / P(B)

= P(A) P(B) / P(B)

= P(A)

• Thus, if two events A and B are independent and P(B)≠0, then P(A|B)=P(A).
• To summarize, we can say "independence” means we can multiply the
probabilities of events to obtain the probability of their intersection", or
equivalently, "independence means that conditional probability of one event
given another is the same as the original (prior) probability".

Real World Examples :

Figure: Two examples of factoring a large joint distribution into smaller


distributions, using absolute independence. (a) Weather and dental problems are
independent. (b) Coin flips are independent.

Example: I pick a random number from {1,2,3,⋯,10}, and call it N. Suppose that all
outcomes are equally likely. Let A be the event that N is less than 7, and let B be the
event that N is an even number. Are A and B independent?

Answer: We have A={1,2,3,4,5,6} B={2,4,6,8,10}, and A∩B={2,4,6}. Then

P(A)= 6/10= 0.6,

P(B)= 5/10= 0.5,


P(A∩B)= 3/10= 0.3

Therefore, P(A∩B)=P(A)P(B), so A and B are independent. This means that knowing


that B has occurred does not change our belief about the probability of A. In this
problem the two events are about the same random number, but they are still
independent because they satisfy the definition.

Baye’s Theorem:

Describes the probability of an event, based on the prior knowledge of conditions that
might be related to the event.

● In probability theory, it relates the conditional probability and marginal


probability of two random events.

● Suppose that we know P(a|b), but we are interested in the probability P(b|a). Using
the definition of conditional probability, we have product rule: P(A∧ B) =
P(A|B)P(B), It can actually be written in two forms:

P(A ⋀ B)= P(A|B) P(B) or P(A ⋀ B)= P(B|A) P(A)

● Equating the two right-hand sides and dividing by P(a), we get

● This equation is known as Bayes’ rule (also Bayes’ law or Bayes’ theorem).

● This simple equation underlies most modern AI systems for probabilistic


inference.
● P(A|B) is known as posterior, which we need to calculate, and it will be read as
Probability of hypothesis A when we have occurred an evidence B.

● P(B|A) is called the likelihood, in which we consider that hypothesis is true, then
we calculate the probability of evidence.

● P(A) is called the prior probability, probability of hypothesis before considering


the evidence

● P(B) is called marginal probability, pure probability of an evidence.

Applying Bayes' rule:

● Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B),
and P(A).

● This is very useful in cases where we have a good probability of these three terms
and want to determine the fourth one.

● Suppose we want to perceive the effect of some unknown cause, and want to
compute that cause, then the Bayes' rule becomes:

Example-1: Question: what is the probability that a patient has diseases dengue
with neck pain?

Given Data:

A doctor is aware that disease dengue causes a patient to have neck pain, and it occurs
80% of the time. He is also aware of some more facts, which are given as follows:
● The Known probability that a patient has dengue disease is 1/30,000.

● The Known probability that a patient has neck pain is 2%.

Let be the proposition that patient has neck pain and b be the proposition that patient has
dengue.

So we can calculate the following as:

P(a|b) = 0.8

P(b) = 1/30000

P(a)=0.02

Hence, we can assume that 1 patient out of 750 patients has dengue disease with neck
pain.

Following are some applications of Bayes' theorem:

● It is used to calculate the next step of the robot when the already executed step is
given.

● Bayes' theorem is helpful in weather forecasting.

● It can solve the Monty Hall problem.


Example:

Question: what is the probability that a patient has diseases meningitis with a stiff
neck?

Given Data:

A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it
occurs 80% of the time. He is also aware of some more facts, which are given as follows:

o The Known probability that a patient has meningitis disease is 1/30,000.


o The Known probability that a patient has a stiff neck is 2%.

Let a be the proposition that patient has stiff neck and b be the proposition that patient
has meningitis. , so we can calculate the following as:

P(a|b) = 0.8

P(b) = 1/30000

P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a
stiff neck.

Example-2:

Question: From a standard deck of playing cards, a single card is drawn. The
probability that the card is king is 4/52, then calculate posterior probability
P(King|Face), which means the drawn face card is a king card.
Solution:

P(king): probability that the card is King= 4/52= 1/13

P(face): probability that a card is a face card= 3/13

P(Face|King): probability of face card when we assume it is a king = 1

Putting all values in equation (i) we will get:

Bayesian Networks :-

"A Bayesian network is a probabilistic graphical model which represents a set of


variables and their conditional dependencies using a directed acyclic graph."

It is also called a Bayes network, belief network, decision network, or Bayesian


model.

Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.

Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it
consists of two parts:

o Directed Acyclic Graph

o Table of conditional probabilities.

The generalized form of Bayesian network that represents and solve decision problems
under uncertain knowledge is known as an Influence diagram.

A Bayesian network graph is made up of nodes and Arcs (directed links), where:

o Each node corresponds to the random variables, and a variable can


be continuous or discrete.

o Arc or directed arrows represent the causal relationship or conditional


probabilities between random variables. These directed links or arrows connect the
pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there
is no directed link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented
by the nodes of the network graph.

o If we are considering node B, which is connected with node A by a


directed arrow, then node A is called the parent of Node B.

o Node C is independent of node A.

Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is known
as a directed acyclic graph or DAG.

The Bayesian network has mainly two components:

o Causal Component

o Actual numbers

Each node in the Bayesian network has condition probability


distribution P(Xi |Parent(Xi) ), which determines the effect of the parent on that node.

Bayesian network is based on Joint probability distribution and conditional probability.


So let's first understand the joint probability distribution:

Joint probability distribution:

If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination
of x1, x2, x3.. xn, are known as Joint probability distribution.

P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability
distribution.

= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]


= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].

In general for each variable Xi, we can write the equation as:

P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Explanation of Bayesian network:

Let's understand the Bayesian network through an example by creating a directed acyclic
graph:

Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm
reliably responds at detecting a burglary but also responds for minor earthquakes. Harry
has two neighbors David and Sophia, who have taken a responsibility to inform Harry at
work when they hear the alarm. David always calls Harry when he hears the alarm, but
sometimes he got confused with the phone ringing and calls at that time too. On the other
hand, Sophia likes to listen to high music, so sometimes she misses to hear the alarm.
Here we would like to compute the probability of Burglary Alarm.

Problem: Calculate the probability that alarm has sounded, but there is neither a
burglary, nor an earthquake occurred, and David and Sophia both called the Harry.

Solution:

o The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the alarm
and directly affecting the probability of alarm's going off, but David and Sophia's
calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not confer
before calling.
o The conditional distributions for each node are given as conditional probabilities
table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table represent
an exhaustive set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values

List of all events occurring in this network:

o Burglary (B)

o Earthquake(E)

o Alarm(A)

o David Calls(D)

o Sophia calls(S)

We can write the events of problem statement in the form of probability: P[D, S, A, B,
E], can rewrite the above probability statement using joint probability distribution:

P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]

= P [D| A]. P [ S| A, B, E]. P[ A, B, E]

= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]


Probabilistic Inference:
● The basic task for any probabilistic inference system is to compute the posterior
probability distribution for a set of query variables, given some observed
event—that is, some assignment of values to a set of evidence variables.

● We will use the notation:

- X denotes the query variable;

- E denotes the set of evidence variables E1,...,Em, and e is a particular


observed event;

- Y will denotes the nonevidence, nonquery variables Y1,...,Yl (called the


hidden variables).

● Thus, the complete set of variables is X = {X} ∪ E ∪ Y.

● A typical query asks for the posterior probability distribution P(X | e).

● In the burglary network, we might observe the event in which JohnCalls = true
and MaryCalls = true.

● We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true, MaryCalls = true) = 0.284, 0.716 .

Two types of probabilistic Inferences are:

1. Inference by enumeration: inference by listing or recording all variables.

2. The variable elimination algorithm: inference by variable removal.


Inference by enumeration:

● Any conditional probability can be computed by summing terms from the full
joint distribution.

● More specifically, a query P(X | e) can be answered using Equation:

● Where α is normalized constant, X-query variable, e-event and y-no.of terms.

● The following equation shows that the terms P(x, e, y) in the joint distribution can
be written as products of conditional probabilities from the network

● Therefore, a query can be answered using a Bayesian network by computing sums


of products of conditional probabilities from the network.

● Consider the query P(Burglary | JohnCalls = true, MaryCalls = true).

Burglary - query variable

JohnCalls - Evidence Variable1 (E1)

MaryCalls - Evidence Variable2 (E2)

● The hidden variables for this query are Earthquake and Alarm.
● Using initial letters for the variables to shorten the expressions, we have

● The semantics of Bayesian networks then gives us an expression in terms of CPT


entries. For simplicity, we do this just for Burglary = true:

● To compute this expression, we have to add four terms, each computed by


multiplying five numbers.
● An improvement can be obtained from the following simple observations: the P(b)
term is a constant and can be moved outside the summations over a and e, and
the P(e) term can be moved outside the summation over a. Hence, we have:

● This expression can be evaluated by looping through the variables in order,


multiplying CPT entries as we go.

● For each summation, we also need to loop over the variable’s possible values.

● The structure of this computation is shown in the next slide Figure


Figure: Alarm system example

● Using the numbers from Figure 4.4, we obtain P(b | j, m) = α × 0.00059224. The
corresponding computation for ¬b yields α × 0.0014919; hence,

● That is, the chance of a burglary, given calls from both neighbors, is about 28%.
The variable elimination algorithm :
● The enumeration algorithm can be improved substantially by eliminating repeated
calculations of the kind illustrated in Figure.

● The idea is simple: do the calculation once and save the results for later use.

● This is a form of dynamic programming.

● Variable elimination works by evaluating expressions in right-to-left order (that is,


bottom up in Figure 4.5).

● Intermediate results are stored, and summations over each variable are done only
for those portions of the expression that depend on the variable.

● Let us illustrate this process for the burglary network. We evaluate the expression

● Notice that we have annotated each part of the expression with the name of the
corresponding factor; each factor is a matrix indexed by the values of its argument
variables.

● For example, the factors f4(A) and f5(A) corresponding to P(j | a) and P(m | a)
depend just on A because J and M are fixed by the query.

● They are therefore two-element vectors:

● f3(A, B, E) will be a 2 × 2 × 2 matrix, which is hard to show on the printed page.


(The “first” element is given by P(a | b, e)=0.95 and the “last” by P(¬a |
¬b,¬e)=0.999.)

● In terms of factors, the query expression is written as


● where the “×” operator is not ordinary matrix multiplication but instead the
pointwise product operation, to be described shortly.

● The process of evaluation is a process of summing out variables (right to left) from
pointwise products of factors to produce new factors, eventually yielding a factor
that is the solution, i.e., the posterior distribution over the query variable.

● The steps are as follows:

which can be evaluated by taking the pointwise product and normalizing the result.

● Examining this sequence, we see that two basic computational operations are
required: - pointwise product of a pair of factors, and

- summing out a variable from a product of factors.

Pointwise product of a pair of factors

● The pointwise product of two factors f1 and f2 yields a new factor f whose
variables are the union of the variables in f1 and f2 and whose elements are given
by the product of the corresponding elements in the two factors.

● Suppose the two factors have variables Y1,...,Yk in common. Then we have

f(X1 ...Xj , Y1 ...Yk, Z1 ...Zl) = f1(X1 ...Xj , Y1 ...Yk) f2(Y1 ...Yk, Z1 ...Zl).
● If all the variables are binary, then f1 and f2 have 2j+k and 2k+l entries, respectively,
and the pointwise product has 2j+k+l entries.

● For example, given two factors f1(A, B) and f2(B,C), the pointwise product f1 × f2
= f3(A, B, C) has 21+1+1 = 8 entries, as illustrated in Figure 4.6. In next slide.

● Notice that the factor resulting from a pointwise product can contain more
variables than any of the factors being multiplied and that the size of a factor is
exponential in the number of variables.

● This is where both space and time complexity arise in the variable elimination
algorithm.

Figure : Illustrating pointwise multiplication: f1(A, B) × f2(B,C) = f3(A, B, C).

Summing out a variable from a product of factors is done by adding up the


submatrices formed by fixing the variable to each of its values in turn.

● For example, to sum out A from f3(A, B, C), we write


● The only trick is to notice that any factor that does not depend on the variable to
be summed out can be moved outside the summation.

● For example, if we were to sum out E first in the burglary network, the relevant
part of the expression would be

The variable elimination algorithm :

● Now the point wise product inside the summation is computed, and the variable is
summed out of the resulting matrix.

● Notice that matrices are not multiplied until we need to sum out a variable from
the accumulated product.

● At that point, we multiply just those matrices that include the variable to be
summed out.

You might also like