Chapter 4
Chapter 4
Uncertainty:
Till now, we have learned knowledge representation using first-order logic and
propositional logic with certainty, which means we were sure about the predicates. With
this knowledge representation, we might write A→B, which means if A is true then B is
true, but consider a situation where we are not sure about whether A is true or not then
we cannot express this statement, this situation is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates, we need
uncertain reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
In the real world, there are lots of scenarios, where the certainty of something is not
confirmed, such as "It will rain today," "behavior of someone for some situations," "A
match between two teams or two players." These are probable sentences for which we
can assume that it will happen but not sure about it, so here we use probabilistic
reasoning.
In probabilistic reasoning, there are two ways to solve problems with uncertain
knowledge:
o Bayes' rule
o Bayesian Statistics
Probability can be defined as a chance that an uncertain event will occur. It is the
numerical measure of the likelihood that an event will occur. The value of probability
always remains between 0 and 1 that represent ideal uncertainties.
We can find the probability of an uncertain event by using the below formula.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the
real world.
Posterior Probability: The probability that is calculated after all evidence or information
has taken into account. It is a combination of prior probability and new information.
Conditional probability:
Conditional probability is a probability of occurring an event when another event has
already happened.
Let's suppose, we want to calculate the event A when event B has already occurred, "the
probability of A under the conditions of B", it can be written as:
If the probability of A is given and we need to find the probability of B, then it will be
given as:
It can be explained by using the below Venn diagram, where B is occurred event, so
sample space will be reduced to set B, and now we can only calculate event A when
event B is already occurred by dividing the probability of P(A⋀B) by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who
likes English and mathematics, and then what is the percent of students those who like
English also like mathematics?
Solution:
Hence, 57% are the students who like English also like Mathematics.
Independence :
• Let A be the event that it rains tomorrow, and suppose that P(A)= 1/3. Also
suppose that I toss a fair coin; let B be the event that it lands heads up. We have
P(B)= 1/2. Now I ask you, what is P(A|B)?
• You are right! The result of my coin toss does not have anything to do with
tomorrow's weather. Thus, no matter if B happens or not, the probability of A
should not change.
• Two events are independent if one does not convey any information about the
other.
• Now, let's first reconcile this definition with what we mentioned earlier,
P(A|B)=P(A). If two events are independent, then P(A∩B)=P(A) P(B), so
= P(A)
• Thus, if two events A and B are independent and P(B)≠0, then P(A|B)=P(A).
• To summarize, we can say "independence” means we can multiply the
probabilities of events to obtain the probability of their intersection", or
equivalently, "independence means that conditional probability of one event
given another is the same as the original (prior) probability".
Example: I pick a random number from {1,2,3,⋯,10}, and call it N. Suppose that all
outcomes are equally likely. Let A be the event that N is less than 7, and let B be the
event that N is an even number. Are A and B independent?
Baye’s Theorem:
Describes the probability of an event, based on the prior knowledge of conditions that
might be related to the event.
● Suppose that we know P(a|b), but we are interested in the probability P(b|a). Using
the definition of conditional probability, we have product rule: P(A∧ B) =
P(A|B)P(B), It can actually be written in two forms:
● This equation is known as Bayes’ rule (also Bayes’ law or Bayes’ theorem).
● P(B|A) is called the likelihood, in which we consider that hypothesis is true, then
we calculate the probability of evidence.
● Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B),
and P(A).
● This is very useful in cases where we have a good probability of these three terms
and want to determine the fourth one.
● Suppose we want to perceive the effect of some unknown cause, and want to
compute that cause, then the Bayes' rule becomes:
Example-1: Question: what is the probability that a patient has diseases dengue
with neck pain?
Given Data:
A doctor is aware that disease dengue causes a patient to have neck pain, and it occurs
80% of the time. He is also aware of some more facts, which are given as follows:
● The Known probability that a patient has dengue disease is 1/30,000.
Let be the proposition that patient has neck pain and b be the proposition that patient has
dengue.
P(a|b) = 0.8
P(b) = 1/30000
P(a)=0.02
Hence, we can assume that 1 patient out of 750 patients has dengue disease with neck
pain.
● It is used to calculate the next step of the robot when the already executed step is
given.
Question: what is the probability that a patient has diseases meningitis with a stiff
neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it
occurs 80% of the time. He is also aware of some more facts, which are given as follows:
Let a be the proposition that patient has stiff neck and b be the proposition that patient
has meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a
stiff neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The
probability that the card is king is 4/52, then calculate posterior probability
P(King|Face), which means the drawn face card is a king card.
Solution:
Bayesian Networks :-
Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it
consists of two parts:
The generalized form of Bayesian network that represents and solve decision problems
under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is known
as a directed acyclic graph or DAG.
o Causal Component
o Actual numbers
If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination
of x1, x2, x3.. xn, are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability
distribution.
In general for each variable Xi, we can write the equation as:
Let's understand the Bayesian network through an example by creating a directed acyclic
graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm
reliably responds at detecting a burglary but also responds for minor earthquakes. Harry
has two neighbors David and Sophia, who have taken a responsibility to inform Harry at
work when they hear the alarm. David always calls Harry when he hears the alarm, but
sometimes he got confused with the phone ringing and calls at that time too. On the other
hand, Sophia likes to listen to high music, so sometimes she misses to hear the alarm.
Here we would like to compute the probability of Burglary Alarm.
Problem: Calculate the probability that alarm has sounded, but there is neither a
burglary, nor an earthquake occurred, and David and Sophia both called the Harry.
Solution:
o The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the alarm
and directly affecting the probability of alarm's going off, but David and Sophia's
calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not confer
before calling.
o The conditional distributions for each node are given as conditional probabilities
table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table represent
an exhaustive set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B,
E], can rewrite the above probability statement using joint probability distribution:
● A typical query asks for the posterior probability distribution P(X | e).
● In the burglary network, we might observe the event in which JohnCalls = true
and MaryCalls = true.
● We could then ask for, say, the probability that a burglary has occurred:
● Any conditional probability can be computed by summing terms from the full
joint distribution.
● The following equation shows that the terms P(x, e, y) in the joint distribution can
be written as products of conditional probabilities from the network
● The hidden variables for this query are Earthquake and Alarm.
● Using initial letters for the variables to shorten the expressions, we have
● For each summation, we also need to loop over the variable’s possible values.
● Using the numbers from Figure 4.4, we obtain P(b | j, m) = α × 0.00059224. The
corresponding computation for ¬b yields α × 0.0014919; hence,
● That is, the chance of a burglary, given calls from both neighbors, is about 28%.
The variable elimination algorithm :
● The enumeration algorithm can be improved substantially by eliminating repeated
calculations of the kind illustrated in Figure.
● The idea is simple: do the calculation once and save the results for later use.
● Intermediate results are stored, and summations over each variable are done only
for those portions of the expression that depend on the variable.
● Let us illustrate this process for the burglary network. We evaluate the expression
● Notice that we have annotated each part of the expression with the name of the
corresponding factor; each factor is a matrix indexed by the values of its argument
variables.
● For example, the factors f4(A) and f5(A) corresponding to P(j | a) and P(m | a)
depend just on A because J and M are fixed by the query.
● The process of evaluation is a process of summing out variables (right to left) from
pointwise products of factors to produce new factors, eventually yielding a factor
that is the solution, i.e., the posterior distribution over the query variable.
which can be evaluated by taking the pointwise product and normalizing the result.
● Examining this sequence, we see that two basic computational operations are
required: - pointwise product of a pair of factors, and
● The pointwise product of two factors f1 and f2 yields a new factor f whose
variables are the union of the variables in f1 and f2 and whose elements are given
by the product of the corresponding elements in the two factors.
● Suppose the two factors have variables Y1,...,Yk in common. Then we have
f(X1 ...Xj , Y1 ...Yk, Z1 ...Zl) = f1(X1 ...Xj , Y1 ...Yk) f2(Y1 ...Yk, Z1 ...Zl).
● If all the variables are binary, then f1 and f2 have 2j+k and 2k+l entries, respectively,
and the pointwise product has 2j+k+l entries.
● For example, given two factors f1(A, B) and f2(B,C), the pointwise product f1 × f2
= f3(A, B, C) has 21+1+1 = 8 entries, as illustrated in Figure 4.6. In next slide.
● Notice that the factor resulting from a pointwise product can contain more
variables than any of the factors being multiplied and that the size of a factor is
exponential in the number of variables.
● This is where both space and time complexity arise in the variable elimination
algorithm.
● For example, if we were to sum out E first in the burglary network, the relevant
part of the expression would be
● Now the point wise product inside the summation is computed, and the variable is
summed out of the resulting matrix.
● Notice that matrices are not multiplied until we need to sum out a variable from
the accumulated product.
● At that point, we multiply just those matrices that include the variable to be
summed out.