AI Unit3 KnowledgeRepresentation
AI Unit3 KnowledgeRepresentation
Knowledge Representation
By
Dr. Mrudul Dixit
Logic, Propositional logic, First order logic,
Sensing
Decision Knowledge
Maker
Action
• The Wumpus world is a cave which has 4/4 rooms connected with passageways.
• There are total 16 rooms which are connected with each other.
• There is a knowledge-based agent who will go forward in this world.
• The cave has a room with a beast which is called Wumpus, who eats anyone who enters the room.
• The Wumpus can be shot by the agent, but the agent has a single arrow.
• In the Wumpus world, there are some Pits rooms which are bottomless, and if agent falls in Pits, then he will
be stuck there forever.
• The exciting thing with this cave is that in one room there is a possibility of finding a heap of gold.
• So the agent goal is to find the gold and climb out the cave without fallen into Pits or eaten by Wumpus.
• The agent will get a reward if he comes out with gold, and he will get a penalty if eaten by Wumpus or falls in
the pit.
Knowledge Representation
• Wumpus World:
•
Knowledge Representation
• PEAS for the Wumpus world:
• P-Performance :
1) 1000 points when gold found
2) -100 when fall in pit
3) -1 for every move
4) -10 when arrow is used
● E-Environment: Grid of 4*4 with pits at some squares and gold at one square and agent position at 1*1 square
● A-Actuators:
1) Turn 90 degree left or right
2) Walk 1 square forward
3) Grab or take object
4) Shoot arrow
● S-Sensor: 5 sensors:
1) In adjacent rooms of Wumpus agent perceives stench (excluding diagonally)
2) In square with pit agent perceives breeze (excluding diagonally)
3) In room containing gold agent perceives glitter
4) When agent walks in a wall he perceives bump
5) When Wumpus is killed it screams that can be perceived anywhere in environment
● Agent draws conclusion based on facts
● Reasoning leads the agent to take correct actions and its dependent on correctness of facts
● So logical reasoning is essence to get correct conclusion
Knowledge Representation
• Wumpus World:
• Its made up of Propositional logic, Entailment and logical inference
• Propositional Logic: Boolean logic
• It is the way in which the truth of sentences is determined
• It’s a simple logic
• Entailment:
• It is the relation between a sentence and another sentence that follows from it and see how this
leads to a simple algorithm for logical inference.
• Syntax: The syntax of propositional logic defines the allowable sentences.
• The atomic sentences: the indivisible syntactic elements consist of a single proposition symbol
• Each such symbol stands for a proposition that can be true or false.
• The uppercase names for symbols: P, Q, R, and so on.
• The names are arbitrary but are often chosen to have some mnemonic value to the reader.
• Example: Say use W1,3 to stand for the proposition that the wumpus is in [1,3].
• symbols such as WI,3 are atomic, i.e., W, I, and 3 are not meaningful parts of the symbol.
• There are two proposition symbols with fixed meanings: True is the always-true proposition
and False is the always-false proposition.
• Complex sentences are constructed from simpler sentences using logical connectives.
Logic, Propositional logic, First order logic,
● Many knowledge representation systems rely on some variant of logic Such as Propositional, First Order Logic and
Temporal Logic
● Logic defines:
− Syntax: describes how sentences are formed in the language
− Semantics: describes the meaning of sentences, what is it the sentence refers to in the real world
− Inference procedure: logical notion of truth, property of completeness so as say that statement is true
● Propositional logic:
− Propositional Logic is Simplest type of logic
− A proposition is a statement that is either true or false
− Syntax and Semantic
● Syntax : related to formal language structure
− Atomic statements and complex statements
● Semantic is the meaning
● Examples: Atomic Statement
− Pitt is located in the Oakland section of Pittsburgh.
− It is raining today.
● Complex sentences:
− It is raining outside and the traffic in Oakland is heavy.
− It is raining outside AND (∧) traffic in Oakland is heavy
● First order logic: It is complex logic: objects, relations, properties are explicit
Knowledge Representation
• Propositional Logic is formed using:
1. Logical constants: true, false
2. Propositional symbols: P, Q,... (atomic sentences)
3. Wrapping parentheses: ( … )
5.
Examples of Propositional Logic sentences
∙ (P ∧ Q) → R
“If it is hot and humid, then it is raining”
∙ Q→P
“If it is humid, then it is hot”
∙Q
“It is humid.”
∙ We can have any symbols for statements
A Backus–Naur form or Backus normal form (BNF )grammar of
sentences in propositional logic
BNF is context free grammar type used to describe the syntax of the programming
languages
S := <Sentence> ;
<Sentence> := <AtomicSentence> | <ComplexSentence> ;
<AtomicSentence> := "TRUE" | "FALSE" |
"P" | "Q" | "S" ;
<ComplexSentence> := "(" <Sentence> ")" |
<Sentence> <Connective> <Sentence> |
"NOT" <Sentence> ;
<Connective> := "AND" | "OR" | "IMPLIES" | "EQUIVALENT" ;
A=B + C*100 TREE ---> LINKED LIST
Formal grammar of propositional logic
So
Knowledge Representation
Propositional Logic: Its foundation is declarative, compositional semantics that is context-independent and
unambiguous and build a more expressive logic on that foundation, borrowing representational ideas from natural
language while avoiding its drawbacks.
a+b*c -> (a+b) * c or a +(b*c)
• The syntax of natural language, the elements are nouns and noun phrases that refer to objects (squares, pits,
wumpuses) and verbs and verb phrases that refer to relations among objects (is breezy, is adjacent to, shoots).
• Some of these relations are functions—relations in which there is only one “value” for a given “input.”
• It is easy to start listing examples of objects, relations, and functions:
• Objects: people, houses, numbers, theories, Ronald McDonald, colors, baseball games, wars, centuries . . .
• Relations: these can be unary relations or properties such as red, round, bogus, prime, multistoried . . ., or more
general n-ary relations such as brother of, bigger than, inside, part of, has color, occurred after, owns, comes
between, . . .
• Functions: father of, best friend, third inning of, one more than, beginning of . . .
Knowledge Representation
Examples:
1. “One plus two equals three.”
• Objects: one, two, three, one plus two;
• Relation: equals;
• Function: plus. (“One plus two” is a name for the object that is obtained by applying the function “plus” to the
objects “one” and “two.” “Three” is another name for this object.)
∙ Inference rules are applied to derive proofs and the proof is a sequence of the conclusion that leads to the
desired goal.
∙ Logical inference creates new sentences that logically follow from a set of sentences (Knowledge Base)
∙ An inference rule is sound if every sentence X it produces when operating on a KB logically follows from
the KB i.e., inference rule creates no contradictions
∙ An inference rule is complete if it can produce every expression that logically follows from (is entailed by)
the KB.
Inference rules
∙ Terms in Inference :
• Implication: It is one of the logical connectives which can be represented as P → Q. It is a
Boolean expression.
• Converse: The converse of implication, which means the right-hand side proposition goes to
the left-hand side and vice-versa. It can be written as Q → P.
• Contrapositive: The negation of converse is termed as contrapositive, and it can be represented
as ¬ Q → ¬ P.
• Inverse: The negation of implication is called inverse. It can be represented as ¬ P → ¬ Q.
∙ If you resolve two Horn clauses, you get back a Horn clause.
∙ Inference with Horn clauses can be done through the forward- chaining and
backward chaining algorithms
Horn sentences
∙ A Horn sentence or Horn clause is a clause containing at most 1 positive literal
∙ A definite horn clause contains exactly 1 positive literal
∙ Clauses are used in 2 ways
● A Disjunctions (∨ i.e or) : (rain ∨ Sleet (ie. Snow flakes))
Solution:
Convert all the above facts into first-order definite clauses, and then we will use a forward-chaining algorithm to reach the goal
○ It is a crime for an American to sell weapons to hostile nations. (Let's say p, q, and r are variables)
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)
○ Country A has some missiles. ?p Owns(A, p) ∧ Missile(p).
○ It can be written in two definite clauses by using Existential Instantiation, introducing new Constant T1.
Owns(A, T1) ......(2)
Missile(T1) .......(3)
○ All of the missiles were sold to country A by Robert. : ?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
○ Missiles are weapons: Missile(p) → Weapons (p) .......(5)
○ Enemy of America is known as hostile. : Enemy(p, America) →Hostile(p) ........(6)
○ Country A is an enemy of America: Enemy (A, America) .........(7)
○ Robert is American :American(Robert). ..........(8)
https://www.javatpoint.com/forward-chaining-and-backward-chaining-in-ai
Example:"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A, an enemy of
America, has some missiles, and all the missiles were sold to it by Robert, who is an American citizen."
Step-1: In the first step we will start with the known facts and will choose the sentences which do not have implications, such
as: American(Robert), Enemy(A, America), Owns(A, T1), and Missile(T1). All these facts will be represented as below.
Step-2: At the second step, we will see those facts which infer from available facts and with satisfied premises. Rule-(1) does
not satisfy premises, so it will not be added in the first iteration.
Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which infers from the conjunction of Rule (2) and
(3)
Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and which infers from Rule-(7).
https://www.javatpoint.com/forward-chaining-and-backward-chaining-in-ai
Example:"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A, an enemy of
America, has some missiles, and all the missiles were sold to it by Robert, who is an American citizen."
Step-3: At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1, r/A}, so we can add
Criminal(Robert) which infers all the available facts. And hence we reached our goal statement.
https://www.javatpoint.com/forward-chaining-and-backward-chaining-in-ai
Comparison of Forward and Backward Chaining
● Forward chaining. ● Backward chaining.
● It is also known as data driven inference technique. ● It is also called as goal driven inference technique.
● Forward chaining matches the set of conditions and infer ● It is a backward search from goal to the conditions used to get
results from these conditions. Basically, forward chaining starts the goal. Basically it starts from possible conclusion or goal and
from a new data and aims for any conclusion. aims for necessary data.
● It continues until no more rules can be applied or some ● It process operations in a backward direction from end to start, it
cycle limit is met. will stop when the matching initial condition is met.
● For example: If it is cold then I will wear a sweater. Here “it is ● For example: If it is cold then I will wear a sweater. Here we
cold is the data” and “I will wear a sweater”is a decision. It was have our possible conclusion “I will wear a sweater”. If I am
already known that it is cold that is why it was decided to wear wearing a sweater then it can be stated that it is cold that is why
a sweater, This process is forward chaining. I am wearing a sweater. Hence it was derived in a backward
direction so it is the process of backward chaining.
● It is mostly used in commercial applications i.e event driven
systems are common example of forward chaining. ● It is used in interrogative commercial applications i.e finding
items that fulfill possible goals.
● It can create an infinite number of possible conclusions.
● Number of possible final answers is reasonable.
First Order Logic
∙ Combining the best of formal and natural languages
∙ The propositional logic only deals with the facts, that may be true or false and every expression is a
sentence that represents a fact.
∙ Syntax of natural language, the elements are nouns and noun phrases that refer to objects (squares,
pits, wumpuses) and verbs and verb phrases that refer to relations among objects (is breezy, is adjacent
to, shoots). Some of these relations are functions—relations
∙ The first order logic assumes that the world contains objects, relations and functions.
∙ First-order logic (like natural language) assumes the world contains
∙ Objects: people, houses, numbers, theories, Ronald McDonald, colors, baseball games, wars, centuries
∙ Relations: red, round, bogus, prime, multistoried . . ., brother of, bigger than, inside, part of, has color,
occurred after, owns, comes between, .
∙ Functions: father of, best friend, third inning of, one more than, end of . .
∙ Constant symbols, variables and function symbols are used to build terms, while quantifiers and
predicate symbols are used to build the sentences.
∙
First Order Logic
∙ The primary difference between propositional and first-order logic lies in the ontological
(showing Relations) commitment made by each language—that is, what it assumes about the
nature of reality.
∙ Commitment is expressed through the nature of the formal models w.r.t.which the truth of sentences is
defined.
∙ For example, propositional logic assumes that there are facts that either hold or do not hold in the
world, Each fact can be in one of two states: true or false, and each model assigns true or false to each
proposition symbol
∙ First-order logic assumes that the world consists of objects with certain relations among them that do
or do not hold.
∙ The formal models are correspondingly more complicated than those for propositional logic.
∙
First Order Logic
∙ Special Purpose Logic:Temporal Logic
∙ Temporal Logic make still further ontological commitments and give the status within logic.
∙ Example, temporal logic assumes that facts hold at particular times and that those times (which may
be points or intervals) are ordered.
∙ Higher Order Logic: It views the relations and functions referred to by first-order logic as objects in
themselves and allows one to make assertions about all relations
∙ Example, one could wish to define what it means for a relation to be transitive (more of logic and
mathematics) higher-order logic is strictly more expressive than first-order logic, some sentences
of higher-order logic cannot be expressed by any finite number of first-order logic sentences.
∙ Epistemological Logic commitments :
∙ In both propositional and first order logic, a sentence represents a fact and the agent either believes the
sentence to be true, believes it to be false, or has no opinion.
∙ These logics therefore have three possible states of knowledge regarding any sentence.
∙ Systems using probability theory,can have any degree of belief, ranging from 0 (total disbelief) to 1
(total belief)
∙
• The syntax of first-order logic with equality, specified in Backus–Naur form Operator
precedences are specified, from highest to lowest.
• The precedence of quantifiers is such that a quantifier holds over everything to the right of it.
∙ The universal quantifier is a symbol of symbolic logic which expresses that the statements within
its scope are true for everything, or every instance of a specific thing.
∙ The symbol ∀, which appears as a vertically inverted “A”, is used as the universal quantifier.
∙ we could make a list of everything in the domains (a1,a2,a3,…), we would have these: ∀xP(x)
≡P(a1)∧P(a2)∧P(a3)∧⋯
∙ Existential Quantifier: The symbol ∃ is call the existential quantifier and represents the phrase
∙ The existential quantification of P(x) is the statement “P(x) for some values x in the universe”, or
equivalently, “There exists a value for x such that P(x) is true”, which is written ∃xP(x).
∙ If P(x) is true for at least one element in the domain, then ∃xP(x) is true. Otherwise it is false
First Order Logic
∙ Universal Quantifiers :
∙ Example: Elephants are big
● All things that are Elephants are big.
● For all things x, for which x is a Elephant, x is big.
● For all things x, if x is a Elephant, then x is big.
● Finally the FOL will be written as. : ∀x Elephant (x) ⇒ Big(x)
∙ Everyone studying in Cummins is smart:
● ∀ x (StudiesAt(x,Cummins) ⇒ Smart(x)
∙ All dogs are mammals : ∀x∈A,D(x)⟹M(x) .... A animals D dogs M mammals
∙ All Apples are Delicious: ∀x∈F, A(x)⟹D(x) .... F fruits, A apples, D delicious
∙ All men are mortal. : For all x, if x is a man then x is mortal: ∀x[P(x)→Q(x)].
∙ P = Man Q=Mortal
∙ https://www.zweigmedia.com/RealWorld/logic/logic7.html
First Order Logic
∙ The "all'' form. The universal quantifier is frequently encountered in the following
context: ∀x(P(x)⇒Q(x)),
∙ Examples:
∙ If we say, "if x is negative, so is its cube,'' we usually mean "every negative x has a
negative cube.'' This should be written symbolically as ∀x((x<0)⇒(x3<0)).
∙ ∙ "If two numbers have the same square, then they have the same absolute value'' should
be written as ∀x∀y((x2=y2)⇒(|x|=|y|)).
∙ ∙ "If x=y, then x+z=y+z'' should be written as ∀x∀y∀z((x=y)⇒(x+z=y+z)).
∙ https://www.zweigmedia.com/RealWorld/logic/logic7.html
First Order Logic
∙ Existential quantifiers:
● some professor are a republican: ∃x(x is a professor ∧ x is a republican)
● Some prime number is even: ∃x(x is a prime number ∧ x is even)
∙ Nested Quantifiers:
● Brothers are siblings :∀y brothers(x,y) ⇒ Siblings(x,y)
● ∀xy brothers(x,y) ⇒ Siblings(x,y)
● Everyone loves somebody : ∀x∃yloves(x,y)
● ∀x (∃yloves(x,y)) or ∃x∀yloves(x,y)
∙ Connection between universal and Existential quantifiers
∙ ∀ and ∃ can be connected using negation
∙ Example: Everybody dislikes Judy is same as There does not exist someone who likes
judy
∀xnot Likes(x,judy) is equivalent to ¬∃x Likes(x,judy)
● Everyone likes roses : ∀x likes(x,Roses) is equivalent to not ∃x not likes(X, Roses)
∙ Universal quantifiers and existential quantifiers follow demorgan's rules
Represent the following sentences by first order logic calculus.
i) Some dogs bark
ii) All dogs have four legs
iii) All barking dogs are irritating
iv) No dogs purr
i) Some dogs Bark
Ǝ x ((dog(x) ˄ bark(x))
ii) All dogs have four legs
∀ x (dog(x) → have_four_legs(x)
∀ x (dog(x) → legs(x,4)
iii) All barking dogs are irritating
∀ x (dog(x) → barking(x)
∀ x (dog(x) → irritating(x)
iv) No dogs Purr
⌐Ǝ x (dog(x) ˄ purr(x))
Examples of First Order Logic
1. All birds fly.
In this question the predicate is "fly(bird)."
And since there are all birds who fly so it will be represented as follows.
∀x bird(x) →fly(x).
2. Every man respects his parent.
In this question, the predicate is "respect(x, y)," where x=man, and y= parent.
Since there is every man so will use ∀, and it will be represented as follows:
∀x man(x) → respects (x, parent).
3. Some boys play cricket.
In this question, the predicate is "play(x, y)," where x= boys, and y= game. Since there are some boys so
we will use ∃, and it will be represented as:
∃x boys(x) → play(x, cricket).
4. Not all students like both Mathematics and Science.
In this question, the predicate is "like(x, y)," where x= student, and y= subject.
Since there are not all students, so we will use ∀ with negation, so following representation for this:
¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].
5. Only one student failed in Mathematics.
In this question, the predicate is "failed(x, y)," where x= student, and y= subject.
Since there is only one student who failed in Mathematics, so we will use following representation for
this:ss
∃(x) [ student(x) → failed (x, Mathematics) ∧∀ (y) [¬(x==y) ∧ student(y) → ¬failed (x,
Mathematics)].
● Write the KB in propositional logic:
○ Ram is IT professional
○ IT professionals are hardworking
○ All hardworking people are wealthy
○ All wealthy people pay heavy income tax
● All kids are naughty
● All engineers are intelligent
● Some intelligent students study intelligent systems
● Every students who opts for intelligent systems is determined student
● There is at least one student of AI who dislikes ML assignments
● The Existential Quantifier
● A sentence ∃xP(x) is true if and only if there is at least one value of x (from the universe of discourse) that makes
P(x) true.
● Ex. ∃x(x≥x2) is true since x=0 is a solution. There are many others.
● ∃x∃y(x2+y2=2xy) is true since x=y=1 is one of many solutions.
● The "some'' form. The existential quantifier is frequently encountered in the following context:
∃x (P(x)∧Q(x)), which may be read, "Some x satisfying P(x) also satisfies Q(x).''
● It may at first seem that "Some x satisfying P(x) satisfies Q(x) '' should be translated as ∃x(P(x)⇒Q(x)),
like the universal quantifier. To see why this does not work, suppose P(x)="x is an apple''
and Q(x)="x is an orange.''
The sentence "some apples are oranges'' is certainly false, but
∃x(P(x)⇒Q(x))
is true. To see this suppose x0
is some particular orange. Then P(x0)⇒Q(x0)
evaluates to F⇒T
, which is T, and the existential quantifier is satisfied.
We use abbreviations of the "some'' form much like those for the "all'' form.
Bayesian Probability and Belief network
• Uncertainty:
• If u want to reach a lecture at 5 pm and u need to start x minutes before lec.
• There is uncertainty whether u will reach in x minutes
• As there are many reason for uncertainty such as its raining, traffic, parent don’t allow
vehicle, etc.
• So we need to deal with uncertainty for decision making
• Uncertainty is about the incomplete info as well as uncertainty of reasoning
• So maximum benefit from the knowledge needs to be presented in proper manner so as to
take decision, Knowledge representation is a way
• In day to day life we take decisions based on our ability and known facts
• In case of machines we embed knowledge
• The decision making capability of machine depends on its intelligence which is based on
sound reasoning and concise presentation of knowledge
Bayesian Probability and Belief network
• Dealing with uncertainty requires : Applying logic and Take appropriate actions
• Factors that result into uncertainty are namely, Omitted data, Unavailability of entire data,
Partial availability of events, missing data
• This uncertainty is handled by Probability theory
• Different methods to handle uncertainty :
• Monotonic logic: unreasonable assumptions lead to need for handling contradictions, ex.
Raining, traffic on road or vehicle not available
• Certainty factor leads to belief or disbelief, belief can be handled using probability
• Fuzzy logic and probability theory represent uncertainty
• Probability theory assign probable value when we have uncertain info
• Conditional probabilities represent uncertainties in complex relations
• Bayesian probability is used to handle uncertainty where instead of rules relationship
between variables are represented
Bayesian Probability and Belief network
• Bayesian probability:
• Probability can be interpreted as
• Objective: Represents physical probability of system
• Subjective: It estimates probability from previous experience
• It calculates the degree of belief, this belief can change under new evidence
• This subjective probability is called Bayesian Probability
• Bayesian Probability is process of using probability for predicting the likelihood of certain
event in future
• The probabilities are conditional in Bayesian as the beliefs are conditional
• Ex. Selection of a person in cricket team depend on his previous performance number of
matches played, etc
• Using Bayesian probability we can model these relationships and make decisions
Bayesian Probability and Belief network
• Proposition and hypothesis:
• Propositional logic talk in terms of true or false
• Hypothesis refers to proposed explanation for observed phenomenon i.e. educated guess
• It is expressed in terms of if…then..
• Ex. If a person in prison learns skilled work he is less likely to commit crime
• Antecedent and consequent: If A occurs then, here A is antecedent or hypothesis and
B is consequent
• Probability is likelihood of occurrence of an event
• Value of probability is from 0 to 1 ie. Rare event to common event respectively
Bayesian Probability and Belief network
• Types of probabilities:
1. Classical probability or Priori theory: = (no. of favourable outcomes of event)/total possible
outcomes
2. Joint Probability: Probability of events occurring together, P(A,B)
3. Marginal Probability: Its unconditional probability of an event A irrespective of occurrence of
event B, calculated as sum of joint probability of A over all occurrences of event B
4. Prior Probability: unconditional probability corresponds to belief without any observed data
5. Conditional Probability: It is probability of event A given occurrence of some other event,
P(A|B) = P(A,B)/P(B) … prob. Of A given B
1. For independent events prob. P(A)
2. Dependent events Prob. Is P(A And B) = P(B)P(A|B)
3. P(A|B) = P(A and B)/P(B)
Bayesian Probability and Belief network
• Types of probabilities:
6. Posterior Probability: its conditional probability after considering relevant incidence
1. Product rule P(A,B) = P(A|B)P(B)
2. P(B|A) = P(A,B) / P(a)
3. P(A|B) = P(B|A)P(A)/P(B)
• This is called Bayes Theorem.
• It is used to infer probability of hypothesis under observed data or evidence
• It is also called Bayesian Inference
Bayesian Probability and Belief network
• Types of probabilities:
1. Classical probability or Priori theory:
• Classical probability is the statistical concept that measures the likelihood of
something happening that are equally likely to happen
• The typical example of classical probability would be a fair dice roll because it is
equally probable that you will land on any of the 6 numbers on the die: 1, 2, 3, 4, 5, or 6.
• Another example of classical probability would be a coin toss. There is an equal
probability that your toss will yield a heads or tails result.
• Classical Probability = (no. of favourable outcomes of event)/total possible
outcomes
• Determine the probability of getting an even number when rolling a dice, 3 would be the
number of favourable outcomes because there are 3 even numbers on a die (and
obviously 3 odd numbers).
• The number of possible outcomes would be 6 because there are 6 numbers on a die.
Therefore, the probability of getting an even number when rolling a die is 3/6, or 1/2
when you simplify it.
Bayesian Probability and Belief network
• Types of probabilities:
1. Classical probability or Priori theory: Examples
• probability of odd number if dice is rolled: 1,2,3,4,5,6 there are 3 odd numbers so
probability is 3/6 =0.5
• probability of prime numbers if dice is rolled: prime nos 2,3,5 again 3/6 =0.5
• probability of getting head or tail when coin is tossed : ½= 0.5
• Probability of finding a jack of spades in deck of cards : 1/52
• Probability of finding queens in deck of cards : 4/52
• Probability of getting red cards in deck of card : 26/52
Bayesian Probability and Belief network
• Types of probabilities:
2. Joint Probability: Probability of events occurring together, P(A,B) i.e the
likelihood of more than one event occurring at the same time.
● The events are independent.
• Probability that the number five will occur twice when two dice are rolled at
the same time.
• Since each die has six possible outcomes, the probability of a five occurring on
each die is 1/6 or 0.1666.
• P(A)=0.1666, P(B)=0.1666 , P(A,B)=0.1666 x 0.1666)=0.02777
• This means the joint probability that a five will be rolled on both dice at the same
time is 0.02777.
Bayesian Probability and Belief network
• Types of probabilities:
2. Joint Probability: examples
• Joint probability of 2 coins tossed at time and both getting head :
■ for 1 coin ½, 2nd coin ½ so joint is ½ * ½ = ¼
• joint probability of drawing a number ten card that is black:
■ prob. of ten card =4/52, black card is 26/52 joint prob. 4/52 * 26/52
•
Bayesian Probability and Belief network
• Types of probabilities:
3. Marginal Probability: Its unconditional probability of an event A irrespective of occurrence
of event B, calculated as sum of joint probability of A over all occurrences of event B
• Example: the probability that a card drawn is red (p(red) = 0.5) 26/52.
• The probability that a card drawn is a 4 (p(four)=1/13 ).
4. Prior Probability: unconditional probability corresponds to belief without any observed
data
• For example, three acres of land have the labels A, B, and C.
• One acre has reserves of oil below its surface, while the other two do not.
• The prior probability of oil being found on acre C is one third, or 0.333.
• But if a drilling test is conducted on acre B, and the results indicate that no oil is present at
the location, then the posterior probability of oil being found on acres A and C become
0.5, as each acre has one out of two chances.
Bayesian Probability and Belief network
● P(A) : Prior Prob. : Probability of hypothesis when event has not occurred before
evidence is observed
Coach t f Coach t f
Shyam=t 0.6 0.5 Ram=t 0.8 0.1
Shyam=f 0.4 0.5 Ram=f 0.2 0.9
Bayesian Probability and Belief network
• Applications Bayesian Networks:
• Machine learning
• Statistics
• Computer vision
• Natural language Processing
• Speech recognition
• Error-control codes
• Bioinformatics Medical diagnosis
• Weather forecasting
• PATHFINDER medical diagnosis system at Stanford
• Microsoft Office assistant and troubleshooters
• Space shuttle monitoring system at NASA Mission Control Center in Houston
http://sse.tongji.edu.cn/liangshuang/pr2012fall/slides/08-bn.pdf
Example
• A new Burglar alarm is installed in the house
• This alarm can detect Burglary/theft and earthquakes too
• There are 2 neighbours Ali and Veli who will call the person in the house when they hear
alarm
• Ali always calls when he hears the alarm but at times confuses as he has same ringtone
of his phone as that of alarm.
• Veli likes loud music so sometimes he misses to call when alarm rings
• Given the evidence of who has or has not called we want to
• 1. find the probability of burglary
• 2. Probability that alarm rings and when no burglary and no earthquake
• Solution :
• Different Probabilities associated
• 1. Probability of burglary has happened /not
• 2. Probability of Earthquake has happened /not
• 3. Probability whether the alarm rings or not in all possible cases of Burglary and
Earthquake
• 4. Probability Ali hear alarm and calls
• 5. Probability Veli hears the alarm and calls
Example
• Solution :
• Different Probabilities associated P(B=T) P(B=F)
• 1. Probability of burglary (B) has happened /not
0.001 0.999
• 3. Probability whether the alarm (A) rings or not in all possible cases of Burglary and
Earthquake ie. Alarm (A) ringing is dependent on parent ie B and E
B E A=T A=F
• T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999
Example
• Solution :
A P(AC=T) P(AC=F)
•
• 4. Probability Ali hear alarm and calls (AC) T 0.90 0.10
• Alarm is parent We need to consider Alarm F 0.05 0.95
ringing probability
A P(VC=T) P(VC=F)
• 5. Probability Veli hears the alarm and calls (VC)
T 0.70 0.30
F 0.01 0.99
• Find probability of Ali and Veli call Alarm rings And no Burglary and Earthquake
• P(AC, VC , A, not(B), not(E))
•
Bayesian Probability and Belief network
• Four kinds of inference in belief networks:
• Diagnostic inference: from effects to causes: Example: Given that Ali Calls, infer
P(Burglary|Ali Calls)
• Causal inference: from causes to effects Example: Given Burglary, infer
P(AliCalls|Burglary) and P(Veli Calls|Burglary)
• Intercausal inference: between causes of a common effect Given Alarm, we have
P(Burglary|Alarm) = 0.376. But with the evidence that Earthquake is true, then
P(Burglary|Alarm∧Earthquake) goes down to 0.003. Even though burglaries and
earthquakes are independent, the presence of one makes the other less likely.
• Mixed inference: combining two or more of the above Example: Setting the effect AliCalls
to true and the cause Earthquake to false gives
P(Alarm|AliCalls ∧ ¬Earthquake) = 0.03 - a combination of diagnostic and causal inference
Bayesian Probability and Belief network
● Naive Bayes Algorithm:
● It works on Bayes theorem of probability to predict the class of unknown data sets.
● It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors.
● A Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence
of any other feature.
● Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c).
● P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
● P(c) is the prior probability of class.
● P(x|c) is the likelihood which is the probability of predictor given class.
● P(x) is the prior probability of predictor. https://www.analyticsvidhya.com/blog/2017/09/naive-ba
yes-explained/
Bayesian Probability and Belief network
● Step by Step Naive Bayes algorithm:
● Step 1: Convert the data set into a frequency table
● Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64.
● Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is
the outcome of prediction.
● Assumption : In Naive Bayes it’s assumed that all the features are independent of each other.
● Also need to convert the continuous variables into discrete variables.
Applications of Naive Bayes Algorithms
● Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making
predictions in real time.
● Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here we can predict the
probability of multiple classes of target variable.
● Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text classification (due to
better result in multi class problems and independence rule) have higher success rate as compared to other algorithms. As a
result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify
positive and negative customer sentiments)
● Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System
that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a
given resource or not
There are three types of Naive Bayes model under the scikit-learn library:
● Gaussian: It is used in classification and it assumes that features follow a normal distribution.
● Multinomial: It is used for discrete counts. For example, let’s say, we have a text classification problem. Here we can
consider Bernoulli trials which is one step further and instead of “word occurring in the document”, we have “count how often
word occurs in the document”, you can think of it as “number of times outcome number x_i is observed over the n trials”.
● Bernoulli: The binomial model is useful if your feature vectors are binary (i.e. zeros and ones). One application would be text
classification with ‘bag of words’ model where the 1s & 0s are “word occurs in the document” and “word does not occur in
the document” respectively
●
Sr. No. Outlook Temp. Humidity Windy Play
Yes/No Given are the details about outlook,
1 Rainy Hot High False No
Temperature, humidity, wind condition.
decide whether Game can be played in
2 Rainy Hot High True No condition outlook is rainy, temp. is
cool, Humidity is high and windy is
3 Overcast Hot High False Yes true
4 Sunny Mild High False Yes Step 1:
P(yes) =9/14
5 Sunny Cool Normal False Yes
P(No)=5/14
6 Sunny Cool Normal True No
4 Sunny Mild High False Yes Outlook Yes No Total P(Yes) P(No)
5 Sunny Cool Normal False Yes
Rainy 2 3 5 2/9 3/5
6 Sunny Cool Normal True No
● Example:
● In a set of 52 cards find the probability of card King in the cards with
face(Jack, Queen, King)
• Find the probability of a patient’s of having liver disease if they are an alcoholic.
• 10% patients entering clinic have liver disease P(A)=0.1
• 5% patients of patients are alcoholic P(B) = 0.05
• 7% patients with liver disease are alcoholic : P(B|A) 0.07 : the probability that a
patient is alcoholic, given that they have liver disease is 7 %
• Prob. That patient having liver disease are alcoholic P(A|B)
• Bayes Theorem = P(A|B) = P(B|A) * P(A) /P(B)
• P(A|B) = (0.07 * 0.1)/0.05 = 0.14
• In other words, if the patient is an alcoholic, their chances of having liver disease is 0.14
(14%). This is a large increase from the 10% suggested by past data. But it’s still unlikely
that any particular patient has liver disease.
Bayesian Probability and Belief network
● Example:
• Find the probability of a patient’s of having liver disease if they are an alcoholic.
• 10% patients entering clinic have liver disease P(A)=0.1 (20%) (25%) (22%)
• 5% patients of patients are alcoholic P(B) = 0.05 (10%) (15%) (12%)
• 7% patients with liver disease are alcoholic : P(B|A) 0.07 : the probability that a
patient is alcoholic, given that they have liver disease is 7 % (8%) (10%) (10%)
• Prob. That patient having liver disease are alcoholic P(A|B)
• Bayes Theorem = P(A|B) = P(B|A) * P(A) /P(B)
• P(A|B) = (0.07 * 0.1)/0.05 = 0.14 (0.08 *0.2)/0.1 = 0.16 =16 %) (.1*.25)/.15 = 16.66
• (0.1*.22)/.1 = 22%
In other words, if the patient is an alcoholic, their chances of having liver disease is 0.14
(14%). This is a large increase from the 10% suggested by past data. But it’s still unlikely
• Suppose that an individual is extracted at random from a population of men.
• The probability of extracting a married individual is 50%.
• The probability of extracting a childless individual is 40%.
• The conditional probability that an individual is childless given that he is married is equal to
20%.
• If the individual we extract at random from the population turns out to be childless, what is the
conditional probability that he is married?
• This conditional probability is called posterior probability and it can be computed by using
Bayes' rule above.
The quantities involved in the computation are
50/100 =1/2
40/100 =4/10 =2/5
20/100 =1/5
The posterior probability is