0% found this document useful (0 votes)
16 views

AI Unit3 KnowledgeRepresentation

Uploaded by

btechproject404
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

AI Unit3 KnowledgeRepresentation

Uploaded by

btechproject404
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

Unit III

Knowledge Representation

By
Dr. Mrudul Dixit
Logic, Propositional logic, First order logic,

● Search techniques and intelligent agents acquire information from environment


and build knowledge with reference to problem
● This knowledge is used for further actions and decisions
● So appropriate and precise representation of knowledge is important
● What is knowledge?
− Its a set of patterns and associations derived from data that helps in making
decisions and resolves problems
● Ex. Teacher wants to judge the performance of student
− Basis will be percentile of student would obtain
− Or the previous performance or info from other teachers
− This is available knowledge used to come to a decision
Logic, Propositional logic, First order logic,

● Systematic reasoning is required when we relate events to outcomes or arrive to judgements


● So reasoning is a way we conclude on different aspects of problem based on available
knowledge representation
● Logic plays important role in reasoning
● Knowledge Representation is about representing the facts.
● Knowledge representation and reasoning is the field of artificial intelligence (AI) dedicated to
representing information about the world in a form that a computer system can utilize to
solve complex tasks such as diagnosing a medical condition or having a dialog in a natural
language.
● Knowledge representation and reasoning incorporates findings from logic to automate various
kinds of reasoning, such as the application of rules or the relations of sets and subsets
Logic, Propositional logic, First order logic,

● Knowledge representation (KR) is the study of


− how knowledge and facts about the world can be represented, and
− what kinds of reasoning can be done with that knowledge.
● Goals of KR:
● We want a representation that is:
● Rich enough to express the knowledge needed to solve the problem
● As close to the problem as possible: compact, natural and maintainable, amenable
to efficient computation
● Able to express features of the problem we can exploit for computational gain
● Able to trade off accuracy and computation time
Knowledge and Intelligence

Knowledge plays a important role in demonstrating Intelligent behaviour

Sensing
Decision Knowledge
Maker
Action

To represent Knowledge in a machine :


1. Language is required to represent the knowledge
2. Method to use this knowledge
3. Inference engine
4. Syntax and semantics
Laughs(Tom) =??, Likes(sunita, Rekha) =??
Logic, Propositional logic, First order logic,

● Logic is Formal Language


● Propositional Logic
Prepositions
● Anil is hardworking
● Anil is intelligent
● If Anil is intelligent and Anil is hardworking Then Anil will score high grades
Knowledge Representation
• Wumpus World:
• The Wumpus world is a simple world example to illustrate the worth of a knowledge-based agent and to
represent knowledge representation.
• It was inspired by a video game Hunt the Wumpus by Gregory Yob in 1973.

• The Wumpus world is a cave which has 4/4 rooms connected with passageways.
• There are total 16 rooms which are connected with each other.
• There is a knowledge-based agent who will go forward in this world.
• The cave has a room with a beast which is called Wumpus, who eats anyone who enters the room.
• The Wumpus can be shot by the agent, but the agent has a single arrow.
• In the Wumpus world, there are some Pits rooms which are bottomless, and if agent falls in Pits, then he will
be stuck there forever.
• The exciting thing with this cave is that in one room there is a possibility of finding a heap of gold.
• So the agent goal is to find the gold and climb out the cave without fallen into Pits or eaten by Wumpus.
• The agent will get a reward if he comes out with gold, and he will get a penalty if eaten by Wumpus or falls in
the pit.
Knowledge Representation
• Wumpus World:

Knowledge Representation
• PEAS for the Wumpus world:
• P-Performance :
1) 1000 points when gold found
2) -100 when fall in pit
3) -1 for every move
4) -10 when arrow is used
● E-Environment: Grid of 4*4 with pits at some squares and gold at one square and agent position at 1*1 square
● A-Actuators:
1) Turn 90 degree left or right
2) Walk 1 square forward
3) Grab or take object
4) Shoot arrow
● S-Sensor: 5 sensors:
1) In adjacent rooms of Wumpus agent perceives stench (excluding diagonally)
2) In square with pit agent perceives breeze (excluding diagonally)
3) In room containing gold agent perceives glitter
4) When agent walks in a wall he perceives bump
5) When Wumpus is killed it screams that can be perceived anywhere in environment
● Agent draws conclusion based on facts
● Reasoning leads the agent to take correct actions and its dependent on correctness of facts
● So logical reasoning is essence to get correct conclusion
Knowledge Representation
• Wumpus World:
• Its made up of Propositional logic, Entailment and logical inference
• Propositional Logic: Boolean logic
• It is the way in which the truth of sentences is determined
• It’s a simple logic
• Entailment:
• It is the relation between a sentence and another sentence that follows from it and see how this
leads to a simple algorithm for logical inference.
• Syntax: The syntax of propositional logic defines the allowable sentences.
• The atomic sentences: the indivisible syntactic elements consist of a single proposition symbol
• Each such symbol stands for a proposition that can be true or false.
• The uppercase names for symbols: P, Q, R, and so on.
• The names are arbitrary but are often chosen to have some mnemonic value to the reader.
• Example: Say use W1,3 to stand for the proposition that the wumpus is in [1,3].
• symbols such as WI,3 are atomic, i.e., W, I, and 3 are not meaningful parts of the symbol.
• There are two proposition symbols with fixed meanings: True is the always-true proposition
and False is the always-false proposition.
• Complex sentences are constructed from simpler sentences using logical connectives.
Logic, Propositional logic, First order logic,
● Many knowledge representation systems rely on some variant of logic Such as Propositional, First Order Logic and
Temporal Logic
● Logic defines:
− Syntax: describes how sentences are formed in the language
− Semantics: describes the meaning of sentences, what is it the sentence refers to in the real world
− Inference procedure: logical notion of truth, property of completeness so as say that statement is true
● Propositional logic:
− Propositional Logic is Simplest type of logic
− A proposition is a statement that is either true or false
− Syntax and Semantic
● Syntax : related to formal language structure
− Atomic statements and complex statements
● Semantic is the meaning
● Examples: Atomic Statement
− Pitt is located in the Oakland section of Pittsburgh.
− It is raining today.
● Complex sentences:
− It is raining outside and the traffic in Oakland is heavy.
− It is raining outside AND (∧) traffic in Oakland is heavy
● First order logic: It is complex logic: objects, relations, properties are explicit
Knowledge Representation
• Propositional Logic is formed using:
1. Logical constants: true, false
2. Propositional symbols: P, Q,... (atomic sentences)

3. Wrapping parentheses: ( … )

4. Sentences are combined by connectives:


∧ and [conjunction]
∨ or [disjunction]
⇒ implies [implication / conditional]
⇔ is equivalent [biconditional]
¬ not [negation]
5. Literal: atomic sentence or negated atomic sentence P, ¬ P
Knowledge Representation
• There are five connectives in common use:
1. ⌐ (not) : A sentence such as ⌐ W1,3 is called the negation of W1,3
• Literal : It is either an atomic sentence (a positive literal) or a negated atomic sentence
(a negative literal).
2. ˄ (and) : A sentence whose main connective is A, such as W1,3 A P1,3 is called
Conjunction , its parts are the conjuncts
3. ˅ (or): A sentence using V, such as (W1,3 A P1,3 ) ˅ W22 a disjunction of the disjuncts
as (W1,3 A P1,3 ) And W22
4.

5.
Examples of Propositional Logic sentences
∙ (P ∧ Q) → R
“If it is hot and humid, then it is raining”
∙ Q→P
“If it is humid, then it is hot”
∙Q
“It is humid.”
∙ We can have any symbols for statements
A Backus–Naur form or Backus normal form (BNF )grammar of
sentences in propositional logic

BNF is context free grammar type used to describe the syntax of the programming
languages
S := <Sentence> ;
<Sentence> := <AtomicSentence> | <ComplexSentence> ;
<AtomicSentence> := "TRUE" | "FALSE" |
"P" | "Q" | "S" ;
<ComplexSentence> := "(" <Sentence> ")" |
<Sentence> <Connective> <Sentence> |
"NOT" <Sentence> ;
<Connective> := "AND" | "OR" | "IMPLIES" | "EQUIVALENT" ;
A=B + C*100 TREE ---> LINKED LIST
Formal grammar of propositional logic

The grammar is very strict about parentheses: every sentence constructed


with binary connectives must be enclosed in parentheses. This ensures that the syntax is
completely unambiguous.
This can write instead of

So
Knowledge Representation
Propositional Logic: Its foundation is declarative, compositional semantics that is context-independent and
unambiguous and build a more expressive logic on that foundation, borrowing representational ideas from natural
language while avoiding its drawbacks.
a+b*c -> (a+b) * c or a +(b*c)
• The syntax of natural language, the elements are nouns and noun phrases that refer to objects (squares, pits,
wumpuses) and verbs and verb phrases that refer to relations among objects (is breezy, is adjacent to, shoots).
• Some of these relations are functions—relations in which there is only one “value” for a given “input.”
• It is easy to start listing examples of objects, relations, and functions:
• Objects: people, houses, numbers, theories, Ronald McDonald, colors, baseball games, wars, centuries . . .
• Relations: these can be unary relations or properties such as red, round, bogus, prime, multistoried . . ., or more
general n-ary relations such as brother of, bigger than, inside, part of, has color, occurred after, owns, comes
between, . . .
• Functions: father of, best friend, third inning of, one more than, beginning of . . .
Knowledge Representation
Examples:
1. “One plus two equals three.”
• Objects: one, two, three, one plus two;
• Relation: equals;
• Function: plus. (“One plus two” is a name for the object that is obtained by applying the function “plus” to the
objects “one” and “two.” “Three” is another name for this object.)

2. “Squares neighboring the wumpus are smelly.”

• Objects: wumpus, squares;


• Property: smelly;
• Relation: neighboring.

3. “Evil King John ruled England in 1200.”

• Objects: John, England, 1200;


• Relation: ruled;
• Properties: evil, king.
Some terms
∙ The meaning or semantics of a sentence determines its interpretation.
∙ Given the truth values of all symbols in a sentence, it can be “evaluated” to determine its truth value (True or
False).
∙ A model for a KB is a “possible world” (assignment of truth values to propositional symbols) in which each
sentence in the KB is True.
∙ A valid sentence or tautology is a sentence that is True under all interpretations, no matter what the world is
actually like or what the semantics is. Example: “It’s raining or it’s not raining.”
∙ An inconsistent sentence or contradiction is a sentence that is False under all interpretations. The world is never
like what it describes, as in “It’s raining and it’s not raining.”
∙ P entails Q, written P |= Q, means that whenever P is True, so is Q. In other words, all models of P are also models
of Q.
Propositional Logic Examples
1.
A. It is hot
B. It is humid
C. It is raining
If B then A , B->A
If A and B then not C (A and B)->not C
2.
Rohan is intelligent and hardworking. It can be written as,
P= Rohan is intelligent,
Q= Rohan is hardworking. → P∧ Q.

3. "Ritika is a doctor or Engineer",


P= Ritika is Doctor.
Q= Ritika is Engineer, representation P ∨ Q.
Propositional Logic Examples
4. If it is raining, then the street is wet.
Let P= It is raining, and Q= Street is wet, so it is represented as P → Q
5. If I am breathing, then I am alive
P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.
6. If I have money then only I will buy dress : A= I have money B I will buy dress A ⇔ B

• Limitations of Propositional logic:


• We cannot represent relations like ALL, some, or none with propositional logic.
• Example:
All the girls are intelligent.
Some apples are sweet.
All triangles have 3 sides
None of the mangoes are ripe
• Propositional logic has limited expressive power.
• In propositional logic, we cannot describe statements in terms of their properties or logical relationships.
Truth tables for propositional Logic

Apply Propositional Logic for:


A = The sun is shining
B = The weather is raining
C = You are carrying an umbrella
D = You are getting wet
Inference rules
∙ Inference is generating the conclusions from evidence and facts

∙ An AI system should create new logic from old logic or by evidence

∙ Inference rules are the templates for generating valid arguments.

∙ Inference rules are applied to derive proofs and the proof is a sequence of the conclusion that leads to the
desired goal.

∙ Logical inference creates new sentences that logically follow from a set of sentences (Knowledge Base)

∙ An inference rule is sound if every sentence X it produces when operating on a KB logically follows from
the KB i.e., inference rule creates no contradictions

∙ An inference rule is complete if it can produce every expression that logically follows from (is entailed by)
the KB.
Inference rules
∙ Terms in Inference :
• Implication: It is one of the logical connectives which can be represented as P → Q. It is a
Boolean expression.
• Converse: The converse of implication, which means the right-hand side proposition goes to
the left-hand side and vice-versa. It can be written as Q → P.
• Contrapositive: The negation of converse is termed as contrapositive, and it can be represented
as ¬ Q → ¬ P.
• Inverse: The negation of implication is called inverse. It can be represented as ¬ P → ¬ Q.

• So P → Q is equivalent to ¬ Q → ¬ P, and Q→ P is equivalent to ¬ P → ¬ Q.


∙ Resolution is a valid inference rule producing a new clause implied by two clauses containing
complementary literals
∙ To use resolution, put KB into conjunctive normal form (CNF), where each sentence written as a
disjunction of (one or more) literals

Clauses
∙ Horn clauses and definite clauses:

∙ Definite clause is a disjunction of literals of which exactly one is positive.

∙ Horn clause is a disjunction of literals of which at most one is positive.

∙ All definite clauses are Horn clauses

∙ Goal Clauses are the clauses with no positive literals

∙ If you resolve two Horn clauses, you get back a Horn clause.

∙ Inference with Horn clauses can be done through the forward- chaining and
backward chaining algorithms
Horn sentences
∙ A Horn sentence or Horn clause is a clause containing at most 1 positive literal
∙ A definite horn clause contains exactly 1 positive literal
∙ Clauses are used in 2 ways
● A Disjunctions (∨ i.e or) : (rain ∨ Sleet (ie. Snow flakes))

● As implication : (¬ child ∨ ¬ male ∨ boy)


∙ Horn clause = at most 1 positive literal in clause
● Positive/definite clause = exactly 1 positive literal :[¬ p1 ¬p2 ..... ¬pn, q]

● [¬ p1 ∨ ¬p2 ∨..... ∨ ¬pn ∨ q]


● Negative clause = no positive literals: [¬ p1 ¬p2 ..... ¬pn,] and []

∙ Resolution with horn negative and positive is positive ,


∙ What Horn sentences give up are handling, in a general way, (1) negation and (2)
disjunctions
Forward and Backward Chaining
∙ Forward chaining (or forward reasoning) is the method of reasoning when
using an inference engine and can be described logically as repeated
application of modus ponens.
∙ Forward chaining is implemented for expert systems, business and production
rule systems.
∙ Forward chaining:
● Starts with the available data and uses inference rules to extract more
data (from an end user, for example) until a goal is reached.
● An inference engine using forward chaining searches the inference rules until

it finds one where the antecedent (If clause) is known to be true.


● When such a rule is found, the engine can conclude, or infer, the consequent

(Then clause), resulting in the addition of new information to its data.


● Inference engines will iterate through this process until a goal is reached.
Forward and Backward Chaining
∙ Expert system programming is distinctively different from conventional
programming.
● Program = algorithm + data

● Expert system = inference engine + knowledge base + data


∙ Forward Chaining is working from the facts to a conclusion.
∙ It is also called as data-driven approach.
∙ To chain forward, match data in working memory against 'conditions' of rules
in the rule-base.
∙ When one of them fires, this is liable to produce more data.
∙ So the cycle continues
Forward and Backward Chaining
∙ Suppose that the Goal is to conclude the Colour of a pet named Fritz, given he croaks and
eat flies
∙ The Rule base given is
1) If X croaks and X eats flies (premise or antecedent)– Then X is a frog (conclusion or
consequent)
2) If X chirps and X sings – Then X is a canary
3) If X is a frog – Then X is green
4) If X is a canary – Then X is yellow
• Forward chaining by following the pattern a computer as it evaluates the rules assuming facts
Fritz croaks and Fritz eats flies
• With forward reasoning the inference engine can derive that Fritz is green in series of steps
• 1. The base facts Fritz croaks and eats flies, the antecedent of rule 1 is satisfied substituting
Fritz as X the inference engine concludes
• Fritz is frog
• 2. The antecedent of rule 3 is then satisfied by substituting Fritz for X and the inference engine
concludes
• Fritz is Green
• The name forward chaining comes from fact that the inference engine starts with the data and
reasons its way to answer
Forward and Backward Chaining

∙ Backward chaining: working from the conclusion to the facts is called as


goal-driven approach.
∙ To chain backward, match a goal in working memory against 'conclusions' of
rules in the rule-base.
∙ When one of them fires, this is liable to produce more goals.
∙ So the cycle continues.
∙ It is used in automated theorem provers, inference engines, proof
assistants, and other artificial intelligence applications.
∙ In game theory, researchers apply it to (simpler) subgames to find a solution to
the game, in a process called backward induction.
∙ In chess, it is called retrograde analysis, and it is used to generate table bases
for chess endgames for computer chess.
∙ Backward chaining systems usually employ a depth-first search
Forward and Backward Chaining
∙ An Example of Backward Chaining.
1) If X croaks and X eats flies – Then X is a frog
2) If X chirps and X sings – Then X is a canary
3) If X is a frog – Then X is green
4) If X is a canary – Then X is yellow
Check if Fritz croaks and eats flies:
∙ With backward reasoning, an inference engine can determine whether Fritz is green in four steps.
1. Fritz is substituted for X in rule 3 to see if the consequent matches the Goal so rule 3 becomes
• If Fritz is frog Then Fritz is green … the fritz is green is proved now rule engine needs to
see if the antecedent (fritz is frog) can be proved. The antecedent is new goal : Fritz if
Frog
2. Substitute fritz in rule 1 If Fritz croaks and Fritz eats files Then it is Frog
Consequent matches the goal Fritz is frog the inference engine now needs to see if antecedent (fritz
croaks and eats flies) can be proved. The antecedent is the new goal.
Forward and Backward Chaining
3. Since the goal is conjunction of 2 statements the inference engine breaks it into sub goals which
must be proved
Fritz croaks ;Frits eats flies
4. To prove both of these inference engine sees both of these goals were given initial facts so the
conjunction is true
Fritz croaks and fritz eats flies
• Therefore antecedent of rule 1 is true and consequent must be true
Fritz if frog
• Therefore the antecedent of rule 3 is true and consequent must be true
Fritz ids green
• The derivation allows the inference engine to prove that Fritz is green.
• Rules 2 and 4 were not used
Example:"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A, an enemy of
America, has some missiles, and all the missiles were sold to it by Robert, who is an American citizen."

Prove that "Robert is criminal.

Solution:

Convert all the above facts into first-order definite clauses, and then we will use a forward-chaining algorithm to reach the goal

Facts Conversion into FOL: Making the rule base

○ It is a crime for an American to sell weapons to hostile nations. (Let's say p, q, and r are variables)
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)
○ Country A has some missiles. ?p Owns(A, p) ∧ Missile(p).
○ It can be written in two definite clauses by using Existential Instantiation, introducing new Constant T1.
Owns(A, T1) ......(2)
Missile(T1) .......(3)
○ All of the missiles were sold to country A by Robert. : ?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
○ Missiles are weapons: Missile(p) → Weapons (p) .......(5)
○ Enemy of America is known as hostile. : Enemy(p, America) →Hostile(p) ........(6)
○ Country A is an enemy of America: Enemy (A, America) .........(7)
○ Robert is American :American(Robert). ..........(8)

https://www.javatpoint.com/forward-chaining-and-backward-chaining-in-ai
Example:"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A, an enemy of
America, has some missiles, and all the missiles were sold to it by Robert, who is an American citizen."

Prove that "Robert is criminal.

Solution: Forward chaining proof:

Step-1: In the first step we will start with the known facts and will choose the sentences which do not have implications, such
as: American(Robert), Enemy(A, America), Owns(A, T1), and Missile(T1). All these facts will be represented as below.

Step-2: At the second step, we will see those facts which infer from available facts and with satisfied premises. Rule-(1) does
not satisfy premises, so it will not be added in the first iteration.

Rule-(2) and (3) are already added.

Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which infers from the conjunction of Rule (2) and
(3)

Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and which infers from Rule-(7).

https://www.javatpoint.com/forward-chaining-and-backward-chaining-in-ai
Example:"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A, an enemy of
America, has some missiles, and all the missiles were sold to it by Robert, who is an American citizen."

Prove that "Robert is criminal.

Solution: Forward chaining proof:

Step-3: At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1, r/A}, so we can add
Criminal(Robert) which infers all the available facts. And hence we reached our goal statement.

○ Hence it is proved that Robert is Criminal using forward chaining approach.

https://www.javatpoint.com/forward-chaining-and-backward-chaining-in-ai
Comparison of Forward and Backward Chaining
● Forward chaining. ● Backward chaining.

● It is also known as data driven inference technique. ● It is also called as goal driven inference technique.

● Forward chaining matches the set of conditions and infer ● It is a backward search from goal to the conditions used to get
results from these conditions. Basically, forward chaining starts the goal. Basically it starts from possible conclusion or goal and
from a new data and aims for any conclusion. aims for necessary data.

● It is bottom up reasoning. ● It is top down reasoning.

● It is a breadth first search. ● It is a depth first search.

● It continues until no more rules can be applied or some ● It process operations in a backward direction from end to start, it
cycle limit is met. will stop when the matching initial condition is met.

● For example: If it is cold then I will wear a sweater. Here “it is ● For example: If it is cold then I will wear a sweater. Here we
cold is the data” and “I will wear a sweater”is a decision. It was have our possible conclusion “I will wear a sweater”. If I am
already known that it is cold that is why it was decided to wear wearing a sweater then it can be stated that it is cold that is why
a sweater, This process is forward chaining. I am wearing a sweater. Hence it was derived in a backward
direction so it is the process of backward chaining.
● It is mostly used in commercial applications i.e event driven
systems are common example of forward chaining. ● It is used in interrogative commercial applications i.e finding
items that fulfill possible goals.
● It can create an infinite number of possible conclusions.
● Number of possible final answers is reasonable.
First Order Logic
∙ Combining the best of formal and natural languages
∙ The propositional logic only deals with the facts, that may be true or false and every expression is a
sentence that represents a fact.
∙ Syntax of natural language, the elements are nouns and noun phrases that refer to objects (squares,
pits, wumpuses) and verbs and verb phrases that refer to relations among objects (is breezy, is adjacent
to, shoots). Some of these relations are functions—relations
∙ The first order logic assumes that the world contains objects, relations and functions.
∙ First-order logic (like natural language) assumes the world contains
∙ Objects: people, houses, numbers, theories, Ronald McDonald, colors, baseball games, wars, centuries
∙ Relations: red, round, bogus, prime, multistoried . . ., brother of, bigger than, inside, part of, has color,
occurred after, owns, comes between, .
∙ Functions: father of, best friend, third inning of, one more than, end of . .
∙ Constant symbols, variables and function symbols are used to build terms, while quantifiers and
predicate symbols are used to build the sentences.

First Order Logic
∙ The primary difference between propositional and first-order logic lies in the ontological
(showing Relations) commitment made by each language—that is, what it assumes about the
nature of reality.
∙ Commitment is expressed through the nature of the formal models w.r.t.which the truth of sentences is
defined.
∙ For example, propositional logic assumes that there are facts that either hold or do not hold in the
world, Each fact can be in one of two states: true or false, and each model assigns true or false to each
proposition symbol
∙ First-order logic assumes that the world consists of objects with certain relations among them that do
or do not hold.
∙ The formal models are correspondingly more complicated than those for propositional logic.

First Order Logic
∙ Special Purpose Logic:Temporal Logic
∙ Temporal Logic make still further ontological commitments and give the status within logic.
∙ Example, temporal logic assumes that facts hold at particular times and that those times (which may
be points or intervals) are ordered.
∙ Higher Order Logic: It views the relations and functions referred to by first-order logic as objects in
themselves and allows one to make assertions about all relations
∙ Example, one could wish to define what it means for a relation to be transitive (more of logic and
mathematics) higher-order logic is strictly more expressive than first-order logic, some sentences
of higher-order logic cannot be expressed by any finite number of first-order logic sentences.
∙ Epistemological Logic commitments :
∙ In both propositional and first order logic, a sentence represents a fact and the agent either believes the
sentence to be true, believes it to be false, or has no opinion.
∙ These logics therefore have three possible states of knowledge regarding any sentence.
∙ Systems using probability theory,can have any degree of belief, ranging from 0 (total disbelief) to 1
(total belief)

• The syntax of first-order logic with equality, specified in Backus–Naur form Operator
precedences are specified, from highest to lowest.
• The precedence of quantifiers is such that a quantifier holds over everything to the right of it.

Universal quantifier (which means “for all)

Existential Quantifier ( which means “there exists” or “for some”)


First Order Logic

∙ The universal quantifier is a symbol of symbolic logic which expresses that the statements within
its scope are true for everything, or every instance of a specific thing.

∙ The symbol ∀, which appears as a vertically inverted “A”, is used as the universal quantifier.

∙ The universal quantifier ∀ (which means “for all”),

∙ we could make a list of everything in the domains (a1,a2,a3,…), we would have these: ∀xP(x)
≡P(a1)∧P(a2)∧P(a3)∧⋯

∙ Existential Quantifier: The symbol ∃ is call the existential quantifier and represents the phrase

“there exists” or “for some”.

∙ The existential quantification of P(x) is the statement “P(x) for some values x in the universe”, or
equivalently, “There exists a value for x such that P(x) is true”, which is written ∃xP(x).

∙ If P(x) is true for at least one element in the domain, then ∃xP(x) is true. Otherwise it is false
First Order Logic
∙ Universal Quantifiers :
∙ Example: Elephants are big
● All things that are Elephants are big.
● For all things x, for which x is a Elephant, x is big.
● For all things x, if x is a Elephant, then x is big.
● Finally the FOL will be written as. : ∀x Elephant (x) ⇒ Big(x)
∙ Everyone studying in Cummins is smart:
● ∀ x (StudiesAt(x,Cummins) ⇒ Smart(x)
∙ All dogs are mammals : ∀x∈A,D(x)⟹M(x) .... A animals D dogs M mammals
∙ All Apples are Delicious: ∀x∈F, A(x)⟹D(x) .... F fruits, A apples, D delicious
∙ All men are mortal. : For all x, if x is a man then x is mortal: ∀x[P(x)→Q(x)].
∙ P = Man Q=Mortal

∙ https://www.zweigmedia.com/RealWorld/logic/logic7.html
First Order Logic

∙ The "all'' form. The universal quantifier is frequently encountered in the following
context: ∀x(P(x)⇒Q(x)),
∙ Examples:
∙ If we say, "if x is negative, so is its cube,'' we usually mean "every negative x has a
negative cube.'' This should be written symbolically as ∀x((x<0)⇒(x3<0)).

∙ ∙ "If two numbers have the same square, then they have the same absolute value'' should

be written as ∀x∀y((x2=y2)⇒(|x|=|y|)).
∙ ∙ "If x=y, then x+z=y+z'' should be written as ∀x∀y∀z((x=y)⇒(x+z=y+z)).

∙ https://www.zweigmedia.com/RealWorld/logic/logic7.html
First Order Logic
∙ Existential quantifiers:
● some professor are a republican: ∃x(x is a professor ∧ x is a republican)
● Some prime number is even: ∃x(x is a prime number ∧ x is even)
∙ Nested Quantifiers:
● Brothers are siblings :∀y brothers(x,y) ⇒ Siblings(x,y)
● ∀xy brothers(x,y) ⇒ Siblings(x,y)
● Everyone loves somebody : ∀x∃yloves(x,y)
● ∀x (∃yloves(x,y)) or ∃x∀yloves(x,y)
∙ Connection between universal and Existential quantifiers
∙ ∀ and ∃ can be connected using negation
∙ Example: Everybody dislikes Judy is same as There does not exist someone who likes
judy
∀xnot Likes(x,judy) is equivalent to ¬∃x Likes(x,judy)
● Everyone likes roses : ∀x likes(x,Roses) is equivalent to not ∃x not likes(X, Roses)
∙ Universal quantifiers and existential quantifiers follow demorgan's rules
Represent the following sentences by first order logic calculus.
i) Some dogs bark
ii) All dogs have four legs
iii) All barking dogs are irritating
iv) No dogs purr
i) Some dogs Bark
Ǝ x ((dog(x) ˄ bark(x))
ii) All dogs have four legs
∀ x (dog(x) → have_four_legs(x)
∀ x (dog(x) → legs(x,4)
iii) All barking dogs are irritating
∀ x (dog(x) → barking(x)
∀ x (dog(x) → irritating(x)
iv) No dogs Purr
⌐Ǝ x (dog(x) ˄ purr(x))
Examples of First Order Logic
1. All birds fly.
In this question the predicate is "fly(bird)."
And since there are all birds who fly so it will be represented as follows.
∀x bird(x) →fly(x).
2. Every man respects his parent.
In this question, the predicate is "respect(x, y)," where x=man, and y= parent.
Since there is every man so will use ∀, and it will be represented as follows:
∀x man(x) → respects (x, parent).
3. Some boys play cricket.
In this question, the predicate is "play(x, y)," where x= boys, and y= game. Since there are some boys so
we will use ∃, and it will be represented as:
∃x boys(x) → play(x, cricket).
4. Not all students like both Mathematics and Science.
In this question, the predicate is "like(x, y)," where x= student, and y= subject.
Since there are not all students, so we will use ∀ with negation, so following representation for this:
¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].
5. Only one student failed in Mathematics.
In this question, the predicate is "failed(x, y)," where x= student, and y= subject.
Since there is only one student who failed in Mathematics, so we will use following representation for
this:ss
∃(x) [ student(x) → failed (x, Mathematics) ∧∀ (y) [¬(x==y) ∧ student(y) → ¬failed (x,
Mathematics)].
● Write the KB in propositional logic:
○ Ram is IT professional
○ IT professionals are hardworking
○ All hardworking people are wealthy
○ All wealthy people pay heavy income tax
● All kids are naughty
● All engineers are intelligent
● Some intelligent students study intelligent systems
● Every students who opts for intelligent systems is determined student
● There is at least one student of AI who dislikes ML assignments
● The Existential Quantifier
● A sentence ∃xP(x) is true if and only if there is at least one value of x (from the universe of discourse) that makes
P(x) true.
● Ex. ∃x(x≥x2) is true since x=0 is a solution. There are many others.
● ∃x∃y(x2+y2=2xy) is true since x=y=1 is one of many solutions.
● The "some'' form. The existential quantifier is frequently encountered in the following context:
∃x (P(x)∧Q(x)), which may be read, "Some x satisfying P(x) also satisfies Q(x).''

● It may at first seem that "Some x satisfying P(x) satisfies Q(x) '' should be translated as ∃x(P(x)⇒Q(x)),
like the universal quantifier. To see why this does not work, suppose P(x)="x is an apple''
and Q(x)="x is an orange.''
The sentence "some apples are oranges'' is certainly false, but
∃x(P(x)⇒Q(x))
is true. To see this suppose x0
is some particular orange. Then P(x0)⇒Q(x0)
evaluates to F⇒T
, which is T, and the existential quantifier is satisfied.

We use abbreviations of the "some'' form much like those for the "all'' form.
Bayesian Probability and Belief network
• Uncertainty:
• If u want to reach a lecture at 5 pm and u need to start x minutes before lec.
• There is uncertainty whether u will reach in x minutes
• As there are many reason for uncertainty such as its raining, traffic, parent don’t allow
vehicle, etc.
• So we need to deal with uncertainty for decision making
• Uncertainty is about the incomplete info as well as uncertainty of reasoning
• So maximum benefit from the knowledge needs to be presented in proper manner so as to
take decision, Knowledge representation is a way
• In day to day life we take decisions based on our ability and known facts
• In case of machines we embed knowledge
• The decision making capability of machine depends on its intelligence which is based on
sound reasoning and concise presentation of knowledge
Bayesian Probability and Belief network
• Dealing with uncertainty requires : Applying logic and Take appropriate actions
• Factors that result into uncertainty are namely, Omitted data, Unavailability of entire data,
Partial availability of events, missing data
• This uncertainty is handled by Probability theory
• Different methods to handle uncertainty :
• Monotonic logic: unreasonable assumptions lead to need for handling contradictions, ex.
Raining, traffic on road or vehicle not available
• Certainty factor leads to belief or disbelief, belief can be handled using probability
• Fuzzy logic and probability theory represent uncertainty
• Probability theory assign probable value when we have uncertain info
• Conditional probabilities represent uncertainties in complex relations
• Bayesian probability is used to handle uncertainty where instead of rules relationship
between variables are represented
Bayesian Probability and Belief network
• Bayesian probability:
• Probability can be interpreted as
• Objective: Represents physical probability of system
• Subjective: It estimates probability from previous experience
• It calculates the degree of belief, this belief can change under new evidence
• This subjective probability is called Bayesian Probability
• Bayesian Probability is process of using probability for predicting the likelihood of certain
event in future
• The probabilities are conditional in Bayesian as the beliefs are conditional
• Ex. Selection of a person in cricket team depend on his previous performance number of
matches played, etc
• Using Bayesian probability we can model these relationships and make decisions
Bayesian Probability and Belief network
• Proposition and hypothesis:
• Propositional logic talk in terms of true or false
• Hypothesis refers to proposed explanation for observed phenomenon i.e. educated guess
• It is expressed in terms of if…then..
• Ex. If a person in prison learns skilled work he is less likely to commit crime
• Antecedent and consequent: If A occurs then, here A is antecedent or hypothesis and
B is consequent
• Probability is likelihood of occurrence of an event
• Value of probability is from 0 to 1 ie. Rare event to common event respectively
Bayesian Probability and Belief network
• Types of probabilities:
1. Classical probability or Priori theory: = (no. of favourable outcomes of event)/total possible
outcomes
2. Joint Probability: Probability of events occurring together, P(A,B)
3. Marginal Probability: Its unconditional probability of an event A irrespective of occurrence of
event B, calculated as sum of joint probability of A over all occurrences of event B
4. Prior Probability: unconditional probability corresponds to belief without any observed data
5. Conditional Probability: It is probability of event A given occurrence of some other event,
P(A|B) = P(A,B)/P(B) … prob. Of A given B
1. For independent events prob. P(A)
2. Dependent events Prob. Is P(A And B) = P(B)P(A|B)
3. P(A|B) = P(A and B)/P(B)
Bayesian Probability and Belief network
• Types of probabilities:
6. Posterior Probability: its conditional probability after considering relevant incidence
1. Product rule P(A,B) = P(A|B)P(B)
2. P(B|A) = P(A,B) / P(a)
3. P(A|B) = P(B|A)P(A)/P(B)
• This is called Bayes Theorem.
• It is used to infer probability of hypothesis under observed data or evidence
• It is also called Bayesian Inference
Bayesian Probability and Belief network
• Types of probabilities:
1. Classical probability or Priori theory:
• Classical probability is the statistical concept that measures the likelihood of
something happening that are equally likely to happen
• The typical example of classical probability would be a fair dice roll because it is
equally probable that you will land on any of the 6 numbers on the die: 1, 2, 3, 4, 5, or 6.
• Another example of classical probability would be a coin toss. There is an equal
probability that your toss will yield a heads or tails result.
• Classical Probability = (no. of favourable outcomes of event)/total possible
outcomes
• Determine the probability of getting an even number when rolling a dice, 3 would be the
number of favourable outcomes because there are 3 even numbers on a die (and
obviously 3 odd numbers).
• The number of possible outcomes would be 6 because there are 6 numbers on a die.
Therefore, the probability of getting an even number when rolling a die is 3/6, or 1/2
when you simplify it.
Bayesian Probability and Belief network
• Types of probabilities:
1. Classical probability or Priori theory: Examples
• probability of odd number if dice is rolled: 1,2,3,4,5,6 there are 3 odd numbers so
probability is 3/6 =0.5
• probability of prime numbers if dice is rolled: prime nos 2,3,5 again 3/6 =0.5
• probability of getting head or tail when coin is tossed : ½= 0.5
• Probability of finding a jack of spades in deck of cards : 1/52
• Probability of finding queens in deck of cards : 4/52
• Probability of getting red cards in deck of card : 26/52
Bayesian Probability and Belief network
• Types of probabilities:
2. Joint Probability: Probability of events occurring together, P(A,B) i.e the
likelihood of more than one event occurring at the same time.
● The events are independent.
• Probability that the number five will occur twice when two dice are rolled at
the same time.
• Since each die has six possible outcomes, the probability of a five occurring on
each die is 1/6 or 0.1666.
• P(A)=0.1666, P(B)=0.1666 , P(A,B)=0.1666 x 0.1666)=0.02777
• This means the joint probability that a five will be rolled on both dice at the same
time is 0.02777.
Bayesian Probability and Belief network
• Types of probabilities:
2. Joint Probability: examples
• Joint probability of 2 coins tossed at time and both getting head :
■ for 1 coin ½, 2nd coin ½ so joint is ½ * ½ = ¼
• joint probability of drawing a number ten card that is black:
■ prob. of ten card =4/52, black card is 26/52 joint prob. 4/52 * 26/52

Bayesian Probability and Belief network
• Types of probabilities:
3. Marginal Probability: Its unconditional probability of an event A irrespective of occurrence
of event B, calculated as sum of joint probability of A over all occurrences of event B
• Example: the probability that a card drawn is red (p(red) = 0.5) 26/52.
• The probability that a card drawn is a 4 (p(four)=1/13 ).
4. Prior Probability: unconditional probability corresponds to belief without any observed
data
• For example, three acres of land have the labels A, B, and C.
• One acre has reserves of oil below its surface, while the other two do not.
• The prior probability of oil being found on acre C is one third, or 0.333.
• But if a drilling test is conducted on acre B, and the results indicate that no oil is present at
the location, then the posterior probability of oil being found on acres A and C become
0.5, as each acre has one out of two chances.
Bayesian Probability and Belief network

• Types of probabilities and examples :


5. Conditional Probability: It is probability of event A given occurrence of some other
event, P(A|B) = P(A,B)/P(B) … P(A|B) :prob. Of A given B
1. For independent events prob. P(A)
2. Dependent events Prob. Is P(A And B) = P(B)P(A|B)
3. P(A|B) = P(A and B)/P(B)

Bayesian Probability and Belief network
• Types of probabilities and examples :
5. Conditional Probability ,
• Example : In a group of 100 sports car buyers, 40 bought alarm systems, 30 purchased
bucket seats, and 20 purchased an alarm system and bucket seats.
• If a car buyer chosen at random bought an alarm system, what is the probability they also
bought bucket seats?
• Step 1: Figure out P(A). It’s given in the question as 40%, or 0.4.
• Step 2: Figure out P(A∩B). This is the intersection of A and B: both happening together.
It’s given in the question 20 out of 100 buyers, or 0.2.
• Step 3: Insert your answers into the formula:
• P(B|A) = P(A∩B) / P(A) = 0.2 / 0.4 = 0.5.
• The probability that a buyer bought bucket seats, given that they purchased an alarm
system, is 50%.
Bayesian Probability and Belief network
• Types of probabilities:
6. Posterior Probability: its conditional probability after considering relevant incidence
1. Product rule P(A,B) = P(A|B)P(B)
2. P(B|A) = P(A,B) / P(A)
3. P(A|B) = P(B|A)P(A)/P(B)
• This is called Bayes Theorem.
• It is used to infer probability of hypothesis under observed data or evidence
• It is also called Bayesian Inference
Bayesian Probability and Belief network
• A Bayesian network, Bayes network, belief network, decision network, Bayes(ian) model
or probabilistic directed acyclic graphical model is a probabilistic graphical model / a type
of statistical model that represents a set of variables and their conditional dependencies via a
directed acyclic graph (DAG).
• Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that
any one of several possible known causes was the contributing factor for that event to happen
• It is statistical method to infer the unknown situation
• The evidences which are consistent and inconsistent with hypothesis gathered
• As more and more evidences gather the degree of belief changes
• The hypothesis(proposed event) can be true or false (alarm will ring or will not ring)
• P(H|E)=P(E|H).P(H) / P(E) …E evidence and H is hypothesis
• P(E) = Prior Prob., P(H) …Marginal probability, P(E|H) .. Conditional probability or likelihood
• In Bayesian inference initial belief on hypothesis is used and the degree of belief is
estimated
Bayesian Probability and Belief network
• 1. Ex. A person X has elbow pain, proposition X has tennis elbow find mappings for
diagnosis
• P(Person with elbow pain): P(e) , P(person having tennis elbow): P(t)
• P(person having tennis elbow and elbow pain) : P(t|e) (probability of having tennis elbow given
elbow pain)
• P(person having pain in elbow given he has tennis elbow) : P(e|t)
• P(t|e) = P(e|t) P(t) / P(e)
• 2. Example:
• In a set of 52 cards find the probability of card King in the cards with face(faces areJack, Queen,
King)
• In card pack there are 12 cards with face, 4 cards of king
• P(King|Face) = P(Face|King) P(King) / P(Face)
• = 1 * 4/52 / 12/52 = 1/3
..... Any picked up card which is king is face card so P(Face|King) =1
● Bayes Theorem: Proof :

● 2 events A and B , Probability of A given B has occurred is

● P(A|B) = P(B|A) . P(A) / P(B)

● P(A) : Prior Prob. : Probability of hypothesis when event has not occurred before
evidence is observed

● P(B): Marginal Prob. : Probability of pure data or event :

● P(A|B) = P(A ∩ B)/ P(B) …… Posterior Probability : P(A|B) A is hypothesis and B


given data or evidence

● P(B|A) = P(B ∩ A) /P(A) ……. Conditional Probability or Likelihood : P(B|A) :


Probability of evidence B given Hypothesis A is true

● P(A|B) . P(B) = P(A ∩ B)

● P(B|A) . P(A) = P(B ∩ A) … but P(A ∩ B)= P(B ∩ A)

● P(A|B) . P(B) = P(B|A) . P(A) so P(A|B) = P(B|A).P(A) / P(B)


Bayesian Probability and Belief network
• Belief Network / Bayesian Network:
• A belief network or Bayesian network is probabilistic graphical model that represents
set of random variables and their conditional dependencies through directed acyclic
graphs
• In graph the nodes represent random variables and directed edges represent conditional
dependencies in variables
• Dependent node of a parent node is child node
• Parent node is cause and child node is effect
• Every node is associated with probability function,
• Bayesian domain provides entire description of domain
• The network is seen as joint probability distribution
• Prob. Distribution is product of appropriate elements
• of conditional probabilities in table
https://artint.info/html1e/ArtInt_148.html
Analyse the given equation and construct the Bayesian / Belief Network
Bayesian Probability and Belief network
• Belief Network: Example:
• Probability of selection of Ram or shyam by Coach M

C=t C=f Coach is parent node and its values represents


prior prob. from past data
Coach 0.9 0.1 Compute probability of Selection of Ram
M P(C) Prob of M is coach ,P(notC) M is not coach
P(R)= P(R|C)*P(C) + P(R| no C) * P(no C)
= (0.8*0.9) + (0.1 * 0.1) = 0.73
Same way we can calculate prob of Shyam
Shyam being selected =0.53
Selected Ram These computations are marginal probabilities.
Selected
If Ram is selected but do not know that
coach is not M then
Coach t f Coach t f P(C|R)= P(R|C)*P(C) / P(R)
Shyam=t =0.8 *0.9/0.73
0.6 0.5 Ram=t 0.8 0.1 = 0.98
Shyam=f 0.4 0.5 Ram=f 0.2 0.9 It tells whether coach is M
Bayesian Probability and Belief network
• Belief Network: Example:
• Probability of selection of Ram or shyam by Coach M

C=t C=f Compute probability of Selection of Shyam


Coach 0.9 0.1 P(S=T)= P(S|C)*P(C) + P(S| no C) * P(no C)
M = (0.6*0.9) + (0.5 * 0.1) = 0.59

Calculate P(S) not getting selected


P(S=F) = P(S=F|C) * P(C) + P(S=F|NO C) * P(NO C)
Shyam = 0.4 * 0.9 + 0.5 * 0.1 =0.41
Selected Ram
Selected
These computations are marginal probabilities.

Coach t f Coach t f
Shyam=t 0.6 0.5 Ram=t 0.8 0.1
Shyam=f 0.4 0.5 Ram=f 0.2 0.9
Bayesian Probability and Belief network
• Applications Bayesian Networks:
• Machine learning
• Statistics
• Computer vision
• Natural language Processing
• Speech recognition
• Error-control codes
• Bioinformatics Medical diagnosis
• Weather forecasting
• PATHFINDER medical diagnosis system at Stanford
• Microsoft Office assistant and troubleshooters
• Space shuttle monitoring system at NASA Mission Control Center in Houston

http://sse.tongji.edu.cn/liangshuang/pr2012fall/slides/08-bn.pdf
Example
• A new Burglar alarm is installed in the house
• This alarm can detect Burglary/theft and earthquakes too
• There are 2 neighbours Ali and Veli who will call the person in the house when they hear
alarm
• Ali always calls when he hears the alarm but at times confuses as he has same ringtone
of his phone as that of alarm.
• Veli likes loud music so sometimes he misses to call when alarm rings
• Given the evidence of who has or has not called we want to
• 1. find the probability of burglary
• 2. Probability that alarm rings and when no burglary and no earthquake
• Solution :
• Different Probabilities associated
• 1. Probability of burglary has happened /not
• 2. Probability of Earthquake has happened /not
• 3. Probability whether the alarm rings or not in all possible cases of Burglary and
Earthquake
• 4. Probability Ali hear alarm and calls
• 5. Probability Veli hears the alarm and calls
Example
• Solution :
• Different Probabilities associated P(B=T) P(B=F)
• 1. Probability of burglary (B) has happened /not
0.001 0.999

• 2. Probability of Earthquake (E) has happened /not P(E=T) P(E=F)


0.002 0.998

• 3. Probability whether the alarm (A) rings or not in all possible cases of Burglary and
Earthquake ie. Alarm (A) ringing is dependent on parent ie B and E

B E A=T A=F
• T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999
Example
• Solution :
A P(AC=T) P(AC=F)

• 4. Probability Ali hear alarm and calls (AC) T 0.90 0.10
• Alarm is parent We need to consider Alarm F 0.05 0.95
ringing probability
A P(VC=T) P(VC=F)
• 5. Probability Veli hears the alarm and calls (VC)
T 0.70 0.30
F 0.01 0.99

• Find probability of Ali and Veli call Alarm rings And no Burglary and Earthquake
• P(AC, VC , A, not(B), not(E))

• P(AC|A) . P(VC|A) . P(A|not B,Not E) . P(not (B)) . P(not(E))

• 0.90 . 0.7 . 0.001 . 0.999 . 0.998 = 0.00062


Example
• 2. Probability that
• Ali calls when hears alarm
• Veli does not hear and does not call
• Alarm rings when Burglary happens and there is no earthquake
• P( AC, not(VC), A, B, not(E))

• P(AC|A=T) . P(NOTVC|A=T) . (A|B=T,not(E)) . P(B|T) .P(E|F)

• 0.90 . 0.99 . 0.94 . 0.001 . 0.998 = 0.00083


3. 3. Probability that Ali and Veli don’t call when the alarm rings due to Earthquake
NOT DUE TO BURGLARY


Bayesian Probability and Belief network
• Four kinds of inference in belief networks:
• Diagnostic inference: from effects to causes: Example: Given that Ali Calls, infer
P(Burglary|Ali Calls)
• Causal inference: from causes to effects Example: Given Burglary, infer
P(AliCalls|Burglary) and P(Veli Calls|Burglary)
• Intercausal inference: between causes of a common effect Given Alarm, we have
P(Burglary|Alarm) = 0.376. But with the evidence that Earthquake is true, then
P(Burglary|Alarm∧Earthquake) goes down to 0.003. Even though burglaries and
earthquakes are independent, the presence of one makes the other less likely.
• Mixed inference: combining two or more of the above Example: Setting the effect AliCalls
to true and the cause Earthquake to false gives
P(Alarm|AliCalls ∧ ¬Earthquake) = 0.03 - a combination of diagnostic and causal inference
Bayesian Probability and Belief network
● Naive Bayes Algorithm:
● It works on Bayes theorem of probability to predict the class of unknown data sets.
● It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors.
● A Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence
of any other feature.
● Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c).

● P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
● P(c) is the prior probability of class.
● P(x|c) is the likelihood which is the probability of predictor given class.
● P(x) is the prior probability of predictor. https://www.analyticsvidhya.com/blog/2017/09/naive-ba
yes-explained/
Bayesian Probability and Belief network
● Step by Step Naive Bayes algorithm:
● Step 1: Convert the data set into a frequency table
● Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64.
● Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is
the outcome of prediction.

Problem: Players will play if weather is sunny. Is this statement is correct?


We can solve it using method of posterior probability:
P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.

● Assumption : In Naive Bayes it’s assumed that all the features are independent of each other.
● Also need to convert the continuous variables into discrete variables.
Applications of Naive Bayes Algorithms
● Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making
predictions in real time.
● Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here we can predict the
probability of multiple classes of target variable.
● Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text classification (due to
better result in multi class problems and independence rule) have higher success rate as compared to other algorithms. As a
result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify
positive and negative customer sentiments)
● Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System
that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a
given resource or not
There are three types of Naive Bayes model under the scikit-learn library:

● Gaussian: It is used in classification and it assumes that features follow a normal distribution.
● Multinomial: It is used for discrete counts. For example, let’s say, we have a text classification problem. Here we can
consider Bernoulli trials which is one step further and instead of “word occurring in the document”, we have “count how often
word occurs in the document”, you can think of it as “number of times outcome number x_i is observed over the n trials”.
● Bernoulli: The binomial model is useful if your feature vectors are binary (i.e. zeros and ones). One application would be text
classification with ‘bag of words’ model where the 1s & 0s are “word occurs in the document” and “word does not occur in
the document” respectively

Sr. No. Outlook Temp. Humidity Windy Play
Yes/No Given are the details about outlook,
1 Rainy Hot High False No
Temperature, humidity, wind condition.
decide whether Game can be played in
2 Rainy Hot High True No condition outlook is rainy, temp. is
cool, Humidity is high and windy is
3 Overcast Hot High False Yes true
4 Sunny Mild High False Yes Step 1:
P(yes) =9/14
5 Sunny Cool Normal False Yes
P(No)=5/14
6 Sunny Cool Normal True No

7 Overcast Cool Normal True Yes

8 Rainy Mild High False No

9 Rainy Cool Normal False Yes

10 Sunny Mild Normal False Yes

11 Rainy Mild Normal True Yes

12 Overcast Mild High True Yes


https://www.geeksforgeeks.org/naive-bayes-classifiers/
13 Overcast Hot Normal False Yes https://medium.com/@hrishavkmr/naive-bayes-in-machine-learning-5c0972340b76
https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
14 Sunny Mild High True No
whether Game can be played in condition Outlook is
Sr. No. Outlook Temp Humidity Windy Play rainy, Temp. is cool, Humidity is high and Windy is true
. Yes/No

1 Rainy Hot High False No Step 2: Frequency Tables


1. P(Rainy|yes), P(Rainy|No)
2 Rainy Hot High True No

3 Overcast Hot High False Yes

4 Sunny Mild High False Yes Outlook Yes No Total P(Yes) P(No)
5 Sunny Cool Normal False Yes
Rainy 2 3 5 2/9 3/5
6 Sunny Cool Normal True No

7 Overcast Cool Normal True Yes Overcast 4 0 4 4/9 0/5

8 Rainy Mild High False No Sunny 3 2 5 3/9 2/5


9 Rainy Cool Normal False Yes
Total 9 5 14 9/9=100% 5/5 =
10 Sunny Mild Normal False Yes 100%
11 Rainy Mild Normal True Yes
P(Rainy|yes) = 2/9
12 Overcast Mild High True Yes
P(Rainy)= 5/14
13 Overcast Hot Normal False Yes P(Rainy|No) = 3/5
14 Sunny Mild High True No
whether Game can be played in condition Outlook is
Sr. No. Outlook Temp Humidity Windy Play rainy, Temp. is cool, Humidity is high and Windy is true
. Yes/No
Step 2: Frequency Tables
1 Rainy Hot High False No 2. P(Cool | yes), P(Cool | No)
2 Rainy Hot High True No

3 Overcast Hot High False Yes


Temp. Yes No Total P(Yes) P(No)
4 Sunny Mild High False Yes
5 Sunny Cool Normal False Yes Hot 2 2 4 2/9 2/5

6 Sunny Cool Normal True No Cool 3 1 4 3/9 1/5


7 Overcast Cool Normal True Yes
Mild 4 2 6 4/9 2/5
8 Rainy Mild High False No
Total 9 5 14 9/9 5/5
9 Rainy Cool Normal False Yes

10 Sunny Mild Normal False Yes

11 Rainy Mild Normal True Yes


P(Cool | Yes) = 3/9
12 Overcast Mild High True Yes P(Cool) = 4/14
P(Cool | No) = 1/5
13 Overcast Hot Normal False Yes

14 Sunny Mild High True No


whether Game can be played in condition outlook is
Sr. No. Outlook Temp Humidity Windy Play rainy, temperature is cool, Humidity is high and Windy
. Yes/No is trueStep 2: Frequency Tables
1 Rainy Hot High False No 2. P(High | yes), P(High | No)
2 Rainy Hot High True No

3 Overcast Hot High False Yes


Humidity Yes No Total P(Yes) P(No)
4 Sunny Mild High False Yes
5 Sunny Cool Normal False Yes High 3 4 7 3/9 4/5

6 Sunny Cool Normal True No Normal 6 1 7 6/9 1/5


7 Overcast Cool Normal True Yes
Total 9 5 14 9/9 5/5
8 Rainy Mild High False No

9 Rainy Cool Normal False Yes

10 Sunny Mild Normal False Yes

11 Rainy Mild Normal True Yes


P(Highl | Yes) = 3/9
12 Overcast Mild High True Yes P(High) = 7/14
P(Highl | No) = 4/5
13 Overcast Hot Normal False Yes

14 Sunny Mild High True No


whether Game can be played in condition Outlook is
Sr. No. Outlook Temp Humidity Windy Play rainy, Temp. is cool, Humidity high and Windy is true
. Yes/No
Step 2: Frequency Tables
1 Rainy Hot High False No 2. P(True | yes), P(True | No)
2 Rainy Hot High True No

3 Overcast Hot High False Yes


Windy Yes No Total P(Yes) P(No)
4 Sunny Mild High False Yes
5 Sunny Cool Normal False Yes True 3 3 6 3/9 3/5

6 Sunny Cool Normal True No False 6 2 8 6/9 2/5


7 Overcast Cool Normal True Yes
Total 9 5 14 9/9 5/5
8 Rainy Mild High False No

9 Rainy Cool Normal False Yes

10 Sunny Mild Normal False Yes

11 Rainy Mild Normal True Yes


P(Truel | Yes) = 3/9
12 Overcast Mild High True Yes P(True) = 6/14
P(Truel | No) = 3/5
13 Overcast Hot Normal False Yes

14 Sunny Mild High True No


whether Game can be played in condition Outlook is
Sr. No. Outlook Temp Humidity Windy Play rainy, Temp. is cool, Humidity high and Windy is true for
. Yes/No new instance X
P(Rainy|yes) = 2/9 P(Cool | Yes) = 3/9
1 Rainy Hot High False No P(yes) =9/14 P(Rainy|No) = 3/5 P(Cool) = 4/14
2 Rainy Hot High True No P(No)=5/14 P(Rainy)= 5/14 P(Cool | No) = 1/5
3 Overcast Hot High False Yes P(Highl | Yes) = 3/9 P(Truel | Yes) = 3/9
P(High) = 7/14 P(True) = 6/14
4 Sunny Mild High False Yes P(Highl | No) = 4/5 P(Truel | No = 3/5
5 Sunny Cool Normal False Yes
P(X | play = yes).P(play = yes)=
6 Sunny Cool Normal True No
(2/9 * 3/9 *3/9 *3/9 * 9/14 ) =
7 Overcast Cool Normal True Yes .22 *.33 *.33 * .33 *.64 = 0.00508
8 Rainy Mild High False No
(P(X | play = no).P(play = no))=
⅗ * ⅕ * 4/5 * ⅗ * 5/14 =
9 Rainy Cool Normal False Yes 0.357 *0.6 * 0.2 * 0.8*0.6)=0.02057
10 Sunny Mild Normal False Yes P(X) = 0.00508+0.02057 =0.02565
P(play = Yes|X) = (P(X | play = yes).P(play = yes)) /
11 Rainy Mild Normal True Yes
P(X)
12 Overcast Mild High True Yes = 0.00508 / 0.02565 = 0.198
13 Overcast Hot Normal False Yes
P(play = No|X) = (P(X | play = no).P(play = no)) /
P(X) =
14 Sunny Mild High True No
= 0.02057/0.02565 = 0.80194
p(Play= No) > P(play= Yes) so game will not be played
0.80194 > 0.198 so game will not be played
Bayesian Probability and Belief network
● Naive Bayes Algorithm:
● Advantages :
○ It is easy and fast to predict class of test data set. It also perform well in multi class prediction
○ When assumption of independence holds, a Naive Bayes classifier performs better compare to other
models like logistic regression and requires less training data.
○ It perform well in case of categorical input variables compared to numerical variable(s).
● Disadvantages
○ If categorical variable has a category (in test data set), which was not observed in training data set, then
model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero
Frequency”.
○ Naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be
taken too seriously.
○ It assumes independent predictors. In real life, it is almost impossible that we get a set of predictors which
are completely independent.
Bayesian Probability and Belief network
● Applications of Naive Bayes Algorithms
● Real time Prediction
● Multi class Prediction
● Text classification/ Spam Filtering/ Sentiment Analysis
○ To mark an email as spam, or not spam ?
○ Classify a news article about technology, politics, or sports ?
○ Check a piece of text expressing positive emotions, or negative emotions?
○ Also used for face recognition softwares

Bayesian Probability and Belief network

● Example:

● In a set of 52 cards find the probability of card King in the cards with
face(Jack, Queen, King)

● P(King|Face) = P(Face|King) P(King) / P(Face)

● = 1 * 4/52 / 12/52 = 1/3


Bayesian Probability and Belief network
● Example:

• Find the probability of a patient’s of having liver disease if they are an alcoholic.
• 10% patients entering clinic have liver disease P(A)=0.1
• 5% patients of patients are alcoholic P(B) = 0.05
• 7% patients with liver disease are alcoholic : P(B|A) 0.07 : the probability that a
patient is alcoholic, given that they have liver disease is 7 %
• Prob. That patient having liver disease are alcoholic P(A|B)
• Bayes Theorem = P(A|B) = P(B|A) * P(A) /P(B)
• P(A|B) = (0.07 * 0.1)/0.05 = 0.14
• In other words, if the patient is an alcoholic, their chances of having liver disease is 0.14
(14%). This is a large increase from the 10% suggested by past data. But it’s still unlikely
that any particular patient has liver disease.
Bayesian Probability and Belief network
● Example:

• Find the probability of a patient’s of having liver disease if they are an alcoholic.
• 10% patients entering clinic have liver disease P(A)=0.1 (20%) (25%) (22%)
• 5% patients of patients are alcoholic P(B) = 0.05 (10%) (15%) (12%)
• 7% patients with liver disease are alcoholic : P(B|A) 0.07 : the probability that a
patient is alcoholic, given that they have liver disease is 7 % (8%) (10%) (10%)
• Prob. That patient having liver disease are alcoholic P(A|B)
• Bayes Theorem = P(A|B) = P(B|A) * P(A) /P(B)
• P(A|B) = (0.07 * 0.1)/0.05 = 0.14 (0.08 *0.2)/0.1 = 0.16 =16 %) (.1*.25)/.15 = 16.66
• (0.1*.22)/.1 = 22%
In other words, if the patient is an alcoholic, their chances of having liver disease is 0.14
(14%). This is a large increase from the 10% suggested by past data. But it’s still unlikely
• Suppose that an individual is extracted at random from a population of men.
• The probability of extracting a married individual is 50%.
• The probability of extracting a childless individual is 40%.
• The conditional probability that an individual is childless given that he is married is equal to
20%.
• If the individual we extract at random from the population turns out to be childless, what is the
conditional probability that he is married?
• This conditional probability is called posterior probability and it can be computed by using
Bayes' rule above.
The quantities involved in the computation are
50/100 =1/2
40/100 =4/10 =2/5
20/100 =1/5
The posterior probability is

You might also like