0% found this document useful (0 votes)
13 views

4 KnowledgRepresentation Planning Prob Uncertainity

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

4 KnowledgRepresentation Planning Prob Uncertainity

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 214

Knowledge Representation

Knowledge

Information

Data

Noise
Representations and Mapping
• For solving the complex problem, large amount of knowledge is
required, and some mechanisms for manipulating that knowledge to
create solutions to new problems. Two entities:
• Facts: Truths in some relevant world. These are the things which we want to
represent.
• Representations of facts in chosen formalism. These are the things we will
actually be able to manipulate.
• Structuring these entities in two levels:
• Knowledge level: at which facts (including agent’s behaviors and current goals)
can be described.
• Symbol level: at which representations of objects at the knowledge level are
defined in terms of symbols that can be manipulated by programs.
Mapping between Facts and Representations

F/w representation mapping


Internal
Facts
Representation
B/w representation mapping

English English
understanding generation

English
Representation
Approaches to Knowledge Representation
• Representation Adequacy: the ability to represent all kind of knowledge that
are needed in that domain.
• Inferential Adequacy: the ability to manipulate the representational structures
in such a way as to derive new structures corresponding to new knowledge
inferred from the old one.
• Inferential Efficiency: the ability to incorporate into the knowledge structure
additional information that can be used to focus the attention of the inference
mechanism in the most promising directions
• Acquisitional Efficiency: the ability to acquire the new information easily. It
may involve direct insertion of new knowledge into the database. Ideally the
program itself would be able to control the knowledge acquisition.
Issues in KR
• Are any attributes of the objects so basic that they occur in almost every
problem domain? If yes, we need to handle these appropriately in each
mechanism we propose. If such exist, what are they?
• Are there any important relationships that exist among attributes of objects?
• At what level should knowledge be represented? Is there a good set of
primitives into which all knowledge can be broken down? Is it helpful to use
such primitives? (Choosing the granularity of the representation)
• How should set of objects be represented?
• Given a large amount of knowledge stored in a database, how can relevant
parts be accessed when they are needed? (Finding the right structures as
needed)
T: Tom hit the ball.

J: Jane caught the ball.

S: Spot chased the ball.

^
^
Predicate calculus
• Predicate calculus is also known as “First Order Logic” (FOL)
• Predicate calculus includes:
• All of propositional logic
• Logical values true, false
• Variables x, y, a, b,...
• Connectives ¬, ⇒, ∧, ∨, ⇔
• Constants KingJohn, 2, Villanova,...
• Predicates Brother, >,...
• Functions Sqrt, MotherOf,...
• Quantifiers ∀, ∃

76
Constants, functions, and predicates
• A constant represents a “thing”--it has no truth value, and it does not
occur “bare” in a logical expression
• Examples: DavidMatuszek, 5, Earth, goodIdea
• Given zero or more arguments, a function produces a constant as its
value:
• Examples: motherOf(DavidMatuszek), add(2, 2), thisPlanet()
• A predicate is like a function, but produces a truth value
• Examples: greatInstructor(DavidMatuszek), isPlanet(Earth), greater(3,
add(2, 2))

77
Universal quantification
• The universal quantifier, ∀, is read as “for each”
or “for every”
• Example: ∀x, x2 ≥ 0 (for all x, x2 is greater than or equal to
zero)
• Typically, ⇒ is the main connective with ∀:
∀x, at(x,Villanova) ⇒ smart(x)
means “Everyone at Villanova is smart”
• Common mistake: using ∧ as the main connective with ∀:
∀x, at(x,Villanova) ∧ smart(x)
means “Everyone is at Villanova and everyone is smart”
• If there are no values satisfying the condition, the result is true
• Example: ∀x, isPersonFromMars(x) ⇒ smart(x) is true

78
Existential quantification
• The existential quantifier, ∃, is read “for some” or
“there exists”
• Example: ∃x, x2 < 0 (there exists an x such that x2 is less
than zero)
• Typically, ∧ is the main connective with ∃:
∃x, at(x,Villanova) ∧ smart(x)
means “There is someone who is at Villanova and is smart”
• Common mistake: using ⇒ as the main connective with ∃:
∃x, at(x,Villanova) ⇒ smart(x)
This is true if there is someone at Villanova who is smart...
...but it is also true if there is someone who is not at Villanova
By the rules of material implication, the result of F ⇒ T is T

79
Properties of quantifiers
• ∀x ∀y is the same as ∀y ∀x
• ∃x ∃y is the same as ∃y ∃x

• ∃x ∀y is not the same as ∀y ∃x


• ∃x ∀y Loves(x,y)
• “There is a person who loves everyone in the world”
• More exactly: ∃x ∀y (person(x) ∧ person(y) ⇒ Loves(x,y))
• ∀y ∃x Loves(x,y)
• “Everyone in the world is loved by at least one person”

• Quantifier duality: each can be expressed using the other


• ∀x Likes(x,IceCream) ¬∃x ¬Likes(x,IceCream)
• ∃x Likes(x,Broccoli) ¬∀x ¬Likes(x,Broccoli)

From aima.eecs.berkeley.edu/slides-ppt, chs 7-9


80
Parentheses
• Parentheses are often used with quantifiers
• Unfortunately, everyone uses them differently, so don’t be
upset at any usage you see
• Examples:
• (∀x) person(x) ⇒ likes(x,iceCream)
• (∀x) (person(x) ⇒ likes(x,iceCream))
• (∀x) [ person(x) ⇒ likes(x,iceCream) ]
• ∀x, person(x) ⇒ likes(x,iceCream)
• ∀x (person(x) ⇒ likes(x,iceCream))
• I prefer parentheses that show the scope of the quantifier
• ∃x (x > 0) ∧ ∃x (x < 0)

81
More rules
• Now there are numerous additional rules we can apply!
• Here are two exceptionally important rules:
• ¬∀x, p(x) ⇒ ∃x, ¬p(x)
“If not every x satisfies p(x), then there exists a x that does
not satisfy p(x)”
• ¬∃x, p(x) ⇒ ∀x, ¬p(x)
“If there does not exist an x that satisfies p(x), then all x do
not satisfy p(x)”
• In any case, the search space is just too large to be
feasible
• This was the case until 1970, when J. Robinson
discovered resolution
82
Interlude: Definitions
• syntax: defines the formal structure of sentences
• semantics: determines the truth of sentences wrt (with respect to)
models
• entailment: one statement entails another if the truth of the first
means that the second must also be true
• inference: deriving sentences from other sentences
• soundness: derivations produce only entailed sentences
• completeness: derivations can produce all entailed sentences

83
Resolution
Logic by computer was infeasible
• Why is logic so hard?
• You start with a large collection of facts (predicates)
• You start with a large collection of possible transformations
(rules)
• Some of these rules apply to a single fact to yield a new fact
• Some of these rules apply to a pair of facts to yield a new fact
• So at every step you must:
• Choose some rule to apply
• Choose one or two facts to which you might be able to apply the rule
• If there are n facts
• There are n potential ways to apply a single-operand rule
• There are n * (n - 1) potential ways to apply a two-operand rule
• Add the new fact to your ever-expanding fact base
• The search space is huge!
85
The magic of resolution
• Here’s how resolution works:
• You transform each of your facts into a particular form,
called a clause (this is the tricky part)
• You apply a single rule, the resolution principle, to a pair of
clauses
• Clauses are closed with respect to resolution--that is, when you
resolve two clauses, you get a new clause
• You add the new clause to your fact base
• So the number of facts you have grows linearly
• You still have to choose a pair of facts to resolve
• You never have to choose a rule, because there’s only one

86
The fact base
• A fact base is a collection of “facts,” expressed in predicate
calculus, that are presumed to be true (valid)
• These facts are implicitly “anded” together
• Example fact base:
• seafood(X) ⇒ likes(John, X) (where X is a variable)
• seafood(shrimp)
• pasta(X) ⇒ ¬likes(Mary, X) (where X is a different variable)
• pasta(spaghetti)
• That is,
• (seafood(X) ⇒ likes(John, X)) ∧ seafood(shrimp) ∧
(pasta(Y) ⇒ ¬likes(Mary, Y)) ∧ pasta(spaghetti)
• Notice that we had to change some Xs to Ys
• The scope of a variable is the single fact in which it occurs

87
Clause form
• A clause is a disjunction ("or") of zero or more literals, some or all of
which may be negated
• Example:
sinks(X) ∨ dissolves(X, water) ∨ ¬denser(X, water)
• Notice that clauses use only “or” and “not”—they do not use “and,”
“implies,” or either of the quantifiers “for all” or “there exists”
• The impressive part is that any predicate calculus expression can be
put into clause form
• Existential quantifiers, ∃, are the trickiest ones

88
Unification
• From the pair of facts (not yet clauses, just facts):
• seafood(X) ⇒ likes(John, X) (where X is a variable)
• seafood(shrimp)
• We ought to be able to conclude
• likes(John, shrimp)
• We can do this by unifying the variable X with the constant shrimp
• This is the same “unification” as is done in Prolog
• This unification turns seafood(X) ⇒ likes(John, X) into seafood(shrimp) ⇒
likes(John, shrimp)
• Together with the given fact seafood(shrimp), the final deductive step is easy

89
The resolution principle
• Here it is:
• From X ∨ someLiterals
and ¬X ∨ someOtherLiterals
----------------------------------------------
conclude: someLiterals ∨ someOtherLiterals
• That’s all there is to it!
• Example:
• broke(Bob) ∨ well-fed(Bob)
¬broke(Bob) ∨ ¬hungry(Bob)
--------------------------------------
well-fed(Bob) ∨ ¬hungry(Bob)

90
A common error
• You can only do one resolution at a time
• Example:
• broke(Bob) ∨ well-fed(Bob) ∨ happy(Bob)
¬broke(Bob) ∨ ¬hungry(Bob) ∨ ¬happy(Bob)
• You can resolve on broke to get:
• well-fed(Bob) ∨ happy(Bob) ∨ ¬hungry(Bob) ∨ ¬happy(Bob) ≡ T
• Or you can resolve on happy to get:
• broke(Bob) ∨ well-fed(Bob) ∨ ¬broke(Bob) ∨ ¬hungry(Bob) ≡ T
• Note that both legal resolutions yield a tautology (a trivially true
statement, containing X ∨ ¬X), which is correct but useless
• But you cannot resolve on both at once to get:
• well-fed(Bob) ∨ ¬hungry(Bob)

91
Contradiction
• A special case occurs when the result of a resolution (the resolvent) is
empty, or “NIL”
• Example:
• hungry(Bob)
¬hungry(Bob)
----------------
NIL
• In this case, the fact base is inconsistent
• This will turn out to be a very useful observation in doing resolution
theorem proving

92
A first example
• “Everywhere that John goes, Rover goes. John is at school.”
• at(John, X) ⇒ at(Rover, X) (not yet in clause form)
• at(John, school) (already in clause form)
• We use implication elimination to change the first of these into clause
form:
• ¬at(John, X) ∨ at(Rover, X)
• at(John, school)
• We can resolve these on at(-, -), but to do so we have to unify X
with school; this gives:
• at(Rover, school)

93
Refutation resolution
• The previous example was easy because it had very few clauses
• When we have a lot of clauses, we want to focus our search on the
thing we would like to prove
• We can do this as follows:
• Assume that our fact base is consistent (we can’t derive NIL)
• Add the negation of the thing we want to prove to the fact base
• Show that the fact base is now inconsistent
• Conclude the thing we want to prove

94
Example of refutation resolution
• “Everywhere that John goes, Rover goes. John is at school. Prove that Rover is
at school.”
1. ¬at(John, X) ∨ at(Rover, X)
2. at(John, school)
3. ¬at(Rover, school) (this is the added clause)
• Resolve #1 and #3:
4. ¬at(John, X)
• Resolve #2 and #4:
4. NIL
• Conclude the negation of the added clause: at(Rover, school)
• This seems a roundabout approach for such a simple example, but it works well
for larger problems

95
A second example
• Start with:
• it_is_raining ∨ it_is_sunny
• it_is_sunny ⇒ I_stay_dry
• it_is_raining ⇒ I_take_umbrella
• I_take_umbrella ⇒ I_stay_dry
• Convert to clause form: • Proof:
1. it_is_raining ∨ it_is_sunny 6. (5, 2) ¬it_is_sunny
2. ¬it_is_sunny ∨ I_stay_dry 7. (6, 1) it_is_raining
3. ¬it_is_raining ∨ I_take_umbrella 8. (5, 4) ¬I_take_umbrella
4. ¬I_take_umbrella ∨ I_stay_dry 9. (8, 3) ¬it_is_raining
• Prove that I stay dry: 10. (9, 7) NIL
5. ¬I_stay_dry ▪ Therefore, ¬(¬I_stay_dry)
▪ I_stay_dry

96
Conversion to clause form
A nine-step process

Reference: Artificial Intelligence, by Elaine Rich and Kevin Knight


Running example
• All Romans who know Marcus either hate Caesar or think that anyone
who hates anyone is crazy

• ∀x, [ Roman(x) ∧ know(x, Marcus) ] ⇒


[ hate(x, Caesar) ∨
(∀y, ∃z, hate(y, z) ⇒ thinkCrazy(x, y))]

98
Step 1: Eliminate implications
• Use the fact that x ⇒ y is equivalent to ¬x ∨ y

• ∀x, [ Roman(x) ∧ know(x, Marcus) ] ⇒


[ hate(x, Caesar) ∨
(∀y, ∃z, hate(y, z) ⇒ thinkCrazy(x, y))]

• ∀x, ¬[ Roman(x) ∧ know(x, Marcus) ] ∨


[hate(x, Caesar) ∨
(∀y, ¬(∃z, hate(y, z) ∨ thinkCrazy(x, y))]

99
Step 2: Reduce the scope of ¬
• Reduce the scope of negation to a single term, using:
• ¬(¬p) ≡ p
• ¬(a ∧ b) ≡ (¬a ∨ ¬b)
• ¬(a ∨ b) ≡ (¬a ∧ ¬b)
• ¬∀x, p(x) ≡ ∃x, ¬p(x)
• ¬∃x, p(x) ≡ ∀x, ¬p(x)

• ∀x, ¬[ Roman(x) ∧ know(x, Marcus) ] ∨


[hate(x, Caesar) ∨
(∀y, ¬(∃z, hate(y, z) ∨ thinkCrazy(x, y))]

• ∀x, [ ¬Roman(x) ∨ ¬know(x, Marcus) ] ∨


[hate(x, Caesar) ∨
(∀y, ∀z, ¬hate(y, z) ∨ thinkCrazy(x, y))]

100
Step 3: Standardize variables apart
• ∀x, P(x) ∨ ∀x, Q(x)
becomes
∀x, P(x) ∨ ∀y, Q(y)
• This is just to keep the scopes of variables from getting confused
• Not necessary in our running example

101
Step 4: Move quantifiers
• Move all quantifiers to the left, without changing their relative
positions

• ∀x, [ ¬Roman(x) ∨ ¬know(x, Marcus) ] ∨


[hate(x, Caesar) ∨
(∀y, ∀z, ¬hate(y, z) ∨ thinkCrazy(x, y)]

• ∀x, ∀y, ∀z,[ ¬Roman(x) ∨ ¬know(x, Marcus) ] ∨


[hate(x, Caesar) ∨
(¬hate(y, z) ∨ thinkCrazy(x, y))]

102
Step 5: Eliminate existential quantifiers

• We do this by introducing Skolem functions:


• If ∃x, p(x) then just pick one; call it x’
• If the existential quantifier is under control of a universal quantifier, then the
picked value has to be a function of the universally quantified variable:
• If ∀x, ∃y, p(x, y) then ∀x, p(x, y(x))
• Not necessary in our running example

103
Step 6: Drop the prefix (quantifiers)
• ∀x, ∀y, ∀z,[ ¬Roman(x) ∨ ¬know(x, Marcus) ] ∨
[hate(x, Caesar) ∨ (¬hate(y, z) ∨ thinkCrazy(x, y))]
• At this point, all the quantifiers are universal quantifiers
• We can just take it for granted that all variables are universally
quantified
• [ ¬Roman(x) ∨ ¬know(x, Marcus) ] ∨
[hate(x, Caesar) ∨ (¬hate(y, z) ∨ thinkCrazy(x, y))]

104
Step 7: Create a conjunction of disjuncts

• [ ¬Roman(x) ∨ ¬know(x, Marcus) ] ∨


[hate(x, Caesar) ∨ (¬hate(y, z) ∨ thinkCrazy(x, y))]

becomes

¬Roman(x) ∨ ¬know(x, Marcus) ∨


hate(x, Caesar) ∨ ¬hate(y, z) ∨ thinkCrazy(x, y)

105
Step 8: Create separate clauses
• Every place we have an ∧, we break our expression up into separate
pieces
• Not necessary in our running example

106
Step 9: Standardize apart
• Rename variables so that no two clauses have the same variable
• Not necessary in our running example

• Final result:
¬Roman(x) ∨ ¬know(x, Marcus) ∨
hate(x, Caesar) ∨ ¬hate(y, z) ∨ thinkCrazy(x, y)

• That’s it! It’s a long process, but easy enough to do mechanically

107
Other Logics
A very brief glimpse
Situation Calculus
• Every mutable predicate includes a situation "timestamp":
weather(sunny, s)
• Actions are described by effect axioms:
forall thing(x), place(p),
(i_am_at(p, s) & at(x, p, s)
=> holding(x, Result(take(x))) )
• Things that don't change are described by frame axioms:
forall thing(x), action(a), (holding(x, s) & a ~= drop =>
holding(x, Result(a, s)) )

109
Second Order Predicate Calculus
• A relation E is an equivalence relation, iff
• E is reflexive: for all x, E(x, x)
• reflexive(E) => for all x, E(x,x)
• E is symmetric: for all x, y, E(x, y) => E(y, x)
• symmetric(E) => for all x, y, E(x, y) => E(y, x)
• E is transitive:
• transitive(E) =>for all x, y, E(x,y) & E(y,z) => E(x,z)
• for all E, equivalence(E) => reflexive(E)
& symmetric(E) & transitive(E)
• But you can't say this in FOPC

110
Three-valued logics:
• Kleene: Values are True, False, and Unknown

• Luckasiewicz: Values are True, False, and Indeterminate

• Bochvar: Values are True, False, and Meaningless

111
Modal Logic
• Along with models, modal logic introduces the idea of “possible
worlds”
• New operators:
• “It is possible that”(x)
• “It is necessary that”(x)
• New rules of inference for these operators

112
Bayesian Statistics
• Probabilities are between 0 and 1
• P(a /\ b) = P(a) x P(b)
• P(~a) = 1 - P(a)
• P(a \/ b) = 1 - (~a /\ ~b)

113
Fuzzy Logic
• Truth values are between 0 and 1
• T(a /\ b) = min(T(a), T(b))
• T(a \/ b) = max(T(a), T(b))
• T(-a) = 1 - T(a)
• T(a \/ ~a) != 1
• Exact values don’t seem to matter much

114
Tense Logic
• P = it has always been the case that
• F = it will at some time be the case that
• H = it has always been the case that
• G = it will always be the case that
• Pp = ~H ~ p
• Fp = ~ G ~ p

115
Situation Calculus
• Time is discrete
• All statements are tagged with a time, or “situation”
• Result (action, situationn) = situationn+1
• Effect axioms: how actions change the world
• Frame axioms: how the world stays the same
• Successor-state axioms: how predicates can become
true or false

116
Defeasible Logics
• All birds fly
• except penguins
• other than Poppy, who runs an airline
• when the plane isn’t undergoing repairs

• One solution: “circumscription”


• Circumscription is very complex

117
Types of Logic
• From: A Bibliography of Non-Standard Logics
• http://www.earlham.edu/~peters/courses/logsys/nonstbib.htm#higher

Categorical logic Fuzzy logic Paraconsistent logic


Combinatory logic Higher-order logic Partial logic
Conditional logic Infinitary logic Prohairetic logic
Constructive logic Intensional logic Quantum logic
Cumulative logic Intuitionistic logic Relevant logic
Deontic logic Linear logic Stoic logic
Dynamic logic Many-sorted logic Substance logic
Epistemic logic Many-valued logic Substructural logic
Erotetic logic Modal logic Temporal (tense) logic
Free logic Non-monotonic logic Other logics
118
Standardisation of Semantic Networks
Standardisation of Network Relationships
Semantic network developed by Collins and
Quillian in their research on human
information storage and response times
(Harmon and King, 1985)
Standardisation of Network
Relationships
Semantic Network representation of
properties of snow and ice

E.g. What is common about ice and snow?


Exercises
• Try to represent the following two sentences into the appropriate
semantic network diagram:

• isa(person, mammal)
• instance(Mike-Hall, person) all in one graph
• team(Mike-Hall, Cardiff)

• score(Cardiff, Llanelli, 23-6)

• John gave Mary the book


Solution 1
• isa(person, mammal), instance(Mike-Hall, person), team(Mike-Hall, Cardiff)
mammal

is_a

person has_part head

is_a

Mike Hall team Cardiff


Solution 2
• score(Spurs, Norwich, 3-1)

Game

Is_a

Spurs Away_team Fixture 5 Score 3-1

Home_team

Norwich
Solution 3
• John gave Mary the book

Gave Book

Action Instance

John Agent Event 1 Object Book_69

Patient

Mary
Advantages of Semantic Networks
• Easy to visualise and understand.
• The knowledge engineer can arbitrarily defined the relationships.
• Related knowledge is easily categorised.
• Efficient in space requirements.
• Node objects represented only once.
•…
• Standard definitions of semantic networks have been developed.
Limitations of Semantic Networks
• The limitations of conventional semantic networks were studied
extensively by a number of workers in AI.

• Many believe that the basic notion is a powerful one and has to be
complemented by, for example, logic to improve the notion’s
expressive power and robustness.

• Others believe that the notion of semantic networks can be improved


by incorporating reasoning used to describe events.
Limitations of Semantic Networks
• Binary relations are usually easy to represent, but some times is
difficult.
• E.g. try to represent the sentence:
• "John caused trouble to the party".

John who cause where party

what

trouble
Limitations of Semantic Networks
• Other problematic statements. . .
• negation "John does not go fishing";
• disjunction "John eats pizza or fish and chips";
•…
• Quantified statements are very hard for semantic nets. E.g.:
• "Every dog has bitten a postman"
• "Every dog has bitten every postman"
• Solution: Partitioned semantic networks can represent quantified
statements.
Partitioned Semantic Networks
• Hendrix (1976 : 21-49, 1979 : 51-91) developed the so-called
partitioned semantic network to represent the difference between
the description of an individual object or process and the description
of a set of objects. The set description involves quantification.

• Hendrix partitioned a semantic network whereby a semantic


network, loosely speaking, can be divided into one or more networks
for the description of an individual.
Partitioned Semantic Networks
• The central idea of partitioning is to allow groups, nodes and arcs to
be bundled together into units called spaces – fundamental entities
in partitioned networks, on the same level as nodes and arcs (Hendrix
1979:59).

• Every node and every arc of a network belongs to (or lies in/on) one
or more spaces.

• Some spaces are used to encode 'background information' or generic


relations; others are used to deal with specifics called 'scratch' space.
Graph of “Mary gave John the book.”

134
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
Conceptual graph indicating that the dog named Emma is brown.

Conceptual graph indicating that a particular (but unnamed) dog is brown.

Conceptual graph indicating that a dog named Emma is brown.

135
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
Conceptual graph of a person with three names.

136
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
Conceptual graph of the sentence “The dog scratches its ear with its paw.”

137
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
Conceptual graph of the statement “Tom believes that Jane likes pizza,” showing
the use of a propositional concept.

138
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
Conceptual graph of the proposition “There are no pink dogs.”

139
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
Partitioned Semantic Networks
• Suppose that we wish to make a specific statement about a dog,
Danny, who has bitten a postman, Peter:
• " Danny the dog bit Peter the postman"

• Hendrix’s Partitioned network would express this statement as an


ordinary semantic network: S1
postma
dog bite
n
is_a is_a is_a
agent patient
Danny B Peter
Partitioned Semantic Networks
• Suppose that we now want to look at the statement:
• "Every dog has bitten a postman"
• Hendrix partitioned semantic network now comprises two partitions SA and S1. Node G is an
instance of the special class of general statements about the world comprising link statement,
form, and one universal quantifier

SA
General
Statement dog bite postman

is_a S1
is_a is_a is_a
form
G D
agent
B
patient
P
Partitioned Semantic Networks
• Suppose that we now want to look at the statement:
• "Every dog has bitten every postman"

SA
General
Statement dog bite postman

is_a S1
is_a is_a is_a
form
G D
agent
B
patient
P
Partitioned Semantic Networks
• Suppose that we now want to look at the statement:
• "Every dog in town has bitten the postman"
SA
dog

ako
General
Statement town dog bite postman

is_a S1
is_a is_a is_a
form
G D
agent
B
patient
P

NB: 'ako' = 'A Kind Of'


Partitioned Semantic Networks
• The partitioning of a semantic network renders them more
• logically adequate, in that one can distinguish between individuals and sets
of individuals,
• and indirectly more heuristically adequate by way of controlling the search
space by delineating semantic networks.

• Hendrix's partitioned semantic networks-oriented formalism has


been used in building natural language front-ends for data bases and
for programs to deduct information from databases.
Exercises
• Try to represent the following two sentences into the appropriate
semantic network diagram:

• "John believes that pizza is tasty"

• "Every student loves to party"


Solution 1: "John believes that pizza is tasty"
believes

is_a

John agent event

object

space

pizza tasty

is_a is_a

object has property


Solution 2: "Every student loves to party"
General
Statement

is_a
is_a
GS1 student party love

form

S1 is_a
form S2 is_a is_a

GS2 exists
p1 receiver l1

s1 agent
Frames
Scripts
Conceptual dependency theory of four primitive conceptualizations

154
155
156
Conceptual dependency representing “John ate the egg”
(Schank and Rieger 1974).

Conceptual dependency representation of the sentence “John prevented Mary from


giving a book to Bill” (Schank and Rieger 1974).

157
Elements of a planning problem

• When is a problem a good application for planning?


• It can be decomposed into subproblems in such a way
that solving each subproblem (in some order) solves
the whole problem.
• It is conjunctive, that is, it can be expressed as a
conjunction of conditions, each representing an
elementary aspect if the world, the initial situation, the
final situation.
• Each subproblem can be reduced to facts about domain
objects, and relations among such objects.
Elements of a planning problem (2)
• States, state spaces.
• Actions with preconditions and postconditions.
• An action is a function from states to states. It makes local
changes that affect few objects.
• Actions are generic, facts are specific.
• Actions are elementary: one-step plans.
• A design decision: granularity of actions.
• Conditional actions: If( C, A1, A2 ).
• Frame axioms and the Closed World Assumption.
• The timing of actions: is a real-time plan needed, or do
we only need a sequence of actions with relative time?
Elements of a planning problem (3)
• A plan is a sequence of actions that lead from an initial state
to a goal state.
• In a linear plan, the actions required to solve a subproblem
precede the actions required to solve the next subproblem.
• In a non-linear plan, actions for subproblems may be (or
even may have to be) interleaved.
• Optimality of plans: a simple criterion is the number of
actions.
• A plan may be executed either “in batch mode” (after the
whole plan has been created), or with every action
performed immediately (with feedback for more planning).
Elements of a planning problem (4)
• A formal representation of plans may look, for
example, like this:
• Do( Action3, Do( Action2, Do( Action1, init ) ) )
• where init denotes an initial state.
• There are two ways of implementing state
changes (we must be able to undo such
changes, because planning algorithms usually
backtrack).

• Explicit changes in the knowledge base.


• Implicit changes, not in the knowledge base but
only in the plan representation, recomputed as
needed (this is costly, but undoing is easier).
Planning as resolution
• The blocks world (find its description in any textbook ☺)
• T( p, s ) means "predicate p is true in state s"
• Elementary conditions: On, OnTbl and Clear.
• Examples of actions: Stack, Unstack and NoOp.
• T( OnTbl( x ), s ) ∧
T( Clear( x ), s ) ∧ T( Clear( y ), s ) ∧ x ≠ y ⇒
T( On( x, y ), Do( Stack( x, y ), s ) )
• T( On( x, y ), s ) ∧ T( Clear( x ), s ) ⇒
T( OnTbl( x ), Do( Unstack( x, y ), s ) ) ∧
T( Clear( y ), Do( Unstack( x, y ), s ) )
• T( p, s ) ⇒ T( p, Do( NoOp, s ) )
Planning as resolution (2)
• On means “sitting on another block”.
• OnTbl means “sitting on the table”.
• Some facts about this worlds are always true.
For example:
• T( OnTbl( x ), s ) ⇔ ¬ ∃ y T( On( x, y ), s) [=]
• Frame axioms — two examples:
• T( Clear( u ), s ) ∧ u ≠ y ⇒
T( Clear( u ), Do( Stack( x, y ), s ) )
• T( On( u, w ), s ) ∧ u ≠ x ⇒
T( On( u, w ), Do( Unstack( x, y ), s ) )
Planning as resolution (3)
• An initial state
• T( Clear( C ), S0 ) T( On( C, A ), S0 )
• T( Clear( B ), S0 )
• T( OnTbl( A ), S0 ) T( OnTbl( B ), S0 )
• Another (possibly final) state
• T( Clear( A ), S1 ) T( On( A, B ), S1 )
• T( On( B, C ), S1 ) T( OnTbl( C ), S1 )

A
S0 S1
C B
A B C
Plan construction by resolution Planning as resolution (4)
(a very simple example)

• We want to prove that T( OnTbl( A ), t ) given S1.


• We assume that we can do it in one action α, so
that state t should look like Do( α, S1 ).
• If this is not possible, we will try Do( β, Do( α, S1 ) ),
Do( γ, Do( β, Do( α, S1 ) ) ), and so on.
• Let’s negate the fact that we want proven.
• ¬ T( OnTbl( A ), Do( α, S1 ) )
• Action α could be Unstack. That is because one of
the clauses in its clausal-form definition is this:
• ¬ T( On( x, y ), s ), ¬ T( Clear( x ), s ),
T( OnTbl( x ), Do( Unstack( x, y ), s ) )
Plan construction by resolution Planning as resolution (5)
(continued)
• ¬ T( OnTbl( A ), Do( α, S1 ) )
• T( OnTbl( x ), Do( Unstack( x, y ), s ) ),
¬ T( On( x, y ), s ), ¬ T( Clear( x ), s )
• This gives
• ¬ T( On( A, y ), S1 ), ¬ T( Clear( A ), S1 )
• after assigning x ← A, s ← S1, α ← Unstack( A, y ).
• This is given: T( On( A, B ), S1 ).
• We resolve and get ¬ T( Clear( A ), S1 ) with y ← B.
• This too is given: T( Clear( A ), S1 ).
• so we produce the empty resolvent: done.
• t ← Do( α, S1 ) = Do( Unstack( A, B ), S1 )
Conditional plans
• Let T( Clear( A ), S2 ) be the only given fact.
• We can find a plan for the goal
• T( OnTbl( A ), Do( γ, S2 ) )
• even though the problem is
under-constrained.
• Axioms for conditional actions
• T( p, s ) ∧ T( q, Do( α, s ) ) ⇒
T( q, Do( If( p, α, β ), s ) )
• ¬ T( p, s ) ∧ T( q, Do( β, s ) ) ⇒
T( q, Do( If( p, α, β ), s ) )
• Again, we begin by assuming a one-step
plan, but now it will be a conditional plan.
Conditional plans (2)
• ¬ T( OnTbl( A ), Do( γ, S2 ) )
• Assume γ ← If( p, α, β ).
• ¬ T( OnTbl( A ), Do( If( p, α, β ), S2 ) )
• The first axiom as a clause: [+]
• T( q, Do( If( p, α, β ), s ) ),
¬ T( p, s ), ¬ T( q, Do( α, s ) )
• Resolve with q ← OnTbl( A ), s ← S2 and
get
• ¬ T( p, S2 ), ¬ T( OnTbl( A ), Do( α, S2 ) )
• We need again that axiom for Unstack:
• T( OnTbl( x ), Do( Unstack( x, y ), s ) ),
¬ T( On( x, y ), s ), ¬ T( Clear( x ), s )
Conditional plans (3)
• ¬ T( p, S2 ), ¬ T( OnTbl( A ), Do( α, S2 ) )
• T( OnTbl( x ), Do( Unstack( x, y ), s ) ),
¬ T( On( x, y ), s ), ¬ T( Clear( x ), s )
• With x ← A, α ← Unstack( A, y ), s ← S2 we get
• ¬ T( p, S2 ), ¬ T( On( A, y ), S2 ), ¬ T( Clear( A ), S2 )
• We resolve this with our only fact:
• ¬ T( p, S2 ), ¬ T( On( A, y ), S2 )
• ... and this is it. We need something new to move on.
• One possibility is factoring: look for matching literals
on the resolvent. Here we can assign p ← On( A, y )
and reduce the resolvent to ¬ T( On( A, y ), S2 ).
Conditional plans (4)
• We can conclude the proof if we assume
• T( On( A, y ), S2 )
• Here, the conditional nature of the plan helps.
We combine everything we have found so far.
Our one-action plan looks like this:
• Do( If( On( A, y ), Unstack( A, y ), β ), S2 )
• To also determine β, we return to the point marked [+] and look
at the second axiom for conditional actions:
• T( q, Do( If( p, α, β ), s ) ), T( p, s ), ¬ T( q, Do( β, s ) )
• ¬ T( OnTbl( A ), Do( γ, S2 ) )
• Resolve with q ← OnTbl( A ), s ← S2 and get
• T( p, S2 ), ¬ T( OnTbl( A ), Do( β, S2 ) )
Conditional plans (5)
• T( p, S2 ), ¬ T( OnTbl( A ), Do( β, S2 ) )
• The clausal form of action NoOp:
• ¬ T( p, s ), T( p, Do( NoOp, s ) )
• Assign p ← OnTbl( A ), β ← NoOp and get this:
• T( p, S2 ), ¬ T( OnTbl( A ), S2 )
• It is time to recall that useful fact [=] about OnTbl:
• T( OnTbl( x ), s ) ⇔ ¬ ∃ y T( On( x, y ), s)
• So, we can replace one side with the other:
• T( p, S2 ), ∃ y T( On( A, y ), S2)
• Skolemize (σ is the Skolem constant):
• T( p, S2 ), T( On( A, σ ), S2)
Conditional plans (6)
• T( p, S2 ), T( On( A, σ ), S2)
• We perform factorization, assigning p ← On( A, σ ):
• T( On( A, σ ), S2)
• To conclude the second part of the proof, we assume
• ¬ T( On( A, σ ), S2 )
• Now, our one-action plan is either of these:
• Do( If( On( A, y ), Unstack( A, y ), β ), S2 )
• Do( If( On( A, σ ), α, NoOp ), S2 )
• We can finally put these partial plans together:
• Do( If( On( A, σ ), Unstack( A, σ ), NoOp ), S2 )
Another representation of actions
• Preconditions of an action +
• facts added (made true) by this action +
• facts deleted (made false) by this action.
• Pre( Unstack( x, y ) ) = { On( x, y ), Clear( x ) }
• Add( Unstack( x, y ) ) = { OnTbl( x ), Clear( y ) }
• Del( Unstack( x, y ) ) = { On( x, y ) }
• The Closed World Assumption serves as the frame axiom:
• facts not on the Add list and on the Del list
are by default preserved.
• A state is represented as a set of facts, for example:
• { OnTbl( A ), OnTbl( B ) }
Goal regression
• Reduce one of the goals to its subgoal or subgoals.
• This means working backward from a state to a state
by looking at the preconditions and effects of some
action.
• Let Regr( q, α ) be a state achieved by regression
from q through α (“undoing” the effects of α).
• α cannot delete a fact that must be true in q:
• q ∩ Del( α ) = Ø
• If α has this property, we can define:
• Regr( q, α ) = Pre( α ) ∪ (q — Add( α ) )
• So: keep the preconditions of α, undo the effects of
α.
Goal regression (2)
• Example
• If we begin with
• { OnTbl( A ), OnTbl( B ) }
• we can “recreate”
• { Clear( A ), On( A, B ), OnTbl( B ) }
• by regression through Unstack( A, B ).
• Protection violation
• Actions for a subproblem must not undo the
effects of earlier actions for already solved
subproblems.
A planning strategy for conjunctive problems
• Going left to right, make a plan for each conjunct and freeze the
effects (put the added or preserved goals on a list of protected
goals).
• This simple method is weak. It will not work even for the
following trivial problem.
• Sinitial On( C, A ) ∧ OnTbl( A ) ∧ OnTbl( B )
• Sfinal On( C, A ) ∧ On( A, B )
• On( C, A ) is true in Sinitial and must be protected. But: it must be
temporarily destroyed to achieve On( A, B ). Goal protection
makes planning impossible.
• Here, rearrangement helps: On( A, B ) ∧ On( C, A ) “works”. In
general non-linear planning is necessary.
• One such a smarter planner is Warplan.
Acting under uncertainty
• Almost never the epistemological commitment that propositions
are true or false can be made.
• In practice, programs have to act under uncertainty:
• using a simple but incorrect theory of the world, which does not take into
account uncertainty and will work most of the time
• handling uncertain knowledge and utility (tradeoff between accuracy and
usefulness) in a rational way
• The right thing to do (the rational decision) depends on:
• the relative importance of various goals
• the likelihood that, and degree to which, they will be achieved
Handling uncertain knowledge
• Example of rule for dental diagnosis using first-order logic:
∀p Symptom(p, Toothache) ⇒ Disease(p, Cavity)
• This rule is wrong and in order to make it true we have to add an almost
unlimited list of possible causes:
∀p Symptom(p, Toothache) ⇒ Disease(p, Cavity) ∨ Disease(p, GumDisease) ∨
Disease(p, Abscess)…
• Trying to use first-order logic to cope with a domain like medical diagnosis
fails for three main reasons:
• Laziness. It is too much work to list the complete set of antecedents or consequents
needed to ensure an exceptionless rule and too hard to use such rules.
• Theoretical ignorance. Medical science has no complete theory for the domain.
• Practical ignorance. Even if we know all the rules, we might be uncertain about a
particular patient because not all the necessary tests have been or can be run.
Handling uncertain knowledge
• Actually, the connection between toothaches and cavities is just not
a logical consequence in any direction.
• In judgmental domains (medical, law, design...) the agent’s
knowledge can at best provide a degree of belief in the relevant
sentences.
• The main tool for dealing with degrees of belief is probability theory,
which assigns to each sentence a numerical degree of belief between
0 and 1.
Handling uncertain knowledge
• Probability provides a way of summarizing the uncertainty that
comes from our laziness and ignorance.
• Probability theory makes the same ontological commitment as logic:
• facts either do or do not hold in the world
• Degree of truth, as opposed to degree of belief, is the subject of fuzzy
logic.
Handling uncertain knowledge
• The belief could be derived from:
• statistical data
• 80% of the toothache patients have had cavities
• some general rules
• some combination of evidence sources
• Assigning a probability of 0 to a given sentence corresponds to an
unequivocal belief that the sentence is false.
• Assigning a probability of 1 corresponds to an unequivocal belief
that the sentence is true.
• Probabilities between 0 and 1 correspond to intermediate degrees
of belief in the truth of the sentence.
Handling uncertain knowledge
• The sentence itself is in fact either true or false.
• A degree of belief is different from a degree of truth.
• A probability of 0.8 does not mean “80% true”, but rather an 80%
degree of belief that something is true.
Handling uncertain knowledge
• In logic, a sentence such as “The patient has a cavity” is true or false.
• In probability theory, a sentence such as “The probability that the patient has
a cavity is 0.8” is about the agent’s belief, not directly about the world.
• These beliefs depend on the percepts that the agent has received to date.
• These percepts constitute the evidence on which probability assertions are
based
• For example:
• An agent draws a card from a shuffled pack.
• Before looking at the card, the agent might assign a probability of 1/52 to its being the
ace of spades.
• After looking at the card, an appropriate probability for the same proposition would be
0 or 1.
Handling uncertain knowledge
• An assignment of probability to a proposition is analogous to saying
whether a given logical sentence is entailed by the knowledge base,
rather than whether or not it is true.
• Todas las oraciones deben así indicar la evidencia con respecto a la
cual se está calculando la probabilidad.
• Cuando un agente recibe nuevas percepciones/evidencias, sus
valoraciones de probabilidad se actualizan.
• Antes de que la evidencia se obtenga, se habla de prior or
unconditional probability.
• Después de obtener la evidencia, se habla de posterior or conditional
probability.
Basic probability notation
• Propositions
• Degrees of belief are always applied to propositions, assertions that
such-and-such is the case.
• The basic element of the language used in probability theory is the random
variable, which can be thought of as referring to a “part” of the world whose
“status” is initially unknown.
• For example, Cavity might refer to whether my lower left wisdom tooth has
a cavity.
• Each random variable has a domain of values that it can take on.
Propositions
• As with CSP variables, random variables (RVs) are typically divided
into three kinds, depending on the type of the domain:
– Boolean RVs, such as Cavity, have the domain <true, false>.
– Discrete RVs, which include Boolean RVs as a special case, take on values
from a countable domain.
– Continuous RVs take on values from the real numbers.
Atomic events
• An atomic event (or sample point) is a complete specification of
the state of the world.
• It is an assignment of particular values to all the variables of
which the world is composed.
• Example:
• If the world consists of only the Boolean variables Cavity and
Toothache, then there are just four distinct atomic events.
• The proposition Cavity = false ∧ Toothache = true is one such event.
Axioms of probability
• For any propositions a, b
• 0 ≤ P(a) ≤ 1
• P(true) = 1 and P(false) = 0
• P(a ∨ b) = P(a) + P(b) - P(a ∧ b)
Prior probability
• The unconditional or prior probability associated with a proposition a is
the degree of belief accorded to it in the absence of any other information.
• It is written as P(a).
• Example:
– P(Cavity = true) = 0.1 or P(cavity) = 0.1
• It is important to remember that P(a) can be used only when there is no
other information.
• To talk about the probabilities of all the possible values of a RV:
• expressions such as P(Weather) are used, denoting a vector of values for the
probabilities of each individual state of the weather
Prior probability
– P(Weather) = <0.7, 0.2, 0.08, 0.02> (normalized, i.e., sums to 1)
• (Weather‘s domain is <sunny, rain, cloudy, snow>)
•This statement defines a prior probability distribution for the
random variable Weather.
•Expressions such as P(Weather, Cavity) are used to denote the
probabilities of all combinations of the values of a set of RVs.
•This is called the joint probability distribution of Weather and
Cavity.
Prior probability
• Joint probability distribution for a set of random variables gives the
probability of every atomic event with those random variables.
P(Weather,Cavity) = a 4 × 2 matrix of probability values:

Weather = sunny rainy cloudy snow


Cavity = true 0.144 0.02 0.016 0.02
Cavity = false 0.576 0.08 0.064 0.08
• Every question about a domain can be answered by the joint
distribution.
Conditional probability
• Conditional or posterior probabilities:
e.g., P(cavity | toothache) = 0.8
i.e., given that toothache is all I know

• Notation for conditional distributions:


P(Cavity | Toothache) = 2-element vector of 2-element vectors

• If we know more, e.g., cavity is also given, then we have


P(cavity | toothache, cavity) = 1 (trivial)

• New evidence may be irrelevant, allowing simplification, e.g.,


P(cavity | toothache, sunny) = P(cavity | toothache) = 0.8

• This kind of inference, sanctioned by domain knowledge, is crucial.


Conditional probability
• Definition of conditional probability:
P(a | b) = P(a ∧ b) / P(b) if P(b) > 0

• Product rule gives an alternative formulation:


P(a ∧ b) = P(a | b) P(b) = P(b | a) P(a)

• A general version holds for whole distributions, e.g.,


P(Weather,Cavity) = P(Weather | Cavity) P(Cavity)
• (View as a set of 4 × 2 equations, not matrix multiplication)

• Chain rule is derived by successive application of product rule:


P(X1, …,Xn) = P(X1,...,Xn-1) P(Xn | X1,...,Xn-1)
= P(X1,...,Xn-2) P(Xn-1 | X1,...,Xn-2) P(Xn | X1,...,Xn-1)
=…
= πi= 1n P(Xi | X1, … ,Xi-1)
Inference by enumeration
• A simple method for probabilistic inference uses observed evidence
for computation of posterior probabilities.
• Start with the joint probability distribution:

• For any proposition φ, sum the atomic events where it is true: P(φ) =
Σω:ω╞φ P(ω)
Inference by enumeration
• Start with the joint probability distribution:

• For any proposition φ, sum the atomic events where it is true: P(φ) =
Σω:ω╞φ P(ω)
• P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
Inference by enumeration
• Start with the joint probability distribution:

• For any proposition φ, sum the atomic events where it is true: P(φ) =
Σω:ω╞φ P(ω)
• P(toothache ∨ cavity) = 0.108 + 0.012 + 0.016 + 0.064 + 0.072 +
0.008 = 0.28
Inference by enumeration
• Start with the joint probability distribution:

• Conditional probabilities:
P(¬cavity | toothache) = P(¬cavity ∧ toothache)
P(toothache)
= 0.016+0.064
0.108 + 0.012 + 0.016 + 0.064
= 0.4
Marginalization
• One particularly common task is to extract the distribution over some
subset of variables or a single variable.
• For example, adding the entries in the first row gives the
unconditional probability of cavity:
P(cavity) = 0.108+0.012+0.072+0.008 = 0.2

1
198 9
Marginalization
• This process is called marginalization or summing out, because the
variables other than Cavity are summed out.
• General marginalization rule for any sets of variables Y and Z:
P(Y) = ΣzP(Y, z)
•A distribution over Y can be obtained by summing out all the other
variables from any joint distribution containing Y.

1
199 9
Marginalization
Typically, we are interested in:
the posterior joint distribution of the query variables X
given specific values e for the evidence variables E.

Let the hidden variables be Y.

Then the required summation of joint entries is done by summing out the hidden
variables:

P(X | E = e) = P(X,E = e) / P(e) = Σy P(X,E = e, Y = y) / P(e)

• X, E and Y together exhaust the set of random variables.


Normalization
• P(cavity | toothache) = P(cavity ∧ toothache) =
P(toothache)
= 0.108+0.012
0.108 + 0.012 + 0.016 + 0.064

• P(¬cavity | toothache) = P(¬cavity ∧ toothache) =


P(toothache)
= 0.016+0.064
0.108 + 0.012 + 0.016 + 0.064
• Notice that in these two calculations the term 1/P(toothache) remains
constant, no matter which value of Cavity we calculate.
2
201 0
Normalization
• The denominator can be viewed as a normalization constant α for the
distribution P(Cavity | toothache), ensuring it adds up to 1.
• With this notation and using marginalization, we can write the two
preceding equations in one:
P(Cavity | toothache) = α P(Cavity,toothache)
= α [P(Cavity,toothache,catch) + P(Cavity,toothache,¬ catch)]
= α [<0.108,0.016> + <0.012,0.064>]
= α <0.12,0.08> = <0.6,0.4>

2
202 0
Normalization

P(Cavity | toothache) = α P(Cavity,toothache)


= α [P(Cavity,toothache,catch) + P(Cavity,toothache,¬ catch)]
= α [<0.108,0.016> + <0.012,0.064>]
= α <0.12,0.08> = <0.6,0.4>

General idea: compute distribution on query variable by fixing


evidence variables and summing over hidden variables
Inference by enumeration
• Obvious problems:
• Worst-case time complexity: O(dn) where d is the largest arity
and n is the number of variables
• Space complexity: O(dn) to store the joint distribution
• How to define the probabilities for O(dn) entries, when
variables can be hundreds or thousand?
• It quickly becomes completely impractical to define the vast number
of probabilities required.
Independence
• A and B are independent iff
P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B)

P(Toothache, Catch, Cavity, Weather)


= P(Toothache, Catch, Cavity) P(Weather)

• 32 entries reduced to 12
• For n independent biased coins, O(2n) →O(n)
• Absolute independence powerful but rare
• Dentistry is a large field with hundreds of variables, none of which are
independent. What to do?
Conditional independence
• P(Toothache, Cavity, Catch) has 23 – 1 (because the numbers must sum to 1) = 7 independent
entries

• If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a
toothache:
P(catch | toothache, cavity) = P(catch | cavity)

• The same independence holds if I haven't got a cavity:


P(catch | toothache,¬cavity) = P(catch | ¬cavity)

• Catch is conditionally independent of Toothache given Cavity:


P(Catch | Toothache,Cavity) = P(Catch | Cavity)

• Equivalent statements:
P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
Conditional independence
• Full joint distribution using product rule:
P(Toothache, Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)
The resultant three smaller tables contain 5 independent entries (2*(21-1) for each conditional
probability distribution and 21-1 for the prior on Cavity)

• In most cases, the use of conditional independence reduces the size of the
representation of the joint distribution from exponential in n to linear in n.

• Conditional independence is our most basic and robust form of knowledge about
uncertain environments.
Bayes' rule
• Product rule
P(a∧b) = P(a | b) P(b) = P(b | a) P(a)
⇒ Bayes' rule:
P(a | b) = P(b | a) P(a) / P(b)

• or in distribution form
P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y)

• Useful for assessing diagnostic probability from causal probability:


• P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect)
Bayes' rule: example
• Here's a story problem about a situation that doctors often
encounter:
1% of women at age forty who participate in routine screening have breast
cancer. 80% of women with breast cancer will get positive mammographies.
9.6% of women without breast cancer will also get positive mammographies.
A woman in this age group had a positive mammography in a routine
screening. What is the probability that she actually has breast cancer?
• What do you think the answer is?

209
Bayes' rule: example
• Most doctors get the same wrong answer on this problem - usually,
only around 15% of doctors get it right. ("Really? 15%? Is that a real
number, or an urban legend based on an Internet poll?" It's a real
number. See Casscells, Schoenberger, and Grayboys 1978; Eddy
1982; Gigerenzer and Hoffrage 1995. It's a surprising result which is
easy to replicate, so it's been extensively replicated.)
• On the story problem above, most doctors estimate the probability to
be between 70% and 80%, which is wildly incorrect.

210
Bayes' rule: example
C = breast cancer (having, not having)
M = mammographies (positive, negative)

P(C) = <0.01, 0.99>


P(m | c) = 0.8
P(m | ¬c) = 0.096

211
Bayes' rule: example
P(C | m) = P(m | C) P(C) / P(m) =
= α P(m | C) P(C) =
= α <P(m | c) P(c), P(m | ¬c) P(¬c)> =
= α <0.8 * 0.01, 0.096 * 0.99> =
= α <0.008, 0.095> = <0.078, 0.922>

P(c | m) = 7.8%

212
Bayes' Rule and conditional independence
P(Cavity | toothache ∧ catch)
= αP(toothache ∧ catch | Cavity) P(Cavity)
= αP(toothache | Cavity) P(catch | Cavity) P(Cavity)

• The information requirements are the same as for inference using


each piece of evidence separately:
– the prior probability P(Cavity) for the query variable
– the conditional probability of each effect, given its cause
Naive Bayes
P(Cavity, Toothache, Catch) = P(Toothache, Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)

• This is an example of a naïve Bayes model:


P(Cause,Effect1, … ,Effectn) = P(Cause) πiP(Effecti|Cause)

• Total number of parameters (the size of the representation) is linear in n.

You might also like