4 KnowledgRepresentation Planning Prob Uncertainity
4 KnowledgRepresentation Planning Prob Uncertainity
Knowledge
Information
Data
Noise
Representations and Mapping
• For solving the complex problem, large amount of knowledge is
required, and some mechanisms for manipulating that knowledge to
create solutions to new problems. Two entities:
• Facts: Truths in some relevant world. These are the things which we want to
represent.
• Representations of facts in chosen formalism. These are the things we will
actually be able to manipulate.
• Structuring these entities in two levels:
• Knowledge level: at which facts (including agent’s behaviors and current goals)
can be described.
• Symbol level: at which representations of objects at the knowledge level are
defined in terms of symbols that can be manipulated by programs.
Mapping between Facts and Representations
English English
understanding generation
English
Representation
Approaches to Knowledge Representation
• Representation Adequacy: the ability to represent all kind of knowledge that
are needed in that domain.
• Inferential Adequacy: the ability to manipulate the representational structures
in such a way as to derive new structures corresponding to new knowledge
inferred from the old one.
• Inferential Efficiency: the ability to incorporate into the knowledge structure
additional information that can be used to focus the attention of the inference
mechanism in the most promising directions
• Acquisitional Efficiency: the ability to acquire the new information easily. It
may involve direct insertion of new knowledge into the database. Ideally the
program itself would be able to control the knowledge acquisition.
Issues in KR
• Are any attributes of the objects so basic that they occur in almost every
problem domain? If yes, we need to handle these appropriately in each
mechanism we propose. If such exist, what are they?
• Are there any important relationships that exist among attributes of objects?
• At what level should knowledge be represented? Is there a good set of
primitives into which all knowledge can be broken down? Is it helpful to use
such primitives? (Choosing the granularity of the representation)
• How should set of objects be represented?
• Given a large amount of knowledge stored in a database, how can relevant
parts be accessed when they are needed? (Finding the right structures as
needed)
T: Tom hit the ball.
^
^
Predicate calculus
• Predicate calculus is also known as “First Order Logic” (FOL)
• Predicate calculus includes:
• All of propositional logic
• Logical values true, false
• Variables x, y, a, b,...
• Connectives ¬, ⇒, ∧, ∨, ⇔
• Constants KingJohn, 2, Villanova,...
• Predicates Brother, >,...
• Functions Sqrt, MotherOf,...
• Quantifiers ∀, ∃
76
Constants, functions, and predicates
• A constant represents a “thing”--it has no truth value, and it does not
occur “bare” in a logical expression
• Examples: DavidMatuszek, 5, Earth, goodIdea
• Given zero or more arguments, a function produces a constant as its
value:
• Examples: motherOf(DavidMatuszek), add(2, 2), thisPlanet()
• A predicate is like a function, but produces a truth value
• Examples: greatInstructor(DavidMatuszek), isPlanet(Earth), greater(3,
add(2, 2))
77
Universal quantification
• The universal quantifier, ∀, is read as “for each”
or “for every”
• Example: ∀x, x2 ≥ 0 (for all x, x2 is greater than or equal to
zero)
• Typically, ⇒ is the main connective with ∀:
∀x, at(x,Villanova) ⇒ smart(x)
means “Everyone at Villanova is smart”
• Common mistake: using ∧ as the main connective with ∀:
∀x, at(x,Villanova) ∧ smart(x)
means “Everyone is at Villanova and everyone is smart”
• If there are no values satisfying the condition, the result is true
• Example: ∀x, isPersonFromMars(x) ⇒ smart(x) is true
78
Existential quantification
• The existential quantifier, ∃, is read “for some” or
“there exists”
• Example: ∃x, x2 < 0 (there exists an x such that x2 is less
than zero)
• Typically, ∧ is the main connective with ∃:
∃x, at(x,Villanova) ∧ smart(x)
means “There is someone who is at Villanova and is smart”
• Common mistake: using ⇒ as the main connective with ∃:
∃x, at(x,Villanova) ⇒ smart(x)
This is true if there is someone at Villanova who is smart...
...but it is also true if there is someone who is not at Villanova
By the rules of material implication, the result of F ⇒ T is T
79
Properties of quantifiers
• ∀x ∀y is the same as ∀y ∀x
• ∃x ∃y is the same as ∃y ∃x
81
More rules
• Now there are numerous additional rules we can apply!
• Here are two exceptionally important rules:
• ¬∀x, p(x) ⇒ ∃x, ¬p(x)
“If not every x satisfies p(x), then there exists a x that does
not satisfy p(x)”
• ¬∃x, p(x) ⇒ ∀x, ¬p(x)
“If there does not exist an x that satisfies p(x), then all x do
not satisfy p(x)”
• In any case, the search space is just too large to be
feasible
• This was the case until 1970, when J. Robinson
discovered resolution
82
Interlude: Definitions
• syntax: defines the formal structure of sentences
• semantics: determines the truth of sentences wrt (with respect to)
models
• entailment: one statement entails another if the truth of the first
means that the second must also be true
• inference: deriving sentences from other sentences
• soundness: derivations produce only entailed sentences
• completeness: derivations can produce all entailed sentences
83
Resolution
Logic by computer was infeasible
• Why is logic so hard?
• You start with a large collection of facts (predicates)
• You start with a large collection of possible transformations
(rules)
• Some of these rules apply to a single fact to yield a new fact
• Some of these rules apply to a pair of facts to yield a new fact
• So at every step you must:
• Choose some rule to apply
• Choose one or two facts to which you might be able to apply the rule
• If there are n facts
• There are n potential ways to apply a single-operand rule
• There are n * (n - 1) potential ways to apply a two-operand rule
• Add the new fact to your ever-expanding fact base
• The search space is huge!
85
The magic of resolution
• Here’s how resolution works:
• You transform each of your facts into a particular form,
called a clause (this is the tricky part)
• You apply a single rule, the resolution principle, to a pair of
clauses
• Clauses are closed with respect to resolution--that is, when you
resolve two clauses, you get a new clause
• You add the new clause to your fact base
• So the number of facts you have grows linearly
• You still have to choose a pair of facts to resolve
• You never have to choose a rule, because there’s only one
86
The fact base
• A fact base is a collection of “facts,” expressed in predicate
calculus, that are presumed to be true (valid)
• These facts are implicitly “anded” together
• Example fact base:
• seafood(X) ⇒ likes(John, X) (where X is a variable)
• seafood(shrimp)
• pasta(X) ⇒ ¬likes(Mary, X) (where X is a different variable)
• pasta(spaghetti)
• That is,
• (seafood(X) ⇒ likes(John, X)) ∧ seafood(shrimp) ∧
(pasta(Y) ⇒ ¬likes(Mary, Y)) ∧ pasta(spaghetti)
• Notice that we had to change some Xs to Ys
• The scope of a variable is the single fact in which it occurs
87
Clause form
• A clause is a disjunction ("or") of zero or more literals, some or all of
which may be negated
• Example:
sinks(X) ∨ dissolves(X, water) ∨ ¬denser(X, water)
• Notice that clauses use only “or” and “not”—they do not use “and,”
“implies,” or either of the quantifiers “for all” or “there exists”
• The impressive part is that any predicate calculus expression can be
put into clause form
• Existential quantifiers, ∃, are the trickiest ones
88
Unification
• From the pair of facts (not yet clauses, just facts):
• seafood(X) ⇒ likes(John, X) (where X is a variable)
• seafood(shrimp)
• We ought to be able to conclude
• likes(John, shrimp)
• We can do this by unifying the variable X with the constant shrimp
• This is the same “unification” as is done in Prolog
• This unification turns seafood(X) ⇒ likes(John, X) into seafood(shrimp) ⇒
likes(John, shrimp)
• Together with the given fact seafood(shrimp), the final deductive step is easy
89
The resolution principle
• Here it is:
• From X ∨ someLiterals
and ¬X ∨ someOtherLiterals
----------------------------------------------
conclude: someLiterals ∨ someOtherLiterals
• That’s all there is to it!
• Example:
• broke(Bob) ∨ well-fed(Bob)
¬broke(Bob) ∨ ¬hungry(Bob)
--------------------------------------
well-fed(Bob) ∨ ¬hungry(Bob)
90
A common error
• You can only do one resolution at a time
• Example:
• broke(Bob) ∨ well-fed(Bob) ∨ happy(Bob)
¬broke(Bob) ∨ ¬hungry(Bob) ∨ ¬happy(Bob)
• You can resolve on broke to get:
• well-fed(Bob) ∨ happy(Bob) ∨ ¬hungry(Bob) ∨ ¬happy(Bob) ≡ T
• Or you can resolve on happy to get:
• broke(Bob) ∨ well-fed(Bob) ∨ ¬broke(Bob) ∨ ¬hungry(Bob) ≡ T
• Note that both legal resolutions yield a tautology (a trivially true
statement, containing X ∨ ¬X), which is correct but useless
• But you cannot resolve on both at once to get:
• well-fed(Bob) ∨ ¬hungry(Bob)
91
Contradiction
• A special case occurs when the result of a resolution (the resolvent) is
empty, or “NIL”
• Example:
• hungry(Bob)
¬hungry(Bob)
----------------
NIL
• In this case, the fact base is inconsistent
• This will turn out to be a very useful observation in doing resolution
theorem proving
92
A first example
• “Everywhere that John goes, Rover goes. John is at school.”
• at(John, X) ⇒ at(Rover, X) (not yet in clause form)
• at(John, school) (already in clause form)
• We use implication elimination to change the first of these into clause
form:
• ¬at(John, X) ∨ at(Rover, X)
• at(John, school)
• We can resolve these on at(-, -), but to do so we have to unify X
with school; this gives:
• at(Rover, school)
93
Refutation resolution
• The previous example was easy because it had very few clauses
• When we have a lot of clauses, we want to focus our search on the
thing we would like to prove
• We can do this as follows:
• Assume that our fact base is consistent (we can’t derive NIL)
• Add the negation of the thing we want to prove to the fact base
• Show that the fact base is now inconsistent
• Conclude the thing we want to prove
94
Example of refutation resolution
• “Everywhere that John goes, Rover goes. John is at school. Prove that Rover is
at school.”
1. ¬at(John, X) ∨ at(Rover, X)
2. at(John, school)
3. ¬at(Rover, school) (this is the added clause)
• Resolve #1 and #3:
4. ¬at(John, X)
• Resolve #2 and #4:
4. NIL
• Conclude the negation of the added clause: at(Rover, school)
• This seems a roundabout approach for such a simple example, but it works well
for larger problems
95
A second example
• Start with:
• it_is_raining ∨ it_is_sunny
• it_is_sunny ⇒ I_stay_dry
• it_is_raining ⇒ I_take_umbrella
• I_take_umbrella ⇒ I_stay_dry
• Convert to clause form: • Proof:
1. it_is_raining ∨ it_is_sunny 6. (5, 2) ¬it_is_sunny
2. ¬it_is_sunny ∨ I_stay_dry 7. (6, 1) it_is_raining
3. ¬it_is_raining ∨ I_take_umbrella 8. (5, 4) ¬I_take_umbrella
4. ¬I_take_umbrella ∨ I_stay_dry 9. (8, 3) ¬it_is_raining
• Prove that I stay dry: 10. (9, 7) NIL
5. ¬I_stay_dry ▪ Therefore, ¬(¬I_stay_dry)
▪ I_stay_dry
96
Conversion to clause form
A nine-step process
98
Step 1: Eliminate implications
• Use the fact that x ⇒ y is equivalent to ¬x ∨ y
99
Step 2: Reduce the scope of ¬
• Reduce the scope of negation to a single term, using:
• ¬(¬p) ≡ p
• ¬(a ∧ b) ≡ (¬a ∨ ¬b)
• ¬(a ∨ b) ≡ (¬a ∧ ¬b)
• ¬∀x, p(x) ≡ ∃x, ¬p(x)
• ¬∃x, p(x) ≡ ∀x, ¬p(x)
100
Step 3: Standardize variables apart
• ∀x, P(x) ∨ ∀x, Q(x)
becomes
∀x, P(x) ∨ ∀y, Q(y)
• This is just to keep the scopes of variables from getting confused
• Not necessary in our running example
101
Step 4: Move quantifiers
• Move all quantifiers to the left, without changing their relative
positions
102
Step 5: Eliminate existential quantifiers
103
Step 6: Drop the prefix (quantifiers)
• ∀x, ∀y, ∀z,[ ¬Roman(x) ∨ ¬know(x, Marcus) ] ∨
[hate(x, Caesar) ∨ (¬hate(y, z) ∨ thinkCrazy(x, y))]
• At this point, all the quantifiers are universal quantifiers
• We can just take it for granted that all variables are universally
quantified
• [ ¬Roman(x) ∨ ¬know(x, Marcus) ] ∨
[hate(x, Caesar) ∨ (¬hate(y, z) ∨ thinkCrazy(x, y))]
104
Step 7: Create a conjunction of disjuncts
becomes
105
Step 8: Create separate clauses
• Every place we have an ∧, we break our expression up into separate
pieces
• Not necessary in our running example
106
Step 9: Standardize apart
• Rename variables so that no two clauses have the same variable
• Not necessary in our running example
• Final result:
¬Roman(x) ∨ ¬know(x, Marcus) ∨
hate(x, Caesar) ∨ ¬hate(y, z) ∨ thinkCrazy(x, y)
107
Other Logics
A very brief glimpse
Situation Calculus
• Every mutable predicate includes a situation "timestamp":
weather(sunny, s)
• Actions are described by effect axioms:
forall thing(x), place(p),
(i_am_at(p, s) & at(x, p, s)
=> holding(x, Result(take(x))) )
• Things that don't change are described by frame axioms:
forall thing(x), action(a), (holding(x, s) & a ~= drop =>
holding(x, Result(a, s)) )
109
Second Order Predicate Calculus
• A relation E is an equivalence relation, iff
• E is reflexive: for all x, E(x, x)
• reflexive(E) => for all x, E(x,x)
• E is symmetric: for all x, y, E(x, y) => E(y, x)
• symmetric(E) => for all x, y, E(x, y) => E(y, x)
• E is transitive:
• transitive(E) =>for all x, y, E(x,y) & E(y,z) => E(x,z)
• for all E, equivalence(E) => reflexive(E)
& symmetric(E) & transitive(E)
• But you can't say this in FOPC
110
Three-valued logics:
• Kleene: Values are True, False, and Unknown
111
Modal Logic
• Along with models, modal logic introduces the idea of “possible
worlds”
• New operators:
• “It is possible that”(x)
• “It is necessary that”(x)
• New rules of inference for these operators
112
Bayesian Statistics
• Probabilities are between 0 and 1
• P(a /\ b) = P(a) x P(b)
• P(~a) = 1 - P(a)
• P(a \/ b) = 1 - (~a /\ ~b)
113
Fuzzy Logic
• Truth values are between 0 and 1
• T(a /\ b) = min(T(a), T(b))
• T(a \/ b) = max(T(a), T(b))
• T(-a) = 1 - T(a)
• T(a \/ ~a) != 1
• Exact values don’t seem to matter much
114
Tense Logic
• P = it has always been the case that
• F = it will at some time be the case that
• H = it has always been the case that
• G = it will always be the case that
• Pp = ~H ~ p
• Fp = ~ G ~ p
115
Situation Calculus
• Time is discrete
• All statements are tagged with a time, or “situation”
• Result (action, situationn) = situationn+1
• Effect axioms: how actions change the world
• Frame axioms: how the world stays the same
• Successor-state axioms: how predicates can become
true or false
116
Defeasible Logics
• All birds fly
• except penguins
• other than Poppy, who runs an airline
• when the plane isn’t undergoing repairs
117
Types of Logic
• From: A Bibliography of Non-Standard Logics
• http://www.earlham.edu/~peters/courses/logsys/nonstbib.htm#higher
• isa(person, mammal)
• instance(Mike-Hall, person) all in one graph
• team(Mike-Hall, Cardiff)
is_a
is_a
Game
Is_a
Home_team
Norwich
Solution 3
• John gave Mary the book
Gave Book
Action Instance
Patient
Mary
Advantages of Semantic Networks
• Easy to visualise and understand.
• The knowledge engineer can arbitrarily defined the relationships.
• Related knowledge is easily categorised.
• Efficient in space requirements.
• Node objects represented only once.
•…
• Standard definitions of semantic networks have been developed.
Limitations of Semantic Networks
• The limitations of conventional semantic networks were studied
extensively by a number of workers in AI.
• Many believe that the basic notion is a powerful one and has to be
complemented by, for example, logic to improve the notion’s
expressive power and robustness.
what
trouble
Limitations of Semantic Networks
• Other problematic statements. . .
• negation "John does not go fishing";
• disjunction "John eats pizza or fish and chips";
•…
• Quantified statements are very hard for semantic nets. E.g.:
• "Every dog has bitten a postman"
• "Every dog has bitten every postman"
• Solution: Partitioned semantic networks can represent quantified
statements.
Partitioned Semantic Networks
• Hendrix (1976 : 21-49, 1979 : 51-91) developed the so-called
partitioned semantic network to represent the difference between
the description of an individual object or process and the description
of a set of objects. The set description involves quantification.
• Every node and every arc of a network belongs to (or lies in/on) one
or more spaces.
134
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
Conceptual graph indicating that the dog named Emma is brown.
135
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
Conceptual graph of a person with three names.
136
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
Conceptual graph of the sentence “The dog scratches its ear with its paw.”
137
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
Conceptual graph of the statement “Tom believes that Jane likes pizza,” showing
the use of a propositional concept.
138
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
Conceptual graph of the proposition “There are no pink dogs.”
139
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
Partitioned Semantic Networks
• Suppose that we wish to make a specific statement about a dog,
Danny, who has bitten a postman, Peter:
• " Danny the dog bit Peter the postman"
SA
General
Statement dog bite postman
is_a S1
is_a is_a is_a
form
G D
agent
B
patient
P
Partitioned Semantic Networks
• Suppose that we now want to look at the statement:
• "Every dog has bitten every postman"
SA
General
Statement dog bite postman
is_a S1
is_a is_a is_a
form
G D
agent
B
patient
P
Partitioned Semantic Networks
• Suppose that we now want to look at the statement:
• "Every dog in town has bitten the postman"
SA
dog
ako
General
Statement town dog bite postman
is_a S1
is_a is_a is_a
form
G D
agent
B
patient
P
is_a
object
space
pizza tasty
is_a is_a
is_a
is_a
GS1 student party love
form
S1 is_a
form S2 is_a is_a
GS2 exists
p1 receiver l1
s1 agent
Frames
Scripts
Conceptual dependency theory of four primitive conceptualizations
154
155
156
Conceptual dependency representing “John ate the egg”
(Schank and Rieger 1974).
157
Elements of a planning problem
A
S0 S1
C B
A B C
Plan construction by resolution Planning as resolution (4)
(a very simple example)
• For any proposition φ, sum the atomic events where it is true: P(φ) =
Σω:ω╞φ P(ω)
Inference by enumeration
• Start with the joint probability distribution:
• For any proposition φ, sum the atomic events where it is true: P(φ) =
Σω:ω╞φ P(ω)
• P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
Inference by enumeration
• Start with the joint probability distribution:
• For any proposition φ, sum the atomic events where it is true: P(φ) =
Σω:ω╞φ P(ω)
• P(toothache ∨ cavity) = 0.108 + 0.012 + 0.016 + 0.064 + 0.072 +
0.008 = 0.28
Inference by enumeration
• Start with the joint probability distribution:
• Conditional probabilities:
P(¬cavity | toothache) = P(¬cavity ∧ toothache)
P(toothache)
= 0.016+0.064
0.108 + 0.012 + 0.016 + 0.064
= 0.4
Marginalization
• One particularly common task is to extract the distribution over some
subset of variables or a single variable.
• For example, adding the entries in the first row gives the
unconditional probability of cavity:
P(cavity) = 0.108+0.012+0.072+0.008 = 0.2
1
198 9
Marginalization
• This process is called marginalization or summing out, because the
variables other than Cavity are summed out.
• General marginalization rule for any sets of variables Y and Z:
P(Y) = ΣzP(Y, z)
•A distribution over Y can be obtained by summing out all the other
variables from any joint distribution containing Y.
1
199 9
Marginalization
Typically, we are interested in:
the posterior joint distribution of the query variables X
given specific values e for the evidence variables E.
Then the required summation of joint entries is done by summing out the hidden
variables:
2
202 0
Normalization
• 32 entries reduced to 12
• For n independent biased coins, O(2n) →O(n)
• Absolute independence powerful but rare
• Dentistry is a large field with hundreds of variables, none of which are
independent. What to do?
Conditional independence
• P(Toothache, Cavity, Catch) has 23 – 1 (because the numbers must sum to 1) = 7 independent
entries
• If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a
toothache:
P(catch | toothache, cavity) = P(catch | cavity)
• Equivalent statements:
P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
Conditional independence
• Full joint distribution using product rule:
P(Toothache, Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)
The resultant three smaller tables contain 5 independent entries (2*(21-1) for each conditional
probability distribution and 21-1 for the prior on Cavity)
• In most cases, the use of conditional independence reduces the size of the
representation of the joint distribution from exponential in n to linear in n.
• Conditional independence is our most basic and robust form of knowledge about
uncertain environments.
Bayes' rule
• Product rule
P(a∧b) = P(a | b) P(b) = P(b | a) P(a)
⇒ Bayes' rule:
P(a | b) = P(b | a) P(a) / P(b)
• or in distribution form
P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y)
209
Bayes' rule: example
• Most doctors get the same wrong answer on this problem - usually,
only around 15% of doctors get it right. ("Really? 15%? Is that a real
number, or an urban legend based on an Internet poll?" It's a real
number. See Casscells, Schoenberger, and Grayboys 1978; Eddy
1982; Gigerenzer and Hoffrage 1995. It's a surprising result which is
easy to replicate, so it's been extensively replicated.)
• On the story problem above, most doctors estimate the probability to
be between 70% and 80%, which is wildly incorrect.
210
Bayes' rule: example
C = breast cancer (having, not having)
M = mammographies (positive, negative)
211
Bayes' rule: example
P(C | m) = P(m | C) P(C) / P(m) =
= α P(m | C) P(C) =
= α <P(m | c) P(c), P(m | ¬c) P(¬c)> =
= α <0.8 * 0.01, 0.096 * 0.99> =
= α <0.008, 0.095> = <0.078, 0.922>
P(c | m) = 7.8%
212
Bayes' Rule and conditional independence
P(Cavity | toothache ∧ catch)
= αP(toothache ∧ catch | Cavity) P(Cavity)
= αP(toothache | Cavity) P(catch | Cavity) P(Cavity)