Proposition Al 2
Proposition Al 2
Approaches to
First-Order
Theorem Proving
David A. Plaisted
UNC Chapel Hill
May 2004
History of AI
Early emphasis on general
methods
Newell Shaw Simon GPS
Robinson 1965 resolution
Cordell Green question answering
Shift to specialized techniques
Feigenbaum Expert Systems
Is logic a suitable basis for AI?
04/09/25
Approaches to AI
Weak vs. strong methods in AI
Declarative vs. procedural knowledge
My interest: general logic-based approaches
04/09/25
Aristotle on Deduction
A deduction is speech (logos)
in which, certain things having
been supposed, something
different from those supposed
results of necessity because of
their being so. (Prior Analytics
I.2, 24b18-20)
Proof
Proof is the idol before whom the
pure mathematician tortures
himself.
-- Sir Arthur Eddington
You may prove anything by
figures. --Thomas Carlyle
What is now proved was once
only imagined. -- William Blake
Proof
You cannot demonstrate
an emotion or prove an
aspiration. -- John Morley
Prove all things; hold
commonsense another. --
Elbert Hubbard, The Note
Book, 1927
Theorem Proving
Potentially a key technology for AI
Brittleness problem for expert
systems
An unsolved problem
Weak versus strong methods
Problems with resolution
Impact on entire field
Importance of space versus time
Theorem Proving on a
Computer
Speed and accuracy of computers
People get tired and make
mistakes
How do people prove theorems?
Potential applications
Hardware verification
Software verification
AI and expert systems
Robots
Deductive Databases
Semantic web and query
answering
Mathematics research
Education
Current theorem provers
Largely syntactic
Resolution or ME (tableau) based
First-order provers are often
poor on non-Horn clauses
Rarely can solve hard problems
Human interaction needed for
hard problems
04/09/25
How do humans prove
theorems?
Semantics
Case analysis
Sequential search through space
of possible structures
Focus on the theorem
People versus computers
In a few areas computers are faster
Propositional calculus
Equational logic
Geometry
Propositional Calculus
Propositional Resolution
Horn Clauses
Davis and Putnam’s Method
The Satisfiability Threshold
Propositional Calculus (continued)
Performance Obtained
Applications
P P P P
T F F
F T F
Testing Validity
Using truth tables is exponential
Resolution
Davis and Putnam’s Method
Local Search Methods
Conjunctive Normal
Form
Any propositional formula can be
put into conjunctive normal form
(clause form).
Example:
(p q r) (p r) (q r)
Represent as sets:
{p, q, r}, {p, r}, {q, r}
clause clauseclause
Conjunctive Normal
Form
A formula in conjunctive normal
form is unsatisfiable if for every
interpretation I, there is a clause
C that is false in I.
A formula in cnf is satisfiable if
there is an interpretation I that
makes all clauses true.
Binary Resolution Step
For any two clauses C1 and C2, if there is
a literal L1 in C1 that is complementary
to a literal L2 in C2, then delete L1 and L2
from C1 and C2 respectively, and
construct the disjunction of the
remaining clauses. The constructed
clause is a resolvent of C1 and C2.
Examples of Resolution Step
C1=a b, C2=b c
Complementary literals : b,b
Resolvent: ac
╨
indicate that semantics I makes the
clause C true.
If C is a ground clause then I satisfies
C if I satisfies at least one of its
literals.
Otherwise I satisfies C if I satisfies all
ground instances D of C. (Herbrand
interpretations.)
If I does not satisfy C then we say I
falsifies C.
Example Semantics
Specify I by interpreting symbols
Interpret predicate p(x,y) as x = y
Interpret function f(x,y) as x + y
Interpret a as 1, b as 2, c as 3
Then p(f(a,b),c) interprets to TRUE
but p(a,b) interprets to FALSE
Thus I satisfies p(f(a,b),c) but I
falsifies p(a,b)
Obtaining Semantics
Humans using mathematical
knowledge
Automatic methods (finite
models)
Trivial semantics
Herbrand’s Theorem
A set S of clauses is unsatisfiable if
there is a finite unsatisfiable set T
of ground instances of S.
The basis of uniform proof
procedures.
Example: S = {{p(a)},{p(x),
p(f(x))}, {p(f(f(a)))}}
T = {{p(a)},{p(a), p(f(a))},
{p(f(a)), p(f(f(a)))}, {p(f(f(a)))}}
{p(a)} {p(x), p(f(x))}
{p(f(f(a)))}
{p(a)}
{p(a), p(f(a))}
{p(f(a)), p(f(f(a)))}
{p(f(f(a)))}
Criteria to evaluate
provers
Don’t know versus don’t care
nondeterminism
Clauses generated by need or
possibility
Instantiation by unification or by
semantics or neither
Clauses selected by semantics
Goal sensitivity
Space versus time
Resolution Principle
Steps for resolution refutation proofs
Put the premises or axioms into clause
form.
Add the negation of what is to be proved, in
empty clause.
This is possible if and only if the theorem is
valid. (Completeness)
Prove that “Fido will die.” from the
statements “Fido is a dog.”,
“All dogs are animals.”
and “All animals will die.”
Changing premises to predicates
(x) (dog(X) animal(X))
dog(fido)
4.die(fido) die(fido)
Equivalent Reasoning by
Resolution(continued)
die(fido) die(fido)
(X)mother(X,m(X))
(X)(Y)(Z)(W)(foo (X,Y,Z,W))
(X)(Y)(W)(foo(X,Y,f(X,Y),W))
Resolution on the predicate calculus
A literal and its negation in parent
clauses produce a resolvent only if they
unify under some substitution . is
then applied to the resolvent before
adding it to the clause set.
C = dog(X) animal(X)
1
C2 = animal(Y) die(Y)
Resolvent : dog(Y) die(Y) {Y/X}
C1 = p(X) q(f(X)) C2 = q(Y) r(g(Y))
Resolvent: p(X) r(g(f(X)))
“Lucky student”
1. Anyone passing his history exams
and winning the lottery is happy
X(pass(X,history) win(X,lottery)
happy(X))
2. Anyone who studies or is lucky can
pass all his exams.
XY(study(X) lucky(X) pass(X,Y))
3. John did not study but he is lucky
study(john) lucky(john)
4. Anyone who is lucky wins the
lottery.
X(lucky(X) win(X,lottery))
Clause forms of “Lucky student”
1. pass(X,history) win(X,lottery)
happy(X)
2. study(X) pass(Y,Z)
lucky(W) pass(W,V)
3. study(john)
lucky(john)
4. lucky(V) win(V,lottery)
5. Negate the conclusion “John is
happy”
happy(john)
Resolution refutation for the
“Lucky Student” problem
{U/X}
pass(U, history) happy(U) lucky(U) happy(john)
{john/U}
lucky(john) pass(john,history) lucky(join)
{}
pass(john,history) lucky(V) pass(V,W)
{john/V,history/W}
lucky(john) lucky(john)
{}
Evaluating resolution
Clauses generated by possibility
(bad)
Don’t care nondeterminism (good)
Unification based (good?)
No semantics (bad)
Uses a large amount of space (bad)
Often not goal sensitive (bad)
Refinements
Many refinements of resolution
have been developed in an attempt
to improve its performance
Set of support
Hyper resolution
Ancestry filter form
Unit preference
…
Otter
PROBLEM SEC CLAUSES KEPT
LCL064-1.in 0.14 1080844 8604
LCL064-2.in 0.00 9448 1954
LCL065-1.in 0.00 2992 653
LCL066-1.in 0.00 1452 306
LCL067-1.in 0.14 492984 9283
LCL068-1.in 0.29 569577 9593
LCL069-1.in 0.00 3577 288
LCL070-1.in 0.14 427166 8840
LCL071-1.in 0.29 449389 8941
LCL072-1.in 0.00 161139 6280
Hyper Linking
Separates instantiation and
inference
Given S, selects clauses C and D in S
and literals L in C and M in D, and
generates instances C’ and D’ so that
L’ and M’ are complementary. Then
C’ and D’ are added to S.
Periodically S is tested for
unsatisfiability using DPLL.
Hyper Linking
Problem Input OTTER Hyper
Clauses (sec) Linking
Ph5 45 38606.76 1.8
Ph9 297 >24 hrs 2266.6
Latinsq 16 >24 hrs 56.4
Salt 44 1523.82 28.0
Zebra 128 >24 hrs 866.2
Eliminating Duplication with the
Hyper-Linking Strategy, Shie-Jue
Lee and David A. Plaisted,
Journal of Automated Reasoning
9 (1992) 25-42.
Later propositional
strategies
Billon’s disconnection calculus,
derived from hyper-linking
Disconnection calculus theorem
prover (DCTP), derived from
Billon’s work
FDPLL
Performance of DCTP on
TPTP, 2003
First in EPS and EPR (largely
propositional)
Third in FNE (first-order, no
equality) solving same number as
best provers
Fourth in FOF and FEQ (all first-
order formulae, and formulae with
equality)
Not tuned to 50 categories!
Definition Detection
I0 I1 I2 I3 …
D0 D1 D2 T
unsatisfiable
OSHL
I0 is specified by the user
Di is chosen so that Ii falsifies Di
Di is an instance of a clause in S
Ii is chosen so that Ii satisfies Dj for
all j < i
Let Ti be {D0,D1, …, Di-1}.
Ii falsifies Di but satisfies Ti
When Ti is unsatisfiable OSHL stops