Zhang H. Logic in Computer Science 2025
Zhang H. Logic in Computer Science 2025
Logic
in Computer
Science
Logic in Computer Science
Hantao Zhang • Jian Zhang
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore
Pte Ltd. 2025
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
v
vi Preface
for proving a language is unrecognizable and whose comments lead to the concept
of “computably countable” in Chap. 11.
It has been a pleasure to work with the folks at Springer Nature in creating the
final product. We mention Celine Lanlan Chang, Sudha Ramachandran, and Kamesh
Senthilkumar because we have had the most contact with them, but we know that
many others have been involved, too. We would also like to thank the National
Science Foundation of China (NSFC) for its continuous support.
Finally, we would like to thank deeply our families for their patience, inspiration,
and love. The first author is grateful in particular to his son, Roy, and daughter,
Felice, for comments and proofreading of the book, and his wife, Ling Rao, for
logistic of writing the book. This book is dedicated to Ling.
1 Introduction to Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Logic Is Everywhere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Statement or Proposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 A Brief History of Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Thinking or Writing Logically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Logical Fallacies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Formal Fallacies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Informal Fallacies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 A Glance of Mathematical Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2 Computability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.3 Model Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.4 Proof Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
ix
x Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
About the Authors
xvii
Chapter 1
Introduction to Logic
Do you like to solve Sudoku puzzles? If so, how long will it take on average to solve
a Sudoku puzzle appearing in a newspaper, minutes or hours? An easy exercise in
Chap. 1 of this book asks you to write a small program in your favorite language
that will solve a Sudoku puzzle, no matter the difficulty, in less than a second on
your laptop.
Did you ever play the Tower of Hanoi? This puzzle consists of three rods and a
number of disks of different sizes, which can slide onto any rod. The puzzle starts
with the disks stacked in ascending order of size on one rod, making a conical shape
with the smallest at the top. The objective of the puzzle is to move the entire stack
to another rod, following three rules:
• Only one disk can be moved at a time.
• Each move consists of taking the top disk from one of the stacks and placing it
on top of another stack or on an empty rod.
• No larger disk may be placed on top of a smaller disk.
In the illustration figure, the three rods are named a, b, and c (Fig. 1.1). There are
three disks initially on rod a, and each disk is named by its size. A solution of the
puzzle of moving the disks from rod a to rod b is given as follows:
Move disk 1 from a to b
Move disk 2 from a to c
Move disk 1 from b to c
Move disk 3 from a to b
Move disk 1 from c to a
Move disk 2 from c to b
Move disk 1 from a to b
This solution was produced by a Prolog program of 3 lines and 168 characters
(including commas and periods). The same program can produce solutions for
various numbers of disks. Prolog is a programming language based on logic and
is introduced in Chap. 8 of the book.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 1
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2_1
2 1 Introduction to Logic
The above two examples illustrate that logic is not only a theoretic foundation of
computer science but also a problem-solving tool. The emphasis of this book is on
how to make this tool efficient and practical. After a rigorous introduction to basic
concepts, we will study various algorithms based on logic and introduce software
tools which are implementations of these algorithms.
To solve a problem using a logic-based tool, you need to think about the problem
in logic and specify the problem in logic. This is a skill that needs to be learned
like any other, and it does take some training and practice to master this skill. The
same skill can be applied to abstract situations such as those encountered in formal
proofs.
Logic is innate to all of us—indeed, you probably use the laws of logic
unconsciously in your everyday speech and in your own internal reasoning. Because
logic is innate, the logic principles that we learn should make sense—if you find
yourself having to memorize one of the principles in the book, without feeling a
mental “click” or comprehending why that law should work, then you will probably
not be able to use that principle correctly and effectively in practice.
In 1953 Albert Einstein wrote the following in a letter to J. E. Switzer:
The development of Western Science has been based on the two great achievements, the
invention of the formal logical system (in Euclidean geometry) by the Greek philosophers,
and the discovery of the possibility of finding out causal relationships by systematic
experiment (at the Renaissance).
Logic has been called the calculus of computer science, because logic is fundamen-
tal in computer science, similar to calculus in the physical and engineering sciences.
Logic is used in almost every field of computer science: computer architecture (such
as digital gates, hardware verification), software engineering (specification and
verification), programming languages (semantics, type theory, abstract data types,
object-oriented programming), databases (relational algebra), artificial intelligence
1.1 Logic Is Everywhere 3
Logic comes from natural languages as most sentences in natural languages are
statements or propositions. A proposition is a sentence that expresses a judgment
or opinion and has a truth value. We will use statement and assertion as synonyms
of proposition. Here are some examples:
• The color of the apple is green.
• Today is either Monday or Tuesday.
• He is a 20-year-old sophomore.
Every statement can be either true or false. For instance, the first statement above
can be true for one apple but false for another. True or false are called the truth
values of a statement. Some sentences in natural languages are not statements, such
as commands or exclamatory sentences. For example, “Run fast!” Coach shouts.
The first part of the sentence, “Run fast!”, is not a statement; the second part, “Coach
shouts,” is a statement. In fact, a sentence is a statement if and only if it has a truth
value.
In natural languages, we can combine or relate statements with words such as
“not” (negation), “and” (conjunction), “or” (disjunction), “if-then” (implication),
4 1 Introduction to Logic
etc. That is, a statement can be obtained from other statements by these words. In
logic, these words are called logical operators, or equivalently, logical connectives.
A statement is composed if it can be expressed as a composition of several simpler
statements; otherwise, it is simple. In the above three examples of statements, the
first statement is simple; the other two are composed. That is, “today is either
Monday or Tuesday” is the composition of “today is Monday” and “today is
Tuesday,” using the logical operator “or.” The second sentence, “He is a 20-year-old
sophomore,” is the composition of “he is a 20-year-old” and “he is a sophomore,”
using the implicit logical operator “and.” We often regard “the color of that apple is
not green” as a composed statement: It is the negation of a simple statement.
Logical operators are indispensable for expressing the relationship between
statements. For example, the following statement is a composition of several simple
statements.
(∗) If either taxes are not raised or expenditures rise, then the debt ceiling is increased.
((¬t) ∨ e) → d
scientifically successful. One does not even have to know exactly what the truth
values true and false are.
Language is a tool for thinking and writing. Being logical means that the reality
of the world expressed by language is accurate. Hence, language and logic are
inseparable, as logic is about truth. To think or write logically, we need to pay
attention to the following points:
• Matching words to facts or entities
A fact is something made or done. An entity is a thing with distinct and
independent existence. We will use fact for both fact and entity. Fact has an
objective status. A word can be used to denote a fact. The meaning of a
word changes if the word is used to represent a different fact, just like the
interpretation of a propositional variable changes in different applications. Being
logical requires that the word unambiguously represents a fact in a given context.
Many words have multiple meanings. For example, “apple” can be a fruit name or
a company’s name. Often, we cannot come up with the right word for a fact. We
may need a novel word or to add an adjective to a word so that the word matches
the fact. Mixing meanings of a word in the same context causes confusion. In a
conversation, the two parties should attach the same meaning to a word.
• Making true statements
Most sentences in a natural language are statements. Many statements can be
expressed in propositional logic. Some of them cannot, but can be expressed
in other logic, such as first-order logic. A true simple statement should be
the representation of an objective fact. When a simple statement cannot be
convincingly true, it gives us a distorted representation of the objective world.
The truth value of complex statements can be decided by logical reasoning,
which can be found later in the book. Speaking or writing incomplete sentences
is a bad habit, as these sentences cannot be statements. We should be cautious
when a statement contains subjective adjectives as the truth of this statement is
subjective. We should try to be straightforward in writing or speaking to help the
reader or audience to understand and accept the truth of your statements.
Violating the above two points leads to mistakes, many of which are so-called
logical fallacies.
Example 1.1.1 The word “logic” has many meanings. In this section, it means “the
study of correct reasoning,” including both formal and informal logic. The focus of
this book is on a small portion of formal logic used in computer science for building
reasoning tools. Hence, the word “logic” in the title of this book means “a science
that deals with the principles and criteria of validity of inference and demonstration.”
On the other hand, “logic” in “propositional logic” means “the formal principles
of a branch of knowledge.” This way of using the word “logic” may cause some
confusion, but will never be an obstacle for learning logic in computer science.
1.1 Logic Is Everywhere 7
If we match one word to two facts, confusion may arise as shown by the Paradox
of the Court, also known as Protagoras’ paradox, which originated in ancient
Greece.
Paradox of the Court The famous sophist Protagoras took on a promising pupil, Euathlus,
on the understanding that the student would pay Protagoras for his instruction after he wins
his first court case. After instruction, Euathlus decided to not enter the profession of law,
and so Protagoras decided to sue Euathlus for the amount he is owed. Protagoras argued
that if he won the case, he would be paid his money. If Euathlus won the case, Protagoras
would still be paid according to the original contract, because Euathlus would have won his
first case. Euathlus, however, claimed that if he won, then by the court’s decision he would
not have to pay Protagoras. If, on the other hand, Protagoras won, then Euathlus would still
not have won a case and would therefore not be obliged to pay. The question is then, which
of the two men is in the right?
The story continues by saying the jurors were embarrassed to make a decision
and postponed the case indefinitely. The confusion of this paradox lies on the mix-
up of “two types of payments”: one by the contract and the other by the court order.
If we distinguish these two types of payments, then the paradox no longer exists.
Let p denote “Euathlus won his first case” and q(x) denote “Euathlus pays $ x
to Protagoras.” Then the following two statements are true:
• By the contract: If p then q(t) else q(0), where $ t is the tuition of the
instruction.
• By the court order: If p then q(x) else q(y), where x can be negative if the
court asks Protagoras to pay Euathlus.
Merging the above two statements, the statement “if p then q(t + x) else q(y)”
is true. That is, Euathlus needs to pay Protagoras $ (t + x) if he won the case, and
$y, otherwise. If x = 0 and y = t, Protagoras will get paid the same amount in both
cases. Euathlus does not need to pay Protagoras if x = −t (when he won) or y = 0
(when he lost), two unlikely court decisions.
The interesting number paradox claims that, if we classify every natural number
as either “interesting” or “uninteresting,” then every natural number is interesting.
A “proof by contradiction” goes as follows: If there exists a non-empty set S of
uninteresting natural numbers, then the smallest number of S is interesting by being
the smallest uninteresting number, thus producing a contradiction. The paradox is
due to lack of a formal definition of “interesting numbers,” so that a number can be
both interesting and uninteresting.
In logic, paradox serves as an acute example of a theory’s inconsistency and
is often a motivation for the development of a new theory. In natural language,
finding a paradox in the opponent’s arguments is a good way to win a debate.
Sometimes, paradox can be a useful expression. The phrase “indescribable feeling”
is such an example. If the feeling is indescribable, then this phrase conveys nothing.
But if the word “indescribable” communicates something about the feeling, then
it does convey something: This is self-contradictory in logic but useful in natural
language. As another example, “‘Impossible’ is not in our vocabulary!” Though a
8 1 Introduction to Logic
Logical fallacies are logical errors one makes when writing or speaking. In this
section, we will give a brief introduction to logical fallacies, which are the subjects
of philosophical logic. Knowing these fallacies will help us to avoid them. Writing
logically is related to, but not the same as, writing clearly, or efficiently, or
convincingly, or informatively; ideally, one would want to do all of these at once, but
one often does not have the skill to achieve them all. Though with practice, you will
be able to achieve more of your writing objectives concurrently. The big advantage
of writing logically is that you can be absolutely sure that your conclusion will be
correct, as long as all your hypotheses are correct, and your steps are logical.
In philosophical logic [2], an argument has similar meaning to a proof in formal
logic, as the argument consists of premises and conclusions. A fallacy is the use of
invalid or otherwise faulty reasoning in the construction of an argument. A fallacious
argument may be deceptive by appearing to be better than it really is. Some fallacies
are committed intentionally to manipulate or persuade by deception, while others
are committed unintentionally due to carelessness or ignorance.
Fallacies are commonly divided into “formal” and “informal.” A formal fallacy
can be expressed neatly in formal logic, such as propositional logic, while an
informal fallacy cannot be expressed in formal logic.
This is a typical example of a conclusion that does not follow logically from
premises or is based on irrelevant data. Here are some common logical fallacies.
• Affirming the consequent: Any argument with the invalid structure of: If A then
B. B, therefore, A.
Example. If I get a B on the test, then I will get the degree. I got the degree, so it follows
that I must have received a B. In fact, I got an A.
• Denying the antecedent: Any argument with the invalid structure of: If A then
B. Not A, therefore not B.
1.2 Logical Fallacies 9
Example. If it’s a dog, then it’s a mammal. It’s not a dog, so it must not be a mammal.
In fact, it’s a cat.
• Affirming a disjunct: Any argument with the invalid structure of: It is the case
that A or B. A, therefore, not B.
Example. I am working or I am at home. I am working, so I must not be at home. In
fact, I am working at home.
• Denying a conjunct: Any argument with the invalid structure of: It is not the
case that both A and B. Not A, therefore B.
Example. I cannot be both at work and at home. I am not at work, so I must be at home.
In fact, I am at a park.
• Undistributed middle: Any argument with the invalid structure of: Every A has
B. C has B, so C is A.
Example. Every bird has a beak. That creature has a beak, so that creature must be a
bird. In fact, the creature is a dinosaur.
A formal fallacy occurs when the structure of the argument is incorrect, despite
the truth of the premises. A valid argument always has a correct formal structure, and
if the premises are true, the conclusion must be true. When we use false premises,
the formal fallacies disappear, but the argument may be regarded as a fallacy as the
conclusion is invalid.
As an application of modus ponens, the following example contains no formal
fallacies:
If you took that course on CD player repair right out of high school, you would be doing
well and vacationing on the moon right now.
Even though there is no logic error in the argument, the conclusion is invalid
because the premise is contrary to the fact. With a false premise, you can make any
conclusion, so that the composed statement is always true. However, an always true
statement has no value in reasoning.
By contrast, an argument with a formal fallacy could still contain all true
premises:
(a) If someone owns the world’s largest diamond, then he is rich. (b) King Solomon was
rich. Therefore, (c) King Solomon owned the world’s largest diamond.
Although (a) and (b) are true statements, (c) does not follow from (a) and (b)
because the argument commits the formal fallacy of “affirming the consequent.”
There are numerous kinds of informal fallacies that use an incorrect relation between
premises and conclusion. These fallacies can be grouped into four groups, and each
group contains several types of fallacies:
10 1 Introduction to Logic
Some informal fallacies do not belong to the above four groups. For example, the
false dilemma fallacy presents a choice between two mutually exclusive options,
implying that there are no other options. One option is clearly worse than the
other, making the choice seem obvious. Also known as the either/or fallacy, false
dilemmas are a type of informal logical fallacy in which a faulty argument is used
to persuade an audience to agree. False dilemmas are everywhere.
• Vote for me or live through 4 more years of higher taxes.
• America: Love it or leave it.
• Subscribe to our streaming service or be stuck with cable.
For a complete list of informal fallacies, the reader is recommended to read
Wikipedia’s page on the same topic for details. Understanding these logical fallacies
can help us more confidently parse the arguments we participate in on a daily basis,
separating fact from dressed fiction.
The main purpose of this presentation is to show the notations used throughout the
book.
People often use N to denote the set of natural numbers, i.e.,
N = {0, 1, 2, 3, . . .}
R = {m/n | m, n ∈ N , n = 0}
Relations
< = {0, 1, 0, 2, 0, 3, 1, 2, 0, 4, 1, 3, 0, 5, 1, 4, 2, 3, . . .}
x1 x2 · · · xi · · ·
Functions
Countable Sets
Let S be any set. If S is finite, |S| is a number in N , and we can compare these
numbers easily. How to compare the sizes of infinite sets? Intuitively, if X is a proper
subset of S, then |X| < |S|. However, this intuition is only correct for finite sets.
For instance, let E be the set of all even natural numbers, then E ⊂ N . However, it
1.3 A Glance of Mathematical Logic 15
f (i, j ) = g(i + j ) + j
The inverse of f exists: For any k ∈ N , let m = h(k) be the integer such that
g(m) ≤ k < g(m + 1). Then f −1 (k) = i, j , where i = g(m) + m − k =
g(h(k)) + h(k) − k and j = k − g(m) = k − g(h(k)). It is easy to check that
i + j = m and f (i, j ) = k. For example, if k = 12, then m = 4, and i = j = 2.
Table 1.1 shows the first few values of k = f (i, j ). Since both f and f −1 are
total functions, f is bijective. So, N × N and N have the same size.
4. To show Z is countable, we define a bijection f from Z to N : f (x) = if x ≥ 0
then 2x else −2x − 1. This function maps 0 to 0, positive integers to even natural
numbers, and negative integers to odd natural numbers. f is bijective because
both f and f −1 are total functions: For any n ∈ N , f −1 (n) = if n is even then
n/2 else (−n − 1)/2. Note that ≥ on Z is not well-founded and cannot be used
to list Z from the smallest to a larger one. Using f as the rank function of Z, we
list Z by the rank of each element as follows: 0, −1, 1, −2, 2, −3, 3, . . ..
5. Let S = {0, 1}∗ be the set of all finite-length binary strings. To show S is
countable, let us find a bijection from S to N . For any binary string s ∈ S,
let the length of s be n, i.e., |s| = n. There are 1 + 2 + 22 + . . . + 2n−1 = 2n − 1
strings in S whose lengths are less than |s|. Let v(s) be the decimal value of s:
v() = 0, where is the empty string, v(x0) = 2v(x) and v(x1) = 2v(x) + 1,
then there are v(s) strings of length n whose decimal values are less than v(s).
Define γ : S → N by γ (s) = 2|s| + v(s) − 1, then γ must be a bijection,
because both γ and γ −1 : N → {0, 1}∗ are total functions: For any x ∈ N , let
n = log2 (x + 1), then γ −1 (x) = s such that |s| = n and v(s) = x − 2n + 1.
6. Let = {a0 , a1 , . . . , ak−1 }. If k = 1, then ∗ = {a0i | i ∈ N } is countable. If
k ≥ 2, we generalize the proof in the previous problem from {0, 1}∗ to ∗ .
Let θ : → N be θ (ai ) = i for 0 ≤ i ≤ k − 1. Every string s ∈ ∗ is
either or xa, where x ∈ ∗ and a ∈ . The decimal value of s ∈ ∗ , v(s), is
computed as follows: v() = 0 and v(xa) = v(x) ∗ k + θ (a). If |s| = n, there are
(k n − 1)/(k − 1) strings of ∗ whose lengths are less than n, and there are v(s)
strings of length n whose decimal values are less than v(s). Define γ : ∗ → N
by γ (s) = (k |s| − 1)/(k − 1) + v(s), then it is easy to check that both γ and γ −1
(an exercise) are total functions.
The last three examples are instances of countably infinite sets. ∗ as well as
{0, 1}∗ in the last two examples plays an important role in theory of computation. If
we order the strings of ∗ by γ , i.e., listing them as γ −1 (0), γ −1 (1), γ −1 (2), and so
on, this order will choose (a) shorter length first; (b) for strings of the same length,
smaller decimal value first. In this order, s ∈ ∗ is preceded by (k |s| − 1)/(k − 1)
strings whose lengths are less than |s|, plus v(s) strings of length |s| and whose
values are less than v(s). Hence, the rank of s in this order is exactly γ (s) = (k |s| −
1)/(k − 1) + v(s).
Definition 1.3.5 Let γ : ∗ → N be the rank function defined in Proposi-
tion 1.3.4(6). The order induced by γ is called the canonical order of ∗ .
For instance, {0, 1}∗ is sorted by the canonical order as follows:
{, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, 0000, . . .}
Uncountable Sets
In 1874, in his first set theory article, Georg Cantor introduced the term “countable
set” and proved that the set of real numbers is not countable, thus showing that not all
infinite sets are countable. In 1878, he used bijective proof to identify infinite sets of
the same size, distinguishing sets that are countable from those that are uncountable
by the well-known diagonal method.
Proposition 1.3.7 The following sets are uncountable:
1. B: the set of binary strings of infinite length
2. R1 : the set of non-negative real numbers less than 1
18 1 Introduction to Logic
Proof For the set B of infinite-length binary strings, we will use Cantor’s diagonal
method to show that B is uncountable by a refutational proof. Assume that B
is countable, then there exists a bijection f : N → B and we write B =
{s0 , s1 , s2 , . . .}, where si = f (i). Let the j th symbol of si be sij . Now, we construct
a binary string x of infinite length as follows: The j th symbol of x, i.e., xj , is the
complement of sjj . That is, if sjj = 0, then xj = 1; if sjj = 1, then xj = 0.
Clearly, x ∈ B. Let x = sk for some k ∈ N . However, x differs from sk on the
k th symbol, i.e., xk = skk , the complement of skk , where 0 = 1 and 1 = 0. The
contradiction comes from the assumption that B is countable. Table 1.2 illustrates
the idea of Cantor’s diagonal method.
Once we know B is uncountable, if we have a bijection from B to R1 , then R1
must be uncountable. The function f : B → R1 is defined by f (s) = 0.s, as each
number of R1 can be represented by a binary number, where s ∈ B. It is obvious
that f −1 (0.s) = s, assuming an infinite sequence of 0 is appended to the right end
of s if s is a finite sequence. Since f and f −1 are total functions, f is indeed a
bijection.
To show P(N ) is uncountable, let g : B → P(N ) be g(s) = {j | (j + 1)th
symbol of s is 1 }. For example, if s = 1101000 . . . (the rest of s are zeros), then
g(s) = {0, 1, 3}. If g(s) = A ∈ P(N ), then g −1 (A) = λ(0)λ(1)λ(2) · · · ∈ B,
where, for every i ∈ N , λ(i) = 1 if i ∈ A and 0 if i ∈ / A. Obviously, g is
bijective because both g and g −1 are total functions. In literature, g −1 (A) is called
the characteristic sequence of A.
Similarly, to show the set DF of decision functions is uncountable, let h : B →
DF be h(s) = f , where, for every j ∈ N , f (j ) = the (j + 1)th symbol of s.
Since s is a binary string of infinite length, f : N → {0, 1} is well-defined. For
instance, if s is a string of all zeros, then f (j ) = 0 for all j . If h(s) = f , then
h−1 (f ) = f (0)f (1)f (2) · · · ∈ B. Obviously, h is bijective because both h and h−1
are total functions.
A total decision function f = h(s), s ∈ B, is also called characteristic function,
because s = g −1 (A), A ∈ P(N ), is a characteristic sequence of A ⊆ N , where h
and g are defined in the above proof.
The result that P(N ) is uncountable while N is countable can be generalized:
For every set S, the power set of S, i.e., P(S), has a larger size than S. This result
implies that the notion of the set of all sets is an inconsistent notion. If S were the
set of all sets, then P(S) would at the same time be bigger than S and a subset of S.
Cantor chose the symbol ℵ0 for the size of N , |N |, a cardinal number. Symbol
ℵ0 is read aleph-null, where ℵ is the first letter of the Hebrew alphabet. The size of
the reals is often denoted by ℵ1 , or c for the continuum of real numbers, because the
set of real numbers is larger than N . Cantor showed that there are infinitely many
infinite cardinal numbers, and there is no largest cardinal number.
ℵ0 = |N |
ℵ1 = |P(N )| = 2ℵ0 > ℵ0
ℵ2 = |P(P(N ))| = 2ℵ1 > ℵ1
ℵ3 = |P(P(P(N )))| = 2ℵ2 > ℵ2
...
The famous inconsistent example of naive set theory is the so-called Russell’s
paradox, discovered by Bertrand Russell in 1901:
Russell’s Paradox Let T = {S | S ∈ / S}, then T ∈ T ≡ T ∈ / T.
That is, if “T ∈ T ” is false, then “T ∈ T ” is true by the definition of T because
condition “S ∈ / S” is true and T = S ∈ T . If “T ∈ T ” is true, then “T ∈ / T”
is also true because every member of T satisfies S ∈ / S. This paradox showed that
some attempted formalization of the naive set theory created by Cantor et al. led to
a contradiction. Russell’s paradox shows that the concept of “a set contains itself”
is invalid. If “S ∈ S” is always false, then S ∈ / S is always true and T = {S | S ∈
/ S}
is the “set of all sets,” which does not exist, either.
A layman’s version of Russell’s paradox is called the Barber’s paradox.
Barber’s Paradox In a village, there is only one barber who shaves all those, and
those only, who do not shave themselves. The question is, does the barber shave
himself?
Answering this question results in a contradiction. The barber cannot shave
himself as he only shaves those who do not shave themselves. Conversely, if the
barber does not shave himself, then he fits into the group of people who would be
shaved by the barber, and thus, as the barber, he must shave himself.
The discovery of paradoxes in naive set theory caused mathematicians to wonder
whether mathematics itself is inconsistent, and to look for proofs of consistency.
Ernst Zermelo (1904) gave proof that every set could be well-ordered, using the
axiom of choice, which drew heated debate and research among mathematicians. In
20 1 Introduction to Logic
1908, Zermelo provided the first set of axioms for set theory. These axioms, together
with the axiom of replacement proposed by Abraham Fraenkel, are now called
Zermelo–Fraenkel set theory (ZF). Besides ZF, many set theories are proposed since
then, to rid paradoxes from naive set theory.
Recursion
The main reason for the name of “recursion theory” is that recursion is used to
construct objects and functions.
Example 1.3.8 Given a constant symbol 0 of set T and a function symbol suc :
T → T , the objects of set T can be recursively constructed as follows:
1. 0 is an object of T .
2. If n is an object of T , so is suc(n).
3. Nothing else will be an object of T .
If we let suc0 (0) = 0, suc1 (0) = suc(0), suc2 (0) = suc(suc(0)), etc., then T can
be expressed as
0 ={} = ∅
1 = {0} = {∅}
2 = {0, 1} = {∅, {∅}}
3 = {0, 1, 2} = {∅, {∅}, {∅, {∅}}}
... ...
Functions can be recursively defined in an analogous way. Let pre, add, sub,
and mul be the predecessor, addition, subtraction, and multiplication functions on
the set of natural numbers:
pre(0) = 0
pre(s(x)) = x
add(0, y) = y
add(s(x), y) = s(add(x, y))
sub(x, 0) = x
sub(x, s(y)) = sub(pre(x), y))
mul(0, y) = 0
mul(s(x), y) = add(mul(x, y), y)
Note that the subtraction defined above is different from the subtraction on the
integers: We have sub(s i (0), s j (0)) = 0 when i ≤ j . This function holds the
property that sub(x, y) = 0 iff x ≤ y.
The above functions are examples of so-called primitive recursive functions,
which are total functions and members of a set of so-called general recursive
functions. A primitive recursive function can be computed by a computer program
that uses an upper bound (determined before entering the loop) for the number of
iterations of every loop. The set of general recursive functions (also called partial
recursive functions) is equivalent to those functions that can be computed by Turing
machines.
The symbol ::= can be understood as “is defined by,” and “|” separates alternative
parts of a definition. The above formula can be read as follows: the member of N is
either 0, or suc applying to (another) member of N; nothing else will be in N. From
this example, we see that to define a set N, we use multiple items on the right of
22 1 Introduction to Logic
::=. For a recursive definition, we have a basic case (such as 0) and a recursive case
(such as suc(N), where N denotes a member of N).
Symbols like N are called variables in a context-free grammar and symbols
like 0, suc, (, and ) are called terminals. A context-free grammar specifies what set
of strings of terminals can be derived for each variable. For the above example, the
set of strings derived from BNF is
This set is the same as the objects defined in Example 1.3.8, but BNF gives us a
formal and compact definition.
Example 1.3.9 To define the set ∗ of all strings (of finite length) over the alphabet
, we may use BNF. For instance, if = {a, b, c}, then ∗ is the set of all strings
of finite length:
symbol ::= a | b | c
∗ ::= | symbol ∗
∗ = {, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc, aaa, . . .}.
Computable Functions
general functions over natural numbers. Since the set of finite-length binary strings
is countable and there exists a bijection between binary strings and Turing machines,
the set of Turing machines is countable. In other words, we have only a countable
set of Turing machines but decision problems are uncountable. As a result, there
exist massive number of decision problems which are uncomputable.
One notable undecidable problem is to decide if a Turing machine M will halt on
an input w. If we encode M and w as a single positive integer denoted by M, w,
and define the function f (M, w) = 1 if M halts on w, and 0, otherwise, then f is
an undecidable decision function.
As the counterpart of Church–Turing thesis, the concept of Turing completeness
concerns whether a computing model can simulate a Turing machine. A computing
model is Turing complete if it can simulate any Turing machine. By the Church–
Turing thesis, a Turing machine is theoretically capable of doing all tasks done by
computers; on the other hand, nearly all digital computers are Turing complete if
the limitation of finite memory is ignored. Some logics are also Turing complete as
they can be used to simulate any Turing machine. As a result, some problems such
as the halting problem for these logics are undecidable. Computability theory will
help us to decide if there exist or not decision procedures for some logic, and we
will see some examples in Chap. 11.
Model theory is the study of the relationship between formal theories and their
models. A formal theory is a collection of sentences in a formal language expressing
statements. Models are mathematical structures (e.g., groups, fields, algebras,
graphs, logics) in which the statements of the theory hold. The aspects investigated
include the construction, the number and size of models of a theory, the relationship
of different models to each other, and their interaction with the formal language
itself. In particular, model theorists also investigate the sets that can be defined
in a model of a theory, and the relationship of such definable sets to each other.
Every formal language has its syntax and semantics. Models are a semantic structure
associated with syntactic structures of a formal language. Following this approach,
every formal logic is defined inside of a formal language.
The syntax of a formal language specifies how various components of the language,
such as symbols, words, and sentences, are defined. For example, the language
for propositional logic uses only propositional variables and logic operators as its
symbols, and well-formed formulas built on these symbols as its sentences. In model
theory, a set of sentences in a formal language is one of the components that form
a theory. The language for logic often contains the two constant symbols, either
1.3 A Glance of Mathematical Logic 25
1 and 0, or and ⊥, which are interpreted as true and false, respectively. They
are usually considered to be special logical operators which take no arguments, not
propositional variables.
The semantics of a language specifies the meaning of various components of the
language. For example, if we use symbol q to stand for the statement “Today is
Monday,” then the meaning of q is “Today is Monday.” q can be used to denote a
thousand different statements, just like a thousand Hamlets in a thousand people’s
eyes. On the other hand, a formal meaning of the formula q ∨ r is the truth value
of q ∨ r, which can be decided uniquely when the truth values of q and r are
given. In model theory, the formal meaning of a sentence is explored: It examines
semantic elements (meaning and truth) by means of syntactical elements (formulas
and proofs) of a corresponding language.
In model theory, semantics and model are synonyms. A model of a theory is an
interpretation that satisfies the sentences of that theory. Universal algebras are often
used as models. In a summary definition, dating from 1973,
Universal algebra, also called categorical algebra, is the field of mathematics that
studies algebraic structures and their models or algebras.
Boolean Algebra
In universal algebra, the most relevant algebra related to the logic discussed in this
book is Boolean algebra. Boolean algebra is not necessarily a topic of model theory,
but is often quoted as an example in model theory.
Many syntactic concepts of Boolean algebra carry over to propositional logic
with only minor changes in notation and terminology, while the semantics of
propositional logic are defined via Boolean algebras in a way that the tautologies
(theorems) of propositional logic correspond to equational theorems of Boolean
algebra.
In Boolean algebra, the values of the variables are the truth values true and false,
usually denoted by 1 and 0, respectively. The main operations of Boolean algebra are
the multiplication “·” (conjunction), the addition “+” (disjunction), and the inverse
i (negation). It is thus a formalism for describing logical operations in the same
way that elementary algebra describes numerical operations, such as addition and
multiplication. In fact, Boolean algebra is any set with binary operations + and
· and a unary operation i thereon satisfying the Boolean laws (equations), which
define the properties of the logical operations.
Boolean algebra was introduced by George Boole in his first book The Mathe-
matical Analysis of Logic (1847). Sentences that can be expressed in propositional
logic have an equivalent expression in Boolean algebra. Thus, Boolean algebra
is sometimes used to denote propositional logic performed in this way. However,
26 1 Introduction to Logic
Boolean algebra is not sufficient to capture logic formulas using quantifiers, like
those from first-order logic.
In formal logic, the axioms are a set of sentences which are assumed to be true.
Typically, the axioms contain by default the definitions of all logical operators. For
example, the negation operator ¬ is defined by ¬() = ⊥ and ¬(⊥) = .
Intuitively, the set of theorems are the formulas which can be proved to be true
from the axioms by the given proof system. For example, for any propositional
variable p, ¬(¬(p)) = p is a theorem. Using the case analysis method, if the truth
value of p is , then ¬(¬()) = ¬(⊥) = ; if the truth value of p is ⊥, then
¬(¬(⊥)) = ¬() = ⊥. It is easy to see that different sets of axioms or different
proof systems would lead to different set of theorems.
A better approach to define theorems is to use the concept of models. Given a
set of axioms, a theorem is any formula that will accept any model of the axioms as
its model. According to this definition, if a set of axioms has no models, then every
formula will be a theorem. Later, we will see that this definition of theorems leads
to two important properties of a proof system: completeness (every theorem can be
proved) and soundness (everything proved is a theorem).
There are two important properties concerning a set of axioms:
• Consistency The axiom set is consistent if every formula in the axiom set can
be true at the same time.
• Independence The axiom set is independent if no axiom is a theorem of the
other axioms. An independent set of axioms is also called a minimum set of
axioms.
1.3 A Glance of Mathematical Logic 27
Proof Procedures
Given a set A of axioms and a formula B, we need a procedure P that can answer
the question “if B is a theorem of A or not”: If P (A, B) returns “yes,” we say B
is proved; if P (A, B) returns “no,” we say B is disproved; if it does not return a
thing, we do not know if B is a theorem or not. P is called a proof procedure. This
procedure can be carried out by hand or executed on a computer.
There are four important properties concerning a proof procedure P :
• Consistency P (A, B) is consistent iff either A is inconsistent, or it is impossible
for P (A, B) to return both true and false for any B. This concept assumes that a
proof procedure may return more than one output on the same input.
• Soundness P (A, B) is sound iff whenever P (A, B) returns true, B is a theorem
of A; whenever P (A, B) returns false, B is not a theorem of A.
• Completeness P (A, B) is complete iff for any theorem B of A, P (A, B) returns
true.
• Termination P (A, B) is terminating iff P (A, B) halts for any axiom set A and
any formula B.
Soundness is a stronger property than consistency as a proof procedure may be
consistent but not sound. For example, in propositional logic, if P ({p∨q}, p) returns
true but P ({p ∨ q}, ¬p) does not return true, then P is consistent. However, P is
not sound, since p is not a theorem of p ∨ q in propositional logic.
Proposition 1.3.11
(a) If a proof procedure P (A, B) is sound, then P (A, B) is consistent.
(b) If a proof procedure P (A, B) is sound and terminating, then P (A, B) is
complete.
Proof
(a) If A is inconsistent, then P (A, B) is consistent by definition. Now assume A is
consistent. If P (A, B) is not consistent, then there exists a formula B such that
P (A, B) returns both true and false. In this case, P (A, B) cannot be sound,
because B cannot be a theorem and a non-theorem at the same time when A is
consistent.
(b) For any theorem B of the axioms A, P (A, B) must return true because if
P (A, B) halts with false, that is a contradiction to the soundness of P (A, B).
The proof of the above proposition shows that, under the condition of soundness,
the termination of P implies the completeness of P . In this case, P is called a
decision procedure for the logic if P is both sound and terminating. By the above
result, every decision procedure is sound and complete. In general, a decision
procedure usually refers to an algorithm which always halts with a correct answer
to a decision problem, such as “A is valid or not.” Some algorithms always halt with
one of the outcomes: “yes,” “no,” or “unknown”; they cannot be called “decision
28 1 Introduction to Logic
procedures,” because they may return “unknown” on some theorems, thus not
complete.
Under the condition of soundness, the completeness of a proof procedure P does
not imply the termination of P , because a sound and complete proof procedure may
loop on a formula that is not a theorem and thus cannot be a decision procedure
by definition. In literature, people call a sound and complete proof procedure as a
semi-decision procedure for the logic, because it will halt on any formula which is
a theorem.
For any logic, we always look for its decision procedure that is both sound and
terminating. For example, propositional logic has different decision procedures for
different decision problems. If a decision procedure does not exist, we look for a
semi-decision procedure. For some logic, even a semi-decision procedure cannot
exist. These comments will be more meaningful once we have formally defined the
logic in question.
Inference Rules
In structural proof theory, which is a major area of proof theory, inference rules
are used to construct a proof. In logic, an inference rule is a pattern which takes
formulas as premises and returns a formula as conclusion. For example, the rule of
modus ponens (MP) takes two premises, one in the form “If p then q” (i.e., p → q)
and another in the form p, and returns the conclusion q. A rule is sound if whenever
the premises are true under any interpretation, so is the conclusion.
An inference system S consists of a set A of axioms and a set R of inference
rules. The soundness of S comes from the soundness of every inference rule in R. A
proof of formula Fn in S = (A, R) is a sequence of formulas F1 , F2 , . . . , Fn such
that each Fi is either in A or can be generated by an inference rule of R, using the
formulas before Fi in the sequence as the premises.
Example 1.3.12 Let S = (A, R), A = {p → (q → r), p, q}, and R = {MP }. A
proof of r from A using R is given below:
1. p → (q → r) axiom
2. p axiom
3. q→r MP , 1, 2
4. q axiom
5. r MP , 3, 4
One obvious advantage of such a proof is that these proofs can be checked either
by hand or automatically by computer. Checking formal proofs is usually simple,
whereas finding proofs (automated theorem proving) is generally hard. Checking
an informal proof in the mathematics literature, by contrast, requires weeks of
peer review and may still contain errors. Informal proofs of everyday mathematical
practice are unlike the proofs of proof theory. They are like high-level sketches
that would allow an expert to reconstruct a formal proof at least in principle, given
enough time and patience. For most mathematicians, writing a fully formal proof is
too pedantic and long-winded to be in common use.
In proof theory, proofs are typically presented as recursively defined data
structures such as lists (as shown here), or trees, which are constructed according
to the axioms and inference rules of the logical system. As such, proof theory is
syntactic in nature, in contrast to model theory, which is semantic in nature.
Every inference system can be converted to a proof procedure if a fair strategy is
adapted to control the use of inference rules. Fair strategy ensures that if an inference
rule is applicable at some point to produce a new formula, this new formula
will be derived eventually. If every inference rule is sound, the corresponding
proof procedure is sound. Extra effort is needed in general to show that the proof
procedure is complete or halting.
Besides structural proof theory, some of the major areas of proof theory include
ordinal analysis, provability logic, reverse mathematics, proof mining, automated
theorem proving, and proof complexity.
One of the main aims of this book is to present algorithms invented for logic
and the software tools built up based on these algorithms. That is why we will
invest a disproportionate amount of effort (by the standard of conventional logic
textbooks) in the algorithmic aspect of logic. Set theory and model theory help us
to understand and specify the problems; proof theory helps us to create algorithms;
and computability theory helps us to know what problems have algorithms for them,
and what problems do not have algorithms at all.
Exercises
1. Decide if the following sentences are statements or not. If they are, use propo-
sitional variables (i.e., symbols) for simple statements and logic connectives to
express them.
(a) If I win the lottery, I’ll be poor.
(b) The man asked, “shut the door behind you.”
(c) Today is the first day of school.
(d) Today is either sunny or cloudy.
2. Pinocchio, an animated puppet, is punished for each lie that he tells by
undergoing further growth of his nose. The Pinocchio paradox arises when
Pinocchio says “My nose grows now.” Please explain why this is a paradox.
30 1 Introduction to Logic
7. Let A = {0, 1}; please provide the solution of P(A) and P(P(A)).
8. Show that all countably infinite sets have the same size, and each countably
infinite set has a rank function.
9. Show that if r is the rank function of a countably infinite set S, then r induces
a well-order on S: x y iff r(x) ≥ r(y).
10. Let a, b be real numbers and a < b. Define a function f : [0, 1) → [a, b) by
f (x) = a + x(b − a), where [a, b) denotes the interval {x | a ≤ x < b}. Show
that f is bijective.
11. Answer the following questions with justification:
(a) Is the set of all odd integers countable?
(b) Is the set of real numbers in the interval [0, 0.1) countable?
(c) Is the set of angles in the interval [0o , 90o ) countable?
(d) Is the set of all points in the plane with rational coordinates countable?
(e) Is the set of all Java programs countable?
(f) Is the set of all words using an English alphabet countable?
(g) Is the set of sands on the earth countable?
(h) Is the set of atoms in the solar system countable?
12. Prove the following properties:
(a) Function f : A → B is injective iff its inverse is well-defined and injective.
(b) An injective function f : A → B is total iff its inverse is surjective
(assuming a surjective function is not required to be total).
13. Prove Proposition 1.3.1: f : A → B is bijective iff both f and f −1 are total
functions.
14. Prove that the set R of rational numbers is countable.
15. Prove that the set N × {a, b, c} is countable, where N is the set of natural
numbers. Your proof should not use the property that countable sets are closed
under Cartesian product.
16. Decide with proof if the set N k is countable or not, where N is the set of natural
numbers and k ∈ N .
17. Decide with proof if the set Sk = {A | A ⊂ N , |A| = k} is countable or not,
where N is the set of natural numbers and k ∈ N .
18. Prove that if both A and B are countable and A∩B = ∅, then A∪B is countable.
19. Let each of A0 , A1 , . . . , Ai , . . . be countably
infinite and for all i, j ∈ N ,
Ai ∩ Aj = ∅ for i = j . Prove that S = ∞ i=0 A i is countably infinite.
20. Decide with proof if the set of all total functions f : {0, 1} → N is countable
or not.
21. Please provide a bijection f : N → {0, 1}∗ and show how to compute f (x) for
any x ∈ N . What is the time complexity of your algorithm for f (x)?
22. For the function γ in Definition 1.3.5, please define γ −1 explicitly as a total
function from N to ∗ .
23. Prove that the set {a, b, c}∗ of all strings (of finite length) on {a, b, c} is
countable.
32 1 Introduction to Logic
24. Prove that the set of all formal languages is uncountable, where a formal
language is any subset of ∗ and = ∅.
25. Prove that the following statements are logically equivalent:
(a) Any subset of a countable set is countable.
(b) Any subset of N is countable.
(c) If there is an injective function from set S to N , then S is countable.
(d) If there is a surjective function from N to S, then S is countable.
26. Barber’s paradox: In a village, there is only one barber who shaves all those,
and those only, who do not shave themselves. The question is, does the barber
shave himself? Let the relation s(x, y) be that “x shaves y” and b denote the
barber.
(a) Define A to be the set of the people shaved by the barber using s(x, y) and
b.
(b) Define B to be the set of the people who does not shave himself.
(c) Use A and B to show that the answer to the question “does the barber shave
himself?” leads to a contradiction.
27. Suppose the set of natural numbers is constructed by the constant 0 and the
successor function s. Provide a definition of < and ≤ on the natural numbers.
No functions other than 0 and s are allowed to use when defining < or ≤.
28. Provide a definition of power function, pow(x, y), which computes x y , on the
natural numbers built up by 0 and the successor function s. You are only allowed
to use the functions add (addition) and mul (multiplication).
29. Provide a BNF (Backus–Naur form) for the set of binary trees, where each non-
empty node contains a natural number. You may use null: Tree for the empty
tree and node: Nat, Tree, Tree → Tree for a node of the tree.
References
1. Paul Vincent Spade and Jaakko J. Hintikka, History of logic. Encyclopedia Britannica, Dec. 17,
2020, www.britannica.com/topic/history-of-logic, retrieved Nov. 1, 2023
2. Daniel Cohnitz and Luis Estrada-Gonzalez, An Introduction to the Philosophy of Logic,
Cambridge University Press, Jun. 27, 2019
3. Joseph Mileti, Modern Mathematical Logic (Cambridge Mathematical Textbooks), Cambridge
University Press, Sep. 22, 2022
Part I
Propositional Logic
Chapter 2
Propositional Logic
2.1 Syntax
The syntax of propositional logic specifies what symbols are allowed and in what
forms these symbols are used. Propositional logic uses propositional variables and
logical operators as the only symbols. In the first chapter, we said that propositional
variables are symbols for denoting statements and take only true and false as truth
values. For convenience, we will use 1 for true and 0 for false. We will treat and
⊥ as the two nullary logical operators such that the truth value of is 1 and the
truth value of ⊥ is 0. Note that and ⊥ are syntactic symbols while 1 and 0 are
semantic symbols.
Mathematicians use the words “not,” “and,” “or,” etc., for operators that change or
combine propositions. The meaning of these logical operators can be specified as a
function which takes Boolean values and returns a Boolean value. These functions
are called Boolean functions. For example, if p is a proposition, then so is ¬p and
the truth value of the proposition ¬p is determined by the truth value of p according
to the meaning of ¬: ¬(1) = 0 and ¬(0) = 1. That is, if the value of p is 1, then
the value of ¬(p) is 0; if the value of p is 0, then the value of ¬(p) is 1. Note that
the same symbol ¬ is used both as a Boolean operator (syntax side) and a Boolean
function (semantic side).
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 35
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2_2
36 2 Propositional Logic
and ↔ are some of them. We may use operators when n > 2. For example, let
ite : B 3 → B be the if-then-else operator, if we use a truth table for the meaning of
n
ite(p, q, r), then the truth table will have eight rows. In general, there are 22 n-ary
Boolean functions, because the number of functions f : A → B is |B||A| and for
n-ary Boolean function, |B| = 2 and |A| = |B n | = |B|n = 2n .
2.1.2 Formulas
Not every combination of propositional variables and logic operators makes sense.
For instance, p () q ∨ ¬) is a meaningless string of symbols. The propositional
formulas are the set of well-formed strings built up by propositional variables and
logical operators by a rigorous set of rules.
Let op be the binary operators and V be the propositional variables used in an
application, then the Formulas for this application is defined by the following BNF
grammar:
op ::= ∧ | ∨ | → | ⊕ | ↔
V ::= p | q | r | s | t
F ormulas ::= | ⊥ | V | ¬ F ormulas | ( F ormulas op F ormulas )
We may add/delete operators or propositional variables at will in the above
definition. According to the definition, every member of F ormulas is either the
constants or ⊥, a propositional variable of VP , or the negation of a formula, or a
binary operator applying to two formulas. Here are some examples:
2. If A is ¬B, then the root node is labeled with ¬ and has one branch pointing to
the formula tree of B.
3. If A is B op C, then the root node is labeled with op and has two branches
pointing to the formula trees of B and C, respectively.
For example, the formula tree of (¬p ∧ q) → (p ∧ (q ∨ ¬r)) is displayed in
Fig. 2.1. The tree representation of a formula does not need parentheses.
A formula and its formula tree are the two representations of the same object. We
are free to choose one of the representations for the convenience of discussion.
Definition 2.1.2 Given a formula A, we say B is a subformula A if either B = A or
A = (A1 op A2 ), where op is a binary operator, or A = ¬A1 and B is a subformula
of A1 or A2 . If B is a subformula of A and B = A, then B is a proper subformula
of A.
For example, ¬p ∨ ¬(q ∨ r) contains p, q, r, ¬p, q ∨ r, and ¬(q ∨ r) as proper
subformulas. Intuitively, a subformula is just a formula represented by any subtree
in the formula tree.
Definition 2.1.3 Given a subformula B of A and another formula C, we use
A[B C] to denote the formula obtained by substituting all occurrences of B
by C in A. If B is a propositional variable, then A[B C] is called an instance of
A.
For example, if A is ¬p ∨ ¬(q ∨ r), B is ¬(q ∨ r), and C is ¬q ∧ ¬r, then
A[B C] is ¬p ∨ (¬q ∧ ¬r). A[p p ∧ q] = ¬(p ∧ q) ∨ ¬(q ∨ r) is an instance
of A.
2.2 Semantics
The title of the previous section is syntax, though we talked about Boolean values,
truth tables, and the meaning of logical operators, which are, strictly speaking,
semantic concepts of the logic. Now, let us focus on the meaning of propositional
variables and formulas.
2.2 Semantics 39
2.2.1 Interpretations
A propositional variable may be assigned a truth value, either true or false. This
truth value assignment is considered to be the semantics of the variable. For a
formula, an assignment of truth values to every propositional variable is said be an
interpretation of the formula. If A is a formula built on the set VP of propositional
variables, then an interpretation of A is the function σ : VP → {1, 0}. It is easy to
check that there are 2|VP | distinct interpretations.
Suppose σ is an interpretation over VP = {p, q} such that σ (p) = 1 and σ (q) =
0. We may write σ = {p → 1, q → 0}. We may also use (1, 0) for the same σ ,
assuming VP is a list (p, q) and σ (p, q) = (1, 0).
An alternative representation of an interpretation σ is to use a subset of VP . Given
σ , let
Xσ = {x ∈ VP | σ (x) = 1}
Then there is one-to-one relation between σ and Xσ . So, we may use Xσ to represent
σ . For example, if VP = {p, q}, then the four interpretations over VP are the power
set of VP :
then Yσ is a set of literals such that every variable of VP appears in Yσ exactly once.
For instance, let VP = {p, q} and Xσ = ∅, then Yσ = {¬p, ¬q}; if Xσ = {p},
Yσ = {p, ¬q}. That is, by adding the negations of the missing variables in Xσ ,
we obtain such a representation Yσ for σ . Using a set of literals to represent an
interpretation has an additional advantage: It can represent a partial interpretation
where some propositional variables do not have a truth value.
For a compact representation of an interpretation, we may write a set of literals
as a conjunction of these literals, often omitting the conjunction operator ∧. For
example, {p, ¬q} can be written as p ∧ ¬q, or simply pq.
40 2 Propositional Logic
Given σ , we can check the truth value of the formula under σ , denoted by σ (A).
This notation means that we treat σ as a function from the formulas to {0, 1}, as the
unique extension of σ : VP → {0, 1}. On the other hand, the notation Aσ is used to
denote the formula obtained by substituting each propositional variable p by σ (p)
if we regard 1 as and 0 as ⊥.
Example 2.2.1 Given the formula A = (p ∧ ¬q) ∨ r, and an interpretation σ =
{p → 1, q → 1, r → 0}, we can rewrite σ as {p, q, ¬r}. Replacing p by 1, q by 1,
and r by 0 in A, we have Aσ = 1 ∧ ¬1 ∨ 0. Applying the meanings of ¬, ∧, and
∨, we obtain 0. In this case, we say the formula is evaluated to 0 under σ , denoted
by σ (A) = 0. Given another interpretation σ = {p, q, r}, the same formula will be
evaluated to 1, i.e., σ (A) = 1.
Recall that we use A[B C] to denote an instance of A where every occurrence
of propositional variable B is replaced by C. An interpretation σ is a special case
of the substitution of formulas where B is a variable and C is either 1 or 0. For
example, if σ = {p, ¬q}, then Aσ = A[p 1][q 0]. Strictly speaking, Aσ
is not a propositional formula: It is the meaning of A under σ . Hence, we assume
σ = 1 and ⊥σ = 0 for any σ , where and ⊥ are propositional formulas, and 1
and 0 are their truth values, respectively.
To obtain σ (A), we may use the algorithm eval.
Algorithm 2.2.2 The algorithm eval will take a formula A and an interpretation σ
and returns a Boolean value.
proc eval(A, σ )
if A = return 1
if A = ⊥ return 0
if A ∈ VP return σ (A)
if A is ¬B return ¬eval(B, σ )
else A is (B op C) return eval(B, σ ) op eval(C, σ )
Example 2.2.3 Let σ = {p, ¬q}. Then the execution of eval(p → p ∧ ¬q, σ ) will
return 1:
eval(p → p ∧ ¬q, σ ) calls
eval(p, σ ), which returns 1; and
eval(p ∧ ¬q, σ ), which calls
eval(p, σ ), which returns 1; and
eval(¬q, σ ), which calls
eval(q, σ ), which returns 0; and
returns ¬0, i.e., 1;
returns 1 ∧ 1, i.e., 1;
returns 1 → 1, i.e., 1.
What eval does is to travel the formula tree of A bottom-up: If the node is labeled
with a variable, use σ to get the truth value; otherwise, compute the truth value of
2.2 Semantics 41
this node under σ using the operator at that node with the truth values from its
children. The process of running eval is exactly what we do when constructing a
truth table for p → p ∧ ¬q. The truth values under p and q in the truth table give
us all the interpretations σ and the truth values of A = ¬q, p ∧ ¬q, or p → p ∧ ¬q
are the values of eval(A, σ ).
p q ¬q p ∧ ¬q p → p ∧ ¬q
0 0 1 0 1
0 1 0 0 1
1 0 1 1 1
1 1 0 0 0
Theorem 2.2.4 Algorithm eval(A, σ ) returns 1 iff σ (A) = 1, and runs in O(|A|)
time, where |A| denotes the number of symbols, excluding parentheses, in A.
|A| is also the number of nodes in the formula tree of A. The proof is left as an
exercise.
Corollary 2.2.5 For any formulas A and B, and any interpretation σ , the following
equations hold.
σ (A ∨ B) = σ (A) ∨ σ (B)
σ (A ∧ B) = σ (A) ∧ σ (B)
σ (¬A) = ¬σ (A)
The truth value of a formula in propositional logic reflects the two foundational
principles of Boolean logic: the principle of bivalence, which allows only two truth
values, and the principle of extensibility that the truth value of a general formula
depends only on the truth values of its parts, not on their informal meaning.
Interpretations play a very important role in propositional logic and introduce many
important concepts.
Definition 2.2.6 Given a formula A and an interpretation σ , σ is a model of A if
eval(A, σ ) = 1. If A has a model, A is satisfiable.
Given a set VP of propositional variables, we will use AllP to denote the
set of all interpretations over VP . For example, if VP = {p, q}, then AllP =
{{p, q}, {p, ¬q}, {¬p, q)}, {¬p, ¬q}}, or abbreviated as AllP = {pq, pq, pq, pq}.
In general, |AllP | = 2|VP | . We may use eval to look for a model of A by examining
every interpretation in AllP .
42 2 Propositional Logic
Definition 2.2.7 Given a formula A, let M(A) be the set of all models of A. If
M(A) = ∅, A is unsatisfiable. If M(A) = AllP , i.e., every interpretation is a
model of A, then A is valid, or tautology. We write | A if A is valid.
Being valid is different from being useful or efficient. For instance, the statement
p ∨ ¬p is valid but neither useful nor efficient (the formula is more precise).
If there is a truth table for a formula A, M(A) can be easily obtained from the
truth table, as the truth values under the variables of A are all the interpretations and
the models of A are given in those rows of the table where the truth value of A is 1.
Example 2.2.8 Let VP = {p, q}. If A is ¬p → ¬q, then M(A) = {pq, pq, pq}.
If B is p ∨ (¬q → ¬p), then M(B) = AllP . From M(A) and M(B), we know
that both A and B are satisfiable, and B is also valid, i.e., | B is true.
A valid formula is one which is always true in every interpretation, no matter
what truth values its variables may take, that is, every interpretation is its model.
The simplest example is , or p ∨ ¬p. There are many formulas that we want to
know if they are valid, and this is done by so-called theorem proving, either by hand
or automatically.
We may think about valid formulas as capturing fundamental logical truths. For
example, the transitivity property of implication states that if one statement implies
a second one, and the second one implies a third, then the first implies the third. In
logic, the transitivity can be expressed as the following formula:
(p → q) ∧ (q → r) → (p → r)
The validity of the above formula confirms the truth of this property of implication.
There are many properties like this for other logical operators, such as the
commutativity and associativity of ∧ and ∨, and they all can be stated by a tautology.
Theorem 2.2.9 (Substitution) If A is valid, so is any instance of A. That is, if p is
a propositional variable in A and B is any formula, then A[p B] is valid.
Proof For any interpretation σ of A[p B], let σ (p) = σ (B). Applying σ to the
formula trees of A and A[p B], then the truth values of all the nodes of A must
be identical to those of the corresponding nodes of A[p B]. Since A is valid, the
root node of A must be 1 under σ , so the root node of A[p B] must have the
same truth value, i.e., 1. Since σ is arbitrary, A[p B] must be valid.
The significance of the above theorem is that if we have a tautology for one
variable, the tautology holds when the variable is substituted by any formula. For
example, from p ∨ ¬p, we have A ∨ ¬A for any A. On the other hand, when we
try to prove a tautology involving symbols A, B, . . . , we may treat each of these
symbols as a propositional variable. For example, to prove (A ∧ B) ↔ (B ∧ A), we
may prove instead (p ∧ q) ↔ (q ∧ p).
An unsatisfiable formula is one which does not have any model, that is, no
interpretation is its model. The simplest example is ⊥, or p ∧ ¬p. Validity and
unsatisfiability are dual concepts, as stated by the following proposition.
2.2 Semantics 43
Proof To show M(A ∨ B) = M(A) ∪ M(B), we need to show that, for any
interpretation σ , σ ∈ M(A ∨ B) iff σ ∈ M(A) ∪ M(B). By definition, σ ∈
M(A ∨ B) iff σ (A ∨ B) = 1. By Corollary 2.2.5, σ (A ∨ B) = σ (A) ∨ σ (B). So
σ (A∨B) = 1 iff σ (A) = 1 or σ (B) = 1. σ (A) = 1 or σ (B) = 1 iff σ (A)∨σ (B) =
1. σ (A) ∨ σ (B) = 1 iff σ ∈ M(A) ∪ M(B).
The proof of (b) and (c) is left as exercise.
Example 2.2.12 Let VP = {p, q}. M(p) = {pq, pq}, M(q) = {pq, pq};
M(¬p) = AllP − M(p) = {pq, pq}, and M(¬p ∨ q) = M(¬p) ∪ M(q) =
{pq, pq, pq}.
2.2.3 Equivalence
In natural language, one statement can be expressed in different forms with the same
meaning. For example, “If I won the lottery, I must be rich.” This meaning can be
also expressed “Since I am not rich, I didn’t win the lottery.” Introducing p for “I
won the lottery” and q for “I’m rich,” the first statement becomes p → q and the
second statement becomes ¬q → ¬p. It happens that M(p → q) = M(¬q →
¬p), as Table 2.2 shows that both formulas have the same set of models. In logic,
these formulas are considered to be equivalent.
The formula ¬q → ¬p is called the contrapositive of the implication p → q.
The truth table shows that an implication and its contrapositive are equivalent: They
are just different ways of saying the same thing. In contrast, the converse of p → q
is the formula q → p, which has a different set of models (as shown in Table 2.2).
Definition 2.2.13 Given two formulas A and B, A and B are logically equivalent
if M(A) = M(B), denoted by A ≡ B.
A ≡ B means that, for every interpretation σ , σ (A) = σ (B), so σ ∈ M(A)
iff σ ∈ M(B). The relation ≡ over formulas is an equivalence relation as ≡ is
reflexive, symmetric, and transitive. The relation ≡ is also a congruence relation as
A ≡ C and B ≡ D imply that ¬A ≡ ¬C and A o B ≡ C o D for any binary
operator o. This property allows us to obtain an equivalent formula by replacing a
part of the formula by an equivalent one. The relation ≡ plays a very important role
in logic as it is used to simplify formulas, to convert formulas into equivalent normal
forms, or to provide alternative definitions for logical operators.
In arithmetic one writes simply s = t to express that the terms s, t represent the
same function. For example, “(x + y)2 = x 2 + 2xy + y 2 ” expresses the equal values
of the terms on the two sides of “=”. In propositional logic, however, we use the
equality sign like A = B only for the syntactic identity of the formulas A and B.
Therefore, the equivalence of formulas must be denoted differently, such as A ≡ B.
Theorem 2.2.14 For any formulas A, B, and C, where B is a subformula of A, and
B ≡ C, then A ≡ A[B C].
Proof For any interpretation σ , σ (B) = σ (C), since B ≡ C. Apply σ to the
formula trees of A and A[B C] and compare the truth values of all corresponding
nodes of the two trees, ignoring the proper subtrees of B and C. Since σ (B) =
σ (C), they must have the same truth values, that is, σ (A) = σ (A[B C]).
The equivalence relation is widely used to simplify formulas and has real
practical importance in computer science. Formula simplification in software can
make a program easier to read and understand. Simplified programs may also run
faster, since they require fewer operators. In hardware design, simplifying formulas
can decrease the number of logic gates on a chip because digital circuits can be
expressed by logical formulas. Minimizing logical formulas corresponds to reducing
the number of gates in the circuit. The payoff of gate minimization is potentially
enormous: A chip with fewer gates is smaller, consumes less power, has a lower
defect rate, and is cheaper to manufacture.
Suppose a formula A contains k propositional variables, then A can be viewed
as one of Boolean functions f : {1, 0}k → {1, 0}. For example, ¬p ∨ q contains
two variables and can be regarded as a Boolean function f (p, q). The truth table
(Table 2.2) reveals that f (p, q), i.e., ¬p ∨ q, always has the same truth value as
p → q, so f and → are the same function. As a result, we may use ¬p ∨ q as the
definition of p → q. As another example, the if-then-else function, ite : {1, 0}3 →
{1, 0}, can be defined by ite(1, B, C) = B and ite(0, B, C) = C, instead of using
a truth table of eight rows.
The following proposition lists many equivalent pairs of the formulas.
2.2 Semantics 45
A∨A≡A A∧A≡A
A∨B ≡B ∨A A∧B ≡B ∧A
(A ∨ B) ∨ C ≡ A ∨ (B ∨ C) (A ∧ B) ∧ C ≡ A ∧ (B ∧ C)
A∨⊥≡A A∧≡A
A ∨ ¬A ≡ A ∧ ¬A ≡ ⊥
A∨≡ A∧⊥≡⊥
A ∨ (B ∧ C) ≡ (A ∨ B) ∧ (A ∨ C) A ∧ (B ∨ C) ≡ (A ∧ B) ∨ (A ∧ C)
¬ ≡ ⊥; ¬⊥ ≡ ¬¬A ≡ A
¬(A ∨ B) ≡ ¬A ∧ ¬B ¬(A ∧ B) ≡ ¬A ∨ ¬B
A → B ≡ ¬A ∨ B A → B ≡ ¬B → ¬A
A ⊕ B ≡ (A ∨ B) ∧ (¬A ∨ ¬B) A ↔ B ≡ (A ∨ ¬B) ∧ (¬A ∨ B)
Some of the above equivalences have special names. For instance, ¬¬A ≡ A is
called double negation. ¬(A ∨ B) ≡ ¬A ∧ ¬B and ¬(A ∧ B) ≡ ¬A ∨ ¬B are
called de Morgan’s law. A ∨ (B ∧ C) ≡ (A ∨ B) ∧ (A ∨ C) and A ∧ (B ∨ C) ≡
(A ∧ B) ∨ (A ∧ C) are called distributive laws.
One way to show that two formulas are equivalent is to prove the validity of a
formula, as given by the following proposition:
Proposition 2.2.16 For any formulas A and B, A ≡ B iff | A ↔ B.
Proof A ≡ B means, for any interpretation σ , σ (A) = σ (B), which means σ (A ↔
B) = 1, i.e., any σ is a model of A ↔ B, i.e., A ↔ B is valid.
It follows that A ≡ iff | A when B = . The above proposition shows the
relationship between ≡, which is a semantic notation, and ↔, which is a syntactical
symbol. Moreover, Theorem 2.2.9 allows us to prove the equivalences using the
truth table method, treating A, B, and C as propositional variables. For example,
from (p → q) ≡ (¬p ∨ q), we know (p → q) ↔ (¬p ∨ q) is valid; from the
validity of (p → q) ↔ (¬p ∨ q), we know (A → B) ↔ (¬A ∨ B) is valid for any
formulas A and B, thus (A → B) ≡ (¬A ∨ B).
2.2.4 Entailment
To catch the relation between the premises and the conclusion in logic, we have
the notion of entailment.
Definition 2.2.17 Given two formulas A and B, we say A entails B, or B is a
logical consequence of A, denoted by A | B, if M(A) ⊆ M(B).
Thus, since M() = AllP ⊆ M(B) implies M(B) = AllP , | B and | B
have the same meaning, i.e., B is valid.
The above definition allows many irrelevant formulas to be logical consequences
of A, including all tautologies and logically equivalent formulas. Despite this
irrelevant relationship between A and B, the concept of entailment is indispensable
in logic. For instance, an inference rule is a pattern which takes formulas as premises
and returns a formula as conclusion. To check the soundness of the inference rule,
we let the premises be represented by formula A and the conclusion represented by
B, and check if A | B, i.e., B is a logical consequence of A. If the number of
variables is small, we may use the truth table method to check if M(A) ⊆ M(B),
that is, if every model of A remains to be a model of B.
Example 2.2.18 The propositional version of the modus ponens rule says that
given the premises p → q and p, then we draw the conclusion q. Let A be
(p → q) ∧ p, then M(A) = {pq}, and M(q) = {pq, pq}. Since M(A) ⊆ M(q),
so q is a logical consequence of A, or A | q.
Proposition 2.2.19 The relation | is transitive, that is, if A | B and B | C, then
A | C.
The proof is left as an exercise.
Definition 2.2.20 Given a formula A, the set {B | A | B} is called the theory of
A and is denoted by T (A). Every member of T (A) is called a theorem of A.
In the above definition, A can be a set of formulas to represent the conjunction
of its members.
Proposition 2.2.21 The following three statements are equivalent: (a) A | B;
(b) M(A) ⊆ M(B); and (c) T (B) ⊆ T (A).
Proof (a) → (b): By definition, A | B iff M(A) ⊆ M(B).
(b) → (c): Also, by definition, B | C for any C ∈ T (B). If A | B, then
A | C because of the transitivity of |, so C ∈ T (A).
(c) → (a): If T (B) ⊆ T (A), since B ∈ T (B), so B ∈ T (A), thus A | B.
Corollary 2.2.22 For any formula A, T () ⊆ T (A) ⊆ T (⊥).
Proof This holds because ⊥ | A and A | . Note that T () contains every
tautology and T (⊥) contains every formula.
Corollary 2.2.23 For any formulas A and B, the following statements are equiva-
lent: (a) A ≡ B; (b) M(A) = M(B); and (c) T (A) = T (B).
2.2 Semantics 47
Algorithm 2.3.3 The algorithm to convert a formula into NNF takes the following
three groups of rules:
1. Use the following equivalences to remove →, ⊕, and ↔ from the formula.
A → B ≡ ¬A ∨ B.
A ⊕ B ≡ (A ∨ B) ∧ (¬A ∨ ¬B).
A ↔ B ≡ (A ∨ ¬B) ∧ (¬A ∨ B).
¬(A ∨ B) ≡ ¬A ∧ ¬B.
¬(A ∧ B) ≡ ¬A ∨ ¬B.
(¬p1 ∨ ((¬p2 ∨ p3 ) ∧ (p2 ∨ ¬p3 ))) ∧ (p1 ∨ ¬((¬p2 ∨ p3 ) ∧ (p2 ∨ ¬p3 ))).
Its NNF is
(¬p1 ∨ ((¬p2 ∨ p3 ) ∧ (p2 ∨ ¬p3 ))) ∧ (p1 ∨ ((p2 ∧ ¬p3 ) ∨ (¬p2 ∧ p3 ))).
52 2 Propositional Logic
Since ∨ is commutative and associative and X ∨ X ≡ X, a clause can be
represented by a set of literals. In this case, we use | for ∨ and p for ¬p in a clause.
For example, the above clauses will be displayed as
Definition 2.3.10 If a clause includes every variable exactly once, the clause is full.
A CNF is called a full CNF if every clause of the CNF is full.
If a clause C misses a variable r, we may replace C by two clauses: {(C | r), (C |
r)}. Repeating this process for every missing variable, we will obtain an equivalent
full CNF. Transforming a set of clauses into an equivalent full CNF is sound because
for any formula A and B, the following relation is true:
A ≡ (A ∨ B) ∧ (A ∨ ¬B),
The concept of full CNF is useful for theoretical proofs. Unfortunately, the size
of a full CNF is exponential in terms of number of variables, too big for any practical
application. CNFs are useful to specify many problems in practice, because practical
problems are often specified by a set of constraints. Each constraint can be specified
by a set of clauses, and the conjunction of all clauses from each constraint gives us
a complete specification of the problem.
p ∧ ¬q ∨ ¬p ∧ ¬r ∨ q ∧ r ≡ ¬p ∧ q ∨ p ∧ r ∨ ¬q ∧ ¬r
54 2 Propositional Logic
A truth table is often used to define a Boolean function. In the truth table, the
truth value of the function is given for every interpretation of the function. Each
interpretation corresponds to a full product and the function can be defined as a sum
(disjunction) of all the products for which the function is true in the corresponding
interpretation.
Example 2.3.16 Suppose we want to define a Boolean function f : B 3 → B,
where B = {0, 1} and the truth value of f (a, b, c) is given in the following truth
table. Following the convention of EDA (electronic design automation), we will use
+ for ∨, · for ∧, and A for ¬A.
a b c f Products Clauses
0 0 0 0 m0 = abc M0 =a+b+c
0 0 1 1 m1 = abc M1 =a+b+c
0 1 0 0 m2 = abc M2 =a+b+c
0 1 1 1 m3 = abc M3 =a+b+c
1 0 0 0 m4 = abc M4 =a+b+c
1 0 1 1 m5 = abc M5 =a+b+c
1 1 0 0 m6 = abc M6 =a+b+c
1 1 1 0 m7 = abc M7 =a+b+c
2.3 Normal Forms 55
f = m0 + m2 + m4 + m6 + m7
f = M0 · M2 · M4 · M6 · M7
where · stands for ∧ and mi ≡ Mi . In other words, f can be defined by a full CNF.
Note that M0 · M2 · M4 · M6 · M7 is the dual of m0 + m2 + m4 + m6 + m7 . In general,
the dual of DNF is CNF and the dual of CNF is DNF.
Once we know the principle, we can construct a full CNF directly from the truth
table of a formula. That is, for each non-model interpretation in the truth table, we
create one full clause: If a variable is true in the interpretation, the negative literal of
this variable is in the clause; if a variable is false, the positive literal of the variable
is in the clause.
Example 2.3.17 The truth table of (p ∨ q) → ¬r has eight rows and three
interpretations of p, q, r are non-models: 1, 1, 1 , 1, 0, 1 , and 0, 1, 1 . From
these three interpretations, we obtain three full clauses: (¬p ∨ ¬q ∨ ¬r) ∧ (¬p ∨
q ∨ ¬r) ∧ (p ∨ ¬q ∨ ¬r), or {(p | q | r), (p | q | r), (p | q | r)}.
Let ite : {1, 0}3 → {1, 0} be the “if-then-else” operator such that, for any formula A
and B, ite(1, A, B) ≡ A and ite(0, A, B) ≡ B. In fact, ite, 1, and 0 can be used to
represent any logical operator. For example, ¬y ≡ ite(y, 0, 1). Table 2.3 shows that
every binary logical operator can be represented by ite. Following the convention
on BDD, we will use 1 for and 0 for ⊥ in every propositional formula.
Definition 2.3.18 A formula is said to be an ITE normal form (INF) if the formula
is 0, 1, or of form ite(p, A, B) where p is a propositional variable, and A and B are
INFs.
56 2 Propositional Logic
An INF uses at most three operators: 0 (for ⊥), 1 (for ), and ite. Note that 1
and 0 are INFs, but p is not; ite(p, 1, 0) is.
Proposition 2.3.19 Every propositional formula can be transformed into an equiv-
alent INF.
For example, let A be ¬p∨q ∧¬r; its equivalent INF is ite(p, ite(q, ite(r, 0, 1),
0), 1). To prove the above proposition, we provide the algorithm below to do the
transformation.
Algorithm 2.3.20 The algorithm to convert a formula into INF is a recursive
procedure convertINF:
proc convertINF(A)
if A = 1 return 1;
if A = 0 return 0;
else if A contains p ∈ VP
return ite(p, convertINF(A[p 1]), convertINF(A[p 0]));
else return simplify(A).
Note that A[p 1] (or A[p 0]) stands for the formula resulting from
replacing every occurrence of p by 1 (or 0) in A. The subroutine simplify(A)
will return the truth value of formula A which has no propositional variables. The
formula returned by convertINF contains only ite, 0, and 1 as the logical operators.
All the propositional variables appear as the first argument of ite.
Example 2.3.21 Let A be ¬p ∨ q ∧ ¬r, then A[p 1] = ¬1 ∨ q ∧ ¬r ≡
q ∧ ¬r; A[p 0] = ¬0 ∨ q ∧ ¬r ≡ 1. So convertINF(A) = ite(p,
convertINF(q ∧ ¬r), 1). convertINF(q ∧ ¬r) = ite(q, convertINF(¬r), 0),
2.3 Normal Forms 57
Fig. 2.2 (a) The BDD of ite(p, ite(q, ite(r, 0, 1), 0), 1) from the last example; (b) the BDD of
INF ite(x1 , ite(x2 , 1, 0), ite(x2 , ite(x3 , 1, 0), ite(x3 , 0, 1))) derived from x1 x2 x3 ∨ x1 x2 ∨ x2 x3
Fig. 2.3 The first BDD uses a1 > b1 > a2 > b2 > a3 > b3 and the second BDD uses a1 > a2 >
a3 > b1 > b2 > b3 for the same formula (a1 ∧ b1 ) ∨ (a2 ∧ b2 ) ∨ (a3 ∧ b3 )
The input A to ROBDD can contain the nodes of a ROBDD, so that we can
perform various logical operations on ROBDDs.
Example 2.3.24 Assume a > b > c, and let A be ite(F, G, H ), where F =
ite(a, 1, B), B = ite(b, 1, 0), G = ite(a, C, 0), C = ite(c, 1, 0), H =
ite(b, 1, D), and D = ite(d, 1, 0). ROBDD(A) will call ROBDD(A[a 1]) and
ROBDD(A[a 0]), which returns C and J = ite(b, 0, H ), respectively. Finally,
ROBDD(A) returns ite(a, C, J ).
ROBDDs have been used for presenting Boolean functions, symbolic simulation
of combinational circuits, equivalence checking and verification of Boolean func-
tions, and finding and counting models of propositional formulas.
There exist many optimization problems in propositional logic. For example, one
of them is to find a variable order for a formula, so that the resulting ROBDD is
minimal. In this section, we introduce three optimization problems.
A ↑ B ≡ ¬(A ∧ B) A ↓ B ≡ ¬(A ∨ B)
Logic minimization, also called logic optimization, is a part of circuit design process
and its goal is to obtain the smallest combinational circuit that is represented by
a Boolean formula. Logic minimization seeks to find an equivalent representation
of the specified logic circuit under one or more specified constraints. Generally,
the circuit is constrained to minimum chip area meeting a pre-specified delay.
Decreasing the number of gates will reduce the power consumption of the circuit.
Choosing gates with fewer transistors will reduce the circuit area. Decreasing
the number of nested levels of gates will reduce the delay of the circuit. Logic
minimization will reduce substantially the cost of circuits and improve its quality.
In the early days of electronic design automation (EDA), a Boolean function is often
defined by a truth table and we can derive full DNF (sum of products) or full CNF
(product of sums) directly from the truth table, as shown in Example 2.3.16. The
62 2 Propositional Logic
obtained DNF or CNF needs to be minimized, in order to use the least number of
gates to implement this function.
Following the convention of EDA, we will use + for ∨, · for ∧, and A for ¬A.
Example 2.4.7 Let f : {0, 1}2 → {0, 1} be defined by f (A, B) = AB +AB +AB.
We need three AND gates and one OR gate of 3-input. However, f (A, B) ≡ A + B.
Using A + B, we need only one OR gate.
The equivalence relations used in the above simplification process are AB +
AB ≡ A and A + AB ≡ A + B (or A + AB ≡ A + B). For the above example,
AB + AB ≡ B and B + AB ≡ B + A.
We want to find an equivalent circuit of the minimum size possible, and this
is a difficult computation problem. There are many tools such as Karnaugh maps
available to achieve this goal.
The Karnaugh map (K-map) is a popular method of simplifying DNF or CNF
formulas, originally proposed by Maurice Karnaugh in 1953. The Karnaugh map
reduces the need for extensive calculations by displaying the output of the function
graphically and taking advantage of humans’ pattern-recognition capability. It is
suitable for Boolean functions with two to four variables. When the number of
variables is greater than 4, it is better to use an automated tool.
The Karnaugh map uses a two-dimensional grid where each cell represents an
interpretation (or equivalently, a full product) and the cell contains the truth value
of the function for that interpretation. The position of each cell contains all the
information of an interpretation, and the cells are arranged such that adjacent cells
differ by exactly one truth value in the interpretation. Adjacent ones in the Karnaugh
map represent opportunities to simplify the formula. The products for the final
formula are found by encircling groups of ones in the map. Product groups must
be rectangular and must have an area that is a power of two (i.e., 1, 2, 4, 8, . . . ).
Product rectangles should be as large as possible without containing any zeros (it
may contain don’t-care cells). Groups may have overlapping ones. The least number
of groups that cover all ones will represent the products of a DNF of the function.
These products can be used to write a minimal DNF representing the required
function and thus implemented by the least number of AND gates feeding into an
OR gate. The map can also be used to obtain a minimal CNF that is implemented
by OR gates feeding into an AND gate.
Let f (x, y) = xy + xy + xy. There are four products of x and y, and the K-map
has four cells, that is, four products corresponding to the four interpretations on x
and y. As the output of f , three cells have value 1 and one cell (i.e., xy) has value
0. Initially, each cell belongs to a distinct group of one member (Fig. 2.4).
The merge operation on K-map: Merge two adjacent and disjoint groups of the same size
and of the same truth value into one larger group.
This operation creates groups of size 2, 4, 8, etc., and cells are allowed to appear
in different groups at the same time. For example, cells xy and xy in Fig. 2.4 are
adjacent; they are merged into one group {xy, xy}, which can be represented by
y as a shorthand of the group. Similarly, cells xy and xy can be merged into
{xy, xy}, which can be represented by x. Note that xy is used twice in the two
merge operations. Now no more merge operations can be performed, and the final
result is f = x + y (f is the NAND function). This result can be proved logically
as follows:
f = xy + xy + xy
= (xy + xy) + (xy + xy)
= (x + x)y + x(y + y)
= y+x
Example 2.4.8 Consider the function f defined by the truth table in Fig. 2.5. The
corresponding K-map has four cells with truth value 1 as f has four models shown
in the truth table by the rows numbered 1, 3, 6, and 7. There are three possible
merge operations: (i) xyz and xyz merge into xz; (ii) xyz and xyz merge into xy;
and (iii) xyz and xyz merge into yz. Since all the cells with value 1 are in the
groups represented by xz and xy, yz is redundant. So, the simplified function is
f = xy + xz, instead of f = xy + xz + yz. On the other hand, if g = xy + xz + yz,
then none of the three products in g is redundant, because if you delete one, you
will miss some ones, even though the full product xyz is covered by all the three
products.
From the above examples, it is clear that after each merge operation, two products
are merged into one product containing one less variable. That is, the larger the
group, the less the number of variables in the product to represent the group. This
is possible because adjacent cells represent interpretations which differ by the truth
value of one variable.
Because K-maps are two-dimensional, some adjacent relations of cells are not
shown on K-maps. For example, in Fig. 2.5, m0 = xyz and m4 = xyz are adjacent,
and m1 = xyz and m5 = xyz are adjacent. Strictly speaking, K-maps are toroidally
connected, which means that rectangular groups can wrap across the edges. Cells
on the extreme right are actually “adjacent” to those on the far left, in the sense that
the corresponding interpretations only differ by one truth value. We need a three-
dimensional graph to completely show the adjacent relation of three variables, a
four-dimensional graph to completely show the adjacent relation of four variables,
etc.
Based on the idea of K-maps, an automated tool will create a graph of 2n nodes
for n variables, each node represents one interpretation, and two nodes have an
edge iff their interpretations differ by one truth value. The merge operation is then
implemented on this graph to generate products for a least size DNF. The same
graph is also used for generating Gray code (by finding a Hamiltonian path in the
graph).
Example 2.4.9 The K-map for f (x, y, z, w) = m0 + m2 + m5 + m7 + m8 + m10 +
m13 + m15 , is given in Fig. 2.6, where m0 = xyzw, m2 = xyzw, m5 = xyzw,
m7 = xyzw, m8 = xyzw, m10 = xyzw, m13 = xyzw, and m15 = xyzw (i in
mi is the decimal value of xyzw as a binary number). First, m0 and m2 are adjacent
and can be merged into xyw; m8 and m10 are also adjacent and can be merged into
xyw. Then xyw and xyw can be further merged into yw. Similarly, m5 and m7
merge into xyw; m13 and m15 merge into xyw. Then xyw and xyw merge into yw.
The final result is f = yw + yw. It would also have been possible to derive this
simplification by carefully applying the equivalence relations, but the time it takes
to do that grows exponentially with the number of products.
Note that the formula obtained from a K-map is not unique in general. For
example, g(x, y, z, w) = xy + zw + xyw or g(x, y, z, w) = xy + zw + yzw
are possible outputs for g from K-maps. This means the outputs of K-maps are not
canonical.
K-maps can be used to generate simplified CNF for a Boolean function. In
Example 2.3.16, we have shown how to obtain a full CNF from a full DNF. Using
the same idea, if we simplify the full DNF first, the CNF obtained by negating the
simplified DNF will be a simplified one.
2.4 Optimization Problems 65
Example 2.4.10 Let f (x, y, z) = xyz + xyz + xyz + xyz and its K-map is given
in Fig. 2.7. The negation of f is
Since xyz and xyz can merge into yz (shown in the figure), xyz and xyz can merge
into xy, the simplified f is f = xyz + xy + yz. The simplified CNF for f is thus
f = (x + y + z)(x + y)(y + z).
Karnaugh maps can also be used to handle “don’t care” values and can simplify
logic expressions in software design. Boolean conditions, as used, for example, in
conditional statements, can get very complicated, which makes the code difficult
to read and to maintain. Once minimized, sum-of-products and product-of-sums
expressions can be implemented directly using AND and OR logic operators.
The above discussion illustrates the basic idea of circuit optimization. A presen-
tation of practical optimization methods used for VLSI (very large-scale integration)
chips using CMOS (complementary metal–oxide–semiconductor) technology is
beyond the scope of this book.
Propositions and logical operators arise all the time in computer programs. All
programming languages such as C, C++, and Java use symbols like “&&”, “||”,
and “!” in place of ∧, ∨, and ¬. Consider the Boolean expression appearing in an
if-statement:
if (x > 0 || (x <= 0 && y > 100)) ...
If we let p denote x > 0 and q be y > 100, then the Boolean expression is
abstracted as p ∨ ¬p ∧ q. Since (p ∨ ¬p ∧ q) ≡ (p ∨ q), the original code can be
rewritten as follows:
if (x > 0 || y > 100) ...
In other words, the knowledge about logic may help us to write neater code.
Consider another piece of code:
if (x >= 0 && A[x] == 0) ...
example, for a = 67, its bit vector is (a)2 = (0, 1, 0, 0, 0, 0, 1, 1), because
67 = 64 + 2 + 1 = 26 + 21 + 20 , and for b = 18, (b)2 = (0, 0, 0, 1, 0, 0, 1, 0), since
18 = 16 + 2 = 24 + 21 . In fact, a and b are stored as (a)2 and (b)2 in computer.
Table 2.5 provides some popular bitwise operators with examples, treating each
bit as a Boolean value and performing on them simultaneously. We will use the bit
vectors of a = 67 and b = 18 as examples.
If necessary, other bitwise operations can be implemented using the four
operations in Table 2.5. For example, a bitwise implication on a and b can be
implemented as ∼ a | b.
The bit shifts are also considered bitwise operations, because they treat an integer
as a bit vector rather than as a numerical value. In these operations, the bits are
moved, or shifted, to the left or right. Registers in a computer processor have a fixed
width, so some bits will be “shifted out” of the register at one end, while the same
number of bits are “shifted in” from the other end. Typically, the “shifted out” bits
are discarded and the “shifted in” bits are all 0’s. When the leading bit is the sign of
an integer, the left shift operation becomes more complicated, and its discussion is
out of the scope of this book. In the table, we see that the shift operators take two
integers: x << y will shift (x)2 left by y bits; x >> y will shift (x)2 right by y
bits.
The value of the bit vector x << y is equal to x ∗ 2y , provided that this
multiplication does not cause an overflow. E.g., for b = 18, b << 2 = 18 ∗ 4 = 72.
Similarly, the value of x >> y is equal to the integer division x/2y . E.g., for a = 67,
a >> 2 = 67/4 = 16. Shifting provides an efficient way of multiplication and
division when the second operand is a power of two. Typically, bitwise operations
are substantially faster than division, several times faster than multiplication.
Suppose you are the manager for a social club of 64 members. For each gathering,
you need to record the attendance of all members. If each member has a unique
member id ∈ {0, 1, . . . , 63}, you may use a set of member ids to record the
attendance. Instead of a list of integers, you may use a single 64-bit integer, say
s, to store the set. That is, if a member attended and his id is i, then the ith bit of s
is 1; otherwise, it is 0 to show his absence. For this purpose, we need a way to set
a bit in s to be 1. Suppose s = 0 initially; we may use in Java or C the following
command:
s = s | (1 << i)
to set the ith bit of s to be 1. To set the value of the ith bit of s to be 0, use
To get the value of the ith bit of s, use s & (1 << i). For another gathering, we use
another integer t. Now it is easy to compute the union or intersection of these sets:
s | t is the union and s & t is the intersection. For example, if you want to check
that nobody attended both gatherings, you can check if s & t = 0, which is more
efficient than using lists for sets.
Propositional logic is a very simple language but rich enough to specify most
decision problems in computer science, because these problems belong to the class
of NP problems and SAT is NP-complete. In fact, propositional logic is a formal
language much more rigorous than natural languages. Since natural languages are
ambiguous, expressing a sentence in logic helps us to understand the exact meaning
of the sentence. For example, the following two sentences appear to have the same
meaning.
• Students and seniors pay half price.
• Students or seniors pay half price.
One sentence uses “and” and the other uses “or,” as if “and” and “or” are syn-
onym. How to explain this? Let “being student” and “being senior” be abbreviated
by p and q, respectively, and “pay half price” by r. Let A be (p → r) ∧ (q → r)
and B be (p ∨ q) → r, then A and B express somewhat more precisely the factual
content of the above two sentences, respectively. It is easy to show that the formulas
A and B are logically equivalent. The everyday-language statements of A and B
obscure the structural difference of A and B through an apparently synonymous use
of the words “and” and “or.”
2.5 Using Propositional Logic 69
The formula following each property expresses the property formally and can be
easily converted into clauses. When n = 2, 3, the formulas are unsatisfiable.
For n ≥ 4, the formulas are satisfiable, and each model gives us a solution.
For example, when n = 4, we have two solutions: {p1,2 , p2,4 , p3,1 , p4,3 } and
{p1,3 , p2,1 , p3,4 , p4,2 }. If we assign the four variables in a solution to be true and
the other variables false, we obtain a model. Note that we ignore the property that
“there is at least one queen in each column,” because it is redundant.
The next example is a logical puzzle.
Example 2.5.2 Three people are going to the beach, each using a different mode of
transportation (car, motorcycle, and boat) in a different color (blue, orange, green).
Who’s using what? The following clues are given:
1. Abel loves orange, but he hates travel on water.
2. Bob did not use the green vehicle.
3. Carol drove the car.
70 2 Propositional Logic
The first step to solve a puzzle using propositional logic is to make sure what
properties are assumed and what conditions are provided by clues. For this puzzle,
the first assumed property is that “everyone uses a unique transportation tool.”
To specify this property in logic, there are several ways of doing it. One way
is to use a set of propositional variables to specify the relation of “who uses what
transportation”: Let px,y denote a set of nine variables, where x ∈ {a, b, c} (a for
Abel, b for Bob, and c for Carol), and y ∈ {c, m, b} (c for car, m for motorcycle,
and b for boat). Now the property that “everyone uses a unique transportation tool”
can be specified formally as the following clauses:
It appears quite cumbersome that 12 clauses are needed to specify one property.
Note that the goal of using logic is to find a solution automatically. For computers,
12 million clauses are not too many.
From the clues of the puzzle, we have the following two unit clauses:
These two unit clauses force pc,c to be assigned 1 and pa,b to be 0. Then we can
decide the truth values of the rest seven variables:
The second assumed property is that “every transportation tool has a unique
color.” If we read the puzzle carefully, the clues involve the relation that “who will
use what color of the transportation.” So, it is better to use qx,z to represent this
relation that person x uses color z, where x ∈ {a, b, c} and z ∈ {b, o, g} (b for blue,
o for orange, and g for green). We may specify the property that “everyone uses a
unique color” in a similar way as we specify “everyone uses a unique transportation
tool.” The clues give us two unit clauses: (qa,o ) and (qb,g ). Starting from these two
unit clauses, we can derive the truth values of all nine qx,z variables.
The next puzzle looks quite different from the previous one, but its specification
is similar.
Example 2.5.3 Four people sit on a bench of four seats for a group photo: two
Americans (A), a Briton (B), and a Canadian (C). They asked that (i) two Americans
do not want to sit next to each other; (ii) the Briton likes to sit next to the Canadian.
Suppose we use propositional variables Xy , where X ∈ {A, B, C} and 1 ≤ y ≤ 4,
with the meaning that “Xy is true iff the people of nationality X sits at seat y of the
bench.” How do you specify the problem in CNF over Xy such that the models of
the CNF match exactly all the sitting solutions of the four gentlemen?
The CNF will specify the following conditions:
1. Every seat takes at least one person: (A1 | B1 | C1 ), (A2 | B2 | C2 ), (A3 | B3 |
C3 ), (A4 | B4 | C4 ). The truth of these clauses implies that there are at least four
people using these seats.
2. Each seat can take at most one person: (¬A1 | ¬B1 ), (¬A1 | ¬C1 ), (¬B1 |
¬C1 ), (¬A2 | ¬B2 ), (¬A2 | ¬C2 ), (¬B2 | ¬C2 ), (¬A3 | ¬B3 ), (¬A3 | ¬C3 ),
(¬B3 | ¬C3 ), (¬A4 | ¬B4 ), (¬A4 | ¬C4 ), (¬B4 | ¬C4 ). Combining with the
first condition, this condition implies that there are exactly four people sitting on
the bench.
3. At most two Americans on the bench: (¬A1 | ¬A2 | ¬A3 ), (¬A1 | ¬A2 | ¬A4 ),
(¬A1 | ¬A3 | ¬A4 ), (¬A2 | ¬A3 | ¬A4 ). These clauses say that for every three
seats, one of them must not be taken by an American.
4. At most one Briton on the bench: (¬B1 | ¬B2 ), (¬B1 | ¬B3 ), (¬B1 | ¬B4 ),
(¬B2 | ¬B3 ), (¬B2 | ¬B4 ), (¬B3 | ¬B4 ). These clauses say that for every two
seats, one of them must not be taken by the Briton.
5. At most one Canadian on the bench: (¬C1 | ¬C2 ), (¬C1 | ¬C3 ), (¬C1 | ¬C4 ),
(¬C2 | ¬C3 ), (¬C2 | ¬C4 ), (¬C3 | ¬C4 ).
6. The Americans do not want to sit next to each other: (¬A1 | ¬A2 ), (¬A2 | ¬A3 ),
(¬A3 | ¬A4 ).
7. The Briton likes to sit next to a Canadian: (¬C1 | B2 ), (¬C2 | B1 | B3 ), (¬C3 |
B2 | B4 ), (¬C4 | B3 ), (¬B1 | C2 ), (¬B2 | C1 | C3 ), (¬B3 | C2 | C4 ),
(¬B4 | C3 ).
If we read (¬C2 | B1 | B3 ) as C2 → B1 ∨ B3 , it means that if the Canadian takes
the second seat, then the Briton must sit at either seat 1 or seat 3.
Some conditions such as there are at least two Americans, and at least one Briton and
one Canadian, can also be easily specified by clauses. However, these clauses are
72 2 Propositional Logic
logical consequence of the other clauses, as conditions 1 and 2 ensure that there are
four people sitting on the bench. If there are at most one Briton and one Canadian,
then condition 2 is also redundant, and we may remove them safely.
The input clauses of a problem serve as a set of axioms. This set is minimum if
no clauses are logical consequence of other clauses. Of course, it is very expensive
to ensure that a set of clauses is minimum. That is, without a thorough search,
we cannot say each clause is independent, i.e., not a logical consequences of the
other clauses. When specifying a problem in propositional logic, we often ignore
the minimum property of the input clauses.
The next problem is called Knights and Knaves, which was coined by Raymond
Smullyan in his 1978 work What Is the Name of This Book? The problem is
actually a collection of similar logic puzzles where some characters can only answer
questions truthfully, and others only falsely. The puzzles are set on a fictional island
where all inhabitants are either knights, who always tell the truth, or knaves, who
always lie. The puzzles involve a visitor to the island who meets small groups of
inhabitants. Usually, the aim is for the visitor to deduce the inhabitants’ type from
their statements, but some puzzles of this type ask for other facts to be deduced.
The puzzles may also be to determine a yes-no question which the visitor can ask in
order to discover a particular piece of information.
Example 2.5.4 You meet two inhabitants: Zoey and Mel. Zoey tells you that “Mel
is a knave.” Mel says, “Neither Zoey nor I am a knave.” Can you determine who is
a knight and who is a knave?
Let p stand for “Zoey is knight,” with the understanding that if p is true, then
Zoey is a knight; if p is false, then Zoey is a knave. Similarly, define q as “Mel is
knight.” Using a function says, what they said can be expressed as
Now using a truth table to check all the truth values of p and q, only when p → 1
and q → 0, both (1) and (2) are true. That is, we replace Zoey by knight and Mel by
knave in (1) and (2), to obtain ¬q and ¬(p∧q), respectively. Then {p → 1, q → 0}
is a model of ¬q ∧ ¬(p ∧ q). Thus, the solution is “Zoey is a knight and Mel is a
knave.”
Example 2.5.5 You meet two inhabitants: Peggy and Zippy. Peggy tells you that
“of Zippy and I, exactly one is a knight.” Zippy tells you that only a knave would
say that Peggy is a knave. Can you determine who is a knight and who is a knave?
Exercises 73
Let p be “Peggy is knight” and q be “Zippy is knight.” Then what they said can
be expressed as
The only solution is p → 0 and q → 0, i.e., both Peggy and Zippy are knaves.
Exercises
1. How many different logical operators have five arguments? That is, how many
Boolean functions of type f : B 5 → B, where B = {0, 1}?
2. Suppose VP = {p, q, r}. Then M((p → q)∧r) =? and M((p → q)∧¬q) =?
3. (Theorem 2.2.11) Prove that, given any two propositional formulas A and B,
M(A ∧ B) = M(A) ∩ M(B) and M(¬A) = AllP − M(A).
4. Answer the following questions with explanation:
(a) Suppose that you have shown that whenever X is true, then Y is true, and
whenever X is false, then Y is false. Have you now demonstrated that X
and Y are logically equivalent?
(b) Suppose that you have shown that whenever X is true, then Y is true, and
whenever Y is false, then X is false. Have you now demonstrated that X
and Y are logically equivalent?
(c) Suppose you know that X is true iff Y is true, and you know that Y is
true iff Z is true. Is this enough to show that X, Y , and Z are all logically
equivalent?
(d) Suppose you know that whenever X is true, then Y is true; that whenever Y
is true, then Z is true; and whenever Z is true, then X is true. Is this enough
to show that X, Y , and Z are all logically equivalent?
5. Provide the truth table for defining ite : B 3 → B, the if-then-else operator.
6. Provide a BNF grammar which defines the well-formed formulas without
unnecessary parentheses, assuming the precedence relation from the highest
to the lowest: ¬, ∧, ∨, →, {⊕, ↔}.
7. Prove by the truth table method that the following formulas are valid.
(a) (A ∧ (A → B)) → B
(b) (¬B → ¬A) → (A → B)
(c) (A ⊕ B) ↔ ((A ∨ B) ∧ (¬A ∨ ¬B))
(d) (A ↔ B) ↔ ((A ∨ ¬B) ∧ (¬A ∨ B))
74 2 Propositional Logic
8. Prove by the truth table method that the following entailments hold:
(a) (A ∧ (A → B)) | B
(b) (A → (B → C)) | (A → B) → (A → C)
(c) (¬B → ¬A) | (A → B)
(d) ((A ∨ B) ∧ (¬A ∨ C)) | (B ∨ C)
9. Prove by the truth table method that the following logical equivalences hold:
(a) ¬(A ∨ B) ≡ ¬A ∧ ¬B
(b) (A ∨ B) ∧ (A ∨ ¬B) ≡ A
(c) (A ↔ B) ≡ ((A ∧ B) ∨ (¬A ∧ ¬B))
(d) (A ⊕ B) ≡ ((A ∧ ¬B) ∨ (¬A ∧ B))
(e) (A ↓ B) ↑ C ≡ A ↑ (B ↓ C)
(f ) (A ↑ B) ↓ C ≡ A ↓ (B ↑ C)
10. Prove the equivalence of the following formulas by using other equivalence
relations:
(a) A ⊕ B ≡ ¬(A ↔ B) ≡ ¬A ↔ B ≡ A ↔ ¬B
(b) (A → B) ∧ (A → C) ≡ ¬A ∨ (A ∧ B ∧ C)
(c) A → (B → (C ∨ D)) ≡ (A → C) ∨ (B → D).
17. Construct full CNF for the following formulas, where a is ¬a, + is ∨ and ∧ is
omitted:
(a) F = abc + abc + abc + abc.
(b) G = abc + abc + abc + abc.
(c) H = a ⊕ b ⊕ c.
(d) I = (a ∧ b) → (b → c).
18. Construct full DNF for the following formulas, where a is ¬a, + is ∨ and ∧ is
omitted:
(a) F = (a + b + c)(a + b + c)(a + b + c)(a + b + c).
(b) G = (a + b + c)(a + b + c)(a + b + c)(a + b + c).
(c) H = a ⊕ b ⊕ c.
(d) I = (a ∧ b) → (b → c).
19. Assume a > b > c; construct ROBDDs for the following formulas, where a is
¬a, + is ∨ and ∧ is omitted as production:
(a) F = abc + abc + abc.
(b) G = abc + abc + abc.
(c) H = a ⊕ b ⊕ c.
(d) I = (a ∧ b) → (b → c).
20. Let F be the set of all formulas built on the set of n propositional variables.
n
Prove that there are exactly 22 classes of equivalent formulas among F.
21. (a) Prove that {→, ¬} is a minimally sufficient set of Boolean operators.
(b) Prove that {→, ⊥} is a minimally sufficient set of Boolean operators.
22. Prove that {∧, ∨, →, } is not a sufficient set of Boolean operators.
23. Prove that (a) {∧, ∨, , ⊥} is not a sufficient set of Boolean operators and (b)
if we add any function o which cannot be expressed in terms of ∧, ∨, , and
⊥, then {∧, ∨, , ⊥, o} is sufficient.
24. Identify the pairs of equivalent formulas from the following candidates:
25. Try to find an equivalent DNF of the minimal size for the following DNF
formulas using K-maps:
(a) f (x, y, z) = x y z + x y + x y z + x z.
(b) f (x, y, z) = x y + y z + y z + x y z.
(c) f (x, y, z, w) = x y zw + x yz + y z w + y z w.
(d) f (x, y, z, w) = m0 + m1 + m5 + m7 + m8 + m10 + m14 + m15 , where mi
is a full product of x, y, z and w, and i is the decimal value of the binary
string xyzw.
76 2 Propositional Logic
26. Try to find an equivalent CNF of the minimal size for the following Boolean
functions using K-maps:
(a) f (x, y, z) = x y + x y + x y z + x z.
(b) f (x, y, z) = x y z + x y z + x y z + x y z.
(c) f (x, y, z, w) = m0 + m1 + m3 + m4 + m5 + m7 + m12 + m13 + m15 .
(d) f (x, y, z, w) = m0 +m1 +m2 +m4 +m5 +m6 +m8 +m9 +m12 +m13 +m14 .
27. A function f : B n → B, where B = {0, 1}, is called linear if
f (x1 , x2 , . . . , xn ) = a0 + a1 x1 + · · · + an xn for suitable coefficients
a0 , . . . , an ∈ B. Here + denotes the addition modulo 2, and the default
multiplication (not shown) is the multiplication over integers.
(a) Show that the above representation of a linear function f is unique.
(b) Determine the number of n-ary linear functions.
28. Let a = 31 and b = 19. Please provide the 8-bit vectors (a)2 and (b)2 for a and
b and show the results (both in bit vectors and decimal values) of (∼ a)2 , (∼ b)2 ,
(a | b), (a & b), (a ∧ b)2 , a >> 2, b >> 2, a << 2, and b << 2.
29. Suppose x is a 32-bit positive integer. Provide a Java program for each of the
following programming tasks, using as much bitwise operations as possible:
(a) Check whether x is even or odd.
(b) Get the position of the highest bit of 1 in x.
(c) Get the position of the lowest bit of 1 in x.
(d) Set the ith of x to 1, where i is an integer.
(e) Set the ith of x to 0, where i is an integer.
(f)Count trailing zeros in x as a binary number.
(g) Shift x right by 3 bits and put the last 3 bits of x at the beginning of the
result from the shifting. The whole process is called “rotate x right” by 3
bits.
(h) Convert a string of decimal digits into a binary number without using
multiplication by 10.
30. Suppose a formula A contains 15 variables, each variable is represented by an
integer from 0 to 14. We use the last 15 bits of an integer variable x to represent
an interpretation of A, such that ith bit is 1 iff we assign 1 to variable i. Modify
the algorithm eval(A, σ ), where σ is replaced by x using bitwise operations.
Design an algorithm which lists all the interpretations of A and apply each
interpretation to A using eval(A, x). The pseudo-code of the algorithm should
be like a Java program.
31. The meaning of the following propositional variables is given below:
t: “taxes are increased”
e: “expenditures rise”
d: “the debt ceiling is raised”
c: “the cost of collecting taxes rises”
g: “the government borrows more money”
i: “interest rates increase”
Exercises 77
(a) Express the following statements as propositional formulas; (b) convert the
formulas into CNF.
Either taxes are increased or if expenditures rise then the debt ceiling is raised. If
taxes are increased, then the cost of collecting taxes rises. If a rise in expenditures
implies that the government borrows more money, then if the debt ceiling is raised,
then interest rates increase. If taxes are not increased and the cost of collecting taxes
does not increase then if the debt ceiling is raised, then the government borrows more
money. The cost of collecting taxes does not increase. Either interest rates do not
increase, or the government does not borrow more money.
32. Specify the following puzzle in propositional logic and convert the specification
into CNF. Find the model of your CNF and use this model to construct a
solution of the puzzle.
Four kids, Abel, Bob, Carol, and David, are eating lunch on a hot summer day. Each
has a big glass of water, a sandwich, and a different type of fruit (apple, banana, orange,
and grapes). Which fruit did each child have? The given clues are:
(a) Abel and Bob have to peel their fruit before eating.
(b) Carol doesn’t like grapes.
(c) Abel has a napkin to wipe the juice from his fingers.
33. Specify the following puzzle in propositional logic and convert the specification
into CNF. Find the model of your CNF and use this model to construct a
solution of the puzzle.
Four electric cars are parked at a charge station and their mileages are 10K, 20K, 30K,
and 40K, respectively. Their charge costs are $15, $30, $45, and $60, respectively.
Figure out how the charge cost for each car from the following hints:
(a) The German car has 30K miles.
(b) The Japanese car has 20K miles and it charged $15.
(c) The French car has less miles than the Italian car.
(d) The vehicle that charged $30 has 10K more mileages than the Italian car.
(e) The car that charged $45 has 20K less mileage than the German car.
34. Specify the following puzzle in propositional logic and convert the specification
into CNF. Find the model of your CNF and use this model to construct a
solution of the puzzle.
Four boys are waiting for Thanksgiving dinner and each of them has a unique favorite
food. Their names are Larry, Nick, Philip, and Tom. Their ages are all different and
fall in the set of {8, 9, 10, 11}. Find out which food they are expecting to eat and how
old they are.
(a) Larry is looking forward to eating turkey.
(b) The boy who likes pumpkin pie is 1 year younger than Philip.
(c) Tom is younger than the boy that loves mashed potato.
(d) The boy who likes ham is 2 years older than Philip.
35. Six people sit on a long bench for a group photo. Two of them are Americans
(A), two are Canadians (C), one Italian (I), and one Spaniard (S). Suppose we
use propositional variables Xy , where x ∈ {A, C, I, S} and 1 ≤ y ≤ 6, and
Xy is true iff the people of nationality X sits at seat y of the bench. Please
78 2 Propositional Logic
References
1. Kevin C. Klement, Propositional Logic, in James Fieser and Bradley Dowden (eds.) Internet
Encyclopedia of Philosophy, iep.utm.edu/propositional-logic-sentential-logic, retrieved Nov. 11,
2023
2. Franks, Curtis, “Propositional Logic”, The Stanford Encyclopedia of Philosophy, Edward N.
Zalta & Uri Nodelman (eds.), Fall 2023, retrieved Nov. 11, 2023
Chapter 3
Reasoning in Propositional Logic
Given a set A of axioms (i.e., the formulas assumed to be true) and a formula B,
a proof procedure P (A, B) will answer the question whether B is a theorem of A.
If P (A, B) returns true, we say B is proved; if P (A, B) returns false, we say B is
disproved. In general, the axiom set A is regarded to be equivalent to the conjunction
of the formulas in A. For propositional logic, a theorem is any formula B such that
A | B, i.e., B ∈ T (A). That is, P (A, B) is also called a theorem prover. The
same procedure P can be used to show that B is valid (i.e., B is a tautology) by
calling P (, B), or that C is unsatisfiable by calling P (C, ⊥). That is, proving the
entailment relation is no harder than proving validity or proving unsatisfiability. In
propositional logic, the three problems are equivalent and belong to the class of co-
NP-complete problems, i.e., the complement of the class of NP-complete problems.
A co-NP-complete problem is regarded “harder” than an NP-complete problem,
because the latter has a polynomial-time algorithm to verify its solution, but the
former is not known to have such an algorithm.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 79
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2_3
80 3 Reasoning in Propositional Logic
In practice, each proof procedure is designed to solve one of the three problems and
belongs to one of the three types of proof procedures.
• Theorem prover
P (X, Y ) returns true iff formula Y is a theorem (an entailment) of X (assuming
a set of formulas is equivalent to the conjunction of all its members).
• Tautology prover
T (A) returns true iff formula A is valid (a tautology).
• Refutation prover
R(B) returns true iff formula B is unsatisfiable.
If we have one of the above three procedures, i.e., P (X, Y ), T (A), or R(B), we
may implement the other two as follows:
• If P (X, Y ) is available, implement T (A) as P (, A) and R(B) as P (B, ⊥).
• If T (A) is available, implement P (X, Y ) as T (X → Y ) and R(B) as T (¬B).
• If R(B) is available, implement P (X, Y ) as R(X ∧ ¬Y ) and T (A) as R(¬A).
The correctness of the above implementations is based on the following theorem,
whose proof is left as an exercise.
Theorem 3.1.1 Given formulas A and B, the following three statements are
equivalent:
1. A | B.
2. A → B is valid.
3. A ∧ ¬B is unsatisfiable.
Thus, one prover can do all the three, either directly or indirectly. In Chap. 2, we
have discussed several concepts and some of them can be used to construct proof
procedures as illustrated below.
• Truth table: It can be used to construct a theorem prover, a tautology prover, and
a refutation prover.
• CNF: It can be used to construct a tautology prover, because a CNF is valid iff
every clause is valid, and it is very easy to check if a clause is valid.
• DNF: It can be used to construct a refutation prover, because a DNF is
unsatisfiable iff every product is unsatisfiable, and it is very easy to check if a
product is unsatisfiable.
• ROBDD or INF: It can be used to construct a tautology prover or a refutation
prover, because ROBDD is a canonical form. Formula A is valid (unsatisfiable)
iff A’s canonical form is (⊥).
• Algebraic substitution: It can be used to construct a tautology prover or a
refutation prover, as substituting “equal by equal” preserves the equivalence
relation. If we arrive at or ⊥, we can claim the input formula is valid or
unsatisfiable.
3.1 Proof Procedures 81
In terms of its working principle, a proof procedure may have one of three styles:
enumeration, reduction, and deduction.
• enumeration-style: enumerate all involved interpretations.
• reduction-style: use the equivalence relations to transform a set of formulas into
a simpler set.
• deduction-style: use inference rules to deduce new formulas.
By definition, A | B iff M(A) ⊆ M(B), that is, every model of A is a model
of B. We may use a truth table to enumerate all the interpretations involving A and
B, to see if every model of A is a model of B. A proof procedure based on truth
tables is an enumeration-style proof procedure which uses exhaustive search. This
style of proof procedures is easy to describe but highly inefficient. For example, if
A contains 30 variables, we need to construct a truth table of 230 rows.
Reduction-style proof procedures use the equivalence relations to transform a
set of formulas into desired simplified formulas, such as reduced ordered binary
decision diagrams (ROBDD). Since ROBDD is canonical, we decide that a formula
is valid if its ROBDD is (or 1), or unsatisfiable if its ROBDD is ⊥ (or 0). We
may also use disjunctive normal forms (DNF) to show a formula is unsatisfiable:
Its simplified DNF is ⊥. In the following, we will introduce the semantic tableau
method, which is a reduction-style proof procedure and works in the same way as
obtaining DNF from the input formula.
Deduction-style proof procedures use inference rules to generate new formulas,
until the desired formula is generated. An inference rule typically takes one or
two formulas as input and generates one or two new formulas as output. Some of
these rules come from the equivalence relations between the input and the output
formulas. In general, we require that the output of an inference rule be entailed by
the input.
Many proof procedures are not clearly classified as one of enumeration, reduc-
tion, or deduction styles. Yet, typically their working principles can be understood as
performing enumeration, or reduction, or deduction, or a hybrid of all implicitly. For
example, converting a formula into DNF can be a reduction; however, converting
a formula into full DNF can be an enumeration, as every term in a full DNF
corresponds to one model of the formula. In practice, reduction is often used in
most deduction-style proof procedures for efficiency.
In Chap. 1, we say that a procedure is a decision procedure if the procedure
terminates for every input and answers the question correctly. Fortunately for
propositional logic, every proof procedure introduced in this chapter is a decision
procedure. Other logics may not have a decision procedure for certain questions.
The semantic tableau method, also called the “truth tree” method, is a reduction-
style proof procedure not just for propositional logic, but also for other logic,
82 3 Reasoning in Propositional Logic
including first-order logic, temporal, and modal logic. A tableau for a formula is a
tree structure; each node of the tree is associated with a set of formulas derived from
the input formula. The tableau method constructs this tree and uses it to prove/refute
the satisfiability of the input formula. The tableau method can also determine the
satisfiability of finite sets of formulas of various logics. It is the most popular proof
procedure for modal logic.
For propositional logic, the semantic tableau checks whether a formula is
satisfiable or not, by “breaking” complex formulas into smaller ones as we do for
transforming the formula into DNF (disjunctive normal form). The tableau used
by the method is a tree structure which records the transforming process. The
method starts with the root node which contains the input formula and grows the
tree by expanding a leaf node and assigning a set of formulas to new nodes. In each
expansion, one or two successors are added to the chosen leaf node.
• Each node of the tree is assigned a set of formulas and the comma “,” in the set
is a synonym to ∧.
• A node is called closed if it contains a complementary pair of literals (or
formulas), or ⊥, or ¬.
• A node is called open if it contains only a set of consistent literals, which is
equivalent to a product in DNF.
• If a node is neither closed nor open and has no children (a leaf in the current tree),
then it is expandable. We may apply the transformation rules for DNF on one of
the formulas in an expandable node to create children. Note that the rules apply
only to the topmost logical operator of a formula. The expansion stops when the
tree has no expandable nodes.
• The tree links indicate the equivalence relation among formula sets. If a node
x has two successors, say y and z, the formula set of x is equivalent to the
disjunction of the two formula sets of y and z. The transformation rules used
to obtain two successors are called β-rules. If x has only one successor, the used
rules are called α-rules, and the formula set of x is equivalent to that of the
successor node.
In the following, we will use tree and tableau as synonyms. To prove the
unsatisfiability of a formula, the method starts by generating the tree as described
above and checks that every leaf node is closed, that is, no open nodes and no
expandable nodes as leaves. In this case, we say the tree is closed.
Example 3.1.2 For formula A to be p ∧ (¬q ∨ ¬p) and B be (p ∨ q) ∧ (¬p ∧ ¬q),
their tableaux are shown in Fig. 3.1. For A, the tableau has one open node (marked
by ) and one closed node (marked by ×). For B, it has only two closed nodes.
Thus, A is satisfiable as it has an open node; and B is unsatisfiable as its tableau is
closed.
The tableaux can be displayed linearly, using strings over {1, 2} to label the
parent–child relations: The root node has the empty label. If a node has label x,
its first child has label x1 and its second child has label x2, and so on. For example,
we may display the tableaux of A and B in Fig. 3.1 as follows:
3.1 Proof Procedures 83
Fig. 3.1 The tableaux for A = p ∧ (¬q ∨ ¬p) and B = (p ∨ q) ∧ (¬p ∧ ¬q)
The labels of nodes can uniquely identify the positions (or addresses) of these
nodes in a tableau.
Here is the listing of α-rules and β-rules, which apply to the top logical operator in
a formula.
α α1 , α2 β β1 β2
A∧B A, B ¬(A ∧ B) ¬A ¬B
¬(A ∨ B) ¬A, ¬B A∨B A B
¬(A → B) A, ¬B A→B ¬A B
¬(A ⊕ B) (A ∨ ¬B), (¬A ∨ B) A⊕B A, ¬B ¬A, B
A ↔ B (A ∨ ¬B), (¬A ∨ B) ¬(A ↔ B) A, ¬B ¬A, B
¬(A ↑ B) A, B A↑B ¬A ¬B
A↓B ¬A, ¬B ¬(A ↓ B) A B
¬¬A A
Those rules are what we used for transforming a formula into DNF. If we start
with a formula in NNF (negation normal form), we need only one α-rule for A ∧ B
and one β-rule for A ∨ B. All other α-rules and β-rules are not needed for NNF,
because from NNF to DNF, only the distributive law is needed.
84 3 Reasoning in Propositional Logic
In Chap. 1, we pointed out that there are two important properties concerning
a proof procedure: soundness and completeness. If semantic tableau is used as a
refutation prover, the soundness means that if the procedure says the formula is
unsatisfiable, then the formula must be unsatisfiable. The completeness means that
if the formula is unsatisfiable, then the procedure should be able to give us this
result.
Theorem 3.1.4 The semantic tableau method is a decision procedure for proposi-
tional logic.
Proof Suppose without loss of generality the formula set at one node is {X, Y, Z},
where X represents an α formula, Y a β formula, and Z any other formula, and this
set is equivalent to A = X ∧ Y ∧ Z. If X is transformed by α-rule to X1 and X2 ,
then from X ≡ X1 ∧ X2 , A ≡ X1 ∧ X2 ∧ Y ∧ Z. If Y is transformed by β-rule
to Y1 and Y2 , then from Y ≡ Y1 ∨ Y2 , A ≡ (X ∧ Y1 ∧ Z) ∨ (X ∧ Y2 ∧ Z). In
other words, the logical equivalence is maintained for every parent–child link in the
tree. By a simple induction on the structure of the tree, the formula at the root of the
tree is equivalent to the disjunction of the formula sets of all leaf nodes. That is, the
procedure modifies the tree in such a way that the disjunction of the formula sets of
all leaf nodes of the resulting tableau is equivalent to the input formula. One of these
formula sets may contain a pair of complementary literals, i.e., the corresponding
node is closed, in which case that set is equivalent to false. If the sets of all leaf
3.2 Deductive Systems 85
nodes are equivalent to false, i.e., every lead node is closed, the input formula is
unsatisfiable. This shows the soundness of the semantic tableau method.
If a formula is unsatisfiable, then every term in its DNF must be equivalent to
false and the corresponding node in the tableau is closed. Thus, we will have a
closed tableau. This shows the completeness of the semantic tableau method.
Since neither α-rules nor β-rules can be applied forever (each rule reduces either
one occurrence of a binary operator or two negations), the semantic tableau method
must be terminating.
The semantic tableau method can be used to show the satisfiability of a formula.
If there is an open node in the tableau, we may assign 1 to every literal in this
node to create an interpretation, and the formula set of this node is true under this
interpretation. The input formula will also be true under this interpretation because
the input formula is equivalent to the disjunction of the formula sets of all leaf
nodes. Thus, this interpretation is a model of the input formula. This shows that the
procedure can be used to find a model when the formula is satisfiable.
Semantic tableau specifies what rules can be used but it does not specify which
rule should be used first or which leaf node should be expanded first. In general, α-
rules are better applied before β-rules, so that the number of nodes in a tree grows
slowly. The user has the freedom to apply β-rules to any formula in the set, resulting
trees of different shapes. The procedure may terminate once an open node is found,
if the goal is to show that the input formula is satisfiable.
Semantic tableaux are much more expressive and easier to use than truth-tables,
though that is not the reason for their introduction. Sometimes, a tableau uses more
space than a truth table. For example, an unsatisfiable full CNF of n variables will
generate a tableau of more than n! nodes, which is larger than a truth table of 2n
rows. The beauty of semantic tableaux lies in the simplicity of presenting a proof
procedure using a set of rules. We will see in later chapters how new rules are added
into this procedure so that a proof procedure for other logics can be obtained.
In logic, a deductive system S consists of a set of inference rules, where each rule
takes premises as input and returns a conclusion (or conclusions) as output. Popular
inference rules in propositional logic include modus ponens (MP), modus tollens
(MT), and contraposition (CP), which can be displayed, respectively, as follows:
A→B A A → B ¬B A→B
(MP) (MT) (CP)
B ¬A ¬B → ¬A
86 3 Reasoning in Propositional Logic
P1 P2 · · · Pk
C
where P1 , P2 , . . . , Pk are the premises and C is the conclusion. We may also write
it as
P1 , P2 , . . . , Pk C
If the premises are empty (k = 0), we say this inference rule is an axiom rule.
An inference rule is sound if its premises entail the conclusion, that is,
{P1 , P2 , . . . , Pk } | C
1. p → (q → r) axiom
2. p axiom
3. q→r MP, 1, 2
4. ¬r axiom
5. ¬q MT, 3, 4
where “MP, 1, 2” means formula 3. is obtained by the MP rule with formulas 1. and
2. as premises.
The formulas in a proof can be rearranged so that all axioms appear in the
beginning of the sequence. For example, the above proof can be rewritten as the
following.
3.2 Deductive Systems 87
1. p → (q → r) axiom
2. p axiom
3. ¬r axiom
4. q→r MP, 1, 2
5. ¬q MT, 3, 4
Every proof can be represented by a directed graph where each node is a formula
in the proof, and if a formula B is derived from A1 , A2 , . . . , Ak by an inference
rule, then there are edges (Ai , B), 1 ≤ i ≤ k, in the graph. This proof graph must
be acyclic as we assume the premises must be present in the sequence before the
new formula is derived by an inference rule. Every topological sort of the nodes in
this graph should give us a linear proof. Since every acyclic direct graph (DAG) can
be converted into a tree (by duplicating some nodes) and still preserve the parent–
child relation, we may present a proof by a tree as well. The proof tree for the above
proof example is given in Fig. 3.2. Note that all axioms do not have the incoming
edges. The final formula in the proof does not have outgoing edges. All intermediate
formulas have both incoming and outgoing edges.
The soundness of an inference system comes from the soundness of all inference
rules. In general, we require all inference rules be sound to preserve the truth of all
the derived formulas. For a sound inference system S, every derived formula B is a
theorem of A, i.e., B ∈ T (A). If A is a set of tautologies, B is tautology, too.
Theorem 3.2.2 If an inference system S is sound and A S B, then A | B.
Proof A S B means there is a proof of B in S. Suppose the proof is
F1 , F2 , . . . , Fn and Fn = B. By induction on n, we show that A | Fi for all
i = 1, . . . , n. If Fi ∈ A, then A | Fi . If Fi is derived from an inference
rule “P1 , P2 , . . . , Pm C”, using Fj1 , Fj2 , . . . , Fjm as premises, by induction
hypotheses, for 1 ≤ k ≤ m, A | Fjk , because jk < i. That is, A | ∧m k=1 Fjk .
Since the inference rule is sound, ∧m P
k=1 k | C. Applying the substitution
theorem, ∧m F
k=1 jk | F i . Because | is transitive, A | F i .
Different inference systems are obtained by changing the axioms or the inference
rules. In propositional logic, all these systems are equivalent in the sense that they
are sound and complete.
H1 : (A → (B → A)),
H2 : (A → (B → C)) → ((A → B) → (A → C)),
H3 : (¬B → ¬A) → (A → B),
MP : A, A → B B.
The axiom rule H3 is not needed if → is the only operator in all formulas. Other
logical operators can be defined using → and ¬. For example, A ∧ B ≡ ¬(A →
¬B) and A ∨ B ≡ (¬A → B). ¬A can be replaced by A → ⊥ in H3 .
Example 3.2.4 To show that H (p → p), we have the following proof in H:
1. p → ((p → p) → p) H1 , A = p, B = (p → p)
2. (p → ((p → p) → p))
((p → (p → p)) → (p → p)) H2 , A = C = p, B = p → p
3. (p → (p → p)) → (p → p) MP, 1, 2
4. (p → (p → p)) H1 , A = p, B = p
5. p → p MP, 3, 4
1. p→q axiom
2. q→r axiom
3. ((q → r) → (p → (q → r)) H1
4. p → (q → r) MP, 2, 3
5. (p → (q → r)) → ((p → q) → (p → r)) H2
6. (p → q) → (p → r) MP, 4, 5
7. p→r MP, 1, 6
Both semantic tableaux and Hilbert system for constructing proofs have their
disadvantages. Hilbert system is difficult to construct proofs and its main uses are
metalogical. The small number of rules makes it easier to prove soundness about
logic. The tableau method on the other hand is easy to use mechanically, but,
because of the form of the rules and the fact that a tableau starts from the negation
of the formula to be proved, the resulting proofs are not a natural sequence of easily
justifiable steps. Likewise, very few proofs in mathematics are from axioms directly.
Mathematicians in practice usually reason in a more flexible way.
Natural deduction is a deductive system in which logical reasoning is expressed
by inference rules closely related to the “natural” way of reasoning. This contrasts
with Hilbert system, which instead use axiom rules as much as possible to express
the logical laws of deductive reasoning. Natural deduction in its modern form was
independently proposed by the German mathematician Gerhard Gentzen in 1934.
In Gentzen’s natural deductive system, a formula is represented by a set A of
formulas, A = {A1 , A2 , . . . , An }, where the comma in A is understood as ∨. To
avoid confusion, we will write A as
(A1 | A2 | · · · | An )
name inferencerule
axiom (A | ¬A | α)
cut (A | α), (¬A | β) (α | β)
thinning (α) (A | α)
op introduction elimination
¬ (A | α) (¬¬A | α) (¬¬A | α) (A | α)
∨ (A | B | α) (A ∨ B | α) (A ∨ B | α) (A | B | α)
∧ (A | α), (B | α) (A ∧ B | α) (a) (A ∧ B | α) (A | α)
(b) (A ∧ B | α) (B | α)
→ (¬A | B | α) (A → B | α) (A → B | α) (¬A | B | α)
¬∨ (¬A | α), (¬B | α) (¬(A ∨ B) | α) (a) (¬(A ∨ B) | α) (¬A | α)
(b) (¬(A ∨ B) | α) (¬B | α)
¬∧ (¬A | ¬B | α) (¬(A ∧ B) | α) (¬(A ∧ B) | α) (¬A | ¬B | α)
¬→ (A | α), (¬B | α) (¬(A → B) | α) (a) (¬(A → B) | α) (A | α)
(b) (¬(A → B) | α) (¬B | α)
1. (p | ¬p | ¬q) axiom, A = p, α = ¬q
2. (q | ¬q | ¬p) axiom, A = q, α = ¬p
3. (p ∧ q | ¬p | ¬q) ∧I , 1, 2, A = p, B = q, α = ¬p | ¬q
4. (¬¬(p ∧ q) | ¬p | ¬q) ¬I , 4, A = (p ∧ q), α = (¬p ∨ ¬q)
5. (¬¬(p ∧ q) | (¬p ∨ ¬q)) ∨I , 3, A = ¬p, B = ¬q, α = ¬¬(p ∧ q)
6. (¬(p ∧ q) → (¬p ∨ ¬q)) →I , 5, A = ¬(p ∧ q), B = (¬p ∨ ¬q), α = ⊥
Note that I in ∧I means the introduction rule for ∧ is used. For better understanding,
we added the substitution of variables in each step.
Proposition 3.2.10 The deductive system G is sound, i.e., if A G B, then A | B.
Proof It is easy to check that every inference rule of G is sound. In particular,
the rules of operator introduction preserve the logical equivalence. In the rules of
operator elimination, some sequents generate two results. For example, (A ∧ B | α)
3.2 Deductive Systems 91
infers both (a) (A | α) and (b) (B | α). If we bind the two results together by
∧, these rules preserve the logical equivalence, too. On the other hand, the cut rule
and the thinning rule do not preserve the logical equivalence; only the entailment
relation holds. That is, for the cut rule, (A | α), (¬A | β) (α | β), we have
(A ∨ α) ∧ (¬A ∨ β) | α ∨ β,
Comparing the proof of G and the tableau above, we can see a clear correspon-
dence: Closed nodes (f ) and (g) correspond to (1) and (2), which are axioms: (e)
to (3), (c) and (d) to (4), (b) to (5), and finally (a) to (6), respectively. That is, the
formula represented by each sequent in a proof of G can always find its negation in
the corresponding node of the semantic tableau.
We show below that G can be used as a refutation prover by the same example.
Example 3.2.11 Let A = {¬(¬(p ∧ q) → (¬p ∨ ¬q))}. A proof of A | ⊥ in G
is given below.
Note that ⊥ in the last step stands for the empty sequent.
In this proof the used inference rules are the cut rule and the rules for eliminating
logical operators. If we apply repeatedly the rules of operator elimination to a
formula, we will obtain a list of sequents, each of which represents a clause. In
other words, these rules transform the formula into CNF, then the cut rule works on
them. The cut rule for clauses has another name, i.e., resolution, which is the topic
of the next section.
Indeed, G contains more rules than necessary: Using only the axiom rule and the
rules for operator introduction, it can imitate the dual of a semantic tableau upside
down. Using only the cut rule and the rules for operator elimination, it can do what
a resolution prover can do. Both semantic tableau and resolution prover are decision
procedures for propositional logic. The implication is that natural deduction G is a
decision procedure, too. We give the main result without a proof.
Theorem 3.2.12 The natural deduction G is sound and complete, that is, A | B
iff A G B.
We have seen two examples of deductive systems: Hilbert system H and Gentzen’s
natural deduction G. To show the completeness of these deductive systems, we need
strategies on how to use the inference rules in these systems and the completeness
depends on these strategies. To facilitate the discussion, let us introduce a few
concepts.
Definition 3.2.13 Given an inference system S, the inference graph of S over the
axiom set A is defined as a directed graph G = (V , E), where each vertex of V is
a set x of formulas, where A ∈ V , x ⊆ T (A), and (x, y) ∈ E iff y = x ∪ {C},
where C ∈ / x is the result of an inference rule r ∈ S using some formulas in x as the
premises of r.
Graph G = (V , E) defines the search space for A S B, which becomes a search
problem: To find a proof of A S B is the same as to find a path from A to a node of
V containing B. That is, if all the axioms of A in a proof appear in the beginning of
the proof sequence, then this proof presents a path in the inference graph. Suppose
the proof is a sequence of formulas F1 , F2 , . . . , Fn such that the first k formulas are
from A, i.e., Fi ∈ A for 1 ≤ i ≤ k and Fi ∈ / A for i > k. Then this proof represents
a directed path in x0 , x1 , . . . , xn−k in the inference graph such that x0 = A, Fi ∈ A
for 1 ≤ i ≤ k, and Fj ∈ xj−k for k < j ≤ n. That is, a proof is a succinct way of
presenting a directed path in the inference graph G.
3.3 Resolution 93
1. p → (q → r) axiom
2. p axiom
3. ¬r axiom
4. q→r MP, 1, 2
5. ¬q MT, 3, 4
This proof corresponds to the path (x0 , x1 , x2 ) in the inference graph, where
x0 = {1., 2., 3.}, x1 = x0 ∪ {4.}, and x2 = x1 ∪ {5.}.
Given a set of inference rules, a proof procedure can be easily constructed if
we use a fair strategy to search the inference graph. For example, starting from
A in the inference graph, a breadth-first search is a fair strategy, but a depth-first
search is not if the graph is infinite. Fair strategy requires that if an inference rule
is applicable at some point, this application will happen eventually. Any fair search
strategy, including heuristic search, best-first search, A*, etc., can be used for a proof
procedure.
If every inference rule is sound, the corresponding proof procedure will be sound.
Extra requirement like fair strategy is needed in general to show that the proof
procedure is complete or halting. If the inference graph is finite, the termination
of the proof procedure is guaranteed, but it does not imply the completeness of the
procedure due to lack of certain inference rules.
3.3 Resolution
In the previous section, we introduced two deductive systems, Hilbert system and
natural deduction. These two systems are exemplary for the development of many
deductive systems but are not practical for theorem proving in propositional logic.
One of the reasons is the use of axiom rules which provide infinite possibilities.
Unlike reduction rules, inference rules may be applied without termination. For
automated theorem proving, we need a deductive system which contains very few
rules as computers can do simple things fast. Resolution R is such a system, which
contains a single inference rule, called resolution. Note that resolution is a special
case of the cut rule when all sequents are clauses.
To use the resolution rule, formula A is first transformed into CNF, which is a
conjunction of clauses (disjunction of literals). We will use A = {C1 , C2 , . . . , Cm }
94 3 Reasoning in Propositional Logic
(l1 | l2 | · · · | lk ),
(A | α) (¬A | β)
(α | β)
(¬α → A) (A → β)
(¬α → β)
A A→β
β
Example 3.3.1 Let C = {(p | q), (p | ¬q), (¬p | q | r), (¬p | ¬q), (¬r)}. A
proof of C R ⊥ is given below:
3.3 Resolution 95
1. (p | q) axiom
2. (p | ¬q) axiom
3. (¬p | q | r) axiom
4. (¬p | ¬q) axiom
5. (¬r) axiom
6. (q | r) R, 1, 3, A = p, α = q, β = (q | r)
7. (¬q) R, 2, 4, A = p, α = ¬q, β = ¬q
8. (r) R, 6, 7, A = q, α = r, β = ⊥
9. ( ) R, 8, 5, A = r, α = ⊥, β = ⊥
Note that we assume that duplicated literals are removed and the order of literals
is irrelevant in a clause. Thus, the resolvent generated from clauses 1 and 3 is (q |
q | r), but we keep it as (q | r) by removing a copy of q. From clauses 1 and 4, we
may obtain a resolvent (q | ¬q) on p, or another resolvent (p | ¬p) on q. These two
resolvents are tautologies and are not useful in the search for ⊥. You cannot resolve
on two variables at the same time to get ( ) directly from clauses 1 and 4.
Definition 3.3.2 A resolution proof P from the set of input clauses C is a refutation
proof of C R ⊥ and the length of P is the number of resolvents in P , denoted by
|P |. The set of the input clauses appearing in P is called the core of P , denoted by
core(P ).
The length of the resolution proof in Example 3.3.1 is 4, which is the length of
the list minus the number of axiom clauses in the proof. The core of this proof is the
entire input.
Proposition 3.3.3 The resolution R is sound, i.e., if C R B, then C | B.
Proof Using truth table, it is easy to check that ((A ∨ α) ∧ (¬A ∨ β)) → (α ∨ β)
is tautology. Thus, the resolution rule is sound. The proposition holds following
Theorem 3.2.2.
Corollary 3.3.4 If C R ⊥, then C is unsatisfiable.
Proof Since C R ⊥ means C | ⊥, so M(C) = M(C ∧⊥) = M(C)∩M(⊥) =
∅, because M(⊥) = ∅ (see Corollary 2.2.26).
This corollary allows us to design a refutation prover: To show A is valid, we
convert ¬A into an equivalent set C of clauses and show that C R ⊥, where C
is called the input clauses. Since the total number of clauses is finite, the inference
graph of R is finite. It means that the proof procedure will terminate but it does not
mean the graph is small. Before we establish the completeness of R, let us consider
how to build an effective resolution prover.
A successful implementation of a resolution prover needs the integration of
search strategies that avoid many unnecessary resolvents. Some strategies remove
redundant clauses as soon as they appear in a derivation. Some strategies avoid
96 3 Reasoning in Propositional Logic
generating redundant clauses in the first place. Some strategies sacrifice the
completeness for efficiency.
Here are some well-known resolution strategies which add restriction to the use of
the resolution rule.
• Unit resolution: One of the two parents is a unit clause.
• Input resolution: One of the two parents is an input clause (given as the axioms).
• Ordered resolution: Given an order on propositional variables, the resolved
variable must be maximal in both parent clauses.
• Positive resolution: One of the two parents is positive.
• Negative resolution: One of the two parents is negative.
• Set-of-support: The input clauses are partitioned into two sets, S and T , where
S is called the set of support, and a resolution is not allowed if both parents are
in T .
• Linear resolution: The latest resolvent is used as a parent for the next resolution
(no restrictions on the first resolution).
A resolution proof is called X resolution proof if every resolution in the proof is
an X resolution, where X is “unit,” “input,” “ordered,” “positive,” “negative,” “set-
of-support,” or “linear.”
Example 3.3.5 Let C = {1. (p | q), 2. (¬p | r), 3. (p | ¬q | r), 4. (¬r)}. In the
following proofs, the input clauses are omitted and the parents of each resolvent are
shown as a pair of numbers following the resolvent.
Note that (a) is a unit and negative resolution proof; (b) is an input, positive and
linear resolution proof; (c) is an ordered resolution proof, assuming p > q > r.
The length of proof (a) is 5; the length of (b) is 3; and the length of (c) is 4.
It is easy to check that (a) is not an input resolution proof, because clause 7 is not
obtained by input resolution. (b) is not a unit resolution proof. (c) is neither a unit
nor an input resolution proof; (c) is neither positive nor negative resolution proof.
3.3 Resolution 97
The resolution rule does not preserve the logical equivalence, that is, if D is a
resolvent of C1 and C2 , then C1 ∧ C2 | D, or C1 ∧ C2 ≡ C1 ∧ C2 ∧ D, but
not C1 ∧ C2 ≡ D.
Example 3.3.6 Let q be the resolvent of p and ¬p | q. M(q) = {pq, pq} has two
models, where p denotes ¬p. However, M(p ∧ (¬p | q)) has only one model, i.e.,
pq. Thus, q ≡ (p ∧ (¬p | q)). However, both formulas are satisfiable.
Definition 3.3.7 Given two formulas A and B, A and B are said to be equally
satisfiable or equisatisfiable, if whenever A is satisfiable, so is B, and vice versa.
We denote this relation by A ≈ B.
Obviously, A ≈ B means M(A) = ∅ iff M(B) = ∅. A ≈ B is weaker than the
logical equivalence A ≡ B, which requires M(A) = M(B).
Proposition 3.3.8 Let S = C ∪ {(α | β)} and S = C ∪ {(α | x), (¬x | β)} be two
sets of clauses, where x is a variable not appearing in S. Then S ≈ S .
Proof If S is satisfiable and σ is a model of S, then σ (C) = 1 and σ (α | β) = 1. If
σ (α) = 0, then σ (β) = 1. We define σ (x) = 1 if σ (α) = 0; otherwise σ (x) = 0.
Thus, both σ (α | x) = 1 and σ (¬x | β) = 1. So σ is a model of S .
98 3 Reasoning in Propositional Logic
(A ∧ B) ∨ C ≡ (A ∨ C) ∧ (B ∨ C)
The input size of A is 14 (excluding commas and parentheses). The total number of
literals of the resulting clause set is 30.
Example 3.3.11 In Example 2.3.9, we have converted p1 ↔ (p2 ↔ p3 ) into four
clauses:
Lemma 3.3.12 plays an important role in the completeness proof of resolution: The
set X can be obtained from C by resolution. From C to D ∪ X, we have eliminated
one variable from C. Replacing C by D ∪X, we repeat the same process and remove
another variable, until no more variable in C: either C = ∅ or C = {⊥}. From the
final result, we can tell if the set of input clauses is satisfiable or not.
The following proof is based on the above idea for the completeness of ordered
resolution.
Theorem 3.3.13 Ordered resolution is complete with any total ordering on the
variables.
Proof Suppose there are n variables in the input clauses Sn with the order xn >
xn−1 > · · · > x2 > x1 . For i from n to 1, we apply Lemma 3.3.12 with C = Si
and x = xi , and obtain Si−1 = D ∪ X. Every resolvent in X is obtained by ordered
resolution on xi , which is the maximal variable in Si . Finally, we obtain S0 which
contains no variables. Since Sn ≈ Sn−1 ≈ · · · ≈ S1 ≈ S0 , if Sn is unsatisfiable,
then S0 = {⊥}, where ⊥ denotes the empty clause; if Sn is satisfiable, we arrive at
S0 = ∅, i.e., S0 ≡ . In this case, we may use the idea in the proof of Lemma 3.3.12
to assign a truth value for each variable, from x1 to xn , to obtain a model of Sn .
Example 3.3.14 Let S = {c1 : (a | b | c), c2 : (a | e), c3 : (b | c), c4 : (b | d),
c5 : (c | d | e), c6 : (d | e) } and a > b > c > d > e. By ordered resolution,
we get c7 : (b | c | e) from c1 and c2 ; c8 : (c | d) from c3 and c4 ; c9 : (c | e)
from c3 and c7 ; and c10 : (d | e) from c5 and c9 . No other clauses can be generated
by ordered resolution, that is, c1 –c10 are saturated by ordered resolution. Since the
empty clause is not generated, we may construct a model from c1 –c10 by assigning
truth values to the variables from the least to the greatest. Let Sx be the set of clauses
from c1 –c10 such that x appears in each clause of Sx as the maximal variable.
• Se = ∅ and we can assign either 0 or 1 to e, say σe = {e}.
• Sd = {c6 , c10 }. From c10 , d has to be 1, so σd = σe ∪ {d} = {e, d}.
• Sc = {c5 , c8 , c9 }. From c8 or c9 , c has to be 1, so σc = σd ∪ {c} = {e, d, c}.
• Sb = {c3 , c4 , c7 }. Both c3 and c7 are satisfied by σc . From c4 , we have to assign
0 to b, so σb = σc ∪ {b} = {e, d, c, b}.
• Sa = {c1 , c2 }. From c2 , a has to be 1, so σa = σb ∪ {a} = {e, d, c, b, a}.
It is easy to check that σa is a model of S. Hence, S is satisfiable.
The above example illustrates that if a set of clauses is saturated by ordered
resolution and the empty clause is not there, we may construct a model from these
clauses. In other words, we may use ordered resolution to construct a decision
procedure for deciding if a formula A is satisfiable or not.
Algorithm 3.3.15 The algorithm orderedResolution will take a set C of clauses and
a list V of variables, V = (p1 < p2 < . . . < pn ). It returns true iff C is satisfiable.
3.3 Resolution 101
It uses the procedure sort (A), which places the maximal literal of clause A as the
first one in A.
proc orderedResolution(C, V )
C := {sort (A) | A ∈ C}
for i := n downto 1 do
X := ∅
for (pi | α) ∈ C do // pi is maximal in (pi | α)
for (pi | β) ∈ C do // pi is maximal in (pi | β)
A := (α | β)
if A = ( ) return false
X := X ∪ {sort (A)}
C := C ∪ X
return true
Before returning true at the last line, this algorithm can call createModel to
construct a model of C when C is saturated by ordered resolution. That is, C
contains the results of every possible ordered resolution among the clauses of C.
proc createModel(C)
σ := ∅ // the empty interpretation
for i := 1 to n do
v := 0 // default value for pi
for (pi | α) ∈ C do // pi is maximal
if eval(α, σ ) = 0 do
v := 1 // pi has to take 1
σ := σ ∪ {pi → v}
return σ
A := pickClause(G)
G := G − {A}
for B ∈ K if resolvable(A, B) do
res := resolve(A, B)
if res = ( ) return false
if res ∈
/ (G ∪ K)
G := G ∪ {res}
K := K ∪ {A}
return true
In the above algorithm, we move each clause from G to K, and compute the
resolution between this clause and all the clauses in K. When G is empty, resolution
between any two clauses of K is done.
With the exception of linear resolution, the above procedure can be used to
implement all the resolution strategies introduced in this section. The restriction
will be implemented in the procedure resolvable. For set-of-resolution, a better
implementation is that G is initialized with the set of support and K is initialized
with the rest of the input clauses.
The termination of this procedure is guaranteed because only a finite number
of clauses exist. When the procedure returns false, the answer is correct because
resolution is sound. When it returns true, the correctness of the answer depends on
the completeness of the resolution strategy implemented in resolvable.
In pickClause, we may implement various heuristics for selecting a clause in G
to do resolutions with clauses in K. Popular heuristics include preferring shorter
clauses or older clauses. Preferring shorter clauses because shorter clauses are
stronger as constraints on the acceptance of interpretations as models. An empty
clause prohibits all interpretations as models; a unit clause prohibits half of the
interpretations as models; a binary clause prohibits a quarter of the interpretations
as models; and so on. Preferring older clauses alone would give us the breadth-first
strategy. If we wish to find a shorter proof, we may mix this preference with other
heuristics.
Despite the result that the algorithm resolution with a complete strategy can serve
as a decision procedure for propositional logic, and some strategies can even build
a model when the input clauses are satisfiable, resolution provers are in general
not efficient for propositional logic, because the number of possible clauses is
exponential in terms of the number of variables. The computer may quickly run
out of memory before finding a solution for real application problems.
There are several strategies which can safely remove redundant clauses inside the
resolution algorithm. They are tautology deletion, subsumption deletion, and pure
literal deletion.
104 3 Reasoning in Propositional Logic
A unit clause (A) subsumes every clause (A | α), where A appears as a literal.
Unit resolution between (A) and (A | β), where A = p if A is p, generates the
resolvent (β), which will subsume (A | β). The rule for replacing (A | β) by (β)
in the presence of (A) is called unit deletion. That is, when subsumption deletion is
used in resolution, a unit clause (A) allows us to remove all the occurrences of A
(with the clauses containing A) and A (only A in the clauses containing A), with the
unit clause itself as the only exception. New unit clauses may be generated by unit
resolution, and we can continue to simplify the clauses by the new unit clause and
so on. This process is traditionally called Boolean constraint propagation (BCP),
or unit propagation.
({x5 , x2 , x4 , x1 }, ∅)
because we first add x5 into U ; c1 becomes (x2 ), and x2 is then added into U ; c4 is
deleted because x2 ∈ U ; c3 becomes (x4 ) and x4 is added into U ; c2 becomes (x1 )
and x1 is added into U . The set U allows us to construct two models of C:
{x1 → 0, x2 → 1, x3 → v, x4 → 1, x5 → 0}
A Horn clause is a clause with at most one positive literal. Horn clauses are named
after Alfred Horn, who first pointed out their significance in 1951. A Horn clause
is called definite clause if it has exactly one positive literal. Thus, Horn clauses can
be divided into definite clauses and negative clauses. Definite clauses can be further
divided into fact (a positive unit clause) and rule (non-unit definite clauses). Horn
clauses have important applications in logic programming, formal specification, and
model theory, as they have very efficient decision procedures, using unit resolution
or input resolution. That is, unit and input resolutions are incomplete in general, but
they are complete for Horn clauses.
Theorem 3.4.4 BCP is a decision procedure for the satisfiability of Horn clauses.
Proof Suppose H is a set of Horn clauses and ⊥ ∈ / H . It suffices to show that pro-
cedure BCP(H ) returns ⊥ iff H ≡ ⊥. If BCP(H ) returns ⊥, by Proposition 3.4.3,
H ≡ ⊥. If BCP(H ) returns (U, S), by Proposition 3.4.3, H ≡ U ∪ S, U is a set
of unit clauses, S is a set of non-unit clauses, and U and S share no variables. In
this case, we create an assignment σ in which every literal in U is true, and every
variable in S is false. Note that each variable in U appears only once; U and S do
not share any variable. Thus, this assignment is consistent and is a model of U ∪ S,
because every unit clause in U is true under σ ; for every clause in S, since it is a
non-unit Horn clause and must have a negative literal, which is true under σ . Thus,
H is satisfiable when BCP(H ) returns (U, S).
Proposition 3.4.5 Unit resolution is sound and complete for Horn clauses.
Proof Unit resolution is sound because resolution is sound. The previous theorem
shows that BCP is a decision procedure for the satisfiability of Horn clauses. The
only used inference rule in BCP is unit resolution. Every clause generated by
BCP can be generated by unit resolution, and subsumption only reduces some
unnecessary clauses and does not affect the completeness.
3.4 Boolean Constraint Propagation (BCP) 109
In the following, we show that input resolution and unit resolution are equivalent,
in the sense that if there exists a unit resolution proof, then there exists an input
resolution proof and vice versa.
Theorem 3.4.6 For any set C of clauses, if there exists an input resolution proof
from C, then there exists a unit resolution proof from C.
Proof We will prove this theorem by induction on the length of proofs. Let P be
an input resolution proof of minimal length from C. The last resolvent is ⊥, both
its parents must be unit clauses, and one of them must be an input clause, say (A).
For simplicity, assume all the input clauses used in P appear in the beginning of P
and (A) is the first in P . If |P | = 1, then P must be a unit resolution proof and
we are done. If |P | > 1, then remove (A), the last clause in P (i.e., ⊥), and all
occurrences of A from P , where A = p if A = p and A = p if A = p. The result
is an input resolution proof P from C , where C is obtained from C by removing
all occurrences of A in C. That is, (A | α) ∈ C iff (α) ∈ C . Because |P | < |P |,
by induction hypothesis, there exists a unit resolution proof Q from C . If (α) ∈ C
comes from (A | α) ∈ C, then change the reason of (α) ∈ Q from “axiom” to
“resolution from (A) and (A | α)”. Let the modified Q be Q, then Q is a unit
resolution from C.
Example 3.4.7 We illustrate the proof of this theorem by Example 3.3.5, where
Let P be the input resolution proof (b) of Example 3.3.5. Then A = r, and C is the
result of removing r from C:
and two input resolutions in C are c5 : (p) from (c1 , c3 ) and c6 : ( ) from (c2 , c5 ).
A unit resolution proof Q from C will contain three unit resolutions:
c5 : (q) from (c1 , c2 ), c6 : (q) from (c2 , c3 ), c7 : ( ) from (c5 , c6 )
c2 : (p) from (c2 , c4 ), c3 : (p | q) from (c3 , c4 ), c5 : (q) from (c1 , c2 ),
c6 : (q) from (c2 , c3 ), c7 : ( ) from (c5 , c6 ),
The above result is the basis for the completeness of pure Prolog programs, which
will be discussed in Chap. 8.
For a unit clause (A) to be true, the truth value of literal A needs to be 1. We can
use a partial interpretation σ to remember which literals appear in unit clauses.
Definition 3.4.12 A partial interpretation σ is a set σ of literals, where no variables
appear in σ more than once, with the understanding that for any literal A, if A ∈ σ ,
then σ (A) = 1 and σ (A) = 0, where A = p if A is p. When σ contains every
variable exactly once, it is a full interpretation.
From the above definition, it is convenient to consider a set of unit clauses as a
partial interpretation. For example, if U = {p, ¬q, r}, it represents the interpretation
σ = {p → 1, q → 0, r → 1}.
We can evaluate the truth value of a clause or formula in a partial interpretation
as we do in a full interpretation.
Definition 3.4.13 Given a partial interpretation σ and a clause c,
• c is satisfied if one or more of its literals is true in σ .
• c is conflicting if all the literals of c are false in σ .
• c is unit if all but one of its literals are false and c is not satisfied.
• c is unsolved if c is not one of the above three cases.
When c becomes unit, we have to assign 1 to the unassigned literal, say A, in c,
so that c becomes satisfied. That is, c is the reason for literal A being true, and we
record this as reason(A) = c. We also say literal A is implied by c.
Example 3.4.14 Given the clauses {c1 : (x1 ), c2 : (x3 ), c3 : (x1 | x2 | x3 )},
reason(x1 ) = c1 , reason(x3 ) = c2 , and reason(x2 ) = c3 . x2 is implied to be true
by c3 because c3 is equivalent to (x1 ∧ x3 ) → x2 .
In BCP, we start with an empty partial interpretation σ . If we have found a
unit clause (A) , we extend σ by adding A into σ so that the unit clause becomes
satisfied. To save time, we do not remove clauses containing A from the clause
set. We do not remove A from any clause to create a new clause. Instead, we just
remember A is now false in σ .
We need to find out which clauses become unit clauses when some of literals
become false. A clause c of k literals becomes unit when k − 1 literals of c are false.
If we know that two literals of c are not false, we are certain that c is unsolved. Let
112 3 Reasoning in Propositional Logic
us designate these two literals as head and tail literals of c, which are two distinct
literals of c; we have the following property [2].
Proposition 3.4.15 A clause c of length longer than one is neither unit nor
conflicting in a partial interpretation σ iff either one of the head/tail literals of c
is true or neither of them is false in σ .
Proof Suppose c contains k literals and k > 1. If c is satisfied, then it contains a
true literal and we may use it as one of the head/tail literals. If c is unit or conflicting,
there are k − 1 or all literals of c are false, and no literal in c is true. In this case,
there do not exist two literals which are not false for the head/tail literals. If c is
unsolved, then there must exist two unassigned literals for the head/tail literals.
Initially, let the first and last literals be the head/tail literals. When a head literal
becomes false, we scan right to find in c the next literal which is not false and make
it the new head literal; similarly, when a tail literal becomes false, we scan left to
find the next tail literal. If we cannot find new head or tail literal, we must have
found a unit clause or a conflicting clause.
To implement the above idea, we use the following data structures for clauses:
For each clause c containing more than one literal:
• lits(c): the array of literals storing the literals of c
• head(c): the index of the first non-false literal of c in lits(c)
• tail(c): the index of the last non-false literal of c in lits(c)
Let |c| denote the number of literals in c, then the valid indices for lits(c) are
{0, 1, . . . , |c| − 1}, and head(c) is initialized with 0 and tail(c) with |c| − 1.
The following are the data structures associated with each literal A:
• val(A): the truth value of A under partial interpretation, val(A) ∈ {1, 0, ×}, ×
(“unassigned”) is the initial value.
• cls(A): list of clauses c such that A is in c and pointed by either head(c) or
tail(c) in lits(c). We use insert(c, cls(A)) and remove(c, cls(A)) to insert and
remove clause c into/from cls(A), respectively.
Assuming global variables are used for clauses and literals, the procedure BCP
can be implemented as follows:
proc BCP(C)
// initialization
1 U := ∅ // U : stack of unit clauses
2 for c ∈ C do
3 if c = ( ) return ⊥
4 if (|c| > 1)
5 lits(c) := makeArray(c) // create an array of literals
6 head(c) := 0; insert(c, cls(lits(c)[0]))
7 tail(c) := |c| − 1; insert(c, cls(lits(c)[|c| − 1]))
8 else push(c, U )
9 for each literal A, val(A) := × // × = “unassigned”
3.4 Boolean Constraint Propagation (BCP) 113
// major work
10 res := BCPht(U )
// finishing
11 if (res = “SAT ) return ⊥
12 U := {A | val(A) = 1}
13 S := {clean(c) | val(lits(c)[head(c)]) = ×, val(lits(c)[tail(c)]) = ×}
14 return (U, S)
Note that the procedure clean(c) at line 13 will remove from c those literals
which are false (under val) and return if one of the literals in c is true. The
procedure BCPht will do the major work of BCP.
Algorithm 3.4.16 BCPht(U ) assumes the input clauses C = U ∪ S, where U is a
stack of unit clauses and S is a set of non-unit clauses stored using the head/tail data
structure, and can be accessed through cls(A). BCPht returns a conflict clause if an
empty clause is found during unit propagation; it returns “SAT” if no empty clause
is found.
proc BCPht(U )
1 while U = ∅ do
2 A := pop(U )
3 if val(A) = 0 return reason(A) // an empty clause is found
4 else if (val(A) = ×) // × is “unassigned
5 val(A) := 1; val(A) := 0
6 for c ∈ cls(A) do // A is either head or tail literal of c
7 if (A = lits(c)[head(c)])
8 e1 := head(c); e2 := tail(c); step := 1 // scan from left to right
9 else
10 e1 := tail(c); e2 := head(c); step := −1 // scan from right to left
11 while true do
12 x := e1 + step // x takes values from e1 + step to e2
13 if x = e2 break // exit the inner while loop
14 B := lits(c)[x]
15 if (val(B) = 0) // new head/tail literal is found
16 remove(c, cls(A)); insert(c, cls(B))
17 if (step = 1) head(c) := x else tail(c) := x
18 break // exit the inner while loop
19 if (x = e2 ) // no new head/tail available
20 A := lits(c)[e2 ] // check if c is unit or conflicting
21 if val(A) = 0 return c // c is conflicting
22 else if (val(A) = ×) // c is unit
23 push(A, U ); reason(A) := c; val(A) := 1; val(A) := 0
24 return “SAT” // no empty clauses are found
114 3 Reasoning in Propositional Logic
Example 3.4.17 For the clauses in Example 3.4.2, we have the following data
structure after the initialization (the head and tail indices are given after the literal
list in each clause):
and val(A) = × (unassigned) for all literal A. Note that each clause appears twice
in the collection of cls(A).
Inside of BCPht(U ), when x5 is popped off U , we assign val(x5 ) = 1, val(x5 ) =
0 (line 5) and check c1 ∈ cls(x5 ) (line 6). c1 generates unit clause x2 and is added
into U (line 23) with the assignments val(x2 ) = 1, val(x2 ) = 0. When x2 is
popped off U , we check c3 ∈ cls(x2 ). c3 generates unit clause x4 , we push x4
into U , and assign val(x4 ) = 1, val(x4 ) = 0. When x4 is popped from U , we
check c2 ∈ cls(x4 ). c2 generates unit clause x1 ; we push x1 into U and assign
val(x1 ) = 1, val(x1 ) = 0. When x1 is popped off U , we check c4 ∈ cls(x1 ). c4
is true because val(x2 ) = 1. Now the head index of c4 is updated to 1, and we add
c4 = [x1 , x2 , x3 ], 1, 2 into cls(x2 ). Since the empty clause is not found, BCPht
returns “SAT.” Finally, BCP returns (U, S) = ({x5 , x2 , x4 , x1 }, ∅).
Note that the two assignments “val(a) := 1; val(a) := 0” at line 23 of BCPht
can be removed as the same work will be done at line 5. However, having them here
often improves the performance of BCPht. At line 23, we also record the reason for
the assignment of a literal; this information is returned at line 3 and is useful if we
like to know why a clause becomes conflicting. BCPht can be improved slightly by
checking (val(lits(c)[e2 ]) = 1) after line 10: If it is true, then skip clause c as c is
satisfied.
In this implementation of BCP, once all the clauses C are read in, each
occurrence of literals of non-unit clauses will be visited at most once in BCPht.
Proposition 3.4.18 BCP(C) runs in O(n) time, where n is the total number of the
literal occurrences in all clauses of C.
Proof Obviously, the initialization of BCP takes O(n). For BCPht, we access a non-
unit clause c when its head (or tail) literal becomes false (line 6). We skip the head
(tail) literal (line 11) and check the next literal b (line 13); if it is false, we continue
(line 14); otherwise, we remove c from cls(A) into cls(B) (line 16) and update the
head (tail) index (line 17). If we cannot find a new head (tail) literal (line 19), we
have found a new unit clause (line 20). None of the literals of c is visited more than
Exercises 115
once by BCPht. In particular, we never access the clauses from cls(A) when literal
A is true.
Combining the above proposition with Theorem 3.4.4, we have the following
result:
Theorem 3.4.19 BCP is a linear time decision procedure for the satisfiability of
Horn clauses.
Exercises
1. Prove formally that the following statements are equivalent: For any two
formulas A and B, (a) A | B; (b) A → B is valid; (c) A∧¬B is unsatisfiable.
2. Use the semantic tableaux method to decide if the following formulas are valid
or not:
(a) (A ↔ (A ∧ B)) → (A → B)
(b) (B ↔ (A ∨ B)) → (A → B)
(c) (A ⊕ B) → ((A ∨ B) ∧ (¬A ∨ ¬B))
(d) (A ↔ B) → ((A ∨ ¬B) ∧ (¬A ∨ B))
(a) (A ∧ B) → C | (A → C) ∨ (B → C)
(b) (A ∧ B) → C | A → C
(c) A → (B → C) | (A → B) → (A → C)
(d) (A ∨ B) ∧ (¬A ∨ C) | B ∨ C
4. Prove by the semantic tableau method that the following statement, “either the
debt ceiling isn’t raised or expenditures don’t rise” (¬d ∨ ¬e), follows logically
from the statements in Problem 31 of Chap. 2.
5. Prove that if all the axioms are tautologies and all the inference rules are
sound, then all the derived formulas from the axioms and the inference rules
are tautologies.
6. Specify the introduction and elimination rules for the operators ⊕ and ↔ in a
natural deductive system.
7. Estimate a tight upper bound of the size of the inference graph for resolution, if
the input clauses contain n variables.
8. Given a non-unit clause c, we may split c into two clause by introducing a new
variable: Let c be (α | β), where α and β are lists of literals, and S = {(α |
y), (y | β)}, where y is a new variable. Show that S | c and S ≈ c. Using this
116 3 Reasoning in Propositional Logic
(a) (A ↔ (A ∧ B)) → (A → B)
(b) (A ⊕ B) → (A ∨ B)
(c) ((A ∧ B) → C) → (A → C)
(d) ((A ∨ B) ∧ (¬A ∨ C)) → (B ∨ C)
10. Use the resolution method to decide if the following formulas are valid or not.
If the formula is valid, provide a resolution proof; if it is not valid, provide an
interpretation which falsifies the formula.
(a) (A ↔ (A ∧ B)) → (A → B)
(b) (B ↔ (A ∨ B)) → (A → B)
(c) (A ⊕ B) → ((A ∨ B) ∧ (¬A ∨ ¬B))
(d) ((A ↓ B) ↑ C) → (A ↑ (B ↓ C))
11. Use the resolution method to decide if the following entailment relations are
true or not. If the relation holds, provide a resolution proof; if it doesn’t, provide
an interpretation such that the premises are true, but the consequence of the
entailment is false.
(a) (A → C) ∧ (B → D) | (A ∨ B) → (C ∨ D)
(b) (A ∧ B) → C | (A → C)
(c) A → (B → C) | (A → B) → (A → C)
(d) (A ∨ B) ∧ (¬A ∨ C) | B ∨ C
12. Apply clause deletion strategies to each of the following sets of clauses and
obtain a simpler equisatisfiable set.
13. Given the clause set S, where S = {1. (t | e | d), 2. (t | c), 3. (e | d | i),
4. (g | d | i), 5. (t | c | d | g), 6. (c), 7. (i | g), 8. (c | d), 9. (c | e) }, find
the following resolution proofs for S: (a) unit resolution, (b) input resolution,
(c) positive resolution, and (d) ordered resolution using the order t > i > g >
e > d > c.
References 117
14. Apply various resolution strategies to prove that the following clause set
S entails ¬s: unit resolution, input resolution, negative resolution, positive
resolution, and ordered resolution using the order of p > q > r > s.
References
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 119
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2_4
120 4 Propositional Satisfiability
In 1960, Martin Davis and Hilary Putnam proposed the inference rule of resolution
for propositional logic. For realistic problems, the number of clauses generated
by resolution grows quickly. In 1962, to avoid this explosion, Davis, George
Logemann, and Donald Loveland suggested replacing the resolution rule with a case
split: Pick a variable p and consider the two cases p → 1 and p → 0 separately.
This modified algorithm is commonly referred to as the DPLL algorithm, where
DPLL stands for Davis, Putnam, Logemann, and Loveland.
DPLL can be regarded as a search procedure: The search space is all the
interpretations, partial or full. A partial interpretation can be represented by a set
of literals, where each literal in the set is assigned true and each literal outside of
the set is assumed unassigned. DPLL starts with the empty interpretation and tries to
extend the interpretation by adding either p or p through the case split. DPLL stops
when every variable is assigned without contradiction and the partial interpretation
it maintains becomes a model.
In the following, we give a recursive version of DPLL. The first call to DPLL is
made with the set C of input clauses and the empty interpretation σ = {}. It is one
of the oldest backtracking procedures. DPLL uses a procedure named BCP (Boolean
constraint propagation) heavily for handling unit clauses. From the previous chapter,
we know that BCP implements unit resolution and subsumption deletion. Given a
set C of clauses, BCP(C) returns ⊥ if the empty clause is found by unit resolution;
otherwise, BCP(C) returns (U, S), such that C ≡ (U ∪ S), U is a set of unit clauses,
S does not contain any unit clauses, and U and S share no variables. Note that
BCP(C) takes O(n) time to run, where n is the number of literal occurrences of C.
4.1 The DPLL Algorithm 121
Fig. 4.1 The decision tree of DPLL for Example 4.1.3. The result of BCP is shown along the tree
links
4.1 The DPLL Algorithm 123
S ≡ (S ∧ A) ∨ (S ∧ ¬A)
into the set of input clauses, the model will be blocked by this clause and cannot
be generated by the SAT solver. Since a blocking clause contains every variable,
it is unlikely to become unit and it is expensive to keep all the blocking clauses.
Therefore, it is desirable to reduce the size of blocking clauses, i.e., to construct
a shorter clause which still blocks the model. If we know the guiding path of this
model, the negation of the conjunction of all the decision literals in the guiding path
will block the model.
Many applications of SAT solvers require solving a sequence of instances by
adding more clauses. Incremental SAT solvers with the guiding path can support
this application without any modification, because the search space skipped by a
guiding path does not contain any new model if more clauses are added.
In many application scenarios, it is beneficial to be able to make several SAT
checks on the same input clauses under different forced partial interpretations,
called “assumptions.” For instance, people may be interested in questions like “Is
the formula F satisfiable under the assumption x → 1 and y → 0?” In such
applications, the input formula is read in only once; the user implements an iterative
loop that calls the same solver instantiation under different sets of assumptions. The
calls can be adaptive, i.e., assumptions of future SAT solver calls can depend on the
results of the previous solver calls. The SAT solver can keep its internal state from
the previous call to the next call. DPLL with a guiding path can be easily modified
to support such applications, treating the assumption as a special set of unit clauses
which can be added into or removed from the solver. The guiding path allows us to
backtrack to the new search space but never revisit the search space visited before.
To obtain a high-performance SAT solver, we may use the deletion techniques
presented in Sect. 3.3.6 as preprocessing techniques. They enable us to reduce the
size of the formula before passing it to DPLL.
Inside the body of DPLL, it calls itself twice, once with S ∪ {(A)} and once with
S ∪ {(A)}. A persistent data structure for S would quickly run out of memory as
S is very large and the decision tree is huge for practical SAT instances. A better
approach is to use a destructive data structure for S: We remember all the operations
performed onto S inside the first call, and then undo them, if necessary, when we
enter the second call.
Head/tail literals are watch literals; when head/tail literals are allowed to appear
anywhere in a clause, head/tail literals are identical to watch literals. As watch
literals, we do not know the literals in a clause before or after the head/tail literals
are false or not when BCP is used to support DPLL. Thus, we need to search the
whole clause for locating the next head/tail literal.
Since the idea of watch literals is based on that of head/tail literals, the imple-
mentation of watch literals is easy if we have an implementation of head/tail literals.
In BCPht, when a head or tail literal of c becomes false, we look for its replacement
at index x, where head(c) < x < tail(c). There are two alternative ways to modify
BCPht. One solution is to allow x ∈ {0, 1, . . . , |c| − 1} − {head(c), tail(c)}; we
will introduce an implementation of this solution shortly. The other solution is to
fix head(c) = 0 and tail(c) = 1 (or tail(c) = |c| − 1) and swap the elements of
lits(c) if necessary.
The second solution saves the space for head(c) and tail(c) but may cost more
for visiting literals of lits(c). For example, if a clause has k literals which become
false in the order from left to right during BCP. The second solution may take O(k 2 )
time to visit the literals of the clause, while the first solution takes O(k) time. As a
result, BCP cannot be a linear time decision procedure for Horn clauses if the second
solution is used. Of course, we may introduce a pointer in the clause to remember
the position of the last visited literal, so that the second solution still gives us a linear
time algorithm.
Algorithm 4.1.7 BCPw(U ) supports the implementation of BCP(C) based on the
head/tail data structure. BCPw(U ) works the same way as BCPht(U ), where U
is a stack of unit clauses and non-unit clauses are stored using the head/tail data
structure. Like BCPht, BCPw returns a conflicting clause if the empty clause is
generated; it returns “SAT” otherwise. Unlike BCPht, values of head(c) and tail(c)
can be any valid indices of lits(c).
proc BCPw(U )
1 while U = {} do
2 A := pop(U )
3 if val(A) = 0 return reason(A) // a conflicting clause is found.
4 else if (val(A) = 1)
5 val(A) := 1; val(A) := 0
6 for c ∈ cls(A) do // A is either head or tail of c.
7 if (A = lits(c)[head(c)])
8 e1 := head(c); e2 := tail(c); step := 1 // scan from left to right.
9 else
10 e1 := tail(c); e2 := head(c); step := −1 // scan from right to left.
11 while true do
12 x := (x + step) mod |c| // x takes any valid index of lits(c).
13 if x = e1 break // exits the inner while loop.
13’ if x = e2 continue // go to line 12.
14 B := lits(c)[x]
126 4 Propositional Satisfiability
15 if (val(B) = 0)
16 remove(c, cls(A)); insert(c, cls(B))
17 if (step = 1) head(c) := x else tail(c) := x
18 break // exit the inner while loop.
19 if (x = e1 ) // no new head or tail found.
20 A := lits(c)[e2 ] // c is unit or conflicting.
21 if val(A) = 0 return c // c is conflicting.
22 else if (val(A) = 1) // c is unit.
23 push(A, U ); reason(A) := c; val(A) := 1; val(A) := 0
24 return “SAT” // no empty clauses are found.
The above algorithm comes from BCPht (Algorithm 3.4.16) by the following
modifications, which allow the values of head(c) and tail(c) to be any valid indices
of lits(c):
• At line 12, “x := x + step” is replaced by “x := (x + step) mod |c|.”
• At line 13, the condition “x = e2 ” is replaced by “x = e1 .”
• Line 13’ is added to skip the literal watched by e2 ; this is the literal to be in the
unit clause if the clause becomes unit.
• At line 19, the condition “x = e2 ” is replaced by “x = e1 .” When x = e1 is true,
every literal of the clause is false, with the exception of the literal watched by e2 .
In DPLL, the book-keeping required to detect when a clause becomes unit can
involve a high computational overhead if implemented naively. Since it is sufficient
to watch in each clause two literals that have not been assigned yet, assignments to
the non-watched literals can be safely ignored. When a variable p is assigned 1, the
SAT solver only needs to visit clauses watched by p. Each time one of the watched
literals becomes false, the solver chooses one of the remaining unassigned literals
to watch. If this is not possible, the clause is necessarily unit, or conflicting, or
already satisfied under the current partial assignment. Any sequence of assignments
that makes a clause unit will include an assignment of one of the watched literals.
The computational overhead of this strategy is relatively low: In a formula with m
clauses and n variables, 2m literals need to be watched, and m/n clauses are visited
per assignment on average. This advantage is inherited from the idea of head/tail
literals.
The key advantage of the watch literals over the head/tail literals is that the
watched literals do not need to be updated upon backtracking in DPLL. That is, the
key advantage of BCPw over BCPht is its use inside DPLL: When you backtrack
from a recursive call of DPLL, you do not need to undo the operations on head(c)
and tail(c), because any two positions of c will be sufficient for checking if c
becomes unit.
4.1 The DPLL Algorithm 127
Since the compiler will convert a recursion procedure into an iterative one using a
stack, to avoid the expense of this conversion, in practice, the procedure DPLL is
not implemented by means of recursion but in an iterative manner using a stack
for the partial interpretation σ . If the head/tail data structure is used, σ can be
obtained from val(a), but we still need a stack of literals to record the time when
a literal is assigned. This stack is called trail, which can also serve as a partial
interpretation. Using the trail, we keep track of the literals, either decision literals
or implied literals, from the root of the decision tree to the current node. Every time
a literal is assigned a value, either by a decision or an implication inside BCP, the
literal is pushed into the trail and its level in the decision tree is remembered.
A trail may lead to a dead-end, i.e., resulting in a conflicting clause, in which
case we have to explore the alternative branch of a case split. This corresponds
to backtracking which reverts the truth value of a decision literal. When we
backtrack from a recursive call, we pop those literals off the trail and make them
as “unassigned” (denoted by ×). When backtracking enough times, the search
algorithm eventually exhausts all branches of the decision tree, either finding all
models or claiming no models available.
To implement the above idea, we use the trail σ which is a stack of literals and
serves as a partial interpretation. σ will keep all the assigned literals. For each literal
A, we use lvl(A) to record the level at which A is assigned a truth value. This is
actually done at line 5 of BCPw. Line 5 of BCPw now is read as follows:
5: val(A) := 1; val(A) := 0; lvl(A) := lvl(A) := level; push(A, σ )
where val, lvl, σ , and level are global variables.
In practice, it is more convenient to set the level of a right child the same as
the level of its parent in the decision tree, to indicate that the right child has no
alternatives to consider. In the following implementation of DPLL, we have used
this idea.
In short, BCP(C) used in DPLL (Algorithm 4.1.1) simplifies C to (U, S), where
U is a set of unit clauses and S is a set of non-unit clauses. Now, the clauses in S are
stored in the head/tail data structure and the clauses in U are stored in the trail σ . We
replace BCP(C) by BCPw(X) to obtain the following version of DPLL, where X is
initially with the unit clauses from the input and contains the unit clause created by
each decision literal during the execution of DPLL.
Algorithm 4.1.8 Non-recursive DPLL(C) takes as input a set C of clauses and
returns a model of C if C is satisfiable; it returns ⊥ if C is unsatisfiable. Procedure
initialize(C) will return the unit clauses of C and store the rest clauses in the
head/tail data structure. Global variables val, lvl, σ , and level are used in all
procedures.
proc DPLL(C)
1 U := initialize(C) // initialize the head/tail data structure
2 σ := {}; level := 0 // initialize σ (partial model of C) and level
128 4 Propositional Satisfiability
3 while (true) do
4 if (BCPw(U ) = ⊥)
5 if (level = 0) return ⊥ // C is unsatisfiable
6 level := level − 1 // level of the last decision literal
7 A := undo() // A is the last decision literal picked at line 10
8 U := {A} // the second branch of A
9 else
10 A := pickLiteral() // pick an unassigned A as new decision literal
11 if (A = nil) return σ // all literals assigned, a model is found
12 level := level + 1 // new level in the first branch of A
13 U := {A} // create a new unit clause from A
proc undo()
// Backtrack to level, undo all assignments to literals of higher levels.
// Return the last literal of level + 1 in the trail σ , or nil.
1 A := nil
2 while (σ = {}) ∧ (lvl(top(σ )) > level) do
3 A := pop(σ )
4 val(A) := val(A) := × // × means “unassigned”
5 return A
Procedure undo() will undo all the assignments (made at line 5 of BCP) to the
literals of levels higher than level (including their complements), and return the last
literal of level + 1 in σ , which must be the last decision literal A picked at line 10
of DPLL.
Procedure pickLiteral() (line 10) is the place for implementing various heuristics
of best search strategies. The procedure will choose an unassigned literal and assign
a truth value. As we know from Theorem 4.1.5, this choice has no impact on the
completeness of the search algorithm. It has, however, a significant impact on the
performance of the solver, since this choice is instrumental in pruning the search
space. We will address this issue later.
Recall that in Algorithm 4.1.7, when a non-unit clause c becomes unit, where c
contains an unassigned literal a, we do reason(a) := c at line 23, i.e., c is the
reason for the unassigned literal a to be true. When a conflicting clause is found in
DPLL, all of its literals are false, and we pick the literal a, which is assigned last in c,
and do resolution on a between this clause and reason(a), hoping the resolvent will
be useful to cut the search space. Since resolution is sound, new clauses generated
this way are logical consequences of the input and it is safe to keep them. For
practical reasons, we must selectively do resolution and keep at most one clause
per conflicting clause.
Algorithm 4.2.2 conflictAnalysis(c) takes as input a conflicting clause c in the
current partial interpretation at the current level, generates a new clause by
resolution, and returns it as output. Procedure latestAssignedLiteral(c) will return
a literal a of c such that a is the last literal assigning a truth value in c. Procedure
countLevel(c, lvl) will return the number of literals in c which are assigned at level
lvl. Procedure resolve(c1 , c2 ) returns the resolvent of c1 and c2 .
proc conflictAnalysis(c)
// c is conflicting in the partial interpretation val()
1 α := c
2 while (|α| > 1) do // as α has more than one literal
3 a := latestAssignedLiteral(α) // a is assigned last in α
4 α := resolve(α, reason(a)) // resolution on a
5 if (countLevel(α, level) ≤ 1) return α
6 return α
130 4 Propositional Satisfiability
Example 4.2.3 Let the input clauses C = {c1 : (x2 | x3 ), c2 : (x1 | x4 ), c3 : (x2 |
x4 ), c4 : (x1 | x2 | x3 )}. If we pick x1 as the first decision literal in DPLL, it will
result in the following assignments:
assignment reason
x1 @1 (x1 )
x4 @1 c2 : (x1 | x4 )
x2 @1 c3 : (x2 | x4 )
x3 @1 c1 : (x2 | x3 )
The new DPLL procedure with CDCL (conflict-driven clause learning) can be
described as follows: We will replace BCPw(U ) by BCP(U ) which is a minor
modification of BCPw(U ): Instead of returning ⊥ when a conflicting clause is
found, just return that conflicting clause.
The major difference of the new DPLL algorithm lies on lines 7–8: In Algo-
rithm 4.1.8, a new unit clause is created from the last decision literal; in the new
version, a new clause is created from a conflicting clause and this new clause will
become unit after the undo operation.
Algorithm 4.2.5 DPLL(C) takes as input a set C of clauses and returns a model of
C if C is satisfiable; it returns ⊥ if C is unsatisfiable. Procedure initialize(C) will
return the unit clauses of C and store the rest clauses in the head/tail data structure.
Global variables val, lvl, σ , and level are used as in Algorithm 4.1.8.
proc DPLL(C)
1 U := initialize(C) // initialize the head/tail data structure
2 σ := {}; level := 0 // initialize σ (partial model of C) and level
3 while (true) do
4 res := BCP(U ) // U is a set of unit clauses.
5 if (res = “SAT”) // res contains a conflicting clause.
6 if (level = 0) return ⊥ // C is unsatisfiable.
7 U := insertNewClause(conflictAnalysis(res)) // new level is set.
8 undo() // undo to the new level
9 else
10 A := pickLiteral() // pick an unassigned A as new decision literal
11 if (A = nil) return σ // all literals assigned, a model is found
12 level := level + 1 // new level in the first branch of A
13 U := {A} // create a new unit clause from A
proc insertNewClause(α)
1 if (|α| = 1) // α is a unit clause
2 level := 0
3 A := literal(α); reason(A) := α // assume α = (A)
4 return {A}
5 else // insert α into the head/tail data structure for clauses
6 lits(α) := makeArray(α) // create an array of literals
7 secondHigh := 0 // look for the second highest level in α
8 for x := 0 to |α| − 1 do
9 if (lvl(lits(α)[x]) = level) head(α) := x; b := lits(α)[x] // lvl(b) = level
10 else if (lvl(lits(α)[x]) > secondHigh) secondHigh := lvl(lits(α)[x]); t := x
11 tail(α) := t
12 level := secondHigh
13 reason(b) := α
14 return {b} // the head literal of α
132 4 Propositional Satisfiability
c1 : (x1 | x2 ) c2 : (x1 | x3 | x7 )
c3 : (x2 | x3 | x4 ) c4 : (x4 | x5 | x6 )
c5 : (x3 | x4 ) c6 : (x5 | x6 )
c7 : (x2 | x3 ) c8 : (x1 | x3 )
c9 : (x1 | x2 | x3 )
If the first three literals chosen by pickLiteral in DPLL are x7 , x6 , and x5 , then the
only implied literal is x4 @3 by c4 . The next decision literal is x1 , which implies the
following assignments: x2 @4 by c1 and x3 @4 by c3 . c7 becomes conflicting at level
4.
Calling conflictAnalysis(c7 ), we obtain (x2 | x4 ) by resolution between c7 and
c3 , which contains x2 as the only literal at level 4. We add (x2 | x4 ) into DPLL as
c10 .
The level of DPLL is set to 3 as lvl(x4 ) = 3. So, we backtrack to level 3,
and c10 = (x2 | x4 ) becomes unit. The next call to BCP will make the following
assignments at level 3: x2 @3 by c10 , x1 @3 by c1 , x3 @3 by c8 , and c9 becomes
conflicting.
Calling conflictAnalysis(c9 ), we obtain c11 = (x2 ) by two resolutions between
c8 , c9 , and c1 . The new clause is unit and we set level = 0. Backjumping to level 0
by undo, the next call to BCP will imply the following assignments: x2 @0 by c11 ,
x3 @0 by c7 , x4 @0 by c10 , and c5 becomes conflicting.
Calling conflictAnalysis(c5 ), we obtain c12 = (x2 ) by two resolutions between
c5 , c10 , and c7 . Clauses c11 and c12 will generate the empty clause. Thus, the input
clauses are unsatisfiable.
Clause learning with conflict analysis does not impair the completeness of the
search algorithm: Even if the learned clauses are removed at a later point during the
search, the trail guarantees that the solver never repeatedly enters a decision level
with the same partial assignment. We have shown the correctness of clause learning
by demonstrating that each conflicting clause is implied by the original formula.
The idea of CDCL was used first in solver grasp of Marques-Silva and Sakallah,
and solver relsat of Bayardo and Schrag. In their solvers, they introduced
independently a novel mechanism to analyze the conflicts encountered during the
4.2 Conflict-Driven Clause Learning (CDCL) 133
search for a satisfying assignment. There are many ways to generate new clauses
from conflicting clauses and most of them are based on implication graphs. The
method of learning new clauses through resolution is the easiest one to present.
CDCL brings a revolutionary change to DPLL so that DPLL is no longer a simple
backtrack search procedure, because the level reset by insertNewClause may not be
level − 1 and DPLL will backjumping to a level less than level − 1 (to a node closer
to the root of the decision tree of DPLL), thus avoiding unnecessary search space.
If the new clause learned contains a single literal, the new level is set to 0. DPLL
with CDCL can learn from failures and backjump to the level where the source
of failures is originated. Several new techniques are proposed based on CDCL,
including generating a resolution proof when the input clauses are unsatisfiable,
or randomly restarting the search.
c1 : (x1 | x2 ) assumed
c3 : (x2 | x3 | x4 ) assumed
c5 : (x3 | x4 ) assumed
c7 : (x2 | x3 ) assumed
c8 : (x1 | x3 ) assumed
c9 : (x1 | x2 | x3 ) assumed
c10 : (x2 | x4 ) resolvent of c7 , c3
c11 : (x1 | x2 ) resolvent of c8 , c9
c12 : (x2 ) resolvent of c11 , c1
c13 : (x2 | x3 ) resolvent of c5 , c10
c14 : (x2 ) resolvent of c13 , c7
c15 : () resolvent of c12 , c14
This example shows the general idea of obtaining a resolution proof when DPLL
finds a set of clauses unsatisfiable. During the search inside DPLL, conflicting
clauses are generated due to decision literals. The procedure conflictAnalysis
produces a new clause to show why the previous decisions were wrong and we go
134 4 Propositional Satisfiability
up in the decision tree according to this new clause. DPLL returns ⊥ when it finally
finds a conflicting clause at level 0. In this case, a resolution proof is generated by
conflictAnalysis, and this proof uses either the input clauses or the clauses generated
previously by conflictAnalysis.
Given an unsatisfiable set C of clauses, we can use all the resolution steps
inside the procedure conflictAnalysis to construct a resolution proof. Obviously,
such a proof provides evidence for the unsatisfiability of C. The clauses used as the
premises of the proof are a subset of the clauses of C and are called the unsatisfiable
core of the proof. Note that a formula typically does not have a unique unsatisfiable
core. Any unsatisfiable subset of C is an unsatisfiable core. Resolution proofs and
unsatisfiable cores have applications in hardware verification.
An unsatisfiable core is minimal if removing any clause from the core makes the
remaining clauses in the core satisfiable. We may use this definition to check if every
clause in the core is necessary, and this is obviously a difficult job when the core is
huge. The core of the above example contains every input clause except c2 , c4 , and
c6 ; it is a minimal core.
Suppose we are looking for a model of C by DPLL(C) and the first decision literal
is A. Unfortunately, C ∧ A is a hard unsatisfiable instance and DPLL stucks there
without success. Randomly restarting DPLL may be your only option. By “restart,”
we mean that DPLL throws away all the previous decision literals (this can be easily
done by level := 0; undo() in DPLL). By “random,” we mean that when you restart
the search, the first decision literal will most likely be a different literal from A.
Intuitively, randomly restarting means there is a chance of avoiding bad luck and
getting luckier with guessing the right literal assignments that would lead to a quick
solution.
Without CDCL, random restart makes DPLL incomplete. With CDCL, the input
and generated clauses keep DPLL from choosing the same sequence of decision
literals and cannot generate the same clause again. Since the number of possible
clauses is finite, DPLL cannot run forever. This is a theoretical view. In practice,
we cannot keep all generated clauses as the procedure will run out of memory.
Managing generated clauses in a DPLL with CDCL is an important issue not
covered in this book.
The same intuition suggests random restart should be much more effective when
the problem instance is in fact satisfiable. Experimental results showed that random
restart also helps when the input clauses are unsatisfiable. If restart is frequent,
this assumes a deviation from standard practice, CDCL can be as powerful as
general resolution, while DPLL without restart has been known to correspond to
the weaker tree-like resolution. The theoretical justification for the speedup is that
a restart allows the search to benefit from the knowledge gained about persistently
troublesome conflicting variables sooner than backjumping would otherwise allow
4.2 Conflict-Driven Clause Learning (CDCL) 135
the partial assignment to be similarly reset. In effect, restarts may allow the
discovery of shorter proofs of unsatisfiability.
To implement the restart strategy, when DPLL has found a certain number of
failures (conflicting clauses) without success (a model), it is time for a restart.
Today’s SAT solvers often use the following restart policy: Let ri = r0 γ i−1 be
the number of failures for the ith restart, where r0 is a given integer, and 1 ≤ γ < 2.
If r0 = 300 and γ = 1.2, it means the first restart happens when DPLL has 300
failures, the second restart happens after another 360 failures, and so on. If γ = 1,
it means DPLL restarts after a fixed number of failures.
Given the common use of restarts in today’s clause learning SAT solvers, the
task of choosing a good restart policy appears appealing. While experimental
results showed that no restart policy is better than others for a wide range of
problems, a clause learning SAT solver could benefit substantially from a carefully
designed restart policy. Specifically, experiments show that nontrivial restart policies
did significantly better than if restarts were disabled and exhibited considerably
different performance among themselves. This provides motivation for the design
of better restart policies, particularly dynamic ones based on problem types and
search statistics.
Using CDCL or not, every implementation of DPLL needs a function for selecting
literals for case split. We assume that when a literal is chosen for case split, the
literal will be assigned true in the first branch of DPLL. Prior to the development of
the CDCL techniques, branching heuristics were the primary method used to reduce
the size of the search space. It seems likely, therefore, that the role of branching
heuristics is likely to change significantly for algorithms that prune the search space
using clause learning.
It is conventional wisdom that it is advantageous to assign first the most tightly
constrained variables, i.e., variables that occur in a large number of clauses. One
representative of such a selection strategy is known as the MOMS rule, which
branches on a literal which has the maximum occurrences in clauses of minimum
size. If the clauses contain binary clauses, then the MOMS rule will choose a literal
in a binary clause. By assigning true to this literal, it is likely that a maximal number
of binary clauses will become satisfied. On the other hand, if the primary criterion
for the selection of a branch literal is to pick one that would enable a cascade of unit
propagation (the result of such a cascade is a smaller subproblem), we would assign
false to the literal chosen by the MOMS rule in a binary clause; assigning this literal
false will likely create a maximal number of unit clauses from all the binary clauses.
MOMS provides a rough but easily computed approximation to the number of unit
clauses (i.e., implied literals) that a particular variable assignment might cause.
Alternatively, one can call BCP multiple times on a set of promising variables
in turn and compute the exact number of unit clauses that would be caused by a
136 4 Propositional Satisfiability
branching choice. Each chosen variable is assigned to be true and to be false in turn
and BCP is executed for each choice. The precise number of unit propagation caused
is then used to evaluate possible branching choices. Unlike the MOMS heuristic, this
rule is obviously exact in its attempt to judge the number of unit clauses caused by a
potential variable assignment. Unfortunately, it is also considerably more expensive
to compute because of the multiple executions of BCP and undoing them. It has
been shown that, using MOMS to choose a small number of promising candidates,
each of which is then evaluated exactly using BCP, it outperforms other heuristics
on randomly generated problems.
Another strategy is to branch on variables that are likely to be backbone literals.
A backbone literal is one that must be true in all models of the input clauses.
The likelihood that any particular literal is a backbone literal is approximated by
counting the appearances of that literal in the satisfied clauses during the execution
of DPLL. This heuristic outperforms those discussed in the previous paragraphs on
many examples.
The development of CDCL techniques enabled solvers to attack more structured,
realistic problems. There are no formal studies comparing the previously discussed
heuristics on structured problems when CDCL techniques are used. CDCL tech-
niques create many new clauses and make the occurrence counts of each literal
difficult or less significant. Branching techniques and learning are deeply related,
and the addition of learning to a DPLL implementation will have a significant effect
on the effectiveness of any of these branching strategies. As new clauses are learned,
the number of unit clauses caused by an assignment can be expected to vary; the
reverse is also true in that the choice of decision literal can affect the generated
clauses.
Branching heuristics that are designed to function well in the context of clause
learning generally try to branch on literals among the new clauses which have
been learned recently. This tends to allow the execution of DPLL to keep “making
progress” on a single section of the search space as opposed to moving from one
area to another; an additional benefit is that existing learned clauses tend to remain
relevant, avoiding the inefficiencies associated with losing the information present
in learned clauses that become irrelevant and are discarded.
One of the popular branch heuristics for DPLL with clause learning is called
“dynamic largest individual sum (DLIS) heuristic” and it behaves similarly as the
MOMS rule. At each decision point, it chooses the assignment that satisfies the
most unsatisfied clauses. Formally, let px be the number of unresolved clauses
containing x and nx be the number of unresolved clauses containing x. Moreover,
let x be the variable for which px is maximal, and let y be variable for which ny
is maximal. If px > ny , choose x as the next decision literal; otherwise, choose y.
The disadvantage of this strategy is that the computational overhead is high: The
algorithm needs to visit all clauses that contain a literal that has been set to true in
order to update the values px and nx for all variables contained in these clauses.
Moreover, the process needs to be reversed upon backtracking.
A heuristic commonly used in contemporary SAT solvers favors literals in
recently added conflict clauses. Each literal is associated with a count, which is
4.3 Use of SAT Solvers 137
initialized with the number of times the literal occurs in the clause set. When
a learned clause is added, the count associated with each literal in the clause is
incremented. Periodically, all counters are divided by a constant greater than 1,
resulting in a decay causing a bias towards branching on variables that appear
in recently learned clauses. At each decision point, the solver then chooses the
unassigned literal with the highest count (where ties are broken randomly by
default). This approach, known as the variable state independent decaying sum
(VSIDS) heuristics, was first implemented in zchaff. Zchaff maintains a list of
unassigned literals sorted by count. This list is only updated when learned clauses
are added, resulting in a very low overhead. Decisions can be made in constant time.
The heuristic used in berkmin builds on this idea but responds more dynamically
to recently learned clauses. The berkmin heuristic prefers to branch on literals that
are unassigned in the most recently learned clause that is not yet satisfied.
The emphasis on variables that are involved in recent conflicting clauses leads to
a locality-based search, effectively focusing on a subspace. The subspaces induced
by this decision strategy tend to coalesce, resulting in more opportunities for reso-
lution of conflicting clauses, since most of the variables are common. Representing
count using integer variables leads to a large number of tied counts. Solver minisat
avoids this problem by using a floating-point number to represent the weight.
Another possible (but significantly more complex) strategy is to concentrate only
on unresolved conflicting clauses by maintaining a stack of conflicting clauses.
When restart is used, the branching heuristic should be a combination of random
selection and other heuristics, so that DPLL will select a different branch literal
after each restart. All told, there are many competing branching heuristics for
satisfiability solvers, and there is still much to be done in evaluating their relative
effectiveness with clause learning on realistic, structured problems.
There exist excellent implementations of DPLL and you can find these SAT solvers
at the international SAT competition web page (satcompetition.org). Most of
them are available, free of charge, such as minisat, glucose, lingeling,
maple_LCM, to name a few. These SAT solvers are also called general-purpose
model generators, because they accept CNF as input and many problems can be
specified in CNF as SAT is NP-complete.
If you have a hard search problem to solve, either solving a puzzle or finding a
solution under certain constraints, you may either write a special-purpose program
or describe your problem in CNF and use one of these general-purpose model
generators. While both special-purpose tools and general-purpose model generators
rely on exhaustive search, there are fundamental differences. For the latter, every
problem has a uniform internal representation (i.e., clauses for SAT solvers). This
uniform representation may introduce redundancies and inefficiencies. However,
since a single search engine is used for all the problems, any improvement to the
138 4 Propositional Satisfiability
Today SAT solvers use the format suggested by DIMACS (the Center for Discrete
Mathematics and Theoretical Computer Science) many years ago for CNF formulas.
The DIMACS format uses a text file: If the first character of a line is “c,” it means
a comment. The CNF starts with a line “p cnf n m,” where n is the number of
propositional variables and m is the number of clauses. Most SAT solvers only
require m to be an upper bound for the number of clauses. Each literal is represented
by an integer: Positive integers are positive literals and negative integers are negative
literals. A clause is a list of integers ending with 0.
Example 4.3.1 To represent C = {(x1 | x2 | x3 ), (x1 | x2 ), (x2 | x3 ), (x2 | x3 )} in
the DIMACS format, the text file will look as follows:
c a tiny example
c
p cnf 3 4
1 2 -3 0
-1 2 0
-2 -3 0
2 3 0
To use today’s SAT solvers, you need to prepare your SAT instances in the
DIMACS format. This is done typically by writing a small program called encoder.
Once a model is found by a SAT solver, you may need another program, called
decoder, which translates the model into desired format. For example, the encoder
for a Sudoku puzzle will generate a formula A in CNF from the puzzle and the
decoder will take the model of A and generate the Sudoku solution.
4.3 Use of SAT Solvers 139
The standard Sudoku puzzle is to fill a 9 × 9 board with the digits 1 through 9, such
that each row, each column, and each of nine 3 × 3 blocks contain all the digits from
1 to 9. A typical example of Sudoku puzzle and its solution is shown in Fig. 4.2.
To solve this problem by using a SAT solver, we need to specify the puzzle
in CNF. First, we need to define the propositional variables. Let pi,j,k be the
propositional variable such that pi,j,k = 1 iff digit k is at row i and column j
of the Sudoku board. Since 1 ≤ i, j, k ≤ 9, there are 93 = 729 variables.
We may encode pi,j,k as 81(i−1)+9(j −1)+k, then p1,1,1 = 1 and p9,9,9 = 729.
Another encoding is pi,j,k = 100i + 10j + k, then p1,1,1 = 111 and p9,9,9 = 999.
The latter wasted some variables, but it allows us to see clearly the values of i, j and
k from the integer. We will use the latter; since this is a very easy SAT problem, a
tiny waste does not hurt. The first three lines of the CNF file may read as
c Sudoku puzzle
p cnf 999 1000000
where 1000000 is an estimate for the number of clauses from the constraints on the
puzzle.
The general constraints of the puzzle can be stated as follows:
• Each cell contains exactly one digit of any value.
• Each row contains every digit exactly once, i.e., there are no duplicate copies of
a digit in a row.
• Each column contains every digit exactly once.
• Each 3 × 3 grid contains every digit exactly once.
A solution to a given board will say which digit should be placed in which cell.
The first constraint above can be divided into two parts: (1) Each cell contains
at least one digit, and (2) each cell contains at most one digit. If every cell in a row
contributes one digit and each digit appears at most once, then it implies that every
digit appears exactly once in this row. The same reasoning applies to columns and
140 4 Propositional Satisfiability
grids, so the next three constraints can be simplified by dropping the constraints like
“every row contains every digit at least once.”
1. Each cell contains at least one digit.
2. Each cell contains at most one digit.
3. Each row contains every digit at most once.
4. Each column contains every digit at most once.
5. Each 3 × 3 grid contains every digit at most once.
Now it is easy to convert the above constraints into clauses:
1. For 1 ≤ i, j ≤ 9, (pi,j,1 | pi,j,2 | · · · | pi,j,9 ).
2. For 1 ≤ i, j ≤ 9, 1 ≤ k < c ≤ 9, (pi,j,k | pi,j,c ).
3. For 1 ≤ i, k ≤ 9, 1 ≤ j < b ≤ 9, (pi,j,k | pi,b,k ).
4. For 1 ≤ j, k ≤ 9, 1 ≤ i < a ≤ 9, (pi,j,k | pa,j,k ).
5. For 0 ≤ a, b ≤ 2, 1 ≤ i < i ≤ 3, 1 ≤ j < j ≤ 3, 1 ≤ k ≤ 9,
(p3a+i,3b+j,k | p3a+i ,3b+j ,k ).
Finally, we need to add the initial configuration of the board as clauses. If digit c
is placed at row a and column b, we create a unit clause (pa,b,c ) for it. Each Sudoku
puzzle should have at least 17 digits placed initially on the board.
You may write a small program to convert the above clauses in the DIMACS
format and use one of the SAT solvers available on the internet to solve the Sudoku
puzzle. A SAT solver takes no time to find the solution. If you would like to design
a Sudoku puzzle for your friend, you could use a SAT solver to check if your puzzle
has no solutions or has more than one solution.
(y ∗ x = z) ∧ (z ∗ y = w) → w ∗ y = x
Without loss of generality, we may assume that L is a sorted list by adding more
clauses, to reduce some symmetric solutions.
If n = 100, k = 5, (a1 ) produces 5 clauses of length 100; (a2 ) produces 24,750
(=5∗100∗99/2) clauses of length 2; (a3 ) produces 500 clauses of length 2; and (a4 )
produces 100 clauses of length 6, for a total of 25,355 clauses. This is a substantial
reduction from 3,921,225 at the cost of introducing 500 new variables.
Given a set C of clauses and a full interpretation σ , we may count how many clauses
are false under σ . Let
then 0 ≤ f (C, σ ) ≤ |C|, where |C| is the number of clauses in C. When f (C, σ ) =
0, σ is a model of C. The maximum satisfiability problem MAX-SAT is thus the
optimization problem where f serves as the objective function: Find σ such that
f (C, σ ) is minimal (or |C| − f (C, σ ) is maximal).
Let Max-kSAT denote the decision version of MaxSAT which takes a positive
integer m and a set C of clauses as input, where each clause has no more than
k literals, and returns true iff there exists an interpretation σ such that at least
m clauses are satisfied under σ . When m = |C|, then Max-kSAT becomes the
satisfiability problem kSAT. It is well-known that 2SAT can be solved in polynomial
time, while Max2SAT is NP-complete. Thus, MaxSAT is substantially more difficult
than SAT.
Theorem 4.4.1 There is a linear time decision procedure for 2SAT.
Proof This sketch of the proof comes from (Aspvall, Plass, Tarjan, 1979). Let C be
in 2CNF without pure literals. We construct a directed graph G = (V , E) for C as
follows: V contains the two literals for each variable appearing in C. For each clause
(A | B) ∈ C, we add two edges, (A, B) and (B, A), into E. Obviously, the edges
represent the implication relation among the literals, because (A | B) ≡ (A → B)∧
(B → A). Since the implication relation is transitive and (A → B) ≡ (B → A),
the following three statements are equivalent for any pair of literals x and y in G:
1. C | (x → y).
2. There is a path from x to y in G.
3. There is a path from y to x in G.
144 4 Propositional Satisfiability
We then run the algorithm for strongly connected components (SCC), which can
be done in linear time, on G. It is easy to see that C | (x ↔ y) for any two
literals x and y in the same component, because x and y share a cycle in G. If both
a variable, say p, and its complement, p, appear in the same component, then C
is unsatisfiable, because C | (p ↔ p). If no such cases exist, then C must be
satisfiable. A model can be obtained by assigning the same truth value to each node
in the same component, following the topological order of the components induced
by G: Always assign 0 to a component, unless one of the literals in the component
had already received 1.
Example 4.4.2 Given C = {c1 : (p | q), c2 : (p | r), c3 : (p | r), c4 : (q |
r), c5 : (q | s), c6 : (r | s)}, its implication graph is shown in Fig. 4.3. There
are two SCCs: S1 = {p, q, r, s} and S2 = {p, q, r, s}. We have to assign 0 to every
node in S1 (and 1 to every node in S2 ) to obtain a model of C.
Theorem 4.4.3 Max2SAT is NP-complete.
Proof This proof comes from Garey, Johnson, and Stockmeyer [2]. NP is a class of
decision problems whose solutions can be checked in polynomial time. Max2SAT
is in NP because a solution candidate is an interpretation σ and we can check how
many clauses are true under σ in polynomial time.
To show Max2SAT is NP-hard, we reduce 3SAT to Max2SAT. That is, if there
exists an algorithm A to solve Max2SAT, we can construct algorithm B, which uses
A, to solve 3SAT, and the computing times of A and B differ by a polynomial of the
input size.
Given any instance C of 3SAT, we assume that every clause of C has exactly
three literals. For every clause ci = (l1 | l2 | l3 ) ∈ C, where l1 , l2 , and l3 are
arbitrary literals, we create 10 clauses in Max2SAT:
(l1 ), (l2 ), (l3 ), (xi ), (l1 | l2 ), (l1 | l3 ), (l2 | l3 ), (l1 | xi ), (l2 | xi ), (l3 | xi ),
where xi is a new variable. Let C be the collection of these clauses created from
the clauses in C, then |C | = 10|C| and the following statements are true:
• If an interpretation satisfies ci , then we can make seven of the ten clauses
satisfied.
• If an interpretation falsifies ci , then at most six of the ten clauses can be satisfied.
4.4 Maximum Satisfiability 145
l1 l2 l3 xi (l1 |l2 ) (l1 |l3 ) (l2 |l3 ) (l1 |xi ) (l2 |xi ) (l3 |xi ) sum
0 0 0 0 1 1 1 1 1 1 6
0 0 0 1 1 1 1 0 0 0 4
0 0 1 0 1 1 1 1 1 1 7
0 0 1 1 1 1 1 0 0 1 6
0 1 1 0 1 1 0 1 1 1 7
0 1 1 1 1 1 0 0 1 1 7
1 1 1 0 0 0 0 1 1 1 6
1 1 1 1 0 0 0 1 1 1 7
The first four columns serve both the truth values of {l1 , l2 , l3 , xi } and the first four
clauses of the ten clauses. The first two lines of the table shows that when (l1 | l2 |
l3 ) is false, the sum of the truth values of the ten clauses (the last column) is at most
6. The rest of the table shows the sums of the truth values of the ten clauses when
one, or two, or three literals of {l1 , l2 , l3 } are true (some cases are ignored due to the
symmetry of l1 , l2 , and l3 ). It is clear from the table that when (l1 | l2 | l3 ) is true,
we may choose the value of xi to obtain seven true clauses out of the ten clauses.
Then C is satisfiable iff C has an interpretation which satisfies K = 7|C| clauses
of C . If we have an algorithm to solve C with K = 7|C|, then the output of the
algorithm will tell if C is satisfiable or not.
C = {(xa | xe ; 5), (xb | xc ; 5), (xc | xe ; 5), (xa ; 1), (xb ; 1), (xc ; 1), (xd ; 1), (xe ; 1)}.
number of xi ’s and thus gives us a maximal clique. It is natural to specify the second
set of clauses as hard clauses in the hybrid MaxSAT format.
Local search methods are widely used for solving optimization problems. In these
methods, we first define a search space, and for each point in the search space, we
define the neighbors of the point. Starting with any point (often randomly chosen),
we replace repeatedly the current point by one of its neighbors, usually a better one
under the objective function, until the point fits the search criteria, or we run out of
time.
For SAT or MAX-SAT, the search space of common local search methods is
the set of full interpretations. If C contains n variables, then the search space size is
O(2n ). In comparison, for exhaustive search procedures like DPLL, the search space
size is O(3n ), because all partial interpretations are considered. For convenience,
we assume that a full interpretation σ is represented by a set of n literals (or
equivalently, a full product), where each variable appears exactly once. A neighbor
of σ is obtained by flipping the truth value of one variable in σ . Thus, every σ has
n neighbors:
Selman et al. proposed a greedy local search procedure called GSAT for SAT. The
running time of GSAT is controlled by two parameters: MaxT ries is the maximal
number of random starting points, and MaxF lips is the maximal number of flips
allowed for any starting point.
Algorithm 4.4.8 GSAT(C) takes as input a set C of clauses and returns a model of
C if it finds one; it returns “unknown” if it terminates without finding a model.
proc GSAT(C, σ )
1 for i := 1 to MaxT ries do
2 σ := a random full interpretation of C // a starting point
3 for j := 1 to MaxF lips do
4 if σ (C) = 1 return σ // a model is found.
5 pick A ∈ σ // pick a literal of σ for flipping.
6 σ := σ − {A} ∪ {¬A}
7 σ := σ
8 return “unknown”
Example 4.4.9 Let C = {c1 : (x1 | x3 | x4 ), c2 : (x1 | x4 ), c3 : (x2 | x4 ), c4 :
(x1 | x2 | x3 ), c5 : (x2 | x4 )}. If the starting interpretation is σ0 = {x1 , x2 , x3 , x4 },
f (C, σ0 ) = 2 as σ0 (c2 ) = σ0 (c3 ) = 0. If x1 is chosen for flipping, we obtain
σ1 = {x1 , x2 , x3 , x4 } and f (C, σ1 ) = 1 since σ1 (c3 ) = 0. If x2 is chosen next for
148 4 Propositional Satisfiability
Selman et al. proposed a variation of GSAT, called walksat, which select the
best variable to flip in a randomly chosen conflicting clause. Walksat’s strategy for
picking a literal to flip is as follows:
1. Randomly pick a conflicting clause, say α.
2. If there is a literal B ∈ α such that the flipping of B does not turn any currently
satisfied clauses to unsatisfied, return B as the picked literal.
3. With probability p, randomly pick B ∈ α and return B.
4. With probability 1 − p, pick B ∈ α such that the flipping of B turns the least
number of currently satisfied clauses to unsatisfied, return B.
Since the early 1990s, there has been active research on designing, understand-
ing, and improving local search methods for SAT. In the framework of GSAT,
various ideas are proposed, and some of them are given below.
• When picking a literal, take the literal that was flipped longest ago for breaking
ties.
• A weight is given to each clause, incrementing the weight of unsatisfied clauses
by one for each interpretation. One of the literals occurring in more clauses of
maximum weight is picked.
4.4 Maximum Satisfiability 149
• A weight is given to each variable, which is increased in variables that are often
flipped. The variable with minimum weight is chosen.
Selman et al. showed through experiments that GSAT as well as walksat
substantially outperformed the best DPLL-based SAT solvers on some classes of
formulas, including randomly generated formulas and graph coloring problems.
GSAT returns either a model or “unknown,” as it can never declare that “the
input clauses are unsatisfiable.” This is a typical feature of local search methods as
they are incomplete in the sense that they do not provide the guarantee that it will
eventually either report a satisfying assignment or declare that the given formula is
unsatisfiable.
There have also been attempts at hybrid approaches that explore the combination
of ideas from DPLL methods and local search techniques. There has also been work
on formally analyzing local search methods, yielding some of the best O(2n ) time
algorithms for SAT. For instance, the expected running time, ignoring polynomial
factors, of a simple local search method with restarts after every 3n “flips” has been
shown to be ((k + 1)/k)n for kCNF instances of n variables, where each clause
contains at most k literals. When k = 3, the result yields a complexity of O((4/3)n )
for 3SAT.
The algorithm GSAT can be used to solve the MaxSAT problem, which asks to
find an interpretation for a set of clauses such that a maximal number of clauses are
true. Since GSAT is incomplete, the answer found by GSAT is not guaranteed to be
optimal. This is a typical feature of local search methods which always find local
optimal solutions. GSAT can also be used to solve weighted MaxSAT and hybrid
MaxSAT.
C1 , C2 , .., Cm ⇒ C1 , C2 , .., Cn
(α; 0) ⇒
2. Tautology elimination: Valid clauses are always true and can be discarded.
(p | p | α; w) ⇒
4. Unit clash: Unit clauses with complement literals can be reduced to one.
where A is a literal and w1 ≥ w2 . For example, given (p; 2) and (p; 3), if p → 1,
we get a cost of 3 from (p; 3); if p → 0, we get a cost of 2 from (p; 2). The
same result can be obtained from (p, 1) and (⊥; 2); the latter is added into the
cost function.
152 4 Propositional Satisfiability
5. Weighted resolution:
For any clause c, let P (c) and N(c) be the sets of positive literals and negative
literals in c, respectively. Let F = {(ci ; wi ) : 1 ≤ i ≤ m} over n variables, say
{x1 , x2 , . . . , xn }. F can be formulated as the following instance of ILP:
minimize 1≤i≤m wi (1 − yi ) // minimize the weights of falsified clauses
subject to x∈P (ci ) x + x∈N (ci ) (1 − x) ≥ yi , 1 ≤ i ≤ m, // ci is true iff yi = 1
yi ∈ {0, 1}, 1 ≤ i ≤ m, // every clause is falsified or satisfied
xj ∈ {0, 1}, 1 ≤ j ≤ n. // every variable is false or true
where yi is a new variable for each clause ci in F .
Intuitively, yi = 1 implies that x∈P (ci ) x + x∈N (ci ) (1 − x) ≥ 1, i.e., ci is true.
Thus, the more yi ’s with yi = 1, the less the value of 1≤i≤m wi (1 − yi ).
The above ILP instance can be relaxed to an instance L of LP:
minimize 1≤i≤m wi (1 − yi )
subject to x∈P (ci ) x + x∈N (ci ) (1 − x) ≥ yi , 1 ≤ i ≤ m,
0 ≤ yi ≤ 1, 1 ≤ i ≤ m, and
0 ≤ xj ≤ 1, 1 ≤ j ≤ n.
If L has a solution, then 1≤i≤m wi (1 − yi ) is a lower bound for the solutions of
F . The LP can be used to obtain an approximate solution for F when we round
up/down the values of xi from reals to integers.
Other techniques for lower bounds look for inconsistencies that force some
soft clause to be falsified. Some people suggested learning falsified soft clauses;
some people use the concepts of clone, minibcukets, or width restricted BDD in
relaxation. For instance, we may treat the soft clauses as if they were hard and then
run BCP to locate falsified clauses. This can help us to find unsatisfiable subsets of
clauses quickly and use them to estimate the total weights of falsified clauses. These
techniques can be effective on small combinatorial problems. Once the number of
variables in F gets to 1,000 or more, these lower bound techniques become weak
or too expensive. Strategies for turning on/off these techniques in HyMaxSATBB
remain to be research topics.
Besides the algorithm based on branch-and-bound, some MaxSAT solvers
convert a MaxSAT instance into a sequence of SAT instances where each instance
encodes a decision problem of the form, for different values of k, “is there an
interpretation that falsifies soft clauses of total weights at most k?”
For example, if we start with a small value of k, the SAT instance will be
unsatisfiable. By increasing k gradually, the first k when the SAT instance becomes
satisfiable will be the value of an optimal solution. Of course, this linear strategy on
k does not provide necessarily an effective MaxSAT solver.
There are much ongoing research works in this direction. The focus of this
approach is mainly doing two things:
1. Develop economic ways to encode the decision problem at each stage of the
sequence.
2. Exploit information obtained from the SAT solver at each stage in the next stage.
154 4 Propositional Satisfiability
Researchers on MaxSAT can assess the state-of-the-art MaxSAT solvers and create
a collection of publicly available MaxSAT benchmark instances.
Exercises 155
These MaxSAT solvers are built on the successful techniques for SAT, evolve
constantly in practical solver technology, and offer an alternative to traditional
approaches, e.g., integer programming. MaxSAT solvers use propositional logic
as the underlying declarative language and are especially suited for inherently
“very Boolean” optimization problems. The performance of MaxSAT solvers has
surpassed that of specialized algorithms on several problem domains:
• Correlation clustering
• Probabilistic inference
• Maximum quartet consistency
• Software package management
• Fault localization
• Reasoning over bionetwork
• Optimal covering arrays
• Treewidth computation
• Bayesian network structure learning
• Causal discovery
• Cutting planes for IP (integer programming)
• Argumentation dynamics
The advances of contemporary SAT and MaxSAT solvers have transformed
the way we think about NP-complete problems. They have shown that, while
these problems are still unmanageable in the worst case, many instances can be
successfully tackled.
Exercises
1. Draw the four complete decision trees (trees of recursive calls) of DPLL on the
following set of clauses:
using the orders of p, q, r, s for splitting (the value 1 is used first for each
splitting variable) and report the truth values of variables at various levels. How
many models are found by DPLL? What is the guiding path when each model
is found?
156 4 Propositional Satisfiability
3. If we feed the clause set in the previous problem to a SAT solver which works
as a black box and can produce only one model at a time, what clause should
be added into the clause set so that the SAT solver will produce a new model?
4. Estimate a tight upper bound of the size of the inference graph for resolution, if
the input clauses contain n variables.
5. Provide the pseudo-code for implementing BCPw based on BCPht such that
the space for head(c) and tail(c) are freed by assuming head(c) = 0 and
tail(c) = |c| − 1 for any non-unit clause c, and no additional space can be
used for c. In your code, you may call swap(lits(c), i, j ), which swaps the two
elements at indices i and j in the array lits(c).
6. Given a set H of m + 1 Horn clauses over m variables:
Show that the size of H is O(m), and the algorithm BCPw in the previous
problem will take (m2 ) time on H , if all the clauses in cls(A) must be
processed before considering cls(B), where ¬A is assigned true before ¬B.
7. We apply DPLL (without CDCL) to the following set S of clauses: S = {1. (a |
c | d | f ), 2. (a | d | e), 3. (a | c), 4. (b | d | e), 5. (b | d | f ), 6. (b | f ),
7. (c | e), 8. (c | e), 9. (c | d) }, assuming that pickLiteral picks a literal in the
following order: a, b, c, d, e, f (false is tried first for each variable). Please (a)
draw the decision tree of DPLL, (b) provide the unit clauses derived by BCP
at each node of the decision tree, and (c) identify the head/tail literals of each
clause at the end of DPLL (assuming initially the first and last literals are the
head/tail literals).
8. We apply GSAT to the set S of clauses in the previous problem. Assume that
the initial node (i.e., interpretation) is σ0 = {a, b, c, d, e, f }, and the strategy
for picking a neighbor to replace the current node is (a) the best among all
the neighbors and (b) the first neighbor which is better than the current node
(flipping variables in the order of a, b, . . . , f ). Sideways moves are allowed if
no neighbors are better than the current node. If no neighbors are better than
the current node, a local optimum is found and you need to restart GSAT with
the node σ0 = {a, b, c, d, e, f }.
9. We apply DPLL with CDCL to the clause set C : {c1 : (x1 | x2 | x5 ), c2 :
(x1 | x3 | x4 | x5 ), c3 : (x1 | x2 | x3 | x4 ), c4 : (x3 | x4 | x5 ), c5 : (x2 |
x3 | x5 ), c6 : (x3 | x4 | x5 ), c7 : (x3 | x4 | x5 )}. If pickLiteral picks a literal
for case split in the following order x5 , x4 , . . . , x1 (xi = 0 is tried first), please
answer the following questions:
(a) For each conflicting clause found in DPLL(C), what new clause will be
generated by conflictAnalysis?
(b) What will be the head and tail literals if the new clause generated in (a)
is not unit? And what will be the value of level at the end of the call to
insertNewClause?
Exercises 157
(c) Draw the DPLL decision tree until either a model is found or DPLL(C)
returns false.
10. The pigeonhole problem is to place n + 1 pigeons into n holes such that every
pigeon must be in a hole and no holes can hold more than one pigeon. Please
write an encoder in any programming language that reads in an integer n and
generates the CNF formula for the pigeonhole problem with n + 1 pigeons
and n holes in the DIMACS format. Store the output of your encoder in a
file named pigeonX.cnf, where X is n, and feed pigeonX.cnf to a SAT
solver which accepts CNF in the DIMACS format. Turn in both the encoder,
the result of the SAT solver for n = 4, 6, 8, 10, and the CNF file for n = 4
(i.e., pigeon4.cnf), plus a summary of the outputs of SAT solvers on the
sizes of the input, the computing times, and the numbers of conflicts. What is
the relation between the size of the input and the computing time?
11. The n-queen problem is to place n queens on an n × n board such that no two
queens are in the same row, the same column, or the same diagonal. Please write
a program called encoder in any programming language that reads in an integer
n and generates the CNF formula for the n-queen problem in the DIMACS
format. Store the output of your encoder in a file named queenX.cnf and feed
queenX.cnf to a SAT solver which accepts CNF in the DIMACS format. Write
another program called decoder which reads the output of the SAT solver and
display a solution of n-queen on your computer screen. Turn in both the encoder
and decoder, the result of the SAT solver for n = 5, 10, 15, the CNF file for
n = 5 (i.e., queen5.cnf), and the output of your decoder for n = 10.
12. You are asked to write two small programs, one called encoder and the other
called decoder. The encoder will read a Sudoku puzzle from a file (some
examples are provided online) and generate the CNF formula for this puzzle
in the DIMACS format. Store the output of your encoder in a file named
sudokuN.cnf and feed sudokuN.cnf to a SAT solver which accepts CNF in
the DIMACS format. The decoder will read the output of the SAT solver and
displays a solution of the Sudoku puzzle on your computer screen. Turn in both
the encoder and decoder, the result of the SAT solver, and the result of your
decoder for the provided Sudoku puzzles.
13. A Latin square over N = {0, 1, . . . , n − 1} is an n × n square where each row
and each column of the square is a permutation of the numbers from N . Every
Latin square defines a binary function ∗ : N, N → N such that (x ∗ y) = z iff
the entry at the xth row and the yth column is z. You are asked to write a small
program called encoder. The encoder will read an integer n and generates the
CNF formula for the Latin square of size n satisfying the constraints x ∗ x = x
and (x ∗ y) ∗ (y ∗ x) = y (called Stein’s third law) in the DIMACS format. Store
the output of your encoder in a file named latinN.cnf and feed latinN.cnf
to a SAT solver which accepts CNF in the DIMACS format. Turn in the encoder,
and a summary of the results of the SAT solver for n = 4, 5, . . . , 9.
14. Given an undirected simple graph G = (V , E), an independent set of G
is a subset X ⊆ V such that for any two vertices of X, (x, y) ∈ / E. The
158 4 Propositional Satisfiability
References
Logic is a collection of closely related formal languages. There are certain languages
called first-order languages, and together they form first-order logic. Our study of
first-order logic will parallel the study of propositional logic: We will at first define
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 161
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2_5
162 5 First-Order Logic
For each symbol in P ∪ F , the number of arguments for that symbol is fixed and
is dictated by the arity function arity : P ∪ F → N . If arity(p) = 0 for p ∈ P , p
is the same as a propositional variable; if arity(f ) = 0 for f ∈ F , f is a constant.
The pair (P , F ) with arity is called the signature of the language. We denote the
language L by L = (P , F, X, Op).
In general, P or F could be empty; X and Op can never be empty. By convention,
we use {p/arity(p) | p ∈ P } and {f/arity(f ) | f ∈ F } to show P and F with
arity in a compact way. For instance, even/1 says arity(even) = 1. In theory, P or
F or X could be infinite, but in practice, they are always finite. We also assume
that no symbols appear more than once in these four sets. Rather than fixing a
single language once and for all, different signatures generate different first-order
languages and allow us to specify the symbols we wish to use for any given domain
of interest.
Example 5.1.2 To specify the properties of the natural numbers listed at the
beginning of this section, we use the signature (P , F ), where P = {even/1, odd/1,
</2}, F = {0/0, s/1}. The arity is given after the symbol / following each predicate
or function symbol.
Once a signature is given, we add some variables X and logical operators Op,
to obtain the alphabet for a first-order language. First-order logic allows us to build
complex expressions out of the alphabet. Starting with the variables and constants,
we can use the function symbols to build up compound expressions which are called
terms.
Definition 5.1.3 Given first-order language L = (P , F, X, Op), the set of terms
of L are the set of strings built up by F and X, which can be formally defined by
the following BNF grammar:
The set of terms in T erms is often denoted by T (F, X). A term is ground if it
does not contain any variable. The set of all ground terms is denoted by T (F ). Note
that T (F ) = ∅ if F contains no constants.
For F = {0, s} in the previous example, T (F ) = {0, s(0), s(s(0)), . . .} and for
X = {x, y}, T (F, X) = {0, s(0), s(x), s(y), s(s(0)), s(s(x)), s(s(y)), . . .}.
Intuitively, the terms denote objects in the intended domain of interest. In a term,
besides the symbols from F and X, the parentheses “(” and “)” and the comma “,”
are also used to identify its structure. Like every propositional formula has a formula
tree (Definition 2.1.1), for each term we can have a term tree in which parentheses
and commas are omitted. We may assign a position, which is a sequence of positive
164 5 First-Order Logic
integers, to each node of the term tree and use this position to identify the subterm
denoted by that node.
Definition 5.1.4 Given term t, a position p of t, with the subterm at p, denoted by
t/p, is recursively defined as follows:
• If t is a constant or a variable, then p = , the empty sequence, and t/p = t.
• If t = f (t1 , . . . , tk ), then either p = and t/p = t, or p = i.q, where 1 ≤ i ≤ k
and t/p = ti /q, assuming p. = .p = p for any p.
For example, if t = f (x, g(g(y))), then the legal positions of t are , 1, 2, 2.1,
and 2.1.1, and the corresponding subterms are t, x, g(g(y)), g(y), and y, respec-
tively.
Now adding the usual logical operators from propositional logic plus the two
quantifiers, we have pretty much all the symbols needed for a first-order language.
Definition 5.1.5 Given first-order language L = (P , F, X, Op), the formulas of
L are the set of strings built up by terms, predicate symbols, and logical operators.
Let op be the binary operators used in the current application, then the formulas for
this application can be defined by the following BNF grammar:
A formula (clause, literal, atom) is ground if it does not contain any variable.
Atoms are a shorthand for atomic formulas. We can have a formula tree for each
first-order formula.
Some parentheses are unnecessary if we use a precedence among the logical
operators. When displaying a formula, it is common to drop the outmost parentheses
or those following a predicate symbol of zero arity, e.g., we write p instead of p().
We may also define subformulas of a formula as we did for propositional formulas
and terms.
Example 5.1.6 Given signature (P , F ), where P = {child/2, love/2} and F =
{ai /0 | 1 ≤ i ≤ 100}, then T (F ) = F and T (F, X) = F ∪ X, because F does not
have any non-constant function symbols. If X = {x, y, z}, the following are some
examples of formulas:
• Ground atoms: child(a2 , a1 ), child(a3 , a1 ), love(a1 , a2 ), . . .
• Non-ground atoms: child(x, y), love(y, x), love(z, x), . . .
5.1 Syntax of First-Order Languages 165
5.1.2 Quantifiers
What makes first-order logic powerful is that it allows us to make general assertions
using quantifiers: Universal quantifier ∀ and existential quantifier ∃. In the definition
of F ormulas, every occurrence of quantifiers is followed by a variable, say x, and
then followed by a formula, say A. That is, the quantifiers appear in a formula like
either ∀x A or ∃x A; the former says “A is true for every value of x,” and the latter
says “A is true for some value of x.”
In middle school algebra, when we say + is commutative, we use formula x+y =
y +x, here x and y are meant for all values. When solving an equation like x 2 −3x +
2 = 0, we try to find some value of x so that the equation holds. In first-order logic,
the two cases are expressed as ∀x∀y (x + y = y + x) and ∃x (x 2 − 3x + 2 = 0),
respectively.
In natural language, the sentences using words like “every,” “always,” or “never,”
are usually translated into formulas of first-order logic with ∀:
• “Everything Ada did is perfect.”: ∀x (did(Ada, x) → perf ect (x)), where
did(x, y) is true iff x did y. If we can find something that Ada did not do
perfectly, then the sentence is false.
• “Bob is always late for appointment!”: ∀x (appointment(x) → late(Bob, x)),
where late(x, y) means x is late for event y. This sentence is false if Bob is on
time once.
• “Cathy never likes the food cooked by Tom.”: ∀x (cook(T om, x) →
¬like(Cathy, x)), where cook(x, y) is true iff x cooked y.
The above three sentences are likely examples of the hasty generalization fallacy,
sometimes called the over-generalization fallacy. It is basically making a claim
based on evidence that is just too small. In general, people assume that “everything
rule has an exception.” However, in first-order logic, if an exception is not stated
explicitly in a rule, then the rule cannot be true and thus cannot be used in an
argument.
Example 5.1.7 From Example 5.1.6, if child(x, y) means “x is a child of y,”
parent (x, y) means “x is a parent of y,” descen(x, y) means “x is a descendant
of y,” and love(x, y) means “x loves y,” using these predicates, we can write many
assertions in first-order logic.
166 5 First-Order Logic
which can be read as “for everything x, either Tom doesn’t own x, x is not red, x is
not a toy, or Tom loves x.” Note that if → is replaced by ∧ in the above formula, we
get
which claims that “Tom owns everything,” “everything is red,” “everything is a toy,”
and “Tom loves everything.”
On the other hand, to express “Tom loves one of his red toy cars,” the formula is
If we replace the last ∧ by → in the above formula, then the resulting formula can
be converted to the following equivalent formula:
which claims that there exists something, denoted by “it,” and one (or more) of the
following is true: “Tom doesn’t own it,” “it is not a toy,” “it is not red,” “it is not a
car,” or “Tom loves it.” Since there are many things that “Tom doesn’t own it,” “it
is not a toy,” “it is not red,” or “it is not a car,” the above formula does not catch the
meaning that “Tom loves one of his red toy cars.”
5.1 Syntax of First-Order Languages 167
¬, ∧, ∨, →, {⊕, ↔}
Where to place ∀ and ∃? As a common practice, we will place them right after ¬.
represent objects of interest, and the set of objects is commonly called the universe
(also called the universe of discourse or domain of discourse) of the language. For
example, we can use T (F ), the set of ground terms, to represent the universe.
If we use U to denote the sort of T erms, for any signature (P , F ), predicate
symbol p ∈ P represents a function p : U arity(p) → Bool and function symbol
f ∈ F represents a function f : U arity(f ) → U . The universal and existential
quantifiers range over the universe U . For example, the first-order language in
Example 5.1.6 talks about 100 people living in a certain town, with a relation
love(x, y) to express that x loves y. In such a language, we might express the
statement that “everyone loves someone” by writing ∀x∃y love(x, y), assuming U
is the set of 100 people.
For many applications, Bool and U are disjoint, like in Examples 5.1.2 and 5.1.6.
For some applications, Bool can be regarded as a subset of U . For example, to model
the if-then-else command in a programming language, we may use the symbol ite :
Bool × U × U → U . Another example is the function symbol says : U × Bool →
Bool in the Knight and Knave puzzles (Example 2.5.4).
It is not difficult to introduce sorts (or types) in first-order logic by using unary
(or monadic) predicate symbols (arity = 1). In Example 5.1.6, we use ai to stand
for person i. If we need function age(x) to tell the age of person x, we need the
function symbols for natural numbers. If we use those in Example 5.1.2, then F =
{0, s, ai | 1 ≤ i ≤ 100}. In this application, we do need to tell which are persons
and which are numbers by using the following predicate symbols:
• person : U → Bool: person(x) is true iff x is a person.
• nat : U → Bool: nat (x) is true iff x is a natural number.
person(x) can be easily defined by asserting person(ai ) for 1 ≤ i ≤ 100. nat (x)
can also be defined by asserting nat (0) and nat (s(x)) ≡ nat (x). Using nat, we can
avoid the meaningless terms like s(ai ) since nat (s(ai )) is false.
Now, to state general properties as in Examples 5.1.2 and 5.1.6, we have to
put restrictions on the variables. For example, to express “everybody is loved by
somebody,” the formula now is
∀x, y, z (nat (x) ∧ nat (y) ∧ nat (z) ∧ (x < y) ∧ (y < z) → (x < z))
Dealing with such unary predicates is tedious. To get rid of such clumsiness,
people introduce “sorts.” Let
P erson = {x | person(x) ∧ x ∈ U }
Nat = {x | nat (x) ∧ x ∈ U }
5.2 Semantics 169
Then define love : P erson2 → Bool and <: Nat 2 → Bool and the formulas
become neat again: We do not need person(x), nat (x), etc., in the formula. For
the age function, we have age : P erson → Nat. That is the birth of many-sorted
logic, which is a light extension of first-order logic.
The idea of many-sorts is very useful in programming languages where each
sort is a set of similar objects. Sets are usually defined by unary predicates. From
a logician’s viewpoint, a set S = {x | p(x)} and a predicate p(x) have the same
meaning. In many-sorted logic, one can have different sorts of objects—such as
persons and natural numbers—and a separate set of variables ranging over each.
Moreover, the specification of function symbols and predicate symbols indicates
what sorts of arguments they expect (the domain), and, in the case of function
symbols, what sort of value they return (the range). The concepts of arity and
signature are extended to contain the domain and range information of each function
and predicate symbol.
Many-sorted logic can reflect formally our intention not to handle the universe as
a collection of objects, but to partition the universe in a way that is similar to types
in programming languages. This partition can be carried out on the syntax level:
Substitution of variables can be done only accordingly, respecting the “ sorts.” We
will return to this topic in Sect. 5.2.4 after introducing the semantics of first-order
logic.
5.2 Semantics
We should be aware that, at this stage, all the function and predicate symbols are just
symbols. In propositional logic, a symbol p can represent “Tom runs fast,” or “Mary
is tall.” The same holds true in first-order logic as we may give different meanings
to a symbol.
In Example 5.1.7, we have seen the following two formulas:
• ∀x∀y (child(x, y) → descen(x, y)).
• ∀x∀y (parent (x, y) → love(x, y)).
The first formula says “if x is a child of y, then x is a descendant of y”; the second
says “every parent loves his child.” What is the difference between them? They differ
only on predicate symbols. If we give different meanings to the predicate symbols,
one can replace the other. That is, we have designed the language with a certain
interpretation in mind, but one could also interpret the same formula differently.
In this section, we will spell out the concepts of interpretations, models, and
satisfiable or valid formulas, as an extension of these concepts of propositional logic.
170 5 First-Order Logic
5.2.1 Interpretation
Procedure 5.2.3 The procedure eval takes as input a formula or a term A, with
or without free variables, in L = (P , F, X, Op), an interpretation I for L, and an
assignment θ : f ree(A) → D I , and returns a Boolean value for a formula and an
element of D I for a term. If A = f (t1 , t2 , . . . , tk ) and f ∈ P , then A is an atom.
proc eval(A, I, θ )
if A = return 1 // 1 means true.
if A = ⊥ return 0 // 0 means false.
if A ∈ X return θ (A) // A is a free variable.
if A = f (t1 , t2 , . . . , tk ) return f I (t1 , t2 , . . . , tk ), where ti = eval(ti , I, θ )
if A = ¬B return ¬eval(B, I, θ )
if A = (B op C) return eval(B, I, θ ) op eval(C, I, θ ), where op ∈ Op
if A = (∀x B) return allI nD(B, I, θ, x, D I )
if A = (∃x B) return someI nD(B, I, θ, x, D I )
else return “unknown”
Procedure allI nD(B, I, θ, x, S) takes formula B, interpretation I , assignment
θ , free variable x in B, and S ⊆ D and evaluates the value of B under I and
θ ∪ {x d} for every value d ∈ S. If eval(B, I, θ ∪ {x d}) returns 0 for one
d ∈ D I , return 0; otherwise, return 1.
proc allI nD(B, I, θ, x, S)
if S = ∅ return 1
pick d ∈ S
if (eval(B, I, θ ∪ {x d}) = 0) return 0
return allI nD(B, I, θ, x, S − {d})
Procedure someI nD(B, I, θ, x, S) works in the same way as allI nD, except
that if one of the evaluations, eval(B, I, θ ∪{x d}), returns 1, return 1; otherwise,
return 0.
proc someI nD(B, I, θ, x, S)
if S = ∅ return 0
pick d ∈ S
if (eval(B, I, θ ∪ {x d}) = 1) return 1
return someI nD(B, I, θ, x, S − {d})
Evidently, when the domain of I , D I , is infinite, neither allI nD(B, I, θ, x, S)
nor someI nD(B, I, θ, x, S) will stop, and eval cannot be an algorithm. In this
case, we view them as recursive mathematical definitions. While the values of these
procedures cannot be decided by an algorithm, we can reason about them using
mathematical induction. In the last section of this chapter, we will see that D I can
be represented by a Herbrand base, which is the set of all ground atomic formulas.
Since every Herbrand base is countable, a well-found order exists for D I and we
may apply induction on D I . We present eval as a procedure because we assume that
the reader is familiar with algorithms. When D I is finite, the termination of eval is
easy to establish.
172 5 First-Order Logic
which catches the meaning that “I (∀x B(x), θ ) returns 1 iff B(x) is evaluated to 1
under I and θ ∪ {x d} for every value d ∈ D.”
Similarly, we have a notation for I (∃x B(x), θ ), or equivalently, someI nD(B(x),
I, θ, x, D):
I (∃x B(x), θ ) = I (B(x), θ ∪ {x d})
d∈D I
5.2 Semantics 173
which catches the meaning that “I (∃x B(x), θ ) returns 1 iff B(x) is evaluated to
true under I and θ ∪ {x d} for some value d ∈ D I .” These notations will help
us to understand quantified formulas and establish their properties. Note that if the
domain D I is empty, then I (∀x B(x)) = 1 and I (∃x B(x)) = 0.
Here are just straightforward extensions from propositional logic to first-order logic.
Recall that a sentence is a closed formula.
Definition 5.2.7 Given sentence A and an interpretation I , I is said a model of A
if I (A) = 1. If A has a model, A is said to be satisfiable.
The satisfiability problem of first-order logic is to decide if a set of first-order
formulas is satisfiable or not. This problem is also called constraint satisfaction
where the constraint is a set of first-order formulas. The SAT problem and the linear
programming problem are special cases of constraint satisfaction.
For a propositional formula, the number of models is finite; for a first-order
formula, the number of models is infinite in general, because we have an infinite
number of choices for domains, relations, and functions in an interpretation.
However, we can still borrow the notation M(A) from propositional logic.
Definition 5.2.8 Given a closed formula A, let M(A) be the set of all models of A.
If M(A) = ∅, A is unsatisfiable; if M(A) contains every interpretation, i.e., every
interpretation is a model of A, then A is valid.
With the identical definitions from propositional logic, the following result is
expected in first-order logic.
Proposition 5.2.9 Every valid propositional formula is a valid formula in first-
order logic, and every unsatisfiable propositional formula is unsatisfiable in
first-order logic.
Of course, there are more valid formulas in first-order logic and our attention is
on formulas with quantifiers.
Example 5.2.10 Let us check some examples.
1. A1 is ∀x (p(a, x) → p(a, a)).
Consider I = ({a, b}, {p}, {a}), where pI = {a, a}, a I = a, I (A1 ) = 1
because pI (a, a) = 1 and
I (A1 ) = d∈{a,b} I (p(a, x) → p(a, a)), {x d})
= I (p(a, a) → p(a, a)) ∧ I (p(a, b) → p(a, a))
= 1 ∧ (pI (a, b) → pI (a, a))
=1
174 5 First-Order Logic
Let I = ({a, b}, {{a, b}}, {a}), then I (A1 ) = 0 because pI (a, a) = 0 and
pI (a, b) = 1 imply that I (p(a, b) → p(a, a)) = 0. I (A1 ) = I (p(a, a) →
p(a, a)) ∧ I (p(a, b) → p(a, a)) = 1 ∧ 0 = 0. Thus, A1 is satisfiable but not
valid.
2. A2 is (∀x p(a, x)) → p(a, a).
A2 is different from A1 because the scopes of x are different in A1 and A2 . For
any interpretation I , we consider the truth value of pI (a I , a I ). If pI (a I , a I ) = 1,
then I (A2 , ∅) = 1. If pI (a I , a I ) = 0, then I (∀x p(a, x)) = 0 because x will
take on value a I . So, I (A2 ) = 1 in any case. Since I is arbitrary, A2 must be
valid.
3. A3 is ∃x∃y (p(x) ∧ ¬p(y)).
Consider I = ({a, b}, {p}, ∅), where p(a) = 1 and p(b) = 0, then p(a) ∧
¬p(b) = 1. From I (p(x) ∧ ¬p(y), {x a, y b}) = 1, we conclude that
I is a model of A3 . A3 is false in any interpretation whose domain is empty or
contains a single element. Thus, A3 is not valid.
Definition 5.2.11 Let A(x) be a formula where x is a free variable of A. Then
A(t) denotes the formula A(x)[x t], where every occurrence of x is replaced
by term t. The formula A(t) is called an instance of A(x) by substituting x for t
(ref. Definition 2.1.3).
We will use substitutions extensively in the next chapter. The substitution θ :
X → D used in the procedure eval coincides with the above definition if D = T (F )
(the set of ground terms built on F ). That is, for any interpretation I = (D, R, G)
of L = (P , F, X, Op), if D = T (F ), then I (A(x), {x d}) = I (A(d)), i.e.,
eval(A(x), I, {x d}) = eval(A(d), I, ∅), where x ∈ X, d ∈ T (F ), and A(d) =
A(x)[x d].
We will keep the convention that a set S of formulas denotes the conjunction of
the formulas appearing in the set. If every formula is true in the same interpretation,
then the set S is said to be satisfiable, and the interpretation is its model.
Example 5.2.12 Let S = {∀x∃y p(x, y), ∀x ¬p(x, x), ∀x∀y∀z (p(x, y) ∧
p(y, z) → p(x, z))}. The second formula states that “p is irreflexive”; the third
formula says “p is transitive.”
Consider the interpretation I = ({N }, {<}, ∅); it is easy to check that I is a
model of S, because < is transitive and irreflexive, i.e., ¬(x < x), and for every
x ∈ N , there exists x + 1 ∈ N such that x < x + 1.
Does S have a finite model, i.e., a model in which the universe is finite? If S has
a finite model, say I = (D, {rp }, ∅), where |D| = n, consider the directed graph
G = (D, rp ). If there exists a path from a to b in G, there must exist an edge from a
to b, because rp is transitive. We cannot have an edge from a vertex to itself because
rp is irreflexive. Thus, if G has a cycle, then there exists a path from a vertex v in the
cycle to itself, which leads to an edge from v to itself by transitivity, a contradiction.
Without cycles, every path in G = (D, rp ) has an end since D is finite. For a vertex
5.2 Semantics 175
x at the end of a path, we cannot find another vertex y satisfying ∀x∃y p(x, y).
Hence, S cannot have any finite model.
In Chap. 2, given a set P of propositional variables, we use AllP to denote all
the propositional interpretations involving P . Similarly, in first-order logic, given a
set P of predicate symbols and a set F of function symbols, let AllP ,F denote all
the interpretations involving P and F . Then we have a copy of Theorem 2.2.11 for
first-order logic, which shows a close relationship between logic and set theory.
Theorem 5.2.13 For any closed formulas A and B of L = (P , F, X, Op),
The proof of the above theorem follows the same approach as that of Theo-
rem 2.2.11.
The following two definitions are copied from Definitions 2.2.13 and 2.2.17.
Definition 5.2.14 Given two formulas A and B, A and B are logically equivalent
if M(A) = M(B), denoted by A ≡ B.
Definition 5.2.15 Given two formulas A and B, we say A entails B, or B is a
logical consequence of A, denoted by A | B, if M(A) ⊆ M(B).
Assuming A is , We can simply write | B to denote that “B is valid.”
With the same definitions, we have the same results for first-order logic.
Theorem 5.2.16 For any two formulas A and B, (a) A | B iff A → B is valid;
and (b) A ≡ B iff A ↔ B is valid.
The substitution theorem from propositional logic still holds in first-order logic.
Theorem 5.2.17 For any formulas A, B, and C, where B is a subformula of A, and
B ≡ C, then A ≡ A[B C].
Proposition 5.2.18 For any formula A(x), where x is a free variable of A, variable
y does not appear in A(x), and A(y) denotes A(x)[x y], then ∀x A(x) ≡
∀y A(y) and ∃x A(x) ≡ ∃y A(y).
Proof For any interpretation I ,
I (∀x A(x)) = d∈D I (A(x), {x d}) = d∈D I (A(d))
I (∀y A(y)) = d∈D I (A(y), {y d}) = d∈D I (A(d))
176 5 First-Order Logic
The variable x serves as a placeholder in A(x), just like y serves as the same
placeholder in A(y), to tell eval when to replace x or y by d. Any distinct name
can be used for this placeholder, thus I (∀x A(x), θ ) = I (∀y A(y), θ ). Since I is
arbitrary, it must be the case that ∀x A(x) ≡ ∀y A(y). The proof for ∃x A(x) ≡
∃y A(y) is identical.
The above proposition allows us to change the names of bounded variables safely,
so that different variables are used for each occurrence of quantifiers. For example,
it is better to replace ∀x (p(x) ∨ ∃x q(x, y)) by ∀x (p(x) ∨ ∃z q(z, y)), to improve
the readability.
Proposition 5.2.19 Let A(x), B(x), and C(x, y) be the formulas with free vari-
ables x and y.
1. ¬∀x A(x) ≡ ∃x ¬A(x)
2. ¬∃x A(x) ≡ ∀x ¬A(x)
3. ∀x (A(x) ∧ B(x)) ≡ (∀x A(x)) ∧ (∀x B(x))
4. ∃x (A(x) ∨ B(x)) ≡ (∃x A(x)) ∨ (∃x B(x))
5. (∀x A(x)) ∨ B ≡ ∀x (A(x) ∨ B) if x is not free in B
6. (∃x A(x)) ∧ B ≡ ∃x (A(x) ∧ B) if x is not free in B
7. ∀x∀y C(x, y) ≡ ∀y∀x C(x, y)
8. ∃x∃y C(x, y) ≡ ∃y∃x C(x, y)
Proof We give the proofs of 1, 3, 5, and 7, and leave the rest as exercises.
1. Consider any interpretation I , using de Morgan’s law, ¬(A ∧ B) ≡ ¬A ∨ ¬B,
I (¬∀x A(x))
= ¬I (∀x A(x))
= ¬ I (A(x), {x d})
d∈D I
= ¬I (A(x), {x d})
d∈D I
5. In the following proof, we assume that x does not appear in B. Consider any
interpretation I = (D, R, G), using the distribution law of ∨ over ∧,
I ((∀x A(x)) ∨ B)
= I (∀x A(x)) ∨ I (B)
= ( d∈D I (A(x), {x d})) ∨ I (B)
= (I (A(x), {x d}) ∨ I (B))
d∈D
= (I (A(x), {x d}) ∨ I (B, {x d}))
d∈D
= d∈D I (A(x) ∨ B, {x d})
= I (∀x (A(x) ∨ B))
let I = (D, R, G) be any model of ∃x∀y p(x, y). There exists d ∈ D such that
I (∀y p(x, y), {x d}) = 1. Then ∀y∃x p(x, y) must be true in I :
To show that ∀y∃x p(x, y) does not entail ∃x∀y p(x, y), we need to find an
interpretation in which ∀y∃x p(x, y) is true but ∃x∀y p(x, y) is false. The detail
is left as an exercise.
Example 5.2.22 We may use known equivalence relations to show ∃x (A(x) →
B(x)) ≡ (∀x A(x)) → (∃x B(x)).
∃x (A(x) → B(x))
≡ ∃x (¬A(x) ∨ B(x)) //A → B ≡ ¬A ∨ B
≡ (∃x ¬A(x)) ∨ (∃x B(x)) //Proposition 5.2.19(4)
≡ ¬(∀x A(x)) ∨ (∃x B(x)) //Proposition 5.2.19(1)
≡ (∀x A(x)) → (∃x B(x)) //A → B ≡ ¬A ∨ B
The more equivalences are established, the more opportunity we can apply this
method.
Example 5.2.23 The drinker paradox can be stated as “There is someone in the
pub such that, if he or she is drinking, then everyone in the pub is drinking.” It
was popularized by the logician Raymond Smullyan, who called it the “drinking
principle” in his 1978 book What Is the Name of This Book.
Let p(x) denote “x is drinking in the pub”; the drinker paradox can be
represented by the formula
∃x (p(x) → ∀y p(y))
A: ∃x ¬p(x) ∨ ∀y p(y)
This formula is valid because for any interpretation I , ∀y p(y) is either true or false
in I . If it is true, then A is true in I ; if it is false, then there exists an element a of
I such that p(a) is false, so ∃x ¬p(x) is true in I . Note that if the domain of I is
empty, ∀y p(y) is trivially true in I .
Let p(x) denote “x is rich”; from A, we obtain the following valid statement
in first-order logic: “There is someone in the world such that, if he or she is rich,
then everyone in the world is rich.” It appears that if we have found this person, the
5.2 Semantics 179
world would be rid of poverty, although in logic, any poor guy can be this person.
The apparently paradoxical nature of this statement comes from the fact that the “if
then” statements often represent causation in natural language. However, in first-
order logic, A → B is true if A is false and there is no causation between A and
B.
The statement of the drinker paradox may be wrongly specified by the formula
which is equivalent to
S1 = {x | x ∈ N ∧ x ≥ 5 ∧ x ≤ 10}
where Pint (x) is true iff x is an integer. Many-sorted logic assumes a universal set
U and every type is a subset of U . There may exist relations between types. For
example, the type nat of natural numbers is a subtype of int, whose meaning is
nat = {x | x ∈ int, x ≥ 0}
In other words, nat ⊆ int. Many-sorted logic deals with a hierarchy of types and
was one of the approaches invented to avoid a paradox in naive set theory (see
Sect. 1.3.1).
Some modern programming languages use a strong type system, where types are
used by compilers as “values” for type checking, an idea coming from higher-order
logic. We illustrate the idea of strong typing by an example.
In Chap. 1, we introduced the Backus–Naur form with the following definition
of natural numbers:
where suc denotes the successor function. A programming language with a strong
type system may accept the above definition to define a new type Num. Moreover,
it can create dynamically many new types. For instance, if Num denotes the set of
natural numbers, then suc(Num) is the type of positive integers, suc(suc(N um))
is the type of natural numbers greater than 1, and so on. If a function needs positive
integer x (as a divisor), we may declare the type of x as suc(N um), without
explicitly adding x > 0 in the code. A compiler will add the condition x > 0
automatically before the call of the function. This way of using types may create a
neat code and avoid human errors.
Strong type system is a common feature of functional programming languages
based on typed lambda calculus (to be discussed next). When we write our code in
a functional programming language like Haskell, we can make use of its strong type
system that helps us to filter out a couple of logical errors in advance.
182 5 First-Order Logic
It seems that first-order logic is a natural and meaningful object of study. From
the current chapter to the end of this book, seven chapters out of eight are entirely
devoted to the study of first-order logic. Even for the only chapter not on it, first-
order logic is indispensable. That is, it occupies the two-third space of this book,
because first-order logic has long been regarded as the “right” logic for studying
mathematics and computer science.
Higher-Order Logic
People may wonder the meaning of “first-order” and ask what are “second-order,”
“third-order,” etc. The variables of first-order logic take individuals from a set called
the universe. If we allow variables to take sets of individuals, it is called second-
order logic. Third-order logic will allow variables to take sets of sets and so on.
Higher-order logic is often referred to any of nth-order logic for n ≥ 1 and allows
quantification of variables over sets that are nested arbitrarily deep.
Higher-order logic is more expressive than first-order logic, but their model-
theoretic properties are less well-behaved than those of first-order logic. In Chap. 11,
we will show that some problems of first-order logic are not computable. More
problems become uncomputable in higher-order logic.
Another definition of “higher-order logic” is that variables can take arbitrary
functions over the universe, not just individuals of the universe. A well-known
example of this kind of formal logic systems is lambda calculus, introduced by
Alonzo Church in the 1930s. Lambda calculus uses function abstraction, variable
binding, and substitution for computation. It is the foundation of type theory and
functional programming. More information related to lambda calculus can be found
in Chaps. 9 and 11. A detailed introduction of lambda calculus is out of the scope
of this book, even though it has great impact on many fields of computer science.
Sometimes people use higher-order logic to denote the union of second-order,
third-order, . . . , nth-order, . . . , logic as a contrast to first-order logic. Since
third-order or fourth-order logic is rarely needed, higher-order may simply be
second-order logic because the advantage and the disadvantage that came along
with moving beyond first-order logic have already appeared in second-order logic.
According to William Ewald [3], the history of first-order logic is anything but
straightforward, and is certainly not a matter of a sudden discovery by a single
researcher. First-order logic was studied by logicians like Gottlob Frege and
explicitly identified by Charles Peirce in 1885, but was then forgotten. It was
independently re-discovered by David Hilbert in 1917. That is, Peirce was the first
5.2 Semantics 183
to identify it, but it is Hilbert who put it on the map, very much like “the Vikings
discovered America, but they forgot about it, because they did not yet need it.”
Hilbert isolated first-order logic by posing questions of completeness, consis-
tency, and decidability for systems of logic. Although these questions represented
an enormous conceptual leap in logic, Hilbert did not treat first-order logic as more
significant than higher-order logic. It is Gödel and others who recognized first-order
logic as being importantly different from higher-order logic in the early 1930s.
Even after the Gödel results were widely understood, logicians continued to work in
higher-order logic, and it took years before first-order logic attained a “privileged”
status.
The emerging of first-order logic is bound up with technical discoveries,
with differing conceptions of what constitutes logic, with different programs of
mathematical research, and with philosophical and conceptual reflection. There are
reasons why first-order logic is regarded as a “privileged” logical system—that is,
as the “right” logic for investigations in foundations of mathematics. In general,
first-order logic is more expressive than propositional logic and less complicated
than higher-order logic. It appears that our grasp on quantification over objects is
firmer than our grasp on quantification over properties, even when the universe is
finite. For instance, solving a Sudoku puzzle is a first-order quantification problem
(see Sect. 4.3.2). Questions about Sudoku’s initial configurations, such as “can it
admit one and only one solution?”, “what is its difficulty?”, “what is the limit on the
number of cells filled initially?”, etc., are problems of second-order quantification.
These are hard problems even for the experts of Sudoku puzzles.
Every universe of first-order logic is assumed to be countable. However, quan-
tification over all subsets of a countably infinite set entails quantification over an
uncountable set (Proposition 1.3.7). In Chap. 11, we will show that uncountable
sets are uncomputable (Proposition 11.4.8). In other words, quantification in higher-
order logic is uncomputable in general when the universe is infinite.
It seems that the “privileged” status of first-order logic also came from philo-
sophical considerations: the need to avoid the set-theoretical paradoxes, a search
for secure foundations for mathematics, and a sense that higher-order logic was
both methodologically suspect and avoidable. All these things show the continuing
influence of the “Crisis of Foundations,” Grundlagenkrise in German, of the 1920s,
which did so much to set the terms of the subsequent philosophical understanding
of the foundations of mathematics [3].
Type theory [4] was invented by Alonzo Church in 1940, called simple type theory,
and there are many generalizations of simple type theory. Church’s type theory is
based on his lambda calculus and lambda-notation is the only binding mechanism
employed in simple type theory. Since properties and relations can be regarded as
functions from the universe to truth values, the concept of a function is taken as
primitive in Church’s type theory, and the lambda-notation is incorporated into the
184 5 First-Order Logic
formal language of type theory. Moreover, quantifiers and description operators are
introduced in a way so that additional binding mechanisms can be avoided, and
lambda-notation is reused instead. Nowadays type theory usually means study of
systems based on typed lambda calculi.
In mathematical logic, type theory is a synonym of higher-order logic, since they
allow quantification not only over the elements of the universe (as in first-order
logic) but also over functions, predicates, and even higher-order variables. Type
theory assigns types to entities, distinguishing, for example, between numbers, sets
of numbers, functions from numbers to sets of numbers, and sets of such functions.
These distinctions allow one to discuss the conceptually rich world of sets and
functions without encountering the paradoxes of naive set theory. Hence, type theory
and higher-order logic are equivalent and both can be viewed as the union of a family
of finite-order logic.
Type theory is a formal logical language including first-order, but more expres-
sive in a practical sense. Type theory constitutes an excellent formal language
for representing the knowledge in automated information systems, sophisticated
automated reasoning systems, systems for verifying the correctness of mathematical
proofs, and a range of projects involving logic and artificial intelligence. It also
plays an important role in the study of the formal semantics of natural language.
Despite of high complexity of type theory related algorithms, there are successful
and influential implementations based on type theory, including ML, Haskell, Coq,
the HOL family, and Twelf.
Proof methods of first-order logic have the same three tasks as in propositional logic:
to prove a formula A is (a) valid, (b) unsatisfiable, or (c) satisfiable. (a) and (b) are
equivalent because A is valid iff ¬A is unsatisfiable. (c) is different because both
A and ¬A can be satisfiable at the same time. As shown in Example 5.2.22, we
may use equivalence relations to transform A to or ⊥, thus showing A is valid or
unsatisfiable. However, this approach has only very limited chance of success.
Some proof methods like those based on truth table or ROBDD do not work
for first-order logic. However, many proof methods can be extended easily from
propositional logic to first-order logic, even though the latter is much more
expressive than the former. That means first-order logic has proof methods simple
to reason about theoretically or to implement correctly on a computer. For example,
the well-known modus ponens rule can be specified as follows:
Definition 5.3.1 The modus ponens rule of the first-order logic is the following
inference rule:
∀x (A(x) → B(x)) A(t)
(modus ponens) t is any term
B(t)
5.3 Proof Methods 185
α α1 , α2 β β1 β2
A∧B A, B ¬(A ∧ B) ¬A ¬B
¬(A ∨ B) ¬A, ¬B A∨B A B
¬(A → B) A, ¬B A→B ¬A B
¬(A ⊕ B) (A ∨ ¬B), (¬A ∨ B) A⊕B A, ¬B ¬A, B
A ↔ B (A ∨ ¬B), (¬A ∨ B) ¬(A ↔ B) A, ¬B ¬A, B
¬(A ↑ B) A, B A↑B ¬A ¬B
A↓B ¬A, ¬B ¬(A ↓ B) A B
¬¬A A
To handle the quantifiers, we introduce two rules: one for the universal quantifier,
called the ∀-rule, and one for the existential quantifier, called the ∃-rule.
Definition 5.3.2 The ∀-rule is an inference rule:
∀x A(x)
(∀) t is a ground term
A(t)
∃x A(x)
(∃) c is a new constant
A(c)
¬∀x A(x)
(¬∀) c is a new constant
¬A(c)
Example 5.3.4 To show (∃x∀y p(x, y)) → (∀y∃x p(x, y)) is valid by semantic
tableaux, we work on the negation of the formula.
Since the negation of the formula has a closed tableau, the formula is valid.
Example 5.3.5 Let A be B ∧ C ∧ D, where B = (∀x p(x, s(x))), C =
(∀x∀y (q(x) ∨ q(y))), and D = (¬∃x q(x)). If we choose to work on q first, we
will find a closed tableau for A; thus, A is unsatisfiable. In the following, we will
start with {B, C, D} by omitting the use of the α∧ rule.
: B, C, (¬∃x q(x)) ∀
1: B, C, (¬∃x q(x)), ∀y (q(c) ∨ q(y)), ∀
11 : B, C, (¬∃x q(x)), ∀y (q(c) ∨ q(y)), q(c) ∨ q(c), ¬∃
111 : B, C, (¬∃x q(x)), ∀y (q(c) ∨ q(y)), q(c) ∨ q(c), ¬q(c), β∨
1111 : B, C, (¬∃x q(x)), ∀y (q(c) ∨ q(y)), q(c), ¬q(c), closed
1112 : B, C, (¬∃x q(x)), ∀y (q(c) ∨ q(y))), q(c), ¬q(c), closed
5.3 Proof Methods 187
op I ntroduction Elimination
∀ (A(x) | α) (∀x A(x) | α) (∀x A(x) | α) (A(t) | α)
∃ (A(t) | α) (∃x A(x) | α) (∃x A(x) | α) (A(c) | α)
where x is a variable and does not appear in α, t is a term, and c is a new constant.
From these rules, we can see that free variables have the same function as universally
quantified variables. We use two examples to illustrate these rules.
Example 5.3.7 Let A = {∀x p(x), ∀x q(x)}. A proof of A | ∀x (p(x) ∧ q(x)) in
natural deduction is given below.
Example 5.3.8 Show that (∃x (p(x) ∨ q(x))) | (∃x p(x)) ∨ (∃x q(x)) in natural
deduction.
By Theorem 5.2.16, (∃x (p(x) ∨ q(x))) → ((∃x p(x)) ∨ (∃x q(x))) is valid.
A = Q1 x1 Q2 x2 · · · Qk xk B(x1 , x2 , . . . , xk ),
The idea of prenex normal form is to move all the quantifiers to the top of the
formula. For example, ∀x∀y (p(x, y) ∨ q(x)) is a PNF but ∀x((∀y p(x, y)) ∨ q(x))
is not.
To make the discussion easier, we assume that the formulas contain only ¬, ∧,
∨, plus the two quantifiers, as operators. A quantifier which is not at the top must
have another operator immediately above it: In this case the operator will be one
of the three operators: ¬, ∧, or ∨. Since the quantifier can be in either parameter
position of ∧ or ∨, and there are two quantifiers, there are 10 cases to consider. If
we consider ∧ and ∨ are commutative, we need six rules, which are provided in
Proposition 5.2.19:
we may choose first to rename the variable names so that every quantified variable
has a different name:
which is in NNF; this step is not required for obtaining PNF. We have two options
now: Either choose equivalence rule 3 or rule 5. To use rule 3, we have
which is a PNF.
Note that the order of quantified variables in this formula is xyzw. There are five
other possible outcomes where the order is xzyw, xzwy, zxyw, zxwy, or zwxy,
respectively. If we are allowed to switch the quantifiers of the same type (the last
two in Proposition 5.2.19), we obtain more equivalent formulas where the order is
xywz, xwyz, xwzy, wxyz, wxzy, or wzxy. These formulas are all equivalent as we
use the equivalence relations in obtaining PNF.
Proposition 5.4.5 Every first-order formula can be transformed into an equivalent
PNF.
Proof By Proposition 5.2.18, renaming quantifiers variable names preserves the
equivalence. The equivalence rules for PNF are terminating (an exercise problem)
and preserve the equivalence. Applying these rules repeatedly, we will obtain the
desired result.
In the following, we will use PrenexNF(A) to denote the procedure for
transforming formula A into a prenex normal form.
5.4.2 Skolemization
Skolemization is the process in which we remove all the quantifiers, both universal
and existential, leaving a formula with only free variables. This process is made
easier by having the original formula in prenex normal form, and we assume this in
this section.
Removing the universal quantifiers while preserving meaning is easy since we
assume that the meaning of a free variable is the same as a universally quantified
variable: The formula will be true for any value of a free variable. Thus, we simply
discard all the top universal quantifiers from a PNF.
Doing the same for an existentially quantified variable is not possible because a
free variable can have only one default meaning. The key idea is to introduce a new
5.4 Conjunctive Normal Form (CNF) 191
function to replace that variable. These functions are called Skolem functions. When
the functions are nullary (i.e., the arity is 0), they are called Skolem constants.
Example 5.4.6 Consider the formula ∀x∃y (y 2 ≤ x ∧ ¬((y + 1)2 ≤ x)). This
asserts the existence of a maximal number y whose square is no more than x for
any value of x. We may drop the quantifier ∀ so that x becomes free and keep
the meaning of the formula. The same effect could be achieved by asserting the
existence of a function f , depending on x, satisfying
terminates with a weakly equivalent formula. To see that the process terminates
is easy, since the process of converting a formula into PNF is terminating and
the number of quantifiers is reduced in the body of the while loop of Sko(A).
The loop continues while a quantifier remains at the top of the formula, so the
final formula will have no quantifiers. The input and output formulas are not
necessarily equivalent, but their meanings are closely related, and we will capture
this relationship with the notion “equally satisfiable” or “equisatisfiable.”
Recall that in propositional logic (Definition 3.3.7), we say two formulas, A and
B, are equisatisfiable, denoted by A ≈ B, if M(A) = ∅ whenever M(B) = ∅
and vice versa. This relation is weaker than the logical equivalence, which requires
M(A) = M(B). We will reuse ≈ with the same definition for first-order formulas.
It is easy to see that elimination of universal quantifiers preserves equivalence.
The difficulties arise with the introduction of Skolem functions/constants.
Lemma 5.4.9 Let A be ∀x∃y B(x, y). Then there exists a function f (x) such that
(a) ∀x B(x, f (x)) | A and (b) ∀x B(x, f (x)) ≈ A.
Proof
(a): For any model I of ∀x B(x, f (x)), and for every d ∈ D I , it must be the case
that I (B(x, f (x)), {x d}) = 1. Let f I (d) = e ∈ D I , where f I : D I →
D I is the interpretation of f in I , then I (B(x, y), {x d, y e}) = 1. So,
we have
Proof Since a free variable and a universally quantified variable have the same
interpretation, dropping a universal quantifier does not change the semantics of a
formula. When we remove an existential quantifier, we introduce a Skolem function.
By Corollary 5.4.10, the resulting formula is equally satisfiable to the original
formula. Applying Corollary 5.4.10 to every existential quantifier, we have the
desired result, because both | and ≈ are transitive:
• (a): Sko(A) | A by Corollary 5.4.10(a).
• (b): A ≈ Sko(A) by Corollary 5.4.10(b).
Relation ≈ is what we call “weakly equivalent,” and we will see that it is
sufficient for our purposes.
c1 : (¬human(x) | mortal(x))
c2 : (¬iowan(y) | human(y))
c3 : (iowan(c))
c4 : (¬mortal(c))
When we consider interpretations for a set of formulas, there are simply too many
ways to choose a set as the universe, too many ways to choose a relation for a
predicate symbol, and too many ways to choose a function for a function symbol.
We can improve on this considerably because there is a class of interpretations,
invented by French mathematician Jacques Herbrand (1908–1931), called Herbrand
196 5 First-Order Logic
interpretations, which can stand for all the others. The Herbrand interpretations
share the same universe, called the Herbrand universe. What does this universe
contain? For each n-ary function symbol f in the formula (including constants) and
n elements, say t1 , t2 , . . . , tn , in the universe, it must contain the result of applying
f to these n elements. This can be achieved by a simple syntactic device; we put
all the ground terms in the universe and let each term denote the result of applying
its function to its parameters. This way, function symbols are bound to a unique
interpretation over the universe.
Definition 5.4.16 Given a CNF formula A in a first-order language L =
(P , F, X, Op), the Herbrand universe of A consists of all the ground terms, i.e., the
set T (F ); the Herbrand base is the set of all ground atoms, denoted by B(P , F ). A
Herbrand interpretation is a mapping I : B(P , F ) → {1, 0}.
Note that T (F ) is the set of variable-free terms that we can make from the
constants and functions in the formula. If there are no constants in F , T (F ) would
be empty. To avoid this case, we assume that there must exist at least one constant
in F , say a. From now on it will be useful to distinguish the constants from non-
constant functions, so by “function” we mean non-nullary function. If F does not
contain function symbols, then T (F ) will be a finite set of constants.
The Herbrand base B(P , F ) consists of all the atoms like p(t1 , t2 , . . . , tn ), where
p is an n-ary predicate symbol in P and ti ∈ T (F ). If n = 0 for all predicates, then
B(P , F ) = P , reduced to propositional logic. Otherwise, B(P , F ) is finite iff T (F )
is finite. For convenience, we may represent an interpretation I by a set H of ground
literals such that for any g ∈ B(P , F ), g ∈ H iff I (g) = 1 and g ∈ H iff I (g) = 0,
where g stands for ¬g.
Example 5.4.17 In Example 5.2.12, we showed that the set S has no finite models:
S = {∀x∃y p(x, y), ∀x ¬p(x, x), ∀x∀y∀z (p(x, y) ∧ p(y, z) → p(x, z))}
Note that S has no function symbols. Converting S to CNF, we obtain the following
set S of clauses:
A Herbrand interpretation of S is
Exercises
1. Using a first-order language with variables ranging over people and predicates
trusts(x, y), politician(x), crazy(x), know(x, y), and related(x, y), and
rich(x), write down first-order sentences asserting the following:
(a) Nobody trusts a politician.
(b) Anyone who trusts a politician is crazy.
(c) Everyone knows someone who is related to a politician.
(d) Everyone who is rich is either a politician or knows a politician.
2. Let the meaning of taken(x, y) be “student x has taken CS course y.” Please
write the meaning of the following formulas:
(a) ∃x∃y taken(x, y)
(b) ∃x∀y taken(x, y)
(c) ∀x∃y taken(x, y)
(d) ∃y∀x taken(x, y)
(e) ∀y∃x taken(x, y)
3. Let c(x) mean “x is a criminal” and s(x) mean “x is sane.” Using them to write
a first-order formula for each of the following sentences.
(a) All criminals are sane.
(b) Every criminal is sane.
(c) Only criminals are sane.
(d) Some criminals are sane.
(e) There is a criminal that is sane.
(f) No criminal is sane.
(g) Not all criminals are sane.
4. Given the predicates male(x), female(x), and parent(x, y) (x is a parent of
y), define the following predicates in terms of male, female, parent, or other
predicates defined from these three predicates, in first-order logic:
(a) father(x, y): x is the father of y.
(b) mother(x, y): x is the mother of y.
(c) ancestor(x, y): x is an ancestor of y.
(d) descendant(x, y): x is a descendant of y.
(e) son(x, y): x is a son of y.
(f) daughter(x, y): x is a daughter of y.
(g) sibling(x, y): x is a sibling of y.
5. Let +, −, ∗, %, >, <, ≥, ≤, = be usual operations on the natural numbers N ,
where (x%y) returns the remainder of x divided by y for y > 0. Please define
the following sets using first-order formulas:
Exercises 199
where adj oin(e, s) adjoins an element e into the set s. Please define the
following predicates in terms of ∅ and adj oin in first-order logic:
(a) isSet(x): x is an object of Set.
(b) member(x, s): x is a member of set s.
(c) subset(s, t): s is a subset of set t.
(d) equal(s, t): s and t contain the same elements.
8. Decide if the following formulas are valid, not valid but satisfiable, or unsatis-
fiable, using the definition of interpretations:
(a) p(a, a) ∨ ¬(∃x p(x, a))
(b) ¬p(a, a) ∨ (∃x p(x, a))
(c) ¬q(a) ∨ (∀x q(x))
(d) ¬(∃x q(x)) ∨ (∀x q(x))
(e) ∃x (q(x) ∨ ¬∀x q(x))
9. Prove the following equivalence relations:
(a) ¬∃x p(x) ≡ ∀x ¬p(x)
(b) ∃x (p(x) ∨ q(x)) ≡ (∃x p(x)) ∨ (∃x q(x))
(c) (∃x (p(x)) ∧ q) ≡ (∃x (p(x) ∧ q))
(d) ∃x∃y r(x, y) ≡ ∃y∃x r(x, y)
(e) ∃x (p(x) → q(x)) ≡ (∀x p(x)) → (∃x q(x))
10. Prove that the following entailment relations are true, but the converse of the
relation is not true.
(a) (∀x p(x)) ∨ (∀x q(x)) | ∀x (p(x) ∨ q(x))
(b) ∃x (p(x) ∧ q(x)) | (∃x p(x)) ∧ (∃x q(x))
(c) (∀x p(x)) → (∀x q(x)) | ∀x (p(x) → q(x))
(d) (∃x p(x)) → (∃x q(x)) | ∃x (p(x) → q(x))
(e) (∃x p(x)) → (∀x q(x)) | ∀x (p(x) → q(x))
(f) ∀x (p(x) → q(x)) | (∃x p(x)) → (∃x q(x))
(g) ∀x (p(x) → q(x)) | (∀x p(x)) → (∃x q(x))
(h) ∀x (p(x) ↔ q(x)) | (∀x p(x)) ↔ (∀x q(x))
(i) ∀x (p(x) ↔ q(x)) | (∃x p(x)) ↔ (∃x q(x))
11. Use the semantic tableau method to show the entailment relations are true in
the previous problem.
12. Provide the pseudo-code for the algorithm PrenexNF(A) and show that it runs
in O(n) time when A is in NNF, where n is the size of A.
200 5 First-Order Logic
13. Show that the algorithm Sko(A) runs in O(n2 ) time, where A is in NNF, and n
is the size of A.
14. Show that the algorithm Skol(A) runs in O(n2 ) time, where A is in NNF, and
n is the size of A.
15. Show that formula ∀x ∃y ((p(x) ∧ p(y)) ∨ (q(x) ∧ q(y))) can be converted
into an equivalent prenex normal form such that ∃ goes before ∀ in the prenex
normal form.
16. Prove that if every predicate symbol is monodic (i.e., unary) and every function
symbol is monodic or a constant, then any first-order formula can be converted
into a prenex normal form where ∃ goes before any ∀.
17. Convert each one of the following formulas into a set of clauses:
(a) ¬((∀x p(x)) ∨ (∀x q(x)) → ∀x (p(x) ∨ q(x)))
(b) ¬((∃x (p(x) ∧ q(x))) → ((∃x p(x)) ∧ (∃x q(x))))
(c) ¬(((∀x p(x)) → (∀x q(x))) → (∀x (p(x) → q(x))))
(d) ¬(((∃x (p(x) → q(x))) → (∃x p(x))) → (∃x q(x)))
(e) ¬((∃x p(x)) → ((∀x q(x)) → (∀x (p(x) → q(x)))))
(f) ¬(((∀x (p(x) → q(x))) → (∃x p(x))) → (∃x q(x)))
(g) ¬((∀x (p(x) → q(x))) → ((∀x p(x)) → (∃x q(x))))
(h) ¬(((∀x (p(x) ↔ q(x))) → (∀x p(x))) ↔ (∀x q(x)))
(i) ¬(((∀x (p(x) ↔ q(x))) → (∃x p(x))) ↔ (∃x q(x)))
18. Given a set S of clauses, S = {(p(f (x), c)), (¬p(f (w), z) | p(f (f (w)),
f (z))), (¬p(y, f (y)))}, please answer the following questions: (a) What is
the Herbrand universe of S? (b) What is the Herbrand base of S? (c) What is
the minimum Herbrand model of S (no subset of this model is a model)? (d)
What is the maximum Herbrand model of S (no superset of this model is a
model)?
19. Let c be (p(x, y) | p(y, z) | q(z) | p(x, z)) and S = {(p(x, y) | p(y, z) |
r(x, z)), (r(x, z) | q(x, z) | p(x, z)). Show that S | c and S ≈ c. In general,
show that a clause c of n literals can be converted into a set S of n − 2 clauses
of 3 literals each (3CNF) such that S | c and S ≈ c.
References
1. Melvin Fitting, First-Order Logic and Automated Theorem Proving, Springer Science &
Business Media. 1990 ISBN 978-1-4612-2360-3.
2. J. P. E. Hodgson, First Order Logic, Saint Joseph’s University, Philadelphia, 1995.
3. William Ewald, “The Emergence of First-Order Logic”, The Stanford Encyclopedia of Philoso-
phy (Spring 2019 Edition), Edward N. Zalta (ed.),
plato.stanford.edu/archives/spr2019/entries/logic-firstorder-emergence/,
retrieved Oct. 2023.
4. Christoph Benzmüller and Peter Andrews, “Church’s Type Theory”, The Stanford Encyclopedia
of Philosophy, Edward N. Zalta and Uri Nodelman (eds.),
plato.stanford.edu/archives/win2023/entries/type-theory-church/. retrieved
Oct. 2023.
Chapter 6
Unification and Resolution
Resolution is known to be a refutational proof method, which can show that the
input formula is unsatisfiable. If we would like to show that formula A is valid, we
need to convert ¬A into clauses and show that the clauses are unsatisfiable. To show
that A | B, we need to convert A ∧ ¬B into CNF and show that the clauses are
unsatisfiable. Note that A and B must be closed formulas.
For example, if we want to express that p(x, y) is a commutative relation,
we may use the formula p(x, y) → p(y, x), where x and y are assumed to be
arbitrary values. The negation of this formula is ¬(∀x∀y (p(x, y) → p(y, x))),
not ¬(p(x, y) → p(y, x)). The CNF of the former is p(a, b) ∧ ¬p(b, a), where
a and b are Skolem constants; the CNF of the latter is p(x, y) ∧ ¬p(y, x), which
is equivalent to the clause set {(p(x, y)), (¬p(y, x))}. These two unit clauses will
generate the empty clause by resolution when x is substituted by y. The process of
finding a substitution to make two atomic formulas (i.e., atoms) identical is called
“unification,” which is needed in the definition of the resolution rule for first-order
formulas.
6.1 Unification
(A | α), (¬A | β) (α | β)
on two clauses, we need to find the instances of the two clauses such that one
instance contains A and the other instance contains ¬A, where A is an atom. A
clause has infinite many instances, and the unification process helps us to find the
right instances for resolution.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 201
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2_6
202 6 Unification and Resolution
[x g(w), y w, z w]
γ = [x tθ | (x t) ∈ σ ]
Note that the composition does not necessarily preserve the idempotent property as
both σ and θ are idempotent but z f (z) appears in σ θ .
Proposition 6.1.7 For any substitutions σ and θ , and any term t, tσ θ = t (σ θ ).
Proof For any x ∈ X, if x is affected by σ , say σ (x) = s, then xσ θ = sθ = x(σ θ ).
If x is unaffected by σ , then xσ θ = xθ = x(σ θ ).
The above proposition states the property that applying two substitutions to a
term in succession produces the same result as applying the composition of two
substitutions to the term.
The composition operation is not commutative in general. For example, if σ =
[x a] and θ = [x b], then σ θ = σ and θ σ = θ . Thus, σ θ = θ σ . By
204 6 Unification and Resolution
definition, if x is affected by both σ and θ , which are idempotent, then the pair
x θ (x) is ignored in σ θ and the pair x σ (x) is ignored in θ σ .
Proposition 6.1.8 The composition is associative, i.e., for any substitutions σ , θ ,
and γ , (σ θ )γ = σ (θ γ ).
The proof is left as an exercise.
.
Definition 6.1.11 The rule-based unification algorithm will try to transform {s =
t}, [ ] to ∅, σ , where σ is a mgu of s and t, by the following unification rules:
. .
• Decompose: S ∪ {f (s1 , s2 , . . . , sm ) = f (t1 , t2 , . . . , tm )}, σ → S ∪ {si = ti |
1 ≤ i ≤ m}, σ .
.
• Clash: S ∪ {f (s1 , s2 , . . . , sm ) = g(t1 , t2 , . . . , tn )}, σ → ∅, f ail.
.
• Occur-check: S ∪ {x = t | x = t, x occurs in t}, σ → ∅, f ail.
.
• Redundant: S ∪ {t = t}, σ → S, σ .
. .
• Orient: S ∪ {t = x | t is not variable}, σ → S ∪ {x = t}, σ .
.
• Substitute: S ∪ {x = t | x does not occur in t}, σ → S[x t], σ · [x t].
Algorithm 6.1.12 The algorithm unify(s, t) takes two terms s and t and returns
either f ail or σ , a mgu of s and t.
proc unify(s, t)
.
1 S := {s = t}; σ = [ ]
2 while (true) do
3 S, σ := apply one unification rule on S, σ
4 if (σ = f ail) return f ail
5 if (S = ∅) return σ
Example 6.1.13 Given s = p(f (x, h(x), y), g(y)) and t = p(f (g(z), w, z), x),
.
unify(s, t) will transform {s = t}, [ ] into ∅, σ as follows:
.
{s = t}, [ ]
. .
decompose → {f (x, h(x), y) = f (g(z), w, z), g(y) = x}, [ ]
. .
orient → {f (x, h(x), y) = f (g(z), w, z), x = g(y)}, [ ]
.
substitute → {f (g(y), h(g(y)), y) = f (g(z), w, z)}, [x g(y)]
. . .
decompose → {g(y) = g(z), h(g(y)) = w, y = z}, [x g(y)]
. .
substitute → {g(z) = g(z), h(g(z)) = w}, [x g(z), y z]
.
redundant → {h(g(z)) = w}, [x g(z), y z]
.
orient → {w = h(g(z))}, [x g(z), y z]
substitute → ∅, [x g(z), y z, w h(g(z))]
The algorithm unify(s, t) does not specify the order of rules, so we may use these
rules in any order when they are applicable. However, none of these rules can be
applied forever. Hence, unify(s, t) will terminate eventually.
Lemma 6.1.14 Given any pair s and t, unify(s, t) will terminate with either f ail
or σ .
Proof First, we point out that the substitute rule may increase the sizes of terms in
S but it moves variable x from S into σ , that is, x disappears from S, so this rule can
be applied no more than the number of variables.
206 6 Unification and Resolution
For the rules used between two applications of the substitute rule, if the clash
rule or the occur-check rule apply, the process ends with failure immediately.
Otherwise, the decompose rule removes two occurrences of f . The orient rule
. .
switches t = x to x = t, where t is not a variable. The redundant rule removes
.
t = t. None of these rules can be used forever. In the end, either f ail is returned,
or S becomes empty and σ is returned.
Lemma 6.1.15 If S , σ is the result of applying the substitute rule on S, σ and
θ is a mgu of S , then θ = [x t] · θ is a mgu of S.
.
Proof By the definition of the substitute rule, S = (S − {x = t})[x t], where x
does not appear in t. By the assumption, S is unifiable with a mgu θ . Since x does
.
not appear in S , θ = [x t] · θ is a mgu of S ∪ [x = t]. It is easy to check that θ
.
is also a mgu of S because Sθ = S([x t] · θ ) = (S[x t])θ = S θ ∪ {t = t}.
Theorem 6.1.16 The algorithm unify(s, t) will return a mgu of s and t iff s and t
are unifiable.
Proof Lemma 6.1.14 ensures that unify(s, t) will terminate. Inside unify(s, t), when
S, σ is transformed to S , σ , for each rule, we can check that S is unifiable (σ
does not affect the unifiability of S) iff σ = f ail and S is unifiable. That is, when
σ is f ail, S cannot be unifiable. When S = ∅, S is trivially unifiable, so S must
be unifiable.
.
Before the substitute rule moves x = t from S, x is not affected by σ , tσ = t
and t does not contain x. Let σ = σ · [x t] = σ [x t] ∪ [x t]. Since
S[x t] and σ [x t] remove all the occurrences of x in S and σ , x has a unique
occurrence in σ and that is also true for all affected variables in σ . Thus, σ and σ
are idempotent by Proposition 6.1.2.
.
Now assume s and t are unifiable. Let S0 = {s = t}, σ0 = [ ], and σi =
σi−1 · [xi ti ] records every application of the substitute rule for 1 ≤ i ≤ m. Let
the corresponding S be S1 , S2 , . . . , Sm , respectively, where Sm = ∅. Since the mgu
of Sm is θm = [ ]. So, by Lemma 6.1.15,
the involved elements. The U nion operation will merge two sets into one, and the
F ind operation will return a representative member of a set, so that it allows us to
find out efficiently if two elements are in the same set or not.
In the following, we introduce an almost-linear time unification algorithm
slightly different from that of Ruzicka and Privara [1].
Given two terms s and t, the term graph for s and t is the directed acyclic graph
G = (V , E) obtained by combining the formula trees of s and t with the following
operation: All the nodes labeled with the same variable in s and t are merged into a
single node of G.
The graph G has the following features:
• Each node v of G is labeled with a symbol in s or t, denoted by label(v).
• The roots of the formula trees are the two nodes in G with zero in-degree,
representing s and t, respectively. The other non-variable nodes’ in-degrees are
one. The in-degree of a variable node is the number of occurrences of that
variable in s and t.
• The leaf nodes of the formula trees are the nodes with zero out-degree in G,
which are labeled with variables or constants.
• If a node is labeled with a function symbol f whose arity is k, then this node has
k ordered successors. We will use child(v, i) to refer to the ith successor of v.
The term associated with each node n of G can be recursively defined as
follows:
1. If label(n) is a variable or constant, term(n) = label(n).
2. If label(n) is a function f other than a constant, term(n) = f (s1 , s2 , . . . , sk ),
where si = term(child(n, i)) for 1 ≤ i ≤ arity(f ).
We say term(n) is the term represented by n in G.
The unification algorithm will work on G = (V , E) and is divided into two
stages: In stage 1, unify1 checks if s and t have no clash failure; in stage 2,
postOccurCheck returns true iff they have no occur-check failure.
In stage 1, algorithm unify1 takes the two nodes representing s and t and visits G
in tandem by depth-first search. During the search, we will possibly assign another
node u to node v as the representative of v, denoted by up(v) = u. Initially, up(v) =
v for every node of G. If up(v) = u and v = u, it means the terms represented by
v and u, respectively, are likely unifiable (no clash failure) and we use u as the
representative of v in the rest of the search. At the end of the search, unify1 returns
true if no clash failure was found. In this case, the relation R defined by up, i.e.,
R = {(u, up(u)) | u ∈ V }, can generate an equivalence relation E (i.e., E is the
reflexive, symmetric, and transitive closure of R). The relation E partitions V into
equivalence classes and the nodes in the same equivalent class represent the terms
which must be unifiable if s and t are unifiable.
Example 6.1.18 Given t = f (f (f (a, x1 ), x2 ), x3 ) and s = f (x3 , f (x2 , f (x1 , a))),
their term graph is shown in Fig. 6.1. The up relation is initialized as up(v) = v.
6.1 Unification 209
The algorithm unify1 (t, s) starts the depth-first search in tandem and goes to the
first child of (t, s), i.e., (v1 , x3 ). Since x3 is a variable node, unify1 (v1 , x3 ) will do
up(x3 ) := v1 and backtrack. The second child of (t, s) is (x3 , v2 ). Since up(x3 ) =
v1 , we use v1 for x3 and call unify1 (v1 , v2 ) recursively. Since v1 and v2 have the
same label, the search goes to their children. The first child of (v1 , v2 ) is (v3 , x2 ).
Since x2 is a variable node, we do up(x2 ) := v3 and backtrack. The second child
of (v1 , v2 ) is (x2 , v4 ). Use v3 for x2 and call unify1 (v3 , v4 ) recursively. Since v3
and v4 have the same label, the search goes to their children. The first child of
(v3 , v4 ) is (v5 , x1 ). We do up(x1 ) := v5 and backtrack. The second child of (v3 , v4 )
is (x1 , v6 ). Use v5 for x1 and call unify1 (v5 , v6 ) recursively. Since (v5 , v6 ) have
the same constant label, the search backtracks after doing up(v6 ) := v5 . Now, all
children of (v3 , v4 ) are done. After up(v4 ) := v3 , the search backtracks. Similarly,
all children of (v1 , v2 ) are done, and we backtrack after doing up(v2 ) := v1 . Finally,
all children of (t, s) are done and unify1 (t, s) finishes with up(s) := t.
The content of up(v) after the termination of unify1 (t, s) is shown below:
node n s t v1 v2 v3 v4 v5 v6 x1 x2 x3
up(n) t t v1 v1 v3 v3 v5 v5 v5 v3 v1
The next step is occur-check. For instance, whether x3 occurs in the term represented
by v1 or v2 ? The procedure postOccurCheck in the second stage will confirm that
the answer is “no.”
It is clear from the above example that each node of V is in a singleton set
initially. When the terms represented by the two nodes are unifiable, the sets
containing the two nodes are merged into one through the assignment up(v) := u,
i.e., we obtain “union” of the two sets containing u and v, respectively. To know
which node is represented by which node, we need the “find” operation. Thus, we
may employ the well-known union/find data structure to support the implementation
of unify1 .
210 6 Unification and Resolution
12 unmark(c1 ); unmark(c2 )
13 U nion(s1 , s2 )
14 return true
proc U nion(s1 , s2 )
// Initially, for any node s, h(s) = 0, the height of a singleton tree.
1 if h(s1 ) < h(s2 ) up(s1 ) := s2 ; return
2 up(s2 ) := s1 // the parent of s2 is s1
3 if h(s1 ) = h(s2 ) h(s1 ) := h(s1 ) + 1
proc F ind(v)
1 r := up(v) // look for the root of the tree containing v
2 while (r = up(r)) r := up(r) // r is root iff r = up(r)
3 if (r = v) // compress the path from v to r
4 while (r = v) up(v) := r; v := up(v)
5 return r
As said earlier, in the union-find data structure, each set of the nodes is
represented by a tree (defined by up(v) = u, where the parent of v is u) and the
root of the tree is the name of the set. The union operation implements the “union
by height,” making the root of the higher tree as the parent of the root of the shorter
tree. F ind(v) will return the name of the set, i.e., the root of the tree containing v;
this is done by the first while loop. For the efficiency of future calls to F ind, the
path from v to the root is compressed after the root is found; this is completed by
the second while loop.
There are two parent–child relations used in the above algorithm. The procedure
unify1 uses the depth-first search based on the parent–child relation defined by the
term graph, not the parent–child relation defined by up(v) = u. The latter induces
an equivalence relation and is used by the union/find algorithm.
Example 6.1.20 Let s = f (x, x) and t = f (y, a). A graph G = (V , E) will
be created with V = {vs , vt , vx , vy , va } and E = {(vs , vx )1 , (vs , vx )2 , (vt , vy )1 ,
(vt , va )2 }, where the subscripts are the order of children. Algorithm unify1 (vs , vt )
will call unify1 (vx , vy ), which changes up(vx ) = vy , and unify1 (vx , va ), which
changes up(vy ) = va . After the two recursive calls, unify1 (vs , vt ) will change
up(vs ) = vt and return true. V is partitioned into two sets: {vs , vt } and {vx , vy , va }.
the algorithm uses only linear space. The algorithm uses almost linear time because
of the following observations:
• Let G = (V , E) be the term graph of s and t, and n = |V | is bound by the total
size of s and t.
• The number of F ind operations performed in unify1 is twice of the number of
recursive calls of unify1 .
• Each U nion operation reduces the number of node sets by one; thus, the number
of union operations is bound by n.
• Once two subterms are shown to be unifiable, the two nodes representing them
are put into the same set; thus, we never try to unify again the same pair of
subterms and the number of recursive calls of unify1 is bound n.
• The total cost of F ind and U nion performed in unify1 is O(nα(n)), where α(n)
is the inverse of Ackermann function and grows very slowly. In fact, α(n) is
practically a constant function, e.g., α(109 ) ≤ 4.
• The special unions performed at lines 4 and 5 of unify1 do not affect the almost
linear complexity, because the total cost of such unions between a variable node
and a non-variable is bounded by the number of variable nodes.
• The procedure postOccurCheck will check if G = (V , E ), where E = E ∪
{(v, up(v))|v ∈ V , v = up(v)}, contains a cycle and can be implemented in
O(n) time by the depth-first search.
Theorem 6.1.21 The algorithm lunify(s, t) takes O(nα(n)) time, where n is the
total sizes of s and t, and α(n) is the inverse of Ackermann function; lunify(s, t)
returns true iff s and t are unifiable.
The algorithm lunify tells us if two terms are unifiable or not. However, this
is different from knowing what their unifiers actually are. We leave the following
problem as an exercise: finding the mgu of s and t after lunify(s, t) returns true.
Since lunify does not change the term graph of s and t, we may wonder how to
get the term tσ from G if σ is the mgu of s and t. The term tσ , which is equal to
sσ , can be obtained by term (vs ) (or term (vt )), where term can be recursively
defined as follows: For any node v of G:
1. If up(v) = v, term (v) = term (up(v)).
2. If label(v) is a constant or a variable and up(v) = v, term (v) = label(v).
3. If label(v) is a function f other than a constant, term (v) = f (s1 , s2 , . . . , sk ),
where si = term (child(v, i)) for 1 ≤ i ≤ arity(f ).
It is easy to check that term (v) = term(v) before the execution of lunify as at that
time, up(v) = v for any v. We may view {(v, up(v)) | up(v) = v} as a new set of
edges added to G = (V , E) during the execution of lunify. This set of edges defines
not only the equivalence relation represented by the union/find data structure, but
also the mgu σ when s and t are unifiable.
The contrived examples in Table 6.1 illustrate the behavior of lunify. The size of a
term is the total number of symbols in the term, excluding parentheses and commas.
The column visited gives the number of nodes visited before a result is returned.
6.2 Resolution 213
Table 6.1 The last column gives the number of visited nodes by unify1 in the term graph for each
example
Pair of terms Size Failure Visited
P1 f (f (. . . f (f (a, x1 ), x2 ), . . .), xn ) 4n + 2 None 4n + 2
f (xn , f (xn−1 , f (. . . , f (x1 , a) . . .)))
P2 f (z, g n (x)) n+3 Occur n+3
f (g(z), g n (y))
P3 h(x, f (x, f (x, . . . , f (x, f (x, x)) . . .)), x) 8n + 3 Occur 8
h(f (y, . . . , f (y, f (y, y)) . . .), y, y)
P4 f (f (. . . f (f (a, x1 ), x2 ), . . .), xn ) 4n + 2 Clash 4n + 2
f (xn , f (xn−1 , f (. . . , f (x1 , b) . . .)))
P5 h(x, g n (x), x) 2n + 8 Occur 2n + 8
h(g n (y), y, y)
P6 f (f (f (. . . f (f (a, x1 ), x2 ), . . .), xn ), z) 4n + 7 Occur 4n + 7
f (f (xn , f (xn−1 , f (. . . , f (x1 , a) . . .))), g(z))
In P1, the two terms are unifiable, and the occur-check can be awfully expensive
if we do not postpone the check. In P2, the first occur-check (on z g(z)) will
fail, though lunify will not know it until postOccurCheck is called. In P3, the last
occur-check will fail. It is interesting to notice that unify1 will report this failure
by the marking mechanism in a constant number of steps, after visiting only eight
nodes of the term graph. In P4, a clash failure takes place at the bottom of the depth-
first search. P5 is similar to P3, though an occur-check failure can be found only
after a thorough search. P6 is similar to P1, though the last occur-check will locate
the failure.
6.2 Resolution
(A | α) (B | β)
(α | β)σ
214 6 Unification and Resolution
where σ is a mgu of A and B. The clause (α | β)σ produced by the resolution rule
is called resolvent of the resolution; c1 and c2 are the parents of the resolvent.
This resolution rule is also called binary resolution, as it involves two clauses as
premises.
Example 6.2.2 Given a pair of clauses (p(f (x1 )) | q(x1 )) and (p(x2 ) | q(g(x2 ))).
The resolution can generate two resolvents from them: Resolve on p to get (q(x1 ) |
q(g(f (x1 )))) with the mgu [x2 f (x1 )] and on q to get (p(f (g(x2 ))) | p(x2 ))
with the mgu [x1 g(x2 )].
The above example shows that more than one resolvent can be generated from
two clauses. We will use resolve(c1 , c2 ) to denote the set of all resolvents from
clauses c1 and c2 . In propositional logic, if two clauses do not have duplicated
literals and can generate more than one resolvent, then these resolvents are
tautology.
Proposition 6.2.3 The resolution rule is sound, that is, c1 ∧ c2 | resolve(c1 , c2 ).
Proof Suppose c1 is (A | α), c2 is (B | β), and resolve(c1 , c2 ) contains (α ∨ β)σ
and σ is a mgu of A and B. For any model I of c1 ∧ c2 , I is also a model of c1 σ =
Aσ ∨ ασ and c2 σ = Bσ ∨ βσ , because free variables are treated as universally
quantified variables. Since σ is a unifier of A and B, Aσ = Bσ . Thus, the truth
values of Aσ and Bσ are different in I . If I (Aσ ) = 0, then I (ασ ) = 1; otherwise
I (c1 σ ) would be 0. If I (Aσ ) = 1, then I (Bσ ) = 0, so I (βσ ) = 1. In both cases,
either ασ or βσ or both are true in I , so (α | β)σ is true in I .
By the above proposition, every resolvent is a logical consequence of the
input clauses, because the entailment relation is transitive. If the empty clause is
generated, then the input clauses are unsatisfiable. In other words, we may use the
resolution rule to design a refutation prover. Suppose we have some axioms, say A,
and we want to see if some conjecture, say B, is a logical consequence of A. By
the refutational strategy, we want to show that A ∧ B is unsatisfiable, by putting
A ∧ B into clausal form and finding a contradiction through resolution. We wish the
following logical equivalences hold.
• B is a logical consequence of A, i.e., A | B, iff
• A ∧ B is unsatisfiable, iff
• S, the clause set derived from A ∧ B, is unsatisfiable, iff
• The empty clause can be generated by resolution from S.
Unfortunately, the last “iff” does not hold in first-order logic. Indeed, when S
is satisfiable, resolution will never generate the empty clause. If the empty clause
can be generated, we can claim that S is unsatisfiable, because resolution is sound.
However, when S is unsatisfiable, resolution may not generate the empty clause.
Example 6.2.4 Let S = {(p(x1 ) | p(y1 )), (p(x2 ) | p(y2 ))}. One resolvent of the
two clauses in S is (p(x3 ) | p(y3 )), and we may add it into S. Any other resolvent
from S will be a renaming of the three clauses. In other words, resolution alone
6.2 Resolution 215
cannot generate the empty clause from S. On the other hand, an instance of the first
clause is (p(a) | p(a)), which can be simplified to (p(a)). Adding this unit clause
to S, we can easily obtain the empty clause from S.
In propositional logic, we defined a proof system which uses the resolution rule
as the only inference rule and showed that this proof system is a decision procedure
for propositional clauses. In first-order logic, we do not have such a luck.
6.2.2 Factoring
To get rid of the problem illustrated in the previous example, we need the following
inference rule:
Definition 6.2.5 (Factoring) Let clause C be (A | B | α), where A and B are
literals and α are the rest literals in C. Factoring is the following inference rule:
(A | B | α)
(A | α)σ
For propositional logic, we introduced three deletion strategies for deleting clauses
without worrying about missing a proof:
216 6 Unification and Resolution
The above procedure is almost identical to Algorithm 3.3.27 with three differ-
ences: (a) resolve(α, β) returns a set of resolvents instead of a single resolvent
(line 8); (b) factoring is used (line 11); (c) the termination of the procedure is
not guaranteed as it may generate an infinite number of new clauses. Thus, it
is not an algorithm. Despite this, this procedure is a base for many resolution
theorem provers, including McCune’s Otter and its successor Prover9, which will
be introduced shortly. In Prover9, the subsumption check at line 10 is called forward
6.3 Simplification Orders and Ordered Resolution 217
In Sect. 3.3.2, we listed some well-known resolution strategies which add restric-
tions to the use of the resolution rule: unit resolution, input resolution, ordered
resolution, positive resolution, negative resolution, set of support, and linear res-
olution. They can be carried over to first-order logic without modification, with
the exception of ordered resolution. The procedure resolvable(A, B) (line 7 of
resolution) is the place where various restrictions are implemented so we can use
these restricted resolution strategies.
For unit resolution and input resolution, they are still incomplete in general, but
are equivalent in the sense that there is a resolution proof for one strategy, then there
exists a resolution proof for the other strategy.
To use ordered resolution in first-order logic, we need the concept of simplifica-
tion orders, which are well-founded partial orders over terms.
Recall that a partial order is an antisymmetric and transitive binary relation over
a set S of elements, and is well-founded if there is no infinite sequence of distinct
elements x1 , x2 , . . . , xi , . . . , such that
x1 x2 · · · xi · · ·
Multiset Extension
There are many ways to obtain well-found orders. A multiset or bag over a set S of
elements is a modification of the concept of a subset of S, that, unlike a set, allows
for multiple instances for each of its elements from S. The union, intersection, and
subtraction operations can be extended over multisets. If there exists a strict partial
order over the set S, we may extend to mul to compare multisets of S:
Definition 6.3.1 For any finite multisets S1 , S2 over S, S1 mul S2 if S1 − S2 = ∅,
and for any y ∈ S2 − S1 , there exists x ∈ S1 − S2 , x y.
The “finite” condition is necessary. For instance, if S1 and S2 are the sets of odd
and even natural numbers, respectively, then we have S1 mul S2 and S2 mul S1 .
In other words, mul is not antisymmetric over infinite multisets.
Example 6.3.2 Let S1 = {a, b, b, c} and S2 = {a, a, b, c}, then S1 ∪ S2 =
{a, a, a, b, b, b, c, c}, S1 ∩ S2 = {a, b, c}, S1 − S2 = {b}, and S2 − S1 = {a}.
If c b a, then S1 mul S2 because b a.
Recall that a well-order is total and well-founded.
Proposition 6.3.3 If is a well-order over a set S, then mul over the multisets of
S is a well-order.
Lexicographic Extension
Precedence
When using ordered resolution, the user is often required to provide a partial order,
called precedence, over function or predicate symbols. Given a first-order language
L = (P , F, X, Op), let = P ∪ F ∪ Op. We say is a precedence if is a
well-order over . Note that if is finite, is trivially well-founded.
For some applications, we may need to define an equivalence ≈ over a set. For
instance, we may allow ∨ ≈ ∧ in a precedence. By definition, a partial order
is an antisymmetric and transitive relation. Allowing ≈ more than = violates the
antisymmetry property, which requires s = t if s t and t s. In literature, if
s t and t s imply s ≈ t, is called “quasi-order.” A quasi-orders can be
obtained from a partial order by adding ≈, that is, let = ∪ ≈. In other
words, a b iff a b or a ≈ b.
Proposition 6.3.6 If partial order is well-founded and any equivalent class of
elements under ≈ is finite, then quasi-order = ∪ ≈ is well-founded.
Since quasi-orders inherit most properties of partial orders, we will treat partial
orders and quasi-orders with finite equivalent classes indifferently.
There are several requirements on the orders used for ordered resolution and we call
it “simplification order.”
Definition 6.3.7 (Stable and Monotonic Relation) A binary relation R over S,
where S denotes the set of terms, atoms, and clauses, is said to be stable if for any
s, t ∈ S, R(s, t) implies R(sσ, tσ ) for any substitution σ : X → T (F, X). R is said
to be monotonic if R(s, t) implies R(f (. . . , s, . . .), f (. . . , t, . . .)) for any symbol
f , where f is either a function or predicate symbol or a Boolean operator.
For a stable order over terms, we cannot have f (y) x because f (y)σ xσ
does not hold for σ = [x f (y)].
Definition 6.3.8 (Simplification Order) A partial order over terms or clauses is
called simplification order if it is well-founded, stable, and monotonic.
220 6 Unification and Resolution
For any term or clause t, let var(t) denote the set of variables appearing in t.
Definition 6.3.9 (lpo ) The lexicographical path order lpo is defined recursively
as follows: s lpo t if s = t and either (a) t ∈ var(s) or (b) s = f (s1 , . . . , sm ) lpo
t = g(t1 , . . . , tn ) by one of the following conditions:
1. f g and s lpo ti for 1 ≤ i ≤ n, or
2. f ≈ g, [s1 , . . . , sm ] lex
lpo [t1 , . . . , tn ], and s lpo ti for 1 ≤ i ≤ n, or
3. f ≺ g and there exists j , 1 ≤ j ≤ m, sj lpo t.
Example 6.3.10 Let ∗ +, s be x ∗ (y + z) and t be (x ∗ y) + (x ∗ z). Then s lpo t
by condition (b)1, because ∗ +, s lpo x ∗ y and s lpo x ∗ z, both by condition
(b)2. That is, from [x, (y + z)] lex
lpo [x, y] and [x, (y + z)] lpo [x, z].
lex
Definition 6.3.12 (rpo ) The recursive path order rpo over T (F, X) is defined
recursively as follows: s rpo t if s = t and either (a) t ∈ var(s) or (b) s =
f (s1 , . . . , sm ) rpo t = g(t1 , . . . , tn ) by one of the following conditions:
1. f g and s rpo ti for 1 ≤ i ≤ n, or
2. f ≈ g and {s1 , . . . , sm } mul
rpo {t1 , . . . , tn }, or
3. f ≺ g and there exists j , 1 ≤ j ≤ m, sj rpo t.
Example 6.3.13 Let s be i(x ∗ y) and t be i(y) ∗ i(x) with i ∗. To show s rpo t,
condition (b)1 applies since i ∗, and we need to show s rpo i(x) and s rpo
i(y). To show s rpo i(x), condition (b)2 applies and we need x ∗ y mul rpo x,
which holds by (a). The proof of s rpo i(y) is identical to that of s rpo i(x). In
comparison, s lpo t holds by the same proof.
6.3 Simplification Orders and Ordered Resolution 221
The Knuth–Bendix order (kbo ), invented by Donald Knuth and Peter Bendix in
1970, assigns a number, called weight, to each function, predicate, or variable
symbol, and uses the sum of weights to compare terms. When two terms have the
222 6 Unification and Resolution
same weight, the precedence relation is used to break ties. Using weights gives us
flexibility as well as complication.
Definition 6.3.17 (Weight Function) A function w : X ∪ → N is said be a
weight function if it satisfies (i) w(c) > 0 for every constant t ∈ , and (ii) there
exists at most one unary function symbol f ∈ such that w(f ) = 0 and f must be
maximal in the precedence if w(f ) = 0.
The weight function w can be extended a function over terms or formulas as
follows:
On the other hand, the list of terms in the increasing order of lpo (or rpo ) is
Note that all of these orders provided by KBO, LPO, or RPO are well-founded.
The above example shows the flexibility of kbo , which provides various ways
of comparing two terms by changing weight functions as needed. However, kbo
lacks the flexibility of comparing terms like x ∗ (y + z) and x ∗ y + x ∗ z. We cannot
make x ∗ (y + z) kbo x ∗ y + x ∗ z, while x ∗ (y + z) lpo x ∗ y + x ∗ z if ∗ +
and x ∗ y + x ∗ z lpo x ∗ (y + z) if + ∗.
Now when we say to use a simplification order , we could use either lpo ,
rpo , or kbo .
Example 6.3.22 Let the operators ↔, ⊕, →, ¬, ∨, ∧ be listed in the descending
order of the precedence , then the right side of each following equivalence relation
is less than the corresponding left side by either lpo or rpo .
These equivalence relations are used to translate a formula into CNF. Thus, the term
orderings like rpo and lpo can be used to show the termination of the process of
converting a formula into CNF.
If we want to convert formulas into DNF, then the precedence relation should be
∧ ∨, so that the termination of converting a formula into DNF can be shown by
rpo or lpo .
224 6 Unification and Resolution
(A | α) (B | β)
(α | β)σ
Given a set S of clauses, let S ∗ denote the set of clauses saturated by ordered
resolution and factoring. That is, any resolvent from ordered resolution on any two
clauses of S ∗ is in S ∗ and any clause from factoring on any clause in S ∗ is also in
S ∗ . Let GC be the set of all ground instances of S ∗ .
Lemma 6.3.24 GC is saturated by ordered resolution.
Proof Suppose (g | α) and (g | β) are in GC, where g is the maximal atom in
both clauses. The resolvent of the two clauses is (α | β). We need to show that
(α | β) ∈ GC.
Let (g | α) be an instance of (A | α ) ∈ S ∗ , i.e., g = Aθ and α = α θ for some
substitution θ . We assume that A is the only literal in (A | α ) such that Aα = g;
otherwise, we use factoring to achieve this condition. Similarly, let (g | β) be an
instance of (B | β ) ∈ S ∗ , i.e., g = Bγ and β = β γ for some γ , and B is the only
literal in (B | β ) such that Bγ = g.
Since g = Aθ = Bγ is maximal, A and B must be unifiable and maximal in
(A | α ) and (B | β), respectively. Let λ be the mgu of A and B, then the resolvent of
(A | α ) and (B | β ) by ordered resolution is (α | β )λ, which must be in S ∗ . Since
θ = λθ and γ = λγ for some θ and γ , we must have (α | β) = (α | β )λθ γ ,
which is an instance of (α | β )λ. Hence, (α | β) ∈ GC.
Theorem 6.3.25 (Refutational Completeness of Ordered Resolution) S is
unsatisfiable iff the empty clause is in S ∗ .
Proof If the empty clause is in S ∗ , since S | ⊥, S must be unsatisfiable.
6.3 Simplification Orders and Ordered Resolution 225
If the empty clause is not in S ∗ , we will construct a Herbrand model for S from
GC, which is the set of all ground instances of S ∗ .
Let GA be the set of ground atoms appearing in GC. The empty clause is
not in GC because any instance of an non-empty clause cannot be empty. By the
assumption, is a well-order over GA.
We start with an empty model H in which no atoms have a truth value. We then
add ground literals into H one by one, starting from the minimal atom, which exists
because is well-founded. Let g be any atom in GA, and
That is, pc(g) contains all the clauses which contains g as its maximal literal; nc(g)
contains all the clauses which contains g as its maximal literal; and upto(g) contains
all the clauses from GC whose maximal literals are less or equal to g.
We claim that after g or g is added into H , the following property is true:
Claim: H is a model of upto(g) after g or g is added into H .
Let g1 be the minimal atom of GA. We cannot have both (g1 ) and (g1 ) in GC;
otherwise, the empty clause must have been generated from (g1 ) and (g1 ) because
GC is saturated. If (g1 ) ∈ GC, we add g1 into H (it means H (g1 ) = 1); otherwise,
we add g1 into H (it means H (g1 ) = 0). It is trivial to check that H is a model of
upto(g1 ) (which may be empty), so the claim is true for g1 . This is the base case of
the induction based on .
Let g be the minimal atom in GA which has no truth value in H . Assume as the
induction hypothesis that the claim is true for all g ≺ g. If there exists a clause
(g | α) ∈ pc(g) such that H (α) = 0, we add g into H ; otherwise, add g into H .
Suppose g is added into H , then every clause in pc(g) will be true under H . For
any clause (g | β) ∈ nc(g), then there exists an ordered resolution between (g | α)
and (g | β), and their resolvent is (α | β). By Lemma 6.3.24, (α | β) ∈ GC. Since
every literal g of (α | β) is less than g, by the induction hypothesis, (α | β) is true
in H . Since H (α) = 0, we must have H (β) = 1. That means no clause in nc(g)
will be false in H .
Now suppose g is added into H . The above analysis also holds by changing the
role of g and g. In both cases, all clauses in pc(g) and nc(g) are true in H after
adding either g or g into H . Thus, the claim is true for every g ∈ GA. Once every
atom in GA is processed, we have found a model H of GC by the claim. Since GC
is the set of all ground instances of S ∗ , H is a Herbrand model of S ∗ or S. In other
words, S is satisfiable.
Note that the model construction in the above proof is not algorithmic, because
GC is infinite in general. The induction is sound because it is based on the well-
founded order .
226 6 Unification and Resolution
Now it is easy to prove Theorem 6.2.7: If the empty clause is generated from
S, then S must be unsatisfiable; if S is unsatisfiable, then the empty clause will be
generated by ordered resolution, which is a special case of resolution.
A traditional proof of Theorem 6.2.7 uses Herbrand’s theorem:
Theorem 6.3.26 (Herbrand’s Theorem) A set S of clauses is unsatisfiable iff
there exists a finite unsatisfiable set of ground instances of S.
From Herbrand’ theorem, a proof of Theorem 6.2.7 goes as follows:
1. If S is unsatisfiable, then by Herbrand’s theorem, there exists a finite unsatisfiable
set G of ground instances of S.
2. Resolution for propositional logic will find a proof from G, treating each ground
atom in G as a propositional variable.
3. This ground resolution proof can be lifted to a general resolution proof in S,
using a lemma similar to Lemma 6.3.24.
The completeness proof based on simplification orders comes from [4] and does
not use Herbrand’s theorem.
Prover9 has a fully automatic mode in which the user simply gives it formulas
representing the problem. A good way to learn about Prover9 is to browse and study
the example input and output files that are available with the distribution of Prover9.
Let us look at an example.
Example 6.4.1 Once we can specify the following puzzle in first-order logic, it is
trivial to find a solution by Prover9:
Jack owns a dog. Every dog owner is an animal lover. No animal lover kills an animal.
Either Jack or Curiosity killed the cat, who is named Tuna. Did Curiosity kill the cat?
The following first-order formulas come from the statements of the puzzle:
6.4 Prover9: A Resolution Theorem Prover 227
The goal is to show that kills(Curiosity, T una). The above formulas can be
written in Prover9’s syntax as follows:
exists x (dog(x) & owns(Jack, x)).
all x (exists y (dog(y) & owns(x, y)) -> animalLover(x)).
all x (animalLover(x) -> (all y (animal(y) -> -kills(x, y)))).
kills(Jack, Tuna) | kills(Curiosity, Tuna).
cat(Tuna).
all x (cat(x) -> animal(x)).
.
The resolution is always done between a clause from the sos list (the given list) and
some clauses from the usable list (the kept list). Once all the resolvents between this
228 6 Unification and Resolution
clause and the usable list have been computed, this clause moves from the sos list to
the usable list. In other words, resolutions between the clauses in the original usable
list are omitted.
A basic Prover9 command on a Linux machine will look like
prover9 -f Tuna.in > Tuna.out
or
prover9 < Tuna.in > Tuna.out
where the file named “Tuna.in” contains the input to Prover9 and the file named
“Tuna.out” contains the output of Prover9. If the file “Tuna.in” contains the formulas
in Example 6.4.1, then the file “Tuna.out” will contain the following proof:
========================== PROOF =============================
In Prover9, the empty clause is denoted by $F. In each clause, the literals are
numbered by a, b, c, . . . . For example, to generate clause 18, -kills(Jack,
Tuna), from
14 animal(Tuna). [resolve(12,a,13,a)].
15 -animal(x) | -kills(Jack,x). [resolve(10,a,11,a)].
6.4 Prover9: A Resolution Theorem Prover 229
Prove9 performed a resolution on the first literal (a) of clause 14 and the first literal
(a) of clause 15, with the mgu x Tuna, and the resolvent is
18 -kills(Jack,Tuna). [resolve(14,a,15,a)].
The message followed by each clause can be checked by either human or machine,
to ensure the correctness of the proof.
There are two types of parameters in Prover9: Boolean flags and numeric
parameters. To change the default values of the former type, use set(flag) or
clear(flag). The latter can be changed by assign(parameter, value). For
example, the default automatic mode is set by
set(auto).
Turning “auto” on will cause a list of other flags to turn on, including setting
up “hyper-resolution” as the main inference rule for resolution. Hyper-resolution
reduces the number of intermediate resolvents by combining several resolution steps
into a single inference step.
Definition 6.4.2 (Hyper-Resolution) Positive hyper-resolution consists of a
sequence of positive resolutions between one non-positive clause (called nucleus)
and a set of positive clauses (called satellites), until a positive clause or the empty
clause is produced.
The number of positive resolutions in hyper-resolution is equal to the number of
negative literals in the nucleus clause.
Example 6.4.3 Let the nucleus clause be (p(x) | q(x) | r(x)), and the satellite
clauses be (q(a) | r(b)) and (p(a) | r(c)). The first resolution between the nucleus
and the first satellite produces (p(a) | r(a) | r(b)). The second resolution between
the resolvent and the second satellite produces the second resolvent (r(a) | r(b) |
r(c)), which is the result of positive hyper-resolution between the nucleus and the
satellites.
Negative hyper-resolution can be defined similarly by replacing “positive” by
“negative.” Hyper-resolution means both positive and negative hyper-resolutions.
Definition 6.4.4 (ur-Resolution) Unit-resulting (ur)-resolution consists of a
sequence of unit resolutions between one non-unit clause (called nucleus) and a
set of unit clauses (called satellites), until a unit clause or the empty clause is
produced.
230 6 Unification and Resolution
Example 6.4.5 In Sect. 3.2.2, we discussed the Hilbert system which consists of
three axioms and one inference rule, i.e., modus ponens. To prove x → x in the
Hilbert system, we may use the following input:
op(400, infix_right, ["->"]). % infix operator
formulas(usable).
P((n(y) -> n(x)) -> (x -> y)).
-P(x) | -P(x -> y) | P(y).
end_of_list.
formulas(sos).
P(x -> (y -> x)).
P((x -> (y -> z)) -> ((x -> y) -> (x -> z))).
-P(a -> a).
end_of_list.
For the above example, ur-resolution will produce the same proof as hyper-
resolution does. Hyper-resolution is the default inference rule for the automatic
mode. If you like to see a binary resolution proof, you have to turn off “auto” and
turn on “binary_resolution”:
clear(auto).
set(binary_resolution).
Since positive resolution is complete and unit resolution is not complete, hyper-
resolution is complete and unit-resulting resolution is not complete. To use “unit-
resulting resolution,” use the command
set(ur_resolution).
This option puts restrictions on the binary and hyper-resolution inference rules.
It says that resolved literals in one or more of the parents must be maximal in the
clause. Continuing from the previous example, if we turn on the ordered resolution
flag, the proof generated by Prover9 will be the following.
1 -P(x) | -P(x -> y) | P(y). [assumption].
2 P(x -> y -> x). [assumption].
3 P((x -> y -> z) -> (x -> y) -> x -> z). [assumption].
4 -P(a -> a). [assumption].
5 -P(x) | P(y -> x). [resolve(2,a,1,b)].
6 -P(x -> y -> z) | P((x -> y) -> x -> z). [resolve(3,a,1,b)].
8 P(x -> y -> z -> y). [resolve(5,a,2,a)].
9 P(x -> y -> z -> u -> z). [resolve(8,a,5,a)].
10 P(x -> y -> z -> u -> w -> u). [resolve(9,a,5,a)].
14 P((x -> y) -> x -> x). [resolve(6,a,2,a)].
19 -P(x -> y) | P(x -> x). [resolve(14,a,1,b)].
21 P(x -> x). [resolve(19,a,10,a)].
22 $F. [resolve(21,a,4,a)].
Prover9 has several methods available for comparing terms or literals. The term
orders are partial orders (and sometimes total on ground terms), and they are used to
decide which literals in clauses are admissible for application of ordered resolution.
Several of the resolution rules require that some of the literals be maximal in their
clause.
The symbol precedence is a total order on function and predicate symbols
(including constants). The symbol weighting function maps symbols to non-
negative integers. Prover9 supports three simplification orders: LPO (lexicographic
path order), RPO (recursive path order), and KBO (Knuth–Bendix order), which are
introduced in Sect. 6.3.2.
232 6 Unification and Resolution
This option is used to select the primary order to be used for determining maximal
literals in clauses. The choices are “lpo” (lexicographic path order), “rpo” (recursive
path order), and “kbo” (Knuth–Bendix order).
The default symbol precedence (for LPO, RPO, and KBO) is given by the
following rules (in order).
• Function symbols < non-equality predicate symbols.
• For function symbols: c/0 < f/2 < g/1 < h/3 < i/4 < . . ., where c is any
constant, f is any function of arity 2, g is any function of arity 1, . . . (note the
position of g/1).
• For predicate symbols: lower arity < higher arity.
• Non-Skolem symbols < Skolem symbols.
• For Skolem symbols, the lower index is the lesser.
• For non-Skolem symbols, more occurrences < fewer occurrences.
• the lexical ASCII order (UNIX strcmp() function).
The function_order and predicate_order commands can be used to change
the default symbol precedence. They contain lists of symbols ordered by the
precedence from the smallest to the largest. For example,
predicate_order([=, <=, P, Q]). % = < <= < P < Q
function_order([a, b, c, +, *, h, g]). % a < b < c < + < * < h < g
We need two separate commands for defining the precedence, because predicate
symbols are assumed to be always greater than function symbols in the precedence.
The used symbol precedence for a problem is always printed in the output file (in
the section PROCESS INPUT).
Prover9 has proved automatically many theorems in the TPTP library, where
TPTP stands for Thousands of Problems for Theorem Provers. TPTP is maintained
by Geoff Sutcliffe at University of Miami. According to the website of TPTP,
tptp.org, TPTP contains thousands of test problems over more than 50 domains
and supports input formats for more than 50 automated theorem proving (ATP)
systems. TPTP has been used to support many ATP competitions. TPTP supplies
the ATP community with the following functions:
• A comprehensive library of the test problems that are available today, in order to
provide an overview and a simple, unambiguous reference mechanism.
• A comprehensive list of references and other interesting information for each
problem.
Exercises 233
Exercises
(x × y) × z = x × (y × z)
x × (y + z) = (x × y) + (x × z)
(y + z) × x = (y × x) + (z × x)
Can you use rpo or kbo to achieve the same result? Why?
11. Choose with justification a simplification order from lpo , rpo and kbo such
that the left side s is greater than the right side t for each of the following
equations (which defines the Ackermann’s function):
A(0, y) = s(y)
A(s(x), 0) = A(x, s(0))
A(s(x), s(0)) = A(x, A(s(x), y))
And it asks for the solution of the following question: “Is there a member of
the Hoofers Club who is a mountain climber but not a skier?” Please find the
following resolution proofs for the Hoofers Club problem: (a) unit, (b) input,
(c) positive, (d) negative, and (e) linear resolutions.
13. Use Prover9 to answer the question in the Hoofers Club problem of the previous
problem. Please prepare the input to Prover9 and turn in the output file of
Prover9.
14. Use binary resolution, hyper-resolution, and ur-resolution, respectively, of
Prover9 to answer the question “Is John happy?” from the following statements:
Anyone passing his logic exams and winning the lottery is happy. But anyone who
studies or is lucky can pass all his exams. John did not study but he is lucky. Anyone
who is lucky wins the lottery.
References 235
Please prepare the input to Prover9 and turn in the output file of Prover9.
16. Following the approach illustrated in Example 6.4.5, use Prover9 to prove that
the following properties are true in the Hilbert system:
References
1. Peter Ruzicka and Igor Privara, An almost linear Robinson unification algorithm, Acta Infor-
matica, 1988, v27, pp. 61–71
2. Alexander Leitsch, The Resolution Calculus. Texts in Theoretical Computer Science. An
EATCS Series. Springer, 1997. ISBN 978-3642606052
3. Williams McCune, Prover9 and Mace4, www.cs.unm.edu/ mccune/prover9/, retrieved
Nov. 11, 2023.
4. Hantao Zhang, Reduction, superposition and induction: automated reasoning in an equational
logic, Ph.D. Thesis, Rensselaer Polytechnic institute, New York, 1988
Chapter 7
First-Order Logic with Equality
There are many equivalence relations in logic. When looking at a semantic level, we
have logical equivalence and equisatisfiability, which are denoted by the symbols ≡
and ≈, respectively, for both propositional logic and first-order logic. When looking
at a syntactic level, we have the logical operator ↔ to express the equality between
two statements. A first-order language has two types of objects, i.e., formulas and
terms, and we use ↔ for the formulas. The conventional symbol for the equality
of terms is “=”, which is a predicate symbol. In the previous chapter, we studied
first-order logic without equality and have been using “=” as an identity relation
outside of first-order logic. In this chapter, we study first-order logic with equality,
that is, we study “=” as a predicate symbol in first-order logic.
The above formula is often denoted by ∀x∃!y p(x, y) in mathematics. Using “=”
gives us higher expressive power to specify various statements:
• We can express “there are at least two elements for x such that A(x) holds” as
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 237
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2_7
238 7 First-Order Logic with Equality
• We can express “there are at most two elements for x such that A(x) holds” as
This states that if we have three elements satisfying A(x), then two of them must
be equal.
• We can express “there are exactly two elements for x such that A(x) holds” as
the conjunction of the above two statements.
The axioms of equality consist of five types of formulas; each is sufficiently famous
to have earned itself a name [1]. In the following, the variables appearing in the
formulas are assumed universally quantified:
• Reflexivity:
x = x. An object is always equal to itself.
• Commutativity:
(x = y) → (y = x). It does not matter how the parameters of = are ordered.
That is, = is commutative.
• Transitivity:
(x = y) ∧ (y = z) → (x = z). Equality can be propagated. That is, = is
transitive.
• Function monotonicity:
For 1 ≤ i ≤ k, (xi = yi ) → (f (x1 , . . . , xk ) = f (y1 , . . . , yk )). If the arguments
of a function are pair-wisely equal, the composed terms will be equal. That is, =
is a monotonic relation over terms.
• Predicate monotonicity:
For 1 ≤ i ≤ k, (xi = yi )∧p(x1 , . . . , xk ) → p(y1 , . . . , yk ). If the arguments of a
predicate are pair-wisely equal, then the resulting atoms are logically equivalent.
The first three axioms together say that “=” is an equivalence relation. The fourth
and fifth are very similar and share the name monotonicity. The first monotonicity
rule governs terms and the second monotonicity rule governs formulas. Both of
them assert that having equal arguments ensures equal results. In fact, these are
axiom schemata as we need a version for each function f and each predicate p.
In other words, we will need different versions of the fourth and fifth axioms in
different first-order languages. Sometimes, we may split the monotonicity axioms
into a set of axioms for each argument. For example, if the arity of predicate p is 2,
we obtain a set of two axioms from the predicate monotonicity:
These monotonicity axioms are not to be confused with the substitution rule,
which allows the monotonicity of terms for variables only.
All the five types of axioms can be easily converted into clauses so that we can
use resolution to prove theorems involving =.
Example 7.1.1 Let f (a) = b and f (b) = a, if we are not allowed to use “equal-
by-equal substitution,” how could you prove that f (f (a)) = a? The answer is
resolution. Since resolution is a refutational prover, we need to add the negation of
f (f (a)) = a, i.e., f (f (a)) = a, into the set of clauses from the equality axioms
and the premises and show the clause set is unsatisfiable:
1 (f (a) = b) premise
2 (f (b) = a) premise
3 (f (f (a)) = a) negation of the goal
4 (x = y | y = z | x = z) transitivity
5 (x = y | f (x) = f (y)) monotonicity
6 (f (f (a)) = f (b)) resolvent from 1 and 5
7 (f (b) = z | f (f (a)) = z) resolvent from 6 and 4
8 (f (f (a)) = a) resolvent from 2 and 7
9 () resolvent from 3 and 8
It shows clearly that without equality axioms, we cannot have a resolution proof
of f (f (a)) = a, and the resolution proof is cumbersome if we cannot use “equal-
by-equal substitution.”
The equality symbol is meant to model the equality of objects in a domain. For
example, when we say “2 + 0 = 2” or add(s(s(0)), 0) = s(s(0)), we meant that
“2+0” and “2” represent the same object in a domain. That is, s = t is true if s
and t represent “equal” or “identical” objects in any domain. We are asserting that
two different descriptions refer to the same object. Because the notion of identity
can be applied to virtually any domain of objects, equality is often assumed to be
omnipresent in every logic. However, talk of “equality” or “identity” raises messy
philosophical questions. “Am I the same person I was three days ago?”
We need a simple and clear way to define the meaning of =. For a first-order
language L = (P , F, X, Op), a Herbrand model for a CNF A can be simply
represented by a set H of ground atoms such that only those atoms in H are
interpreted to be true. Because of the existence of the equality axioms, the set H
must hold some properties.
Definition 7.1.2 (Congruence) A relation =m over the set of terms T (F, X) is
called a congruence if =m is an equivalence relation closed under instantiation:
sσ =m tσ if s =m t
240 7 First-Order Logic with Equality
[s] = {t | s =m t, t ∈ T (F )},
the set of all congruence classes, which is also called the Herbrand base modulo
=m .
Example 7.1.4 Let F = {0/0, s/1}, and A contains s(s(s(x))) = x and the
equality axioms. Let H be a Herbrand model of A in which (0 = s(0)) ∈
/ H,
then
Now, it is easy to check if two ground terms s and t are equal or not under H :
(s = t) ∈ H iff t ∈ [s] (or s ∈ [t]), i.e., s =H t. In fact, when we define a Herbrand
model for a formula with equality, at first we can define a congruence =m and then
regard T (F )/=m as the domain of the Herbrand model with equality: (s = t) ∈ H
iff s =m t. This way, we do not have to concern about the equality axioms explicitly.
Example 7.1.5 Let F = {0/0, s/1, −/1}, and A contains −(0) = 0, −(−(x)) = x,
s(−(s(x))) = −x, and the equality axioms:
In this example, each congruence class [t] is infinite, and the collection of all
congruence classes, i.e., T (F )/ =A , is infinite, too. A natural model of A is the
set Z of integers, where a nonnegative integer n is represented by s n (0) and a
negative integer −n by −(s n (0)) for n > 0. The meaning of s is “to add one”
and − is the minus sign. We may use T (F )/ =A as the Herbrand base to define
a Herbrand model. However, it would be cumbersome to define usual operations
like +, −, ∗, etc., in this setting. An alternative solution is to use {0, s, p} with
{s(p(x)) = x, p(s(x)) = x} as the building blocks for the set of integers: negative
integer −n is represented by pn (0) and positive integer n by s n (0).
A set of equations is simply a set of positive unit clauses where the only predicate
symbol is “=”.
Example 7.1.6 In modern algebra, a group is G = (S, ∗), where S is a set of
elements and ∗ is a binary operator satisfying the properties that (a) ∗ is associative
and closed on S; (b) there exists an identity element in S; and (c) every element of
S has an inverse in S. Using the first-order logic approach, the closure property is
implicit; the identity is denoted by e; for each x ∈ S, the inverse of x is denoted by
i(x). We just need the following three equations as axioms:
From these three equations, we may prove many interesting properties of the group
theory, such as i(i(x)) = x, x ∗ e = x, x ∗ i(x) = e, etc.
Definition 7.1.7 Given a set E of equations, the relation =E is recursively defined
as follows:
1 s =E t if s = t ∈ E
2 t =E t if t ∈ T (F, X)
3 s =E t if t =E s
4 s =E t if s =E r, r =E t
5 f (. . . , si , . . .) =E f (. . . , ti , . . .) if si =E ti and f ∈ F
6 sσ =E tσ if s =E t and σ : X → T (F, X) is any substitution
where [0] = {0, s(s(s(0))), s 6 (x), . . .}, [s(0)] = {s(0), s 4 (x), s 7 (x), . . .}, and
[s(s(0))] = {s(s(0)), s 5 (0), s 8 (0), . . .}.
where [x] = {x, s(s(s(x))), s 6 (x), . . .}, [s(x)] = {s(x), s 4 (x), s 7 (x), . . .}, and
[s(s(x))] = {s(s(x)), s 5 (x), s 8 (x), . . .}.
Proposition 7.1.11 Given E, let Eax be the equality axioms associated with E,
then for every (s, t) ∈ =E , E ∪ Eax | (s = t).
Proof (Sketch) Check all the six conditions of Definition 7.1.7 for s =E t:
condition 1 holds because A | A. Conditions 2–5 hold because of Eax . Condition
6 holds because the free variables are assumed to be universally quantified.
Since =E contains all the equations that are logical consequence of E and the
equality axioms, we call =E the theory of E.
Example 7.1.12 Let E be the three equations in Example 7.1.6, we may show that
x ∗ e =E x, x ∗ i(x) =E e, i(e) =E e, i(i(x)) =E x, i(x ∗ y) =E i(y) ∗ i(x), etc.
A proof will be provided in the next section.
Given any two terms, say s and t, does s =E t? This is an important
decision problem with many applications in mathematics. In computation theory,
all computable functions can be constructed by equations. Unfortunately, it is
undecidable in general to answer (s, t) ∈ =E ; in other words, we do not have an
algorithm which takes E, s, and t as input and returns yes if s =E t (see Chap. 11).
In computer science, the congruence closure problem refers to the problem
of deciding s =E t, when E is a set of ground equations. This is a decidable
problem and there exist efficient algorithms. There are a number of applications
using congruence closures. The detailed discussion on this topic can be found in
Chap. 12.
While the major use of R is to represent a congruence relation, R can be used for
representing any transitive, stable, and monotonic relation. This might be inequality
(like > or ≥) or implication.
Example 7.2.3 In Chap. 1 (Sect. 1.3.2), we have seen that the functions can be
constructed by equations. For instance, we used equations to define the predecessor,
addition, subtraction, and multiplication functions over the natural numbers. We
may treat these equations as rewrite rules:
pre(0) → 0
pre(s(x)) → x
add(0, y) → y
add(s(x), y) → s(add(x, y))
sub(x, 0) → x
sub(x, s(y)) → sub(pre(x), y)
mul(0, y) → 0
mul(s(x), y) → add(mul(x, y), y)
exp(x, 0) → s(0)
exp(x, s(y)) → mul(x, exp(x, y))
mul(s(s(0)), s(s(0))) at
⇒ add(mul(s(0), s(s(0))), s(s(0))) at 1
⇒ add(add(mul(0, s(s(0))), s(s(0))), s(s(0))) at 1.1
⇒ add(add(0, s(s(0))), s(s(0))) at 1
⇒ add(s(s(0)), s(s(0))) at
⇒ s(add(s(0), s(s(0)))) at 1
⇒ s(s(add(0, s(s(0))))) at 1.1
⇒ s(s(s(s(0))))
In this language, if a term is ground, it is not hard to prove by induction that the
rewriting process will eventually terminate with a number, i.e., a term containing
only 0 and s. Using this rewrite system, we can do all the addition, subtraction, and
multiplication over the natural numbers by rewriting, if we do not care about the
speed.
246 7 First-Order Logic with Equality
A nice property of many rewrite systems, including the previous example, is that the
application of rules to terms, ground or not, cannot go on forever; it will eventually
terminate.
Definition 7.2.4 A rewrite system R is said to be terminating if there exists no
infinite sequences of terms t0 , t1 , . . . , ti , . . . such that ti ⇒ ti+1 .
To prove that a rewrite system R is terminating, we may use a simplification
order (ref. Sect. 6.3.2).
Proposition 7.2.5 Let be a simplification order and R a rewrite system. If l r
for all every l → r in R, then R is terminating.
Proof By definition, is a well-founded, stable, and monotonic order on terms. If
ti ⇒ ti+1 by l → r, then ti ti+1 , because l r, and is stable and monotonic. If
there exists an infinite sequence, t0 , t1 , . . . , ti , . . . such that ti ti+1 , then cannot
be well-founded.
The above proposition provides a sufficient condition for the termination of
a rewrite system. In Sect. 6.3.2, we discussed lexicographic path order (LPO),
recursive path order (RPO), and Knuth-Bendix order (KBO). These orders are
simplification orders and can be used to prove the termination of R.
Example 7.2.6 To show the termination of R in Example 7.2.3, we may use the
lexicographic path order lpo with the precedence s ≺ pre ≺ add ≺ sub ≺ mul
and every binary operator has the left-to-right status, with the exception of sub:
sub must have the right-to-left status (Definition 6.3.16), as we want [x, s(y)] lex
[pre(x), y].
In general, how can we show that the rewriting process using R will terminate?
Unfortunately, this is an undecidable problem. That is, we do not have an algorithm
to answer this question for general R.
Definition 7.2.7 Given a rewrite system R, if s ⇒∗R t and there exists no t such
that t ⇒R t , we say t is a normal form of s in R.
Why is termination an important property of a rewrite system? The termination
of R ensures that there exists a normal form for every term. Without termination,
we will have hard time to control rewriting. In computation theory, termination is a
divider for telling whether a decision problem is decidable or not. Some equations,
such as the commutativity of + or ∨, can never be made into a terminating rule. It
is undecidable to tell if a rewrite system is terminating or not.
7.2 Rewrite Systems 247
Termination is one important property of a set of rewrite rules; the other important
property is confluence.
Example 7.2.8 In Example 7.1.6, we used three equations to specify a group (S, ∗).
These equations can be made into a rewrite system R:
(1) e∗x → x
(2) i(x) ∗ x → e
(3) (x ∗ y) ∗ z → x ∗ (y ∗ z)
That is, there is more than one way to rewrite a term. For (i(x) ∗ x) ∗ y, we have
two normal forms: i(x) ∗ (x ∗ y) and y.
Definition 7.2.9 A rewrite system R is said to be confluent if, for any term s, if
s ⇒∗R t1 and s ⇒∗R t2 , then there exists a term t such that t1 ⇒∗R t and t2 ⇒∗R t .
R is said to be canonical if R is both terminating and confluent.
Figure 7.1 illustrates the concepts of “terminating,” “confluent,” and “canonical.”
That is, if R is confluent, it does not matter how you go about applying the rules to
a term, you will get the same result. There are several concepts of “confluence”
proposed in literature. They are equivalent for terminating rewrite systems. The
termination of R ensures the existence of a normal form for every term; the
confluence of R ensures the uniqueness of normal forms. When R is canonical,
you do not need to worry about which positions and choose which rule to rewrite a
term. Any choice can be made, and the end result will be the same. Let us denote the
unique normal form of t in a canonical rewrite system by t↓R , the canonical form
of t.
In Example 7.2.8, we have seen that term t = (i(x) ∗ x) ∗ y has two normal forms,
i.e., i(x) ∗ (x ∗ y) and y. The rewrite system in the example is not confluent. In
1970, Donald Knuth and Peter Bendix introduced a procedure which takes a set
of equations and a simplification order as input and makes rewrite rules from the
equations according to the simplification order. So-called critical pairs are then
computed from the rewrite rules. If all the critical pairs have the same normal form,
7.2 Rewrite Systems 249
we obtain a canonical rewrite system. If not, such critical pairs are added into the
set of equations and continue.
Definition 7.2.12 Given two rewrite rules which share no variables and may differ
only by variable names, say l1 → r1 and l2 → r2 , a critical pair of the two rules is
a pair of terms (s, t), where s = r1 σ , t = l1 [p r2 ]σ , p is a non-variable position
of l1 , and σ is the mgu of l1 /p and l2 . We say l2 is superposed at p of l1 to produce
the critical pair and this process is called superposition.
In Example 7.2.8, we are given three rewrite rules. If we rename the variables in
rule 3 as (x3 ∗ y3 ) ∗ z3 → x3 ∗ (y3 ∗ z3 ), we can compute a critical pair from rule
3 and rule 2, by superposing l2 = i(x) ∗ x at position 1 of l3 , that is, unifying l2
and l3 /1 = x3 ∗ y3 with the mgu σ = [x3 i(x), y3 x, z3 y] and generate
the critical pair (i(x) ∗ (x ∗ y), e ∗ y) from l3 σ = (i(x) ∗ x) ∗ y as illustrated in
Example 7.2.8.
During the computation of a critical pair, applying the mgu σ to l1 , l1 σ can be
rewritten to r1 σ by l1 → r1 . At the same time, since l1 /pσ = l2 σ , l1 σ contains an
instance of l2 at position p, so l1 σ can be also rewritten to l1 [p l2 ]σ by l2 → r2 .
In other words, the critical pair is obtained by two rewritings on l1 σ using the two
rules, respectively:
r1 σ by l1 → r1
l1 σ ⇒
l1 [p r2 ]σ by l2 → r2
(1) e∗x → x
(2) i(x) ∗ x → e
(3) (x ∗ y) ∗ z → x ∗ (y ∗ z)
(4) i(x) ∗ (x ∗ y) → y
A critical pair comes from rule (4) (i.e., i(x4 ) ∗ (x4 ∗ y4 ) → y4 is l4 → r4 )
and itself (i.e., i(x) ∗ (x ∗ y) → y is l4 → r4 ), and the mgu of l4 /2 and l4 is
σ = [x4 i(x), y4 x ∗ y]:
(4) x ∗ y
i(i(x)) ∗ (i(x) ∗ (x ∗ y)) ⇒
(4) i(i(x)) ∗ y
.
The critical pair is x ∗ y = i(i(x)) ∗ y, which produces the fifth rule:
(5) i(i(x)) ∗ y → x ∗ y
A critical pair comes from rules (4) and (2), and the mgu of l4 /2 and l2 is σ =
[x4 i(x), y4 x]:
(4) x
i(i(x)) ∗ (i(x) ∗ x) ⇒
(2) i(i(x)) ∗ e → (5) x ∗ e
.
The critical pair is x = x ∗ e, which produces the sixth rule:
(6) x ∗ e → x
A critical pair comes from rules (6) and (2) with σ = [x6 i(e)] and generates
the seventh rule:
(7) i(e) → e
A critical pair comes from rule (5) and (2) with σ = [x2 i(x), y i(x)]:
(5) x ∗ i(x)
i(i(x)) ∗ i(x) ⇒
(2) e
(8) x ∗ i(x) → e
A critical pair comes from rules (6) and (5) with σ = [x6 i(i(x)), y e]:
(6) i(i(x))
i(i(x)) ∗ e ⇒
(5) x ∗ e → (6) x
7.2 Rewrite Systems 251
(9) i(i(x)) → x
A critical pair comes from rules (4) and (3), and the mgu of l4 /2 and l3 is σ =
[x4 x ∗ y, y4 z]:
(4) z
i(x ∗ y) ∗ ((x ∗ y) ∗ z) ⇒
(3) i(x ∗ y) ∗ (x ∗ (y ∗ z))
A critical pair comes from rules (10) and (8), and the mgu of l10 /2.2 and l8 is
σ = [x8 y, z i(y)]:
(10) i(y)
i(x ∗ y) ∗ (x ∗ (y ∗ i(y))) ⇒
(8) i(x ∗ y) ∗ (x ∗ e) ⇒ (6) i(x ∗ y) ∗ x
A critical pair comes from rules (11) and (4), and the mgu of l11 /1.1 and l4 is
σ = [x11 i(x), y11 x ∗ y]:
(11) i(x ∗ y)
i(i(x) ∗ (x ∗ y)) ∗ i(x) ⇒
(4) i(y) ∗ i(x)
The final critical pair comes from rules (4) and (5) and generates the thirteenth
rule:
(13) x ∗ (i(x) ∗ y) → y
252 7 First-Order Logic with Equality
Since rules (5), (10), and (11) can be rewritten to the identity, i.e., t = t, by rules
(9) and (12) (and then by other rules), the final set of rewrite rules is
All the critical pairs from this set of ten rules can be reduced to the identity by
this rewrite system.
The above example illustrates the execution of the Knuth-Bendix completion
procedure, invented by Donald Knuth and Peter Bendix, which is described by the
following pseudo-code.
Procedure 7.2.15 The procedure KnuthBendix(E, ) takes a set E of equations
and a simplification order as input and generates a canonical rewrite system from
E when it succeeds. It calls four procedures:
• pickEquation(E) picks an equation from E.
• N F (t, R) returns a normal form of term t by R.
• rewritable(t, r) checks if rewrite rule r can rewrite term t.
• criticalPairs(r, R) returns the set of all critical pairs (in the form of equation)
between rule r and every rule in R.
proc KnuthBendix(E, )
1 R := ∅
2 while (E = ∅) do
. .
3 (s = t) := pickEquation(E); E := E − {(s = t)}
4 s := NF (s, R); t := NF (t, R) // normalize s and t by R.
5 if (s = t) continue // identity is discarded.
6 else if (s t) r := (s → t)
7 else if (t s) r := (t → s)
8 else return “failure” // s and t are not comparable by .
9 R := R ∪ {r} // add new rule into R
10 for (d → e) ∈ R − {r} do // inter-reduction
.
11 if rewritable(d, r) R := R − {d → e}; E := E ∪ {d = e}
12 if rewritable(e, r) R := R − {d → e} ∪ {(d → NF (e, R)}
13 E := E ∪ criticalPairs(r, R)
14 return R
any two terms are equal according to the axioms of group theory. The rewrite system
returned by the above procedure can serve as a decision procedure by checking if
the two terms have the same canonical form.
The Knuth-Bendix completion is not an algorithm because it may go on forever
due to an ever-growing set of rules and critical pairs. Note that the procedure may
fail if an equation cannot be oriented into a rewrite rule. There are many attempts
.
to reduce the cases of failures. For instance, if an equation like f (x, g(x)) =
h(y, g(y)) cannot be oriented, we may introduce a new constant c to make two
. .
equations f (x, g(x)) = c and h(y, g(y)) = c. In general, if we have an equation
.
s(x, z) = t (y, z), where x does not appear in t and y does not appear in s, we
.
may introduce a new function h(z) to create two new equations: s(x, z) = h(z) and
. . .
t (y, z) = h(z). For the axioms AC = {(x ∗ y) ∗ y = x ∗ (y ∗ z), x ∗ y = y ∗ x}, we
may consider the rewriting over T (F, X)/=AC , the so-called rewriting modulo AC
(associativity and commutativity).
Definition 7.2.16 A rewrite system R is said to be inter-reduced if for any rule
l → r ∈ R, l is not rewritable by R − {l → r}; R is reduced if both l and r are not
rewritable by R − {l → r}.
Lemma 7.2.17 If KnuthBendix(E, ) returns R, then R is reduced.
Proof Line 11 (resp. 12) ensures the left (resp. right) side of each rule in R is not
rewritable by any other rule.
When KnuthBendix terminates successfully, we obtain a reduced canonical
rewrite system.
Theorem 7.2.18 If KnuthBendix(E, ) returns R, then R is reduced and canoni-
cal.
Proof R is reduced by Lemma 7.2.17. For canonicality, at first, R is terminating,
because for every l → r of R, l r, where is a simplification order. Secondly,
all the critical pairs of R are computed, and each pair is rewritten to the same term
by R (if not, a new rule will be added into R).
To show that R is confluent, we do induction on t0 based on . As a base case,
if t0 is not rewritable by R, then R is confluent trivially at t. Suppose, for i = 1, 2,
t0 ⇒ ti by li → ri ∈ R at position pi of t, i.e., ti = t[pi ri σi ].
Case 1: If p1 and p2 are disjoint positions, then both t1 and t2 can be written to a
common term t3 = t0 [p1 r1 σ1 , p2 r2 σ2 ].
Case 2: If p1 and p2 are not disjoint positions, without loss of generality, assume
p1 = , then t0 /p1 = t0 = l1 σ1 , and t0 /p2 = (l1 σ1 )/p2 = l2 σ2 . There are two
subcases to consider:
Case 2.1: If p2 is a non-variable position of l1 , then (l1 σ1 )/p2 = (l1 /p2 )σ1 =
l2 σ2 , that is, l1 /p2 and l2 are unifiable. Thus, there exists a critical pair (s1 , s2 )
between the two rules such that t1 = s1 θ and t2 = s2 θ for some substitution θ .
Since all critical pairs are rewritten to the same term by R, so is (s1 , s2 ), as well as
(t1 , t2 ).
254 7 First-Order Logic with Equality
Fig. 7.2 Illustration of the proof of Theorem 7.2.18: (a) two disjoint positions; (b) two overlapping
positions
Example 7.2.19 To illustrate case 2.2 in the above proof, let t = i(i(i(a)))∗i(i(a))
and R be the ten rules in Example 7.2.14. Then, t can be rewritten to e by rule 2 and
to i(a) ∗ i(i(a)) by rule 9 at position 1.1. The left side of rule 2 is i(x) ∗ x, where x
occurs twice (n = 2).
To obtain a common term, we apply rule 9 one more time (n − 1 = 1) to i(a) ∗
i(i(a)) at position 2 to obtain i(a) ∗ a, which can be rewritten to e by rule 2.
(2) e
i(i(i(a))) ∗ i(i(a)) ⇒
(9) i(a) ∗ i(i(a)) → (9) i(a) ∗ a → (2) e
Note that both 1.1 and 2 of t are variable positions of i(x) ∗ x, the left side of rule 2.
Rewrite systems can be used to put terms in normal form, to prove terms equal
or equivalent, or to serve as a decision procedure for some transitive monotonic
relations. Their application represents a powerful method of mathematical reason-
ing, because they can sometimes overcome the combinatorial explosions caused by
other proof procedures, e.g., resolution. In the following, we will briefly introduce
several special cases of rewrite systems.
A ground equation is a pair of ground terms, i.e., no variables appear in the equation.
The Knuth-Bendix completion can be greatly simplified on ground equations
because of the following:
• There exists a total simplification order for ground terms. This can be achieved
by defining a total precedence over the function symbols and then applying one
of the known simplification orderings, such as the Knuth-Bendix order (KBO),
to terms.
• The critical pair computation is not needed if the rewrite system is inter-reduced.
That is, if there exists a critical pair between two rewrite rules, then one rule can
rewrite the left side of the other rule.
Example 7.2.21 Let f 3 a and f 5 a denote f (f (f (a))) and f (f (f (f (f (a))))),
. .
respectively. Feeding E = {f 5 a = a, f 3 a = a} to the Knuth-Bendix completion,
the rewrite rules made from E should be {(1) f 5 a → a, (2) f 3 a → a}. Since
.
f 5 a = f 2 f 3 a, the critical pair from (1) and (2) is f 2 a = a, and this is the same
.
as we rewrite (1) to f a = a by (2). In fact, rewriting happens first (line 11) in the
2
.
Knuth-Bendix procedure. From f 2 a = a, we have (3) f 2 a → a, which rewrites
.
(2) to f a = a. The Knuth-Bendix completion halts with {f 1 a → a}.
1
Theorem 7.2.22 For any set E of ground equations, there exists a canonical
rewrite system R serving as a decision procedure of E.
We leave the proof of this theorem as an exercise.
256 7 First-Order Logic with Equality
We may also apply (3) first to cba and achieve the same result.
Formally, an SRS is defined as P = (, R), where is the alphabet and R is a
finite subset of ∗ × ∗ . Each member (s, t) of R is called rewrite rule and written
as s → t.
A term rewrite system (TRS) is more expressive than SRS as an SRS is easily
converted into a TRS, where each symbol s in the SRS is replaced by a unary
function s/1 in the TRS. For example, rewrite rule ba → ab in R of the above
example is replaced by b(a(x))) → a(b(x)). All the important concepts of TRS,
such as termination and confluence, can be carried over to SRS. For instance, it is
easy to show that R in the above example is a canonical rewrite system.
For TRS, a rewrite system defines a transitive, monotonic, and stable binary
relation over terms. For SRS, a rewrite system defines a transitive and monotonic
binary relation over ∗ . That is, given a set R of rewirte rules, which is binary
relation → between fixed strings over ∗ , an SRS extends the rewriting relation to
all strings in which the left and right sides of the rules appear as substrings, that is,
usv ⇒ utv from s → t, where s, t, u, and v are strings of ∗ . If we consider the
symmetric closure of ⇒, then R is called Thue system, whose name comes from the
Norwegian mathematician Axel Thue, who introduced systematic treatment of SRS
in 1914.
Every reader of this book has seen polynomial equations in middle school algebra.
For example, the following is a system of polynomial equations:
(1) x 3 + 4x 2 − 4xz − 8 = 0
(2) x 2 y + 4xy − 4yz + 4 = 0
We will use this example to illustrate all the concepts about rewriting with
polynomials.
In abstract algebra, a field is a set of elements closed under arithmetic operations
+, −, ∗, and /. The set of rational numbers is the best-known field. Given a set
7.2 Rewrite Systems 257
Here, the variables are treated as constants of first-order logic and the coefficient of
the left side of any rewrite rule is always simplified to one. We can use these rewrite
rules to simplify polynomials and compute critical pairs by superposition.
Given two monomials M = [1, a1 , a2 , . . . , an ] and N = [1, b1 , b2 , . . . , bn ], let
the least common multiple (lcm) of M and N be
To compute a critical pair, we consider two rewrite rules of which the left sides share
a variable and rewrite the lcm of the left sides by the two rules, respectively. For our
example, the lcm from (1) and (2) is lcm(x 3 , x 2 y) = x 3 y:
(1) (−4x 2 + 4xz + 8)y = −4x 2 y + 4xyz + 8y
x y ⇒
3
(2) x(−4xy + 4yz − 4) = −4x 2 y + 4xyz − 4x
258 7 First-Order Logic with Equality
(3) x → − 2y
(4) y 3 → 2y 2 + yz − 1
Using (3) and (4), (2) is simplified to true, and we obtain the following set of
rewrite rules:
(1) pre(0) → 0
(2) pre(s(x)) → x
(3) add(0, y) → y
(4) add(s(x), y) → s(add(x, y))
(5) sub(x, 0) → x
(6) sub(x, s(y)) → sub(pre(x), y)
(7) mul(0, y) → 0
(8) mul(s(x), y) → add(mul(x, y), y)
(9) exp(x, 0) → s(0)
(10) exp(x, s(y)) → mul(x, exp(x, y))
In this example, the function symbols are F = {0, s, pre, add, sub, mul}, which
can be divided into two parts: C = {0, s} and D = {pre, add, sub, mul}. T (C) =
{0, s(0), s 2 (0), . . .} is a representation of the natural numbers. D represents a set of
functions over the natural numbers.
For each function f in D, f is C-complete:
• pre(0) = 0 and pre(s i (0)) = s i−1 (0) for i > 0
• add(s i (0), s j (0)) = s i+j (0) for any i, j ≥ 0
• sub(s i (0), s j (0)) = 0 if i ≤ j and pre(s i (0), s j (0)) = s i−j (0) for i > j
• mul(s i (0), s j (0)) = s i∗j (0) for any i, j ≥ 0
It is easy to check that T (F )/=E = {[0], [s(0)], . . . , [s i (0)], . . .}, which is the
domain of the Herbrand model with equality and has a bijection h from T (C) to
T (F )/=E by h(t) = [t].
If C = {0, 1}, then the concept of C-consistency coincides with the consistency
of Boolean logic. The C-consistency of E does not allow equations over C. For
some applications, we may need such equations. For example, to model the set of
integers, we may use three constructors: 0, s (plus one), and p (minus one), so that a
.
negative integer −i is represented by pi (0). In this case, we need EC = {p(s(x)) =
7.3 Inductive Theorem Proving 261
.
x, s(p(x)) = x}, and the definition of C-consistency can be modified by replacing
T (C) by T (C)/=EC .
We leave the proof of the following lemma as an exercise.
Lemma 7.3.4 Let E be the equational definitions of D = F − C. D is C-complete
iff for every ground term s ∈ T (F ), there exists t ∈ T (C) such that s =E t.
Proposition 7.3.5 Let E be a set of equations over F , C ⊂ F be the set of
constructors and D = F − C. Let h : T (C) → T (F )/=E be defined by h(t) = [t].
The following statements are true:
(a) h is injective iff E is C-consistent.
(b) h is surjective iff D is C-complete.
(c) h is bijective iff E is C-consistent and D is C-complete.
Proof
(a) h : T (C) → T (F )/ =E is injective iff h(t1 ) = h(t2 ) whenever t1 = t2 for
t1 , t2 ∈ T (C). Thus, h(t1 ) = h(t2 ) iff [t1 ] = h(t1 ) = h(t2 ) = [t2 ], iff t1 =E t2 ,
iff E is C-inconsistent.
(b) h : T (C) → T (F )/=E is surjective iff ∀s ∈ T (F )∃t ∈ T (C) h(t) = [t] = [s],
iff ∀s ∈ T (F )∃t ∈ T (C) s =E t, iff D is C-complete (Lemma 7.3.4).
(c) Comes from (a) and (b).
It is not always an easy task to show that every defined function is C-complete or
E is C-consistent. The above proposition does not provide an effective method for
this task but may help us understand these concepts.
To prove an inductive theorem in the presence of constructors and C-complete
functions, we need to consider only the ground terms in T (C), not T (F ), thus
making the inductive proof simpler. In general, inductive theorems are properties
of defined functions.
One induction rule is called the structural induction rule, which is based on
the structure of terms built up by the constructors. In the current example, the
constructors are C = {0, s}.
The above rule states that to prove an inductive theorem B(x), we need to prove
two formulas: B(0) and B(y) → B(s(y)). Its soundness is based on the condition
that all the functions in F − C are C-complete. The C-completeness ensures that
every term in T (F ) is equivalent to a term in T (C) (Lemma 7.3.4). Since every term
x ∈ T (C) can be represented by 0 or s(y), if B(0) and B(s(y)) are true, so is B(x).
B(0) is the base case of the induction. To prove B(s(y)), we use B(y) as “induction
hypothesis” because y is less than s(y) in terms of the term structure of T (C).
.
Example 7.3.6 Let B(x) be add(x, 0) = x. Applying the structural induction on
. .
this equation, we need to prove (a) add(0, 0) = 0 and (b) add(x, 0) = x →
.
add(s(x), 0) = s(x). (a) is trivial; (b) is simplified by rule (4), the definition of
. .
add, to: add(x, 0) = x → s(add(x, 0)) = s(x), which is true by equality crossing,
as described below.
Equality Crossing
(t1 = t2 | A(t1 ))
(t1 = t2 | A(t2 ))
.
Since (t1 = t2 | A(t1 )) is equivalent to (t1 = t2 ) → A(t1 ), the result (t1 = t2 |
.
A(t2 )) can be regarded as “rewrite A(t1 ) to A(t2 ) by the context t1 = t2 ” [4]. For
. . .
instance, add(x, 0) = x → s(add(x, 0)) = s(x) is written to add(x, 0) = x →
.
s(x) = s(x) by equality crossing, where t1 = add(x, 0) and t2 = x. In other words,
. .
we used the induction hypothesis add(x, 0) = x to show that s(add(x, 0)) = s(x)
is true.
Equality crossing, called crossing fertilization by Boyer and Moore, is a simpli-
fication rule in the presence of equality axioms, due to the following result.
Proposition 7.3.7 (Soundness of Equality Crossing) Eax | (t1 = t2 | A(t1 )) ↔
(t1 = t2 | A(t2 )), where Eax is the equality axioms.
Proof From the monotonic axiom of equality, i.e., (xi = yi ) ∧ p(. . . , xi , . . .) →
.
p(. . . , yi , . . .), we can obtain (t1 = t2 ) ∧ A(t1 ) → A(t2 ), whose clausal form is
The resolvent of this clause and (t1 = t2 | A(t1 )) on A(t1 ) is (t1 = t2 | A(t2 )).
Switching t1 and t2 , the other implication also holds.
The fundamental difference between using an inference rule like resolution and
using an induction rule is not that a well-founded order, e.g., s(x) x, is needed
when applying induction. When using resolution for theorem proving by refutation,
the empty clause is the goal, and the resolution rule is used to derive this goal from
the input clauses. When using an induction rule for theorem proving, the goal is
the formula to be proved. However, we never try (and it is almost impossible) to
7.3 Inductive Theorem Proving 263
derive this goal directly from the input formulas by the induction rule. Instead, we
try to reduce the goal to subgoals by using the induction rule backward. That is,
to prove B(x) by induction, we generate two subgoals B(0) and B(y) → B(s(y))
from B(x) and try to establish the validity of these subgoals. Remember that when
every defined function is C-complete, this condition ensures the soundness of the
induction rule.
Example 7.3.8 Continuing from Example 7.3.3, let us prove that sub(add(x, y), y)
.
= x. If we do induction on x, the proof will not go through. On the other hand, we
can do induction on y. By inspecting the definitions of add and sub, we see that x
is the induction position for add(x, y) and y is the induction position for sub(x, y).
.
The base case is sub(add(x, 0), 0) = x, which is simplified to true by rules (3)
and (5). The inductive case is
. .
sub(add(x, y), y) = x → sub(add(x, s(y)), s(y)) = x
.
To simplify add(x, s(y)), we need add(x, s(y)) = s(add(x, y)), which is also
an inductive theorem, and its proof is left as an exercise. Suppose this theorem is
available to us, then
The previous example shows that we need to choose the right variable to do the
induction. Sometimes, we need to do induction on multiple variables.
Example 7.3.9 Adding the following two predicates to Example 7.3.3:
(11) y<0→⊥
(12) 0 < s(x) →
(13) s(y) < s(x) → y < x
(14) 0≤y→
(15) s(x) ≤ 0 → ⊥
(16) s(x) ≤ s(y) → x ≤ y
264 7 First-Order Logic with Equality
.
To prove (x ≤ y) = ¬(y < x) by structural induction, neither on x alone nor on
y alone will work. We need an induction on two variables.
By far, we have focused on the natural number functions based on the constructors
0 and s. In many applications, we need other data structures, such as lists, trees, etc.,
which have their own constructors. In Sect. 5.2.4, we discussed briefly many-sorted
logic, which is a small extension of first-order logic. A type or sort corresponds to
an unary (or monadic) predicate p(x) and defines a set S = {x | p(x)}. Many-sorted
algebraic specification is a language in which we define functions by equations
7.3 Inductive Theorem Proving 265
where the constructors for nat are 0 : nat and s : nat → nat and the constructors
for list are nil : list and cons : nat, list → list. We have seen the functions on
nat; the functions over list can be also defined using equations. To prove inductive
theorems involving nil and cons, we need the following rule:
where x comes from rev(z), y from rev(y), and z from cons(x, nil). The
associativity of app can be shown to be an inductive theorem separately. Once this is
done, the proof of G2 is complete. Generalization is a heuristic technique originally
used in Boyer-Moore’s theorem prover.
The above example illustrates an interesting feature of inductive theorem prov-
.
ing: in order to prove rev(app(y, z)) = app(rev(z), rev(y)), we need to prove
. .
extra lemmas: app(y, nil) = y and app(app(x, y), z) = app(x, app(y, z)).
These lemmas can sometimes be generated automatically and sometimes need to
be provided by the user. For rigorousness, they are needed to be proved before they
are used in a proof.
.
Example 7.3.11 Let us prove another inductive theorem, rev(rev(y)) = y, by the
.
same induction rule. The induction rule reduces rev(rev(y)) = y to two subgoals:
.
G3 : rev(rev(nil)) = nil, which is easy to prove by rule 2, and
. .
G4 : rev(rev(y)) = y → rev(rev(cons(x, y))) = cons(x, y).
We have shown how to use the language of algebraic specification of abstract data
types to write axioms (definitions of functions and predicates), theorems or lemmas
(intermediate inferences), etc. for inductive reasoning. In particular, we require that
the axioms as well as theorems in an algebraic specification be expressed as a set
of (conditional) equations. In other words, we are interested in automatic methods
which can prove equations from equations by mathematical induction.
Example 7.3.12 We may define the insertion sort algorithm using equations:
If we can find an odd integer x such that f (x) = f ((3x + 1)/2) = · · · = f (x),
then x will be a counterexample to the Collatz conjecture and f (x) is undefined. A
procedure for computing f (x) will loop forever on this x.
As of 2020, the conjecture has been checked by computer for all x ≤ 2682 ≈
2.95×1020 . This experimental evidence is still not rigorous proof that the conjecture
is true for all positive integers, as counterexamples may be found when considering
very large (or possibly immense) values. The renowned mathematician Paul Erdös
once said about the Collatz conjecture: “Mathematics is not yet ready for such
problems.” Erdös offered $500 for its solution.
In the case of using recursion incorrectly, we often use self-reference. The
circular reasoning fallacy is a typical error of self-reference, where two statements
(or concepts) of the same fact are used to support each other. If the meaning of a
propositional variable is defined by self-reference, it may lead to a contradiction.
For example, the statement “I am lying” leads to a contradiction if we assume that
“if someone is lying, then what he said is false.” If the statement “I am lying” is
true, since it is uttered by someone who is lying, then the statement is false. If the
statement is false, then “I am not lying” is true, so what I said (i.e., “I am lying”)
should be true. The above contradiction is called the “liar paradox” and can be found
in first-order logic, too. For instance, the Epimenides paradox, “All Cretans are liars”
when uttered by an ancient Greek Cretan, is one of the first recorded instances.
7.4 Resolution with Equality 269
Up to this point in this chapter, our focus was on equations, which are positive
.
unit clauses with “=” as the only predicate. Equations are expressive enough to
describe all the computations. For instance, all Turing machines can be expressed as
a set of equations. On the other hand, it is more convenient to express axioms and
theorems in general clauses. The Knuth-Bendix completion suggests that we do not
270 7 First-Order Logic with Equality
need the equality axioms explicitly if we have an inference like superposition which
computes critical pairs from two rewrite rules. This is also true in resolution-based
proving as Alan Robinson and Larry Wos suggested the paramodulation rule [5],
shortly after the invention of the resolution rule.
7.4.1 Paramodulation
.
Definition 7.4.1 (Paramodulation) Suppose clause c1 is (s = t | α) and c2 is
(A | β), where A is a literal and p is a non-variable position of A, α and β are the
rest literals in c1 and c2 , respectively, the paramodulation rule is defined as follows:
.
(s = t | α) (A | β)
(A[p t] | α | β)σ
where σ is the mgu of s and A/p. The clause (A[p t] | α | β)σ produced by the
paramodulation rule is called paramodulant of the paramodulation; c1 and c2 are
the parents of the paramodulant. We use paramod(c1 , c2 ) to denote the set of all
clauses like (A[p t] | α | β)σ from c1 and c2 .
Superposition is a special case of paramodulation where α and β are empty and
A is also an equation.
.
Example 7.4.2 Let c1 be (f (g(y)) = a | r(y)) and c2 be (p(g(f (x))) | q(x)),
then f (g(y)) and p(g(f (x)))/1.1 = f (x) are unifiable with σ = [x g(y)]. The
paramodulant is (p(g(a)) | q(g(y)) | r(y)).
Proposition 7.4.3 (Soundness of Paramodulation) For any clauses c1 and c2 ,
Eax ∪ {c1 , c2 } | paramod(c1 , c2 ), where Eax is the equality axioms.
Proof If paramod(c1 , c2 ) is empty, then paramod(c1 , c2 ) ≡ by our assumption
and the proposition is trivially true. If paramod(c1 , c2 ) is not empty, let σ be a
.
mgu used in the paramodulation, c1 σ is (s = t | α ) and c2 σ is (A [s ] | β ), then
paramod(c1 , c2 ) contains c3 : (A [t ] | α | β ).
.
From the monotonic axiom in Eax , we can deduce (s = t ) ∧ A (s ) → A (t ),
.
whose clausal form is c4 : (s = t | A (s ) | A (t )). Apply resolution on (s = t )
of c1 σ and c4 , we get c5 : (A (s ) | A (t ) | α ). Finally, apply resolution on A (s )
of c2 σ and c5 , we get c6 : (A (t ) | α | β ), which is the same as c3 . Since resolution
is sound, c3 is a logical consequence of c1 , c2 , and Eax .
The above proof shows that one paramodulation is equivalent to two resolutions
with an instance of the monotonic axiom.
Using paramodulation, we do not need most of the equality axioms. It can
be shown that any consequence from the equality axioms can be deduced from
. .
paramodulation, with the only exception of the reflexivity axiom: x = x. (x = x)
7.4 Resolution with Equality 271
(s = t | α)
(α)σ
For a theorem prover based on resolution and paramodulation, huge number of new
clauses are generated, and we need to control redundancy in large search spaces.
Restricted strategies for using resolution and paramodulation are effective to reduce
redundant clauses, deleting unnecessary clauses in the search for the empty clause
272 7 First-Order Logic with Equality
is also important to control the search space. As shown before, we may apply the
following deletion strategies in a resolution-based theorem prover:
• Pure literal deletion: Clauses containing pure literals are discarded.
• Tautology deletion: Tautology clauses are discarded.
• Subsumption deletion: Subsumed clauses are discarded.
• Unit deletion: Those literals which are the negation of instances of some unit
clauses are discarded from clauses.
Simplification strategies allow us to keep clauses in simplified forms so that
multiple copies of equivalent clauses are not needed. For instance, if a clause
.
contains a term s, say (p[s] | q), and we have a rewrite rule s → t or simply s = t,
.
then this clause can be rewritten to (p[t] | q) by s = t. Rewriting is not the only
way to obtain (p[t] | q), because paramodulation can generate it from (p[s] | q)
.
and s = t. However, there is no need to keep both (p[s] | q) and (p[t] | q), because
both are logically equivalent under equality. In other words, when (p[s] | q) is
rewritten to (p[t] | q), we keep the latter and throw the former away. That is why
we call this kind of strategy a simplification strategy.
To work with ordered resolution and paramodulation, we may enforce a restric-
tion on equality crossing as a simplification rule:
Ordered Equality Crossing
(s = t) | A(s)
(s = t) | A(t)
.
Proof Suppose c1 is (s = t | α) and σ is the substitution used in contextual
rewriting, let s = sσ and t = tσ , then c2 is (A[s ] | β), and c2 is (A[t ] | β).
All the inference and simplification rules discussed in this section have been
implemented in Prover9, with the exception of contextual rewriting. Rewriting using
equations, called demodulation, is supported in Prover9. In fact, most examples
provided in the distribution of Prover9 contain equality.
Example 7.4.10 In Example 7.1.6, we are given three equations which specify a
free group (S, ∗). Can we reduce the number of equations to one? What is the
minimal size of such a single equation? These are examples of questions interested
by mathematicians. Here is an answer to the first question:
.
y ∗ i(z ∗ (((u ∗ i(u)) ∗ i(x ∗ z)) ∗ y)) = x
Prover9’s input file to this problem is simply the single axiom and the goals will
be the three equations in Example 7.1.6, with the only exception that we use here
y ∗ i(y) for the identity e.
formulas(sos).
y * i(z * (((u * i(u)) * i(x * z)) * y)) = x # label(oneAxiom).
end_of_list.
formulas(goals).
(x * y) * z = x * (y * z) # label(associativity).
x * i(x) = y * i(y) # label(inverse).
x * (y * i(y)) = x # label(identity).
end_of_list.
It took Prover9 less than a second to find proofs for all the three equations after
generating 1587 clauses. The proof of the associativity law is the hardest, which
consists of 48 steps after the proofs of the other equations.
The equality symbol “=” is built-in; so is “!=”, which stands for the negation of
equality. The major options involving paramodulation are the following:
clear(ordered_para). % default set
set(ordered_par).
If n = −1, each parent in paramodulation can have at most n literals. This option
may cause incompleteness of the inference system.
set(para_units_only).
clear(para_units_only). % default clear
This flag says that both parents for paramodulation must be unit clauses. The
only effect of this flag is the same as to assign 1 to the parameter para_lit_limit.
The major options involving rewriting are the following:
assign(demod_step_limit, n). % default n=1000, range [-1 .. INT_MAX]
This parameter limits the number of rewrite steps that are applied to a clause during
demodulation. If n = −1, there is no limit.
assign(demod_size_limit, n). % default n=1000, range [-1 .. INT_MAX]
This parameter limits the size (measured as symbol count) of terms as they
are demodulated. If any term being demodulated has more than n symbols,
demodulation of the clause stops. If n = −1, there is no limit.
set(back_demod).
clear(back_demod). % default clear
If this flag is set, a new rewrite rule will try to rewrite equations in the usable and
sos lists. Prover9 also provides options for non-orientable equations to be rewrite
rules under certain conditions, and rewriting using these rules is restricted to ensure
termination.
7.5 Finite Model Finding in First-Order Logic 275
In theory, the problem of finding a model of the fixed size in first-order logic can
be formulated as a SAT problem. In practice, converting into a SAT instance is not
always the most effective approach even though we have advanced SAT solvers.
There are many finite model finders proposed in the literature. In the following, we
will introduce Mace4 because it is available and has the same specification language
as Prover9.
Mace4 is a software tool for finite model finding in a first-order language. Mace4
is distributed together with Prover9 and was created by William McCune. Prover9
and Mace4 share a first-order language and there is a GUI interface for both of
them. Mace4 searches for finite models satisfying the formulas in a first-order
language. For a given domain size, all instances of the formulas over the domain
are constructed. The result is a set of ground clauses with equality. Then, a
decision procedure based on ground equational rewriting is applied. If satisfiability
is detected, one or more models are printed. If the formula is the denial of some
conjecture, any model found by Mace4 is a counterexample to the conjecture.
Mace4 is a valuable complement to Prover9, looking for counterexamples before
(or at the same time as) Prover9 is used to search for proof. It can also be used to
help debugging input clauses and formulas for Prover9.
For the most part, Mace4 accepts the same input files as Prover9. If the input file
contains commands that Mace4 does not understand, then the argument “-c” must
be given to tell Mace4 to ignore those commands. For example, we can run the
following two jobs in parallel on a Linux machine, with Prover9 looking for proof
and Mace4 looking for a counterexample.
prover9 -f x2.in > x2.prover9.out
mace4 -c -f x2.in > x2.mace4.out
Most of the options accepted by Mace4 can be given either on the command line or
in the input file. The following command lists the command-line options accepted
by Mace4.
mace4 -help
Mace4 searches for unsorted finite models only. That is, a model has one
underlying finite set, called the domain (or the universe), and the members are
always 0, 1, . . . , n − 1 for a set of size n. The models are the structures which
define functions and relations over the domain, as an interpretation to the function
and predicate symbols in the formula (using the same symbols). By default, Mace4
starts searching for a structure of domain size 2, and then, it increments the size until
276 7 First-Order Logic with Equality
it succeeds or reaches some limit. The size of the initial domain or the incremental
size can be specified by the user.
If a formula contains constants that are natural numbers, {0, 1, . . .}, Mace4
assumes they are members of the domain of some structure, that is, they are distinct
objects; in effect, Mace4 operates under the assumptions 0 = 1, 0 = 2, and so on.
To Prover9, natural numbers are just ordinary constants. This is a subtle difference
between Prover9 and Mace4. Because Mace4 assumes that natural numbers are
members of the domain, if a formula contains a natural number that is out of range
(≥ n, when searching for a structure of size n), Mace4 will terminate with a fatal
error.
Mace4 and Prover9 have the same restrictions on the goal formulas it accepts.
Mace4 negates the goals and translates them to clauses in the same way as Prover9.
The term “goal” is not particularly intuitive for Mace4 users, because Mace4 does
not prove things. It makes more sense, however, when one thinks of Mace4 as
searching for a counterexample to the goal.
Mace4 uses the following commands to specify the initial domain size, the
maximal domain size, and the increment of the domain size.
assign(domain_size, n). % default n=2, range [2 .. 200]
% command-line -n n
assign(iterate_up_to, n). % default n=10, range [-1 .. 200]
% command-line -N n
assign(increment, n). % default n=1, range [1 .. 200]
% command-line -i n
Note that the function f in the first clause is a Skolem function from ∃z. Mace4
will find a model of size 5 for this input, and it displays the value of p(x, y, z)
as a Boolean matrix of 25 rows and 5 columns, not a very readable format for a
Latin square. From the meaning of p(x, y, z), we may reconstruct the Latin square
as follows:
0 2 1 4 3
4 1 3 2 0
3 4 2 0 1
1 0 4 3 2
2 3 0 1 4
.
If we specify the Latin square using x ∗ y = z instead of p(x, y, z), the input to
Mace4 is the following:
assign(domain_size, 5).
formulas(assumptions).
x * z != y * z | x = y. % "!=" means "not equal"
x * y != x * z | y = z.
x * x = x. % idempotent law
((y * x) * y) * y = x. % special constraint
end_of_list.
.
Note that “x ∗ z = y ∗ z | x = y” specifies that if rows x and y contain the
.
same value at column z, then x = y. Similarly, “x ∗ y = x ∗ z | y = z” specifies
that if columns y and z contain the same value at row x, then y = z. An alternative
specification will use \ and / as the Skolem functions for u and v, respectively, in
. .
∀x, y∃!u, v (x ∗ u = y) ∧ (v ∗ y = x)
where ∃! means “there exists uniquely.” Now, Mace4’s input will look like
assign(domain_size, 5).
formulas(assumptions).
% quasigroup axioms (equational)
x * (x \ y) = y.
x \ (x * y) = y.
(x / y) * y = x.
(x * y) / y = x.
The Latin square defined by ∗ is different from the one obtained by p(x, y, z). If we
use the command assign(max_models, -1), Mace4 will find all the six models
when the domain size is 5; it will find 120 models when the domain size is 7 (you
need to turn off the auto command).
Earlier versions of Mace (before Mace3) are based on the fact that a set C of first-
order clauses has a finite model if the propositional formula translated from C is
satisfiable. Let D = {0, 1, . . . , n − 1} be the domain of the finite model I , we will
do the following to obtain a set of propositional clauses from C:
p
• For each predicate symbol p/k, define a set of propositional variables qx1 ,...,xk ,
p
xi ∈ D, such that qx1 ,...,xk is true iff pI (x1 , . . . , xk ) is true.
f
• For each function symbol f/k, define a set of propositional variables qx1 ,...,xk ,y ,
f
xi , y ∈ D, such that qx1 ,...,xk ,y is true iff f I (x1 , . . . , xk ) = y, and add the
following clause into C:
.
(f (x1 , . . . , xk ) = y1 | f (x1 , . . . , xk ) = y2 | y1 = y2 )
.
where s = t stands for ¬(s = t).
• For each clause A of C, if f (t1 , . . . , tk ) appears in A and not in the for-
. .
mat of f (x1 , . . . , xk ) = y (or y = f (x1 , . . . , xk )), then replace A by
(A[f (t1 , . . . , tk ) z] | f (t1 , . . . , tk ) = z), where z is a new variable. This
step is called flattening. Once this process stops, every atom in C will be either
.
p(x1 , . . . , xk ) or f (x1 , . . . , xk ) = y, where x1 , . . . , xk , y are free variables.
• For each flattened clause A of C, for each variable x of A, for each value d of D,
create all the ground instances of A. Let the set of all ground clauses be G.
p
• Finally, for each ground clause in G, replace p(d1 , . . . , dk ) by qd1 ,...,dk and
. f
f (d1 , . . . , dk ) = d by qd1 ,...,dk ,d . Let the set of all propositional clauses be
propo(C).
Example 7.5.2 For the equational specification of the Latin square problem, we
used three function symbols: ∗, \, and /. The propositional variables will be px,y,z
7.5 Finite Model Finding in First-Order Logic 279
. . .
for x ∗ y = z, qx,y,z for x\y = z, and rx,y,z for x/y = z; if |D| = 5, there
will be 375 propositional variables. The flattened clauses from the axioms plus the
functional constraints are
. .
(x\y = z | x ∗ z = y) // x ∗ (x\y) = y.
. .
(x ∗ y = z | x\z = y) // x\(x ∗ y) = y.
. .
(x/y = z | z ∗ y = x) // (x/y) ∗ y = x.
. .
(x ∗ y = z | z/y = x) // (x ∗ y)/y = x.
. .
(y ∗ z = z | z ∗ y = w | w ∗ y = x) // ((y ∗ x) ∗ y) ∗ y = x.
.
(x ∗ y = z) | x ∗ y = w | z = w) //“ ∗ " is a function
.
(x\y = z) | x\y = w | z = w) //“\" is a function
.
(x/y = z) | x/y = w | z = w) //“/" is a function
If |D| = 5, a clause with three free variables will generate 53 = 125 ground clauses;
a clause with four free variables will generate 54 = 625 ground clauses. The ground
instances of the above clauses are easy to convert to propositional clauses. For the
literal (z = w), its ground instance becomes if we have zσ = wσ = d and ⊥ if
zσ = d1 = d2 = wσ .
While the conversion from a first-order formula to a set of propositional clauses
uses the equality “=”, the equality axioms (introduced in the first section of this
chapter) are not needed for the conversion because they produce only trivial
propositional clauses. Only the function axiom is needed in the conversion for each
function f :
.
(f (x1 , . . . , xk ) = y1 | f (x1 , . . . , xk ) = y2 | y1 = y2 )
Proposition 7.5.3 A first-order formula C in CNF has a finite model iff the
propositional formula propo(C) is satisfiable.
Proof Let C be a CNF formula of (P , F, X, Op) and D = {0, 1, . . . , n − 1} be the
finite domain.
If propo(C) is satisfiable, then propo(C) has a model σ and σ defines a relation
R p over D k for each predicate symbol p/k:
p
R p = {d1 , . . . , dk | σ (qd1 ,...,dk ) = 1}
On the other hand, if C has a finite model I = (D, R, G), where D is finite, then
we can create the interpretation σI for propo(C) as follows:
p
σI (qd1 ,...,dk ) = 1 iff pI (d1 , . . . , dk ) = 1
and
f
σI (qd1 ,...,dk ,d ) = 1 iff f I (d1 , . . . , dk ) = d
Exercises
.
1. Show that (f (x1 , . . . , xk ) = y1 | f (x1 , . . . , xk ) = y2 | y1 = y2 ) from the
equality axioms.
2. Write out in first-order logic the statements that there are at least, at most, and
exactly three elements x such that A(x) holds.
3. How many critical pairs can be computed between the rewrite rules in the final
set of Example 7.2.14? Please count them by the pair of rules.
.
4. Adding the equation a ∗ b = b ∗ a to the rewrite system of Example 7.2.14,
please use the Knuth-Bendix completion to obtain a canonical rewrite system
with LPO and the precedence a b i ∗ e.
5. Given the following set of ground equations
. . . . .
E = {I = J, K = L, A[I ] = B[K], J = A[J ], M = B[L]},
Exercises 281
.
(b) (1) x 3 − x 2 y − x 2 z + x = 0
.
(2) x2y − z = 0
.
(c) (1) x4y2 − z = 0
.
(2) x3y3 − 1 = 0
.
(3) x 2 y 4 − 2z = 0
11. Prove that the following rules from Example 7.3.3 are terminating by a
simplification order:
12. Prove that the following rules are terminating by a simplification order:
(a) f ib(0) → 0
(b) f ib(s(0)) → s(0)
(c) f ib(s(s(x))) → add(f ib(s(x)), f ib(x))
(d) f sum(0) → 0
(e) f sum(s(x)) → add(f sum(x), f ib(s(x)))
where f ib(x) stands for the Fibonacci number and f sum(x) stands for the
summation of the first x Fibonacci numbers. Add these rules to Example 7.3.3
.
and prove by the structural induction that f sum(x) = pre(f ib(s(s(x)))) is
true.
13. Prove that the following rules are terminating by a simplification order:
where A(x, y) stands for the Ackermann’s function. Add these rules to
.
Example 7.3.3 and prove by the structural induction that A(s(0), y) = s(s(y))
.
and A(s(s(0)), y) = s(s(s(add(y, y)))).
14. Let C = {0, s, p} be the constructors to model the set of integers with EC =
. .
{p(s(x)) = x, s(p(x)) = x}:
(a) Please provide a C-complete definition for add (integer addition) and sub
(integer subtraction).
(b) Prove that add is commutative and associative.
. .
(c) Prove sub(add(x, y), y) = x and add(sub(x, y), y) = x.
15. Given the definitions of app and rev in Example 7.3.10, prove that
.
(a) app(x, nil) =x
.
(b) app(x, app(y, z)) = app(app(x, y), z)
.
(c) rev(app(x, y)) = app(rev(y), rev(x))
18. Use Mace4 to decide if there exist Latin squares of size between 4 and 9 for each
of the following constraints (called “short conjugate-orthogonal identities”),
respectively.
.
19. Repeat the previous problem when an additional constraint, x ∗ x = x, is added
into the axiom set.
References
1. Alex Sakharov, Equational Logic, in Eric W. Weisstein (ed.) MathWorld–A Wolfram Web
Resource, mathworld.wolfram.com/EquationalLogic.html, retrieved 2023-06-20.
2. Nachum Dershowitz and Jean-Pierre Jouannaud, Rewrite Systems, in Jan van Leeuwen (ed)
Handbook of Theoretical Computer Science. Vol. B. Elsevier. pp. 243–320, 1990
3. Robert S. Boyer and J. Strother Moore, A computational logic, Academic Press, New York, 1979
4. Hantao Zhang, Reduction, superposition and induction: automated reasoning in an equational
logic, Ph.D. Thesis, Rensselaer Polytechnic Institute, New York, 1988
5. Larry Wos and Gail Pieper, Collected Works of Larry Wos, World Scientific, January 2000,
ISBN: 9810240015
Part III
Logic in Programming
Chapter 8
Prolog: Programming in Logic
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 287
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2_8
288 8 Prolog: Programming in Logic
A :−B1 , . . . , Bn .
B1 ∧ · · · ∧ Bn → A.
In other words, the symbol :− is the inverse of → and the commas are ∧ in Prolog.
For the above rule, A is called the head and “B1 , . . . , Bn ” the body. It is read as “A
is true if “B1 , . . . , Bn ” are all true.” Prolog’s facts are clauses with an empty body
and do not need the symbol “:-”. Equivalently, we may write a fact in Prolog as “A
:- true.”
Prolog uses negative clauses (B1 | . . . | Bn ) for queries. Each Prolog query is
written as ?- B1 , . . . , Bn , which resembles the equivalent formula ¬(B1 ∧· · ·∧Bn ).
Example 8.1.1 Suppose we want to build the family tree of a big family. We may
state the father-child and mother-child relations by facts and define other relations
by rules. Here is a Prolog program for the family tree.
mom(rose, john). % Rose is the mother of John
mom(rose, bob).
mom(rose, ted).
papa(joe, john). % Joe is the father of John
papa(joe, bob).
papa(joe, ted).
papa(ted, patrick).
papa(john, caroline).
papa(bob, kathleen).
papa(bob, joseph).
papa(bob, mary).
parent(X, Y) :- papa(X, Y). % X is parent of Y if X is papa of Y.
parent(X, Y) :- mom(X, Y). % X is parent of Y if X is mom of Y.
grandpa(X, Y) :- papa(X, Z), parent(Z, Y).
grandmo(X, Y) :- mom(X, Z), parent(Z, Y).
8.1 Prolog’s Working Principle 289
From this program, we see that variables in Prolog are identifiers whose first
character is a capital letter. Once the program is loaded into Prolog, we may issue
queries after the symbol ?-.
To query who are the parents of ted, after the symbol ?-, we type parent(X,
ted). After we hit a “return,” Prolog will answer with X = joe. If you type “;”,
which means for “more solutions,” Prolog will give you another answer: X = rose.
If you hit “;” again, Prolog will terminate with “no,” because it runs out of answers.
To query the sibling relation, we define the sibling relation as a rule
sibling(X, Y) :- parent(Z, X), parent(Z, Y), X =\= Y.
% X and Y share parent Z.
where =\= stands for “not equal.” Now, the result can be obtained by the query
?- sibling(X, Y).
In Prolog, associated with every successful query, there exists a resolution proof.
For the Prolog program in Example 8.1.1, we have an answer X = joe for the
query “?- parent(X, ted).” The corresponding resolution proof is given below,
where p stands for parent and f for papa.
Prolog format clausal format
1 papa(joe, ted). (f (j oe, ted)) // input
2 parent(X, Y) :- papa(X, Y). (p(X, Y ) | f (X, Y )) // input
3 ?- parent(X, ted). (p(X, ted)) // query
4 ?- papa(X, ted). (f (X, ted)) // resolvent of 2 and 3
5⊥ ⊥ // resolvent of 1 and 4.
In the last resolution, the mgu is {X → joe}, which is the answer to the query.
The query comes from the negation of the formula ∃X p(X, ted), which is a logical
consequence of the input clauses.
As another example, we list below the corresponding resolution proof for the
query “parent(Z, X), parent(Z, Y), X = \ = Y.” Note again that “?- A,
B, C” denotes the clause (¬A | ¬B | ¬C).
1 papa(joe, john). // input
2 papa(joe, bob). // input
3 parent(X, Y) :- papa(X, Y). // input
290 8 Prolog: Programming in Logic
The mgus used in the resolutions provide the solution of the query: the resolution
between 1 and 5 provides X = john, Z = joe and the resolution between 2 and
7 provides Y = bob.
Both above resolution proofs are a negative, input and linear resolution proof.
That is, every resolution includes a negative clause and an input clause as parents,
and the latest resolvent (which is negative) is used in the next resolution. This
is true for every resolution proof found by Prolog. Corollary 3.4.11 claims that,
for propositional Horn clauses, if there exists only one negative clause, then the
negative, input, and linear resolution is complete for Horn clauses. This result can
be extended to first-order clauses. In other words, the negative, input, and linear
resolution strategy is complete for first-order Horn clauses. However, the depth-
first search strategy used by Prolog is not a fair strategy. Thus, even if there exists a
negative, input, and linear resolution proof for a set of Horn clauses, Prolog may fail
to find a proof. We will discuss this issue when talking about recursions in Prolog.
In short, given a query, the Prolog engine attempts to find a resolution proof from
the program and the query. If the empty clause can be derived, it follows that the
query, along with the found substitution being applied, is a logical consequence of
the program. This makes Prolog particularly useful for symbolic computation and
language parsing applications.
Example 8.1.2 Suppose we are asked to write a program to convert a propositional
formula into negation normal form. This would be a day-long exercise using a
conventional programming language. Suppose we use only three propositional
variables, say p, q, r, and the logical operators are “n” for ¬, “a” for ∧, “o” for
∨, “i” for →, and “e” for ↔, the Prolog program is given below, where nnf(X, Y )
is true iff Y is an NNF of X.
pvar(p). % definition of propositional variables
pvar(q).
pvar(r).
The Prolog engine can be regarded as a goal-reduction process: for each atom G in
the original query, the engine will search for a rule in the Prolog program whose
head unifies with G and recursively apply the engine to the subgoals, which are the
atoms in the body of the rule. If the body is empty, then this rule is a fact. When
all the subgoals of G are solved, or when the body is empty, goal G is considered
as solved and a substitution for the variables of G is returned; otherwise, the label
“fail” is returned.
Procedure 8.1.3 Assuming P is a Prolog program, procedure engine1 takes a
single goal G as input and returns a substitution as output if one rule of P can
solve G; otherwise, f ail is returned. Procedure engine takes a list Gs of goals as
input and returns a substitution as output if every goal of Gs is solved by P .
proc engine1 (G)
1 for (H :− B) ∈ P do
2 if (H and G are unifiable with mgu σ )
3 θ := engine(Bσ )
4 if θ = f ail return σ θ
5 return f ail
proc engine(Gs)
1 θ := [ ]
2 for A ∈ Gs do
3 σ := engine1 (Aθ )
4 if σ = f ail break else θ := θ σ
5 if σ = f ail return f ail else return θ
292 8 Prolog: Programming in Logic
There are two solutions to the query “?- dfs(a, N).” That is, N = l(a, l(b,
l(d, g))) and N = l(a, l(c, g)), where a term like l(a, l(c, g)) represents a
path (a, c, g) in the graph. Later, after introducing the list, the same path can be
represented by [a, c, g]. Figure 8.1 shows the ∨-nodes in the search tree for the
query “:- dfs(a, N).” The ∧-nodes are implicit, the shaded nodes are the failed
nodes, and the other nodes are ok nodes.
Note that the two recursive procedures engine and engine1 just illustrate the goal-
reduction process; the actual Prolog engine is an interactive procedure and uses a
stack to save all the information in the ∧-∨ tree.
Procedure engine1 shows that the order of rules in P is important. Procedure
engine shows that the order of atoms in the body of a rule is important. In other
words, the links in the ∧-∨ tree are ordered, and it implements a special case of
the negative, input, and linear resolution strategy: the literals in a rule are handled
8.1 Prolog’s Working Principle 293
from left to right, and the commas, which represent ∧ in logic, are not commutative.
The input clauses are tried to unify with a given goal from first to last. From the
viewpoint of search strategies, this is the depth-first search in the ∧-∨ tree.
Example 8.1.5 Adding the following rules to the Prolog program in Example 8.1.1:
descen(X, Y) :- parent(Y, X). % X is a descendant of Y
descen(X, Z) :- parent(Y, X), descen(Y, Z).
The last rule is different from the previous rules in that descen is recursively
defined. If we ask, “who are the descendants of Joe,” the query is “?- descen(X,
joe).” Prolog will find that X = john, bob, ted, patrick, caroline,
kathleen, joseph, and mary, in the given order. For the first answer, i.e., X =
john, the recursive calls of engine and engine1 go as follows:
engine1 (descen(X, joe))
engine([parent(joe, X)]])
engine1 (parent(joe, X)])
engine([papa(joe, X)])
engine1 (papa(joe, X)) return X = john
...
engine1 ([descen(X, joe)]) return X = john
For the last answer, i.e., X = mary, the recursive calls go as follows. (the calls to
engine are ignored if the list contains a single goal):
engine1 (descen(X, joe))
engine([parent(Y, X), descen(Y, joe)])
engine1 (parent(Y, X))
294 8 Prolog: Programming in Logic
The same query, “?- descen(X, joe),” will produce X = john, bob, ted,
caroline, kathleen, joseph, mary, and patrick, the same set of eight
answers in different orders, and will cause Prolog to crash if you ask for more
answers. If we switch the order of the last two clauses, i.e., the last two clauses
are
descen(X, Z) :- descen(Y, Z), parent(Y, X).
descen(X, Y) :- parent(Y, X).
The same query will cause Prolog to crash without generating any answer. The
first-order logic cannot explain the difference because these two clauses are logically
equivalent. The procedures engine and engine1 can explain the difference perfectly:
The initial goal list is [descen(X, joe)] in all the three cases. In the last case,
the rule descen(X, Z) :- descen(Y, Z), parent(Y, X) is applied first, and
the recursive calls of engine and engine1 go as follows:
engine1 (descen(X, joe))
engine([descen(Y, joe), parent(Y, X)])
engine1 (descen(Y, joe))
engine([descen(Y1, joe), parent(Y1, Y)])
engine1 (descen(Y1, joe))
engine([descen(Y2, joe), parent(Y2, Y1)])
...
The unsolved goal list contains a long sequence of goals like
..., parent(Y3, Y2), parent(Y2, Y1), parent(Y1, Y), parent(Y, X)
until Prolog crashes when it runs out of memory. Here, we changed the variable
names because each clause is assumed to have different variables from others.
The above example shows that Prolog uses a depth-first strategy, which is not a
fair strategy, to implement resolution, and becomes an incomplete theorem prover
for Horn clauses. As a Prolog programmer, one needs to be careful to avoid the
pitfall of infinite loops associated with the depth-first strategy. We will address this
issue in the next section.
8.2 Prolog’s Data Types 295
Example 8.1.6 Hanoi’s tower, introduced in the introduction of this book, asks to
move n disks of different sizes from peg X to peg Y using peg Z as an auxiliary
holding peg. At no time can a larger disk be placed upon a smaller disk. The Prolog
code contains two predicate definitions: move writes out the move information;
hanoi(N,X,Y,Z) checks if N = 1, then moves disk 1 from X to Y; otherwise, it
tries to move the top N-1 disks from X to Z; move the bottom disk N from X to Y;
and then move the top N-1 disks from Z to Y.
move(N,X,Y) :- % print a move info
write(’Move disk ’), write(N), write(’ from ’),
write(X), write(’ to ’), write(Y), nl. % ‘nl’ = newline
?- hanoi(3, a, b, c).
Move disk 1 from a to b
Move disk 2 from a to c
Move disk 1 from b to c
Move disk 3 from a to b
Move disk 1 from c to a
Move disk 2 from c to b
Move disk 1 from a to b
The code comes directly from the description of the solution. The body of the
second rule of hanoi contains two recursive calls of hanoi. The recursive calls will
terminate because the first argument of hanoi is reduced by one in each successive
call.
As a programming language, Prolog has several built-in data types, such as numbers
(floats or integers), strings, lists, and associated functions. These data types are
subtypes of a supertype, called term. In this view, term is the only data type in
Prolog. Term is either atoms, numbers, variables, or compound terms. Atomic
formulas in Prolog are called callable term.
296 8 Prolog: Programming in Logic
We have been using “atom” for “atomic formula”; in this section, we will use
“atomic formula” explicitly because “atom” has special meaning in Prolog: an atom
is either an identifier other than variables or a single-quoted string of characters.
Examples of atoms include x, red, “Taco,” and “some atom.” An atom is a
general-purpose name with no inherent meaning.
Prolog provides the predicate atom/1, where atom(X) is true iff X is an atom.
For example, atom(“some atom”) is true; atom(123) is false.
Numbers can be floats or integers. The range of numbers depends on the number
of bits (amount of computer memory) used to represent the number. Most of
the major Prolog systems support arbitrary length integer numbers. The treatment
of float numbers in Prolog varies from implementation to implementation; float
numbers are not all that heavily used in Prolog programs, as the emphasis in Prolog
is on symbol manipulation.
Prolog provides all the usually predicates on numbers, such as >/2, </2,
=</2, >=/2, etc., and arithmetic operations, such as -/1, +/1, +/2, -/2,
*/2, //2, mod/2, rem/2, gcd/2, lcm/2, abs/1, sign/1, max/2, min/2,
floor/1, ceiling/1, truncate/1, sqrt/1, sin/1, cos/1, etc. You use
=:=/2 to check if two numbers are equal and =\=/2 for not equal. The symbol
=/2 is reserved for unification: s = t returns true if s and t are unifiable.
Prolog also provides predicate number/1 to check if a term is a number or
not; integer/1 for being an integer; and float/1 for being a float number with
“.” For example, float(1.0) returns true; float(1) returns false. Note that
integer(1+1) returns false as 1+1 is a compound term. If we check the value
of 1+1<3, Prolog will return true because 1 + 1 will be evaluated to 2 before < is
called on (2, 3).
Variables are denoted by an identifier consisting of letters, numbers, and
underscore characters and beginning with an uppercase letter or underscore. The
underscore alone, “_”, denotes a nameless distinct variable. A nameless variable
is used when we do not care about the value of the variable. For example, if we
define the multiplication relation mul(X, Y, Z), where Z = X*Y, the first rule can
be mul(0, _, 0), where the second position is a nameless variable.
Prolog variables closely resemble variables in logic in that they can be instanti-
ated by arbitrary terms through unification. If a variable is instantiated, we say “the
variable is bounded.” Some operations require the involved variables to be bounded.
For instance, to show
descen(X, Z) :- parent(Y, X), descen(Y, Z).
In Prolog, there is a special infix predicate called is/2, which is used in the
Hanoi example. It resembles the assignment statement of a variable by a numeric
expression in a conventional programming language. For example, the value of X
is 1+1 is true with X is instantiated with 2. That is, Prolog will evaluate the right
side of is, if it is a number, that number is bounded to the variable on the left side
of is; otherwise, it will abort with error. In contrast, X = 1+1 will also return true,
but it will bound the variable X to the compound term 1+1. As another example, “X
= a” will return true and “X is a” will be aborted, because a is not a number.
In the definition of dfs, we used “_” twice in each rule; each “_” represents a
unique variable.
One special member of compound terms is list, which is an ordered collection of
terms. It is denoted by square brackets with the terms separated by commas or in
the case of the empty list, by []. Here are some lists of three elements, [1,2,3],
[red,green,blue], or [34,tom,[2,3]]. Lists are the replacement of arrays of
conventional programming languages.
For a non-empty list, the first member of a list is separated from the rest members
by the symbol |. For example, the result of [X | Y] = [1,2,3] is X = 1 and Y
= [2, 3] and the result of [X | Y] = [1] is X = 1 and Y = [].
298 8 Prolog: Programming in Logic
Two lists are unifiable if they are the same length, and all their elements are
pairwise unifiable.
?-[a,b,c,d]=[Head|Tail]. ?- [(a+X),(Y+b)]=[(W+c),(d+b)].
Head = a, W = a,
Tail = [b,c,d]? X = c,
yes Y = d?
yes
?- [a,b,c,d]=[X,Y|Z].
X = a, ?- [(a+X),(Y+b)]=[(W+c),(X+Y)].
Y = b, no
Z = [c,d]?
yes
• append/3: Check if the third list is the concatenation of the first two lists.
append([], Y, Y).
append([F|X], Y, [F|Z]) :- append(X, Y, Z).
where \= means “not unifiable.” For instance, delete(E, [a,b], Z) has only
one solution: E = a, Z = [b]; the second answer, E = b, Z=[a], could not
be generated by the above code, because Prolog treats E \= a as false (i.e., E =
a is true).1 This shows another difference of select and delete.
• reverse/2: reverse(List1, List2) is true when the elements of List2 are
in reverse order compared to List1.
reverse([], []).
reverse([F|X], Z) :- reverse(X, Y), append(Y, [F], Z).
One common feature of these predicates is that they are all defined recursively.
The termination of these definitions is easy to establish as one of the arguments to
the recursive calls goes smaller, or we may show that the head of each rule is strictly
greater than its body under a simplification ordering.
Due to the relational nature of many built-in predicates, they can be used in
several directions. For example, append/3 can be used either to append two lists
into one (the first two arguments are input and the third is the output) or to split a
given list into two parts (the third argument is the input, and the first two arguments
are the output). For instance, the query “append(X, Y, [1,2,3]).” will produce
the following four answers:
X = [], Y = [1,2,3]
X = [1], Y = [2,3]
X = [1,2], Y = [3]
X = [1,2,3], Y = []
For this reason, a comparatively small set of library predicates suffices for many
Prolog programs. Note that the suggested implementation for length can compute
the answer N = 3 from length([a,b,c], N) but cannot generate a list of 3
placeholders, L = [_1, _2, _3], from length(L, 3). Some versions of Prolog,
e.g., SWI Prolog, can do so.
For some applications, we need to obtain a set S = {x | A(x)}, that is, S is
a set of elements satisfying the condition A(x). It is not easy to write a Prolog
program which generates such a set. Fortunately, in most implementations of Prolog,
such a set can be obtained by calling bagof(X, Goal(X), S). For example, in
Example 8.1.1, we may define children(F) = { C | papa(F, C) }. A query
like the following will give us a satisfactory answer:
1 For the same query, the built-in delete in gprolog gives an infinite number of solutions.
300 8 Prolog: Programming in Logic
The above code is lucid, neat and easy to understand and maintain. This is a
typical feature of a declarative programming language. Of course, its execution
is not as efficient as conventional programming languages. That is why Prolog is
used mostly for prototype programming. To compensate this weakness, most Prolog
implementations provide efficient built-in implementations for popular functions
like arithmetic operations or sorting.
Another feature of declarative programming is that the correctness of the
algorithm can be verified. For sorting, the correctness is ensured by the two
properties: (1) the output is sorted; (2) the output is a permutation of the input list.
In Prolog, permutation is a built-in function and we can define sorted with ease.
Example 8.2.2 The following Prolog program defines the predicate sorted(L),
which returns true iff L is a sorted list, and some utility predicates for the verification
purpose.
sorted([]).
sorted([_]).
sorted([X,Y|L]) :- X=<Y, sorted([Y|L]).
In the above code, the first three clauses define sorted/1. The next two clauses
illustrate the verification of isort and qsort. The last clause shows how the
requirement can be the code itself. The code of permutation/2 can be found in
Sect. 8.2.3.
Most Prolog implementations provide two versions of predicates for sorting:
sort/2, which removes duplicate elements, and msort/2, which does not remove
duplicates. For instance, the query “sort([b,a,c,a], X).” will produce X =
[a,b,c], and “ msort([b,a,c,a], X).” will produce X = [a,a,b,c]. These
built-in predicates run faster than the user-defined sorting algorithms.
Any recursive definition, whether in Prolog or some other languages, needs two
things: base cases and recursive cases. The base cases specify a condition of when
the recursion terminates. Without this, the recursion would never stop! For example,
the base case of append(X, Y, Z) is append([], Y, Y) when X=[]. In a
Prolog program, the clauses for the base cases almost always come before those
for the recursive cases, which ask a program to handle a similar problem of smaller
size. For example, append([U|X], Y, [U|Z]) :- append(X, Y, Z).
The danger of recursion is that the program may loop forever and never terminate.
The termination problem is more complicated in Prolog than in conventional
programs because the input to a Prolog program may contain variables, and we
may ask for multiple solutions from a single query.
Definition 8.3.1 A Prolog program P is said to have first termination on an input
x if P terminates with either no solution or the first solution on x. P is said to have
last termination on x if P terminates after all solutions of x are generated. If P
has first (last) termination on any ground input x, then P is said to have first (last)
ground termination.
Example 8.3.2 Given the following program P , defining ancestor(X, Y ) as “X
is an ancestor of Y .”
ancestor(tom, jack).
ancestor(X, Y) :- ancestor(X, Z), ancestor(Z, Y).
P has first termination on the query “?- ancestor(A, B)” with the solution
A=tom, B=jack, but not on ?- ancestor(jerry, tom), as the second clause
will be used forever during the search. Since the second query is ground, P has no
first ground termination. Since last termination implies first termination, P has no
last ground termination.
Prolog is a relational programming language, and relations can hold between
multiple entities that are reported on backtracking. Therefore, first termination does
not fully characterize the procedural behavior of a Prolog query, and we need
the concept of last termination. We may use the termination of the query “?- Q,
false.” to check if Q has last termination, because all the solutions of Q must be
generated before the query terminates.
Consider the member/2 program in Sect. 8.2.3, the program has ground termi-
nation when the second argument of member is ground. However, the query “?-
member(X, Y), false.” will fail with a stack overflow error, thus showing that
member(X, Y) has no last termination.
Example 8.3.3 Given a propositional formula A in negation normal form (NNF,
see Sect. 2.3.1), we may use the following program to decide if A is satisfiable,
8.3 Recursion in Prolog 303
assuming the formula uses Prolog variables for propositional variables, and, or,
not for ∧, ∨, ¬, respectively.
sat(true). % positive literal
sat(not(false)). % negative literal
sat(or(X, _)) :- sat(X).
sat(or(_, Y)) :- sat(Y).
sat(and(X, Y)) :- sat(X), sat(Y).
The first two clauses are the base cases of sat which handle literals, and the next
three clauses are the recursive cases which handle or and and.
It is easy to see that the argument to the recursive call is strictly less than the
argument in the head of each rule. Thus, we expect that the program will terminate
with the test examples. Now, we create three queries to test the above program:
test1(X, Y) :- sat(and(not(X), X)). % test examples
test2(X, Y) :- sat(and(X, not(Y))).
test3(X, Y) :- sat(and(X, or(not(X), Y))).
The query “?- test1(X, Y).” will say no; thus, sat has last termination on
and(not(X), X). For the second query, “?- test2(X, Y).”, Prolog will give
the first solution X=true, Y=false. The third query “?- test3(X, Y).” will
produce the first solution X=true, Y=true. Thus, sat has first termination on the
last two queries.
However, if we look for more answers from ?- test2(X, Y), we will see the
following answers:
X = not(false), Y = false
X = or(true,_), Y = false
X = or(not(false),_), Y = false
X = or(or(true,_),_), Y = false
X = or(or(not(false),_),_), Y = false
X = or(or(or(true,_),_),_), Y = false
X = or(or(or(not(false),_),_),_), Y = false
...
These answers are correct in Prolog, but not correct for propositional satisfiability
(which asks for the models of a formula). If you ask for more answers from ?-
test3(X, Y), you will see an infinite number of answers, too. Thus, sat has last
termination neither on and(X, not(Y)) nor on and(X, or(not(X), Y)). It is
easy to check that the program has both first and last ground terminations by an
induction on the term structure of the input.
The problem illustrated by the above example is that when variables appearing
in the arguments of a recursive program are not bounded, they can be substituted
by terms containing other unbound variables, thus creating an infinite number of
possibilities. This is the same reason why member(X, Y) has no last termination.
304 8 Prolog: Programming in Logic
A poor definition of a recursive predicate may cause the program to loop forever.
In Chap. 11, it is shown that checking if a Prolog program terminates on a given
input is an undecidable problem. Thus, people are interested in sufficient conditions
which ensure the termination of a Prolog program. Focused recursion is one of such
conditions.
Definition 8.3.4 A recursive predicate is said to be focused on an input t0 if there
exists a well-founded order such that t1 t2 · · · tn , where ti are the values
of the same argument appearing in the predicate’s consecutive recursive calls.
The argument in the above definition is called the focused position of the
predicate. Given the definition of append/3 in Sect. 8.2.3, the predicate append
is focused on the first argument when the first argument is bound to a list of items
(which can be variables) of finite length. Each recursive call reduces the number
of items in the list by one. On the other hand, if the first argument is bound to a
variable, the call may go on forever. The same can be said about member/2 or sat
in Example 8.3.3, where one sequence of calls of sat in the query “test2(X, Y).”
may contain the following values as the argument of sat:
and(X, not(Y)), X, or(X1, _), X1, or(X2, _), X2, or(X3, _), X3, ...
In Prolog, due to the power of logic variables, many predicates can be naturally
written in a tail recursive way, where the recursion happens in the last position of
the body. For example, the predicates list, member, append, select, and
delete are tail recursive as given in Sect. 8.2.3. On the other hand, reverse,
permutation, and length are not tail recursive.
In many cases, tail recursion is good for performance. For conventional pro-
grams, a smart compiler can change tail recursion to an interactive one, thus saving
the time and space of using a stack to store the environments of a recursive call.
In Prolog, the same technique can be used so that a tail recursive call means that
the Prolog system can automatically reuse the allocated space of the environment
on the local stack. In typical cases, this measurably reduces memory consumption
of your programs, from O(N) in the number of recursive calls to O(1). Since
decreased memory consumption also reduces the stress on memory allocation and
garbage collection, writing tail recursive predicates often improves both space and
time efficiency of your Prolog programs.
Example 8.3.7 The reverse/2 predicate defined in Sect. 8.2.3 is copied here:
reverse([], []).
reverse([F|X], Z) :- reverse(X, Y), append(Y, [F], Z).
Suppose the query is “?- reverse([a, b, c, d], L).” The subgoal list sees
?- reverse([], []), append([], [d], L3), append(L3, [c], L2),
append(L2, [b], L1), append(L1, [a], L), before it is shrinking. To avoid
this long list of subgoals, we may introduce a tail recursive rev/3:
reverse(X, Y) :- rev(X, [], Y).
rev([], R, R).
rev([U|X], Y, R) :- rev(X, [U|Y], R).
Now, for query ?- reverse([a, b, c, d], L), the subgoal list never contains
more than one item at any time:
?- reverse([a, b, c, d], L)
?- rev([a, b, c, d], [], L)
?- rev([b, c, d], [a], L)
?- rev([c, d], [b, a], L)
?- rev([d], [c, b, a], L)
?- rev([], [d, c, b, a], L)
L = [d, c, b, a].
This example illustrates that tail recursion can save space and time. However, you
should not overemphasize it, as for beginners, it is more important to understand
termination and to focus on clear declarative descriptions.
306 8 Prolog: Programming in Logic
Example 8.3.8 Below is a simple Prolog program for generating all legal positions
for the N-queens problem. The predicate queen(N, R) takes N as the number of
queens and generates a list R for the row numbers of N queens in the N columns.
% queen(N, R) generates at first a list R of N variables.
% Assuming one variable per column, then generate values for R,
% which gives the row number of the queen in each column.
queen(N, Res) :-
length( Res, N), % Res = N variables for columns
gen_list( 1, N, L), % L = [1, 2, ..., N]
solution( N, Res, L). % Res is a permutation of L
Given the query “?- queen(4, R).”, the first answer is R = [2,4,1,3],
which says that the queens are placed at row 2 in the first column, row 4 in the second
column, etc., and the second answer is R = [3,1,4,2]. In the above program, each
position is generated and checked against the positions generated previously, so that
no two positions can attack each other by the move of queens. Using Prolog’s depth-
first search, the program is simple to write but highly inefficient. When N > 12, it
will take quite a while to find one solution, much slower than a SAT solver. Sudoku
puzzles can also be solved by Prolog using the same idea. However, a Sudoku puzzle
cannot be solved efficiently using this generate-and-test approach, as the search
space is huge. SAT solvers are a much better tool for Sudoku puzzles.
8.4 Beyond Clauses and Logic 307
?- X = f(X,Y).
cannot display cyclic term for X
?- unify_with_occurs_check(X, f(X,Y)).
no
The cut, in Prolog, is a predicate of arity 0, written as !, which always succeeds but
cannot be backtracked. It is best used to prevent unwanted backtracking, including
the finding of extra solutions by Prolog and to avoid unnecessary computations.
Example 8.4.1 In this contrived example, a cut operator is placed in the first rule:
a :- b, !, c. % 1
a :- c. % 2
b :- d. % 3
b :- e. % 4
c :- d. % 5
c :- e. % 6
d :- write(’d’). % 7
e :- write(’e’). % 8
For the query “:- a.”, Prolog will try rule 1 first. The goal list now is [b, !, c].
To solve b, Prolog will try rule 3, reducing b to d. Prolog will succeed with rule 7.
Now, the cut operator is passed, and Prolog will solve c by rules 5 and 7 with the first
success. If we look for the next solution, Prolog will succeed for the second time by
solving c of rule 1 with rules 6 and 8. If we ask Prolog for more solutions, Prolog
308 8 Prolog: Programming in Logic
will backtrack to the cut operator. However, the meaning of the cut operator is “no
longer backtrack.” Thus, Prolog will terminate with only two successes. Without the
cut operator, Prolog should find four more solutions: two solutions from b of rule 1
with rules 4 and 5–8 and two solutions from rules 2, 5–8.
The above example illustrates the use of the cut operator: when “!” is reached
by engine1 , alternative rules for the head of this rule, such as rule 2 for a, are
eliminated; alternative rules for all the left siblings of “!”, such as rule 4 for b,
are also eliminated. The right siblings of “!” are not affected by “!”.
Some programmers call the cut a controversial control facility because it was
added for efficiency only and is not a function of logic.
A green cut is a use of cut which only improves efficiency. Green cuts are used
to make programs more efficient without changing program output. For example, in
the definition of split predicate of qsort, we may insert a cut after H =< P of the
second rule, that is, the three clauses become the following:
split(_, [], [], []).
split(P, [H|I], [H|S], B) :- H =< P, !, split(P, I, S, B).
split(P, [H|I], S, [H|B]) :- H > P, split(P, I, S, B).
If the current goal G is split(3, [2, 4, 1], L, R), then G unifies with the
heads of the last two rules. Once the subgoal H =< P, i.e., 2 =< 3, succeeds, the
last clause will never be tried due to the cut operator, even if G unifies with the head
of this clause, and we want to see more answers. This cut is a green cut, because we
know the subgoal H>P, i.e., 2 > 3, will fail and the last clause cannot be applied.
The cut operator saves us the time for trying unifying G with the head of the last
clause. With or without the cut, the program performs the same way as desired.
A red cut is a use of cut which changes the meaning of the program if the cut is
missing. For the above split example, if H =< P fails, then H > P will succeed.
To save the cost of H > P, we can write the split program as the following:
split(_, [], [], []).
split(P, [H|I], [H|S], B) :- H =< P, !, split(P, I, S, B).
split(P, [H|I], S, [H|B]) :- split(P, I, S, B).
This program will give us the same result as the previous one for any input.
However, this program will perform differently from the following program, where
the cut is gone:
split(_, [], [], []).
split(P, [H|I], [H|S], B) :- H =< P, split(P, I, S, B).
split(P, [H|I], S, [H|B]) :- split(P, I, S, B).
So, the above cut is a red cut. Red cuts are potential pitfalls for bugs. For example,
if the order of the two rules is reversed, the green cut will work correctly and the
red cut will produce wrong results. If the saving is significant when using a red cut,
detailed documentation should be provided along the code. The cut should be used
sparingly. Proper placement of the cut operator and the order of the rules is required
to determine their logical meaning.
8.4 Beyond Clauses and Logic 309
We can use the same idea of “cut fail” to define the predicate not, which takes
a term as an argument, and use the built-in predicate “call.” not(G) will “call” the
term G, evaluate G as though G is a goal. If G succeeds, so is call(G), and not(G)
will fail. Otherwise, not(G) succeeds.
not(G) :- call(G), !, fail.
not(_).
Most Prolog systems have a built-in predicate like not. For instance, SWI-Prolog
calls it \+. Remember, “not” is not the same as ¬, because it is based on the
success/failure of goals. It can, however, be useful:
likes(mary, X) :- not(reptile(X)). % Mary does not like reptiles.
different(X, Y) :- not(X = Y). % Not equal means "different"
Negation as failure can be misleading. Suppose the database held the names of
members of the public, marked by whether they are innocent or guilty of some
offense, and expressed in Prolog:
innocent(peter_pan).
innocent(winnie_the_pooh).
innocent(X) :- occupation(X, nun).
guilty(joe_bloggs).
guilty(X) :- occupation(X, thief).
?- innocent(einstein).
no.
?- guilty(einstein).
yes.
of “negation as failure,” people suggest the closed world assumption: every ground
atomic formula is assumed to be false unless it can be shown to be true by the
program. Under this assumption, since we cannot show that innocent(einstein)
is true in the program, so innocent(einstein) is assumed to be false, thus it
justifies the answer guilty(einstein).
Some disturbing behavior is even more subtle than the innocent/guilty problem
and can lead to some extremely obscure programming errors. Here is a Prolog
program about restaurants:
good_standard(godels).
good_standard(hilberts).
expensive(godels).
reasonable(R) :- not(expensive(R)).
?- good_standard(X), reasonable(X).
X = hilberts
yes
?- reasonable(X), good_standard(X).
no.
-> G1; G2, which is not a clause, is equivalent to two clauses: H :- C, G1 and H
:- ¬C, G2 (which is not a Horn clause).
Once a predicate is declared as dynamic, Prolog allows the clauses of that
predicate to be inserted or removed during the execution. For Example 8.1.1, if
we have a file which contains the pair of father-children names, we can declare
papa as dynamic: dynamic(papa). During the execution, we read names into
Father and Child and call assertz(papa(Father, Child)), which adds the
clause papa(Father, Child) at the end of all existing clauses of the predicate
papa. If we want to add the new clause at the beginning of the existing clauses, the
command is asserta/1. To remove a clause during the execution, the command
is extract/1. This set of commands allows the user to dynamically change the
program that occurs as the result of executing assertz/1 or retract/1, which
removes the clause added by assertz/1. The change does not affect any activation
that is currently being executed. Thus, the database is frozen during the execution
of a goal, and the list of clauses defining a predicate is fixed at the moment of its
execution.
Since the birth of Prolog, other paradigms for logic programming have also been
introduced. In this section, we briefly describe some of them.
Datalog
In Prolog, the subgoals from the body of a Prolog rule are solved from left to right.
Constraint logic programming (CLP) extends Prolog by relaxing this left-to-right
order for certain subgoals. A constraint is a predefined relation on variables, e.g.,
X + Y > 8, which appears also in Prolog programs. The evaluation of a constraint
is similar to that of Prolog. Unlike Prolog, when the CLP interpreter encounters
a constraint which is not evaluable, it will put the constraint in a constraint store.
312 8 Prolog: Programming in Logic
Exercises
Find the four solutions for the query “?- append(X, Y, [a, b, c]).” For
each of the four solutions, find the corresponding resolution proof of each
solution.
2. Given the program in Example 8.1.2, please answer the following questions:
(a) What is the result of the program when converting ¬(p → (q ∨ ¬p))
to NNF? Please display your NNF using both Prolog and conventional
formats.
(b) Do you get the same result if you change the order of clauses? Please
explain all possible cases.
(c) What is the resolution proof associated with your result?
3. For the program in Example 8.3.3, place the clause for handling and before
those for or, and run the query “?- test2(X, Y).” for three answers. Please
produce the resolution proof for each of the three answers.
4. Implement in Prolog the selection sort, ssort(X, Y ), where X is a list
of integers and Y is the result of sorting X. You will need to implement
maxNum(L, A), which returns true when A is maximal among the numbers
in list L. Your selection sort will move repeatedly the maximal number from
the input to the output.
5. Implement in Prolog the merge sort, msort(X, Y ), where X is a list of integers
and Y is the result of sorting X.
References 313
11. Introduce a tail recursive predicate for each of the following predicate: (a)
length/2 and (b) permutation/2, so that the same result can be obtained
through the tail recursive calls.
12. Given the following Prolog program which gives reviews on restaurants
not(P) :- call(P), !, fail.
not(_).
good_standard(godels).
good_standard(hilberts).
expensive(godels).
reasonable(R) :- not(expensive(R)).
Do the following two queries provide the same result? Why?
?- good_standard(X), reasonable(X).
?- reasonable(X), good_standard(X).
References
1. William F. Clocksin and Christopher S. Mellish: Programming in Prolog. Berlin; New York:
Springer-Verlag, 2003
2. Daniel Diaz, The GNU Prolog web site, http://gprolog.org, retrieved 2023-06-20
Chapter 9
Hoare Logic
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 315
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2_9
316 9 Hoare Logic
on natural numbers (therefore should work for every natural number), we just
do not have the ability to test the correct outcome for every natural number.
Formal verification may prove the desired property for every natural number by
mathematical tools.
Imperative programming languages like C, C++, Java, or Python are closely related
to the random-access stored-program (RASP) computing model and is an example
of the so-called von Neumann architecture. Algorithms written in imperative
languages are called imperative programs, where a program describes subsequent
steps that change a computer’s memory. Many formal verification techniques are
shared for imperative programs and functional programs.
We will use the word “state” to denote a valuation of all variables in an imperative
program. A command in the program may change the value of variables, thus
causing a transition of states. For example, suppose x holds the value 1 before the
command “x := x + 1,” the value of x will be 2 after the execution of x := x + 1.
A state can be easily specified by a first-order formula, called assertions, such as
. .
“x = 1,” “A[0] = 2,” etc.
Executing an imperative program has the effect of changing the current state. To
reason about such a program, one first establishes an initial state by specifying the
initial values of variables. One then executes the program, and this transforms the
initial state into a final one through a sequence of state transitions. One then retrieves
the values of variables in the final state to get the desired results. For example, to
compute “z := exp(x, y),” the initial state specifies the values of x and y, and the
.
final state specifies the value of z satisfying “z = x y ” to judge that the program
exp(x, y) is correct.
Hoare logic (also known as Floyd-Hoare logic or Hoare rules) is a formal
system with a set of logical rules for reasoning rigorously about the correctness
of imperative programs. It was proposed in 1969 by the British computer scientist
and logician Tony Hoare and subsequently refined by Hoare and other researchers.
The original ideas were seeded by the work of Robert Floyd, who had published a
similar system for flowcharts.
Hoare logic is an axiomatic approach to formal verification, which consists of (1)
a first-order logic language for making assertions about programs and (2) rules for
establishing state transitions, that is, what assertions hold after a transition of states.
From the assumption on the input, we create a precondition of a program, and from
the requirement on the output, we create a postcondition of the program. Both the
precondition and the postcondition are assertions, i.e., formulas of the first-order
logic. We then use various methods, including theorem provers, to show that the
postcondition is a logical consequence of a formula derived from the precondition
and the program code by the rules of Hoare logic.
9.1 Formal Verification of Computer Systems 317
In the previous chapter, we mentioned that general recursive functions and Lambda
calculus are the foundations for today’s functional programming languages, such as
Lisp, ML, Clojure, or Haskell. In the imperative paradigm, we specify exactly how
an algorithm should run; in the functional paradigm, we specify what the result of
the algorithm looks like.
Example 9.1.1 Consider the following imperative program which computes the
sum of all numbers in an integer array. After the execution of the program, we
.
expect that “s = sum(n, A)” is true.
.
s := 0 sum(0, A) = 0
.
for i := 0 to n − 1 do sum(i + 1, A) = sum(i, A) + A[i]
s := s + A[i]
od
The definition of sum is in fact a functional program, which specifies the expected
outcome of the imperative program.
In imperative programs, variables can be passed through parameters into a
function, and there are in general two modes: “pass by value” and “pass by
reference.” In the “pass by value” mode, values of the variables are copied to the
parameters, and then, the copied objects are used in the function. In the “pass by
reference” mode, a variable itself is passed by the parameter into the function, and
any change to the parameter will affect the value of the variable (outside of the
function). Therefore, an assignment inside of a function may modify the global
state. This modification is called a “side effect.” Side effects often create unexpected
outcomes, as well as hard-to-detect bugs. Because of this unsafe feature, some
programming languages, such as Java, prohibit “call by reference.” That is, Java
passes everything by value, including references, though peple can simulate “call
by reference” with Java’s container objects such as array or collection.
In functional programs, variables are absent. Parameters of a function look like
variables and serve as placeholders of values. We cannot create global variables
that influence how our functions work. Functional programming intends to make all
functions pure, with the exception of I/O functions that can be modeled using special
constructs such as monad, which is a structure that combines program fragments and
wraps their return values in a type with additional computation.
Pure functional programs do not have the burden of maintaining a global
state and resemble mathematical functions. Thus, various mathematical tools can
be used to reason about their properties. For instance, in Example 7.3.12, we
defined the insert sorting algorithm, isort, by a set of equations, which can be
regarded as a functional program. The correctness of isort can be reduced to two
theorems: (a) sorted(isort (x)) (the output of isort (x) is a sorted list), and (b)
permu(x, isort (x)) (isort (x) is a permutation of x).
318 9 Hoare Logic
The Boyer-Moore theorem prover (NQTHM) [2], started in 1971, uses Lisp as
a working logic to define total recursive functions. Over the decades, NQTHM
evolved considerably and achieved significant progress in formal specification and
verification:
• (1976) an expression compiler for a stack machine (Boyer and Moore)
• (1983) invertibility of the RSA encryption algorithm (Boyer and Moore)
• (1985) FM8501 microprocessor (Warren Hunt)
• (1988) an operating system kernel (William Bevier)
• (1989) Piton assembly language on FM9001 (Boyer and Moore)
• (1990) Gauss’ law of quadratic reciprocity (David Russinoff)
• (1992) Byzantine Generals and clock synchronization (Bevier and Young)
• (1992) a compiler for the NQTHM language (A. Faltau)
• (1992) the 32-bit FM9001 microprocessor (Hunt and Brock)
• (1993) bi-phase mark asynchronous communications protocol (Boyer and
Moore)
• (1993) a small compiler (Debora Weber-Wulff)
• (1993) Motorola MC68020 and Berkeley C String Library (Yuan Yu)
An outstanding successor to NQTHM is ACL2 (a shorthand for “A Computa-
tional Logic for Applicative Common Lisp”), developed by Matt Kaufmann and J.S.
Moore. They received the 2005 ACM Software System Award for building ACL2.
Like NQTHM, ACL2 is a logic and programming language in which you can model
computer systems, together with a tool to help you prove properties of those models.
For more information, the interested reader may visit ACL2’s website:
www.cs.utexas.edu/users/moore/acl2/
Recent work using Coq and Isabelle for the verification of kernels of operating
systems and compilers has also been encouraging and promising. For people who
are doubtful about the performance of functional language, you may use them at
least as a logic language in which other designs could be conveniently formalized
and verified. For instance, verification of operating systems such as sel4 is generally
about imperative C programs, using functional programs for specification. You then
use verified compilers to generate efficient and secure code for added safety and
correctness.
The central idea of Hoare logic is Hoare triples, a notation proposed first by
Hoare. A postcondition specifies the desired behavior of the program. Given a
postcondition, a Hoare triple specifies exactly what precondition that a program
asks before the execution. That is, a Hoare triple describes how the execution of
a piece of code changes the state of the computation, we regard precondition and
9.2 Hoare Triples 319
{P } C {Q}
where P and Q are assertions, which are formulas in a first-order language, and C is
a program code. P is the precondition and Q the postcondition: assuming P is true
before the execution of C, C finishes the execution, and Q is true after the execution
of C, we say “{P } C {Q}” is true.
. .
For instance, {x = 1}x := x + 1{x = 2} is true, because the value of x changes
.
from 1 to 2 after the execution of x := x + 1. On the other hand, {x = 1}x :=
.
x + 1{x = 3} is false. Since a triple may be true or false, we call it a statement in
the general sense.
Example 9.2.1 Consider the program from Example 9.1.1, which computes the
sum of all numbers in an integer array with precondition and postcondition.
{ P : A is an array of n integers, and n ≥ 0 }
s := 0
for i := 0 to n − 1 do
s := s + A[i]
.
{ Q: s = n−1 i=0 A[i] }
At first glance, P and Q are not the formulas we saw in the previous chapters.
An array is not different from a list in logic. In Prolog, we have seen that a list is
a compound term. Thus, it is not strange to represent an array by a term in first-
order logic, and we may simply treat an array as a term in the logic. Similarly,
we can also treat n−1 i=0 A[i] as another term. In fact, we may define a function
sum : N, Array → N recursively as follows:
.
sum(0, A) = 0
.
sum(i + 1, A) = sum(i, A) + A[i]
n−1
Then, i=0 A[i] is equivalent to term sum(n, A).
Now, suppose we can specify preconditions and postconditions in first-order
logic, how do we prove that a postcondition is a theorem from the precondition
and the program? This is handled by the Hoare rules for each type of command in
a small programming language. In this language, for simplicity, we consider only
three types of commands:
1. Assignment, i.e., x := E, where x is a variable and E is an expression as
understood in a programming language
2. Conditional, i.e., if B then S1 else S2 fi, where B is a Boolean expression, S1 and
S2 are sequences of commands
3. While loop, i.e., while B do S od, where B is a Boolean expression and S is a
sequence of commands
320 9 Hoare Logic
s := 0 s := 0; i := 0
for i := 0 to n − 1 do while (i < n) do
s := s + A[i] s := s + A[i]; i := i + 1
od od
Hoare rules cover the three basic commands in the small programming language,
plus one rule for the composition of commands and one rule for logical implications.
Assignment Rule
The assignment rule states that, after the assignment, any assertion that was
previously true for the right side of the assignment now holds for the variable.
Formally, let P be an assertion in which the variable x is free, then
{P [x E]} x := E {P }
. .
Examples of valid triples include {x = 4} y := x + 1 {y = 5} and {x ≤ N −
.
1} x := x + 1 {x ≤ N}. In the first example, to show that y = 5 is true, the
. .
assignment rule produces the precondition x + 1 = 5, which is equivalent to x = 4,
.
from y = 5. In the second example, the assignment rule produces the precondition
x + 1 ≤ N, which is equivalent to x ≤ N − 1, from x ≤ N.
Since we have not given a formal definition of “state,” we cannot prove rigorously
the correctness of the assignment rule. However, we may argue informally. To show
that {P [x E]} x := E {P } is sound, let s be the state before x := E and s the
state after. So, s = s[x → E] (assuming E has no side effect). P [x E] holds in
s iff P holds in s = s[x → E], because
1. Every variable, except x, has the same value in s and s .
2. P [x E] has every x in P replaced by E.
3. P has every x evaluated to E in s = s[x → E].
The major application of the assignment rule is to find the precondition, i.e.,
P [x E], of x := E from the desired postcondition P , not forward. Be careful
not to try to do this “forward” by following this incorrect way of thinking: {P } x :=
. .
E {P [x E]}; this rule leads to examples like {x = 4} x := 5 {5 = 4}. Another
.
incorrect rule looking tempting at first glance is {P } x := E {P ∧ (x = E)}; it leads
. . .
to illogical examples like {x = 5} x := x + 1 {x = 5 ∧ x = x + 1}.
While a given postcondition P uniquely determines the precondition P [x E]
by the assignment rule, the converse is not true. That is, different postconditions are
available for the same precondition and the same code. For example,
• {0 ≤ y ∧ y ≤ 4} x := y {0 ≤ x ∧ x ≤ 4}
• {0 ≤ y ∧ y ≤ 4} x := y {0 ≤ y ∧ x ≤ 4}
• {0 ≤ y ∧ y ≤ 4} x := y {0 ≤ y ∧ y ≤ 4}
are valid instances of the assignment rule.
The assignment rule proposed by Hoare can apply to multiple mutual assign-
ments. For example, to swap two elements of an array, some programming
languages support the command A[i], A[j ] := A[j ], A[i]. The assignment rule
for this type of commands is
{P [x E1 , y E2 ]} x, y := E1 , E2 {P }
Implication Rule
The implication rule, also called the consequence rule, is purely logical as it does
not involve any command of a programming language but concerns the implication
relation of assertions:
P1 → P2 {P2 } S {Q1 } Q1 → Q2
{P1 } S {Q2 }
322 9 Hoare Logic
Conditional Rule
The conditional rule states that a postcondition Q common to the two branches of
an if-then-else statement is also a postcondition of the whole if-then-else statement:
In the “then” part and the “else” part, B and ¬B are added to the precondition
P , respectively.
Example 9.2.3 To show the following triple
is true by the conditional rule, we need to show the following two statements:
To prove (1) and (2), we obtain the following by the assignment rule:
(1 ) {x ≤ 10} x := x {Q : x ≤ 10},
(2 ) {10 ≤ 10} x := 10 {Q : x ≤ 10}.
9.2 Hoare Triples 323
It is trivial to show that (1) and (2) are true by the implication rule based on (1 ) and
(2 ).
If the “else” part is missing, the conditional rule becomes
{B ∧ P }S{Q} ¬B ∧ P → Q
{P } if B then S fi {Q}
is true, by the conditional rule, we need to show the following two statements:
. . .
(1) {P1 : x = a ∧ y = b ∧ x > y} y := x {Q : y = max(a, b)};
. . .
(2) (x = a ∧ y = b ∧ ¬(x > y)) → y = max(a, b).
Iteration Rule
{I ∧ B} S {I }
{I } while B do S od {¬B ∧ I }
Here, I is the so-called loop invariant, which is true at the beginning of the loop
and is preserved by the loop body S, i.e., after the loop is finished, I still holds. The
loop will terminate if B becomes false. The assertion ¬B, which is called the exit
condition of the loop, is added to I as the postcondition of the loop.
Example 9.2.6 Assuming x is an integer, to show that the following statement is
true
.
{I : x ≤ 5} while x < 5 do x := x + 1 od {Q : x = 5}
{x ≤ 5 ∧ x < 5} x := x + 1 {I : x ≤ 5}
Sequence Rule
The assignment, conditional, and iteration rules cover the three basic commands
of the small programming language. The sequence rule applies to sequentially
executed programs S and T , where S executes prior to T and is written as S; T :
where Q is called the middle assertion. For example, consider the following two
. . .
instances of the assignment axiom: {x = 4} y := x + 1 {y = 5} and {y = 5} z :=
. . .
y {z = 5}. By the sequence rule, one concludes {x = 4} y := x + 1; z := y {z = 5}.
9.2 Hoare Triples 325
Now, we have all the rules needed to show the correctness of any program in the
small programming language.
Example 9.2.7 Consider the program in Example 9.1.1:
.
{n ≥ 0} sum(0, A) = 0
.
s := 0; i := 0 sum(i + 1, A) = sum(i, A) + A[i]
while (i < n) do
s := s + A[i]; i := i + 1
od
.
{s = sum(n, A)}
Since the code contains a while loop, a crucial step in applying the Hoare logic is
to find a loop invariant. Examining the body of the loop, we see that the values of s
and i are changed inside the body. A good loop invariant must reflect these changes.
Since the variable s accumulates the sum of the first i elements of A, a first guess
.
for a suitable loop invariant would be I : s = sum(i, A).
By the sequence rule, we need to prove the following statements:
(1) {n > 0} s := 0; i := 0 {I }
.
(2) {I } while i < n do s := s + A[i]; i := i + 1 od {s = sum(n, A)}
The proof of (1) is easy as we just need to prove n > 0 → I [i 0, s 0],
which comes from {I [i 0, s 0]} s := 0 {I [i 0]} and {I [i 0]} i := 0 {I }.
.
Since I [i 0, s 0] is 0 = sum(0, A), which is true by the definition of sum,
hence (1) is true.
The proof of (2) is reduced to the following two statements by the iteration rule
and the implication rule:
(2.1) {I ∧ i < n} s := s + A[i]; i := i + 1 {I }.
.
(2.2) I ∧ ¬(i < n) → {s = sum(n, A)}.
The way to prove (2.1) is the same as that of (1), and we need to prove the following
formula:
the failure, we see that we need to strengthen the loop invariant by adding i ≤ n to
.
it: the new loop invariant I should be s = sum(i, A) ∧ i ≤ n. Using I , the proofs
of (1) and (2.1) will go through as before, and the proof of (2.2) becomes easy as
.
(i ≤ n ∧ i ≥ n) → i = n is true.
The above example shows that the loop invariant should contain all the informa-
tion about how the loop works:
• It reflects what has been done so far together with what remains to be done and
should contain all the changing variables.
• It is true at both the beginning and the end of each iteration.
• Together with the negation of the loop condition, it gives the desired result when
the loop terminates.
• If a proof cannot go through, we may need to strengthen the loop invariant by
adding a condition needed in the proof.
Example 9.2.8 Consider the following program which finds the maximal element
in an array:
.
{n > 0} arrMax(1, A) = A[0]
.
m := A[0]; i := 1; arrMax(i + 1, A) =
while i < n do max(arrMax(i, A), A[i])
if (m < A[i]) then m := A[i]
.
i := i + 1 max(x, y) = x if x > y
.
od max(x, y) = y if x ≤ y
.
{m = arrMax(n, A)}
Examining the body of the loop, we see that the values of m and i are changed
inside the body. A good loop invariant must reflect these changes. Since variable m
contains the maximal value of the first i elements in A, a reasonable guess for a loop
invariant would be
.
I : m = arrMax(i, A) ∧ i ≤ n.
The proof of (2) is reduced to the following two statements by the iteration rule
and the implication rule:
(2.1) {I ∧ i < n} if (m < A[i]) then m := A[i]; i := i + 1 {I }
.
(2.2) I ∧ ¬(i < n) → {m = arrMax(n, A)}
The proof of (2.1) is focused on the proof of the following statement:
Applying the conditional rule, we need to prove the following two statements:
(2.1.1.1) {I ∧ i < n ∧ m < A[i]} m := A[i] {I [i i + 1]}
(2.1.1.2) (I ∧ i < n ∧ ¬(m < A[i])) → I [i i + 1]
Applying the assignment and implication rules to (2.1.1.1), we need to prove
Hence, (2.1.1.1.1) is true. Using the same approach, (2.1.1.2) can be proved to be
true. Hence, (2.1.1) is true.
Opening I in (2.2), we have
. .
(m = arrMax(i, A) ∧ i ≤ n ∧ ¬(i < n)) → (m = arrMax(n, A)).
. .
Since i ≤ n and ¬(i < n) imply i = n, so (m = arrMax(i, A)) implies (m =
arrMax(n, A)). Thus, the proofs of (1) and (2) are complete and I is indeed a good
.
loop invariant for showing m = arrMax(n, A).
Example 9.2.9 The following program will compute the integer quotient.
{x, y > 0}
r := x; q := 0
while y ≤ r do
r := r − y
q := q + 1
328 9 Hoare Logic
od
.
{q = x/y}
Since both r and q are changed inside the loop, let the loop invariant I be
.
I: y >0∧r ≥0∧x =r +y·q
(2.1) {I ∧ y ≤ r} r := r − y; q := q + 1 {I }
.
(2.2) (I ∧ ¬(y ≤ r)) → (q = x/y)
The way to prove (2.1) is the same as to prove (1), we need to show
(I ∧ y ≤ r) → I [r r − y, q q + 1]
Hoare rules presented in the previous section concern the partial correctness
of programs: if the program stops, the postcondition will hold. If a program is
both partially correct and terminating, we say the program is totally correct. The
relationship between partial and total correctness can be informally expressed by
the equation:
After seeing only a few examples, the following two things are painfully clear:
• Correctness proofs in the program verification are typically long and boring, even
if the program being verified is quite simple.
• There are lots of fiddly little details to get right, many of which are trivial, e.g.,
. . .
proving (r = x ∧ q = 0) → (x = r + y · q).
Many attempts have been made (and are still being made) to automate correctness
proofs by designing systems to do boring and tricky bits of generating formal
proofs in Hoare logic. Unfortunately, logicians have shown that it is impossible
in principle to design a decision procedure to decide automatically the truth or
falsehood of an arbitrary mathematical statement. However, this does not mean that
one cannot have procedures that will prove many useful theorems. The nonexistence
of a general decision procedure merely shows that one cannot hope to prove
everything automatically. In practice, it is quite possible to build a system that
330 9 Hoare Logic
will mechanize many of routine aspects of verification. This section describes one
common approach to doing this.
In the previous section, it was shown how to prove {P } S {Q} by reducing this
goal to several subgoals and then putting them together using the Hoare rules to get
the desired property of S itself. For example, to prove {P } S; T {Q}, we first prove
{R} T {Q} and {P } S {R} for a middle assertion R and then deduce {P } S; T {Q}
by the sequence rule. This process is called goal reduction, or backward reasoning,
because one works backward: starting from the goal of showing {P } S {Q}, one
generates subgoals, sub-subgoals, etc., until the problem is solved. Prolog also uses
this process for computation.
Example 9.3.1 Suppose one wants to show
. . . .
{x = x0 ∧ y = y0 } r := x; x := y; y := r {y = x0 ∧ x = y0 },
then by the assignment and sequence rules, the above statement is reduced to the
subgoal:
. . . .
{x = x0 ∧ y = y0 } r := x; x := y {r = x0 ∧ x = y0 }
. . . .
because {r = x0 ∧ x = y0 } y := r {y = x0 ∧ x = y0 } is true by the assignment rule.
By a similar argument, this subgoal can be reduced to
. . . .
{x = x0 ∧ y = y0 } r := x {r = x0 ∧ y = y0 }
which is true by the assignment rule. The middle assertions for the sequence rule
are automatically generated.
We wish that the process illustrated by the above example can be extended to
arbitrary programs so that all middle assertions can be generated automatically,
releasing programmers from the burden of providing all assertions. A programmer
just needs to provide precondition and postcondition of the program derived from
the specification of the problem, and all the needed assertions can be generated
automatically. Coupling with a powerful theorem prover, all the formulas derived
from Hoare rules can be proved automatically, and we arrive at the automation of
software verification.
at least the pre- and postconditions and loop invariants; other assertions can be
generated automatically.
2. A set of formulas called verification conditions is then generated from the
annotated specification. This process is purely mechanical and easily done by
a software tool. We will describe how this is done in this section.
3. The verification conditions are proved by a theorem prover. Automating this
remains to be a big challenge.
Definition 9.3.2 Given a program S with postcondition Q, let vc(S, Q) denote the
formula P such that {P } S {Q} is true. vc(S, Q) is called a verification condition of
S with respect to Q.
If vc(S, Q) can be generated automatically from S and Q, then the process of
generating all assertions becomes automated. By definition, vc(S, Q) is not unique,
and this property gives flexibility for the automated generation of vc(S, Q). Besides
being the basis for automated verification systems, verification conditions are a
useful way of working with Hoare logic by hand.
We will define a recursive procedure V C(S, Q) on input S and Q to generate
vc(S, Q), coinciding with the goal reduction process of Hoare triples. The definition
of V C(S, Q) follows closely with Hoare rules on the three constructs and the
sequence rule of our small programming language. There are only five cases to
consider in the definition of V C(S, Q) to generate vc(S, Q).
In fact, V C(S, Q) will generate the weakest preconditions in most cases, with
the exception of while loops, but we will neither make this more precise nor go into
details here. An in-depth study of weakest preconditions can be found in [3].
For while loops, we do not have an algorithm which can generate vc(S, Q)
for every loop; only some heuristic methods are available. To help the verification
tool to generate vc(S, Q), a programmer needs to suggest a loop invariant. That
is, instead of the code while B do S od, we write while B do {I } S od,
where I is the suggested loop invariant. The pre- and postconditions can be also
inserted into a program code. This type of program codes with assertions is called
“annotated program,” where assertions are provided by the programmer to assist the
verification.
Another exception with while loops is that, in addition to returning a verification
condition for the loop, it will create another formula whose validity is needed to
ensure the validity of the Hoare triple. This formula will be pushed into a stack
denoted by s during the execution of V C. The stack s is a global variable and is
initially empty; it will hold as many formulas as the number of while loops in the
program.
332 9 Hoare Logic
First, we need to show that V C will terminate. This is an easy task because
the first argument of V C, which is a piece of code, becomes strictly smaller for
each recursive call, and it will come down to an assignment or a skip command;
there cannot exist an infinite chain of recursive calls. So, the algorithm V C must
terminate.
The proof of the theorem will be by induction on the structure of C. Such
inductive arguments have two parts. First, it is shown that the result holds for
the assignment and skip commands. Second, it is shown that when C is not an
assignment command, if the result holds for the constituent commands of C (this
is called the induction hypothesis), then it holds also for C. The first of these parts
is called the base cases of the induction, and the second part is called the inductive
cases. From the base and inductive cases, it follows that the result holds for all
commands. For instance, to show that V C returns a formula, since all the base cases
return a formula, assuming all the recursive calls will return a formula, then V C will
return a formula by examining all the inductive cases.
9.3 Automated Generation of Assertions 333
The case of the skip command is trivial; the other cases will be discussed
separately.
Assignments
The verification condition for x := E and Q is Q[x E], which is the weakest
precondition for x := E and Q. Thus, we have V C(x := E, Q) = Q[x E], and
it is easy to verify that V C(x := E, Q) is a vc(x := E, Q).
Sequences
For the code “S; T ” and postcondition Q, V C(S; T , Q) returns V C(S, V C(T , Q)).
If both V C(S, Q ) and Q = V C(T , Q) are vc(S, Q ) and vc(T , Q), respectively,
then V C(S, V C(T , Q)) is a vc(S; T , Q).
Example 9.3.5 A verification condition for the code and the postcondition in
Example 9.3.1 is easily obtained as it involves only two cases, assignment and
sequence:
. .
V C({r := x; x := y; y := r}, y = x0 ∧ x = y0 )
. .
= V C({r := x; x := y}, V C(y := r, y = x0 ∧ x = y0 ))
. .
= V C({r := x; x := y}, r = x0 ∧ x = y0 )
. .
= V C(r := x, V C(x := y, r = x0 ∧ x = y0 ))
. .
= V C(r := x, r = x0 ∧ y = y0 )
. .
= (x = x0 ∧ y = y0 )
which is the precondition for the swapping command of x and y. The three recursive
. . . .
calls of V C generate vc(y := r, y = x0 ∧ x = y0 ), vc(x := y, r = x0 ∧ x = y0 ),
. .
and vc(r := x, r = x0 ∧ y = y0 ), respectively.
Conditionals
is true.
334 9 Hoare Logic
Iterations
All verification conditions can be automatically generated except for the case
of iteration command. In this case, the validity of I → ((B → V C(S, I )) ∧
(¬B → Q)) ensures that I is a verification condition. This completes the proof
of Theorem 9.3.4.
9.3 Automated Generation of Assertions 335
Algorithm V C(S, Q) shows how assertions needed for Hoare rules are derived
from the postcondition and loop invariants automatically in a goal-reduction or
backward reasoning process. Backward reasoning can be useful for nested loops,
where the inner loop’s postcondition can be derived from the outer loop’s invariant.
.
Example 9.3.7 The precondition for the following program is P : n ≥ 0 ∧ x =
. .
n ∧ y = 1 and the loop invariant is I : x! ∗ y = n!, where x! is the factorial function:
. .
0! = 1 and (x + 1)! = x! ∗ (x + 1).
. . .
V C(while (x = 0) do {I : x!y = n!} y := x∗y; x := x−1 od, y = n!) = (x!y = n!)
which can be proved to be true by the Hoare rules with the loop invariant x ≤ 5.
If we modify V C such that V C(while B do {I } S od, Q) returns
which is equivalent to
I ∧ (B → V C(S, I )) ∧ (¬B → Q)
. .
Let I be x = 0 ∨ x = 1 for the above program, then V C(while (x < 5) do {I } x :=
.
x + 1 od, x = 5) would return
.
I ∧ (x < 5 → V C(x := x + 1, I )) ∧ (x ≥ 5 → x = 5)
For the iteration, instead of using a stack to store formulas, we simply write
the formulas on the screen. Below is the Prolog code for treating assignments,
conditionals, iterations, and skip:
% vc1(S1, Post, VC): VC = vc(S1, Post), S1 is a single command
% assignment
vc1(let(X, E), Post, VC) :- substitute(X, E, Post, VC).
% VC is the result of Post by replacing X in Post by E.
% conditional
vc1(ite(B, S, T), Post, VC) :-
vc(S, Post, VC1), vc(T, Post, VC2),
VC = and((B -> VC1), (neg(B) -> VC2)).
% iteration
% use vc1(while(B, Inv, S), Post, Inv) as an alternative.
vc1(while(B, Inv, S), Post, or(and(neg(B), Post), Inv)) :-
vc(S, Inv, VC), % VC is vc(S, Inv)
write((and(Inv, B) -> VC)), nl, % part 1 of a conjunction
write((and(Inv, neg(B)) -> Post)), nl. % part 2 of a conjunction
The query “?- q2” produces three formulas, the first two from the stack and the
last from “write((P -> VC))”:
and(and(0=<k, and(k=<n, result=sum(0,k,a))), k<n) ->
and(0=<k+1, and(k+1=<n, result+a(k)=sum(0,k+1,a)))
and(and(0=<k, and(k=<n, result=sum(0,k,a))), neg(k<n)) ->
result=sum(0,n,a)
and(n>=0, and(k=0, result=0)) -> and(0=<k,and(k=<n, result=sum(0,k,a)))
The concept of loop invariant is one of the foundational ideas of software construc-
tion. We have seen several examples of loop invariants. We hope that these examples
establish the claim that the invariant is the key to every loop: to devise a new loop
so that it is correct requires summoning the proper invariant and to understand an
existing loop requires understanding its invariant.
By far, finding an invariant for a loop is traditionally the responsibility of a
human: either the person performing the verification or the programmer writing the
loop in the first place. A better solution, when applicable, is program synthesis,
a constructive approach to programming advocated by Dijkstra and others [3].
Automated generation of loop invariants is still an ongoing research topic.
For the triple {I } while B do S od {Q}, I is a correct invariant for the loop if it
satisfies the following conditions:
1. I is true in the state preceding the loop execution.
2. Every execution of the loop body S, started in any state in which both I and B
are true, will yield a state in which I holds again.
If these properties hold, then any terminating execution of the loop will yield a state
in which both I and ¬B hold; the latter is called the exit condition.
We may look at the notion of loop invariant from the constructive perspective of a
programmer directing his or her program to reach a state satisfying a certain desired
property, the postcondition. In this view, program construction is a form of problem-
solving, and the various control structures are problem-solving techniques; a loop
solves a problem through successive approximation. This idea has been strongly
advocated in Furia et al.’s survey article [4] on loop invariants. In the following, our
presentation is based on their survey.
• Generalize the postcondition (the characterization of possible solutions) into a
broader condition: the invariant.
• As a result, the postcondition can be defined as the conjunction of the invariant
and another condition, i.e., the exit condition.
• Find a way to reach the invariant from the previous state of the computation: the
initialization.
• Find a way, given a state that satisfies the invariant, to get to another state,
still satisfying the invariant but closer, in some appropriate sense, to the exit
condition: the body.
The importance of this presentation of the loop process is that it highlights the
nature of the invariant: it is a generalized form of the desired postcondition, which
in a special case (represented by the exit condition) will give us that postcondition.
9.4 Obtaining Good Loop Invariants 341
This view of the invariant, as a particular way of generalizing the desired goal of the
loop computation, explains why the loop invariant is such an important property of
loops; one can argue that understanding a loop means understanding its invariant.
Many programmers still find it hard to come up with invariants. Even though the
basic idea is often clear, coming up with a good invariant is an arduous task. We
need methods to generate loop invariants automatically. To illustrate the idea of
generalizing the postcondition to obtain an invariant, we will use the thousands-
year-old Euclid’s algorithm for computing the greatest common divisor (gcd) of
two positive integers.
Example 9.4.1 The postcondition of the gcd algorithm is
.
y = gcd(a, b),
where the precondition is a > 0 ∧ b > 0. The first step of the generalization is to
replace this condition by
. .
x = 0 ∧ gcd(x, y) = gcd(a, b)
with a new variable x, taking advantage of the mathematical property that, for
. .
every y, gcd(0, y) = y. The second conjunct, i.e., gcd(x, y) = gcd(a, b), a
generalization of the postcondition, will serve as the invariant; the first conjunct
will serve as the exit condition. To obtain the loop body, we take advantage of
.
another mathematical property: for every x and y, gcd(x, y) = gcd(y, x) and
.
gcd(x, y) = gcd(x − y, y) if x > y, yielding the well-known algorithm:
x := a; y := b
.
{I : x ≥ 0 ∧ y > 0 ∧ gcd(x, y) = gcd(a, b)}
while x = 0 do
if x < y then z := x; x := y; y := z fi
x := x − y
od
The proof of correctness follows directly from the mathematical property of gcd.
This form of Euclid’s algorithm uses subtraction; we may replace the subtraction by
the remainder function, i.e., replace x − y by x%y (% is the remainder function),
resulting in a more efficient algorithm. In this case, we need the property that
.
gcd(x, y) = gcd(x%y, y).
342 9 Hoare Logic
Using the classification criteria proposed in Furia et al.’s survey article, the
.
invariant for the gcd algorithm, i.e., I : x ≥ 0 ∧ y > 0 ∧ gcd(x, y) = gcd(a, b), is
divided into two categories:
• The last conjunct of the invariant is an essential invariant, representing a
weakening of the postcondition. The first two conjuncts are bounding invariants,
indicating that the state remains within certain general boundaries and ensuring
that the “essential” part is well defined.
• The strategy that leads to the essential invariant is uncoupling, which replaces a
property of one variable (y in this case), used in the postcondition, by a property
of two variables (y and x), used in the invariant.
As illustrated by the above example, the essential invariant is a mutation (often,
a weakening) of the loop’s postcondition. The following mutation techniques are
particularly common.
• Constant Relaxation
Constant relaxation replaces a constant n (more generally, an expression which
does not change during the execution of the algorithm) by a variable i and use
.
i = n as part or all of the exit condition. In Example 9.2.7, constant relaxation
.
is used to obtain the loop invariant, i.e., s = sum(i, A), from the postcondition
. .
s = sum(n, A). This condition is trivial to establish initially for i = 0 and
.
s = 0 and easy to extend to an incremented i, yielding the postcondition when i
reaches n.
• Uncoupling
Uncoupling is to replace a variable by two, using their equality as part or all of
the exit condition. Uncoupling is used in the gcd algorithm where we replace
. . .
y = gcd(a, b) by x = 0 ∧ gcd(x, y) = gcd(a, b). Uncoupling is also used in
. . .
Example 9.3.7 where we replace y = n! by x = 0 ∧ x!y = n!.
• Term dropping
Term dropping removes a subformula (typically a conjunct), which gives a
.
straightforward weakening. Term dropping often applies after uncoupling. x = 0
. .
is dropped from x = 0 ∧ gcd(x, y) = gcd(a, b) in the gcd algorithm and also
. .
from x = 0 ∧ x!y = n! in the factorial algorithm, to obtain the essential invariant.
The dropped term often goes into the exit condition of the loop.
• Aging
Aging is to replace a variable (more generally, an expression) by an expression
that represents the value the variable had at previous iterations of the loop. Aging
typically accommodates “off-by-one” discrepancies between when a variable is
evaluated in the invariant and when it is updated in the loop body. Typically,
aging is used for generating bounding invariants. For example, if i > 0 appears
in the loop condition and i := i − 1 appears in the loop body, aging will generate
i + 1 > 0 (equivalently i ≥ 0) as a bounding invariant. On the other hand, if
j < n appears in the loop condition and j := j + 1 appears in the body, then
aging will generate j − 1 < n (equivalently j ≤ n) as a bounding invariant.
9.4 Obtaining Good Loop Invariants 343
Program synthesis is the task of constructing a program that provably satisfies the
postcondition. In contrast to program verification, the program is to be constructed
rather than given. Both fields make use of formal proof techniques, and both
comprise approaches of different degrees of automation. In contrast to automatic
programming techniques, postconditions in program synthesis are usually non-
algorithmic statements in an appropriate logical language.
If we are capable of generating loop invariants from the postcondition automati-
cally, then it is relatively easy to construct the code that satisfies the postcondition.
Since different algorithms for solving the same problem require different invariants,
generating different invariants leads us to different algorithms.
Suppose we like to sort an array A of n elements. Let A0 denote the original array
A. The postcondition of a sorting algorithm is permu(A, A0 ) ∧ sorted(A, 0, n),
where permu(A, A0 ) is true iff A is a permutation of A0 and sorted(A, 0, n) is true
iff the elements of A from 0 to n − 1 are sorted. Suppose the only way to change A
is by swap(A, i, j ), which swaps the elements A[i] and A[j ], then permu(A, A0 )
will remain true through the algorithm. That is, the triple
Example 9.4.2 If we wish to generate an invariant for the insertion sort algorithm,
.
the first technique to apply is constant relaxation: replace sorted(A, 0, n) by k =
.
n ∧ sorted(A, 0, k). Taking k = n as the exit condition, we obtain the invariant
344 9 Hoare Logic
k > 0 ∧ sorted(A, 0, k), where k > 0 comes from the predcondition n > 0, and
the outer loop looks as follows:
{n > 0}
1 k := 1
{I : k > 0 ∧ sorted(A, 0, k)}
2 while k = n do
...
8 k := k + 1
9 od
.
It is easy to see that the exit condition of the outer loop, i.e., k = n, will give us
the postcondition sorted(A, 0, n).
In order to generate an loop invariant for the inner loop, we apply uncoupling
and aging to sorted(A, 0, k) by introducing a new variable j , which breaks the first
k + 1 elements of A into two sorted lists: A[0], . . . , A[j − 1], and A[j ], . . . , A[k].
We obtain invariant I for the inter loop:
I : 0 ≤ j ∧ j ≤ k ∧ sorted(A, 0, j ) ∧ sorted(A, j, k + 1)
which has four conjuncts. The first two conjuncts are the bounding invariants. If we
choose ¬(j > 0 ∧ A[j − 1] > A[j ]) as the exit condition of the inner loop, together
with the last two conjuncts of I , we can show that sorted(A, 0, k + 1) is true when
the exit condition is true:
{n > 0}
1 k := 1
{I : k > 0 ∧ sorted(A, 0, k)}
2 while k = n do
3 j := k
{I : 0 ≤ j ∧ j ≤ k ∧ sorted(A, 0, j ) ∧ sorted(A, j, k + 1)}
4 while (j > 0 ∧ A[j − 1] > A[j ]) do
5 swap(A, j, j − 1)
6 j := j − 1
7 od
8 k := k + 1
9 od
.
It is easy to see that sorted(A, 0, j ) and sorted(A, j, k+1) are true when j = k,
because sorted(A, 0, k) comes from I and sorted(A, k, k + 1) from the definition
.
of sorted. To show that 0 ≤ j ∧ j ≤ k is true when j = k, we need to know k ≥ 0.
That is why we have already added k > 0 as a bounding invariant into I of the outer
loop. If this were not done, we would have done it now. So, I can be shown to be
true before entering the inner loop.
To see that I is an invariant for the inner loop, inside the second loop, 0 ≤ j − 1
is true because the loop condition j > 0; sorted(A, 0, j − 1) is true because of
9.4 Obtaining Good Loop Invariants 345
Informal assertions are intuitive, but being able to express them formally is even
better. Reaching this goal and, more generally, continuing to make assertions
346 9 Hoare Logic
where the first part says m is an element of A and the second part says no elements
of A are greater than m. Similar assertions can be written for the loop invariant.
Even for such a simple example, the limitations of the atomic assertion approach
are clear: the assertions become long and complicated; they contain quantifiers
which are hard for theorem provers. In general, this approach requires going back to
basic logical constructs every time, it does not scale. Past experiences have shown
that the atomic assertion approach is futile.
The domain theory approach means that, before any attempt to reason about an
algorithm, we should develop an appropriate model of the underlying domain by
defining appropriate concepts such as greatest common divisor for algorithms on
integers (Example 9.4.1) and establishing the relevant theorems (e.g., x > y →
.
gcd(x, y) = gcd(x − y, y)). These concepts and theorems need only be developed
once for every application domain of interest, not anew for every program over that
domain. The programs can then use the corresponding functions in their assertions.
The domain theory approach takes advantage of the standard abstraction mecha-
nism of mathematics. Its only practical disadvantage, for assertions embedded in a
programming language, is that the functions over a domain (such as gcd) must come
9.4 Obtaining Good Loop Invariants 347
from some library and, if written in the programming language, must satisfy strict
limitations.
The assertions without quantifiers are more amenable to automated reasoning.
However, the domain theory approach has its own limitations: it needs some
functions (e.g., arrMax) which are not required by the program. Some properties of
these functions may be needed in the proof of the assertions, and the proofs of these
properties may need other reasoning tools (e.g., inductive reasoning). For instance,
in Example 9.2.8, we may need the property that arrMax(n, A) is an element of A
when n > 0.
Progress in the domain theory approach, both theoretical and on the engineering
side, remains essential. Much remains to be done to bring the tools to a level where
they can be integrated in a standard development environment and routinely suggest
assertions, including invariants, conveniently and correctly, whenever a programmer
writes a postcondition.
Formal verification has long been an active field of research in both academy
and industry. Many formal verification systems have been developed with notable
success. Here, we select a few of them for conventional programming languages:
• Verified Software Toolchain: This tool is developed at Princeton University
(website: vst.cs.princeton.edu) with worldwide participants. The tool pro-
vides a verifiable C language and assures with machine-checked proofs that
the claimed assertions really hold in the C program. The tool includes static
analyzers to check assertions, optimize compilers to translate C programs to
machine language, and supply verified C libraries.
• KeY: The KeY tool is developed originally at Germany’s Karlsruhe
Institute of Technology and jointly by several European institutions (website:
www.key-project.org). The tool includes a collection of software tools that
support verified Java programming in the Eclipse IDE (integrated development
environment). The tool accepts specifications written in the Java Modeling
Language (JML) to Java programs. These specifications are transformed into
theorems of dynamic logic and then proved either interactively (i.e., with a
user) or automatically. Failed proof attempts can be used for a more efficient
debugging or verification-based testing. There have been several extensions of
KeY to the verification of C programs or hybrid systems.
348 9 Hoare Logic
Exercises
3. Given the following annotated two programs, provide the loop invariant and
prove its total correctness by Hoare logic. For every simplification step used in
the proof, please state clearly which arithmetic/algebraic/logic law is used for
simplification:
{A : a ≥ 0}
int x := 0; int y := 0
while x < a do
x := x + 1
y := y + x
.
{B : y = a(a + 1)/2}
{A : a ≥ 0}
int x := a; int y := 0
while x = 0 do
y := y + x
x := x − 1
.
{B : y = a(a + 1)/2}
4. Repeat the previous exercise for the following two programs for computing the
power function:
{A : int b ≥ 0}
int x := a; int y := 0; int z := 1
while y = b do
y := y + 1
z := x ∗ z
.
{B : z = a b }
{A : int b ≥ 0}
int x := a; int y := b; int z := 1
References 349
while y = 0 do
if (y%2 = 1) // i.e., y is odd
y := y − 1; z := x ∗ z
else x := x ∗ x; y := y/2
.
{B : z = a b }
5. For the four programs in the previous two exercises, please run the algorithm
V C with proper loop invariants and show the returned verification conditions
from the postconditions.
6. Given an array A of n integers, the problem of Max-Sum is to find the maximal
sum of elements in each consecutive non-empty subarray of A. For example, if
A = [2, −5, 8, −4, 6, −3, −1, 2], then the solution is 10, which is the sum of
the subarray [8, −4, 6]. Please provide a formal postcondition for the following
program and prove formally that the program can solve the Max-Sum problem.
7. Repeat the previous exercises using the Prolog implementation of algorithm V C,
and show the Prolog code and returned results:
{A[1..n] : array of integers, n ≥ 1}
int x := 2; int y := A[1]; int z := A[1]
while x ≤ n do
y := max(y + A[x], A[x])
z := max(y, z)
x := x + 1
{z is the solution of Max-Sum }
8. Choose a different loop invariant with proper exit condition for the inner loop in
Example 9.4.2, so that the resulting algorithm is “selection sort.”
References
1. Krzysztof R. Apt and Ernst-Rüdiger Olderog, Fifty years of Hoare’s logic. Formal Aspects of
Computing. vol.31 (6): pp 751-807. 2019.
2. Robert S. Boyer and J. Strother Moore, A computational logic, Academic Press, New York,
1979.
3. Reiner Hähnle, Dijkstra’s Legacy on Program Verification. ACM Books. 2022.
4. Carlo A. Furia, Bertrand Meyer, and Sergey Velder, Loop invariants: analysis, classification,
and examples, ACM Computing Surveys 46, 3, Article 34 (January 2014)
Chapter 10
Temporal Logic
In the previous chapter, we have shown that Hoare logic is a framework for software
verification: it can show that a program can achieve the desired property after its
execution. However, when a program involves multiple concurrent processes, its
verification becomes much more complex, and the first-order logic is inadequate.
For example, when you deposit cash into an automated teller machine (ATM),
the program running inside the ATM will read the balance of your bank account and
add the cash amount into your account. At the same time, another program running
inside the bank may use the same account to pay off your credit card. The two
programs cannot be run at the same time; otherwise, the balance of your account
may go wrong. Modifying the same data at the same time by multiple processes
is called “race condition” in concurrent programming. To avoid race conditions, a
locking mechanism is needed so that an account can be accessed when it is free
(unlocked). Due to locking mechanisms, two programs may go to a “deadlock”
state. For instance, when both you and your friend send money to each other at
the same time, one program locked account A and asks for account B and the
other program locked account B and asks for account A. In such cases, we cannot
prove the termination of the program by counting steps. To verify a program works
correctly without race condition or deadlock, we need a formalism to express time,
such as “two processes cannot access critical data at the same time,” or “the program
will terminate eventually.”
First-order logic is an excellent logic for many applications as we have demon-
strated in Chaps. 5–9. The so-called temperature paradox shows that it is not
convenient to express time in first-order logic in a simple way. Let t be a variable for
the current temperature, such as t = 30 to indicate that the current temperature is
30◦ . Let p(t) denote “temperature t is rising,” then p(30) is a well-formed formula.
However, the interpretation of p(30) is not meaningful: “temperature 30 is rising.”
To express that the temperature changes over time in first-order logic, we may
define the temperature as a function t (x), where x is time variable. To express
“temperature t is rising,” we have (s > 0) → t (x + s) > t (x). In propositional
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 351
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2_10
352 10 Temporal Logic
The first modal operator, , denotes “always” and takes a formula as its argument.
For example, if p denotes “it rains today,” then p is a formula of TL, and its
intuitive meaning is “it will rain everyday.”
10.1 An Approach from Modal Logic 353
The second operator, , denotes “eventually” and also takes a formula as its
argument. Thus, if p denotes “it rains today,” p is a formula of TL, and its intuitive
meaning is “it will rain someday.”
Informally, states a universal property in the future, while states an existential
property in the future. For example, to specify the statement “if a request is made
to print a file, the file will be printed eventually,” the statement can be denoted by
(r → p), where r stands for “request is made” and p for “the file is printed.”
The two operators, and , appear almost in every modal logic, where is
usually read as “necessarily” and as “possibly.” We borrow them from classical
modal logic for TL and give them the temporal meaning, i.e., “always” and
“eventually.”
Using the BNF grammar, the formulas of TL can be formally defined as follows:
Definition 10.1.1 Let op be the binary logical operators of propositional logic and
VP be the propositional variables used in the current application, then the formulas
of TL can be defined by the following BNF grammar:
op ::= ∧ | ∨ | → | ⊕ | ↔
VP ::= p | q | r | s | t
F ormulas ::= | ⊥ | VP | ¬F ormulas | (F ormulas op F ormulas) |
F ormulas | F ormulas
Note the difference of the last two formulas, where applies to the whole formula
in the former and only to the first argument of ∨ in the latter.
The conventional formal semantics of modal logic is called Kripke semantics, also
known as Kripke frames, relational semantics, or frame semantics. Kripke semantics
is a formal semantics for modal logic systems created in the late 1950s by Saul
Kripke and Andre Joyal. It was adapted to temporal logic and other nonclassical
systems. The development of Kripke semantics was a breakthrough in the theory of
nonclassical logics, because the model theory of such logics was almost nonexistent
before Kripke.
As we have seen from Chap. 2, an interpretation of a propositional formula
is a mapping of propositional variables to truth values. Given n propositional
variables, we have exactly 2n different interpretations. For example, if we have two
propositional variables, say p and q, then there are four interpretations, I1 = pq (it
means I1 (p) = 1, I1 (q) = 1), I2 = pq, I3 = pq, and I4 = pq. Using the notation
354 10 Temporal Logic
from Chap. 2, VP = {p, q} and AllP = {pq, pq, pq, pq}. Kripke semantics are
based on binary relations over AllP . Note that every binary relation R over S can
be represented by a directed graph G = (S, R).
Definition 10.1.2 Given a set of propositional variables VP , a Kripke frame is a
directed graph K = (S, R), where S, called states, is a non-empty subset of AllP ,
the set of all interpretations over VP , and R is a binary relation over S.
Example 10.1.3 Let VP = {p, q}, S = AllP = {pq, pq, pq, pq}, and R =
{pq, pq, pq, pq, pq, pq, pq, pq, pq, pq, pq, pq}. Then, G = (S, R),
displayed in Fig. 10.1, is a Kripke frame.
It should be understood that in an application, every TL formula uses the
propositional variables from VP and every state w in a Kripke frame K = (S, R) is
an interpretation of VP . If w, w ∈ R, we say w is a successor of w in K.
Definition 10.1.4 Given a Kripke frame K = (S, R) and a state s ∈ S, a TL
formula C is true in s of K, denoted by K, s | C, if one of the following conditions
is true:
1. C is or 1.
2. C ∈ VP , s(C) = 1.
3. C is ¬A, K, s | A.
4. C is A ∧ B, K, s | A and K, s | B.
5. C is A ∨ B, K, s | A or K, s | B.
6. C is A → B, K, s | ¬A ∨ B.
7. C is A, for every successor s of s, K, s | A.
8. C is A, for some successor s of s, K, s | A.
Otherwise, we say C is false in s of K.
Apparently, the above definition is a recursive one, and we may write a recursive
algorithm to compute the value of K, s | A.
Example 10.1.5 Consider the Kripke frame in Fig. 10.1 and let A be ¬p. Then,
A is true in s = pq because the only successor of s is s = pq and ¬p is true in s .
Let B be p ∧ q → ¬p. B is true in s = pq or s = pq, because p is false
in s. B is true in s = pq because q is false in s. B is also true in s = pq because
A = ¬p is true in pq. That is, B is true in every state of the frame.
10.1 An Approach from Modal Logic 355
Using the same analysis, we can show that p ∧ q → ¬p is true in every state.
However, p ∧ q → ¬p is false in s = pq.
In modal logic, a Kripke frame specifies the transitions of states (also called
worlds). Necessarily means “in all successive states (successors)” and possibly
means “in one of the successors.” Similarly, in TL, a formula A is true in the
current state if A is true in every successor; A is true in the current state if A is true
in one of the successors. Note that if the current state does not have any successor,
then A is true and A is false in the current state for any formula A.
Definition 10.1.6 Given a Kripke frame K = (S, R), a TL formula A is true in K,
written K | A, if for every state s ∈ S, K, s | A, and we say K is a Kripke model
of A.
A is satisfiable in TL if A has a Kripke model. That is, there exists a Kripke
frame K such that K | A.
A is valid in TL, written | A, if A is true in every Kripke frame. That is, every
Kripke frame is a model of A.
Example 10.1.5 shows that both p∧q → ¬p and p∧q → ¬p are satisfiable
as they are true in the Kripke frame in Fig. 10.1. However, this frame is not a model
of p ∧ q → ¬p.
Given a TL formula A, let M(A) denote the set of all Kripke models of A.
Definition 10.1.7 Given two TL formulas A and B, A and B are equivalent, written
A ≡ B, if M(A) = M(B). We say A entails B, or B is a logical consequence of
A, written A | B, if M(A) ⊆ M(B).
Given a set VP of n propositional variables, the number of choices for S in
n
a Kripke frame is enormous (22 − 1) and the number of choices for R is also
huge (2|S| for each S). Hence, it is infeasible to check if a TL formula is valid by
2
enumerating all frames. However, all tautologies in propositional logic are valid in
TL, and all major results from propositional logic are still useful in TL.
Theorem 10.1.8
(a) (Substitution) Any instance of a valid TL formula is valid in TL. That is, if p
is a propositional variable in A, then A[p B] is valid whenever A is valid,
where B is any TL formula.
(b) (Substitution of equivalence) For any TL formulas A, B, and C, where B is a
subformula of A, and B ≡ C, then A ≡ A[B C].
(c) (Logical Equivalence) Given two TL formulas A and B, | A ↔ B iff A ≡ B.
(d) (Entailment) Given two TL formulas A and B, | A → B iff A | B.
The proofs of the above theorem are similar to those for propositional logic and
omitted here.
We show below some examples of proof in TL. Later, we will provide a proof
method based on semantic tableau.
Theorem 10.1.9 (Duality) ¬p ≡ ¬p.
356 10 Temporal Logic
Proof For any Kripke frame K = (S, R), if K is a model of ¬p, then for every
state s ∈ S, ¬p is true in s, or equivalently, p is false in s. That means there
exists a successor s of s such that p is false in s , or equivalently, ¬p is true in s .
Hence, ¬p is true in s by the semantics of . Since s is arbitrary, ¬p is true in
every state of K. In other words, K is a model of ¬p.
On the other hand, if K is not a model of ¬p, then there exists a state s ∈ S such
that ¬p is false (thus p is true) in s. By definition, either s has no successors, or
for every successor s of s, p must be true in s . Hence, if s has no successor, then
¬p is false by definition. If s has, since ¬p is false in every successor of s, ¬p
must be false in s. So, K cannot be a model of ¬p.
Since K is arbitrary, ¬p and ¬p must share the same set of models.
Applying substitution and equivalence, we can easily deduce from the above
theorem that ¬p ≡ ¬ p and ¬p ↔ ¬ p is valid.
Example 10.1.10 We can prove that (p → p) ≡ (¬p → ¬p) by transforming
both sides of ≡ to the same formula using the known equivalence relations:
p →p ≡ ¬p ∨ p
≡ ¬p ∨ p
¬p → ¬p ≡ ¬¬p ∨ ¬p
≡ ¬p ∨ p
The formula p ∨ ¬p is a negation normal form (NNF) of TL, and we will discuss
NNF later.
Theorem 10.1.11 (Distribution) | (p → q) → (p → q).
Proof (p → q) → (p → q) is logically equivalent to
For any Kripke frame K = (S, R) and any s ∈ S, if (2) is false in s, that is, (p ∧
¬q), ¬p, and q are all false in s, then s must have some successors (otherwise,
q is true in s). Because q is false in s, there must exist a successor s of s such
that q is false in s . p must be true in s because ¬p is false in s. Hence, p ∧ ¬q
is true in s and (p ∧ ¬q) is true in s, a contradiction. Thus, (2) must be true in s
and thus true in every Kripke frame.
10.1 An Approach from Modal Logic 357
Let r denote “it rains,” then the intuitive meaning of r is “it always rains.” Given
a Kripke frame K = (S, R), if r is true in a state s ∈ S, we know r will be true in
any successor of s. What is the truth value of r in the state s? If means “always,”
should r be true now and then in every state, including s? That is, we may ask that
A → A be an axiom (a formula assumed to be true) of TL.
We say a Kripke frame K = (S, R) is reflexive if every state s ∈ S is a successor
of itself, that is, s, s ∈ R (called loop in graph theory).
Theorem 10.1.12 (Reflexivity) A Kripke frame K is reflexive iff K is a model of
A → A for any formula A.
Proof Let K = (S, R) be reflexive. If A → A is false in K, then there exists
s ∈ S such that A is true but A is false in s. Since A is true, A is true in every
successor of s. However, s is also a successor of s because K is reflexive. So, A is
true in s, too. This is a contradiction to A → A being false in K.
On the other hand, suppose K = (S, R) is not reflexive. Pick a state s ∈ S
such that s is not a successor of itself. Let Is be the propositional interpretation
associated with s. If we view Is as a conjunction of literals each of which is true
in s, then ¬(Is ) is a clause which is false in s. This clause will be true in every
interpretation (including all successors of s) other than s. Thus, ¬(Is ) is true in s.
Let A be ¬(Is ), then A → A is false in s and K is not a model of A → A.
Since K is arbitrary, the theorem holds.
The above theorem plays a dual role: by restricting Kripke frames to reflexive
ones, we obtain the validity of A → A in these frames; on the other hand, if
A → A is given as an axiom, we exclude all nonreflexive Kripke frames as model
candidates.
We know from Kripke frames that if p is true in the current state s, then p will
be true in all the successors of s. What about the truth value of p in the successors of
the successors of s? If means “always,” should p be true in them, too? Essentially,
we ask A → A to be true in every state for every formula A. Let us say a Kripke
frame K = (S, R) is transitive if R is transitive: for any states u, v, w, u, v ∈ R
and v, w ∈ R imply that u, w ∈ R. The following theorem tells us about the
relationship between the transitivity and the formula A → A.
Theorem 10.1.13 (Transitivity) A Kripke frame K is transitive iff K is a model of
A → A for any formula A.
Proof Let K = (S, R) be transitive. If A → A is false in K, then there exists
s ∈ S such that A is true but A is false in s. The latter implies that there must
exist a successor s of s and a successor s of s such that A is false in s . Since A
is true, A is true in every successor of s. However, s is also a successor of s because
K is transitive. So, A is true in s , too. This is a contradiction to A → A being
false in K.
358 10 Temporal Logic
On the other hand, suppose K = (S, R) is not transitive. There must exist three
states s1 , s2 , s3 ∈ S such that s2 is a successor of s1 , s3 is a successor of s2 but not
a successor of s1 . Let I3 be the propositional interpretation associated with s3 . If we
view I3 as a conjunction of literals each of which is true in s3 , then ¬(I3 ) is a clause
which is only false in s3 but true in every interpretation (including all successors of
s1 ) other than s3 . Let A be ¬(I3 ), then A is false in s3 and true in every other state,
and A is false in s2 and true in s1 ; A is false in s1 . Hence, A → A is false
in s1 and K is not a model of A → A.
Combing the above two theorems, we can conclude that if a Kripke frame is both
reflexive and transitive, then A ↔ A will be true in this frame.
The necessitation rule of modal logic states that if A is true, so is A. That is, if
A can be proved to be true now, the same proof can be used everywhere. The axiom
for expressing the necessitation rule is the formula A → A. The Kripke models
of A → A are those Kripke frames K = (S, R) in which the only successor
(if exists) of each state is itself, i.e., if w, v ∈ R, then w = v. Such frames are
called discrete. Obviously, the only edges allowed in a discrete frame are loops (i.e.,
w, w), and such frames are not very useful. As some loops may be missing in a
discrete frame, a discrete frame is not always reflexive.
Besides reflexivity, transitivity, and discreteness, there are many other restrictions
on the relations in a Kripke frame, such as symmetry, density, etc. There are axioms
corresponding to these restrictions.
Semantics is useful for logic only if the semantic consequence reflects its
syntactical counterpart, i.e., the result of the proof system for the logic. It is vital
to know which modal logics are sound and complete with respect to a set of Kripke
frames. Let C be the set of formulas specifying satisfying some properties. For
instance, C can be the reflexive (or transitive) property. We define T (C) to be the
set of all formulas that are entailed by C. A proof system in a modal logic is sound
with respect to C if every formula proved to be true belongs to T (C); it is complete
with respect to C if every formula in T (C) can be proved to be true. There are
over a dozen proof systems for modal logic, and the interested reader may refer to a
textbook on modal logic.
Typically, temporal logic has the ability to reason about a timeline, which is linear
by nature. The Kripke semantics use directed graphs K = (S, R) and are suitable
for reasoning about multiple timelines, as each state allows multiple successors.
We may put restrictions on R so that R acts like a linear relation: each state has at
most one successor. This restriction can work with reflexivity by allowing each state
to have at most one successor other than itself. However, this restriction does not
work well with transitivity because transitivity wants a state to include successors of
successors as its own successors. Moreover, the number of states in a Kripke frame
is bounded by the number of propositional interpretations. We cannot model every
10.2 Linear Temporal Logic 359
timeline of infinite length by Kripke frames, which provide only cycles in the graph.
Hence, we need semantics different from Kripke frames for linear temporal logic.
Temporal logic based on Kripke frames can be viewed as a branching temporal
logic.
Linear temporal logic (LTL) or linear-time temporal logic is a modal (temporal)
logic with modalities referring to a unique timeline. LTL was first proposed for the
formal verification of computer programs by Amir Pnueli in 1977. In LTL, one can
encode formulas about the future of paths, e.g., a condition will eventually be true,
or a condition will be true until another event becomes true, etc. For simplicity, our
introduction is limited to an extension of propositional logic.
As in TL, in LTL, is interpreted as “always” and as “eventually.” In addition
to these two common modal operators, we have the third operator ◦ which is
interpreted as “next” tick (or step). Like and , ◦ takes a formula as argument: if
r means “it rains today,” then ◦r means “it will rain tomorrow,” if we interpret that
a tick is a day. The operator ◦ is an important part of LTL, which has been invented
for the formal verification of concurrent programs. However, ◦ is rarely used in the
specification of concurrent programs, because not much is said about the execution
of programs in the next step. Furthermore, we want a correctness statement about
a concurrent program to hold regardless of how the interleaving processes select a
next operation. Therefore, properties are almost invariably expressed in terms of
and , not in terms of ◦.
Let VP be the set of all propositional variables used in an application of LTL.
The formal definition of the LTL formulas is given as the following BNF grammar:
V ::= p if p ∈ VP
op ::= ∧ | ∨ | → | ⊕ | ↔
F ormulas ::= | ⊥ | V | ¬F ormulas | (F ormulas op F ormulas) |
F ormulas | F ormulas | ◦ F ormulas
π : N → AllP
360 10 Temporal Logic
π = s0 s1 s2 · · ·
π≥i = t0 t1 t2 · · ·
language, π for p satisfies ∃i∀j R(i, j ), while π for p satisfies ∀i∃j R(i, j ),
where R(i, j ) denotes the formula “i ≥ 0 ∧ j ≥ i ∧ π(j )(p) = 1.” From the view
of the first-order formulas, p is not equivalent to p.
Example 10.2.2 For the formula p ∧ ¬p, we choose a sequence π where
the truth value of p is alternating. p is true in π because p is true in every suffix
of π . Note that p is not true in π , another evidence that p and p are not
equivalent. ¬p is also true in π because ¬p is true in every suffix of π . Thus,
π is a model of p ∧ ¬p. Neither p nor ¬p is valid because there
exist sequences in which one formula is false and the other is true.
Example 10.2.3 Formula p ∧ ¬p is unsatisfiable because it is not true in any
sequence π . That is, p requires p be true everywhere in one suffix of π . ¬p
requires p be false everywhere in one suffix of π . The two conditions cannot be met
at the same time.
Definition 10.2.4 An LTL formula A is satisfiable if there exists an interpretation
sequence π such that π | A and we say π is a model of A. A is valid in LTL,
written | A, if every interpretation sequence is a model of A.
Both p and p are satisfiable but not valid. From Example 10.2.2, we know
p ∧ ¬p is satisfiable but not valid. From Example 10.2.3, p ∧ ¬p is
unsatisfiable, and its negation is valid.
Theorem 10.2.5 For any LTL formula A
(a) (reflexivity) | A → A;
(b) (transitivity) | A → A.
(a) A | ◦A;
(b) ◦A | A;
(c) A | A.
The proofs of (a) and (b) come from the semantic definitions of these modal
operators; (c) comes from the transitivity of implication.
Like TL, all tautologies in propositional logic are valid in LTL and all major
results from propositional logic are still useful here.
Theorem 10.2.8
(a) (Substitution) Any instance of a valid LTL formula is valid in LTL. That is, if p
is a propositional variable in A and B is any formula, then A[p B] is valid
whenever A is valid.
(b) (Substitution of equivalence) For any LTL formulas A, B, and C, where B is
a subformula of A, and B ≡ C, then A ≡ A[B C].
(c) (Logical Equivalence) Given two LTL formulas A and B, | A ↔ B iff A ≡ B.
(d) (Entailment) Given two LTL formulas A and B, | A → B iff A | B.
The proofs of the above theorem are analog to those for propositional logic and
omitted here.
The proofs of the following theorems are examples of proving a formula is valid
or equivalent to another formula. In the next section, we will provide some proof
techniques based on semantic tableau. The operator ◦ commutes with , , and ¬,
but and do not commute with each other, as shown in Example 10.2.3.
Theorem 10.2.9 (Commutativity)
(a) ◦ p ≡ ◦p;
(b) ◦ p ≡ ◦ p;
(c) ¬ ◦ p ≡ ◦¬p.
Proof The proofs of (a) and (b) are similar to that of (c): given any interpretation
sequence π , π is a model of ¬ ◦ p iff ◦p is false in π ; ◦p is false in π iff p is false
in π≥1 , or equivalently, ¬p is true in π≥1 ; ¬p is true in π≥1 iff ◦¬p is true in π .
10.2 Linear Temporal Logic 363
Proof (a) Given any interpretation sequence π , ¬p is true in π iff p is false in
π ; p is false in π iff there exists i ≥ 0, π(i)(p) = 0, or equivalent, π(i)(¬p) = 1.
There exists i ≥ 0, π(i)(¬p) = 1 iff ¬p is true in π .
(b) Replace p by ¬q in (a), and apply ¬¬q ≡ q, we have ¬¬q ≡ q, or
¬ q ≡ ¬q.
In Chap. 2, we introduced a concept called “negation normal form,” which is a
propositional formula where every argument of ¬ is a propositional variable. This
concept applies to formulas of LTL as well as TL. Applying the two theorems above,
we have the following theorem.
Theorem 10.2.11 For every LTL formula A, there exists an LTL formula B such
that A ≡ B and B is in negation normal form.
Proof The additional rules needed for LTL are Theorems 10.2.9(c) and 10.2.10.
We will use negation normal form as a proving method, as shown by the next
example.
Example 10.2.12 To show that p → p is valid, we transform ¬(p →
p) into negation normal form:
¬(p → p) ≡ p ∧ ¬ p
≡ p ∧ ¬ p
≡ p ∧ ¬p
Proof
(a) For any interpretation sequence π , (p ∧ q) is true in π iff for any i ≥ 0,
π(i)(p ∧ q) = 1. That is, π(i)(p) = 1 and π(i)(q) = 1 for any i ≥ 0. This is the
same as p and q, respectively.
(b) can be obtained from (a) by duality. The proofs of (c) and (d) are left as exercises.
Example 10.2.14 To show that (p ∨ q) is not equivalent to p ∨ q, consider an
interpretation sequence π where p is true only at each odd-numbered state of π and
q is true only at each even-numbered state of π . Then, p ∨ q is true at every state of
π ; hence, (p ∨ q) is true in π . It is easy to see that neither p nor q is true in π .
Note that p ∨ q | (p ∨ q).
Theorem 10.2.15 (Absorption)
(a) p ≡ p
(b) p ≡ p
(c) p ≡ p
(d) p ≡ p
Proof (a) comes from Theorem 10.2.5. (b) is a dual of (a). (c) can be obtained from
(d) by duality. Here is the proof of (d): for any interpretation sequence π , p is
true in π iff there exists i ≥ 0, p is true in π≥i . p is true in π≥i iff for all
j ≥ i, p is true in π≥j . p is true in π≥j for j ≥ i iff p is true in π≥k for any
k ≥ 0, which is equivalent to p is true in π .
The above theorem allows us to compress series of and operators to no more
than two. They are useful in the simplification of LTL formulas.
Fig. 10.2 (a) A Kripke frame for the oven model, where a link exists between two interpretations
that differ by one truth value; (b) a partial search tree of the frame
Fig. 10.2a, showing the results of “the oven power is on,” “open or close the oven
door,” and usual oven functions. Of course, this is an oversimplified oven model for
the purpose of illustration.
A path (repeated nodes are allowed) in a Kripke frame K = (S, R) is a sequence
of states s1 , s2 , s3 , · · · , si , · · · over S, such that si , si+1 ∈ R for all i ≥ 1. Since
K is finite, an infinite path of the frame is conveniently expressed as a sequence
of states ending with a subsequence which can repeat any number of times. For
instance, s0∗ , s0 s1∗ , (s0 s1 )∗ , s0 (s1 s2 s3 s4 )∗ , etc. are such paths in the oven example,
where x ∗ indicates that we repeat x any number of times.
If an infinite sequence π of interpretations is represented by s0 s1 s2 · · · sk (sk+1
sk+2 · · · sk+m )∗ , we say it is regular (called regular expression in theory of
computation).
It is relatively easy to check if s0 s1 · · · sk (sk+1 sk+2 · · · sk+m )∗ is a model of LTL
formula C by following Definition 10.2.1, where condition i ≥ 0 in the last two
cases can be replaced by 0 ≤ i ≤ k + m:
7. C is A and for all 0 ≤ i ≤ k + m, π≥i | A.
8. C is A and there exists 0 ≤ i ≤ k + m, π≥i | A.
Example 10.2.17 Let’s continue with Example 10.2.16 and let A be (on →
hot). We show that A is true in s0 s1 s2 s3∗ , a regular sequence in Fig. 10.2. For A to
be true, on → hot needs to be true in all suffixes of s0 s1 s2 s3∗ : They are s0 s1 s2 s3∗ ,
s1 s2 s3∗ , s2 s3∗ , and s3∗ . Indeed, on → hot is true in s0 s1 s2 s3∗ and s1 s2 s3∗ because on
is false in s0 and s1 . on → hot is also true in s2 s3∗ and s3∗ because hot is true in
s2 s3∗ and s3∗ (because hot is true in s3 ). Thus, A is satisfiable.
The above example shows a typical working process of model checking:
1. Build a model of the system abstracting out irrelevant details.
366 10 Temporal Logic
We may use this equivalence to reduce the number of regular sequences in model
checking.
For a Kripke frame K = (S, R), model checking will examine all regular
sequences of K as follows: choose a node in S as the root of a tree T and expand
repeatedly each leaf node of T according to R. A stopping criterion of the expansion
is that the path from the root to a leaf contains no more than one copy of any node.
This way, the tree will be finite for K, and all nonequivalent regular sequences
represented by T will be also finite. For each regular sequence represented by T ,
model checking will examine if it is a model of the given formula.
For the Kripke frame given in Fig. 10.2a, a search tree is shown in (b). The
rightmost path, i.e., s0 s7 , represents the following regular sequences: s0∗ , s0 s7∗ , and
(s0 s7 )∗ . To show that “once the power is on, the oven will be eventually hot,” we
have to show that ¬A, where A is (on → hot), has no models. However, s0 s7∗ is
a model of ¬A, which is equivalent to (on ∧ ¬hot); thus, A is not a theorem of
the oven model. We need to modify either A or the oven model if we want A to be
a theorem.
Like propositional logic, the extended tableau method takes an input formula A,
creates the initial node containing A, and then applies the rules repeatedly to any leaf
node to generate its successors. The tableau rules for LTL consist of the rules for
propositional logic shown in Sect. 3.1.2, plus a new α-rule for , a new β-rule for ,
a new type of rule called X-rule (the neXt rule) for ◦. To simplify the discussion, we
will apply some simplification rules to the initial formula. The simplification rules
include the following rewrite rules and some rules involving the constants and ⊥
(such as ⊥ ∨ A → A and → ):
(1) ¬¬A → A
(2) ¬(A ∧ B) → ¬A ∨ ¬B
(3) ¬(A ∨ B) → ¬A ∧ ¬B
(4) ¬(A → B) → A ∧ ¬B
(5) ¬(A ↔ B) → (A ∧ ¬B) ∨ (¬A ∧ B)
(6) ¬A → ¬A
(7) ¬A → ¬A
(8) ¬◦A → ◦¬A
(9) ◦A → ◦A
(10) ◦A → ◦A
The last three rules are the commutativity rules which move ◦ outward; the other
rules move ¬ inward. These simplification rules are based on logical equivalences
and preserve the equivalence relation between the two formulas before and after the
application of a rewrite rule. The rules will transform the initial formula to negation
normal forms where ¬, ∨, ∧, ◦, , are the only operators.
In Chap. 3, we showed how a tableau is created by creating the initial node of the
input formula and applying α- and β-rules to a formula in a node to create successor
nodes. A α-rule creates one successor, and a β-rule creates two successors. Let n
be the current node and F (n) be the set of formulas contained in n. If a α-rule
can replace A by {B, C} for A ∈ F (n), then the successor of n, denoted n , is
created and F (n ) = F (n) − {A} ∪ {B, C}. If a β-rule can replace A by B and C
for A ∈ F (n), then two successors of n, denoted n and n , are created such that
F (n ) = F (n) − {A} ∪ {B} and F (n ) = F (n) − {A} ∪ {C}.
As in Chap. 3, given a tableau, we say a node is closed if it contains a pair of
complementary literals. A node is open if it is not closed and contains only literals. A
node is expandable if it is a leaf node which is neither closed nor open. Expandable
nodes always contain a formula which allows one of α- or β-rules to apply. For LTL,
we have a new type of nodes, called X-nodes.
368 10 Temporal Logic
(a) A ≡ A ∧ ◦A
(b) A ≡ A ∨ ◦ A
Proof
(a) For any interpretation sequence π , A ∧ ◦A is true in π iff A is true in π≥0
and A is true in π≥1 . A is true in π≥1 iff A is true in π≥i for every i ≥ 1. Thus,
A must be true in π≥i for every i ≥ 0, which is exactly “A being true in π ” by
definition.
(b) is the dual case of (a).
The above theorem serves as an inductive tool for constructing a model of A
and A: from (a), A is the base case and ◦A is the inductive case. Once both cases
are finished, we obtain a model for A. From (b), A is the base case; if a model of
10.3 Semantic Tableaux for LTL 369
A is found, we use it as a model of A. Otherwise, from the inductive case, we try
to find a model of ◦ A.
Definition 10.3.4 The α-rule for A and the β-rule for A are given below:
α α1 , α2 β β1 β2
A A, ◦A A A ◦ A
n0 : p
↓
n1 : p, ◦ p
n2 : p, ◦ p n3 : ◦ p, ◦ p
↓ ↓
to n0 n4 : p, p
↓
to n1
This tableau has no leaf nodes (neither open nor closed nodes). The boxed nodes
are X-nodes. We will show later how to derive a model from this tableau.
Given a tableau as directed graph, we still call a node without outgoing links a
leaf , which is either closed, open, or expandable. Given two nodes n and n of a
directed graph, n is said to be reachable from n if there exists a path from n to n in
the graph.
For any node n of a tableau, let F (n) denote
the set of formulas appearing in
n. As in Chap. 3, we assume that F (n) ≡ A∈F (n) A, that is, a set of formulas is
equivalent to the conjunction of the formulas in the set.
370 10 Temporal Logic
Theorem 10.3.6
(a) The construction of a tableau for an LTL formula always terminates.
(b) If node n is derived from n by an α-rule, then F (n) ≡ F (n ); if n and n are
derived from n by a β-rule, then F (n) ≡ F (n ) ∨ F (n ).
(c) If node n is derived from n by the X-rule, then an interpretation sequence π
is a model of F (n ) iff π = s0 π is a model of F (n), where s0 ∈ AllP is an
interpretation which makes every literal A ∈ F (n) true.
Proof
(a) The construction stops when no expandable nodes are available. If the construc-
tion does not stop, then there will be an infinite path of nodes in the tableau such
that each node contains a different set of formulas. Hence, the set of formulas from
these nodes is infinite. However, the rules used in the construction can only generate
subformulas of the original formula or add at most one ◦ operator to them. In other
words, only a finite number of formulas can be generated. This is a contradiction,
so the tableau cannot have an infinite path other than cycles.
(b) All the α and β rules used in the construction preserve the equivalence relation.
(c) When the X-rule is applied to n to generate n , where F (n) = {A1 , A2 , . . . , Ak ,
(n ) = {B1 , B2 ,
◦B1 , ◦B2 , . . . ◦ Bm }, F . . . Bm }, where Bi is any formula. By the
assumption, F (n) ≡ ( 1≤i≤k Ai ) ∧ ◦( 1≤j ≤m Bj ). Let π be any interpretation
sequence and s0 be an interpretation such thats0 (Ai ) = 1 for 1 ≤ i ≤ k. Then,
π is a model of F (n ) iff sπ is a model of ◦( 1≤j ≤m Bj ) for any s ∈ AllP . Take
s = s0 , then π = s0 π is a model of ( 1≤i≤k Ai ) ∧ ◦( 1≤j ≤m Bj ), since s0 is a
model of 1≤i≤k Ai . That is, π = s0 π is a model of F (n).
Part (a) of the above theorem tells that the construction of a tableau for an
LTL formula will terminate such that every leaf is either closed or open, without
expandable nodes. If the tableau has an open node, parts (b) and (c) tell us how to
construct a model (for the input formula in the initial node) from this open node
bottom-up, thus have shown that the original formula is satisfiable. What happens
when a tableau has no open nodes? We will discuss this case next.
We have seen how a tableau can be constructed for every LTL formula. For
propositional logic, a formula is satisfiable iff its tableau has an open node. For
LTL, at least one side is true.
Theorem 10.3.7 If the tableau for an LTL formula A has an open node, then A is
satisfiable.
10.3 Semantic Tableaux for LTL 371
Proof Let (n0 , n1 , . . . , nm ) be a simple path from the initial node n0 to the open
node nm in the tableau. We show that F (nj ) has a model πj for j = m, m −
1, . . . , 1, 0 by induction.
For the base case when j = m, F (nm ) is a set of consistent literals. Let sm be
the interpretation which makes every literal of F (nm ) true and πm = sm + , where s +
m
means we repeat sm forever, then πm is a model of F (nm ).
As induction hypothesis, we assume that πj is a model of F (nj ), 0 < j ≤ m.
If nj is a successor of nj −1 by either α-rule or β-rule, by Theorem 10.3.6 (b),
πj −1 = πj is also a model of F (nj −1 ). If nj is the successor of nj −1 by the X-rule,
by Theorem 10.3.6 (c), πj −1 = sj −1 πj is a model of F (nj −1 ), where sj −1 is an
interpretation made from the literals Ai ∈ F (nj −1 ).
Finally, when j = 0, π0 is a model of F (n0 ) = {A}. Thus, A is satisfiable.
For LTL, a tableau may not have an open node, as seen in Example 10.3.5. That
tableau does not have closed nodes, either. How can we tell if the input formula of
the tableau is satisfiable? Let us look at another example.
Example 10.3.8 To show that ◦p∧¬p is satisfiable, we may construct a model by
semantic tableau. Using the tableau rules, we obtain the tableau shown in Fig. 10.3,
where X-nodes are boxed.
A model is a sequence of interpretations and an interpretation can be created
either from an open node or X-node. For instance, consider the path (n0 , n1 , n2 , n4 ):
from n2 , we create s0 = {¬p}; from n4 , we create s1 = {p}. The first two
interpretations of our model are s0 and s1 , respectively. By repeating s1 , we obtain
an infinite interpretation sequence denoted by s0 s1+ , where s1+ means we repeat s1
one or more times. It is easy to check that s0 s1+ is a model of ◦p ∧ ¬p. Since
the interpretations after s1 do not affect the truth value of p, let s2 denote an
interpretation where p takes any value, then a general representation of the models
from this path is s0 s1 s2∗ , where s2∗ means we can repeat s2 any number of times.
372 10 Temporal Logic
(n0 , n1 , n3 , n5 , n7 , n8 , n9 )
where n3 and n7 are X-nodes and n9 is open. n3 does not specify the truth value
of p, and we name the interpretation as s2 where p can take any truth value. The
interpretations created from n7 and n9 are s1 = {p} and s0 = {¬p}, respectively.
The model created from this path is s2 s1 s0 s2∗ .
Consider the following path:
(n0 , n1 , n3 , n5 , n7 , n8 , n10 )
where n3 , n7 , and n10 are X-nodes. If the interpretations created from them always
make p true, then no model can be found from this path. On the other hand, if the
path is
(n0 , n1 , n3 , n5 , n7 , n8 , n10 , n8 , n9 ),
the model from this path is s2 s1 s2 s0 s2∗ . If we repeat n8 and n10 a couple of times,
then all of the models from this path can be represented by s2 s1 s2∗ s0 s2∗ , where p can
take any value in s2 .
In fact, all the models of ◦p ∧ ¬p can be denoted by s0 s1 s2∗ ∪ s2 s1 s2∗ s0 s2∗ . Each
model corresponds to some path in the tableau.
The above example tells us that when there is an open node, we can construct a
model for the input formula. When the path ends with a cycle, a model may or may
not be constructed.
Example 10.3.9 Look back at the tableau in Example 10.3.5: it has no open nodes.
If we pick the cycle (n0 , n1 , n2 ), let s1 = {p} be the interpretation derived from the
X-node n2 , then s1+ is a model of the input formula p.
Let us look at the path (n0 , n1 , n3 , n4 ), where the last three nodes are in a cycle.
The interpretation created from n3 does not care about the truth value of p and we
denote it by s2 , then s2+ is not a model.
On the other hand, consider the path:
(n0 , n1 , n2 , n0 , n1 , n3 , n4 ).
The sequence (s1 s2 )+ is a model of p, while the sequence s1 s2+ is not a model.
In other words, we need to repeat altogether all the interpretations generated from
the nodes in a cycle.
Note that all the models of p can be denoted by the sequence (s2∗ s1 )+ .
The above examples show that some paths ending with a cycle generate models
and some don’t. Our purpose of using tableaux is to decide if the input formula is
satisfiable or not. A model is just evidence that the input formula is satisfiable. If a
10.3 Semantic Tableaux for LTL 373
tableau has an open node, we know the input formula is satisfiable because a model
can be constructed from the path which connects the initial node and the open node.
If the tableau has no open nodes, we define below a marking mechanism to check
effectively if the input formula is satisfiable or not.
Definition 10.3.10 Given a tableau which has no expandable nodes. A node n is
marked dead if one of the following conditions is true:
1. n is closed.
2. All successors of n are marked dead.
3. There exists a formula A ∈ F (n) such that there is no node n reachable from
n and A ∈ F (n ).
In the third case, the reachable condition is checked among the nodes which
are not marked dead. That is, if a node is marked dead, it cannot be used in any
path. Node n is marked dead because it contains an unfulfilled A: A is true in
a sequence π iff A is true in a suffix of π . If A does not appear in any node n
reachable from n, then it is impossible to construct such a suffix of π such that π is
a model of F (n).
Example 10.3.11 To show that (p∧q) → p is valid, we show that its negation is
unsatisfiable by semantic tableau. The negation normal form of ¬((p ∧ q) → p)
is (p ∧ q) ∧ ¬p, whose tableau is given below:
n0 : (p ∧ q) ∧ ¬p
↓
n1 : (p ∧ q), ¬p
↓
n2 : p ∧ q, ◦(p ∧ q), ¬p
↓
n3 : p, q, ◦(p ∧ q), ¬p
n4 : p, q, ◦(p ∧ q), ¬p n5 : p, q, ◦(p ∧ q), ◦ ¬p
× ↓
to n1
For ¬p ∈ F (n1 ), there is no node n reachable from n1 and ¬p ∈ F (n ). Thus,
n1 is marked dead. n0 is marked dead because its successor is marked dead. In fact,
every node in this tableau can be marked dead.
Example 10.3.12 For the tableau given in Example 10.3.5, p appears in n4 . Since
n2 is reachable from n4 through the path (n4 , n1 , n2 ) and p ∈ F (n2 ), n4 cannot be
marked dead.
Algorithm 10.3.13 The algorithm satisfiabilityByTableau takes an LTL formula A
as input and returns “satisfiable” if A is satisfiable, otherwise returns “unsatisfiable.”
374 10 Temporal Logic
proc satisfiabilityByTableau(A)
1. Create a tableau for A:
1.1 Create the initial node that contains the negation normal form of A.
1.2 Apply the α-, β-, and X-rules until no expandable nodes exist.
2. Decide if A is satisfiable:
2.1 If the tableau has an open node, return “satisfiable.”
2.2 Apply Definition 10.3.10 until no nodes can be marked dead.
2.3 If the initial node is marked dead, return “unsatisfiable.”
2.4 Otherwise return “satisfiable.”
n0 : p ∧ ¬p
↓
n1 : p, ¬p
n2 : p, ¬p n3 : ◦ p, ¬p
n4 : p, ¬p n5 : p, ◦ ¬p n6 : ¬p, ◦ p n7 : ◦ p, ◦ ¬p
× ↓ ↓ ↓
n8 : ¬p n9 : p to n1
↓
n10 : ¬p n11 : ◦ ¬p n12 : p n13 : ◦ p
↓ ↓
to n8 to n9
For p ∈ F (n1 ), either n5 or n12 contains p; for ¬p ∈ F (n1 ), either n6 or n10
contains ¬p. All the four nodes are reachable from n1 . n0 cannot be marked dead,
so the algorithm will return “satisfiable.”
Various models can be constructed from this tableau, depending on which path
we choose in the tableau. Let s0 = {¬p}, s1 = {p}, and s2 denotes an interpretation
where we do not care about the truth value of p.
1. (n0 , n1 , n2 , n5 , n8 , n10 ): The derived model is s1 s0 s2∗ , where s1 is derived from
n5 and s0 from n10 . s2∗ is added at the end because n10 is open.
2. (n0 , n1 , n3 , n6 , n9 , n12 ): The derived model is s0 s1 s2∗ , where s0 is derived from
n6 and s1 from n12 .
3. (n0 , n1 , n2 , n5 , n8 , n11 ): The last two nodes are in a cycle and no model can be
found from this path. To fulfill ¬p ∈ F (n1 ), we need n6 or n10 .
10.3 Semantic Tableaux for LTL 375
4. (n0 , n1 , n3 , n6 , n9 , n13 ): The last two nodes are in a cycle and no model can be
found from this path.
5. (n0 , n1 , n3 , n7 ): The last three nodes are in a cycle and no model can be found
from this path.
All models of p ∧ ¬p can be denoted by s2∗ s0 s2∗ s1 s2∗ ∪ s2∗ s1 s2∗ s0 s2∗ and each
model corresponds to a path in the tableau.
Example 10.3.15 To show that (p ∧¬p) is satisfiable, we construct a semantic
tableau where we replace ∧ by “,” and A stands for (p, ¬p).
n0 : (p, ¬p)
↓
n1 : p, ¬p, ◦A
n2 : p, ¬p, ◦A n3 : ◦ p, ¬p, ◦A
n4 : p, ¬p, ◦A n5 : p, ◦ ¬p, ◦A n6 : ¬p, ◦ p, ◦A n7 : ◦ p, ◦ ¬p, ◦A
× ↓ ↓ ↓
n8 : ¬p, A n9 : p, A n10 : p, ¬p, A
↓ ↓ ↓
to n1 to n1 to n1
This tableau has one closed node, i.e., n4 , and no open nodes. n4 is the only node
marked dead. p and ¬p in n1 are fulfilled by n5 and n6 , respectively. n1 cannot
be marked dead, neither does n0 . So, the algorithm will return “satisfiable.”
If we wish to construct a model from the tableau, we need to consider a path
starting from n0 and containing both n5 and n6 . Let s0 = {p} and s1 = {¬p} be
the interpretations derived from X-nodes n5 and n6 , respectively. Then, (s0 s1 )+
is a model of the input formula. Any model requires that we repeat these two
interpretations together; repeating a single interpretation infinitely does not produce
a model. Let s2 be any interpretation where we do not care the value of p. All the
models of (p ∧¬p) can be represented by (s2∗ s0 s2∗ s1 s2∗ ∪s2∗ s1 s2∗ s0 s2∗ )+ , and each
model corresponds to some path in the tableau.
The following theorem provides the correctness of Algorithm 10.3.13.
Theorem 10.3.16 An LTL formula A is satisfiable iff Algorithm 10.3.13 returns
“satisfiable.”
Proof At first, the algorithm will terminate because (1) the construction of the
tableau will terminate (Theorem 10.3.6 (a)), and (2) checking if the initial node
is marked dead takes a finite number of steps.
If the tableau has an open node, A is satisfiable by Theorem 10.3.7.
If the initial node is not marked dead, we show that there is a model for A. This
model is constructed from a path that starts with the initial node and contains all the
nodes required by the fulfilling condition.
The path is selected among the nodes not marked dead as follows: (1) the path
starts with the initial node; (2) if the last node of the current path has only one
376 10 Temporal Logic
unmarked successor, add the successor to the path; (3) if the current path contains
a node n and A in n has not fulfilled yet, find node n reachable from n and
containing A. If n does not appear in the path, append the path from n to n to
the current path. If n already appears in the path, add only necessary nodes so that
all the nodes of the cycle containing n and n are present in the path. That is, the
last portion of the path contains a list of nodes from the same cycle, and they are
reachable to each other.
Note that if both A and B appear in F (n), we will work on them one at a time:
when A is fulfilled in n , either B has been fulfilled in the path up to n or B is
still present in F (n ). In the latter case, there is another node n reachable from n
and containing B (if not, n would have been marked dead).
By repeating the selection, one obtains a path ending with a cycle and all
formulas of form C are fulfilled. Let nk0 , nk1 , . . . , nkm be the X-nodes in the path
and s0 s1 · · · sm be the interpretations from these X-nodes (keeping them in the same
order), we define interpretation sequence π as follows:
π = s0 s1 s2 · · · (sj · · · sm )+ ,
where sj , . . . , sm are the interpretations generated from the X-nodes in the cycle. It
is ready to check that π is a model of the initial formula by a structural induction on
the formulas in each node.
We claim that π≥i is a model of F (nki ) for 0 ≤ i ≤ m. For any node n other
than X-nodes, if nki is the first X-node following n in the path for some i, then
F (n) ≡ F (nki ) and π≥i is also a model of F (n). For any formula A ∈ F (n), the
following statements are true:
• If A does not have any temporal operator, then A is true in π≥i .
• If A is ◦B, then A is true in π≥i because B appears in the node n following nki
by the X-rule and B is true in the model of n by induction hypotheses (because
B is smaller than ◦B and this model is the immediate suffix of π≥i ).
• If A is B, then A is true in π≥i because B appears in a reachable node n from
n, B is true in the model for n by induction hypotheses (B is smaller than A),
and this model is a suffix of π≥i .
• If A is B, then A is true in π≥i because B is true in π≥i (by the α-rule for )
and B is also true in the immediate suffix of π≥i (by the X-rule).
On the other hand, if the initial node is marked dead, then A is unsatisfiable
because for any node n which is marked dead, F (n) is unsatisfiable. This statement
can be proved by induction according to the recursive definition of that “n is marked
dead.” There are three cases to consider:
1. n is closed: F (n) contains a pair of complementary literals and cannot be
satisfiable.
2. all successors of n are marked dead: Using induction hypotheses, the formulas
in the successors are unsatisfiable. F (n) is unsatisfiable by Theorem 10.3.6 (b)
and (c).
10.4 Binary Temporal Operators 377
We will extend LTL by adding U and R to the definition of LTL formulas such
that if A and B are LTL formulas, so are (A U B) and (A R B).
We also need to extend Definition 10.2.1 with the following condition:
Given an interpretation sequence π , an LTL formula C is true in π if
• C is A U B and there exists i ≥ 0, π≥i | B and for all 0 ≤ j < i, π≥j | A.
• C is A R B and either B or there exists i ≥ 0, π≥i | A and for all 0 ≤ j ≤ i,
π≥j | B.
(a) (idempotency) A UA ≡A
(b) (abbreviation) A ≡ UA
(c) (absorption) A U (A U B) ≡ A UB
(d) (absorption) (A U B) U B ≡ A UB
(e) (distributivity) ◦(A U B) ≡ (◦A) U (◦B)
(f ) (induction) A UB ≡ B ∨ (A ∧ ◦(A U B))
Proof (a) For any interpretation sequence π , π≥0 = π . If A is true in π≥0 , then
i = 0, and for 0 ≤ j < i, A is true in π≥j vacuously. So, A U A is true in π≥0 . If A
is false in π≥0 , then A U A is false in π≥0 by definition.
(b) Here, B is and C is A. The truthfulness of B U C is reduced to the first
condition that there exists i ≥ 0, π≥i | C, identical to that of C.
(c) For any interpretation sequence π , if B is false, then both A U (A U B) and
A U B are false in π . If B is true, let k be the state such that B is true in π≥k but
false in π≥j for 0 ≤ j < k. Since B is true in π≥k , A U B is trivially true in π≥k .
A U B is true in π iff A is true in π≥j for 0 ≤ j < k. A U (A U B) is true in π iff A
is true in π≥j for 0 ≤ j < k, too.
The proofs of (d) and (e) are left as exercises.
10.4 Binary Temporal Operators 379
(f) For A U B to be true, either B is true now, or we put off to the next tick the
requirement to satisfy A U B while requiring that A be true now.
Example 10.4.3 By definition, p R q is true in π means either q is true forever, or
after p becomes true, the condition of q being true is released. Let s0 = {p, q},
s1 = {p, ¬q}, s2 = {¬p, q}, and s3 = {¬p, ¬q}. Then
p R q is true in s0+ , s2+ , s2 s0+ , and s2 s0 s3+ .
p R q is false in s1+ , s1k s0+ for k ≥ 1, s1k s2+ for k ≥ 1, s2 s1+ , s2 s3+ , and s3+ .
The following theorem displays some properties of R.
Proof (a)–(f) can be deduced from (g) and Theorem 10.4.2. To simplify the proof
of (g), we prove p R q ≡ ¬(¬p U ¬q), where p and q are propositional variables.
Replacing π(i) by π≥i in the following proof would give us the proof of A R B ≡
¬(¬A U ¬B) by the substitution theorem.
Assume p R q is true in π . If q is always true, i.e., q is true, then ¬q and
¬p U ¬q are false. Thus, ¬(¬p U ¬q) is true in π . If q is not always true, let k be
minimal such that q is false in π(k). Since p R q is true in π and q is false in π(k),
by the semantics of R , there must exist i < k such that p is true in π(i) and for all
0 ≤ j ≤ i, q is true in π(j ). In this case, let us check the truth value of ¬p U ¬q:
since π(k) is the first interpretation in which ¬q is true and ¬p is false in π(i) for
i < k, ¬p U ¬q must be false in π by the semantics of U . Hence, ¬(¬p U ¬q) must
be true in π .
Now, assume p R q is false in π . By the semantics of R , q cannot be true and
let k be minimal such that q is false in π(k). Hence, q is true in π(j ) for 0 ≤ j < k.
Because p R q is false in π , by the semantics of R , p must be false in π(j ) for
0 ≤ j < k. Equivalently, ¬p must be true in π(j ) for 0 ≤ j < k. Hence, ¬p U ¬q
is true in π by the semantics of U . So, we conclude that ¬(¬p U ¬q) is false in π .
The above two cases complete the proof that p R q is true in π iff ¬(¬p U ¬q) is
true in π for arbitrary π .
380 10 Temporal Logic
Some researchers also define a weak until binary operator, denoted by W , with
semantics similar to that of the until operator, but the stop condition is not required
to occur (similar to release). The strong release binary operator, denoted S , is the
dual of weak until. It is defined similar to the until operator, so that the release
condition has to be true at some point. Therefore, it is stronger than the release
operator.
Definition 10.4.5 The additional temporal operators are defined as follows:
• Weak until A W B: A has to be true at least until B becomes true; if B never
becomes true, A must remain true forever.
• Strong release A S B: B has to be true until and including the state where A first
becomes true, and A must be true at the current or a future state.
Instead of extending Definition 10.2.1 to include the formal meanings of A W B
and A S B, we provide below the equivalent relations as an alternative definition:
Theorem 10.4.6 For any LTL formulas A and B
The above theorem shows that W and S can be defined in terms of U and R , just
like → and ↔ can be defined in terms of ∧ and ∨. We know that R can be defined
in terms of U by duality. In fact, any one of the four binary temporal operators
can be used to define the other three operators. We have seen in Theorems 10.4.2
and 10.4.4 that and can be defined in terms of U and R . Thus, the minimal set
of LTL needs only two temporal operators: ◦ and one of the four binary operators.
Theorem 10.4.7 For any LTL formulas A and B
(a) A UB ≡ B ∧ (A W B)
(b) ARB ≡ B ∨ (A S B)
(c) A UB ≡ B S (A ∨ B)
(d) ARB ≡ B W (A ∧ B)
(e) AW B ≡ B R (A ∨ B)
(f ) AS B ≡ B U (A ∧ B)
(g) AW B ≡ A U (A ∨ B)
(h) AS B ≡ A R (A ∧ B)
(i) A ≡ AW ⊥
(j ) A ≡ AS
(k) ¬(A S B) ≡ ¬A W ¬B.
10.5 Verification of Concurrent Programs 381
The last equivalence is the duality of W and S. The proof of this theorem is left
as an exercise.
α α1 , α2 β β1 β2
A R B B, A ∨ ◦(A R B) A U B B A, ◦(A U B)
The α-rule is based on Theorem 10.4.4 (f ), and the β-rule is based on Theo-
rem 10.4.2 (f).
After a tableau is constructed using these rules, we need to decide if a model
can be generated from the tableau. In this case, the definition of dead nodes
(Definition 10.3.10) needs to be extended to consider the occurrence of (A U B)
in a node of the tableau: a node n is marked dead if there exists a formula
(A U B) ∈ F (n) such that there is no node n reachable from n and B ∈ F (n ).
This chapter begins with an example of how to access a bank account concurrently
and correctly. This example serves as a motivation for the application of temporal
logic for formal verification of concurrent programs. We will end this chapter
with an example of formal verification, which is taken from a tutorial of the
software system STeP (Stanford Temporal Prover). STeP is a system developed
by Zohar Manna et al. at Stanford University to support the computer-aided formal
verification of concurrent and reactive systems based on Hoare logic and temporal
logic. At Carnegie Mellon University, Edmund Clarke et al. developed multiple tools
based on model checking (MC) for the same purpose. Efficient automated reasoning
techniques such as simplification methods, decision procedures, temporal logic, etc.
are used in STeP and some model checkers. Because of space limitation, we will
introduce model checking briefly and leave STeP to the interested readers to explore.
382 10 Temporal Logic
Model checking is a general formal verification paradigm for both imperative and
functional programs. In model checking, hardware/software/algorithm designs are
modeled by finite state machines, properties are written in propositional temporal
logic, and verification procedure is an exhaustive search in the state space of the
design. To avoid exponential space explosion, bounds are used to limit the path
lengths in finite state machines, hence the name bounded model checking. Model
checking aims at finding diagnostic counterexamples, such as unexpected behavior
of a circuit or possible violation of a safety property. When no counterexamples are
found, the verification is considered “complete.”
When LTL (linear temporal logic) is used with model checking, a finite state
machine M is a Kripke frame K = (S, R), where S is a set of states, each of which
is an interpretation of propositional variables in LTL, and R is a binary relation over
S, plus the following information:
• A nonempty subset I of S is identified as the initial states of M.
• Optionally, a subset F of S is identified as the final states of M.
• Optionally, labels are assigned to each transition in R.
Example 10.5.1 Suppose you are asked to implement an online ticket booth to sell
tickets for an event. You come up with the following procedure, where k is the total
number of tickets to be sold and reserved is a Boolean variable for “a reservation
request is being processed.”
proc ticketing()
precondition: nTickets = k > 0 ∧ reserved = false
l1 while (true) do
l2 if (nTickets > 0 ∧ ¬ reserved)
l3 reserved := true
l4 if (nTickets > 0 ∧ reserved)
l5 reserved := false
l6 nTickets := nTickets − 1 // critical area
od
• Other lines of the code do not change the states, and state ar is not accessible
from the initial state ar.
Based on the above analysis, we create a finite-state machine M shown in
Fig. 10.4 for procedure ticketing. This machine has three useful states: s0 , s1 , and
s2 , and s0 is the initial state. The code labels are attached to each transition to help
understand how the state changes during the execution.
For the reader who knows well finite-state machines, the machine M in Fig. 10.4
is a nondeterministic one, because from state s0 , the transition may go to either s0
or s2 with the same label l6 . This is clear from the tabular representation of M to the
right side of the figure.
In Sect. 10.2.3, we have shown that an infinite path in a Kripke frame
can be conveniently represented by a regular sequence of states, that is,
s0 s1 · · · sk (sk+1 sk+2 · · · sk+m )∗ . We will use them in a finite state machine as model
candidates of LTL formulas.
If we concatenate the label assigned to every transition si , si+1 of a given path,
we obtain a string of labels, which will be called the trace of the execution path.
For instance, the following is a path of M in Fig. 10.5.1 with explicit labels on each
transition.
l1 l2 l3 l4 l5 l6 l1 l2 l3 l4 l5 l6
s0 → s0 → s0 → s1 → s1 → s0 → s0 → s0 → s0 → s1 → s1 → s0 → s2
Suppose procedure ticketing is called every time a customer places an online order. It
means that multiple copies of the procedure will run concurrently. The computation
of a concurrent program is viewed as the interleaving of the atomic operations of its
processes, where each process is a sequential program and has a control variable to
locate the code of the current execution.
Concurrent procedures create many problems not seen in sequential procedures.
For instance, if two copies of ticketing run line l6 , i.e., nTickets := nTickets − 1
simultaneously, then the end result will be wrong. For instance, if nTickets = 5 before
the execution, and both copies will write 4, not 3, into nTickets. If reserved is a
global variable and copy 1 runs l5 right after copy 2 runs l3 , then the customer
running copy 2 will not get the ticket.
Concurrent procedures should possess the following properties:
• Mutual exclusion: Execution is never in the critical section (line l6 of ticketing)
at the same time for multiple processes.
• Accessibility: Once a process has expressed interest in entering the critical
section, it will eventually do so. For ticketing, if a process resides at l3 , it will
always eventually reach l6 .
• One-bounded overtaking: If one process wants to enter the critical section, the
other process can enter the critical section at most once before the first one does.
Mutual exclusion and one-bounded overtaking belong to the class of temporal
safety properties. Informally, such properties state that something “bad” can never
happen and are falsified if a bad state is ever reached: if a safety formula A is false
in a model, then there is a finite prefix of the model such that A is also false in every
extension of this prefix.
Accessibility, on the other hand, is a response property and belongs to the
larger class of progress properties. These properties state that something “good”
is guaranteed to happen eventually. The verification of each class of properties has
its particular requirements. For instance, safety properties are independent of the
termination requirements of the given program, whereas the verification of response
properties relies on termination.
It is a challenge to design concurrent programs with these properties. Apparently,
ticketing() cannot be used as concurrent programs because it does not have any of
the above three properties.
Early solutions to mutual exclusion suffered from starvation, where a process could
forever be denied access to the critical section, and some later solutions do not
satisfy one-bounded overtaking. Lamport’s bakery algorithm, discovered by Leslie
10.5 Verification of Concurrent Programs 385
Lamport in 1974, is the first solution that satisfied the three above properties for the
general N -process (in our example, N = 2).
Figure 10.5 shows a program taken from a tutorial of STeP that implements
Lamport’s bakery algorithm for mutual exclusion. Two processes, P1 and P2,
coordinate access to a critical section, where at most one process should reside at
any given time. For instance, modifying the balance of a bank account should be
done in a critical section. Each process selects a “ticket” which is a number in y1
and y2 , respectively, as the customers in a bank would, and the process with the
lower ticket is allowed to enter the critical section. Initially, y1 = y2 = 0. A ticket
with number 0 indicates that the process is not interested in accessing the critical
section. Since there are only two processes, it can be proved that y1 , y2 ∈ {0, 1, 2}.
This informal description of the algorithm can be made precise if the system and its
properties are modeled formally.
The meaning of “await (y2 = 0 ∨ y1 ≤ y2 )” is that the execution pauses until
the condition (y2 = 0 ∨ y1 ≤ y2 ) is true. This is the same as
while ¬(y2 = 0 ∨ y1 ≤ y2 ) /* do nothing */
To describe the execution of BAKERY(2) in LTL, we introduce three proposi-
tional variables, i.e., p1 , p2 , and q, with the meaning that p1 is y1 > 0, p2 is
y2 > 0, and q is y1 > y2 . We then build a finite state machine based on them. Three
variables may create a maximal number of eight states. However, the meaning of
the three variables rules out three states: p1 p2 q, p1 p2 q, p1 p2 q. Hence, our finite
state machine has five states. Initially, y1 = y2 = 0; hence, p1 , p2 , q are false,
representing the initial state. Following the execution of BAKERY(2) line by line, we
create the transitions between states, shown in Fig. 10.6. Strictly speaking, this is not
a finite state machine that we see in theory of computation, because the transitions
from s0 to s4 and s5 model the executions of lines l1 and m1 at the same time with
two possible outcomes (y1 > y2 is true or false).
Following the convention from Sect. 10.5.1, we call a path in a finite state
machine is a run, if the trace of the path represents a program execution. For
BAKERY (2), the trace of a run should be a “shuffle” or “interleaving” of two strings
from (l0+ l1 l2+ l3 l4 )∗ and (m+ + ∗ + + + +
0 m1 m2 m3 m4 ) , where l0 (resp. l2 , m0 , m2 ) means l0
can repeat one or more times, to model the effect of working in noncritical section or
the await command. For instance, the sequence l0 l1 l2 l3 m0 m1 l4 m2 m3 m4 shows that
386 10 Temporal Logic
P1 enters the critical section once, while P2 starts its own at that moment. The run
associated with this trace is s0 s0 s1 s1 s1 s1 s4 s2 s2 s2 s0 , which can be verified below:
l0 l1 l2 l3 m0 m1 l4 m2 m3 m4
s0 → s0 → s1 → s1 → s1 → s1 → s4 → s2 → s2 → s2 → s0
We cannot prove the validity of the above formula since we do not have
axioms about p1 , p2 , and q. However, we have a finite state machine in Fig. 10.6
which models the program execution. To show accessibility, we show that the
negation of the above formula has no acceptable models in the finite state
machine.
Let us first consider A, where A is (p1 → (p2 ∨ q)). A can simplified to
(p1 ∧ (p2 ∧ q)). The only models of A in M are those regular sequences that
end with s5∗ . However, they are discarded because they are not runs. Similarly, B,
where B is (p2 → (p1 ∨ q)), has only models which end with s4∗ , and they
are discarded for the same reason.
The accessibility property rules out the case of starvation, which produces
traces of form . . . l0 l1 (m0 m1 m2 m3 m4 )∗ . The next property rules out the case that
m3 appears more than once between l1 and l4 in any trace.
• One-bounded overtaking: This property is easily expressed by a property of
traces of BAKERY(2): in any trace of a run, between l1 (P1 makes request to
enter the critical section) and l4 (P1 leaves the critical section), m3 (P2 enters
the critical section) can appear at most one. The dual property is obtained by
switching P1 and P2.
Since l1 (resp. m1 ) is associated with p1 (resp. p2 ) becoming true and l4 (resp.
m4 ) is associated with p1 (resp. p2 ) becoming false, we may use the following
formula to express the one-bounded overtaking property:
This formula states that whenever control is at l2 (p1 is true), meaning that
P1 wants to enter the critical section, the following must occur: there may be an
interval in which P2 is not in the critical section (so all states in the interval satisfy
p2 ), followed by an interval where P2 is in the critical section (states satisfying
p2 ), followed by an interval where P2 is again not in the critical section (states
satisfying p2 ), followed finally by a state where P1 leaves the critical section
(i.e., l4 , and p1 is false). Thus, P2 can enter the critical section (m3 ) at most once
before P1 leaves the critical section (p1 becomes false).
Again, we can check that the negation of the above formula has no acceptable
models for the finite state machine of BAKERY(2).
A general version of BAKERY(N ), given in Fig. 10.7 [2], is an oversimplification
of Lamport’s bakery algorithm because when two or more processes have the same
positive y-values, i.e., yi = yj > 0 for i = j (it means they made the request
at the same time), Lamport’s algorithm will use i < j to break the tie, whereas
BAKERY (N ) will allow them to finish the waiting at the same time. As a result, the
mutual exclusion property of BAKERY(N ) will not hold; the other two properties of
BAKERY (2) can be carried over to BAKERY (N ). As an exercise, you are asked to
modify BAKERY(N ) so that the mutual exclusion property holds for the modified
BAKERY (N ).
388 10 Temporal Logic
Exercises
1. For the TL formula A = p → p, find two Kripke frames such that A is true
in one and false in the other.
2. For the TL formula A = (p ∨ q) → (p ∨ q), find two Kripke frames such
that A is true in one and false in the other.
3. Prove that the following TL formula is valid:
(a) p ≡ p
(b) p ≡ p.
(a) p → p
(b) p → p
(c) p → p
(d) (p → q) ∧ (q → r) → (p → r)
(a) ◦(p ∧ q) ≡ ◦p ∧ ◦q
(b) ◦(p ∨ q) ≡ ◦p ∨ ◦q.
13. Prove by semantic tableau that the LTL formulas are valid:
14. Find an equivalent negation formal form for the following formulas:
(a) (A U B) ↔ B ∧ (A W B)
(b) (A R B) ↔ B ∨ (A S B)
(c) (A U B) ↔ B S (A ∨ B)
(d) (A R B) ↔ B W (A ∧ B)
(e) (A W B) ↔ B R (A ∨ B)
(f ) (A S B) ↔ B U (A ∧ B)
(g) (A W B) ↔ A U (A ∨ B)
(h) (A S B) ↔ A R (A ∧ B)
(i) A ↔ A W ⊥
(j ) A ↔ A S
(k) ¬(A S B) ↔ ¬A W ¬B
Exercises 391
15. Prove by semantic tableau that the LTL formulas are valid:
(a) pRp → p
(b) ⊥ R p → p
(c) p R (p R q) → p R q
(d) (p R q) R q → p R q
(e) p R q → q ∧ (p ∨ ◦(p R q))
(f ) p R q → ¬(¬p U ¬q)
16. Prove by semantic tableau that the LTL formulas are valid:
(a) p U q → q ∧ (p W q)
(b) p R q → q ∨ (p S q)
(c) p U q → q S (p ∨ q)
(d) p R q → q W (p ∧ q)
(e) p W q → q R (p ∨ q)
(f ) p S q → q U (p ∧ q)
(g) p W q → p U (p ∨ q)
(h) p S q → p R (p ∧ q)
(i) p → p W ⊥
(j ) p → p S
(k) ¬(p S q) → ¬p W ¬q
17. Prove the following statements are true in LTL: (a) (A U B) U B ≡ A U B and
(b) ◦ (A U B) ≡ (◦A) U (◦B).
18. Decide how many propositional variables are needed for modeling BAKERY(N )
(Fig. 10.7) with N = 4. Please express the conditions of line l2 for the four
processes in terms of these variables.
19. (a) Modify the bakery algorithm BAKERY(N ) in Fig. 10.7 to provide a correct
description of Lamport’s bakery algorithm.
(b) For N = 3, express the condition of the await command in the modified
BAKERY (N ) as a Boolean expression of a programming language in P1,
P2, and P3, respectively.
(c) For N = 3, we define the following propositional variables: p1 (denotes
y1 > 0), p2 (y2 > 0), p3 (y3 > 0), q1 (y1 > y2 ), q2 (y2 > y3 ), q3
(y3 > y1 ), q1 (y1 = y2 ), r2 (y2 = y3 ), and r3 (y1 = y3 ). Do we need all the
nine variables for the modified BAKERY(N )? Please answer this question
by expressing the await conditions of P1, P2, and P3 as a formula of these
nine variables.
392 10 Temporal Logic
20. Prove the mutual exclusion and accessibility properties of the program given
below:
References
1. Yde Venema, Temporal Logic, in Lou Goble (ed.) The Blackwell Guide to Philosophical Logic.
Blackwell, 2001
2. Nikolaj S. Bjorner, Anca Browne, Michael A. Colon, Bernd Finkbeiner, Zohar Manna, Henny
B. Sipma, Tomas E. Uribe, “Verifying temporal properties of reactive systems: A STeP tutorial”,
Formal Methods in System Design, 16, 1–45, 2000
3. Edmund M. Clarke Jr., Orna Grumberg, Daniel Kroening, Doron Peled, and Helmut Veith,
Model Checking, 2nd Ed, MIT Press, 2018. ISBN: 978-0262038836
Part IV
Logic of Computability
Chapter 11
Decidable and Undecidable Problems
In computer science, undecidability theory studies the problems which are beyond
the power of computers and is a part of computability theory. In logic, undecidability
concerns about what problems can be computable or uncomputable or which
formulas can be proved, disproved, or unprovable. For example, all the problems
of propositional logic discussed in this book are computable. On the other hand,
some problems in first-order logic are uncomputable, and we will see some of
them through the chapter. The objective of this chapter is to introduce the formal
concepts of decidable and undecidable decision problems, as well as computable
and uncomputable functions, and show that some problems in first-order logic are
undecidable or uncomputable. We leave the discussion of decidable problems in
first-order logic to Chap. 12, where we will present several decidable fragments of
first-order logic.
The cornerstone of undecidability theory comes from Kurt Gödel’s two incomplete-
ness theorems:
1. In any consistent formal system S which can perform common arithmetic
operations on natural numbers, some formulas of S cannot be proved to be true
or false in S.
2. If S is consistent, then we cannot prove in S that S itself is consistent.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 395
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2_11
396 11 Decidable and Undecidable Problems
Turing machine is just one of many formal computing models ever proposed. In
the 1930s, several independent attempts were made to formalize the notion of
computability [1]:
• In 1931, Gödel proved the first incompleteness theorem for a class called
primitive recursive functions. In 1934, Gödel created a more general class called
general recursive functions, also called partial recursive functions. The class
of general recursive functions is the smallest class of functions (possibly with
more than one argument) which includes all constant functions, projections,
and the successor function and is closed under function composition, recursion,
and minimization. If minimization is excluded, then it becomes the class of
primitive recursive functions. In 1936, Stephen Kleene introduced variants of
Gödel’s definition and defined a recursion theory equivalent to general recursive
functions.
• In 1932, Alonzo Church created a method for defining functions called lambda
calculus, often written as λ-calculus. Later, Church defined an encoding of the
natural numbers called Church numerals. A function on the natural numbers is
called λ-computable if the corresponding function on the Church numerals can
be represented by a term of the λ-calculus.
• In 1936, before learning of Church’s λ-calculus, Alan Turing created a theo-
retical computing model, now called Turing machines, which could carry out
calculations from inputs by manipulating symbols on a tape. Given a suitable
encoding of the natural numbers as sequences of symbols, a function on the
natural numbers is called Turing computable if there exists a Turing machine
that starts with the encoded natural numbers as input and stops with the result of
the corresponding function as output.
Church and Turing proved that these three formally defined classes of com-
putable functions coincide: a function is λ-computable iff it is Turing computable
and iff it is a general recursive function. This has led mathematicians and computer
scientists to believe that the concept of computability is accurately characterized
by these three equivalent computing models. Other formal attempts to characterize
computability, including Kleene’s recursion theory and Post’s canonical system,
have subsequently strengthened this belief. Since we cannot exhaust all possible
computing models, the Church–Turing thesis, although it has near-universal accep-
tance, cannot be formally proved.
Today, programming languages based on general recursive functions (or lambda
calculus or Kleene’s recursion theory) are called functional programming languages.
On the other hand, programming languages based on the random-access stored-
program (RASP) machines are called imperative programming languages. RASP
is a computing model equivalent in power to the above three computing models and
is an example of the so-called von Neumann architecture, which uses random-access
398 11 Decidable and Undecidable Problems
memories and stored programs. Lisp, ML, Clojure, and Haskell are examples of the
former; C, C++, Java, and Python are examples of the latter.
proc Collatz(x)
while x > 1 do
if even(x) then x := x/2 else x := (3 ∗ x + 1)/2
return 1
f (1) → 1
f (2x) → f (x)
f (2x + 1) → f (3x + 2)
The termination of the above rewrite system will imply that the Collatz conjec-
ture is solved positively.
In this chapter, we will formalize the aforementioned concepts such as com-
putability, decidability, undecidable problems, etc., based on Turing machines, and
show certain problems in logic are undecidable.
a Turing machine. In other words, despite the model’s simplicity, Turing machines
are capable of simulating any computation on today’s computers, whether it is a
supercomputer or a smartphone.
Turing thought that when a human calculates with pen and paper, at any given
moment of the calculation, the mind can be in only one of a finite collection of
states and that in each state, given the intermediate results thus far obtained, the
next calculation step is completely determined. And Turing machine catches all the
essential features of a human computer.
A Turing machine operates on a tape containing an infinite number of cells and
each cell contains one symbol. The machine positions its “head” over a cell and
“reads” the symbol there. As per the symbol and the machine’s own present state in
a user-specified set of instructions, the machine operates as follows:
(a) Read the tape symbol pointed by the tape head.
(b) Get the user-specified instructions according to the symbol from (a) and the
current state. If no instructions are available, halt with “failure.” Otherwise,
perform (c)–(e) according to the instructions.
(c) Write a symbol into the current cell.
(d) Move the tape head one cell either left or right.
(e) Change the current state. If the new state is “success,” halt with “success.”
(f) Go to (a).
The tape is initially filled with an input string of symbols and blank everywhere
else. The machine positions its head to the first symbol with the designated state
called the “initial state.” The machine moves as described above through the steps
(a)–(f). There are three possible outcomes: (1) stop with “success”; (2) stop with
“reject”; or (3) loop forever.
Turing was able to answer a fundamental question in the negative: does an
algorithm exist that can determine whether a Turing machine stops on a given input
string? This is the so-called “halting problem” of Turing machines.
A Turing machine can do everything that a real computer can do. However,
their minimalist design makes them unsuitable for computation in practice: real-
world computers are based on different designs that, unlike Turing machines, use
random-access memory. For people who know programming, you may view a
Turing machine’s tape as a very long array of symbols with a single index variable
of the array (tape head). The index can increase or decrease by 1 to simulate the
move of the tape head. The user’s instructions can be stored in a two-dimensional
array indexed by the tape symbols and the state symbols.
While a Turing machine can express arbitrary computations, nonetheless, even
a Turing machine cannot solve certain problems. In a very real sense, these
problems are beyond the theoretical limits of computation and show the existence
of fundamental limitations on the power of mechanical computation.
400 11 Decidable and Undecidable Problems
A Turing machine (TM) uses a tape of infinitely many cells, which provides
unlimited memory for computation. The set of symbols appearing in the tape is
denoted by , called the tape alphabet. By default, is a super set of , which is
the set of symbols used in all input strings. The blank symbol, , is in − . ∗
denotes the set of all input strings (see Example 1.3.9) and ∗ denotes the set of all
tape contents excluding the trailing blank symbols.
For any string w ∈ ∗ , let w 0 = and w i+1 = ww i , where i ≥ 0. For example,
a = aaa and (ab)2 = abab.
3
pointed by the head and the current state, we formally define moves as a general
function δ which takes a state and a tape symbol as input and outputs (i) the next
state, (ii) the symbol to be written into the current cell, and (iii) the move direction
of the tape head. The definition of δ decides the feature of a Turing machine. If the
user (i.e., the designer of Turing machines) does not provide the definition of δ for
a state (other than qa ) and a tape symbol, it is regarded as “reject” (by blocking the
next move), which is equivalent to entering qr .
Below is the formal definition of a Turing machine.
Definition 11.2.2 A Turing machine is a 6-tuple M = (Q, , , δ, q0 , qa ), where
Q is a finite set of states, is the finite alphabet of input strings, is the finite set of
tape symbols, ⊂ , ∈ − , the initial state q0 ∈ Q, the accept state qa ∈ Q,
and the partial function δ.
δ : Q × → Q × × {L, R}
defines each move of M, where L and R indicate the tape head moves left or right,
respectively. If the tape head points to the first symbol on the tape, it cannot move
left (i.e., “move off the tape” is blocked).
Example 11.2.3 Let A0 = {0i 1j | i, j > 0}. We want to construct M0 =
(Q, , , δ, q0 , qa ) to accept the strings in A0 , where = {0, 1}, = {0, 1, },
and Q = {q0 , q1 , q2 , qa }. The tasks already performed before entering a state or to
be performed in this state are the following:
• q0 : check if the input starts with 0.
• q1 : the first 0 is found, skip rest 0s, and look for the first 1.
• q2 : the first 1 is found, skip rest 1s, and look for , the end of the input.
• qa : report “accept.”
The δ function is defined as follows:
M0 scans the tape once and does not change the tape content. Once it enters qa , the
input must contain a string of form 0i 1j , where i > 0 and j > 0. For other input
strings, M0 stops at a point where δ(q, a) is undefined, and the result is “reject”.
Alternatively, we may enter qr , the designated rejecting state, when one of these
undefined cases happens.
Example 11.2.4 Continuing from Example 11.2.1, based on the algorithm in
Example 11.2.1, we want to construct M1 = (Q, , , δ, q0 , qa ) to accept the
402 11 Decidable and Undecidable Problems
(q, a) Reason
(q0 , ) The input is the empty string
(q0 , 1) The input starts with 1
(q1 , ) The input misses 1
(q2 , 0) 0 appears after 1
1 δ(q0 , ) = (qa , , R) // j = k = 0, ∈ A1
2 δ(q0 , a) = (q1 , x, R) // j > 0, j := j − 1, go to q1
3 δ(q0 , b) = (q3 , b, R) // j = 0, k > 0, go to q3
4 δ(q0 , y) = (q3 , y, R) // j = 0, skip y, go to q3
5 δ(q1 , a) = (q1 , a, R) // skip a, continue right
6 δ(q1 , y) = (q1 , y, R) // skip y, continue right
7 δ(q1 , b) = (q2 , y, L) // k := k − 1, move back and go to q2
8 δ(q2 , y) = (q2 , y, L) // skip y, continue left
9 δ(q2 , a) = (q2 , a, L) // skip a, continue left
10 δ(q2 , x) = (q0 , x, R) // found x, go to q0 and start a new round
11 δ(q3 , y) = (q3 , y, R) // skip y, continue right
12 δ(q3 , b) = (q3 , b, R) // skip b, continue right
13 δ(q3 , ) = (qa , , L) //j = 0, k ≥ 0, bk ∈ A1
From the above instructions, we see that M1 enters qa (“success” state) at line 1
( ∈ A1 ) and line 13 (j = 0, k ≥ 0, bk ∈ A1 , after the same number of the copies
of a and b are erased). M1 is blocked when either the state is q1 and the current
symbol is (more a symbols than b symbols in the input), or the state is q3 and the
current symbol is a (a appears after b symbols).
Definition 11.2.5 Given an input string w to a Turing machine M =
(Q, , , δ, q0 , qa ), let q ∈ Q be the current state, a ∈ be the symbol pointed by
the tape head, and α and β be the strings of tape symbols before and after the tape
head, respectively. The triple α, q, aβ, or simply αqaβ, is called a configuration of
M. The initial configuration is q0 w, where w is the input string. A legal move of M
is a pair (C1 , C2 ) of configurations, written C1 C2 , such that either C1 = αqaβ,
11.2 Turing Machines 403
δ(q, a) = (p, b, R), and C2 = αbpβ or C1 = αcqaβ, δ(q, a) = (p, b, L), and
C2 = αpcbβ.
In the above definition, we assume that β does not contain the blank symbols
after the last non-blank symbol on the tape.
Example 11.2.6 Continuing from the previous example, a sequence of moves of
M1 on the input string aabb are q0 aabb xq1 abb xaq1 bb xq2 ayb
q2 xayb xq0 ayb xxq1 yb xxyq1 b xxq2 yy xq2 xyy xxq0 yy
xxyq3 y xxyyq3 xxyqa y. So, aabb is accepted by M1 .
On the other hand, q0 aba xq1 ba q2 xya xq0 ya xyq3 a. Since δ(q3 , a)
is not defined, the move is blocked, and the input aba is rejected by M1 .
Definition 11.2.7 Let M = (Q, , , δ, q0 , qa ) be a Turing machine, and w ∈ ∗ ,
w is said to be accepted by M, written M(w) = 1, if
q0 w ∗ αqa β
By the above assumptions, M or M, w can be any string of ∗ . Now, the
complement of any language in the above list makes sense. Let L = ∗ − L:
All the decision problems we have met in the courses on data structures and
algorithms, where their complexity is bound by a big O function, belong to the class
11.3 Decidability of Problems 409
11.3.4 Reduction
Once we have found one undecidable language, other undecidable languages can be
found through reduction.
Definition 11.3.5 Let X and Y be two formal languages. We say Y is reduced to X
if we can construct a decider or recognizer B for Y , using a decider or recognizer A
for X as a component of B, such that one of the following is true:
• If X is decidable, then Y is decidable.
• If X is recognizable, then Y is recognizable.
The concept of reduction can be generalized to computation problems X and
Y : when we want to construct an algorithm (or procedure) B for Y , we may
use the algorithm (or procedure) A for X as a component of B. Intuitively,
reduction resembles the idea of “procedure call” in computer programming. As an
experienced programmer, one has seen numerous procedure calls in object-oriented
programming as a way of problem-solving. That is, if we have a method A for
problem X, to solve problem Y , we may construct method B from A for Y . For
instance, let X be “measuring the diameter of a circle” and Y be “measuring the
area of a circle,” then method A of X can be used for constructing method B of Y .
That is exactly the meaning of reducing Y to X.
Suppose we have a decider (recognizer) A to answer the question of whether
x ∈ X or not. Using A, we then build a decider (recognizer) B for answering the
question of y ∈ Y for any y. Inside B, we may create an input x and call A on x.
The output A(x) is used to create the output of B. If decider (recognizer) B works
correctly for Y , that is, B(y) returns true iff y ∈ Y , then Y is successfully reduced
to X.
There are two special cases of reduction. In the first case, decider (recognizer)
B with input y uses an algorithm f which takes input y and produces x = f (y).
B calls A with input x and returns A(x) as the output of B(y), as illustrated in
Fig. 11.3a. In this case, y ∈ Y iff x = f (y) ∈ X, and the reduction is called
mapping reduction, because f maps yes-instances of Y to yes-instances of X and
no-instances of Y to no-instances of X.
Proposition 11.3.6 If Y is reduced to X by mapping reduction, then Y is reduced
to X.
Fig. 11.3 Illustration of two special cases of reduction: (a) mapping reduction; (b) negation
11.3 Decidability of Problems 413
Proof The same algorithm f can be used for the reduction from Y to X.
In the second case of special reduction, the same input is used by both A and B.
For instance, to show that decidable languages are closed under complement, if we
have a decider A for a formal language L, we may construct B for L = ∗ − L
such that B(x) = ¬A(x), as illustrated in Fig. 11.3b.
The application of reduction in this chapter is different and can be stated as
follows: if Y is reduced to X by constructing B from A and we know B does not
exist for Y , then A cannot exist for X. Formally, we have the following proposition,
which comes directly from the definition of reduction.
Proposition 11.3.7 If Y is reduced to X and Y is undecidable, then X is undecid-
able.
In the following proof of the halting problem of Turing machines, X is the
encoding of the halting problem and Y is AT M , the first known undecidable
language in this book.
Theorem 11.3.8 The halting problem of Turing machines, HT M , is undecidable.
Proof Assume HT M is decidable, then there exists a hypothetical decider H for
HT M . Since H is a decider, H will halt on any input, and H accepts M, w iff M
halts on w.
Using H , we can construct another decider S for AT M as follows:
S = “On input M, w, M is a Turing machine and w is an input to M .
1. Simulate H on M, w until H halts.
2. If H rejects, return 0. // H rejects M, w if M loops on w.
3. Simulate M on w until M halts. // M will halt because H accepts M, w.
4. return M(w).”
At line 2, when H rejects M, w, it means M loops on w and justifies the
rejection of w by S. At line 3, we know M will halt on w. When M halts on w,
S returns the output of M(w) at line 4. All four lines of S will be finished in a finite
number of steps, so S is a decider.
The relationship between M and S is given below:
1. M accepts w: S accepts M, w (line 4).
2. M rejects w: S rejects M, w (line 4).
3. M loops on w: S rejects M, w (line 2).
Thus, M accepts w iff S accepts M, w. If H can decide if M halts on w, then S
can decide if M accepts w. Thus, S is a decider for AT M . Since S cannot exist, H
cannot exist, either. Thus, HT M must be undecidable.
The above proof is a reduction of the second special case, where the input strings
to S and H are the same.
Reduction is a powerful tool to show that a problem X is undecidable: assume X
is decidable and A is a decider for X. We use A to construct another decider B for a
known undecidable problem Y . Since B cannot exist, A cannot exist, and X is thus
414 11 Decidable and Undecidable Problems
M = “On input x
1. Replace x by w as new input.
2. Simulate M on w until M halts.
3. If M(w) = 0, then go to a looping state.
4. Otherwise, return 1.”
Note that B only creates the code of M and does not simulate M on w. B uses
the code of M to create M : M will replace its input x by w on the tape of M and
then simulate M on w. If M rejects w, M goes into an infinite loop. If M accepts
w, M accepts x. The relationship between M and M is given as follows:
1. M accepts w: M halts on every x.
2. M does not accept w: M loops on every x.
Thus, M accepts w iff M halts on every x. B(M, w) = A(M ) = 1 iff M ∈
DT M and M, w ∈ AT M . Hence, B is a decider for AT M if A is a decider for DT M .
Since AT M has no deciders, A cannot exist and DT M must be undecidable.
We know that both AT M and HT M are undecidable; are they recognizable? The
answer is yes for both of them.
Proposition 11.3.11 AT M is recognizable.
Proof We provide the following Turing machine for AT M .
U = “On input M, w, M is a Turing machine and w is an input to M.
1. Simulate M on w until M halts.
2. return M(w).”
It is clear that U accepts M, w if M accepts w; U rejects if M rejects w; U
loops if M loops on w. Thus, L(U ) = AT M .
U in the above proof is called the universal Turing machine, because it simulates
the execution of any Turing machine on any input. If we view M, w as a stored
program with data, U is equivalent to RASP (random-access stored-program)
machines. This idea has great impact on the development of today’s computers and
considerable theoretical importance.
Proposition 11.3.12 HT M is recognizable.
Proof HT M can be recognized by Turing machine T .
416 11 Decidable and Undecidable Problems
We continue to hold the assumption that every string of ∗ is the code of a Turing
machine (good or bad).
Definition 11.3.14 Given a predicate p(M) for all Turing machines M, let
M = “On input x
1. Put a special symbol # at the end of x and write w after #.
2. Simulate M on w until M halts, and then, erase any symbol after #, including #.
3. If M accepts w, simulate A on x until A halts and return A(x).
4. If M rejects w, go to a looping state.”
The relationship between M and M is given below:
1. M accepts w: L(M ) = X = L(A) and p(M ) = p(A) = 1.
2. M rejects w or loops on w: M loops, L(M ) = ∅ and p(M ) = 0.
Thus, M accepts w iff p(M ) = 1 (or M ∈ Lp ). T (M, w) = S(M ) = 1
iff M ∈ Lp and M, w ∈ AT M . If S can decide M ∈ Lp , then T can decide
M, w ∈ AT M . Since AT M has no deciders, S cannot exist. The used reduction is
illustrated in Fig. 11.6, where algorithm f does what in line 1 of T .
Rice’s theorem claims that any Rice property of Turing machines is undecidable.
Since many properties of Turing machines are nontrivial and language related, these
properties are undecidable. For instance, Rice’s theorem applies to the following
formal languages:
• ET M = {M | L(M) = ∅ for TM M}.
• OneT M = {M | M is a TM, |L(M)| = 1}.
418 11 Decidable and Undecidable Problems
Fig. 11.6 Illustration of a reduction from AT M to Lp in the proof of Rice’s theorem. A dashed
arrow is used to indicate a trigger for starting the simulation of a Turing machine
The concept of countable set is attributed to Georg Cantor, who made the distinction
between countable and uncountable sets in 1874. The concept of computable set is
younger and arose in the study of computing models in 1930s by the founders of
computer science, including Gödel, Church, and Turing. In the following, we will
see what will happen if we enforce that bijections are computable.
11.4 Computability of Counting Bijections 419
m
|S| m k
|F | = |B| = n = (n + 1)m
k
S⊆A k=0
m
where k = m!/(k!(m − k)!) is a binomial coefficient.
Every bijection must be total, injective, and surjective. If it is not total, its inverse
is not surjective. If f : X → Y is a bijection, the following properties of f are
desired:
• Counting. Y is N (the set of natural numbers), and we say f : N → Y is a
counting bijection of Y .
• Computable. For every x ∈ X, we compute y = f (x) ∈ Y , and for every y ∈ Y ,
we compute x = f −1 (y) ∈ X.
• Increasing. Let f : N → Y be a counting bijection of Y and be a well-order
of Y . For any i ∈ N , f (i + 1) f (i).
By Definition 1.3.2, the inverse of a rank function is a counting bijection, which
occurs in the study of countability. If you want to show an infinite set is countable,
you must find a counting bijection for this set. Here are some examples of bijective
proofs:
• Example 11.4.1 is one of many examples in combinatorics to show two finite sets
have the same size. The bijection in this example is computable.
• In Sect. 1.3.1, we used bijective proofs to show some infinite sets are countable
(Proposition 1.3.4). All bijections used in the proof of Proposition 1.3.4 are
computable counting bijections. They are also increasing if we choose a well-
order properly for these sets.
420 11 Decidable and Undecidable Problems
• We also used bijective proofs to show some uncountable sets have the same
size (Proposition 1.3.7). These bijections are neither computable nor counting
bijections.
The above examples suggest that bijections between countable sets are com-
putable. In the days of Cantor, the notion of computable was not formally defined.
It is reasonable to guess that Cantor used bijections as other people, assuming
that counting bijections can be computed by effective methods. This assumption is
deceiving because we will show that if a formal language has a computable counting
bijection, then this language is recognizable. If every formal language is countable,
an unrecognizable language must have an uncomputable counting bijection. To
facilitate the discussion, we introduce the following concept.
Definition 11.4.2 A set S is said to be computably countable if either S is finite or
S has a computable counting bijection.
In the above definition, if the word “computable” is dropped, we obtain the
definition of countable (Sect. 1.3.1). From the definition, the relationship between
countable and computably countable is given by the following proposition.
Proposition 11.4.3
(a) If S is computably countable, then S is countable.
(b) S is not computably countable iff either S is uncountable or every counting
bijection of S is uncomputable.
In the following, we show that a formal language is computably countable iff
it is recognizable. To understand the equivalence proof of computably countable
and recognizable, we need the concept of enumerator, which is a variant of Turing
machine.
An enumerator is a Turing machine M that runs forever once started and has an
attached printer as illustrated in Fig. 11.7. M can use that printer as an output
device to print strings. The set of printed strings, denoted by E(M), is the language
enumerated by M. A formal language L is called recursively enumerable if there
exists an enumerator M such that L = E(M). Note that an enumerator does not
have an input, so, the tape is purely working space. The printer can be formally
defined as a write-only tape with the tape head going one direction. The following
result is known for enumerators:
Theorem 11.4.4
(a) If a formal language L is recursively enumerable, then L is recognizable.
11.4 Computability of Counting Bijections 421
following two lemmas show that a formal language is computably countable iff
it is recognizable. Thus, recognizable, recursively enumerable, and computably
countable are equivalent concepts even though they have different definitions.
Lemma 11.4.6 Every recognizable formal language is computably countable.
Proof Suppose L is recognizable. The case when L is finite is trivial. If L is infinite,
by Theorem 11.4.4(b), we have an enumerator M such that L = E(M) and every
printed string by M is unique, then E(M) is countable because the order in which
the strings are printed by M defines a computable bijection f : N → E(M). That
is, for n ∈ N , we compute f (n) by algorithm A(n) as follows:
Algorithm A(n): Let c := 0 and simulate M. When a string x is printed by M, check if
c = n. If yes, return x; otherwise c := c + 1 and continue the simulation.
Algorithm A will terminate because E(M) is infinite and c, which records the
number of printed strings by M, will reach n eventually. A(n) can be modified
as A(w) to compute f −1 (w) for w ∈ E(M): instead of checking c = n, check if
x = w and, if yes, return c. Since both f and f −1 are total functions, f is a bijection
(Proposition 1.3.1). Algorithm A(n) is the evidence that f is computable.
Lemma 11.4.7 Every computably countable formal language is recognizable.
Proof The case when L is finite is trivial. If L is computably countable, then there
exists a computable bijection f : N → L, Thus, we may use f to design an
enumerator M that prints strings f (0), f (1), f (2), and so on. It is evident that
E(M) = L. By Theorem 11.4.4(a), L is recognizable.
The following theorem extends the above result from formal languages to general
sets.
Theorem 11.4.8 A set is computably countable iff it is computable.
Proof Let S be any set. The case when S is finite is trivial. If S is computably
countable, then there exists a computable bijection f : N → S. Hence, S can be
represented by the formal language {f (0), f (1), . . .}, where f (i) is a string
representing f (i). By Lemma 11.4.7, S is recognizable, thus, computable.
When S is not computably countable, there are two cases to consider: S can
be represented by a formal language or not. If yes, then S is unrecognizable by
Lemma 11.4.6, thus uncomputable. If not, then S cannot be recognized by any
Turing machine, thus uncomputable.
Proposition 11.4.9 If X is countably infinite, then
1. X is decidable iff there exists a computable and increasing bijection f : N → X.
2. X is computable iff there exists a computable bijection f : N → X.
3. X is uncomputable iff every bijection f : N → X is uncomputable.
Proof By Theorem 11.4.8, X is computable iff X is computably countable. (3)
comes from Proposition 11.4.3, although the existence of an uncomputable counting
bijection is unknown. (2) is logically equivalent to (3). For (1), X is decidable iff
11.4 Computability of Counting Bijections 423
L = {w ∈ ∗ | M(w) = 1}.
ET M , NT M , and AllT M are the encoding of all Turing machines that accept,
respectively, nothing, something, and everything. NT M is the complement of ET M .
It is known that NT M is recognizable and ET M is unrecognizable (Proposi-
tion 11.4.15). By Lemma 11.4.6, NT M is computably countable. By Lemma 11.4.7,
ET M is not computably countable. Both AllT M and its complement are unrecogniz-
able (Proposition 11.4.17), hence, not computably countable.
Proposition 11.4.11 The set of all deciders is not computably countable; hence,
DT M is unrecognizable.
Proof Assume DT M is computably countable with a computable bijection h :
N → DT M such that h(i) = Mi for each i ∈ N . Using h and γ (Definition 1.3.5),
we construct the following Turing machine X:
X = “On input w ∈ ∗
1. Compute i = γ (w) and h(i) = Mi ∈ DT M .
2. Simulate Mi on w until Mi halts. // Mi is a decider.
3. return 1 − Mi (w).” // i.e., X(w) = ¬Mi (w).
Since every step of X stops, X ∈ DT M . Let X = Mk for some k ∈ N . However,
X(wk ) = Mk (wk ), a contradiction to X = Mk . So, the assumption is wrong and
DT M is not computably countable. By Lemma 11.4.6, DT M is unrecognizable.
Proposition 11.4.12 The set of all algorithms is not computably countable; hence,
AlT M is unrecognizable.
Proof The proof is analogous to that of Proposition 11.4.11, where line 3 of X is
changed to “return 1 + Mi (w).”
Note that DT M ⊂ GT M and GT M is countable. If DT M is countable, then every
counting bijection of DT M is uncomputable (Proposition 11.4.3(b)).
Rice’s theorem is very useful for showing some languages are undecidable. If p
is a Rice property, then Lp = {M | p(M) = 1} is undecidable. If we also know
that Lp is recognizable, then Lp must be unrecognizable by Proposition 11.4.13. For
instance, ET M can be shown unrecognizable this way, because ET M is recognizable
(an exercise).
Proposition 11.4.15 ET M is unrecognizable.
Analogous to Proposition 11.3.7, we have the following result from Defini-
tion 11.3.5:
Proposition 11.4.16 If Y is reduced to X and Y is unrecognizable, then X is
unrecognizable.
The above proposition is very useful to show some formal languages are
unrecognizable.
Proposition 11.4.17 Both AllT M and AllT M are unrecognizable.
Proof We will first work on AllT M . Assume AllT M is recognizable and A is its
recognizer. We reduce AT M to AllT M by constructing recognizer B for AT M as
follows, assuming the initial state of M is not the accept state:
B = “On input M, w, M is a Turing machine and w is an input to M.
1. Generate the code of Turing machine M (see below), M , from M, w.
2. return A(M ).”
M = “On input x
1. Simulate M on w for at most |x| moves.
2. If M accepts w in line 1, go to a loop state.
3. return 1.”
The relationship between M and M is given as follows:
1. M accepts w in n moves: M loops on every x if |x| ≥ n and accepts x if |x| < n.
2. M does not accept w: M accepts every x.
Thus, M does not accept w iff M accepts every x. B(M, w) = A(M ) = 1
iff M ∈ AllT M and M, w ∈ AT M . Hence, B is a recognizer for AT M if A is
a recognizer for AllT M . Since AT M has no recognizers, A cannot exist and AllT M
must be unrecognizable.
Now on AllT M . Assume it is recognizable and A is its recognizer. We will use
the same B in the proof of Proposition 11.3.10 to reduce AT M to AllT M . The only
difference is that now A is a hypothetical recognizer for AllT M instead of a decider
for DT M , and B is a recognizer for AT M . The relationship between M and M is
given as follows:
1. M accepts w: M accepts every x and L(M ) = ∗ .
2. M does not accept w: M does not accept any x and L(M ) = ∅.
426 11 Decidable and Undecidable Problems
Thus, M does not accept w iff L(M ) = ∗ . B(M, w) = A(M ) = 1 iff
L(M ) = ∗ and M, w ∈ AT M . Hence, B is indeed a recognizer for AT M ,
and AT M is reduced to AllT M . By Proposition 11.4.14, AT M is unrecognizable. By
Proposition 11.4.16, AllT M must be unrecognizable.
Note that the two reductions used in the above proof are mapping reduction.
The first reduction comes from Prof. Tian Liu of Beijing University. The same
mapping of the second reduction was used to show that AT M is reduced to
DT M (Proposition 11.3.10). Applying Proposition 11.3.6, AT M is reduced to DT M .
Applying Proposition 11.4.16, we have the following result.
Proposition 11.4.18 DT M is unrecognizable.
AllT M is our first example to the statement “both L and L are unrecognizable”
and DT M is our second example.
In summary, we have three methods to show that a formal language X is
unrecognizable:
1. If X is not computably countable, then X is unrecognizable (Lemma 11.4.6).
Example: DT M , AlT M
2. If X is recognizable but undecidable, then X is unrecognizable (Proposi-
tion 11.4.13).
Example: ET M , AT M , HT M
3. If Y is reduced to X and Y is unrecognizable, then X is unrecognizable
(Proposition 11.4.16).
Example: AllT M , AllT M , DT M
Combining the above results with Lemma 11.4.7, we have the following result.
Proposition 11.4.19 The following formal languages are not computably count-
able:
ET M , AllT M , AllT M , DT M , AT M , HT M .
head (inclusive) to right (excluding the trailing blanks); the second stack stores all
the symbols from the tape head (exclusive) to left. The state information remains the
same. It is easy to show that the two-stack machine is Turing complete by simulating
all the Turing moves by the operations on stacks. Initially, the input string is stored
in the first stack with the first symbol on the top of the stack. The simulation of a
Turing move goes as follows: a symbol a is popped off from the first stack; if the
stack is empty, use the blank symbol. If δ(q, a) = (p, b, R), push b into the second
stack; if δ(q, a) = (p, b, L), push b into the first stack and then move the top
symbol from the second stack (if the second stack is empty, use the blank symbol)
to the first stack by stack operations.
The significance of being Turing complete is that all the undecidable properties
of Turing machines will be inherited by the new computing model. Thus, the halting
problem of a two-stack machine is undecidable, because if we have an algorithm to
tell the termination of a two-stack machine, this algorithm can be used to show the
termination of a Turing machine. Similarly, if we can show the Turing completeness
of a logic, we can derive the same result.
% move(L1, L2, Q, R1, R2): move from (L1, Q, R1) to (L2, P, R2)
move(L, L, qa, R, R) :- !. % halt on qa, the accept state
move(L0, L, Q, R0, R) :- % simulate one TM move
current(R0, A, Rest), % load the current symbol in A
delta(Q, A, P, B, Direct), % delta(Q, A) = (P, B, Direct)
result(Direct, L0, L1, B, Rest, R1), % update L and R
write(’|- ’), write(rev(L1)), write(P), write(R1), nl,
move(L1, L, P, R1, R). % continue to move
% result(dir, L1, L2, B, R1, R2): B is the symbol pointed by tape head
% According to "dir", stack1 changes from L1 to L2.
% stack2 changes from R1 to R2,
result(left, [C|L], L, B, R, [C,B|R]). % move top symbol of L to R
result(right, L, [B|L], B, R, R). % push B at the top of L
?- turing([a,a,b,b], Tape).
q0[a,a,b,b]
|- rev([x])q1[a,b,b]
|- rev([a,x])q1[b,b]
|- rev([x])q2[a,y,b]
|- rev([])q2[x,a,y,b]
|- rev([x])q0[a,y,b]
|- rev([x,x])q1[y,b]
|- rev([y,x,x])q1[b]
|- rev([x,x])q2[y,y]
|- rev([x])q2[x,y,y]
|- rev([x,x])q0[y,y]
|- rev([y,x,x])q3[y]
|- rev([y,y,x,x])q3[]
|- rev([y,x,x])qa[y,bl]
Tape = [x,x,y,qa,y,bl].
Turing machine on an input, as the moves of the Turing machine can be simulated by
a Prolog program. The termination of the Prolog program in the simulation ensures
the termination of the Turing machine.
Theorem 11.5.4 The satisfiability of a set of Horn clauses is undecidable.
Proof of Sketch Let H be the clauses used in the Prolog program from the proof of
Proposition 11.5.2 for simulating Turing machine M, including the negative clause
resulting from the query derived from q0 w. It is easy to check that H is a set of Horn
clauses. If we have an algorithm A to tell that H is unsatisfiable, then there exists a
resolution proof of H . This resolution proof ensures that w is accepted by M. Thus,
algorithm A can tell if Turing machine M accepts w or not. This is a contradiction
to the fact that the acceptance problem of Turing machines is undecidable.
Theorem 11.5.5 The satisfiability of a first-order formula is undecidable.
The proof is left as an exercise.
In Sect. 7.2.5, we briefly introduced the concept of string rewrite system (SRS). Axel
Thue introduced this notion hoping to solve the word problem for Thue systems of
finite alphabets: given an SRS (, R), let =R denote the symmetric closure of ⇒, a
transitive and monotonic relation induced by R over ∗ , how to decide s =R t for
any s, t ∈ ∗ ? Only in 1947 was the word problem shown to be undecidable—this
result was obtained independently by Emil Post and Andrey Markov.
Proposition 11.5.6 SRS is Turing complete.
Proof Let M be any Turing machine. Assuming the states and the tape do not share
symbols, any configuration of M is a string of tape and state symbols of form αqβ,
where q is a state symbol, and α and β are strings of tape symbols. We may use an
SRS R to describe all the moves of M as follows: the initial configuration q0 w is
represented by the string q0 w$, where $ is the end marker of all non-blank symbols
on the tape of M. If δ(q, a) = (p, b, R), we have a rewrite rule qa → bp; moreover,
if a is (the blank symbol), i.e., δ(q, ) = (p, b, R), we add a rewrite rule q$ →
bp$. If δ(q, a) = (p, b, L), we have a set of rewrite rules: for any tape symbol c,
cqa → pcb; moreover, if a is , we add the following rewrite rules: cq$ → pcb$,
where c is any tape symbol. Then, it is trivial to show that q0 w ∗ αqa β in M iff
q0 w$ ⇒∗ αqa β$ in R.
Example 11.5.7 For Turing machine M0 given in Example 11.2.3, the rewrite rules
for each delta move generated by the above proof are given below:
430 11 Decidable and Undecidable Problems
It is easy to see that q0 0011$ ⇒∗ 0011 $ by the SRS consisting of the above
rewrite rules.
Since SRS is a special case of TRS (term rewrite system), the Turing complete-
ness of SRS implies that of TRS.
Theorem 11.5.8 The termination of rewrite systems is undecidable.
Proof of Sketch If we have an algorithm to decide the termination of any rewrite
system (either SRS or TRS), then we can use the algorithm to decide the termination
of a Turing machine as the moves of the Turing machine can be expressed as a set
of rewrite rules. The termination of the rewrite system ensures the termination of
the Turing machine.
Theorem 11.5.9 The word problem of Thue systems is undecidable.
Proof of Sketch If we have an algorithm to decide the word problem of Thue
systems, then we can use the algorithm to decide the acceptance problem of Turing
machines (encoded as AT M ): given any Turing machine M and input string w, we
create M from M by introducing a new accept state qf such that M simulates
M on w, and once M enters the accept state qa , M will erase every non-blank
symbol (replaced by the blank symbol) before entering qf , so that the accepting
configuration of M is unique. Now, let s = q0 w$ (the initial configuration) and
t = qf $ (the accepting configuration), then M accepts w iff s =R t, where R is the
SRS created from the definition of M as we did in Example 11.5.7.
Recall that the congruence problem of first-order logic with equality (Defini-
tion 7.1.7) is the problem of deciding if s =E t or not for any two terms s and t,
where E is a given set of equations. The following is a consequence of the word
problem.
Corollary 11.5.10 The congruence problem of first-order logic with equality is
undecidable.
In this chapter, we have introduced the concept of Turing machines and proved
formally that the well-known halting problem of Turing machines is undecidable.
Since a Prolog program or a rewrite system can simulate the moves of any Turing
machine, the termination of Prolog programs or rewrite systems is undecidable,
too. The acceptance problem of Turing machines is also undecidable. Since this
problem can be specified by a set of Horn clauses, it implies that the satisfiability
of Horn clauses is undecidable. As a consequence, the satisfiability of first-order
logic is undecidable. The satisfiability of any logic containing first-order logic,
Exercises 431
such as higher order logic, is also undecidable. These theoretical results help us to
understand the logic thoroughly and avoid unnecessary efforts to find an algorithm
for these undecidable problems.
Exercises
1. For the Turing machine M1 of Example 11.2.1, please provide the sequence of
moves of M1 for each of the input strings: (a) a 2 b3 ; (b) a 2 b; (c) ab2 a. Each
sequence must end with M1 accepting or rejecting the input.
2. Provide a complete design of Turing machine M2 in Example 11.2.10 for
recognizing A2 = {bi a j bk | i ∗ j = k, i, j ≥ 0}. That is, provide the detailed
δ definition of M2 and group the moves of your definition according to the
high-level description of M2 . Provide the sequence of moves on the input string
w = b2 a 1 b2 .
3. Provide a high-level description of a Turing machine M3 for recognizing A3 =
{a i bj ck | k = i/j , i, j ≥ 0}. Each step of the description uses either known
Turing machines or is expressed (maybe informally) by a sequence of Turing
machine moves.
4. Provide a high-level description of a Turing machine M4 for recognizing A4 =
j
{a i bj | i, j ≥ 0}, and the tape contains exactly a i when M4 stops with qa .
Each step of the description uses either known Turing machines or is expressed
by a sequence of Turing machine moves.
5. Suppose A is a formal language. Show that A is decidable if (a) |A| = 1; (b) A
is finite.
6. Let C10T M = {M | TM M moves at most ten steps for any input string}.
Show that C10T M is decidable by designing a decider.
7. Show the following languages are decidable:
(a) A1 is the set of all binary numbers divisible by 4.
(b) A2 is the set of all binary numbers divisible by 3.
(c) A3 = {0i 1j 0k | i + j = k}.
8. Show that decidable languages are closed under (a) union, (b) intersection,
and (c) complement. That is, suppose A and B are decidable, the following
languages are decidable: (a) A ∪ B; (b) A ∩ B; (c) A = ∗ − A.
9. Show that recognizable languages are closed under (a) union and (b) intersec-
tion. That is, suppose A and B are recognizable, the following languages are
recognizable: (a) A ∪ B; (b) A ∩ B.
10. Provide a mapping reduction from AT M to HT M .
11. Prove formally that the partial correctness of Hoare triples is undecidable.
12. Prove by reduction from AT M that the following properties of Turing machines
are undecidable (Rice’s theorem cannot be used):
(a) OneT M = {M | L(M) = {0}}.
432 11 Decidable and Undecidable Problems
References
1. Jack Copeland, “The Church-Turing Thesis”, The Stanford Encyclopedia of Philosophy, Edward
N. Zalta (ed.), Summer 2020
2. Michael Sipser, Introduction to the Theory of Computation, 3rd ed., January 1, 2014
Chapter 12
Decision Procedures
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 433
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2_12
434 12 Decision Procedures
Most SMT solvers use the DPLL(T ) framework [2], where T stands for theories,
including equality, linear arithmetic, arrays, etc. DPLL(T ) is expressed as an
extension of the DPLL algorithm. At high level, DPLL(T ) works by transforming
an SMT formula into a SAT formula where atomic formulas are replaced by
propositional variables. DPLL(T ) repeatedly finds a model for the SAT formula and
consults a theory solver to check consistency under the domain-specific theory, until
either a theory-consistent model is found, or the SAT formula runs out of models.
where F does not contain any quantifier and f ree(F ) = {x1 , . . . , xn }. On the other
hand, in the unification problem or in middle school algebra, the variables in a set
of equations are existentially quantified.
In the abstraction process, each free variable is regarded as “ground”: these
variables have the same quantification in p and ¬p. For instance, if p stands for
q(x), then ¬p stands for ¬q(x). Had we given the quantification of x in p as
p ≡ ∀x q(x), we would have ¬p ≡ ∃x ¬q(x); hence, the quantification of x would
be different in p and ¬p.
Proposition 12.1.1 For any QF formula A, if A is satisfiable, so is γ (A).
Proof Let B = γ (A), the abstract of A. For any atomic formula Ai of A, let pi =
γ (Ai ) be the propositional variable which replaces Ai in the abstraction. If A is
satisfiable, then A has a first-order model I such that I (A) = 1. There are three
cases to consider:
• A is ground: We define a propositional interpretation σ for B as follows: σ (pi ) =
I (Ai ), where pi = γ (Ai ). It is easy to check that σ is a model of B by comparing
the formula trees of A and B under σ and σ , respectively.
12.1 DPLL(T ): Extend DPLL with Theories T 437
For the DPLL(T ) framework, abstraction γ (A) plays two roles: (1) As a filter of
models of A so that we waste less effort to check if A has models. (2) We need only
to check if a conjunction of literals from A has a model or not.
. . .
Example 12.1.3 Let X be (a = b ∨ b = a) ∧ b = c ∧ a = c and Y = γ (X) =
(p1 ∨ p2 ) ∧ p3 ∧ ¬p4 . Y has three models: {p1 , p2 , p3 , ¬p4 }, {p1 , ¬p2 , p3 , ¬p4 },
and {¬p1 , p2 , p3 , ¬p4 }. The conjunction of atomic formulas corresponding to
. . .
{p1 , p2 , p3 , ¬p4 } is a = b ∧ b = a ∧ b = c ∧ a = c, which is unsatisfiable.
Similarly, the SMT formulas corresponding to the other two models of Y are also
unsatisfiable. Thus, X is unsatisfiable.
Now, back to the verification example in the beginning of this section, let A be
¬X and B be ¬Y in the example. That is, B is ¬((p1 ∧p2 ∧p3 ∧¬p4 ) → (p5 ∧p6 )),
where
.
p1 : m = arrMax(i, A) p2 : i ≤ n
p3 : i < n p4 : m < A[i]
.
p5 : m = arrMax(i + 1, A) p6 : i + 1 ≤ n
S = {p1 , p2 , p3 , p4 , (p5 | p6 )}
S has three models in which p1 , p2 , and p3 are true, p4 is false, and one of p5 and
p6 must be false. In integer arithmetic, p3 and p6 , i.e, i < n and i + 1 ≤ n, are
equivalent. Thus, p6 must be true when p3 is true. Moreover, p1 ∧ ¬p4 → p5 is
true by the definitions of arrMax and max:
Thus, p5 must be true when p1 is true and p4 is false. In other words, no models of
S are acceptable for A, so, A = ¬X is unsatisfiable.
In Chap. 4, we have presented the DPLL algorithm in detail and described several
techniques for the efficient implementation of DPLL, which is an enumeration-based
search method for propositional satisfiability. DPLL tries to find a model for a set
12.1 DPLL(T ): Extend DPLL with Theories T 439
of propositional clauses. Here, a model from DPLL can be viewed as a set of true
literals in the model.
Example 12.1.4 Consider the problem of deciding the satisfiability of the follow-
ing first-order formula X in conjunctive normal form:
. .
X : g(a) = c ∧ (f (g(a)) = f (c) ∨ g(a) = d) ∧ c = d
DPLL(T ) will feed these clauses to a SAT solver which returns {p1 , ¬p2 , ¬p4 }
as its first model. The first-order formula corresponding to this model is
.
g(a) = c ∧ f (g(a)) = f (c) ∧ c = d
which can be proved to be unsatisfiable, too. DPLL(T ) then adds (¬p1 ∨ ¬p3 ∨ p4 )
to the SAT solver, which will return “unsatisfiable.” Finally, DPLL(T ) claims that
the original formula X is unsatisfiable as the SAT solver runs out of models.
The above example illustrates a great advantage of the DPLL(T ) framework: the
basic DPLL algorithm handles the logic operators, and the decision procedures for
theories T handle a conjunction of literals, thus reducing the complexity of decision
procedures required by the theories. In fact, many theories of interest have efficient
decision procedures for the satisfiability of sets (or conjunctions) of literals.
Let us call the decision procedures used in DPLL(T ) theory solver for T . These
theory solvers use various algorithms for each domain appearing in the input. The
above example also illustrates several interesting aspects of DPLL(T ).
1. Atomic formulas in the input formulas are abstracted to propositional variables.
This abstraction is done once for all as a preprocessing step.
2. The SAT solver is called multiple times, each time with an increment of input
clauses to avoid repeated models, Thus, an incremental SAT solver is desired to
support DPLL(T ).
440 12 Decision Procedures
3. The theory solver is called multiple times, each with a conjunction of literals as
input. Each input corresponds to a propositional model found by the SAT solver.
The theory solver does not need to be incremental because the input changes
unpredictably. That is, the interaction between a SAT solver and a theory solver
goes back and forth in DPLL(T ).
4. This approach of solving theory satisfiability via propositional satisfiability is
said to be lazy: domain information is used lazily when checking consistency
of propositional models in theory T . In contrast, the eager approach converts
the original theory satisfiability problem into an equisatisfiable propositional
instance.
5. In this lazy approach, every tool does what it is good at:
• SAT solver takes care of Boolean combinations.
• Theory solver takes care of conjunction of literals in theories.
The only differences between DPLLT and Algo. 4.2.5 are the following:
• At line 0: abstraction(X) computes the abstract of X.
• At line 11–13: Instead of returning true after (A = nil) (every literal has a
truth value), we call isTheoryConsistent(σ ) to check if the corresponding SMT
formula to the propositional model σ of C is consistent in theories of X. If yes,
return true; otherwise, create a new clause from the negation of the decision
literals in σ : negateDecisions(σ ) returns the negation of the conjunction of
decision literals in σ . Then, insertNewClause is called on the newly created
clause (see line 7).
The implementation of abstraction(X) is straightforward. The purpose of line 13
is to support CDCL (conflict-directed clause learning).
The algorithm DPLLT(X) just illustrates how easily we can implement the
DPLL(T ) framework if we have a SAT solver and an implementation of isTheo-
ryConsistent(σ ). The latter depends on the theories appearing in X and is the focus
of the remaining sections.
Example 12.1.6 Suppose T is equality and X contains the following clauses:
. . . . .
(b = c), (h(a) = h(c)) ∨ p, (a = b) ∨ ¬p ∨ (a = d), (a = b) ∨ (a = d)
The abstraction procedure will convert X into C which contains the following
clauses:
Again, isTheoryConsistent(σ ) found this formula inconsistent and returns false. The
new clause from the negation of the decision literal is (t).
Now, DPLLT will backtrack to level 0 and find model σ = {p, q, r, s, t} (or
σ = {p, q, ¬r, s, t}, . . . ) and return true at line 12 as isTheoryConsistent(σ ) will
return true.
Naturally, the minimal modification of DPLL into DPLLT does not necessarily
gives us an ideal implementation of DPLL(T ). At the end of this chapter, we will
discuss some issues to make DPLLT more practical and efficient.
Next, we will present the decision procedures for the following theories, which
occur often in program verification and can be integrated in DPLLT:
• Equality with uninterpreted functions
• Linear arithmetic
Since the body of the while loop in P 1 is executed twice, we may unroll the loop by
introducing auxiliary variables x0 , x1 , x2 to record the values of x before and after
x := a[i] + x. The relationship between them can be expressed by
. . .
x0 = a[3] ∧ x1 = a[2] + x0 ∧ x2 = a[1] + x1
. .
Together with y = a[1] + (a[2] + a[3]), we would like to prove x2 = y.
If + is an interpreted function, it will be difficult to decide the value of a[2]+a[3],
x2 , and y. If we replace + by an uninterpreted function f , the verification becomes
to prove the following theorem:
. . . .
(x0 = a[3] ∧ x1 = f (a[2], x0 ) ∧ x2 = f (a[1], x1 ) ∧ y = f (a[1], f (a[2], a[3])))
.
→ x2 = y.
The above can be shown to be true by rewriting, viewing each equality as a rewrite
rule from left to right. Once the theorem is proved, the equivalence of P1 and P2
holds for any binary operation f on any type of a[1..3] (e.g., double or matrix).
On the other hand, if the generalized formula cannot be proved, it does not mean
that the equivalence of the programs does not hold. For instance, in P2 , if we have
y := (a[1] + (a[2]) + a[3], then the two programs are still equivalent because +
.
is associative. However, f (a[1], f (a[2] + a[3])) = f (f (a[1], a[2]), a[3]) is no
longer true, unless f is known to be associative.
We would like to point out that x and y used in P1 and P2 are variables of the
code; they are not variables of first-order logic. The formula for the equivalence of
P1 and P2 is indeed ground (i.e., variable-free) because x and y are 0-arity functions
in first-order logic. To avoid confusion, we call these 0-arity functions identifiers.
Recall that the axioms of equality introduced in Chap. 7 are the following five
formulas, where the variables are assumed universally quantified:
.
reflexivity : x = x;
. .
symmetry : (x = y) → (y = x);
. . .
transitivity : (x = y) ∧ (y = z) → (x = z);
.
functionmonotonicity : For 1 ≤ i ≤ k, (xi = yi ) → f (x1 , . . . , xk )
.
= f (y1 , . . . , yk );
.
predicatemonotonicity : For 1 ≤ i ≤ k, xi = yi ∧ p(x1 , . . . , xk )
→ p(y1 , . . . , yk ).
444 12 Decision Procedures
.
A set A of literals is called ground equality if the only predicate in A is “=” and
.
no literals contain variables. We often write ¬(s = t) as s = t. A positive literal
.
s = t is called equation and a negative literal s = t is called inequality. If constants
c1 and c2 appear in A, we assume that (c1 = c2 ) ∈ A, so that we do not distinguish
.
identifiers (i.e., functions of zero arity) and constants. If predicates other than = are
.
needed, we treat them as functions: replace positive literal p(x) by p(x) = and
.
negative literal ¬p(x) by p(x) = ⊥, and add = ⊥ into A.
Given a set A of ground equality, is A satisfiable with the axioms of equality?
For instance, in Example 12.2.1, let A be
. . .
{x0 = a[3], x1 = f (a[2], x0 ), x2 = f (a[1], x1 ),
.
y = f (a[1], f (a[2], a[3])), x2 = y},
.
then A is unsatisfiable iff x2 = y is a logical consequence of the first four equations.
The above decision problem is traditionally called the congruence closure of
equality with uninterpreted functions. Since a function is uninterpreted by default
in first-order logic, we prefer to call it the congruence closure of ground equality.
.
Note that = is the only interpreted function and only the axioms of equality contain
universally quantified variables. Some people also called it the congruence closure
of quantifier-free equality. The set of quantifier-free formulas is the union of ground
formulas and formulas with free variables. For the congruence closure problem
under discussed, no free variables are presented. Thus, “ground” is a better label
than “quantifier-free” for the congruence closure problem.
The conventional approach, proposed by Shostak, Nelson and Oppen, consists of
the following steps:
1. Divide a set A of ground equality into equations E (positive literals) and
inequalities D (negative literals).
2. Compute the congruence closure =E , which is the congruence closure generated
by E (Definition 7.1.7). It is the minimum equivalence relation satisfying the
monotonicity axioms.
3. If there exists (s = t) ∈ D such that s =E t, then return “unsatisfiable”;
otherwise, return “satisfiable.”
The soundness of the above approach is stated as the following theorem:
Theorem 12.2.2 Given a set A of ground equality, including both equations and
inequalities, let E ⊆ A be the set of all equations in A. If there exists (s = t) ∈ A
such that s =E t, then A is unsatisfiable; otherwise, A is satisfiable.
The proof can be done by Herbrand models (Proposition 7.1.3) and is left as an
exercise. Now, we are ready for the proof of the following theorem:
Theorem 12.2.3 The congruence closure of ground equality is decidable.
Proof Given a set A of ground equality, including both equations and inequalities,
let E ⊆ A be the set of all equations in A. According to Theorem 7.2.22, there is
12.2 Equality with Uninterpreted Functions 445
proc merge(v1 , v2 )
1 s1 := Find(v1 ); s2 := Find(v2 )
2 if s1 = s2 return // already in the same class, exit
3 P1 := use(s1 ); P2 := use(s2 )
4 Union(s1 , s2 ) // Union two classes into one
5 for (t1 , t2 ) ∈ P1 × P2 do // Congruence propagation
6 if Find(t1 ) = Find(t2 ) ∧ congruent(t1 , t2 )
7 merge(t1 , t2 )
The pseudo-codes for congruent, Union, and Find are given below.
12.2 Equality with Uninterpreted Functions 447
proc Union(s1 , s2 )
// Initially, for any node s, w(s) = 1, the weight of a singleton tree.
1 if w(s1 ) < w(s2 ) x := s1 ; s1 := s2 ; s2 := x // switch s1 and s2
2 up(s2 ) := s1 // s2 ’s representative is s1
3 use(s1 ) := use(s1 ) ∪ use(s2 ); use(s2 ) = ∅
4 w(s1 ) := w(s1 ) + w(s2 )
proc Find(v)
1 r := up(v) // look for the root of the tree containing v
2 while (r = up(r)) r := up(r) // r is a root iff r = up(r)
3 if (r = v) // compress the path from v to r
4 while (r = v) up(v) := r; v := up(v)
5 return r
Algorithm 12.2.5 The algorithm NOS(A) takes a set A of ground equality, which
is a mix of equations or inequalities, as input, create the term graph G = (V , E)
from all the terms in A, construct the congruence classes for equations in A, and
check the satisfiability of A.
proc NOS(A) : Boolean
1 G := createGraph(A) // up(v) = v for each v in G
.
2 for (s = t) ∈ A do merge(vs , vt ) // vs , vt represent s, t in G
3 for (s = t) ∈ A do
4 if Find(vs ) = Find(vt ) return ⊥ // unsatisfiable
5 return // satisfiable
.
Example 12.2.6 Continue from Example 12.1: to handle f (a, b) = a, merge(2, 3)
is called. The first line of merge calls Find twice with the result s1 = Find(2) = 2
and s2 = Find(3) = 3. Since s1 = s2 , we get P1 = {1} and P2 = {2}, preparing for
congruence propagation.
At line 4 of merge, Union(2, 3) is called. Because w(2) = w(3) = 1, the second
part of line 1 is skipped. At line 2, up(3) is changed to 2, meaning node 3 uses node
2 as its class name. At line 3, w(2) is changed from 1 to 2, as the class tree increases
its weight. At line 4, the parents of node 3 move to those of node 2.
Now, back to line 5 of merge: since P1 × P2 = {1, 2}, t1 = 1 and
t2 = 2 are the only case for t1 and t2 . Since Find(1) = 1 and Find(2) = 2,
congruent(1, 2) is called and returns true, because Find(child(1, 1)) = Find(2) =
2 = Find(3) = Find(child(2, 1)) and Find(child(1, 2)) = Find(child(2, 2)) = 4.
Now, merge(1, 2) is recursively called at line 7 and it will merge node 1 into {2, 3}
(i.e., up(1) = 2). The final congruence classes are {1, 2, 3} and {4}.
448 12 Decision Procedures
This rewrite system is canonical (confluent and terminating), flat, and singleton. It
can serve as a decision procedure for =E0 .
Let U be the set of all unit rules returned by the Knuth-Bendix completion
procedure. Any rule in U whose left side is an identifier introduced by flattening
and slicing can be removed, because this identifier occurs nowhere other than in this
rule; thus, there is no chance to use this rule. For the above example, the last rule,
i.e., c2 → c1 , can be deleted.
. .
Example 12.2.10 Let E0 = {f 3 a = a, f 5 a = a}. Then, E1 = flattenSlicing(E0 ),
where E1 contains the following equations:
. . . . . .
{f (a) = c1 , f (c1 ) = c2 , f (c2 ) = a, f (c2 ) = c3 , f (c3 ) = c4 , f (c4 ) = a}
. .
The first three equations come from f 3 a = a; all the equations excluding f (c2 ) = a
.
come from f a = a, as we reuse the first two equations.
5
Feeding E1 to the Knuth-Bendix procedure, the first three equations will produce
three rewrite rules:
(4) c3 → a.
450 12 Decision Procedures
. .
The fifth equation, f (c3 ) = c4 , is rewritten by (4) and (1) to c1 = c4 , from which
we obtain
(5) c4 → c1 .
. .
The last equation, f (c4 ) = a, is rewritten by (5) and (2) to c2 = a, from which
we obtain
(6) c2 → a,
.
which rewrites (2) to f (c1 ) → a. Furthermore, (6) reduces (3) to f (a) = a, then
.
by (1) to c1 = a, from which we obtain
(7) c1 → a.
.
(1) becomes f (a) → a by (7). (7) also reduces (2) to f (a) = a, then by (1) to
.
a = a.
The final rewrite system is
If we remove those rules in U whose left side is a new identifier, then only the
first rule, i.e., f (a) → a, is needed for the decision procedure of =E0 .
During the execution of the Knuth-Bendix completion procedure (Proc. 7.2.15),
let E be the set of remaining equations and R be the current set of rewrite rules,
both E and R are ground, flat, and singleton. R can be partitioned into U and N ,
i.e., R = U ∪ N , where U are unit rules and N are non-unit rules. The above two
examples illustrate some properties of (E, R) summarized by the following lemma.
Lemma 12.2.11 Suppose the input E to the Knuth-Bendix completion procedure
is a set of ground equations and R is the current set of rewrite rules. When a new
rewrite rule l → r is made, for any x → y of R, x is not rewritable by l → r at the
root of x. Moreover, if x is a constant, then x → y will stay in R forever.
Proof If l → r can rewrite x at its root, then l = x, because they are ground. So, l
can be rewritten to y by x → y, a contradiction to the assumption that l is in normal
form by R. If x is a constant, then the only way to rewrite x is at its root and that is
not possible.
We may observe the following properties when a rewrite system is kept flat and
singleton in the Knuth-Bendix completion procedure:
1. By Lemma 12.2.11, for any u → v ∈ U , i.e., it is a unit rule, u → v will never
be deleted by any other rewrite rule during the execution of Proc. 7.2.15.
2. When a new unit rule l → r is made, this unit rule can rewrite some rules of N
where l appears in the rules.
12.2 Equality with Uninterpreted Functions 451
Like the Knuth-Bendix completion procedure (Proc. 7.2.15), the above procedure
takes one equation from E at each iteration of the main loop, normalizes the
equation by R = U ∪ N , tries to make a rewrite rule from the normalized equation,
and adds the rule into R while keeping R inter-reduced. Some rules of R go back
to E as the result of inter-reduction. The main loop terminates when E becomes
empty. However, KBG has special features which make it efficient than the general
completion procedure:
• Critical pair computation is omitted because inter-reduction is sufficient for
ground equations.
• The normalization procedure NF(t) uses nf (c) instead of U for each identifier c;
the rewrite rules of N can be used at most once because t is flat. In fact, NF(t)
can be done in O(|t|) time since we can use a hash table or a trie structure to
store N. Note that, since t is flat, |t| is a constant if the arity of any function is a
constant.
• When a unit rule s → t is created, we update nf (c) by t for any c ∈ class(s)
(line 9) and merge class(s) into class(t) (line 10). Recall that nf (c) is the normal
form of c by U . Due to the existence of nf (c), rewrite rules in U are superficial:
they are neither deleted nor simplified.
• The data structure use(s) records the set of non-unit rules which use identifier
s in its left side (line 14). This data structure is useful at line 9 to remove rules
l → r from N when l is rewritable by a new rule s → t.
• When a non-unit rule is added into N (line 14), N as well as R = U ∪ N remains
inter-reduced, so there is no need to rewrite N by the new rule.
The correctness of the above procedure comes directly from Theorem 7.2.22.
The time complexity of the procedure is given explicitly by the following theorem.
Theorem 12.2.13 If E is a set of ground, flat, and singleton equations, and n is the
total number of symbols in E, then procedure KBG runs in O(n log(n)).
.
Proof The main loop (lines 3–15) handles s = t of E and each line of the main
loop takes various times to execute. Line 4 takes O(|s| + |t|) or O(1) time if we
consider the arity of any symbol is a constant. Line 9 takes O(|class(s)|) time; lines
11–12 take O(|use(s)|) time. All other lines take constant time. If each equation of
E is processed only once, then the total time would be O(n), because the sums of
.
|s| + |t|, |class(s)|, or |use(s)| for all equations s = t and identifies c are bound by
O(n). If no equations are processed more than m times, then the total time of KBG
would be O(nm).
How many times an equation can go back from N to E (line 12)? Note that the
root cause of moving an equation from N to E at line 12 is that we have a new unit
rule s → t (line 8) and s appears in the moved equations. When making a rewrite
.
rule s → t from s = t, we ensure that |class(s)| ≤ |class(t)| (line 7) and then
merge class(s) into class(t) (line 10). In other words, an equation moves from N
to E only when (the left side of) this equation contains an identifier whose class will
be merged into another class. Since a smaller class cannot be merged into a larger
12.3 Linear Arithmetic 453
class more than log(n) times, no equation can be moved from N to E more than
log(n) times. So, the total complexity of KBG is bound by O(n log(n)).
Theorem 12.2.14 The congruence closure of ground equality can be decided in
O(n log(n)), where n is the input size, i.e., the total number of symbols appearing
in the input.
Proof Given a set A of ground equality, let E ⊆ A be all the equations of A. At
first, we call KBG(flattenSlicing(E)) to obtain a decision procedure R for =E . Next,
for each inequality (s = t) ∈ A, we check if NF(s) = NF(t) using R; if yes, return
“unsatisfiable.” The whole process takes O(n log(n)) time (Theorem 12.2.13) as
NF(s) and NF(t) take O(|s| + |t|) time. Note that s and t are not necessarily flat.
While Kapur’s ideas of flattening and slicing are used in the above procedure,
flattening is optional as the same result can be achieved by slicing alone (assuming
each hash function takes constant time); the proof is left as an exercise. Flattening
and slicing have been used successfully by Kapur to handle function symbols which
are commutative and associative. Nieuwenhuis and Oliveras’ algorithm can also
produce automated proofs of s =E t.
Linear arithmetic is a theory of first-order logic where the formulas are quantifier-
free and contain existentially quantified variables over a domain D, where D can
be of type rational or integer and support usual arithmetic operators such as +, −,
·, =, ≤, ≥, etc., in addition to the conventional Boolean operators. The formulas
are linear as the multiplication, i.e., ·, is not allowed over two variables. An atomic
linear arithmetic formula looks like
a1 · x1 + · · · + an · xn ≤ b, or a1 x1 + · · · + an xn ≤ b,
Example 12.3.1 Consider the constraint ¬(x1 − 2x2 < 3). Apply ¬(A < B) ≡
B ≤ A, 3 ≤ x1 − 2x2 . Moving 3 to the right and x1 − 2x2 to the left of ≤, we have
−x1 + 2x2 ≤ −3.
The top two rows of the above table can be also used for preprocessing linear
programming problems. Constraints like x ≥ a for a = 0 can be replaced by y ≥ 0
with x = y + a.
Linear arithmetic is very expressive as numerous problems in scientific and
industrial computation can be expressed in linear arithmetic. Linear arithmetic is
relevant to automated reasoning because most programs use arithmetic variables
(e.g., integers) and perform arithmetic operations on those variables. Therefore, soft-
ware verification has to deal with arithmetic. Linear arithmetic has been thoroughly
investigated in at least two directions: (i) optimization via linear programming
(LP), integer programming (ILP), and mixed real integer programming (MILP) and
(ii) first-order quantifier elimination. In this chapter, we will focus on (i) only.
In particular, we are interested in the satisfiability of linear arithmetic constraint
problems in the context of the combination of theories, as they occur, e.g., in SMT
(satisfiability modulo theories) solving.
For the objective function 4x1 + 3x2 , we may introduce a new variable x0 and
write it in standard form as x0 − 4x1 − 3x2 = 0, then (12.1) becomes (12.2):
Note that (12.2) has a simple matrix representation, which is used for the
implementation of the simplex method.
x0 x1 x2 x3 x4 x5 rhs
1 −4 −3 0 0 0 0
0 1 2 1 0 0 16
0 1 1 0 1 0 9
0 3 2 0 0 1 24
x1 + x3 = 16 x3 = 16 − x1 ≥ 0
x1 + x4 = 9 which give us x4 = 9 − x1 ≥ 0
3x1 + x5 = 24 x5 = 24 − 3x1 ≥ 0
x0 − (1/3)x2 + (4/3)x5 = 32
(4/3)x2 + x3 − (1/3)x5 = 8
(1/3)x2 + x4 − (1/3)x5 = 1
x0 x1 x2 x3 x4 x5 rhs
1 0 −1/3 0 0 4/3 32
0 0 4/3 1 0 −1/3 8
0 0 1/3 0 1 −1/3 1
0 1 2/3 0 0 1/3 8
Note that the above matrix can be obtained from the matrix representation of
(12.2) by the corresponding arithmetic operations on (12.2). For example, (12.3) is
the same as “dividing the fourth row of the matrix for (12.2) by 3.” “Adding (12.3)
4 times to (12.2)(a)” can be done by “adding the multiplication of the fourth row
by 4 to the first row of the matrix.” “Subtracting (12.3) from (12.2)(b)” can be done
by “subtracting the fourth row of the matrix from the second row of the matrix.”
“Subtracting (12.3) from (12.2)(c)” can be done by “subtracting the fourth row of
the matrix from the third row of the matrix.” The result is exactly the matrix for
(12.4). This is how the simplex method is implemented.
At the point (8, 0, 8, 1, 0), the value of x0 is 32, increasing from its old value 0.
Is this the optimal value for x0 ? From (12.4)(a), we have
x0 = (1/3)x2 − (4/3)x5 + 32
(4/3)x2 + x3 = 8 x3 = 8 − (4/3)x2 ≥ 0
(1/3)x2 + x4 = 1 which give us x4 = 1 − (1/3)x2 ≥ 0
x1 + (2/3)x2 = 8 x1 = 8 − (2/3)x2 ≥ 0
x2 + 3x4 − x5 = 3 (12.5)
x0 + x4 + x5 = 33 (12.6)
458 12 Decision Procedures
x3 − 4x4 + x5 = 4 (12.7)
x1 − 2x4 + x5 = 6 (12.8)
x0 x1 x2 x3 x4 x5 rhs
1 0 0 0 1 1 33
0 0 0 1 −4 1 4
0 0 1 0 3 −1 3
0 1 0 0 −2 1 6
Suppose the linear equational system has m constraints and n variables, with the
first m + 1 variables as basic variables [4]. Hence, we assume m ≤ n because,
when m > n, either the linear system has no solutions or one of its constraints is
redundant. Its standard form is given below:
x0 + 2x3 − 3x4 = 10
x1 + 2x3 − x4 = 3
x2 − x3 − 2x4 = 2
proc simplex(S)
1 Convert S in standard form and store it in matrix M.
2 Let r be the row of M with basic variable x0 , which is to be maximized.
3 while ¬ haltCondition(M) do
4 select a nonbasic variable v1 in r with the minimal coefficient.
5 select a basic variable v2 by pivoting with respect to v1 .
6 switch the roles of v1 and v2 and convert M in standard form.
The complexity of the simplex algorithm is exponential in the worst case. Since
there are polynomial decision procedures for linear programming, this means that
there are at least theoretically faster decision procedures. But in contrast to those
polynomial decision procedures, the simplex algorithm has all properties necessary
for an efficient linear programming solver: it produces minimal conflict expla-
460 12 Decision Procedures
The name “linear programming” existed before the age of computers. Here “pro-
gramming” means “table,” as the data of a linear programming problem, or linear
program for short, are usually stored in a table.
The standard form of a linear program is given as follows:
Example 12.3.6 Continuing from the previous example, we introduce three vari-
ables, x3 , x4 , and x5 , and convert (12.8) into a linear equational system:
(12.10) is identical to (12.1) in Sect. 12.3.1 and we have seen how (12.1) is
solved by the simplex method. Hence, the optimal solution of 4x1 + 3x2 is 33 when
x1 = 6 and x2 = 3.
By definition, linear programming is an optimization problem. We can convert it
into a decision problem by introducing a constant c, so that “Maximize a0,1 x1 +
· · · + a0,n xn ” becomes “Decide a0,1 x1 + · · · + a0,n xn ≥ c.” Another decision
problem related to linear programming is to decide if the set of all constraints is
consistent. That is, can we find values of x1 , x2 , . . . , xn such that all constraints are
true? This problem can be easily solved if we convert linear programming into linear
equational system through slack variables.
The standard form of an integer program is the same as that of linear programming,
with the additional constraints that the variables take only integers:
Maximize 4x + 3y (12.12)
subject to
−x + y ≤ 1
x + 5y ≤ 12
3x + 2y ≤ 12
x, y are non-negative integers.
If we drop the constraint that “x, y are integers,” the problem becomes an
example of linear programming. An optimal solution can be found by the simplex
method with the optimal solution (x, y) = (2.4, 2.4), i.e., 4x + 3y = 16.8. With
the constraint that x, y are integers, there are 11 feasible solutions (black dots in
Fig. 12.2) and one optimal solution, i.e., (x, y) = (4, 0), or 4x + 3y = 16.
Whereas the simplex method is effective for solving linear programs, there is
no single effective technique for solving integer programs. Instead, a number of
techniques have been developed, and the performance of any particular technique
appears to be highly problem dependent. Methods to date can be classified broadly
as one of three approaches:
• Local search techniques: Start with some promising candidate solutions, and
then, choose some variables (e.g., those occurring in the violated constraints)
to modify their values (e.g., add or subtract 1) in turn to find the neighbors
of these candidate solutions. Select the best among these neighbors and repeat.
Candidate solutions can be found by (1) relaxing integer programming to linear
programming; (2) using the simplex method to find an optimal solution for the
relaxed linear programming; and (3) rounding up/down the optimal solution to
integer solutions.
• Enumeration techniques: The well-known enumeration technique is the branch-
and-bound procedure. This procedure is very much like the branch-and-bound
12.3 Linear Arithmetic 463
procedure for maximum satisfiability (Sect. 4.4.4) and especially effective for
0-1 linear programming.
• Cutting-plane techniques: The basic idea of the cutting plane method is to cut
off parts of the feasible region of the linear programming search space, so
that the optimal integer solution becomes an extreme point and therefore can
be found by the simplex method. In practice, the branch-and-bound method
outperforms mostly the cutting plane method. Cutting plane is the first algorithm
developed for integer programming that could be proved to converge in a finite
number of steps. Even though the technique is considered not efficient, it has
provided insights into integer programming that have led to other, more efficient,
algorithms.
Example 12.3.8 In Example 12.3.7, (x, y) = (2.4, 2.4) is an optimal solution of
the relaxed linear programming problem. The round-down solution is (x, y) =
(2, 2), which is one of the starting points for local search. The neighbors of
(x, y) = (2, 2) are {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)}.
From (2, 2), the local search will find (3, 1), then (4, 0), the optimal solution.
Applying the enumeration techniques to this example, we work on y first with
y = 0. The constraints give us 0 ≤ x ≤ 4 and the optimal value is (x, y) = (4, 0).
When y = 1, 2, the optimal value is (x, y) = (3, 1), (2, 2), respectively. The overall
optimal value is (x, y) = (4, 0).
Applying the cutting-plane techniques to this example, we may add x < 2 and
x ≥ 2, respectively, to the original constraints and search optimal solutions for the
more constrained subproblems. The overall optimal solution comes from x ≥ 2.
In addition, several composite procedures have been proposed, which combine
techniques using several of these approaches. In fact, there is a trend in computer
systems for integer programming to include a number of approaches and possibly
utilize them all when analyzing a given problem. The interested reader may consult
a textbook on integer programming for the details of these techniques.
duration of each execution is fixed, one can use difference arithmetic to express
constraints like earliest start and end, order of machine switches, and sequentially
of the executions.
There is a well-known, efficient decision procedure for difference constraints.
Given a set of difference constraints, one can reduce the problem of checking
its satisfiability to the problem of detecting negative cycles in an appropriately
generated graph. Then, any of the negative-cycle-detection algorithms can be used
to decide the given constraints. For instance, the classic Bellman-Ford algorithm can
decide m difference constraints on n variables in O(nm) time and O(n + m) space
complexity.
Example 12.3.9 Consider the following set of integer difference constraints:
The directed weighted graph corresponding to this set is shown in Fig. 12.3a. Obvi-
ously, there exist no negative cycles in the graph. There exist infinite assignments
to x, y, and z, such that all the constraints are satisfied. For instance, (x, y, z) =
(0, 0, 0), or (2, 6, 3).
If we add (E) : (x − y ≤ −5) to the above set of constraints, then (E) makes
(A) trivially true, thus, replacing (A). The new graph is (b) of Fig. 12.3. Now, there
exists a negative cycle in (b): x ⇒ z ⇒ y ⇒ x. No matter how we assign values to
x, y, and z, we would have 0 ≤ −1, a contradiction.
We have now seen a basic theory solver for difference arithmetic. It determines
satisfiability of a conjunction of literals by checking the absence of negative cycles
in the induced weighted graph.
12.4 Making DPLL(T ) Practical 465
proc analyzeConflict()
// return a unit clause learned from a conflicting clause
1 c := conflictClause // this is the latest conflicting clause
2 U := insertNewClause(conflictAnalysis(c)) // new level is set
3 undo(); // undo to the new level
4 return U // new clause is unit under σ after undo().
The procedures conflictAnalysis, undo, and insertNewClause are the same as
those in Sect. 4.2 for conflict-directed clause learning.
Example 12.4.2 Suppose X contains the clauses from Example 12.1.6:
. . . . .
(b = c), (h(a) = h(c)) ∨ p, (a = b) ∨ ¬p ∨ (a = d), (a = b) ∨ (a = d)
To deal efficiently with this formula, we need decision procedures for the
following theories:
• The theory of uninterpreted functions (TEU F )
• The theory of linear integer arithmetic (TLI A )
• The theory of arrays (Tarr )
TEU F and TLI A have been introduced in the previous sections. For Tarr , we need
two operators over arrays, read and write:
• read(A, i): read from array A at index i. It returns A[i].
• write(A, i, d): write d to array A at index i. It returns A after the execution of
“A[i] := d.”
The following three axioms are assumed in Tarr :
.
read(write(A, i, d), i) = d.
.
read(write(A, i, d), j ) = read(A, j ) if i = j.
. .
(∀i read(A, i) = read(B, i)) ↔ (A = B).
The last axiom is called extension axiom, which is needed in the current example.
.
In the example, “A = write(B, a + 1, 4)” asks to show that array A is equal to array
B after the execution of B[a + 1] := 4.
Example 12.4.3 We may prove the unsatisfiability of
. . .
a = b + 1 ∧ (A = write(B, a + 1, 4) ∨ f (a − 1) = f (b)) ∧ read(A, b + 2) = 5)
. . .
as follows: by TLI A , a − 1 = b and a + 1 = b + 2 are true when a = b + 1 is
.
true. By TEU F , f (a − 1) = f (b) is false when a − 1 = b is true. By Tarr , read(
12.4 Making DPLL(T ) Practical 469
. .
write(B, a + 1, 4), a + 1) = 4 is true and A = write(B, a + 1, 4) is false when read(
. . .
write(B, a + 1, 4), a + 1) = 4, read(A, b + 2) = 5, a + 1 = b + 2, and 4 = 5 are
.
true. Since both f (a − 1) = f (b) and A = write(B, a + 1, 4) are false, the input
formula is false.
Now, the question is: given theory solvers for the three individual theories, can
we combine them to obtain one for (TLI A ∪ Tarr ∪ TEU F )? It appears from the proof
of the above example that we may call the solver for TEU F first to obtain all the
.
equality. However, new equality may be created from TLI A (e.g., a = b from a ≥ b
and b ≥ a) or Tarr . It is indeed a difficult task to combine theory solvers into one.
Nelson and Oppen proposed a combination algorithm that will combine theory
solvers under some sufficient conditions. We will use the previous example to
illustrate Nelson and Oppen’s ideas.
The first step is to purify the input, which is a set of SMT literals, by introducing
interface constants and dividing the input by theories.
For the input formula in Example 12.4.3, there are three theories and one case
using the DPLL(T ) framework is to check the following conjunction is satisfiable
or not:
. .
a = b + 1 ∧ f (a − 1) = f (b) ∧ read(A, b + 2) = 5
The input is divided into three groups, F1 , F2 , and F3 , plus a common group CA
over the set C of interface constants. For instance, literal (f (a − 1) = f (b))
. . .
converts to {a − 1 = e1 , f (e1 ) = e2 , f (b) = e3 , e2 = e3 } by introducing
. .
three constants. Literal (read(A, b + 2) = 5) converts into { read(A, e4 ) = e5 ,
. .
b + 2 = e4 , 5 = e5 }. For the whole input, we need seven interface constants, i.e.,
C = {a, b, e1 , e2 , e3 , e4 , e5 }.
F1 F2 F3
. . .
b + 1 = a f (e1 ) = e2 read(A, e4 ) = e5
. .
a − 1 = e1 f (b) = e3
.
b + 2 = e4
.
5 = e5
.
It is easy to see that F2 ∪ CA is unsatisfiable in TEU F (because b = e1 implies
. .
f (e1 ) = f (b), thus e2 = e3 ). Hence, the second step ends with “unsatisfiable.”
The set CA is called a constant arrangement. If Fi ∪CA is unsatisfiable in theory
Ti , then the input is unsatisfiable. Otherwise, we add a new equality into CA which
is true in Ti until no new equality exists. In this case, we say CA is saturated. When
a saturated constant arrangement is found and for every i, Fi ∪ CA is satisfiable
in Ti , we claim that the input is satisfiable. These ideas can be described by the
following procedure.
Nelson and Oppen showed that the above procedure works correctly under two
sufficient conditions.
Definition 12.4.4 A theory T is stably infinite if every T -satisfiable quantifier-free
formula has an infinite model.
Many interesting theories are stably infinite, including theories of an infinite
structure (e.g., integer arithmetic). TEU F is also known to be stably infinite.
However, there are interesting theories that are not stably infinite. They are theories
of a finite structure, e.g., theory of bit vectors of finite size, or arithmetic modulo n,
or theory of strings of bounded length.
.
Definition 12.4.5 A theory T is convex if for any set S of literals, S | a1 =
. .
b1 ∨ · · · ∨ an = bn , then S | ai = bi for some 1 ≤ i ≤ n.
Theorem 12.4.6 If a conjunction S of literals can be split into signature-disjoint,
stably infinite, and convex theories Ti , 1 ≤ i ≤ n, the (T1 ∪ · · · ∪ Tn )-satisfiability
of S can be checked with the Nelson-Oppen algorithm.
Proof of the above theorem can be found in [5].
Some popular theories like integer linear arithmetic TLI A are not convex. For
. .
instance, in TLI A , 1 ≤ x ≤ 2 is equivalent to x = 1 ∨ x = 2. However, neither
. .
TLI A , 1 ≤ x ≤ 2 | (x = 1) is true, nor TLI A , 1 ≤ x ≤ 2 | (x = 2).
Attempts to get rid of either the stably infinite condition or the convex condition
have been reported with success [5].
12.4 Making DPLL(T ) Practical 471
. . .
Example 12.4.7 Consider input S = {f (1) = a, f (x) = b, f (2) = c, 1 ≤ x, x ≤
. .
2, b = a − 1, b = c + 1}. To show S is unsatisfiable, we may add case analysis
in Nelson and Oppen’s combination algorithm by considering the two cases when
.
x = 1 is true or false. The detail is left as exercise.
To see what theories involved in the SMT problems, the best place is the SMT-LIB
website:
smtlib.cs.uiowa.edu/index.shtml
Fig. 12.4 SMT-LIB: The group structure of SMT problems by theories. The arrow shows the
subgroup relation. Source: smtlib.cs.uiowa.edu/index.shtml
472 12 Decision Procedures
used by the groups and some major restriction of the group, with the following
nomenclature:
• QF for the restriction to quantifier free formulas
• A or AX for the theory of arrays
• BV for the theory of fixed size bit vectors
• IA for the theory of integer arithmetic
• RA for the theory of real-number arithmetic
• IRA for the theory of mixed integer and real arithmetic
• IDL for the theory of integer difference logic
• UF for uninterpreted function symbols (other than the operations from arithmetic
and equality)
• L before IA, RA, or IRA for the linear fragment of those arithmetic
• N before IA, RA, or IRA for the nonlinear fragment of those arithmetic
The above abbreviated names can be combined to form subgroups or super-
groups. For instance, the supergroup is AUFNIRA, which is a combination of
A-UF-N-IRA and denotes the group of problems from nonlinear (N) fragment
of mixed integer and real arithmetic (IRA), with arrays (A) and restrictions to
uninterpreted functions (UF).
The website notes that the reason for this nomenclature is mostly historical. For
instance, the set of quantifier-free (QF) formulas is the union of ground formulas
and non-ground formulas with free variables. However, the website does not have a
group named “ground.” When needed, it uses the word “closed” in the description
of a group. Indeed, closed QF formulas are exactly ground formulas.
A great effort of SMT-LIB is to provide a standard input/output language for
SMT solvers. Researchers use this language to create new problems and send them
to SMT-LIB. Developers of SMT solvers use SMT-LIB to experiment and test new
algorithms. SMT-LIB also provides links to SMT solvers and related tools and
utilities.
github.com/SMT-COMP/
have provided further evidence that such a competition can stimulate improvement
in solver implementations: solvers entered in each competition have improved
significantly over those in previous competitions. SMT-LIB is the place where we
can find SMT solvers of the state of the art.
The competition consists of several divisions by problem types and each division
consists of several tracks by ways of solving. For SMT-COMP 2023, there are 24
divisions and a total of 95 tracks. Some frequent or high-performance winners of
these tracks are listed below:
• CVC5: an SMT solver developed mainly at Stanford University and University
of Iowa
• Yices2: an SMT solver developed and distributed by SRI International
• Bitwuzla: an SMT solver developed at Stanford University
• SMTInterpol: an SMT solver written in Java, developed at University of Freiburg
• OpenSMT: an SMT solver written in C++, developed at USI, Switzerland
• Z3: an SMT solver developed by Microsoft, and Z3++, a modification of Z3
(4.8.15), developed at Chinese Academy of Sciences
• STP: a solver for QF-BV and arrays, developed at Stanford University and
University of Melbourne
• Vampire 4.8: a theorem prover for first-order logic, developed at University of
Manchester and Vienna University of Technology
• iProver 3.8: a theorem prover for quantified first-order logic with support for
arithmetical reasoning, developed at The University of Manchester
These SMT solvers are freely available on the internet. The interested reader may
download a copy of these solvers and experiment with the problems from SMT-LIB.
There exist outstanding solvers that work well with a single theory, and some
examples are listed below:
• Linear Arithmetic (LA): SPASS-SATT from Max Planck Institute (MPI) for
Informatics, https://www.mpi-inf.mpg.de/departments/automation-of-logic/
software/spass-workbench/spass-satt
• Nonlinear Integer Arithmetic (NIA): AProVE from RWTH Aachen University,
https://aprove.informatik.rwth-aachen.de/ and BLAN from Chinese Academy
of Sciences, https://github.com/fuqi-jia/BLAN
• String Constraints: OSTRICH from several universities in the UK, Germany, and
China, https://github.com/uuverifiers/ostrich
This chapter will be updated in the near future to report on mature and efficient
techniques for SMT problems.
Exercises
please run the Knuth-Bendix procedure on E with the ordering f > a > c1 >
c2 > c3 > c4 , and list all the rewrite rules made from the procedure in the order
they were created.
.
6. Let E0 = {f (a, g(a)) = g(g(a))} (Example 12.2.9). Please run
KBG(flattenSlicing(E0 )) by hand and show the contents of nf (c), class(c),
and use(c) for each identifier c before and after the main loop of KBG.
7. Assuming each hash function takes constant time, prove that Theorem 12.2.14
remains true if the flatten operation is omitted. In other words, the slicing
operation is sufficient for Theorem 12.2.14.
8. The final matrix given in Sect. 12.3.1 is
x0 x1 x2 x3 x4 x5 rhs
1 0 0 0 1 1 33
0 0 0 1 −4 1 4
0 0 1 0 3 −1 3
0 1 0 0 −2 1 6
10. Please provide the details of running Nelson and Oppen’s combination algo-
rithm on the following inputs involving TLI A and TEU F :
. .
a. S1 = {f (f (x − 1) − f (y)) = a + 1, f (0) > a + 2, x − y = 1}.
. . .
b. S2 = {f (g(x − 1)) = x + 1, f (g(y + 1)) = y − 1, x = y + 2}.
.
c. S3 = {f (x) ≥ 1, f (y) = 0, x < y + 1, x ≥ y}.
11. Please show the execution detail of Nelson and Oppen’s combination algorithm
on the following input:
. . .
a = b + 1 ∧ A = write(B, a + 1, 4) ∧ read(A, b + 2) = 5
12. Please add case analysis in Nelson and Oppen’s combination algorithm and
show how input S is shown to be unsatisfiable (see Example 12.4.7):
. . . . .
S = {f (1) = a, f (x) = b, f (2) = c, 1 ≤ x, x ≤ 2, b = a − 1, b = c + 1}.
References
1. Egon Börger, Erich Grädel, Yuri Gurevich, The Classical Decision Problem, Springer-Verlag,
Berlin, 1997
2. Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli, “Solving SAT and SAT Modulo
Theories: From an Abstract Davis–Putnam–Logemann–Loveland Procedure to DPLL(T)”,
Journal of Association for Computing Machinery, 53 (6): 937–977, 2006
3. Deepak Kapur, “Shostak’s congruence closure as completion”, in Hubert Comon (ed.) Rewriting
Techniques and Applications, 8th International Conference, RTA-97, Sitges, Spain, June 2-5,
1997, vol 1232, Lecture Notes in Computer Science, pages 23–37. Springer, 1997
4. George Dantzig, “Origins of the simplex method”. In Stephen G. Nash (ed.). A History of
Scientific Computing, Association for Computing Machinery, May 1987
5. Daniel Kroening, Ofer Strichman, Decision Procedures: An Algorithmic Point of View, (2nd ed.)
Springer, November 2016
Index
Symbols B
α-rule, 83 Backus, John, 21
, 369 Backus–Naur form (BNF), 21, 181
first-order logic, 185 Bakery algorithm, 384
release, 381 Bendix, Peter, 221, 252
β-rule, 83 Bijection, 14
, 369 computable, 419
first-order logic, 185 counting, 419
until, 381 increasing, 419
λ-calculus, 397 Bijective proof, 419
Binary decision diagram (BDD), 57
Boethius, 5
A Boole, George, 5, 25
Algebra Boolean, 4
Boolean, 25 algebra, 25
categorical, 25 function, 14, 35
universal, 25 variable, 4
Algorithm, 23, 409 Boolean constraint propagation (BCP), 106
Alphabet, 162 Bounded model checking (BMC), 381
Argument, 8 Buchberger, Bruno, 258
Aristotle, 5 Buchberger’s algorithm, 258
Arity, 163
Assertion, 316
middle, 324 C
Assignment Chomsky’s, Noam, 23
rule, 320 Church, Alonzo, 23, 182, 183, 397
Atom Church-Turing thesis, 23, 24, 396
atomic formula, 164 Clarke, Edmund, 381
Axiom, 26 Clause, 51
consistent, 26 binary, 94
independent, 26 definite, 108, 288
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 477
H. Zhang, J. Zhang, Logic in Computer Science,
https://doi.org/10.1007/978-981-97-9816-2
478 Index
computable, 397 V
degree, 414 Valid, 42
equivalent, 414 first-order, 173
Turing, Alan, 23, 397 LTL, 361
Turing machine, 396 TL, 355
decider, 405 Variable
recognizer, 405 basic, 455
universal, 396, 415 Boolean, 4
Type, 168, 264, 433 bounded, 167
strong typing, 181 free, 167
Type theory, 183 propositional, 4
simple, 183 scope, 167
slack, 460
Verification condition, 331
U Von Neumann, John, 23, 397
Unifiable, 202
Unification W
clash, 204 Weighted CNF (WCNF), 145
occur check, 204 Well-founded
Unifier, 202 induction, 218
mgu, 204 order, 13
more general, 204 Well order, 13
Unit deletion, 217 William of Ockham, 5
Unit propagation, 106 Wu, Wenjun, 258
Universal halting problem, 407 Wu’s method, 258
Universe, 168
Herbrand, 196
of discourse, 11, 168 Z
Until, 377 Zermelo, Ernst, 19
weak, 380 Zermelo–Fraenkel set theory (ZF), 20