Pub - Elements of Computation Theory PDF
Pub - Elements of Computation Theory PDF
Editors
David Gries
Fred B. Schneider
Elements of Computation
Theory
ABC
Arindama Singh
Department of Mathematics
Indian Institute of Technology Madras
Sardar Patel Road
Chennai - 600036
India
[email protected]
Series Editors
David Gries Fred B. Schneider
Department of Computer Science Department of Computer Science
Upson Hall Upson Hall
Cornell University Cornell University
Ithaca, NY 14853-7501, USA Ithaca, NY 14853-7501, USA
What is an algorithm?
What can be computed and what cannot be computed?
What does it mean for a function to be computable?
How does computational power depend upon programming constructs?
Which algorithms can be considered feasible?
For more than 70 years, computer scientists are searching for answers to such ques-
tions. Their ingenious techniques used in answering these questions form the theory
of computation.
Theory of computation deals with the most fundamental ideas of computer sci-
ence in an abstract but easily understood form. The notions and techniques employed
are widely spread across various topics and are found in almost every branch of com-
puter science. It has thus become more than a necessity to revisit the foundation, learn
the techniques, and apply them with confidence.
• To introduce to the students of computer science and mathematics the elegant and
useful models and abstractions that have been created over the years for solving
foundational problems about computation
• To help the students develop the ability to form abstract models of their own and
to reason about them
• To strengthen the students’ capability for carrying out formal and rigorous
arguments about algorithms
v
vi Preface
• To equip the students with the knowledge of the computational procedures that
have hunted our predecessors, so that they can identify similar problems and struc-
tures whenever they encounter one
• To make the essential elements of the theory of computation accessible to not-
so-matured students having not much mathematical background, in a way that is
mathematically uncompromising
• To make the students realize that mathematical rigour in arguing about algorithms
can be very attractive
• To keep in touch with the foundations as computer science has become a much
more matured and established discipline
Organization
Chapter 1 reviews very briefly the mathematical preliminaries such as set theory,
relations, graphs, trees, functions, cardinality, Cantor’s diagonalization, induction,
and pigeon hole principle. The pace is not uniform. The topics supposedly unknown
to Juniors are discussed in detail.
The next three chapters talk about regular languages. Chapter 2 introduces the
four mechanisms such as regular expressions, regular grammars, deterministic fi-
nite automata, and the nondeterministic finite automata for representing languages
in their own way. The fact that all these mechanisms represent the same class of
languages is shown in Chap. 3. The closure properties of such languages, existence
of other languages, other structural properties such as almost periodicity, Myhill–
Nerode theorem, and state minimization are discussed in Chap. 4.
Chapters 5 and 6 concern the class of context-free languages. Here we discuss
context-free grammars, Pushdown automata, their equivalence, closure properties,
and existence of noncontext-free languages. We also discuss parsing, ambiguity, and
the two normal forms of Chomsky and Greibach. Deterministic pushdown automata
have been introduced, but their equivalence to LR(k) grammars are not proved.
Chapters 7 and 8 discuss the true nature of general algorithms introducing the
unrestricted grammars, Turing machines, and their equivalence. We show how to take
advantage of modularity of Turing machines for doing some complex jobs. Many
possible extensions of Turing machines are tried and shown to be equivalent to the
standard ones. Here, we show how Turing machines can be used to compute functions
and decide languages. This leads to the acceptance problem and its undecidability.
Chapter 9 discusses the jobs that can be done by algorithms and the jobs that
cannot be. We discuss decision problems about regular languages, context-free lan-
guages, and computably enumerable languages. The latter class is tackled greedily
by the use of Rice’s theorem. Other than problems from language theory, we discuss
unsolvability of Post’s correspondence problem, the validity problem of first order
logic, and of Hilbert’s tenth problem.
Chapter 10 is a concise account of both space and time complexity. The main
techniques of log space reduction, polynomial time reduction, and simulations in-
cluding Savitch’s theorem and tape compression are explained with motivation and
rigour. The important notions of N L S-completeness and NP-completeness are ex-
plained at length. After proving the Cook–Levin theorem, the modern approach of
using gadgets in problem reduction and the three versions of optimization problems
are discussed with examples.
Preface vii
Special Features
There are places where the approach has become nonconventional. For example,
transducers are in additional problems, nondeterministic automata read only sym-
bols not strings, pushdown automata require both final states and an empty stack for
acceptance, normal forms are not used for proving the pumping lemma for context-
free languages, Turing machines use tapes extended both ways having an accepting
state and a rejecting state, and acceptance problem is dealt with before talking about
halting problem. Some of the other features are the following:
• All bold-faced phrases are defined in the context; these are our definitions.
• Each definition is preceded by a motivating dialogue and succeeded by one or
more examples.
• Proofs always discuss a plan of attack and then proceed in a straightforward and
rigorous manner.
• Exercises are spread throughout the text forcing lateral thinking.
• Problems are included at the end of each section for reinforcing the notions learnt
so far.
• Each chapter ends with a summary, bibliographical remarks, and additional
problems. These problems are the unusual and hard ones; they require the guid-
ance of a teacher or browsing through the cited references.
• An unnumbered chapter titled Answers/Hints to Selected Problems contains
solutions to more than 500 out of more than 2,000 problems.
• It promotes interactive learning building the confidence of the student.
• It emphasizes the intuitive aspects and their realization with rigorous formalization.
Target Audience
This is a text book primarily meant for a semester course at the Juniors level. In
IIT Madras, such a course is offered to undergraduate Engineering students at their
fifth semester (third year after schooling). The course is also credited by masters
students from various disciplines. Naturally, the additional problems are tried by such
masters students and sometimes by unusually bright undergraduates. The book (in
notes form) has also been used for a course on Formal Language Theory offered to
Masters and research scholars in mathematics.
languages (Sects. 5.2–5.4), pushdown automata, pumping lemma and closure prop-
erties of context-free languages (Sects. 6.2, 6.4, and 6.5), computably enumerable
languages (Chap. 7), a noncomputably enumerable language (Chap. 8), algorithmic
solvability (Sects. 9.2–9.4), and computational complexity (Chap. 10). Depending
on the stress on certain aspects, some of the proofs from these core topics can be
omitted and other topics can be added.
I cheerfully thank
My students for expressing their wish to see my notes in the book form,
IIT Madras for keeping me off from teaching for a semester, for putting a deadline
for early publication, and for partial financial support under the Golden Jubilee Book
Writing Scheme,
Prof. David Gries and Prof. Fred B. Schneider, series editors for Springer texts in
computer science,
Mr. Wayne Wheeler and his editorial team for painstakingly going through the
manuscript and suggesting improvements in presentation,
Prof. Chitta Baral of Arizona State University for suggesting to include the chapter
on mathematical preliminaries,
Prof. Robert I. Soare of the University of Chicago, Dr. Abhaya Nayak of Maquarie
University, and Dr. Sounaka Mishra of IIT Madras for suggesting improvements,
My family, including my father Bansidhar Singh, mother Ragalata Singh, wife
Archana, son Anindya Ambuj, daughter Ananya Asmita, for tolerating my obses-
sion with the book, and
My friends Mr. Biswa R Patnaik (in Canada) and Mr. Sankarsan Mohanty (in Orissa)
for their ever inspiring words.
Arindama Singh
ix
Contents
1 Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Relations and Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Functions and Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Proof Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 Summary and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2 Regular Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Language Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 Regular Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.5 Deterministic Finite Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.6 Nondeterministic Finite Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.7 Summary and Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3 Equivalences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 NFA to DFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3 Finite Automata and Regular Grammars . . . . . . . . . . . . . . . . . . . . . . . 76
3.4 Regular Expression to NFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.5 NFA to Regular Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6 Summary and Additional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
xi
xii Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
1 Mathematical
Preliminaries
1.1 Introduction
An African explorer conversant with the language of the Hottentot tribe asks a native,
“How many children do you have?” The tribesman answers, “Many.” The determined
explorer persists on. He shows his index finger, meaning “one?” Promptly comes the
answer, “no.” He adds his middle finger, meaning “two”; the answer is “no”; “three?,”
“no”; “four,” “no.” Now all the five fingers on the explorer’s right hand are straight.
Answer comes, “yes.” The puzzled explorer experiments with another tribesman.
Over the next week, he discovers that they have only three kinds of numbers, one,
two, and many.
It is an old story, but perhaps not without morals. The Hottentot tribesman does
not have a way of naming the numbers more than two. How does he manage his
cattle?
Our mathematical tradition has gone so far and so deep that it is indeed difficult
to imagine living without it. In this small chapter, we will discuss a fragment of this
tradition so that the rituals of learning the theory of computation can be conducted
relatively easily. In the process, we will fix our notation.
1.2 Sets
A set is a collection of objects, called its members or elements. When writing a set by
showing its elements, we write it with two closing curly brackets. The curly brackets
serve two purposes: one, it shows the elements inside, and two, it says that the whole
thing put together is another object on its own right. Sometimes, we become tired of
writing out all the elements, and we put three dots. For example,
{pen, pencil, knife, scissors, paper, chalk, duster, paper weight, . . .}
is the set of names of educational stationery. It is the extensional way of representing
a set. But what about the expression, “the set of names of educational stationery?”
It is a set, nonetheless, and the same set as above. We thus agree to represent a
set by specifying a property that may be satisfied by each of its elements. It is the
intensional way of representing a set. (Note the spelling of “intensional.”) If A is a
set and a is an element in A, we write it as a ∈ A. The fact that a is not an element
of A is written as a ∈ A.
When specifying a set by a property, we write it as {x : P(x)}, meaning that this
set has all and only those x as elements which satisfy a certain property P(·). Two sets
A, B are said to be the same set, written A = B, if each element of A is in B and each
element of B is in A. In that case, their defining properties must be logically equiv-
alent. For example, the set {2, 4, 6, 8} can be written as {x : x = 2 or x = 4 or x =
6 or x = 8}. Also, {2, 4, 6, 8} = {x : 2 divides x and x is an integer with 0 < x <
10}. Further, {2, 4, 6, 8} = {4, 8, 6, 2}; the order of the elements when written down
explicitly, does not matter, and it is assumed that there are no repetitions of elements
in a set.
Two sets A, B are equal, written A = B, whenever they have precisely the same
elements. We say that A is a subset of a set B, written A ⊆ B, whenever each
element of A is in B. Similarly, we say that A is a proper subset of B, written
A B, whenever A ⊆ B but A B. Thus, A = B iff A ⊆ B and B ⊆ A. We
abbreviate the phrase “if and only if ” to “iff.”
A mathematical discourse fixes a big set, called the universal set often denoted
by U. All other sets considered are subsets of this big set in that particular context.
As a convention, this big set is never mentioned, and if strict formal justification is
required, then this is brought into picture.
Let A, B be sets and let U be the universal set (in this context, of course). The
union of A, B is written as A ∪ B = {x : x ∈ A or x ∈ B}. The intersection of A, B
is A ∩ B = {x : x ∈ A and x ∈ B}. The difference of A, B is A − B = {x : x ∈ A
but x ∈ B}. The complement of A is A = U − A = {x : x ∈ A}.
We define the empty set ∅ as a set having no elements; ∅ = {x : x x} = { }.
For any set A, ∅ = A − A. We find that A ∪ ∅ = A and A ∩ U = A. Moreover,
∅ is unique, whatever be the universal set. When two sets A, B have no common
elements, we say that they are disjoint and write it as A ∩ B = ∅. For example, with
the universal set as the set of all natural numbers N = {0, 1, 2, . . .}, A as the set of
all prime numbers, and B as the set of all composite numbers, we see that
In fact, it is enough for us to remember this property of the ordered pairs. The Carte-
sian product of the sets A and B is A × B = {(x, y) : x ∈ A and y ∈ B}.
The operations of union, intersection, and the (Cartesian) product can be extended
further. Suppose A = {Ai : i ∈ I } is a collection of sets Ai , where I is some set,
called an index set here. Then, we define
1.3 Relations and Graphs 3
∪A = ∪i∈I Ai = {x : x is in some Ai }.
∩A = ∩i∈I Ai = {x : x is in each Ai }.
For the product, we first define an n-tuple of objects by
Finally, we write
Double Complement : A = A.
De Morgan : A ∪ B = A ∩ B, A ∩ B = A ∪ B.
Commutativity : A ∪ B = B ∪ A, A ∩ B = B ∩ A.
Associativity : A ∪ (B ∪ C) = ( A ∪ B) ∪ C, A ∩ (B ∩ C) = ( A ∩ B) ∩ C.
Distributivity : A ∪ (B ∩ C) = ( A ∪ B) ∩ ( A ∪ C), A ∩ (B ∪ C) = ( A ∩ B) ∪ ( A ∩ C),
A × (B ∪ C) = ( A × B) ∪ ( A × C), A × (B ∩ C) = ( A × B) ∩ ( A × C),
A × (B − C) = ( A × B) − ( A × C).
We use the relations in an extensional sense. The binary relation of “is a son of ”
between human beings is thus captured by the set of all ordered pairs of human
beings, where the first coordinate of each ordered pair is a son of the second coordi-
nate. A binary relation from a set A to a set B is a subset of A × B. If R is such a
relation, a typical element in R is an ordered pair (a, b), where a ∈ A and b ∈ B are
suitable elements. The fact that a and b are related by R is written as (a, b) ∈ R; we
also write it as R(a, b) or as a Rb.
Any relation R ⊆ A × A is called a binary relation on the set A. Similarly,
an n-ary relation on a set A is some subset of An . For example, take P as a line and
a, b, c as points on P. Write B(a, b, c) for “b is between a and c.” Then B is a ternary
relation, that is, B ⊆ P 3 , and B(a, b, c) means the same thing as (a, b, c) ∈ B.
Unary relations on a set A are simply the subsets of A.
Binary relations on finite sets can conveniently be represented as diagrams. In
such a diagram, the elements of the set A are represented as small circles (points,
nodes, or vertices) on the plane and each ordered pair (a, b) ∈ R of elements
4 1 Mathematical Preliminaries
Example 1.1. The digraph for the relation R = {(a, a), (a, b), (a, d), (b, c), (b, d),
(c, d), (d, d)} on the set A = {a, b, c, d} is given in Fig. 1.1.
a b
Example 1.2. Figure 1.2 depicts the labeled digraph (V, E, I ), where the vertex set
V = {a, b, c, d}, the edge set E = {e1 , e2 , e3 , e4 , e5 , e6 , e7 }, and the incidence rela-
tion I = {(e1 , a, a), (e2 , a, b), (e3 , a, d), (e4 , b, c), (e5 , b, d), (e6 , c, d), (e7 , d, d)}.
e2
a b
e1 e4
e3
e5
Sometimes we do not need to have the arrows in a digraph; just the undirected
edges suffice. In that case, the digraph is called an undirected graph or just a graph.
Usually, we redefine this new concept. We say that a graph is an object having a set
of vertices and a set of edges as components, where each edge is a two-elements set
(instead of an ordered pair) of vertices.
Example 1.3. The graph in Fig. 1.3 represents the graph G = (V, E), where V =
{a, b, c, d} and E = {{a, a}, {a, b}, {a, d}, {b, c}, {b, d}, {c, d}, {d, d}}.
1.3 Relations and Graphs 5
c d
Two vertices in a graph are called adjacent if there is an edge between them.
A path in a graph is a sequence of vertices v1 , v2 , . . . , vn , such that each vi is ad-
jacent to vi+1 . For example, in Fig. 1.3, the sequence a, a, b, c, d is a path, and so
are a, b, d and a, d, c. We say that the path v1 , v2 , . . . , vn connects the vertices v1
and vn . Moreover, the starting point of the path v1 , v2 , . . . , vn is v1 and its end point
is vn . Similarly, directed paths are defined in digraphs.
A graph is called connected if each vertex is connected to each other by some path
(not necessarily the same path). For example, the graph of Fig. 1.3 is a connected
graph. If you delete the edges {a, d}, {b, c}, {b, d}, then the resulting graph is not
connected. Similarly, if you remove the edges {a, b}, {a, d}, then the resulting graph
is not connected either.
A path v1 , v2 , . . . , vn , v1 is called a cycle when n > 2 and no vertex, other than the
starting point and the end point, is repeated. A connected cycleless graph is called a
tree. By giving directions to the edges in a tree, we obtain a directed tree.
In a directed tree, if there is exactly one vertex towards which no arrow comes
but all edges incident with it are directed outward, the vertex is called the root of the
tree. Similarly, any vertex in a directed tree from which no edge is directed outward
is called a leaf. It is easy to see that there can be only one edge incident with a leaf
and that edge is directed toward the leaf. Because, otherwise, there will be a cycle
containing the leaf! A tree having a root is called a rooted tree.
Example 1.4. The graph on left side in Fig. 1.4 is a rooted tree with c as its root. It
is redrawn on the right in a different way.
a d c
c e a b e
b f d f
The trees in Fig. 1.4 are redrawn in Fig. 1.5 omitting the directions. Since the
direction of edges are always from the root toward any vertex, we simply omit the
6 1 Mathematical Preliminaries
directions. We will use the word “tree” for rooted trees and draw them without di-
rections. Sometimes, we do not put the small circles around the vertices. The tree on
left side in Fig. 1.5 uses this convention of drawing the trees in Fig. 1.4. It is further
abbreviated in the right side tree of the same figure.
In Fig. 1.5, all children of a vertex are placed below it and are joined to the parent
vertex by an undirected edge. We say that the root c has depth 0; the children of the
root are the vertices of depth 1 in the tree; the depth 2 vertices are the vertices that
have an edge from the vertices of depth 1; these are children of the children of the
root, and so on. In a tree, the depth is well defined; it shows the distance of a vertex
from the root. The depth of a tree is also called its height, and trees in computer
science grow downward!
c
c
a b e
a b e
d f
d f
Leaves of the tree in Fig. 1.5 are the vertices a, b, d, and f. The nonleaf vertices
are called the branch nodes (also, branch points or branch vertices).
With this short diversion on representing relations as graphs, we turn towards
various kinds of properties that a relation might satisfy. Suppose R is a binary relation
on a set A. The most common properties associated with R are
Reflexivity : for each x ∈ A, xRx.
Symmetry : for each pair of elements x, y ∈ A, if xRy, then yRx.
Antisymmetry : for each pair of elements x, y ∈ A, if xRy and yRx, then x = y.
Transitivity : for each triple of elements x, y, z ∈ A, if xRy and yRz, then xRz.
A binary relation can be both symmetric and antisymmetric, or even neither.
For example, on the set A = {a, b}, the relation R = {(a, a), (b, b)} is reflex-
ive, symmetric, transitive, and antisymmetric. On the other hand, the relation S =
{(a, b), (a, c), (c, a)} on A is neither reflexive, nor symmetric, nor antisymmetric,
nor transitive.
Given a binary relation R on a set A, we can extend it by including some more
ordered pairs of elements of A so that the resulting relation is both reflexive and
transitive. Such a minimal extension is called the reflexive and transitive closure
of R.
Example 1.5. What is the reflexive and transitive closure of R = {(a, b), (b, c),
(c, b), (c, d)} on the set A = {a, b, c, d}?
1.3 Relations and Graphs 7
Solution. Include the pairs (a, a), (b, b), (c, c), (d, d) to make it reflexive. You have
R1 = {(a, a), (a, b), (b, b), (b, c), (c, b), (c, c), (c, d), (d, d)}.
Since (a, b), (b, c) ∈ R1 , include (a, c). Proceeding similarly, You arrive at
R2 = {(a, a), (a, b), (a, c), (b, b), (b, c), (b, d), (c, b), (c, c), (c, d), (d, d)}.
R3 = {(a, a), (a, b), (a, c), (a, d), (b, b), (b, c), (b, d), (c, b), (c, c), (c, d), (d, d)}.
You see that R3 is already reflexive and transitive; there is nothing more required to
be included. That is, the reflexive and transitive closure of R is R3 .
[x] = {y ∈ A : x Ry}.
For each pair of elements x, y ∈ A, x Ry iff both x and y are elements of the
same subset B in A.
For (i), let [x] be an equivalence class of R for some x ∈ A. This x is in some
B ∈ A. Now y ∈ [x] iff x Ry iff y ∈ B, by the very definition of R. That is, [x] = B.
Similarly for (ii), let B ∈ A. Take x ∈ B. Now y ∈ B iff y Rx iff y ∈ [x]. This
shows that B = [x].
Let A, B be two sets. A partial function f from the set A to the set B is a relation
from A to B satisfying
Equivalently, for each x, y ∈ A, if f (x) = f (y), then x = y. It is easy to see that the
inverse of a one–one partial function is again a one–one partial function. But inverse
of a total function (even if one–one) need not be a total function. Because, there
might be elements in its co-domain that are not attained by the map. We call a partial
function f : A → B an onto partial function (or say that f is a partial function from
A onto B) if
Equivalently, for an onto function, the range of f coincides with the co-domain of f.
It is easy to see that a one–one total function from A onto B has an inverse, which
is also a one–one total function from B onto A. A one–one total function is also
10 1 Mathematical Preliminaries
Recall that the Hottentot tribesman knew the meaning of one-to-one correspon-
dence; he could say that he had as many children as the fingers on the right hand of
the African explorer. The tribesman did not know the name of any number beyond
two. He could count but could not name the number he has counted. Probably, he
had a bag full of pebbles as many as the sheep he owned. This is how he used to
keep track of the sheep in his possession. The idea behind counting the elements of
a set is the same as that of the tribesman.
We say that two sets A and B have the same cardinality when there is a bijection
between them. We write cardinality of a set A as |A|. Cardinality of a set intuitively
captures the idea of the number of elements in a set. Notice that we have not defined
what |A| is; we have only defined |A| = |B|.
To make the comparison of cardinalities easier, we say |A| ≤ |B| if there is an
injection from A to B (a one–one total function from A to B). We say that |A| ≥
|B| if |B| ≤ |A|. Cantor–Schröder–Bernstein Theorem says that if |A| ≤ |B| and
|B| ≤ |A|, then |A| = |B|. (A proof is outlined in Problem 1.29.) Further, we write
|A| < |B| when |A| ≤ |B| but |A| |B|; similarly, |A| > |B| when |A| ≥ |B| but
|A| |B|.
Since the empty set ∅ has no elements, we define 0 = |∅|. We go a bit further and
define 1 = |{0}|. And then define inductively n + 1 = |{0, 1, . . . , n}|. These are our
natural numbers, elements of the set N = {0, 1, 2, . . .}. Then the operations of +
and × are defined on N in the usual way. Notice that + is a function that maps a pair
of natural numbers to a natural number, and so is ×. Of course, we simplify notation
by writing mn for m × n.
Once the natural numbers are defined, we define the integers by extending it to
Z = N ∪ {−1, −2, . . .} = N ∪ {−n : n ∈ N}, with the convention that −0 = 0. Notice
that −n is just a symbol, where we put a minus sign preceding a natural number.
The set of positive integers is defined as Z+ = N − {0} = Z − {−n : n ∈ N}. Then
the operations + and × are extended to Z in the usual way, n + (−n) = 0, etc. This
extension allows us to solve simple equations such as x + 5 = 2 in Z.
1.4 Functions and Counting 11
Beyond some finite number of digits after the decimal point, a finite sequence of
digits keep recurring.
Further, we see that these recurring decimals uniquely represent rational numbers
with one exception: a decimal number with recurring 0’s can also be written as an-
other decimal with recurring 9’s. We agree to use the latter and discard the former if
uniqueness is at vogue. For example, the decimal 0.5 is written as 0.499999 · · · . This
guarantees a unique decimal representation of each number in Q. Also, this conven-
tion allows us to consider only the recurring infinite decimals instead of bothering
about terminating decimals.
We then extend our numbers to the real numbers. The set of real numbers, R, is
the set of all (infinite) decimals. The nonrecurring decimals are called the irrational
numbers, they form the set R−Q. √This extension
√ now allows us talking about square
roots of numbers. For example, 2 ∈ R, but 2 ∈ Q. However, we find that it is
not enough for solving polynomial equations, for example, there is no real number x
satisfying the equation x 2 + 1 = 0.
For solving polynomial equations, we would need the complex numbers. We de-
√ the set of complex numbers as C = {x + ı y : x,√y ∈ R}, where ı is taken as
fine
−1. Notice that ı is again a symbol which is used as −1. The operations of +, ×,
taking roots, etc. are extended in the usual way to C. It can be shown that our quest
for solving polynomial equations stop with C. The Fundamental Theorem of Alge-
bra states that each polynomial of degree n with complex coefficients has exactly n
complex roots.
Besides, there are complex numbers that are not roots√ of any polynomial equation.
Moreover, we require to distinguish between surds like 2 and numbers like π. For
this purpose, we restrict our polynomials to have rational coefficients. We define
an algebraic number as a complex number, which is a solution of a polynomial
equation, where the polynomials have rational coefficients. Other complex numbers
are called transcendental numbers. In fact, there are more transcendental numbers
than the algebraic numbers (Problem 1.28), though we know a very few of them.
Further, there is a natural partial order on N, the ≤ relation. It so happens that
this relation can be extended to Z, Q, and R. However, it stops there; it cannot be
extended to C. This does not mean that there cannot be any partial order on C. For
example, define ≤ on C by a + ı b ≤ c + ı d iff a < c, or (a = c and b ≤ c), taking
the ≤ on R as the basis. You can verify that this defines a partial order on C. But this
is not an extension of the ≤ relation on R. Because our definition of the relation <
says that 0 < ı , we should have 0 × 0 < ı × ı = −1, which is not true.
Observe that by this process of extension, we have constructed some infinite sets,
the sets whose cardinalities cannot be written as natural numbers. Infinite sets can be
12 1 Mathematical Preliminaries
defined without using numbers. We say that a set is infinite or has infinite cardinality
if it has the same cardinality as one of its proper subsets. And a finite set is a set
which is not infinite. Naturally, a finite set has greater cardinality than any of its
proper subsets. For example, N is an infinite set as the function f : N → 2N defined
by f (n) = 2n is a bijection, where 2N denotes the set of all even natural numbers.
It can further be shown that a set is finite iff either it is ∅ or it is in one-to-one
correspondence with a set of the form {0, 1, . . . , n} for some natural number n. We
take it as our definition of a finite set. Using this, we would define an infinite set as
one which is not finite. Cardinalities of finite sets are now well defined : |∅| = 0, and
a set that is in one-to-one correspondence with {0, 1, . . . , n} has cardinality n + 1.
Can we similarly define the cardinalities of infinite sets?
Well, let us denote the cardinality of N as ℵ0 ; read it as aleph-null. Can we say that
all infinite sets have cardinality ℵ0 ? Again, let us increase our vocabulary. We call a
set denumerable (also called enumerable) if it is in one-to-one correspondence with
N, that is, having cardinality as ℵ0 . The one-to-one correspondence with N gives
an enumeration of the set: if f : N → A is the bijection, then the elements of the
denumerable set A can be written as f (0), f (1), f (2), . . .. The following statement
should then be obvious.
Proof. Let A be an infinite subset of a denumerable set B. You then have a bijection
f : N → B. That is, elements of B are in the list: f (0), f (1), f (2), . . . . All elements
of A appear in this list exactly once. Define a function g : N → A by induction:
Proof. For the denumerability of Z, we put all even numbers in one-to-one corre-
spondence with all natural numbers, and then put all odd natural numbers in one-to-
one correspondence with the negative integers.
To put it formally, observe that each natural number is in one of the forms 2n or
2n + 1. Define a function f : N → Z by f (2n) = n and f (2n + 1) = −(n + 1).
To visualize, f (0) = 0, f (1) = −1, f (2) = 1, f (3) = −2, . . . . It is easy to see that f
is a bijection. Therefore, Z is denumerable.
1.4 Functions and Counting 13
For the denumerability of Q, let Q P denote the set of all symbols of the form p/q,
where p, q ∈ Z+ . Also, denote the set of positive rational numbers by Q+ . When
we look at these symbols as rational numbers, we find many repetitions. For exam-
ple, corresponding to the single element 1 in Q+ , there are infinitely many elements
1/1, 2/2, 3/3, . . . in Q P . We construct a one–one function from the set Q P to Z+ .
The elements of Q P can be written in a two-dimensional array as shown below.
1 1 1 1 1 1
1 → 2 3 → 4 5 → 6 ···
2 2 2 2 2 2
1 2 3 4 5 6 ···
↓
3 3 3 3 3 3
1 2 3 4 5 6 ···
4 4 4 4 4 4
1 2 3 4 5 6
···
↓
5 5 5 5 5 5
1 2 3 4 5 6 ···
.. .. .. .. .. .. ..
. . . . . . .
In the first row are written all the numbers of the form 1/m, varying m over Z+ ;
in the second row, all the numbers of the form 2/m; etc. Any number p/q ∈ Q P is
the qth element in the pth row. Thus the array exhausts all numbers in Q P .
Now, start from 11 and follow the arrows to get an enumeration of numbers in the
array. This means that the (enumerating) function f : Q P → Z+ is defined by
1 1 2 3 2
f = 1, f = 2, f = 3, f = 4, f = 5, · · ·
1 2 1 2 2
The method of proof in Theorem 1.3 proves that Z+ × Z+ is denumerable. All you
have to do is keep the ordered pair (m, n) in place of m/n in the array. For another
alternative proof of this fact, you can search for a one–one map from Z+ × Z+ to
Z+ . One such is defined by f (m, n) = 2m 3n . Just for curiosity, try to prove that the
function g : Z+ × Z+ → Z+ given by g(m, n) = 2m (2n − 1) is a bijection.
It is then clear that (Cartesian) product of two countable sets is countable. You
can further extend to any finite number of products, as A × B × C is simply (A ×
B) × C, etc. The proof method also shows that a countable union of countable sets
14 1 Mathematical Preliminaries
is countable. Keep on the first row, the first countable set, on the second row, the
second countable set, and so on, and then proceed as in the above proof! Similarly,
a denumerable union of finite sets is denumerable. However, a denumerable product
of denumerable sets is not countable. Check whether you can prove it following the
proof of Theorem 1.4 below!
Sometimes a result can be too counter intuitive; Theorem 1.3 is one such. Unlike
N, if you choose any two numbers from Q, you can always get another number (in
fact, infinitely many numbers) between them. But this does not qualify Q to have
more elements than N. What about R, the set of real numbers?
Proof. Let J = {x ∈ R : 0 < x < 1} be the open interval with 0 and 1 as its end
points; the end points are not in J. We first show that J is uncountable.
We use the famous diagonalization method of Georg Cantor. Suppose, on the
contrary, that J is countable. Then, we have a bijection g : N → J. The elements of J
can now be listed as g(0), g(1), g(2), . . . . But each number in J is a nonterminating
decimal number. Write all these numbers in J as in the following:
where each ai j is one of the digits 0, 1, . . . , 9. Using this array of decimals, construct
a real number d = 0.d1 d2 d3 d4 · · · , where for each i ∈ Z+ ,
This number d is called the diagonal number. It differs from each number in the
above list. For example, d g(0) as d1 a11 , and d g(n − 1) as dn ann . But
this d is in J, contradicting the fact that the list contains each and every number in J.
Therefore, J is uncountable.
Since J ⊆ R, uncountability of R is proved. Of course, a stronger fact holds:
|J | = |R|. To see this, define a function f : J → R by f (x) = (x − 1/2)/(x − x 2 ).
Verify that f is a bijection.
Recall that we have agreed to write sets by specifying a defining property of the
form {x : P(x)}. Existence of an uncountable set such as R dispenses the wrong
belief that every set can be expressed by a property and each property can give rise
to a set.
1.4 Functions and Counting 15
To see this, suppose you want to express properties in English. (In fact, any other
language will do.) Each such property is a finite sequence of symbols from the Ro-
man alphabet. For any fixed n, there are clearly a finite number of properties having
n occurrences of symbols. Hence, there are only a countable number of properties.
But there are uncountable number of sets, for example, sets of the type {r }, where r
is a real number. Hence, there are sets that do not correspond to any property. For the
converse, see the following example.
Example 1.6. Consider the property of “A ∈ A.” This property is perhaps meaning-
ful when A is a set. Let S be the set of all sets A such that A ∈ A. Now, is S ∈ S or
S ∈ S?
The contradiction in Example 1.6 shows that there is no set corresponding to the
property that x ∈ x. See Russells’ paradox if you are intrigued.
This is the reason why axiomatic set theory restricts the definition of new sets as
subsets of old sets. In Example 1.6, if you take the big set as the set of all sets, then
S could be a subset of that. In fact, axiomatic set theory does a clever thing so that
existence of such a big set can never be justified. Moreover, it prevents constructing
a set that may also be a member of itself. For information on axiomatic set theory,
you may search for set theories of Zermelo–Fraenkel, or of Gödel–Berneys–Von
Neuman, or of Scott–Potter.
In the proof of Theorem 1.4, we have constructed a set by changing the diagonal
elements of the array of numbers listed as g(0), g(1), . . . . Below we give a very
general result that cardinality of any set must be strictly less than the cardinality
of its power set, which was first proved by Georg Cantor using (and inventing) the
diagonalization method.
Theorem 1.5 (Cantor). No function from a set to its power set can be onto. There-
fore, |A| < |2 A |.
Now it is obvious that the power set of a denumerable set is uncountable. You can
derive the uncountability of the open interval J defined in the proof of Theorem 1.4
16 1 Mathematical Preliminaries
from Cantor’s theorem by using binary decimals instead of the usual decimals. This
representation will first prove the fact that |J | = |2N |. In the last paragraph of the
proof of Theorem 1.4, we have shown that |J | = |R|. Hence, R = |2N |. Cantor
conjectured that each infinite set in between N and R must be in one-to-one corre-
spondence with one of N or R, now known as the Continuum Hypothesis. Because
of this reason, we denote the cardinality of 2N as ℵ1 . Then, the continuum hypothesis
asserts that any subset of R is either finite or has cardinality ℵ0 or ℵ1 . There are in-
teresting results about the continuum hypothesis, but you should be able to look for
them on your own.
Notice that we have only defined the cardinalities of finite sets. For infinite sets,
we know how to compare the cardinalities. Moreover, for notational convenience, we
write the cardinality of a denumerable set as ℵ0 . Cardinality of the power set of a
denumerable set is written as ℵ1 . We may thus extend this notation further by taking
cardinality of the power set of the power set of a denumerable set as ℵ2 , etc., but we
do not have the need for it right now. The countability results discussed so far can be
summarized as:
In all the theorems except Theorem 1.1, we have used the technique of proof by
contradiction. It says that statement S is considered proved when from the assump-
tion that S is not true follows a contradiction. To spell it out explicitly, suppose we
have a set of premises Ω. We want to prove that if all the statements in Ω are true,
then the statement S must be true. The method of proof by contradiction starts by
assuming the falsity of S along with the truth of every statement in Ω. It then de-
rives a contradiction. If the premises in Ω are S1 , S2 , . . . , Sn , then the method can
be schematically written as
Required: S1 , S2 , . . . , Sn . Therefore, S.
We prove: S1 , S2 , . . . , Sn and not S. Therefore, a contradiction.
1.5 Proof Techniques 17
It works because, when not S implies a contradiction, not S must be false. Therefore,
S must be true. Conversely, when S is true, not S is false, and then it must imply a
contradiction.
The method of proof by contradiction appears in many disguises. Calling the
above as the first form, the second form of the method is
This is justified due to the simple reason that by asserting the falsity of “if S1 , then
S2 ,” we assert the truth of S1 and the falsity of S2 .
The third form of proof by contradiction is proving the contraposition of a
statement. It says that for proving “if S1 , then S2 ,” it is sufficient to prove its con-
traposition, which is “if S2 is false, then S1 is false.” In fact, a statement and its
contraposition are logically equivalent. Why is it another form of the “proof by con-
tradiction?” Suppose you have already proved the contrapositive statement “if S2 is
false, then S1 is false”. Then, not S2 and S1 together give the contradiction that S1
is true as well as false. And then the second form above takes care. The principle of
proving the contraposition can be summarized as
The contrapositive of a statement is not the same as its converse. The converse of
“if S1 then S2 ” is “if S2 then S1 ,” which is equivalent to “if not S1 then not S2 ”.
The fourth form does not bring a contradiction from the assumption that S is false.
Rather it derives the truth of S from the falsity of S. Then, it asserts that the proof of
S is complete. It may be summarized as
Justification of this follows from the first form itself. Assume not S. Since you have
proved not S implies S, you also have S. That is, by assuming not S you have got
the contradiction: S and not S.
The fifth form is the so-called argument by cases. It says that to prove a statement
S, pick up any other statement P. Assume P to deduce S. Next, assume that P is
false and also deduce S. Then, you have proved S. It may be summarized as
Why does it work? Since, you have proved P implies S, its contraposition holds. That
is, you have not S implies not P. But you have already proved that not P implies S.
Thus, you have proved not S implies S. Now the fourth form takes care.
All the while you are using the law of double negation, that is, not not S is equiv-
alent to S. Along with it comes the law of excluded middle that one of S or not S
must be true. For example, in the argument by cases, you use the fact that one of P
18 1 Mathematical Preliminaries
or not P must hold. There have been many objections to the law of excluded middle.
One of the most befitting example is by J. B. Bishop. It is as follows.
Example 1.7. Show that there are irrational numbers x, y such that x y is rational.
The alternative proof in the solution of Example 1.7 does not give us a pair x, y
of irrational numbers satisfying the requirements. However, it proves the existence
of such irrational numbers. In the mainstream mathematics, this is a well appreciated
proof. The question there is not only about accepting the law of excluded middle, but
also about appreciating the nature of existence in mathematics.
When we say that there exists an object with such and such property, what we
understand is: it is not the case that the property is false for every object in the
domain of discourse. It may or may not be always possible to construct that object
with exactitude. See the following example.
5
Example 1.8. Show that there exists a real number x satisfying x x = 5.
5
Solution. Let f : R → R be given by f (x) = x x . We see that f (1) = 1 and
f (2) = 232 . Also, f is a continuous function. Since f (1) < 5 < f (2), by the
intermediate value theorem, there exists an a with 1 < a < 2 such that f (a) = 5.
The a is not obtained exactly, but we know that there is at least one such point
between 1 and 2. Of course, there is a better way of getting such an a. For example,
a = 51/5 does the job! However, even if we could not have got this simple a, the
solution in Example 1.8 is still valid.
Example 1.9. Show that there exists a program that reports correctly whether tomor-
row by this time, I will be alive or not.
Either I will be alive tomorrow by this time or not. That is, either Program-1 correctly
reports the fact or Program-2 correctly reports the fact. Hence, we have a program
that does the job, but we do not know which one.
What about Program-3 : Wait till tomorrow. See what happens; then report ac-
cordingly? This, of course, does the job correctly. But this is constructive, whereas
the existence in the solution of Example 1.9 is not. Looked in a different way, the
statement in Example 1.9 is ambiguous.
One meaning of it has been exploited in the example. The other meaning asks for
a program that “justifiably predicts” whether I will be alive till tomorrow or not. The
solution there does not answer this nor does Program-3. Also, none of Program-1 and
Program-2 work correctly in all cases. Program-1 can be wrong if I really die today,
and Program-2 is wrong when I do live up to day after tomorrow.
Nonetheless, the method combined with diagonalization can be used for showing
nonexistence of programs. Choose any programing language in which you can write
a program that would compute functions with domain N and co-domain {0, 1}. Com-
puting a function here means that if f : N → {0, 1} is a given function, then you
can possibly have a program P f in your language that takes input as any n ∈ N and
outputs f (n). This program P f computes the function f.
Example 1.10. Prove that there exists a function g : N → {0, 1} that cannot be
computed by any program in whatever language you choose.
Solution. Choose your language, say, C. Since the C-programs can be enumerated in
alphabetical order, they form a countable set. The set of all C-programs that com-
pute functions is a subset of the set of all C-programs, and hence, is countable.
Enumerate the programs that compute functions in alphabetical order. Call them
C0 , C1 , C2 , . . . . Each C j takes a number n ∈ N as an input, and outputs either 0
or 1. Define a function f : N → {0, 1} by
f (n) = 0, if Cn outputs 1 on input n, and f (n) = 1 if Cn outputs 0 on input n.
Now, if there exists a C-program that computes f, then it must be one of
C0 , C1 , C2 , . . . . Suppose it is Cm . But on input m, Cm outputs a different value
than f. So, it does not compute f. Hence no C-program can compute this f.
A nonconstructive version of the solution to Example 1.10 uses the fact that there
are uncountable number of functions from N to {0, 1}, whereas there are only a count-
able number of C-programs. It is because the set of all such functions is in one-to-one
correspondence with the power set of N. See Problem 1.16.
In the above solution, we have used a form of proof by contradiction. All the forms
of proof by contradiction are propositional in nature, that is, they simply play with the
simple propositional connectives like “and,” “or,” “not,” etc. One more proof method
that uses the propositional connectives is the so-called proof employing a conditional
hypothesis. It is summarized as
Certainly, there is something wrong. You see that there is nothing wrong with
the proof using the technique of conditional hypothesis. It is wrong to denote the
statement
If S is true, then there is no life on earth.
by S. To further understand what is going on, see the following commercial of this
book using three seductive questions.
I : I’ll ask you three questions. Would you like to answer each with “Yes” or “No?”
You : Yes.
I : That’s my first question. Will you answer the same to the third as to the second?
You : No.
I : Will you promise me that you will read only this book on Theory of computation
and no other book on the topic throughout your life?
1.5 Proof Techniques 21
Now you are trapped. Since you have answered “No” to the second question, you
cannot answer “No” to the third question. Had you answered “Yes” to the second
question, then also you had to answer “Yes” to the third.
Of course, had you chosen to answer “No” to my first question, you would not
have been trapped. But as it is, why does it happen? The reason is the same as in
Example 1.11, a spurious self-reference. When you give notation to a statement, it
should have both the possibilities of being true or false. The notation itself cannot
impose a truth condition. In Example 1.11, the notation S violates this, as S cannot be
false there. If S is false, then the “if . . . then . . .” statement that it stands for becomes
true, which is untenable. The same way, you are trapped in Example 1.12.
Along with the propositional methods, we had also used the diagonalization tech-
nique of Cantor. There are two more general proof methods we will use in this book.
The first is the principle of mathematical induction. It is, in fact, a deductive proce-
dure. It has two versions: one is called the strong induction and the other is called
induction, without any adjective.
Writing P(n) for a property of natural numbers, the two forms of the principle
can be stated as
Strong Induction : P(0). If P(k) holds for each k < n, then P(n).
Therefore, for each n ∈ N, P(n).
Induction : P(0). If P(n) then P(n + 1).
Therefore, for each n ∈ N, P(n).
Verification of P(0) is called the basis step of induction, and the other “if . . .
then” statement is called the induction step. In case of strong induction, the fact
that all of P(0), P(1), . . . P(n − 1) hold is called the induction hypothesis; and in
case of induction, the induction hypothesis is “P(n) holds.” Both the principles are
equivalent, and one is chosen over the other for convenience.
In the case of strong induction, the induction step involves assuming P(k) for each
k < n and then deriving P(n). While the induction step in the other case consists of
deriving P(n + 1) from the single assumption P(n). Thus it is safer to start with the
strong induction when we do not know which one of them will really succeed.
The principle is also used to prove a property that might hold for all natural num-
bers greater than a fixed m. This is a generalization of the above. The formulation of
the principle now looks like:
Strong Induction : P(m). If P(k) holds for each k with m ≤ k < n, then P(n).
Therefore, for each natural number n ≥ m, P(n).
Induction : P(m). For each n ≥ m, if P(n) then P(n + 1).
Therefore, for each natural number n ≥ m, P(n).
As earlier, verification of P(m) is the basis step of induction, and the other “if . . .
then” statement is the induction step. In case of strong induction, the fact “P(k) holds
for each k with m ≤ k < n” is the induction hypothesis; and in case of induction, the
induction hypothesis is “P(n) holds.”
22 1 Mathematical Preliminaries
Not only on N, but wherever we see the structure of N, we can use this principle.
For example, it can be used on any set via the well ordering principle, which states
that every set can be well ordered; see Problem 1.10. However, we will not require
this general kind of induction.
Example 1.13 (Hilbert’s Hotel). Hilbert has a hotel having rooms as many as num-
bers in Z+ . Show that he has rooms for any number of persons arriving in groups,
where a group might contain infinite number of persons.
Solution. Naturally, we take the infinite involved in the story as ℵ0 . If only one such
group asks for rooms, Hilbert just assigns one to each. Suppose he has accommo-
dated n number of such groups. The (n + 1)th group arrives. Then, he asks the in-
cumbents to move to other rooms by the formula:
Now, all odd numbered rooms are free. And the persons in the just arrived group get
accommodated there.
Example 1.14 (König’s Lemma). Show that each finitely generated infinite tree has
an infinite branch.
Choose its root as vn+1 . Now, we get a sequence v0 , v1 , . . . , vn , vn+1 such that vn+1 is
the root of an infinite subtree of T. Here ends the induction step.
However, there is a danger in misusing the principle of induction. See the follow-
ing example.
Example 1.15 (Fallacious Induction). In a certain tribe, each boy loves a girl. Show
that each boy loves the same girl.
Solution. In the basis step, consider any single boy. Clearly, the statement holds for
him. For the induction step, assume that if you take any group of n boys, you find that
they love the same girl. Now, take any group of n +1 boys. Call them b1 , b2 , . . . , bn+1 .
Form two groups of n boys each. The first group has the boys b1 , b2 , . . . , bn and
the second group has b2 , b3 , . . . , bn+1 . Now, by the induction hypothesis, all of
b1 , b2 , . . . , bn love the same girl and all of b2 , b3 , . . . , bn+1 love the same girl. As
b2 , b3 , . . . , bn are common to both the groups, we see that all of b1 , b2 , bn , bn+1 love
the same girl.
For the argument of the induction step to hold, the set {b2 , b3 , . . . , bn } must be
nonempty. That means, the basis step is not n = 1 but n = 2. You will see plenty of
induction proofs later. Combining the principle of induction and proof by contradic-
tion, we get Fermat’s principle of finite descent. It is stated as
If P(n + 1), then P(n). But not P(0). Therefore, not P(n), for each n ∈ N.
Example 1.16 (Surprise Quiz). Your teacher (not of this course!) declares in the
class on a Friday that he will be conducting a surprise quiz some time on the next
week. When returning to the dormitory, your friend says − “so nice of him; he will
not be able to conduct the quiz this time.” He thus argues, “You see, he cannot afford
to keep the quiz on Friday, for, in that case, he does not conduct the quiz till Thursday.
Certainly then, we infer the quiz to be on Friday and it would not be a surprise quiz.
Now agreed that he has to conduct the quiz on or before Thursday, can he afford not
to conduct the quiz till Wednesday? No, for then, we infer that only on Thursday he
conducts the quiz. Continuing three more steps, you see that he cannot even conduct
the quiz on Monday.”
Let A, B be two finite sets. If |A| > |B| and f : A → B is a total function, then
f cannot be one–one.
24 1 Mathematical Preliminaries
Even finiteness of the sets can be dropped, but that would require transfinite induction
to justify the principle. Try proving this principle by using induction on |A|. We see
an application of the principle.
Example 1.17. Show that if seven points are chosen randomly from inside a circle
of radius 1, then there are at least two points whose distance is less than 1.
Solution. Take a circle of radius 1 and divide it into six equal parts by drawing six
radii. Each of the six sectors of the circle is bounded by two radii and an arc. If seven
points are chosen at random from inside the circle, then by Pigeon hole principle, at
least two of them are from the same sector; which sector, we do not know. Now, the
distance between those two points is less than 1.
You have already seen how induction could be used for defining certain objects.
For example, we have defined the cardinalities of finite sets as |∅| = 0, |{0}| = 1,
|{0, 1, 2, . . . , n}| = n + 1. This definition uses induction. Another common example
of definition by induction, sometimes called a recursive definition, is of the factorial
function. It is defined by 0! = 1, (n + 1)! = n! (n + 1), for each n > 0. The Fibonacci
sequence is defined recursively by f 0 = 1, f 1 = 1, f n+1 = f n + fn−1 , for each n > 1.
The construction of a suitable branch in König’s lemma is by induction.
Sometimes a definition by induction does not use any number. For example, in
defining an arithmetic expression involving the variables x, y, and the only operation
as +, you would declare that each of x, y is an arithmetic expression. This is the
basis step in the definition. Next, you will declare that if E 1 , E 2 are expressions, then
(E 1 + E 2 ) is an expression, and nothing else is an expression. In such a case, suppose
you want to prove that in every expression there is an equal number of left and right
parentheses. How do you proceed?
Obviously, the proof is by induction on the number of left parentheses, or on the
number of right parentheses, or on the number of + signs. But this has to be identified.
Suppose we pick up the later. In the basis step, if there is no + sign in an expression,
then there are no parentheses. Hence, the number of left and right parentheses are
equal, equal to 0.
Assume the induction hypothesis that if an expression has less than n number
of + signs, then the number of left and right parentheses are equal. We plan to use
strong induction to be on the safe side. Suppose E is an expression having n number
of + signs,. Then, E = (E 1 + E 2 ) for some expressions E 1 and E 2 . Now, both of
E 1 , E 2 satisfy the induction hypothesis. Thus they have equal number of left and
right parentheses. Then so does (E 1 + E 2 ).
In the above proof, we can avoid the parameter n, which we had chosen as
the number of + signs. Here, we verify that the statement holds in the basis case of
the inductive definition of expressions. Next, in the inductive step, we see if E 1 , E 2
are expressions satisfying the conclusion that the left and right parentheses are equal
in number, then so does the new expression (E 1 + E 2 ). There ends the proof.
Such a use of induction without identifying a parameter is named as the prin-
ciple of structural induction. To keep the matter straight, we will rather identify a
suggestive integer parameter than using this principle.
1.6 Summary and Problems 25
As you have observed, the mathematical preliminaries are not at all tough. All that
we require is a working knowledge of set theory, the concept of cardinality, trees,
induction, and the pigeon hole principle.
A good reference on Set Theory including cardinality that covers all the topics
discussed here is [50]. For induction, see [102]. For a reference on formal derivations
and their applications to discrete mathematics, see [48]. These books also contain a
lot of exercises. For an interesting history of numbers and number systems, see [35].
The story of the Hottentot tribes is from this book; of course I have modified it a bit
to provide motivation for counting.
Unlike other chapters, I have neither included exercises nor problems in each
section; probably you do not require them. If you are really interested, here are some.
1.2. What is wrong in the following fallacious proof of the statement that each sym-
metric and transitive binary relation on a nonempty set must also be reflexive:
Suppose a Rb. By symmetry, b Ra. By transitivity, a Ra?
1.3. Label the edges of the graph in Fig. 1.3 as e1 , . . . , e7 and then write the corre-
sponding labeled graph as a triple, now with an incidence relation.
1.4. If |A| = n, then how many binary relations on A are there? Among them, how
many are reflexive? How many of them are symmetric? How many of them are both
reflexive and symmetric?
1.5. Show that the inverse of an equivalence relation on a set is also an equivalence
relation. What relation is there between the equivalence classes of the relation and
those of its inverse?
1.6. Let A be a finite set. Let the binary relation R on 2 A be defined by x Ry if there
is a bijection between x and y. Construct a function f : 2 A → N such that for any
x, y ∈ 2 A , f (x) = f (y) iff xRy.
1.7. Let R be an equivalence relation on any nonempty set A. Find a set B and a
function f : A → B such that for any x, y ∈ A, f (x) = f (y) iff xRy.
1.8. Let R, R be two equivalence relations on a set A. Let P and P be the partitions
of A consisting of equivalence classes of R and R , respectively. Show that R ⊆ R
iff P is finer than P , that is, each x in P is a subset of some y in P .
1.9. Let A be a nonempty set. Suppose A denotes the empty collection of subsets of
A. What are the sets ∪A and ∩A?
26 1 Mathematical Preliminaries
1.14. Let A, B be any sets. Prove that there exists a on-one function from A to B
iff there exists a function from B onto A. [Hint: For the “if ” part, you may need the
axiom of choice.]
1.16. For sets A, B, define B A as the set of all functions from A to B. Prove that
for any set C, there is a one–one correspondence between {0, 1}C and the power set
2C . This is the reason we write the power set of C as 2C . [Hint: For D ⊆ C, define
its characteristic function, also called the indical function, χ D : C → {0, 1} by “if
x ∈ D, then χ D (x) = 1; else, χ D (x) = 0.”]
1.18. Recall that a nonempty set A is finite iff there is a bijection between A and
{0, 1, . . . , n} for some n ∈ N. Here, we show how cardinality of a finite set is well
defined. Let A be a set; a ∈ A; B A; and let n ∈ N. Prove the following without
using cardinality:
(a) There is a bijection between A and {0, 1, . . . , n+1} iff there is a bijection between
A − {a} and {0, 1, . . . , n}.
(b) Suppose there is a bijection between A and {0, 1, . . . , n +1}. Then, there exists no
bijection between B and {0, 1, . . . , n + 1}. If B ∅, then there exists a bijection
between B and {0, 1, . . . , m}, for some m < n.
1.20. Prove: If A is a nonempty set and n ∈ N, then the following are equivalent:
(a) There is a one–one function from A to {0, 1, . . . , n}.
(b) There is a function from {0, 1, . . . , n} onto A.
(c) A is finite and has at most n + 1 elements.
1.24. Using Cantor’s theorem, show that the collection of all sets is, infact, not a set.
1.27. Show that for each n ∈ Z+ , |Qn | = |Q| and |Rn | = |R|.
1.28. Show that the set of algebraic numbers is denumerable. Then deduce that there
are more transcendental numbers than the algebraic numbers.
1.30. Let A, B be sets. Is it true that there is an injection from A to B iff there is a
surjection from B to A? [Hint: Does axiom of choice help? See Problem 1.10]
1.34. Let S denote the sentence: This sentence has no proof. Show that S is true.
Conclude that there is a true sentence having no proof. [Gödel’s proof of his incom-
pleteness theorem expresses S in the system of natural numbers.]
1.35. Let S be the sentence: This sentence has no short proof. Show that if S is true,
then there exists a sentence whose proof is not short, but the fact that it is provable
has a short proof. [A formal version of this S expressed in the system of natural
numbers is called Parikh’s sentence.]
1.36. Let P(m, n) denote a property involving two natural numbers. Suppose we
prove that P(0, 0) is true. We also prove that if P(i, j ) is true, then both P(i, j + 1)
and P(i + 1, j ) are true. Does it follow that P(m, n) is true for any m, n ∈ N?
√ √ √ √ √
1.37. Show that for each integer n > 1, 1/ 1 + 1/ 2 + 1/ 3 + · · · + 1/ n > n.
1.38. Deduce the pigeon hole principle from the principle of induction.
1.39. Show that among any n + 2 positive integers, either there are two whose sum is
divisible by 2n or there are two whose difference is divisible by 2n.
1.40. Use the pigeon hole principle to show that each rational number has a recurring
decimal representation.
1.6 Summary and Problems 29
1.41. Show that among any n + 1 numbers randomly chosen from {1, 2, . . . , 2n},
there are at least two such that one divides the other.
√
1.42. Let A = {m + n 2 : m, n ∈ Z}. Show that for each k ∈ Z+ , there is x k ∈ A
such that 0 < x k < 1/k.
2 Regular Languages
2.1 Introduction
It is said that human intelligence is mainly the capability to represent a problem, its
solution, or related facts in many seemingly different ways. You must have encoun-
tered it in several problem-solving situations. You first represent the problem in a
known language, where you might like to eliminate or omit the irrelevant aspects
and consider only the appropriate ones. The methodology is followed throughout
mathematics starting from solving first arithmetic problems such as “if you already
had five candies and your friend offers you one more, then how many candies do you
have now?”.
To give another example, the memory in a computer is only an assembly of
switches which, at any moment, may be off or on. If there is a trillion of them, then
it may be represented as a trillion digited binary number, when, say, off corresponds
to 0 and on to 1. In some situations, we may not be interested in all possible bi-
nary numbers, but only those having a few number of digits out of the trillion, or
only those having a particular pattern, such as “there is at least one 0 following ev-
ery occurrence of a 1.” There might arise a situation where we would like to have a
representational scheme having more than two symbols. We will, however, consider
only a finite number of symbols at a time, and in parallel with the existing natural
languages, we will develop formal languages out of these symbols.
In this book, we will introduce a hierarchy of formal languages. To represent
formal languages, we will study grammars. Each such type in the hierarchy will
have its own type of mechanical device, which may recognize the language but not
any other language from another type. In the sequel, we will have to introduce many
technicalities. The technical words or phrases, as usually are, will either be defined
clearly or will be left completely undefined. In the latter case, I will attempt at a
description of such undefined or primitive notions so that you will be able to think
about them in a certain well-intended way.
We start with the primitive notion of a symbol. A symbol is any written sign. We
adhere to the written scripts as communication between you and me. Of course, you
can even consider “spoken signs” or even “body language” or any piece in any other
sign language. The implicit assumption here is that we will be able to represent other
types of symbols or signs in terms of the written ones, in terms of our symbols.
An alphabet is then a nonempty finite set of symbols, where no symbol is a part
of another. Notice that the phrase is a part of is again a primitive notion here. We
will not allow the blank symbol to be in our alphabets for some technical reasons.
If the blank symbol has to be used in some situation, then we will rather have some
b . For example, {0, 1}, {a, b, c, . . . , z}, {@, $, !, 1, a, z, 5, >} are
rewriting of it, say,
alphabets, but {0, 10}, {ab, c, a} are not alphabets as 0 is a part of 10 and a is a part
of ab.
A word or a string over an alphabet is a finite sequence of symbols from the
alphabet. For example, each word in an English dictionary is a string over the Roman
alphabet. Each natural number is a string over the alphabet {0, 1, 2, . . . , 9}. Thus a
string is written as a sequence of symbols followed one after another without any
punctuation marks. This way of writing a string is referred to as the operation of
concatenation of symbols.
The operation can be defined for strings also. For example, concatenation of
strings alpha and bet is alphabet, and concatenation of bet and alpha is betalpha.
If s and t are strings (over an alphabet), then concatenation of s and t is the string
st (over the same alphabet). There is a special string, the string containing no occur-
rence of any symbol whatsoever, called the empty string. The empty string is indeed
unique, and it is a string over every alphabet. We will denote the empty string by the
symbol ε. It serves as an identity of concatenation as for any string u, uε = εu = u.
The number of occurrences of symbols in a string is called its length. The length
of alpha is 5 and the length of bet is 3. The empty string ε has length 0. Note that
the length of a string depends upon the underlying alphabet as the string itself needs
an alphabet, first of all. For example, the length of the string 1001 over the alphabet
{0, 1} is 4, while its length over the alphabet {10, 01} is 2.
The vagueness in the definition of a symbol and an alphabet is removed by follow-
ing a strict mathematical formalism. In this formalism, an alphabet is taken as any
finite set and a string over the alphabet is taken as a map from {1, 2, . . . , n} to the
alphabet, where n is some natural number. The natural number n is again the length
of the string. For example, the string 101 over the alphabet {0, 1} is simply the map
f : {1, 2, 3} → {0, 1}, where f (1) = 1, f (2) = 0, f (3) = 1. This string has length
3 as usual. Note that the map f here can be completely determined by its values at
the integers 1, 2, 3. That is, the map can be rewritten by noting down its values at
1, 2, 3 one after another, and, in that order. That is how the formal definition would
be connected to the informal.
When the natural number n is taken as 0, we have an empty domain for the map,
and then, by convention, we will have the empty map, the empty string ε, having
length 0. Moreover, the length function
can be defined inductively over an alphabet
Σ as in the following:
2.2 Language Basics 33
1.
(ε) = 0.
2. If σ ∈ Σ and u is a string over Σ, then
(uσ) =
(u) + 1.
1. ε R = ε.
2. If σ ∈ Σ and u is a string over Σ, then (σu) R = u R σ.
See that the above definition of the reversal does really capture the notion of the
reversal; for example, (r ever se) R = esr ever . It follows that the length of the reversal
of a string is same as the length of the string. It also follows that (uv) R = v R u R for
strings u and v. Show these by induction (see Sect. 1.5.) on the length of the string!
Exercise 2.1. Define the operation of concatenation inductively and then show that
this operation is associative but not commutative.
Exercise 2.2. Write (ab)0 = ε, (ab)1 = ab, (ab)2 = abab, . . . for the string ab.
Define (ab)n inductively for every n ∈ N. Show that
((ab)n ) = n
(ab).
A string u is a prefix of a string w iff there is a string v such that w = uv. Similarly,
if for some string v, we have w = vu, then u is called a suffix of w. In general, u is a
substring of w iff there are strings x, y such that w = xuy.
For example, pre is a prefix of prefix and fix is a suffix of suffix (and also of prefix).
The string ref is a substring of prefix. Vacuously, both pre and fix are substrings of
prefix. As the strings x, y in w = xuy can be taken as the empty string ε, every string
is a substring of itself. Also, every string is both a prefix and a suffix of itself.
Observe that all of ε, p, pr, pre, pref, prefi, prefix are prefixes of prefix. Out of
these if you take any two, can you find any relation between them? Easy, one of them
has to be a prefix of the other! If both u and v are prefixes of the same string w, then u
is matching with a part of w from the left and so is v. So, the one with smaller length
must be a prefix of the other. But this is not a proof!
Lemma 2.1 (Prefix Lemma). If u and v are prefixes of a string w over an alphabet
Σ, then u is a prefix of v or v is a prefix of u.
Proof. We prove it by induction on the length of w. If
(w) = 0, then w = ε, and then
u = v = ε. This shows that u is a prefix of v. Assume the induction hypothesis that
for all strings w of length n, the statement holds. Let
(w) = n + 1. Write w = zσ,
where σ ∈ Σ is the last symbol of w, and then
(z) = n. Let u and v be prefixes of
w. If one of u, v equals w, then the other is a prefix of it. So, suppose that neither u
nor v is equal to w. Then both u and v are prefixes of z. (Why?) As
(z) = n, by the
induction hypothesis, u is a prefix of v or v is a prefix of u.
Exercise 2.3. Formulate and prove Lemma 2.1 with suffixes instead of prefixes.
What about mixing prefixes and suffixes?
A language over an alphabet Σ is any set of strings over Σ. In particular, ∅, the
empty set (of strings) is a language over every alphabet. So are the sets {ε}, the set
Σ itself, and the set of all strings over Σ, which we denote by Σ ∗ .
34 2 Regular Languages
Thus, any book written in English may be thought of as a language over the Ro-
man alphabet. The set of all binary numbers starting with 1 and of length 2, that is,
the set {10, 11} is a language over {0, 1}. The binary palindromes (the strings that
are same when read from right to left) form a language as this can be written as
{w ∈ {0, 1}∗ : w R = w}, a subset of {0, 1}∗ .
Theorem 2.1. Let Σ be any alphabet. Then Σ ∗ is denumerable. Therefore, there are
uncountable number of languages over Σ.
Proof. Write Σ0 = {ε} and Σn = the set of all strings of length n over Σ, for any n ∈
Z+ . If |Σ| = m, then there are m n strings in Σn . As Σ ∗ = ∪n∈N Σn , a denumerable
union of finite sets, it is denumerable. Each language over Σ is a subset of Σ ∗ .
∗
Thus, the number of such languages is the cardinality of the power set 2Σ , which is
uncountable by Theorem 1.5.
The question is how to name all these languages? Obviously, whatever way we try
to name them, the names themselves will be strings over some alphabet, and there
can only be a countable number of them at the most, unless we choose to supply
names from another uncountable set such as R. So, before even attempting to name
the languages, we see that any such attempt is bound to fail. But then, can we name
certain interesting languages? It, of course, depends upon what our interest is and
what is our naming scheme. Note that naming schemes are only finitary ways of
representing the languages.
We start with some natural naming schemes. As languages are sets, we can use set
operations such as union, intersection, complementation (in Σ ∗ ) etc. We also have
the operation of concatenation for strings, which can be adopted or extended to lan-
guages. Let L, L 1 , L 2 be languages over an alphabet Σ. Then
L 1 L 2 = {uv : u ∈ L 1 and v ∈ L 2 } is the concatenation of the languages L 1 and L 2 .
Note the asymmetry in this and.
L 1 ∪ L 2 = {w : w ∈ L 1 or w ∈ L 2 } is the union of L 1 and L 2 .
L 1 ∩ L 2 = {w : w ∈ L 1 and w ∈ L 2 } is the intersection of L 1 and L 2 .
L 1 − L 2 = {w : w ∈ L 1 but w ∈ L 2 } is the difference of L 2 from L 1 .
L = {w ∈ Σ ∗ : w ∈ L} is the complement of the language L.
The powers of L , denoted by L m , for m ∈ N, are defined inductively by L 0 = {ε}
and L n+1 = L L n .
The Kleene star (or the closure or the asterate) of the language L is defined as
L ∗ = ∪m∈N L m . Read it as L star. Notice that it goes along well with our earlier nota-
tion Σ ∗ , the set of all strings over the alphabet Σ.
The Kleene plus of L is L + = L L ∗ = {u1 u2 · · · uk : k > 0 and ui ∈ L}. Read it as L
plus. L + is also referred to as the positive closure of L.
Similarly other set operations will give rise to respective definitions of new lan-
guages using the old ones. Our aim is to use these symbolism for writing many
interesting languages in a compact way. See the following examples.
2.2 Language Basics 35
Example 2.1. Can we represent the language L = {w ∈ {a, b, c}∗ : w does not end
with c} using the symbolism we have developed so far?
Solution. Will it be easy if we first try representing all strings over Σ that end with
one c? Any such string will be a string from Σ ∗ followed by the symbol c. That is,
L = {uc : u ∈ Σ ∗ } = Σ ∗ {c} = {a, b, c}∗{c}. Thus, L = {a, b, c}∗ {c}.
For the time being, we only consider the operations of concatenation, union, and
the Kleene star starting from the symbols of an alphabet and the empty language
∅. We begin with such a definition of a class of expressions and the corresponding
languages they represent.
2.3 Regular Expressions 37
A regular expression over an alphabet Σ and the language it represents are defined
inductively by the following rules:
The parentheses above are used to remove possible ambiguities such as those that
occur in the expression αβ∪γ , which may be read as α(β∪γ ) or as (αβ)∪γ . However,
we will dispense with many parentheses by using the following precedence rules:
We will also dispense with the outermost parentheses from a regular expression.
This means that the regular expression ((αβ) ∪ γ ∗ ) will be rewritten (or abbreviated)
as αβ ∪ γ ∗ . Instead of writing L((αβ)), L((α ∪ β)), (L(α))∗ , we will simply write
L(αβ), L(α ∪ β), L(α)∗ , respectively.
A language is called a regular language iff it can be represented by a regular
expression, that is, when L = L(α) for some regular expression α.
Example 2.2. Let Σ be any alphabet. Then ∅, {ε}, Σ, Σ ∗ , Σ + , Σ n for any n ∈ N are
regular languages. Note that ∅0 = {ε} and ∅n = ∅ for n > 0. Thus, ∅∗ = {ε}, Σ + =
ΣΣ ∗ , and Σ n = Σ · · · Σ concatenated n times. Similarly, L = {a m bn : m, n ∈ N} is
regular as L = L(a ∗ b∗ ).
Example 2.3. Let L be the set of all strings over {a, b} having exactly two occur-
rences of b, which are consecutive. Is L regular?
Example 2.4. Let L be the language over {a, b} having exactly two nonconsecutive
b’s. Is L a regular language?
38 2 Regular Languages
Solution. You can write L as L(a ∗ ba ∗ ba ∗ ) − L(a ∗ bba ∗). But this does not help as
difference or complementation is not allowed in a regular expression. Any typical
string in L has one b in its middle, followed by some a’s and then another b. It
may look something like aa · · · aba · · · aba · · · a. Note that, before the first b (first
from the left) there may not be an a and similarly there may not be an a after the
second b. But there must be an a in between the two b’s. Thus, L = L(a ∗ ba + ba ∗) =
L(a ∗ baa ∗ba ∗ ), a regular language.
Exercise 2.4. See that L(a ∗ ba ∗ baa ∗ba ∗ (ba ∗ ∪ ∅∗ )) = {w ∈ {a, b}∗ : w has
three or four occurrences of b’s in which the second and third occurrences are not
consecutive}.
Solution. With Σ = {a, b}, this language L contains all the strings of the language in
Example 2.4 and the strings that do not contain exactly two consecutive b’s. Thus L =
L(a ∗ baa ∗ba ∗ ) ∪ L(a ∗ ) ∪ L(a ∗ ba ∗ ) ∪ Σ ∗ {b}Σ ∗ {b}Σ ∗ {b}Σ ∗ . The regular expression
for L is a ∗ ∪ a ∗ ba ∗ ∪ a ∗ baa ∗ba ∗ ∪ (a ∪ b)∗ b(a ∪ b)∗ b(a ∪ b)∗ b(a ∪ b)∗ .
Solution. Clearly, any string of a’s is in the language. What else? There are also
strings with 0 or more a’s followed by one or two b’s, then at least one a, and
then 0 or more of b or bb, and then a string of a’s. What is the middle aa ∗ do-
ing? It prevents occurrences of three or more consecutive b’s. The language is
L((a ∪ b)∗ bbb(a ∪ b)∗ ).
Hence forward, we will make the overline short; for example, we will write
L((a ∪ b)∗ bbb(a ∪ b)∗ ) instead of L((a ∪ b)∗ bbb(a ∪ b)∗ ).
Exercise 2.5. Does the regular expression a ∗ ∪ ((a ∗ (b ∪ bb))(aa ∗(b ∪ bb))∗)a ∗ rep-
resent the language L(bbb)?
We will also write the regular expression itself for the language it represents. This
will simplify our notation a bit. For example, instead of writing L(a ∗ b∗ ), we will
simply write the language a ∗ b∗ . Use of the phrase “the language” will clarify the
meaning.
Exercise 2.6. Does the equality L(bbb) = (∅∗ ∪ b ∪ bb)(a ∪ bb ∪ abb)∗ hold?
We will also say that two regular expressions are equivalent when they represent
the same language. Moreover, in accordance with the last section, two equivalent
regular expressions can also be written as equal. This means, for regular expressions
R, E, we will use any one of the notations R = E (sloppy), L(R) = L(E) (precise),
or R ≡ E (R is equivalent to E, technical) to express the one and the same thing.
2.3 Regular Expressions 39
2.17. Give regular expressions for the following languages over {a}:
(a) {a 2n+1 : n ∈ N}.
(b) {a n : n is divisible by 2 or 3, or n = 5}.
(c) {a 2 , a 5 , a 8 , . . .}.
2.19. Find regular expressions for the following languages over {0, 1}:
(a) {w : w does not contain 0}.
(b) {w : w does not contain the substring 01}.
(c) Set of all strings having at least one pair of consecutive zeros.
(d) Set of all strings having no pair of consecutive zeros.
(e) Set of all strings having at least two occurrences of 1’s between any two occur-
rences of 0’s.
(f) {0m 1n : m + n is even, m, n ∈ N}.
(g) {0m 1n : m ≥ 4, n ≤ 3, m, n ∈ N}.
(h) {0m 1n : m ≤ 4, n ≤ 3, m, n ∈ N}.
(i) Complement of {0m 1n : m ≥ 4, n ≤ 3, m, n ∈ N}.
(j) Complement of {0m 1n : m ≤ 4, n ≤ 3, m, n ∈ N}.
(k) Complement of {w ∈ (0 ∪ 1)∗ 1(0 ∪ 01)∗ :
(w) ≤ 3}.
(l) {0m 1n : m ≥ 1, n ≥ 1, mn ≥ 3}.
(m) {01n w : w ∈ {0, 1}+ , n ∈ N}.
(n) Complement of {02m 12n+1 : m, n ∈ N}.
(o) {uwu :
(u) = 2}.
(p) Set of all strings having exactly one pair of consecutive zeros.
(q) Set of all strings ending with 01.
(r) Set of all strings not ending in 01.
(s) Set of all strings containing an even number of zeros.
(t) Set of all strings with at most two occurrences of the substring 00.
(u) Set of all strings having at least two occurrences of the substring 00. [Note: 000
contains two such occurrences.]
2.20. Write the languages of the following regular expressions in set notation, and
also give verbal descriptions:
(a) a ∗ (a ∪ b).
(b) (a ∪ b)∗ (a ∪ bb).
(c) ((0 ∪ 1)∗ (0 ∪ 1)∗ )∗ 00(0 ∪ 1)∗ .
(d) (aa)∗(bb)∗ b.
(e) (1 ∪ 01)∗ .
(f) (aa)∗ b(aa)∗ ∪ a(aa)∗ba(aa)∗.
40 2 Regular Languages
2.21. Give regular expressions for the following languages on {a, b, c}:
(a) Set of all strings containing exactly one a.
(b) Set of all strings containing no more than three a’s.
(c) Set of all strings that contain aaa as a substring.
(d) Set of all strings that do not have aaa as a substring.
(e) Set of all strings with number of a’s divisible by three.
(f) Set of all strings that contain at least one occurrence of each alphabet symbol.
(g) {w :
(w) mod 3 = 0)}.
(h) {w : #a (w) mod 3 = 0}.
(i) {w : #a (w) mod 5 > 0}.
2.22. Let E 1 , E 2 be arbitrary regular expressions. Are the following true? Justify.
(a) (E 1∗ )∗ ≡ E 1∗ .
(b) E 1∗ (E 1 ∪ E 2 )∗ ≡ (E 1 ∪ E 2 )∗ .
(c) (E 1 ∪ E 2 )∗ ≡ (E 1∗ E 2∗ )∗ .
(d) (E 1 E 2 )∗ ≡ E 1∗ E 2∗ .
(e) (E 1 ∪ E 2 )∗ ≡ E 1∗ ∪ E 2∗ .
(f) (E 1 ∪ E 2 )∗ E 2 ≡ (E 1∗ E 2 )∗ .
(g) (E 1 ∪ E 2 )E 3 ≡ E 1 E 3 ∪ E 2 E 3 .
(h) (E 1 E 2 ∪ E 1 )∗ E 1 ≡ E 1 (E 2 E 1 ∪ E 1 )∗ .
(i) (E 1 E 2 ∪ E 1 )∗ E 1 E 2 ≡ (E 1 E 1∗ E 2 )∗ .
(j) E 1 (E 2 E 1 ∪ E 1 )∗ E 2 ≡ E 2 E 2∗ E 1 (E 2 E 2∗ E 1 )∗ .
2.24. Let E be a regular expression that does not involve ∅ or ε. Give a necessary
and sufficient condition that L(E) is infinite.
S → AB, A → a A, A → ε, B → b B, B → ε.
2.4 Regular Grammars 41
To see, for example, that a 3 b2 ∈ L, we can apply the rules and derive a 3 b2 as
The symbol ⇒ is read as “yields in one step.” It means that a rule has been applied
on the left side expression to obtain the one on the right. Thus the first yield S ⇒
AB is an application of the rule S → AB. The second yield AB ⇒ a AB is an
application of the rule A → a A. Similarly, the last yield is an application of the rule
B → ε. These successive applications of the rules above show that the string a 3 b2
can be derived from S by a sequence of applications of the above rules. We write
this as
∗
S ⇒ a 3b2 .
Read this as S yields (in zero or more steps) a 3 b2 . We will, in fact, omit the super-
script ∗ in the yield relation, and write this simply as S ⇒ a 3 b2 . Such a device of
rules using extra symbols (as S, A, B here) is called a grammar (a phase structure
grammar, to be specific) in common parlance with the natural languages. We can, of
course, find another grammar for generating exactly the same strings. For example,
a grammar with the rules S → a S, S → B, B → b B, B → ε will also do. Try a
derivation of a 3 b2 by using these new rules.
A regular grammar G is a quadruple (N, Σ, R, S), where
A → ε or A → u or A → uB,
where A, B are nonterminal symbols and u is any string of terminal symbols. Thus,
the rule S → AB cannot be a production in any regular grammar if both A, B are non-
terminal symbols. Of course, it is enough to write only the productions and mention
the start symbol S; the other components such as N and Σ can be found out from
42 2 Regular Languages
them. Sometimes, we write the four components of a grammar by using the name
of the grammar as a subscript like G = (NG , ΣG , RG , SG ). This notation becomes
convenient when many grammars are considered in a context.
With such a formal definition of a regular grammar, we must also say how the
productions are applied to generate the strings.
Let G = (N, Σ, R, S) be a regular grammar. If there is a string x ∈ Σ ∗ , a non-
terminal A ∈ N, and a string B ∈ (N ∪ Σ)∗ such that A → B is a production in R,
1
then we write x A ⇒ x B, and read it as x A yields in one step x B. Such a way of
1
writing x A ⇒ x B corresponding to a production A → B will be referred to as an
application of the production A → B.
∗ ∗
Moreover, successive applications of productions is written as ⇒. (In fact, ⇒ is
1
the transitive and reflexive closure of the relation ⇒.) Note that for any string s, we
∗
have s ⇒ s, that is, s yields s, though possibly there would not be any production in
a grammar to show that s yields s in one step. We will also abbreviate two (and then
1 1 1 1
many) yields A ⇒ B and B ⇒ C to A ⇒ B ⇒ C. Such an abbreviated sequence of
yields will be called a derivation, a derivation of the last string from the first string.
1 1 1
For example, A ⇒ B ⇒ C ⇒ D is a derivation of D from A.
The number of steps in the derivation, counting each “one step yield” as one, is
∗
called the length of the derivation. Thus the derivation a ⇒ a has length 0, as no
rule has been applied to derive the string a from itself. Note that all the productions
in a regular grammar are such that a nonterminal can only occur as the rightmost
symbol in any derived string. Thus an application of a rule A → B to derive x B from
x A simply overlooks whatever is to the left of A; it simply replaces A by B.
The yield relations (in 0 step, in m steps, or in 0 or more steps) can formally be
defined by
0
x ⇒ x for any x ∈ (N ∪ Σ)∗ ,
m+1 m
y ⇒ zB iff y ⇒ zA and A → B ∈ R for any y, z ∈ (N ∪ Σ)∗ , A ∈ N, B ∈
Σ ∗ N ∪ Σ ∗,
∗ m
y ⇒ z iff for some m ∈ N, y ⇒ z.
1 m ∗
We will write ⇒, ⇒, and ⇒ as the same symbol ⇒, if no confusion arises.
If G = (N, Σ, R, S) is a regular grammar, then the language generated by G,
written L(G), is the set of all strings of Σ ∗ derived from the start symbol S. That is,
∗
L(G) = {w ∈ Σ ∗ : S ⇒ w}.
Example 2.7. Consider the regular grammar G = ({S, B}, {a, b}, R, S), where R =
{S → a S|B, B → b B|ε}. Let us attempt a derivation.
The second production S → B says that we can derive B from S. That is, S ⇒ B.
The fourth production says that ε can be derived from B, that is, B ⇒ ε. Thus we
have the derivation S ⇒ B ⇒ ε, that is, S ⇒ ε. So, ε ∈ L(G).
2.4 Regular Grammars 43
S ⇒ a S ⇒ a B ⇒ aε = a,
You can see that no derivation in Example 2.7 would give a string of terminals
once a b precedes an a. For example, you cannot have S ⇒ ba. Can you prove it?
You may need to think a bit carefully before starting a proof.
If a string over {a, b} is generated by G, then the last step must use the produc-
tion B → ε, for this is the only production whose right side string does not have a
nonterminal. (The derivation must terminate with terminals only.) Then, before it, B
must have been generated. For this to happen, either B → b B or S → B has been
applied. That is, once B has come into a derivation, S has gone and then a cannot
enter the derivation.
As B can occur only as the rightmost symbol of any derived string, and B would
eventually give a string of b’s, no a can be on the right of any b. This shows that
G can only generate strings of the form a m bn . You will have to show the converse
that all strings of the form a m bn are indeed generated by G. This will show that
L(G) = L(a ∗ b∗ ).
However, sometimes we will not be able to accept such reasoning as a proof,
because there might be a case which we might have left unnoticed. So, how will a
formal proof look like?
Let us use induction on the length of derivations. We notice that the minimum
length of a derivation is 2; the particular derivation is S ⇒ B ⇒ ε deriving the empty
string. This forms our basis case, and there is, of course, nothing to prove here.
Assume that a derivation of length n generates the string a r bs . Now, before the last
step, there should be a B and the last step is an application of the rule B → ε.
Moreover, if in any derivation B appears, then it appears as the rightmost symbol.
Thus, before the last but one step ((n − 1)th step) of the derivation of a r bs , the string
appears as a r bs B. On this string, we use the production B → b B and proceed to
derive
a r bs B ⇒ a r bs b B ⇒ a r bs b = a r bs+1 .
Moreover, this is the only string that can be derived from a r bs B in the next two
steps. Further, once a B is introduced to the derivation, no a can be generated. Thus,
by induction, we have proved that L(G) = L(a ∗ b∗ ).
Exercise 2.7. There are gaps in the above inductive proof of L(G) = L(a ∗ b∗ ). Point
out them and then complete the proof.
44 2 Regular Languages
So, what is the connection between regular expressions and regular grammars?
I would suggest you to wait a bit before attempting to answer this question. I ask
one more related question: given a language, how do we determine whether it is
regular or not? You can try to represent a given language by a regular expression.
When the answer to the first question turns out to be affirmative, you know that
the existence of a regular grammar for a given language would also prove that the
language is regular. But suppose that neither could you construct a regular expression
nor a regular grammar. Then? What do you conclude? Is the language regular or not?
2.25. Find regular grammars for Σ = {a, b} that generate the set of all strings with
(a) exactly one a.
(b) at least one a.
(c) no more than three a’s.
2.26. Let Σ be any alphabet. Construct regular grammars to generate the languages
∅, {ε}, Σ, Σ ∗ , Σ + , and Σ n , for any n ∈ N.
2.27. Let Σ = {a, b}. Construct regular grammars for (generating) the languages in
Examples 2.3–2.5.
2.28. Find a regular grammar that generates the language
(a) aa ∗ (ab ∪ a)∗ .
(b) (aab∗ab∗ )∗ .
2.29. Find a regular grammar that generates the language
(a) {(ab)m : m ∈ N}.
(b) {a 2n : n ≥ 1}.
(c) {wa : w ∈ {a, b}∗ }.
(d) {bw : w ∈ {a, b}∗ }.
(e) {a m bn ck : m, n, k ≥ 1}.
(f) {a m bn : m + n is even, m, n ∈ N}.
(g) {abm cn d : m, n ≥ 1}.
(h) {(ab)m : m ≥ 1} ∪ {(ba)n : n ≥ 1}.
2.30. Give a regular grammar with terminals in {a, b, c} for generating the language
(a) {w : w contains aaa as a substring}.
(b) {w : number of a’s in w is divisible by two}.
(c) {w : w contains no run of a’s of length greater than two}.
(d) {w : w contains at least one occurrence of each alphabet symbol}.
(e) {w :
(w) mod 3 = 0}.
(f) {w : #a (w) mod 3 = 0}.
(g) {w : #a (w) mod 5 > 0}.
2.31. Show that if G is a regular grammar with L(G) ∅, then G must have a
production of the form A → x for some terminal string x.
2.5 Deterministic Finite Automata 45
2.32. Show that for every regular language L not containing ε, there is a regular
grammar having productions in the forms A → Ba or A → a that generates L.
[Such a grammar is a left-linear grammar. First define how a production is applied.]
What we require is a sort of an algorithm that answers “yes” or “no” according as the
language is regular or not. We may think of a machine to which we give our language
as an input and the machine would then tell us as an output whether the language is
regular or not. But how do we give any language, possibly a set of infinite number of
strings, as an input to a machine? We will rather think of a class of machines where
each machine will be made for a language. Imagine a machine for the language a ∗ b∗ .
It will read any given string as an input and will let us know whether the given string
is or is not in the language a ∗ b∗ .
Once we have a machine for each regular language, we can think, in a certain
sense, that the class of such machines determine the class of regular languages. Then,
by arguing about machines we may be able to say whether a given language is or is
not regular. Individual machines then will be tailor-made for particular languages
and will be used as language recognizers.
A typical machine will have a certain finite number of states out of which one will
be marked as the initial state. By default, the machine will start its operation being in
this initial state. It will have a reading head, which will read from a one-dimensional
tape, called the input tape.
The input tape will have a left end and will be extended to the right up to a finite
extent. Imagine a paper tape, say, of 1 cm wide, one end of which you are holding
in your left hand and starting from there you are using your right hand to search for
the other end, which you would eventually reach. The tape is also divided into small
squares (say, of dimension 1 × 1 cm2 for your 1 cm wide tape). Each square will
contain an input symbol and a string will be written on the tape starting with the first
square, from left to right. See Fig. 2.1.
If you think of a numbering of the squares on the tape, with the square on the
left end marked as 1 and etc., then the mth symbol of a string is written on the mth
square. The reading head of the machine will be kept on the leftmost square of the
tape initially.
When the machine starts operating, it will be in the initial state, reading the first
symbol of the input string from the leftmost square of the input tape. Depending
upon the initial state and the input symbol, it will change its state to possibly another
state from the initial (exactly which state, only the machine knows). It will then go
to the next square. The second symbol is read similarly; but this time the machine
reads it being in the new state, not (necessarily) the initial state.
Again, depending upon this new state and the input symbol on the second square,
the machine will go to possibly another state and get ready for reading the symbol
from the next square. This sort of reading a symbol and changing its own state con-
tinues till it finishes reading the input string. Note that upon consuming the whole
string, the machine must have been in some state.
46 2 Regular Languages
a b · · · · c d
q r
To know whether the machine has accepted or rejected the string, we will desig-
nate some of its states as the final states. If the last state of the machine is a final state,
we will say that the machine has accepted the input string, otherwise, the machine
has rejected the string. Final states are the accepting states, so to speak.
A deterministic finite automaton is a quintuple D = (Q, Σ, δ, s, F), where
We use the acronym DFA for a deterministic finite automaton. Suppose a DFA is
reading the input symbol σ being in the state q. If δ(q, σ) = r , then the DFA will go to
the state r and its reading head will go to the next square. Since δ is a partial function
(not necessarily a function), there might be a pair of state and input symbol for which
δ is not defined at all, that is, δ does not prescribe any next state. In such a case, of
course, the DFA will stop abruptly. We will see how to handle such a situation.
Suppose the DFA M = (Q, Σ, δ, s, F) is in operation. At any instant of time, a
description of M can be completely specified by telling in which state the automaton
is, and what is the remaining input string. Formally, a configuration (or an instan-
taneous description) of a DFA is an ordered pair (q, w), where q ∈ Q is a state, the
current state, and w ∈ Σ ∗ is a string, that is yet to be read.
For a DFA, one step of a computation can be described by specifying how a
description of the machine has changed. That is, a step of a computation can be
specified by telling which configuration has given rise to (yields) which other config-
uration. For example, suppose that the DFA is currently in state q scanning the first
symbol of the input string u = σv. If the transition function of the DFA is such that
in the next step of computation it goes to the state r, the remaining string to be read
is v, and then the next configuration of the machine would be (r, v). In such a case,
we will say that the configuration (q, u) yields the configuration (r, v) in one step.
2.5 Deterministic Finite Automata 47
1
We will use the symbol for “yields in one step.” Formally, for the DFA M, the
configuration (q, u) yields in one step the configuration (r, v), that is,
1
(q, u) (r, v) iff δ(q, σ) = r and u = σv for some σ ∈ Σ.
The machine halts (stops operating) when it encounters a state-symbol pair for
which δ is not defined. Suppose that δ(q, σ) is left undefined in the partial function.
Now the machine is in state q and the input string is σv. It halts then and there. We
will also make it a convention that when the input string is totally consumed by the
machine, that is, when the machine has read all the symbols on the input tape, then
also it halts. (Cryptically, δ is not defined for (q, ε)!)
The relation “yields” in the set of all configurations is defined as the reflexive and
transitive closure of the relation of “yields in one step.” Explicitly, a configuration
(q, u) yields another configuration (r, v) iff there is a sequence (of length n ≥ 0) of
configurations (qi , wi ) such that
1
q0 = q, w0 = u, qn = r, wn = v and (qi , wi ) (qi+1 , wi+1 ).
Thus (q, u) yields (r, v) means that (q, u) yields, in zero or more steps, the configu-
ration (r, v). If we have a sequence of configurations
1 1 1
(q1 , w1 ) (q2 , w2 ), (q2 , w2 ) (q3 , w3 ), . . . , (qn−1 , wn−1 ) (qn , wn ),
∗
then (q1 , w1 ) yields all the configurations (q1 , w1 ), . . . , (qn , wn ). The symbol is
used to denote the yield relation just as in derivations of strings in a grammar.
A formal definition of the yield relation can be given as follows:
0
( p, u) ( p, u).
1
( p, σu) (q, u) iff δ( p, σ) = q.
n+1 1 n
( p, σu) (q, v) iff ( p, σ) (r, u) and (r, u) (q, v) for some state r.
∗ m
( p, u) (q, v) iff ( p, u) (q, v) for some m ∈ N.
We make it a convention to omit the superscripts 1, m, ∗, and simply use the same
symbol to denote all of “yields in one step,” “yields in m steps,” and “yields (in 0
or more steps)” if no confusion arises.
A computation by the DFA then will be such a sequence of yields (in one step),
possibly terminating at a configuration (q, ε) for some state q. The configuration
(q, ε) says that the machine is currently in state q and the whole input has already
been read. We then say that the DFA M accepts a string u ∈ Σ ∗ iff (s, u) (q, ε),
where q ∈ F, a final state. This says that the machine starts from its initial state s;
reads the input string, symbol by symbol; changes its state according to its transition
function; comes to a state q after it finishes reading the whole string and then halts.
If this state q is a prescribed final state, then we would interpret that the machine has
accepted the string. Otherwise, the machine rejects the input string.
Exactly what happens when a machine rejects an input string? If the state q that
the machine enters after consuming the whole string is not a final state, then the
48 2 Regular Languages
machine rejects the input string. Moreover, if the machine halts before it finishes
reading the whole string, then also the input string is rejected.
The language accepted by the DFA is the set of all input strings accepted by the
DFA. That is,
Sometimes, we also say that a DFA recognizes the language that it accepts. If more
machines than one are discussed in a context, we will give a subscript to the yield
relation; we use (q, u) M (r, v) to say that when the machine M operates, the
configuration (q, u) yields the configuration (r, v).
Example 2.8. Let Q = {s, q, r }, Σ = {a, b}, F = {s, q}. Define the transition δ by
δ(s, a) = s, δ(s, b) = q, δ(q, a) = r, δ(q, b) = r, δ(r, a) = r, δ(r, b) = r. What is the
language accepted by the DFA D = (Q, Σ, δ, s, F)?
Solution. Give u = a 2 b2 as an input to the machines D. The DFA D starts from the
initial state s while u is on its input tape. That is, the initial configuration of D is
(s, a 2 b2 ). The reading head of D is placed on the leftmost (first) input symbol a.
When D reads the first symbol a, the transition (an element of the transition
function) δ(s, a) = s applies and the machine enters the state s (it does not change
state here), and the reading head goes to the next symbol, which is again a. By this
time, the first symbol has already been read (or consumed) by the machine. The
remaining of the input string is ab2 . Thus the configuration the machine is in is
(s, ab2 ). It reads an a and works similarly to yield in the next step the configuration
(s, b2 ). As δ(s, b) = q, the machine now changes its state to q and the remaining of
the input string is only b. As a configuration, this information is coded as (q, b). As
δ(q, b) = q, the machine reenters the state q and by this time, all of the input string
has been read. Thus the end-configuration (q, ε) is achieved. The computation of D
on the input string u = a 2 b2 can be written down with the help of the yield relation
as follows:
(s, a 2 b2 ) (s, ab2 ) (s, b2 ) (q, b) (q, ε).
The string a 2 b2 is accepted by D as q ∈ F.
For u = a 2 ba, computation of D is as follows:
a b a
D : b q a
s r
Fig. 2.2. Transition diagram
of the DFA of Example 2.8. b
What job does the state r do in the DFA D? As the transition function δ shows,
once the machine enters the state r , it never leaves it. Moreover, it is not a final state.
That means once the machine enters the state r , the input string will eventually be
rejected. Is it not equivalent to omitting all the transitions with r as a state compo-
nent? That is, suppose we leave δ undefined for the pairs (q, a), (r, a), and (r, b).
Then there will be no change in the accepting computations. This is the reason we
allowed δ to be a partial function. The new automaton D = ({s, q}, {a, b}, δ , s) with
δ (s, a) = s, δ (s, b) = q, δ (q, b) = q would be depicted as in Fig. 2.3.
a b
Fig. 2.3. Transition diagram D : b q
s
of a DFA.
D halts abruptly without consuming the whole input. Thus, the string aba is rejected
by D .
By this time you must have guessed how a transition diagram is drawn corre-
sponding to a DFA. We put the names of the states inside circles at different places.
The final states are encircled once more. The initial state is shown by putting an ar-
row going towards it and coming from nowhere. An arrow labeled with a symbol
σ is drawn from a state p to another state t iff we have δ( p, σ) = t as a transition
for a DFA. Similarly, loops are drawn from and to the same state, provided there is
a corresponding transition. Note that the labels on any edge of a DFA diagram are
members of the alphabet Σ.
You can recognize such diagrams; these are simply directed labeled graphs with
additional information of initial and final states. In fact, we could have defined au-
tomata as directed labeled graphs with these additional information on the initial and
final states. Often we will do this by specifying automata by their diagrams instead
of quintuples.
Example 2.9. Construct a DFA for accepting the language {w ∈ {a, b}∗ : w has a
substring bab or a substring aba}.
50 2 Regular Languages
Solution. The language can be represented by the regular expression (a ∪ b)∗ (bab ∪
aba)(a ∪ b)∗ . A DFA is drawn in Fig. 2.4 for accepting this language.
b a; b
a b
5 6 7
b a
b
D : 1
a
2
b
3
a
4
Let us see how D proceeds with the input string aabab. The initial configuration
of D with the input is (s, aabab). The computation can be written down with the
help of the yield relation. Here is the computation:
(1, aabab) (2, abab) (2, bab) (3, ab) (4, b) (4, ε).
As the state 4 is a final state, D accepts aabab. Find such computations for the
input strings bbabab, abbaba, baaba, and abbaab. Understand why the cross edges
(transitions) from the states 3 to 5, and 6 to 2 are made.
Exercise 2.8. Define a DFA as a directed labeled graph. Then define carefully the
yield relation, the notion of a computation, and the languages accepted by them.
δ →1 2 3 4 5 6 7
a 2 2 4 4 6 2 7
b 5 3 5 4 5 7 7
Have you verified that the D in Example 2.9 is indeed a DFA? If not, do it now.
Then modify it by introducing another state so that the new transition function will
2.5 Deterministic Finite Automata 51
become a total function, but the new DFA must do the same job as D. This can be
done by introducing another nonfinal state and adding all missing transitions from
the relevant states to this new one, and then adding loops from this new state to itself.
Such a state in a DFA is called a state of no return.
For defining the language an automaton accepts, we can avoid the configurations
and also the yield relation. It is done by following the transitions in a sequence. What
we have to do is simply extend the total function δ : Q × Σ → Q to another total
function δ ∗ : Q × Σ ∗ → Q inductively. This is done by taking δ ∗ (q, ε) = q and
δ ∗ (q, uσ) = δ(δ ∗ (q, u), σ) for u ∈ Σ ∗ and σ ∈ Σ. Then the language accepted by D
can be defined as L(D) = {w ∈ Σ ∗ : δ ∗ (s, w) ∈ F}. Computations of the DFA can be
shown by the computations of δ ∗ on any given input string. However, we will follow
the usual way of configurations and yields.
Example 2.10. What are the languages accepted by the DFAs in Fig. 2.5?
b
D : 4
a
5 6
a
a b a
D : 2 3 b
b 7
a b
Solution. The initial state 2 of D is also a final state. Thus D accepts the empty string
ε. Then, upon reading an a it changes its state to 3. In this state it can only read b
and then it changes to the state 2, where it also accepts the already consumed input,
that is, the string ab. Now, it is obvious that it will go on accepting any number of
concatenations of this string ab.
Now suppose that D gets the input string as b. What does it do? It does not have
a transition defined for this input, being in the initial state 2. Then it simply halts
(abruptly). Though it is in a final state, namely 2, it does not accept the string because
it has not consumed the input yet (and it does not know how to consume it). A similar
thing happens when D is in state 3 and gets the symbol a to read. Thus it accepts the
language (ab)∗.
As the initial state of D is a final state, it also accepts the empty string. Moreover,
being in its initial state, D can read both a and b. If it reads b, then it enters the
state 7, which is a state of no return and a nonfinal state. This leads to a halt in a
nonfinal state. The string is not accepted. Thus, any string that begins with b is not
accepted by D .
However, if initially being in state 4 it reads a, then it changes to state 5, where
again, an a will lead to nonacceptance. But a b at this stage leads the automaton to
state 6, which is a final state. That is, the string ab is accepted. Once it enters the
state 6, it is clear that the only way it would accept the string is by changing its state
52 2 Regular Languages
to 5 and then to 6, in succession. That is, it will only accept concatenations of the
string ab thereafter. Thus, D accepts the same language (ab)∗.
You can make another DFA from D in Example 2.10 by introducing a state of no
return as the state 7 serves D . Then the partial function will become a total function
without changing the accepted language.
Can you make a DFA to accept the language (ab)+? From among the DFAs, D
and D of Example 2.10, it will be easy to modify the last one. Just do not keep the
state 4 as a final state. That does the job as ε is no more accepted, and all the others
will be accepted as they were. You can, of course, delete state 7.
Example 2.11. Construct a regular expression, a regular grammar, and a DFA for the
language L = {w ∈ {a, b}∗ : w contains an odd number of a’s}.
Solution. What are the strings in {a, b}∗ that are in L? It may be a string having any
number of b’s, followed by an a, followed by any number of b’s. Or, it may be a string
of the previous type and then followed by an a, then any number of b’s, and then an a
followed by any number of b’s. Here “any number of b’s ” also includes zero number
of b’s, etc. That is, the strings can be provisionally written down as
b b
a
Writing formally, the DFA is D = ({s, q}.{a, b}, δ, s, {q}), where δ(s, a) = q,
δ(s, b) = b, δ(q, a) = s, δ(q, b) = b. Computations of D with the strings aabbabb
and bbbabba look like:
(s, aabbabb) (q, abbabb) (s, bbabb) (s, abb) (q, bb) (q, ε).
As q is a final state, the first string aabbabb is accepted, while the second string
bbbabba is not accepted as s is not a final state. Convince yourself that L(D) = L
as claimed. Can you prove it? Also, compare the DFA with the grammar we have
constructed.
Can you see the connection between a regular expression, a regular grammar, and
an automaton? Let us have some more examples.
Example 2.12. Construct an automaton to accept the language a + b+ .
Solution. This language differs from our opening example a ∗ b∗ only by the empty
string ε. A regular grammar for the language can be constructed by having the pro-
duction rules S → a A, A → a A|b B, B → ε|b B. Verify it. This also gives you hints
for constructing a DFA. See the diagram in Fig. 2.7.
a b
Fig. 2.7. DFA for a b
Example 2.12.
Check whether it works correctly for some strings in the language and for some
which are not in the language.
Example 2.13. Construct a DFA that accepts the language {w ∈ {0, 1}∗ : w has even
number of 1’s}.
Solution. Convince yourself that the DFA in Fig. 2.8 does the job. Write also the
regular expression and the regular grammar for the language.
0
1 0
Fig. 2.8. DFA for
Example 2.13. 1
Example 2.14. Construct a DFA for accepting the language {w ∈ {0, 1}∗ : w does
not contain three consecutive 1’s}.
54 2 Regular Languages
Solution. You have met this language earlier where your alphabet was {a, b} instead
of {0, 1}. If you have forgotten, then construct a regular expression and a regular
grammar for this. A DFA for accepting the language is given in Fig. 2.9. Write the
DFA as a quintuple and then verify that it really accepts the language.
0
0
0
Fig. 2.9. DFA for 1
Example 2.14. 1
In the problems, I will ask you to construct an automaton that accepts all strings
satisfying certain constraints. What I mean there is that the language of your au-
tomaton should be the set of all strings satisfying the given constraint. Thus, it is not
just enough to construct an automaton that accepts all those strings, I require that
the automaton must accept those strings and nothing else. Similar caution goes with
the grammars and regular expressions, and also with some other machines we will
devise later.
2.34. Construct a DFA that recognises any string over {a, b, . . . , z} having the sub-
string abc.
2.35. For the following DFAs ({ p, q, r, s}, {a, b}, δ, s, F), draw the transition dia-
grams and then find the languages they accept, in both the set notation and in plain
English:
(a) F = { p}, δ( p, a) = q, δ( p, b) = s, δ(q, a) = r, δ(q, b) = r, δ(r, a) = r, δ(r, b) = r,
δ(s, a) = p, δ(s, b) = r.
(b) F = {s}, δ( p, a) = q, δ( p, b) = s, δ(q, a) = r, δ(q, b) = p, δ(r, a) = r, δ(r, b) = r,
δ(s, a) = p, δ(s, b) = r.
(c) F = {s}, δ( p, a) = q, δ( p, b) = s, δ(q, a) = q, δ(q, b) = q, δ(r, a) = s, δ(r, b) = q,
δ(s, a) = p, δ(s, b) = r.
(d) F = {s}, δ( p, a) = q, δ( p, b) = q, δ(q, a) = q, δ(q, b) = q, δ(r, a) = s, δ(r, b) = q,
δ(s, a) = p, δ(s, b) = r.
(e) F = {q}, δ( p, a) = q, δ( p, b) = q, δ(q, a) = q, δ(q, b) = p, δ(r, a) = r, δ(r, b) = r,
δ(s, a) = p, δ(s, b) = s.
2.37. Construct DFAs for accepting the following languages on Σ = {a, b}:
(a) (a +b)2 .
(b) (a + b)2 − (a + b).
2.6 Nondeterministic Finite Automata 55
2.38. Consider the set of all strings w on {0, 1} defined by the requirements below.
For each, construct an accepting DFA.
(a) In w, every 00 is followed immediately by a 1. For instance, 101, 0011 are in the
language but 0001 is not.
(b) w contains 00 but not 000.
(c) The leftmost symbol in w differs from the rightmost symbol.
(d) Each substring of w of length at least four has at most two 0’s. For instance,
001110, 011001 are in the language but 10010 is not.
(e)
(w) ≥ 5 and the fourth symbol from the right end of w is different from the
leftmost symbol.
(f) In w, the leftmost symbol and the rightmost two symbols are identical.
2.39. Let M be a DFA with a as an input symbol. Suppose that δ(q, a) = q for each
state of M. Then show that either a ∗ ⊆ L(M) or a ∗ ∩ L(M) = ∅.
2.40. Let M be a DFA with the initial state s and a single final state r . Suppose
δ(s, σ) = δ(r, σ) for each input symbol σ. Show that if a nonempty string w is in
L(M), then for all n ≥ 1, the string wn is also in L(M).
2.41. Let M = ({s, p, q}, {0, 1}, δ, s, {s, p}) be a DFA with δ(s, 0) = p, δ(s, 1) = s,
δ( p, 0) = q, δ( p, 1) = s, δ(q, 0) = q, δ(q, 1) = q. Guess the language that M accepts.
Prove that your guess is correct.
transition function δ will no more be a function! Well, we will use a relation instead.
To make the nondeterministic machines more flexible, we will allow them to read the
empty string ε along with symbols from Σ. The following is a formal definition of a
nondeterministic automaton.
A nondeterministic finite automaton, abbreviated to NFA, is again a quintuple
N = (Q, Σ, Δ, s, F), where Q, Σ, s, F are, respectively, the set of states, the input
alphabet, the initial state, and the set of final states, as in a DFA, and Δ is a finite
relation from Q × (Σ ∪ {ε}) to Q.
As mentioned earlier, Δ is a finite subset of (Q × (Σ ∪ {ε})) × Q, called the
transition relation of N. Any element of the transition relation Δ will be called a
transition. Instead of writing a transition as the formally correct ((q, σ), r ), we will
write it as the triple (q, σ, r ). Moreover, we will use the term finite automaton to
commonly refer to both a DFA and an NFA.
Suppose the NFA N = (Q, Σ, Δ, s, F) is currently in state q, its reading head
is scanning the symbol σ and to the right of it is the remaining string v. Suppose
further that we have a transition (q, σ, r ) ∈ Δ. Then the NFA will go to the state
r with the input string left to be read as v, as usual. However, there might be sev-
eral triples in Δ whose first two components are, respectively, q and σ. That is,
as Δ is a relation (and not a partial function) we might have n number of triples
(q, σ, r1 ), (q, σ, r2 ), · · · , (q, σ, rn ) in Δ. (Say, r is one of these ri .) Then, we do not
know exactly to which of the n states r1 , · · · , rn the NFA goes next.
Further, the input string w can be parsed in a trivial way, that is, w = εw. Now, if
(q, ε, r ) ∈ Δ (Look at the relation Δ above once again.), then the NFA may enter the
state r without consuming any input. The transition (q, ε, r ) says that if N is in state
q, then it can change its state to r without consuming any input. Such a transition
is called an ε-transition. We should keep all these in mind, while formalizing the
notion of a computation in an NFA.
Let N = (Q, Σ, Δ, s, F) be an NFA. A configuration of N is an ordered pair
(q, u), where q ∈ Q is a state, the current state, and u ∈ Σ ∗ is a string that is yet
to be read. Let (q, u) and (r, v) be two configurations of N. We define the relation of
1
yields in one step, using the same symbol by
1
(q, u) (r, v) iff there is σ ∈ Σ ∪ {ε} such that u = σv and (q, σ, r ) ∈ Δ.
The “yield” relation is defined as the reflexive and transitive closure of “yields in one
∗
step.” As earlier, we use the same symbol instead of the more formal to denote
the yield relation. Thus, (q, u) yields (r, v) iff there is a sequence of configurations
each yielding in one step the next one starting from (q, u) and ending at (r, v). Also,
(q, u) yields itself, in 0 steps. That is,
(q, u) (q, u).
1
(qi , ui ) (qi+1 , ui+1 ) for i = 1, . . . , n − 1.
(q, u) (r, v) iff for some n ≥ 1, q0 = q, qn = r, u0 = u, un = v.
In the case of an NFA, computation, that is, the sequence of yields, is nonde-
terministic in the sense that starting from a configuration (q, u) it may end up at
2.6 Nondeterministic Finite Automata 57
various configurations (r, v). For instance, suppose both (q, σ, r ) and (q, σ, p) are
members of Δ, transitions in our NFA N. In such a case, both (q, σu) (r, u) and
(q, σu) ( p, u) hold. Thus a computation starting with a configuration might end
up at several configurations, and thus, in different states. However, at any one in-
stance of computation, N would follow up one such path, but we do not know which.
We will say that the NFA accepts the input string if one such computation ends up at
a final state.
Formally, the NFA N accepts an input string u ∈ Σ ∗ iff there is a (final) state
q ∈ F such that (s, u) (q, ε) in some computation. Then N rejects the input string
u iff there is no possible computation yielding an end-configuration ( p, ε) from the
starting configuration (s, u) for any of the final states p.
Note that in both the cases of a DFA and an NFA, the automaton halts if there is
no transition defined for any configuration met through the steps of a computation.
In a DFA with a total function as its transition, the machine must have consumed the
whole input and the string is thus rejected. While an NFA might follow up an alter-
nate computation path leading to the acceptance of the string, the language accepted
by an NFA N is
Notice that we are a bit partial towards preferring acceptance over rejection. If N
has an accepting computation and also a rejecting computation on the same input,
our definition says that N accepts the input.
Example 2.15. Let N = (Q, Σ, Δ, s, F) be the NFA with Q = {s, q, r }, Σ = {a, b},
F = {s, q}, and Δ = {(s, a, s), (s, ε, q), (s, b, r ), (q, b, q), (q, b, r )}. What is L(N) ?
Here, the machine N halts without reading further inputs, as Δ gives no informa-
tion as to what to do once it reads the input b being in the state r. However, we cannot
conclude that N rejects the input string u, for there might be another accepting com-
putation! Let us see one more possible computation of N with the same input string
a 2 b2 . Here is one more computation in N:
a b
N : s " q
b
b
Fig. 2.10. Transition
diagram of the NFA of
r
Example 2.15.
Once N enters the state r , it never leaves the state, and as r ∈ F it leads to a
rejecting computation. The only accepting computations are those that never enter r.
It is then obvious that L(N) = a ∗ b∗ . Thus we can safely delete the state r from N.
Moreover, you can also see that due to the ε-transition (s, ε, q), there is no need
to keep s as a final state in Fig. 2.10. The modified NFA is drawn in Fig. 2.11.
a b
N : s " q
Fig. 2.11. Modified NFA.
Does N accept the same language as N (of Fig. 2.10) or it does not? Check the
computation with some strings from {a, b}∗ to answer the question and then argue
abstractly.
As in a DFA, we put the names of the states inside circles at different places; the
final states are encircled once more. The initial state is shown by putting an arrow
going towards it and coming from nowhere. An arrow labeled with a symbol σ is
drawn from a state p to another state t iff we have ( p, σ, t) ∈ Δ. Similarly, loops
are drawn from and to the same state, provided there is a corresponding transition.
The labels on any edge of an NFA-diagram can be members of the alphabet Σ, and
some of them can also be ε, as well.
In case of an NFA, there can be many edges between the same pair of vertices with
different labels. For example, if in an NFA, we have the transitions (q, a, r ), (q, b, r ),
then its diagram will have an edge from q to r with label a and also another edge
from q to r with label b. In such cases, instead of having multiple edges, we will
draw only one edge and label it with all those strings separated by commas. For ex-
ample, the transitions (q, a, r ) and (q, b, r ) will be depicted by a single edge labeled
2.6 Nondeterministic Finite Automata 59
“a, b.” Sometimes, we will not write the names of the states inside the circles; see
Example 2.16 below.
Example 2.16. Construct an NFA for accepting the language {w ∈ {a, b}∗ : w has a
substring bab or a substring aba}.
Solution. The language can be represented by the regular expression (a ∪ b)∗ (bab ∪
aba)(a ∪ b)∗ . See the NFA N given in Fig. 2.12 and try to connect the diagram
directly with this regular expression.
a; b a; b
a
b b
N :
Fig. 2.12. NFA for a a
Example 2.16. b
Exercise 2.9. Define an NFA as a directed labeled graph. Then define carefully the
yield relation, the notion of a computation, and the languages accepted by them.
Example 2.17. Construct an NFA that accepts all strings of length 3 over the alpha-
bet {a, b}.
Solution. Trivially, you can have all the strings of length 3, take only one state and
have loops on this state with all these strings as labels, insert extra states to break the
strings into symbols; finally declare the initial state as a final state. This will do. This
also tells you how to construct a regular expression and a regular grammar for the
language.
However, a better way of writing the regular expression is to use union and con-
catenation. It is simply (a ∪ b)(a ∪ b)(a ∪ b). Similarly, a better grammar can be
constructed using the only productions as
Compare the NFA in Fig. 2.13 with this grammar. Just a look at the NFA tells that it
does the intended job.
Example 2.18. Draw the transition diagram of an NFA that accepts the language
a + b+ ∪ a ∗ b.
Solution. As a + = a ∗ a you may think of a loop labeled a on the initial state and
then an edge from this state to another labeled a. Similarly, from the next state, you
simply have a loop and an edge to the third state both labeled b. This would account
for a + b+ , provided the third state is a final state. Notice that this itself takes care of
the strings in a + b. The only string left out is a single b. So, have an edge from the
initial state to the third one. This probably does the job. The transition diagram is
drawn in Fig. 2.14. Verify with some input strings and then decide whether this NFA
really does the job.
a b
Fig. 2.14. NFA for
Example 2.18. b
Solution. A regular expression for L is a(aaaa)∗. A regular grammar for L has the
productions S → a A, A → ε|aaa S. An NFA can be easily constructed by mimick-
ing the grammar.
Take one state as s and the other as q. Being in state s, let the machine read a and
go to the state q. Next, being in q, it reads aaa to come back to s. Finally, declare q
as the final state. We need to modify this a bit by breaking the string into symbols.
Just have two more states, say, p and r to break aaa symbol by symbol. That is, the
NFA starts from s, reads a, and changes state to q. Then it reads a and changes state
to p; again it reads a and changes its state to r . One more a is to be read. So, being
in the state r it reads an a and changes its state back to s. Final state is q as earlier.
See the diagram in Fig. 2.15. Convince yourself taking some input strings whether
the automaton really does the intended job. This NFA is also a DFA.
Exercise 2.10. Does the NFA in Fig. 2.16 accept {w ∈ {a, b}∗ : w has aa or bb as
substring(s)}?
2.6 Nondeterministic Finite Automata 61
b a; b
a; b
b
"
a
Fig. 2.16. DFA for
a; b
Exercise 2.10. a
In fact, nondeterminism can be used for doing very complicated jobs easily. It is
because, these machines have the power to guess and then go towards choosing the
correct string. The ε-transitions especially can be used for guessing a correct string.
Look at Examples 2.20 and 2.21.
Example 2.20. Construct automata for accepting the language (10 ∪ 101)∗.
s 1 0
Fig. 2.17. NFA for
Example 2.20. "
An NFA without ε-transitions accepting the same language can also be con-
structed, see for example, the transition diagram in Fig. 2.18.
1
Fig. 2.18. NFA without 0
s p q
ε-transition for 0
Example 2.20. 1
You can then try constructing a DFA for accepting the same language. But first see
why the NFA in Fig. 2.18 is not deterministic. It is because, we have two transitions
62 2 Regular Languages
for the state–symbol pair ( p, 0). The NFA could go to the state q or to the state s
when it is in state p reading the symbol 0.
At this point, close the book and try to construct a DFA. Can you have a simpler
DFA to accept the above language than the one drawn in Fig. 2.19?
1
1 1
Fig. 2.19. A DFA for 0
Example 2.20. 0
Example 2.21. Construct an NFA for accepting the language L = {w ∈ {a, b, c, d}∗ :
at least one of the symbols a, b, c, d does not appear in w}.
Solution. Suppose we use the state A of the machine, from where it can accept all
strings without having an occurrence of a. We can use an ε-transition to drive the
machine from the initial state to A. Similarly, ε-transitions can be used to go to any
of the states B, C, D, which correspondingly would accept the strings where the
corresponding symbols b, c, or d does not appear. This language can be represented
by the expression ({a})∗ ∪ ({b})∗ ∪ ({c})∗ ∪ ({d})∗ . Look at the diagram of such a
machine in Fig. 2.20.
b; c; d
"
" a; c; d
"
a; b; d
"
Fig. 2.20. An NFA for
a; b; c
Example 2.21.
You can define an NFA by using a function instead of a relation. In such a case,
an NFA is a quintuple (Q, Σ, Δ, s, F), where Δ : Q × (Σ ∪ {ε}) → 2 Q . In
this definition, the ordered triples ( p, a, q1 ), ( p, a, q2), . . . , ( p, a, qn ) are written as
Δ( p, a) = {q1 , q2 , . . . , qn }.
A simple answer to “why nondeterminism” is: Look at the regular grammars.
They allow nondeterminism. From the same nonterminal symbol, you can derive
different strings by choosing different production rules.
2.6 Nondeterministic Finite Automata 63
2.42. For the following NFAs ({ p, q, r, s}, {a, b}, Δ, p, F), draw the transition dia-
grams and then find the languages they accept, in both the set notation and in plain
English:
(a) Δ = {( p, a, p), ( p, a, q), ( p, b, p), (r, a, r ), (r, b, r ), (r, a, s), (s, a, s), (s, b, s)},
F = {s}.
(b) Δ = {( p, a, q), ( p, a, s), ( p, b, q), (q, a, r ), (q, b, q), (q, b, r ), (r, a, s), (r, b, p),
(s, b, p)}, F = {q, s}.
(c) Δ = {( p, a, p), ( p, a, q), ( p, b, p), (q, a, r ), (q, a, s), (q, b, s), (r, a, p), (r, a, r ),
(r, b, s)}, F = {s}.
(d) Δ = {( p, a, p), ( p, a, r ), ( p, b, q), (q, ε, r ), (q, a, q), (q, b, r ), (r, ε, q), (r, a, r ),
(r, a, p)}, F = {r }.
(e) Δ = {( p, a, q), (q, b, r ), (r, a, q), (q, b, s), (s, a, r )}, F = { p, r }.
(f) Δ = {( p, ε, q), ( p, a, q), (q, a, p), (q, b, q), (q, a, r ), (q, b, r ), (r, a, r ), (r, b, q),
(r, q, s), (s, a, s), (s, b, s)}, F = {q}.
2.43. Draw transition diagrams of NFAs that accept the following languages:
(a) ((a ∗ b∗ a ∗ )∗ b∗ )∗ .
(b) (ba)∗(ab)∗ ∪ b + .
(c) ((ba ∪ aab) ∗ a ∗ )∗ .
(d) (a ∪ ab)∗ ∪ (b ∪ aa)∗.
(e) (ba ∪ bba ∪ aba)∗.
(f) (a ∪ b)∗ b(a ∪ b)4 .
2.44. Draw the transition diagram of the NFA with initial state p, final state q, and
another state r, having transitions ( p, 0, q), ( p, 1, q), (q, 0, p), (q, 1, q), (q, 0, r ),
(q, ε, r ), (r, 1, q). Which language does it accept?
2.45. Check whether the NFA in Fig. 2.21 accepts {w ∈ {0, 1}∗ : w has a substring
11 or has a substring 101}. Construct a DFA to accept the same language.
0; 1
1 "
1 0; 1
2.46. Construct a DFA/NFA that accepts, in each case below, the set of all strings
w ∈ {a, b}∗ satisfying
(a) w has exactly one a.
(b) w has at least one a.
(c) w has no more than three a’s.
64 2 Regular Languages
2.47. Prove that for each NFA with many final states, there is an NFA with a single
final state accepting the same language. Is it true for DFAs also?
As a uniform data structure, we have started with strings of symbols; the symbols
are from a finite set, called an alphabet. Then we defined a language over an alphabet
as a set of strings. For finite representation of languages, we have defined the regular
expressions, which are again strings of symbols with five more new symbols such
as ∅, ∪, ∗, ), and (, obeying certain specified formation rules. We have also intro-
duced three other mechanisms for representing languages, such as regular grammars,
DFAs, and NFAs. The regular expressions are called the representation devices, the
regular grammars are the generative devices, and the DFAs and NFAs are the rec-
ognizing devices. Nonetheless, all of them represent languages. Almost always it so
happened that if a language is represented by any one of the four mechanisms, it is
also represented by another. It raises a natural question, whether it is always so. We
will show that it is indeed so in the next chapter.
Introduction of finite state transition systems go back to McCulloh and Pitts [84]
in 1943. Kleene [67] used the present form of DFAs. The NFAs were introduced by
Rabin and Scott [107]. The notation for automata has been used by various authors
after Hopcroft and Ullman [58].
You will find in the exercises below the so-called finite state transducers first in-
troduced in [37] and also discussed in [86] and [91]. For advanced texts on language
theory, you may consult [51, 113].
The additional problems below are a bit harder. Some of them appeared in original
research papers. If you cannot solve every one of them, do not get disheartened.
Perhaps, after some time you will be able to solve them with the help of your teacher.
2.7 Summary and Additional Problems 65
2.49. The star height h(E) of a regular expression E over an alphabet Σ is de-
fined inductively by h(∅) = 0, h(σ) = 0 for each σ ∈ Σ, h(α ∪ β) = h(αβ) =
max(h(α), h(β)), h(α ∗ ) = 1 + h(α). In each case below, find an equivalent regular
expression having the least star height.
(a) (011∗0)∗ .
(b) (a(b∗c)∗ )∗ .
(c) (0∗ ∪ 1∗ ∪ 01)∗ .
(d) (c(ab∗c)∗ )∗ .
(e) ((01)∗ ∪ 1∗ )∗ 0∗ .
(f) ((abc∗)ab)∗.
2.50. Show that each regular expression is equivalent to one in the form (α1 ∪α2 · · ·∪
αn ) for some n ≥ 1, where ∪ does not occur in any αi . This is called the disjunctive
normal form of a regular expression.
2.51. Is it true that every regular expression is equivalent to one in the form
α1 α2 · · · αn for some n ≥ 1, where no αi is a concatenation of other expressions?
2.52. Find a regular expression that denotes all bit strings whose value, when inter-
preted as a binary integer, is greater than or equal to 40.
2.53. Find a regular expression for all bit strings with leading bit 1, which when
interpreted as a binary integer, has value not between 10 and 30.
2.54. In the roman number system, numbers are represented by strings on the alpha-
bet Σ = {M, D, C, L, X, V, I }. For simplicity, replace the subtraction convention in
which 9 is represented as IX ; instead write it as VIIII; similarly for IV , write IIII and
etc. Design a DFA that accepts all strings over Σ that represents a number in roman
number system.
2.55. Find an NFA that accepts the set of all strings over {a, b} not containing the
substring aba. Then construct a regular expression for the same language. Try to
simplify the expression as far as possible.
2.56. Construct a DFA in each case below that accepts all binary strings w such that
(a) w is divisible by 5 when interpreted as a binary integer.
(b) Reversal of w is divisible by 5 when interpreted as a binary integer.
66 2 Regular Languages
2.57. Show that if an NFA with m states accepts any string at all, then it accepts a
string of length less than m.
2.58. Let M be a DFA with n ≥ 1 number of states. Show that M accepts an infinite
language iff M accepts a string w for which n ≤
(w) < 2n.
2.59. Construct an NFA with input alphabet {a} that rejects some string, but the
length of the shortest rejected string is more than the number of states.
2.60. Let N = (Q, Σ, Δ, s, {q}) be an NFA where there is no transition to s and no
transition from q. Describe the language of the modified NFA, in terms of L(N),
using each of the following modifications to N:
(a) Add an ε-transition from q to s.
(b) Add an ε-transition from s to each state reachable from s. We say that p is reach-
able from s when you can follow the arrows to go from s to p.
(c) Add an ε-transition to q from each state p with the property that q is reachable
from p.
(d) Modify using both (b) and (c).
2.61. The finite automata we have defined do not produce any output. Imagine a finite
automaton with another tape, which is initially blank having a writing head associated
with this second tape. These automata which can give output on the second tape
are called transducers. Transducers are of two types: Moore machines and Mealy
machines. In a Moore machine, symbols are written on the output tape depending on
what state the machine is currently in. That is, we have an extra output alphabet, say
Γ , and an extra map, say, λ : Q → Γ . When the machine is in state q, the symbol
λ(q) is written on the second tape. In the Mealy machines, output are produced by
the transitions, that is, λ : Q × Σ → Γ. These transducers do not require any final
states.
(a) For a Moore machine (Q, Σ, Γ, δ, λ, q0 ), show that the output in response to the
input σ1 σ2 · · · σn is λ(q0 )λ(q1 ) · · · λ(qn ), where δ(qi−1 , σi ) = qi , for 1 ≤ i ≤ n.
Moore machines always give output λ(q0 ) for input ε.
(b) Show that the Moore machine M = ({q0 , q1 , q2 }, {0, 1}, {0, 1}, δ, λ, q0 ) with
δ(q0 , 0) = δ(q1 , 1), = q0 , δ(q0 , 1) = δ(q2 , 0) = q1 , δ(q1 , 0) = δ(q2 , 1) = q2 and
λ(q0 ) = 0, λ(q1 ) = 1, λ(q2 ) = 2 computes the residue modulo 3 for each binary
string treated as a binary integer.
(c) For a Mealy machine (Q, Σ, Γ, δ, λ, q0 ), show that the output in response to the
input σ1 σ2 · · · σn is λ(q0 , σ1 )λ(q1 , σ2 ) · · · λ(qn , σn ), where δ(qi−1 , σi ) = qi for
1 ≤ i ≤ n. [Mealy machines always give output ε for input ε.]
(d) Construct a Mealy machine for computing the residue modulo 3 for each binary
string treated as a binary integer. [This can be trivial if you have a Moore machine
for the same job.]
(e) Show that corresponding to each Moore machine, there exists a Mealy machine,
which gives the same input–output mapping.
(f) Show that corresponding to each Mealy machine, there exists a Moore machine,
which gives the same input–output mapping.
2.62. A serial binary adder processes two binary sequences x = a1 a2 · · · an and y =
b1 b2 · · · bn , where each ai , b j is a bit (0 or 1). It adds x and y bit by bit. Each bit
2.7 Summary and Additional Problems 67
addition creates a digit for the sum as well as a carry digit for the next higher position.
The rules are: 0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1 without any carry digit, and 1 + 1 = 0
with carry digit 1.
(a) A Mealy machine for the binary serial adder has three tapes, first one with x,
second one with y, and the third for the output, can thus have two states, say p
for carry and q for no-carry. For example, typical transitions of such a machine
are δ(q, 1, 1) = p and λ(q, 1, 1) = 0. This means, if the machine is in state q and
scanning 1 on its first tape (second components of δ and λ), 1 on the second tape
(third components), then in its next move, it goes to state p and outputs 0 on the
third tape. We depict such transitions δ and λ in diagram by an arrow from q to p
labeled (1, 1)/0. Design such a Mealy machine for the binary serial adder.
(b) Design a Moore machine with three tapes in a similar way for the binary serial
adder, where contents of the third tape gives the output.
2.65. Design a transducer to convert a binary integer into octal. For example, the
binary string 011101100 should produce 354.
2.66. Design a transducer that computes parity of every substring of three bits of
a given binary string. That is, if the input is a1 a2 · · · an , then the output will be
b1 b2 · · · bn , where b1 = b2 = 0 and b j = (a j −2 + a j −1 + a j )mod 2 for j ≥ 3. For
example, the input 1010110 gives the output 0001000.
2.67. Design a transducer that accepts bit strings a1 a2 · · · an and computes the binary
value of each set of three consecutive bits modulo 5. That is, it outputs b1 b2 · · · bn ,
where b1 = 0 = b2 and b j = (4a j2 + 2a j −1 + a j )mod 5 for j ≥ 3.
2.68. Design a transducer to compute the maximum of two given binary integers.
2.69. A deterministic two-tape automaton is a device like a DFA that accepts a pair
of strings. Each state is in one of the two sets, first or second; depending on which
set the current state is in, the transition function refers to the first or second tape.
Formally define a deterministic two-tape automaton.
2.71. Find a regular expression that generates the set of all strings of triplets defining
correct binary addition.
3 Equivalences
3.1 Introduction
By now, You are familiar with regular expressions, regular grammars, determinis-
tic finite automata, and nondeterministic finite automata. Are they talking about the
same class of languages? The examples in the last chapter, at least, suggest that they
might. In this chapter, we will see that it is indeed so. We will say, informally, that
two mechanisms are equivalent if they accept the same class of languages. In fact,
we solve many subproblems to arrive at these equivalences. Our route is from NFA
to DFA, from DFA to regular grammar, from regular grammar to NFA, from regular
expression to NFA, and finally, from NFA to regular expression.
We start with our first question. We pose a subproblem: can we construct a DFA to
imitate an NFA?
Let N = (Q, Σ, Δ, s, F) be an NFA. If the transition relation Δ is already a
partial function from Q × Σ to Q, then N is itself a DFA. Otherwise, suppose that
Δ, a relation from Q × (Σ ∪ {ε}) to Q is not a partial function from Q × Σ to Q.
Then, in Δ, we have triples of the form (q, σ, r ), where
(a) σ = ε, or
(b) there exists at least one r ∈ Q, r r such that (q, σ, r ) ∈ Δ.
We will construct new NFAs to accept the same language as N but will not satisfy
any of the above properties. Then, the new NFA is bound to be a DFA accepting the
same language as N.
We first try to eliminate the possibility (a). This says that there may be ε-
transitions, that is, edges labeled with ε. Suppose, for example, there is a triple
(q, ε, r ) ∈ Δ, an ε-transition in N. Suppose further that the automaton N is cur-
rently in the state q reading some symbol. What will be the next action of N? It may
A. Singh, Elements of Computation Theory, Texts in Computer Science, 69
c Springer-Verlag London Limited 2009
70 3 Equivalences
continue computation from the state q or it may first go to the state r and then con-
tinue computation. While changing from the state q to r , it does not consume any
input symbol due to the ε-transition. Looking from above, it amounts to telling that
N is in one of the states q or r , or that it is in the state {q, r }.
Now, look at what we are trying to discuss here. The set {q, r } can be thought of
as a new naming of the states. Instead of taking the states individually by q and r and
then telling that these are the possible states the machine can be in, we are simply
using the set {q, r } as a single state. In such a naming scheme of the states, the state
q itself can be named as the singleton {q}.
Well, suppose that we take {q, r } as a new state. We are going to shrink the edge
labeled ε, that is, merge the states q and r and rename the merged state as {q, r }. This
removes the states q and r from the picture and replaces both with {q, r } deleting the
edge from q to r labeled ε. But then, will the new machine compute the same way as
the old?
There are two objections. One, the merging of two states q and r does not take
care of the asymmetry in the directed edges. That is, when we merge q and r , does it
make any difference if the edge is from r to q? No. But this should. Two, there might
be another ε-transition from r to a state, say t. In that case, the automaton can go to
the state t from q without consuming any input. The merging does not take care of
this situation. The term merging is in fact, sloppy.
To overrule the objections, we have to do something more. We will have to get
all those states in N that are reachable from q in place of the only other state r .
That is, we must collect together all the states that are connected from q by an ε-
transition, all the states connected from those latter states by ε-transitions, and so
on. This renaming scheme can be formalized by defining a set of states, call it R(q),
the states reachable from q without consuming any input. This is defined as follows:
Let M = (Q, Σ, Δ, s, F) be an NFA, where Δ ⊆ (Q × (Σ ∪ {ε})) × Q. Corre-
sponding to each state q ∈ Q, the set R(q) is defined inductively by the following:
1. q ∈ R(q).
2. If p ∈ R(q) and ( p, ε, t) ∈ Δ, then t ∈ R(q).
3. R(q) contains all and only those states satisfying (1 and 2).
Thus, R(q) is the set of all states p ∈ Q such that (q, ε) ( p, ε) in zero or more
steps. The set R(q) can also be defined as
R(q) = { p ∈ Q : (q, ε) ( p, ε)}
= { p ∈ Q : (q, w) ( p, w) for every w ∈ Σ ∗ }.
Each set R(q) is nonempty as q ∈ R(q). Moreover, R(q) is the transitive closure
of the set {q} under the relation {( p, t) : there is a transition ( p, ε, t) ∈ Δ}. You can
use the while loop to construct the set R(q) as in the following:
Initialize R(q) {q} ;
while there exists a transition ( p, ε, t) ∈ Δ with p ∈ R(q) and t ∈ R(q) do
R(q) R(q) ∪ {t} od
3.2 NFA to DFA 71
Example 3.1. Find R(q) for the NFA N = ({ p, q, r, s, t}, {a, b}, Δ, s, {q, r }),
where Δ = {(s,a, q), (s,a, p), (q, ε, p), (q, ε, t), (q, a, s), ( p, a, r ), ( p, ε, s), (r, ε, s),
(r, b, q), (t, a, p)}.
Solution. To understand the example, do not just read it. First, draw a transition
diagram and then proceed. Initially, we take R(q) = {q}. Now, the ε-transitions from
q are (q, ε, p) and (q, ε, t). Thus, R(q) is updated to R(q) = {q, p, t}. Next, the
ε-transitions from p is ( p, ε, s) and there is no ε-transition from t. Thus, s is again
added to R(q), updating it to R(q) = {q, p, t, s}. From s there is no ε-transition.
Now, we also see that starting from any state in R(q), if we follow up ε-transitions,
we end up in a state that is also in R(q). Moreover, each state in R(q) is reached from
q by following possibly successive ε-transitions. Thus, this is the required set. Verify
that R( p) = { p, s}, R(r ) = {r, s}, R(s) = {s}, and R(t) = {t}.
Observe that whenever q is a state from which there exists no ε-transition to any
other state, R(q) = {q}. Thus, we have got new states that are sets of some of the
states of N. In Example 3.1, we have the new states as R( p), R(q), R(r ), R(s), and
R(t). Now, how do we construct transitions from and to these new states? They
should mimic the computations of N anyway. Well, what is the meaning attached
to this new set R(q)?
It says that R(q) contains all the states of N that are reachable from q in N by
consuming no input symbol. Moreover, these are all such states. So, when N being
in state q reads an a, it goes to the state s (since (q, a, s) ∈ Δ); our new transition
should be able to do it. Next, as N, from the state q, can follow an ε-transition to go
to the state p and then read an a to go to the state r (since (q, ε, p), ( p, a, r ) ∈ Δ),
our new transitions should also be able to capture this computation.
Now, starting from q we end up in s as well as in r . Moreover, had there been an
ε-transition from s, N could have gone to that state after reading a. The automaton N
could take any one of these paths in any single computation. But for the acceptance
of a string, it would only look for one successful computation. Thus, we must take
all these states into consideration and define a transition accordingly.
Suppose that in an NFA M, we have a state q and we have already computed the
sets R(x) for every state x. Suppose (q, a, p) is a transition in M. This means that M
can go to ( p and then to) R( p) by reading a. This also means that M can start from
any one of R(q), read an a from that state, and then as Δ permits, it goes to another
state. We take care of these eventualities by defining the new transition as follows:
However, the new set δ(R(q), a) need not be any of the new states we have con-
structed, because the class of these R(x)’s may not be closed under union. Well, what
if we take every subset of Q instead of only these R(x)’s? There, of course, the defi-
nition of δ will be all right. Moreover, δ will also be a partial function from 2 Q × Σ
to 2 Q . Notice that the nondeterminism in the transitions have already been taken care
by our new naming scheme of the states. We have collected all possible next-states
as a set of states, and this set of states is regarded as a new state. This will, proba-
bly, solve case (b) also. (Recollect that case (b) refers to the case of many possible
transitions with the same state–symbol pair.)
72 3 Equivalences
What about the final states? A computation with a string becomes successful in
accepting the string if it ends up in a final state. The NFA then accepts the string if
there is one such successful computation. This says that if any computation with our
new scheme leads to a state (a set of states now) that contains a final state, then the
machine should accept the string. Similarly, the new initial state would be the set of
all those states that are reachable from the old initial state by ε-transitions. That is, a
new final state is a set that contains at least one final state of N. Similarly, the new
initial state is simply R(s).
We collect the threads together and write formally our construction of a DFA
from an NFA, also called the subset construction algorithm as in the following:
A LGORITHM Subset Construction
Let M = (Q, Σ, Δ, s, F) be an NFA. The DFA corresponding to the NFA M is
D = (Q , Σ, δ, s , F ), where
Q = 2Q ,
s = R(s) = {q ∈ Q : (s, ε, q) ∈ Δ} ∪ {s},
F = {A ⊆ Q : A ∩ F ∅}, and
δ : 2 Q × Σ → 2 Q is defined by
δ(∅, σ) = ∅, and
δ( A, σ) = ∪ p∈ A {R(q) : (r, σ, q) ∈ Δ for some q ∈ Q, r ∈ R( p)}.
In other words, δ( A, σ) = {q ∈ Q : ( p, σu) (q, u) for some p ∈ A}, and this
holds for every string u ∈ Σ ∗ . The set δ( A, σ) is the set of all states in Q, which are
reachable from some state in A by reading the symbol σ. The initial state s of D is
that subset of Q which contains all and only states in M which are reachable from s
without consuming any input. The final states in F of D are the sets of all states that
contain at least one final state of M. It is easy to see that the automaton D is indeed
a DFA, as δ is a partial function. In fact, there is no need to start construction with
the whole of 2 Q . We can introduce the subsets of Q as and when necessary. Look at
the following example to set the idea.
Example 3.2. Construct a DFA corresponding to the NFA given in Fig. 3.1.
" p a
M : s
Fig. 3.1. NFA for a r q b
Example 3.2. b
Solution. Here, M = (Q, {a, b}, Δ, s, F), where Q = {s, p, q, r }, F = { p, q}, and
Δ = {(s, ε, p), (s, a, r ), ( p, a, p), (r, b, q), (q, b, q)}. First, we compute the sets of
states reachable from the states s, p, q, r without consumption of any input. They
are the following:
This says that the initial state of the DFA will be s = {s, p}. Next, we go for the
computation of the transition function δ. It is done as follows:
δ({s, p}, a)
= {R(y) : (x, a, y) ∈ Δ for some y ∈ Q, x ∈ R(s)}
∪{R(y) : (x, a, y) ∈ Δ for some y ∈ Q, x ∈ R( p)}
= {R(y) : (x, a, y) ∈ Δ for some y ∈ Q, x ∈ R(s)} [since R( p) ⊆ R(s) ]
= ∪x∈{s, p} {R(y) : (x, a, y) ∈ Δ for some y ∈ Q}.
We must find out the suitable triples (x, a, y) ∈ Δ, where x = s or p. They are
(s, a, r ) and ( p, a, p). Thus,
We see that in constructing δ for the known sets, we have reached at new states such
as { p, r } and ∅. We must at least find δ for these.
You may also construct δ for other subsets of Q; however, any computation of the
DFA will only use, at the most, these states. Now, what are the final states of the DFA
among the subsets ∅, { p}, {q}, {r }, { p, r }.{s, p}? We had originally F = { p, q}. The
subsets that intersect with F are the elements of F . That is,
In the above example, Does D accept the same language as M? The empty string
ε is clearly accepted by D. If a string begins with b, then D enters the state ∅ and
then halts without accepting the string. If the string starts with a, then D enters the
state { p, r }. If the second symbol is b, it enters {q} and for more b’s the string is
accepted, else, it is not accepted. If the third symbol happens to be a (or more a’s
thereafter), then also the string is accepted. Thus, L(D) = a ∗ ∪ abb∗, which equals
L(M).
74 3 Equivalences
{p;r}
a
a
a
a b
D: {s;p} {p}
b
b
∅ {r}
a
a; b
Notice that {r } is a redundant state in D. Even though we had considered the only
subsets of Q that come up during the computation of δ, it does not guarantee that all
those states are really relevant. However, this does not matter as long as D accepts the
same language as M. But can you prove this, in general? We prove a result looking
slightly more general than this.
Lemma 3.1. Let D be the DFA corresponding to the NFA M. Let p, q be any states
of M and let u ∈ Σ ∗ . Then (q, u) M ( p, ε) iff (R(q), u) D (P, ε) for some
P ⊆ Q with p ∈ P.
Proof. We use induction on the length of u. For u = ε, P = R(q) does the job. Lay
out the induction hypothesis that the statement holds for all strings u ∈ Σ ∗ of length
m. Let w = vσ for a string v of length m with σ ∈ Σ.
Suppose that (q, w) M ( p, ε). While computing with w, M reads v first and
then the symbol σ. That is, there are states q1 , q2 ∈ Q such that
where q2 may also be equal to p. But (q, vσ) M (q1 , σ) gives (q, v) M (q1 , ε).
By the induction hypothesis, (R(q), v) D (Q 1 , ε) for some Q 1 ⊆ Q with q1 ∈ Q 1 .
Thus, (R(q), vσ) D (Q 1 , σ). As (q1 , σ) M (q2 , ε), we have (q1 , σ, q2 ) ∈ Δ.
By the construction of D, we have P1 = δ(Q 1 , σ) ⊇ R(q2 ). Combining this with
(R(q), vσ) D (Q 1 , σ), we see that
Then there are states r2 ∈ Q, p ∈ R(r2 ), r3 ∈ R1 such that (r3 , σ, r2 ) ∈ Δ. The con-
dition p ∈ R(r2 ) says that (r2 , ε) M ( p, ε). By the induction hypothesis, (q, v) M
(r3 , ε). Combine all these to obtain (q, v) M (r3 , ε) M (r2 , ε) M ( p, ε).
Lemma 3.2. Let D be the DFA corresponding to the NFA M obtained by the subset
construction algorithm. Then L(M) = L(D).
When two automata accept the same language, we say that they are equivalent.
The above result shows that corresponding to each NFA there is an equivalent DFA.
3.1. Consider the NFA M = ({s, p, q}, {0, 1}, Δ, s, {q}) with Δ = {(s, 0, s), (s, 1, s),
(s, 1, p), ( p, 0, q), ( p, 1, q), }. Find a DFA equivalent to M using the subset con-
struction algorithm. How many states does the DFA have?
3.2. Fix an n > 1. Define the NFA M = ({s, q1 , . . . , qn }, {0, 1}, Δ, s, {qn }) with Δ =
{(s, 0, s), (s, 1, s), (s, 1, q0 )} ∪ {(qi , σ, qi+1 ) : σ ∈ {0, 1}, i = 1, 2, . . . , n − 1}. Use
the subset construction algorithm to obtain a DFA equivalent to M. What is L(M)?
3.3. Let M be the NFA with input alphabet {0, 1}, states p, q, r , initial state p, final
state p, and transitions ( p, 1, q), ( p, ε, r ), (q, 0, p), (q, 0, r ), and (q, 1, r ). What is
L(M)? Construct a DFA accepting L(M).
3.4. Let M = ({ p, q, r, s}, {a, b}, Δ, s, {s}) be the NFA with Δ = {(s, a, s), (s, a, p),
(s, a, q), (s, a, r ), (r, b, q), (q, b, p), ( p, b, s)}. Give a string that starts with an a and
that is not accepted by M. Construct a DFA D equivalent to M.
76 3 Equivalences
3.5. Construct a DFA equivalent to the NFA with initial state p, final state q,
and transitions ( p, a, p), ( p, a, q), (q, b, q), (q, b, p). Omit inaccessible states. Also
give an equivalent regular expression.
3.6. Construct DFAs equivalent to the NFAs ({s, p, q, r }, {a, b}Δ, s, {r }), with Δ as
given below. Show clearly which subsets correspond to the states s, p, q, r . Omit
inaccessible states.
(a) Δ = {(s, a, s), (s, b, s), (s, a, p), ( p, a, q), (q, b, r )}.
(b) Δ = {(s, a, s), (s, b, s), (s, a, p), ( p, a, q), (q, b, r ), (r, a, r ), (r, b, r )}.
3.7. Design an NFA M that accepts a ∗ with the property that there is a unique tran-
sition which when removed from M, the resulting NFA accepts {a}. Can you design
a DFA with the same property instead of an NFA? Why?
3.8. Find a DFA equivalent to the NFA with initial state p, final state q, and another
state r having transitions
(a) ( p, a, q), (q, a, q), (q, ε, r ), (r, b, p).
(b) ( p, 0, p), ( p, 0, q), ( p, 1, q), (q, 0, r ), (q, 1, r ), (r, 1, r ).
(c) ( p, 0, q), ( p, 1, q), (q, 0, p), (q, 1, q), (q, 0, r ), (q, ε, r ), (r, 1, q).
(d) ( p, 0, q), ( p, ε, q), (q, 0, p), (q, 1, q), (q, 0, r ), (q, 1, r ), (r, o, r ), (r, 1, q).
3.9. Each DFA is an NFA. Apply subset construction algorithm to construct a DFA
from a given DFA M, treating M as an NFA. What do you obtain?
Does there exist a similar connection between regular grammars and finite automata?
Can we mimic the operation of a DFA by a regular grammar?
The first point we observe is that the states in a DFA and the nonterminals in a
grammar are doing almost the same job. Suppose that D = (Q, Σ, δ, s, F) is a DFA.
Let us use the state symbols of D as the nonterminals of a grammar, s being the
start symbol. What about the final states? After reading a string when D enters a
final state, computation stops and the string is accepted. In terms of a derivation in
a grammar, if a final state, a nonterminal symbol now, is met, then we must accept
the string without generating any further string. That is, corresponding to each final
state q, we may have a production of the form q → ε. This gives a hint as to how to
define other productions from δ.
Formally, corresponding to the DFA D, we construct the regular grammar G D =
(N, Σ, R, S), where N = Q, S = s, and
Lemma 3.3. For the DFA D, let G D be the grammar as constructed earlier. Then
L(G D ) = L(D).
Proof. Let u, v ∈ Σ ∗ , σ ∈ Σ, and p, q ∈ Q. Now, writing for D and ⇒ for
⇒G D , we see that
3.3 Finite Automata and Regular Grammars 77
Instead of a DFA, had you started with an NFA, a similar construction of produc-
tion rules from the transitions could have served the purpose.
Does it give any hint for constructing a DFA from a regular grammar? There is
a minor problem. We would like to have a transition such as δ(q, σ) = p from the
production q → σ p. However, all the productions in the grammar need not be in this
form. For example, a regular grammar allows a production of the form q → στ p for
nonterminals p, q, whereas the analog transition δ(q, στ ) = p is not allowed in a
DFA.
At this stage you can think of constructing another grammar from the original
where each production will be in the form q → σ p (or of the form q → ε) and
which accepts the same language as the original. However, a similar construction
can give rise to an NFA.
Observe that we also must take care of the productions of the form q → u for any
string u. As this kind of productions have no nonterminal symbol on the right hand
side, we introduce a new final state; in fact, this will be the only final state of our
NFA.
Formally, let G = (N, Σ, R, S) be a regular grammar. Define Q = N ∪ { f } where
f ∈ N, and
Δ = {(X, u, Y ) : X → uY ∈ R for X, Y ∈ N and u ∈ Σ ∗ }
∪{(X, u, f ) : X → u ∈ R for X ∈ N and u ∈ Σ ∗ }.
Here also, (X, u, Y ) is not allowed as a transition in an NFA; we must break the
string u into symbols. This is done by introducing new states as earlier. Correspond-
ing to every element (X, σ1 σ2 · · · σm , Y ) ∈ Δ, we introduce the triples
(X, σ1 , X 1 ), (X 1 , σ2 , X 2 ), . . . , (X m−1 , σm , Y ),
and remove the original triple (X, u, Y ), where X 1 , . . . , X m−1 are new states added to
Q. Notice that for each distinct u appearing in a transition in Δ, these X i ’s are added
fresh. That means, for precision, we should have written X ui instead of X i . After this
replacement is over, we have possibly a bigger set of states and a bigger transition
relation. Call the updated set of states as Q and the updated transition relation Δ .
The NFA corresponding to the grammar G is defined as NG = (Q , Σ, Δ , S, { f }).
78 3 Equivalences
Lemma 3.4. For a regular grammar G, let NG be the NFA as constructed earlier.
Then L(NG ) = L(G).
Lemma 3.4 is often expressed as: corresponding to each regular grammar there
exists an equivalent NFA.
3.10. Let N be an NFA with input alphabet {0, 1}, states p, q, r , initial state p, final
state p, and transitions ( p, 1, q), ( p, ε, r ), (q, 0, p), (q, 0, r ) and (q, 1, r ). Construct
a regular grammar generating L(M).
3.11. Construct a DFA that accepts the same language generated by the grammar
with productions S → a A, A → abS|b.
3.12. Construct a DFA having three states that accepts the language accepted by the
DFA ({ p, q, r, s, t}, {0, 1}, δ, p, {t}), where δ( p, 0) = q, δ( p, 1) = s, δ(q, 0) = r,
δ(q, 1) = t, δ(r, 0) = q, δ(r, 1) = t, δ(s, 0) = r, δ(s, 1) = t, δ(t, 0) = δ(t, 1) = t. Can
you find a regular grammar for the same language?
3.13. Find a regular grammar that generates all those strings w over {a, b} for which
2 · #a (w) + 3 · #b (w) is even.
3.14. Construct regular grammars for the following languages on {a, b}:
(a) {w : #a (w) and #b (w) are both even}.
(b) {w : (#a (w) − #b (w)) mod 3 = 1}.
(c) {w : (#a (w) − #b (w)) mod 3 0}.
(d) {w : #a (w) − #b (w) is an odd integer}.
3.15. Construct, in each case, a DFA/NFA that recognises L(G), where G is a regular
grammar with productions
(a) S → b|a S|a A, A → a|a A|bS.
(b) S → a S|bS|a A, A → b B, B → aC, C → ε.
3.16. Construct a DFA with six states that accepts all and only strings over {a, b} that
start with the prefix ab.
3.4 Regular Expression to NFA 79
What about finite automata and regular expressions? Regular expressions (over Σ)
are generated from the symbols ∅ and those in the alphabet Σ inductively by using
the operations of ∪, ∗ and concatenation. To begin with, we must say how to con-
struct NFAs that accept the languages (represented by) ∅ and {σ} for each σ ∈ Σ.
Trivially, an NFA with no final states accepts the language ∅. The NFA in Fig. 3.3
accepts the language {σ}.
We next try to simulate the operations of union, Kleene star, and concatenation.
What does this simulation mean? The question is:
Suppose, for two regular expressions α, β, we have the corresponding NFAs, say
M1 and M2 , that is, we have L(M1 ) = L(α), L(M2 ) = L(β). How do we construct
NFAs for the languages L(α ∪ β), L(αβ), and L(α ∗ )?
For simulating union, assume that we have NFAs M1 , M2 accepting the languages
L(α), L(β), respectively. We want to construct an NFA M to accept L(α) ∪ L(β),
which is the same language as L(α ∪ β). In M, we take a new initial state, say, s and
try simulating the computations of M1 and M2 . Think of M1 and M2 as diagrams. M
starting from its initial state can go to the initial state of either of the automata M1 or
M2 . Then M will compute as that automaton does. It is now obvious that we should
have ε-transitions from s to the initial states of the other machines. Final states of
both of them can serve as the final states of M.
Formally, let M1 = (Q 1 , Σ, Δ1 , s1 , F1 ), M2 = (Q 2 , Σ, Δ2 , s2 , F2 ), with Q 1 ∩
Q 2 = ∅. For the union of L(M1 ) and L(M2 ), define M = (Q, Σ, Δ, s, F), where
Q = Q 1 ∪ Q 2 ∪ {s} for a new state s ∈ Q 1 ∪ Q 2 ,
F = F1 ∪ F2 , and
Δ = Δ1 ∪ Δ2 ∪ {(s, ε, s1 ), (s, ε, s2 )}.
Speaking in terms of diagrams, we just take a new initial state s and add an edge
from s to s1 , and another edge from s to s2 , label the edges with ε, remove the initial
arrows from s1 and s2 , keeping the earlier diagrams of M1 and M2 as they were. The
new automaton is M. It is clear that L(M) = L(M1 ) ∪ L(M2 ).
Solution. By the above construction, we see that M is the NFA in Fig. 3.4. It is also
obvious that M accepts the language ab ∪ baa.
a b
"
M:
" b a a
and then M mimics M2 . So, the initial state of M should be that of M1 and the final
states of M should be the final states of M2 . Moreover, M should not stop operating
just after computing with u. Observe that after computing with u, the NFA M is in a
final state of M1 . To continue with the computation, M must go to the initial state of
M2 from any final state of M1 without consuming any input. That is, we must have
ε-transitions from the final states of M1 to the initial state of M2 .
Formally, let M1 = (Q 1 , Σ, Δ1 , s1 , F1 ), M2 = (Q 2 , Σ, Δ2 , s2 , F2 ), with Q 1 ∩
Q 2 = ∅. For the concatenation of L(M1 ) and L(M2 ), define M = (Q 1 ∪ Q 2 , Σ, Δ,
s1 , F2 ), where Δ = Δ1 ∪ Δ2 ∪ {(q, ε, s2 ) : q ∈ F1 }.
In terms of the transition diagrams, we simply put M1 , M2 side by side, remove
the arrow sign from the initial state of M2 , remove the extra circle from the final
states of M1 , and join ε-transitions from the final states of M1 to the initial state of
M2 . It is clear that L(M) = L(M1 )L(M2 ); but show it formally.
Solution. You can construct such an NFA easily. However, I want you to see the
above construction at work. You have the NFA M1 as the M of Example 3.3 accepting
the language ab ∪ baa. First, construct an NFA, say, M2 for accepting ba. From these
two, construct an NFA M by using the above idea; see Fig. 3.5.
a b
"
M1 : M2 : b a
" b a a
a b
" "
M: b a
" b a a "
For Kleene star, it looks straightforward that we add an edge from every final
state to the initial state with label as ε. That is, without consuming any input let the
automaton change its state from any final state to its initial state. One minor problem:
this new automaton will not accept the empty string ε if the initial state is originally
not a final state. There are many remedies.
We add a new state; make this new state our initial state and also an extra fi-
nal state. This will now force the automaton to accept the empty string. We add ε-
transitions from old final states to this new state; and then add one ε-transition from
this new state (a final state) to the old initial state, which is no more an initial state.
Further, we can afford to have only one final state, namely, the new initial state, as
there are already ε-transitions from the old final states to this new initial state. (What
are the other remedies?) See the following example.
Example 3.5. Construct an NFA for accepting the language (aba ∪ abb)∗.
a
M1 : a b
"
a
M: " a b
b
"
3.17. Design an NFA that accepts the language (abab∗ ∪ aba ∗). How many states
you require at the minimum?
82 3 Equivalences
3.18. Construct an NFA with three states that accepts {ab, abc}∗. Can you have an
NFA with fewer than three states that accepts the same language?
3.19. Use the construction of an NFA for regular expressions on the expressions a∅
and ∅∗ . What do you observe?
3.20. Design NFAs and DFAs that accept the following languages:
(a) a + ∪ b ∗ a ∗ .
(b) aa ∗ (a ∪ b).
(c) (a ∪ bb)∗(ba ∗ ∪ ε).
(d) (ab∗ aa ∪ bba ∗ab).
(e) Complement of (ab ∗ aa ∪ bba ∗ab).
(f) (aa ∗ ∪ aba ∗b∗ ).
(g) (a ∪ b)∗ b(a ∪ bb)∗ .
(h) (abab)∗ ∪ (aaa ∗ ∪ b ∗ ).
(i) ((aa ∗)∗ b)∗ .
(j) (ab∗a ∗ ) ∪ ((ab)∗ba).
(k) (ab∗ a ∗ ) ∩ ((ab)∗ba).
(l) (a ∪ b)a ∗ ∩ baa ∗.
(m) (aa ∪ bb)∗(ab ∪ ba)(aa ∪ bb)∗ .
(n) (aaa)∗b ∪ (aa)∗ b.
(o) (a(ab)∗(aa ∪ b) ∪ b(ba)∗(a ∪ bb))∗.
3.21. Design two NFAs each with four states that accept the languages a ∗ ∪ b+ a and
(ab ∪ abb ∪ abbb)∗.
such an NFA is the same as our given one. Let M be an NFA. Call its initial state by
the name q. Add a new state to M, call it s. Make s as the initial state and add an
ε-transition from s to q.
Clearly, the language accepted by the new NFA is the same as that accepted by
M. Note that if there is no transition from any state to q, then we can afford to keep
q as the initial state. But then, it may ensue some modification to the other phases.
Try doing that after you understand the algorithm.
Phase 2: In the second phase, we want an NFA where the final state is a sink, that is,
there will be a single final state from which no transition would go to any state. This
is achieved in a way very much similar to the first phase. Introduce a new final state
f , add ε-transitions from every old final state to f , and then do not regard the old
final states as final states any more.
By the end of Phase 2, we have an NFA; call it M with only one initial state s
to which no transition comes in, and only one final state f from which no transition
goes out. Moreover, there is no transition from s to itself, and none from f to itself.
It is easy to see that L(M) = L(M ). As in Phase 1, if already there is only one
final state in M with no transition going from it, then we can afford to refrain from
introducing another new state. However, try this modification only after completely
understanding the full algorithm.
Phase 3: In this phase, we want an NFA, or rather a generalized NFA, where we
should have exactly one transition between any (ordered) pair of states. That is, we
should have only one transition from p to q and exactly one more from q to p, for
any states p, q of our required generalized NFA. Let p, q be any two (different) states
in M . If there is no transition from p to q, then add a transition and label it with ∅.
If there are n ≥ 2 transitions from p to q, say with labels σ1 , . . . , σn for symbols
σ1 , . . . , σn from Σ ∪ {ε}, then remove all those transitions and add a new transition
with label as σ1 ∪ . . . ∪ σn . Call the new generalized NFA as M . Notice again that
the languages accepted by this new generalized NFA M and by the old M are the
same.
The next phase is the proper node-elimination process for which the earlier phases
form a background.
Phase 4: We have a generalized NFA M with the initial state s and only one final
state f where there is no transition from any state to s and no transition from f to
any state. There are not even any loops around s and f. Further, if x, y are any states
of M not equal to any of s or f , then there is exactly one transition from x to y with
label as a regular expression, say, Rx,y .
Pick any state r other than s, f . Any such state is an original state of N. Corre-
sponding to this state r , find all pairs of states ( p, q) such that a transition comes
from p to r , a transition goes to q from r , and there is a transition from p to q. There
might possibly be a loop around r . But loops around p, q are overlooked.
The relevant portion of the NFA M now looks like the left side diagram in
Fig. 3.7 with the respective labels on the transitions. For each such pair ( p, q) of
states in M corresponding to the state r , replace the label of the transition R p,q
∗
by R p,q ∪ R p,r Rr,r Rr,q . Look at Fig. 3.7; the diagram on the left is replaced by the
diagram on the right (inside M ).
84 3 Equivalences
R r,r
∗
r R p,q ∪ R p,r R r,r R r,q
R p,r R r,q p q
p q
R p,q
This replacement must be done simultaneously for each such pair ( p, q). This
means that you do not immediately remove r going for the replacement that calls for
the pair ( p, q), but explore first whether there are other such pairs for the same r.
Compute replacements for all such pairs, and then remove r by effecting all these
replacements. The pairs ( p, q) also include the possibility ( p, p). This subcase and
the corresponding replacement is drawn schematically in Fig. 3.8.
R r,r
r
R p,r
p
R r,p
p
∗
R p,p ∪ R p,r R r,r R r,p
Fig. 3.8. Phase 4 of state
elimination algorithm. R p,p
Notice that the process of replacement starts with choosing a state r from M
and then replacing simultaneously many transitions. After all these replacements are
over for this state r , delete the state r and the transitions from and to r . The new
generalized NFA now has one state less than that of M .
Repeat this phase to eliminate all the states except the initial state s and the final
state f .
Caution: While considering pairs ( p, q) corresponding to a state r , do not exclude
the states s and f . The state s can serve as a p and the state f can also be a q.
You must consider both the possibilities ( p, q) and (q, p) separately. They are not
the same pair. The order is important because for a given r , the state p is the one
from which there is a transition to r and q is a state to which there is a transition
from r . Observe that such replacements have an effect only when there are transi-
tions from p to r and from r to q in the original NFA M. You can also follow up
∗
many simplifications such as when Rr,r = ∅, the expression R p,q ∪ R p,r Rr,r Rr,q is
∗
simply R p,q ∪ R p,r Rr,q , and when R p,q = ∅, we can have R p,r Rr,r Rr,q instead of
∗
R p,q ∪ R p,r Rr,r Rr,q . Further, the order in which we pick the pairs ( p, q) is irrelevant,
although it might affect the size of the final expression. Similarly, the order in which
3.5 NFA to Regular Expression 85
we choose different r ’s from the automaton M is also irrelevant. This phase is over
when the only remaining states are s and f . One more point to remember: we write
ε as a regular expression instead of the official expression ∅∗ .
You must first understand why at each phase, the language accepted by the new
generalized NFA is the same as that accepted by the old one. It will be helpful to
first duplicate the states in another place and then slowly add or delete the required
transitions as asked in various phases. Look at the following two examples. I would
suggest you to try to do the examples yourself before looking at the solutions. Then,
of course, verify that your solution is, indeed, the same as the one given here. You
may not get the same regular expression because of the order of the states r and the
corresponding state-pairs ( p, q).
Do not be discouraged; you can show that it is the same language that you get
from your regular expression as the one given in the solution. But try to solve the
examples again by choosing the same order of the states as done in the solutions.
This reworking is sometimes tedious. But this is the easiest way to learn!
Example 3.6. Apply the state elimination algorithm on the NFA M to obtain the
regular expression representing the language accepted by it, where M = ({A, B, C},
{a, b}, Δ, A, {C}) with Δ = {( A, a, A), ( A, b, A), ( A, b, B), (B, a, C), (B, b, C)}.
a; b
b a; b
Fig. 3.9. The NFA M.
In Phase 1, we add the initial states s, final state f , and introduce the ε-transitions
as in Fig. 3.10.
In Phase 2, we replace the multiple transitions by regular expressions and then
add transitions with label as ∅ whenever there is no transition between states. In fact,
we do not show all these ∅-transitions; rather, we take only some relevant ones and
suppress others thinking that they are there. We then obtain the generalized NFA M
as in Fig. 3.11.
a; b
" b a; b "
s f
Fig. 3.10. M after Phase 1.
the regular expressions will be same as ∅, and thus this is not relevant. Similarly, all
other possibilities of choosing different pairs ( p, q) for this r give only ∅-transitions.
a∪b
At the end of this phase, we must delete r and all the transitions from and to it.
The r -state in Fig. 3.11 is deleted to obtain the diagram in Fig. 3.12.
In Fig. 3.12, we also show another relevant ∅-transition with our next choice of r .
Note that the r -state in the succeeding diagram is freshly chosen; it is not the same
as in the previous diagram. As shown in Fig. 3.12, we choose our state r afresh. Then
among various other pairs of ( p, q), first we take the state to the left of r as p and f
as q. This is the reason we have shown the ∅-transition in Fig. 3.12. Now, with this
choice of ( p, q), we see that the ∅-transition is to be replaced by a ∪ b and then the
r -state to be deleted. Thus the generalized NFA is updated to the NFA in Fig. 3.13.
(a ∪ b)∗ b a∪b
s r f
We have again chosen our new r as the middle state (the only choice here), and
have shown the relevant ∅-transition from s to f . Eliminating the state r gives us the
new transition from s to f labeled (a ∪ b)∗ b(a ∪ b) as in Fig. 3.14.
(a ∪ b)∗ b(a ∪ b)
s f
Fig. 3.14. M after Phase 4.
The state elimination algorithm suggests that the language accepted by the NFA
M is (a ∪ b)∗ b(a ∪ b).
Example 3.7. Apply the state elimination algorithm on the NFA in Fig. 3.15 to obtain
the regular expression that represents the language accepted by it.
3.5 NFA to Regular Expression 87
Solution. We redraw the diagram and add the ε-transitions changing the initial and
final states. The result of these operations in Phase 1 gives us the left hand side
transition diagram in Fig. 3.16.
a a a a
y z y z
b b
b " ∅ b "
a a
∅
s " x " f s " x " f
We want to eliminate the state x from the NFA. This state as the r -state, we have
the pairs ( p, q) as (s, y), (y, y), and (y, f ). Then the two ∅-transitions as drawn on
the right hand side transition in Fig. 3.16 become relevant.
For the pair (s, y), we replace the transition from s to y by following the sequence
of states s to x to y; the other sequence is from s to y, which is ∅. So union of these
is simply a. Thus the ∅-transition from s to y is relabeled as a.
For the pair (y, y), we follow the sequence of states y, x, y to get the string (ex-
pression) ba. Thus we introduce a loop at y with label ba.
For the pair (y, f ), we follow the states as y, x, f or as y, f directly. These alter-
natives give us the expressions bε or ∅, respectively. Their union is simply b. This is
the label for the edge from y to f .
We delete the state x along with the transitions from and to it to obtain the gen-
eralized automaton as in the left hand side diagram of Fig. 3.17. Also look at the
relevant ∅-transitions added to it as on the right hand side diagram of Fig. 3.17.
ba ba
a a a a
y z y z
b b
a " a "
b ∅ b
s f s f
∅
For eliminating y, the relevant pairs ( p, q) are (s, f ), (s, z), and (z, z). Considering
the pair (s, f ), we get the sequence of states from s to f through y. This way, the NFA
88 3 Equivalences
Once you think you have understood the state elimination algorithm, the following
result should be obvious. It says that corresponding to each NFA, there exists an
equivalent regular expression. However, its proof has to be done by induction.
a ∪ (b(ba)∗ a)
z
∗
a(ba) a
"
s s R
f f
a(ba)∗ b
Lemma 3.6. Let α be the regular expression constructed by the state elimination
algorithm from the NFA M. Then L(α) = L(M).
3.23. Recall that instead of a transition relation for an NFA, we can have a transition
function. In this definition, an NFA is taken as a five tuple: M = (Q, Σ, δ, q0 , F),
where Q is a finite set of states, Σ is an alphabet, q0 ∈ Q is the initial state, F ⊆ Q is
the set of final states, and δ : Q × (Σ ∪ {ε}) → 2 Q . Here, for instance, the transitions
( p, σ, q) and ( p, σ, r ) in the transition relation Δ are written as δ( p, σ) = {q, r }. The
transition function δ is extended to δ ∗ : Q × Σ ∗ → 2 Q in such a way that δ ∗ (q, w)
contains p ∈ Q iff there is a walk in the transition graph from q to p labeled w. The
language accepted by M is defined to be L(M) = {w ∈ Σ ∗ : δ ∗ (q0 , w)∩ F ∅}.
(a) Define δ ∗ set theoretically and then show that our definition of L(M) and this
definition are equivalent.
3.5 NFA to Regular Expression 89
(b) Define the operation of yield between configurations using this definition of an
NFA.
(c) Define acceptance in this case by extending δ ∗ for strings in place of symbols.
(d) Connect this definition to the State Elimination Algorithm.
(e) Is it true that L(M) = {w ∈ Σ ∗ : δ ∗ (q0 , w) ∩ F = ∅}? Prove or give a counter
example.
(f) Is it true that L(M) = {w ∈ Σ ∗ : δ ∗ (q0 , w) ∩ (Q − F) ∅}? Prove or give a
counter example.
3.24. Design an NFA with a single final state and without ε-transitions that accepts
{a} ∪ {bn : n ≥ 1}.
3.25. If the NFA happens to be a DFA, the state elimination algorithm becomes sim-
plified. Write the state elimination algorithm for DFAs in a step-by-step manner.
3.27. Let M be an NFA with initial state p, final state q, and another state r hav-
ing transitions ( p, a, q), (q, ε, r ), (r, ε, p). What is the complement of the language
accepted by M?
3.28. Consider a generalized NFA having transitions ( p,a, p), ( p, a∪b, q), (q, ab, r ),
(r, a, p), (r, ε, q), (r, bb, r ), where p is the initial state and r is the final state.
90 3 Equivalences
3.29. What is L(N), where N is a generalized NFA with initial state p, final state r ,
and transitions
(a) ( p, b, p), ( p, a, q), (q, a, q), (q, a, s), (r, b, s), (s, a, r )?
(b) ( p, a, q), ( p, b, r ), (q, b, p), (q, a, r ), (r, b, q), ( p, ε, r )?
(c) ( p, a, p), ( p, b, q), ( p, b, r ), (q, a, r ), (q, ε, r ), (r, a, q)?
(d) ( p, a, p), ( p, a, q), ( p, a ∗b ∪ c, r ), (q, a ∪ b, q), (q, a ∪ b, r ), (r, a ∪ b∗ , r )?
3.30. Find regular expressions equivalent to the DFAs ({ p, q, r }, {a, b}, δ, p, F),
with
(a) δ( p, a) = δ( p, b) = q, δ(q, a) = δ(r, a) = δ(r, b) = p, δ(q, b) = r ; F = {r }.
(b) δ( p, a) = δ(q, b) = q, δ( p, b) = δ(r, a) = r, δ(q, a) = δ(r, b) = p; F = {q, r }.
3.31. Give regular expressions for the languages accepted by the DFA (Q,Σ,δ,a,F),
where Q = {a, b, c, d, e, f, g}, Σ = {0, 1} and
(a) δ(a, 0) = b, δ(a, 1) = e, δ(b, 0) = a, δ(b, 1) = d, δ(c, 0) = g, δ(c, 1) = b,
δ(d, 0) = e, δ(d, 1) = c, δ(e, 0) = d, δ(e, 1) = c, δ( f, 0) = c, δ( f, 1) = f,
δ(g, 0) = c, δ(g, 1) = a; F = {a, b}.
(b) δ(a, 0) = b, δ(a, 1) = e, δ(b, 0) = a, δ(b, 1) = f, δ(c, 0) = d, δ(c, 1) = c,
δ(d, 0) = g, δ(d, 1) = a, δ(e, 0) = f, δ(e, 1) = g, δ( f, 0) = e, δ( f, 1) = d,
δ(g, 0) = d, δ(g, 1) = b; F = {a, b}.
(c) δ(a, 0) = b, δ(a, 1) = f, δ(b, 0) = a, δ(b, 1) = g, δ(c, 0) = e, δ(c, 1) = b,
δ(d, 0) = b, δ(d, 1) = c, δ(e, 0) = c, δ(e, 1) = a, δ( f, 0) = g, δ( f, 1) = c,
δ(g, 0) = f, δ(g, 1) = e; F = {a, b}.
(d) δ(a, 0) = a, δ(a, 1) = c, δ(b, 0) = f, δ(b, 1) = c, δ(c, 0) = e, δ(c, 1) = g,
δ(d, 0) = f, δ(d, 1) = a, δ(e, 0) = a, δ(e, 1) = g, δ( f, 0) = b, δ( f, 1) = g,
δ(g, 0) = e, δ(g, 1) = c; F = {b, d, e}.
In this chapter, we have fulfilled the promise of the previous chapter. Our sole con-
cern has been to show that the devices of regular expressions, regular grammars,
and the finite automata are equivalent. The sense of equivalence here is in terms of
the class of languages the devices represent. Our path has been the following: DFA
to NFA (trivial), NFA to DFA, DFA to regular grammar, regular grammar to NFA,
regular expression to NFA, and finally, NFA to regular expression.
Equivalence of DFAs and NFAs were shown by Rabin and Scott [107]. Kleene
[67] proved the equivalence of DFAs and regular expressions; and a shorter proof
was given by McNaughton and Yamada [85]. Chomsky and Miller [17] observed
the equivalence of regular grammars and regular expressions. The state elimination
algorithm as presented here closely follows that in [34]. The equivalence problem for
two-tape automata has been examined in [10]. The equivalence of two-way automata
with the standard ones has been shown in [117].
3.6 Summary and Additional Problems 91
3.32. Let L = {w ∈ {a, b}∗ : #a (w) is even and #b (w) is odd}. Find a DFA D and then
a regular expression E such that L(D) = L = L(E).
3.33. Fix an n > 1. Let N = ({q0 , q1 , . . . , qn }, q0 , Δ, {qn }), where Δ = {(q0 , 0, q0 )}∪
{(qi−1 , 0, qi ), (qi−1 , 1, qi ) : 1 ≤ i ≤ n}. Show that
(a) L(N) = {w ∈ {0, 1}∗ : the nth symbol from the right end of w is 1}.
(b) If D is a DFA such that L(D) = L(N), then D has at least 2n states.
This shows that the subset construction cannot be improved to have less than 2n
states, in general, where n is the number of states in an NFA.
3.34. Consider a modification of the definition of an NFA, where there are possibly
many initial states, that is, N = (Q, Σ, δ, Q 0 , F), where Q 0 ⊆ Q is the set of initial
states. Define L(M) as the set of strings from Σ ∗ , which drive at least one initial state
to at least one final state. Show that for every NFA with multiple initial states, there
is one NFA with a single initial state so that they accept the same language. What if
NFAs are replaced by DFAs here?
3.35. Let L be a regular language that does not contain ε. Show that there exists an
NFA without ε-transitions and with a single final state that accepts L.
3.37. If a DFA has n states q1 , q2 , . . . , qn , then its transition function δ can be spec-
ified by an n × n matrix T , called its transition matrix, whose ijth entry is given by
the set of input symbols taking state qi to q j , that is, (T )i j = {σ ∈ Σ : δ(qi , σ) = q j }.
Given a set of states Q having n elements and an input alphabet Σ, consider the col-
lection C of all n × n matrices, which are the transition matrices of some DFAs with
state-set as Q. Define the identity matrix I ∈ C as one whose ijth entry Ii j = “{ε}
for i = j and ∅ for i j ”. Define addition and multiplication of matrices in C by
( A + B)i j = Ai j ∪ Bi j and ( AB)i j = ∪1≤k≤n Aik Bk j . Then, the powers of matrices are
defined by taking A0 = I, An+1 = An A. Similarly, the asterate of a matrix is defined
by ( A∗ )i j = ∪n≥0 ( An )i j . Let M = (Q, Σ, δ, s, F) whose transition matrix is A, as
constructed from δ. Also, let δ ∗ denote the extended transition of δ that is defined for
the strings in Σ ∗ . Then, prove that
(a) ( An )i j = {u ∈ Σ ∗ :
(u) = n and δ ∗ (qi , u) = q j }.
(b) L(M) = ∪qk ∈F ( A∗ )ik .
3.38. It is claimed that one can construct a regular expression from a finite automaton
by solving a system of linear equations of the form Ax + b = x, where A is an n × n
matrix with entries ai j as sets of strings, b is an n × 1 vector with entries b j as sets
of strings, and the unknown n × 1 vector x has entries, also, sets of strings. Here, +
denotes union and multiplication denotes concatenation. Prove the claim and give an
algorithm to solve such a linear system.
92 3 Equivalences
3.39. The Hamming distance between two strings u, v ∈ {0, 1}∗ of the same length
is defined as H (u, v) = number of places where they differ. Moreover, if
(u)
(v),
then H (u, v) = ∞. If A is a language over {0, 1} and x ∈ {0, 1}∗ , then H (x, A) =
min y∈ A H (x, y). For any L ⊆ {0, 1}∗ and n ∈ N, define Mn (L) = {w : H (w, L) ≤ n},
the set of all strings whose Hamming distance from L is at most n. Prove that if L is
regular, then so is M2 (L).
3.40. Design an algorithm which takes a DFA and a natural number n as its inputs
and computes the number of strings of length n that are accepted by the DFA.
3.41. For a regular language L ⊆ Σ ∗ and a ∈ Σ, define L a = {uav : uv ∈ L}.
(a) Given an NFA for L, show how to construct an NFA for L a .
(b) How do you use an NFA for L a to write a pattern recognition program using a
regular expression for L as an input?
3.42. For n ≥ 1, define the NFAs Mn = (Q n , Σn , Δn , sn , Fn ) with Q n = {q1 , . . . , qn },
Σn = {ai j : 1 ≤ i, j ≤ n}, sn = q1 , Fn = {q1 }, and Δn = {(i, ai j , j ) : 1 ≤ i, j ≤ n}.
(a) What is L(Mn )? Describe it in English.
(b) What are the regular expressions for L(M2 ), L(M3 ), and L(M5 )?
(c) Can you see the plausibility in the conjecture that for each polynomial p there is
an n such that no regular expression for L(Mn ) has length smaller than p(n)? I
am not asking for a proof.
3.43. A run in a string is a substring of length at least two, as long as possible and
consisting entirely of the same symbol. For instance, the string abaaaba has a run
aaa. Find DFAs for the following languages over {a, b}:
(a) L = {w : w contains no runs of length less than four}.
(b) L = {w : every run of a’s has length either two or three}.
(c) L = {w : there are at most two runs of a’s of length three}.
(d) L = {w : there are exactly two runs of a’s of length three}.
3.44. Give a regular expression for the set of all strings over {a, b, c} in which all
runs of a’s have lengths that are multiples of three.
4 Structure of Regular
Languages
4.1 Introduction
To take stock, look up at the statements in Lemmas 3.2–3.6. They assert that reg-
ular languages can be defined in many ways: by regular expressions, as languages
generated by regular grammars, as languages accepted by DFAs, and as languages
accepted by NFAs. We summarize these results in the following statement:
Theorem 4.1. Let L be any language. Then the following are equivalent:
Solution. The DFA in Fig. 4.1 accepts the language and hence it is regular.
1 1 1
0 0
Fig. 4.1. DFA for
Example 4.1. 0
(c) Show that L is generated by the regular grammar with the following productions:
S → 0 A, S → 1S, A → 0B, A → 1 A, B → 0S, B → 1B.
(d) Can you find a regular grammar with less than six productions for generating L?
(e) Can you find a regular grammar for L with only two nonterminals?
To see whether a given language is not regular is altogether a different question. The
methods we develop, in this section and the next, will enable us to answer this ques-
tion in most of the cases. In this section, we will look into some closure properties
of regular languages; they will help us in showing some languages to be nonregular
once we know that certain other languages are nonregular. We begin with an example.
1 1 1
0 0
Fig. 4.2. DFA for
Example 4.2. 0
Can you show the claim in the solution of Example 4.2? Compare this DFA with
that in Example 4.1. The final states are simply interchanged. You can, of course, do
that for any DFA, and then complement of every regular language will be regular.
However, you must take care to make the transition function of the original DFA a
total function. See the following statement:
Exercise 4.2. What happens if we do not assume the transition function δ in the DFA
D to be a total function in the proof of Theorem 4.2? [Hint: Can you have a DFA
where each state is a final state but the DFA does not accept every string?]
We have shown that intersection of two regular languages is regular by using the
closure properties of regular languages with respect to union and complement, which
are themselves proved via regular expressions and finite automata. However, there is
a more direct approach to showing this using DFAs. It is the approach of construction
of a product automaton.
Suppose you have two DFAs M1 and M2 that accept the languages L 1 and L 2 ,
respectively. Now, given an input x, imagine feeding it to both the automata simul-
taneously. M1 , upon reading σ, the first symbol of x starting from its initial state s1
goes to the state p, say. Similarly, M2 goes to the state q reading σ starting from
its own initial state s2 . For monitoring just this one step of computations of both the
automata, you must take into account both the new states p and q. In other words,
imagine a new machine M3 , which starts from the state (s1 , s2 ) and goes to the state
( p, q) upon reading σ. This new machine is able to keep track of the computations
of both M1 and M2 at least for one step. Now, computation in M3 may proceed anal-
ogously by simulating both the automata simultaneously this way.
Formally, let M1 = (Q 1 , Σ, δ1 , s1 , F1 ) and M2 = (Q 2 , Σ, δ2 , s2 , F2 ) be two DFAs,
where both δ1 and δ2 are total functions. The product DFA of M1 and M2 is the DFA
M1 × M2 = (Q 1 × Q 2 , Σ, δ, (s1 , s2 ), F1 × F2 ),
where δ : (Q 1 × Q 2 ) × Σ → Q 1 × Q 2 with δ(( p, q), σ) = (δ1 ( p, σ), δ2 (q, σ)).
Exercise 4.4. Show that L(M1 × M2 ) = L(M1 )∩L(M2 ). [Hint: You may either follow
the configurations-yield approach or δ ∗ -approach. In the latter case, first show that
δ ∗ (( p, q), x) = (δ1∗ ( p, x), δ2∗ (q, x)) for x ∈ Σ ∗ .]
Besides the set theoretical closures, regular languages admit of rewriting of sym-
bols as other symbols. The notion of rewriting the symbols as some other symbols is
formally achieved by defining a homomorphism. In this context, a homomorphism is
a concatenation preserving map. Let Σ, Γ be two alphabets. A homomorphism is a
function h : Σ ∗ → Γ ∗ that satisfies h(uv) = h(u)h(v) for all strings u, v ∈ Σ ∗ .
For example, rewriting every symbol in Σ as a amounts to the homomorphism
h : Σ ∗ → {a}∗ defined by h(u) = a
(u) . It is easy to see that h preserves con-
catenation, as
(uv) =
(u) +
(v).
96 4 Structure of Regular Languages
4.4. For each pair of DFAs M = (Q, {a, b}, δ, p, F), M = (Q, {a, b}, δ , p, F )
given below, construct product automata accepting (i) L(M)∩L(M ) and (ii) L(M)∪
L(M ).
(a) Q = { p, q}, F = {q} = F , δ( p, a) = δ( p, b) = q, δ(q, a) = δ(q, b) = p, and
δ ( p, a) = δ (q, b) = p, δ (q, a) = δ ( p, b) = q.
(b) Q = { p, q, r }, F = {q, r }, F = { p, r },
δ( p, a) = δ(r, b) = p, δ(q, a) = δ( p, b) = r, δ(r, a) = δ(q, b) = p, and
δ ( p, a) = δ (q, b) = r, δ ( p, b) = δ (r, a) = q, δ (q, a) = δ (r, b) = p.
(c) Q = { p, q}, F = {q} = F , δ( p, a) = δ( p, b) = q, δ(q, a) = δ(q, b) = p, and
δ ( p, a) = δ(q, b) = q, δ (q, a) = δ ( p, b) = p.
4.7. What happens to the acceptance of languages when we interchange the final and
nonfinal states of an NFA?
4.8. Instead of DFAs, if we take NFAs and construct their product, and do simi-
lar modifications to the final states, do they accept the union or intersection of the
languages?
4.9. Define an operation tr (a shorthand for truncate) that removes the rightmost
symbol of a string. For instance, tr (abbab) = abba. Extend the operation to lan-
guages by tr (L) = {tr (w) : w ∈ L}. Show how to construct a DFA that accepts tr (L)
from a DFA that accepts L. Does it prove that if L is regular, then tr (L) is regular?
4.10. Construct a DFA that accepts the language generated by the grammar with
productions S → ab A, A → ba B, B → a A|bb.
4.13. Let h be a homomorphism. Which of the following are true for all regular
languages?
(a) h(L ∪ L ) = h(L) ∪ h(L ).
(b) h(L ∩ L ) = h(L) ∩ h(L ).
(c) h(L L ) = h(L)h(L ).
(d) h(L ∗ ) = (h(L))∗ .
4.14. Show that if the sentence “for all languages L, L , if L and L ∪ L are regular,
then L is regular” were true, then all languages would be regular.
98 4 Structure of Regular Languages
Is it not obvious that there exist nonregular languages? Well, not to intrigue you, sup-
pose Σ is an alphabet. Any regular expression is a string over Γ = Σ ∪{∪, ∅, ∗, ), (}.
By Theorem 2.1, there are a denumerable (countably infinite) number of strings over
Γ. Consequently, there are only a countable number of regular expressions over Σ.
On the other hand, by Theorem 2.1, there are uncountable number of languages over
Σ. Hence, there are uncountable number of nonregular languages. However, do we
have one such to demonstrate?
Well, is L = {a m bm : m ∈ N} a regular language? How do you construct a regular
expression to represent L? Had it been a m bn , you would have taken a ∗ b∗ . It is not at
all clear how to get L by a regular expression. It looks we have to go on adding a’s
on the left and simultaneously adding that many b’s on the right. But with the help
of a nonterminal, in a regular grammar, we can only add on the right. So, it looks we
will fail in every attempt. But it is not a proof that we cannot have a regular grammar
for generating L.
What about a DFA or an NFA? How many states should we start with? One state
will not accept L any way. (Why?) Suppose we have two states, say, s and q. To accept
ε, we must make s a final state. To accept ab, we must have a transition δ(s, a) = q.
This is so, because, if we have a transition δ(s, a) = s, then any number of a’s can
be taken so that an accepted string may not have equal number of a’s and b’s. Then,
we may have δ(q, b) = s so that the string ab is accepted. To accept aabb, these
transitions will not be enough. It may need introducing another state. It looks this
process will continue for ever. To accept a bigger string, we may have to add another
state. But is that a DFA, with infinite number of states?
Suppose we have a DFA with n states. Consider the string a n+1 bn+1 , for our con-
struction might fail there. Now, the DFA changes its state after reading the first a
(including the possibility to remain in the same state). By reading a n+1 , it must have
changed its state n + 1 times. But there are only n states in the machine. So? By the
pigeon hole principle (See Sect. 1.5.), it must have entered a state at least twice. Sup-
pose the state qk has been entered (at least) twice during the reading of the input a n+1 .
To fix notation, assume that the DFA entered first time the state qk after read-
ing a i . It entered second time after consuming the next a j , and then it consumes
a n+1−i− j bn+1 , finally anyway, it enters a final state, say, f to accept the string
a n+1 bn+1. Then, it is clear that the DFA also accepts a i a j a j a n+1−i− j bn+1 . How? The
extra j -number of a’s in the middle will again drive the machine back to qk and
then it follows the same line of computation as earlier for the succeeding string
a n+1−i− j bn+1 . However, this new string a i a j a j a n+1−i− j bn+1 ∈ L. Therefore, the
DFA does not accept L.
Do you see what we have done? We have started from the assumption that a certain
DFA accepts the language, and then we find that the same DFA does not accept it.
That means there cannot be any DFA that accepts the language. The language is not
regular.
We have an interesting observation. If a string is accepted by a DFA, then certain
other string of possibly bigger length should also be accepted by the DFA. This new
4.3 Nonregular Languages 99
bigger (or smaller) string is obtained from the old by pumping another smaller string
into the old one. Moreover, this pumping can be done as often as we please. However,
we must start with a suitable string and the language must be an infinite language.
Proof. The idea of the proof is contained in the discussion above about determining
whether {a m bm : m ∈ N} is regular or not. It is a matter of formalization only. Why
don’t you try it yourself? Here it is.
Let L be a regular language. Let D = (Q, Σ, δ, s, F) be a DFA that accepts L.
(See Theorem 4.1.) Let m = |Q|, the number of states in D, if D has already more
than one state, else, take m = 2. Let u ∈ L be of length more than m.
Writing explicitly, let u = σ1 σ2 . . . σm+k , k ≥ 1. The automaton D under-
goes a sequence of change of states while reading u. Let the sequence of states be
q0 , q1 , . . . , qm+k . That is, D starts from q0 , reads σ1 , then changes to q1 , and so on.
Then, q0 = s, qm+k ∈ F and δ(qi , σi ) = qi+1 for 0 ≤ i ≤ m + k − 1. As there are
only m states in Q, by pigeon hole principle, some state appears at least twice in this
sequence of states. Moreover, a repetition of states occurs within reading of the first
m symbols of u. To fix notation, let q j = ql for 0 ≤ j < l < m. Then take
x = σ1 σ2 . . . σ j , y = σ j +1 . . . σl , z = σl+1 . . . σm+k , j + l ≤ m.
Observe that this notation allows x to be the empty string. This happens when j = 0,
that is, when the state q0 = s is repeated. Writing u = x yz says that
Choose the repeated state, say, q in such a way that no other state is repeated
within these two occurrences of q; also q is not repeated any more within these
two occurrences.
Obviously such a choice can always be made. With this choice of q, we see that the
length of the string y has to be less than or equal to the total number of states. This
completes the proof.
The proof of the pumping lemma can be summarized easily. It says that once the
prefix x of u has been read by D, the DFA enters a certain state, say, q. Then it reads
the nonempty substring y and finishing this portion, it reenters the state q. Thus any
number of copies of y can be pumped into the string, and then D would accept all
such new strings.
Observe that the number m depends upon the regular language L as the pumping
lemma shows. The proof uses the number of states in a DFA accepting L as this m.
This m then is not unique, for you can always construct a DFA with more number
of states to accept the same language. However, there might be a minimum of such
m, which gives a hint that perhaps a DFA of minimum number of states can be
constructed to accept a given regular language. This problem is addressed under the
banner of minimization of states.
Sometimes it is easier to appeal to the proof of the pumping lemma rather than
its statement. There are thus many strengthening of the pumping lemma; some of
them are in the exercises at the end of the chapter. However, I would suggest you
to apply the proof method itself rather than any strengthening of pumping lemma to
individual problems. With a good choice of the string x yz, pumping lemma makes
the things easy. See the following example.
Example 4.5. Show that L = {w ∈ {a, b}∗ : w R = w}, the set of all palindromes over
{a, b}, is not a regular language.
Solution. Assume that L is regular. Let m be the number provided by the pumping
lemma. Choose w = a m ba m . If w = x yz, y ε,
(x y) ≤ m, then y = a k for some
k ≥ 1. Now, the string x y 2 z is not a palindrome. This violates the pumping lemma,
and therefore, L is not regular.
4.3 Nonregular Languages 101
Sometimes, the closure properties (Theorem 4.2) also help in proving that certain
languages are not regular. This is only a method of reduction, where you must know
beforehand that certain other languages are not regular.
Example 4.7. Is L = {u ∈ {a, b}∗ : #a (u) = #b (u)} regular?
Solution. Suppose that L is regular. As a ∗ b∗ is regular, L 1 = L ∩ a ∗ b∗ must also be
regular by Theorem 4.2. But L 1 = {a m bm : m ∈ N} is not regular. Hence L is not a
regular language.
Application of the pumping lemma is viewed as a game with the demon. Given a
language L ⊆ Σ ∗ , you want to show that it is not regular while the demon claims
that it is regular. The demon chooses m. You choose w ∈ L such that
(w) > m.
The demon picks strings x, y, z ∈ Σ ∗ in such a way that w = x yz and y ε. You
choose n ∈ N. Now, you win if x y n z ∈ L, and the demon wins otherwise. Here, of
course, we assume that each of you and the demon makes the best choice possible.
That means, it is not enough for you to win, but to have a strategy to win.
For instance, in Example 4.10 above, the demon might choose any m. Then, you
choose w as a (m+3)! . It is a strategical choice and not a particular one. Next, the demon
chooses x = a p , y = a q , z = a r with q > 0. You choose n = 2. As x y 2 z ∈ L, you
have own; and the language is not regular.
For regular languages over a single-letter alphabet (say, {a}), the pumping lemma
can be strengthened in the sense that the suffix z in the rewriting of u can be chosen to
be ε. This is so, because a DFA that accepts such a language has a diagram with the
property that from any state only one arrow can go to another state (if at all any arrow
goes from it). Now just follow the arrows from the initial state successively. If there
is a string of length more than the number of states, then a state has to repeat. Take
the first such occurrence of a repetition. This is actually the second time occurrence
of that state. Any string that is accepted must be accepted inside this loop. That is,
one of the states inside this loop is a final state.
Write a i as the string that corresponds to the sequence of states before the first
occurrence of this state, the string a j for the loop, and the string a k from the first oc-
currence of the repeated state to one such final state. Then the string a i+k is accepted
without completing a loop. But when we go through the loop, say, n times, the string
a i+n j +k is also accepted. This amounts to asserting that for any such language L, if
for some i, a m ∈ L for any m ≥ i (take m = i + k), then we have also a j such that
a m+n j ∈ L for all n ∈ N.
Surprisingly, its converse also holds. That is, for a language over a single-letter
alphabet, if this property holds, then the language must be regular. To show this, you
just start with the given numbers i, j, k with k ≤ j , and then construct a DFA with a
cycle after joining (i + 1)th state to itself, which goes through j -states with (i + k)th
state as a final state.
This result is put formally by defining an ultimately periodic subset of N. A subset
P ⊆ N is called ultimately periodic, provided there exist natural numbers n ≥ 0,
k ≥ 1 such that for all m ≥ n, we have m ∈ P iff m + k ∈ P. For example, the set of
all even natural numbers is ultimately periodic, as, with n = 0, k = 2, we see that for
each natural number m, m is even iff m + 2 is even. We have thus proved that
What happens if a language is not over a one-letter alphabet? Can Theorem 4.5
help in getting some idea about the set of lengths of its strings? Suppose L is any
regular language and we have a DFA D that accepts it. If you rename every symbol
in the alphabet of L as a, then D would become an NFA. The language L accepted
4.3 Nonregular Languages 103
4.16. Show that there is no DFA that accepts all (and only) palindromes over {a, b}.
4.18. Show that the following languages over {a, b} are not regular:
(a) {a n ba n : n ≥ 1}.
(b) {a n : n is a perfect cube }.
(c) {w : #a (w) < #b (w)}.
(d) {wbn : w ∈ {a, b}∗ ,
(w) = n}.
(e) {(ab)m bn : m > n ≥ 0}.
(f) {a m bn : m n, m.n ∈ N}.
(g) {a m bn a k : k ≥ m + n}.
(h) {a n b2n : n ≥ 1}.
(i) {a m bn : 0 < m < n}.
(j) {a m bn a k : k m + n}.
(k) {a m bn a k : m = n or n k}.
(l) {a m bn : m ≤ n}.
(m) {w ∈ {a, b}∗ : #a (w) #b (w)}.
(n) {ww : w ∈ {a, b}∗}.
(o) {w R w : w ∈ {a, b}∗}.
(p) {wwww R : w ∈ {a, b}∗}.
(q) {a m bn : m > n} ∪ {a m bn : m + 1 n}.
(r) {uww R v : u, v, w ∈ {a, b}+}.
(s) {wwn v : v, w ∈ {a, b}+, n ≥ 1}.
(t) {ww : w ∈ {a, b}∗ }, where w is the string obtained from w by changing a to b,
and b to a simultaneously.
104 4 Structure of Regular Languages
4.25. Show that the set of balanced parentheses is not a regular language.
4.26. Formal languages can be used to describe various two-dimensional figures. For
example, let Σ = {u, d, l, r }. Interpreting the symbols as drawing a line segment of
one unit from the current position in the direction of “up, down, left, right,” respec-
tively. A string such as ur dl draws a square.
(a) Draw figures corresponding to the expressions (r d)∗ , (ur ddr u)∗, and (r uldr )∗ .
(b) Find a necessary and sufficient condition on an expression E over Σ so that the
figures it represents are closed contours.
(c) Let L be the set of all w ∈ {u, d, l, r }∗ that describe rectangles. Show that L is
not a regular language.
4.27. Use Theorem 4.3 to derive Theorem 4.6 from Theorem 4.5.
Consider the language L = (10 ∪ 101)∗ of Example 2.20. I have intrigued you by
claiming that possibly no simpler DFA can be constructed to accept the same lan-
guage L than that given in Fig. 1.19. What I mean by “simpler” is that the number of
states is smaller. Let us name the states of the DFA as in Fig. 4.3.
1
1 1
M : s p q r
Fig. 4.3. States of Fig. 1.19 0
named. 0
The DFA in Fig. 4.3 is M = (Q, Σ, δ, s, F), where Q = {s, p, q, r }, Σ = {0, 1},
F = {s, q, r }, and the transition function δ : Q × Σ → Q is given by
1
1 1
M : s p
0
q r
0 0
Clearly, L(M ) = L(M). Reason? No string can drive the state s to state t in M !
In the transition diagram you see that there is no path from the initial state to t. In
such a case, the state t is one that is inaccessible from the initial state.
Formally, we say that a state in a DFA D = (Q, Σ, δ, s, F) is an accessible state
if there is some string u ∈ Σ ∗ such that (s, u) (q, ε) (equivalently, δ ∗ (s, u) = q).
A state that is not accessible is said to be an inaccessible state. It is obvious that a
DFA can be cleaned up by simply deleting all its inaccessible states and the corre-
sponding transitions from it. The resulting DFA still accepts the same language as the
old one. If two automata accept the same language, we say that they are equivalent,
as usual. That is, the DFA (or NFA) D1 is equivalent to the DFA (or NFA) D2 iff
L(D1 ) = L(D2 ).
As long as we are interested in the equivalence of DFAs, we assume, without loss
of generality, that no DFA has an inaccessible state. Further, we will assume that all
our DFAs have transition functions as total functions. That is, δ : Q × Σ → Q is
a total function. This assumption again does not affect generality, as a state of no
return can always be added to one where δ is a partial function. For example, to the
DFA of Fig. 4.3, we can add another state, say, t with all missing transitions directed
to it and a loop on t with labels as all symbols from Σ. For example, see how the
DFA M of Fig. 4.5 works on the inputs 1010, 1110, 10101, and 10101101. You find
that
(s, 1010) (q, ε), (s, 1110) (t, ε), (s, 10101) (r, ε), (s, 10101101) (r, ε).
1
1 1
M : s p
0
q r
1 0
0 0
t
Notice that the computation of M that ends in the state t would correspond to
an abrupt halt in the earlier DFA of Fig. 4.3. Take, for example, the computation
4.4 Myhill–Nerode Theorem 107
with input 1110. In the DFA of Fig. 4.3, we would have an abrupt halt with the
configuration ( p, 110). The converse also holds, that is, when the earlier DFA halts
abruptly, the new DFA ends up with t, the added state of no return.
We are interested in minimizing the states of a DFA having no inaccessible states
and where the transition function is a total function. Let us call such DFAs as simple
DFAs.
This assumption of simplicity helps in presenting the minimization in a rather eas-
ier way. The ease comes from the fact that computation in such a DFA with any string
in Σ ∗ will terminate normally leaving the DFA in some state. It suggests that we can
possibly categorize the strings in Σ ∗ by looking at the terminal state of the DFA.
For example, the above computation suggests that the strings 10101 and 10101101
are in the same category as they drive the DFA M (of Fig. 4.5) to the state r , start-
ing from the initial state. Notice that the categories or classes depend on the DFA in
consideration. We may start with a relation, say, D , induced by the DFA D on Σ ∗ .
Let D = (Q, Σ, δ, s, F) be a simple DFA. Let u, v ∈ Σ ∗ be any strings. We
write u D v iff (s, u) D (t, ε) and (s, v) D (t, ε) for the same state t ∈ Q.
Equivalently, u D v iff δ ∗ (s, u) = δ ∗ (s, v). The binary relation D is said to be the
relation induced by D on Σ ∗ . You may read u D v as u is related to v by D.
Whenever, we have only one DFA in question, we will omit the subscript D from
the symbol D and write it as ; similarly, let us abbreviate D to . The relation
is reflexive as δ is a total function. Trivially, is symmetric. Now, if (s, u) (t, ε)
and v w, then (s, w) (t, ε) for the same t, and thus is a transitive relation. We
conclude that is an equivalence relation on Σ ∗ . We thus say that that is, D , is
the equivalence relation on Σ ∗ induced by the simple DFA D.
The relation then decomposes Σ ∗ into equivalence classes. A typical equiva-
lence class will have all strings that drive the DFA D from its initial state to some
particular state in Q. Thus, there are exactly |Q| number of such equivalence classes,
because inaccessible states have already been cleaned up from D. Thus, correspond-
ing to each state, there is at least one string in Σ ∗ that drives the DFA to this state.
An equivalence relation on Σ ∗ that has a finite number of equivalence classes is
said to be a relation of finite index, the index being the number of such equivalence
classes. In this terminology, we see that the relation is of finite index, the index
being |Q|.
If both u, v ∈ Σ ∗ are in the same equivalence class, then there is some state, say,
t ∈ Q such that (s, u) (t, ε) and (s, v) (t, ε). We will write this equivalence
class as [t]. We see that
This notation of an equivalence class [x] is unlike the usual notation where x is an
element of it. But it is convenient for us.
The relation , moreover, satisfies another elegant property. Let u, v, w ∈ Σ ∗
with u v. Let (s, u) (t, ε) and (s, v) (t, ε) for a state t ∈ Q. Now,
(s, uw) (t, w) (q, ε) for some q ∈ Q. Also, (s, vw) (t, w) (q, ε).
Therefore, uw vw.
108 4 Structure of Regular Languages
is called a right invariant relation. We have shown that is a right invariant relation.
Using the fact that L(D) is the set of all strings that drive the DFA D from its
initial state to one of the final states, we obtain the following:
Lemma 4.1. Let D = (Q, Σ, δ, s, F) be a simple DFA. Then the relation on Σ ∗ in-
duced by D is a right invariant equivalence relation of finite index. It decomposes Σ ∗
into |Q| number of equivalence classes; that is, index of is |Q| and Σ ∗ = ∪q∈Q [q].
Moreover, L(D) = ∪t∈F [t].
Each equivalence class [t] is regular as the DFA Dt = (Q, Σ, δ, s, {t}) accepts
it. You can now view Lemma 4.1 in a different way. If L is a regular language,
then there is a DFA D such that L(D) = L. This D gives rise to a right invariant
equivalence relation of finite index, such as D , where each equivalence class is a
regular language. Remarkably, the converse of this statement also holds.
3. (q1 , u) (q j , ε) iff C1 u ⊆ C j .
Notice that statement (3) can also be rephrased as δi∗ (qi , u) = q j iff Ci u ⊆ C j .
We show that L(Di ) = Ci for 1 ≤ i ≤ n. So, let u ∈ Ci . As ε ∈ C1 , u ∈ C1 u. By
(2), we have C1 u ⊆ Ci . By (3), we see that (q1 , u) (qi , ε). (i.e., δi∗ (q1 , u) = qi ). As
qi is a final state, u ∈ L(Di ). Conversely, let u ∈ L(Di ). Then (q1 , u) (qi , ε) (i.e.,
δi∗ (q1 , u) = qi ). By (3), C1 u ⊆ Ci . By (1), u ∈ Ci .
Lemma 4.2 was essentially due to J. Myhill and A. Nerode. We combine Lem-
mas 4.1 and 4.2 and summarize the above discussion as follows:
Myhill–Nerode theorem can be used to show that certain languages are not regu-
lar; see the following example:
4.29. Design a DFA with fewer states equivalent to one with initial state p, final
states q, r, s, t, and having transitions:
δ( p, 0) = q, δ( p, 1) = t, δ(q, 0) = δ(r, 0) = r, δ(q, 1) = δ(r, 1) = s, δ(s, 0) = u,
δ(s, 1) = δ(t, 0) = δ(t, 1) = δ(u, 1) = u, δ(u, 0) = δ(v, 0) = δ(v, 1) = v.
4.30. Construct a DFA with fewer states that accepts the same language as the DFA
D = ({ p, q, r, s, t}, {0, 1}, δ, p, {s, t}) with δ( p, 0) = q, δ( p, 1) = δ(q, 0) =
δ(r, 0) = r, δ(q, 1) = δ(s, 0) = δ(s, 1) = s, δ(r, 1) = δ(t, 0) = δ(t, 1) = t.
4.32. Show that if a language has a string of minimum length n, then any DFA that
accepts this language must have at least n + 1 states.
110 4 Structure of Regular Languages
Myhill–Nerode theorem can be used to prove the existence of a minimal DFA, one
with minimum number of states. Let L be a regular language over an alphabet Σ.
We then have a right invariant equivalence relation ∼ on Σ ∗ of finite index such that
L is the union of some of the equivalence classes of ∼. Let C1 , C2 , . . . , Cn be the
equivalence classes of ∼. Notice that our construction of such a relation was done
through a DFA accepting L. In that case, the relation ∼ is simply the relation induced
by the DFA, and n is the number of states in the DFA. However, there is no guarantee
that any such relation having the properties in Myhill–Nerode Theorem must be one
associated with a DFA. Is there?
We will try to find another relation of finite index looking at the perspective of
choosing one where the index is minimum. Later, we will connect this minimum
index to a DFA. The idea is to look at the definition of right invariance. It demands
only a conditional statement such as
We will write ρL as ρ omitting the subscript L if the language L is clear from the
context. Our aim is to show that such a relation would generate minimum number of
equivalence classes among all those given in Theorem 4.7.
First of all, is ρ an equivalence relation? Trivially, for all w ∈ Σ ∗ , uw ∈ L iff
uw ∈ L, showing that ρ is reflexive. Symmetry of ρ is also equally trivial. If for all
w ∈ Σ ∗ , uw ∈ L iff vw ∈ L, and for all w ∈ Σ ∗ , vw ∈ L iff zw ∈ L, then clearly, for
all w ∈ Σ ∗ , uw ∈ L iff zw ∈ L. Thus, ρ is transitive. Therefore, ρ is an equivalence
relation. Further, ρ is right invariant from its very definition. In general, ρ need not
be of finite index. But when L is a regular language, ρ (i.e., ρL ) is a relation of finite
index; this is shown in the following statement:
q1 ≈ q2 . By Exercise 4.8, δ(q1 , σ) ≈ δ(q2 , σ), and then (δ(q1 , σ)) = (δ(q2 , σ)) . Our
aim is to show that the DFA D = (Q , Σ, δ , s , F ) is a minimal DFA accepting the
same language as D.
First, we must check that L(D ) = L(D). To see this, suppose u ∈ L(D). Then
(s, u) D (t, ε) for some t ∈ F. From the definition of δ (using induction on
length of u), it is obvious that (s , u) D (t , ε). Further, t ∈ F as t ∈ F. That is,
u ∈ L(D ). Conversely, suppose u ∈ L(D ). Then (s , u) D (t , ε) for some t ∈ F .
Again, from the definition of δ , we have (s1 , u) D (t1 , ε), where t1 ∈ F, s ≈ s1 ,
and t ≈ t1 . It then follows that (s, u) D (t1 , ε) for t1 ∈ F, that is, u ∈ L(D).
We next compare the number of equivalence classes of ρL and the number of
states in Q . Recollect that ρL is supposed to give us a minimal DFA by its equiva-
lence classes.
Now, suppose |Q | ≥ n, where n is the index of ρL . Then there are two distinct
states (equivalence classes, looking from Q), say, p and q in Q and strings x, y ∈
Σ ∗ such that when D gets inputs x and y, it reaches p and q , respectively, starting
from its initial state s . Moreover, the strings x and y are in the same equivalence
class of ρL . This means that (s , x) D ( p , ε) and (s , y) D (q , ε). As we have
defined the transition function δ , the yields can be seen in D itself. That is, we have
(s, x) D ( p, ε), (s, y) D (q, ε), p ≈ q, x ρL y.
As p ≈ q, there is z ∈ Σ ∗ such that ( p, z) D (r, ε) and (q, z) D (t, ε), where one
of r or t is in F and the other is not. Now,
(s, xz) D ( p, z) D (r, ε) and (s, yz) D (q, z) D (t, ε).
As only one of r, t is in F, this shows that xz and yz are not related by ρL . This
contradicts the right invariance of ρL . Hence |Q | ≤ the index of ρL . But the in-
dex of ρL is the minimum number of states required for any DFA for accepting L
(Theorem 4.8). Therefore, D is a minimal DFA.
We summarize the above discussion as in the following:
Theorem 4.9. Let D = (Q, Σ, δ, s, F) be a DFA and D = (Q , Σ , δ , s , F ) be its
quotient DFA as constructed earlier, modulo state equivalence. Then L(D ) = L(D)
and D has the minimum number of states for accepting L(D).
The construction of the quotient DFA that uses state equivalence can be written as
an algorithm. The idea is to tick out the nonequivalent states and keep the equivalent
ones by starting from all pairs of states. As we are computing with an equivalence
relation, there is no need to start with all ordered pairs. Only half of them would be
sufficient, taking symmetry into consideration. The algorithm goes as follows:
A LGORITHM State Minimization
Suppose, we have ordered the set of all states Q of the simple DFA D, say, Q =
{q1 , . . . , qn }. We consider the set E = {(qi , q j ) : 1 ≤ i < j ≤ n}. We want to mark
off all those ordered pairs in E, the components of which are not state equivalent.
Notice that when a state p is a final state and a state q is not a final state, then
they cannot be state equivalent. So, initially, we mark all the pairs ( p, q) where one
of them is in F and the other is not. These are the states that are shown easily to
4.5 State Minimization 113
Example 4.12. Construct a minimal DFA equivalent to the simple DFA whose tran-
sition diagram is given in Fig. 4.6.
1
0
0
M: q1 q2 q3 q4
1 0
0 1
1 0
0
q5 0 q6 1 q7 1
Solution. We take the same ordering of the states as their subscripts and form the
unordered pairs (qi , q j ) for 1 ≤ i < j ≤ 7. Initially, we mark the pairs ( p, q), where
p ∈ F and q ∈ Q − F. Owing to symmetry, we also mark the pairs ( p, q), where
p ∈ Q − F and q ∈ F.
While doing it manually, as we are doing now, it is convenient to construct a trian-
gular array keeping blank slots (to be marked) corresponding to rows and columns.
See Fig. 4.7.
q6
q5
q4
q3 × × × ×
q2 ×
q1 ×
Fig. 4.7. Working array for
q7 q6 q5 q4 q3 q2
Example 4.12.
114 4 Structure of Regular Languages
The rows correspond to the states q1 to q6 and the columns correspond to states
q2 to q7 . We omit the first state in columns and the last state in the rows; they are not
required as any state is always equivalent to itself. Moreover, we need not consider
any entry beyond the triangular region bounded by the horizontal line, the vertical
line, and the line joining the top most point of the vertical line and the right most point
of the horizontal line. We want to mark those pairs ( p, q) where p is not equivalent
to q (and q is not equivalent to p). After the initial marking (with a ×), our array
looks like that in Fig. 4.7.
As q3 is the only final state, we have marked the pairs
The pair (q3 , q6 ) has already been marked. Therefore, the pair (q1 , q2 ) is also marked.
Similarly, the pairs (q1 , q4 ), (q1 , q6 ), and (q1 , q7 ) are also marked as
(δ(q1 , 0), δ(q4 , 0)) = (q6 , q3 ), (δ(q1 , 0), δ(q6 , 0)) = (q2 , q3 ),
(δ(q1 , 0), δ(q5 , 0)) = (q6 , q6 ), (δ(q1 , 1), δ(q5 , 1)) = (q2 , q7 ).
Next, we consider the input symbols. The input symbol 0 forces us to end up with
(q6 , q6 ), where nothing can be concluded. However, the input symbol 1 gives some
information. It says that
if the states q2 , q7 are not equivalent, then q1 , q5 are also not equivalent.
So, we must maintain this relation among the pairs, which can be used in a future
time. For example, if we find later, during the execution of the algorithm, that the
pair (q2 , q7 ) are marked, then this will lead us to mark the pair (q1 , q5 ) also. Let us
write this association by (q2 , q7 ) (q1 , q5 ).
Next, we consider the pairs (q2 , qi ) for i = 4, 5, 6, 7. We see that
(δ(q2 , 0), δ(q4 , 0)) = (q3 , q3 ), (δ(q2 , 1), δ(q4 , 1)) = (q6 , q7 ),
(δ(q2 , 0), δ(q5 , 0)) = (q3 , q6 ), (δ(q2 , 0), δ(q6 , 0)) = (q3 , q3 ),
(δ(q2 , 1), δ(q6 , 1)) = (q6 , q7 ), (δ(q2 , 0), δ(q7 , 0)) = (q3 , q3 ),
(δ(q2 , 1), δ(q7 , 1)) = (q6 , q4 ).
Thus, the pair (q2 , q5 ) is marked and the association of pairs is updated to
(δ(q4 , 0), δ(q5 , 0)) = (q3 , q6 ), (δ(q4 , 0), δ(q6 , 0)) = (q3 , q3 ),
(δ(q4 , 1), δ(q6 , 1)) = (q7 , q7 ), (δ(q4 , 0), δ(q7 , 0)) = (q3 , q3 ),
(δ(q4 , 1), δ(q7 , 1)) = (q7 , q4 ).
This tells that the pair (q4 , q5 ) is marked and no new update of the association of the
pairs is possible.
For pairs (q5 , q6 ), (q5 , q7 ), we see that
(δ(q5 , 0), δ(q6 , 0)) = (q6 , q3 ), (δ(q5 , 0), δ(q7 , 0)) = (q6 , q3 ).
Therefore, the pairs (q5 , q6 ) and (q5 , q7 ) are marked. For the pair (q6 , q7 ), we find that
(δ(q6 , 0), δ(q7 , 0)) = (q3 , q3 ), (δ(q6 , 1), δ(q7 , 1)) = (q7 , q4 ).
From this, nothing can be concluded. However, the last equality says that the associ-
ation of pairs must be updated to
q6
q5 × ×
q4 ×
q3 × × × ×
q2 × ×
q1 × × × × ×
Fig. 4.8. Working array for
q7 q6 q5 q4 q3 q2
Example 4.12 continued.
In the final round, we must exploit the associations, using them recursively. As
(q2 , q7 ) is unmarked, the first association cannot be used rightnow. The pair (q6 , q7 )
is unmarked; this also cannot be used rightnow. Other pairs such as (q6 , q4 ) and
(q7 , q4 ) are again unmarked. That is, none of the pairs to the left of is marked.
116 4 Structure of Regular Languages
This round does not mark any of the earlier unmarked pairs, and then we conclude
that the vacant entries in the triangular array correspond to the equivalent states. But
what are they?
Look at the vacant entries in the column labeled q7 . It says that the state q7 is
equivalent to all of (from the rows now) q2 , q4 , q6 . Similarly, q6 is equivalent to q2 ,
and q4 ; q5 is equivalent to q1 , and q4 is equivalent to q2 . That is, the equivalence
classes are p = {q1 , q5 }, q = {q3 }, and r = {q2 , q4 , q6 , q7 }.
To construct the DFA with these equivalence classes, we have to look at the transi-
tion diagram of Fig. 4.7. We just add transitions from an equivalence class to another
if there is a transition from at least one element of the first class to one of the second,
keeping the label as the same. For example, there is a transition from state q2 to q3
labeled 0. Thus, we add a transition from the equivalence class {q2 , q4 , q6 , q7 } to {q3 }
labeled 0. Again, the final states will be those of which an element is a final state in
the original DFA. We then have the minimized DFA as in Fig. 4.9.
1 1
0; 1 0
M : p q r
Fig. 4.9. Minimized DFA
for Example 4.12. 0
Example 4.13. Construct a DFA with fewest states that accepts (10 ∪ 101)∗ .
Solution. A simple DFA accepting the language is given in Fig. 4.10.
M : p 1 q 0 r 1 s
1 0
0 0
t
To minimize the number of states in the DFA M of Fig. 3.10, we start with a
triangular array and mark the pairs ( p, q), ( p, t), (r, q), (r, t), (s, q), (s, t) as F =
{ p, r, s} and Q − F = {q, t}. At this point, make a triangular array and start the
process yourself. Then come back to this page. Look at Fig. 4.11.
Next, compute (δ(qi , σ), δ(q j , σ)) for marking the nonequivalent states. You see
that
(δ( p, 0), δ(s, 0)) = (t, r ), (δ( p, 0), δ(r, 0)) = (t, t), (δ( p, 1), δ(r, 1)) = (q, s),
(δ(q, 0), δ(t, 0)) = (r, t), (δ(r, 0), δ(s, 0)) = (t, r ).
4.5 State Minimization 117
r ×
q × ×
p × ×
t s r q
For the pair ( p, r ), we see that (δ( p, 0), δ(r, 0)) = (t, t) gives no information. So we
had to compute (δ( p, 1), δ(r, 1)) = (q, s). We see that (t, r ) and (q, s) are already
marked. Hence, all the pairs ( p, s), ( p, r ), (q, t), (r, s) are marked nonequivalent.
(Here are no associations of pairs for future marking.) That is, no equivalent pair
exists in the DFA M. This means that the simple DFA M is already minimized.
However, this may not be a DFA with the fewest states; for, there might be one
with fewer states that is not a simple DFA. This may happen due to a state of no
return in the simple DFA. For example, the DFA M has the state t, a state of no
return. Remove t from M and call this new DFA M .
Now, is M a DFA with fewest states? If not, then there is another DFA, say,
M with less than four states that accepts the language (10 ∪ 101)∗ . If there are
redundant states (unreachable from the initial state) in M , delete them. Then, add to
the resulting DFA a state of no return to make a simple DFA. Call the new DFA M .
The number of states in M is no more than 4. But M accepts the same language
as M. This is a contradiction to the fact that M is a minimal DFA accepting the
language (10 ∪ 101)∗ . Therefore, M is a DFA with fewest states. Check that this is
the DFA of Example 1.19.
4.33. Suppose in a DFA the states p, q, r are such that p, q are state equivalent but
p, r are not. Show that q, r are not state equivalent.
4.34. Design a minimal DFA that accepts L = L(a ∗ b∗ ∪ b ∗ a ∗ ). Also, give a regular
expression describing each of the regular classes of the Myhill–Nerode relation ρL .
4.35. Let L be the set of strings of balanced parentheses. What are the equivalence
classes of the relation ρL ?
4.36. Write a precise algorithm of state minimization employing the idea of making
a triangular array and the association of pairs via .
118 4 Structure of Regular Languages
4.37. Construct DFAs with minimum number of states that accept the following lan-
guages and argue why the DFAs are minimal:
(a) {a m bn : m ≥ 2, n ≥ 1}.
(b) {a n b : n ≥ 0} ∪ {bn a; n ≥ 1}.
(c) {a n : n 2 and n 4}.
(d) {a n : n ≥ 0, n 3}.
4.38. Minimize the states of the DFA D = ({ p, q, r, s, t, u}, {0, 1}, δ, p, {t, u}),
where δ( p, 0) = q, δ( p, 1) = t, δ(q, 0) = p, δ(q, 1) = t, δ(r, 0) = q, δ(r, 1) = s,
δ(s, 0) = δ(s, 1) = t, δ(t, 0) = δ(t, 1) = u, δ(u, 0) = δ(u, 1) = u.
4.40. Show that state equivalence is an equivalence relation, but state inequivalence
is not an equivalence relation.
4.41. Minimize the DFAs ({a, b, c, d, e, f, g, h}, {0, 1}, δ, a, F), where F and δ are
as follows. Indicate which equivalence class corresponds to which state of the
new DFA.
In this chapter, we have discussed the closure properties of regular languages with
respect to the usual set theoretical operations, concatenation, asterate (Kleene star),
homomorphism, and inverse homomorphism. We have also tried to discover the peri-
odicity structures of regular languages, including the pumping lemma. These results
help us in showing the existence of nonregular languages constructively, as well as in
proving that some languages are regular provided some others are. We have also dis-
cussed an important result, the Myhill–Nerode theorem. An application of this result
has led us to a solution of the problem of minimizing the number of states in a DFA.
Closure properties of regular languages were studied by Ginsburg and some of his
collaborators [38, 41, 42], McNaughton and Yamada [85], Rabin and Scott [107], and
Stearns and Hartmanis [127], among others. For a general treatment of ultimate peri-
odicity and some other functions that preserve regularity, see [116]. Myhill–Nerode
theorem was proved independently, in different forms, by Myhill [93] and Nerode
[95]. For minimization of DFAs, you may like to see [57, 60, 91] and the references
therein.
The pumping lemma for regular languages as treated here was found by Bar-Hillel
et al. [8]. This only gives a necessary condition for a language to be regular. For
stronger forms of pumping lemma giving both necessary and sufficient conditions,
see [29, 61, 126].
We have not discussed here the algebraic theory of automata and regular lan-
guages. In one of the exercises below, you will find Kleene algebra and its complete
axiomatization. This was introduced by Kozen [70]. The first complete axiomatiza-
tion of the algebra of regular languages was given by Salomaa [112]. For an extensive
treatment of the topic, see Conway [22]. See also [134] for algebraic recognizability
of languages. You may like to see [6] and the references there in.
As minimal NFAs are not necessarily unique up to isomorphisms, we have not
discussed it here. In one of the exercises below we ask some pertinent questions. A
part of Myhill–Nerode theorem can be generalised to NFAs [71] using the notion of
bisimulation, which is an important notion for concurrency; see for example, [88].
You may also like the presentations about term automata and two way finite automata
in Kozen [71].
An important application of DFAs is pattern matching with regular expressions;
see [4, 130]. For a nice example and an algorithm for search using regular expres-
sions, see [69].
4.43. Assume that all our NFAs have only a single final state. (Can you show that
there is no loss in generality in assuming this?) There are some simple modifications
120 4 Structure of Regular Languages
for taking care of the operations of union, concatenation, and Kleene star as given in
the following:
(a) For union, merge the two initial states into one with all the transitions of both the
NFAs. Likewise, merge the two final states having all transitions to both of them
to go to the merged state.
(b) For concatenation, merge the final state of the first NFA with the initial state of
the second.
(c) For the Kleene star, add ε-transitions from the final state to the initial state.
Show that each of these simplifications do the intended job.
4.44. Show that the set of all triples of bits that represent correct multiplication is not
a regular language.
4.46. Let L, L be regular languages over an alphabet Σ. Show that the following
languages are regular, or give a counter example:
(a) L L = {w ∈ Σ ∗ : either w ∈ L or w ∈ L }. (Symmetric Difference)
(b) nor (L, L ) = {w ∈ Σ ∗ : w ∈ L and w ∈ L }. (Neither–Nor)
(c) cor (L, L ) = {w ∈ Σ ∗ : w ∈ L or w ∈ L }. (Complementary Or)
(d) L/L = {w ∈ Σ ∗ : wv ∈ L for some v ∈ L }. (Right Quotient)
(e) L \ L = {w : u ∈ L , uw ∈ L}. (Left Quotient)
(f) head(L) = {w : wu ∈ L for some u ∈ Σ ∗ }.
(g) tail(L) = {w : uw ∈ L for some u ∈ Σ ∗ }.
(h) pr une2(L) = {w : στ w ∈ L for some σ, τ ∈ Σ}.
(i) shi f t(L) = {σ2 σ3 · · · σn σ1 : σ1 σ2 · · · σn ∈ L}.
(j) swap(L) = {τ uσ : σuτ ∈ L for some σ, τ ∈ Σ, u ∈ Σ ∗ }.
(k) shuf f le(L 1 , L 2 ) = {w1 v1 w2 v2 · · · wn vn : w1 w2 · · · wn ∈ L 1 , v1 v2 · · · vn ∈ L 2 }.
(l) mi nus5(L) = {abcdw : abcdew ∈ L}. (Removing the fifth symbol)
(m) r em2(L) = {w ∈ Σ ∗ : abw ∈ L for some a, b ∈ Σ}.
(n) mi n(L) = {w ∈ L : there is no u ∈ L, v ∈ Σ ∗ with w = uv}.
(o) max(L) = {w ∈ L : if v ε, then wv ∈ L}.
(p) le f t(L) = {w : ww R ∈ L}.
(q) hal f (L) = {u : uv ∈ L and
(u) =
(v)}.
(r) 1thir d(L) = {u : uv ∈ L and 2
(u) =
(v)}.
(s) mthir d(L) = {v : uvw ∈ L and
(u) =
(v) =
(w)}. (Middle–Third)
(t) del1(L) = {uv : u1v ∈ L} when Σ = {0, 1}.
(u) odd(L) = {a1 a3 · · · a2n−1 : a1 a2 a3 · · · ak ∈ L where k = 2n − 1 or 2n}.
(v) even(L) = {a1 a3 · · · a2n : a1 a2 a3 · · · ak ∈ L where k = 2n or 2n + 1}.
(w) cycle(L) = {uv : vu ∈ L for some strings u and v}.
4.47. Find (aba ∗)/(a ∗ baa ∗). Construct two languages L, L to show that L L/L is
not necessarily equal to L. [See part (d) of the last problem.]
4.48. The set of regular languages R E G(Σ) over an alphabet Σ is closed un-
der union, concatenation, intersection, and complementation. Moreover, each finite
4.6 Summary and Additional Problems 121
subset of Σ ∗ is regular. Is it true that R E G(Σ) is the least class of subsets of Σ ∗ that
contains all finite subsets of Σ ∗ and is closed under
(a) union, intersection, and complementation?
(b) union, concatenation, and complementation?
(c) intersection, complementation, and concatenation?
4.49. A substitution is a generalization of a homomorphism, in the sense that instead
of associating a string, it associates a language to a symbol. Formally, a substitution
∗
from an alphabet Σ to an alphabet Γ is a map h : Σ → 2Γ , that is, h(σ) is a
language over Γ. We extend such an h to Σ ∗ by taking h(ε) = ε and h(wσ) =
h(w)h(σ) for strings w ∈ Σ ∗ . Show that R E G(Σ) is closed under substitution.
4.50. A language is called definite if there is k ∈ N such that whether any string w is
in L or not depends only on the last k symbols of w. Suppose L, L are any definite
languages over an alphabet Σ. Show that
(a) L is regular.
(b) L ∪ L is definite.
(c) Σ ∗ − L is definite.
(d) L L is not necessarily definite.
(e) L ∗ is not necessarily definite.
4.51. For i = 1, 2, let Mi = ({0, 1, 2, 3, 4}, {0, 1}, δi , 0, {0}) be two DFAs, where for
q ∈ {0, 1, 2, 3, 4} and σ ∈ {0, 1}, the transitions δ1 , δ2 are given by
δ1 (q, σ) = (q 2 − σ) mod 5, and δ2 (q, σ) = (q 2 + σ) mod 5.
Prove that L(M1 ) = The set of binary strings containing an even number of 1’s. What
is L(M2 )? Give minimal DFAs equivalent to both M1 and M2 . [Note: Here, we write
the states as 0, 1, 2, 3, 4 instead of q0 , q1 , q2 , q3 , q4 , for convenience.]
4.52. Prove that any equivalence class of a right congruence of finite index on Σ ∗ is
a regular subset of Σ ∗ .
4.53. The usual way of writing numbers is called the decimal notation; this uses
the digits 0 to 9. In unary notation a number n is written as 0n . Are the following
languages regular? Justify.
(a) {w : w is the unary representation of a number divisible by 11}.
(b) {w : w is the decimal representation of a number divisible by 11}.
(c) {w : w is the unary representation of 10n for some n ∈ N}.
(d) {w : w is the decimal representation of 10n for some n ∈ N}.
(e) {w : w is a sequence of digits in the decimal expansion of 1/7}.
(f) {w : w is the unary representation of a number n such that there is a pair of twin
primes bigger than n} [Numbers p, p + 2 are called twin primes if both of them
are primes.]
4.54. Construct a nonregular language that satisfies the conclusions of the pumping
lemma (for regular languages).
4.55. Let A ⊆ N. Define bi nar y( A) = {binary representations of numbers in A}, and
unar y( A) = {0n : n ∈ A}. Which one of the following sentences is true and which
one is false? Justify.
122 4 Structure of Regular Languages
4.63. Does there exist a regular language L over {0, 1} such that both L and L have
infinite subsets that are not regular languages?
4.6 Summary and Additional Problems 123
4.64. Let p(·) be a polynomial of degree n whose coefficients are natural numbers.
Show that if L is a regular language, then so is L = {u :
(v) = p(
(u)) for some
v ∈ L}.
4.67. For any w ∈ {a, b}∗ , let tail(w) = {uw : u ∈ {a, b}∗ }; the set of strings
which end with the string w. For example, the NFA ({ p, q, r }, {a, b}, Δ, p, {r }) with
Δ = {( p, a, p), ( p, b, p), ( p, a, q), (q, b, r )} accepts tail(ab). tail(w) is also written
as tail({w}). Show that the minimal DFA for tail(w) has n + 1 states if
(w) = n.
4.68. (Greibach) Let N be an NFA with a single final state. First, reverse the tran-
sitions and interchange the initial and final states to get an NFA for L(N) R . Next,
use the subset construction to get an equivalent DFA for L(N) R . Finally, repeat the
previous two steps once again. Prove that the DFA so obtained is a state-minimized
DFA for L(N).
4.73. Give a construction of NFAs having arbitrarily large number of states, where
the length of the shortest rejected string can be exponential in the number of states.
124 4 Structure of Regular Languages
4.74. Write a computer program that produces a minimal DFA for a given DFA.
4.75. Two interesting particular cases of quotients of languages are obtained by tak-
ing one of the languages as a singleton. Fix an alphabet Σ. Let L be a language and
a be a symbol. Define L/a = {w ∈ Σ ∗ : wa ∈ L}. For example, if Σ = {a, b} and
L = {ab, baa}, then L/a = {b}. Prove or disprove the following:
(a) If L is regular, then so is L/a.
(b) (L/a)a = L.
(c) (La)/a = L.
Define similarly the left quotient a/L = {w ∈ Σ ∗ : aw ∈ L} and then ask and answer
questions similar to (a–c).
da
= dL
da
∪ dL
da
.
d(L L )
(e) If ε ∈ L, then da = d(L) da
L .
d(L L )
(f) If ε ∈ L, then da = da L ∪ d(L
d(L)
da
)
.
5.1 Introduction
You have tried to construct a regular grammar for the language {a k bk : k ∈ N}, but
failed. However, the strings of the form a k bk can be generated by two productions
such as S → ε, S → a Sb. For example, with such productions, we can have a
derivation for a 3 b3 as
1 1 1 1 1
derivations u ⇒ u1 ⇒ u2 ⇒ · · · ⇒ un ⇒ v or when u = v. Such a sequence is also
called a derivation. The length of this derivation is either n + 1 or 0, accordingly.
If many context-free grammars are involved in a particular context, then we will
write ⇒G to say that the derivation is done in the grammar G. A string w ∈ Σ ∗ is
generated by the grammar G iff S ⇒ w in G.
The language generated by G is L(G) = {w ∈ Σ ∗ : S ⇒ w in G}. A language is
called a context-free language (CFL) if it is generated by a context-free grammar.
The word “context-free” comes from the mode of application of a production.
The production A → u is applied on the string x Ay, where the context x (on the left)
along with y (on the right) remains unchanged. That is, it can be applied without
bothering what x and y are. When this mode of application is violated, a grammar is
called context-sensitive. We will come across such grammars later in an appropriate
place.
Example 5.1. Let G = ({S}, {a, b}, {S → ε, S → a Sb}, S). What is L(G)?
Solution. This is the grammar that we discussed in the first paragraph of this chapter.
It is, of course, obvious that L(G) = {a n bn : n ∈ N}. But how do you prove it?
Induction?
For any n ∈ N, if we apply the production S → a Sb, n-times, and then S → ε,
we will have a derivation of a n bn as
S ⇒ a Sb ⇒ a 2 Sb 2 ⇒ · · · ⇒ a n Sbn ⇒ a n bn .
Conversely, if any string is generated by G, then the last production applied must
be S → ε, as the only production in G having no nonterminal symbol on its right
hand side is S → ε. Moreover, S → ε could not have been applied anywhere other
than the last step of the derivation for obtaining a string of terminal symbols. Hence,
the production S → a Sb has only been applied repeatedly before the application of
S → ε. Thus, the only strings that could be generated by G are of the form a n bn .
For an arbitrary context-free grammar, the proof may not be so straight forward.
It may demand the use of induction either on the length of a string or on the length
of a derivation. Sometimes we may need to use, in such proofs, whatever that can
be derived on from the start symbol S in a grammar rather than the derived strings
of terminals. These more general strings are called sentential forms. In this termi-
nology, a string generated by a grammar is a sentential form consisting of terminal
symbols only.
Each regular grammar is a context-free grammar. What do you conclude from this
about regular and context-free languages?
5.2 Context-free Grammars 127
Example 5.2. Let G = ({S}, {a, b}, {S → SS|a Sb|ε}, S). What is L(G)?
Solution. You must first construct a few derivations, and then look at the produc-
tion rules to generalize from these typical strings. If required, you will prove your
conjecture by induction. Here are a few derivations:
S ⇒ SS ⇒ a SbS ⇒ ab.
S ⇒ SS ⇒ a SbS ⇒ abS ⇒ abab, (using the previous derivation)
S ⇒ SS ⇒ SSS ⇒ SSa Sb ⇒ SSaa Sbb ⇒ Sa Sbaa Sbb ⇒ abaabb.
We observe from the derivations that the second and third production rules develop
a string of interlaced a’s and b’s. If you think of a as “(” and b as “),” then the string
is simply a correct sequence of closing parentheses. The closing of parentheses (we
work with a, b instead) can be conveniently described in an inductive way by the
following formation rules:
Now you see that the grammar is only a translation of the formation rules. Do you
see or not? We have avoided an inductive proof by the idea of these formation rules.
However, this is only a shifting of responsibility; it is no proof!
Exercise 5.2. Show that the language is as claimed in the solution of Example 5.2.
Example 5.3. Let G = ({S, A, B}, {a, b}, R, S) be the CFG with R = {S →
a A | b B, A → b | bS | a A A, B → a | a S | b B B}. What is L(G)?
Solution. What are the strings generated by G? Have some derivations before think-
ing abstractly.
S ⇒ a A ⇒ ab.
S ⇒ a A ⇒ abS ⇒ abab.
S ⇒ a A ⇒ aa A A ⇒ aababbab.
If we follow the path of B instead of A, we might have to interchange a’s with b’s,
from that point onward, for example,
So, what do you guess? No good patterns? Well, it seems that number of a’s must be
equal to the number of b’s. Look at the production rules again and convince yourself
why this guess is more suggestive than others. Even if it is obvious, we do require
a proof, and naturally, by induction. All the most, as it is not very obvious, we must
128 5 Context-free Languages
give a proof. The proof might also give more insight into the pattern. Note that the
guess might turn out to be wrong, and then we may have to modify our guess and try
a fresh proof. So, let us try a proof by induction.
Can ε be generated by G? Obviously not; the rules do not mention of ε. Moreover,
no string of length 1 can be generated by G. So, let w ∈ {a, b}∗ be of length 2. We
have seen that ab ∈ L(G). Similarly, ba ∈ L(G); give a derivation. Also, argue that
aa ∈ L(G) and bb ∈ L(G). This completes the basis step; our basis step starts with
(w) = 2.
Lay out the induction hypothesis that if w ∈ {a, b}∗ and
(w) < n, then w ∈ L(G)
iff #a (w) = #b (w). Let w ∈ {a, b}∗ with
(w) = n. We must show that
How to show (a)? Look at the derivations above. You see that the last rule applied
will be either A → b or B → a, and that will replace the only nonterminal in the last
but one step of any derivation. According as w has the last symbol a or b, you would
have to apply the second or first rule at the last step. If your guess is correct, then
the symbol A must be generating one more b than required and the symbol B must
be generating one more a. That is, we may have to prove these statements first. Do
not be disheartened; add two more statements to be proved by induction. The revised
statements are
Exercise 5.3. Consider the context-free grammar G = ({a, b}, {S, A}, R, S) with
R = {S → A A, A → b | a A | Aa | A A A}. Show by induction on the number of
derivation steps that if S ⇒ w (for w ∈ {a, b, S, A}∗ ), in at least one step, then
#a (w) + #b (w) is even. What is L(G)?
that number of “[” matches with the number of “]?” Certainly not; for example, ] [
is not a string with balanced parentheses. What we need is the restriction that the
number of left parentheses does not exceed the number of right parentheses in any
prefix of a string of balanced parentheses. Suppose x is any string over the alphabet
{[, ]}. Let us use the following notation:
left (w) = right(w) and for each prefix y of w, left (y) ≥ right(y).
The fact is that the language of balanced parentheses is generated by the CFG with
productions S → ε|SS|[ S ]. A proof requires that if a string is at all generated by
the CFG, then it satisfies the conditions above. This is proved by induction on the
length of a derivation. The other implication that any string over the alphabet {[, ]}
that satisfies the above conditions is generated by the CFG is proved by induction on
the length of the string. Why don’t you try it?
5.1. Is it rue that if a CFG generates ε, then it does so via a derivation of length 1?
5.2. Which of the strings aaaaba, aabaab, abaaba, aabbaa are generated by the
CFG, with productions S → AB|AB S, A → b|b A, B → a A, and which are not?
Give reasons.
5.3. Show that the grammar with productions S → SS|ε|a Sb|bSa generates the
language L = {w ∈ {a, b}∗ : #a (w) = #b (w)}.
5.8. Show that the string aabbabba is not generated by the CFG that has the produc-
tions S → aaA, A → Ba, B → ε|b Ab.
5.9. Give a verbal description of the language generated by the CFG with productions
S → ε|AB, A → aB, B → Sb.
5.10. Give a derivation of the string abbaaa in the CFG that has the productions
S → ε|a Aa|b Ab, A → SS. Try to describe, in English, the language generated by
this grammar.
5.11. What is L(G), where G = ({a, b}, {S, A}, R, S) with R = {S → a|a AS, A →
ba|SS|Sb A}? Prove your claim.
5.12. What is the shortest string that can be derived in the CFG with productions
S → a B|aaabb, A → a|B, B → bb|AaB? Prove your claim.
5.13. Show that {w ∈ {a, b}∗ : #a (w) = #b (w)} is generated by the CFG with produc-
tions S → a A|b B, A → b|bS|BAA, B → a|a S|ABB.
5.14. Show that the CFG with productions S → a|ab A|BC, A → c|ab A, B → b
does not generate the language {w ∈ {a, b, c}∗ : #a (w) = #b (w)}.
5.16. What is the language generated by the CFG with productions S → A A|B,
A → a A|Aa|b, B → a Baa|b? Is it regular?
5.18. What is the language generated by the CFG given by the productions
(a) S → ab|abS?
(b) S → ab|SS?
(c) S → a A, A → a|bS?
(d) S → ε|a|b, A → ε|B, B → b|A?
(e) S → a Sb|a Ab, A → b|b A?
(f) S → a Sb|a Ab, A → ba|b Aa?
(g) S → a|b|a Sb|a A|b B, A → a|a A, B → b|b B?
(h) S → a A|bS|a|b, A → b A|bS|b?
(i) S → AC|C B, C → ε|aCb, A → a|a A, B → b|b B?
(j) S → ab B, B → bb Aa, A → ε|aa Bb?
(k) S → ε|a Sb|SS?
5.2 Context-free Grammars 131
5.19. Find CFGs that generate the following languages over {a, b}:
(a) {a n+1 b2 abn : n ∈ N}.
(b) {ww R : w ∈ {a, b}+}.
(c) {a n bn+1 : n ∈ N}.
(d) {a m bn : m n − 1; m, n ∈ N}.
(e) {a m bn : 2m ≤ n ≤ 3m; m, n ∈ N}.
(f) {a m bn : m ≥ n; m, n ∈ N}.
(g) {a m bn : m ≤ 2n; m, n ∈ N}.
(h) {a m bn : m 2n; m, n ∈ N}.
(i) {a m bn : m = n or m = 2n; m, n ∈ N}.
(j) {a m bn : m > 2n; m, n ∈ N}.
(k) {a m bn : m ≤ n + 3; m, n ∈ N}.
(l) {w ∈ {a, b}∗ : #a (w) #b (w)}.
(m) {w ∈ {a, b}∗ : #a (w) ≥ 3}.
(n) {w ∈ {a, b}∗ : #a (x) ≤ #b (x) for any prefix x of w}.
(o) {w ∈ {a, b}∗ : #b (w) = 2#a (w)}.
(p) {w ∈ {a, b}∗ : #b (w) = 2#a (w) + 1}.
(q) {a n b2n : n ≥ 1}.
(r) {a m bn : m, n ≥ 1, m n}.
(s) {w ∈ {a, b}∗ : #a (w) = 2 · #b (w))}.
(t) {uawb : u, w ∈ {a, b}∗ ,
(u) =
(w)}.
(u) {a m bm : m ≥ 1} ∪ {b m a m : m ≥ 1}.
(v) {a m bn a m : m, n ≥ 1} ∪ {a m bn cn : m, n ≥ 1}.
(w) {a m bn : 1 ≤ m ≤ n}.
5.21. Give a simple description of the language generated by the grammar with
productions S → a A|ε, A → bS.
5.26. Show that G 1 , G 2 generate the same language, where their productions are
G 1 : S → SS|SSS|a Sb|bSa|ε and G 2 : S → SS|a Sb|bSa|ε.
5.28. Find a CFG that generates all regular expressions over the alphabet {a, b}.
5.29. Find a CFG that generates all production rules for the CFGs with a, b as ter-
minals and A, B, C as nonterminals.
5.30. Define precisely what one means by properly nested parentheses when two
types of parentheses are involved. For example, ( ), [ ], ( [ ( ) ( [ [ ] ] ) ] ) are properly
nested but ( [ ) ], ( ( ] ) [ ] are not. Using your definition, give a CFG for generating all
properly nested parentheses, when two types of parentheses are involved.
Example 5.4. Let G be the grammar of Example 5.3. It has the productions:
S → a A | b B, A → b | bS | a A A, B → a | a S | b B B.
These derivations are depicted in Fig. 5.1. Both the diagrams are trees, but we
had also placed numbers 1–6 below the derived symbols according to the step in
which the corresponding string is derived. Without these numbers, the trees are not
different.
S S
a A a A
(1) (1) (1) (1)
a A A a A A
(2) (2) (2) (2) (2) (2)
b b S b b S
(6) (3) (3) (3) (4) (4)
b B b B
(4) (4) (5) (5)
a a
(5) (6)
The numbering in the right hand side tree in Fig. 5.1 has a nice property. In this
derivation, only the first-from-left nonterminal symbols are rewritten by applying
a production. Look at the derivation tree again and its numbering. The first-from-
left nonterminal is expanded first in the tree. In the first derivation (the left tree in
Fig. 5.1), this heuristic is not followed.
Exercise 5.4. Give a leftmost derivation of the string aabbabaababb in the context-
free grammar ({S}, {a, b}, {S → ε | SS | a Sb}, S), and also label the nodes of the
tree correspondingly.
Are there context-free grammars where a string can be derived but it cannot be de-
rived by any leftmost derivation? The derivation trees above suggest that this should
not be the case. For, if we have a derivation of a string, we draw its derivation tree
and then reassign the numbers to various nodes. We can always have a renumbering
that would correspond to a leftmost derivation. This is the content of the following
lemma; you may try a formal proof by induction on the number of derivation steps.
Exercise 5.5. Use the construction in the proof of Lemma 5.1 and induction on “the
least n such that at nth step the leftmost condition is violated” to give a formal proof
of the existence of a leftmost derivation.
The first derivation in Example 5.4 (left tree in Fig. 5.1) was:
Scanning from the left (Here, take a piece of paper, write the above derivation and
follow up.), you find that at the third step of the derivation (not before it), the leftmost
condition is violated. We should have grown on the first A instead of the second. We
proceed further to check where this first A has been grown in the derivation. This is
done in the sixth step, giving rise to one b. Thus we switch the sequence of steps.
That is, the last step is brought to the third. This means that we first construct
S ⇒ a A ⇒ aa A A ⇒ aab A ⇒
and then continue as earlier, omitting the sixth and the last step, to obtain
Again, we scan from the left of whatever derivation we have got thus far, looking for
any violation of the leftmost condition. We see that there is no violation as such. We
stop, and we have got a leftmost derivation.
Revisit the derivation trees of Example 5.4 in Fig. 5.1. Omitting the step num-
bers assigned to them, you find that they are the same tree. Abstractly speaking, the
derivation trees are, in a way, fixed by the grammar. The derivation trees where we
do not mention the step numbers of the derivation are also called parse trees.
5.3 Parse Trees 135
Many derivation trees can give rise to the same parse tree; this corresponds to
many different derivations having the same structure. Consider once again the gram-
mar G of Example 5.4, which has productions
S → a A | B, A → b | bS | a A A, B → a | a S | b B B.
a A
b S
b B
b B B
Fig. 5.2. Another parse tree
for Example 5.4. a a
In constructing the parse tree in Fig. 5.2, we have used the productions S →
a A, A → bS, S → b B, B → b B B in that order. The parse tree shows a derivation
(many derivations at a time) of the string abbbaa. In fact, other rules could have
been used and also in a different order, giving rise to many parse trees. Look at the
parse trees in Fig. 5.3.
S S
b B a A
b B B b
The first parse tree is a parse tree for the string bba B and the second is for the
string ab. You see that the string for which a parse tree has been constructed is not
necessarily a string over the underlying alphabet of the grammar; it may contain some
nonterminal symbols. We will refer to all these strings as the yields or harvests of the
tree. We can have a formal definition of a parse tree.
Let G = (N, Σ, R, S) be a context-free grammar. The parse trees, their roots,
leaves, and yields can be defined inductively as in the following:
(a) For each A ∈ N ∪ Σ, the single vertex parse tree is A, whose root is A, whose
set of leaves is {A}, and its yield is A.
136 5 Context-free Languages
"
A1 ··· An
y1 yn
This parse tree has the yield as the concatenated string y1 . . . yn , and root A.
(d) The parse trees are generated using the only rules as (a–c) above.
Exercise 5.6. Prove the last statement by using induction on the length of a derivation
or on the height of a parse tree.
Example 5.5. Let G = ({S, A}, {a, b}, {S → a | a AS, A → ba | SS | Sb A}, S).
Some of the parse trees in G are drawn in Fig. 5.4. The corresponding yields are
a, aba S, and aa ASaa.
S S S
a a A S
a A S
b a
S S a
a A S a
S ⇒ a.
S ⇒ a AS ⇒ aba S.
S ⇒ a AS ⇒ a SSS ⇒ aa ASSS ⇒ aa ASa S ⇒ aa ASaa.
However, the last parse tree in Fig. 5.4 also represents the derivation
This derivation, of course, generates the same string. We see that the former deriva-
tion was a leftmost derivation while the latter is not. See Fig. 5.4.
It is easy to see that there is a correspondence between parse trees and leftmost
derivations. Though a parse tree represents possibly many derivations of a string,
among them there can be only one leftmost derivation.
5.31. Draw all possible parse trees for the string aabbaa in the CFG with productions
S → a|a S A, A → ba|SS|Sb A.
5.32. In the CFG with productions S → a B|b A, A → a|a S|b A A, B → b|bS|a B B,
give leftmost and rightmost derivation trees with proper numbering of the nodes with
the same yield as the string aabbabab.
5.33. In the CFG with productions S → a|b|SS, give two strings each of which
has exactly two leftmost derivations. Give two strings each of which has a unique
leftmost derivation.
5.4 Ambiguity
Just like many possible derivations of a string, is it possible to have many parse trees
with the same yield? You have already seen that each leftmost derivation gives rise to
a unique parse tree and each parse tree also represents a unique leftmost derivation.
In this light, we ask: whether there are grammars where a string has different leftmost
derivations? You can rephrase the question via parse trees. It thus reads as whether
there are grammars in which there can be constructed distinct parse trees with the
same yield? See the following example.
Example 5.6. Let G = ({a, b, +, ×}, {S}, R, S), where R = {S → a | b | S+S | S×S}.
Clearly, a × b + b ∈ L(G). Here are two different derivations:
138 5 Context-free Languages
S ⇒ S + S ⇒ S × S + S ⇒ a × S + S ⇒ a × b + S ⇒ a × b + b.
S ⇒ S × S ⇒ a × S ⇒ a × S + S ⇒ a × b + S ⇒ a × b + b.
Moreover, both the derivations are leftmost derivations. What are the parse trees?
Look at Fig. 5.5. The parse trees are clearly different.
S S
S + S S × S
S × S b a S + S
a b b b
In Example 5.6, if you read the derivations and look at the parse trees, you may
see how bracketing of the symbols is represented by parse trees, though there are no
brackets used in the grammar.
Suppose that a, b are some numbers and +, × are the usual operations of plus and
product of numbers. Suppose that you evaluate a string such as the above by using
its parse tree. In this method, you start looking at the leaves. The value of a leaf is
passed to its parent. If there are three leaves somewhere, of which the middle one is
an operation, say, +, then this node, the parent node of the leaves, stores the value of
first leaf plus the third leaf. The evaluation proceeds toward the root. In the first tree
above, the root node stores the value (a × b) + b, while the root node of the second
tree stores the value a × (b + b).
This is why we would be interested in understanding when a grammar permits
many parse trees with the same yield. It is, of course, dangerous to have such a
grammar for any fragment of a programing language.
Exercise 5.7. Show that for the CFG G = ({a, b, +, ×, ), (}, {S}, R, S) with R =
{S → a | b | (S + S) | (S × S)}, there cannot be two parse trees for the same yield.
example, the grammar of Example 5.6 is ambiguous, while that in Exercise 5.7 is
unambiguous. But can we have an unambiguous context-free grammar for the lan-
guage of the grammar in Example 5.6?
S S
S A B S
Fig. 5.6. Trees for
Example 5.8. a a a a
Solution. You can generate a m bm by a context-free grammar. Similarly, you can have
one for ck d k . Then just add a new start symbol, have a production expressing “the
new start symbol can be the start symbol of the first followed by the start symbol of
the second.” This will give you a grammar for a m bm ck d k . Similarly, you can try for
the second one in the union and then try to combine the grammars for obtaining the
desired one.
We have one such grammar G = ({S, A, B, C, D}, {a, b, c, d}, R, S), where R =
{S → AB | C, A → a Ab | ab, B → cBd | cd, C → aCd | a Dd, D → b Dc | bc}. Is
G really ambiguous? Yes, as we have two distinct parse trees for the string a 2 b2 c2 d 2
in G. See Fig. 5.7. In fact, any context-free grammar for L is ambiguous. But the
proof of this fact is far from obvious, and we will not indulge in this tedious task.
S S
C
A B
a A b c B d
a C d
a b c d
a D d
b D c
b c
5.34. Show that the CFG with productions S → ε|SS|[ S ] generating all balanced
parentheses is ambiguous. Construct an unambiguous CFG for the same language.
5.35. The CFG G with productions S → +SS| × SS| − SS|a|b generates arithmetic
expressions with the operations +, ×, and − and operands a and b, in prefix nota-
tion. Find leftmost and rightmost derivation trees for the string − × +baab. Is G
ambiguous?
5.36. Let L = L(G), where G has productions S → ε|a S|a SbS. Show that
(a) G is ambiguous.
(b) L consists of strings whose every prefix has, at the least, as many a’s as b’s.
(c) L is not inherently ambiguous.
5.5 Eliminating Ugly Productions 141
5.37. Show that the CFG with productions S → a|a Ab|abSb, A → bS|a A Ab is
ambiguous.
5.39. A CFG is called simple if each of its productions is in the form A → ax, where
A is a nonterminal, a is a terminal, x is any string of terminals and nonterminals,
and the pair ( A, a) occurs at most once in the productions. For example, the CFG
with productions S → ε|a S|bSS is simple, while the CFG with productions S →
ε|a S|a SS|bSS is not.
(a) Find a simple CFG to generate the language {b} ∪ aaa ∗b.
(b) Find a simple CFG for {a n bn : n ∈ N}.
(c) Find a simple CFG for {a n bn+1 : n ∈ N}.
(d) Show that every simple CFG is unambiguous.
(e) Show that if G is a simple CFG, then any w ∈ L(G) can be parsed with an effort
proportional to
(w).
(f) For a simple CFG G, give an upper bound for the number of productions in terms
of the number of terminals and the number of nonterminals.
5.40. Show that the CFGs with productions given below are ambiguous, but the
languages they generate are not inherently ambiguous:
(a) S → AB|aa B, A → a|Aa, B → b.
(b) S → ε|SS|a Sb.
5.44. Construct an unambiguous CFG for generating all regular expressions over
{a, b}.
5.45. Are the CFGs with the productions as given below ambiguous? Justify.
(a) S → ε|a SbS|bSa S.
(b) S → ε|SS|a Sb|bSa.
(c) S → a AB, A → b Bb, B → ε|A.
(d) S → ε|bCa, C → ε|bCa.
While deriving strings in a CFG, you might have noticed an irritant. It is the absence
of any information as to whether you are going in the right direction. The irritation
is bound to persist, so to speak. For example, in a CFG, it can very well happen that
142 5 Context-free Languages
after some steps of a derivation, you get a string of much bigger length than aimed
at string of terminals. That is, after some more steps, some of the nonterminals just
give the empty string.
To avoid such a situation, you may try to construct a grammar without the pro-
ductions of the form A → ε, the empty productions or the ε-productions.
In that case, the empty string will not be generated. But can you get everything
else? If so, then later you might just add to it the production S → ε, where S is the
start symbol. Thus, we require a CFG whose language does not contain ε and where
no step of a derivation will decrease the length of the derived string. Moreover, given
any CFG, can we construct one such grammar for generating the same language, of
course, leaving the empty string as an exception?
The answer is, in fact, in the affirmative, and we have at least two such normal
forms for any CFG. These normal forms are some restricted versions of context-free
grammars. In this section, we will prepare for arriving at the two normal forms. In
the next section, we will show how to reach at the normal forms.
If a derivation gives a string (of terminals and/or nonterminals) of length less than
the current one, then empty productions are present in the CFG. We must thus try to
eliminate all empty productions. To eliminate an empty production A → ε means
that except the string ε we must be able to generate all other strings that could have
been generated by using the empty production, and nothing more. The problem is,
such productions might be used in conjunction with others in deriving nonempty
strings. In that case, how are such productions used?
Let G = (N, Σ, R, S) be a CFG, where A → ε ∈ R. The only way it can
be used nontrivially (trivial when A = S) is that A must occur on the right hand
side of other productions. So, suppose B → x Ay ∈ R is another production where
x, y ∈ (N ∪Σ)∗ . In any derivation, their conjunction gives us B ⇒ x y. This amounts
to considering the production A → ε in R and then adding B → x y to R for all such
productions B → x Ay ∈ R. Once R is updated this way, we may remove A → ε
from R. But, wait a bit. First let us check whether the new grammar generates the
same language as G. Define the new set of productions R1 as in the following:
For each A ∈ N and A → ε ∈ R, construct
R A = {B → x y : B → x Ay ∈ R, x, y ∈ (N ∪ Σ)∗ }.
Next, construct Rε = ∪ A∈N R A .
Finally, take R1 = R ∪ Rε .
Let G 1 = (N, Σ, R1 , S). As R ⊆ R1 , any string that is derived in G is also derived
in G 1 . Thus L(G) ⊆ L(G 1 ). Moreover, any string that is derived in G 1 using one
production, say, B → x y in Rε can also be derived in G by using the productions
B → x Ay and A → ε for some A ∈ N. This shows that L(G 1 ) ⊆ L(G), and
therefore, L(G) = L(G 1 ).
Now, we can proceed to removing all empty productions from R1 . Let
1
S ⇒G 1 x Ay ⇒G 1 x y ⇒G 1 u, for x ∈ (N ∪ Σ)+ or y ∈ (N ∪ Σ)+ .
1
S ⇒G 1 α Bβ ⇒G 1 αx Ay β = x Ay ⇒G 1 x y ⇒G 1 u.
S ⇒G 2 α Bβ ⇒G 2 αx y β = x y ⇒G 2 u.
Mark S reachable.
If X ∈ N has been marked reachable, and X → αYβ ∈ R, then mark Y
reachable.
You can show easily, by induction, that the set of reachable symbols is correctly
computed by the above marking scheme. To eliminate the nongenerating nontermi-
nals, we assume that L(G) ∅. This assumption guarantees that the start symbol S
is a generating nonterminal. We compute the set of generating nonterminals by the
following iterative scheme:
The useless symbols are then eliminated by first throwing away nongenerating sym-
bols, and all productions wherever they occur, and then by eliminating all sym-
bols that are not reachable. For example, consider the CFG with productions S →
a|AB, A → b. We see that S is generating as S ⇒ a, A is generating as A ⇒ b;
terminals A, b are generating, by default. The only nongenerating symbol is B. We
through away the production S → AB updating the productions to S → a, A → b.
Now, we see that the symbol A (as well as b) are not reachable. Eliminating it, we are
left with the only production S → a. And that is it; we have a CFG with no useless
symbols. That is, we proceed as follows:
5.5 Eliminating Ugly Productions 145
Compute the generating nonterminals and mark the nongenerating ones. Delete
from R all productions involving one or more of these marked nongenerating
nonterminals.
Compute the reachable nonterminals and mark the nonreachable ones. Delete
from R all productions involving one or more of these marked nonreachable
nonterminals.
Delete all terminals that do not occur in the updated productions.
Update the CFG G and call it G 4 .
Notice that the order of elimination in the above procedure does matter. If you elimi-
nate the nonreachable nonterminals first, and then eliminate the nongenerating ones,
you may not, in general, reach at a CFG equivalent to G.
An easy induction proof yields the following result:
Exercise 5.8. What happens if we first eliminate the nonreachable symbols and then
nongenerating symbols? Will the resulting CFG be without useless symbols? [Hint:
Carry out the steps on the CFG with productions S → a, A → b.]
5.46. Find a CFG without ε-productions equivalent (except generating ε) to the CFG
with productions S → ABaC, A → BC, B → ε|b, C → ε|D, D → c.
5.47. Find a CFG without unit productions equivalent to the CFG with productions
S → B|Aa, B → A|bb, A → a|bc|B.
5.49. Show that the CFGs G, G with the productions as given below are equivalent:
5.51. Eliminate all unit productions from the CFG with productions S → A|S + A,
A → B|A × B, B → C|(S), C → a|b|Ca|Cb|C0|C1.
5.53. Eliminate useless symbols from the CFG with productions S → AB|C A,
A → a, B → AB|BC, C → b|a B.
5.54. Eliminate all unit productions and ε-productions from the CFG with produc-
tions S → b A|b B B, A → ε|bb A, B → a B|aaC, C → B. Try simplifying the
resulting grammar further by eliminating useless symbols. What is L(G)?
5.55. Eliminate all unit productions from the CFG that has the productions S → c|
A|B, A → D|Ca, B → E|Cb, C → d|D|E, D → S|Ea, E → S|Db.
In this section, we will discuss two normal forms, the Chomsky Normal Form
and the Greibach Normal Form, named after their originators, N. Chomsky and
S. A. Greibach, respectively.
A context-free grammar G is said to be in Chomsky normal form (CNF) if all
productions in G are of the form A → BC or A → σ, for nonterminals A, B, C,
and terminals σ.
Once you have eliminated the empty productions and the unit productions, you
can convert the resulting grammar into Chomsky normal form. The productions of
the form A → σ for σ ∈ Σ can be left as they are. The problematic productions are
now in the form A → A1 A2 · · · Am , where Ai ∈ N ∪Σ. (Why?) If all of Ai ∈ N, then
we can introduce new nonterminals so that there will be exactly two nonterminals on
the right hand side of → . For example, the production A → XY Z can be replaced
by the pair A → X B and B → Y Z , provided B is a new nonterminal.
On the other hand, if an Ai is a terminal symbol, then we introduce another new
nonterminal, say I corresponding to it and add a production accordingly. For exam-
ple, the production A → a B can be replaced by the pair A → X B and X → a,
where X is again a new nonterminal. We summarize the discussion in the following
statement:
Proof. (Outline) We merely give the construction of G . Using Lemmas 5.2 and 5.3,
you can give a formal proof. Let G = (N, Σ, R, S) be a CFG. The construction, to
be carried out step-by-step, is as follows:
Solution. The first step is to eliminate the empty productions. The only empty
production is S → ε. Then,
R S = {B → x y : B → x Sy ∈ R, x, y ∈ (N ∪ Σ)∗ } = {S → ab}.
Rε = ∪R S = R S , R1 = R ∪ Rε = {S → a Sb | ab | ε}.
R2 = R1 − {S → ε} = {S → a Sb | ab}.
As there is no unit production in R2 , we just introduce the new nonterminals corre-
sponding to each terminal (Steps 4 and 5). This updates, writing A for X a and B for
X b , the new productions along with the modified old ones to
A → a, B → b, S → AS B, S → AB.
The only production having more than two symbols on the right hand side is S →
AS B. This requires introducing another nonterminal, say, C and replacing this by
the pair S → AC and C → S B. We then take together all the productions to obtain
R = {A → a, B → b, S → AC, C → S B, S → AB}.
Exercise 5.9. Show that a CNF for the CFG ({S}, {a, b}, {S → a Sb | SS | ε}, S) is
({S, A, B, C}, {a, b}, {S → AB, S → AC, S → SS, C → S B, A → a, B →
b}, S).
A derivation of the string aabb in the CNF G of Example 5.10 looks like
S ⇒ AC ⇒ aC ⇒ a S B ⇒ a A B B ⇒ aa B B ⇒ aab B ⇒ aabb.
In every step of this derivation, some progress towards the string aabb is clearly
visible. The indicator of progress is the length of the current string (of terminals and
nonterminals taken together) compared with the earlier. The length grows as we go
for more and more steps of any derivation. A better progress indicator would be the
string itself. That is, imagine a derivation where in each new step, you get one more
symbol from the aimed at string. In such a case, you would first get a, next, another
a, next, b, and finally, the last b. To get a CFG where this can be realized, we may
have to do more work. Notice that in such a grammar, each production must start
with a terminal symbol, and then each application of a production would give rise
to one more relevant symbol of the string. The requirements are formalized by the
following:
5.6 Normal Forms 149
A → α, A → α B, B → C, B → C B.
Notice that we run the risk of introducing unit productions; we will see how to tackle
those later (You have at least Lemma 5.3.). For example, the string αCC can now be
derived from A as
A ⇒ α B ⇒ αC B ⇒ αCC.
Lemma 5.5 (Elimination of Self-productions). Let A be a nonterminal in a
context-free grammar G. Let A → Aα1 | Aα2 | · · · | Aαm be all self-productions in
G and A → β1 | β2 | · · · | βn be all other productions in G that have A on the left
hand side. Let B A be a new nonterminal. Let G A = (N ∪ {B A }, Σ, R A , S) with
R A = (R − {A → Aα1 | Aα2 | · · · | Aαm }) ∪
{A → β1 B A | · · · | βn B A , B A → α1 | · · · | αm , B A → α1 B A | · · · | αm B A }.
Then L(G A ) = L(G).
Proof. As A ∈ N, any leftmost derivation of any string in Σ ∗ must have an applica-
tion of a production of the form A → β j . Without loss of generality, we thus assume
that any derivation that uses a production of the form A → Aα would have in it a
derivation of the following form:
A ⇒ Aαi1 ⇒ Aαi2 αi1 ⇒ · · · ⇒ Aαik . . . αi1 ⇒ β j αik . . . αi1 . (5.1)
Such a (portion of a) derivation can be mimicked in G A by
Conversely, any derivation in G A that involves B A must have come from an earlier
A. Any derivation that involves A might use directly a production A → β j or one
of the other three types of productions (that we have introduced while eliminating
A → Aα) involving B A . The first type of productions A → β j are already in G. On
the other hand, if a derivation brings in B A from A, then it must eventually end in an
application of a production of the type B A → αi . Any such derivation in G A is of
the form (5.2) above. And each of these (portions of) derivations can be replaced by
a derivation of the type (5.1) in G. Thus L(G A ) ⊆ L(G).
We want to guarantee that in each derivation some progress is made towards the
string of terminals that is being derived. By eliminating self-productions, we see that
from an A, we would not get Aα. But we may get Cα from A. Again, somewhere
in a derivation, we may get Aβα from Cα, for example, if we have a production
C → Aβ. This would result in, essentially, a self-production again. To prevent such
a situation, let us follow an ordering of the nonterminals, say, N has the elements
A1 , A2 , . . . , Am . Our aim is to have no production of the form Ai → A j α, where
j < i . If A = A1 , C = A3 , then we cannot have a production C → Aβ after this aim
is fulfilled.
For this purpose, let us call a production of the form Ai → A j α, j < i , a
dominated production. Now, how to eliminate the dominated productions? This is
done by applying the following result inductively:
Lemma 5.6 (Elimination of Dominated Productions). Let G = (N, Σ, R, S) be
a context-free grammar. Let A → Cβ ∈ R and C → α1 | α2 | · · · | αn be all the
productions in R with C on the left hand side. Let
Rd = (R − {A → Cβ}) ∪ {C → α1 β | · · · | αn β} and G d = (N, Σ, R4 , S).
Then L(G) = L(G d ).
Proof. Consider any derivation involving A and then proceed as in the proof of
Lemma 5.5.
For eliminating the dominated productions, assume that all self-productions have
been eliminated. Then order the nonterminals. Suppose the ordered nonterminals are
A1 , A2 , . . . , Am . As we do not have any production of the form A1 → A1 α, trivially,
the following is satisfied for i = 1:
Notice that this might require eliminating self-productions which, in turn, will
introduce new nonterminals. Such nonterminals, call them B1 , B2 , . . . , Bl , are not
in the ordered ones such as A1 , . . . , Am . We will see how to tackle these new
nonterminals.
What we have done is that if (5.3) is satisfied for i = k −1, then we can modify the
productions in such a way that (5.3) will be satisfied for i = k. From Lemmas 5.4–
5.6, the language generated by a grammar with the modified productions remains the
same. Thus, inductively, this shows that we can have a CFG where all the productions
will satisfy (5.3) without changing the language.
Our argument shows that by the end of this step, all productions are in one of the
following forms:
Ai → A j α, Ai → σβ, σ ∈ Σ, or Bt → γ , (5.4)
What about the Bt ’s? A look at Lemma 5.5 shows that no production with a Bt on the
left hand side can begin with any B p . Therefore, any production of the form Bt → γ
has either a terminal or some Ai as the first symbol on the right hand side (i.e., of
γ ). If it is Ai , then use Lemma 5.6 again to replace it with productions of the form
Bt → τ α, where τ ∈ Σ. Thus the updated productions are
S → a Sb, S → ab, S → a Sb B, S → ab B, B → S, B → S B.
B → a Sb B, B → ab B, B → a Sb B B, B → ab B B.
S → a Sb | ab | a Sb B | ab B,
B → a Sb | ab | a Sb B | ab B, B → a Sb B | ab B | a Sb B B | ab B B.
In the final step, we just introduce new nonterminals for taking away the terminals
from all the right hand sides of productions, leaving the first one. Suppose we use A
for b. (For a, we need not introduce any new nonterminal here.) Then the updated set
of productions is
R̃ = {S → a S A | a A | a S A B | a AB, B → a S A | a A | a S A B | a AB,
B → a S A B | a AB | a S A B B | a A B B, A → b}.
The GNF corresponding to G is the grammar G̃ = ({S, A, B}, {a, b}, R̃, S).
Example 5.12. Construct a GNF for G = ({S}, {a, b}, R, S), where
R = {S → AB | AC | SS, C → S B, A → a, B → b}.
S → AB | AC, C → S B, A → a, B → b, S → AB E | AC E, E → S | S E.
S → a B | aC | a B E | aC E.
C → a B B | aC B | a B E B | aC E B;
E → a B | aC | a B E | aC E | a B E | aC E | a B E E | aC E E.
You may drop one of the two instances of the production E → aC E, if you wish.
There is no production where a terminal is in the middle of the right hand side; so
we need not introduce any new nonterminal. The GNF is
Next, we look for the second symbol of the given string, and choose an appropriate
production and proceed. For example, if our string is aba and our grammar has the
productions S → a AB | a B A | b AB, S → a, then in the first place, we have the
choices S → a AB and S → a B A (leaving here S → a which will not lead us
towards the string).
If we take up the first choice, then we have derived a AB. We look for the next
symbol in the given string, which happens to be b. So, we need a production that
would have the left hand side as A and the right hand side would begin with a b.
There is no such production; so this way we cannot derive the string. We go back
and take up the left out choice in the first step, that is, we start a fresh derivation by
choosing S → a B A.
Clearly, this gives an algorithm for the decision problem of determining whether
a given string of terminals is at all generated by a given grammar in GNF or not.
However, this process is exponential in the length of the given string. There is, of
course, a better algorithm that takes only O(n 3 ) time, where n is the length of the
given string. It is the so-called CYK algorithm; see Problem 10.88 in the additional
problems to Chap. 10.
5.62. Eliminate self-productions from the CFG having productions S → ε|Sa|a Bc,
B → bc|Bc.
5.63. Convert the CFGs with the following productions to equivalent CFGs in CNF:
(a) S → a|b|cSS.
(b) S → Ab, A → a B, B → a|Ab.
(c) S → ε|A|AS A, A → ε|aa.
(d) S → C Da, C → aab, D → Cc.
(e) S → A|bSb A, A → a|ba A.
(f) S → b B|AB, A → ε|bba, B → aa A.
(g) S → ba A A, A → ε|a AB, B → ε|A|b.
(h) S → a B|AS A, A → S|B, B → ε|b.
5.7 Summary and Additional Problems 155
5.65. Show that each CFG can be converted to one whose productions are in the
forms A → ε, A → BC, or A → a BC.
5.66. Convert the CFGs with the following productions to equivalent GNFs:
(a) S → ab|a Sb.
(b) S → a|b|cSS.
(c) S → bb|ba Sa.
(d) S → a|b|a Sb|bSa.
(e) S → a|A A, A → b|SS.
(f) S → Ab, A → a B, B → a|Ab.
(g) S → ba|bS|bbS.
(h) S → b|ABa, A → bb A|B, B → a Aa.
(i) S → AB, A → a|a B|b A, B → a.
5.67. Construct CFGs in CNF and in GNF for generating the following languages:
(a) {w ∈ {a, b}∗ : w is not a palindrome}.
(b) {a m bn a m : m, n ≥ 1}.
(c) {a m bn ck : m, n, k ≥ 1, k ≤ 2m}.
(d) {a m b2m cn : m, n ≥ 1}.
(e) The language of balanced parentheses.
(f) {w ∈ {0, 1}∗ : w R = w,
(w) ≥ 1}, where w is obtained from w by interchanging
0’s and 1’s.
5.68. Prove that the CFG in GNF with productions S → a A|b B, A → b|bS|a A A,
B → a|a S|b B B generates the language {w ∈ {a, b}∗ : #a (w) = #b (w) ≥ 1}.
5.69. Suppose that the CFG G is in CNF and the CFG G is in GNF. If L(G) = L(G ),
then what can you say about the lengths of derivations of a string w in G and in G ;
which one is shorter?
were first considered in [8]. Chomsky normal form conversion is from [15] and
Greibach normal form conversion is from [47]. We have just introduced the concept
of parse trees, but have not given any details regarding parsing in actual program-
ming languages. You will find some references on parsing in the summary to next
chapter. For advanced texts on context-free languages, see [38, 51].
5.70. What language is generated by the CFG with productions S → ε|a S|Sb|bSa?
Prove your claim.
5.71. Show that the context-free grammar with productions S → ε|a AB|a B A|b A A,
A → a S|b A A A, B → bS|a A B B|a B B A|a B AB generates all and only strings
having exactly twice as many a’s as b’s.
5.72. Write an algorithm to construct a CFG from a given regular expression. Note
that the CFG need not be a regular grammar. Prove the correctness of the algorithm.
5.73. Suppose in the CFG G no production has the right side as ε. Let w ∈ L(G)
have
(w) = n. Show that if w has a derivation of length m in G, then w has a parse
tree with m + n nodes.
5.75. Show that the following algorithm for testing membership of a string
(input string), in the language L generated by the CFG with productions
S → ε|SS|a S|a SbS, is correct:
1. If the current string begins with b, then the input string is not in L.
2. If the current string has no b’s, then the input string is in L.
3. Otherwise delete the first b and the a that occurs to its immediate left; and repeat
the three steps on the new string.
5.76. Some strings in L(G), where G is the CFG with productions S → ε|a S|a SbS,
have unique left-derivations and some do not have. Give an algorithm to determine
whether a given string is of one type or the other.
5.79. Let G be a CFG and L = L(G). How do you construct a CFG G from G so
that L(G ) = head(L)?
5.7 Summary and Additional Problems 157
(x) = k > 1. Show that the height h of the parse tree for any w ∈ L(G) satis-
fies (k − 1) logk
(w) ≤ h(k − 1) ≤
(w) − 1.
5.82. Use the exhaustive search parsing to parse the string baaaaaa in the CFG with
productions S → b AB, A → a Ba, B → ε|A.
5.83. Show that if the method of exhaustive search parsing is used, there can be, at
the most, n + n 2 + · · · n 2
(w) number of nodes in a parse tree for a string w, where n is
the number of production rules in the CFG G.
5.84. Show that for every CFG G, there exists an algorithm that parses any w ∈ L(G)
in a number of steps proportional to (
(w))3 .
5.85. Prove: If G is a CFG in which each nonterminal occurs on the left side of at
most one production, then G is unambiguous.
5.87. Eliminate all useless productions from the CFGs whose productions are
(a) S → AB|bS, A → a A, B → A A.
(b) S → b|b A|B|C, A → ε|b B, B → Ab, C → aC D, D → ccc.
5.88. Prove that the marking method for eliminating useless productions works
correctly.
5.90. Show that if elimination of unit productions is done first, and then the elimina-
tion of useless symbols, then the CFG is better simplified, than otherwise.
5.91. Define the difficulty level or complexity of a CFG G by first defining com-
plexity of a production as complexi t y( A → x) = 1 +
(x) and then taking
complexi t y(G) as the sum of complexities of all productions of G. Show that elim-
ination of useless productions reduces the complexity of a CFG.
5.96. Let G be a CFG without an ε-production. Assume that the right hand side of
each production has length at most n. Convert G to G in CNF. Show that
(a) G has at the most a constant times n 2 number of productions.
(b) Considering the elimination of unit productions, show that it is possible for G to
have a number of productions actually proportional to n 2 .
5.98. Let G be a CFG with m productions. Assume that the right side of each
production in G has length at most n. Show that if A ⇒G ε, then there is a derivation
of ε from A in at most (n m − 1)/(n − 1) steps.
6 Structure of CFLs
6.1 Introduction
What can be the additions to a finite automaton so that it may accept any context-free
language? Clearly, some addition is required as each regular language is context-free
and there are context-free languages that are not regular. For example, the lan-
guage L = {a n bn : n ∈ N} is context-free but it is not regular. Here you can see
that somehow the automaton must remember how many a’s it has read, and then it
has to consume b’s one after another matching to that number. A finite automaton
does not remember any thing. So, we do not see how an automaton can accept the
language {ww : w ∈ {a, b}∗}, where w is the string obtained from w by interchanging
a’s with b’s. We add some sort of memory to a finite automaton so that it remembers
the first half and then match that with the second half, symbol by symbol.
Moreover, how would it know that it has come across the first half of the input
string? Well, we will keep that undetermined so that the automaton would guess and
then will do the right thing when it has the right guess. If it is not the right guess,
then the machine will not be penalized for it. A nondeterministic machine with some
kind of memory might do the job.
We start with such a nondeterministic machine with a very primitive type of memory,
called a stack or a pushdown storage. A stack resembles keeping books on your table,
where you can add one more on the top or take out one from the top. We plan to add
such a stack to a finite automaton. These two operations of adding one more on the
top, called pushing to the stack, and deleting one from the top, called popping off
from the stack, are inbuilt to any stack.
To see how the language {a n bn : n ∈ N} can be accepted by an automaton with
a stack, think of an automaton that accepts {a m bn : m, n ∈ N} and then give it the
capability of putting one pebble on the stack when an a is read and throwing one
pebble out when a b is read. So, the stack must be empty before the automaton starts
A. Singh, Elements of Computation Theory, Texts in Computer Science, 159
c Springer-Verlag London Limited 2009
160 6 Structure of CFLs
and it must also be empty at the end of its work for accepting the string. This stack
along with a finite automaton for a ∗ b∗ will accept {a m bm : m ∈ N}.
Such a machine has a finite input tape containing its input. The reading head
reads from left to right one symbol at a time, or even it reads nothing sometimes.
The control unit has a device to point out the current state of the machine. Moreover,
the control unit can change the state of the machine by following certain instructions
that are given by a transition relation.
The stack-head is capable of reading only one symbol (or can pretend not to read
anything, i.e., reading the empty string) from the top of the stack. It can also sense
when the stack (-string) is empty. It can push a string B1 B2 · · · Bk down the stack in
one move by pushing Bk first and B1 last, in that order. The stack can store any finite
amount of information; therefore, it is potentially infinite. Look at the schematic
drawing in Fig. 6.1.
a b · · · · c d
B
q
·
p r ·
·
Intuitively, this means that when the pushdown automaton P is in the state q,
reading the symbol σ on the input tape, having the symbol A on the top of the stack,
it acts by changing its state to r , pops off A from the top of the stack, and pushes α
to the stack (replaces A by α on the stack so that α from left to right matches with
α from top to bottom on the stack). The input symbol σ has thus been read, and the
reading head of P is placed on the square next to the last symbol σ.
Notice that our pushdown automaton is nondeterministic; it has the freedom of
taking one of the many allowed actions even from the same state, input-symbol, and
stack-symbol tuple. And all this is done in a single move. This happens due to the
transition relation Δ, which is a relation and not (necessarily) a partial function.
During its operation it may find zero or more number of transitions, which may be
applicable at a particular instant. Let us see the degenerate cases when one or many
of the possibilities σ = ε, A = ε, or α = ε happens.
For σ = ε, the automaton does not read anything but does all the other things it
is supposed to do. This means that if (q, ε, A, r, α) ∈ Δ, the machine is currently in
state q, the symbol on the top of the stack is A, and if it chooses to follow this tran-
sition (there might be other applicable transitions), then without moving its reading
head it changes its state to r , pops off the symbol A from the top of the stack, and
pushes the string α to the stack.
For A = ε, the quintuple (q, σ, ε, r, α) says that if the machine is currently in
the state q reading the symbol σ on the input string, by following this transition,
it changes its state to r , moves its reading head to the square next to the currently
scanned square (unless σ = ε), and simultaneously, pushes the string α to the stack.
Similarly, when α = ε, the quintuple (q, σ, A, r, ε) says that if the machine is in
state q, reading σ on the input tape, and finding A on the top of the stack, then it
changes its state to r , moves the reading head one square to the right (unless σ = ε),
and does not do anything with the stack. Here, after this transition is followed, the
top of the stack remains the same as earlier.
Exercise 6.1. What do the transitions (q, ε, ε, r, α), (q, ε, A, r, ε), (q, σ, ε, r, ε), and
(q, ε, ε, r, ε) mean?
At any instant of time during its operation, a description of a PDA must inform
us in what state it is, what is the remaining part of the input string yet to be read,
including the currently scanned symbol, and what string is there on the stack, from
top to bottom. Such an instantaneous description of a PDA is called a configuration.
Formally, a configuration of P is any element of Q × Σ ∗ × Γ ∗ . For example, the
initial configuration of P with an input u is (s, u, ε).
The one-step operation of P can be described by telling which configuration gives
rise to which other configuration. That is, an operation in one step can be described
by determining “how the PDA looks like after one step of operation.” This can be
formally captured by the yield in one step relation from configuration to configura-
tion. Suppose that currently P is in state q, reading the input symbol σ (i.e., σv on
the tape for some v ∈ Σ ∗ ) while the symbol A is on the top of the stack (i.e., you
have Aβ on the stack from top to bottom, for some string β ∈ Γ ∗ ). This means that
P has the current configuration (q, σv, Aβ).
162 6 Structure of CFLs
Suppose further that there is a transition (q, σ, A, r, α). Then by using this transi-
tion, P would change its state from q to r , move its reading head to the next square
on the right (of that containing σ), and replace the symbol A by α on the stack by
popping off A and pushing α. Which means that the next configuration of P will be
(r, v, αβ). In such a case, we will say that the configuration (q, σv, Aβ) yields in one
1
step the configuration (r, v, αβ). We will use the symbol for “yield in one step.”
The reflexive and transitive closure of the relation of yield in one step will be referred
∗
to as yield and we will denote it by the symbol . Formally,
1
(q, σv, Aβ) (r, v, αβ) if (q, σ, A, r, α) ∈ Δ.
∗
For configurations C, C , C C iff either C = C or
there is a sequence of configurations C1 , · · · , Cn , n ≥ 1, such that
1 1 1 1
C C1 , C1 C2 , . . . , Cn−1 Cn , Cn C .
Such a sequence of configurations is called a computation of the PDA. The com-
putation C C is an empty computation. The degenerate cases of the configurations
and the corresponding yield relation are taken care as in the following:
1
(a) If (q, ε, A, r, α) ∈ Δ, then (q, v, Aβ) (r, v, αβ).
1
(b) If (q, σ, ε, r, α) ∈ Δ, then (q, σv, β) (r, v, αβ).
1
(c) If (q, σ, A, r, ε) ∈ Δ, then (q, σv, Aβ) (r, v, β).
1
(d) If (q, ε, ε, r, α) ∈ Δ, then (q, v, β) (r, v, αβ).
1
(e) If (q, σ, ε, r, ε) ∈ Δ, then (q, σv, β) (r, v, β).
Perhaps you do not have to remember the above degenerate cases in toto. All that
you understand is that in a transition, an empty second component means that in one
step of computation the automaton does not read anything, an empty third component
says that it does not pop off anything from the stack, and the empty fifth component
means nothing to be pushed to the stack.
1 ∗
As earlier, we will simply write instead of and if no confusion arises. If
many PDAs occur in some context, then we will have a subscript in the symbol to
mention “yield in which PDA.”
Starting from its initial configuration, a PDA can halt in two ways, just like a finite
automaton. The first way is when the input string is over, and the second way is when
there is not an appropriate transition, though there might be a nonempty string on the
input tape yet to be read. In the second case, we say that the PDA has halted abruptly.
The string is accepted only when the PDA halts the first way, and on the top of that,
the last state where the PDA halts must be a final state, and the stack must also be
empty.
Formally, let P = (Q, Σ, Γ, Δ, s, F) be a PDA. P accepts w iff there exists
a computation of P in which (s, w, ε) P (q, ε, ε), for some state q ∈ F. The
language accepted by P is
Notice that we write the same ε for the empty input string and also for the empty
string on the stack; it is the empty string any way. If the PDA halts abruptly, then the
input string has not been read completely; consequently, it is not accepted. As you
see, an NFA is a PDA that never operates on its stack.
Example 6.1. What is L(P) if P = ({s, q, r }, {a, b}, {α, β}, Δ, s, {s}) is a PDA with
Δ = {(s, a, α, q, α), (q, b, α, s, α), (q, b, α, r, α), (r, a, α, s, α)}?
Solution. As earlier, let us have some computations. What happens if P is given the
input string ab? The initial configuration of P with the input ab is (s, ab, ε). We see
that every transition requires α to be on the top of the stack. Hence no transition is
applicable and P halts, that is, (s, ab, ε) (s, ab, ε). The machine does not read
any nonempty string at all. However, (s, ε, ε) (s, ε, ε) by default, and s is a final
state. Thus L(P) = {ε}.
In the transition diagram of a PDA, the transitions are picturized as joining arrows
from a state to another labeled with an input symbol, followed by “A → α,” when
the stack symbol A is replaced by the string α. The state diagram for the PDA of
Example 6.1 is given in Fig. 6.2.
b; ® → ®
s q r
a; ® → ® b; ® → ®
a; ® → ®
But what is intended in the above example? The transitions in Δ say that the stack
symbol is never changed. But somehow this was not expressed correctly. See the
following modification:
Example 6.2. What is L(P) if P = ({s, q, r }, {a, b}, {α, β}, Δ, s, {s}), where Δ =
{(s, a, ε, q, ε), (q, b, ε, s, ε), (q, b, ε, r, ε), (r, a, ε, s, ε)}?
Solution. We try computations with some strings over {a, b}. Using the first and the
third transitions, we have
It is a rejecting computation. However, we cannot just say from this that the string ab
is not accepted. There might be another accepting computation. Here is one:
Since s is a final state, the string ab ∈ L(P). Let us take one more string, say, aba.
Can you see that the stack is never used in any computation of P of Example 6.2?
The corresponding NFA where we do not have a stack can be given by N =
({s, q, r }, {a, b}, Δ, s, {s}) with Δ = {(s, a, q), (q, b, s), (q, b, r ), (r, a, s)}. Com-
pare the transitions in P and N; the stack components are simply omitted. Here it is
all right since in P the stack is never touched.
Example 6.3. What is L(P) if P = ({s, q}, {a, b}, {A, B}, Δ, s, {q}), where Δ =
{(s, a, ε, s, A), (s, b, ε, s, B), (s, ε, ε, q, ε), (q, a, A, q, ε), (q, b, B, q, ε)}?
Solution. Look at the transitions. The automaton reads a in state s and then pushes A
to the stack. In the same state, whenever it reads a b, it pushes one B. How is the third
transition applied? Does it require the stack to be empty? It is not so, because even
if there is a string γ in the stack, γ = εγ , and thus, the transition (s, ε, ε, q, ε) is still
applicable. Similarly, even if the string is not empty, the transition is still applicable.
This transition says that without reading any symbol, without operating on the stack,
the automaton may change its state from s to q. The fourth and the fifth transitions
pop off or pop out symbols A and B from the stack by reading one a or one b,
respectively, provided it is in state q. To see how does it operate, let us try it on the
strings ab and abba.
Then it halts without accepting the string ab as the stack is not empty. We try another
computation with the same string:
Here also it halts as no transition is applicable. It does not accept the string, as the
input string has not been read completely. Let us take one more computation with the
same string:
(s, ab, ε) (s, b, A) (q, b, A).
Once again it halts without accepting the string. Does there exist an accepting com-
putation at all with the input string ab?
The string abba is accepted by P as the following computation shows:
(s, abba, ε) (s, bba, A) (s, ba, BA) (q, ba, BA) (q, a, A) (q, ε, ε).
You can find one more computation that would be a rejecting one. But that does not
matter. As we have one accepting computation, abba ∈ L(P). Look at the above
computation once more to understand how it accepts the string abba. The key fact
in this computation is that it nondeterministically changes state to q and there after
it remains in the state q. Show that L(P) = {ww R : w ∈ {a, b}∗ }.
6.2 Pushdown Automata 165
Exercise 6.2. Let P = ({s, q}, {a, b}, {A, B}, Δ, s, {q}), with the transition rela-
tion Δ = {(s, a, ε, s, A), (s, b, ε, s, B), (s, ε, ε, q, ε), (q, a, B, q, ε), (q, b, A, q, ε)}.
Compare this PDA with that in Example 6.3. Does P accept the language {ww : w ∈
{a, b}∗}, where w is obtained from w by interchanging a’s and b’s?
Example 6.4. Construct a PDA to accept any binary number with same number of
0’s as 1’s, that is, {w ∈ {0, 1}∗ : #0 (w) = #1 (w)}.
Solution. Try to solve the problem yourself. Return to this solution after a work out.
Let us use the stack to keep note of the excess of either digit found thus far in the
input string. To use the stack this way, suppose we push an A to the stack whenever
one 0 is read and push one B when one 1 is read. However, when it reads a 0 and the
stack symbol on the top is a B, it pops out B instead of putting another A. Similarly,
if it reads a 1 while the top symbol on the stack is A, it pops out A instead of putting
another B.
With such an operation with the stack, when will a computation finish? Of course,
only after the input string is completely read. But then we need the stack to be empty
also. To this end, can we include the transition (q, ε, ε, r, ε)? However, we do not
want such a transition to be applied every where. We can solve the problem by push-
ing an extra symbol, say, C initially so that only when we find this C on the stack,
we will come to know that the stack is, in fact, empty. Then instead of the ε on the
stack in the above transition, we simply use this symbol C. Once met, the symbol C
will signal that the number of 0’s and 1’s read thus far is same. So, we try the PDA
P = ({s, q, r }, {0, 1}, {A, B, C}, Δ, s, {r }), where
Δ = {(s, ε, ε, q, C), (q, 0, C, q, AC), (q, 0, A, q, AA), (q, 0, B, q, ε),
(q, 1, C, q, BC), (q, 1, B, q, BB), (q, 1, A, q, ε), (q, ε, C, r, ε)}.
Can you now show that P really does the intended job?
Exercise 6.3. Find an accepting computation of P in Example 6.4 for the input string
00011001010111. From the working out of P, convince yourself that L(P) is the set
of binary strings with same number of 0’s and 1’s.
We have used our memory device as a stack that comes with two operations of
pushing and popping a string of stack symbols. What happens if we give a PDA
more power by enabling it to read also a string from the stack, and not just a symbol?
A generalized PDA is like a PDA where the transition relation Δ is a finite subset
of (Q × (Σ ∪ {ε}) × Γ ∗ ) × (Q × Γ ∗ ). That is, the stack-head can now read and
replace strings from Γ ∗ . However, generalized PDAs are no more powerful than the
PDAs. To see this, suppose we have a generalized transition (q, a, β, p, γ ), where
β = B1 B2 · · · Bn with n > 1. We replace it by adding new transitions
Exercise 6.4. Show that any language that is accepted by a generalized PDA is also
accepted by a PDA.
However, a triviality may be bothersome. How to push or pop the empty string?
It can be done in two steps: by pushing a new symbol and then popping it out. This
is the reason we have allowed the empty string in a stack to be operated with. For
example, let us consider a transition of the form (q, σ, ε, r, β). This means that with-
out bothering for what is there on the stack, read the symbol σ in state q, and then
change state to r while pushing β to the stack.
The same effect can be achieved by having a transition of the form (q, σ, X, r, βX )
for each X ∈ Γ . The difference is that in the latter case, the action of the PDA has
been more explicit. Such PDAs where these new transitions have been added corre-
sponding to each transition of the form (q, σ, ε, r, β) are called simple pushdown
automata, or SPDA for short.
Formally, an SPDA is a PDA P = (Q, Σ, Γ, Δ, a, F), where Q, Σ, Γ, s, F are
as in a PDA, and Δ satisfies the property that whenever (q, σ, ε, r, β) ∈ Δ, we also
have (q, σ, A, r, βA) ∈ Δ, for each A ∈ Γ .
Let us call this restriction as the restriction of simplicity. Our first concern is
whether we loose any power by such a restriction on a PDA. The discussion above
says that we do not. That means we can always construct an SPDA by introducing
new transitions to the old one for accepting the same language. See the following
example:
Example 6.5. Consider the PDA in Example 6.3. We add more transitions to it
corresponding to the empty-stack-transitions. The corresponding SPDA is P =
({s, q}, {a, b}, {A, B}, Δ, s, {q}), where
Δ = {(s, a, ε, s, A), (s, b, ε, s, B), (s, ε, ε, q, ε), (q, a, A, q, ε), (q, b, B, q, ε),
(s, a, A, s, AA), (s, a, B, s, AB), (s, b, A, s, BA), (s, b, B, s, BB),
(s, ε, A, q, A), (s, ε, B, q, B)}.
Exercise 6.5. Show that L(P) = L(P ), where P is the PDA of Example 6.3 and P
is the PDA of Example 6.5.
Have you completed Exercise 6.5? Well, then you would have realized that
addition of new transitions to satisfy the simplicity restriction does not matter for
the language it accepts. The construction is as follows:
After the construction, the resulting PDA is an SPDA. On the one hand, if any string
is accepted by the PDA without adding the extra ones, then it is also accepted by the
SPDA due to nondeterminism in using the transitions. On the other hand, these addi-
tions cannot get any new string accepted by the SPDA as application of a transition
of the form (q, σ, ε, r, β) includes already an application of a transition of the form
(q, σ, A, r, βA) (Why is it so?). So we have proved the following statement:
6.2 Pushdown Automata 167
Exercise 6.6. Can we afford to delete the transitions (q, σ, ε, r, β) from the con-
structed SPDA?
6.1. Let P be the PDA with initial state s, final state q, and transitions (s, a, ε, s, A),
(s, b, ε, s, A), (s, a, ε, q, ε), (q, a, A, q, ε), (q, b, A, q, ε). Show that aab, aba ∈
L(P) but bba, bab ∈ L(P). Describe L(P) in English.
6.2. Draw transition diagrams for the PDAs of Examples 6.2 and 6.3.
6.3. What are the languages accepted by the PDAs with initial state p, final states in
F, and transitions in Δ as specified below?
(a) Δ = {( p, b, ε, q, A), (q, b, A, q, AA), (r, a, A, s, ε), (s, a, A, t, ε), (t, a, ε, t, ε)},
F = {u}.
(b) Δ = {( p, a, ε, q, A), ( p, a, ε, r, ε), (q, b, A, q, B), (q, b, B, q, B), (q, a, B, r, ε)},
F = {r }.
(c) Δ = {( p,ε, ε, q, ε),( p, a, ε, p, A),( p, b, ε, p, B),( p, a, A, p, AA),( p, b, A, p, ε),
( p, a, B, p, ε), ( p, b, B, p, BB)}, F = { p, q}.
(d) Δ = {( p, a, ε, q, A), ( p, a, ε, r, ε), (q, b, A, q, B), (q, b, B, q, B), (q, a, B, r, ε)},
F = { p, q, r }.
(e) Δ = {( p, a, ε, p, A), ( p, a, A, p, A), ( p, b, A, p, B), ( p, a, B, p, B), ( p, ε, B,
q, B)}, F = {q}.
(f) Δ = {( p, a, A, q, BA), ( p, a, A, s, ε), ( p, ε, A, s, ε), (q, a, B, q, BB), (q, b, B,
r, ε), (r, b, B, r, ε), (q, ε, A, p, ε)}, F = {s}.
6.5. Construct a PDA for accepting the language generated by the CFG with produc-
tions S → a B|aaabb, A → a|B, B → bb|Aa B. Does there exist a nonaccepting
computation of the string aaabb in your PDA?
6.6. Find CFGs for generating, and PDAs for accepting the languages {w ∈ {a, b}∗ :
P(w)}, where P(w) is the property:
(a) w contains at least three b’s.
(b) w starts and ends with the same symbol.
(c)
(w) is odd.
(d)
(w) is odd and the middle symbol of w is b.
(e) w contains more a’s than b’s.
(f) w is a palindrome.
(g) w w.
(h) w has twice as many a’s as b’s.
(i) w a n bn for any n ∈ N.
168 6 Structure of CFLs
6.7. Construct CFGs and PDAs for the following languages over {a, b, c}:
(a) {vcw : v R is a substring of w for v, w ∈ {a, b}∗}.
(b) {w1 cw2 c · · · cwk : k ≥ 1, each wi ∈ {a, b}∗, and for some m, n, wm = wnR }.
(c) {ucv : u, v ∈ {a, b}∗, u v}.
(d) {ucv : u, v ∈ {a, b}∗ , u v but
(u) =
(v)}.
6.8. Construct PDAs for accepting the following languages over {a, b, c}:
(a) {ww R : w ∈ {a, b}∗ }.
(b) {a n bn : n ∈ N} ∪ {a}.
(c) {a n b2n : n ∈ N}.
(d) {a m bn cm+n : m, n ∈ N}.
(e) {a m bm+n cn : m ≥ 1, n ∈ N}.
(f) {a m bn : m ≤ n ≤ 2n}.
(g) {a m bn : m ≤ n ≤ 3n}.
(h) {a m bn : m, n ∈ N, m n}.
(i) {ab(ab)n b(ba)n : n ∈ N}.
(j) {w ∈ {a, b}∗ : #b (w) = 2 · #a (w)}.
(k) {w ∈ {a, b, c}∗ : #a (w) + #b (w) = #c (w)}.
(l) {w ∈ {a, b}∗ : #a (w) < #b (w)}.
(m) {w ∈ {a, b, c}∗ : 2 · #a (w) ≤ #b (w) ≤ 3 · #c (w)}.
(n) {vcw : v, w ∈ {a, b}∗, v w R }.
6.10. Suppose that we use a start symbol on the stack of a PDA, initially, and allow
its temporary removal during computation. That is, such a symbol will always be
there on the bottom of the stack; if it is removed by following a transition, then im-
mediately, it has to be pushed back into the stack. This allows transitions of the type
( p, a, , q, α), but neither of the type ( p, a, , q, ε) nor of the type ( p, a, , q, α),
where is “bems,” the bottom-of-the-stack symbol. At the end of a computation,
we consider the stack to be empty when the only symbol in it is . Show that we
can always suitably modify the transitions in such a way that, given a PDA, there is
6.3 CFG and PDA 169
The PDA, while in operation, mimics a leftmost derivation checking a terminal sym-
bol (primed one) on the top of the stack with the currently scanned symbol on the
input string. See the following example before arguing abstractly.
Example 6.6. Carry out the above construction of a PDA from the context-free
grammar of Example 5.4.
170 6 Structure of CFLs
S → aA | b B, A → b | bS | aAA, B → a | aS | bBB.
The corresponding PDA is PG = ({s, q}, {a, b}, {S, a , b , A, B}, Δ, s, {q}), where
Δ = {(s, ε, ε, q, S), (q, ε, S, q, a A), (q, ε, S, q, b B), (q, ε, A, q, b),
(q, ε, A, q, b S), (q, ε, A, q, a AA), (q, ε, B, q, a ), (q, ε, B, q, a S),
(q, ε, B, q, bBB), (q, a, a , q, ε), (q, b, b, q, ε)}.
Correspondingly, P goes on the following computation (Each line below gives one
step of the derivation in the grammar):
(s, aabbba, ε) (q, aabbba, S) (q, aabbba, a A)
(q, abbba, A) (q, abbba, a A A)
(q, bbba, A A) (q, bbba, b A) (q, bba, A)
(q, bba, b S) (q, ba, S)
(q, ba, b B) (q, a, B)
(q, a, a ) (q, ε, ε).
It is fairly clear how a string having a leftmost derivation in G is accepted by the
PDA PG . In fact, the mimicking process says a little more.
1
S ⇒ . . . (in n − 1 steps) . . . ⇒ v ⇒ wy.
Then v has at least one nonterminal symbol that has been replaced by an allowed
string to obtain wy. So, assume that v = x 1 Z x 2 , for some x 1 , x 2 ∈ (Σ )∗ ; the produc-
tion used was Z → z ∈ R, and then wy = x 1 zx 2 . By the induction hypothesis, we
have
(q, x 1 , S) (q, ε, Z x 2 ).
6.3 CFG and PDA 171
Exercise 6.7. In the induction step of the proof of Lemma 6.2, where have you used
the “leftmost derivation of wy?” Further, in the last line of the same proof, you have
used the statement “If (q, x 1 , S) (q, ε, z x 2 ), then (q, x 1 α, S) (q, εα, z x 2 ).”
Prove this statement formally.
1
(q, w, S) . . . (in n − 1 steps) . . . (q, v, z ) (q, ε, y).
The last step says that there is a transition in Δ, which is either in the form
(q, ε, A, q, α ) or (q, σ, σ , q, ε).
In the former case, v = ε, z = Aβ , y = α β for some β ∈ (N ∪ Σ)∗ . That is, there
is a production A → α ∈ R. Then the computation would look like
1
(q, w, S) . . . (in n − 1 steps) . . . (q, ε, Aβ) (q, ε, y).
172 6 Structure of CFLs
S ⇒ a Aβ ⇒ wαβ ⇒ wy.
As P is an SPDA, each of its transitions is in one of the forms of those in (c) or (d)
above. A production of the type (a) says that no computation is needed to remain in
the same state. Type (b) productions say that if the stack remains unchanged, then P
can change its state from the initial to the final. A production of type (c) says that P
changes its state from q to r while reading u, and at that moment it either removes X
from the stack or it keeps the stack unchanged. Finally, a production of type (d) says
that P changes state from q to p while removing X (or ε) from the stack through
the n states, which effect the removal of A1 , . . . , An after taking a single move in
reading the input u and changing state to r . We now prove the following result:
6.3 CFG and PDA 173
Proof. If [q, X, r ] ⇒ u in one step, then a type (a) production has been applied.
Then q = r, u = ε, X = ε, and consequently, (q, u, X) (r, ε, ε) trivially. Lay
out the induction hypothesis that if [q, X, r ] ⇒ u in fewer than m steps, then
(q, u, X) (r, ε, ε). Suppose that [q, X, r ] ⇒ u in m steps. Then the first step
must be an application of a production of one the types (c) or (d).
Let us first consider the type (d) productions. Here, the derivation looks like
1
[q, X, r ] ⇒ v[ p, A1 , q1 ] · · · [qn−1 , An , r ] ⇒ u.
The former case of type (c) productions is similar and a bit easier than this.
Proof. If (q, u, X) (r, ε, ε) in 0 step, then [q, ε, q] ⇒ ε due to type (a) production
in G P . Lay out the induction hypothesis that if (q, u, X) (r, ε, ε) in a computation
of length fewer than m, then [q, X, r ] ⇒ u. Suppose that (q, u, X) (r, ε, ε) in a
computation of length m and no fewer. Then such a computation looks like
1
(q, u, X) ( p, w, A1 · · · An ) (r, ε, ε) (in m − 1 steps).
nontrivial case n > 0, all Ai ’s are somehow removed from the stack. The SPDA will
then have states q1 , . . . , qn−1 and there are strings w1 , . . . wn such that
Thus we see that the construction has the desired effect. We summarize the dis-
cussion as follows:
Theorem 6.1. Let L be a language over some alphabet. Then there is a context-free
grammar G with L(G) = L iff there is a PDA P with L(P) = L.
Proof. Let G be a context free grammar with L(G) = L. Then by Lemma 5.1, each
w ∈ L has a leftmost derivation in G. By Lemmas 6.2 and 6.3 (with x = ε, u = w),
there is a PDA, say P such that (q, w, S) (q, ε, ε). Moreover, in the construction of
P, we have a transition that allows (s, w, ε) (q, ε, ε). This shows that (s, w, ε)
(q, ε, ε). That is, w ∈ L(P).
Conversely, let w ∈ L(P) for a PDA P. By Lemma 6.1, there is an SPDA, say,
P such that w ∈ L(P). Then (s, w, ε) ( f, ε, ε) for some final state of P . Using
Lemmas 6.4 and 6.5 (with q = s, X = ε, r = f ), we obtain a context free grammar
G such that [s, ε, f ] ⇒ w in G. That is, w ∈ L(G).
6.11. Let G be a CFG with productions S → c|a Sa|bSb. Take the PDA P with ini-
tial state p, final state q, and transitions ( p, ε, ε, q, S), (q, ε, S, q, ASA), (q, ε, S, q,
BSB), (q, ε, S, q, C), (q, a, A, q, ε), (q, b, B, q, ε), (q, c, C, (q, ε). Derive the string
baacaab in G and also trace the computation of P on baacaab. Relate the two.
6.12. Construct, in each case, a PDA that accepts the language generated by the CFG
with productions as given below.
(a) S → a|a S A, A → b B, B → b.
(b) S → ε|SS|a Sa.
(c) S → C|a Sb, C → ε|S|bCa.
(d) S → a A A, A → a|a S|bS.
(e) S → a A, A → a|b B|a ABC, B → b, C → c.
(f) S → aab|a Sbb.
(g) S → ab|a SSS.
(h) S → aAA|aABB, A → a|aBB, B → A|bBB.
(i) S → a|A A, A → b|S A.
6.4 Pumping Lemma 175
6.13. Find CFGs that generate the languages accepted by the PDAs with initial state
p, final state q, and transitions
(a) ( p, a, ε, p, A), (q, ε, ε, p, A), ( p, a, A, q, ε), ( p, b, A, r, ε), (r, ε, ε, q, ε).
(b) ( p, a, ε, p, A), ( p, a, A, p, A), ( p, b, A, r, ε), (r, ε, ε, q, ε).
(c) ( p, a, ε, p, A), ( p, b, ε, p, B), ( p, ε, ε, q, ε), (q, a, A, q, ε), (q, b, B, q, ε).
(d) ( p, ε, ε, r, C), (r, a, C, r, AC), (r, a, A, r, AA), (r, a, B, r, ε), (r, b, C, r, BC),
(r, b, B, r, BB), (r, b, A, r, ε), (r, ε, C, q, ε).
(e) ( p, 1, ε, p, A), ( p, 1, A, p, AA), ( p, 0, A, q, A), ( p, ε, A, p, ε), (q, 1, A, q, ε),
(q, 0, ε, p, ε), accepted by empty stack.
(f) ( p, 0, ε, p, A), ( p, 0, A, p, AA), ( p, 1, A, p, A), ( p, ε, A, q, ε), (q, ε, A, q, ε),
(q, 1, A, q, AA), (q, 1, ε, q, ε), accepted by final states.
6.14. Construct a PDA with two states, one initial and the other final, that accepts the
language
(a) L = {a n bn+1 : n ∈ N}.
(b) L = {a n b2n : n ≥ 1}.
(c) {a m bn ck : m = 2n or n = 2k; m, n, k ∈ N}.
(d) {a m bn c2(m+n) : m, n ∈ N}.
6.15. Is it true that corresponding to each PDA there exists an equivalent PDA with
only two states, one of which is the initial state and the other is the final state?
6.16. Consider a PDA M with n states and m number of distinct input symbols. What
is the maximum possible number of rejecting computations that M can have on an
input of length k?
There are only a countable number of pushdown automata. Can you see this? In that
case, there are only a countable number of context-free languages over any alphabet.
But there are uncountable number of languages over the same alphabet. Hence, there
are uncountably many languages that are not CFLs. However, this argument neither
does give a noncontext-free language nor does it help in proving whether a given
language is context-free or not. In this section, we suggest a partial remedy: the
pumping lemma. Let us begin with an example.
Example 6.7. Let G = ({S}, {a, b}, {S → ε | a Sb | SS}, S). A parse tree in G with
yield aabbab is shown in Fig. 6.3.
The leftmost derivation of the same string that has been represented in the parse
tree in Fig. 6.3 is
a S b
a S b
ε
a S b
ε
Look at the derivation of aabbab and also the parse tree. It is clear that any string of
the form aa n εbn bab can be derived in G. It is, of course, obvious from the grammar
itself, as L(G) is simply the language of matching parentheses [with a as ( and b as )].
However, I want you to realize this fact by looking at the parse tree. Can you see it?
In the second step of the derivation, you see that the string (over N ∪ Σ) is a SbS.
Now, the first S in it can be grown into a n Sbn by repeated use of the production
S → a Sb. But not only that, the same method can be used on any string of length
greater than 3. Why?
If we take any string (over Σ) of length greater than 3, there must be a parse tree
with height at least 2. This happens because on the right hand side of any production,
there are at most three symbols. Then any node in the parse tree can have at most
three children. If a parse tree has height 1, then the number of leaves in it will be no
more than 3. Thus the height of any parse tree that yields a string of length greater
than 3 has to be of height at least 2. In that case, there is a path from the root to a leaf
of length (number of edges in the path) at least 2.
That is, such a path has at least three nodes in it. Of these nodes, the last one,
the leaf is a terminal symbol from Σ and all the others are from N, nonterminals.
As there is only one nonterminal here, namely, S, it must repeat at least twice in the
path. That is, while traveling from the root to the leaf along that path, we find S at
least twice. For example, in the parse tree of Fig. 6.3, we take the path “S to S to
S to S to ε,” which gives the tree its height. Now, taking the second and the fourth
occurrences of S, we can have a relook at the tree as in Fig. 6.4.
S S
aa S bb a S b
Fig. 6.4. Another parse tree
for Example 6.7. "
In Fig. 6.4, we have kept the yield aa and bb at the appropriate places by shrinking
the tree at the second occurrence of S. As the same node S is repeated, the whole
portion shown within a triangle can be inserted below the S again to have another
parse tree. Such an insertion will give us the parse tree in Fig. 6.5.
6.4 Pumping Lemma 177
S S
a S b
a S b "
a S b
a S b
a S b
"
Fig. 6.5. Parse tree for Example 6.7 with inserted nodes.
This corresponds to the derivation of the string aa(aabb)bbab. That is, we can
always construct a derivation of aa(aabb)nbbab from that of aabbab by using the
insertion of the said portion of the parse tree. Looking at it in a slightly different
way, we can get a derivation of aabbab from a derivation of aa(aabb)nbbab, where
n > 1. Loosely speaking, the latter observation says that for derived strings of certain
bigger lengths, we can have derivations of strings of smaller lengths.
Proof. Let T1 be a parse tree in G with yield w ∈ Σ ∗ , and having the least height
among many possible parse trees yielding w. Suppose that
(w) > k m , that is, the
number of leaves in T1 is more than k m . Each node in T1 can have at the most k
children. Hence, height of T1 is greater than m.
Let P be a path in T1 from the root to a leaf, which gives T1 its height. Now, P
has at least m + 1 edges, and thus has at least m + 2 nodes. The last node, the leaf,
is a terminal symbol and all the others are nonterminals in N. But N has only m
symbols. Thus, by the pigeon hole principle, there is at least one nonterminal that is
repeated in P. Choose such a symbol, say, A ∈ N, which is repeated at least twice
in P. Take the last two occurrences of A in P.
178 6 Structure of CFLs
To understand what is going on, we construct five more parse trees from T1 . Look
at the schematic drawing in Fig. 6.6.
T1 : T2 : T3 :
S A S
A
A A
vxy
A A
uvAyz
uvxyz
T4 : T5 : T6 :
A A S
A A
vAy uAz
x
Let T2 be the subtree of T1 whose root is the last but one occurrence of A in the
path P. Let T3 be the tree obtained from T1 by cutting down the subtree from the
last occurrence of A in P, but keeping this A as a leaf. Let T4 be the subtree of T1
whose root is the last occurrence of A in P. That is, you have got T3 by cutting down
T4 from T1 and keeping this A. Moreover, T4 is a subtree of T2 . Let T5 be the tree
obtained from T2 by cutting down T4 from it and keeping the last A as a leaf. Let T6
be the tree obtained from T3 by cutting down T5 from it and keeping the A (last but
one occurrence in P) as a leaf.
Now, what are the yields of these parse trees? The original parse tree T1 has the
yield w any way. Look at the smaller parse trees first. Suppose that the parse tree T4
has yield x for some x ∈ Σ ∗ . This means that in G, we have a derivation A ⇒ x.
Similarly, the tree T5 has the yield vAy and the tree T6 has a yield of the form u Az
for some u, v, y, z ∈ Σ ∗ . Then we have derivations A ⇒ vAy and S ⇒ u Az in G.
Then what are the yields of the other parse trees?
Look at Fig. 6.6 again. The yields are written below the trees. From the yields of
T4 and T5 , we see that the yield of T2 is vx y. Similarly, from the yields of the trees
T5 and T6 , we obtain the yield of T3 as uvAyz. Finally, from the yields of T3 and T4 ,
6.4 Pumping Lemma 179
we get the yield of T1 as uvx yz. The sequence of derivations in terms of growing the
parse trees can be given as
Can both v and y be empty? If so, then the tree T5 can completely be removed
from T1 without changing the yield. This is done by removing the subtree at the last
but one occurrence of A and adding the subtree with the root as the last A at the same
node. Notice that the subtree with the last but one A generates vx y, while the subtree
at the last A generates x. As v = y = ε, vx y = x.
This deletion can be carried out in every such path in T1 that give the tree its
height. This way, the height of the tree T1 is reduced with the same yield as w. That
is, if v = y = ε, then there will exist a parse tree of shorter height than that of T1 ,
whose yield will be w. This is a contradiction as T1 is a tree of shortest height with
yield w. Therefore, v ε or y ε.
To see that
(vx y) ≤ k m , look at the tree T2 , whose yield is vx y. We had chosen
the repeated nonterminal A arbitrarily. Now we want to take a special choice so that
our job would be done. We choose the repeated nonterminal A in such a way that on
the path P from the root of T2 to the leaf, no other nonterminal is repeated. In turn,
this uses an appropriate choice of T2 .
It is obvious that such a nonterminal can always be chosen. As the path P in T1
was a longest path, the portion of the same path in T2 is also the longest. However,
its length is no more than m as there are only m number of nonterminals in G. Thus,
every path from the root of T2 to any leaf is having length less than or equal to m.
Moreover, each node in T2 has at most k children. Thus, there are at the most k m
leaves in T2 . This shows that
(vx y) ≤ k m and completes the proof.
Lemma 6.6 also shows how certain strings can be pumped into a derived string.
This also shows how certain strings can be pumped out! We state and prove a formal
version of the Pumping Lemma for context-free languages, from which this pumping
out will become clearer.
S ⇒ u Az ⇒ uvAyz ⇒ uvvAyyz ⇒ · · · ⇒ uv j Ay j z ⇒ uv j x y j z.
The assumption “infinite” is not really required in the pumping lemma as the
conclusion is a conditional statement. However, it guarantees that a string w of length
more than n, in fact, exists. Note that the number n is fixed for a language, and the
lengths of the pumped strings v and y are also somehow fixed by the language, for the
given string w. We have the freedom to choose only the number of pumped strings
into an already derived string. As applications of the pumping lemma we will show
that some languages are not context-free.
Solution. Suppose that L is context-free. Let n be the number that is fixed for L as
in the pumping lemma. Take k = n + 1. The string w = a k bk ck ∈ L and it has length
more than n. We can thus write w = uvx yz, where at least one of v, y is nonempty.
Suppose that v ε; the other case that y ε is similar. What can be this string v? As
(vx y) ≤ n, either v has occurrences of only one letter or v has occurrences of two
letters.
Suppose v has occurrences of only one letter, say, a’s. As
(vx y) ≤ n, the string
vx y does not contain any c. Now, in the string uv2 x y 2 z, there are more a’s than c’s.
The cases that v consists of only b’s or of only c’s are similar.
On the other hand, suppose v has occurrences of two letters, say, a and b. Then in
v all a’s must precede all b’s. However, in v2 , a b precedes an a. Thus, in the string
uv2 x y 2 z, a b precedes an a. The cases that v has occurrences of b’s and c’s, or of c’s
and a’s are similar.
That is, in any case, uv2 x y 2 z ∈ L, contradicting the pumping lemma. Therefore,
our assumption that L is a context-free language is wrong.
Exercise 6.8. Formulate the application of Pumping Lemma for CFLs as a game with
the demon.
a i , b i , ci , d i , a i b j , bi c j , or ci d j for i, j ≥ 1.
Suppose v = a i . As
(vx y) ≤ n, the string vx y has no occurrence of c. Then
uxz = uv0 x y 0 z has fewer occurrences of a’s than c’s. Similar are the cases when v is
of the form bi , ci , or d i . In all these cases, uv0 x y 0 z ∈ L.
6.4 Pumping Lemma 181
On the other hand, suppose v = a i b j . Then in uv2 x y 2z, the symbol b precedes an
a. The cases of v being in one of the forms bi c j or ci d j are similar. In all these cases,
uv2 x y 2 z ∈ L.
In either case, pumping lemma is violated. Therefore, L is not context-free.
6.19. Following the proofs of Lemma 6.6.and of Theorem 6.2, give another proof of
the pumping lemma for regular languages. [Hint: Use regular grammars instead of
DFAs]
You may be wondering why this delay in discussing the closure properties? Indeed,
we have gone topsy turvy. For regular languages, we discussed all the machinery
almost at the same time. For CFLs, we relied heavily upon the context-free grammars.
You will discover the reason shortly; some closure properties can be easily seen via
PDAs! Moreover, waiting this far will become fruitful in discovering certain more
noncontext-free languages.
You have seen that union, intersection, complement, concatenation, and Kleene
star of regular languages are regular. Do they also behave the same way for context-
free languages?
R ∗ = R ∪ {S → ε, S → SS}.
Is it clear why we are doing this? In G ∗ you can now derive ε. Also you can derive
from the first S in SS any string that can be derived in G and from the second S
another such string so that their concatenation can be derived in S. Inductively, that
must give you L ∗ = ∪n∈N L n .
Let us try to prove that G ∗ indeed generates L ∗ . Trivially, L 0 = {ε} ⊆ L(G ∗ ) due
to the production S → ε. Suppose that L m ⊆ L(G ∗ ) for 0 ≤ m ≤ k. Let w ∈ L k+1 .
Then w = uv, where u ∈ L k and v ∈ L. That means, we have derivations S ⇒G ∗ u
and S ⇒G v. However, all the productions in G are also in G ∗ . Thus, S ⇒G ∗ v.
A derivation of w in G ∗ is
S ⇒ SS ⇒ uS ⇒ uv = w.
Exercise 6.10. Show that L(G ∪ ) = L 1 ∪ L 2 and L(G c ) = L 1 L 2 to complete the proof
of Theorem 6.3.
S → AB, A → ε | aAb, B → ε | cB
Exercise 6.11. Show that for a regular language L 1 and a context-free language L 2 ,
the intersection L 1 ∩ L 2 may not be a regular language. [Hint: What was our first
example of a nonregular language?]
Exercise 6.12. Show that L(P∩ ) = L 1 ∩ L 2 , as claimed in the proof of Theorem 6.5.
Analogous to the regular languages, you can use the closure properties along with
the pumping lemma to prove that certain languages are or are not context-free. See
the following examples.
Example 6.11. Show that L = {ww : w ∈ {a, b}∗} is not context-free, but L is.
6.23. Classify each of the following languages into (i) regular, (ii) context-free but
not regular, or (iii) not context-free:
(a) {a n : n = 2m for some m ∈ N}.
186 6 Structure of CFLs
Abstractly speaking, a deterministic pushdown automaton has the same relation with
a PDA as a DFA has with an NFA. In a PDA, the machine has the capability to guess
a course of action basing on its current state, its currently scanned input symbol, and
the stack symbol that is on top of the stack. This guess is nondeterministic in the
sense that there may be no possible course of action or there may be many possible
actions from which it chooses one; the choice being not specified. In case of deter-
minism, we have either no possible course of action or there is a unique possibility.
This was so as far as DFAs are concerned.
To make the matter simple for deterministic pushdown automata in dealing with
its stack, we assume that there is always something on the stack. It would behave as
a bottom end marker on the stack. Similar to this, we will use a marker for the end of
the input string. It will signal when the input is over. Further, determinism demands
that at each possible situation there is either no way to proceed, when the machine
would halt abruptly, or there is a unique action to be taken. The end marker on the
input string will allow the use of ε-transitions due to the extra information on the
stack. Further, we will use acceptance by final states only.
Formally, a deterministic pushdown automaton, or DPDA, for short, is an eight-
tuple P = (Q, Σ, Γ, , , δ, s, F), where
The first restriction on the transitions keeps determinism intact, even in the pres-
ence of ε-transitions. The second condition says that (bems) is always there at the
bottom of the stack. The machine may pop it off momentarily, but it must bring it
back immediately. We do not allow this symbol to be put on the stack anywhere else.
As earlier, the transition δ(q, σ, ε) = (r, α) will mean that upon reading the input
σ from state q, the machine changes its state from q to r , pushing the string α to
the stack so that the α read from left to right matches that on the stack read from
top to bottom, irrespective of whatever be there on the stack. Similarly, the transition
δ(q, σ, A) = (r, ε) will simply pop off the stack symbol A, which is currently at the
top of the stack, with necessary change of state from q to r upon reading the input
symbol σ.
An instantaneous description or a configuration of P is any element of Q × Σ ∗
×Γ ∗ , which specifies the current state of P, the input string yet to be read (the
string to the right of the scanned square and before ), and the string on the stack
read from top to bottom (above ). The one-step operation of P is described by the
yield in one step relation among various possible configurations of P. That is, for
q, r ∈ Q, σ ∈ Σ, v ∈ Σ ∗ , A ∈ Γ, α, β ∈ (Γ ∪ {})∗ ,
1
(q, σv, Aβ) (r, v, αβ) iff δ(q, σ, A) = (r, α),
1
As before, we take the transitive and reflexive closure of as the yield relation, and
∗ ∗
denote it by . This means that C0 Cn iff either n = 0 or there are configurations
1 1 1
C1 , C2 , . . . , Cn−1 such that C0 C1 , C1 C2 , . . . , Cn−1 Cn . If there is no
1 ∗ n
confusion, we will simply use for , , and also for the yield in n steps ().
Similarly, whenever there are many DPDAs involved in a context, we will use a
subscript with to denote the computation of that machine. For example, “yield
relation of the machine P” will be denoted by P if we talk about other machines.
We say that a string u ∈ Σ ∗ is accepted by P iff (s, u, ) (q, ε, β) for some
q ∈ F and for some β ∈ Γ ∗ . That is, u is accepted by the DPDA P if P, upon
starting with its initial configuration (s, u, ) (i.e., being in state s, with input u,
and having only on the stack), is driven to a final state q after reading u completely
along with , no matter whatever is left on the stack. Finally, the language of P is
the set of all strings accepted by it. That is,
L(P) = {u ∈ Σ ∗ : (s, u, ) (q, ε, β) for some q ∈ F and for some β ∈ Γ ∗ }.
Example 6.12. What is L(P) if P = ({s, q, r }, {a, b}, {A, B}, , , δ, s, {r }), with
δ(s, a, A) = (q, A), δ(q, b, A) = (s, A), δ(r, a, A) = (s, A)?
188 6 Structure of CFLs
Solution. The initial configuration of P with an input u ∈ {a, b}∗ is (s, u, ). We
see that no transition is applicable on such a configuration, whatever the first symbol
of u may be. Further, the initial state s is not a final state. Thus, even the empty string
ε is not accepted. The DPDA does not accept any string; thus, L(P) = ∅.
Example 6.13. What is L(P) if P = ({s, q, r }, {a, b}, {, A, B}, , , δ, s, {s}),
with δ(s, a, A) = (q, A), δ(q, b, A) = (s, A), δ(r, a, A) = (s, A)?
Solution. The only change we have made to the DPDA in the last example is that s is
now the final state instead of r . Then the initial configuration (s, , ) becomes a final
configuration, and without any nontrivial computation the empty string is accepted.
Instead of ε if we have some other input, then the initial configuration would either be
(s, av, ) or (s, bv, ), for some string v ∈ {a, b}∗. As no transition is applicable,
the string is not accepted. L(P) = {ε}.
Example 6.14. Construct a DPDA for accepting the language (ab ∪ aba)∗.
Solution. As the given language is regular, operating with the stack is redundant.
So the problem is essentially, constructing a DFA and then modify the transition to
take care of the redundant stack. Now, how do we construct a DFA for accepting the
language?
First, ε is accepted, so make the initial state s also the final state. Let the machine
go to state q after reading an a, and then from q to r after reading one b. Now, r must
be a final state. Here, we let the machine go to state p after reading the next a. We
also make p a final state so that aba is accepted. If the machine, upon reading another
a goes back to state q, then it would accept the string ab thereafter. But one more
ab would not then be accepted. We just add another edge from p to r with label b.
Check that the DFA in Fig. 6.7 accepts (ab ∪ aba)∗. [Hint: ab(aab)n = (aba)n ab.]
s a q b r a p
Fig. 6.7. A DFA accepting
(ab ∪ aba)∗ . b
Once you have checked the claim that the DFA in Fig. 6.7 accepts (ab ∪ aba)∗,
it would be fairly easy to construct a DPDA for accepting the same language. Just
use a stack that is not touched at all, that is, the DPDA would pop off and immedi-
ately add it to the stack, with every transition. We keep an unused stack symbol, A,
just to make Γ nonempty. So, take P = ({s, q, r, p}, {a, b}, {A}, , , δ, s, {s, r, p}),
with δ(s, a, ) = (q, ), δ(s, , ) = (s, ), δ(q, b, ) = (r, ), δ(r, a, ) = ( p, ),
δ(r, , ) = (r, ), δ( p, , ) = ( p, ), δ( p, a, ) = (q, ).
Exercise 6.13. Check on some inputs that the DPDA of Example 6.14 accepts the
language (ab ∪ aba)∗.
6.6 Deterministic Pushdown Automata 189
Notice that our DPDAs accept strings by final states. By modifying the restriction
(2) of the transitions suitably, it can be shown that acceptance by final states and by
empty stack are equivalent, but not as it is!
A DCFL or a deterministic context-free language is a language accepted by
a DPDA. Now you have learnt that each DFA can be simulated by a DPDA. And
this is achieved by replacing each transition in the DFA of the form δ(q, σ) = r
by δ(q, σ, ) = (r, ), and then adding transitions δ( p, , ) = ( p, ) for each
final state p. This shows that each regular language is a deterministic context-free
language. Is this inclusion proper?
Solution. We keep track of the number of a’s and b’s by pushing the symbols A
and popping it off, respectively. Moreover, we will not define the transitions for other
cases when the input is not of the required form, so that the machine will halt abruptly
in those cases.
Thus we take P = ({s, q, r }, {a, b}, {A}, , , δ, s, {r }), where δ(s, a, ε) = (s, A),
δ(s, b, A) = (q, ε), δ(q, b, A) = (q, ε), δ(q, , ) = (r, ), δ(s, , ) = (r, ). Upon
reading one b, P cannot read an a. The last two transitions force the stack to be
empty (only on it) upon acceptance of a string. As the stack becomes empty only
after the whole input is read, P does not accept any other string than one in the form
a m bm . Verify this.
{a, b}∗ − {ww : w ∈ {a, b}∗ } is a CFL but its complement L = {ww : w ∈ {a, b}∗} is
not a CFL. This means, had the language L been a DCFL, its complement L would
also have been a DCFL, and thus a CFL. Hence, the context-free language L is not a
DCFL.
Thus, the class of regular languages (over an alphabet) is a proper subclass of that
of DCFLs and the class of DCFLs is a proper subclass of CFLs.
Because of the presence of determinism and context-free-ness, all programing
languages are DCFLs. It is easier to construct parsers (and thus compilers) for the
DCFLs than more general CFLs. But then it makes mandatory to know what kind of
grammars generate the DCFLs.
The grammars that correspond to DCFLs are the so-called LR-grammars, which
are naturally, restrictions of CFGs. However, the restriction is not too natural or easy
conceptually. We will not deal with LR-grammars in this book. We also mention
that there are two more operations that preserve DCFLs but not CFLs, in general.
They are
For details regarding the closure properties and LR-grammars, you must consult the
references quoted in the summary to this chapter.
6.27. Is {a m bn : m = n or m = n + 2} a DCFL?
6.29. Give an intuitive explanation why {wcw R : w ∈ {a, b}∗ } is a DCFL while
{ww R : w ∈ {a, b}∗ } is not.
6.31. Give a precise algorithm to construct a DPDA from a given DFA for accepting
the same language.
6.32. Show that if a language is accepted by a DFA with n states, then it is accepted
by a DPDA with n states and just one stack symbol.
6.33. Show that if a language is accepted by a DFA with n states, then it is accepted
by a DPDA with two states and n stack symbols. Can you further minimize on the
number of states and/or the number of stack symbols of the DPDA?
6.35. Show that if L is regular and L is a DCFL, then L ∪ L and L ∩ L are DCFLs.
6.37. Show that each DCFL is accepted by a DPDA that never adds more than one
symbol to its stack at a time, that is, each transition of such a DPDA looks like
δ( p, a, α) = (q, β), where β ∈ {ε, , B, B}.
6.38. Draw a Venn diagram illustrating the containment relations between the four
classes of languages such as regular, deterministic context-free, context-free, and
complements of context-free. Supply a language for each nonempty region in your
diagram.
6.39. Construct formally a PDA with and so that each DPDA is now a PDA.
This chapter started with the context-free recognition devices, the PDAs. We have
shown equivalence of PDAs with context-free grammars. We have proved a pumping
lemma for CFLs and used it to demonstrate some languages that are not context-free.
The closure of context-free languages with respect to union, intersection with regular
languages, concatenation, and Kleene star have also been proved. We claimed that
unlike DFAs and NFAs, the deterministic analogues of PDAs, called DPDAs, are not
equivalent to the PDAs.
The pumping lemma for context-free languages was first given in [8] followed
by a stronger form by Ogden [97], now called Ogden’s lemma. I ask you to prove
192 6 Structure of CFLs
Ogden’s lemma in one of the problems below. Though PDAs were introduced by
Oettinger [96, 115], equivalence of PDAs and CFGs was shown in [16, 30, 115].
Closure properties of CFLs are from [8, 41, 42, 114].
The DPDAs exist not only due to our curiosity, but also due to an important prac-
tical concern: parsing in programing languages. Early appearance of DPDAs can be
traced back to [31, 39, 49, 115]. The DPDAs are equivalent to LR(k) grammars, or
sometimes called LR-grammars. It first appeared in [68]. You can also refer [58]
for an exposition of these grammars. For the use of DPDAs and LR-grammars in
parsing, you may like to read [68] and some explanatory texts such as [4, 59, 77].
In the following exercises, I ask you to prove Parikh’s theorem and Chomsky–
Schützenberger theorem. It is a bit too much to expect from you without external
help if you are learning the subject for the first time. However, you can refer [16, 18]
for Chomsky–Schützenberger theorem and [45, 51, 73, 100] for Parikh’s theorem.
6.41. Construct a CFG and also a PDA for {w ∈ {a, b}∗ : #a (w) ≤ 2 · #b (w)}.
6.43. Show that each CFL is accepted by some PDA with a single state if acceptance
is by empty stack.
6.45. Prove the following stronger versions of the Pumping Lemma for CFLs and
then say in what sense they are stronger:
(a) Let L be a CFL over an alphabet Σ containing at least one nonempty string. There
is n ≥ 1 such that if w ∈ L is of length at least n, then w can be rewritten as w =
uvx yz such that v ε, y ε,
(vx y) ≤ n, and for each k ∈ N, uvk x y k z ∈ L.
(b) Ogden’s Lemma: Let L be a CFL over an alphabet Σ. Let L contain at least one
nonempty string. Then there exists an n ≥ 1 such that the following is satisfied:
Let w ∈ L with
(w) > n. Mark any n or more occurrences of symbols in w
arbitrarily. Then there exists strings u, v, x, y, z ∈ Σ ∗ such that
1. w can be rewritten as w = uvx yz,
2. the string vwx contains n or fewer marked symbols,
3. each of the strings uz, vy, x contains at least one marked symbol, and
4. for all k ∈ N, the string uvk x y k z ∈ L.
6.46. Use Ogden’s Lemma to simplify the proof that {ww : w ∈ {a, b}∗ is not a CFL.
6.47. Use Ogden’s Lemma to show that the following languages are not CFLs:
(a) {a m bk a p : k = max{m, p}; m, k, p ∈ N}.
(b) {a m bm c p : p m; m, p ∈ N}.
6.7 Summary and Additional Problems 193
6.48. Let L, L be CFLs. Show that the following languages are not necessarily
CFLs:
(a) min(L) = {w ∈ L : no proper prefix of w is in L}.
(b) max(L) = {w ∈ L : wx ∈ L for x ε}.
(c) half (L) = {w : wx ∈ L,
(w) =
(x) for some string x}.
(d) alt(L, L ) = {σ1 τ1 σ2 τ2 · · · σn τn : σ1 · · · σn ∈ L, τ1 · · · τn ∈ L }.
(e) shuffle(L, L ) = {v1 w1 v2 w2 · · · vn wn : v1 v2 · · · vn ∈ L, w1 w2 · · · wn ∈ L }.
6.49. Show that shuffle(L, L ) is a CFL if L is a CFL and L is regular. What happens
to shuffle(L, L ) if L is regular and L is a CFL?
6.50. For a string x, define per m(x) as any string obtained by reordering the occur-
rences of symbols in x. For a language L, define per m(L) = { per m(x) : x ∈ L}. For
example, per m({a n bn }) is the set of all strings having equal number of a’s as b’s.
(a) Show that if L is a regular language over {a, b}, then per m(L) is context-free.
(b) Give an example of a regular language L such that per m(L) is not context-free.
6.51. Let L = {a/b : a < b and a, b are positive integers written in decimal
notation}. Is L a context-free language?
6.52. A CFG is in two-standard form if each of its productions is in one of the forms
A → a, A → a B, or A → a BC.
(a) Convert the CFG with productions S → aSA, A → bABC, B → b, C → aBC
into a two-standard form.
(b) Prove: If ε ∈ L(G) for a CFG G, then there is an equivalent two-standard form
for G.
6.53. Show that for each CFL, there exists an accepting PDA in which the number of
symbols in the stack never exceeds the length of the input string by more than one.
6.54. Show that corresponding to each PDA P, there is an equivalent PDA P such
that each transition in P is of the form ( p, σ, α, q, β), where
(α) +
(β) ≤ 1.
6.56. Show that the family of CFLs is closed under regular difference, that is, if L is
a CFL, L is regular, then L − L is a CFL. Does it also follow that L − L is a CFL?
6.57. Show that the family of unambiguous CFLs is neither closed under union nor
under intersection.
6.58. Give a noncontext-free language that satisfies the Pumping Lemma for CFLs.
6.59. Show that if L is a CFL and R is a regular language, over an alphabet Σ, then
the quotient L/R = {w ∈ Σ ∗ : wx ∈ L for some x ∈ R} is a CFL.
(b) Suppose G has no useless nonterminals. If L(G) is regular, is it true that G does
not have any self-embedding nonterminal?
(c) Give an algorithm for determining whether a given nonterminal of a CFG is self-
embedding.
∗
6.61. Let Σ be an alphabet. A substitution on Σ is a map s : Σ → 2Γ ; that is, for
each symbol σ ∈ Σ, we have s(σ) a language over some alphabet Γ. Extend s to a
∗
map s : Σ ∗ → 2Γ by defining s(σ1 σ2 · · · σn ) = s(σ1 )s(σ2 ) · · · s(σn ). Further, if L
is a language over Σ, then define s(L) = ∪w∈L s(w), extending s to a map from the
power set of Σ ∗ to the power set of Γ ∗ . Prove the
Substitution Theorem: Let L be a CFL over Σ. Let s be a substitution on Σ such that
s(σ) is a CFL for each σ ∈ Σ. Then s(L) is a CFL.
6.62. Use the Substitution Theorem to show that the class of CFLs (over an
alphabet Σ) is closed under union, concatenation, Kleene star, positive closure
(+), homomorphism, and inverse homomorphism.
6.64. Let PARn denote the language of properly nested parentheses of n distinct
types, where the n pairs of parentheses are [1 , ]1 , [2 , ]2 , . . . , [n , ]n . Show that PARn
is generated by the CFG with productions S → ε|SS| [1 S ]1 |[2 S ]2 | · · · | [n S ]n .
6.67. Parikh’s theorem states that every CFL is letter-equivalent to a regular lan-
guage. To understand and prove this, we need some definitions.
Let G = (N, Σ, R, S) be a CFG in CNF. Let T, T1 , T2 , . . . denote parse trees of
G with a nonterminal at the root, nonterminals as labels of the internal nodes, and
terminals or nonterminals at the leaves. Recall that for a parse tree T, r oot(T ) is the
nonterminal at the root of T, yi eld(T ) is the string of terminals at the leaves of T ,
read left to right, and the depth(T ) is the maximum number of edges in a path of T
connecting a leaf up to the root. Let N(T ) denote the set of nonterminals occurring
in T.
A parse tree T is called a pump if T contains at least two nodes and all leaves of
T are labeled with terminal symbols except one, and the nonterminal label of that
leaf is the same as the label of the r oot(T ); that is, yi eld(T ) = u r oot(T ) v, for some
u, v ∈ Σ ∗ .
For parse trees T1 , T2 , define (a binary relation ≺ by) T1 ≺ T2 iff T2 can be
obtained from T1 by splitting T1 at a node labeled with some nonterminal A and
inserting a pump whose root is labeled A. Though the relation ≺ is not a partial
order (why?), it has the property that if T1 ≺ T2 , then T1 has less nodes than that
of T2 . A pump T is called a basic pump if T is ≺-minimal among all pumps. That
means, a basic pump can never contain another pump that can be cut out of it. It thus
follows that if T ≺ T and T is a basic pump, then T can only be the trivial one-node
parse tree labeled with the nonterminal r oot(T ).
A parse tree T1 is dominated by a parse tree T2 , if T2 can be obtained from T1 by
inserting a finite number of basic pumps into T1 , where these basic pumps use only
(not necessarily all) nonterminals that occur in T1 . That is, in T1 , we may choose
a nonterminal and insert a basic pump at that occurrence of the nonterminal with
the nonterminal as the root of the basic pump. Moreover, the basic pump cannot use
any nonterminal that has not already occurred in T1 . This process is repeated a finite
number of times to obtain T2 .
Suppose Σ = {σ1 , σ2 , . . . , σm }. The Parikh map is the function φ : Σ ∗ → Nm
defined by φ(w) = (#σ1 (w), #σ2 (w), . . . , #σm (w)). The Parikh map records the number
of occurrences of symbols of Σ in the string w. For a language L over Σ, we define
the commutative image of L as the image of L under φ, which equals φ( A) = {φ(w) :
w ∈ L}. Note that the commutative image of L is a subset of Nm . For a string α of
terminals and/or nonterminals, denote by x α , the string obtained from α by deleting
all occurrences of nonterminals. Then, define φ(α) = φ(x α ).
196 6 Structure of CFLs
6.72. Using Parikh’s theorem, show that the following languages are not context-free:
(a) {a m bn : m > n or (m is prime and m ≤ n)}.
(b) {a, b}∗ − {a m bn : n = m 2 , m ∈ N}.
6.74. Let K ⊆ N k be a semilinear set. Prove that there is a DFA M such that K =
φ(L(M)).
6.76. Show that the family of CFLs is closed under homomorphic images and ho-
momorphic preimages. We also say that CFLs are closed under homomorphisms and
inverse homomorphisms. That is, if h is a homomorphism, and L is a CFL, then
show that h(L) and h −1 (L) are also CFLs.
6.77. Ginsburg and Rice: Any CFL over a one-symbol alphabet is regular. Prove it.
6.79. Recall that for a string w, per m(w) is a permuted string where the occurrences
of symbols in w are reordered. For a language L, per m(L) is the set of all permu-
tated strings of L. In each case below, decide whether per m(L) is regular, context-
free, or neither:
(a) L = (ab)∗.
(b) L = a ∗ ∪ b ∗ .
(c) L = (abc)∗.
(d) L = (ab ∪ bc)∗ .
6.80. Let L be a CFL and R be a regular language. Which of the following languages
are CFLs and which is not?
(a) cycle(L) = {wx : xw ∈ L}.
(b) per m(L) = {w : for some x ∈ L, for all σ ∈ Σ, #σ (x) = #σ (w)}.
(c) shuf f le(L 1 , L 2 ) = {w1 v1 w2 v2 · · · wn vn : w1 w2 · · · wn ∈ L 1 , v1 v2 · · · vn ∈ L 2 }.
6.81. Let L be any language over {a, b, c}. Define L 1 = {ww : w ∈ L} and L 2 =
{w : ww ∈ L}. Which one of the following is true, and which three are false? Give
reasons.
(a) If L is regular then so is L 1 .
(b) If L is regular then so is L 2 .
(c) If L is context-free then so is L 1 .
(d) If L is context-free then so is L 2 .
6.83. Let b(n) denote the binary representation of n, where leading zeros are omitted,
for example, b(5) = 101 and b(12) = 1100. Show that
(a) {b(n)2b(n + 1) : n ≥ 1} is not a CFL.
(b) {(b(n)) R 2b(n + 1) : n ≥ 1} is a CFL.
198 6 Structure of CFLs
6.84. Show that there is no DPDA that accepts {ww R : w ∈ {a, b}∗}.
6.86. Show that each DCFL is generated by some unambiguous CFG. Conclude that
no DCFL is inherently ambiguous.
6.87. Here is an outline of the proof that complements of DCFLs are DCFLs. Let
M = (Q, Σ, Γ, δ, , , s, F) be a DPDA. Construct the DPDA M from M by M =
(Q ∪ Q ∪ {h, }, Σ, Γ, δ , , , s, F ), where h, ∈ Q ∪ Q , Q = {q : q ∈ Q},
F = {q : q ∈ F}, and δ is extended to δ by performing the following in that
sequence:
1. For each transition of the form δ( p, σ, A) = (q, β), include δ ( p , σ, A) = (q , β).
2. Replace each transition of the form δ( p, , A) = (q, β) with δ ( p, , A) = (q , β).
3. Replace each transition of the form δ ( p , ε, A) = (q , β) with δ ( p , ε, A) =
( p , A), for p ∈ F
4. Include the transitions δ (h, σ, A) = (h, A), δ (h, , A) = (, A), and δ (, ε, A) =
(, A), for each σ ∈ Σ, A ∈ Γ.
5. Replace each transition of the form δ( p, ε, A) = (q, β) with δ ( p, ε, A) = (h, A),
if p ∈ Q, or with δ ( p, ε, A) = (, A), if p ∈ Q .
Prove the following:
(a) After the modifications (1–3), L(M = L(M ).
(b) Let w be any input. Because of determinism, there exists a unique infinite se-
quence of configurations of the new machine obtained after the modifications
(1–4) on input w. Let γi denote the stack contents of the modified machine at
time i ∈ N. Then there exists an infinite sequence of times i 0 < i 1 < i 2 < · · ·
such that, for each i k ≤ i,
(γik ) ≤
(γk ).
(c) In the modified machine, after (1–4) have been applied, there exists a transition
δ ( p, ε, A) = (q, β) such that, it has been applied infinitely often, that is, at times
j0 , j1 , j2 , . . . .
(d) With M as the modified machine where all of (1–5) have been applied, we have
L(M) = L(M ).
(e) Let M be a DPDA obtained from M by taking F = {} and keeping all others
as they were in M . Then L(M ) = L(M ) = L(M).
6.88. Show that the family of DCFLs is closed under regular difference, and hence,
under complementation.
6.89. Show that the family of DCFLs is closed under inverse homomorphism.
6.91. Using Problem 6.90, show that the class of DCFLs is not closed under reversal.
6.7 Summary and Additional Problems 199
6.92. If the DPDAs are required to accept by empty stack (not by final states), then
the recognizing capability is limited. A language L has prefix property iff there are
no two distinct strings x, y ∈ L such that x is a prefix of y. Prove that
(a) If a language L is accepted by some DPDA by empty stack, then L has prefix
property.
(b) If a language L is accepted by some DPDA by empty stack, then L is accepted
by some DPDA by final states.
(c) If L has the prefix property and L is accepted by a DPDA by final states, then
there is some DPDA that accepts L by empty stack.
(d) There is a regular language that is not accepted by any DPDA by empty stack.
6.93. A simplified DPDA is one that does not use ε-transitions, and has neither nor
. Such a DPDA is a PDA with its transition relation as a partial function. Define
formally a simplified DPDA. Show that any language accepted by a simplified DPDA
is a DCFL. Is it true that each DCFL is accepted by some simplified DPDA?
6.94. A restricted PDA is one in which each transition is of the form ( p, a, A, q, α),
where
(α) ≤ 2. That means, at any single move, the stack of a restricted PDA can
increase the height of the stack by at most one. Show that corresponding to each
PDA, there is an equivalent restricted PDA.
6.95. A generalized PDA is one in which each transition is of the form ( p, a, α, q, β),
where α ∈ Γ ∗ is any string of stack symbols, instead of a single symbol. That means,
at any single move, the stack of a generalized PDA can read and pop off a string from
the stack. Show that corresponding to each generalized PDA, there is an equivalent
restricted PDA.
7 Computably Enumerable
Languages
7.1 Introduction
In the last chapter, you have tried but failed to construct a context-free grammar for
the language {a n bn cn : n ∈ N}. Later you could also prove that no such grammar
might be able to generate this language. Obviously, the restrictions on the productions
in a context-free grammar have some roles to play in this regard. The productions in
a context-free grammar look like A → u, where A is a nonterminal and u can be any
string of terminals and nonterminals.
What happens if we generalize the productions, for example, by allowing any
string of terminals and nonterminals in the left side of a production? In such a gener-
alization, how do we apply a production? Suppose we have a production of the form
a Ab Bbc → aabbcc and that we have already generated the string abca Ab Bccab.
By replacing the substring a Ab Bbc with aabbcc, we obtain the string abcaabbccab.
But, neither A will be replaced by a nor B will be replaced by b in isolation. This is
not really hopeless. Let us give a try and see what generality is achieved.
The unrestricted grammars are called grammars without any adjectives, and also
rewriting systems.
As usual, the productions can be given as ordered pairs of strings u, v instead of
using the symbol →. But you are now matured enough to express facts formally!
A. Singh, Elements of Computation Theory, Texts in Computer Science, 201
c Springer-Verlag London Limited 2009
202 7 Computably Enumerable Languages
That is, R is a finite relation from (N ∪ Σ)∗ N(N ∪ Σ)∗ to (N ∪ Σ)∗ . For any strings
x, y ∈ (N ∪ Σ)∗ and for any production u → v, we say that xuy derives in one step
1
xvy, or that xvy is derived from xuy in one step, and we write it as xuy ⇒ xvy. This
constitutes an application of the production u → v. We use the symbol ⇒ (derives)
1
for the reflexive and transitive closure of the relation of ⇒. That is, w0 ⇒ wn iff
either wn = w0 or there is a sequence of derivations
1 1 1
w0 ⇒ w1 , w1 ⇒ w2 . . . wn−1 ⇒ wn for n ≥ 1.
n
We write ⇒ to denote “derives in n steps.” If more than one grammar occur in a
certain context, then we will write ⇒G to denote the derivation in the grammar G.
Similarly, if no confusion arises, we may also write ⇒ even for a one-step or an
n-step derivation.
For a grammar G, we define the language of G as L(G) = {w ∈ Σ ∗ : S ⇒ w}.
We will also read L(G) as the language generated by the grammar G; it is the set of
all strings over the alphabet Σ that are derived from the start symbol by following the
production rules of G. A language is called a computably enumerable language if
it is generated by some (unrestricted) grammar. Computably enumerable languages
were previously (before 1999) called as recursively enumerable languages. See the
following examples.
In the derivation of abc, I have used underlines to tell you exactly which substring
has been matched for obtaining the next string. Now you know how to generate
7.2 Unrestricted Grammars 203
Example 7.3. Construct a grammar for {w ∈ {a, b, c}∗ : #a (w) = #b (w) = #c (w)}.
Solution. We will use a construction similar to that in Example 7.2. That is, our plan
is to first generate (ABC)n and then somehow permute the A’s, B’s, and C’s, and
finally replace A, B, C with a, b, c, respectively. Take G = ({S, A, B, C}, {a, b, c},
R, S) with
R = {S → ε, S → ABCS, AB → BA, BA → AB, BC → CB, CB → BC,
CA → AC, AC → CA, A → a, B → b, C → c}.
To know exactly what is going on, derive the strings abbacc and cabacb in this
grammar. Can you show that the grammar G generates the language?
n
Example 7.4. Is there a grammar G such that L(G) = {a 2 : n ∈ N}?
Solution. Yes. Consider the grammar G = ({S, A, B, C, D}, {a}, R, S) with
How does the grammar work? First, S gives C AD, then C gives C B n having the
derivation by now of C B n AD. The B’s make the A’s double while going to the
n
right, thus generating C A2 B n D. Then B n D gets absorbed into D. Finally, C, D
n
both become ε, A becomes a, and the string a 2 is generated.
It looks as though we have generalized the concept of a grammar too wildly. Al-
most any language can be generated by such grammars! But we are going too fast in
throwing conjectures. First, we must see what kind of automaton accepts computably
enumerable languages. If an unrestricted grammar is the ultimate type of a grammar,
then we must have a corresponding automaton with the same property of being the
ultimatum in some sense.
7.1. Decide whether the string baabbabaaabbaba is generated by the grammar with
productions S → aABa, A → baABb, B → Aab, aA → baa, bBb → abab.
204 7 Computably Enumerable Languages
7.2. Let G be the grammar with productions S → a Sb|a A|b B|a|b, A → a A|a,
B → b B|b. Decide whether ababa, aababa, aabbaa ∈ L(G).
7.3. Show that the grammar with productions S → AAB, Aa → SaB, Ab → SBb,
B → S A|ab generates no string.
7.4. Let Σ be an alphabet. Construct a grammar for generating all and only the
strings of length at most 2009 over Σ.
7.5. Construct a grammar that generates all (and only) even natural numbers below
2009, written in usual decimal notation.
7.6. Find a grammar that generates all strings of the form a m bm cm , where m ≥ 1.
7.7. Let G be the grammar with productions S → ε|aa|bb|a Sa|bSb. Show that L(G)
has no string of odd length, and also that the number of strings in L(G) of even length
is 2n .
7.9. For each of the following languages, find a grammar that generates it:
(a) L 1 = {a n b2n : n ≥ 0}.
(b) L 2 = {a n+2bn : n ≥ 1}.
(c) L 3 = {a n bn−3 : n ≥ 3}.
(d) L 4 = {a m bn : n > m ≥ 0}.
(e) L 4 L 2 .
(f) L 4 ∪ L 2 .
(g) L 34 .
(h) L ∗4 .
(i) L 4 − L 3 .
7.10. Find grammars to generate the following languages over the alphabet {a, b}:
(a) {ww : w ∈ {a, b}∗ }.
(b) {a m bn a n bm : m, n ≥ 1}.
(c) {w : #a (w) = #b (w)}.
(d) {w : #a (w) = 1 + #b (w)}.
(e) {w ∈ {a, b}∗ : #a (w) > #b (w)}.
(f) {w ∈ {a, b}∗ : #a (w) − #b (w) = ±1}.
7.11. Find grammars to generate the following languages over the alphabet {a, b, c}:
n
(a) {a 2 : n ∈ N}.
(b) {w : #a (w) = 1 + #b (w)}.
(c) {w : #a (w) = 2 · #b (w)}.
(d) {a m bm cn : m ≥ 1, n ≥ 0}.
7.12. What difficulties would arise if we allow ε on the left side of a production in
an unrestricted grammar?
7.3 Turing Machines 205
You have worked with regular grammars and finite automata, and context-free gram-
mars along with pushdown automata. Now you are working with unrestricted gram-
mars. The definition of an unrestricted grammar looks simpler than others, as there
are no restrictions on the productions. But then it allows so much of freedom that
life becomes difficult with too many choices. Freedom brings forth responsibility as
it is said. You will see, similarly, that Turing machines will be simple by definition,
but it may become difficult to work with ample opportunities and manipulating the
unrestricted power. However, it will be certainly enjoyable.
Like any other automaton, a Turing machine also has states, a reading head, and
an input tape. But it has no memory, no stack, or no pushdown storage. Then where
from the power comes? Well, instead of the reading head, a Turing machine uses a
read–write head, that is, it can use the same device to read symbols from the tape and
also write symbols on the tape. Thus the input tape is also used as an output tape and
hence it would work as a temporary memory.
Moreover, it can now move both left and right instead of the only-right movement
of the reading heads in earlier automata. The set of states, also called the control unit,
works in discrete steps as earlier. We will also have final states, which we rather call
as halt states. There will be exactly two halt states with each Turing Machine. Once
a machine enters a halt state, it will stop operating thereafter. There will also be an
initial state which is, of course, different from the halt states. From the initial state
the machine starts operating.
Unlike other automata, a Turing machine can move to the left. It then comes with
the danger of moving out of the tape. This is avoided by an extension of the tape
itself. The input tape is extended to both left and right, that is, there is neither a left
end nor a right end of the tape unlike the tapes of earlier machines.
206 7 Computably Enumerable Languages
Once the input is written on the tape, all squares to the left of the input string are
left blank. We write the blank space as b . All squares to the right of the input string
are also left blank, that is, every square preceding and following the input string
contains the blank symbol b . The machine will start with its read–write head placed
on the 0th square; (In fact, it is only conceptual; we do not number the squares.) it is
the square or cell just to the left of the input string, which initially contains the blank
symbol b . See Fig. 7.1.
··· b a b · · · c d b ···
s h
q h
We equip every Turing machine with a partial function. For example, if the partial
function is f with f (q, b ) = (r, L), then it would mean that “if the machine, being
in state q, reads b , then it would go to state r and its read–write head will move
one square to the left of the scanned square.” The symbols R are common
b , L,
to every Turing machine. The symbol b can be used to write the blank space (for
erasing nonblank symbols), L and R signify the movement of the read–write head
one square to the left and one square to the right, respectively. A formal description
follows.
A Turing machine or a TM is a seventuple M = (Q, Σ, Γ, δ, s, h, ), where
Both the states h and are called the halt states. We further assume that the
symbols s, h, are all different and that Q ∩ (Γ ∪ { L,
R }) = ∅, that is, the states
7.3 Turing Machines 207
and tape symbols are different and the special symbols do not occur anywhere. The
transitions of the form δ(q, β) = ( p, γ ) are also called instructions of the machine
M. Moreover, the input alphabet Σ is also called the output alphabet of M.
We must describe how the machine M works. Suppose the machine is currently
in the state q scanning (its read–write head is on a square containing) the symbol
a. If the partial function δ is not defined for the pair (q, a), then the machine stops
working. In particular, the machine halts whenever it enters one of the halt states h
or as δ is never defined for these states, whatever be the scanned symbol. We will
use the difference between the nonfunctioning of a Turing machine with a halt state
and with a nonhalt state in a very meaningful way later.
If δ(q, a) = (h, R ), then the read–write head of the machine moves one square
to the right and then halts. Similarly, when δ(q, a) = (h, L), the machine goes one
square to the left and then halts. Similar interpretation is given to the transitions
δ(q, a) = (, L) and δ(q, a) = (, R) when the machine first moves to the left or right
accordingly, and then halts in the state .
At any instant of time, the machine M is in some state and must be scanning
some symbol. Also, there must be some string on the tape to the left of the currently
scanned square and one to its right. These four pieces of information of the state, the
left-string, currently scanned symbol, and the right-string describe the current posi-
tion of the machine. Further, we will represent the infinite sequence of b ’s (blanks)
on both sides of the input string (or even on both sides of whatever string is there
currently) by the empty string ε. An instantaneous description or a configuration of
any machine will have these four components.
The work-out of the machine can be described by specifying how the machine
passes from one configuration to another. In passing from one configuration to an-
other, it uses a transition as given by δ. Suppose M is currently in state q scanning
the symbol a to the left of which is the string ab on the tape, and on the right hand
side of the currently scanned symbol is the string abab. Then we write the current
configuration of M as a quadruple (q, ab, a, abab).
If s is the initial state and if we place the read–write head on the square preceding
the input u, then the initial configuration will be (s, ε, b , u). Notice that the infinite
b ’s on the right are simply taken as ε and thus it is merged with the input u. The
configuration (q, ε, b , ε) means that currently the read–write head of the machine is
on the 0th square scanning the symbol b and to its left is the string ε and to its right
is also the string ε. That means empty string ε is the current input to the machine.
The configuration (q, ab, a, abab) can be abbreviated to qabaabab. The config-
uration (q, ε, b , ε) can then be abbreviated to q b . We mention the convention again
that the infinite number of b ’s to the right or left are represented by ε and thus not
explicitly mentioned in abbreviated configurations. Now from such a configuration,
if the read–write head moves one square to the left, then it will scan the b on the left,
and again the b ’s on the right will merge giving the new configuration as (q, ε, b , ε)
or as q b again.
Formally, a configuration is an element of Q × Γ ∗ × Γ × Γ ∗ . We use the
abbreviated configurations written as quav in place of (q, u, a, v). This includes the
abbreviations of (q, ε, a, v) as qav, of (q, b , a, ε) as q
b a, and of (q, ε,
b , ε) as q b.
208 7 Computably Enumerable Languages
We can dispense with the underline in the currently scanned symbol by writing
configurations in yet another way. In this notation, the configuration (q, ab, a, abab)
is written as abqaabab. In general, if q is the current state, σ is the currently scanned
symbol, u is the string to the left of the currently scanned symbol, and v is the string
right to the currently scanned symbol, then the configuration (q, u, σ, v) is abbrevi-
ated to the string uqσv. When need arises to represent a configuration as a string over
the alphabet Q ∪ Γ , we will use this abbreviation. Otherwise, we continue with the
abbreviated configurations written in the form quσv.
The working out of the machine M is described by how a configuration yields
in one step another configuration. Obviously, the transitions will have some role to
play in this yield of configurations. Though this can be done abstractly for any pair
of configurations, we will go the other way around. We will define the yield in one
step relation of configurations starting from the transitions. A formal definition of
1
the “yield in one step” written as can be given as in the following:
Let M = (Q, Σ, Γ, δ, s, h, ) be a Turing machine. Let p, q be generic states. Let
σ, τ be generic symbols from Γ, and u, v be generic strings over Γ.
1
(a) If δ( p, σ) = (q, τ ), then puσv quτ v.
1
(b) If δ( p, σ) = (q, L ), then puτ σv quτ σv.
1 1
In particular, puσb
quσ as b merges with ε on the right. Also, pσv qb
σv as
b appears from the ε on the left.
1
R ), then puστ v quστ v.
(c) If δ( p, σ) = (q,
1
In particular, If δ( p, R ), then pub
b ) = (q,
σv pu
b σv. This includes the cases:
1 1
p
b q bb as a b can appear from ε at the right, and pb
σv pσv as
b merges
with ε on the left.
(d) A configuration of the form huσv or uσv does not yield any configuration.
(e) If δ( p, σ) is not defined, then puσv does not yield any configuration.
1 1 1 1
C C1 , C1 C2 , · · · , Cn−1 Cn , Cn C .
n+1
In such a case, we also write C C mentioning in how many steps C yields C . If
1 n
there is no confusion, we will simply write instead of or . The sequence of
yields in one step that shows how C yields C is called a computation of M starting
from the configuration C.
We also specify the configuration from which a machine starts its work. When we
use the phrase “M starts with input u,” we will assume that the initial configuration
7.3 Turing Machines 209
of the machine is s b u, that is, the read–write head is kept on the blank square imme-
diately to the left of the input string u and the machine is in its initial state s. See the
following example.
Example 7.5. Let M = ({s, q, h, }, {a}, {a, b }, δ, s, h, ) be the Turing machine
R ), δ(q, a) = (q,
b ) = (q,
with δ(s, R ), δ(q,
b ) = (h, b ). What is the computation of
M with input aaa?
You can also represent a Turing machine by its transition diagram as you were do-
ing for the DFAs. A transition diagram of a Turing machine has states and transitions
as usual. The transition diagram of the machine in Example 7.5 is given in Fig. 7.2.
(a; r)
( b ; r) (b; b)
M: s q h
The transitions now go from one state to another but are labeled with a pair, the
first component is what it reads on the scanned square and the second component
specifies its action. Thus a transition from p to q labeled (a, b) would mean:
if the machine is currently scanning the symbol a being in state p, then it changes
state to q and replaces a with b.
We must also have some way of telling which state is our accepting state and
which one is the rejecting state. We will have two circles around the accepting halt
state as we were doing for the final states in a DFA. We will put an outgoing arrow
from the rejecting state. The isolated node with along with the out-going arrow is
sometimes omitted.
7.15. Design a Turing machine with least number of transitions that goes into an
infinite loop.
7.16. Design a Turing machine that started on a square not containing a, somewhere
in the middle of an input string not containing b , searches for an occurrence of a to
its right. If it finds an a, then it halts, else, it gets a b before getting an a, and in that
case, it should enter an infinite computation.
7.17. Let M = ({s, p, q}, {a}, {a, b }, s, δ, h, ) be the Turing machine with δ given
by δ(s, b ), δ(s, a) = ( p, L), δ( p,
b ) = (s, b ), δ( p, a) = (q,
b ) = (h, b ), δ(q,
b) =
(s, L), and δ(q, a) = (q, a). Let n ∈ N. Describe what M does when started in the
configuration s b a n a.
A Turing machine accepts or rejects languages depending upon its last configuration.
A halted configuration of a Turing machine is a configuration whose state component
is one of the halt states. An accepting configuration is a halted configuration whose
state component is the accepting halt state. A rejecting configuration is a halted con-
figuration with the state component as the rejecting halt state. The machine accepts a
string when it finally halts in an accepting configuration. Similarly, it rejects a string
when it eventually halts in a rejecting configuration.
Formally, let M = (Q, Σ, Γ, δ, s, h, ) be a Turing machine. The two types of
configurations whose state components are one of h or are called the halted
configurations. A halted configuration of the type huσv is called an accepting
7.4 Acceptance and Rejection 211
Exercise 7.1. Show that if M is a total Turing machine, then L(M) and L R (M) are
complements of each other. Does the converse hold?
We also say that the machine M accepts L(M) and it rejects L R (M). The lan-
guage of the machine (i.e., the language accepted by the machine) in Example 7.5
is, obviously, a ∗ and the language rejected by it is ∅. Notice that we do not bother to
take care what happens if an input is given over a different alphabet to the machine.
Because, the machine already knows what its input alphabet is. Informally, the case
of irrelevant input strings is referred to as garbage in and garbage out. This also ap-
plies to all the earlier machines such as finite automata and pushdown automata. For
example, exactly the same transitions in Example 7.5 can be used for constructing a
TM for accepting a ∗ over the alphabet {a, b}.
Example 7.6. Construct a total TM for accepting the language a ∗ ⊆ {a, b}∗.
Solution. The way it differs from the earlier machine of Example 7.5 is that the
strings having at least one b must now be rejected, as it is total. So, we modify
the earlier machine by changing its transitions taking into account the bigger input
alphabet. Take M = ({s, q, h, }, {a, b}, {a, b, b }, δ, s, h, ) with δ(s,
b ) = (q,
R ),
R ), δ(q, b ) = (h, b ), δ(q, b) = (, b). The transition diagram of this
δ(q, a) = (q,
Notice that in case of Turing machines, it is not required that the input string is
read completely for acceptance or rejection.
212 7 Computably Enumerable Languages
(b; b)
h
Solution. Of course, you are hopeful that a Turing machine M can be constructed to
accept L. It must be able to match the number of a’s and b’s and only in that particular
form, where no b can precede any a. When M meets the first a, it should look for
a b and then it should come back to the next a, etc. To fix the idea, suppose that M
meets the first a from the left, it matches the last b with this a and then it comes back
to the second a, goes for matching with it the last but one b, and so on. But how does
it come back to the second a after the first matching? To be able to sense how far it
has matched, let us change the already met a to another symbol, say, c. In that case,
we will simply go on accumulating c’s!
Well, something can be done. We will rewrite the a it meets first, and after the
matching with a b, we will erase this c and then start the next matching process,
again rewriting the next a as a c, and so on. It looks that this much extra symbol will
do. Let us try. At this point I would suggest you to take a piece of paper and a pen
and build up the transition function δ as we go along.
First, our set of states must contain the initial state s. As the initial configuration
is of the form s b w for the input w, the machine must go to the right from this. That
means we must have δ(s, b ) = ( p, R ) for some state p. Here, start building the
function δ as a table. Draw a horizontal line on the top of the piece of paper. Draw a
vertical line crossing the horizontal line towards the left hand side of the paper, put
δ on the top left corner of the table. Above the horizontal line keep the first symbol
as b , second symbol as a, third as b, and fourth as c. To the left of the vertical line
write the first symbol as s, and second as p. We will write the states one below the
other and the tape symbols on the horizontal one next to the other. Now fill in the first
entry corresponding to the pair (s, b ) as ( p,
R ). The table now looks like:
δ b
a b c
s ( p,
R)
p
The first input we should think about is ε. In such a case, the initial configuration
is s b ) = ( p,
b . As δ(s, R ), this configuration would yield p b , which must yield an
accepting configuration. So, we must have δ( p, b ) = (h,
b ). Here, (h,
b ) may also
be replaced by any other ordered pair whose first component is h. Let us stick to this
pair. At this point, update your transition function (table) to
7.4 Acceptance and Rejection 213
δ b
a b c
s ( p,
R)
p (h, b)
Suppose the input string is starting with an a, that is, it is in the form au. Now
how should our machine behave? As earlier, it reaches the configuration pau. We
want to replace a by c being in the same state. So add δ( p, a) = ( p, c). Had it been b
instead of a, our machine should not have accepted it; so we leave δ( p, b) undefined.
With input a, the machine is now having the configuration pcu. It must go to the
right and change state to, say, q. Then we add δ( p, c) = (q, R ). Now it is reading
the first symbol of u. Whatever this may be, it should go to the right till it gets a
b,
just to the right of u. So, we add δ(q, a) = δ(q, b) = (q,
R ). The updated table now
looks like
δ b
a b c
s ( p,
R)
p (h, b ) ( p, c) (q,
R)
q (q, R ) (q,
R)
The next configuration is qcu b. From this place it should go left by changing its
state. So, add δ(q, b ) = (r, L), where r is another state. We have replaced one a by
one c and now we want to erase one b from the right end of the input, where the
read–write head is currently scanning the b. So, we must have δ(r, b) = (r, b ).
What if the last symbol was not a b? If the last symbol is an a, the string should
not be accepted; we leave δ(r, a) undefined. If it is a c, then we have an excess a
in the input, thus δ(r, c) must not be defined. Now, the square that was having a b
contains a b and the machine is in state r . So, what should the machine do? It must
go to the left searching for the c. Can it be in the same state r while going left?
We have already decided to leave δ(r, a) undefined. If there is one more a to
the left, somewhere else, then we will be in soup. So, let us introduce one more
state, say, t and take δ(r, b ) = (t, L). Our plan is to move left until we get the c,
being in state t. So, add δ(t, a) = (t, L), δ(t, b) = (t, L). When the read–write head
reaches c, it must erase it, as agreed upon. Let us then add δ(t, c) = (t, b ). After
this, the machine should start afresh as it had started initially from state s. So, have
b ) = ( p,
δ(t, R ). The updated table of instructions is shown in Table 7.1.
δ
b a b c
s ( p,
R)
p (h, b) ( p, c) (q,
R)
q (r, L) (q, R) (q, R)
r (t, L) (r,
b)
t ( p,
R) (t, )
L (t, L) (t,
b)
214 7 Computably Enumerable Languages
q r
(b; l )
(a; c) (b; l )
(c; r)
( b ; r)
s p t
( b ; r)
(b; b)
(c; b ); (b; l ); (a; l )
The following is a verification that the machine works as intended for the inputs
1
ε, a, b, ab, aba, abb, abab, aabb. Read each as . Verify with other inputs.
1. s
b p
b h
b.
2. s
b a pa pc q
b r c, an abrupt halt.
3. s
b b pb, an abrupt halt.
4. s
b ab pab pcb qcb qcb
b r cb r c
b tc t
b yl p
b h
b.
5. s
b aba paba pcba qcba qcba qcba
b r cba, an abrupt halt.
6. s
b abb pabb pcbb qcbb qcbb qcbb
b r cbb r cb
b tcb
tcb t b b pb, an abrupt halt.
8. s
b aabb pcabb qcabb b r cabb r cab
b tcab tcab
t b ab pab h b , as in 4.
From the above, you see that the computations in 1, 4, 8 are accepting computa-
tions, while those in 2, 3, 5, 6, 7 do not end in a halted configuration; thus the
strings a, b, aba, abb, abab are not accepted by M. Now you can convince your-
self how the machine M works by arguing abstractly with what the states do and why
L(M) = L.
TM that accepts a n bn for each n ∈ N and rejects every other string over the alphabet
{a, b}? Start your work from Table 7.1.
All the transitions that were left undefined was so intended just to ensure that the
strings that are not in the form a n bn are never accepted; in such cases, the machine
halts abruptly. Is it always the case that if the transition function of a machine is a
total transition function, then the machine is a total Turing machine? In this partic-
ular case, if we modify the transition function by filling in those places driving the
machine to the rejecting state , the machine would reject the unwanted strings. The
updated transition function is now a total function as given in Table 7.2.
s ( p,
R) (, b) (, b) (, b)
p (h, b) ( p, c) (, b) (q, R)
q (r, L) (q, R) (q, R) (, b)
r (t, L) (, b) (r,
b) (, b)
t ( p,
R) (t, L) (t, L) (t,
b)
The above computations show that the strings a, b, ba, aba, abab are all rejected.
A moment’s thought will convince you that the modified machine M accepts the
language {a n bn ∈ {a, b}∗ : n ∈ N} and rejects the language
Here, L R (M ) = L(M ). Recall that we use L(M ) instead of the precise L(M ).
Will this strategy of modifying an already existing machine work in all cases? To be
specific, suppose M accepts L ⊆ Σ ∗ for some alphabet Σ. Construct a machine M
b ). Clearly, L(M ) = L(M). Will it be the
by redefining the missing transitions to (,
case that L R (M ) = L(M )?
The strategy would not work when the machine M does not accept a string,
by entering an infinite computation. Recollect that unlike finite automata, a Tur-
ing machine can go on infinite computations. Moreover, a Turing machine accepts
a string by eventually reaching its accepting state; it does not accept a string either
216 7 Computably Enumerable Languages
accepts the language L(M) = {aw : w ∈ {a, b}∗} and rejects ∅. The strings in {a, b}∗ −
L(M) cause M to enter an infinite computation by moving to the right. The tran-
sition function is a total function having nothing more to be filled in. The strategy
of redefining the undefined transitions gives M = M. Thus it fails. However, you
can have another machine for accepting the same language where there is no such
transition that might cause the machine to enter an infinite computation. In that case,
you can use the strategy for rejecting strings from L(M). Can we always do that?
The question is a bit deeper now. To be precise, suppose we have a Turing machine
that accepts a language L. Can we have a Turing machine that accepts L but for no
strings in its input alphabet it enters an infinite computation? Equivalently, if L is a
language accepted by a Turing machine, can we have a machine that accepts L and
rejects L? Notice that this question asks about the closure properties of the class of
languages accepted by Turing machines. We will need some more experience with
Turing machines to answer this question.
7.21. Design total Turing machines that accept the following languages over {a, b}:
(a) ∅.
(b) {ε}.
(c) {a}.
(d) a ∗ ba ∗ b.
7.22. Design Turing machines for accepting the following languages over {a, b}:
(a) ab∗ ab∗a.
(b) {a n bn : n ≥ 1}.
(c) a(a ∪ b)∗ .
(d) bab ∗a.
7.5 Using Old Machines 217
(e) {w :
(w) is even}.
(f) {w :
(w) is a multiple of 3}.
(g) {a m bn : m ≥ 1, m n}.
(h) {w : #a (w) = #b (w)}.
(i) {a m bn a m+n : m ≥ 0, n ≥ 1}.
(j) {(a m bm )2 : m ∈ N}.
(k) {a n b2n : n ≥ 1}.
(l) {ww : w ∈ {a, b}∗ ,
(w) ≥ 1}.
(m) {ww R : w ∈ {a, b}∗ }.
7.23. Give an example of a Turing machine M for each of the following cases:
(a) M has the single halt state h and M does not accept any language.
(b) M has both the halt states h and and M does not accept any language.
7.24. Turing machines can be defined with more final states than two as for other
automata. Define such a machine and show that for each such machine there ex-
ists a Turing machine with only two final states (as we defined) accepting the same
language as the original.
7.25. What happens if we put forth the convention of starting a Turing machine from
scanning the first input symbol instead of starting it from the
b preceding the input
string?
7.26. What is L(M), if M = ({ p, q, r, s}, {a, b}, {a, b, b }, δ, p, h, ) with δ given by:
(a) δ( p,
b ) = (q, R ), δ(q, a) = (r, R ), δ(r, b) = (q, R ), δ(r, b ) = (h,
R )?
(b) δ( p,
b ) = ( p, R ), δ( p, a) = (q, b ), δ(q, R ), δ(r, b) = (q,
b ) = (r, b ), δ(r,
b) =
(h, R )?
b ) = ( p,
(c) δ( p, R ), δ( p, a) = (q, b), δ(q, b) = (r, R ), δ(r, b) = (s, a), δ(s, a) =
(t, L), δ(t, b) = ( p, R ), δ(r,
b ) = (h, R ).?
Our experience shows that it is not easy to construct Turing machines that accept a
given language. We will, in fact, develop better mechanism for constructing Turing
machines instead of always trying from the first principle.
Look back on the solution of Example 7.7. You first had an informal way of
spelling out how the machine would work. The informal way took care of small
b is met,” then “turn right,” etc. Imagine that there
jobs such as “go to left unless one
are machines to do these small jobs. Then can you make bigger machines to do the
job at hand using these small ones?
Suppose you have a machine, say, M L that goes to the left only one square, and
a machine M R that goes to one square on the right. You can think of combining
these machines so that they work one after the other. Say, somewhere on the middle
of the tape, the read–write head is already there. Now if M L works on it, the read–
write head goes to the square to the left of the original. If M R works immediately on
the current one, then it would take the read–write head back to the original square.
218 7 Computably Enumerable Languages
That is, the combined machine M L → M R does not alter the input at all, leaving the
read–write head as it was, that is, it does not alter the current configuration. Similarly,
M L → M L → M R will have the net effect of M L .
Notice that these machines need not start from the blank square preceding the
input. In fact, this holds for all machines. Only when we require a machine to accept
a language, it has to start from the blank square just to the left of the input.
However, all machines will start from their initial states and if nothing is defined,
they would halt abruptly. An abrupt halt would signal that the string is not accepted.
In combining machines, we will ensure that the first machine comes to its accepting
halt state, and then the second machine takes over. Though this is not necessary, it
will be easy, conceptually leading to a clear presentation.
To fix the idea, if we have two machines to be operated one after the other, then
the first machine starts from its initial state, it should come to its halt state after some
computation, and then the second machine takes over starting from its own initial
state. The first machine can start from anywhere on the tape as per our requirements,
but the second machine must start from wherever the first one has left the tape.
Think of taking out the control unit of the first machine (Fig. 7.1) and plugging
in the control unit of the second machine leaving the tape and the read–write head
as they were. How do we formally describe such a combined machine? Well, with
Γ = Σ ∪ { b }, take
M L = ({s1 , h, }, Σ, Γ, δ L , s1 , h 1 , 1 ) and δ L (s1 , γ ) = (h 1 , L), for each γ ∈ Σ ∪{
b }.
M R = ({s2 , h, }, Σ, Γ, δ R , s2 , h 2 , 2 ) and δ R (s2 , γ ) = (h 2 ,
R ), for each γ ∈ Γ.
The machine M L → M R will have the initial state as that of M L , that is, s1 . On any
given input when the new machine works, the machine M L will be initiated. Upon
halting of M L , the new machine will go to the initial state of the machine M R and
then M R must work. Thus, no halt states of M L should be a halt state of the new
machine. The only accepting halt state of the new machine is the accepting halt state
of M R . In addition, we must have a transition to drive the new machine from the halt
states of M L to the initial state of M R . That is,
M3
Start in the initial state of M1 and operate as M1 does. Upon the halting of M1 ,
check the currently scanned symbol. Then do one of the following:
(a)If the currently scanned symbol is σ, then change state to the initial state of M2
and operate as M2 does.
(b)If the currently scanned square is τ , then change state to the initial state of M3
and operate as M3 does.
For rejection of strings, only the accepting and rejecting states of M2 and M3
matter and not the rejecting state of M1 . When the scanned square is neither σ nor τ ,
the combined machine must halt abruptly due to the reason that no transition would
be defined in that case. Similarly, for acceptance of a string, the accepting states of
M2 and M3 matter and not that of M1 .
If in one of the machines Mi , the transition function δi is not defined for (i , σ),
then the transition function δ in the combined machine is left undefined for (i , σ).
For example, the transition δ(i , σ) in the combined machine M1 →M2 may be
well defined provided δ1 (1 , σ) has been defined in M1 . In the combined machine
M L →M R , we have not included such a transition as in M L the transition δ L (1 , σ)
has not been defined.
We can have much more complex combinations of machines, examples of which
you will see later. A simple modification of the above machine is where the new
machine simulates M3 whenever the currently scanned symbol is any thing other
than τ . The arrow from M1 to M3 can, in such a case, be written simply as τ or as
γ τ . This can be depicted by any of the diagrams in Fig. 7.6. We use the former
when we do not require the currently scanned symbol anymore; and we use the latter
when we require to use the information that the currently scanned symbol is γ .
220 7 Computably Enumerable Languages
M3 M3
If M1 has halted on a square containing the symbol τ , then M2 would take over,
else, if M1 has halted on a square containing the symbol not equal to τ , then M3
would take over.
We will refer to these diagrams as machine diagrams. Moreover, when machines are
combined unconditionally, we may also omit the arrow. For example, the combined
machine M1 → M2 can also be written as M1 M2 just by juxtaposing the machines
one after the other. By induction, many machines can be combined.
It would be nice to have some simple machines and then from those we would like
to build complex ones. Some such simple machines are as follows:
Symbol Writing Machines: Let Σ be an alphabet. Let σ ∈ Γ ⊇ Σ ∪{ b }. Define the
machine Mσ = ({s, h, }, Σ, Γ, δ, s, h, ), where δ(s, τ ) = (h, σ) for each τ ∈ Γ .
This machine simply writes the symbol σ on the currently scanned square and then
halts. The machine M b is called the erasing machine as it writes the symbol b (i.e.,
from the machine M1 to the machine M2 , the first labeled σ, second labeled τ , and
the third labeled γ b , then we will replace all the three by a single arrow with
b . However, an arrow in the other direction, that is, from M2 to
the label σ, τ, γ
M1 cannot be mixed with the previous one. Recollect also the other abbreviations:
M1 → M2 is written as M1 M2 , and an arrow labeled τ σ as σ, when the particular
symbol τ different from σ is not used latter.
A combined machine is also called a machine scheme, and a machine diagram
when depicted in a figure using labeled graphs. See the following examples.
a
b
M : r h
Solution. As usual, the input to a combined machine must be in the specified form
b u. Suppose we give the input aaa to M. This means that the initial configuration of
M is s b aaa. The initial machine R takes over and the read–write head moves to the
first input symbol. We can no more write the configurations as there is no trace of
the states of the combined machine. Moreover, the machines in a combined machine
are just copies of the usual head moving or symbol writing machines. Thus, their
exact states are not known. However, we can write the tape appearances. For example,
b aaa, and then after the
initially, the tape looks like R machine has worked, the tape
looks like aaa, etc.
At this point, we make a convention in referring to the tape appearances in a
combined machine as the configurations of it, overusing the word “configuration.”
We also use the same symbol between successive tape appearances for writing
down a computation in a combined machine.
The current configuration of M is aaa. The machine is now scanning a and the
machine diagram of M says that if it is scanning an a, it must simulate R . (The loop
around R is labeled a.) Thus the next configuration of M is aaa. Twice the process
is repeated and the configuration aaa b is reached. The currently scanned symbol is
b and the machine diagram of M says that it must simulate the accepting machine
sb
aaa aaa aaa aaa aaa
b haaa
b.
Notice the state components in the first and last of the above tape appearances. We
will make it a practice to write the s in the beginning to mark the initial configura-
tion. The string following this sb
is the input string. Similarly, the state of the final
computation step is also marked with h or so that it will be easy to read it through
for acceptance or rejection of strings.
It is thus obvious that any string a n is accepted by M. The empty string ε is clearly
accepted by M. If there is any other symbol in the input string, then the machine
would halt abruptly upon reaching that symbol. That is, L(M) = {a n : n ∈ N}.
222 7 Computably Enumerable Languages
You can also construct a total TM for accepting the same language a ∗ modify-
ing the machine diagram of Example 7.8. What you need is that the machine must
execute the rejecting machine when it meets some symbol other than a. Look at
Fig. 7.7.
h
a b
M : r
You see that there is no need to think about the states of the combined machine.
Those can be found out by following the combination rules. This is possible because
we assume that all the machines used in a combined machine have distinct states. If
a machine is used many times in a combined machine, then the many occurrences of
the former machine are not same but identical copies.
In Example 7.8, the machine R has been simulated four times and each time
a different copy of R has been used. For example, a copy of the machine R is
M R = ({si , h i , i }, Σ, Γ, δ ,
i is , h i , i ), where Γ ⊇ Σ ∪ { b
} and δ (s
i i , τ ) = (h R)
i,
for each τ ∈ Γ .
b
Lb :
l
b
Solution. The machine L b first goes one square to the left. Next, if the square it is
currently in (that is, one left to the original) contains a nonblank symbol, then it
moves another square to the left. It repeats the steps and stops when it finds a b.
It does not really matter what the underlying input alphabet is; the label b on the
loop takes care. If at the start the machine is scanning the blank symbol b , then it
moves one square to the left anyway and then depending upon whether the next left
square is a b or not, it proceeds as earlier. Upon finding the left b , the machine
L b halts in the accepting halt state h.
Look at the combined machines in Fig. 7.8.
The machine L b moves its read–write head to the left until it reaches a square
containing a nonblank symbol. Once it gets a nonblank symbol, it halts then and
there in its accepting state. The machine R b goes on moving its read–write head to
the right and halts in its accepting state upon reaching a square containing the blank
symbol b . Similarly, The machine R b halts at the first nonblank symbol to the right
of the currently scanned square.
7.5 Using Old Machines 223
b b b
Lb : l Rb : r Rb : r
b b b
h h h
In Fig. 7.9, the machines L σ and L σ halt after finding σ and a symbol other than
σ to the left, respectively. The machines Rσ and Rσ find the symbols σ and one other
than σ to the right, respectively. We can use these machines to build other machines;
see the following example.
¾ ¾
Lσ : l Lσ : l
¾ ¾
h h
¾ ¾
Rσ : r Rσ : r
¾ ¾
Fig. 7.9. More combined
machines. h h
M: r a b Rb l b Lb
b
b
Fig. 7.10. The combined
machine for Example 7.10. h
Solution. First, the read–write head of M goes to the right. If that symbol happens to
be an a, then it erases that a and then it moves to the right until a blank symbol is met.
Next, it turns left and if that symbol is b, then this b is erased. It then comes back
to the blank on the left, which must be on the square where it erased an a. Then the
control is transferred to the initial machine R . Till now, the net effect is erasing an a
from the start-side and erasing a b from the end-side of the input string. It continues
to operate this way until the initial machine gets a blank, when it halts accepting the
string. This happens if the input string has the same number of a from the start-side
224 7 Computably Enumerable Languages
as the number of b’s from the end-side, and there should not be anything else in the
middle. Thus, L(M) = {a n bn : n ∈ N}.
Computation of M of Fig. 7.10 with the input aabb can now be shown as in the
following. While reading or writing such a computation sequence by a combined
machine, you must keep one finger on the current machine that is operating and then
proceed.
sb
aabb aabb
b abb abb
b abb ab b ab ab
b bb
bb
b b h
b b.
Exercise 7.3. Show computations of the machine M of Example 7.10 (Fig. 7.10) on
the inputs aabab and aabba.
Exercise 7.4. Follow the idea used in Example 7.10 to construct a Turing machine
with states and transitions for accepting {a n bn : n ∈ N}.
You can now easily construct a machine diagram for accepting the language
{a n bn : n ∈ N} and rejecting any string in {a, b}∗ − {a n bn : n ∈ N}. One such
modification of the diagram of Fig. 7.10 is shown in Fig. 7.11; it has to be a total
Turing machine.
M: r a b Rb l b Lb
b
¾=a b
¾ h
b
b
Fig. 7.11. A total TM
accepting {a n bn : n ∈ N}. h
Solution. How do you proceed? You are now thinking algorithmically. First give a
try before reading further. Well, for the language at hand, our approach should be
analogous to that in Example 7.10. You now know how to match a’s with b’s. Match
them by replacing b’s with d’s instead of erasing the b’s altogether. Then you are left
with d’s and c’s. This can be tackled as in Example 7.10.
However, we will use another idea to solve this problem. Suppose you have erased
one a from the left-end, one c from the right-end, and one b from somewhere in the
middle. Then you are left with a string of the form
a n−1 bk
b bn−1−k cn−1 .
7.5 Using Old Machines 225
You are not able to use the same procedure as you might have followed earlier to
erase a, b, c, one each, because this is not in the form a m bm cm . If somehow, you
can transform this string to the initial form, then things will be easier. Say, we will
transform a n−1 bk b bn−1−k cn−1 to b a n−1 bk bn−1−k cn−1 . Can you have a machine to
do it? What you require is that if the machine is given an input in the form u b v, then
upon halting, it would leave the string b uv on the tape, where u contains no blanks
and v may contain blanks. That is, the machine is bringing the currently scanned b
to the square just left of the string u which contains no b ’s.
The machine S R of Fig. 7.12 is such a machine, called the right-shifting machine,
as it just shifts the left string containing no blanks one square to the right. S R copies
the string u symbol by symbol.
SR : l b r¾ l
¾= b
b
Fig. 7.12. Right shifting
machine. rh
S R starts from the b following the string, comes left one square, remembers that
symbol (which is not equal to b ), erases the symbol, goes one square to the right,
writes the symbol there, turns left two squares. By now it is at the square just to
the left of the original blank square. It repeats the loop so that the string is copied
symbol by symbol from the right side. The read–write head is also moving one square
to the left each time the loop is executed. Finally, it reaches the square where the first
symbol of the original string was written, now containing b . It goes left and checks
whether the input string is over or not as usual. If the left one is a b , then the string is
over. To come to the square containing the moved b , a right move is taken. S R thus
transforms w bub v to w
bb uv, where u does not contain b.
Exercise 7.5. Use the notation of for the combined machines to write a com-
putation of S R on the input abb b aa
b ab
b ba
b aa. (Strictly speaking, it is a tape
appearance rather than an input; we agree to overuse the term “input.”)
We can now use the right-shifting machine S R for constructing a machine that
accepts {a n bn cn : n ∈ N}. If the input is ε, then the string is to be accepted. This is
done by turning right, and if found a b , then we simply halt. Otherwise, we will erase
the first a, the last c, the last b (the last b is the first b coming from the right side), and
then right-shift the remaining a’s and b’s. We then come back to the b on the left, the
first square to the left of anything left over from the input. The process is repeated
till we end up with the empty string on the tape; this happens when the input was
really a n bn cn . If the input string is not of this form, then our machine halts abruptly
somewhere in the middle. At this point you must yourself complete the construction.
See Fig. 7.13.
See that this machine really accepts the language {a n bn cn : n ∈ N}. Try designing
another machine to accept this language that does not use S R ; see Problem 7.31.
226 7 Computably Enumerable Languages
r a b Rb l c b l b SR
b
b
c
h
Notice that the language accepted by the machine in Fig. 7.13 is not a context-free
language.
Example 7.12. Construct a machine that shifts a string containing no blanks to one
square left, assuming that the square just left to the string contains a blank. That is,
it must transform v
bu
b w to vu
bb w, where u does not contain a b.
Solution. The idea is similar to S R . We shift the string symbol by symbol from the
left. See the machine diagram for SL in Fig. 7.14.
SL : r bl¾ r
¾= b
b
Fig. 7.14. Left-shifting
machine. lh
Let us see how SL operates. It starts with the blank just to the left of the string and
then ends at the blank just to the right of the left-shifted string. Here is its operation
on the tape appearing as b ab:
b ab ab
bb
bbb a
bb a
bb a
bb a
bb a
b ab
ab
b ab
bb ab
b.
Example 7.13. Construct a machine that copies a string to its right, leaving a blank
in between the string and its copy. Further, at the end, the machine must be scanning
this middle blank. That is, the machine should transform v b u to v
bu
b u, where u
does not contain a blank and v may be any string. As usual, we assume that there are
b ’s to the right of u.
infinite
7.5 Using Old Machines 227
Solution. Verify that the machine C in Fig. 7.15 does the work.
C : r bRb R ¾ L Lb ¾
¾= b
b
7.27. Design a Turing machine that replaces the symbol following the leftmost a by
ab . Moreover, if the input contains no a, it replaces the rightmost nonblank symbol
by a b.
7.29. Construct another copying machine that copies from the right side unlike the
one in Example 7.13 that copies from the left; the copy must be on the right as earlier.
7.30. Design a Turing machine that outputs ww when given input w. Check your
design on the input aabba.
7.33. Can you modify all your combined machines for the languages in the preceding
problem so that they accept the given language and reject the complement of the
corresponding language?
228 7 Computably Enumerable Languages
7.35. Describe combined machines that accept the given language and reject the
complement of the corresponding language in the previous problem.
You have sufficient experience in working with Turing machines as language ac-
ceptors. We can now try to answer the question whether these machines accept all
computably enumerable languages or something else. As you guess, the answer is in
the affirmative. But then we have to do some more work to see that it is indeed so.
We have to have a way to simulate the working of each grammar by a Turing machine
and vice versa. We will do it, albeit, indirectly. Our plan is to first try to extend the
notion of Turing machines a little bit and then bring in the question of estimating its
power.
In this scheme, first we think of equipping a Turing machine with more number of
input tapes. Imagine a machine having, say, three tapes and three read/write heads.
We mark which head works on which tape; the first head works on the first tape,
the second head on the second tape, and the third head on the third tape. As earlier,
we have the same control unit with possibly many states, one of which is marked as
the initial state. We also have the two special states h and possibly different for
different machines.
We give input to this three-tape machine only on the first tape, and the other tapes
are initially blank. The machine may use them whenever required. As earlier, the
machine has a transition function now specifying its action on all the three tapes.
That is, at certain instant of time, if the machine is in a certain state, the three heads
scanning a symbol each on their respective heads, the transition function must give
the information as to what to do next. The next action of the machine will be possibly
a change of state, and also doing one of the three actions of writing symbols on each
of the scanned squares, or moving to the left or moving to the right by the respective
heads on their respective tapes.
Formally, a three-tape machine is a seventuple M = (Q, Σ, Γ, δ, s, h, ), where
Q, Σ, Γ, s, h, are as they were in a Turing machine, and δ is a partial function from
(Q − {h, }) × Γ × Γ × Γ to Q × (Γ ∪ { L , R }) × (Γ ∪ { L , R }) × (Γ ∪ { L
,
R }).
The transition function δ tells what action is taken on each of the three tapes
while changing a state. We define an instantaneous description of the machine or a
configuration as a quadruple (q, u1 a1 v1 , u2 a2 v2 , u3 a3 v3 ). This configuration is to be
understood as
7.6 Multitape TMs 229
The current state of the machine is q, on its i th tape, the currently scanned symbol
is ai , to the left of it is the string ui , and to the right of it is the string vi , for
i = 1, 2, 3.
The yield relation of the configurations is then defined to describe any compu-
tation of the machine, as earlier, but taking care of the three tapes. We say that
such a three-tape machine M accepts a string u ∈ Σ ∗ iff the initial configuration
(s, b u, b ) yields a configuration whose state component is h. And then the lan-
b,
guage accepted by M is defined as the set of all strings from Σ ∗ accepted by M.
Similarly, a string is rejected by the three-tape machine is one that drives the ma-
chine to the rejecting state and the language rejected by the machine is the set of
all strings rejected by it.
Obviously, any work that is done a by a Turing machine (standard, or one-tape)
can also be done a by a three-tape machine. For example, the three-tape machine
that operates on its first tape just as the original standard machine works, and which
does not operate with the second and third tapes at all, would achieve this. That is,
any language accepted by a standard Turing machine is also accepted by a three-tape
machine. We intend to show the converse also. This is a bit harder. The idea is to
simulate the three tapes on a single tape.
Suppose M is a three-tape machine. It works with the three tapes simultaneously,
but step by step. This means that in a single step it changes its three head positions,
and there is only one change of states. Imagine a standard Turing machine M , which
is to simulate faithfully one step of a computation of M. How will it work? We may
give M the same states as that of M (a renaming of states, of course).
What about the working out of the three heads? At the first instant, we may try to
have a copy of the first tape of M on the tape of M , then put a copy of the second
tape keeping some blanks in-between, and then some blanks, followed by a copy of
the third tape. But this will not work, for we do not know how many squares will be
used from the first tape to its right. If we need more than the number of separating
blanks, then contents of the second tape on M will get unduly rewritten. So this
method of keeping copies will fail. There is, of course, a way out. The way is while
simulating M on M we can always make a space and then follow M. But we have
an easier alternative.
Think of the first squares on all the three tapes of M as a triple, and write them in
three consecutive squares on the tape of M . For example, the blanks preceding the
strings on the three tapes are taken as a block of three blanks on the single tape of
M . This triple represents the 0th squares of M. The first b is for the
b of the first
tape of M, the second b for the b on the second tape of M, and the third
b for the
b
on the third tape of M. In such a case, one step of computation of M will be carried
out by three steps of M doing the works of the three heads of M in succession.
For example, a right movement of the first tape of M would be accomplished
by three right movements of M starting from the corresponding square. Think of a
situation when the first head of M is on the second square (on the first tape) and the
second head of it is on the first square (on the second tape).
While simulating M, the head of M must be on one of the squares corresponding
to one of these heads. Then how does it keep track of different positions of different
230 7 Computably Enumerable Languages
heads? We will use a trick here. Instead of using a triple for the three symbols on the
corresponding squares on the tape, we will use a six-tuple.
Suppose that we have a, b, c on the second squares of each of the tapes first to
third, respectively. And the first head is scanning a, second b, and third c. Then, the
corresponding six-tuple will be (a, 1, b, 1, c, 1). Suppose, for example, the second
head is not scanning the symbol b right now, but first is scanning a and third is
scanning c. Then we would represent it as the six-tuple (a, 1, b, 0, c, 1). Is the scheme
clear now? The next right square to the symbol will contain 1 if the corresponding
head is scanning it, else, it contains 0. Of course, the symbols 0 and 1 are new to Γ .
We reword the above scheme of simulation of a three-tape machine by a stan-
dard machine. At some instant of time, suppose the nth squares of the three tapes
of M contain the symbols σ1 , σ2 , σ3 , respectively. Then form the nth six-tuple as
(σ1 , b1 , σ2 , b2 , σ3 , b3 ), where b1 , b2 , b3 ∈ {0, 1}. If bi = 1, then it means that the i th
head is scanning σi and if bi = 0, then the i th head is not scanning the symbol σi
right now. The first squares on the tapes of M are written on the first six squares on
the tape of M , the second squares of tapes of M are written on the next six squares
of the tape of M and so on.
For example, initially, the first six squares of the tape of M will look like
b 1 b 1 b 1. We put an extra symbol, say, 2 to distinguish between the six-tuples; this
is not needed, but will be handy. The next six squares will look like a0 b0b 0 if the
first input symbol is a, which is written on the first tape of M, while all other tapes of
M are blank. The point is, this is only a conceptual device so far. The following is a
description of M , which accomplishes this representation and then simulation of M.
(We assume that 0, 1, 2 have not been used in M.) Given an input u to the three-tape
machine M, the machine M operates as follows:
1. The input u is given to M on its tape. M goes to the blank on the right (R b)
and writes b1b1b 1, using the six squares in succession, goes right and writes 2,
comes back to the beginning of the tape (L b Lb L
b ), goes right, remembers the
symbol, say, σ (the first input symbol), replaces it with τ , a new symbol, goes to
the blank (after 1), writes there in succession σ, 0, b , 0, b , 0, 2, comes back to
τ , erases it, goes right, scanning a square containing a b . This is done iteratively
until the input to M is over. The tape after the completion of this step is as per the
above description of input to M representing the input to M.
2. Now M simulates M step by step by doing one of the following for each i ∈
{1, 2, 3}:
(a) If the i th head of M goes to the left, then M finds a 1 as the (2i )th symbol
after a 2, and then changes that 1 to 0, comes left seven squares, changes that
0 to 1.
(b) If the i th head of M goes to the right, then M finds a 1 as the (2i )th symbol
after a 2 (or the b where M starts), and then changes that 1 to 0, goes right
seven squares, changes that symbol α to 1. Here, the symbol α need not be
always 0, as the final blank positions of the tapes in M can be accessed in a
right movement. In that case, when α 0, M is using a b on the i th tape
while other tapes corresponding to that square are also containing b ’s. Here,
M replaces α as 1 and replaces the b ’s on the other 0–1 positions as 0. That
7.6 Multitape TMs 231
Theorem 7.1. For every k-tape machine M, there exists a standard Turing machine
M such that L(M) = L(M ) and L R (M) = L R (M ).
In what follows, we will use a kind of restriction on Turing machines. The re-
stricted machines will be having some extra power. The extra power I am talking of
is that a Turing machine can know the square on the tape to the left of which there are
only b ’s, that is, the left end of the string on the tape, after some computations. Note
that this is a restriction, as in an arbitrary Turing machine, we do not have any way
of determining the position on the tape from where there exists nothing interesting,
but only the infinite number of b ’s. This knowledge can be exploited to erase all that
is left on the tape if required.
For this purpose, we will equip the Turing machines with an extra symbol, say, ∗,
which will always be there on the tape marking the left end of the interesting portion.
To the left of ∗, there will only be infinite number of b ’s. Initially, such a machine
will be given input in the form of ∗ b u, the machine scanning the
b as the underlining
shows. Whenever the machine tries to go to the left of ∗, it will first move that ∗ one
square to the left, and then continue its operation. Thus the net effect will be that
the ∗ behaves like a b but remains on the left edge of the used portion of the tape.
Specifically, when the machine reaches the symbol ∗, say, in state p, it will erase this
∗, go one square to the left, write the symbol ∗, and then comes one square to the
right, coming back to the state p. This can be achieved by adding two extra states,
say, e and f to the old machine and adding the transitions
δ( p, ∗) = (e, b ) = ( f, L ), δ( f,
b ), δ(e, b ) = (e, ∗), δ(e, ∗) = ( p,
R ),
on the right and adding two extra states, say, e and f to the machine. We then add
the following transitions:
δ( p, ) = (e ,
b ), δ(e ,
b ) = ( f ,
R ), δ( f , b ) = (e , ), δ(e , ) = ( p, L
),
δ(m, ) = (m,
b ), δ(m, τ ) = (m, b ) = (m, L), δ(m, ∗) = (h ,
b ), δ(m, b ),
δ(m , ) = (m ,
b ), δ(m , τ ) = (m ,
b ), δ(m ,
b ) = (m , L), δ(m , ∗) = ( ,
b ),
for every τ ∗, τ .
The new machine M does everything that M does. After this, instead of halting,
M goes to the right until it finds . It erases , comes left erasing whatever it gets
on the way. It also erases ∗, and then halts in h , provided M would have halted in
h, otherwise, it halts in had M halted in . It is clear that M accepts the same
language as M , and it rejects the same language as M . In turn, M accepts the same
language as M and rejects the same language as M. Thus we have shown that
Theorem 7.2. Corresponding to each standard Turing machine, there is at least one
standard Turing machine that accepts or rejects the same language leaving the tape
empty.
7.38. Consider a Turing machine that cannot write a b , that is, if δ( p, a) = (q, b),
then b
b . Show how such a machine can simulate a standard TM.
7.39. Give a formal definition of a TM with a single tape, many control units each
with a read–write head. Show how such a machine can be simulated by a multi-
tape TM.
7.7 Nondeterministic TMs and Grammars 233
7.40. Suppose we restrict a Turing machine by requiring that it cannot write the same
symbol on the same square that it reads, that is, if δ( p, a) = (q, b), then a b. Does
it restrict the power of TMs?
7.41. Consider a TM with a different decision process in which transitions are made
if the currently scanned symbol is not an element of a given set. For instance,
R ) would mean that if the currently scanned symbol is neither an a
δ( p, {a, b}) = (q,
nor a b, and the machine is in state p, then change state to q and go a square right.
Show that such machines can be simulated by standard TMs.
7.42. Consider a variation of TMs, where transitions depend not only on the symbol
currently scanned, but also on the symbols on the squares immediately to the left
and/or right of the currently scanned square. Show that such a TM can be simulated
by a standard TM.
7.43. A multihead TM has a single tape but more than one read–write head working
on the same tape. Give a formal definition of such a machine and describe how it can
be simulated by a standard TM.
Solution. We will follow the same shorthand for writing the configurations, that is,
the configuration (s, u, σ, v) will be written as suσv. Let us see what happens for the
empty input. Computation might proceed as in the following:
s
b p
b q
b.
7.7 Nondeterministic TMs and Grammars 235
This does not accept the empty string. However, we must see all possible
computations to decide whether M accepts ε or not. Here is another computation:
s
b p
b h
b.
b b pb h
s b.
Exercise 7.6. Construct an NTM (not deterministic) for accepting {ε}. Also con-
struct an NTM for accepting a ∗ ∪ a ∗ b.
As the partial function of any Turing machine is also a relation, every Turing
machine is, by default, a nondeterministic Turing machine. The combinations of ma-
chines (when we use machine diagrams) can also be extended for nondeterministic
machines. To start with, we may take the basic machines as earlier, which are de-
terministic. (Henceforth, when both types of machines occur in a context, we will
refer to the standard Turing machines as deterministic machines (DTM). However,
deterministic machines also include k-tape Turing machines.)
Recollect that the basic machines were the head moving machines L and R , the
symbol writing machines σ, for every σ ∈ Γ , the tape alphabet, and the two halting
machines h and . In earlier diagrams, when we have an arrow from M1 to M2 labeled
τ , we would never have an arrow from M1 to another machine M3 labeled τ . Now
we allow that to create nondeterministic machine combinations. Thus in a nondeter-
ministic machine combination (its diagram), we can have multiple arrows emanating
from the same machine with same labels, going to different machines.
For example, the machine diagram of Fig. 7.16 represents a nondeterministic
combination of machines. The nondeterministic Turing machine M of Fig. 7.16 first
a
M: r bh
goes to the right. If that symbol happens to be an a, then it either erases it and halts,
or it moves to the right until it finds a
b , when it halts.
Solution. You can construct one for yourself, even a deterministic one. But try con-
structing one nondeterministic machine diagram that is not a deterministic diagram
before reading further, and then come back to compare your solution with the one
given in Fig. 7.17.
a; b
M : r r a; b
b; b
b
Exercise 7.7. Show why a simpler machine as drawn in the following diagram does
not accept L = ((a ∪ b)∗ b(a ∪ b)∗ ) ∪ {ε}. Can you make it still simpler to accept L?
a; b
M : r
b
h
It is now clear that such machine diagrams do indeed give rise to nondeterministic
machines. We will use such diagrams for constructing nondeterministic machines
instead of writing the states and transition relations in detail. Further we will refer to
such machine diagrams as nondeterministic machine diagrams.
It is convenient to have nondeterminism. But is it just convenient, or it increases
the power of a Turing machine? Given a nondeterministic machine M, does there
exist a deterministic machine D accepting the same language as M? Again, we will
see that the answer is in the affirmative. The idea is, D can be constructed by looking
through all possible computations of M on any given input, as at any one particular
step, there are only a finite number of choices for M.
Suppose M = (Q, Σ, Γ, Δ, s, h, ) is a nondeterministic Turing machine. Let C
1
and C be two configurations such that C C in M. For a given C, there can be
many C s. However, if C has the state component q and the symbol being scanned is
σ, then the number of such C s is exactly the same as the number of pairs of the form
((q, σ), ( p, τ )) in Δ. Obviously, this number is finite and is bounded by |Q|·(|Γ |+2),
7.7 Nondeterministic TMs and Grammars 237
where |A| denotes the number of elements in the set A. The “plus 2” comes from the
extra symbols L and R . Now, if r is the maximum number of transitions applicable
at any C, then r ≤ |Q| · (|Γ | + 2). This r can, of course, be found out by looking
at Δ. With such an r , the machine M starts from the initial configuration and then
follows up one of the possible r (or less) choices at every step.
If C is the current configuration of M, then it may follow one of the r or less paths
of the computations in the next step:
C C1 , or C C2 , or . . . or C Cr .
This r here may be different from the previous r . Taking r = |Q| · (|Γ | + 2), we
have at the maximum r choices at every step of any computation. That is, at the
most there are r 2 possible two-steps computations starting from any configuration.
In general, there are r m possible m-step computations at the most starting from any
configuration.
The standard Turing machine D that would simulate M checks all possible
one-step computations first, and then accepts a string whenever there was a one-step
computation starting from C, which yields an accepting configuration (with state
component h) of M in one step. When it fails, it goes for checking all two-steps
computations for acceptance, and so on. D does not accept a string that did not yield
an accepting configuration of M in those finite-steps computations. If we can have
such a machine D, it would definitely accept L(M).
We take the Turing machine D as a three-tape deterministic machine. The first
tape of D is the tape of M. Initially D copies the contents of the first tape to the
second tape. Throughout the operation, D never changes its first tape. D writes an
integer n on its third tape and then generates a string of n integers (not necessarily
distinct) from among the numbers 1, 2, . . . , r , separating them by a b and keeps this
string next to n separating it by a b . D works on the second and third tapes towards
simulating M.
For simulating the first step of computation of M, D chooses one of the r (at the
most) possible choices. It keeps its second head on the b following the first integer on
the second tape. If that integer is m, it means that out of the r choices, it has chosen
the mth choice. This means that if the current state–symbol pair is (q, σ), and if we
have the applicable transitions in the forms ((q, σ), (q1 , σ1 )), . . . ((q, σ), (qr , σr )),
then it has decided to follow the mth transition ((q, σ), (qm , σm )). So, we assume an
ordering of transitions of M.
It begins its work by placing its third head on the first integer after n. On the sec-
ond tape it does the work by following the chosen transition; the mth applicable one
from the current configuration of M if the first integer is m. Once this computation
of one step is over, it puts its second head on the next integer. It follows the same
strategy again and continues until it halts or until it finishes the nth integer on the
third tape (It leaves the n in the beginning of the tape any way.) If by this time D has
not halted, then it erases its second and third tapes, chooses n as n + 1 and starts work
238 7 Computably Enumerable Languages
afresh. Notice that D simply simulates M by exploring all possible n-step computa-
tions of M at each phase. By fixing n = 1 in the beginning, all possible computations
are eventually explored.
If M accepts an input u, say, in n steps, where n is the least such integer, then D
would enter a halted configuration in the nth phase, after n − 1 unsuccessful phases
of work. That is, u is accepted by D any how. Conversely, if D enters an accepting
configuration in the nth phase starting with input u, then n is the least integer such
that M accepts u in an n-steps computation.
Similarly, equipping a nondeterministic machine with more than one tape makes
no difference. Thus we have proved the following result:
Exercise 7.8. Show that in the above simulation of a nondeterministic Turing ma-
chine M by a deterministic Turing machine D, if M accepts u in n steps then D
accepts u in at most r + r 2 + · · · + r n steps.
Exercise 7.9. What about the languages rejected by deterministic and nondetermin-
istic Turing machines? Does a statement similar to Theorem 7.3 hold?
We want to see that a more unified statement holds. That is, each computably
enumerable language ought to be accepted by a Turing machine. To this end, we plan
to simulate a grammatical derivation in a nondeterministic Turing machine and try
to simulate the work of any Turing machine by a grammar. This will accomplish our
job due to Theorems 7.1–7.3.
After the end of each such step, M matches whether the input string u on the first
tape is the same as the generated string w on the third tape. If yes, it stops in the
accepting halt state h, else it enters the rejecting halt state . Notice that the here
does not mean that the string on the first tape is rejected. Because of nondeterminism,
a string will be rejected if all possible computations on the input result in the state .
It is now clear that M halts on an input u only when it is generated by the gram-
mar G. This proves the if part of the theorem that every computably enumerable
language is accepted by a Turing machine due to Theorems 7.1–7.3.
Conversely, suppose D = (Q, Σ, Γ, δ, s, h, ) is a Turing machine that accepts a
language L. We want to construct a grammar G that generates L. We take G =
(N, Γ, R, S), where N = Q ∪ Γ ∪ {X, S} with two new symbols as X and S. The set
of productions, R, contains the rules as given in the following:
5. S → h A, S → A ∈ R; and s → ε, A → ε ∈ R.
Our plan is to simulate the working of the machine by the derivations in the reverse
order. The extra rule as in (2) takes care of the extension of the tape to the right by a
new blank; the extra rule in (4) similarly accommodates the erasing of extra blanks
on the left. The rules in (5) erase the part of the final string to the left of the input
and erases the extra symbol A. It is clear from the description above that
( p, usv) D (q, xτ y) iff xτ qy A ⇒G uσ pvA. (7.1)
Now, since the only rules involving S, s are as in (5), and A can give a terminal only
by the rule A → ε as in (5), we have
b h A ⇒G
w ∈ L(G) iff S ⇒G b swA ⇒G wA ⇒G w.
From (7.1) above, we conclude that w ∈ L(G) iff D halts on the input w accepting it.
This completes the proof.
7.44. Construct an NTM that accepts the language {u2v : u, v ∈ {0, 1}∗ , u v}.
7.45. Show that for each NTM, there exists an equivalent NTM that has a unique
accepting configuration for each accepted input. [Can you prove a theorem analogous
to Theorem 7.2?]
7.46. Construct a Turing machine to accept the language (ab)+ and then construct a
grammar equivalent to it. Derive the string abab in this grammar.
240 7 Computably Enumerable Languages
7.48. Let N be the NTM with input alphabet {a, =}, initial state s, and transitions
(s, =, p, a), (s, =, p, =), ( p, =, p, =), ( p, a, s,
R ), ( p, a, h,
R ). Starting from the
initial configuration s=, find all possible computations of length less than five.
Explain what M eventually does starting from this initial configuration.
7.49. Suppose the tape of a TM has one nonblank symbol, say, 0, at one square only;
b . Design an NTM that starts from some square containing
all other squares contain
ab and finds out the 0 and then halts. Design a DTM for doing the same job.
7.50. Design NTMs (not deterministic) that decide the following languages over
{a, b}, and then construct equivalent grammars from the NTMs.
(a) ∅.
(b) {ε}.
(c) {a}.
(d) a ∗ ba ∗ b.
7.51. Prove: for each NTM M, we can construct a standard TM D such that for any
b,
string not containing
(a) If M accepts the string w, then D halts in its accepting state in some configuration
in which M would have halted.
(b) If M does not accept the string w, then D does not accept w.
7.59. Show that each unrestricted grammar is equivalent to one in which no terminal
symbol occurs on the left hand side of any production rule.
7.63. Show that every computation that can be done by a standard TM can be done
by a two-tape TM with at most two states (besides h and ).
7.64. Show that for each TM, there exists an equivalent standard TM with at most
six states. Can you reduce the number of states further, say, to three?
7.65. Let M be a TM. Assume, without loss of generality (why?), that each compu-
tation of M is of even length. For any such computation q0 b w w1 · · · wn ,
construct the string q0 b w w1R w2 w3R · · · wn−1R
wn . This is called
a valid computation of M on w. Show that for every M, three CFGs G 1 , G 2 , and G 3
can be constructed such that
(a) The set of all valid computations is L(G 1 ) ∩ L(G 2 ).
(b) The complement of the set of all valid computations is L(G 3 ).
7.66. Consider a model of Turing machine in which each move permits the read–
write head to travel more than one square on either direction, the number of squares it
can travel being the third argument of δ. Give a precise definition of such a machine,
and then show that such a machine can be simulated by a standard Turing machine.
7.67. A nonerasing Turing machine is one which cannot replace a nonblank symbol
b . This means, if δ( p, a) = (q,
by a b ), then a =
b . Show that nonerasing machines
are no less powerful than standard TMs.
7.68. A write-once Turing machine is a standard TM, where the read–write head is
allowed to replace a symbol on a cell at most once. Show that this variant of TM is
equivalent to the standard TM.
7.69. Show that single-tape TMs that are not allowed to write on the portion of the
tape containing the input accept only regular languages.
7.70. A stay-put Turing machine is a TM with a left end beyond which the read–write
head cannot move. Further, these machines do not have one-step left-movements; the
7.8 Summary and Additional Problems 243
L
signifies jumping back to the left-most cell. Show that this variant is not equivalent
to the standard TMs. What class of languages are accepted by these machines?
7.73. A two-stack automaton is a PDA with two stacks. Formally define a two-stack
automaton and then prove that two-stack automata are equivalent to Turing machines.
(d) What is the language of the Post System that has Σ = {a}, A = {a}, and the only
production V1 → V1 V1 ?
(e) Prove that a language is computably enumerable iff it is the language of some
Post System.
(f) In a restricted Post System, we further require that m = n, that is, the number of
occurrences of variables on the left and on the right sides of a production is the
same. Show that restricted Post Systems generate the same class of languages as
the Post Systems.
8 A Noncomputably
Enumerable Language
8.1 Introduction
In Sect. 7.7, I gave a passing comment that the set of composite numbers is com-
putably enumerable. There is a minor huddle in using the terminology to arbitrary
sets, for example, a set of numbers, rather than to languages. That is not a big hud-
dle, for we just represent a countable set by a language and try to solve the problem
about the language. Agreed that we can use the adjective “computably enumerable”
for sets, how do we proceed to show that the set of composite numbers is computably
enumerable? Its solution, as suggested there, requires a Turing machine capable of
multiplying two numbers p and q.
Now this is something unnatural to the way we were using all our automata. Why?
We have used all varieties of automata as language acceptors, which, given a string,
would signal to us whether it accepts the string or not. Here, when we say that we
want multiplication of p and q, our requirement is a bit more demanding. We want a
Turing machine not only to operate and halt on an input, but also it should give us an
output upon halting.
To add to the problem, when a machine halts after giving an output, another
machine may also work on the same tape, taking the content of the tape as its in-
put tape. That is, the first machine must leave the tape in such a way that it can be
used as an initial tape for another machine. As our inputs are written as b u, where
the read/write head is placed initially on the b preceding the input, we may like
our output to be in the same form. This means that the first machine must scan the
symbol b preceding the output while halting so that the second machine will start
working.
This requirement amounts to imagining the machine as the control unit only,
whereas the tape and the read/write head can be detached from it and attached to
another machine that might work after the first had completed its job. It goes along
well with our machine combinations also. We will put it as a convention while com-
puting any function.
Computing a function f (·) means that given an input x to the machine, it must output
the value f (x) upon halting. As we have adopted our data structure as strings, we
will use them for computing functions mapping strings to strings. Shortly, we will
see how to compute arbitrary functions via representations of other types of data as
strings.
Formally, let f : Σ ∗ → Σ1∗ be a partial function. We take Γ ⊇ Σ ∪ Σ1 and
think of f as a partial function from Σ ∗ to Γ ∗ . A Turing machine M = (Q, Σ,
Γ, δ, s, h, ) is said to compute the partial function f iff for every u ∈ Σ ∗ , when-
ever f (u) = v ∈ Σ1∗ ⊆ Γ ∗ , we have s b u h b v in M. Moreover, if f (u) is
undefined, then M either does not halt on the input u going on an infinite computa-
tion or halts in the state . Notice that we exclude the case of an abrupt halt here.
A partial function f : Σ ∗ → Σ1∗ is said to be computable iff there exists a Turing
machine that computes f .
If f is a total function from Σ ∗ , then the machine M that computes it must halt
on every input from Σ ∗ . It may perhaps go on an infinite computation on inputs over
other alphabets, but never so when the input is a string over Σ.
We should first address the issue of representing other types of inputs as strings.
If the inputs are, say, numbers, then how do we go about representing them as strings?
You may do it in many ways. As a start, we use unary representation of natural
numbers. Here, 0 is represented as 0 itself, 1 is represented as 00, and so on. In
general, the natural number n is represented as 0n+1 . If f : N → N is a partial
function, then we represent f as f : 0∗ → 0∗ , where if f (m) = n, then f (0m+1 ) =
0n+1 . For example, the square function f : N → N with f (n) = n 2 is represented as
2
f : 0∗ → 0∗ with f (0m+1 ) = 0m +1 .
What about functions from N × N to N, such as addition or multiplication of
natural numbers? Here again we follow the same procedure, but we must take care as
to how to represent an ordered pair (m, n) as an input. We just separate the numbers
m and n or rather their representations by a 1. The ordered pair (m, n) is represented
as 0m+1 10n+1 . Then as an input, the initial tape appears as b 0m+1 10n+1 . That is, the
symbol 1 represents the comma (,) in between the pair of numbers.
We assume that the tape alphabet of the machine that would compute such a func-
tion will have 1 in it and this symbol 1 is never used for any other purpose. Thus the
operation of addition of two natural numbers is represented as a (partial) function
+ : 0∗ 10∗ → 0∗ , where if the initial configuration of the machine that computes it is
s b 0m+1 10n+1, the final configuration must be h
b 0m+n+1 .
In general, any partial function f : N → Nn , called an arithmetic function, is
m
That is, the input tape of the machine that would compute f appears initially as
b 0k1 +1 10k2 +1 1 · · · 10km +1 (m − 1 number of 1’s).
b 0 p1 +1 10 p2+1 1 · · · 10 pn +1
(n − 1 number of 1’s).
In what follows, we will not distinguish between the numbers (or functions) and
their corresponding representations. We say that f is computable whenever the cor-
responding f is computable. For computing a function f , we simply discuss how
to compute f . We will loosely use the symbol f , although what we mean is its
representation f .
Solution. Initially, the tape appears as b 0m+1 10n+1. We rewrite that 1 as a 0. The
tape contains the string 0m+n+3 . We delete two 0’s so that the output will be 0m+n+1 as
required. This is accomplished by the following machine:
R 0L r b r b h .
1 b
Example 8.3. Let Σ be any alphabet. For each string u ∈ Σ ∗ , define f (u) = u R , the
reversal of the string u. This defines f to be the reversal function. Is it computable?
Solution. Our plan is to use the copying machine and then read a symbol from the
right end of the copied string, erase it, and write it on the left end of the original
string. When the copied string is over, we must have the reversal of the string on
the tape, and finally, we must come back to the blank square preceding the reversed
string. Recollect that after a string is copied, the copying machine C scans the blank
square in between the string and its copy, that is, C transforms the tape appearance
b u to ub
u. The machine diagram for reversing a string is given in Fig. 8.1.
You can also construct a machine to reverse a string without using the copying
machine. Try it! You can also simplify the machine in Fig. 8.1.
248 8 A Noncomputably Enumerable Language
C Rb l b Lb Lb ¾ r b Rb
¾= b
l Lb h
Exercise 8.2. Show that multiplication of two natural numbers is a computable func-
tion. [Hint: Use repeated copying of the string 0n exactly m times and then add one
b 0m+1 10n+1 to
more 0 to transform the tape from b 0m·n+1 . You must take care of the
cases m = 0 and/or n = 0.]
Exercise 8.3. Use Exercise 8.2 to show that the set of all composite numbers is
computably enumerable (as a language over {0}). What about the set of all prime
numbers?
the contents of third tape, that is, the enumerated strings for the string on the first
tape. If matching is found, then, M1 halts in its accepting state. You see that any
string that can be generated by M1 is eventually accepted by M1 .
Let us say that a Turing machine M enumerates a language L ⊆ Σ ∗ if M lists
all elements of L (and none of L), one by one, as output. Such a machine is called
an enumerator. The above discussion can be summarized as follows:
In fact, this is the reason why computably enumerable languages are called com-
putably enumerable!
8.1. What function from N to N is computed by the TM with initial state s and tran-
sitions given by δ(s, b ) = ( p, L ), δ( p, R ), δ( p, 0) = (q,
b ) = (h, b ), δ(q,
b) =
(r, L ), δ(r, b ) = (h,
R ), δ(r, 0) = (h, R )?
8.2. Which function is computed by the TM with initial state s and transitions
b ) = ( p, L ), δ( p, a) = ( p, L ), δ( p, b) = ( p, L )δ( p,
δ(s, b ) = (q, b), δ(q, a) =
R ), δ(q, b) = (q,
(q, R ), δ(q,
b ) = (h, b )?
8.3. Design Turing machines with states and transitions for doing the following
jobs:
(a) Addition of two natural numbers.
(b) Given a string of a’s, it leaves a blank to the right of the given string and then
copies the string.
(c) Given two strings of a’s with a blank in between, it outputs 1 if first string is of
length at least that of the second; else, it outputs 0.
(d) Given a string w ∈ {0, 1}+ as input, it outputs w R .
8.4. Give a combined machine to compute the function f : {a, b}∗ → {a, b}∗ defined
by f (w) = ww R .
8.8. Design a combined machine which, given a string a1 · · · an an+1 · · · a2n of even
length, outputs a1 · · · an can+1 · · · a2n , where c ∈ Σ. That is, it finds the middle of the
string.
8.13. Using the TMs for addition, subtraction, multiplication, comparison of two
natural numbers, and the copying machine, design TMs for computing the following
functions f : N → N, where
(a) f (n) = n(n + 1).
(b) f (n) = n 3 .
(c) f (n) = 2n .
(d) f (n) = n!.
(e) f (n) = n n! .
8.14. Construct a TM that computes concatenation of two strings, that is, it must
compute the function f : Σ ∗ × Σ ∗ → Σ ∗ defined by f (u, v) = uv.
8.15. Describe a Turing machine that converts a given positive integer written in
decimal notation to one in binary.
8.16. Suggest a method to represent rational numbers as strings and then give a
description of a Turing machine for adding and subtracting rational numbers.
8.17. Describe Turing machines for adding and subtracting natural numbers given in
decimal notation.
8.3 TMs as Language Deciders 251
8.18. Let f : N → N be a total function. The graph of f is the set of ordered pairs
(x, f (x)). Answer the following with informal but clear constructions:
(a) How to construct a TM that accepts the graph of f from one that computes f ?
(b) How to construct a TM that computes f from one that accepts the graph of f ?
(c) If f is a partial function, then a TM that computes it need not halt on the inputs
for which f is undefined. Answer (a and b) for a partial function f : N → N.
8.20. Prove that a language (or a set) A is computably enumerable iff it is the range
of a computable function.
Decidable languages were also called recursive languages, though decidability and
recursiveness have different intensional meanings. Usually the terms “acceptable”
and “decidable” are used together and similarly the terms “computably enumer-
able” and “recursive.” Here, the word “recursive” has nothing to do with algorithms
that call themselves. To avoid this confusion in terminology, we will use the word
“decidable” instead.
If M is a Turing machine that computes χ L , then upon input u to it, M outputs
α if u ∈ L, and M outputs ν if u ∈ L. In such a case, we say that M decides the
language L.
Compare the definitions of accepting a language and deciding one. If M accepts a
language L, then on inputs u from L, it halts (in the accepting state h), and for other
inputs not from L, the machine either enters an infinite computation, or eventually
enters a nonhalt state, or it halts in the rejecting state . While M decides L, it halts
on every input from Σ ∗ and outputs different symbols depending upon whether the
input is from L or not. Such a TM is called a decider of the language L. A decider
is necessarily a total TM, be it deterministic or nondeterministic.
Decidable languages can also be defined without using the notion of computable
functions. It is because we have two halting states, h for accepting and for rejecting
strings. If L ⊆ Σ ∗ is decidable, then given an input string u ∈ Σ ∗ , a machine M halts
with output α or ν depending on whether u ∈ L or not. We can construct another
machine that works after M has worked in such a way that it halts in state h starting
from α and it halts in starting from ν. Look at the machine M in Fig. 8.2. If M
computes the characteristic function χ L , of L ⊆ Σ ∗ , then M clearly accepts L and
rejects L.
®
M : M r h
º
Fig. 8.2. From computing
a characteristic function to
acceptance and rejection. h
Conversely let M be a machine that halts in state h for inputs from L and it
halts in state for inputs from L. Assume, without loss of generality, that it has
erased every symbol on the tape before halting (due to Theorem 7.2). Suppose M =
(Q, Σ, Γ, δ, s, h, ) is such a TM. Take four new states h 1 , h 2 , H, H and construct
the TM M = (Q ∪ {h 1 , h 2 , H, H }, Σ, Γ, δ , s, H, H ), where δ is an extension of
δ with additional transitions as
δ (h,
b ) = (h 1 ,
R ), δ (h 1 , b ) = (h 1 , α), δ (h 1 , α) = (H, L
),
δ (, b ) = (h 2 , R ), δ (h 2 , b ) = (h 2 , ν), δ (h 2 , ν) = (H, L ).
b
r Rb l b
b
b b
Solution. L is a regular language; it should be easy to design a DTM for deciding it.
Given an input, all that we check is whether there is an occurrence of an a in it or not.
For example, let the read–write head of the TM move right if it gets a b, and halts in h
if it gets an a. Of course, this TM is also an NTM. But to see a truly nondeterministic
machine, let us ask the machine to guess a position on the string where it expects an
a. If it is an a, accept, else, reject. See Fig. 8.5 for an implementation of this idea.
h
a; b a
r
Fig. 8.5. A nondeter-
ministic decider for b;b
{a, b}∗ a{a, b}∗ . h
The machine just takes some right movements to guess a position on the input
string, and then it checks whether the scanned symbol is an a. If it is, it accepts. If
it is not, it does not accept. But, input strings that do not contain an a are always
rejected. Hence, it decides the language L.
What does the phrase “if it is an a, accept, else reject” mean? Does it mean that
if at the guessed position, an a is not there, it will reject the string? No. Because,
for rejection of the string, we must ascertain that every computation on the string
leads to the rejecting state. In such a case, it simply tries another computation. The
nondeterministic acceptance and decision process is encapsulated by:
The parameter could be a substring of the input, a symbol at some particular position
of the string, or even, any string or number not related to the input at all. The last
sentence above is heavily loaded. It is somewhat more favorable towards acceptance
than rejection. If the specified condition is satisfied, then it accepts, of course. But if
the condition is not satisfied, then the word “reject” does not mean the input string
is rejected. It really means that the choice of the parameter is rejected. The NTM
goes on guessing another value of the parameter and repeats the computation again,
setting the time spent till now to zero. The time spent on such rejections are not
accounted for. We will see later how this matters in a significant way.
8.3 TMs as Language Deciders 255
Theorem 8.3. A language L over an alphabet Σ is decidable iff both L and its com-
plement L are computably enumerable.
M1 : º M2 : ®
M r r M r r
® º
h h
Exercise 8.6. It follows from Theorem 8.3 that “if a language L is decidable then so
is its complement.” Give a direct proof of this statement.
For deciding a language L, Theorem 8.3 gives you the freedom to design TMs for
accepting L and L separately instead of constructing one total TM that would accept
L and reject L simultaneously.
Exercise 8.7. Design machines that accept the languages {a n : n is odd } and {a n :
n is even }. Use these to construct a machine that decides {a n : n is odd }.
You have seen that Turing machines accept the same languages as generated by
unrestricted grammars. It then seems reasonable to believe that we are tackling the
most general type of automata. That there cannot be any automata (or even any com-
putational device) strictly more general than Turing machines is to assert that, what-
ever ingenious computational device we may think of, all that it can do can also be
done by a Turing machine. If such a computational device can accept a language, we
can have a Turing machine to accept the same language. If such a device can compute
a function, we can have a Turing machine to compute the same function.
In fact, many such devices are already in existence, for example, λ-calculus,
μ-recursive functions, Post’s machine, unlimited register machines, Markov’s nor-
mal algorithms, while-programs, and many other grammatical models of compu-
tation such as P-Systems with suitable constraints, membrane computing, genetic
algorithms, DNA-computing, and quantum computing. All these devices have been
shown to compute the same class of functions as standard Turing machines.
This gives evidence to the belief that a total Turing machine is the correct formal-
ization of the notion of an “algorithm.” This belief, commonly known as Church-
Turing thesis, amounts to asserting that any class of problems that cannot be solved
by total Turing machines is, in fact, algorithmically unsolvable. Nevertheless, partic-
ular problems of the same class can very well be solved by algorithms (total Turing
machines), but there cannot be a single algorithm for the whole class.
8.24. Show that the family of decidable languages is closed under Kleene star, rever-
sal, union, intersection, concatenation, and complementation. What about the family
of computably enumerable languages?
8.26. Describe how an NTM may compute a function. Give a function where it is
more natural to compute by an NTM than by a DTM.
8.27. Give a language that an NTM decides more naturally than a DTM.
8.29. Prove that the complement of a CFL is decidable. Conclude that decidable
languages need not be CFLs.
8.30. Use NTMs to show that both the classes of decidable languages and of
computably enumerable languages are closed under union, concatenation, and
Kleene star.
Theorem 8.3 does not answer the question whether decidability is any different from
acceptability. It only says that decidable languages are closed under complementa-
tion. It does not say anything similar about computably enumerable languages. It is
still possible that each computably enumerable language is decidable. It is also pos-
sible that there is a computably enumerable language that is not decidable. All that
Theorem 8.3 says is that in the latter case, the complement of such a language is not
computably enumerable.
So, our question now takes the form: “Whether there exists a computably enumer-
able language whose complement is not computably enumerable?” We ask a weaker
question first: whether there are languages at all that are not computably enumerable?
It leads to an easy attempt, the counting argument. Such an attempt might succeed
if there are more languages than machines. But then how many languages and how
many machines are there? How do we determine how many Turing machines are
there?
Each Turing machine is a seven-tuple that uses finite number of states, a finite set
for its tape alphabet, of which a subset is the input alphabet, and a finite relation, the
transition relation (or a function). We can thus safely assume that the set of states for
each machine is drawn from a countable set
Q ∞ = {q0 , q1 , q2 , . . .}.
For different machines, we can use altogether different states by renaming the states
if necessary. The idea is that each machine has a set of states, none of the states being
258 8 A Noncomputably Enumerable Language
used in any other machine and each such set is a finite subset of Q ∞ . Notice that h
and are also some states in Q ∞ ; they, of course, differ from machine to machine.
Similarly, we can assume that the tape alphabet Γ of each machine contains b
and some finite number of symbols drawn from a countable set
Γ∞ = {σ0 , σ1 , σ2 , . . .}.
Naturally, we have the special symbols R used in every machine. (For the
b , L ,
machines that would decide languages, you may also require the special symbols α
and ν. But you can avoid them by using the halting states h and for accepting,
rejecting, and deciding languages.) Our plan is to represent all these symbols as
strings of 0’s. We do this with the help of a map, call it ρ, which is defined as follows:
ρ(
b ) = 0, ρ ( L ) = 02 , ρ ( 3
R ) = 0 , and
The extra 1’s in between say that q0 is the initial state, and q2 is the accepting state,
and q3 is the rejecting state. Notice the occurrence of 111; it says that the description
8.4 How Many Machines? 259
of the initial state is over and the description of the halting states is starting. The 11
after that tells that the accepting state is over and the rejecting state is starting.
We will represent the quadruples ( p, σ, q, τ ) again the same way by separating
each of the components by a 1, that is, as 1 ρ ( p)1 ρ (σ)1 ρ (q)1 ρ (τ )1. For example,
the string
10105103 102 1 stands for the quadruple (q0 , σ1 , q2 , L).
A set of quadruples will simply be written by concatenating the strings of the above
type. For example, the string
10105103 102 1102 102 10103 1 stands for the set {(q0 , σ1 , q2 , L), (q1 ,
b , q0 ,
R )}.
Look at the 11 (Read it as “one one,” not as “eleven.”) in the above string. It signals
the end of one transition and the beginning of the other. There will be no confusion
in knowing from such a binary number which one is a state and which one represents
a nonstate. The first and the third sequences of 0’s are states, and the second and the
fourth sequences of 0’s are symbols from { R } ∪ Γ∞ .
b , L ,
Now when we write the sequence of states and then the sequence (or set) of
quadruples, we will have again two 1’s. We will insert one more 1 as a separator.
That is, if we have in M, the initial state q0 , the halting states q1 , q2 , and the transi-
tions (q0 , σ1 , q2 , L), (q1 ,
b , q0 ,
R ), then M will be represented as
Again look at the occurrences of 1’s carefully. The first three 1’s say that the initial
state is over and the halting states are starting. The second occurrence of 111 says
that the halting states are over and the transitions are starting. We will write such a
sequence of 0’s and 1’s for representing a machine M as ψ(M). It is clear that for
each M, there is a unique binary number ψ(M). Moreover, from ψ(M), the machine
M can be constructed back easily.
Going in the reverse direction, you just scan the above binary number from left to
right. The first sequence of 0’s gives you the initial state q0 , then there are three 1’s.
Next starts the sequence of halting states, separated by 1’s. They are 02 , 03 giving
the halting states as q1 and q2 . Then starts the transitions (after the three 1’s, out of
which the last 1 is in the first transition). They are 1010101031, 10105103 102 1, and
102 10101031, which correspond to the transitions δ(q0 , R ), δ(q0 , σ1 ) =
b ) = (q0 ,
(q2 , L), and δ(q1 , R ), respectively. Thus M is reconstructed.
b ) = (q0 ,
260 8 A Noncomputably Enumerable Language
Note that the extra symbols and states that are not at all used in the transitions will
not appear in the reconstruction of M from ψ(M). But by that we loose nothing, it
is the same machine M anyway. Thus, our implicit assumption with any machine M
here is that there are no unnecessary states and symbols mentioned in Q or Γ that
are not used in the transitions.
Formally, let M = (Q, Σ, Γ, δ, s, h, ) be a Turing machine. Scan all the
transitions in M. If a nonhalt state q ∈ Q is never used in any transition, delete
it from Q. If any symbol σ ∈ Γ is never used in any transition, then delete σ from Γ
(also from Σ). It is clear that the new machine with updated Q and updated Γ does
the same work as M. Call this updated machine as the least form of M. We use only
the machines in their least form. We then find out ψ(M) as described earlier.
Further, there is a possible confusion of self-reference regarding the coding of
tape symbols in terms of 0’s and 1’s. Suppose that 0 is a symbol already used in a
machine such as one that computes an arithmetic function. Are we rewriting this 0 or
keeping it as just a 0? Is the question clear? We have assumed that our set of symbols
Γ∞ contains all possible symbols that any Turing machine might use.
Is 0 in Γ∞ or not? In general, it will be there. Suppose σm = 0 for some m. Then,
in the above scheme, ρ (0) = ρ (σm ) = 0m+4 and not just 0. We can, of course, put
down a convention that 0 will be written as 0, but we will not do that. We will go in
conformity with our rewriting scheme, and represent 0, in this case, by 0m+4 . Similar
comments for the symbol 1 also hold. In this sense, the 0 and 1 used in the represen-
tation are really different from the 0 and 1 as symbols in any input or tape alphabet.
We now make our assumptions explicit to disambiguate possible confusions.
Assumption on Representation: All Turing machines are in least form, unless other-
wise mentioned. The 0 and 1 in Γ∞ are treated as any other symbols, whose repre-
sentations ρ (0) and ρ (1) are possibly not just 0 and 1.
With this assumption, we have thus proved that
Theorem 8.4. There exists a one–one function ψ from the set of all Turing machines
to {0, 1}∗ .
Notice that ψ(M), as constructed earlier, always has exactly two occurrences of
three consecutive 1’s; also, it begins with a 1 and ends with a 1. Thus we do not get
all strings of 0’s and 1’s in the set of all ψ(M)’s. That is, ψ is not an onto function.
Moreover, for every Turing machine M, the string ψ(M) is a binary number.
We will not distinguish between ψ(M) as a string over {0, 1} and as the number of
which it is the binary representation. That is, we take ψ(M) = n, where n has the
binary representation ψ(M). These numbers ψ(M) give an enumeration of all Turing
machines.
Let M be the set of all Turing machines. For each machine M ∈ M, we have
ψ(M), a binary number. That is, ψ : M → N is a one-to-one mapping. Hence M
is countable and each machine M ∈ M has now a unique number, ψ(M). We have
an enumeration of M. It thus makes sense to talk of the mth machine. In fact, for
all the machines, we arrange their corresponding ψ’s in ascending order. If ψ(M)
comes as the mth in this list, then M is the mth machine. (This m is not necessarily
equal to ψ(M).)
8.5 Acceptance Problem 261
Theorem 8.5. Over any alphabet Σ, there are uncountable number of languages
which are not decidable; there are uncountable number of languages which are not
computably enumerable; and there are uncountable number of functions on Σ ∗ which
are not computable. In particular, there are uncountable number of arithmetic func-
tions (functions from Nm to Nn ) which are not computable.
Notice that even without encoding various parts of an arbitrary TM, we could
have shown that there are only a countable number of TMs. We have followed this
path due to another reason, which will be clarified shortly.
8.31. Describe how to determine whether or not a string from {0, 1}∗ is an encoding
of a Turing machine.
8.32. Encode the TM having the transitions δ(q1 , σ1 ) = (q2 , R ), δ(q1 , σ1 ) = (q4 , σ1 ),
δ(q4 , σ1 ) = (q3 , L ), δ(q3 , σ1 ) = (q5 , σ2 ), δ(q5 , σ2 ) = (q2 , L).
8.33. Describe a Turing machine that enumerates the set {a, b}+ in dictionary order.
Find its encoding in binary.
However, Theorem 8.5 does not settle the problem whether there are recursively enu-
merable languages that are not decidable. What we require is at least one computably
enumerable language that is not decidable, or we must show that such a language
does not exist. To this end, our representation of any machine as a binary string will
come of help. A bit of lateral thinking is of value here. We have a scheme to represent
any machine as a binary string. Can we do that for any input string?
262 8 A Noncomputably Enumerable Language
In general, an input to any Turing machine is given as a string over its input al-
phabet. For example, integers m and n are rewritten in unary notation as 0m+1 and
0n+1 , respectively; the ordered pair (m, n) is rewritten as 0m+1 10n+1 whenever they
are used as inputs to Turing machines. The 1 used here is not from 0∗ . However, the
string 0m+1 10n+1 is from {0, 1}∗ , enlarging our input alphabet. Thus, any machine
that computes an arithmetic function has the input alphabet {0, 1}. What about any
other functions, not necessarily from N to N?
Any general input can be thought of as an ordered k-tuple (u1 , u2 , . . . , uk ) of
strings over an alphabet Σ ⊆ Γ∞ . Each ui is represented as ρ (ui ), which is a string
of 0’s and 1’s. The 1’s separate ρ of each symbol occurring in ui , and also 1’s appear in
the beginning and the end once extra, as earlier. For example, the string u = σ0 σ1 σ2
is represented by ρ(u) = 104 105 106 1. Then we separate each such string of 0’s and 1’s
by another 1 to write the k-tuple. For example, the ordered pair (σ0 σ1 σ2 , σ3 σ0 ) is rep-
resented as 104 105 106 1107104 1.
Name this representation of an input as ψ(u), where u is an input, a k-tuple of
strings from an alphabet Σ. As earlier, ψ is a one–one map from the set of inputs to
{0, 1}∗ . It is so, because from a string over {0, 1}, we can first find out occurrences of
double 1’s, then separate out the components of the k-tuple input (if there are k − 1
number of such double 1’s in ψ(u)). Then, we separate out the symbols in each such
component by looking at the occurrence of 1’s.
To demystify, we are aiming at representing any Turing machine and a possible
input to it by a binary string. That is, not only the machine M and the input u, but
also we must be able to represent the phrase “the machine M is supplied with the
input u.” It is fairly easy to do this. We have ψ(M), we also have ψ(u); then we just
concatenate them with an extra 1 in-between. That is, the phrase is represented by
ψ(M)1ψ(u). Is this representation faithful? Can we reconstruct the machine M and
the input u from ψ(M)1ψ(u) uniquely?
In ψ(M), there are just two occurrences of three consecutive 1s. After that the next
such occurrence (of three consecutive 1’s) is at the end of ψ(M) and at the beginning
of ψ(u). Once ψ(u) is separated, u is reconstructed from it uniquely. Given a specific
Turing machine M and a specific input u to it, we can now represent faithfully the
pair (M, u) so that this can be given as an input to other machines. We first define a
specific language from the pair (M, u).
Let M be any arbitrary Turing machine and u be an input string to M. We construct
the acceptance language
Notice that the language L A does not depend upon specific machine M or any spe-
cific input u to it. The M and u are simply used as notation (variables indeed).
What is the meaning of decidability of L A ? If it is decidable, then we must have
a Turing machine, say, U that decides it. As L A ⊆ {0, 1}∗ , the input alphabet of U
is {0, 1}. Thus, given any binary string w (a string of 0’s and 1’s), U must halt in a
halting state. Moreover, if w = ψ(M)1ψ(u) for some Turing machine M and some
input u to M, then U halts in state h iff M accepts u. Further, U halts in state
iff either M rejects u or M enters an infinite computation. Notice that we are using
8.5 Acceptance Problem 263
U decides L A means that “Given any Turing machine M, any input u to it, U
decides whether M accepts u or not.”
Because of this reason, the language L A is called the acceptance language. The
corresponding decision problem:
Proof. We design a (deterministic) Turing machine U with three tapes for accepting
L A . On the first tape is given the binary string w as an input to U. Then U copies this
input to the second tape. It first checks whether the input w is in the form ψ(M)1ψ(v)
for some machine M and some string v. If w is not in this form, then U halts in state .
That is, in this case, U does not accept w. If w is in the specified form, then U first
copies ψ(v) to the third tape, erasing it from the second. Now U simulates M (which
it gets from the second tape) on the input v, which is on the third tape in the form
ψ(v). If M accepts v, then U halts in state h. If M does not accept v, then U also
does not accept its own input as it simulates M step by step. Clearly, U accepts w iff
M accepts v, where w = ψ(M)1ψ(v) for the Turing machine M and input v to M.
Therefore, U accepts L A .
Theorem 8.7. The acceptance language L A is decidable iff every computably enu-
merable language is decidable.
(Our scheme for representing states, symbols, strings, etc. is also hard-wired into
D.) It writes a 1 after the ψ(M) on the first tape and then copies ψ(w) from the third
tape to the first tape to the right of this 1. Then D erases the second and the
third tapes. Now, the first tape contains the string ψ(M)1ψ(w), and second
and the third tapes are blank. Next, D comes back to the b preceding ψ(M)1ψ(w)
on the first tape and simulates U on it.
From the construction of D, it is now obvious that the result of the working out of
D on w is the same as that of U on ψ(M)1ψ(w). That is, D halts in its accepting state
if U halts in state h, and D halts in its rejecting state if U halts in state . However,
U decides L A . Thus, U always halts, either in state h or in state depending upon
whether M accepts w or M does not. This means that D halts in its accepting state
on input w if M accepts w, and D halts in its rejecting state whenever M does not
accept w. But M accepts L. Therefore, D halts in its accepting state whenever w ∈ L
and D halts in its rejecting state whenever w ∈ L. In other words, D decides L; L is
decidable.
This is called the self-acceptance language. The reason for such a name is that
when L SA is decidable by a machine U S , it would mean that U S gives an answer to
the question: “Whether any arbitrary machine accepts itself as its own input?.”
Can you see it how? The machine U S must first determine whether its input string
from {0, 1}∗ represents a machine M, and then it goes for deciding whether that M
accepts the binary string ψ(M). Similar to the acceptance language, we have the
corresponding problem for the language L SA . The self-acceptance problem is the
problem of deciding
halts in state . That is, U halts in state h if M accepts ψ(M), otherwise it halts in
state . Accordingly, D halts in its accepting state or its rejecting state. Therefore, D
decides L SA .
Using Theorems 8.3, 8.6, 8.8, and 8.9, you conclude that
Exercise 8.8. Show directly that the language L SA is not computably enumerable but
L SA is computably enumerable. This will also prove Theorem 8.9.
The languages L A and L SA are computably enumerable but not decidable. More-
over, the languages L A and L SA are not computably enumerable. As L A is com-
putably enumerable, there is a (one-tape) Turing machine M that accepts it. The
machine M has states and a transition function. Given any input w to M, it starts
referring to its transition function and enters a computation. Now, undecidability of
L A means that we cannot have any Turing machine to determine whether M accepts
an arbitrarily supplied input or not.
Whenever M accepts w, clearly it can be so determined. For example, by a ma-
chine that simulates M; even M itself does the job. But if M does not accept w, then
it may either halt abruptly, or rejects w, or goes on an infinite computation. If the
last case happens, that is, when M in fact, goes on an infinite computation on the
input w, then no Turing machine can find out that the machine M in fact goes on
an infinite computation on the input w. This is the reason we had to take recourse in
constructing the self-acceptance language.
Any singleton language {w} is a decidable language (Why?). But there may not
be any Turing machine to decide a countably infinite union of decidable languages.
However, finite union of decidable languages is decidable.
266 8 A Noncomputably Enumerable Language
One final comment about Theorem 8.10. This theorem is often referred to as
telling that
Here, the terminology is that a problem is considered solvable iff the corresponding
language is decidable. Unsolvability of the acceptance problem means that
It is not the case that there exists a Turing machine U such that given any Turing
machine M and any string w as an input to M, the machine U would be able to
decide whether M accepts w or not.
It does not say that “For a given Turing machine M, and a given string w, there can-
not exist a machine that would determine whether M accepts w or not.” The latter is
very much possible for certain Turing machines, especially when a machine decides
a language. However, for the particular machine U that accepts the acceptance lan-
guage L A , this stronger statement also holds. That is, there cannot be any machine to
decide whether U accepts a given input or not. The same holds for any machine that
accepts an undecidable language. Why?
Recall that at the end of Sect. 8.4, I have asked whether corresponding to each TM,
there exists a total TM accepting the same language. Does Theorem 8.10 answer it?
8.36. Determine which of the following sets are computably enumerable and which
are not computably enumerable:
(a) {(ψ(M1ψ(N) : M takes fewer steps than N on input ε}.
(b) {ψ(M) : M takes fewer than 20092009 steps on some input}.
(c) {ψ(M) : M takes fewer than 20092009 steps on at least 20092009 different
inputs}.
(d) {ψ(M) : M takes fewer than 20092009 steps on all inputs}.
(e) {ψ(M) : M accepts at least 2009 strings}.
(f) {ψ(M) : M accepts at most 2009 strings}.
(g) Complement of {ψ(M) : M accepts at least 2009 strings}.
(h) Complement of {ψ(M) : M accepts at most 2009 strings}.
(i) {ψ(M) : L(M) contains at least 2009 strings}.
(j) {ψ(M) : L(M) contains at most 2009 strings}.
(k) {ψ(M) : M halts on all inputs of length less than 2009}.
(l) Complement of {ψ(M) : M halts on all inputs of length less than 2009}.
8.6 Chomsky Hierarchy 267
8.37. Like a universal TM, a universal finite automaton would accept all strings
ψ(M)1ψ(w), where M is a finite automaton accepting the input w. Explain why
a universal finite automaton cannot exist.
8.38. Show that computably enumerable languages are closed under homomorphism
but decidable languages are not.
We have tried extensions of Turing machines by equipping it with more tapes and
the power of nondeterministic choice. However, we see that no more power is added
by doing so. Now, we want to see how a nondeterministic machine can be restricted
(not extended) in a useful manner.
The restriction is on the use of the tape, which is potentially infinite. Instead of
using any finite portion of the tape, the restriction says that only a predefined portion
of the tape has to be used throughout computation. The input alphabet of such a
machine includes two special symbols, say, the left end-marker ∗ and the right end-
marker on the tape. It never moves to the left of ∗ and also it never moves to the
right of . Moreover, it never rewrites these two special symbols. Read ∗ as lem and
as sem.
A linear bounded automaton (LBA) is an eight-tuple B = (Q, Σ, Γ, Δ, s, k,
h, ), where L, R , ∗, ∈ Γ , and k is a given positive integer, and all else are as for
a nondeterministic Turing machine.
The symbols ∗ and never occur in any transition so that when the read–write
head encounters these symbols, the machine halts abruptly. The integer k fixes the
portion of the tape that may be used by the machine throughout any computation.
Specifically, the portion is fixed as kn + 4, where n is the length of the input string.
Thus depending upon the input string, a liner size on the tape is fixed as its space
resource. We will have in the beginning this resource bound in the form of an input.
An input u will always appear as ∗ b ub n(k−1)
b , where n is the length of the
string u. The machine has to operate without going beyond the portion of the tape
between ∗ and . However, we will refer to this appearance on the tape as by telling
that “u is the input to the machine.”
A configuration is written as q ∗uσv, as earlier but with the additional ∗ and
on the tape. The relation of yield among configurations is defined in the usual man-
ner. B accepts a string u iff there is a computation in which (s, ∗ b u b n(k−1)
b )
∗
(h, v) for some v ∈ Γ . The language accepted by B is
Exercise 8.11. Show that there are only a finite number of possible distinct configu-
rations of a linear bounded automaton with an input of length n. Use it to prove that
each context-sensitive language is decidable.
Moreover, the containment relations between language classes as stated earlier are
proper. That means
Further, regular languages are those that are accepted by finite automata; Context-
free languages are those that are accepted by push down automata; Context sensitive
languages are those that are accepted by linear bounded automata; and computably
enumerable languages are those that are accepted by Turing machines.
Because of this hierarchy, computably enumerable languages are named as Type-0
languages, context sensitive languages as Type-1 languages, context-free languages
as Type-2 languages, and regular languages as Type-3 languages. There are lower
type languages that are not of the higher type and each higher type language is, by
default, of the lower type. This is referred to as the Chomsky Hierarchy and the results
stated above constitute the extended Chomsky hierarchy.
270 8 A Noncomputably Enumerable Language
8.44. Prove the Pumping Lemma for Linear Languages: Let L be an infinite linear
language. Then there exists a positive integer n such that any w ∈ L, with
(w) ≥ n,
can be rewritten as w = uvx yz such that
(vy) ≥ 1,
(uvyz) ≤ n, and uvi x y i z ∈ L,
for each i ∈ N.
The versatility of Turing machines have been demonstrated by using them as com-
puters of numerical functions, and then as language deciders. Though the Turing
machines are believed to be most powerful automata, a simple counting argument
shows that there are more languages that are not recognized by them than otherwise.
Our quest has been to construct one such language. This has been done by using the
diagonalization technique of Georg Cantor via encoding of TMs as binary strings.
This also demonstrated the usefulness of our chosen data structure as strings. We
have shown that the acceptance problem of TMs leading to a noncomputably enu-
merable language, and in turn, a problem that cannot be solved by TMs. We then
discussed the hierarchy of languages, called the extended Chomsky hierarchy.
The diagonalization technique was invented by Cantor [12] to prove that there are
fewer algebraic real numbers than the real numbers. A. M. Turing [131] has used it
to show that the halting problem for TMs (given an arbitrary TM and an arbitrary
string, whether the TM halts on input as that string) is unsolvable. We have used his
technique in showing that the language L A is not decidable; and will see, in the next
chapter, that the language of the halting problem is undecidable.
Church’s thesis has been first formulated in [19], observing that the μ-recursive
functions of Gödel and Herbrand [44], and the λ-calculus of Kleene and Church
talked about the same class of functions, unaware of Turing’s work [131], where
Turing’s thesis appeared. For a distinction between Church’s thesis and Turing’s thesis
see [124]. Turing has shown that his machines compute the same class of functions
as defined by the λ-calculus; moreover, his machines present a more compelling
definition of the notion of computability. A good reference on λ-calculus is [9]. Other
systems mentioned in the text that have been shown to be equivalent to the Turing
machine model of computation are Post systems [103, 104], combinatorial logic [24],
Markov’s normal algorithms [82], the unlimited register machines [117], and the
while-programs.
8.7 Summary and Additional Problems 273
8.62. Let f : Σ0∗ → Σ1∗ be a total onto computable function. Show that there is a
total computable function g : Σ1∗ → Σ0∗ such that f (g(w)) = w, for each w ∈ Σ0∗ .
8.67. When grammars are used to compute functions, the order in which rules are
applied is indeterminate. This indeterminacy is avoided by a Markov system due to
274 8 A Noncomputably Enumerable Language
8.68. Prove that left-linear grammars generate all and only the regular languages.
8.69. Suggest a method to obtain a left-linear grammar from an NFA directly. Use
this to construct a left-linear grammar that generates the language (aab∗ab∗ )∗ .
8.70. Let L be a linear language not containing ε. Show that there is a linear grammar
having productions in the forms A → a, A → a B, or A → Ba that generates L.
[Here is a linear grammar without ε-productions and without unit productions.]
8.71. You can say that a CFG is a linear grammar if no production in it has its right
hand side string with more than one nonterminal. A PDA is similarly called single-
turn whenever once its stack decreases in height, it never increases thereafter. For-
malize such a PDA. Show that a language is linear iff it is accepted by a single-turn
PDA.
8.73. Show that the family of linear languages is closed under union and reversal, but
neither under concatenation nor under intersection. What about complementation?
8.75. Show that the family of linear languages is closed under homomorphism.
8.76. Prove that every infinite regular language has a subset that is not computably
enumerable.
8.77. Does there exist a TM that accepts an infinite language, which does not contain
an infinite regular language?
8.78. A directed graph is an ordered pair (V, E), where V is a nonempty finite
set, called the set of vertices, and E ⊆ V × V , called the set of edges. A path
in a directed graph is a sequence of distinct vertices v1 , v2 , . . . , vn such that there
is an edge from each vi to vi+1 . Give an encoding of vertices, edges, and finally,
a directed graph as bits, that is, if x is a vertex, e is an edge, G is a directed
graph, then ρ (x), ρ (e), ρ (G) ∈ {0, 1}∗ . Design an NTM that accepts the language
{ ρ(G)2 ρ(u)2 ρ(v) : there is a path from u to v in G}.
8.7 Summary and Additional Problems 275
8.79. In an undirected graph, both the edges (u, v) and (v, u) are considered the same
edge. In fact, the ordered pair (u, v) is replaced by the two-elements-set {u, v} of
vertices. In an undirected graph, a clique is a subset of vertices such that each vertex
in this subset is connected to each other vertex in this subset, by an edge. The size of
a clique is the number of vertices in a clique. Show that the language { ρ (G)2 ρ (k) :
G has a clique of size k} is computably enumerable.
8.80. A Hamiltonian path in an undirected graph is a path that contains each vertex
of the graph exactly once. Show that the language { ρ (G) : G contains a Hamiltonian
path} is computably enumerable.
8.81. Describe NTMs for accepting the following languages. Take advantage of the
nondeterminism available, by preferring more branches of computations while each
branch is short.
(a) {uvwvx :
(v) = 99, u, v, w, x ∈ {a, b}∗ }.
(b) {w1 2w2 2 · · · 2wn : n ≥ 1, each wi ∈ {0, 1}∗ , and for at least one j, w j is the
binary representation of j }.
(c) {w1 2w2 2 · · · 2wn : n ≥ 1, each wi ∈ {0, 1}∗ , and for at least two indices j, k, w j
is the binary representation of j and wk is the binary representation of k}.
8.83. Prove that every infinite computably enumerable language has an infinite subset
which is decidable.
8.84. There is a variation of Turing machines, where a machine when writes a sym-
bol, also moves, either to the left or to the right. A typical transition that looks like
δ( p, a) = (q, b, L ) means that if the machine is in state p, scans a, then in one
move, it goes to state q replaces that a by a b, and its read–write head is positioned
on the square left to the current one. Similarly, the transitions in an NTM are taken
as quintuples like ( p, a, q, b, L ) instead of quadruples.
(a) What is L(M) if M = ({ p, q, r, s}, {0, 1}, {0, 1, 2, 3, b }, δ, p, h, ) with
δ( p,
b ) = ( p, b, R ), δ( p, 0) = (q, 2, R ), δ( p, 3) = (s, 3, R ), δ(q, 0) = (q, 0, R ),
δ(q, 1) = (r, 3, L ), δ(q, 3) = (q, 3, R ), δ(r, 0) = (r, 0, L ), δ(r, 2) = ( p, 2, R ),
δ(r, 3) = (r, 3, L ), δ(s, 3) = (s, 3, R ), δ(s, b ) = (h, b, R )?
(b) Suppose we represent numbers in binary. Then which numerical function
is computed by M = ({ p, q, r, s, t, u}, {0, 1}, {0, 1, b }, δ, p, h, ), where
δ( p,
b ) = ( p,
b, R ), δ( p, 0) = (q,
b, R ), δ( p, 1) = (u,
b, R ), δ(q, 0) = (q, 0, R ),
δ(q, 1) = (r, 1, R ), δ(r, 0) = (s, 1, L ), δ(r, 1) = (r, 1, R ), δ(r, b ) = (t,
b , L ),
δ(s, 0) = (s, 0, L ), δ(s, 1) = (s, 1, L ), δ(s, b ) = ( p,
b , r t), δ(t, 0) = (t, 0, L ),
δ(t, 1) = (t, b , L ), δ(t, b ) = (h, 0,
R ), δ(u, 0) = (u, b, R ), δ(u, 1) = (u, b, R ),
δ(u, b ) = (h, b ,
R )?
276 8 A Noncomputably Enumerable Language
(c) What is L(M) if M = ({ p, q, r }, {0, 1}, {0, 1, b }, Δ, h, ) is the NTM with tran-
sitions ( p,
b , p,
b,R ), ( p, 0, p, 1,
R ), ( p, 0, q, 1,
R ), (q, 1, r, 0, L
), (r, 1, p,
1, R ), and (q, b , h, b , R )?
(d) Show that these machines accept the same class of languages as the standard
machines. Further, show that these machines compute the same functions as the
standard ones.
8.85. In another variation, a Turing machine has a tape having a left end but no right
end unlike ours, which have both the ends extending to infinity. During computations
of such a machine, if the read–write head attempts to move off the left end of the tape,
then the machine halts abruptly. Show that these machines compute the same class
of functions as ours.
8.86. An off-line Turing machine is a DTM having two tapes, one of them is the input
tape, the other is both a working tape and the output tape. From the input tape, the
first reading head reads symbols, while everything else is done by the second read–
write head on the second tape. Show that off-line machines are equivalent to standard
Turing machines.
8.87. Consider an off-line TM with two tapes, in which the input can be read only
once, moving left to right, and not rewritten. On its work tape, it can use at most n
extra squares for work space, where n is fixed for the machine, not depending upon
inputs. Show that such a machine is equivalent to a finite automaton.
8.88. Give a direct proof of the fact that each language accepted by a PDA is also
accepted by some linear bounded automaton.
8.91. In what way the class of deterministic context-sensitive languages and the class
of context-free languages are related?
8.92. We define a certain class of k-ary numerical functions, the functions from Nk to
N, for various values of k ∈ N. Notice that for k = 0, the 0-ary functions are simply
natural numbers, as such a function has nothing to depend upon. First, we define the
so-called basic functions. These are the following:
1. The k-ary zero function is defined by zrk (n 1 , . . . n k ) = 0 for all n 1 , . . . , n k ∈ N.
2. The j th k-ary identity function is defined by i dk, j (n 1 , . . . , n k ) = n j , for all
n 1 , . . . , n k ∈ N and for all j with 0 < j ≤ k. These functions are also called
projection functions.
3. The successor function is the same as earlier, succ(n) = n + 1.
8.7 Summary and Additional Problems 277
New functions are built from the basic functions by applying the following two
operations:
(i) Let g : Nt → N be a t-ary function, and let h 1 , h 2 , . . . , h t be m-ary func-
tions. The composition of g with h 1 , . . . , h t is the m-ary function f defined by
f (n 1 , . . . , n m ) = g(h 1 (n 1 . . . , n m ), . . . , h t (n 1 , . . . , n m )).
(ii) Let g be a k-ary function and h be a (k + 2)-ary function. The function defined
recursively by g and h is the (k + 1)-ary function given by:
f (n 1 , . . . , n k , 0) = g(n 1 , . . . , n k ),
f (n 1 , . . . , n k , m + 1) = h(n 1 , . . . , n k , m, f (n 1 , . . . , n k , m)),
for all n 1 , . . . , n k , m ∈ N.
The primitive recursive functions are functions from Nk to N obtained from the
basic functions by any (finite) number of successive applications of composition
and recursive definition. Show that the following functions are primitive recursive:
(a) p2(n) = n + 2.
(b) plus(m, n) = m + n.
(c) mult(m, n) = mn.
(d) ex p(m, n) = m n .
(e) Predecessor function: pr ed(0) = 0 and pr ed(n + 1) = n.
(f) Monus function: m ∼ n = max(m − n, 0).
(g) Constant functions: f (n 1 , . . . , n k ) = n, for a fixed n ∈ N.
(h) Sign function: sgn(n) = 0 for n = 0, and sgn(n) = 1 for n > 0.
(i) All polynomials in one or many variables.
8.93. A primitive recursive predicate is a primitive recursive function that only takes
values 0 or 1. When its value is 1, we say that the predicate holds; and when its value
is 0, we say that the predicate does not hold. For instance, the relation m > n is
captured by a primitive recursive predicate, which is evaluated to 1 when m > n,
else, it is evaluated to 0. Similarly, equal(m, n) is a primitive recursive predicate,
which compares m and n for equality. This predicate holds, that is, it is evaluated
to 1, when m = n, else, equal(m, n) is evaluated to 0, in which case, m n.
Show that if g and h are primitive recursive functions of arity k, and p is a primi-
tive recursive predicate with the same arity k, then the function f : Nk → Nk defined
by cases as
f (n 1 , . . . , n k ) = g(n 1 , . . . , n k ) if p(n 1 , . . . , n k ) holds, and
f (n 1 , . . . , n k ) = h(n 1 , . . . , n k ) if p(n 1 , . . . , n k ) does not hold,
is also a primitive recursive function.
8.96. Can you find a numerical function, other than Ackermann’s function, which is
a computable total function but not primitive recursive? Does there exist a numerical
partial function, which is computable but not primitive recursive?
8.98. Show that a numerical function is μ-recursive iff it is computable (by a TM).
8.99. Prove that no computably enumerable set of total μ-recursive functions can
contain all total computable functions.
In (3 and 4), the relation < can be replaced by any one of >, ≤, ≥, =, or . Programs
built inductively from these constructs without using (5) are called for-programs and
those from all of these constructs are called while-programs.
The meanings are as usual in programing languages. For instance, entering the
for-loop: for y do p od, the current value of the variable y is determined, and
the program p is executed that many times. Similarly, upon entering the while-loop:
while x < y do p od, the condition x < y is tested with the current values of x
and y. If the condition x < y is false, then it is considered that the loop has been
executed. On the other hand, if the condition x < y is satisfied, then p is executed
once, and then, the complete loop is executed again. We say that a program computes
a numerical function f if given an input m, it prints f (m).
Let g be a numerical function. Prove that
(a) g is primitive recursive iff there is a for-program to compute it.
(b) g is μ-recursive iff there is a while-program to compute it.
as 0, If the URM with a suitable program halts with f (m) in its first register and 0
elsewhere, we say that it computes f. Similarly, numerical functions g : Nm → Nn
can be computed by URMs.
Prove that a function f : N → N is μ-recursive iff it is computed by a URM with
a suitable program. [Note: The so called random access Turing machines are a kind
of URM.]
8.102. Write a program in C that simulates a Turing machine. Such a simulator pro-
gram should accept the initial configuration, the description of a Turing machine, and
give the output as the output of the machine.
9.1 Introduction
You have come a long way starting from regular grammars to unrestricted grammars
and from deterministic finite automata to Turing machines. You have seen how ver-
satile the Turing machines are. They seem to be the most general kind of computing
devices in the sense that all known algorithms can be realized as Turing machines.
Quoting various approaches to computing models such as unrestricted grammars,
λ-calculus, μ-recursive functions, Post’s machines, Markov’s normal algorithms,
unrestricted memory machines, and while-programs, I asked you to believe in the
fact that all these models characterize the same set of languages. The search is for a
correct formalization of the informal notion of an algorithm.
Informally, an algorithm is a step-by-step procedure that uses a finite amount of
space and time to do a bit of work, where each step is unambiguous (effectiveness).
Any formalization of this notion must also have the capability to express any effec-
tive procedure in a step-by-step manner, while it being such a procedure itself. That
is, any object that is formally defined as an algorithm must be an algorithm infor-
mally. This means that the formally defined entities that are claimed to be algorithms
construe a subset of all informal algorithms.
On the other hand, any object that has been accepted as an algorithm informally
has been shown to be an algorithm formally, using any of the known formalizations.
Surprisingly, all the formalizations created so far agree, in the sense that all of them
have been found to describe the same subset of the set of all informal algorithms.
Thus we come to the conclusion that probably we have captured the notion of an
algorithm; the subset that any or all of these formal devices define is possibly the
whole set of informal algorithms. This is the content of Church–Turing thesis. It says
that an algorithm is nothing but a total Turing machine. As of any scientific theory,
this thesis cannot be proved, but can only be verified, and possibly, be falsified.
If in future, someone defines another formalization of the notion of an algorithm,
which do all works that total Turing machines do and which do some works that a
total Turing machine cannot do, then Church–Turing thesis will be falsified. This of
course is a remote possibility, a possibility nonetheless. As to the verification part,
A. Singh, Elements of Computation Theory, Texts in Computer Science, 281
c Springer-Verlag London Limited 2009
282 9 Algorithmic Solvability
you have seen in the last chapter how many odd works such as copying, reversing, and
shifting of strings could be done by designing simple Turing machines (diagrams).
The purpose of working with machine diagrams were twofold. The first was to
convince you that most of the usual works for which we may think of an easy step-
by-step procedure could be done by total Turing machines. The second goal was to
see that machine diagrams are nothing but flow-chart-like translations of procedural
English, which is commonly used to express algorithms informally.
Turing machines, when written down with all its details of states and transitions,
are comparable to the machine language of a modern computer. The machine dia-
grams built from the basic macros such as the symbol writing machines, the head
moving machines, and the halting machines are analogous to macros in an assembly
language. The informal procedural English used for writing an algorithm is compara-
ble with a high level language such as C or Java. Once you realize this point, we will
express our procedures or Turing machines in simple procedural English. In each
case of an algorithm written this way, you must convince yourself that a machine
diagram can be constructed so that we are, infact, talking of a Turing machine.
It is not the case that there exists a Turing machine D such that given any Turing
machine M and any string u as an input to M, the machine D would be able to
decide whether M accepts u or not.
In the rest of this chapter, you will work through many such problems from lan-
guage theory and will come to know many interesting problems from other branches
of mathematics towards their algorithmic solvability. The main technique in proving
the results on solvability has already been illustrated in showing that the acceptance
problem is unsolvable.
First, we have formulated a suitable language corresponding to the problem; the
acceptance language L A for the acceptance problem. Second, we have constructed
another language L SA , the self-acceptance language. Specifically, we have shown
that if L A is decidable, then so is L SA . That is, we have reduced L SA to L A . Third,
we have used Cantor’s diagonalization technique to show that L SA is not decidable.
Notice the direction of reduction. We say that L SA has been reduced to L A , not
otherwise. It is because if we have an algorithm to decide L A , then we can construct
an algorithm to solve L SA . That is, if we can solve the acceptance problem, then we
can solve the self-acceptance problem, but not otherwise. In this section, we concen-
trate on the second step of problem reduction. Once we have an unsolvable problem,
we can reduce this to another problem so that the latter becomes unsolvable.
Suppose B and C are two problems. First we construct the corresponding lan-
guages L B and L C over some appropriate alphabets Σ B and ΣC , respectively. To
reduce B to C means that when we have an algorithm for solving C; and we can
somehow use this algorithm to solve B. As languages, L B is to be transformed to
L C . As a decision procedure for L C is expected to work for L B , via this transforma-
tion, the transformation of L B to L C must be captured by another algorithm.
That is, we should have a map from Σ B∗ to ΣC∗ , which takes L B ⊆ Σ B∗ to exactly
L C ⊆ ΣC∗ . Consequently, Σ B∗ − L B is taken to ΣC∗ − L C by the transformation. More-
over, as the languages L B and L C are suitable representations of the problems B and
C, respectively, we can have a uniform alphabet for both of them. Thus, without loss
of generality, we assume that Σ B = ΣC . Problem reduction via a transformation or a
map such as this is called a map reduction or a mapping reduction, and is formally
defined as follows.
Let L and L be languages over an alphabet Σ. A total function f : Σ ∗ → Σ ∗
is called a map reduction of L to L if f is computable, and provided for each
w ∈ Σ ∗ , we have w ∈ L iff f (w) ∈ L . In such a case, we say that L is reduced to L
f
by the map f , and write it as L ≤m L . When the map f is not so important to be
mentioned, we will write L ≤m L and read it as L is map-reduced to L .
The subscript m in L ≤m L indicates that this reduction is via a map, which is, in
general, many-one. The goal of such a reduction is to be able to decide L by using a
decision procedure for L .
Exercise 9.1. Let f 1 : Σ1∗ → Σ2∗ , f 2 : Σ2∗ → Σ3∗ be computable functions. Show
that f 2 ◦ f 1 : Σ1∗ → Σ3∗ is computable.
To see the above proof more directly, let M be a total TM that decides L . Con-
struct a TM M as follows:
Proof. The same proof as that of Theorem 9.1 that uses ξ L instead of χ L would do.
However, you can have an alternate proof as in the following:
Let M be a TM that accepts L and let M be a TM that computes f , where
L is reduced to L by the map f. Let M be the combined machine M → M . It
takes input w from Σ ∗ , computes f (w) by simulating M , and then accepts f (w)
by simulating M . As f (w) is accepted only when f (w) ∈ L and that happens only
when w ∈ L, we see that M accepts w when w ∈ L. Therefore, M accepts L.
By combining Theorems 9.1 and 9.2 and using contraposition, we have the follow-
ing result that justifies the implicit direction in the symbol ≤m of a map reduction:
To see an application of map reduction, we take the halting problem. The prob-
lem is, “Does there exist an algorithm to decide whether an arbitrary TM halts on
a given input?” Similarly, the self-halting problem is posed as: “Does there exist
an algorithm to decide whether an arbitrary TM halts on itself?” The problems dif-
fer from acceptance problem and the self-acceptance problem by asking a question
about halting in any of the states h or rather than in the accepting state h. The
suitable languages are
9.2 Problem Reduction 285
Similarly, Mg is constructed by taking the input as ψ(M) and giving its output
as ψ(M ). As L A , L SA are not computably enumerable, Theorem 9.3 proves the
following:
Therefore, both the halting problem and the self-halting problem are (algorithmi-
cally) unsolvable problems. You will meet many more unsolvable problems in the
rest of this chapter. The proofs that they are unsolvable use reduction in some form
or other. However, map reduction is not the most general type of reduction. Intu-
itively, if a problem B is reduced to another problem C, then an algorithm that might
solve C can be used to solve B. Map reduction is not capable of doing this in its full
generality. For example, an algorithm for deciding L A can be used to decide L A by
just interchanging the outputs 0 and 1. But there is no map reduction of L A to L A as
L A is not computably enumerable while L A is computably enumerable.
The requirement that an algorithm for deciding a language L can be used for
deciding a language L is called a Turing reduction of L to L . We write L ≤T L
for “L is Turing reducible to L . ” Thus both L A and L SA are Turing reducible to L A .
You will see the use of Turing reduction in the rest of the chapter.
286 9 Algorithmic Solvability
9.1. The state-entry problem is: Given a Turing machine M, a state q of M, and
an input string w ε, whether or not M enters q computing on w. Show that the
state-entry problem is unsolvable by giving a reduction of halting problem to the
state-entry problem.
9.2. Can you use the unsolvability of the state-entry problem to show that the prob-
lem raised by the question “Given a TM M and a state q of M, does M enter the
state q on some input?” is also unsolvable?
9.3. The blank-tape problem is: Given a Turing machine M, whether it halts when
started with the blank-tape. Give a reduction of the halting problem to the blank-tape
problem.
9.4. Let f : N → N be the function where f (n) equals the maximum number of
moves that can be made by any n-state TM having tape alphabet {0, 1, b }, that halts
when started with a blank tape. First, show that f is well-defined. Show that f is
not a computable function by reducing the blank-tape problem to the problem of
computability of f.
9.5. Reduce the halting problem to the problem: Given any TM M, a symbol σ ∈ Σ,
and a nonempty string w ∈ Σ ∗ , determine whether or not the symbol σ is ever written
when M works on the input w.
9.6. The restricted acceptance problem is: Given any string w, whether there exists
a TM M such that M accepts w. Give a reduction of the acceptance problem to
the restricted acceptance problem. Note that in the acceptance problem, we seek an
algorithm that would work for each pair of a machine M and a string w, while in
the restricted acceptance problem, we only ask for an algorithm that would work for
each w.
9.7. Determine which of the following problems about an arbitrary TM M are decid-
able, either by giving a decision procedure or by reducing the acceptance problem to
the problem at hand:
(a) Does M have at least 2009 states?
(b) Does M take more than 2009 steps on input ε?
(c) Does M take more than 2009 steps on some input?
(d) Does M take more than 2009 steps on every input?
(e) Does M move to some square at least 2009 squares away from the square con-
b?
taining the initial
(f) Does M accept any string at all?
(g) Is L(M) finite?
9.8. Show that there exists a fixed TM M0 and a fixed unrestricted grammar G 0 such
that the following problems about them are unsolvable:
(a) Given a string w whether w ∈ L(M0 ).
(b) Given a string w whether w ∈ L(G 0 ).
9.3 Rice’s Theorem 287
9.9. Suppose we are given a DFA for accepting L and a TM for accepting L . Show
that we cannot necessarily construct a DFA for L/L effectively. Note that if L ⊆ Σ ∗
is regular, and L ⊆ Σ ∗ , then L/L = {w ∈ Σ ∗ : wx ∈ L for some x ∈ L } is regular.
Similarly, formulate and prove an unsolvability result for L\L = {w ∈ Σ ∗ : wx ∈
L for all x ∈ L }.
9.10. Show that the Turing reduction relation ≤T is reflexive and transitive, and that
the map reduction ≤m refines ≤T .
9.11. Suppose L ≤m L and that L is a regular language. Does that imply L is also
regular?
9.12. Recall that L A = {ψ(M)# ψ(w) : the TM M accepts w}. Let L ∅ = {ψ(M) : M
is a TM and L(M) = ∅}. Is L A ≤m L ∅ ? Is L ∅ ≤m L A ?
9.17. (See Example 1.9) For solving the halting problem, consider two TMs as in the
following:
M1 : Take input as TM M and a string w. Output “M halts on w.”
M2 : Take input as TM M and a string w. Output “M does not halt on w.”
As either M halts on w or not, one of M1 , M2 reports correctly whatever be the case.
Therefore, halting problem is solvable. What is wrong with this argument?
9.18. Revisit the proof of Theorem 8.8. Is the reduction of L SA to L A there a map
reduction? If not, does there exist such a map reduction?
9.20. Prove directly that L H is computably enumerable but its complement is not.
9.21. Prove directly that L SH is computably enumerable but its complement is not.
To see how Turing reducibility is at work, we prove a nontrivial result about nontrivial
properties of computably enumerable languages. Unless otherwise stated, a property
is a unary relation on a set. For example, in the domain of natural numbers, primeness
is a property. A natural number may or may not have this property, that is, it may or
may not be a prime number. Thus the property of primeness can be identified with
the subset of prime numbers.
288 9 Algorithmic Solvability
However, we need to see that P is represented as a language and not just a set of
TMs. This is tackled easily by using the encodings of TMs as binary strings. Thus,
we represent P as the language
We show that this problem, called the decision problem for P, is unsolvable by
reducing the self-acceptance problem to this problem, that is, formally, you would
reduce L SA to L P . Notice that the phrase “a property P of computably enumerable
languages” is represented here as a language L P ⊆ {0, 1}∗ ; and L P satisfies the
property:
Similarly, P is nontrivial amounts to asserting that there are TMs M, M such that
ψ(M) ∈ L P but ψ(M ) ∈ L P .
Proof. We continue with the informal P rather than the formal L P . We would then
show that the decision problem for P is unsolvable. Let P be a nontrivial property
of computably enumerable languages. For the time being assume that P does not
contain the empty language ∅. As P is a nontrivial property, it contains a nonempty
computably enumerable language, say, L. Let M L be a Turing machine that accepts
L. We use M L for reducing the self-acceptance problem to decision problem for P.
Recall that L SA consists of strings of the form ψ(M), where M accepts ψ(M).
We give an algorithm to construct a TM M given a string ψ(M), by specifying what
M does on an input w. It is as follows:
M simulates M on ψ(M). If M accepts ψ(M), then M simulates M L on w.
Thus, the TM M L is hard-wired into M this way. Now, either M accepts ψ(M)
or not. In the first case, M does as M L would do to the string w. Here, M accepts
all and only strings of L. That is, when M accepts ψ(M), L(M ) = L. On the other
hand, when M does not accept ψ(M), M also does not accept w. In fact, no w is
accepted, in this case. That is, when M does not accept ψ(M), L(M ) = ∅. Since
L ∈ P and ∅ ∈ P, we see that
Here ends the reduction of the self-acceptance problem to the decision problem
for P. How?
Suppose there is an algorithm to determine whether any given computably enu-
merable language is in P. Feed to this algorithm the language L(M ). The algorithm
then determines whether L(M ) is in P or not. Writing the contrapositive statements
of (a and b) above, we see that the algorithm then determines whether M accepts
ψ(M) or not. As the self-acceptance problem is not solvable (Theorem 8.9), the de-
cision problem for P is not solvable.
The proof is not yet complete. We must get rid of the extra assumption that ∅ ∈ P.
If ∅ ∈ P, consider P = RE − P. The above yields a proof for P instead of P. That
is, L P is undecidable. Then, so is L P .
290 9 Algorithmic Solvability
Rice’s theorem says that almost whatever interesting we speak about computably
enumerable sets is undecidable. But what about acceptability, if not decidability? It
can be shown that P, as a subset of RE, is computably enumerable iff P satisfies all
of the following:
Theorem 9.6. The following problems about TMs, and hence, about unrestricted
grammars, are unsolvable:
You should be able to translate the statements in Theorem 9.6 to unsolvable prob-
lems about unrestricted grammars. For example, unsolvability of the fourth statement
there reads as: “It is unsolvable whether two arbitrary unrestricted grammars gener-
ate the same language or not.” Some more problems about grammars and automata
will be discussed in the following sections.
9.22. Show that there is no algorithm to determine whether or not a given TM even-
tually halts with an empty tape given any input.
9.23. Is the problem of determining whether or not an arbitrary TM revisits its initial
square (the cell with
b which is followed by the input) solvable?
9.24. Using reduction, prove that L ∅ = {ψ(M) : L(M) = ∅} is not computably enu-
merable. Also, prove that L not ∅ = {ψ(M) : L(M) ∅} is computably enumerable.
9.26. Show that the set of all ψ(M) for TMs M that accept all inputs that are palin-
dromes (possibly along with others) is undecidable.
9.27. Show that the set of all TMs that halt when started with the blank tape is com-
putably enumerable. Is this decidable?
9.28. Let f (n) be the maximum number of tape squares examined by any n-state
TM with tape alphabet {0, 1,
b } that halts when started with a blank tape. Show that
f : N → N is not a computable function.
9.30. Let M be a Turing machine with input alphabet {a, b}. Is the problem “L(M)
contains two different strings of the same length” solvable?
9.31. Given a TM M and a string w, determine whether the following problems are
solvable:
(a) Does M ever write b on its tape on input w?
(b) Does M ever write a nonblank symbol on its tape on input w?
(c) Given a symbol σ, does M ever write σ on its tape on input w?
9.32. Show that the problem of determining whether the language accepted by a
given TM equals {ww R : w ∈ {a, b}∗ } is unsolvable.
292 9 Algorithmic Solvability
9.34. Are the following problems about a given pair of TMs M and N solvable?
(a) Is L(M) ⊆ L(N)?
(b) Is L(M) = L(N)?
(c) Does there exist a string that both M and N accept, that is, is L(M) ∩ L(N) ∅?
(d) Is L(M) ∩ L(N) finite?
(e) Is L(M) ∩ L(N) decidable?
(f) Is L(M) ∩ L(N) computably enumerable?
9.36. Show that the halting problem for TMs remains unsolvable when restricted to
machines with fixed but small number of states. Note that if arbitrarily large number
of states are allowed, then the universal machine would do the job.
9.40. We say that a TM M uses k cells on input w if, there is a configuration quσv
b w quσv and
(uσv) ≥ k. Which of the following problems are
of M such that s
(is) solvable:
(a) Given a TM M, a string w, and a number k, does M use k cells on input w?
(b) Let f : N → N be a computable total function. Given a TM M and a string w,
does M use f (
(w)) cells on input w?
(c) Given a TM M and a string w, does there exist a k such that M uses k cells on
input w?
9.4 About Finite Automata 293
Does there exist an algorithm to determine whether any given DFA accepts a given
string?
Yes; because you can have a total TM that simulates any given DFA. Moreover, this
simulation can be written as an algorithm. Let us use the same notation ψ(D) for a
binary string encoding of a DFA D. The corresponding language for the acceptance
problem for DFAs is:
Theorem 9.8. The following problems about regular expressions, regular grammars,
and finite automata are solvable:
On input u ∈ {0, 1}∗ , M first checks whether u is in the form ψ(D) for some DFA
D. If not, M enters its rejecting state . Otherwise, M
(a) Marks the start state of D.
(b) Repeats the following step until no new states are marked:
Mark any state that has a transition coming into it from an already marked
state.
(c) If no final state is marked, then it enters the accepting state h, else it enters the
rejecting state .
Exercise 9.6. Show that the TM M constructed in the proof of the fourth statement
of Theorem 9.8 decides L EDFA .
Most of the interesting problems about regular languages are decidable. The para-
dox is that if you discover an unsolvable problem about finite automata, it will be
really interesting!
An interesting problem concerning regular languages and TMs is that, given a
TM, can it be determined whether it accepts a regular language or not. For a particular
given TM, the answer may be “yes.” But the question is whether there exists an
algorithm to do it uniformly for all TMs. The language for the problem is:
You have already seen in Theorem 9.6 that L RTM is not decidable. Therefore, there
is no such algorithm. You may ask whether a similar result holds if PDAs are taken
instead of TMs, or when regular languages are replaced by context-free languages.
9.5 About PDA 295
9.41. Show that the problem of determining whether two given DFAs accept the same
language is solvable. Give an intuitive explanation why for DFAs this problem is
solvable, but for TMs, it is not.
9.43. Consider the problem of testing whether a DFA and a regular expression are
equivalent. Express this problem as a language and then show that the language is
decidable.
We ask a simple question about context-free grammars (CFGs). Given a CFG G and
a string u, can it be determined whether u is at all generated by G? If I actually give
you such a G and a u, then you would start deriving u in G based on your intuition,
and then possibly, by luck, derive it. If you fail in many attempts, then you may
like to proceed towards a proof that u cannot be generated in G. But my question is
different. Can you have an algorithm to determine whether any such G generates any
such u?
296 9 Algorithmic Solvability
You may think in terms of a PDA. Construct an equivalent PDA for G, for which
there is an algorithm. If this PDA happens to be a DPDA, then run the DPDA on u.
Depending upon whether the DPDA accepts u, the answer to the original question
will be “yes” or “no.” This is fairly algorithmic. But if the PDA is not a DPDA,
then there might be more than one computation with the same input u, and it looks
too awkward to follow through all those computations. Nonetheless, there are only
a finite number of such computations, and an algorithm can find them all by brute
force approach. Thus, the problem “whether u ∈ L(G) ” is solvable.
Another way is to use a Chomsky normal form for G. Recall that in a Chom-
sky normal form CFG, each production is of the form A → BC or A → σ for
nonterminals A, B, C, and terminal σ. Each application of a production either pro-
duces a terminal or increases the length of the (to be) derived string. Then you try all
derivations of length 2
(u) − 1 in the Chomsky normal form. If u is at all derived in
any of these derivations, then answer is “yes,” else, “no.” Thus you have proved the
following result:
Theorem 9.9. The membership problem for CFGs (and for PDAs), that is, given an
arbitrary CFG G and a string u, whether u ∈ L(G), is solvable.
Notice that Theorem 9.9 is a restatement of the fact that the set of context-free
languages form a subset of the set of decidable languages (over any alphabet Σ).
Similarly, the question whether a given CFG generates any string at all (L(G) ∅)
is algorithmically solvable. How? Just revisit the proof of Pumping Lemma for CFLs.
There we had chosen a magic number n = k m , where m is the maximum number of
nonterminals in G and k is the maximum length of α, when X → α is any production
in G. It follows from Pumping Lemma that corresponding to each string of L(G)
having length at least n, there exists a string of length at most n, which is also in
L(G). That means, if G generates any string at all, then it must generate a string of
length at most n. Now make a list of all strings of length at most n over the terminal
alphabet, and apply Theorem 9.9. You have proved the following result:
Theorem 9.10. The emptiness problem for CFGs (and for PDAs), that is, whether an
arbitrary CFG G generates any string at all, is solvable.
Proof. We have already proved it. Here is another proof similar to the marking
scheme we have adopted for DFAs.
First, mark all the terminal symbols of G. Next, repeatedly mark each nonterminal
A if there is a production A → α and if all symbols of α have already been
marked. Quit this loop when there are no more changes in the marking.
After the end of this loop, you will see that either A has been marked or some nonter-
minal symbols occurring in α have not been marked. By induction on the number of
marking steps, it follows that if A is marked by this procedure, then there is a string
u of terminals such that A ⇒ u. Induction on the length of the derivation A ⇒ u
also shows that if A ⇒ u for some string u of terminals, then A has been marked.
Therefore, some string w is in L(G) iff the start symbol S of G has been marked.
This solves the problem whether L(G) = ∅.
9.5 About PDA 297
Exercise 9.7. Complete the proof of Theorem 9.10 by carrying out the two inductive
proofs.
Recall that context-free languages are not closed under complementation. Theo-
rem 9.10 will thus be useless in deciding “whether a CFG generates all strings over
its terminal alphabet” is a solvable problem or an unsolvable problem. Let us fix the
terminal alphabet as Σ. The All-CFG problem can be posed as:
For an arbitrary CFG G, whether L(G) = Σ ∗ ?
We need to do some ground work to answer this question. This concerns com-
putations of Turing machines. Let M = (Q, Σ, Γ, δ, s, h, ) be a TM. Recall that
a configuration of M is a snap-shot of a TM in action. It can be written as a string
uqσv, where M is currently in the state q, is scanning the symbol σ, to the left of
which is the string u, and to the right of which is the string v. A computation of M is
a sequence of configurations, where a configuration yields in one step the next con-
figuration in the sequence. When a computation starts with an initial configuration
and ends with an accepting configuration, we say that it is a valid computation.
A valid computation history is a valid computation written in a slightly twisted
way. We use a new symbol, say, # , assumed not to have been used in M, to separate
the configurations in a computation. A valid computation history of M is a string
in either of the forms:
1. # w1 # w2R # w3 # w4R # · · · w2m
R
# OR
R R R
2. # w1 # w2 # w3 # w4 # · · · w2m # w2m+1 # .
such that
have abqab abpbb. Thus following M, P should have on its stack bbpba after a
move. P does it by following the rule:
If δ(q, a) = ( p, b) and vaq is on its stack, then replace vaq by vbp.
Here, v is any string not having a state component. Now the stack contains bbpba;
read from top to bottom, as usual. The next move of M follows the transition
δ( p, b) = (r, L) after which, M has the configuration br bba. This is followed by
P using the rule:
If δ( p, b) = (r, L), and vbp is on the stack, then replace vbp by vr b.
Now M has the configuration br bba and the transition δ(r, b) = (t,
R ) brings it to
the configuration bbtba. P follows it by using the rule:
If δ(r, b) = (t,
R ), and vr b is on the stack, then replace vr b by vbt.
Notice that the input alphabet and the stack alphabet of the PDA includes all state
symbols, all tape symbols of the Turing machine, and the special symbol #. We also
agree to use the same symbols on the stack as the input symbols.
Now, whenever u w, the string w has been obtained from u by using a transition
in either of the forms (1–3). In that case, the stack of P has been changed from u R to
w R . The reading head is still on # . Here, operation on the stack is over, the reading
head moves to the next square on the right and then P operates as follows:
4. If the symbol that P is scanning is the same as the top most symbol on the stack,
then P’s reading head goes to the next square on the right popping the matched
symbol off the stack, else, P halts abruptly.
9.5 About PDA 299
Now, if u # v R is on the tape and v is on the stack, then only P would read the entire
input and come to the stage where it has an empty stack. We just declare all states of
P as final states. Thus P accepts u # v R and only such strings where u v.
A similar result holds for valid computation histories when one of the yields is
violated.
Lemma 9.2. The language {# w1 # w2R # · · · wn # : wi wi+1 for some i } is a CFL.
Proof. We construct a PDA P for accepting this language. Let all states of P be final
states. P nondeterministically selects some wi that is preceded by some number of
# s. This is the place where an error in the yield might occur. P pushes a new symbol,
say, # , to the stack initially. P then reads wi and stores y in the stack from bottom to
top such that wi y, as in the PDA constructed in the proof of Lemma 9.1. Thus,
the rightmost symbol of y is on the top of the stack.
After reading # from the input, P compares the succeeding string wi+1 with y R
on the stack, read from top to bottom now. While comparing, it goes on popping off
a matched symbol. If discrepancy occurs at some symbol (it is not #), then P scans
the remaining symbols and empties the stack, accepting the string. If a perfect match
between wi+1 and y R occurs, then P finds # on the stack, in which case, it halts
abruptly. Thus P accepts the input of the required form.
On the contrary, the set of valid computation histories is not, in general, a CFL.
We will not indulge in such details. We rather show that the All-CFG problem is
unsolvable.
Theorem 9.11. The problem that given an arbitrary CFG G with terminal alphabet
Σ, whether L(G) = Σ ∗ is unsolvable.
Proof. We reduce the acceptance problem of Turing machines to the All-CFG prob-
lem by employing the valid computation histories of a TM M.
Let M be a Turing machine and w be any input to it. A valid computation history
# w1 # w2R # · · · # of M is called a valid computation history of M on w whenever
w1 is an initial configuration of M with input w. Denote by V , the set of all valid
computation histories of M on w. The set V ⊆ Δ∗ , for some alphabet Δ that has
all the symbols of the state-set and the tape-alphabet of M along with the special
symbols L , R , h, , and # . (If you have used encodings of configurations wi ’s, then
Δ can be taken as {0, 1, # }.) We write V for the complement of V in Δ∗ . Now,
V = V1 ∪ V2 ∪ V3 ∪ V4 ,
where each Vi is determined by the i th property as given below.
Exercise 9.8. Use lemma 9.2 to show that the intersection problem for CFGs, that is,
for arbitrary CFGs G, G , whether L(G) ∩ L(G ) = ∅, is unsolvable.
Following similar lines, many more undecidable results about CFGs can be
proved. we mention some of them here. Let G, G 1 , G 2 be arbitrary CFGs and R
be an arbitrary regular language. The following problems are unsolvable:
Similar results hold for PDAs in place of CFGs. We mention some similar results for
linear bounded automata (LBA).
(k) Acceptance problem for LBAs is solvable, that is, there is an algorithm for testing
whether an arbitrary LBA accepts a given input string.
(l) The emptiness problem for LBAs is unsolvable, that is, there is no algorithm for
deciding whether an arbitrary LBA accepts any string at all.
9.45. Show that the finiteness problem for CFGs, that is, whether an arbitrary CFG
generates only a finite number of strings, is solvable.
9.46. Either give an algorithm or show that no algorithm exists for solving each of
the following decision problems:
(a) Is there a nonterminal of a CFG that comes up repeatedly in some derivation?
(b) Is the language of a CFG infinite?
(c) Is the language of a CFG regular?
(d) Given a regular language L and a CFG G, is L ⊆ L(G)?
(e) Does a CFG generate a string of length less than some given number?
(f) Do a CFG and a regular grammar generate some common string?
9.6 Post’s Correspondence Problem 301
9.50. Which one of the following problems is solvable and which is not? Justify.
(a) Given a regular language L and a CFL L , is L ⊆ L ?
(b) Given a regular language L and a CFL L , is L ⊆ L?
where the concatenated string on the upper track matches with that on the lower
track, concatenated as they are placed. The matched string here is babbbaabb.
aab ab ab ba
Exercise 9.9. Find a match with the dominoes , , , or
a abb bab aab
show that there is no match possible.
It looks easy to play this game. But our interest is in getting an algorithmic solu-
tion. You are now asked to write a program (an algorithm) that would solve all such
domino games, and not just the one given in Example 9.2.
Remember that the same dominoes can be repeated as often as you require. That
gives you more freedom, and in turn, makes it difficult to play the game. Post’s
correspondence problem is to find a match, if possible, for all such games. Let us
formalize a bit.
Let Σ be an alphabet having at least two symbols. A Post’s correspondence
system P over Σ, or PCS, for short, is a finite ordered set of ordered pairs of strings
(u1 , v1 ), . . . , (un , vn ) from Σ + . We write such an ordered set with n elements as the
n-tuple ((u1 , v1 ), . . . , (un , vn )). A match in P is a sequence of indices i 1 , i 2 , . . . , i k ,
not necessarily distinct, such that ui1 ui2 · · · uik = vi1 vi2 · · · vik . Post’s Correspondence
Problem, or PCP, for short, is the problem of determining whether an arbitrary PCS
has a match. An instance of PCP is a particular PCS where we seek a match.
Example 9.3. The PCS of Example 9.2 is the triple P = ((u1 , v1 ), (u2 , v2 ), (u3 , v3 )),
where u1 = a, v1 = aba, u2 = ab, v2 = bb, u3 = baa, v3 = aa. The match found out
there was the sequence of indices 1, 3, 2, 3.
I leave the details of how you encode a PCS as binary strings: L PC P ⊆ {0, 1}∗ . All
that you require is a TM that can find out the individual ui ’s, vi ’s, and their sub-
scripts i , etc. from the encoding ψ(P).
We will be interested in a particular type of PCP, called a modified PCP,
which imposes certain restrictions on a match. Let P = ((u1 , v1 ), . . . , (un , vn ))
be a PCS. A modified match in P is a sequence of indices 1, i 1 , i 2 , . . . , i k , such
that u1 ui1 ui2 · · · uik = v1 vi1 vi2 · · · vik . That is, a modified match must start with the
first domino. The modified PCP, or MPCP for short, is the PCP where we seek a
modified match.
We try to reduce the acceptance problem of TMs to MPCP by using the technique
of valid computations. In case of PDAs, we required a twisted way of writing the
valid computations due to the presence of a stack. Here, we use the valid computa-
tions themselves.
9.6 Post’s Correspondence Problem 303
string. And the only dominoes where the first string outnumbers the second are given
in Stages 6 and 7. Thus in such a match, Stage 6 and/or Stage 7 dominoes are bound
to occur. That means, the accepting state h must occur in the computation of M on
w. Hence the statement in Exercise 9.10 is an “iff ” statement. With this you have
proved the following result:
Lemma 9.3. There is an algorithm to construct a PCS P from any given TM M and
a given string w so that M accepts the input w iff P has a modified match.
Lemma 9.3 reduces the acceptance problem of TMs to MPCP. To complete the
argument, we require a reduction of MPCP to PCP. Recall that such a reduction
would mean utilizing an algorithm to decide “whether an arbitrary PCS has a match”
for determining “whether an arbitrary PCS has a modified match.” We give such a
reduction in the proof of our main theorem below.
Theorem 9.12. PCP is unsolvable. Moreover, the subproblem of PCP over an alpha-
bet containing at least two symbols is unsolvable.
Proof. Suppose PCP is solvable. That is, we have an algorithm, say, A that given an
arbitrary PCS with at least two symbols in its alphabet, reports correctly whether this
PCS has a match or not. We first show how to use this algorithm A for solving MPCP.
Let P = ((u1 , v1 ), . . . , (un , vn )) be a PCS over an alphabet Σ containing at least
two symbols. We give an algorithm, say, B to construct a PCS P̂ corresponding to
the PCS P.
Let ∗ and be two new symbols not in Σ. For any string u = σ1 σ2 · · · σm with
each σi ∈ Σ, define
∗u = ∗σ1 ∗ σ2 ∗ · · · ∗ σm ,
u∗ = σ1 ∗ σ2 ∗ · · · ∗ σm ∗,
∗u∗ = ∗σ1 ∗ σ2 ∗ · · · ∗ σm ∗ .
B constructs the new PCS P̂, corresponding to P, by taking
P̂ = ((∗u1 , ∗v1 ∗), (∗u2 , v2 ∗), (∗u3 , v3 ∗), . . . , (∗un , vn ∗), (∗, )).
For solving MPCP, suppose that the PCS P is given. we apply the algorithm B on
P to obtain the PCS P̂. Then, we use the algorithm A to determine whether P̂ has
a match or not. If there is a match in P̂, then it must start with its first domino
(∗u1 , ∗v1 ∗), as this is the only domino whose first symbols in the first component
(top track) and the second component (bottom track) are same. That is, each match
in P̂ is a modified match. Conversely, each modified match in P̂ is, trivially, a match.
Thus, A determines whether there is a modified match in P̂.
Moreover, each modified match in P̂ gives rise to a modified match in P. Such
a match in P is obtained by just deleting all occurrences of ∗ and from a match
in P̂. Thus it is decidable whether P has a modified match or not. That is, MPCP
is solvable. This result, of course, depends on the solvability of PCP, as we have
assumed here.
To complete the reduction process, let M be a TM and w be an input to it. Using
Lemma 9.3, construct the PCS P. M accepts w iff P has a modified match. This
306 9 Algorithmic Solvability
reduces the acceptance problem of TMs to MPCP whenever the dominoes of the
PCS use at least two symbols. That is, if MPCP is solvable, then the acceptance
problem for TMs is also solvable.
But the acceptance problem for TMs is not solvable. Hence, MPCP is not solv-
able. Therefore, the subproblem of PCP where dominoes use at least two symbols is
unsolvable. It follows that PCP, in general, is also unsolvable.
What is the catch in “at least two symbols?” If only one symbol is used in the
dominoes, then write m i for the number of symbols in the upper track of the i th
domino, and n i for that in the lower track. If for some k, m k = n k , then a match
consists of a single copy of the kth domino. If m i > n i for each i , then there is
no match. Similarly, if m i < n i for each i , then there is no match. Else, suppose
m j > n j and m k < n k for some j, k. Then take n k − m k copies of j th domino and
m j − n j copies of the kth domino for a match.
9.51. Show that the PCS ((ab, abb), (b, ba), (b, bb)) has no match.
9.52. Does the PCS ((a, ba), (bba, aaa), (aab, ba)) have a match?
9.53. Find three matches in the PCS ((a, aaa), (ab, b) (abaaa, ab)).
9.54. Let P be the PCS ((001, 01), (0011, 111), (11, 111), (101, 010)). Does P have
a match? Does P have a modified match?
9.55. Consider two PCSs P = ((aa, ab), (a, bb), (bba, a), (b, a), (a, ab)) and P =
((a, abb), (ab, ba), (ab, b), (aba, ba)). Which one has a match and which one has
not? Prove your claim.
9.57. If Σ is a singleton, see where do the reductions in the proof of Theorem 9.12
(possibly also in Lemma 9.3) go wrong.
9.58. A weak match in a PCS is a match (relaxed) where the concatenated string w
can be obtained by concatenating the n first components and also it can be obtained
by the n second components, for some n, but the pairs used in the two cases need not
be the same pairs. Show that the problem of determining whether a PCS has a weak
match is solvable.
9.61. Show that the following modifications to the PCP are unsolvable:
(a) Does there exist a first-at-last match to a given PCS, where a first-at-last match is
a match whose last domino is the first domino of the PCS? In this terminology, a
modified match is a first-at-first match.
(b) Does there exist a first-two-at-first match to a given PCS, where a first-two-at-first
match is a match whose first two dominoes are the first two of the PCS, kept in
the same order?
Next, we define truth and falsity in such a logical language by interpreting them
as true or false. An interpretation t is taken as an assignment of values 0 or 1
to the atomic propositions, say, 0 for falsity, and 1 for truth. The interpretation t
is defined for all propositions by employing the rules: t(¬x) = 1 iff t(x) = 0,
t(x ∧ y) = 1 iff t(x) = 1 = t(y), t(x ∨ y) = 0 iff t(x) = 0 = t(y), t(x → y) = 0 iff
t(x) = 1 and t(y) = 0, and t(x ↔ y) = 1 iff t(x) = t(y). These rules are commonly
presented as truth tables.
Then we say that a proposition is valid if all its interpretations assign it to 1.
For example, the proposition ( p5 ∨ ¬ p5 ) has only two possible interpretations,
namely, t1 , t2 , where t1 ( p5 ) = 1 and t2 ( p5 ) = 0. Now, t1 ( p5 ∨ ¬ p5 ) = 1 and also
t2 ( p5 ∨ ¬ p5 ) = 1. Therefore, ( p5 ∨ ¬ p5 ) is valid.
In the second stage, we define the atomic formulas by declaring that each atomic
proposition is an atomic formula. Further, if t1 , . . . , t j are terms and Ri is a predicate
of arity j , then Ri (t1 , . . . , t j ) is an atomic formula. Similarly, for any terms s, t,
the expression (s = t) is also taken as an atomic formula. Notice that we agree to
use infix notation for the equality relation. Each atomic formula is declared as a
formula, and then other formulas are built up from atomic formulas by employing
the following rule:
You may first build up vocabulary. Suppose you write R1 (x) for “x is a rational
number,” R2 (x, y, z) for “y is between x and z,” and f 1 (u) for “square of u.” Then
you translate the sentence first to
If y and z are rational numbers, then there is a rational number x between f 1 (y)
and f 1 (z).
Next, you use the quantifiers properly, reading ∀ as “for all,” and ∃ as “there exists,”
so that the translation looks like:
∀x 1 ∀x 3 ((R1 (x 1 ) ∧ R1 (x 3 )) → ∃x 2 R2 ( f 1 (x 1 ), x 2 , f1 (x 3 ))).
Let I be the interpretation with domain as N, where R1I is the unary relation of prime-
ness, R2I is the equality relation “=,” c1I is the element 0, R3I is the binary relation “>,”
and f1I is the successor function. Then I interprets the sentence Y as
For each natural number x 1 , [(if x 1 is prime, then successor of x 1 is not equal to
0) and (there is a natural number x 2 such that x 2 is prime and x 2 > x 1 )].
Remember, you are not just asked to determine whether the given first order sentence
has a model or not. To solve the problem, you are supposed to construct an algorithm
(a TM or a program) that must determine the validity of each and every first order
sentence.
We connect validity problem to Post’s correspondence problem. Instead of writing
P as an n-tuple, and then writing a match as a sequence of indices, we will write P
as a set and use the informal and intuitive notion of a match in concatenating the
dominoes in P.
Let P = {(u1 , v1 ), (u2 , v2 ), . . . , (un , vn )} be a PCS over {0, 1}. Then each ui , vi ∈
{0, 1}∗ , a binary string. Our goal is to construct a first-order language from P, with
certain properties so that reduction might be possible. We choose an individual con-
stant c0 , two unary function symbols f 0 , f 1 , and a binary predicate R1 . We think of c0
as the empty string, f 0 as the function that concatenates its argument with a 0, f1 as a
function symbol that concatenates its argument with a 1. Using these nomenclature,
we can write any binary string as compositions of f 0 , f 1 evaluated on c0 .
For example, the binary string 0 is thought of as ε0, which then is written as
f 0 (c0 ). Similarly, f 1 ( f 0 (c0 )) represents the binary string (read the composition back-
ward) 01. In general, the binary string b1 b2 · · · bm of bits is represented by the term
f bm ( f bm−1 ( · · · ( f b2 ( f b1 (c0 ))) · · · )), which we again abbreviate to f b1 b2 ···bm (c0 ) for bet-
ter readability.
This nomenclature is like translating a given argument in English to a first order
language by building an appropriate vocabulary. Here, we are translating the PCS P
into a first-order language. The predicate R1 below represents intuitively the initially
available dominoes, and then how the game of concatenating the dominoes is played
through successive moves.
We first express the fact that (ui , vi ) are the available dominoes. Next, we say that
if we have a domino (x, y) and the domino (ui , vi ), then we can have a concatenated
domino (an extended domino as) (xui , yvi ). A match in this game is then an extended
domino with the same first and second components. The binary predicate R1 is, in
fact, this extended domino. Now you can read through the formal construction of a
suitable formula, a sentence, as in the following:
Construct the sentence X, which is ((X 1 ∧ X 3 ) → X 4 ), where
X 1 = R1 ( f u1 (c0 ), f v1 (c0 )) ∧ · · · ∧ R1 ( f un (c0 ), f vn (c0 )),
X 2 = R1 ( f u1 (x 1 ), f v1 (x 2 )) ∧ · · · ∧ R1 ( f un (x 1 ), f vn (x 2 )),
X 3 = ∀x 1 ∀x 2 (R1 (x 1 , x 2 ) → X 2 ), and
X 4 = ∃x 3 R1 (x 3 , x 3 ).
Our goal is to show that the sentence X is valid iff P has a match. We break up the
proof into two parts.
Proof. This part is a bit trivial. If you have translated an English sentence to first-
order, and the first-order sentence is valid, then in particular, the English sentence
should be true. We give a similar argument.
312 9 Algorithmic Solvability
Proof. Assume that P has a match. We have a sequence of indices i 1 , . . . , i k such that
ui1 · · · uik = vi1 · · · vik = w, say. Let I be any interpretation of X, with the nonempty
domain as D. Then c0I ∈ D, f0I , f1I : D → D, and R1I ⊆ D × D. As P has only
binary strings on its dominoes, we somehow assign these binary strings to elements
of D. To this end, define a map φ : {0, 1}∗ → D recursively by
The last expression is again abbreviated to f bI1 b2 ···b j (c0I ). Thus, with the abbreviation
at work, we have f s (c0I ) = φ(s) for any binary string s. For example, φ(011) =
f 1I ( f 1I ( f 0I (c0I ))). We are simply coding backward the elements of D, so to speak.
To show that I is a model of X, we assume that I is a model of X 1 , I is a model
of X 3 , and then prove that I is a model of X 4 . As I is a model of X 1 , for each
i, 1 ≤ i ≤ n, ( f uIi (c0I ), f vIi (c0I )) ∈ R1I , that is,
Starting with (φ(u1 ), φ(v1 )) ∈ R1I , and repeatedly using the above statement, we
obtain:
(φ(ui1 ui2 · · · uik ), φ(vi1 vi2 · · · vik )) ∈ R1I .
That is, φ(w, w) ∈ R1I . I is a model of X 4 . This completes the proof.
Now, suppose we have an algorithm A such that given any sentence Y , A decides
whether Y is valid or not. Let P be any given PCS over the alphabet {0, 1}. Construct
the sentence X as in Lemma 9.4. Because of the Lemmas 9.4 and 9.5, A decides
whether P has a match or not. However, PCP is not solvable. Therefore, there cannot
exist such an algorithm A. You have thus proved the following statement:
9.62. Show that validity in monadic first order logic is decidable. You do not have to
use the algorithm of Aristotle for showing this.
9.63. Let T h(N, +) denote the additive theory of natural numbers, that is, the set of
all true sentences in natural numbers having only addition and no multiplication. Is
the sentence ∃x∀y(x + y = y) in T h(N, +)? Is ∃x∀y(x + y = x) in T h(N, +)?
9.65. Denote by T h(N, <) as the set of all true sentences in natural numbers where
we have only the relation of “less than.” Show that T h(N, <) is a decidable theory.
Theorem 9.13 says that the set of valid sentences of first order logic is not decidable.
However, this set is computably enumerable, as we have proof procedures in first-
order logic that do prove all (eventually all) valid sentences. This is essentially the
contents of Gödels’s completeness proof of a Hilbert-style axiomatic system for first-
order logic. The undecidability of valid sentences of first-order logic also implies
that we cannot hope to write a program that would either give a proof, whenever
possible, of a mathematical conjecture, or would report to us that the conjecture is
false. In this section, we will discuss (not prove) some of the undecidability results
about other branches of mathematics.
To begin with, the theory of natural numbers with addition as the only operation,
commonly written as (N, +), has been shown to be a decidable theory. This means
that there is an algorithm which, given any sentence in this theory, decides whether
the sentence is true or false. Similarly, the field of reals (R, +, ×) is also a decid-
able theory. However, the theory of natural numbers with addition and multiplication
(N, +, ×) is not decidable. Had it been decidable, there would not have been unproved
conjectures such as Goldbach’s conjecture or the twin-primes conjecture.
There are also nontrivial problems that had been shown to be solvable. For
example, let p(x) be a polynomial in one variable, having rational coefficients. The
predicate “ p(x) has a zero in the closed interval [a, b] for given rationals a, b” is
decidable. This means that there is an algorithm (Use Sturm’s theorem.) that decides
whether an arbitrary polynomial with rational coefficients has a zero between two
given rational numbers. Moreover, there is an algorithm to compute the number of
real zeros of such a polynomial between two rationals.
A related problem concerns Diophantine equations. Let p(x 1 , . . . , x n ) be a poly-
nomial in n variables having integer coefficients. The equation
p(x 1 , . . . , x n ) = 0
For example, the predicate “the integer x is a perfect cube” is Diophantine as this
predicate can be expressed as ∃x 1 (x − x 13 ) = 0, where p(x, x 1 ) = x − x 13 is a polyno-
mial with integer coefficients.
Let R be a unary predicate. We say that R(x) is computably enumerable if the set
{x : x is an integer and R(x)} is a computably enumerable set, that is, if all those
integers satisfying the property R form a computably enumerable set. Matiyasevich
proved that any predicate R(x), where x ranges over integers, is computably enumer-
able iff it is Diophantine.
9.8 Other Interesting Problems 315
9.67. Let M be a TM and w be a string. Construct a formula φ M,w (x) in the lan-
guage of T h(N, +, ×) that contains a single free (unquantified) variable x so that the
sentence ∃xφ M,w (x) is true iff M accepts w.
9.68. Give a map reduction of the acceptance problem for TMs to T h(N, +, ×) by
constructing the formula φ M,w (x) from ψ(M)# ψ(w) using the preceding problem.
Conclude that the theory T h(N, +, ×) is undecidable.
316 9 Algorithmic Solvability
Our approach in this chapter has been greedy. We first proved Rice’s theorem and
then used it to show that many interesting problems about Turing machines are un-
solvable. The reduction technique has been quite helpful in arriving at the unsolv-
ability results. We then discussed solvability in the domains of finite automata and
PDAs. As many unsolvable problems have been shown to be so by a reduction of
Post’s correspondence problem (PCP), we have discussed the unsolvability of PCP.
9.9 Summary and Additional Problems 317
This has been demonstrated by the proof of unsolvability of validity problem in first
order logic by reducing PCP to it. We then mentioned many interesting problems that
have been shown to be unsolvable.
Reducibility relations have been discussed by Post; see [105, 111, 123]. Rice’s
theorem was proved in [109, 110]. The idea of valid computation histories was called
as T -predicates by Kleene [64, 65]. Undecidable properties of CFLs are from [8, 11,
18, 33, 40, 52]. PCP has been introduced in [106].
To give a bit of history, the origin of Theory of Computation dates back to 1900.
It came in the form of a problem:
Find an algorithm to determine whether a given polynomial with integer coefficients has an
integral root.
9.73. Show that the halting problem for deterministic LBAs is decidable.
9.74. Let Σ = {0, 1}. A binary relation R on Σ ∗ is called decidable if the language
{v2w : (v, w) ∈ R} is decidable. Show that a language L ⊆ Σ ∗ is decidable iff
there exists a decidable binary relation R on Σ ∗ such that L = {v ∈ Σ ∗ : (v, w) ∈
R for some w ∈ Σ ∗ }.
318 9 Algorithmic Solvability
9.75. Show that the language {ψ(D) : D is a DFA that accepts w R whenever it
accepts w} is decidable. Express the last statement as the solvability of a problem
about DFAs.
9.76. Suppose L and L are regular languages over Σ. Show that the following prob-
lems are solvable:
(a) Is shuf f le(L, L ) = L?
(b) Is tail(L) = L?
9.79. A useless state in a PDA is a state that is never entered during any computation.
Show that the problem of determining whether an arbitrary PDA has a useless state
is solvable. Express the problem as a language.
9.80. Show that both the conditions spelled out below in detail are necessary for
proving that L P is undecidable in Rice’s theorem:
(a) For any TMs M, M with L(M) = L(M ), if ψ(M) ∈ L P , then ψ(M ) ∈ L P .
(b) There are TMs M, M such that ψ(M) ∈ L P but ψ(M ) ∈ L P .
9.83. Show that the language {ψ(M)# ψ(M )# 0k+1 : M, M are TMs where L(M) ∩
L(M ) contains at least k strings} is computably enumerable but not decidable.
9.85. Prove that for any two languages A and B, there exists a language C such that
both A and B are Turing reducible to C.
9.86. Let P( · ) be a property of RE such that the set of all TMs whose languages
satisfy the property P is computably enumerable. If A ⊆ B are languages such that
P( A) holds, then show that P(B) holds.
9.9 Summary and Additional Problems 319
9.87. Let R be the set of all TMs that accept regular languages. Show that neither R
nor its complement in the set of all TMs is computably enumerable.
9.88. Let A be the set of all TMs that halt on all inputs. Show that neither A nor its
complement is computably enumerable.
9.93. Show that for an arbitrary unrestricted grammar G, the problem of deciding
whether L(G) = (L(G))∗ is unsolvable, by reducing the acceptance problem to this
problem.
9.94. Prove that the emptiness problem for LBAs, that is, whether a given LBA ac-
cepts the empty language, is unsolvable.
9.95. Let G be a regular grammar and G be any grammar. Is the problem L(G) =
L(G ) solvable when G is
(a) regular?
(b) context-free?
(c) linear?
(d) context-sensitive?
(e) unrestricted?
9.97. Show that, determining whether the union of two DCFLs is a DCFL, is an
unsolvable problem.
320 9 Algorithmic Solvability
9.98. Prove that it is undecidable whether a given LBA halts on every input. Interpret
the result for context-sensitive grammars.
9.99. Prove that the finiteness problem for TMs reduces to that of LBAs. Use this
to show that the problem of determining whether a given LBA accepts a regular
language is unsolvable.
9.102. Let M be an arbitrary TM. Show that the problem of determining whether
there exists a TM of shorter description than M accepting the same language L(M)
is unsolvable. [See Problem 10.146.]
9.103. Consider only TMs with a left end (a dead end) beyond which the read–write
head can never move. Note that these machines can simulate the standard ones. Re-
strict these TMs by requiring that they can never overwrite the input string; they can
write on the blank squares to the right of the input. These restricted machines ac-
cept only the regular languages (prove). Show that given such a restricted TM, it is
impossible to give an algorithm for constructing an equivalent DFA.
9.105. Informally describe a multitape TM that enumerates the set of all n such that
Mn accepts wn , where we have the sequence of all TMs and the sequence of all inputs
written as Mi ’s and wi ’s. Can a TM enumerate all these n in numerical order?
9.107. Unsolvability of most problems about CFLs can be seen via PCP instead of
using valid computation histories. Consider the PCS P = {(u1 , v1 ), . . . , (un , vn )} over
an alphabet Σ. Write B = {u1 , . . . , un } and C = {v1 , . . . , vn }. Take new symbols
a1 , a2 , . . . , an ; these are called the index symbols. Define two CFGs G B , G C having
the only nonterminals as B, C, respectively, and the productions as in the following:
B→
u1 Ba1 |u2 Ba2 | · · · |un Ba N, B → u1 a1 |u2 a2 | · · · |un an ;
C→ v1 Ca1 |v2 Ca2 | · · · |vn Can , C → v1 a1 |v2 a2 | · · · |vn an .
Let L B = L(G B ) and L C = L(G C ). The languages L B and L C are the comple-
ments of L B and of L C in the set (Σ ∪ {a1 , . . . , an })∗ . Finally, we construct another
CFG G BC having nonterminals S, B, C with S as the start symbol, and the produc-
tions as S → B|C along with all productions of G B , and all of G C . Attempt the
following:
(a) Prove that the CFG G BC is ambiguous iff the PCS P has a match.
(b) L B and L C are CFLs.
(c) Taking L(G 1 ) = L B and L(G 2 ) = L C , show that the problem of determining
whether for arbitrary CFGs G 1 , G 2 , L(G 1 ) ∩ L(G 2 ) = ∅ is unsolvable.
(d) Taking L(G 1 ) = L B ∪ L C and L(G 2 ) = (Σ ∪ {a1 , . . . , an })∗ , show that it is
unsolvable whether the language of a given CFG G 1 is regular.
(e) Show that the problem of determining for an arbitrary CFG G and a regular ex-
pression E, whether L(E) ⊆ L(G) is unsolvable.
(f) Show that it is unsolvable whether a given CFG is ambiguous or not.
(g) Taking L(G 1 ) = (Σ ∪ {a1 , . . . , an })∗ and L(G 2 ) = L B ∪ L C , show that it is
unsolvable whether for arbitrary CFGs G 1 , G 2 , L(G 1 ) ⊆ L(G 2 ).
(h) Prove that L B ∪ L C is regular iff it equals (Σ ∪ {a1 , . . . , an })∗ iff the PCS P has
a match.
(i) Show that it is unsolvable whether or not a CFG generates a regular language.
(j) Prove that it is unsolvable whether the complement of a CFL is also a CFL.
9.108. Show that the set of all ψ(G) for CFGs G that generates at least one palin-
drome is undecidable.
9.109. Prove that there exist two languages neither of which is Turing reducible to
the other.
9.110. A Thue system is a finite set of two-elements sets of strings over some alphabet
Σ such as T = { {u1 , v1 }, . . . , {un , vn }}. Let x, y ∈ Σ ∗ be strings. We say that x can
be obtained from y in T if for some k with 1 ≤ k ≤ n, there are z1 , z2 ∈ Σ ∗ such that
either x = z1 uk z2 and y = z1 vk z2 happen or x = z1 vk z2 and y = z1 uk z2 happen. Notice
that if x can be obtained from y, then y can also be obtained from x. We then extend
this relation to an equivalence relation by taking its reflexive and transitive closure;
that is, we say that x is equivalent to y in T if there is a finite sequence of strings
x 1 , x 2 , . . . , x m such that x can be obtained from x 1 , each x i can be obtained from
x i+1 , and finally, x m can be obtained from y. The word problem for Thue systems is
the problem of determining whether given a Thue system T and given two strings
x, y, the string x is equivalent to y in T or not. Show that the word problem for Thue
systems is unsolvable.
322 9 Algorithmic Solvability
9.111. A weak–strong match in a PCS is a match where the concatenated string w can
be obtained by concatenating the n first-components and also it can be obtained by
the n second-components, the pairs used in the two cases are required to be the same
pairs but need not be in the same order. That is, we allow the n second-components
to be permuted before concatenation. Is the problem of determining whether a PCS
has a weak–strong match solvable?
9.112. A Post’s tag system is a set of pairs of strings over an alphabet Σ with a
designated start string. If (u, v) is a pair of strings and w is any string in Σ ∗ , we say
that uw wv; this defines a move in the system. Show that it is unsolvable, given a
Post’s tag system and a string x, whether x ε in zero or more moves.
9.113. There is a two-dimensional version of PCP, called the tiling problem. We have
an infinite number of tiles, each one square unit with which we have to tile the first
quadrant. The only restrictions are that a special tile is to be put at the lower left
corner; it is called the origin tile; certain other tiles can touch each other horizontally
and certain others can touch vertically. Tiles may not be rotated as we please. Thus
the infinite number of tiles in a tiling system comprise an infinite supply from each
kind of these finite types of tiles. The problem is to determine whether there is an
algorithm to tile the first quadrant given a tiling system. A formal version of a tiling
system is as follows. A tiling system is a quadruple (T, t0 , H, V ), where T is a finite
set of tiles (in fact, the types; having infinite supply from each type), t0 ∈ T, and
H, V ⊆ T × T. A tiling by such a system is a function f : N × N → T satisfying
f (0, 0) = t0 , and
( f (m, n), f (m + 1, n)) ∈ H, ( f (m, n), f (m, n + 1)) ∈ V, for all m, n ∈ N.
Show that the problem “given a tiling system, whether there exists a tiling by that
system” is unsolvable.
9.114. Suppose we think of the tiles as being determined by the colors of their four
edges, and that two similarly colored edges are allowed to touch each other. Further
suppose that we are allowed to rotate the tiles and turn them over. Show that any
nonempty set of tiles, with or without a given origin tile, can be used to tile the first
quadrant.
9.115. Formulate precisely and then prove the following extension of Rice’s theorem:
every nontrivial property of pairs of computably enumerable sets is undecidable.
9.116. The set of propositions is defined over a countably infinite set of symbols
{¬, ∧, ∨, →, ↔, ), (}∪{ p0 , p1 , . . .}, using the rules that each pi is a proposition, and
if pi , p j are propositions, then so are ¬ pi , and ( pi βp j ), where β ∈ {∧, ∨, →, ↔}.
It defines a CFG over an infinite alphabet. Using the prefix lemma (Lemma 2.1),
show that this CFG is unambiguous.
9.117. Each atomic proposition pi in the last problem can be rewritten as p followed
by i + 1 number of 0’s. That is, the alphabet {¬, ∧, ∨, →, ↔, ), (} ∪ { p0 , p1 , . . .} can
be generated as a language over Σ = {¬, ∧, ∨, →, ↔, ), (, p, 0}. Finally, the set of
propositions can be generated as a language over the alphabet Σ. Construct a CFG
with terminals in Σ that generates the set of all propositions.
9.9 Summary and Additional Problems 323
9.118. Construct a CFG for generating the set of all first order formulas.
9.119. The numbers 34 and 10 are the ASCII codes for the double quote and the
newline, respectively. Understand, without executing, what the following C program
does:
char *s=” char *s=%c%s%c;
% cmain( ) {printf(s,34,s,34,10,10);}%c”;
main( ) {printf(s,34,s,34,10,10);}
9.120. Turing’s proof of Gödel’s Incompleteness Theorem: Let T (N, +, ×) denote the
set of all statements in Peano’s arithmetic (N with addition and multiplication) which
have proofs, the theorems of N. Let T h(N, +, ×) denote the set of all true statements
of Peano’s arithmetic, the first order theory of N. Show that
(a) T (N, +, ×) is computably enumerable.
(b) T h(N, +, ×) is not computably enumerable.
(c) There exists at least one true statement in Peano’s arithmetic which cannot be
proved.
9.121. Show that there exists a computable total function from N to N that cannot be
proved to be so in Peano’s arithmetic.
9.122. Describe two TMs M and N such that when started on any input w, the TM
M outputs ψ(N) and the TM N outputs ψ(M).
9.123. Here I ask you to prove a fixed-point theorem, called the recursion theorem.
It is as follows. Use the notation Sx for the TM whose encoding, ψ(Sx ), is the binary
string x. Also for any TM T , let T (x) denote the contents of the tape of T when
T has halted on input x; if T does not halt on x, then let T (x) remain undefined.
Further, suppose f : {0, 1}∗ → {0, 1}∗ be a computable total function. Then there
exists a string u ∈ {0, 1}∗ such that L(Su ) = L(M f (u) ). Prove it by showing the
following:
(a) Let f be computed by the TM M. Let N be the TM that on input x computes a
description (encoding) of a TM K , where K on input y does the following:
K constructs Mx , simulates Mx on input x, if it halts, then it simulates M on
Sx (x), it interprets the result of the computation, that is, it identifies M(Sx (x))
as the description of a TM, and then simulates that TM on the original input
y, accepting or rejecting as that machine accepts or rejects, respectively.
Show that N is a total TM and that L(SN (x) ) = L(S f (Sx (x) )).
(b) Let w be a description of the machine N, that is, N = Sw . Show that u = N(w)
satisfies L(Su ) = L(M f (u) ). [Such a u is called a fixed point of f.]
9.124. Give a short proof of Rice’s theorem using the recursion theorem.
9.125. A TM M is called a minimal TM if, among all TMs accepting the same lan-
guage L(M), M has the fewest number of states. Prove that there does not exist an
infinite computably enumerable set of minimal TMs. Formulated another way, write
M I NT M = {ψ(M) : M is a minimal TM}. Let A ⊆ M I NT M be any infinite set.
Show that A is not computably enumerable.
324 9 Algorithmic Solvability
9.126. Let f : {0, 1}∗ → {0, 1}∗ be a computable function. Prove that there is a TM
M such that f (ψ(M)) = ψ(M ) for some TM M equivalent to M. (The TM M is a
fixed point of f.)
9.127. In the preceding problem, suppose f is the function that interchanges h and
of the machine encoded as binary strings, which are in the form ψ(N) for some TM
N. What would be a fixed point of f ? Give an example of such a fixed point.
9.128. To construct the sentence “This sentence is not provable.” in the theory
T h(N, +, ×), we use the following facts:
Fact 1: Let M be a TM and w be a string. A formula φ M,w (x) in the language of
T h(N, +, ×) that contains a single free (unquantified) variable x can be algorithmi-
cally constructed so that the sentence ∃xφ M,w (x) is true iff M accepts w.
Fact 2: There exists an algorithm A that checks whether a suggested proof of a sen-
tence in T h(N, +, ×) is indeed a proof or not.
Now, Construct a TM M that operates as follows:
On any input, M obtains its own description ψ(M), via Recursion theorem. It
then constructs the sentence X = ¬∃xφ M,0 (x) using the above mentioned facts.
Next, it runs algorithm A on X. Finally, if A reports success, then it accepts; if A
halts but reports failure, then it rejects.
Show that
(a) The sentence X is true iff M does not accept 0.
(b) M cannot find a proof of X.
(c) Gödel’s Incompleteness Theorem: X is true but not provable.
9.129. Matiyasevich’s Theorem states that all computably enumerable sets are Dio-
phantine. That is, if A ⊆ N is computably enumerable, then there is a polynomial
p(x, y1 , . . . , yn ) in n + 1 variables and with integer coefficients such that “x ∈ A iff
∃y1 · · · ∃yn ( p(x, y1, . . . , yn ) = 0)” holds.
Use Matiyasevich’s theorem to show that a set A ⊆ N is computably enumerable
iff A is the set of all nonnegative values taken by some polynomial p(x 1 , . . . , x n )
with integer coefficients; for values of x 1 , . . . , x n taken from N.
9.130. Prove that there is a computably enumerable set A ⊆ N such that A is infinite
but A contains no infinite computably enumerable set. Ironically, such subsets of N
are called simple sets.
9.131. An oracle for a language L is an external device that can report whether any
given string is or is not in L. For instance, total TMs serve as oracles for decidable
languages. An oracle TM is a TM that has an additional capability of querying an
oracle. For example, an oracle TM that decides the acceptance problem can be used
to decide the emptiness problem (how?). Show the following:
(a) If L is Turing reducible to L , then an oracle TM that decides L can be used to
decide L. (What about the converse?)
(b) An oracle TM decides more languages than decidable languages.
(c) There are languages that cannot be decided by oracle TMs.
9.9 Summary and Additional Problems 325
9.134. Prove that each level of the arithmetic hierarchy is strictly contained in the
next. That is, Σn0 ∪ Πn0 ⊆ Δ0n+1 but Σn0 ∪ Πn0 Δ0n+1 .
9.135. From the previous problem it follows that Σ10 and Π10 are not comparable.
Prove that Σn0 and Πn0 are not comparable for any n ∈ Z+ , with respect to set
inclusion.
9.136. A set A is called RE-hard if every computably enumerable set is map re-
ducible to A. A set B is called RE-complete if B is both computably enumerable and
326 9 Algorithmic Solvability
10.1 Introduction
Now you know that certain problems can be solved by algorithms and certain others
cannot be. In discussing the issue of algorithmic solvability, you have used the
Church–Turing thesis, which asks you to believe in the dictum that algorithms are
nothing but total Turing machines (TTM) that use potentially infinite tapes, an ideal
which we will probably not be able to realize. Even if we realize this ideal, there is a
possibility that a TM may take an impracticably long time for giving an answer. This
can happen even if the machine operates too fast.
For example, suppose you want to visit 50 tourist destinations spread out all
over the earth. You know the cost of traveling from each destination to the other.
Depending upon which place to visit first, and which one next, you have now 50!
possibilities from which you want to select the one that is the cheapest. The number
of possibilities to be explored is too large, 50! > 10025. If computing the cost for one
itinerary visiting all 50 destinations takes a billionth of a second (too fast indeed),
then it will require no less than 1025 human life times to determine the cheapest
itinerary. Thus, algorithmic solvability alone does not suffice; we are interested in
practical solvability.
But what does it mean to say that a problem is practically solvable? Do we have
to run an algorithm for each instance of a solvable problem and then determine its
practical solvability? This is the most useless proposal, in fact, an impossible job.
The simple reason is that there are, in all probability, an infinite number of instances
of a problem.
Again, how do we measure time in this case? Measuring real time is useless.
As technology progresses, computing time for the basic operations reduce, and how
do we ascertain that our estimate is still applicable? Moreover, time is not the only
factor in discussing efficiency or practicality of algorithms. You may be interested
in the working space an algorithm might demand to solve a problem. The issue of
practicality might also depend on the particular computational model we choose.
We fix Turing machines as our computational model. We will measure time in
terms of the number of steps a TM takes in getting a solution. Of course, we must
A. Singh, Elements of Computation Theory, Texts in Computer Science, 327
c Springer-Verlag London Limited 2009
328 10 Computational Complexity
decide whether our TM uses a single tape or multiple tapes, and whether the TMs
are deterministic or nondeterministic.
And what about inputs? The same TM may arrive at a solution in different number
of steps of computation working on different inputs. To make the matter simple, we
will discuss the time taken by a TM as a function of the lengths of inputs rather than
in absolute terms, or as functions of inputs themselves. However, for different inputs
of same length, a TM may use different number of computational steps. Thus, this
scheme of measuring time as a function of the lengths of input might fail. In that
case, we consider the maximum time taken by a TM as a function of the lengths of
inputs.
We could also decide to take the minimum time or the average time instead of
the maximum time. But minimum time corresponds to those inputs with respect
to which the TM behaves at its best, and if our input is not of that type, we may
not get a solution in time; it would be unrealistically optimistic. The minimum time
analysis will only give an opportunistic lower bound on the time required by a TM for
performing a job. Just as the minimum time corresponds to the best case, maximum
time corresponds to the worst case; it is pessimistic.
Average time may be a better alternative, but it is too vague. As we cannot run the
algorithm on each of the possibly infinite inputs, we require a predefined distribution
on the set of all inputs (relevant to the problem) in order to assess efficiency of
an algorithm in the average case. Also, which average we would take: arithmetic
mean, geometric mean, or any (but which) weighted average? Moreover, all these
averages depend on the presumed distribution on the space of inputs. But then which
distribution is the appropriate one?
On the other hand, the pessimistic method of analyzing the worst case would give
us a guarantee that a particular algorithm would never take more time than what is
estimated. It will give an upper bound for all the cases at once, with a possibility that
an algorithm might perform better than expected. We agree to play safe by sticking
to the worst case analysis. That is, we will take the maximum time taken by a TM as
a function of the lengths of inputs.
Look at Fig. 10.1. Here, we assume that σ is an input symbol while τ is not. How
much time the TM M takes for reversing a string? If you run the machine on the
input ab, a string of length 2, the computation of M proceeds as follows:
b ab ab τ b
b τb bb τ b a
b τ b a b τ b a
b ab a
b ab a
b aτ
a b aτ b a b aτ ba b aτ ba b aτ ba b ab ba b ab b ba
b ab
ba
b ab ba
b a ba
bb ba b b ba.
Our interest is in finding out for an input of length n, how many steps of com-
putation is performed by M. For each input symbol σ, the machine M moves its
read–write head to the right once, writes τ , performs L b twice, and then writes
σ. The first L
b uses m movements if the mth symbol is currently being scanned.
(This was not shown correctly in the above trace of computation of M on input ab.)
10.2 Rate of Growth of Functions 329
¾= b
M : r ¿ L b L b ¾Rτ ¾
b
L b
L bh
Next L b takes the same number of movements again. The simulation of Rτ in-
volves 2m + 1 movements, including the initial b which is in the middle now. Thus,
if the mth symbol is currently being scanned, then this is processed in altogether
1 + 1 + m + m + 1 + 2m + 1 + 1 = 4m + 4 steps. Here, we assume that M takes the same
amount of time in moving its read–write head one square to the left or right, and also
writing or erasing a symbol on the scanned square.
After the reversal of the string has been written down to the left of the input string,
M erases the input and comes back to the b , which is now to the left of the reversed
string. For an input of length n, we see that after the reversal of the string, 2n + 1
squares on the tape are occupied. M comes to the middle blank after erasing the
input. This involves n left movements and n number of writing b . Next, it comes to
the left of the reversed string in n + 1 left movements, totaling to 3n + 1 units of time.
That is, the total time taken by M is
n
(4m + 4) + 3n + 1 = {4(n(n + 1)/2} + 4n + 3n + 1 = 2n 2 + 9n + 1.
m=0
Exercise 10.1. Determine whether f (n) = O(g(n)) and/or g(n) = O( f (n)), where
(a) f (n) = n 3 and g(n) = 9n 2 + 5n − 1.
(b) f (n) = log n and g(n) = 1000.
(c) f (n) = n log n and g(n) = (log n)3 .
(d) f (n) = n 15 and g(n) = (1.5)n .
Exercise 10.2. Determine whether f (n) = o(g(n)) and/or g(n) = o( f (n)), where the
functions f (n), g(n) are as in Exercise 10.1.
Other order symbols for denoting rates of growth are big-Ω, little-ω, and
big-Θ. Let f, g : N → N. We say that f (n) = Ω (g(n)) iff there exist constants
c > 0 and m ∈ N, such that for each n ≥ m, f (n) ≥ cg(n). f (n) = ω (g(n))
iff limn→∞ [ f (n)/g(n)] = ∞. And f (n) = Θ (g(n)) iff f (n) = O(g(n)) and g(n) =
O( f (n)).
These comparisons of numerical functions for large arguments are informally re-
ferred to as asymptotic comparisons. You would have, by now, guessed that there
are functions that are not asymptotically comparable to each other. For example, the
functions f, g defined by f (n) = n and g(n) = n (1+sin n) are not asymptotically
comparable. Can you show it?
As 2n 2 +9n +1 = O(n 2 ), we see that the TM M reverses a string in O(n 2 ) time. We
make further definitions that would help us in talking about the complexity issues. We
make it a convention to write f (n), g(n), s(n), t(n) for functions f, g, s, t : N → N.
332 10 Computational Complexity
10.1. In each of the following cases, determine whether f (n) = O(g(n)), g(n) =
O( f (n)), f (n) = o(g(n) and/or g(n) = o( f (n)) :
(a) f (n) = n 2 + 2n + 7 and g(n) = n 5 .
(b) f (n) = log n and g(n) = 2009.
(c) f (n) = n log n and g(n) = (log n)3 .
2
(d) f (n) = n log n and g(n)
√ =n .
(e) f (n) = n and g(n) = n.
(f) f (n) = n 2 and g(n) = n(log n)2 .
(g) f (n) = 3n and g(n) = 2cn , for some c > 1.
(h) f (n) = 2n and g(n) = n!.
10.4. Show that if f (n) = O(n 2 ), then ( f (n))2 = O(n 4 ). Generalize this result to one
involving the powers of n and the big-O notation.
10.5. Suppose for solving a problem we have two algorithms of which one runs
in f (n) time and the other runs in g(n) time, where f (n) = O(g(n)) but g(n)
O( f (n)). Is it possible that the algorithm whose running time is g(n) is preferable
for values of n up to 2009, but after that the other algorithm is better? If so, propose
a new algorithm that would be better than both of the earlier algorithms.
on n-time t(n). When NTMs are used to decide languages, we will use the terms
nondeterministic space complexity and nondeterministic time complexity for space
and time complexities, respectively. The corresponding space and time complexity
classes are defined as follows:
DS(s(n)) = {L : L is a language decided by an O(s(n)) space multitape DTM},
NS(s(n)) = {L : L is a language decided by an O(s(n)) space multitape NTM},
DT(t(n)) = {L : L is a language decided by an O(t(n)) time multitape DTM},
NT(t(n)) = {L : L is a language decided by an O(t(n)) time multitape NTM}.
The class DS is also called DSPACE, and similarly DT, N S, and N T are called,
respectively, as DTIME, NSPACE, and NTIME. If a language is decided by a deter-
ministic TM in 5n 2 +3n−2 space, then it is in the class DS(n 2 ) as 5n 2 +3n−2 = O(n 2 ).
However, the language is also in O(n m+2 ) space as 5n 2 + 3n − 2 = O(n m+2 ) for each
m ∈ N. Possibly, the language is not in O(n) space, but it cannot be so asserted in the
absence of any further information. For example, if it so happens that we construct
in a clever way another DTM that decides the same language in O(n) space (linear
space), then we have this sharpened result. This does not contradict the earlier state-
ment that the language is in O(n 2 ) space; it only improves that. Similar remark holds
for time complexity also.
Sometimes space complexity is defined in terms of the total space requirements
instead of the “working space” as we have defined. That way, minimum space re-
quirement is linear, as the input itself consumes (linear) space. For time complexity,
we have taken into account the input tape also. For all practical purposes, it is at least
linear; this is so whenever the input is completely read.
Notice that there are trivial cases though; for example, the halting machine which
never reads anything, but simply halts. The following examples will clarify the
notions.
Example 10.1. Which complexity classes does a regular language belong to?
Solution. Let D = (Q, Σ, δ, s, F) be a DFA that accepts the given regular language
L. We assume that b ∈ Σ and the symbols S, h, ∈ Q and that δ is a total function.
We construct a DTM M = (Q , Σ, Σ ∪ { b }, δ , S, h, ), where Q = Q ∪ {S, h, }
and δ is constructed from δ as in the following:
Initially, have the transition δ (S, R ) in M. Next, corresponding to each
b ) = (s,
transition δ(q, σ) = p of D, add a transition δ (q, σ) = ( p, R ) to M. Finally, cor-
responding to each final state r ∈ F of D, add the transition δ (r, σ) = (h, σ) and
corresponding to each nonfinal state t ∈ Q − F, add a transition δ (t, σ) = (, σ), for
each σ ∈ Σ.
This means that M initially transfers control to the initial state of D, then simu-
lates D, and finally, it enters the accepting state h when D has entered a final state,
otherwise M enters the rejecting state . It is clear that M decides L. In deciding L, it
has used n + 1 squares on the tape, but out of that the input itself occupies n squares.
Thus, it has used only one additional square, the initial b . That is, L ∈ DS(1). Notice
that had we defined space complexity counting all squares including the input, the
space complexity would have been (n + 1) instead of 1.
10.3 Complexity Classes 335
Example 10.2. What are the space and time complexities of the language L =
{ucu R : u ∈ {0, 1}∗ }?
Solution. We design a two-tape DTM M for deciding L. Input is given on the first
tape as usual. M first copies the string u from ucu R to the second tape, symbol by
symbol. At the end of this step, it has ucu R on the first tape, scanning the symbol c,
and u on the second tape, scanning the last symbol of u.
Next, it goes right leaving the c on the first tape and starts comparing the strings on
the two tapes, symbol by symbol; the head on the first tape moving to the right, while
the head on the second tape moving to the left. If a successful matching is obtained,
then M enters its accepting halt state h. If at any stage, there is a failure, then M
enters its rejecting halt state . For example, if there is no c on the input string, then
M would have encountered a b (before a c) at the end of the input string; in which
case, M would enter the state . Similarly, if one of the heads do not find a matching
symbol with the other, then M would enter the state .
In this process, we see that M has used only m squares on the second tape, when
the input has length 2m + 1. And this happens when it accepts the string. In the case
of nonacceptance, the input may not be symmetric about c, or at the worst, there may
not be any c at all in the input.
This is the worst case of an input for M. In such a case, M would have finished
copying the input to the second tape. Of course, it goes to the rejecting state after
that. Thus, the maximum space required for M to work (leaving the input) is n, where
n is the length of the input string. L ∈ DS(n). Or that L ∈ DS(O(n)).
Similarly, each symbol can be written down in four steps, one for reading (first
head), one for writing (second head), two for moving to the next square (both the
heads). Matching of a symbol is done in two steps (only moving of heads). Thus the
whole input requires 4n +2n plus some constant number for handling the initial steps,
and the final steps. That is, L ∈ DT(O(n)).
In Example 10.2, if we use a single-tape DTM, then the complexity may increase.
For example, consider a DTM that matches the first symbol with the last symbol of
the input and erases them. It continues till it encounters a c, or a 0, or a 1, or, a b,
and then decides accordingly. Then the DTM would not require any extra space than
that is occupied by the input string, where it also writes symbols.
Similarly, it uses an O(n 2 ) steps of computation, at the worst. This is so, because
it has to go back and forth for matching and erasing the matched symbols. Thus, we
may assert that L ∈ DS(n) and L ∈ DT(O(n 2 )) when a single tape machine is used.
Exercise 10.3. Find exactly how much time the DTM with a single tape (described
above) takes in deciding the language L of Example 10.2.
Example 10.3. Let L = {ψ(M) : M is an NFA and L(M) ∅}, where ψ(M) is the
binary encoding of M. What is the nondeterministic time complexity of L?
336 10 Computational Complexity
Clever designing of algorithms can save space and time. For example, consider the
problem of determining whether a given string is at all generated by a given CFG.
In general, there can be an exponential number of derivations to try. However, there
is a cubic time algorithm to solve this problem; it is known as CYK algorithm. The
algorithm uses a technique called dynamic programing. For each substring v of u, it
derives the set of all nonterminals that would possibly generate v, and then proceeds
inductively on the length of v. See Problem 10.88.
The language {0n 1n : n ∈ N} as a subset of {0, 1}∗ can be decided in O(n 2 ) time, as
all that one has to do is go on deleting a 0 from the left and a 1 from the right itera-
tively. However, there are O(n log n) time algorithms that decide the same language.
For example, here is one:
A LGORITHM 0n1n
Exercise 10.5. Show that Algorithm 0n1n decides the language {0n 1n : n ∈ N} in
O(n log n) time.
10.3 Complexity Classes 337
Notice that if we show that a certain decision problem has time complexity t(n), it
will remain so for all future time to come. How sharp is this estimate depends on the
currently available algorithm. Tomorrow we may get a sharper result proving that the
same decision problem has time complexity τ (n), where τ (n) = o(t(n)). That will
not invalidate our result of today; that will only sharpen it.
with minimum weight from among the unselected ones, provided it does not form
a cycle with the selected ones. The process terminates when n − 1 edges have been
selected. How much time does Krushkal’s algorithm take in constructing a minimum
spanning tree? Express the time taken in big-O notation.
Recall that DS(s(n)) denotes the class of languages, or of the decision problems that
are represented by these languages, that are decided by multitape total deterministic
Turing machines with a read-only tape where the input is given, using a maximum
of s(n) number of squares on the other tapes taken together, on any input of length
n. Similar is the class NS(s(n)), where the machines are nondeterministic and s(n) is
the maximum number of working squares used in any branch of computation of the
NTM. We begin with the question I asked you in the solution of Example 10.3 :
Is L = {ψ(M) : M is an NFA and L(M) ∅} ∈ DS(O(n 2 ))?
As L ∈ NS(O(n)), answer to this question is in the affirmative due to the following
theorem.
Theorem 10.1. (Savitch) Let s : N → N be such that s(n) ≥ log n for each n ≥ 1.
Then NS(s(n)) ⊆ DS((s(n))2 ).
Proof. Let M be a two-tape NTM with a separate read-only tape on which inputs are
given, and a working tape. For an input w to M define a configuration of M on w as a
quintuple (q, σ, u, τ, v), where q is the current state of M, σ is the currently scanned
symbol on the input (on the read-only tape), τ is the currently scanned symbol on
the working tape, u is the string to the left of τ , and v is the string to the right of τ on
the working tape. The input w, as it is, is not a part of the configuration of M on w;
it is on the read-only tape. The configurations on w reflect only the working tape, the
current state, and the currently scanned symbol.
Suppose M decides a language L in space s(n). We construct a DTM D which
tests whether one of the configurations of M on w can yield another configuration
of M on w within a specified number of steps. Notice that by solving this problem,
where C1 is the initial configuration of M on w, C2 is the final configuration of M
on w, and t is the maximum number of steps that M uses, we can determine whether
M accepts the input w or not. The following algorithm, named as, yi elds, when
implemented as a DTM just does this.
A LGORITHM yi elds(C1 , C2 , t)
Savitch’s theorem uses a multitape DTM to show that a language that is decided
in O(s(n)) n-space (nondeterministic space) can be decided in O((s(n))2 ) space (de-
terministic space). It says nothing about how big this constant in the O((s(n))2 ) is.
However, this constant can be made as small as you like by suitably decoding the
input alphabet of D.
Theorem 10.2. (Tape Compression) Let L ∈ DS(s(n)). Then for any constant c > 0,
there exists a DTM D such that D decides L having space complexity c · s(n).
Savitch’s theorem requires that the space complexity s(n) of a language be at least
log n. For large n, (log n)2 O(log n), but log n ≤ (log n)2 . Hence, there is no
guarantee that the complexity classes NS(log n) and DS(log n) are equal. We make a
shorthand
L S = DS(log n), N L S = NS(log n).
In other words, L S consists of languages that are decidable in deterministic
logarithmic space and NLS is the class of languages that are decidable in
nondeterministic logarithmic space. LS and NLS are also written as L OG S PACE
340 10 Computational Complexity
Define CoNLS as the class of languages whose complements are in NLS. It can
be shown that the complement of PATH is in NLS, that is, PATH ∈ CoNLS. Because
of NLS-completeness of PATH, it follows that CoNLS = NLS. It is also known that
CoNLS is a proper subclass of DPS though it is not yet known whether L S is a proper
subclass of NLS. The results in this regard may be summarized as
10.19. Design a DTM that accepts the language {wcw R : w ∈ {a, b}∗ } in O(n) space.
10.21. Prove: Suppose the language L is accepted by an NTM with space complexity
s(n). Then for any c > 0, L is accepted by an NTM with space complexity c · s(n).
10.25. Which of the following are true, which false, and which ones you think cannot
be determined to be true or false on the basis of the material discussed so far?
(a) NS(log n) ⊆ DS((log n)2 ).
(b) NS(n) DS(n 2 log n).
(c) NS(n 2 ) ⊆ DS(n 4 ).
(d) NS(2n ) ⊆ DS(4n ).
(e) DS(n 6 ) ⊆ DS(n 8 ).
10.26. Suppose that f and g are functions that can be computed in logarithmic space.
Show that the composition f ◦ g can be computed in logarithmic space. Deduce the
transitivity of log space reduction, that is, the composition of two log space reduc-
tions is a log space reduction.
10.29. Assume that there is an NLS-complete language L such that L ∈ NLS. Show
that CoNLS = NLS.
Exercise 10.6. In the proof of Theorem 10.4, where did we use the assumption that
t(n) ≥ n for each n ∈ N?
Theorem 10.5. Let t : N → N be a function with t(n) ≥ n for each n ∈ N. Then, for
each multitape NTM that decides a language in time t(n), there exists a single tape
NTM that decides the same language in time O((t(n))2 ).
Proof. Earlier we have described how a three-tape DTM could simulate an NTM.
The same simulation would serve our purpose. Notice that the simulation uses a brute
force approach; it simulates the branches of computations of the NTM breadth-first
way.
Let M be a single tape NTM that decides a language L in time t(n) on an input
of length n. Each branch of computation of M has depth at most t(n). As M is
nondeterministic, for a state–symbol pair, there can be many choices of the next-
move; nonetheless, they are finite in number. Suppose c is the maximum number of
such choices of next-moves for each state–symbol pair in the transition relation of
M. That means, each node in the computation tree of M on a given input of length n
has at the most c children. Thus the computation tree has a maximum of ct(n) number
of leaves.
The DTM D limits its search to the point when a success is met, that is, the
relevant portion of the tree, looked at breadth-first way, has depth at most t(n). The
number of nodes in this portion of the computation tree of M cannot exceed t(n) ct(n)
nodes. Hence, the running time of D is O(t(n)ct(n) ). However, D is a three-tape
DTM. By Theorem 10.5, there is a DTM that decides L in O((t(n)ct(n) )2 ) time, which
is same as 2 O(t(n)) .
Exercise 10.7. Show that if t : N → N satisfies t(n) ≥ n for each n ∈ N, and c > 1
is any real number, then O((t(n)ct(n) )2 ) = 2 O(t(n)) .
Because of the absence of any speedup result regarding the simulation of a non-
deterministic decider, by a deterministic decider, we may expect a gap between DPT
and NPT. This is unlike the scenario in space complexity, where Savitch’s theorem
does the trick. But what are DPT and NPT?
DPT is the class of languages that are decidable in polynomial time by single tape
DTMs. Similarly, NPT consists of all languages that are decidable in polynomial
time by single tape NTMs. From Theorem 10.5, it does not matter whether we use
multitape or single tape machines for deciding languages as long as our interest lies
in bigger classes such as DPT or NPT.
We make further shorthands and call DPT as P and NPT as NP. Along with P,
NP, we also define EXP, the class of languages decidable in exponential time.
10.5 Time Complexity 345
14. An algorithm to decide whether a given string is at all generated by a given CFG.
(CYK Algorithm, not proved in this book; see Problem 10.88 in Sect. 10.10.)
15. An algorithm to decide whether a given string is at all accepted by a given PDA.
In view of the above results we do not yet have any answers to the following
questions:
1. Does there exist a polynomial time algorithm to solve any of the problems in
Theorem 10.8 above?
2. Does there exist a polynomial time algorithm to construct an NFA with least
number of states from a given NFA so that they accept the same language?
3. Does there exist a polynomial time algorithm to construct a DFA with least
number of states from a given NFA so that they accept the same language?
10.35. Construct a list of TMs that run in linear time such that every language decid-
able in linear time is accepted by some TM on the list. Build a TM that diagonalizes
over the list. Then conclude that there exists a decidable language that is not decid-
able in linear time.
10.36. Show that there is no computable total function f (n) such that every decidable
language is in DT( f (n)).
10.5 Time Complexity 347
10.37. What are the space and time complexities of the language {a n bn : n ∈ N}?
Improve your algorithm to get a best space-complexity bound. Improve your first
algorithm again to obtain the best time-complexity bound. Can the algorithm be im-
proved so that the best bounds for both of space- and time-complexity are achieved
simultaneously?
10.38. Show that there exist polynomial time algorithms for the following:
(a) Constructing a PDA equivalent to a given CFG.
(b) Constructing a CFG equivalent to a given PDA.
(c) Constructing a CFG in CNF equivalent to a given CFG.
(d) Converting a PDA accepting by final states to one accepting by empty stack.
(e) Converting a PDA accepting by empty stack to one accepting by final states.
10.39. Can you find a polynomial time algorithm for converting a CFG to one in
GNF? If yes, give such an algorithm and show why is it polynomial. If no, give
reasons.
10.40. Show that whether a CFG generates any string at all (emptiness check) can be
checked in linear time.
10.41. Show that the problem “whether a symbol is reachable in a CFG ” can be
checked in liner time.
10.42. Show that the problem “whether a given nonterminal in a CFG is nullable
(derives ε)” can be checked in linear time.
10.45. Give an algorithm for deciding whether the language of a given CFG is infi-
nite. Can you find a polynomial time algorithm?
10.46. Show that the problem of relatively prime, that is, “Given two integers
m, n > 1, whether m, n are relatively prime”, is in P.
10.49. Prove that the graph connectedness problem,: that is, “Is a given undirected
graph connected?,” is in P.
10.50. Show that the problem of “determining whether a given undirected graph has
a triangle” is in P. This is called the triangle problem in graphs.
348 10 Computational Complexity
10.52. Which of the following are true, which false, and which ones you think cannot
be determined to be true or false on the basis of the material discussed so far?
(a) DT(2n ) DT(n 2 2n ).
(b) NT(n) ⊆ DS(n 2 ).
(c) NT(n 6 ) ⊆ DS(n 8 ).
(d) E XP ⊆ DT(2n O(1) ) = ∪k>0 DT(2nk ).
A polynomial time verifier is a verifier that runs on a polynomial time in the length
of w. A string such as c in the ordered pair (w, c) is called the certificate or a proof
of membership. We say that L is a polynomial time verifiable language if there is a
polynomial time verifier for L.
Informally, the string ψ(w)# ψ(c) is replaced by the ordered pair (w, c) as the
former is simply an encoding of the latter. Though, in a verifier, time spent on ma-
nipulating the certificate c is not explicitly charged, it has to be accessed and used.
Thus the length of c must be bounded by a polynomial in the length of w. We will
illustrate this with an example from propositional logic.
Recall that a propositional language is built upon a finite number of symbols,
called atomic propositions, { p0 , p1 , . . . , pn } using the connectives ¬, ∧, ∨, →, ↔ .
When a proposition uses the only connectives ¬, ∨, and ∧, we say that it is a boolean
formula. In a propositional language, a literal is defined as any atomic proposition
10.6 The Class NP 349
( p2 ∨ ¬ p3 ∨ p5 ) ∧ ( p1 ∨ p2 ∨ p10 ∨ ¬ p5 ) ∧ ( p0 ∨ p3 ∨ p6 ∨ p8 )
is a cnf. It is easy to see that every proposition is logically equivalent to one in cnf.
For example, the proposition ( p2 ↔ p3 ) ↔ p4 is equivalent to the cnf ( p2 ∨ p4 ) ∧
(¬ p3 ∨ p4 ) ∧ (¬ p2 ∨ p3 ∨ ¬ p4 ). Here, we say that two propositions x and y are
equivalent (x ≡ y) iff whatever interpretation (an assignment of values of 0 and
1 to the atomic propositions) i you take, i (x) = i (y). Please go through Sect. 9.7
if you have forgotten how connectives are interpreted. It can be easily shown that
x → y ≡ ¬x ∨ y, x ↔ y ≡ (¬x ∨ y) ∧ (x ∨ ¬y), ¬(x ∧ y) ≡ ¬x ∨ ¬y, ¬(x ∨
y) ≡ ¬x ∧ ¬y and ¬¬x ≡ x. Moreover, the connectives ∧ and ∨ distribute over each
other, that is, (x ∧ y) ∨ z ≡ (x ∨ z) ∧ (y ∨ z) and (x ∨ y) ∧ z ≡ (x ∧ z) ∨ (y ∧ z). The
connectives ∧ and ∨ are associative and commutative. Using these equivalences, any
proposition can be brought to an equivalent cnf. For example,
( p2 → p3 ) ↔ p4
≡ (¬ p2 ∨ p3 ) ↔ p4
≡ (¬(¬ p2 ∨ p3 ) ∨ p4 ) ∧ ((¬ p2 ∨ p3 ) ∨ ¬ p4 )
≡ ((¬¬ p2 ∧ ¬ p3 ) ∨ p4 ) ∧ (¬ p2 ∨ p3 ∨ ¬ p4 )
≡ (( p2 ∧ ¬ p3 ) ∨ p4 ) ∧ (¬ p2 ∨ p3 ∨ ¬ p4 )
≡ (( p2 ∨ p4 ) ∧ (¬ p3 ∨ p4 )) ∧ (¬ p2 ∨ p3 ∨ ¬ p4 )
≡ ( p2 ∨ p4 ) ∧ (¬ p3 ∨ p4 ) ∧ (¬ p2 ∨ p3 ∨ ¬ p4 ).
A proposition is called satisfiable if it has a model, that is, if there is an
interpretation that assigns the cnf to 1. Of course, we follow the rules that i (x ∧ y) = 1
iff i (x) = i (y) = 1; i (x ∨ y) = 0 iff i (x) = i (y) = 0; and i (¬x) = 1 iff i (x) = 0.
For example, the cnf ( p2 ∨ p4 ) is satisfiable as the interpretation i with i ( p2) =
i ( p4) = 1 evaluates ( p2 ∨ p4 ) to 1. Similarly, (¬ p3 ∨ p4 ) is also satisfiable as the in-
terpretation i with i ( p3) = 0, i ( p4 ) = 1 satisfies it (assigns it to 1). Even the
interpretation i with i ( p3) = i ( p4 ) = 1 satisfies (¬ p3 ∨ p4 ). You can see that the dis-
junctive clause (¬ p2 ∨ p3 ∨ ¬ p4 ) is also satisfiable. But this does not make the cnf
x = ( p2 ∨ p4 ) ∧ (¬ p3 ∨ p4 ) ∧ (¬ p2 ∨ p3 ∨ ¬ p4 ) satisfiable. To show that it is
satisfiable, we must have at least one interpretation j that satisfies all the disjunc-
tive clauses in it simultaneously. This also happens here, as the interpretation j with
j ( p2) = 0, j ( p3) = 0, j ( p4) = 1 satisfies all these disjunctive clauses. Hence the cnf
x is satisfiable.
Moreover, there is a way to construct a cnf from a given boolean formula in poly-
nomial time so that the boolean formula is satisfiable iff the constructed cnf is sat-
isfiable; see Problem 10.68 at the end of this section. If you are mystified, do not
worry; we will be concerned with cnfs only, forgetting how we have arrived at that.
The satisfiability problem for cnfs is defined below.
350 10 Computational Complexity
for a string w. Notice that once this choice of a branch of computation is fixed, M
behaves deterministically. The verifier V uses the pair (w, c) and simulates M on w
with the nondeterministic choice specified by c. If this branch of computation of M
accepts w, then V goes to the accept state, else V enters the reject state. Clearly, V is
a polynomial time verifier for L as M takes a polynomial time to decide L.
Conversely, suppose V is a polynomial time verifier for L. Without loss of gen-
erality, assume that V , as a Turing machine, runs in time at most n k on any input
of length n. We construct an NTM M, which, on an input w of length n, chooses a
certificate c of length at most n k nondeterministically. Then, M simulates V on the
pair (w, c) and accepts w if V accepts (w, c), else M rejects w. Clearly, M decides L,
as V does so, in time n k .
Theorem 10.10 formalizes the intuition that NP is the class of languages (decision
problems) for which a suggested solution can be verified to determine whether it is
indeed a solution or not in polynomial time.
Related to the class NP is the class CoNP, where L ∈ CoNP iff its complement
L ∈ NP. Closure of NP with respect to complementation leads to this class; see the
problems below.
10.54. Prove that P is closed under the operations of union, intersection, comple-
mentation, concatenation, reversal, inverse homomorphism, and Kleene star.
10.57. It is known that P ⊆ NP ⊆ DPS and P ⊆ CoNP ⊆ DPS. Which of the follow-
ing are known and which are yet unknown?
(a) P ∅.
(b) NP − P ∅.
(c) CoNP − P ∅.
(d) (NP ∩ CoNP) − P ∅.
(e) DPS − (NP ∪ CoNP) ∅.
10.59. Show that S UB S UM, the subset sum problem, that is, “Given a set A of
numbers m 1 , m 2 , . . . , m k and a target number t, does there exist a subset B of A
such that the sum of all numbers in B is t?” is in NP.
10.7 NP-Completeness
Proof. You need not read this proof; it is easy to work it out. As L ∈ P, there is a
polynomial time DTM M2 that decides whether a string w ∈ Σ ∗ of length m is in L
or not in time O(m k ) for some k ∈ N. As L ≤ P L , there is a function f : Σ ∗ → Σ ∗
and a DTM M1 such that M1 outputs f (u) for a given u ∈ Σ ∗ of length m in O(m r )
time for some r ∈ N. Moreover, u ∈ L iff f (u) ∈ L .
Now, the M1 in the combined DTM M1 → M2 , upon input u ∈ Σ ∗ of length n,
computes f (u) in time O(nr ) time. Then the M2 in the combined machine decides
whether f (u) ∈ L (or equivalently, u ∈ L) in time O((nr ))k . Thus, M1 → M2 decides
u ∈ L in a polynomial time.
If we think of P as the class of decision problems that are easily solvable as many
computer scientists think, then polynomial time reduction of L to L says that L is
as easy as L . Looked another way, we may say that L is as hard as L. What about
the problems in NP? If L is polynomial time reducible to L and L ∈ NP, then
the polynomial time verifier for L combined with the polynomial time reduction
(the machine that computes it) would provide a polynomial time verifier for L. Hence
Theorem 10.11 can be read correctly replacing P by NP throughout. That is,
A little thought shows that even P can be replaced by E X P, the same way. In
fact, any class of decision problems that possibly take more time than a polynomial
time remains so under a polynomial time reduction. An important class of problems,
called NP-hard problems, are obtained by employing this idea. This class contains all
those decision problems that are at least as hard to solve as any typical NP-problem.
A language L is called NP-hard if every language L ∈ NP is polynomial time
reducible to L. A language L is called NP-complete if L ∈ NP and L is NP-hard.
NP-hardness looks like a big constraint, because to show that a language is NP-
hard, one must have a polynomial time reduction of each and every language in NP
to this language. Nonetheless, there are NP-hard languages (or NP-hard problems).
The first such problem to be discovered was S AT.
The problems in NP can differ wildly from each other. The fact that they are in
NP guarantees that each of those is decided in polynomial time by an NTM. The
key idea in the construction of a polynomial time reduction of any problem in NP to
S AT is to encrypt the work-out of a polynomial time NTM as a cnf. This encryption
is analogous to the symbolization of an argument in English to propositional logic.
Let M be an NTM with a single tape having Q = {q0 , q1 , . . . , qm } as the set of
states and the tape alphabet Γ = {γ1 , γ2 , . . . , γ p }, where γ1 is the symbol b . We take
q0 as the initial state, q1 as the accepting state h, and q2 as the rejecting state . The
transition relation Δ is a finite subset of (Q × Γ ) × (Q × (Γ ∪ { L , R })). Let w be an
input to M with length n. Assume that M is a polynomial time decider that decides
the language L ⊆ Σ ∗ , where Σ is some subset of Γ − {γ1 }. There is a polynomial
f : N → N such that each w ∈ Σ ∗ of length n is decided by M (w ∈ L when M
accepts w and w ∈ L if M rejects w) in time f (n).
The string w is given as an input to M the standard way, the read–write head of M
scanning the b just to the left of w. Number this square containing the b as 0, to the
left of it as −1, −2, −3 · · · , and to the right of it as 1, 2, 3, · · · in succession. As M
decides w in time f (n), only the tape squares numbered − f (n) to f (n) might, in the
worst case, be used by M during its computation on w. Without loss of generality,
assume that M accepts w in time f (n) exactly, as otherwise, when M accepts w in
less than f (n) steps, we can let it work till f (n)th step in a trivial manner such as
changing its states, or going left and right, etc.
Exercise 10.8. Let M be an NTM that accepts a string w ∈ L of length n in less than
f (n) steps, where L is any language. Show that there is an NTM N that accepts any
string w ∈ L of length n in exactly f (n) steps.
0
x 01 : b ) at 0th step of computation of M.
The 0th tape square contains γ1 (=
y00 : The read–write head of M is scanning the 0th tape square at 0th step of
computation of M.
z00 : M is in state q0 at 0th step of its computation.
The initial configuration of M on input w = γi1 , γi2 · · · γin with initial state q0 ,
when the read–write head is scanning the 0th square containing the symbol γ1 (= b)
is described as follows :
0 0 0 0 0
A: x (− f (n))1 ∧ x (− f (n)+1)1 ∧ · · · ∧ x (−1)1 ∧ z00 ∧ y00 ∧ x 01 ∧ x 1i 1
∧x 2i0 2 ∧ · · · ∧ x ni
0
n
0
∧ x (n+1)1 ∧ · · · ∧ x 0f (n)1
This also says that each square beyond the nth one contains γ1 .
⎡ ⎤ ⎡ ⎤
Bk : ⎣ yik ⎦ ∧ ⎣ (¬yik ∨ ¬ylk )⎦,
− f (n)≤i≤ f (n) − f (n)≤il≤ f (n)
B: B1 ∧ B2 ∧ · · · ∧ B f (n)
As ¬yik ∨ ¬y j k ≡ yik → ¬y j k ≡ y j k → ¬yik , the cnf Bk says that at the kth step
of computation of M, only one tape square is scanned. The cnf B says that at each
step of computation of M, only one tape square is scanned.
k
Cik : ¬x i j ∨ ¬x ilk , C: Cik
1≤ j l≤ p − f (n)≤i≤ f (n) 1≤k≤ f (n)
As ¬x ikj ∨ ¬x ilk ≡ x ikj → ¬x ilk ≡ x ilk → ¬x ikj , Cik says that at kth step of
computation of M, the i th square contains only one symbol from Γ. Then, C asserts
that at each step of computation of M, each square contains only one symbol from Γ.
Dk : (¬zrk ∨ ¬zlk ), D: Dk
0≤rl≤m 0≤k≤ f (n)
The cnf Dk says that at kth step of computation, M stays at only one state. Thus,
D means that at any step of computation, M stays at only one state.
Next, using our vocabulary we translate each transition of M into a clause. There
are three possible types of transitions, such as symbol writing, left moving, or right
moving. We take them in turn.
10.7 NP-Completeness 355
k k
Sαβθ τ : (¬x iβ ∨ ¬yik ∨ ¬zαk ∨ zθ (k+1) ) ∧ (¬x iβ ∨ ¬yik ∨ ¬zαk ∨ yi(k+1) )
k
∧(¬x iβ ∨ ¬yik ∨ ¬zαk ∨ x iτk+1 ).
This says that if M is in state qα scanning the symbol γβ on the i th square at kth
step of computation, then it goes to state qθ , writing γτ on the i th square at (k + 1)th
k
step of its computation. This is so as the cnf Sαβθ τ is equivalent to (zαk ∧ yik ∧ x iβ )→
k+1
(zθ (k+1) ∧ yi(k+1) ∧ x iτ ).
For transitions of the form t = (qα , γβ , qθ , L ) ∈ Δ, we take the cnf
k k
L αβθ : (¬x iβ ∨ ¬yik ∨ ¬zαk ∨ zθ (k+1) ) ∧ (¬x iβ ∨ ¬yik ∨ ¬zαk ∨ y(i−1)(k+1) )
k k+1
∧(¬x iβ ∨ ¬yik ∨ ¬zαk ∨ x (i−1)τ ).
k k+1
The cnf L αβθ is equivalent to (zαk ∧ yik ∧ x iβ ) → (zθ (k+1) ∧ y(i−1)(k+1) ∧ x (i−1)τ ). It
thus says that if M is in state qα scanning the symbol γβ on the i th square at kth step
of computation, then it goes to state qθ , moves to the (i − 1)th square at (k + 1)th step
of its computation.
The transitions t = (qα , γβ , qθ ,
R ) ∈ Δ of the third type are translated to
k k
Rαβθ : (¬x iβ ∨ ¬yik ∨ ¬zαk ∨ zθ (k+1) ) ∧ (¬x iβ ∨ ¬yik ∨ ¬zαk ∨ y(i+1)(k+1) )
k k+1
∧(¬x iβ ∨ ¬yik ∨ ¬zαk ∨ x (i+1)τ ).
This says that if M is in state qα scanning the symbol γβ on the i th square at kth
step of computation, then it goes to state qθ , moves to the (i + 1)th square at (k + 1)th
step of its computation. Notice that the proposition that captures this description is
k k+1
(zαk ∧ yik ∧ x iβ ) → (zθ (k+1) ∧ y(i+1)(k+1) ∧ x (i+1)τ ), which is equivalent to the cnf Rαβθ .
Thus the transitions in Δ and the computation of M following these transitions for
correct updating of states, position of the read–write head, and the currently scanned
square are encapsulated by the cnf
E : E it j k .
− f (n)≤i≤ f (n) 1≤ j ≤ p 0≤k≤ f (n) t∈Δ
The subscript 1 in z1 f (n) refers to the state q1 , which is the accepting halt state.
Notice that this statement takes a simpler form due to our assumption that M halts at
exactly f (n)th step.
Proof. By Theorem 10.9, S AT ∈ NP. To see that S AT is NP-hard, we use the above
construction. Our aim is to give a polynomial time reduction of any language in
NP to S AT. Any language in NP is decided by a polynomial time verifier. Thus,
membership problem for any such language is same as the acceptance problem of any
such verifier. It is enough to see how such verifiers can be reduced to S AT instances.
Let M be a polynomial time verifier for a language L ∈ NP. Suppose w ∈ Σ ∗ is
given as an input to the NTM M with a certificate c. The certificate supplies the
choice of a branch of computation for this w. By following this branch, M decides
whether w ∈ L or not. Now, M accepts w iff the cnf X = A ∧ B ∧ C ∧ D ∧ E ∧ F is
satisfiable. Notice that the certificate c correctly constructs the cnf E. However, we
must show that the construction above takes a polynomial time. To this end, we count
the time taken by each of the six cnfs.
The number of atomic propositions, that is, the x ikj , yik , zrk , is altogether of the
order of O(( f (n))2 ). Thus A contains at the most an O(( f (n))2 ) literals. Bk has
similarly an O( f (n)) literals; so B has at most O(( f (n))2 ) literals. Similarly, C has
O(( f (n))2 ), D has O( f (n)), E has O(( f (n))2 ), and F has O( f (n)) literals, at the
most. Thus X contains at most an O(( f (n))2 ) literals. As an O(( f (n))2 ) number of
letters can be encoded as an O(log f (n)) digited binary number, the length of the cnf
X is of the order O((log f (n) · f (n))2 ), which is (bounded above by) of the order
O(( f (n))4 ), a polynomial in n, the length of w. This completes the proof.
Proof. Suppose L is NP-hard. Let L̂ ∈ NP. Then there is a polynomial time reduc-
tion f from L̂ to L. As L ≤ P L , there is a function g, which is a polynomial time
reduction of L to L . Then the composition g ◦ f is a polynomial time reduction of
L̂ to L .
reduction, starting from S AT. However, what is the fun in knowing these difficult
but next-to-easy problems? The reason is the following result.
I repeat. If you can show that any one of the NP-complete problems is in P, then
all NP-problems will have to be in P. Constructively, if you can find a polynomial
time algorithm for deciding any one of the NP-complete problems, then that algo-
rithm can be used to solve each and every problem in NP. For example, if S AT ∈ P,
then P = NP. As all approaches to prove S AT ∈ P have failed, people believe that
probably P NP. However, to show this, one must prove that S AT ∈ P or that
some NP-problem is not in P.
i
10.60. Suppose that f and g are functions that can be computed in time ∪∞
i=0 DT(n ).
k
Show that the composition f ◦g can be computed in time n for some k ∈ N. Deduce
the transitivity of polynomial time reduction.
10.61. Show that if every NP-hard language is DPS-hard, then DPS = NP.
10.62. Let L be any nonempty proper subset of Σ ∗ for an alphabet Σ. Show that if
P = NP and L ∈ P, then L is NP-complete.
10.63. Let L 1 , L 2 , L 3 be three languages over some alphabet Σ that does not contain
the symbol #. Assume that L 1 ∈ P, L 2 is NP-complete, and L 3 ∈ NP. Answer in
each case below (a–e) whether the given language is
(i) in P,
(ii) in NP but not NP-complete,
(iii) NP-complete,
(iv) not in NP, or
(v) none of (i-iv):
(a) L i ∪ L j , for 1 ≤ i, j ≤ 3.
(b) L i ∩ L j , for 1 ≤ i, j ≤ 3.
(c) L i L j , for 1 ≤ i, j ≤ 3.
(d) L i # L j , for 1 ≤ i, j ≤ 3.
(e) Σ ∗ − L i , for 1 ≤ i ≤ 3.
10.65. Show that the Euler cycle problem, that is, “Given an undirected graph G,
does there exist a cycle in G that uses each edge exactly once?” is in P. What about
Euler cycle problem in directed graphs?
10.66. The isomorphism problem in undirected graphs asks for the existence of
an isomorphism between two given undirected graphs. Show that the isomorphism
problem of graphs is in NP. Is it NP-complete?
It is generally believed that all the containments above are proper containments,
though only the following have been known to be so:
You are invited to try them and make your mark in the mathematical world by
proving any of the other containments to be proper or by showing that some of them
are not so.
It is also believed that NP and CoNP are different. Notice that when L is decided
by an NTM in polynomial time, it does not mean that L is decided in polynomial
time. This is due to the anomaly discussed in Sect. 10.5. However, you can easily see
that P is closed under complementation, that is, P = Co-P.
To give an evidence for the conjecture that NP and CoNP are different, consider
the problem U N S AT:
Exercise 10.9. Prove that NP = CoNP iff there exists an NP-complete problem
whose complement is in NP.
There are many interesting problems that can be seen naturally to be in CoNP. A
problem related to U N S AT is VAL, or sometimes written as TAUT:
Given a proposition, whether it is valid or not.
VAL asks whether each interpretation of the given proposition is its model or not. It
is in CoNP because the problem U N S AT is in CoNP. The connection between VAL
and U N S AT is that a proposition E is valid iff its negation ¬E is unsatisfiable.
You should not be misguided here by the fact that validity of a cnf can be checked
quite easily. (How?) It is because conversion of a proposition to a cnf that preserves
validity is not a polynomial process unlike the conversion that preserves satisfiability;
see the problems following this section. What about the validity problem when we
have propositions in dnf, that is, when the proposition is given as a disjunction of
conjunctive clauses? Is it in NP or in CoNP?
It is generally argued that P represents the easy or tractable problems. That is, if a
decision problem or its corresponding language is known to be in P, then the prob-
lem can be solved for all practical purposes. It is thought that spending a polynomial
time is well within our reach, though polynomials of degree four or more become
practically unmanageable for large inputs.
For example determining whether an integer is prime or not has been proved to be
solvable in time O(n 11 ). However, the cryptographic systems those keep their secrets
under the risk of being broken by primality testing have not yet been broken. Because
of such reasons, many believe that P is too huge a complexity class to represent truly
tractable problems.
On the other hand, the next best in our containment of complexity classes is NP,
the decision problems, which have polynomial time verifiers. It seems that most of
the interesting problems are in NP, and are, in fact, NP-complete. It would be a nice
piece of magic to see that discovering a solution to a problem coincide with verifying
a suggested solution (in polynomial time).
Human experience shows that the jobs of discovering a solution and verifying a
suggested solution are a way apart in terms of the time they would take. It is thus gen-
erally believed that P NP. Because of its aesthetic appeal, it stands out among all
other open problems concerning the left-out containments of the complexity classes.
We have seen that to settle this problem, it would be enough to show whether one
(just one) of the NP-complete problems is in P or not. We do not know whether the
?
solution will be so straight forward to solve P = NP problem. The NP-complete
problems represent hard problems next to polynomial time solvable problems, and
?
due to our ignorance of P = NP, they are, in fact, at the boundary of tractable and
intractable problems.
In this section, we discuss some other interesting NP-complete problems.
We begin with a S AT related problem. A kcnf (Read it as k cnf.) is a cnf where
each disjunctive clause has at most k literals. For example, (¬ p1 ∨ p2 ) ∧ p3 ∧ ¬ p5 is
a 2cnf, whereas ( p1 ∨ ¬ p0 ) ∧ ¬ p3 ∧ ( p4 ∨ p6 ∨ p5 ) is a 3cnf. The problem kS AT is,
Given a kcnf, is it satisfiable?
360 10 Computational Complexity
It is the subproblem of S AT where each cnf has clauses with at most k literals.
First, we see that 1S AT ∈ P; how? A 1S AT instance is a 1cnf that looks like q1 ∧
q2 ∧· · ·∧qm , where each qi is a literal. To see whether it is satisfiable, you just assign
1 (i.e., true) to each literal, and the 1cnf is satisfiable. This procedure may fail when
some qi is negation of some q j . In that case, the 1cnf is unsatisfiable. This heuristic
is directly implemented in the following algorithm:
A LGORITHM 1S AT(i, m)
Exercise 10.10. Show that 1S AT(1, m) correctly solves 1S AT. Also give an iterative
algorithm to solve 1S AT.
In the worst case, the Algorithm 1S AT(1, m) will go through m number of recur-
sive steps dealing with m + (m − 1) + · · ·+ 1 steps. Thus, 1S AT solves 1S AT in O(m 2 )
steps where m is the number of literals in the given 1cnf; 1S AT ∈ P.
What about 2S AT? You can now try an algorithm for checking satisfiability of a
2cnf employing the same idea as in Algorithm 1S AT(1, m). Of course, it needs some
modifications.
Suppose there is a single literal that is a clause in the given 2cnf. If its negation is
also a clause, then the 2cnf is unsatisfiable. So, this should be our first check. If you
get a clause, which is a single literal, but the negation of the literal is not a clause,
then in order that the cnf is satisfiable, this literal must be assigned to 1.
This amounts to omitting the literal from the cnf, as conjunction of the remaining
clauses of the cnf (omitting this single-literal clause, which has been assigned to 1)
is satisfiable iff the original 2cnf is satisfiable. Also, you can delete all those clauses
wherever this literal occurs, as all those clauses are also evaluated to 1.
Thus, the updated 2cnf is now smaller and it contains less number of atomic
propositions than the original. This is written below as the Stage 1 of our procedure
for 2S AT.
On the other hand, when each clause in the 2cnf is a two-literals clause, we try to
get a model of the 2cnf by assigning a chosen literal to 0, 1, alternately. Suppose we
assign a literal (Call it the chosen literal.) in a clause to 0, to begin with. Then, this
particular clause can be satisfied, provided the other literal in it is satisfied.
Notice that if the clause that contains this chosen literal has another literal, then
that other literal must also be satisfied. Further, if any clause contains the negation
of this chosen literal, then that clause is automatically satisfied as the negation of the
chosen literal is assigned 1.
It may quite well happen that the 2cnf has no model where the chosen literal is
assigned to 0. In that case, we will not succeed in satisfying the 2cnf by following
the method described in the above paragraph. We then go back and assign the chosen
10.8 Some NP-Complete Problems 361
literal as 1. In this case, the clauses that contain this chosen literal are satisfied; and
any clause that contains the negation of this chosen literal is satisfied if (only) the
other literal in it is satisfied.
We use these observations to describe a procedure for solving 2S AT. Let x be a
given 2cnf. We make it a point to write both the clauses p ∨ q and q ∨ p as p ∨ q,
taking commutativity of ∨ into consideration. The finer details of the procedure are
revealed in three stages as given below.
A LGORITHM 2S AT
Stage 1 : Scan the clauses in x for a single-literal clause. If x contains no single-
literal clause, then perform Stage 2. Otherwise, let p be a single-literal clause of x.
Scan further for the single-literal clause ¬ p. If ¬ p is also a clause of x, then report
that x is unsatisfiable. If ¬ p is not a clause of x, then scan x from the beginning.
Drop each clause that contains p and drop ¬ p from each clause that contains ¬ p.
That is, update x by deleting all clauses of the form p ∨ q and by replacing each
clause of the form ¬ p ∨ q by q.
Repeat Stage 1 as long as possible. If the result is empty, that is, every clause has
been deleted in the process, then report that x is satisfiable. Otherwise, the result is a
2cnf A, each clause of which has exactly two literals. Then perform Stage 2.
Stage 2 : Take the first literal in the first clause of A; say p, in the first clause p ∨ q
of A. Scan A from the beginning. Drop the literal p from each clause that contains
p, and drop each clause that contains ¬ p, from A. That is, update A by replacing
each clause of the form p ∨ q by q, and by deleting each clause of the form ¬ p ∨ q.
Call the updated 2cnf as B. Execute Stage 1 repeatedly on the updated 2cnf. This
will result in one of the following three cases.
(a) Reporting that B is unsatisfiable
(b) An empty cnf
(c) A cnf C having only two-literals clauses
In the case (a), execute Stage 3 as given below. In the case (b), report that x is satis-
fiable. In the case (c), repeat Stage 2 on C.
Stage 3 : Go back to the 2cnf A. Let p ∨ q be the first clause of A. Scan each clause
of A. Update A to D by dropping each clause of the form p ∨ r . Scan D for an
occurrence of the literal ¬ p. If ¬ p does not occur in D, then execute Stage 2 on
D. Otherwise, update D to E by dropping the literal ¬ p from each clause of the
form ¬ p ∨ r . Now, E has at least one single-literal clause. Execute Stage 1 on E
repeatedly. This will halt resulting in either of the following:
In the case (a), report that x is unsatisfiable. In the case (b), report that x is satisfiable.
In the case (c), execute Stage 2 on F.
362 10 Computational Complexity
Exercise 10.12. Show that A LGORITHM 2S AT correctly solves 2S AT. The catch is
to see that in Stage 2, the 2cnf A is satisfiable if the 2cnf C is satisfiable.
Stage 1 of the procedure eliminates at least one atomic proposition, and Stages 2
and 3 together eliminate one. Moreover, unsatisfiability of the 2cnf x is reported
while executing Stage 1. The worst case scenario corresponds to the case when all
atomic propositions are eliminated one by one in Stage 2 followed by an execution
of Stage 3, and when finally, x is found to be satisfiable. Initially, the 2cnf is scanned
for checking for a possible application of Stage 1, anyway.
Suppose n is the length of the 2cnf. Initial scanning of the 2cnf for a single lit-
eral clause takes an O(n) time. Repeated execution of Stage 1 can take, at the most,
an O(n 2 ) time. Thus, executing Stage 3 for a literal, which turns out to be an un-
successful attempt, takes an O(n 2 ) time. It is followed by an execution of Stage 2,
which again takes an O(n 2 ) time. So, finally a literal is eliminated in O(n 2 ) time. At
the worst, each literal is eliminated; thus the maximum time amounts to an O(n 3 ).
Therefore, 2S AT ∈ P.
We have thus proved the following:
There is another procedure to solve 2S AT; see Problem 10.72 at the end of this
section. In this respect, 3S AT is very different from 1S AT and 2S AT.
Proof. We use the technique of problem reduction. Let L ∈ NP. As there is already
a polynomial time reduction of L to S AT, due to Theorems 10.15 and 10.16, it is
enough to show that S AT ≤ P 3S AT. To this end, consider a cnf x. We construct a
3cnf y corresponding to this x such that x is satisfiable iff y is satisfiable. For this
construction, we consider any disjunctive clause C having more than three literals,
and then go for construction of a 3cnf C corresponding to C, so that C is satisfiable
iff C is. And finally, we just take conjunctions of all these 3cnfs C to obtain y.
Consider a clause p ∨ q ∨ r ∨ s having four literals. We introduce a new symbol
z and take z ↔ (r ∨ s) to be valid. Then
p∨q ∨r ∨s
≡ (( p ∨ q) ∨ z) ∧ (z ↔ (r ∨ s))
≡ ( p ∨ q ∨ z) ∧ (¬z ∨ r ∨ s) ∧ (z ∨ ¬r ) ∧ (z ∨ ¬s).
By induction, each clause can now be written as a cnf. However, there is a more
efficient way. We need only satisfiability to be preserved, not necessarily equivalence.
Let C = p1 ∨ p2 ∨ · · · ∨ pm be a clause, where m > 3. Construct
C = ( p1 ∨ p2 ∨ z1 ) ∧ ( p3 ∨ ¬z1 ∨ z2 ) ∧ ( p4 ∨ ¬z2 ∨ z3 )
∧ · · · ∧ ( pm−2 ∨ ¬zm−3 ∨ zm−2 ) ∧ ( pm−1 ∨ pm ∨ ¬zm−2 ),
where z1 , . . . , zm−2 are new propositional variables. We first show that C is satisfiable
iff the cnf C is satisfiable. To show this, take an interpretation i such that i (C ) = 1.
If i (C) = 0, then i ( p j ) = 0 for each j with 1 ≤ j ≤ m. Looking at individual clauses
10.8 Some NP-Complete Problems 363
of C , we find that i (z1 ) = 1, then i (z2 ) = 1. Continuing further, we see that i (z3 ) =
i (z4 ) = · · · i (zm−2 ) = 1. But then i of the last clause ( pm−1 ∨ pm ∨ ¬zm−2 ) in C is 0.
This contradicts i (C ) = 1. Therefore, i (C) = 1.
Conversely, suppose that i (C) = 1. As C contains new literals (the zi ’s), we con-
struct an extension of i , which we write as i again taking care of these zi ’s where
i ( p j )’s remain the same. As i (C) = 1, not all p j ’s are 0 under i. Let k be the smallest
index such that i ( pk ) = 1. Then, we set
It is easy to verify that under this (extended) interpretation i , each of the clauses in
C is evaluated to 1. Therefore, i (C ) = 1 as desired.
The reduction of S AT to 3S AT first replaces each clause C having more than three
literals in any cnf by the corresponding 3cnf C , thus obtaining a 3cnf y for the
cnf x. Notice that if C has m occurrences of literals, then C has at the most 3m
occurrences of literals. (Exactly how many?) Hence the length of the corresponding
3S AT instance y is at the most an O(n) for a cnf of length n. Thus, the reduction is a
polynomial time reduction. Therefore, S AT ≤ P 3S AT, and as S AT is NP-complete,
so is 3S AT.
Exercise 10.13. Show that the extended interpretation as defined in the proof of The-
orem 10.18 does satisfy the 3cnf C , if the original interpretation satisfies C.
Given an undirected graph and an integer k ≥ 2, does there exist a k-clique in the
graph?
Constructing edges between the k selected vertices from G requires an O(k 2 ) time.
Checking whether each such constructed edge is really an edge in G takes a lin-
ear time with respect to the number of edges in G. Hence, N is a polynomial time
nondeterministic decider for C LIQUE. Therefore, C LIQUE ∈ NP.
Towards a polynomial time reduction of 3S AT to C LIQUE, we give an algorithm to
construct an undirected graph G corresponding to a given 3cnf x having k disjunctive
clauses. As a clause of the form p can be rewritten as p ∨ p ∨ p, and one of the form
p ∨ q as p ∨ p ∨ q, assume that each clause in x has exactly three literals. That is, let
x = ( p1 ∨ q1 ∨ r1 ) ∧ ( p2 ∨ q2 ∨ r2 ) ∧ · · · ∧ ( pk ∨ qk ∨ rk ),
where pi , qi , ri are literals, not necessarily distinct. We construct the graph G having
3k vertices with labels as
p1 , q1 , r 1 , p2 , q2 , r 2 , . . . , pk , qk , r k ,
respectively. The labels are taken from their occurrences in the clauses directly.
We treat the 3k vertices into k groups, each group having three vertices labeled as
pi , qi , ri . The edges are constructed by joining each vertex to the other satisfying the
following restrictions:
1. There is no edge between vertices of equal index, that is, no edge between pi , qi ;
between qi , ri ; and between ri , pi . That means, there is no edge between the ver-
tices in the same group. There can be only edges from vertices in one group to
vertices in another group.
2. There is no edge between two vertices (belonging to different groups) if their
labels are complementary. For example, if the literal p1 equals p and the literal
q2 is ¬ p, then even though the vertices labeled p1 and q2 are members of different
groups, there cannot be an edge between them.
For an illustration, see Example 10.4 below. Then, come back to the proof.
We must show that x is satisfiable iff G has a k-clique. To this end, let f be an
interpretation such that f (x) = 1. Then in every clause, at least one literal is assigned
to 1 by f. We select the corresponding vertex from G. If more than one literal from a
clause is assigned to 1 by f , we choose one of them arbitrarily. These vertices form
a clique in G as there is an edge from one to the other, by our construction.
10.8 Some NP-Complete Problems 365
Conversely, suppose G has a k-clique. This clique does not contain vertices la-
beled with literals from the same group, that is, from the same clause of x, due to the
first restriction. Moreover, by the second restriction, no pair of vertices with labels
as complementary literals are connected by an edge. Thus, assigning all nodes in the
clique to 1 defines an interpretation of x, which also satisfies x.
It is easy to see that the construction of a graph from a 3cnf as described above
can be carried out in a polynomial time in the length of the 3cnf. This completes the
proof.
p1 p2 p2
1 2 3
¬p2 4 7 ¬p1
¬p2 5 8 ¬p3
p3 6 9 p4
Exercise 10.14. Show that the reduction of 3S AT to C LIQUE as given in the proof of
Theorem 10.19 is a polynomial time reduction.
In addition to C LIQUE, many more graph theoretic problems have been shown to
be NP-complete. Some of them are:
IS: In an undirected graph G, an independent set is a subset of the vertices such that
there is no edge between any pair of vertices in this subset. The independent subset
problem asks for the existence of an independent subset of size greater than a given
integer k > 1.
VC: In an undirected graph G, a vertex cover is a subset V of vertices of G, where
each edge of G is incident to some vertex in V. The vertex cover problem asks for a
vertex cover of size no more than a given integer k > 1.
HP: A Hamiltonian path in an undirected graph G is a path that goes through every
vertex exactly once. The Hamiltonian path problem asks for the existence of a Hamil-
tonian path joining two given vertices u, v in G.
The Hamiltonian cycle problem asks for the existence of a cycle (a path ending at
the starting vertex) that goes through each vertex exactly once in a given undirected
graph.
C N: Given an undirected graph G and an integer k ≥ 1, can G be colored with k
colors so that no adjacent vertices are of the same color? The acronym C N stands for
the chromatic number problem; it is also called the vertex colouring problem.
Some other interesting NP-complete problems are:
EC: Given a finite set A and a family F of subsets of A, an exact cover is a subfamily
S of F such that the sets in S are pairwise disjoint and have A as their union. The
exact cover problem asks for the existence of an exact cover, given the set A and the
family F.
S UB S UM: Given a finite multiset A of natural numbers, and a target number t, all
represented in binary, the subset sum problem asks for the existence of a sub-multiset
B of A such that the numbers in B add up to t.
K NAPSACK: For a finite set A, two functions s, p : A → Z+ , and two
positive inte-
gers S and P, a knapsack is a subset B of A such that s(x) ≤ S and p(x) ≥ P.
x∈B x∈B
The knapsack problem asks for the existence of a knapsack B given A, s, p, S, and P.
PARTITION: Given a finite set A of natural numbers, all represented in binary, the
partition problem asks for the existence of a subset B of A
such that the
numbers in
B add up to the same as in its complement A − B, that is, ai ∈B ai = ai ∈ B ai .
T SP: Given a set {c1 , c2 , . . . , cm } of m ≥ 2 cities and an m × m matrix of natural
numbers di j of distances between cities ci and c j (thus dii = 0 and di j = d j i > 0
for i j ), the traveling salesperson problem asks for the shortest tour of the cities,
that is, a bijection π on {1, 2, . . . , m}, π(i ) being the city visited as i th in the tour,
so that the cost c(π) = dπ (1)π (2) + dπ (2)π (3) + · · · + dπ (m)π (1) of the tour is a minimum.
10.8 Some NP-Complete Problems 367
p to ¬ p, and also there is a path from ¬ p to p. Next, show that determining whether
such a vertex p exists in G x takes polynomial time.
10.73. Consider the language PATH= {ψ(G)# ψ(u)# ψ(v) : G is an undirected graph
in which there is a path from the vertex u to the vertex v}. Show the following:
(a) The space complexity of PATH is O((log n)2 ).
(b) PATH is in NS(log n).
(c) A DTM that enumerates all possible paths in G for deciding PATH has time com-
plexity O(2n ), where n is the number of vertices in G.
10.76. The problem 4 TA is: Given a proposition, does it have at least four models?
(Models are the satisfying truth assignments.) Show that 4 TA is NP-complete.
10.77. Using a complementary graph of an undirected graph as the one with the same
vertex set but having all and only those edges which are not in the original graph,
show that the problem of determining an independent set can be reduced to the prob-
lem of determining whether there is a clique. With ρ(k) as the binary representation
of k, define
L = {ψ(G)# ρ(k) : the undirected graph G has an independent set of size at least k}.
L = {ψ(G)# ρ(k) : the undirected graph G has a clique of size at least k}.
Give a polynomial time and log space reduction of L to L .
10.78. Here is the description of an algorithm to solve the partition problem: Let
k be the sum of all integers in the set A = {a1 , . . . , an } divided by 2. If k is not
an integer, then there is no B. Otherwise, for each i, 0 ≤ i ≤ n, define the sets:
B(i ) = {b ≤ k : b is the sum of numbers in some subset of {a1 , . . . , ai }}. If B(n)
is known, then testing whether k ∈ B(n) solves the problem. B(n) is computed as
follows:
Initialize B(0) := {0};
for i = 1 to n, do
B(i ) := B(i − 1);
for j = ai to k, do
if j − ai ∈ B(i − 1) then B(i ) := B(i ) ∪ { j }
od
od
10.9 Dealing with NP-Complete Problems 369
Prove that this algorithm solves the partition problem correctly in time O(nk). Had
the numbers in A been given in unary notation, this algorithm could have solved
the partition problem in a polynomial time. Argue why this is not a polynomial time
algorithm.
10.80. The longest path problem is: Given an undirected graph G and a positive
integer k, does G contain a simple path of length at least k? Show that there is a
polynomial time and log space reduction of the Hamiltonian path problem to the
longest path problem.
Read the T SP as formulated in the last section once again. It asks for a shortest tour
and not mere existence of a shortest tour. Existence of a shortest tour is anyway
guaranteed, be it unique or not. Associated with this problem are two others, where
we might ask for the cost of a shortest tour, or we ask for a tour whose cost is no more
than a given positive integer k. Such problems are called optimization problems. An
optimization problem always comes with a cost function. For example, M AX S AT is
an optimization problem where the cost is the number of clauses that are satisfiable,
and we want this cost to be maximized. That is, given an instance of the problem, a
set of disjunctive clauses, we want a subset of maximum size such that all clauses in
the subset can be satisfied by some interpretation simultaneously. In general, there
are three kinds of problems associated with an optimization problem. They are the
following:
Here, “no harder to solve than,” in fact, means “modulo polynomial time.” For
example, the first arrow above says that if we have a solution for NPOD, then with
perhaps an additional polynomial time, we will be able to solve NPOE.
It is an open problem whether NPOC −→ NPOE, that is, whether there exists
an optimization problem whose construction version is strictly harder to solve than
10.9 Dealing with NP-Complete Problems 371
the evaluation version. A partial answer to this question is available. It says that if
NPOD is NP-complete, then NPOC −→ NPOE. As a consequence, we see that if
NPOD is NP-complete, then so are NPOE and NPOC. In the summary section to
this chapter, you will find pointers to references for similar results, and also for an
almost complete list of interesting optimization problems.
So you see, there are many interesting problems that are NP-hard. For these prob-
lems we do not yet have a polynomial time algorithm. However, the problems are of
such practical importance that we need to solve them; question is how? The obvious
choice is to use an algorithm any way, even if it takes more than a polynomial time.
The risk is that we may not get a solution during our life time!
One common approach is to look for special subclasses of such a problem; look
for patterns, etc. For example, if a graph happens to be a tree, then most NP-complete
problems in this special case can be solved in polynomial time. The vertex cover
problem for perfect graphs is in P. Depending on your intuition on the problem at
hand, you may look for special graphs such as bipartite graphs or regular graphs, etc.
Another approach is to use a near optimal algorithm. This approach suits best
for the optimization problems such as M AX S AT or T SP, where all that we require
is an approximate solution to the problem, and the time to get such an approximate
solution is comparably affordable. When an NP-complete problem admits of such
an approximation, we would focus on a polynomial time approximate algorithm. In
view of this, the problems can be seen to be of three types.
The first kind consists of those problems that do not admit of a polynomial time
algorithm for an approximate solution.
The second type has those problems that admit of approximate solutions but the
difference between an approximate solution and the optimum solution is always big-
ger than some quantity. Instead of the difference between an optimal solution and
an approximate solution, one often considers the ratio. This approach is pursued for
optimization problems, where some sort of cost is involved. In such a case, we say
that an approximate algorithm (for a problem) is k-optimal if it always finds an ap-
proximate solution that does not produce a cost more than k times that of the optimal
solution.
And the problems of the third type admit of approximate solutions with the differ-
ence between an approximate solution and an optimum solution being controllable.
It is the third type of problems for which this approach of computing an approximate
solution is truly applicable.
Another approach for NP-complete problems is to follow good algorithmic prac-
tices such as branch-and-bound, backtracking, and divide-and-conquer for construct-
ing algorithms that may be exponential in the worst case, but work well in many
practical instances. For example, Davis–Putnam algorithm for S AT uses backtrack-
ing, and there are many branch-and-bound algorithms for solving C LIQUE, divide-
and-conquer algorithms for T SP.
Especially for optimization problems, a common approach is to use the so-called
local improvements. In this strategy, we obtain an approximate solution by following
another method, or just guess it. Next, we change some values of some of the pa-
rameters slightly. If the new solution is an improvement over the old, then we adopt
it and try to improve further by changing the same parameter. Else, we use the old
approximation, and change values of another parameter, and repeat the process. For
372 10 Computational Complexity
example, the sophisticated and the best known heuristic algorithm of Lin–Kernighan
for TSP uses local improvement.
A twist on local improvements is the so-called method of simulated annealing.
Here, we allow occasional changes in the opposite direction towards an optimal so-
lution, as it often happens in the physics of cooling solids. The genetic algorithms,
neural networks, and some other related methods follow this approach. It has been
found that for many practical problems, these biology-motivated heuristic algorithms
often perform better than other algorithms. In contrast, these methods are exponen-
tial, in the worst case, and that to, they give only an approximate solution. But they
work! And why do they work is perhaps worth pursuing.
10.81. Formulate the construction problem, the evaluation problem, and the decision
problem corresponding to M AX S AT.
10.83. The dominating set problem is stated as: Given a directed graph G and an
integer k, does there exist a subset S of k vertices from G such that for each vertex
u of G not in S, there is a vertex v ∈ S so that G has an edge from v to u? This
problem is NP-complete. Show that there is a polynomial time algorithm for solving
the dominating set problem when G is a tree.
In this chapter, we have tried to measure the performance of algorithms (total TMs)
in solving problems with respect to how much space or time they require. To be on
the safe side, we considered the worst-case performance only. Moreover, our concern
was further restricted to analyzing the performance with respect to the length of an
input string. We have introduced the order notations and then classified the solvable
problems according to the resources such as space and time that they need.
In discussing space complexity, we have seen that certain problems can be solved
in logarithmic space and certain others cannot be. Though the problem whether LS =
NLS remains open, we argued that if any one of certain problems can be shown to
be in L S, then LS = NLS would be proved. Such problems were named as NLS-
complete problems.
Similar to space complexity, we have come across the NP-complete problems
in discussing time complexity. If we would be able to show any one of the NP-
complete problems to be in P, then NP would become equal to P. The analogy
between space complexity and time complexity breaks down when we consider the
complementary problems. We have seen that CoNLS = NLS, but we do not know
10.10 Summary and Additional Problems 373
whether CoNP = NP. We had shown that the satisfiability of boolean formulas is
an NP-complete problem, and then mentioned many other such problems. Similar to
map reductions in solvability, log space reductions and polynomial time reductions
were the main tools we used in proving NLS-hardness and NP-hardness.
Though Hopcroft and Ullman [58] attribute the original idea of the CYK algo-
rithm to J. Cocke, the algorithm first appeared in Kasami [63] and independently, in
Younger [135]. Improvements over the CYK algorithm for context-free recognition
can be found in [28, 133].
For subrecursive concepts about hardness, completeness, and arithmetic hierarchy,
see [23, 62, 76, 128]. The texts [20, 36, 53, 72, 92, 98] provide good introduction
to the theory of complexity of algorithms along with proofs that many interesting
problems are NP-complete. Many ways of tackling NP-complete problems are the
major themes addressed in [55, 99, 108]. You may like to see [5] and the references
there for approximate algorithms of optimization problems. You may also like to
see the forthcoming book by S. Arora and B. Barak on Computational Complexity, a
draft of which is available on the net.
In some of the exercises, you will find mention of information theoretic complex-
ity theory. This exciting notion resulted from the works of Shannon and Chaitin; see
[13] and the references therein. This also includes Chaitin’s theorem. For Parikh’s
sentence mentioned in Problem 10.146 below, see [101]. The fact that primality test-
ing can be done in polynomial time is in [3]. We have not discussed an important
issue, the complexity of boolean circuits. You can see [46, 122] for an introduction
to the topic.
10.84. Show that any computation that takes O(T (n)) time on a single-tape off-line
TM can be performed on a standard TM in time O(T (n)).
10.85. Show that any computation that can be performed on a standard TM in time
O(T (n)) can also be performed in time O(T (n)) on a TM with a left-end on its tape.
10.86. Write ψ(G) for an encoding of a CFG as a binary string, where we also denote
by ψ(X) the binary encoding of any symbol X involved in G. Define two languages
as in the following:
L = {ψ(G 1 )# ψ(G 2 ) : G 1 , G 2 are CFGs with L(G 1 ) = L(G 2 )}.
L = {ψ(G)# ψ( A)# ψ(B) : G is a CFG and A, B are nonterminals of G, and
the terminal strings derived from A and B are the same}.
Answer the following with justification:
(a) Is L polynomial time reducible to L?
(b) Is L polynomial time reducible to L ?
(c) What about the NP-completeness of L and that of L ?
10.87. Prove that the state minimization algorithm for DFAs as given below runs on
polynomial time. (Is it the same as we have described in Sect. 4.5?)
374 10 Computational Complexity
10.88. The CYK algorithm for checking membership of a string in a CFL depends
upon the following considerations:
You have a CFG G in CNF. You also have a string of terminals w = a1 a2 · · · an .
Define substrings wi, j = ai · · · a j and subsets Ni, j = {A : A is a nonterminal with
A ⇒ wi, j }. Show that
(a) A ∈ Ni,i iff A → a is a production of G. Thus, Ni,i can be computed by looking
at w and the productions of G.
(b) For j > i, A ⇒ wi, j iff there is a production A → BC, and B ⇒ wi,k , C ⇒
wk+1, j , for some k with i ≤ k < j.
(c) Ni, j = ∪i≤k< j {A : A → BC, for some B ∈ Ni,k , C ∈ Nk+1, j }.
(d) The subsets N1,1 , . . . , Nn,n , N1,2 , . . . , Nn−1,n , N1,3 , . . . , Nn−2,n , . . . , N1,n can
thus be computed.
(e) w ∈ L(G) iff S ∈ N1,n .
Write the CYK algorithm. It is so named after the three mathematicians who devised
it; J. Cocke, D.H. Younger, and T. Kasami. It is also called CKY algorithm.
10.89. Apply CYK algorithm to determine whether the string aabbb is generated by
the CFG with productions S → AB, A → a|BB, B → b|AB. What are the intermedi-
ary subsets Ni, j for 1 ≤ i ≤ j ≤ 5?
10.90. Show that the CYK algorithm takes an O(n 3 ) time, where n =
(w).
10.91. Show how the CYK algorithm can be converted to a parsing method by keep-
ing track of how the subsets Ni, j are computed. Then, find a parsing of aab using the
productions S → AB, A → a|BB, B → b|AB.
10.92. Modify the CYK algorithm to output the number of distinct parse trees for
the given input, rather than just reporting membership in the language.
derivation of the string. Show that if the CFG contains no ε-production, then this
is an O(n) time nondeterministic algorithm. What happens if the CFG contains an
ε-production?
10.94. It is obvious now that each regular language is in DT(n); each CFL is in
DT(n 3 ), and that each CFL is in NT(n). Show that if L is a context-sensitive lan-
guage, then there is an m such that L ∈ DT(n m ). Does it follow that there is m such
that each context-sensitive language is in DT(n m )?
k
10.95. Show that there are languages that are not in NT(O(2n )), for any k ∈ N. Is
any such language decidable?
10.96. A disjunctive clause is called a Horn clause if at most one propositional vari-
able in it is unnegated. Show that S AT restricted to horn clauses can be solved in
polynomial time.
10.98. Given an undirected graph G, show that the problem of “determining whether
G is a tree” is in DS(log n).
10.101. Show that the problem of determining whether two regular expressions
represent the same language is in DPS.
10.102. An undirected graph is bipartite if its vertices can be partitioned into two
sets so that each edge joins some vertex in one set to some vertex in the other. Show
that the problem of “determining whether a given undirected graph is bipartite” is in
NLS.
10.103. A directed graph is strongly connected if from any vertex to any other vertex
there is a directed path in the graph. Show that the problem of determining whether
a directed graph is strongly connected is NLS-complete.
10.104. For a cnf with c clauses and m propositional variables, show that an NFA
with number of states of the order O(cm) can be constructed in polynomial time that
accepts all nonsatisfying interpretations, represented as binary strings of length m.
Conclude that minimization of NFAs cannot be realized in polynomial time unless
P = NP.
376 10 Computational Complexity
10.105. Using diagonalization, show that the following languages are NP-complete:
(a) {ψ(M)# ψ(w)# 0k : M is an NTM that accepts w in k or fewer steps}.
(b) {ψ(M)# ψ(w) : M is an NTM that accepts w in 3
(w) or fewer steps}.
10.110. In 3C OLOR problem, we are given an undirected graph, and are asked to
determine whether its vertices can be colored with three colors such that no two
adjacent vertices have the same color. Show that 3C OLOR is NP-complete.
10.111. Let US UB S UM be the subset sum problem when the numbers in the set A
are given in unary notation instead of binary. Show that US UBSUM is in P.
10.112. Define the directed Hamiltonian path problem as “Given a directed graph,
does there exist a directed Hamiltonian path in the graph?”. Show that there is a
log space reduction of Hamiltonian path problem to the directed Hamiltonian path
problem.
10.113. Define the fixed vertices Hamiltonian path problem as “Given an undirected
graph and two vertices in it, does there exist a Hamiltonian path joining the two
vertices?”. Show that there is a polynomial time reduction of the Hamiltonian path
problem to the fixed vertices Hamiltonian path problem.
10.118. Let f (n) be a computable total function. Prove the following hierarchy the-
orems:
(a) There is a decidable language L not in DS( f (n)).
(b) There is a decidable language L not in DT( f (n)).
(c) There is a computable total function g(n) such that DS( f (n)) DS(g(n)).
(d) There is a computable total function g(n) such that DT( f (n)) DT(g(n)).
(e) If f (n) ≥ log n and L is accepted by an f (n) space bounded TM, then L is
accepted by an f (n) space bounded total TM.
(f) Suppose there is a TM that uses g(n) squares for some input of length n, whatever
be this n. If f (n) ≥ log n, g(n) ≥ log n, and limn→∞ ( f (n)/g(n)) = 0, then there
is a language in DS(g(n)) but not in DS( f (n)).
(g) Suppose there is a TM that takes exactly g(n) time on each input of length n,
whatever be this n. If limn→∞ [ f (n) log( f (n))/g(n)] = 0, then there is a language
in DT( f (n)) but not in DT(g(n)).
(h) If L ∈ DT( f (n)), then L ∈ DS( f (n)).
(i) Suppose there is a TM that takes f (n) time on each input of length n. Then
DT( f (n) log f (n)) ⊆ DS( f (n)).
(j) If L ∈ DS( f (n)) and f (n) ≥ log n, then there is a constant c > 0 (possibly
depending on L) such that L ∈ DT(c f (n) ).
378 10 Computational Complexity
(k) If L ∈ NT( f (n)), then there is a constant c > 0 (possibly depending on L) such
that L ∈ DT(c f (n) ).
(l) Suppose there are TMs M, M such that on any input of length n, M uses exactly
g(n) squares and M uses exactly h(n) squares. If f (n) ≥ n and h(n) ≥ n, then
DS(g(n)) ⊆ DS(h(n)) implies DS(g( f (n))) ⊆ DS(h( f (n))).
(m) Suppose there are TMs M, M such that on any input of length n, M uses exactly
g(n) squares and M uses exactly h(n) squares. If f (n) ≥ n and h(n) ≥ n, then
NS(g(n)) ⊆ NS(h(n)) implies NS(g( f (n))) ⊆ NS(h( f (n))).
(n) Suppose there are TMs M, M such that on any input of length n, M runs exactly
for g(n) time and M runs exactly for h(n) time. If f (n) ≥ n and h(n) ≥ n, then
DT(g(n)) ⊆ DT(h(n)) implies DT(g( f (n))) ⊆ DT(h( f (n))).
(o) Suppose there are TMs M, M such that on any input of length n, M runs exactly
for g(n) time and M runs exactly for h(n) time. If f (n) ≥ n and h(n) ≥ n, then
NT(g(n)) ⊆ NT(h(n)) implies NT(g( f (n))) ⊆ NT(h( f (n))).
(p) Let r ≥ 0 and c > 0. Then NS(nr ) NS(nr+c ).
10.119. Let s, t : N → N be functions such that s(n) ≥ log n and t(n) ≥ n log n. We
say that s(n) is space constructable if the function that maps 0n+1 to the binary repre-
sentation of s(n) is computable in O(s(n)) space. Similarly, t(n) is time constructable
if the function that maps 0n+1 to the binary representation of t(n) is computable in
O(t(n)) time. Prove the hierarchy theorems as summarized in the following:
(a) Space hierarchy theorem: Let s(n) be a space constructable function. There is a
language L such that L is decidable in O(s(n)) space but not in o(s(n)) space.
(b) Time hierarchy theorem: Let t(n) be a time constructable function. There exists
a language L such that L is decidable in O(t(n)) time but not in o(t(n)/ log t(n))
time.
10.121. Borodin’s gap theorem: For each computable total function f (n) ≥ n, there
is a computable function s(n) such that DS(s(n)) = DS(g(s(n))). Along with this
formulate and prove the other three gap theorems replacing DS by NS, DT, and NT.
10.123. Let f (n) be a computable total function. Show that there is a monotonically
nondecreasing function g(n) such that g(n) ≥ f (n), g(n) ≥ n 2 , and there is a TM
that uses exactly g(n) squares for each input of length n.
10.124. Prove Blum’s speed-up theorem: Let f (n) be a computable total function.
There exists a decidable language L such that for any TM accepting L in space s(n),
10.10 Summary and Additional Problems 379
there is a TM that accepts L in space s (n) so that f (s (n)) ≤ s(n) for all but a finite
number of n’s.
10.125. Prove the Honesty theorem for space: There exists a computable total func-
tion f (n) such that for any space complexity class C, there is a function s(n) with
DS(s(n)) = C, and s(n) is computable in f (s(n)) space.
10.126. The bounded tiling problem asks whether there exists a tiling of an s × s
square, given a bounded tiling system with a positive integer s. A bounded tiling
system prescribes the first row of tiles instead of the origin tile. It is formalized as
follows. A bounded tiling system is a quintuple B = (T, H, V, s, f0 ), where s ∈ Z+
and f0 : {0, 1, . . . , s −1} → T is a function, and T is a finite set of tiles. The problem
asks whether there exists a function f : {0, 1, . . . , s − 1} × {0, 1, . . . , s − 1} → T
such that
f (m, 0) = f0 (m) for all m < s;
( f (m, n), f (m + 1, n)) ∈ H for all m < s − 1, n < s;
( f (m, n), f (m, n + 1)) ∈ V for all m < s, n < s − 1.
Such a tiling extends the function f 0 to f. Show that the bounded tiling problem is
NP-complete.
10.127. Consider the binary bounded tiling problem, where in the bounded tiling
problem we are not given the first row of tiles, but only an origin tile d0 , as in the
original tiling problem. Further, the size s of the square to be tiled is given in binary.
Prove the following:
(a) There is a reduction from the language {ψ(M) : the TM M halts on the empty
string within 2
(ψ(M)) steps} to the (language of) binary bounded tiling problem.
(b) The binary bounded tiling problem is not in P.
k
(c) Let N E XP be the class of all languages decided by NTMs in time 2n for some
k > 0. Then the binary bounded tiling problem is in N E XP.
(d) All languages in N E XP are polynomial time reducible to the binary bounded
tiling problem. That is, the binary bounded tiling problem is N E XP-complete.
10.128. Show that M AX 2S AT, that is, “Given a set of disjunctive clauses with at most
two literals each, and an integer k, does there exist an interpretation that satisfies at
least k of the clauses?” is NP-complete.
10.129. Recall that the growth rate of the function n log n lies strictly between that of
the polynomials and the exponentials. Suppose that there is an NP-complete problem
that has a solution that takes an O(n log n ) time by a DTM. What could you say about
the running time of any problem in NP?
10.130. Describe complements of each of the problems below. Decide whether the
problem is in NP or in CoNP. If the problem or its complement is NP-complete,
then supply a proof.
(a) T RUE S AT: Given a proposition that is satisfied by the interpretation that assigns
each variable to 1, does there exist another model of the proposition?
380 10 Computational Complexity
(b) FALSE S AT: Given a proposition that is falsified by the interpretation that assigns
each variable to 0, does there exist another interpretation that falsifies the
proposition?
(c) D OUBLE S AT: Given a proposition, do there exist at least two interpretations that
satisfy it?
(d) N EARVAL: Given a proposition, whether there is at most one interpretation that
falsifies it?
10.131. Suppose that there is a bijection f on the set of n-bit integers such that f (x)
can be computed in polynomial time, while f −1 (x) cannot be computed in polyno-
mial time. Show that the language {(x, y) : f −1 (x) < y} is in (NP ∩ CoNP) − P.
10.135. Let L CFE = {ψ(G) : G is a CFG and L(G) = ∅.} Prove that every language
in P is log space reducible to L CFE . This says that the emptiness problem for CFGs
is complete for P with respect to log space reduction. Note that we do not yet have a
deterministic algorithm to recognize L CFE in (log n)k space.
10.136. Prove that the context-sensitive languages occupy the bottom of DPS. That
is, if L CS = {ψ(G)# ψ(w) : G is a context-sensitive grammar and w is a string over
the terminal alphabet of G}, then show that L CS is in NS(O(n)) ∩ DS(O(n 2 )).
10.138. Given an undirected graph G with two vertices s and t each of degree 1.
There are two players called S and C, for Short, and Cut, respectively. Each selects
a different vertex other than s, t, in each move alternately, which then belongs to the
player for the rest of the game. S starts the game and wins if he is able to select a set
of vertices that make up for a path between s and t. C wins if all the vertices have
10.10 Summary and Additional Problems 381
been selected but S could not win. The Shannon Switching Game problem is: Given
an undirected graph G, can S win no matter what choices C makes? Show that this
problem is DPS-complete.
10.139. Recall that propositions with only connectives as ¬, ∨, and ∧ are called
boolean formulas. An expression of the form Q 1 x 1 Q 2 x 2 · · · Q n x n E is called a quan-
tified boolean formula when E is a boolean formula with the propositional variables
x 1 , . . . , x n , and any Q i is either ∀ or ∃. When Q i x i occurs in a quantified boolean for-
mula F, we say that in F, the variable x i has been quantified. Further, in a quantified
boolean formula each variable is quantified exactly once. (In fact, we are considering
a restricted class of quantified boolean formulas here.) The meanings of the quan-
tifiers ∀ and ∃ are as follows. The expression ∀x E means that E is evaluated to 1
(true) when all occurrences of x are assigned to 1, and also, when all occurrences of
x are assigned to 0 (false). Similarly, the expression ∃x E means that E is true when
all occurrences of x are assigned to 1, or when all occurrences of x are assigned to
0. Thus the meaning of a quantified boolean formula is a statement about when the
unquantified expression in it is true or not. We say that the quantified boolean for-
mula is true or false, according as the statement that expresses its meaning is true or
false. The quantified boolean formula problem, denoted by Q BF, is: Given a quanti-
fied boolean formula is it true? Prove the following:
(a) Q BF is in DPS.
(b) Q BF is DPS-complete.
10.140. Prove that the following problems are NP-complete by using the suggested
reductions:
(a) Reduce S AT to M AX S AT.
(b) Reduce S AT to I S, the independent set problem.
(c) Reduce 3S AT to DS AT, also called the double-sat, the problem of determining
whether a given cnf has at least two models.
(d) Reduce 3S AT to C N, the chromatic number problem.
(e) Reduce 3S AT to S UB S UM, the subset sum problem.
(f) Reduce I S to C LIQUE.
(g) Reduce I S to V C, the vertex cover problem.
(h) Reduce V C to V CWT, the vertex cover problem for graphs without triangles.
(i) Reduce V CWT to C C, the clique cover problem: Given an undirected graph G and
an integer k ≥ 1, does there exist k cliques in G such that each vertex of G is in
at least one of the k cliques?
(j) Reduce S AT to the problem of inequivalence of ∗-free regular expressions, which
asks for determining whether two given ∗-free regular expressions represent two
different languages.
(k) Reduce S AT to E C, the exact cover problem.
(l) Reduce E C to D HP, the Hamiltonian path problem in directed graphs.
(m) Reduce E C to K NAPSACK.
(n) Reduce K NAPSACK to PARTITION.
(o) Reduce D HP to H P, the Hamiltonian path problem in undirected graphs.
382 10 Computational Complexity
10.141. Prove that the Hamiltonian path problem (H P) is NP-complete even when
restricted to planar graphs. Note that NP-completeness of H P also holds for four-
connected graphs.
10.142. Consider the traveling salesperson problem, where the distances satisfy the
triangle inequality, that is, di j + d j k ≥ dik , for all possible cities i, j, k. Give a poly-
nomial time algorithm to find a tour whose cost is within twice the minimum cost.
10.143. Prove that if there exists a polynomial time algorithm to find a tour whose
cost is within twice the minimum cost, in an arbitrary T SP, then P = NP.
10.144. Let DP = {L − L : L, L ∈ NP}. This is the first language class in the so-
called called difference hierarchy. A DP-complete language is a language in DP to
which every other language in DP is polynomial time reducible. Let ρ( j ) denote the
binary representation of j ∈ N. Prove that the following languages are DP-complete:
(a) {ψ(G)# ρ(m)# ψ(G )# ρ(n) : G has an m-clique and G has no n-clique}.
(b) {ψ(G)# ρ(k) : the largest clique in G is of size k}.
Notice that the enumeration used in C is not done at a time as there are infinitely
many algorithms. The enumeration gives one algorithm, the next steps in C are per-
formed, and then the enumeration gives the next algorithm, and so on. You can try to
give an informal proof of the first version of Chaitin’s theorem using a theorem gen-
erator instead of an algorithm enumerator. However, this informal proof is analogous
to proving Gödel’s incompleteness theorem by using the sentence “This sentence is
not provable,” or using the sentence “This sentence has no short proof ” to show that
there exists a sentence in the system of natural numbers having a long proof but the
fact that it is provable has a short proof (Parikh’s sentence).
Answers and Hints
to Selected Problems
Problems of Chapter 1
1.2 For each a ∈ A, there may not be one b ∈ A such that aRb. For example, the
relation {(b, b)} on the set {a, b}.
2 2 2
1.4 R ⊆ A2 . So, R ∈ 2 A . Number of such R is 2n . Reflexive R’s are 2n −n
in
number. And 2n(n+1)/2 symmetric R’s.
1.5 x R −1 y iff y Rx. Hence they have the same equivalence classes.
1.6 Take f (x) as the number of elements in x.
1.7 B = {[x] : x ∈ A}. Define f : A → B by f (x) = [x].
1.9 ∪A = ∅. Use De Morgan’s law to get ∩A = A.
1.12 f (x) = f (y) implies x = g( f (x)) = g( f (y)) = y. Hence f is injective. Now,
f (g(y)) = y shows that f (x) = y for x = g(y). Hence f is surjective.
1.13 (d) g ◦ f injective implies f is injective; g need not be injective.
(e) g ◦ f surjective implies g is surjective; f need not be surjective.
1.15 Use Problem 1.12.
1.24 Suppose the collection of all sets is a set; denote it by S. Each element of 2 S is
a set, and hence is in S. That is, 2 S ⊆ S. Then |2 S | ≤ |S|. This contradicts Cantor’s
theorem.
1.28 Let Pn be the set of all polynomials of degree n with rational coefficients.
Define f : Pn → Qn+1 by f (a0 + a1 x + · · · + an x n ) = (a0 , a1 , . . . , an ). This f is
injective. Hence Pn is countable. The set of all polynomials with rational coefficients
is ∪n∈Z+ Pn , and hence, is countable. Each such polynomial is in some Pn and has n
roots. Therefore, all roots of such polynomials are countable in number.
1.33 ∪r<1 {x ∈ R : a − r ≤ x ≤ a + r } = {x ∈ R : a − 1 < x < a + 1} and
∩r>0 {x ∈ R : a − r < x < a + r } = {a}.
1.36 Yes.
385
386 Answers and Hints to Selected Problems
1.39 Let the n + 2 positive integers be x 1 > x 2 > . . . > x n+2 . Consider the set
A = {x 1 + x k , x 1 − x k : k = 2, 3, . . . , n + 2}. By the pigeon hole principle, at least two
of them have the same remainder when divided by 2n. If they are yi , y j ∈ A, then
consider the difference y j − yi . This is in the form ±(x i ± x j ) and is divisible by 2n.
1.41 Write the n + 1 numbers in the form x k = 2qk yk , where yk is odd. How many
yk ’s are there at the most?
1.42 For x ∈√R, let √ 'x( denote
√ the largest
√ integer less √ to x. Consider
than or equal √
the numbers 2 − ' 2(, 2 2 − '2 2(, . . . , (n + 1) 2 − '(n + 1) 2(. All these
are between 0 and 1. Divide the 0–1 interval into n subintervals each of length 1/n.
By pigeon hole principle, at least two√of these numbers lie in the same subinterval.
Their difference is of the form m + n 2 and it lies between 0 and 1/n.
Problems of Chapter 2
Section 2.2
2.4 bbbbaa, baaaaabaaaab ∈ L ∗ .
2.7 (d) Yes. (i) Yes. (j) No.
2.13 (c) No; e.g., (L ∗ )∗ = L ∗ L, in general. (d) Yes, {ε}. (e) Yes.
Section 2.3
2.17 (a) a(aa)∗. (b) (aa)∗ ∪ (aaa)∗ ∪ (aaaaa). (c) aa(aaa)∗.
2.18 a ∗ b(a ∪ b)∗ .
2.19 (c) (0 ∪ 1)∗ 00(0 ∪ 1)∗ . (d) (1 ∪ 01)∗ (0 ∪ ε). (e) 1∗ ∪ (1 ∪ 011)∗01∗ .
(g) 00000∗(ε ∪ 1 ∪ 11 ∪ 111). (i) (ε ∪ 0 ∪ 00 ∪ 03 )1∗ ∪ 0∗ 14 1∗ ∪ (0 ∪ 1)∗ 10(0 ∪ 1)∗ .
(l) Split into three cases: m ≥ 1, n ≥ 2; m ≥ 2, n ≥ 2; and m ≥ 3, n = 1.
(o) Enumerate all cases with
(u) = 2. (s) (1∗ 01∗ 01∗ )∗ ∪ 1∗ .
2.20 (a) {a, aa, aaa, . . . , b, ab, aab, . . .}. (b) {a, bb, aa, abb, ba, bbb, . . .}.
(c) {u00v : u, v ∈ {0, 1}∗ }. (d) {a 2m b2n+1 : m, n ∈ N}.
(e) All strings having no pair of consecutive zeros.
2.21 (f) Generate all permutations of the three symbols, giving six terms that can be
∪-ed together; e.g., one of these six is (a ∪b∪c)∗ a(a ∪b∪c)∗ b(a ∪b∪c)∗ c(a ∪b∪c)∗ .
(g) ((a ∪ b ∪ c)3 )∗ .
2.22 (c) Yes.
2.24 E has a starred sub-expression.
Section 2.4
2.29 (a) S → abS|ε. (b) S → aa S|aa. (c) S → a|a S|bS. (d) S → b A, A →
a A|b A|ε.
Answers and Hints to Selected Problems 387
Section 2.5
2.34 The DFA should accept Σ ∗ abcΣ ∗ .
2.36 Initial state s, F = {t}, δ(s, a) = p, δ(s, b) = u, δ( p, a) = q, δ( p, b) = p,
δ(q, a) = r, δ(q, b) = p, δ(t, a) = t, δ(t, b) = r, δ(u, a) = q5 , δ(u, b) = u.
2.37 (c) Initial state q0 , F = {q8 }, and Δ has the transitions:
(q0 , a, q1 ), (q0 , b, q9 ), (qi , b, qi+1 ), for i = 1 (1) 7; (qi , a, q9 ), for i = 1 (1) 5;
(q6 , a, q6 ), (q7 , a, q6 ), (q6 , b, q7 ), (q8 , b, q8 ), (q8 , a, q6 ), (q9 , a, q9 ), (q9 , b, q9 ).
(e) Initial and final state q0 , and Δ has the transitions:
(q0 , a, q1 ), (q0 , b, q1 ), (q0 , a, q2 ), (q0 , b, q2 ), (q2 , a, q0 ), (q2 , b, q0 ).
(h) Initial state A, F = {D, G, H } and the transitions are:
δ( A, a) = D, δ( A, b) = B, δ(B, a) = E, δ(B, b) = C, δ(C, a) = F, δ(C, b) = A,
δ(D, a) = G, δ(D, b) = E, δ(E, a) = H, δ(E, b) = F, δ(F, a) = I, δ(F, b) = D,
δ(G, a) = A, δ(G, b) = H, δ(H, a) = B, δ(H, b) = I, δ(I, a) = C, δ(I, b) = G.
2.38 (a) Initial state q0 , F = {q0 , q3 }, and the transitions are:
δ(q0 , 0) = q1 , δ(q0 , 1) = q0 , δ(q1 , 0) = q2 , δ(q1 , 1) = q0 , δ(q2 , 0) = q4 ,
δ(q2 , 1) = q3 , δ(q3 , 0) = q1 , δ(q3 , 1) = q0 , δ(q4 , 0) = q4 , δ(q4 , 1) = q4 .
Section 2.6
2.46 (c) Initial state q0 , F = {q0 , q1 , q2 , q3 }, and the transitions are:
δ(qi , a) = qi+1 , δ(qi , b) = qi , for i = 0, 1, 2, 3, and δ(q4 , a) = δ(q4 , b) = q4 .
(d) Initial state q0 , F = {q2 }, δ(q0 , a) = q1 , δ(q0 , b) = q3 , δ(q1 , a) = q3 ,
δ(q1 , b) = q2 , δ(q2 , a) = q2 , δ(q2 , b) = q2 , δ(q3 , a) = q3 , δ(q3 , b) = q3 .
(e) Initial state p, F = { p, q, r }, δ( p, a) = q, δ( p, b) = p, δ(q, a) = r,
δ(q, b) = p, δ(r, a) = r, δ(r, b) = t, δ(t, a) = t, δ(t, b) = t.
(m) Initial state s, F = {r }, δ(s, a) = r, δ(s, b) = p, δ( p, a) = q, δ( p, b) = s,
δ(q, a) = p, δ(q, b) = r, δ(r, a) = s, δ(r, b) = q.
(n) Initial and final state s, Δ = {(s, a, r ), (s, b, p), ( p, a, q), ( p, b, s), (q, a, p),
(q, b, r ), (r, a, s), (r, b, q)}.
2.47 Drive all final states to a single new final state by an epsilon transition. Old
final states are now nonfinal. Not possible for DFAs, e.g., try for {ε, a}.
2.71 The addition 0101 + 0110 = 1011 is written as the string abcd of triples:
a = (0, 0, 1), b = (1, 1, 0), c = (0, 1, 1), d = (1, 0, 1). Write 0110 below 0110 and
then 1011 as the third line. Now read the columns to get a, b, c, d.
Problems of Chapter 3
Section 3.2
3.2 The set of all strings whose nth symbol from the end is 1. All the 2n states are
relevant.
3.7 Initial state p, F = {q, r }, Δ = {( p, a, q), ( p, ε, r ), (r, a, r )}.
3.8 (b) Initial state p, F = {q, r, s, t}, δ( p, 0) = q, δ( p, 1) = t, δ(q, 0) = r, δ(q, 1) = s,
δ(r, 0) = r, δ(r, 1) = s, δ(s, 0) = u, δ(s, 1) = u, δ(t, 0) = u, δ(t, 1) = u, δ(u, 1) = u,
δ(u, 0) = v, δ(v, 0) = v, δ(v, 1) = v. Notice the renaming of states.
(c) Initial state p, F = {q}, δ( p, 0) = q, δ( p, 1) = q, δ(q, 1) = q, δ(q, 0) = r,
δ(r, 0) = q, δ(r, 1) = q.
Section 3.3
3.11 Initial state p, F = {r }, δ( p, a) = q, δ(q, a) = s, δ(q, b) = r, δ(s, b) = p.
3.12 Initial state p, F = {r }, and transitions δ( p, 0) = δ( p, 1) = δ(q, 0) = q,
δ(q, 1) = δ(r, 0) = δ(r, 1) = r }.
3.14 (a) Construct a DFA and then the grammar. One such has productions:
S → a A|b B|ε, A → bC|a S, B →
aC|bS, C → a B|b A.
3.15 (a) Initial state p, F = {r } and transitions ( p, a, p), ( p, b, q), ( p, b, r ), (q, a, q),
(q, a, r ), (q, b, p).
(b) Initial state s, F = { p}, Δ = {(s, a, s), (s, b, s), (s, a, r ), (r, b, q), (q, a, p)}.
3.16 Initial state q0 , F = {q2 }, δ(q0 , a) = q1 , δ(q1 , b) = δ(q2 , a) = δ(q2 , b) = q2 ,
δ(q0 , b) = δ(q1 , a) = q3 , δ(q3 , a) = δ(q3 , b) = q4 , δ(q4 , a) = δ(q4 , b) = q5 ,
δ(q5 , a) = q5 , δ(q5 , b) = q5 .
Section 3.4
3.18 Initial and final state p, Δ = {( p, a, q), (q, b, r ), (r, c, p), (r, ε, p)}. No.
3.20 (f) Initial state p, F = {r, v}, Δ = {( p, ε, q),( p, ε, s),(q, a, r ), (r, a, r ), (s, a, t),
(t, b, u), (u, a, u), (u, ε, v), (v, b, v)}.
(g) Initial state p, F = {q}, Δ = {( p, a, p), ( p, b, p), ( p, b, q), (q, b, r ), (q, a, s),
(r, b, s), (s, ε, q)}.
Section 3.5
3.24 Initial state p, Final state r , Δ = {( p, b, q),(q, b, q),(q, b, r ),( p, a, r ),( p, b, r )}.
Answers and Hints to Selected Problems 389
Problems of Chapter 4
Section 4.2
4.3 As earlier, with F = (Q 1 × F2 ) ∪ (F1 × Q 2 ).
4.5 (c) Initial state q0 , F = {q0 , q1 , q2 , q3 , q5 }, δ(qi , a) = qi+1 ), for i = 0(1)4, and
δ(q5 , a) = q5 .
4.12 (b) Closure under concatenation and reversal.
4.14 Take L = Σ ∗ . Conclude that any L is regular.
Section 4.3
4.17 (a) Otherwise, every walk has a finite number of edges, and the accepted string
is of length at most some fixed number. The language is finite.
(b) If not, then such a cycle can be used to generate a string of arbitrary length that
can be accepted.
4.18 (f) Consider L ∩ a ∗ b∗ . (g) w = a m bn a 2m = x yz. Take x y 3 z.
(m) Consider its complement.
4.20 (n) No. (o) No. (p) No.
4.21 (b) Use h(a) = h(b) = a, h(c) = c. (d) No. (e) No. (i) No.
4.22 (a) Consider a nonregular language and its complement. (b) L = L ∩ (L ) R
(c) Take a nonregular infinite language. It is an infinite union of singletons.
4.24 (a) One case is: m = 0, k = 0, n > 6. Get a regular expression.
(b) Choose w = a 7 b j a j . One of m > 6 or k ≤ n is violated after suitable pumping.
4.26 (b) #u (E) = #d (E) = #l (E) = #r (E). (c) L is not regular. Prove it.
4.28 Converse of Theorem 4.6 does not hold. For example, consider the language
{a n bn : n ∈ N}.
390 Answers and Hints to Selected Problems
Section 4.4
4.29 First, give a simple verbal description of the language of the DFA.
4.30 D = ({ p, q, r }, {0, 1}, δ, p, {s}) with δ( p, 0) = δ( p, 1) = δ(q, 0) = q,
δ(q, 1) = δ(r, 0) = δ(r, 1) = r.
Section 4.5
4.33 Prove by contradiction.
4.37 (d) Initial state p, F = { p, q, r, t}, δ( p, a) = q, δ(q, a) = r, δ(r, a) = s,
δ(s, a) = δ(s, b) = δ(t, a) = t.
4.38 Initial state p, F = {q}, δ( p, 0) = p, δ( p, 1) = δ(q, 0) = δ(q, 1) = q.
4.39 Prove it by contradiction.
Problems of Chapter 5
Section 5.2
5.10 Not easy to describe.
5.17 Yes.
5.8 The only possible derivations giving aab as prefix, ends with aba.
Answers and Hints to Selected Problems 391
Section 5.4
5.36 (a) Derive aab. (c) S → ε|a S|aCbS, C → ε|aCbC disambiguates.
5.39 (b) S → a A, A → b|a AB, B → b.
5.40 (a) aab has two left-most derivations. (b) ab has two left-most derivations. An
equivalent unambiguous CFG has productions S → ε|A, A → ab|A A|a Ab.
5.42 From the DFA, get a regular grammar. Leaving aside rules of the form A → ε,
it is a simple CFG. Next, the rules of the kind A → ε never create any ambiguity.
5.45 (a) Yes. (b) Yes. (c) No. (d) No.
Section 5.5
5.46 S → a|aC|Aa|Ba|AaC|ABa|BaC|ABaC, A → B|C|BC, B → b, C → D,
D → c.
5.47 S → a|bc|bb|Aa, A → a|bb|bc, B → a|bb|bc.
5.48 S → a B|a B B, B → bb|b Bb.
5.50 (b) S → A|B|AB, A → b|b A|b A A, B → a|a B|a B B.
5.51 S → S + A|A × B|(S)|a|b|Ca|Cb|C0|C1, A → A × B|(S)|a|b|Ca|Cb|C0|C1,
B → (S)|a|b|Ca|Cb|C0|C1, C → |a|b|Ca|Cb|C0|C1.
5.53 S → C A, A → a, C → b.
392 Answers and Hints to Selected Problems
Section 5.6
5.60 (a) S → AS B|AB, A → b|b A|b AS, B → a|aa|A|a S|Sa|Sa S.
(b-c) S → AB|AS B, A → b|b A|b AS, B → a|b|aa|b A|a S|Sa|b AS|SaS.
(d) S → AB|AE, A → b|C A|C F, B → a|b|C A|C F|D D|DS|S D|SG, C → b,
D → a, E → S B, F → AS, G → DS.
5.63 (d) S → C A, A → D X, C → X B, B → XY, D → C Z , X → a, Y → b,
Z → c.
(f) S → AB|X B|Y Y |Z A, A → W Y, B → Z A|Y Y, Z → Y Y, W → XY, → a,
Y → b.
(h) S → A A1 |U B|a|S A|AS, S→ A A1 |U B|a|S A|AS, A → b|A A1 |U B|a|AS|S A,
A1 → S A, U → a, B → b.
5.64 (b) Label of an interior node is a nonterminal.
5.65 Use a CNF. Then, for A → a, use A → a XY, X → ε, Y → ε with new
nonterminals X, Y.
5.66 (c) S → b A|b B S B, A → b, B → a. (g) S → bS|b A|b B S, A → a, B → b.
(i) S → a B|a B B|b AB, A → a|a B|b A, B → a.
5.68 . Conjecture which strings are derived from A, B and use induction on the length
of derivations.
Problems of Chapter 6
Section 6.2
6.3 (b) {a} ∪ ab+a. (e) aa ∗ ba ∗.
6.8 (a) Initial state p, F = {q}, Δ = {( p, a, A, p, A A), ( p, b, A, p, B A), ( p, a, B, p,
AB), ( p, b, b, p, B B),( p, a, ε, p, A), ( p, b, ε, p, B), ( p, ε, A, q, A), ( p, ε, b, q, B),
Answers and Hints to Selected Problems 393
(q, a, A, q, ε), (q, b, B, q, ε)}. (c) Initial state p, F = {q}, Δ = {( p, ε, ε, q, ε), (q, a,
ε, q, A A), (q, a, A, q, A A A), (q, b, A, q, ε)}. (f) Use nondeterminism to generate
one or two tokens by the transitions ( p, a, ε, q, A), ( p, a, ε, q, A A), etc.
6.10 For the bottom-of-the- stack symbol, see in Sect. 6.6.
Section 6.3
6.12 (a) Initial state p, F = {r }, Δ = {( p, ε, ε, q, S), (q, a, S, q, S A), (q, a, S, q, ε),
(q, b, A, q, B), (q, b, B, q, ε), (q, ε, ε, r, ε)}.
(c) Initial and final state q, Δ = {(q, ε, S, q, AS B),(q, ε, S, q, C),(q, ε, C, q, BC A),
(q, ε, C, q, S), (q, ε, C, q, ε), (q, a, A, q, ε), (q, b, B, q, ε), }.
(e) Initial state p, F = {r }, Δ = {( p, ε, ε, q, S), (q, ε, ε, r, ε), (q, a, S, q, A),
(q, a, A, q, ABC), (q, a, A, q, ε), (q, b, A, q, B), (q, b, B, q, ε), (q, c, C, q, ε)}.
(f) Initial state p, F = {s}, Δ = {( p, a, ε, q, ε), (q, a, ε, r, ε), (r, a, ε, r, A A),
(r, a, A, r, A A A), (r, b, A, s, A), (s, b, A, s, ε)}.
(g) Initial state p, F = {q}, Δ = {( p, ε, ε, q, S), (q, a, S, q, SSS), (q, a, S, q, B),
(q, b, B, q, ε)}.
6.13 (d) What is the language?
6.14 (a) Initial state p, final state q, D = {( p, ε, ε, p, S A), ( p, a, S, p, S B),
( p, b, S, p, ε), ( p, b, B, p, ε), ( p, ε, A, q, ε)}.
6.15 Yes. From PDA construct a CFG; from this CFG construct a PDA.
Section 6.4
2
6.18 (g) Choose a n b2n b2n a n . (l) Choose a n bn .
Section 6.5
6.20 (a) Intersect it with a ∗ b∗ c∗ . (e) Closure under union. (g) Closure under union.
(i) Closure under union.
6.21 (b) Write it as {a m bm } ∩ (L) where L = {a 2009 b2009 }. (c) Is a CFL ( d) Is a CFL.
(o) Closure under union. (s) Construct a PDA. (t) Choose a pq where both p, q are
primes bigger than n. (v) Yes. Show that {w ∈ {a, b, c}∗ : #a (w) < #b (w)} is a CFL
by breaking into eight cases.
6.22 Replace each production A → x in G by A → x R . Then use induction on the
lengths of all derived strings of terminals and nonterminals.
Section 6.6
6.25 (a) Initial and final states p, Δ = {( p, a, , q, A), (q, a, A, q, A A), (q, b, A,
r, ε), (r, b, A, r, ε), (r, b, , p, )}. (e) Initial state p, F = {r, s}, Δ = {( p, a, , s,),
(s, a, , q, A A), (q, a, A, q, A A), (q, b, A, r, ε), (r, b, A, r, ε), (r, b, , s, )}.
(g) Construct a PDA. How to replace a transition of the form ( p, ε, , ε, )?
(k) Once c is encountered it matches the number of accumulated A’s by popping off
one a step.
6.28 What is its complement?
394 Answers and Hints to Selected Problems
Problems of Chapter 7
Section 7.2
7.1 Yes.
7.2 None is in L(G).
7.6 S → a S BC|a BC, C B → BC, a B → ab, b B → bb, bC → bc, cC → cc.
7.8 L(G) = {ab m cn d : m, n ≥ 1}, a regular language.
7.9 (c) S → aaa A, A → a Ab|ε. (d) S → Ab A, A → a Ab|ε, B → b B|ε.
7.10 (a) S → XY Z , XY → a X A, XY → b X B, Aa → a A, Ab →
b A, Ba → a B, Bb → b B, aY → Y a, bY → Y b, XY → ε, Z → ε.
(b) S → a Sb|a Ab, A → b Aa|ba.
(c) S → a B|b A, A → a|a S|b A A, B → b|bS|a B B.
7.11 (a) S → X AY, X → X D, DY → Y, D A → A AD, X → ε, Y → ε, A → a.
(d) S → A|Sc, A → ab|a Ab.
Answers and Hints to Selected Problems 395
7.13 (a) ∅. (b) Only strings of length 2n are generated. (c) {a m bn cn : m ≥ 0, n ≥ 1}.
2 2
(d) {abm cn d : m, n ≥ 1}. (e) {a n : n ≥ 1}. (f) {a n : n ≥ 1. (g) {a n bn a n : n ≥ 1}.
(h) {ww : w ∈ {a, b}∗. (i) {a n bn cn : n ≥ 1}. (j) {a n+1 bn+k : n ≥ 1, k ≥ −1}.
Section 7.3
7.15 δ(s,
b ) = (s,
b ).
7.16 δ(s, σ) = (s,
R ) if σ a; δ(s, a) = (h, a).
Section 7.4
7.22 (b) Replace first a by c, last b by d, come back to c, go right, continue.
(c) δ(s,
b ) = (s, R ), δ(s, a) = δ(q, a) = δ(q, b) = (q, R ), δ(q, b ) = (h,
b ).
(d) δ(s, b ) = ( p,
R ), δ( p, b) = (q, R), δ(q, a) = (r, R), δ(r, b) = (r, R),
δ(r, a) = (h, R). (e) δ(s, b ) = ( p,
R ), δ( p,
b ) = (h, b ), δ( p, a) = (q, R),
δ( p, b) = (q, R), δ(q, a) = δ(q, b) = ( p, R), δ(q,
b ) = (, b ).
Section 7.5
7.28 Put left and right markers and move them cell by cell in each direction.
7.34 (a) 1. Go from left to right, crossing off every other a.
2. If in stage 1, the tape contained a single a, accept.
3. If in Stage 1, the tape contained odd (but >1) number of a s, reject.
4. Return to the original blank cell. Go to stage 1.
(b) Remember and delete the first symbol, search for c on the right, match and delete
the next symbol. Remove the blank by shifting suitably. Go to the original blank cell,
and start over again.
(d) 1. Scan from left to right and check if the string is in a ∗ b∗ c∗ ; else reject.
2. Come back to the original blank cell.
3. Delete one a.
4. Replace one b by a d, delete a c; do this until all b are replaced.
Change all d’s to b’s.
5. Continue to execute Stage 3–4 if there is another a to be deleted.
6. If all a’s are deleted, check whether all c’s have been deleted.
If yes, accept, else, reject.
Section 7.6
7.37 Move all a from the beginning of the input s to another tape, and then match
the contents of the two tapes.
7.38 Can it write β instead of
b?
7.40 No. Represent each σ by a σ (new). If needed, revert back to σ.
7.41 Replace δ(( p, {a, b}) = (q,
R ) by δ( p, σ) = (q,
R ), for each σ ∈ {a, b}.
396 Answers and Hints to Selected Problems
Section 7.7
7.44 Generate n on another tape, and then compare the nth symbol of u and v. If not
equal, accept; else, reject.
7.47 (c) Nondeterministically guess the middle of the input and then compare the
left and the right strings. (h) For n > 1, generate nondeterministically a positive
integer m and check whether n is a multiple of m.
7.49 The DTM may be difficult, while the NTM can guess where is this 0.
Problems of Chapter 8
Section 8.2
8.3 (a) Find the first blank, write a 0, find next blank, move left, remove two 0’s.
8.8 Write a c on both the sides of the string; bring them closer symbol by symbol.
Replace the middle cc by c.
8.9 (f) x is the least integer greater than or equal to x. Thus, n/2 = m or m + 1
according as n = 2m or 2m + 1.
(i) Euclidean algorithm is given by:
if n = 0 or n = m, then return m, else, return gcd(n, m mod n).
Answers and Hints to Selected Problems 397
Section 8.3
8.24 Computably enumerable languages are not closed under complementation.
8.25 It computes the function f, where f (u, v) = 1 if u is a substring of v, else,
f (u, v) = 0.
8.26 For example, test whether an integer is prime.
8.28 L 1 = ∪ki=2 L i , and finite union of computably enumerable languages is com-
putably enumerable.
Section 8.5
8.35 Yes, as there are only finite number of splits of any string from L + .
8.36 (e) Computably enumerable. (g) Not computably enumerable.
(k) Computably enumerable. (l) Not computably enumerable.
8.38 If L is computably enumerable, design an NTM M for h(L) as follows. Suppose
w is the input to M. On a second tape, M guesses some string v and then checks
whether h(x) = w. Next, it simulates the TM for L on v. If v is accepted, then M
accepts w.
For the decidable languages, consider the language L consisting of strings of the
form ψ(M)# w@ 2n+1 , where M has input alphabet {0, 1}, and w ∈ {0, 1}∗ . The
string is in L iff M accepts w after making at most n moves. L is decidable as we
may simulate M on w for n moves and then decide whether or not to accept. Take
the homomorphism h defined by h(0) = 0, h(1) = 1, h(#) = h(@) = h(2) = ε. Apply
h to L. We find that h(L) is the acceptance language, which is not decidable.
Section 8.6
8.40 (a) S → abc|a Abc, Ab → b A, Ac → Bbcc, b B → Bb, a B → aa|aa A.
(b) Create a n Bcn D. The markers B and D assure the correct number of b’s and
d’s created. Then, B travels to the right to meet D. Finally, we create one d and a
return messenger that puts b in the right place. That is, S → a AcD|a BcD, A →
a Ac|a Bc, Bc → cB, Bb → b B, B D → Ed, cE → Ec, b E → Eb, a E → ab.
8.41 Right-linear: S → aa A, A → a A|B, B → bbbC, C → bC|ε.
Left-linear: S → Abbb, A → Ab|B, B → aaC, C → a A|ε.
8.42 Yes. Prove it.
8.45 (b) Choose a n b2n a n . (c) Choose a n b2n a n . (e) Choose (n a )n + (n a )n .
8.47 (b) Go on dividing n in a n successively by 2, 3, 4, . . . until the string is accepted
or rejected. You may have to rewrite the quotient on the input squares and delete
others.
398 Answers and Hints to Selected Problems
(f) Use a three-tape machine. On the third tape, keep the current value of
(w). On
the second tape keep markers after each
(w) squares. Compare the strings between
the markers.
8.51 L(G) is linear.
8.52 aab(ab)∗.
8.53 Both generate {a n bn cn : n ≥ 1}.
8.54 You can find a suitable grammar. Alternatively, an LBA starts at opposite ends
and stops when a match is found.
8.55 A nonregular CFG can generate a regular language.
Problems of Chapter 9
Section 9.2
9.1 Modify the transitions of a given machine D in such way that q would behave
like a halt state.
9.3 Construct Dw from D that writes w on the tape, and then simulates D.
9.4 How many n-state TMs are there which halt? From a machine M and a number
m, construct D which always halts with one of two answers: M on the blank tape
halts in at most m moves, or M on the blank tape takes more than m moves. Use this
D and the TM that computes f to solve the blank-tape problem.
9.5 Modify the halting transitions of M so that it first writes σ and then halts.
9.7 (a) Decidable.
9.12 L A is not map-reducible to L ∅ .
9.13 Yes.
9.16 L A is not computably enumerable, while both L A and L A are undecidable.
Section 9.3
9.24 This shows that problems and their complements are essentially different. They
are similar only when either, so each, is decidable.
9.25 Reduce the acceptance problem to this problem by modifying M in such a way
that C leads to a halted configuration in one step.
9.26 Rice’s theorem.
9.27 No; a nontrivial property of R E.
9.29 Yes.
9.30 No.
9.31 (b) Yes. If M has n states, consider the first n moves of M. (c) Unsolvable.
9.34 (a) No. Choose L(N) = ∅.
9.36 Show that each TM can be simulated by one having only two states other than
h and ; you may have to enlarge the tape alphabet.
9.37 (c) Take L(G ) = Σ ∗ . (d) Unsolvable. (e) Unsolvable.
9.40 (a) Solvable. (b) Solvable. (c) Unsolvable.
400 Answers and Hints to Selected Problems
Section 9.4
9.42 (g) L − L is regular. (h) L ∪ L is regular. Is L = L ∪ L ?
(k) Construct a DFA for L R from a DFA for L. Then, check equality.
(p) If L over {a, b} contains no even length string, then L ∩ (aa ∪ ab ∪ ba ∪ bb)∗ = ∅.
Section 9.5
9.46 (a) Solvable. (b) Solvable. (c) Unsolvable. (d) Solvable. (e) Solvable.
(g) Unsolvable. (h) Unsolvable. (i) Unsolvable. (j) Unsolvable. (k) Unsolvable.
(l) Unsolvable. (m) Solvable. (n) Unsolvable. (o) Solvable. Use Pumping Lemma.
(p) Unsolvable. (s) Unsolvable. (t) Solvable. Such a PDA accepts all strings iff it
accepts all strings of length 1.
Section 9.6
9.52 No.
9.54 Yes. No.
9.55 P has a match, P does not have.
9.61 (a) Mimic the reduction of MPCP to PCP.
Section 9.7
9.64 Write a given sentence φ in prenex form, where all the quantifiers are in the
beginning. Suppose there are k quantifiers. Give an algorithm that constructs a DFA
Di for each i from 0 to k. The DFA Di recognizes the collection of all strings repre-
senting i -tuples of numbers that make the formula φi true, where φi is obtained from
φ by chopping of the first i occurrences of quantifiers. Use Di to construct Di−1 .
Once A0 is constructed, test whether A0 accepts the empty string. If it does, φ is true.
Section 9.8
9.67 φ M,w (x) states that x is a suitably encoded accepting computation history of M
on w.
9.69 (b) This says that semi-decidability can be had by an unbounded search moni-
toring at each stage, a decidable property. (c) Use part (b).
(e) The predicate Pr ( p, φ) that stands for “ p is a proof of φ,” is decidable. Encode p
and φ as numbers. (f) The Diophantine predicate p(· · · ) = 0 is decidable, by Sturm’s
theorem.
9.72 (c) This can be interpreted in two ways.
9.112 The proof follows similar lines as for PCP. Remember to modify the construc-
tion in such a way that when a TM M enters its accepting state, the current string can
eventually be erased!
9.113 Reduce the acceptance problem to tiling.
9.116 Using induction on the length of y, show that if x, y are propositions and x is
a prefix of y, then x = y.
9.117 S → p A|¬S|(S ∧ S)|(S ∨ S)|(S → S)|(S ↔ S), A → ε|A0.
9.119 (a) Some TM can enumerate all provable statements.
(b) Reduce the acceptance problem to T h(N); use valid computation histories.
9.121 Use diagonalization.
9.124 Let P be a nontrivial property of computably enumerable languages. If it were
decidable for TMs M whether L(M) ∈ P, then we could construct a computable map
f having no fixed point.
9.126 Construct M by “On input w, obtain, via recursion theorem, ψ(M). Compute
f (ψ(M)) to get ψ(M ). Simulate M on w.” The crux is to show that f (ψ(M)) =
ψ(M ) for some M .
9.128 (b) If so, then M accepts 0, and then X would be false.
9.129 Consider the polynomial p(x, y) = x − (x + 1)(q(x, y))2. Then p(x, y) is
nonnegative iff q(x, y) = 0.
9.131 (c) How many languages are decided by oracle TMs?
9.133 (c) Empty = {ψ(M) : ∀x∀y(M does not accept x in y steps)}. Next, ∀x∀y can
be combined into one using the map that takes (i, j ) to i + (i + j )(i + j + 1)/2.
(d) Total = {ψ(M) : ∀x∃y(M halts on x in y steps)}.
(e) Finite = {ψ(M) : ∃n∀x∀y(
(x) ≤ n or M does not accept x in y steps}. The two
universal quantifiers can be combined into one.
(f) CoFinite = {ψ(M) : ∃n∀x∃y(
(x) ≤ n or M accepts x in y steps)}.
Problems of Chapter 10
Section 10.2
10.1 (a) f (n) = O(g(n)), f (n) = o(g(n)), g(n) O( f (n)), g(n) o( f (n)).
(h) f (n) = o(g(n)), g(n) O( f (n)).
10.2 (a) False. (b) True. (c) True.
10.3 (a) True. (d) True.
Answers and Hints to Selected Problems 403
Section 10.3
10.15 (a) Guess the middle of the string, copy the second part to second tape, then
compare the strings.
10.16 Where is {a n bn : n ∈ N}?
Section 10.4
10.20 Yes.
10.21 Use Tape compression.
10.22 The map reduction here uses an O(1) space.
10.23 A TM that uses f (n) space runs in time 2 O( f (n)) . Moreover, PATH is in P.
10.25 (a) True. (c) True. (d) True.
10.28 Yes.
Section 10.5
10.36 Use diagonalization. Consider Σ = {0, 1}. Assume that strings in Σ + have
been arranged in lexicographic order, and that we have an enumeration of all TMs
as M1 , M2 , . . . . Next, define L = {wi : Mi does not accept wi in f (
(wi )) or fewer
steps}. Show that L is decidable but is not in DT( f (n)).
10.38 (a) Linear. (b) O(n 3 ). (c) O(n 2 ). (d) Linear. (e) Linear.
10.40 An O(n 2 ) algorithm for emptyness check is easy to construct. Modify it.
10.41 Use emptiness check.
10.42 Use emptiness check.
10.46 Use the Euclidean algorithm for computing the gcd, the greatest common
divisor, of two numbers. It uses gcd(m, n) = gcd(m, m mod n).
10.48 This result has been improved recently. See [3].
10.49 An undirected graph is connected iff there is a path between every pair of
vertices. A crude algorithm for checking connectedness is: mark some vertex; mark
a vertex (another) if there is an edge between it and some marked vertex. Finally,
check whether all vertices have been marked.
Section 10.6
10.54 For concatenation, suppose the input is of length n. For each i between 1 and
n−1, test whether positions 1 through i holds a string in L 1 and positions i +1 through
n holds a string in L 2 . If so, accept. If the test fails for all i, reject. The running time
of this test is at most n times the sum of the running times of the recognizers for L 1
and L 2 .
For Kleene star, let x ∈ L ∗ , where L ∈ P. Consider all substrings of x; one by one
with increasing length. Reviewing the CYK algorithm may help.
404 Answers and Hints to Selected Problems
Section 10.7
10.64 If L ∈ P, then the complement of {ψ(M) : M accepts ψ(M) after at most
2
(ψ(M)) steps} is also in P.
10.65 Use Euler’s characterization that an undirected graph G is Eulerian iff G is
connected and each vertex of G has even degree. A similar result also holds for
directed graphs.
Section 10.8
10.68 (b) Part (a) provides one case. For the other case, take f to be a model of Y.
10.75 (b) As there are only three Hamiltonian cycles in G. the formula is:
(x 12 ∧ x 23 ∧ x 34 ∧ x 14 ) ∨ (x 13 ∧ x 23 ∧ x 24 ∧ x 14 ) ∨ (x 13 ∧ x 34 ∧ x 24 ∧ x 12 ).
10.77 Show: the reduction can be carried out in O(1) space, and in O(n) time.
10.78 As numbers in A are given in binary, length of k is not a polynomial in n.
10.109 (c) NP is a class of languages and factoring is a function. Thus saying that
factoring is in NP will not do.
10.110 Prove that 3C OLOR is in NP. Next, construct a polynomial time reduction
of 3C OLOR to S AT
10.115 b = 2k is worth trying.
10.116 Reduce H P to LPATH
10.117 First, try k as a power of 2.
10.118 (c) Choose L not from DS( f (n)), and a TM accepting L whose running time
is h(n). Take g(n) = max( f (n), h(n)).
(f) Use diagonalization. The infimum condition says that there is an input w of length
n such that (log t) f (n) < g(n), where t is the number of tape symbols of any consid-
ered TM.
(o) There are positive integers m, k such that r ≤ m/k and r + c ≥ (m + 1)/k. Prove
that NS(n m/k ) NS(n (m+1)/k ).
10.123 Use the space complexity of a TM that computes f (n).
10.128 Consider C = { p, q, r, t, ¬ p∨¬q, ¬q ∨¬r, ¬r ∨¬ p, p∨¬t, q ∨¬t, r ∨¬t}.
Show that if an interpretation is a model of { p, q, r }, then it can be extended to
a model of at most seven of the clauses from C and no more. Moreover, there is
exactly one model of { p, q, r } that can be extended to at the most six of the clauses
from C. Use this gadget to reduce 3S AT to M AX 2S AT
10.129 There would be some constant c such that an NP problem could be solved
in time O(n c log n ).
10.130 (a) Reduce S AT to T RUE S AT. Suppose we are given an expression E with
variables x 1 , x 2 , . . . , x n . Convert E to E as follows:
(i) First, test if E is true when all variables are true. If so, E is satisfiable; so convert
it to a specific expression x ∨ y that we know is in T RUE S AT.
(ii) Otherwise, let E = E ∨ (x 1 ∧ x 2 ∧ · · · x n ). It is a polynomial time reduction. E
is true when all variables are true.
If E is in S AT, then it is satisfied by some interpretation, which does not assign
all variables to 1, because we tested such an interpretation and found E to be false.
Thus, E is in T RUE S AT. Conversely, if E is in T RUE S AT, then as x 1 ∧ x 2 ∧ · · · ∧ x n
is true only when all of them are 1, E must be satisfiable.
10.131 Guess z, compute f (z) deterministically in polynomial time, and test whether
f (z) = x. When the guess of z is correct, we have f −1 (x). Compare it with y, and
accept the pair (x, y) if z < y. This shows that L ∈ NP. To show that L ∈ CoNP, we
require the set of inputs that are not of the form (x, y), with f −1 (x) < y, is in NP.
It is easy to check for ill-formed inputs. For the rest of the inputs, guess z, compute
f (z), test if f (z) = x, and then test if z ≥ y. If both tests succeed, then f −1 (x) ≥ y,
so (x, y) is in L.
406 Answers and Hints to Selected Problems
10.134 (a) For any polynomially bounded TM M and an input w, construct, in poly-
nomial time, a regular expression R that generates all strings that are not sequences
of configurations of M leading to the acceptance of w.
10.139 (a) If F = F1 ∨ F2 , then (if F1 is true, F is true, else, (F is as is F2 )). If
F = ∃x E, then (Evaluate E by taking each x in it as 0. If result is 1, then F is true.
Else, evaluate E by taking each x as 1. The result is the value of F.)
(b) For a DTM M that uses p(n) space, and an input w of length n, construct a quan-
tified boolean formula that is true iff M accepts w. You will have to consider trans-
lating the moves of M as quantified boolean formulas analogous to the construction
in Cook’s theorem.
10.146 (a) As C progresses, B finds a cute algorithm, say, D that satisfies
(ψ(D)) >
(ψ(C)). Next, D is run. Thus, C does exactly what D does, eventually. Then D is
not cute; a contradiction.
References
16. Chomsky, N.: Context-free grammars and pushdown storage. Tech. Rep., MIT
Research Lab in Electronics. Cambridge, MA (1962)
17. Chomsky, N., Miller, G.A.: Finite state languages. Inform. Control. 1, 91–112
(1958)
18. Chomsky, N., Schützenberger, M.P.: The algebraic theory of context-free
languages. In: Braffort, P., Hirschberg, D. (eds.) Computer Programming and
Formal Systems, pp. 118–161. North-Holland, Amsterdam (1963)
19. Church, A.: An unsolvable problem of elementary number theory. Amer.
J. Math. 58, 345–363 (1936)
20. Cobham, A.: The intrinsic computational difficulty of functions. In: Proc. 1964
Congress for Logic, Math. Phil. of Sci., pp. 24–30. North-Holland, Amsterdam
(1964)
21. Cohn, P.M.: Algebra, Vol 2. Wiley, NY (1977)
22. Conway, J.H.: Regular Algebra and Finite Machines. Chapman and Hall,
London (1971)
23. Cook, S.A.: The complexity of theorem proving procedures. In: Proc. Third
Symp. Theory of Computing, pp. 151–158. Assoc. Comput. Mach., NY (1971)
24. Curry, H.B.: An analysis of logical substitutions. Am. J. Math. 51, 363–384
(1929)
25. Cutland, N.: Computability: An Introduction to Recursive Function Theory.
Cambridge University Press, NY (1980)
26. Davis, M.: Hilbert’s tenth problem is unsolvable. Amer. Math. Monthly. 80,
233–269 (1973)
27. Denning, P.J., Dennis, J.B., Qualitz, J.E.: Machines, Languages, and Computa-
tion. Prentice Hall, Englewood-Cliffs, NJ (1978)
28. Earley, J.: An efficient context-free parsing algorithm. Comm. of ACM. 13,
94–102 (1970)
29. Ehrenfeucht, A., Parikh, R., Rozenberg, G.: Pumping lemmas and regular sets.
SIAM J. Comput. 10, 536–541(1981)
30. Evey, J.: Application of pushdown store machines. In: Proc. Fall Joint Com-
puter Conf., pp. 215–227. AFIPS Press, Montvale, NJ (1963)
31. Fischer, P.C.: On computability by certain class of restricted Turing ma-
chines. In: Proc. Fourth Symp. Switching Circuit Theory and Logical Design,
pp. 23–32 (1963)
32. Fischer, P.C.: Turing machines with restricted memory access. Inform. Control.
9, 364–379 (1966)
33. Floyd, R.W.: On ambiguity in phrase-structure languages. Comm. ACM. 5,
526–534 (1962)
34. Gallier, J., Hicks, A.: The Theory of Languages and Computation, web page:
http://www.cis.upenn.edu/ jean/gbooks/tc.html. Cited 15 Nov 2008
35. Gamow, G.: One, Two, Three, . . . Infinity: Facts and Speculations of Science,
The Viking Press, NY (1961)
36. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the The-
ory of NP-completeness. W.H. Freeman, NY (1979)
37. Ginsburg, S.: Examples of abstract machines. IEEE Trans. Elec. Comp. EC-11,
132–135 (1962)
References 409
61. Jaffe, J.: A necessary and sufficient pumping lemma for regular languages.
SIGACT News. 10, 48–49 (1978)
62. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E.,
Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103.
Plenum Press, NY (1972)
63. Kasami, T.: An efficient recognition and syntax algorithm for context-free
languages. Tech. Rep. AFCRL-65–758, Air Force cambridge Research Lab.,
Redford, MA (1965)
64. Kleene, S.C.: General recursive functions of natural numbers. Math. Annalen.
112, 727–742 (1936)
65. Kleene, S.C.: Recursive predicates and recursive quantifiers. Trans. Am. Math.
Soc. 53, 41–74 (1943)
66. Kleene, S.C.: Introduction to Metamathematics. D. Van Nostrand, Princeton,
NJ (1974)
67. Kleene, S.C.: Representation of events in nerve nets and finite automata. In:
Shannon, C.E., McCarthy, J. (eds.) Automata Studies, pp. 3–41. Princeton Uni-
versity Press, Princeton, NJ (1956)
68. Knuth, D.E.: On the translation of languages from left to right. Inform. Control.
8, 607–639 (1965)
69. Knuth, D.E., Morris Jr., J.E., Pratt, V.R.: Fast pattern matching in strings. SIAM
J. Comput. 6, 323–350 (1976)
70. Kozen, D.C.: A completeness theorem for Kleene algebras and the algebra of
regular events. Inform. Comput. 110, 366–390 (1994)
71. Kozen, D.C.: Automata and Computability. Springer-Verlag, NY (1997)
72. Kozen, D.C.: Theory of Computation. Springer-Verlag, NY (2006)
73. Kuich, W.: The Kleene and Parikh theorem in complete semirings. In:
Ottmann, T. (ed.) Proc. 14th Colloq. Aut. Lang. and Progr., Lect. Notes in
Comp. Sci. 257, pp. 212–225. EATCS, Springer-Verlag, NY (1987)
74. Kuroda, S.Y.: Classes of languages and linear bounded automata. Inform. Con-
trol. 7, 207–223 (1964)
75. Landweber, P.S.: Three theorems on phrase structure grammars of type 1.
Inform. Control. 6, 131–136 (1963)
76. Levin, L.L.: Universal sorting problems. Problemi Predachi Informatsii. 9,
265–266 (1973)
77. Lewis, P.M., Rosenkrantz, D.J., Stearns, R.E.: Compiler Design Theory.
Addison-Wesley, Reading, MA (1976)
78. Lewis, H.R., Papadimitriou, C.H.: Elements of the Theory of Computation, 2nd
ed. Pearson Education, NJ (1998)
79. Linz, P.: An Introduction to Formal Languages and Automata, 4th ed. Jones
and Bartlett, MA (2006)
80. Manin, Y.I.: A Course in Mathematical Logic. Springer-Verlag, NY (1977)
81. Manna, J.: Mathematical Theory of Computation. McGraw-Hill, NY (1974)
82. Markov, A.A.: The Theory of Algorithms. Trudy Math. Steklov Inst. (1954).
Eng. Trans.: National Science Foundation, Washington, DC (1961)
83. Matiyasevich, Y.: Enumerable sets are Diophantine. Dokl. Akad. Nauk SSSR.
191, 279–282 (1970). Eng. Trans.: Soviet Math. Dokl. 11, 354–357 (1970)
References 411
84. McCulloh, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous
activity. Bull. Math. Biophy. 5, 39–47 (1943)
85. McNaughton, R., Yamada, H.: Regular expressions and state graphs for au-
tomata. IEEE Trans. Elec. Computs. 9, 39–47 (1960)
86. Mealy, G.H.: A method for synthesizing sequential circuits. Bell Sys. Tech. J.
34, 1045–1079 (1955)
87. Meyer, A.R., Ritchie, D.M.: The complexity of loop programs. In: Proc. ACM
Natl. Meeting, pp. 465–469 (1967)
88. Milner, R.: Operational and algebraic semantics of concurrent processes. In:
van Leuwen, J. (ed.) Handbook of Theoretical Computer Science, Vol B,
pp. 1201–1242. North-Holland, Amsterdam (1990)
89. Minsky, M.L.: Recursive unsolvability of Post’s problem of tag and other topics
in the theory of Turing machines. Ann. Math. 74, 437–455 (1961)
90. Minsky, M.L.: Computation: Finite and Infinite Machines. Prentice Hall,
Englewood-Cliffs, NJ (1967)
91. Moore, E.F.: Gadanken experiment on sequential machines. Automata Studies.
129–153 (1956)
92. Moret, M.E.: The Theory of Computation. Addison-Wesley, NY (1998)
93. Myhill, J.: Finite automata and the representation of events. In: Technical Note
WADD, pp. 57–624. Wright Patterson AFB, Dayton, Ohio (1960)
94. Naur, P.: Revised report on the algorithmic language Algol 60. Comm. of the
ACM. 6, 1–7 (1963). Reprinted in: Rosen, S. (ed.) Programming Systems and
Languages, pp. 79–118. McGraw-Hill, NY (1967)
95. Nerode, A.: Linear automaton transformations. Proc. Am. Math. Soc. 9, 541–
544 (1958)
96. Oettinger, A.G.: Automatic syntactic analysis and the pushdown store. In: Proc.
Symp. on Appl. Math. Vol 12., pp. 104–129. American Math. Soc., Providence,
RI (1961)
97. Ogden, W.G.: A helpful result for proving inherent ambiguity. Math. Sys.
Theory 2, 191–194 (1968)
98. Papadimitriou, C.H.: Computational Complexity. Addison-Wesley, Reading,
MA (1994)
99. Papadimitriou, H., Steiglitz, K.: Combinatorial Optimization: Algorithms and
Complexity, 2nd ed. Dover, NY (1997)
100. Parikh, R.: On context-free languages. J. Assoc. Comput. Mach. 13, 570–581
(1966)
101. Parikh, R.J.: Existence and Feasibility in Arithmetic. J. Symbolic Logic 36,
494–508 (1971).
102. Polya, G., Mathematics and Plausible Reasoning, Vol. I : Induction and Anal-
ogy in Mathematics. Princeton University Press, Priceton, NJ (1954)
103. Post, E.: Finite combinatory process – formulation I. J. Symb. Logic. 1, 103–
105 (1936)
104. Post, E.: Formal reductions of the general combinatorial decision problem. Am.
J. Math. 65, 197–215 (1943)
105. Post, E.: Recursively enumerable sets of positive natural numbers and their
decision problems. Bull. Am. Math. Soc. 50, 284–316 (1944)
412 References
106. Post, E.: A variant of recursively unsolvable problem. Bull. Am. Math. Soc. 52,
264–268 (1946)
107. Rabin, M.O., Scott, D.S.: Finite automata and their decision problems. IBM
J. Res. Develop. 3, 115–125 (1959)
108. Reeves, C.R.: Modern Heuristic Techniques for Combinatorial Problems. John
Wiely, NY (1993)
109. Rice, H.G.: Classes of recursively enumerable sets and their decision problems.
Trans. Am. Math. Soc. 89, 25–59 (1953)
110. Rice, H.G.: On completely recursively enumerable classes and their key arrays.
J. Sym. Logic. 21, 301–341 (1956)
111. Rogers Jr., H.: Theory of Recursive Functions and Effective Computability.
McGraw-Hill, NY (1967)
112. Salomaa, A.: Two complete axiom systems for the algebra of regular events.
J. Assoc. Comput. Mach. 13, 158–169 (1966)
113. Salomaa, A.: Formal Languages. Academic Press, NY (1973)
114. Scheinberg, S.: Note on Boolean properties of context-free languages. Inform.
Control. 3, 372–375 (1960)
115. Schützenberger, M.P.: On context-free languages and pushdown automata.
Inform. Control. 6, 246–264 (1963)
116. Seiferas, J.I., McNaughton, R.: Regularity preserving relations. Theor. Comp.
Sci. 2, 147–154 (1976)
117. Shepherdson, J.C.: The reduction of two-way automata to one-way automata.
IBM J. Res. Develop. 3, 198–200 (1959)
118. Shepherdson, J.C., Sturgis, H.C.: Computability of recursive functions. J. As-
soc. Comput. Mach. 10, 217–255 (1963)
119. Shoenfield, J.R.: Degrees of Unsolvability. North-Holland, Amsterdam (1971)
120. Singh, A., Goswamy, C.: Fundamentals of Logic. Indian Council of Philosoph-
ical Research, New Delhi (1998)
121. Singh, A.: Logics for Computer Science. PHI, New Delhi (2003)
122. Sipser, M.: An Introduction to the Theory of Computation. PWS Pub. Co. NY
(1997)
123. Soare, R.I.: Recursively Enumerable Sets and Degrees. Springer-Verlag, Berlin
(1987)
124. Soare, R.I.: Computability and recursion. Bull. Symb. Logic. 2, 284–321
(1996)
125. Soare, R.I.: Computability and incomputability. In: Proc. Third Conf. on Com-
putability in Europe, CIE 2007, Siena, Italy, June 18–23, 2007. Lect. Notes in
Comp. Sci. No. 4497, S.B. Cooper, B. Löwe, A. Sorbi (Eds.) Springer-Verlag,
Berlin, Heidelberg (2007)
126. Stanat, D., Weiss, S.: A pumping theorem for regular languages. SIGACT News
14, 36–37 (1982)
127. Stearns, R.E., Hartmanis, J.: Regularity preserving modifications of regular ex-
pressions. Inform. Control 6, 55–69 (1963)
128. Stockmeyer, L.J.: The polynomial-time hierarchy. Theor. Comp. Sci. 3, 1–22
(1976)
129. Thomas, W.: Languages, automata, and logic. In G. Rozenberg and A. Salomaa,
eds., Handbook of Formal Languages, volume III, 389–455, Springer-Verlag,
New York (1997)
References 413
130. Thompson, K.: Regular expression search algorithms. Comm. ACM 11,
419–422 (1968)
131. Turing, A.M.: On computable numbers with an application to the Entschei-
dungsproblem. Proc. Lond. Math. Soc. 42, 230–265 (1936), Erratum: Ibid. 43,
544–546 (1937)
132. Turing, A.M.: Systems of logic based on ordinals. Proc. Lond. Math. Soc. 42,
230–265 (1939)
133. Valiant, L.G.: General context-free recognition in less than cubic time. J. Comp.
Sys. Sci. 10, 308–315 (1975)
134. Weil, P.: Algebraic recognizability of languages. In: Lecture Notes in Computer
Science No. 3153, Fiala, J., Koubek, V., Kratochvl, J. (eds.) Springer-Verlag,
149–175 (2004)
135. Younger, D.H.: Recognition and parsing of context-free languages in time n 3 .
Inform. Control. 10, 189–208 (1967)
Index
415
416 Index
algorithm children, 6
k-optimal, 371 Chomsky’s hierarchy, 268
approximate, 371 Church-Turing thesis, 256
CKY, 374 clique, 275
cute, 382 size of, 275
CYK, 374 closed downward under , 123
Euclidean, 250 closure, 34
Krushkal, 337 reflexive and transitive, 6
node elimination, 82 closure properties
state elimination, 82 regular languages, 94
State Minimization, 112 CNF, 147
state minimization, 373 cnf, 349
subset construction, 72 conversion, 367
alphabet, 32 co-domain, 9
input, 46, 206 commutative image, 195
output, 207 compactness, 290
tape, 206 complement, 2, 34
ambiguous, 139 computable
analytic hierarchy, 326 grammatically, 273
argument by cases, 17 Markov System, 274
arithmetic hierarchy, 325 computably enumerable in, 325
arithmetic progression, 122 computation, 47, 208
arity, 308 PDA, 162
asterate, 34 relative, 325
automaton step of, 46
2-tape, 67 valid, 297
counter, 243 computes, 246
deterministic linear bounded, 269 concatenation, 32, 34
linear bounded, 267 conditional hypothesis, 19
product, 95
configuration, 46, 56, 161, 207, 228
pushdown, 160
accepting, 211, 233
pushdown, deterministic, 186
combined machine, 221
queue, 243
DPDA, 187
quotient, 111
halted, 210, 233
ray, 319
non-halted, 211
two-stack, 243
of M on w, 338
axiom of choice, 26
rejecting, 211, 233
connected, 5
balanced parentheses, 129
connects, 5
basic pump, 195
contraposition, 17
big-Θ, 331
countable, 12
big-Ω, 331
big-oh of, 330 cycle, 5
bisimulation, 119
boolean formula, 348 DCFL, 189
branch, 22 decidable
branch node, 6 relation, 317
decidable in, 325
Cantor’s theorem, 15 decider, 252
Cartesian product, 2 deterministic, 343
certificate, 348 nondeterministic, 343
CFG, 125 decides, 252, 253
complexity, 157 definition
minimal, 157 by induction, 24
simple, 141 denumerable, 12
CFL, 126 depth, 6
Index 417
transitive, 6 string, 32
unary, 3 square-free, 123
representation subset, 2
of a partial function, 246 substitution, 121, 194
unary, 246 substring, 33
representation assumption, 260 suffix, 33
representative, 7 symbol, 32
restriction nullable, 146
simplicity, 166 useless, 144
reversal, 33
rewriting systems, 201 terminals, 41
Rice, 197 terms, 308
Rice’s Theorem, 289 theorem
root, 5, 135 Blum’s speed-up, 378
rooted tree, 5 Borodin’s gap, 378
running time, 333 Cantor-Schröder-Bernstein, 10
runs in space, 333 Chaitin’s, 382
runs in time, 333 Chomsky-Schützenberger, 181, 195
Russell’s paradox, 15 Cook-Levin, 356
Gödel’s Incompleteness, 324
hierarchy, 377, 378
satisfiability problem, 350
honesty, for space, 379
satisfiable, 349
Matiyasevich, 324
semilinear set, 122, 196
Parikh, 181
sentence, 309
recursion, 323
sentential form, 126
Rice’s, 318, 322
sentential forms, 271 Savitch’s, 338
separates, 320 space hierarchy, 378
set, 1 substitution, 194
elements, 1 tape compression, 339
simple, 324 time hierarchy, 378
simulated annealing, 372 theory
simulating a 3-tape machine, 230 of N, 323
sink, 83 propositional, 308
source, 82 Thue system, 321
space complexity, 333 word problem, 321
space constructable, 378 time complexity, 333, 342
SPDA, 166 time constructable, 378
stack, 159 TM, 206
star height, 65 accepts, 233
start symbol, 41 minimal, 323
starting point, 5 oracle, 324
state, 46 outout language of, 251
accepting halt, 206 random access, 280
accessible, 106 rejects, 234
diagram, 49 semi-decides, 253
equivalence, 111 two-dimensional, 243
final, 46 uses k cells, 292
halt, 206 total TM, 211
inaccessible, 106 transcendental numbers, 11
initial, 45, 46, 206 transducers, 66
of no return, 51 transition, 48, 56
rejecting halt, 206 diagram, 49
start, 206 epsilon, 56
useless, 318 generalized, 165
states, 45, 206 matrix, 91
422 Index
transitions valid
NTM, 233 sentence, 310
TM, 206 valid computation, 242, 303
tree, 5 valid computation history, 297
derivation, 133 valid proposition, 308
parse, 134 verifier, 348
Turing machine, 206 polynomial time, 348
t(n) time, 333 vertex, 3
non-erasing, 242
nondeterministic, 233 walk, 59
offline, 276 well-ordering principle, 26
stay-put, 242 while-program, 279
write-once, 242 word, 32