CS701 Handouts PDF
CS701 Handouts PDF
Theory of Computation
Chapter Subject Lecture Page
ee
1 The Church Turing Thesis 01 to 06 02
2 Decidability 07 to 09 29
3 Reducibility 10 to 14 43
4 Advance Topics in Computability Theory 14 to 20 64
5 Time Complexity 21 to 35 85
6 Space Complexity 36 to 44 135
7 Computability
rS 44 to 45 162
ree
Ca
ee
7. Can we identify computationally hard problems?
Abacus was the next device. Here is a Slide rule was another device that was
picture of a typical abacus. invented in 1600‟s.
Ca
1. Tally sticks like the Lebombo bone are the earliest computing devices. Earliest known is
35,000 BC.
2. Clay tablets were then used to store the count of livestock etc.
3. Abacus is known to exist 1000 BC and 500 BC in Mesopotamia and China.
4. Around 1620 John Napier introduced the concept of logarithms. The slide rule was
invented shortly afterwards.
More than physical devices we are interested in the history of concepts in computing. We are
interested in very basic questions. Let us ask the following more important questions.
When was the first algorithm devised? This is a very difficult question. Since, algorithms were
not studied the way we do them today.
Earliest Algorithms
Perhaps, the first algorithm devised was unary addition thousands of years ago in some cave.
The limbobo bone type devices were used to execute that algorithm. The concept of an
algorithm is very well developed in Euclid‟s school. Two striking examples are:
The Euclidean Algorithm to compute the GCD. It is a non-trivial algorithm, in the sense that it is
ee
not clear at all why it computes the GCD. The constructions by ruler can compass are examples
of precise algorithms. The elementary steps are well defined and the tasks are well defined.
• X2 + y2 = 0
• X2 + y2 = z2
• ax + by + z3 + 17d3 = 14567
These are called diophantine equations. What we want to do is to ask if there is a solution in
positive integers. For example X2 + y2 = z2 has a solution x = 3, y = 4 and z = 5. So this equation
has a solution. On the other hand x2 + x +2y =39 does not have a solution. So the question is:
Given a diophantine equation does it have a solution?
Hilbert‟s Tenth Problem was the following problem in computer science: Can we write a
program that will take a diophantine equation as an input and tell us if the equation has solution
in positive integers? He was asking for an algorithm.
A solution to Hilbert‟s tenth problem would mean that mathematicians could solve diophantine
equations mechanically. It took almost 70 years to solve this question. Eventually Yuri
Matiyasevich was able to solve Hilbert‟s Tenth Problem. What was Matiyasevich‟s solution?
ee
Did he invent an algorithm? No.
Matiyasevich‟s proved that Hilbert‟s 10th problem was unsolvable. There is no algorithm that
can tell if a diophantine equation has solutions in positive integers. Note that to understand what
he did is not easy. Matiyasevich did not say he is not smart enough to find a solution. He
proved that no such algorithm existed. He was showing impossibility of the existence of an
algorithm.
Impossibility Proofs
rS
Impossibility proofs are not very uncommon in mathematics. Let us look at some of them:
• It is impossible to write 2 as a ratio of two integers.
• It is impossible to trisect an angle with a straight edge and compass.
• It is impossible to cut a regular tetrahedron and put the pieces together into a cube of
ree
equal volume. (Here cuts are one that can be given with a hyperplane).
In order to prove that algorithm cannot exist we must have a exact and mathematically precise
definition of an algorithm. This is what is needed in order to show that an algorithm does not
exist. Once a precise definition of an algorithm is given we can argue mathematically about
them.
Ca
We will work with Turing‟s model. Turing came up with the idea of computing machines that we
now call Turing machines. He proposed that every computation that we can perform
mechanically could be performed by a Turing machine. The good thing about Turing machines
was:
ee
Turing machines are not an intuitive notion like that of an algorithm. But they are mathematically
defined objects. Thus one can prove things about them. One can prove their existence and non-
existence also. One can study them with the tools of mathematics and make precise claims
about them.
Halting Problem
This was a breakthrough. It showed that there are mathematically defined precise problems that
have no algorithmic solutions. Thus showing us that some problems cannot be computed!!!!!!
ree
Computability Theory
Computability Theory will deal with the following questions:
• What is a computation? Can we come up with a precise definition of an algorithm or a
computation? (The answer is yes).
• Can everything be computed. (The answer is no.)
Can we characterize problems that cannot be computed. (The answer is it is hard to do that but
in certain cases we can identify them and prove that they are not computable). We will study
computable and uncomputable problems in various different areas. We will study computability
and logic. We will prove Godel‟s incompleteness theorem.
Ca
• In any (axiomatic) system that includes numbers. There are always statements that are
true but not proveable.
This has had profound impact of mathematics. We will prove this theorem in our course using
tools from computability theory. We will discuss this in detail when the time comes.
Complexity Theory
Complexity theory is much more practical than computability theory. The basic idea is now we
not only want an algorithm but an efficient algorithm to solve problems.
One of the most important concept in Complexity Theory is that of NP-completeness. In 1971
Cook and Levin proved a very important theorem in the following papers:
ee
• Stephen Cook (1971). "The Complexity of Theorem Proving Procedures", Proceedings of the third annual
ACM symposium on Theory of computing, 151 –158.
• Leonid Levin (1973). "Universal‟nye perebornye zadachi". Problemy Peredachi Informatsii 9 (3): 65 English
translation, "Universal Search Problems", in B. A. Trakhtenbrot (1984).
• "A Survey of Russian Approaches to Perebor (Brute-Force Searches) Algorithms". Annals of the History of
Computing 6 (4): 384400.
rS
This theorem showed that if one given problem (called Satisfiability) could be solved efficiently
then many others could also be solved efficiently.
In fact, a whole infinite class of problems can be solved efficiently. This is a very strong
evidence that Satisfiability is a hard to solve problem. We are still looking for a proof!!!!!!!
In complexity theory we will talk about the celebrated P vs NP question. Which has a
ree
$1000,000.00 award on it. We will see why the amount stated here is not enough and why this
question has far reaching consequences in computer science.
The string that contains no symbols is denoted by . In some books it is also denoted by .
Notice that the length of this string is 0. It is called the empty string. Let us look at some strings.
1. 01011 is a string of length 5 over ∑ = {0, 1}
2. is a string of length 0 over ∑ = {0, 1}.
3. 10xx is a string of length 4 over ∑2 = {0, 1, x, y}.
4. abpqrsxy is a string of length 8 over ∑3 = {a, b, · · · , z}.
5. is a string of length 0 over ∑3 = {a, b, · · · , z}
Remember that all strings have finite length.
Let x be a string. We let xk denote the the concatenation of x with itself k times.
1. 010113 = 010110101101011.
ee
2. 5 =
3. 110 = 1111111111.
4. abc4 = abcabcabcabc.
5. Note that |xk| = k|x|.
Let ∑ be an alphabet. Then ∑k is the set of all strings of length k over the alphabet ∑.
If ∑ = {a, b, c, . . . , z} then ∑7 is the set of all possible strings of length 7, I can make out of
these characters. There are 267 of them. That is a lot of strings!
Let * = = ........
k 0 1 2 3
k0
Thus ∑* is the set of all the strings over the alphabet ∑. So, for example, if ∑ = {0, 1} then {0, 1}*
denotes all the strings over the alphabet {0, 1}. Examples of strings in {0, 1}* are:
1.
2. 101000
3. 11
4. 101001001001001010011011010011101101001001000100001
The lexicographical ordering of strings in ∑* is the same as the dictionary ordering, except the
shorter strings precede the longer strings. Thus to write the strings of {0, 1}* in lexicographical
order we first write all strings of length 0 in dictionary order, then all strings of length 1 in
dictionary order and so on.....
Thus we get , 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, 0000,
There is a very neat way to write the strings of {0, 1}* in lexicographical order. It is very easy for
computer scientist to understand as they know how to write numbers in binary.
First write 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ······in binary to get 1, 10, 11, 100, 101, 110, 111, 1000,
1001, 1010, 1011, and now throw away the starting 1 from each string to get the lexicographical
ee
ordering as follows: , 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, · · ·
A language L is a subset of ∑*
1. L1 = { , 00, 10, 11, 011, 10001}. L1 is a finite language.
2. L2 = {x {0, 1}* : x starts with a 0}.
3. L3 = {x {0, 1}* : x ends with a 1}.
Note that
1. 10001 L1 but 000 L1.
rS
2. 0, 001, 01010, 010101 L2 and 10, 11, 111, 100 L2.
3. 1, 111, 001 are strings in L3 and 010, 100 L3.
4. 011 is a string that is present in all these languages.
ree
Let us look at a complicated language over {0, 1}*
Lp = {x : x represents a prime number written in binary}.
1. 11Lp.
2. 101Lp.
3. 10001Lp.
4. 10Lp.
It is not easy to tell if a string is in Lp. Given a string you will have to check if it represents a
prime only then you will find out if it is in Lp.
Consider a language for example Lp which is the set of all strings over primes. It does not seem
to be an easy task to tell if a number is prime. So, it would be nice to have a machine that can
tell us if a number is prime. The machine can have some input device so that we can enter a
string x and ask it if x is a prime number (written in binary).
For now we want to concentrate on machines that only make decisions for us. The machine will
have an input device on which we will provide the input. The machine will perform a
computation and then tell us if the input belongs to a language or not.
Let us start informally and then come to the formal definition of a Turing machine. A Turing
machine has a input tape which is infinite. Hence, the machine has unlimited memory. It has a
read/write tape head which can move left and write and change symbols while the machine is
performing its computation. The tape initially is blank everywhere. We write an input on the tape
and turn the machine on by putting it in a start state.
The machine starts a computation and (may) eventually produces an output. The output can be
either accept or reject. The machine produces by going into designated two designated states.
ee
computation
Turing Machines:
Formally a Turing machine M is a 7-tuple, (Q, ∑, Г, , q0, qa, qr ), where Q, ∑, Г are all finite
sets.
1. Q is the set of states.
How does a Turing machine work? We have to understand how it takes a single step. The
transition function defines how it takes a single step. The Turing machine looks at the current
symbol under its tape head and the state that it is in. Depending on that it decides the following
three things:
1. What will be its next state?
2. What will it replace the current symbol with?
3. If it will move its head left or right.
This is why the transition function has Q × Г as its domain. It tells what the machine should do in
all possible situations. The range of the transition function is Q × Г × {L, R}. Therefore, it
specifies all the three decisions that the machine has to make.
Let us look at a very simple Turing Machine M1. In order to describe it I will need to specify all
the seven things.
1. The machine has three states, thus, Q = {q0, qa, qr}.
2. The input alphabet is {0, 1}.
3. The work alphabet is {0, 1, }.
4. The start state is q0.
5. The accept state is qa
6. The reject state is qr
ee
7. The transition function 1 is as follows:
Why have we not specified many entries? That is because, they are irrelevant. When the
machine enters qa it accepts the input. When the machine enters qr it rejects the input. Hence
from these two states no further computation happens.
This Turing machine will accept all the strings that start with a 0. Since from the start state it
goes to either the accept state or the reject state. Hence, it only reads the first symbol of the
input before accepting or rejecting.
1. The machine has five states, thus, Q = {q0, q1, q2, qa, qr}.
2. The input alphabet is {0, 1}.
3. The work alphabet is {0, 1, □}.
4. The start state is q0.
5. The accept state is qa.
6. The reject state is qr.
7. The transition function 1 is as follows:
ee
q0 1 q2 1 R
q0 □ qr □ R
q1 0 q1 0 R
q1 1 q1 1 R
q1 □ q2 □ L
q2 0 qr 0 R
q2
q2
1
□
rS
qa
qr
1. Suppose the input is then the machine starts in q0. It reads the first symbol on the tape
which is a blank. Since, 1(q0, □) = (qr, □, R). Hence it goes into state qr and rejects the
input. So the machine rejects the input .
ree
2. Suppose the input is 0 then the machine starts in q0. It reads the first symbol which is 0.
Since, 1(q0, □) = (q1, □, R). Hence it goes into state q1 and replaces the current symbol
with 0 (so does not change it). Now the machine is in state q1. Next it reads the symbol
Since 2(q1, □) = (qr, □, L) so it goes into state q2 and moves left. Now it is reading 0
again but is in state q2. Since 2(q2, □) = (qr, 0, R) the machine rejects this input.
3. Suppose the input is 1 then the machine starts in q0. It reads the first symbol which is 1.
Since, 1(q0, 1) = (q1, 1, R). Hence it goes into state q1 and replaces the current symbol
Ca
with 1 (so does not change it). Now the machine is in state q1. Next it reads the symbol
Since 2(q2, □) = (qr, □, L) so it goes into state q2 and moves left. Now it is reading 1
again but is in state q2. Since 2(q2, 1) = (qr, 1, R) the machine rejects this input.
This is very tedious. So let us work out a better way to following a computation of a Turing
machine.
Configurations:
What will happen if we give 00101 as an input to M2? We can represent its initial configuration
with q000101. This means the machine is in the state q0 reading the symbol that is to its left.
The tape contents are 00101. Next it goes into state q1 and moves right we can represent this
by 0q10101
This means it is in state q1 reading a 0. The tape contents can be easily obtained by ignoring
the state in the configuration. Again since 2(q1, 0) = (q1, 0, R) the head moves to the right so
Actually we realize that the tape head will keep moving to the right without changing the
contents till it reaches a blank. This is because, 2(q1, 0) = (q1, 0, R) and 2(q1, 1) = (q1, 1, R).
Hence the machine will eventually reach the configuration 00101q1 at this point the tape head is
reading a blank. Now, if we look up the transition function we observe that 2(q1, ) = (q2, , L).
So the head has to move to the left now. Hence the machine goes into the configuration
0010q21 and since 2(q2, 1) = (qa, 1, R) the machine eventually reaches 00101qa and accepts
this string.
ee
Can you tell which strings will be accepted by this Turing machine? Yes, it accepts all strings
that end with a 1. We can see this by examining the transition function again.
Let us once again make sure we understand what a Turing machine is? LECTURE#
Formally a Turing machine M is a 7-tuple, (Q, ∑, Г, , q , q , q ), where Q, ∑, Г3are all finite
0 a r
sets.
1. Q is the set of states.
2.
3.
4.
5.
6.
7.
q0 is the start state.
qa is the accept state.
qr is the reject state.
rS
∑ is the input alphabet not containing the special blank symbol _.
Г is the tape or work alphabet. ∑ Г and □ Г
q1 □ qa □ L
One again we have not specified what the machine does when it reaches qa or qr . Let us see
what this machine does since the machine halts when it reaches the accept or the reject state.
Let us intuitively see what each state is doing. Let us look at q0. When the machine is in q0
represents the fact that the last scanned symbol was not a 1. Note that (q0, 0) = (q0, 0, R).
so if the machine reads a 0 it simply stays in the state q0 and moves right (without changing the
input). However, when the machine sees a 1 it changes its state to q1 as (q0, 1) = (q1, 1, R)
When the machine is in the state q1 that represents the fact that the last scanned symbol was 1.
Thus if the machine is in the state q1 and sees a 1 it stays in q1 and if it sees a 0 it goes back to
q0. (q1, 0) = (q0, 0, R) and (q1, 1) = (q1, 1, R).
ee
To make a state diagram follow the following rules:
1. Each state is represented by a node in the state diagram.
2. Clearly, mark the initial state, the accept and the reject state. In this course we will
always mark the accept state with qa and reject state with qr.
3. For each state (other than the accept and reject states) there should be as many arrows
coming out of the state as there are symbols in the work alphabet.
4. Each arrow should be labeled by a replace symbol and a move.
A setting of all these things is called a configuration. Hence a configuration tells us what state
the machine is in, what is currently written on the tape and where the tape head exactly is.
We will represent a configuration in a very special way. Suppose Г is the tape alphabet.
Let u, v Г* and q be a state of the Turing machine. Then the configuration uqv denotes
1. The current state is q.
2. The tape contents are uv.
Ca
It is assumed that the tape contents are all blanks after the last symbol of v. Let us look at some
examples: Let us see the following picture. The machine is in state q7. The tape contents are
101101111 and the tape head is currently reading the fifth symbol with is a 0.
So in this case, the first four symbols constitute of u and the last five symbols constitute of v. So,
this configuration will be written as 1011q701111
Can you draw a picture? The tape contents are easily recovered by ignoring the state in the
configuration. So, the tape contents are 0abbabba01 and the machine is in state q3. Lastly, it is
reading the 6th symbol on the tape.
ee
this mean? In this case, v = . So the
machine is reading a blank symbol. The
picture is given here:
What if the head is at one of the ends of the the configuration? Let us discuss what if the head is
at the left end.
• qibv yields qjcv if (qi, b) = (qj, c, L)
• qibv yields cqjv if (qi, b) = (qj, c, R)
In this case, we are agreeing that if the machine is at the left most cell it will not move its head
further to the left even if the delta function requires us to do so.
Have we discussed all possibilities? One annoying one is left. Consider a machine in a
configuration qi
Is this a configuration? Yes, the machine is in state qi and the tape is completely blank and the
machine is reading the left most symbol.
• qi yields qjc if (qi, □) = (qj, c, L)
ee
• qi yields cqj if (qi, □) = (qj, c, R)
Summary:
• uaqibv yields uqjacv if (qi, b) = (qj, c, L)
• uaqibv yields uacqjv if (qi, b) = (qj, c, R)
rS
diagram and see what the yields relation
gives us if we start the machine with input
1001. Let us first look at the state diagram
of this TM again.
The start configuration of M on input w is q0w where q0 is the start state. Notice that the head is
reading the left most symbol.
Ca
ee
Does a TM only accept or reject inputs? No.
Lets consider M‟
rS
q000 ⱵM‟ 0q00 ⱵM‟ 00q0 ⱵM‟ ⱵM 00□q0 ⱵM‟ 00□□q0
What is the language accepted by M‟? By definition it is the set of all strings accepted by M‟.
Hence,
L(M‟) = {x {0, 1}* : x ends with a 0}.
ree
We have
• L(M) = {x {0, 1}* : x ends with a 0}
• L(M‟) = {x {0, 1}* : x ends with a 0}
Both accept the same language. However, the first machine M will reject all inputs which do not
end with a 0. Whereas, M‟ will go into a infinite loop on all such inputs. We prefer M to M‟.
We say that a Turing machine is a decider if it halts on all inputs in ∑*. Hence M is a decider
and M‟ is not a decider.
Ca
L0 is the given problem and M is the algorithm that solves or decides the problem. Let us look at
an example of this.
n
Let ∑ = {0}, L2 = { 02 : n 0}. We want to design a TM M2 such that L(M2 ) = L2 . Hence we want
a TM M2 that accepts all the inputs in L2. In order to solve this problem we must first understand
the language L2.
L2 = {0, 00, 0000, 00000000, 0000000000000000, . . . , }
To design a TM we must have a plan in mind. How will we solve this problem? Note that a TM
can only have finite number of states and therefore it cannot remember much. It does not even
know how to count (unless you teach it).
If I have some number of 0‟s and I cross each alternate one I will cut the number of 0‟s into half.
ee
Why would this work. If the number of 0‟s is 16, it will be cut into 8 and then 4 and then 2 and
then 1 and then we will accept. On the other hand if the number of 0‟s is say 20, it will be cut to
10 then 5 and then we will reject.
ee
2. Then the machine will repeatedly scan the input and match each character of w1 with
one character of w2.
We define our TM M
1. Q = {q1, . . . , q14, qa, qr}
2. ∑ = {0, 1,#}
3. Г = {0, 1,#, x,□}
In this diagram we have not shown qr. All transitions not defined here go to qr.
Let us look at the computation of this TM on the input 011#011. As you can see in the first
phase the machine has converted the input into □11#011.Now it is looking for a 0 in the second
string. When it finds a 0 it crosses it and starts all over again. Next it crosses of the 1 and is
looking for a 1 in the second string □x1#x11and is looking for a 1.
It finds a 1 after the # and crosses it. □x1#xx1 Now it goes back and similarly matches the last 1
to finally reach □xx#xxx Then it goes over the tape once and realizes that all things have been
matched so it accepts this input. Put Animation here from sipser‟s book.
Firstly all the strings in this language have a‟s then b‟s then c‟s. Some strings in the language
ee
are: abc, aabbcccc, aaabbcccccc. Some strings not in the langauge are ab, aabb, aabbccc, ba,
cab.
On input w
1. Scan the input and make sure that w is of the form a*b*c* if it is not reject.
rS
2. Cross off an a and scan to the right till a b occurs. Now, shuttle between b‟s and c‟s
crossing one b and one c till all b‟s are crossed off. If you run out of c‟s reject.
3. If all a‟s are crossed off. Check if all c‟s are also crossed off. In that case accept.
4. If there are more a0s left. Restore all b‟s and repeat.
On input aabbbcccccc, we will first check if the string is of the form a*b*c* and it is. So in the
next state we cross off an a x‟abbbcccccc. Since there are three b‟s we shuttle between b‟s can
ree
c‟s crossing them off.
To get to x‟axxxxxxccc, next there is still an a so we restore all the b‟s to get x0abbbxxxccc.
Now, we cross of another a to get x‟xbbbxxxccc. Now, shuttle between b‟s and c‟s crossing
them off to get x‟xxxxxxxxxx. Now check if there are any a‟s left. No so we accept.
One question.
Why is the left most x and x0. The answer is that this way the machine will know that it has
reached the left end of the tape.
Ca
This is a high level description of a TM. It is your home work to convert this high level
description to a formal description. In your home work you have to specify 7 things.
The set of states, the start, accept and reject state, the input alphabet, the work alphabet and a
transition function (for which you can draw a state diagram).
On input w
1. Place a mark on top of the left most tape symbol. If that symbol is not # reject.
2. Scan right to the next # and place a second mark on top of it. If no # is encountered and
a blank is encountered accept.
3. By zig-zagging compare the two strings to the right of the marked #. If they are equal
ee
reject.
4. Move the rightmost of the two marks to the next # symbol. If no # is encountered move
the left most marker to the next #. If no such # is available all strings are compared, so
accept.
5. Go to state 3
Suppose the input is #0#10#11#000. We start by marking the first # sign. This means the first
rS
string has to be compared will all the strings # 0 # 1011#000. It is first compared with the second
string. # 0#10 # 11#000 then # 0#10#11 # 000 then there are no more # signs. So we move onto
10. #0 # 10 # 11#000 then #0 # 10#11 # 000 then #0#10 # 11 # 000 after comparing them we are
done.
For i = 1, 2,…….,l
For j = i + 1,…….,l
Compare xi and xj
If they are equal
Reject.
Accept.
Ca
From now on. We will only give informal description of TM‟s. But, you should be able to convert
any informal description to a formal one. Most interesting TM have several states hundreds
even thousands. However, it will be clear that an informal description can be changed to a
formal one.
We will show that multi-tape TMs are not more powerful than single tape TMs.
ee
A single tape TM can actually “simulate” a multi-tape TM. Let us look at a 3-tape TM.
The configuration of this TM is described as follows: q3#aa˙aab#baa˙ab#a˙aaab
The main point is that this configuration can be written on a single tape.
rS
The method for writing a configuration on a single tape is:
ree
1. Write the state first followed by #.
2. Write the contents of the first tape followed by a #.
3. Write the contents of the second tape followed by a #.
4. Write the contents of the third tape followed by a #.
5. Place dots on all symbols that are currently been scanned by the TM.
Suppose that I have only a single tape. I want to know what a multi-tape TM will do on input w. I
can write q0#w‟#□˙#□ ˙ Which denotes the initial configuration of M on w.
Ca
Here w‟ is the same string as w except the first character has a dot placed on top of it. Now, I
can keep scanning the entire tape to find out which symbols are being read by M. In the second
pass I can make all the modifications that M would have done in one step. Given a k tape TM M
we describe a single tape TM S. S is going to simulate M by using only one tape.
Description of S:
On input w = w1…….wn
1. Put the tape in the format that represents the initial configuration of M; that is,
#q0w1˙ w2 ........ wn # □˙ # □˙……# □˙ #
2. To simulate a single move of M, S scans it‟ tape from left to right. In order to determine
the symbols that are placed under it‟s “virtual heads”. Then it makes a second pass to
update the contents of the tapes according to the transition function of M.
Suppose we have a three tape Turing machine and we give it the input aabb. The machine is in
the following configuration:
ee
This configuration will be written on a single tape of S as: #q0a˙abb#□˙#□˙# and suppose
(q0, a, □, □) = (q2, b, c, a, R, R, R).
That means the next configuration will be written as: #q2ba˙bb#c□˙#a□˙#. This represents the
machine in the following configuration:
rS
ree
Now, if (q2, a, □, □) = (q3, c, b, b, R, L, R) then #q3bcb˙b#c˙b#ab□˙# represents the next
configuration.
Ca
ee
Theorem: Every k-head TM has an equivalent single tape TM.
The proof of this theorem is almost a ditto copy of the previous theorem. However, there are
some subtle differences. You must write the proof in detail to understand these differences.
Can we use this theorem to make our lives simpler? Yes. Suppose someone asks you if the
language L = {w#w: w{0, 1}*} is decidable. It is much simpler to design a two tape TM for this
language.
Note that this is a much more natural solution than matching symbols by shuffling over the input.
From now on we will mostly describe multi-tape TMs. By this theorem we can always find a one-
tape TM that is equivalent to our multi-tape TM. This theorem is going to be extremely useful.
Enumerators:
Let us now look at another model of computation called enumerators. Enumerators are devices
Ca
that do not take any input and only produce output. Here is a picture of an enumerator:
Enumerator‟s are another model of computation. They do not take any input and only produce
output.
L(E) the language enumerated by E is the set of all strings that it outputs.
ee
Given an enumerator E Can we make a TM M such that L(E) = L(M); that is, the language
accepted by M is the same as the one enumerated by E. This is easy M simply has E encoded
in it. On input x it runs E and waits for x to appear. If it appears it accepts.
This does not work. Since, M loops for ever on 00 thus it will never output 01!
L(E) the language enumerated by E is the set of all strings that it outputs. Last time we outlined
a proof of the following theorem.
One side was easy: Let L be a language enumerated by an enumerator E. The L is Turing
recognizable. To prove this we are given that an enumerator E enumerates L. We have to show
that some Turing machine M accepts L.
The informal description of the Turing machine is as follows:
1. Input x
2. Run the Enumerator E each time it outputs a string w check if w = x. If w = x go to accept
state. Otherwise, continue.
The other side was a bit tricky. But let us look at the details. If L is Turing recognizable then
there is an enumerator E that enumerates L. Since L is Turing recognizable therefore some TM
M accepts L. We can use M as a subroutine in our enumerator E. Here is our first
attempt.
Recall that we know how to generate strings in lexicographical order as , 0, 1, 00, 01, 10,…..,
1. For i = 1,……,
2. Simulate the machine M on xi
3. If M accepts xi output xi
Lets say
ee
1. M accepts
2. M rejects 0.
3. M accepts 1.
4. M loops forever on 00.
5. M accepts 01.
Then the enumerator will never output 01. Whereas, 01L. So, we have to get around this
problem.
Dovetailing:
We can use dovetailing to get over this problem. The idea is to use “time sharing” so that one
bad string does not take up all the time.
Suppose xtL. Then M accepts xt and it must accept xt is some finite number of steps. Lets say
this number is k. Then we claim that when i = max(k, t) the string xt will be outputted. That is
because M is simulated for on x1, . . . , xi for i steps. Since, i t hence xt{x1, . . . ,xi} and since
Ca
Hence all strings that are in L are enumerated by this enumerators. To see if x L then E will not
enumerate it, we simply observe that E only enumerates strings that are accepted by M.
A minor point to notice is that E enumerates a given string many many times. If this bothers you,
you can modify E so that it enumerates each string only once.
Now, E keeps a list of all the strings it has enumerated and checks the list before enumerating
every string if it has been enumerated before. Show that if L is Turing recognizable then there is
an enumerator E that enumerates L. Furthermore, each string of L is enumerated by E exactly
once.
Definition of an algorithm:
Church-Turing Thesis:
The Church-Turing thesis states that the intuitive notion of algorithms is equivalent to the
mathematical concept of a Turing machine. It gives the formal definition for answering questions
like Hilbert‟s tenth problem.
Encodings:
ee
Note that we always take inputs to be strings over some alphabet ∑*
Suppose ∑ = {a, b, c, d} we can map
(1, 1), (2, 4), (3, 9) are in L and (1, 3), (2, 5), (3, 1) are not in L. L is a subset of N × N.
What we can do is to encode the pairs into strings. Let us say we reserve 11 to represent a 1,
00 to represent a 0 and 01 will be the separator. Then we can encode (2, 4) to 110001110000
Ca
We represent the encoding of a pair (i, j) by < (i, j )>. Then we can solve the following problem
L‟ = {< (i, j) >: i2 = j}. This is a language over {0, 1}*.
How can we encode this problem so that it becomes a language over {0, 1}*?
Lets say we have the matrix
2 5
3 6
First we can write it as linearly as 2, 5, 3, 6. Now, we can write all the numbers in binary where
00 represents a 0, 11 represents a 1 and 01 represents the separator, So
2 5
= 11000111001101111101111100
3 6
This graph can be written as a vertex list followed by an edge list as follows:
(1, 2, 3, 4)((1, 2), (2, 3), (3, 1), (1, 4)).
1. 101 represents ).
2. 110 represent (.
ee
3. 011 represents ,.
4. 000 represents 0
5. 111 represents 1.
and now it is easy to encode this whole graph into a 0/1 string. The encoding starts as
110111011111000011111111011111000000101110110111011111000101011110……….
which is very long but not conceptually difficult.
rS
Thus suppose we have L = {G : G is connected }.
We can instead look at L‟= {<G> : G is connected }.
This should not be surprising for you. As computer scientist you know that everything inside a
computer is represented by 0‟s and 1‟s.
ree
Hilbert’s Tenth Problem:
Recall from the first lecture:
Hilbert‟s tenth problem was about diaphantine equations. Let us cast it in modern language.
Devise an algorithm that would decide if a (multi-variate) polynomial has integral roots
Hilbert asked to study the language H = {<p> : p is a polynomial with integral roots}.
Ca
Once again <> mean encodings. Is there a Turing machine M that halts on all inputs and
L(M) = H
Matijaseviˇc Theorem:
H is not decidable. There is no Turing machine M that halts on all inputs with L(M) = H.
Let look at a simpler language: H1 = {<p> : p is a polynomial in x and has an integral root}. Lets
see that H1 is Turing decidable:
Here is the description of a TM
1. On input p.
2. for i = 0, 1, 2, . . . ,
3. evaluate the polynomial on x = i and x = −i. If it evaluates to 0 accept.
In a formal description you have to specify the whole TM using the 7-things needed. That is you
have to specify (Q, ∑, Г, q0, qa, qr, ).
Implementation description: In this you refer to the tapes of the Turing machine and explain how
ee
it moves and performs its task. Let look at an example L = {w#w : w{0, 1}* }.
rS
In a high level description we do not refer to the tapes. We simply describe the Turing machine
as an algorithm. We simply give psuedo-code for the algorithm.
Suppose we wish to show that L = {< j >: i2 = j for some i}.
We are assuming that we can build a machine (subroutine) that can multiply numbers, check for
equality etc.
Ca
When you give a high-level description you must be convinced that you can convert it into a
formal one in principle. Although it will take a very long time to do so.
Decidability
Decidable Languages:
Let us recall that a language A is called decidable if there exists a TM, M, that halts on all inputs
and L(M) = A.
1. If xA then M accepts x
2. If xA then M rejects x
ee
The Acceptance Problem for DFAs:
You must have studied DFAs in your previous automata course. DFAs are simple machines
with a read only head. Let us look at the following problem:
ADFA = {<B, w> : B is a DFA that accepts the input string w}.
Note that the input will not be a DFA B but a suitable encoding of B. Let us look at the following
picture which shows two DFAs:
ree
In this case:
1. <B1, 1> ADFA.
Ca
We are asking if we can make an algorithm (Turing machine) that will take as input a description
of a DFA B and a string w{0, 1}* and tell us if B accepts w?
The answer is yes. We can devise such an Turing machine. Since, aTuring machine can
“simulate the behavior of a DFA.”Here is a high-level description of such a Turing machine MD.
1. On input <B, w>
2. Simulate B on w.
3. If B reaches an accepting state then accept. If it reaches a rejecting state then reject.
ee
ANFA = {<B, w> : B is a NFA that accepts the input string w}
We are asking if we can make an algorithm (Turing machine) that will take as input a description
of a NFA B and a string w 2 {0, 1}* and tell us if B accepts w?
rS
There are two ways we can approach this problem. The first one is very similar to the previous
one. We can show that a Turing machine can simply “simulate” an NFA. All it has to do is to
keep track of the set of states that the NFA is in.
However, there is another approach. We know that every NFA can be converted to an
equivalent DFA. In fact, there is an algorithm that can convert any (description of an) NFA to a
ree
(description of an) equivalent DFA. So we have the following solution also:
1. On input <B, w> where B is a NFA and w is a string.
2. Convert B to an equivalent DFA C.
3. Run the TM MD on input <C, w>
4. If MD accepts accept. If it rejects reject.
Regular expressions:
1. <0*1*, 0011>AREX
2. <0*1*, 10011>AREX
3. <1(0 + 1)*, 10>AREX
4. <1(0 + 1)*, 0011>AREX
Is AREX decidable? Once again think of the question intuitively. We are asking if we can make an
algorithm (Turing machine) that will take as input a description of a regular expression R and a
string w{0, 1}* and tell us if R generates w?
Once again, there is another approach. We know that every REX can be converted to an
equivalent DFA. In fact, there is an algorithm that can convert any (suitable description of an)
regular expression to a (description of an) equivalent DFA.
The problem on the surface looks very difficult. We have to make sure that A does not accept
ee
any strings. However, the number of potential strings is infinite. Thus it would be futile to
simulate A on all the strings. The process will never end.
So we may start suspecting that this is perhaps a difficult question and designing a TM that tells
us in finite time if L(A) = may not be possible. However, a little thought shows that it is
possible to find a TM that decides this problem.
1. On input <A>
2. Mark the state of A.
rS
The idea is that if we want to tell if the language of a DFA is non-empty. All we are trying to see
is if it is possible to reach some accepting state. So, we can use the following method:
Notice that we managed to avoid a potential infinite computation by a clever method. If the
problem looks difficult think again. Think of ways to avoid potential infinite computations.
Ca
The input to this problem is a description of two deterministic finite automata A and B. We have
to decide if the L(A) = L(B). That is if they accept exactly the same language.
Note the problem on the surface again looks very difficult. We have to make sure that A and B
accept exactly the same strings and reject exactly the same strings. However, the number of
potential strings that we can as input is infinite. Thus it would be futile to simulate A and B on all
the strings. The process will never end.
So we may start suspecting that this is perhaps a difficult question and designing a TM that tells
us in finite time if L(A) = Ǿ may not be possible. However, a little thought shows that it is
possible to find a TM that decides this problem.
The idea is that if we want to tell if the language of a DFA is non-empty. All we are trying to see
is if it is possible to reach some accepting state. So, we can use the following method:
1. On input <A>
2. Mark the state of A.
3. For i = 1, 2,…….,n where n is the number of states.
4. Mark any state that has a transition coming into it from any state that is already marked
5. If any accept state gets marked reject. Otherwise, accept the input.
ee
This method can be explained in one line as follows:
1. On input <A>
2. Perform a DFS on the transition diagram of the A starting from the start state.
3. If any final state gets marked reject. Otherwise accept.
rS
If the problem looks difficult think again. Think of ways to avoid potential infinite computations.
The input to this problem is a description of two deterministic finite automata A and B. We have
ree
to decide if the L(A) = L(B). That is if they accept exactly the same language.
Note the problem on the surface again looks very difficult. We have to make sure that A and B
accept exactly the same strings and reject exactly the same strings. However, the number of
potential strings that we can as input is infinite. Thus it would be futile to simulate A and B on all
the strings. The process will never end.
So we may start suspecting that this is perhaps a difficult question. We have seen an example
Ca
Hence we can make a DFA C such that: L(C) = (L(A) L (B)) (L(B) L (A)) = Ǿ
1. On input <A, B>
2. Construct the DFA C described above.
3. If L(C) = Ǿ accept. Else reject.
The input to this problem consists of two parts. The first part is going to be a CFG G. The
ee
second part is going to be a string w. We have to decide if the G generates w. Note that the
input will not be a CFG G but a suitable encoding of G.
We are asking if we can make an algorithm (Turing machine) that will take as input a CFG G
and a string w{0, 1}* and tell us if G generates w?
rS
One idea is to try all possible derivation and check if we ever get the string w. However, we do
not know how many derivations to try and this can be potentially infinite.
But we know how to convert a grammar into Chomsky Normal Form. In chomsky normal form
every string of length n has a derivation of length at most 2n − 1.
ree
1. On input <G, w>
2. Convert G into Chomsky normal form G‟.
3. List all derivations with 2n − 1 steps.
4. If one of them generates w accept. Otherwise reject.
Note the above method is not the most efficient. You have studied more efficient algorithms for
this problem. You can recall the CYK algorithm.
Ca
Note the problem on the surface looks very difficult. We have to make sure that G does not
generate any strings. In other words, we would like to show that starting from the start symbol of
G we can never get to a string which consists only of terminals. Since, there are potentially
infinite number of derivations the problem looks difficult.
However, a little thought shows that it is possible to find a TM that decides this problem. The
idea is that if we want to tell if the language of a CFG is non-empty. All we are trying to see is if
it is possible to generate all terminals from the start symbol.
ee
Note the problem again looks very difficult. We have to make sure that G1 and G2 generate
exactly the same strings.
rS
ATM = {<M,w>: M is a TM that accepts w}.
This is called the acceptance problem for TMs or the halting problem.
This machine lets call it U is a recognizer for ATM. Why is this not a decider? If M loops on w
then U will also loop on <M, w>.
Suppose we have made the Universal TM U. Then we do not need to make any other
machines. We only need to describe them.
U is a general purpose computer. M is the program and w are the data. That is what happens in
a typical computer system. We make one kind of machine and run different programs on various
data on it.
The fact that Universal TM may look simple to us now but it is extremely important. Notice there
are no:
1. Universal DFAs
2. Universal PDAs
3. Universal CFGs
However, there is a Universal TM. Turing machines are powerful enough to simulate DFAs,
PDAs and other Turing machines as well.
What happens to U if the input is a machine M and data w such that M does not halt on w then
U does not halt on <M, w>.
We can ask the question is ATM decidable. So, we are asking if there is a decider H such that
1. H will accept <M, w> if M accepts w.
2. H will reject <M, w> if M does not accept w.
ee
Cantor’s approach:
To show that two sets are of the same size we can have two approaches. Let us say we want to
prove that the number of passengers standing at a bus stop is equal to the number of seats in
the bus.
are occupied.
A function f : A ! B is called one to one if each element of A is assigned a unique element B (that
ree
is, f is a function) and f(a) ≠ f(b) whenever a ≠ b. A function f : A → B if for every b in B there is
an a such that f(a) = b. A correspondence (or a one-to-one correspondence) is a function that is
both one to one and onto.
2
i
It seems that the larger circle1has more
4 9 16 25 36 ……
points than the smaller circle.
Lets look at another example: Then we have
N = { 1, 2, 3, 4, · · · , } g(i) = i – 1
B = { 2, 3, 4, · · · , } f (i) = i + 1.
Let= us
A { a,look
b, cat f:N
} |A| =→
3 B and f : A → B and g : B → A.
B = { x, y, z } |B| = 3
Hilbert Hotel:
The fact that N and B have the “same number of elements” can be explained as living in the
Hilbert hotel.
1. Set of positive integers is called a countable set.
2. Any set that has the same cardinality as positive integers is called a countable set.
ee
Is the set of all ordered pairs of the set of
positive integers countable? Yes. Consider f
given by a picture.
rS
ree
Ca
Is the set of all positive rational numbers countable? Yes. Consider f given by a picture.
ee
Uncountable sets and Cantor Diagonalization:
•
Let x = 0.4641 · · ·
The cantor set consist of all infinite 0/1
sequences.
1 → 101011011 · · · · 1 → A1
2 → 0010011011 · · · 2 → A2
Ca
3 → 1111000101 · · · 3 → A3
4 → 1010101001 · · · 4 → A4
5 → 1100110011 · · · 5 → A5
6 → 1010110101 · · · 6 → A6
.. ...
Let us define x a sequence x = x1x2x3 · · · xi, xi is different from the i-th bit of Ai by
where xi is the i-th diagonal flipped. definition.
This x cannot be in the list. It will differ from each Ai. In fact, it will differ from Ai in the i-th bit.
This is called the acceptance problem for TMs or the halting problem.
This machine lets call it U is a recognizer for ATM. Why is this not a decider? If M loops on w
then U will also loop on <M, w>. U is called a universal Turing Machine.
ee
Theorem: ATM is not decidable.
The proof is via contradiction. We will start by assuming that ATM is decidable and reach an
absurdity. This method of proving mathematical claims is called reductio ad absurdum.
So let us assume that ATM is decidable. This means there is a decider H such that
H ( M,w ) = accept,
reject,
rS
if M accepts w
if M does not accept w
H is a spectacular TM. Since, it halts on all inputs. So, it must be doing something very clever.
Lets continue analyzing H. Note that if H exists then we can use it as a subroutine in any other
TM that we design.
ree
So, let us look at the following TM which we call D:
1. On input <M>.
2. Convert <M> into <M, <M>>
3. Run H on <M, <M>>.
4. Output the opposite of what H does; that is If H accepts reject, if H rejects accept.
Note that we can easily make this TM D provided H is given to us. So, if H exists then so does
D. The main step D has to do is to convert the input <M> to <M, <M>>. What will running H on
Ca
<M, <M> tell us? H accepts <M, <M>> if and only if M accepts <M>. So we are finding out if M
accepts its own description or not!!
Can Turing machines accept/reject their own description? Can we run a program on itself? Well
why not!
Now, comes the interesting twist in the proof. We simply ask the question what does D do on its
own description?
accept,
D( D ) =
if D does not accepts w D
reject, if D accept D
But this is absurd in both cases. Hence, we conclude that D does not exist and therefore H does
not exist either.
Assume that H exists. We use H to construct D. This is easy D simply converts its input <M>
into <M, <M>> and calls H. It returns the answer that is opposite of the one given by H.
ee
1. H accepts <M, w> exactly when M accepts w.
2. D rejects <M> exactly when M accepts <M>.
3. D rejects <D> exactly when D accepts <D>.
rS
Where is the diagonalization in this proof? Well it is hidden but we can find it. Lets make a
matrix as follows:
1. Label the rows with TMs as M1,M2, M3, . . . ,Mk , . . .
2. Label the columns with descriptions of TMs as <M1>, . . . ,
3. Entry i, j is accept if Mi accepts <Mj>
1. The first three entries in the first row tell us that M1 accepts <M1> and <M3> but not the
<M2> and so on...
2. M2 accepts all descriptions......
3. Similarly the 4th row tells us that M4 accepts <M1> and <M2>...
Lets look at this matrix. Note that all entries are filled.
The machine D is when it is given <Mi> as an input looks at the entry i, i on the diagonal and
ee
outputs the opposite. Just like cantor‟s idea. Now, if D exists then there must be a row
corresponding to D in this matrix. Lets look at this row and see why we have a contradiction.
rS
1. The first element is reject and the second is reject where as the third and fourth
elements are reject.
The contradiction comes when we try to fill out the entry that corresponds to the column of D.
1. If we put an accept there, it should be reject.
2. Similarly, if we put a reject there, it should be accept.
ree
<M1> <M2> <M3> <M4> ··· ··· <Mi> ···
···
D = Mi accept accept reject Reject ··· ··· ??? ···
1. Assume H exists.
2. Construct D. We argue that D can be constructed from H.
3. We check what D will do if it is given its own description.
4. This leads to an absurdity. Therefore, H cannot exist.
5. Quad erat demenstratum
Ca
Lets recall that a language A is Turing recognizable if there is a TM M such that L(M) = A.
accept, xA
M ( x) =
does not accept, x A
Note that the same trick we did in the previous theorem will not work here. If we switch the
accept and reject states of M we get a new machine M‟ that does the following:
accept, xA
M '( x ) =
accept loops, x A
ee
So it is sensible to talk about a Language whose complement is Turing recognizable.
rS
Definition: If a language is Turing recognizable and co-Turing-recognizable then it is decidable.
1. On input x.
Ca
Now what is L(N)? If xA then N accepts x and if xA then N rejects x. Therefore, L(N) = A.
Furthermore, N halts on all inputs. This is because A A =∑*
Lets apply this theorem to the Halting Problem. We cannot apply is directly.
ee
2. Contrapositive of S is not Q not P
•
•
•
We showed that ATM is undecidable.
rS
We showed that if A is TR and co-TR then A is decidable.
We concluded that ATMis not TR.
ree
Ca
Reducibility
Linear Bounded Automata:
An LBA L is a restricted type of TM. Its tape
head is not permitted to move off the portion
of the tape containing the input. Here is a
diagram of an LBA:
ee
move its head on. ALBA = {<M, w>: M is a LBA that accepts w}
Is ALBA decidable?
Acceptance Problem for LBAs
Let us define the acceptance problem for LBAs.
Theorem: Let M be an LBA with q states and g tape symbols. There are at most qngn distinct
configurations of M for a tape of length n. Recall that a configuration is completely given by
Example: Suppose M has 7 states and 5 tape symbols. How many distinct configurations can
we have for a tape of length 3.
Theorem: Let M be an LBA with q states and g tape symbols. If M takes more then qngn steps
on an input, x, of length n then the M does not halt on x.
Note that in that case one of the configurations of the LBA has repeated. Hence, M is stuck in
an endless loop.
Theorem: ALBA is decidable: We can simulate an LBA on its given input only for a finite amount
of time to decide if it accepts the input or not.
ee
AREX is decidable. We convert the regular expression into a DFA and then use the decider for
AREX to decide AREX.
rS
APDA = {<P, w>: P is PDA that accepts w}.
This is decidable. Since, we can convert a PDA into a equivalent grammar and then use the
decider for APDA
there is one computation history that B will accept. On the other hand if M does not accept w
then L(B) =Ф.
Computation Histories:
Let M be a TM. We have the following concepts.
1. A state. This just tells us what the current state of the machine is.
2. A configuration. This tells us what the state is and all the tape contents. So, given a
configuration we can figure out what will be the next configuration.
Given a TM M and w we can construct an LBA B such that, B accepts the accepting
computation history of M on w. We give an informal description of B.
ee
b. B zigzags between Ci and Ci+1 to make sure if Ci yields Ci+1
Note that B never uses any extra memory. It only uses the portion of the input that is given to B.
B is an LBA.
1. Note that the input to B is not w but a computation history H of M.
2. In order to check if H is an computation history history of M on w it does not need more
rS
tape then the portion on which H is written.
Run R on B. If R accepts reject. If R rejects accepts. Note that S is a decider for ATM, a
contradiction.
Ca
On the surface the problem looks difficult. We have to find at least one string that is not
generated by the grammar G. However there are infinitely many potential strings. But our goal is
to PROVE that it is undecidable.
Computation Histories: Once again we will show that If ALLCFG is decidable then ATM is
decidable. Given a TM M and w we can construct an PDA D such that D rejects the accepting
configuration history of M on w (appropriately encoded) and accepts all other strings. We give
an informal description of D.
We will use a trick here. Let C1, . . . ,Cl be a computation history. We will encode it as follows:
ee
C #C r #C #C r # ..... #C . We will see in a minute why we are doing this.
1 2 3 4 l
Now consider the following PDA D. The PDA will take a computation history H = C1, . . . ,Cn and
do the following.
1. If C1 is not the initial configuration of M on w accept.
2. If Cl is not a final configuration then accept.
rS
3. If there is an i such that Ci does not yield Ci+1 then accepts.
Note that D is a non-deterministic PDA and therefore it can guess which one of these conditions
is violated. Since w is hard coded in D it can check if the first configuration is not the initial
configuration. Similarly, it is easy to check if the last configuration is not an accepting
configuration. How does D check if Ci does not yield Ci+1?
ree
This is why we said we will use a special encoding. We had C i#C r i+1
. Thus D can push Ci on its
stack and pop it and check if Ci yields Ci+1 or not.
Theorem: Given M and w we can construct a PDA D such that D rejects the accepting
computation of M on w (properly encoded). It accepts all other strings.
Theorem: Let M and w be given and D be constructed as earlier. Let G be a grammar which
generates all the strings accepted by G. Then
L(B) =
= *, M does not accept w
Ca
*, M accepts
ee
The second domino has a on top and ab on the bottom.
The third domino has ca on top and a on the bottom.
The fourth domino has abc on top and c on the bottom.
The top of each domino has more letter than the bottom. So in any list the top string will be
longer than the bottom string. Given a collection P of Dominos. Does P have a match?
The language ATM is undecidable. There is no TM H such that on input <M, w>
1. H accepts if M accepts w.
2. H rejects if M does not accept w.
ee
4. Output the opposite of what H does; that is If H accepts reject, if H rejects accept.
Once we have constructed D we will ask how it behaves on its own description.
Assume that H exists. We use H to construct D. This is easy D simply converts its input <M>
into <M, <M>> and calls H. It returns the answer that is opposite of the one given by H.
1. H accepts <M, w> exactly when M accepts w.
Russell’s Paradox:
rS
2. D rejects <M> exactly when M accepts <M>.
3. D rejects <D> exactly when D accepts <D>.
In a town called Seville there lives a barber. “A man Seville is shaved by the barber if and only if
ree
the man does not shave himself?"
Students are asked to make a list in which they can include the names of anyone (including
themselves) in a class room. Is there a particular student wrote the names of those people in
the class who did not include their own names in their own list?
Question: Does that particular student write his/her own name on his/her list?
Ca
Is it possible to have an enumerator that enumerates the description of all the enumerators that
do not enumerate their own description. Suppose such an enumerator F exists.
1. D that accepts (the description of) those TMs that do not accept their own description.
2. Barber who shaves all those people who do not shave themselves.
3. Student who lists all those students who do not list themselves.
4. Enumerator that enumerates (the description of) all those enumerators that do not
enumerate themselves.
Reducibility:
Suppose u have a language B in your mind and you can prove that. If I can make a decider for
B then I can make a decider for the halting problem.
Now suppose some one claims that they have a decider for the language B. Then I can claim
that I have a decider for ATM. Which is absurd. From this we can conclude that B is also
undecidable as we know that ATM is undecidable.
ee
Suppose that HALTTM is decidable and R decides HALTTM. If we have R then we have no
problem on TM that loop endlessly. We can always check if a TM will loop endlessly on a given
input using R. We can use R to make a decider for ATM. We know that ATM is undecidable. We
have a contradiction and that is why R cannot exist.
Suppose that ETM is decidable and R decides ETM. R accepts <M> if the language of M is empty.
We can use R to make a decider S for ATM. We know that ATM is undecidable. We have a
contradiction and that is why R cannot exist. How can we construct S?
Ca
S gets <M, w> as an input. The first idea is to run R on input <M> and check if it accepts. If it
does, we know that L(M) is empty and therefore that M does not accept w. But, if R rejects we
only know that the language of R is not empty. We do not know anything about what it does on
w. So the idea does not work. We have to come up with another idea.
Given R we can now we can construct S which decides ATM that operates as follows:
S is a decider for ATM. Since S does not exist we conclude that R does not exist. So ETM is
undecidable.
ee
REGTM = <<M>: L(M) is regular}
Suppose that REGTM is decidable and R decides REGTM. R accepts all TMs that accept a
regular language and reject all TMs that do not accept a regular language. How can we use R to
L(M ) =
rS
construct a decider for ATM? Note that S has to get as input <M, w> and has to decide if M
S gets <M, w> as an input. Given M and w we will construct a machine M2 such that
is regular,if M accpets w
ree
2
not regular,if M does not accept w
What is L(M2)?
Ca
Given R we can now we can construct S which decides ATM that operates as follows:
S is a decider for ATM. Since S does not exist we conclude that R does not exist. So REGTM is
undecidable.
ee
ETM is a special case of EQTM.
Suppose R decides EQTM. We design S to decide ETM. S gets <M> as an input. Let M‟ be a
machine that rejects all inputs.
On input <M>
1. Run R on <M, M‟>
rS
2. If R accepts, accept. If it rejects, reject.
It is easy to see that the language of S is ETM. As, ETM is undecidable so is EQTM.
ee
1. The first domino has ca on top and a on the bottom.
2. The first domino has acc on top and ba on the bottom.
For this collection it is not possible to find a match. Why? The top of each domino has more
letter than the bottom. So in any list the top string will be longer than the bottom string.
How will we prove this theorem? We will show that if PCP is decidable then we can use the
decider for PCP to construct a decider for A TM. The main idea is that given a TM M and an
input w we will describe a collection P of dominos such that, M accepts w if and only if P has a
match.
How to do this? First lets us deal with a little technicality. Lets define another problem which is
closely related to PCP. This problem is called MPCP (M for modified). Given a collection P of
Ca
dominos does P have a match that starts with the first domino? We will show that MPCP is
undecidable.
Once again the main idea is that given a TM M and an input w we will describe a collection P0
of dominos such that, M accepts w if and only if P” has a match starting with the first domino.
The main idea is that we will make a collection so that the only way to find a match will be to
“simulate” a computation of M on w.
Part 1:
#
,
Put in p as the first domino
# q w w ........w #
0 1 2 n
Since this is the first domino hence the bottom string is the initial configuration of M on input w.
To get a match one must choose dominos that are consistent with the bottom. We will chose
these dominos in such a way that it forces the second configuration of M to appear at the
bottom and so on.
Part 2:
qa
For every a, b Ɛ Г and q, r Ɛ Q where q ≠ qr if (q, a) = (r , b,R), put ,
br in p
ee
Part 3:
cqa
For every a, b, c Ɛ Г and q, r Ɛ Q where q ≠ qr if (q, a) = (r , b, L), put ,
in p
rcb
Part 4:
For every a Ɛ Г put
a
a
in p,
rS
The purpose of these dominos is to “simulate” the computation of M. Let us look at an example.
ree
Example:
Lets say Г = {0, 1, 2,□}. Say w = 0100 and the start state of M is q0. Then the initial
configuration is q00100. Now, suppose (q0, 0) = (q7,2,R). Part 1 places the first domino which
is
#
# q 0100 #
0
So we have the initial configuration on the bottom. This is how the match begins. In order for the
match to continue we want to pick a domino that has q00 on the top. Part 2 had put
Ca
q0
2q
7
in the collection. In fact, that is the only domino that fits. So if we are to continue our match we
are forced to put this domino next. The match now looks like Part 1 places the first domino
which is
# q0 0# 2q
# q 0100
0 2
This is quite amazing! The top string is the initial configuration. The bottom has the initial
configuration and the one that comes after the initial configuration. We want this to continue.
Part 5:
# #
Put and
# #
ee
1. The first one allows us to copy the # symbol that marks the separation of the
configurations.
2. The second one allows us to add a blank symbol at the end of a configuration to
simulate the blanks to the right that are suppressed when we write a configuration.
Lets continue with our example. Let us say we have (q7, 1) = (q5,0,R).
# q0 0100 # 2q7100 #
ree
# q 0100 # 2q 100 # 20q 00 #
0 2 5
Once again note that the top has the first two configurations of M and the bottom has the first
three. Lets look at what happens when the machine moves to the left.
In order to construct a match we must simulate M on input w. However, there is a problem. The
top string is always “behind” the bottom string! This is actually good for us. We will only allow
the top string to catchup with the bottom one if we go into an accept state.
Part 6: aq qa
For every a Ɛ Г put a and a in p,
q q
a a
These dominos allow us to eat up the symbols till none are left. Lets look at an example.
Suppose we have the following situation.
....#
# 21q 02 #
a
qa 0
We can use to get to ....# and other dominos in part 6 to finally reach ....#
ee
# 21q 2 # #q#
qa a a
Part 7:
qa ##
Finally we add in P”, Which will allow us to complete the match.
#
Summary of this construction:
rS
1. The first domino has the initial configuration on the bottom and only a # on the top.
2. To copy any configuration on the top. We are forced to put the next configuration on the
bottom.
ree
3. By doing this we simulate M on w.
4. When we reach the final state. We allow the top to catch up with the bottom string.
1. If M rejects w. There is no match. A match can only occur if we reach an accept state.
2. If M loops forever on w then the top string will never catch up with the bottom string.
Assume that MPCP is decidable. Let us say we have a decider R for MPCP. Consider the
following decider S
1. On input < M, w >
2. Construct p, as described in the seven parts.
3. Run R on P‟.
4. If R accepts, accept.
5. If R rejects, reject.
Then S is a decider for ATM, which is a contradiction to the fact that ATM is undecidable.
However, P‟ is an instance of MPCP. In fact, if we remove the restriction that the match should
start with the first domino it has a very short match. Now, we can convert this instance of MPCP
to an instance of PCP as follows:
Let u = u1 u2 · · · un be a string. Let us define *u, u*, *u* to be
*u = *u1 * u2 * u3 · · · * un
u* = u1 * u2 * u3 · · · * un *
*u* = *u1 * u2 * u3 · · · * un *
ee
t1 t2 t3 tk
,.........., b
b b b
1 2 3 k
*
*t1 *t2* b*t3* , ......... , b*tk*
*b b
1 2 3 k
rS
*
In this collection any match will start with the first domino. Since, it is the only one that has the
same first character on the top and bottom.
ree
So, we don‟t have to insist that the match has to start with the first domino. It is built into the
collection.
Other than forcing the first domino to start the match the * ‟s do not put any restrictions. They
simply interleave with the symbols.
*
allows the top string to add an extra * at the end of the match! Note can easily prove.
*
Ca
Assume that PCP is decidable. Let us say we have a decider R for PCP. Consider the following
decider S
1. On input < P0 >
2. Construct P as described above.
3. Run R on P.
4. If R accepts, accept.
5. If R rejects, reject.
Then S is a decider for MPCP. Which is a contradiction to the fact that MPCP is undecidable.
Now we will consider TMs that compute functions. So the TM has an input tape and an output
tape. A diagram of the machine looks like as follows:
ee
Let us consider a function f : ∑*→∑*.
We say that f is computable if there is a TM M such that on every input w it halts by writing f(w)
2. Suppose a string w encodes a description of a two tape TM. Let f be a function that
ree
constructs the description of an equivalent one tape TM. In this case, f is a computable
function.
Note that f must be a computable function. It does not need to be one-to-one or onto.
Usually we want to take n input w and produce and output f (w). An algorithm (or a TM M) that
computes f is what we look for.
Suppose we have two problems A and B. Roughly, if the solution of B can be used to also solve
A then we say A is reducible to B.
ee
water.
• The problem of solving a quadratic equation can be reduced to the problem making
friends with a mathematician.
Reducibility:
In this diagram:
rS
ree
• The left side shows ∑* and the shaded portion shows the set A.
• The right side again ∑* and the shaded portion shows the set B.
• The function must map all the strings inside A into B.
• It must also map all the strings inside A into B .
• The function must be computable.
1. Need not be one-to-one. So, it is allowed to map several inputs to a single one.
2. Need not be onto.
A = {<n>: n is even}
B = {w{0, 1}* : w contains no 1‟s }. Let us define f (x1x2 · · · xn) = xn then f is computable.
0, if n is even
f shows that A ≤m B since f ( n ) = 1, if n is odd
ee
Proof: Since B is decidable hence there is a decider M that decides B.
rS
Since f is a computable function and M is a decider therefore N is a decider. Furthermore, x is
accepted by N if and only if y is accepted by M. However, y is accepted by M if and only if yB.
Since f is a reduction therefore, xA if and only if y = f(x)B. Which implies that N accepts x if
and only if xA. This shows that N is a decider for A.
ree
Theorem: If A ≤m B and B is decidable, then A is decidable.
We are more interested in using the contrapositive form of this theorem. Since, we are more
interested in undecidable questions.
Ca
Suppose you have a language P that you do not know anything about. You are wondering if P is
decidable or undecidable. There are two approaches you have now.
1. Find language D which is known to be decidable and show that P≤mD now you can
conclude that P is also decidable.
Put given a language P how can you tell if it is decidable or not? How do you know which option
to take? In general no one knows the answer.
We first try to find an algorithm and try to prove that P is decidable. If we fail and start
suspecting that the problem is undecidable then we try to reduce some known undecidable
problem to P.
In case of Hilbert‟s third problem. It took 70 years to come up with the reducibility and prove that
the tenth problem is undecidable.
Mapping reducibility is only one kind of reducibility. It is in some sense the simplest way to
reduce a problem to another one. We will only discuss mapping reducibility.
ee
We have already used much reducibility to show that problems are undecidable. Let us look at
some of them. We have the fundamental problem.
rS
To show that ATM ≤m HALTTM, we have to find a computable function f such that
f (<M, w>) = <M”, w”> such that <M, w>ATM if and only if <M”, w”>HALTTM.
Let us show ETM ≤m EQTM. Let M1 be a TM that rejects all its inputs. Let us define
f(<M>) = <M, M1> then we have L(M) = Ф if and only if L(M) = L(M1).
In order to show that f is a reducibility we only need to argue that it is computable. It is clearly
Ca
Now the previous reducibility shows that E TM is undecidable. From this we can also conclude
that ETM is undecidable. Since A is decidable if and only if A is decidable.
If A ≤m B and B is Turing recognizable then A is Turing recognizable.
Proof is almost identical to the earlier theorem. Let A ≤m B and let f be the reducibility from A to
B. Furthermore, since B is Turing recognizable there is a TM M such that L(M)=B.
Consider N:
1. On input x
2. Compute y = f (x)
3. Run M on y
ee
4. If M accepts, accept.
Recall
Let show that: ATM ≤m EQTM.
M2:
ree
1. On input x 1. On input x
2. accept 2. run M on w, if M accepts w accept x.
Note that L(M1)=∑*, and L(M2)=∑* if and only if M accepts w. Therefore, L(M1)=L(M2) if and only
if M accepts w. Hence if we define f (<M, w>) = <M 1, M2> then f is a reducibility from ATM to
EQTM.
3. On input x 3. On input x
4. reject 4. run M on w, if M accepts w accept.
Note that L(M1) = Ф and L(M2) = Ф if and only if M does not accepts w.
Therefore, L(M1)=L(M2) if and only if M does accepts w. Hence if we define
f (<M, w>) = <M1,M2> then f is a reducibility from A TM to EQTM.
f can be computed as follows:
1. On input <M, w>
2. Construct M1 and M2 as described above.
3. Output <M1, M2>
ATM ≤m EQTM
This shows that A TM ≤m EQTM and hence EQTM is not Turing recognizable.
Theorem:
Every Turing recognizable language is mapping reducible to ATM. This theorem says that
ATM is in some sense the hardest problem out of all Turing recognizable problems. If one
could devise an algorithm for ATM one could devise an algorithm for all Turing
ee
recognizable problems. Alas, ATM is not decidable.
The proof of this theorem is very easy. let A be any Turing recognizable language. We
have to show that A ≤m ATM Let M be a TM such that L(M)=A. Note that M exists since A
is Turing-recognizable.
Now, let us consider the function f given by f(x) = <M, x>, clearly xA if and only if M
accepts x .
rS
Which is the same as xA if and only if <M, x>ATM.
Is f computable?
Yes, if we hardwire M into the program that computes f. Here is the machine F that
computes f
1. On input x
ree
2. Output <M, x>
This shows that ATM in some sense is the hardest Turing recognizable language.
Therefore, it is the best candidate for being an undecidable language. No wonder Turing
chose to work with this language and show that it is undecidable!
We will show that EQTM is also not Turing recognizable. We will now show that
ATM ≤m EQTM
What can we say about the languages of M1 and M2. The language of the first machine
is the empty language; that is, L(M1) = Ф. When is the language L(M2) empty? L(M2) is
empty when M does not accept w. Thus L(M1) = L(M2) if and only if M does not accept w
ee
.
Thus f(<M, w>) = <M1, M2> is a reducibility from ATM to EQTM . This shows that EQTM is
not Turing recognizable and not co-Turing recognizable.
rS
Theorem: Any language that is Turing recognizable is reducible to ATM.
Let A be any Turing recognizable language. We have to show that A ≤m ATM. Let M be a
TM such that L(M) = A. M exists as A is Turing recognizable.
ree
Let us consider the following function f: ∑*→∑* given by: f(x) = <M, x>.
All f does is appends M and x and makes the pair <M, x> and outputs it. Now,
1. If M accepts x then xA.
2. If M does not accept x then x A.
1. On input x.
Ca
All it does it on input x produces the output <M, x>. This shows that A ≤m ATM.
ee
main() {
cout « "main() {";
cout « "cout « \"main() {\”; ”;
······
This is not going to end. We have to
cleverer.
rS
Let us consider the following instruction and follow it: Print the following sentence twice,
second time in quotes “To be or not to be that is the question”. If we follow the instruction
the result is: To be or not to be that is the question “To be or not to be that is the
question”.
Let‟s change this a bit. Print the following sentence twice, second time in quotes “Hello
ree
how are you”. If we follow the instruction the result is: Hello how are you “Hello how are
you”.
Recursion Theorem:
Programming language version
The output of B is
A() { X="Hello";
cout « X;
B();}
Which is A.
The output of A is, Hello. Thus the output of the whole program is: Hello followed by A.
If we change "Hello" to "Goodbye", the output of the whole program is Goodbye followed
by A.
ee
to do is give the name main to the function A!
Recursion Theorem:
We want to make a machine that prints its own description. Let us start:
Theorem:
There is a computable function q : * → * where: q(w) = <pw>.Where pw is the machine
that prints w, (on any input).
Let us define A to be Q.
Let us run A followed by B.
What will be the output?
Combine the description with X to make Z Write Z,w on the tape and call T.
Now, if we define A to be A = q‟(<B>)
Then the combination of them is the machine promised in the recursion theorem.
1. A will put <B> on the tape.
ee
2. B will compute q‟(<B>) = A
3. It will combine A with B which is its own description.
4. Then it will call T.
1. On input <M>,w
rS
T analyzes programs. R does the same analysis on itself. Let us look at an example. Let
us define the following Turing machine.
Thus T counts the number of states in a TM M and prints it. Hence t(<M>, w) = no of
ree
states in M. The recursion theorem says that there is a TM R that computes r such that
r(w) = no of states in R. Thus there is a TM that prints a number which is the number of
states in that very machine.
The recursion theorem says that there is a TM R such that: r(w) = <R>. Namely that
there is a machine that prints its own description. We have already seen how to do this.
The recursion theorem says that a machine can obtain its own description and can go on
to compute with it. By using recursion theorem we can add the following statement to our
informal description any TM M that we are designing.
Obtain own description <SELF>
This is the machine that prints its own description. It is easy to design if we have
recursion theorem at our disposal.
Lets prove
ATM is undecidable
Using the recursion theorem. Suppose ATM is decidable. Then there is a decider H that
decides ATM.
ee
Consider the following TM B.
1. On input w
2. Obtain, via recursion theorem, own description <B>
3. Run H on B, w . If H accepts reject w if H rejects accept w.
Clearly, B on any input w does the opposite of what H declares it does.
Therefore, H cannot be deciding ATM. Simple!
rS
Lets look at this proof closely. Consider the following TM T:
1. On input <W>, w
2. Run H on <M, w> if H accepts reject. if H rejects accept.
This TM exists as long as H exists. However, this TM is not running H on its own
description but on <M>.
ree
The recursion theorem now says that there is TM which we call B such that it computes
a function b with b(w) = T(<B>, w) but T(<B>, w) = accept ↔ H rejects <B, w>. Thus B
accepts w ↔ H rejects <B, w>. Hence, L(H) ≠ ATM. a contradiction.
The length of the description of <M> is the number of symbols in its description. We say
M is minimal if there is no TM which accepts the same language as M and has a shorter
description. Let MINTM = {<M>: M is minimal}. This is a very interesting set. We will prove
that
Ca
Theorem
MINTM is not Turing-Recognizable.
1. On input w
2. Obtain own description <C>
3. Run E until a machine D appears with a longer description than that of C.
4. Simulate D on input w.
All we have to note that eventually D will appear as MINTM contains TM with arbitrarily
large descriptions. Now, what does C accept? L(C) = L(D). However, C has a smaller
Theorem
Let t:∑* → ∑* be a computable function. There is a TM F such that T(<B>)accepts the
same language as F.
We should think t as a function that scrambles programs. This theorem says that any
function that you write which scrambles programs will fail to scramble at least one
program in the sense that the scramble program will continue accept the same strings as
the original one.
ee
Consider F.
1. Input w.
2. Obtain own description <F>
3. Compute G = t (<F>)
4. Simulate G on w.
Clearly L(G) = L(F) and G = t (<F>). done!
The first statement is true a known for 2300 years. The second statement is true an took
about 400 years to prove. The last statement is a famous open problem.
We want to make these notions very precise. So, we consider the alphabet
Note that if we need more than one variable we can use x, xx, xxx and so on. However,
we will write them as x1, x2, .... Also, we will allow other boolean operators like → .
Thus p → q is a short form for p q
Ri (x1, . . . , xj ) is called an atomic formula. The value j is the arity of the relation symbol
Ri . All appearances of Ri will have the same arity.
A quantifier can appear anywhere in the formula. The scope of a quantifier is the
fragment of the formula that appears within the matched parenthesis or brackets. We will
ee
assume that all formulas are in prenex normal form, where all the quantifiers appear in
front of the formula. A variable that is not quantified with a scope of a quantifier is called
a free variable.
A sentence or a statement is a formula without any free variables. Lets look at some
examples.
Let us take the model M1 = ( N, ) . Thus our universe is the set of integers. The relation
is assigned to the relation symbol R1. This sentence is true. Since, in natural numbers
either a b or b a.
Now, let us take another model M2 = (N,<). This model assigns < to R1. Now the
sentence is not true. If Ri is assigned to a customary symbol we can switch to infix
notation. Thus in M1 we can write:
ee
We considered formulas with this alphabet.
Some examples of formulas
1.
2.
rS
To specify a model M. We have to specify
1. U the universe where the variables will take values.
2. P1, . . . , Pk are relation assigned to symbols R1, . . . , Rk .
ree
The language of a model M is the set of all formulas which use the relation symbols with
correct airy
The formula
Ca
is in the language. Since, all relations have the right airty but
If M is a model. We let TH(M) denote the theory of M. It is by definition the set of all
sentences that are true sentences in the language of that M.
A central point of investigation is to find out which theories are decidable and which ones
are not.
Note that when we are talking about a model in which relations have some usual
meaning then we shift to infix notation. We also saw examples of sentences that are true
in one model and false in another model.
1. M1 was the model with N as the universe. R1 was assigned PLUS. The sentence
can be rewritten as and it is not true.
2. M2 was the model with R as the universe. R1 was again assigned PLUS. The
sentence can be rewritten as and it is true.
Given a Model M we would like to know which sentences are true and which ones are
not true. What we want is a decision procedure or an algorithm which will take a
sentence and tell us if the sentence is true or not. One of the most interesting model is
number theory where M with relation + and ×, =, ≤ with their usual meaning. This theory
ee
is undecidable. This is a major theorem by Alonzo Church building on the work of Godel.
Let us start with a simpler theory. This theory is called Presburger Arithmetic.
We want to prove that this theory is decidable. So, we want to design an algorithm (or a
TM) with will take as input a sentence and decide if it is true or not. How does the input
look like:
Lets start with something simpler. Let us look at the simple atomic formula x + y = z.
Can we design a DFA that accepts x, y, z if and only if they satisfy this equation?
Let us be more clear. How will the input be given to the DFA. Let us define
Now, we can give input to the DFA. Suppose x = 5, y = 3, z = 6 then the input to the DFA
would be
Can we design a DFA where the input alphabet is and it accepts a string such
that the top row of the string when added to the second row of the string gives us the last
row of the string?
The answer is yes. But it is easier if the machine is given the strings in reverse. Lets see
how do we add in binary:
010100
001111
The way we add is we remember the carry and compute the next digit and keep moving
left. If the numbers were given to us in reverse we will remember the carry and keep
moving right. So, if these numbers were given to us in reverse.
ee
c=0 001010 c=1 001010
111100 111100
1 1100
c=0 001010
111100 c=1 001010
c=0
11
001010
111100
110
rS c=1
111100
11000
001010
111100
110001
ree
If the answer were given to us. We could check it as we go along. Actually this DFA is
very very simple. Let us look at it. It has only two state and a trap state. q 0, q1 and qr . q0
is the accept state. Here are the transitions. When the machine is in q0 the carry is 0.
When it is in q1 the carry is 1.
Ca
Each string represents four numbers x, y, z and t written in reverse binary. We can
make a DFA D such that D accepts I if and only if x + y = z + z + t
1. x1 + x2 + x2 + x2 = x3 + x3 + x4 + x5.
2. x1 + x2 + x3 + x5 = x4 + x4 + x4
We can make DFA D1 such which accepts strings over in the first case and another
ee
DFA D2 which accepts strings over . So, in some sense DFAs can capture atomic
formulas of Presburger Arithmetic.
(x1 +x2 +x2 +x2 = x3 +x3 +x4 +x5) ^ (x1 +x2 +x3 +x5 = x4 +x4 +x4)
rS
¬(x1 +x2 +x2 +x2 = x3 +x3 +x4 +x5) (x1 +x2 +x3 +x5 = x4 +x4 +x4)
In the first case we can make a DFA that accepts all the strings accepted by D1 or D2.
Regular languages are closed under union, intersection and complementarily.
Given quantifier free formula f (in Presburger Arithmetic) in i variables x1…..xi variables
ree
we define ∑I to be an alphabet with 2i characters. Each string SЄ∑i defines i numbers
a1….ai . We can make a DFA D such that D accepts S if and only if a1…..ai satisfy F.
We would like a DFA that accepts strings such that represent x1, x2 and
satisfy x3[x1 + x2 + x2 = x3]
It is much easier to make an NFA given the DFA D. The NFA N simply guesses the
value of x3 as it goes along moving to the right. Thus in each move it makes two non-
deterministic moves. One pretending the current bit of x3 is 0 and the other pretending
that it is 1. We can then convert this NFA N to a DFA using a standard algorithm.
We can deal with quantifiers like this one by one. Let say we have a formula
First we make an DFA D that accepts those over ∑i that represent x1, . . . ,xi and satisfy.
Then we make an NFA that uses D and guesses the value of each bit of xi as it goes
along. We convert this D to D0 which now accepts strings over
First we make an DFA D that accepts those over ∑i that represent x1, . . . ,xi and satisfy.
Thus we can use the fact that regular languages are closed under complementarily to
make a DFA D‟ that accepts We also use the technique for handling
ee
Let us again look at these very interesting statements.
1.
2.
3.
These statements are not statements in Presburger Arithmetic.
3 / 26
The first two use × and the second one also uses exponentiation. Which is not allowed
rS
in Presburger Arithmetic. Let us now study a different Model. (N,×, +). In this Model
1. The universe is a set of integers.
2. We have × and + as relations with their usual meaning.
The first and the third statements are in the language of this Model. This to many
mathematicians the the most important and interesting model. It is what we call number
theory or the Peano Arithmetic.
ree
Is the theroy of this Model decidable? Note that if this was the case, we would have an
algorithm which could decide if Golbach conjecture is true. Such an algorithm would be
amazing. It could tell us about the truth or falsehood of many mathematical statements.
From now on we will refer to this model as peano arithmetic or number theory.
We will prove:
This is both bad and good. This shows that no algorithm can be devised that can decide
the truth and falsehood of general statements in number theory.
Theorem: Let M be a TM and w a string. We can construct from M and w a formula M,w
in the language of (N,+,×) such that M,w is true if and only if M accepts w.
The proof of this theorem is quite large and complicated. To make the proof a bit simpler
we have provided the proof of the following theorem in the handout.
Theorem: Let M be a TM and w a string. We can construct from M and w a formula M,w
in the language of (N,+,×, ↑) (where ↑ denotes exponentiation) such that M,x is true if
and only if M accepts w.
Notice that the statement of this theorem is the same as the previous one, except that in
our model now we also allow exponentiation. Even this is quite technical to prove. So, it
is included as a handout for the interested students.
ee
Theorem: Th(N,+,×) is undecidable.
Proof. We can reduce Atm to TH(N,+,×). Recall ATM = {<M, w> : M accpets w}.
We give a mapping reduction.
1. On input <M,w>
2. Construct M,w and output it.
rS
M,w is true.
We are now in a position to prove an extremely celebrated theorem of Kurt Gödel. Gödel
was a German logician, perhaps, the greatest in history.
Gödel’s Theorem:
ree
He proved a theorem that shocked and
delighted the mathematical world. His
theorem is called the incompleteness
theorem. The story goes back to Hilbert
who outlined a program in 1920.
Godel showed that there are statements in TH(N,+,×) which cannot be proved. TH(N,+,×)
cannot prove its own consistency. This showed that Hibert‟s program was doomed. Godel
Theorem is considered to be a corner stone in foundation of mathematics. It is an extremely
important work with far-reaching consequences.
Let us prove:
There are statements in TH(N,+,×) which cannot be proved. In order to do this we have to
define what a proof is. But, instead of going into the exact definitions let us be a bit informal in
our treatment. We start with a set of axioms. For example the set of axioms for numbers is the
following:
ee
We add to this set of axioms
rS
This is better known as the principle of mathematical induction.
ree
Once we have defined axioms we can now define a proof.
A proof of a statement S is a sequence of statements S1, . . . ,Sl such that
1. Each Si is an axiom or
2. Each Si follows from the previous Sj ‟s using rules of logic. Which are precise and well
defined.
An Example of rules of logic is given S and S → T we can conclude T. Note that this rule is
purely syntactic.
Ca
Proof.
1. On input Ф
2. Enumerate all possible proofs
check if is a proof of Ф. If it is accept.
If all ture statements are proveable then since each statement is either true or false hence Ф or
≠ Ф is proveable. Thus the above algorithm will be a decider for TH(N,+,×)
This is a contradiction as we have already proved that TH(N,+,×) is not decidable.
The previous theorem just says that there are true statements which are not proveable. Can we
actually construct such a statement? This is what Godel Actually did. He constructed a
ee
statement which was true but not proveable. Let us also construct such a statement. We will use
the recursion theorem to construct such a statement.
1. On input w
2. Obtain self description <S>.
3. Construct a statement
4. if is proveable accept w.
rS
The fourth step can be done as the set of all proveable statements is Turing recognizable. Let
us clearly see what Ф is saying. Recall that says that S accepts 0. Hence
says that S does not accept 0. This statement is true if and only if S does not accept 0.
1. If S finds a proof of Ψ it will accept 0 (and all its inputs).
2. Ψ says that S does not accept 0.
ree
Thus if S finds a proof of Ψ then Ψ is false. But a false statement cannot have a proof.
Therefore Ψ is true. But that means S does not find a proof of it and hence it is un provable.
Gödel also proved that TH(calN,+,×) cannot prove its own consistency. These two theorems
were corner stones of logic.
Oracles:
Recall the definition of mapping reducibility.
A is mapping reducible to B, written as A ≤m B if there is a computable function f : ∑*→∑*
such that for all x, x Є A if and only if f(x) Є B.
Ca
Mapping reducibility is not the most general way to reduce. Recall that we proved that:
But intuitively a Language and its complement are almost the same problem. So they should be
reducible to eachother. We will now define a new concept that will capture this intuition. For this
we will introduce the notion of Oracles. What are Oracles? Where does the word come from?
An oracle for a language B is an external devise that solve the membership problem in B. An
oracle tape TM has a special tape on which queries are written and the machine at any point in
its computation write a string w on its oracle tape and enters a special state q?. If w Є B then the
machine enters in the state qy otherwise it enter the state qy. Here is a picture of an oracle tape
TM.
ee
Having an oracle B is like having access to a magical power that answers questions. A Turing
machine with oracle B will be denoted by MB.
Lets look at an example of an oracle tape TM.
We know that ATM is undecidable. ATM = {<M,w> : Maccpetsw} and ETM is undecidable. Here ETM =
{<M> : L(M) = Ф}
rS
We want to construct a oracle TM TATM that decides ETM. So we are asking the following
question. If we assume someone can solve ATM can we use that knowledge to solve ETM.
Let M be a TM. Let us consider another another TM N.
1. On input x
2. using dovetailing run M on all strings.
ree
3. If M accepts any string then accept x.
We note that L(M) ≠ Ф then N accepts all strings. On the other hand if L(M) = Ф then N does not
halt on any string.
Note that we do not have an algorithm for solving ETM. All we have shown is that if ATM could be
solved then we can solve ETM. So in this case, we say that ETM is decidable relative to ATM.
Turing reducibility:
A language A is Turing reducible to B, written as A ≤T B if A is decidable relative to B.
More precisely, there is a Turing machine with oracle B, MB such MB halts on all inputs and
L(MB) = A.
Let us ask what would happen if we could solve the halting probelm. How powerful are TM‟s
which have access to an oracle for ATM.
The smallest positive integer that requires at least twenty words to describe.
1. Since there are infinitely many integers hence we cannot describe all of them in under
twenty words.
2. Hence there must be many integers that require more than twenty words to describe.
3. Out of them there is one which is the smallest.
This sentence is describing a number. However, it is describing a number which requires at
least twenty words to describe.
ee
The smallest positive integer that requires at least twenty words to describe.
But this sentence itself has only twelve words! This sentence is describing in twelve words a
number that requires at least twenty words to describe!!!! This is a paradox.
Actually there is no paradox. When we realize that the word "describe" does not have a precise
meaning in English. When we try to agree on what a description of a number is the paradox
goes away. For example if we say that we describe numbers by saying their digits then this
A = 10101010101010101010101010101010101010
B = 10101000101001101000100100111101001011
ree
Intuitively A seems to have less information than B. This is because I can describe A quite
easily
1. The string A is 01 repeated 20 times.
2. The string B seems to have no such description.
There are two ways to describe the string A. One is to write its individual bits and the other is to
simply say that it is 10 repeated fifteen times. We will be interested in the shortest description.
The most general way to describe a string s is to give an algorithm A that produces that string.
Thus instead of describing a string s we simply describe A. Let us make this idea more precise.
Formally and algorithm is just a TM so all we have to do is to describe a TM. Lets start carefully.
Let us first consider a TM M and a binary input w to M. Suppose we want to describe the pair
(M,w). How to perform this encoding? We can take the description of M that is <M> and
concatenate w after it. There is a problem with this.
Suppose we send this description to a friend. The friend would not know. Where the description
of M ends and where does w start from.
To avoid this problem we can have a convention. All the strings in the description of M would be
doubled. The description will be followed by 01 and then we will simply write down w. With this
rule there will be no ambiguity in finding out where w starts.
ee
d(x), is the shortest string <M,w> such that
1. M halts on w
2. Running M on w produces x
If there are several such strings we select the lexicographically first amongst them. The
descriptive complexity of x, written as K(x), is K(x) = |d(x)|. It is the smallest description of x.
Lets look at a practical example: Java. You can think of hM, xi to be a java program M and its
rS
data w. Your computer downloads the whole program and the data and when you run it on your
own computer with data w it produces a lot of other data.
You can download a program that produces the digits of
communicated to you in a very small program.
Let M0 be the TM that halts as soon as it starts. So, it does not change its input at all. We can
always describe x by <M0>, x. This description has size 2|<M0>| + |x|
What is important here is the |M0| is independent of the length of x. Let c = 2|M0| + 2 then there
is a description of x of size |x| + c. Since K(x) is the minimal length description so it cannot be
larger than |x| + c.
Note that we are saying that we can append the description of the “NULL” machine to each
string. Why don‟t we just leave M0 out. This is because anyone reading the description is
expecting to see a description of some machine first then 01. If we agree that M0 will be denoted
by the empty string then we can simply send 01x. But even in this case c = 2.
ee
Theorem: There is a constant c such that K(xx) ≤ K(x) + c
Let M3 be a TM that takes hNiw as inputs and doubles the output after running N on w.
1. On input <N,w>
2. Run N on w.
3. Let s be the output of N on w.
4. Write ss.
rS
A description of xx is given by <M>d(x). Recall that d(x) is the minimal length description of x. In
the previous proof d(x) was the smallest description of x. We appended the message, Double
your output. To it to obtain a description of xx.
Now consider <M> double (d(x)) 01d(y). Then it is clear that when M is run on double d(x)d(y) it
produces xy hence K(xy) ≤ +|<M>| + 2|d(x)| + |d(y)| + 2,
So we may take c = |<M>| + 2. One can show
Theorem: There is a constant c such that K(xy) ≤ 2 logK(x) + K(x) + K(y) + c and we can even
improve this by using more sophisticated coding techniques.
Optimality of Definition:
We have a very important question we can ask. What happens if we use some other way of
describing algorithms. Lets say we say algorithms are java programs. Wouldn‟t that change the
descriptive complexity?
Let p : ∑*→∑* be any computable function. Let us define dp(x) be the length of the
lexicographically shortest string s such that p(s) = x. We define Kp(x) = |dp(x)|
For example we can let Kjava(x) to be the length of the shortest java program that outputs
x. The question is what happens if we use Kjava instead of K? The answer is that descriptive
complexity only changes by an additive constant. Thus K(x) is in some sense universal.
Theorem: Let p be a computable function p : ∑*→∑* then there is a constant c such that K(x) ≤
Kp(x) + c
The idea is that we can append an interpreter for java in our descriptions and this changes the
description length by a constant.
Consider the following TM M
1. On input w
2. Output p(w)
ee
Let dp(x) be the minimal description of x (with respect to p). Then consider hMidp(x). It is clear
that this is a description of x. The length is 2|<M>| + 2 + dp(x)
So we may take c = 2|<M>| + 2.
Incompressible strings:
Theorem: In compressible strings of every length exists.
rS
The proof is by counting. Let us take strings of length n. There are 2n strings of length n.
On the other hand each description is also a string. How many descriptions are there of length ≤
n − 1. There are 20 + 21 + · · · + 2n−1 = 2n – 1 descriptions of length ≤ n − 1.
Hence at least one string must be incompressible.
ree
Theorem: At least 2n − 2n−c+1 + 1 strings of length n are incompressible by c.
The proof is again by counting. Let us take strings of length n. There are 2n strings of length n.
On the other hand each description is also a string. How many descriptions are there of length ≤
n − c).
Many properties of random strings are also true for incompressible strings. For example
1. Incompressible strings have roughly the same number of 0‟s and 1‟s.
2. The longest run of 0 is of size O(log n).
Let us prove a theorem which is central in this theory.
Theorem: A property that holds for "almost all strings" also holds for incompressible strings.
Let us formalize this statement. A property p is a mapping from set of all strings to {TRUE,
FALSE}. Thus p:{0, 1}* → {true, false}.
Theorem: Let p be a computable property. If p holds for almost all strings the for any b > 0, the
property p is false on only finitely many strings that are incompressible by b.
We can use M to obtain short descriptions of string that do not have property p. Let x be a string
that does not have property p. Let ix be the index of x in the list of all strings that do not have
property p ordered lexicographically. Then hM, ix i is a description of x. The length of this
ee
description is |ix | + c
Now we count. For any give b > 0 select n such that at mos
fraction of the strings of length ≤ n do not have property p. Note that since p holds for almost all
rS
strings, this is true for all sufficiently large n. Then we have
Theorem: A computable property that holds for ”almost all strings” also holds for incompressible
strings.
Let us formalize this statement. A property p is a mapping from set of all strings to {TRUE,
FALSE}. Thus p:{0, 1}* → {true, false}.
Theorem: Let p be a computable property. If p holds for almost all strings the for any b > 0, the
property p is false on only finitely many strings that are incompressible by b.
ee
1. On input i (in binary)
2. Find the ith string s such that p(s) = False. Considering the strings lexicographically.
3. Output s.
We can use M to obtain short descriptions of string that do not have property p. Let x be a string
that does not have property p. Let ix be the index of x in the list of all strings that do not have
property p ordered lexicographically. Then <M, ix > is a description of x. The length of this
description is |ix | + c
rS
Now we count. For any give b > 0 select n such that at most
fraction of the strings of length ≤ n do not have property p. Note that since p holds for almost all
strings, this is true for all sufficiently large n. Then we have
ree
Therefore, |ix | ≤ n − b − c. Hence the length of <M>ix is at most n − b − c + c = n − b. Thus K(x)
≤ n – b, which means x is b-compressible. Hence, only finitely many strings that fail p can be
incompressible by b. We can ask a very interesting question now: Is it possible to compress a
string that is already compressed? The answer to this question seems to be no! Let us prove
something that is quite close to this.
Theorem: There is a constant b such that for every string x the minimal description d(x) is
Ca
incompressible by b.
Roughly the idea of the proof is that, If d(x) were compressible we could use the compressed
version of d(x) to obtain a smaller description of x than d(x).
Consider the following TM M:
1. On input <R, y>
2. Run R on y and reject if its output is not of the form <S, z>
3. Run S on z and halt with its output on the tape.
This machine is”decoding twice”. Let b = 2|<M>| + 3. We now show that b satisfies the
statement of the theorem. Suppose on the contrary that d(x) is b compressible for some string.
The |d(d(x))| ≤ |d(x)| − b. But what happens if we run M on d(d(x)). Note that it outputs x.
Complexity
Time
Resource Bounded Computations
Now we will look at resource bounded computations. Two of the most important resources are
going to be
1. Time 2. Space
Let A = { 0k1k : k ≥ 0}
ee
1. Consider the following TM that accepts A
2. Scan across the tape to check if all 0‟s are before 1‟s if not reject.
3. Scan across the tape crossing of a 0 and a 1 each time.
4. If 0‟s remain and 1‟s finished reject.
5. if 1‟s remain and 0‟s are finished reject.
6. if both 0‟s and 1‟s are finished accept.
7. Goto Step 2
rS
We want to analyze this algorithm and talk about how many steps (time) does it take. We saw
an algorithm that accepts A in time O( n2 ).
A step of the Turing machine is assumed to take unit time. Lets start with a definition. Let M be
a TM that halts on all inputs. The running time or the time complexity of M is a function f: N → N
such that f(n) is the maximum number of steps that M uses on any input of length n.
ree
1. We say that M runs in time f(n).
2. M is f(n)-time TM.
Big O Notation:
We want to understand how much time a TM takes in the worst case. The exact running time
t(n) is going to be very complicated expression. In some cases we will not be able to compute it.
But what we are interested in is getting a rough idea of how t(n) grows for large inputs. So we
will use asymptotic notation.
Ca
Lets look at an example let f (n) = 5 n3 + 13 n2 − 6n + 48. This function has four terms. He most
important term is 5 n3 which will be the bulk if n is large. In fact 5 n3 the constant 5 does not
contribute to the growth of this function. Thus we will say that f(n) is order n3 . We will denote
this by f (n) = O( n3 ). Let us state this precisely:
Let f,g:N→R+. We say that f (n)=O(g(n)) if there are positive integers c, n0 such that for all n≥
n0, f (n) ≤ cg(n) When f (n) = O(g(n)) we say that g is an asymptotic upper-bound on f .
Sometime we will just say that g is an upper bound on f.
Intuitively if f (n) = O(g(n)) you can think that f is less than or equal to some multiple of g. For
most functions the highest order term is is an obvious candidate to be g.
Suppose that f(n)= logb n where b is a constant. Note that logb n = log2 n / log2b. Hence, we can
say f(n)=O(logn). Let f2(n) = 5n log n + 200n log log n + 17, then f2(n) = O(nlogn). We can use O
notation in exponents also. Thus 2 ( ) stands for a function that is upper bounded by 2cn for
O n
some constant c.
We have to be careful though. For example consider 2O(logn) . Well remember that O(log n) has a
constant attached to it. Thus this function is upper bounded by 2clogn for some constant c. Now,
we get 2clogn = (2logn )c = nc . Thus 2O(logn) is polynomial.
Bounds of the form nc are called polynomial bounds. Where as bounds of the form 2nc are called
ee
exponential bounds.
Little Oh Notation:
The Big-Oh notation roughly says that f is upper bounded by g. Now we want to say f grows
strictly slower that g. Let f,g:N → R+. We say that f (n) = o(g(n)) if lim (f (n) g(n)) = 0 . Thus
x→
the ratio of f and g becomes closer and closer to 0.
1.
2.
3.
4.
5.
n = o(n)
n = o(n log log n)
n log log n = o(n log n)
n log n = o( n2 )
n2 = o( n3 )
rS
ree
Lets look at a new TM that recognizes
A = { 0k1k : k ≥ 0}
1. Scan the tape and make sure all the 0s are before 1s.
2. Repeat till there are 0s and 1s on the tape.
1. Scan the tape and if the total number of 0s and 1s is odd reject.
2. Scan and cross of every other 0 and do the same with every other 1.
3. If all 0s and 1s are crossed accept.
Ca
Thus if we have 13 0s in the next stage there would be 6. This algorithm shows that there is
O(nlogn) time TM that decides A. Since, nlogn = o( n2 ). This points to a real algorithmic
improvement.
Time Complexity Classes Let t : N → N be a function. Define the time complexity class
TIME(t(n)) to be: TIME(t(n)) = {L|L is decided by an O(t(n))-time Turing machine}.
So for example:
1. TIME(n) is the set of all languages, L, for which there is a linear time TM that accepts L.
2. TIME( n2 ) is the set of all languages, L, for which there is a quadratic time TM that
accepts L.
Recall A = { 0k1k : k ≥ 0}. Our first algorithm (or TM) showed that A Є TIME( n2 ). The second one
showed that A Є TIME(n log n). Notice that TIME(nlogn) _ TIME( n2 ). What will happen if we
allow ourselves to have a 2-tape TM.
Can we have a faster TM. Yes, if we have two tapes we can do the following:
1. Scan across to find out if all 0s are before 1s.
2. Copy all the 0s to the second tape.
3. Scan on the first tape in forward direction and the second tape in reverse direction
crossing off 0s and 1s.
4. If all 0s have been crossed off and all ones have been crossed off accept else reject.
In fact, it one can design a 2-tape TM that accepts A while scanning the input only once! This
leads to an important question. Can we design a 1-tape TM that accepts A in o(n log n) time,
Preferably linear time? The answer is NO!
ee
1. L is accepted by a 2-tape TM in time O(n).
2. L cannot be accepted by any 1-TM TM in time o( n2 ).
Thus the definition of TIME(t(n)) changes if we talk about 1-tape TMs or 2-tape TM. Fortunately,
it does not change by too much as the following theorem shows.
Theorem: Let t(n) be a function where t(n) ≥ n. Then every t(n)-time k-tape TM has an
equivalent O(t2(n))-time single tape TM.
rS
Notice the difference from computability theory. Previously we showed:
The proof of the theorem is just a more careful study of the previous proof. Remember given a
k-Tape TM, M, we made a 1-tape TM, N. N works as follows:
1. On input x convert the input to #q0#x#· · ·# the start configuration of M. This
configuration says that x is on the first tape. The rest of the tapes are empty and the
machine is in q0.
2. In each pass over the tape change the current configuration to the next one.
3. If an accepting configuration is reached accept
4. If a rejecting configuration is reached reject.
Ca
ee
For example we may have (q5, a) = {(q7, b, R), (q4, c, L), (q3, a, R)}.
Thus the machine has three possible ways to proceed.
branch.
rS
accepts x. If x is accepted on at least one
Let us prove some relation between non-deterministic time and deterministic time.
Theorem: Let t(n) be a function, where t(n) ≥ n. Then every t(n) time nondeterministic single
tape TM has an equivalent 2O(t(n)) time deterministic single tape TM.
Thus this theorem says that we can get rid of non-determinism at the cost of exponentiating the
time!
Let N be a non-deterministic TM that runs in time t(n). Let us first estimate how big the
computation tree of N can be:
ee
4. The total number of nodes can be at most 1 + b + b2 + · · · + b t(n) ≤ 2bt(n) .
A deterministic TM D simulates the computation tree of N. There are several ways to do this.
One is given in the book. Here is a slightly different one. To begin with D is going to be a multi-
tape TM.
1.
2.
3.
4.
rS
Starting from the root go to the first child c1 of the root.
Then the second child of c1 call it c2.
Then to the third child child of c2 call it c3.
Then to the first child of c3
ree
Thus each such string corresponds to a branch in the tree.
1. On input x of length n.
t (n)
2. For all strings s Є R .
3. Simulate N making the choices dictated by s.
4. If N accepts then accept.
Ca
This method assumes that t(n) can be computed quickly. However, we can also eliminate this
requirement. For each string s the simulation can take O(t(n)) time. There are bt (n) strings to
simulate. Thus the total time is O(t(n)) bt(n) = 2 ( ) .
O t (n)
Our last part is to convert this multi-tape TM into a single tape TM. If we use the previous
2O (t(n)) O (t(n))
theorem then the total time would be (2O(t(n)) ) 2 = 2 =2 . We will come back to NTMs
again.
P Classes:
We want to know which problems can be solved in practice. Thus we define the class P as
follows:
p= TIME (nk )
k
or more elaborately write is as:
ee
deterministic single-tape machines.
2. P roughly corresponds to the problems that we can solve realistically on these models.
Thus in defining P we can talk about single tape TMs, Java programs, multi-tape TMs. They all
give the same definition of P.
The second point is problems in P can be solved in reasonable time. Some people may object
rS
to this as they can point out that if a problem can be solved in time O( n5000 ), then such an
algorithm is of no practical value. This objection is valid to a certain point. However calling P to
be the threshold of practical and unpractical problems has been extremely useful.
Question: Design a polynomial time algorithm that takes as input a graph G and two vertices s
and t (suitably encoded) and decides if there is a path from s to t.
How will we encode the graph G. There are two ways to do this
1. Adjacency Matrix
2. Adjacency List
Ca
How do we show that PATH is in P. We have to give a polynomial time algorithm for this
problem. The basic idea is “Start BFS or DFS from s and if t appears then there is a path from s
to t”. Lets look at this algorithm in detail.
To analyze the algorithm we first have to compute the size of the input. The input size is at least
m, where m is the number of nodes in G. We show that algorithm runs in time polynomial in m.
The repeat loop can at most run for m time. Each time all the edges are scanned. Since the
number of edges is at most m2 thus step 2 takes at most m2 time. So, the total time is at most
m3 . Hence we have shown that PATH is in P.
Let us take another problem. Let us say x and y are relatively prime if the largest integer dividing
them both is 1. Let us define: RELPRIME = {<x, y> : x and y are relatively prime }.
ee
4. Check if i divides x and y both. If it does reject.
5. Accept.
Let analyze this algorithm. First we have to compute the size of the input. We write numbers in
binary. Let us say that x < y are both k-bit numbers. So the size of the input is about 2k .
If x and y are relatively prime then the for loop will run all the way to till x. So, how many time it
rS
will be executed? Anser is: x times, but how big is x? x is at least 2k .
So this algorithm will run the for loop 2k times. Which is exponential in the size of the input. So
it is not a polynomial time algorithm. Lets see this point more clearly.
Let x = 10000000000000000000000000000000000000000000000000
ree
y = 10000000000000000000000000000000000000000000000001
In this case x = 250 . The size of the input counted in bits would be about 100. So, the input is
not too large. But the number of iterations will be very large.
We do not have a polynomial time algorithm yet! But, you know of a polynomial time algorithm
that computes the GCD. It is called the Euclidean algorithm. Lets try that
3. If r = 0 then GCD = y
4. Else GCD = GCD(y, r ).
We have to analyze this algorithm. All other steps can be performed in polynomial time in the
number of bits involving x and y. A minor point first. If x < y then they are exchanged. After that
we always have x ≥ y. So lets assume x ≥ y. Note that x and y are replaced by y and r.
Theorem:
x x
y or r
2 2
1. m − 1 bits for y and at most t bits for r . Thus a total of at most m + t − 1 bits.
2. Or t bits for y and at most m − 1 bits for r . Thus again a total of at most m + t − 1 bits.
Hence, the number of total bits in the input is reduced by 1 each time. Hence, the algorithm will
take as many recursive calls as there are bits in the input.
x
1. if y then we are done.
2
x
2. What if y . In this case, what is r?
ee
2
x x
3. r = x mod y = x − y < x − = done again!
2 2
Theorem: Every context free language is a member of P. We will not prove this theorem here.
rS
What the theorem says that for every context free language L there is a polynomial time
algorithm that accepts L.
The Class NP
Lecture No 23
ree
NP is the non-deterministic analog of P. So, formally
NP = NTIME ( n k )
k
or
P = NTIME(n) NTIME(n2 ) NTIME(n3) NTIME(n4 ) ......
Let us start by looking at the following simple (and very important) non-deterministic TM. The
Ca
To show that this problem in NP we have to show a non-deterministic machine that accepts all
composite numbers given in binary.
Here is the TM
ee
1. On input <n> let k be the number of bits in n.
2. Use G to guess a k-bit number p.
3. Use G to guess a k-bit number q.
4. If pq = n and p, q > 1 then accept. Else reject.
Suppose we give this machine 15 in binary. When we run G it will branch off and each branch
will guess a 4 bit number. One of those branches will guess 3. So p = 3. Similarly, when we run
rS
G again it will branch off and each branch will guess a number and one of them will be 5. This
particular branch will accept. Thus by definition 15 will be accepted.
Suppose we give this machine 13 in binary. When we run G it will branch off and each branch
will guess a 4 bit number. So every branch will have a p. Similarly, when we run G again it will
branch off and each branch will guess a number and every branch will have a q. No branch will
ree
accept. Because, the condition 13 = pq and p, q > 1, cannot be satisfied for any p, q. Thus 13
will be rejected.
A moments thought shows that every composite will be accepted by some branch, and every
prime will be accepted by all branches. Thus this NTM accepts COMPOSITES.
It is easy to see that the length of any branch is polynomial in the number of bits in <n>. Thus
this is a non-deterministic polynomial time algorithm that accepts composites.
Let us look at another problem in NP. Let HAMPATH = {<G, s, t> : G has a hamiltonian path}
Ca
We will show that this problem is in NP. Let us recall that given a digraph G. A hamiltonian path
in G is a path that visits each vertex exactly once.
Here is a graph with a hamiltonian path from This graph does not have a hamiltonian
s to t. Show picture of a graph. path from s to t.
Here is a NTM that accepts HAMPATH:
1. On input <G, s, t>. Let n be the number of vertices in G.
2. Guess v1, . . . , vn a sequence of n vertices of G using the guessing machine.
3. Check if all vertices are distinct.
4. Check if v1 = s.
5. Check if vn = t.
6. For i = 1, . . . , n − 1 check if (vi , vi+1) is an edge in G.
7. If all conditions are satisfied accept else reject.
A moments thought shows that If G has a hamiltonian path from s to t then at least one branch
will accept, and If G does not have a hamiltonian path from from s to t no branch will accept.
Thus this NTM accepts HAMPATH. It is easy to see that the length of any branch is polynomial
ee
n. Thus this is a non-deterministic polynomial time algorithm that accepts HAMPATH.
Verifiers:
Let A be a language. A verifier for a language A is an algorithm (Turing machine) V such that
A = {w|V accepts <w, c> for some string c}.
Thus a verifier V
1. Takes two inputs <w, c>.
rS
2. c is called the certificate or proof.
3. It accepts an <w, c> if c is a correct proof of the fact that w A.
We measure the time running time of a verifies only in terms of |w|. We say that V is a
polynomial time verifier if it runs in time polynomial in |w|. If A has a polynomial time verifier then
ree
we say that it is polynomial time verifiable.
6. Accept.
Note that if someone has found a factorization of a number n then it is easy for them to convince
anyone that the number is composite. Thus the factorization of n is a easily verifiable proof that
the number is composite.
ee
NP1 is the class of languages that have polynomial time verifiers.
NP2 is the class of languages that are decided by polynomial time non-deterministic Turing
machines.
Let us say L NP1 we will show that it is also in NP2.
Let L be a language in NP1 and V be a (polynomial time) verifier for L. Consider the following
non-deterministic TM. Let us say V runs in time dnk .
1. On input w of length n.
This is a non-deterministic TM. It runs in time O( dnk ). Now, if w L then there at least one
ree
certificate c such that V accepts <w, c>. Thus if w L at least one branch of M will accept w.
If w L then no branch of M will accept w. This shows that L is in NP2. Now, we will show that if
L is in NP2 then L is in NP1. Let L be in NP1 then there is non-deterministic TM M that accepts
L. M runs in time dnk . Let w L. What is the way to prove that w L. We can try to prove that
M accepts w. But how?
Note that a NTM can make a choice at every step. There will be at most b possibilities for each
choice. Suppose we take a string of the form 1311 · · ·. This string tells us to make choice
Ca
number 1 in the first step. Choice number 3 in the second step and so on...
This is a deterministic verifier for L and runs in polynomial time. Now, we can remove the
subscripts that we had put on these classes since we have shown they are the same.
Let us define
ee
We have bolded out the clique vertices.
Is CLIQUE in NP?
You should think of making a verifier. What
would be a good proof that a graph has a
clique of size k?
Let us define
NTIME(t(n)) = {L : L is decided by a O(t(n))-time NTM N}
NP is the non-deterministic analog of P. So, formally
k
Intuitively, the class NP consists of all the problems that we can solve in polynomial time if we
had the power of guessing.
It is important to note that the time taken by V is measured in the length of w only. We say that
( )
V is a polynomial time verifier if on input <w, c> it takes time O w k for some k. Here c is called
the certificate or the proof of w‟s membership in A.
COMPOSITES Example:
Let us try to understand verifiers more closely by a simple examples. Suppose I wish to become
a verifier who verifies if a given number is composite. So, I am a verifier for
I can declare that I will accept only proofs or certificates of the form (p, q)
where p × q = n and p, q > 1.
ee
3. If the input is (45, (7 × 9)) I will reject as 7 × 9 ≠ 45.
4. If the input is (45, (1, 45)) I will reject as 1 ≤ 1.
Notice that in the above scheme if n is a composite number then there is always at least one
way to write it as a product of two numbers that are greater than 1. Thus, for every composite
number there is a proof that the number is composite.
rS
On the other hand if n is a prime. There is no proof that n is composite. So, I will never accept
any input of the form (n, (p, q)) where n is a prime.
Thus the language verified is exactly composites. Note that the verification algorithm runs in
polynomial time in the size of n.
ree
Here is the detailed version.
Note that if someone has found a factorization of a number n then it is easy for them to convince
Ca
anyone that the number is composite. Thus the factorization of n is a easily verifiable proof that
the number is composite.
Hamiltonian Path Example:
Lets look at another problem. Given a directed graph G a Hamiltonian path from s to t is a path
such that:
ee
Let us consider the following question:
HAMPATH = {<G> : G has a hamiltonian path } Let us devise a polynomial time verification
algorithm for HAMPATH. Consider the problem. We have to verify that if a graph has a
hamiltonian path from s to t. What will be the simplest proof that a graph is hamiltonian. What
will convince you that a given graph has a hamiltonian path. Think for a moment.
rS
Well if someone claims that a given graph has a hamiltonian path from s to t. I will ask them to
show me the path. A hamiltonian path from s to t is a clear proof or a certificate that a graph has
a hamiltonian path from s to t.
Now, a verification algorithm is easy to design. The algorithm will accept inputs of the form
ree
<G, s, t, (v0, . . . , vn)>. It is simply check if v0, ,vn is indeed a hamiltonian path from s to t.
A language has a polynomial time verification algorithm if there is an easily (polynomial time)
verifiable proof of the membership of its elements.
So, (2377, 1735) is a proof that 416681 is a composite. However, it takes a while to find this
proof. You can verify that the proof is correct by multiplying these two numbers in about two
minutes.
Try to find a proof that 13862249 is a composite to appreciate that proofs maybe hard to find.
Similarly, it maybe hard to find a hamiltonian path (from s to t) in G. But once found it can be
easily verified that it is indeed a hamiltonian path in G.
Recall that a clique in a graph is a set, C, of nodes that are all connected to eachother.
Formally, C is a clique in G if for every x, y C with x ≠ y we have x ~ y. The clique number of a
graph is the size of the largest clique in G.
ee
< G0, 6> is in CLIQUE.
< G0, 7> is not in CLIQUE.
Let us devise a polynomial time verification algorithm for CLIQUE. Consider the problem. We
have to verify that if a graph has a CLIQUE number ≥ k. What will be the simplest proof? What
will convince you that a given graph has clique number ≥ k. Think for a moment.
rS
A clique of size k itself is a clear proof or a certificate that a graph clique number ≥ k. Now, a
verification algorithm is easy to design. The algorithm will accept inputs of the form <(G, k), (v0,
. . . , vk )>. It simply checks if v0, . . . , vk is indeed a clique in G.
Well if someone claims that a given graph has clique number ≥ k. I will ask them to show me a
clique of size k. A clique of size k is a clear proof that the clique number of a given graph is at
ree
least k.
SUBSETSUM Example:
Let us consider a set of numbers {1, 3, 6}. This set has 23 subsets. The elements in the subsets
sum to different numbers.
In the subset sum problem, we are given a set S of numbers and a target value t. The question
is if there is T S such that a = t . We are asked if there is a subset of S whose elements
aT
sum to t? Formally,
Given a set S it has potential 2n subsets. Thus it does not seem to be easy to tell if a given
ee
number will appear as a subset sum.
Let us devise a polynomial time verification algorithm for SUBSETSUM. Consider the problem.
We have to verify that if a S has a subset T that adds up to t.
What will be the simplest proof of this. What will convince you that there is such a subset. Think
for a moment.
On input
1. <S, t,T>
rS
The subset itself will be a proof! Now, a verification algorithm is easy to design. The algorithm
will accept inputs of the form <S, t,T> It simply checks if T is a subset of S and adds its
ree
2. Check if T is a subset of S if not reject.
3. Add all the elements of T. If they sum to t accept.
Let us come back to the class NP which was defined using non-deterministic turing machines.
Machines that have the “magical” power of guessing.
Now, we are talking about “verification” where the proof is provided to us for free. These two
notions are very close. In fact a non-deterministic machine can guess a ”proof”.
Ca
Theorem:
If A is accepted by a polynomial time verifier then A is also accepted by a polynomial time non-
deterministic Turing machine.
To prove this theorem let us take a language A that is verified by a polynomial time verifier V.
V runs in time nk for some k. Let us make a non-deterministic TM for it. Consider the following
non-deterministic TM N.
1. On input w of length n.
2. Non-deterministically guess a string c of length nk .
3. Run V on <w, c> if V accepts accept.
Clearly if w A then there is a c such that V accepts <w, c> so at least one branch of N will
accept. On the other hand if w A then no branch of V will accept. Clearly, this Turing machine
runs in non-deterministic polytime. Hence, A NP
Another example:
As an example we can look at the set {3, 5, 7} It has the following subsets and subset sums:
ee
Formally the subset sum problem is the following:
SUBSETSUM = {<S, t>: There is a subset of S that sums to t }.
We note that the simplest proof that S has a subset that sums to t is the subset itself. Thus the
following is a polynomial time verification algorithm:
1. On input <S, t, T>.
2. Check if T is a subset of S.
4. Else reject. rS
3. If all the elements of T sum to t accept.
Once again notice that we are only saying that there is an easily verifiable certificate that an
instance <S, t> is in SUBSETSUM. We are not saying anything about how hard it is to find such
ree
a proof.
Alternate Characterization of NP
Theorem:
A language, A, is in NP if and only if there exists a polynomial time verification algorithm for A.
Thus we get another characterization of NP. NP is the class of easily verifiable problems. Let us
start by showing:
Ca
Theorem:
If A has a polynomial time verification algorithm then A is in NP.
Proof. Since A has an polynomial time verification algorithm let us call that algorithm V. We want
to show that A is accepted by some non-deterministic Turing machine, N, that runs in non-
deterministic polynomial time.
Hence L(N) = A. It is easy to see that N runs in polynomial time since V runs in polynomial time.
ee
The whole proof can be summarized in one line:
Theorem:
If A is in NP then it has a polynomial time verification algorithm.
rS
Proof. Since A is in NP it is accepted by some non-deterministic Turing machine, N, that runs in
non-deterministic polynomial time. We want to come up with a verification algorithm for A. We
do not know much about A. All we know is that it is accepted by some non-deterministic TM.
What would be a good proof that w A? The only way to convince someone that w A is to
ree
convince them that N accepts w. But N is a non-deterministic TM. How can we convince them in
polynomial time that N accepts w.
The main idea is that if we give the branch that accepts w then that is a easily verifiable proof
that w is accepted by N and therefore it is a proof that w A.
Note that if w A
1. Then N accepts w. Therefore, there is at least one branch of N that accepts w.
2. Hence, we can give the specification of this particular branch to V as c. Then V will
simulate N along this branch only and will accept.
If w A then
1. Then N does not accepts w. Therefore, there all branches of N reject w.
2. Hence, the specification of all branches will make V reject.
The question asks if all easily verifiable languages can be solved in polynomial time?
The clay mathematical institute has offered a prize of $1, 000, 000 for a solution to this problem.
ee
This is considered to be one of the most important open questions in computer science and
logic.
rS
2. f can be computed by a Turing machine that runs in polynomial time.
This notion of reducibility is also measuring time. Now, we are saying that we should be able to
compute the reducibility quickly. If A is polynomial time mapping reducible to B we write A p B.
ree
Theorem:
If B P and A p B then A P.
Let us carefully prove this theorem. Since, A p B hence there is a function f : ∑*→∑* such that
1. x A if and only if f (x) B.
2. f can be computed by a Turing machine, F, that runs in polynomial time. Let the running
time of F be O( nk ).
1. On input x
2. Compute y = f (x) using F.
3. Run M on y. If M accepts accept if M rejects reject.
Clearly, this TM accepts A. Let us find out how much time does it take.
( )
l
Total time is O( n k ) + O( y l ) O( n k ) + O( n k ) = O( n kl ).
Thus this algorithm runs in polynomial time and hence A P.
NP completeness
Definition:
A language A is called NP-complete if
ee
1. A is NP-hard.
2. A is in NP.
It is not at all clear that there are NP-complete languages. In fact, if they exist it would be an
amazing fact. These languages will have “all the complexity” of the class NP. They will be
extremely important because of the following theorem.
Theorem:
rS
If A is NP-complete and A is in NP then P = NP.
So in order to solve P versus NP problems we will have two approaches. Suppose we find a
NP-complete problem A. A is the hardest problem in NP. So, either we can
1. Try to prove that A is not in P. In fact, NP-complete problems are the best candidates for
this.
ree
2. Try to find a polynomial time algorithm for A. If we succeed then we will show that P =
NP.
No one has been able to do any of the above!
Satisfiability
( (
= ( x1 x3 ) x4 x2 x1 ))
Suppose the assignment is (x1) = (x2) = (x4) = 0, (x3) = 1then the formula becomes
( ( ))
= (0 1) 0 0 0
Which evaluates to 0. We denote the this by |
Note that if there is a formula with n variables there are potential 2n assignments. A formula is
satisfiable if there is at least one assignment that makes it 1 (or true).
Theorem: SAT is in NP
We can give an easy verification algorithm. The algorithm takes a formula and assignment
and checks if satisfies .
ee
1. On input < , >.
2. Substitute the variables using of using .
3. Evaluate the resulting formula with constants.
4. If the formula evaluates to 1 accept.
In order to understand the Cook-Levin theorem we have to understand what can be expressed
in a boolean formula. We will do this by first reducing one problem in NP to Satisfiability.
ree
This will give us some feel for satisifiablilty and then we can go on to the proof of the Cook-Levin
theorem.
3- Color:
ee
three colorable graph.
What we really want to do is to come up with a Translation Algorithm that translates the problem
of 3-colorability to Satisfiability. In fact given a graph G we want to be able to come up with a
formula Ǿ that is true if and only if the graph is 3-colorable. For that let us see what the formula
Ǿ will have to “say”.
1. It is possible to assign each vertex exactly one color from red, blue and green.
2. Furthermore, if two vertices are connected then they get different colors.
This formula is true if and only if x1 or x2 is true. Now, let us try to come up with a formula that is
true if exactly one of the variables out of x1 and x2 is set to true.
(
A little thought shows that the formula is ( x1 x2 ) x1 x2 )
Now, let us go a bit further. Let us come up with a formula which is true if and only if exactly one
out of the there variables x1, x2, x3 is true. A little thought shows that that formula is
( x1 x2 x3 ) ( x1 x2 ) ( x 1 x3 ) (x2 x3 )
Now, suppose we have a graph G = (V,E).
As an example let us take the following
graph
V = {a, b, c, d, e}.
E = {{a, b}, {a, c}, {a, d}, {b, d}, {d, e}}
Here is a picture of the graph.
ee
Suppose I have three variables for the first vertex a. Let us say they are Ra,Ba,Ga.
They have the following interpretation:
1. Ra = true means a is colored red.
2. Ba = true means a is colored blue.
3. Ga = true means a is colored green.
rS
How can we come up with a formula that is true if and only if the vertex a is assigned exactly
one color from these three colors? Easy remember
( x1 x2 x3 ) ( x1 x2 ) ( x 1 x3 ) (x2 x3 )
ree
So the formula is
(Ra Ba Ga ) ( R a Ba ) ( R a Ga ) ( B a Ga )
Similarly
(Rb Bb Gb ) ( Rb Bb ) ( R b Gb ) ( B b Gb )
is true if and only if b is assigned exactly one color. So it is easy to find a formula that is true if
and only if a given vertex is assigned one out of there colors.
Ca
Now, let us consider an edge e = {a, b}. How can we find a formula that is true if a and b are
given different colors. Well that is easy
(R a )( )
Bb Ra Bb (Ga Gb )
1. Given a vertex v we can make three variables Rv ,Bv ,Gv and make a formula that says
v is assigned exactly one of the three colors, red, blue or green.
2. Given an edge {u,w} we can come up with formula that is true (or says) that u and w
must have different colors.
G = ( (R B G ) ( R B ) ( R G ) (B G ))
vV
v v v v v v v v v
( ) ( )
Ra Rb Ba Bb ( Ga Gb )
a,b E
This formula “says” that G is three colorable. It is satisfiable true if and only if we can find a
3Coloring of G.
ee
Consider the algorithm:
1. On input G.
2. Compute ǾG and output it.
rS
1. Said any thing about how hard is it to solve a SAT.
2. Said anything about how hard it is to solve 3Color.
Theorem:
For every language A NP.
A ≤p SAT.
Ca
Thus the Cook-Levin theorem is much more general. It is talking about an infinite class of
problems that is reducible to SAT.
Our goal now is to prove the Cook-Levin Theorem in its complete general form. What do we
have to show? We want to show that if L NP then L ≤p SAT.
What do we know about L? The only thing we know about L is that it is accepted by a non-
deterministic Turing machine M in non-deterministic polynomial time.
Let us record these facts. Let M be a non-deterministic Turing machine that accepts L in time nk
We want to come up with a reduction. This reduction will take an instance x and convert it into a
formula Ǿx such that x L if and only if Ǿx is satisfiable.
Note that if x L then at least one computation branch of M accepts x. Let us write this
computation in a table. The first row of this table consist of the initial configuration.
The second row of this table consist of the second configuration and so on. How does this table
ee
look like?
This tableau will be an nk nk tableau. The tableau is a proof of the fact that M accepts x. Thus it
is a proof of the fact that x L.
rS
Let us look at a schematic diagram of this tableau.
ree
Ca
Let us look at a schematic diagram of this tableau. If anyone has this tableau they can confirm
that xL by checking the following.
1. Check if the first configuration is an initial configuration.
2. Check if the last configuration is an accepting configuration.
3. Check the i-th configuration yields the i + 1-st configuration.
Thus xL if an only if we can make a tableau with the above properties. Let us think a bit about
the tableau. Each cell in the tableau has a symbol from C = Q Г { # }
Can we make a formula which expresses the fact that each cell has exactly one symbol from C
in it. The tableau is made out of cells let cell [i, j] denote the i, j-th cell. We will make variables
xi,j,s for each cell where s C. If cell [i, j] = s then we can set the variable xi,j,s = true. For
example if the i, j-th cell has an a then xi,j,a = ture.
( x )(
sC
i, j,s
s,tC ,st
(x
i, j,s xi, j,t ))
ee
The above formula says that the i, j-th cell has exactly one symbol from C in it. Let us review
why this is the case.
( ) ( ))
Let us now look at the following formula:
cell = xi, j,s (x i, j,s xi, j,t
1i, jn
k sC s,tC ,st
0 1 2
rS
Let x = x1x2 . . . xn be an input. How can we say that the first row corresponds to the initial
configuration? This is not that difficult. Let us consider the formula
start = x1,1,# x1,2,q x1,3,x x1,4,x x1,5,x .... x1,n+2,x x1,n+3, x1,n+4, .....x k
3 n
This formula says that the first row of the tablaue is the initial configuration of M on the input
1,n −1,
xk
1,n ,#
ree
x = x1 · · · xn.
This formula says that the accepting state appears in the tableau.
Lastly, we want a formula move this formula will say that each of the row comes from the previous
row of the table by a legal move of the Turing machine M.
Ca
This is the most important part of the construction and the proof. How do we say that a move is
legal. For this we have to go back to the tableau and come up with a concept of a legal window.
Consider a window of size 2 × 3 in the tableau. A window is called legal if it does not violate the
actions specified by M‟s transition function.
Let us look at some examples of legal windows. Suppose we have the following moves in the
machine M.
a q1 b This is because the machine reading b can move left and write a c.
q2 a c
a q1 b This is because in q1 machine reading a b can replace it with an a and
a a q2 move right.
a a q1 The only information we have is that the machine is in q1 and reading
aab something. So, it can move right.
aba The head maybe too far to make a change in this part of the tape.
a b q2
aba The head may have moved to the left.
ee
a b q2
bbb The head may be on the left and may have changed the b to a c and moved
cbb to the left.
Now let us look at some illegal windows to get some more intuition.
In the end we will combine this formula with Ǿcell, Ǿstart and Ǿaccept to get a formula.
What do we want:
We want a to show that there is a function f that we can compute in polynomial time such that
1. x L then f (x) SAT. We do this by creating a formula Ǿx such that:
2. Ǿx is satisfiable if and only if M accepts x.
What do we want:
We want a to show that there is a function f that we can compute in polynomial time such that
1. x L then f (x) SAT.
1. Each entry of the tableau has exactly one entry. we used the formula Ǿcell to express
ee
this.
2. The first computation is the initial computation of M on x. We used the formula Ǿstart to
express this.
3. The tableau corresponds to an accepting computation. We used the formula Ǿaccept to
express this.
4. Each configuration row is obtained from the previous row through the yields relation. We
will use the formula Ǿmove
rS
Previousely we looked closely at Ǿcell, Ǿstart and Ǿaccept. Now we want to concentrate on Ǿmove.
Next was the concept of legal windows. We said a 2 × 3 window is legal if it can appear in a
possible computation of the machine M. As an example, let us take a machine with
(q0, a) = {(q2, b,R)} and
(q0, b) = {(q1, a,R), (q2, c, L)}.
ree
Let us look at some legal windows now:
# a b b a b q0 a b q0 b b
# a b b a b b q2 b b c b
# a b b a b q0 a b q0 b b
# a # b c b b q1 b c q2 b
By observing the transition function of M we can compute the set of all legal windows.
This is not very hard to see. In your homework, I have given you a NTM and asked you to
compute all the possible legal windows. Let us nevertheless outline a simple (but time
consuming method).
1. Note that these are windows of size 2 × 3. Thus they have 6 entries. The table can let us
say have r possible entries.
2. Thus the total number of possible windows is r6 . Hence there are only a finite number of
them.
Thus we can go through every possible window and figure out if it is legal or not.
1. For each possible window W
ee
2. If W is legal output W.
The above algorithm will compute all legal windows. Provided we can test if a window is legal.
That is not difficult to do!
Let the output of this algorithm be L the set of all legal windows.
Note that this has to be done once in for all. It need not be recomputed for each x. (Although it is
rS
not hard to compute also but we do not need to do that!).
Note that the legality of the window does not depend on the input to the machine M. It only
depends on the transition table of M. Thus the set of all legal windows can be computed once in
for all.
ree
Lastly, note that if all windows in a tableau are legal then it corresponds to a computation of M.
In fact, the definition of a legal window is designed to enforce this. This is because a 2 × 3 legal
window ensure that the middle element in the next configuration comes from a legal move of the
NTM M.
Now we are in a position to start constructing Ǿcell . Let us go one step at a time. Let us take all
the legal windows. L = {W1,W2, . . . ,Wt}. What we want to say is that the first window is legal.
We can say that by saying....
1. The first window is W1 or
2. The first window is W2 or
3. The first window is W3 or
and so on....
# q0 a
ee
# b q2
Then we can simply write: x1,1,# ^ x1,2,q0 ^ x1,3,a ^ x2,1,# ^ x2,2,b ^ x2,3,q2
Similarly if I want to say that the (i, j)-th window is equal to W1 I can write
xi,j,# ^ xi,j+1,q0 ^ xi,j+2,a ^ xi+1,j,# ^ xi+1,j+1,b ^ xi+1,j+2,q2
(i, j, l
l =1
) rS
Let us denote µi,j,l to be the formula that says that the (i, j)-th window is equal to Wl . Then to say
that the (i, j)-th window is legal all I have to do is to say
t
This says that the (i, j)-th window is either W1 or W2 or and so on. Which is equivalent to saying
that the (i, j) window is legal.
ree
So what would the following formula express:
t
(i, j, l )
i, j l =1
This says that all windows are legal. This is our Ǿmove.
Now, let us look at our grand boolean formula that we have created.
Ǿx = Ǿcell ^ Ǿstart ^ Ǿaccept ^ Ǿmove
This formula is satisfiable if and only if all four components are satisfiable. Let us look at them
Ca
one by one:
1. Ǿcell is satisfiable if and only if we have an assignment of the variables that corresponds
to putting one symbol in each entry of the tableau. Thus the tableau is properly filled.
2. Ǿcell is satisfied if and only if the variable assignment corresponds to filing the first row
with the initial configuration of M on x.
3. Ǿaccept is satisfied if the variable assignment corresponds to having the accept sate
somewhere in the tableau. That means the tableau encodes an accepting computation.
4. Lastly, and most importantly Ǿmove is satisfied if the assignment corresponds to filling
rows in such a way that they follow from legal moves of the machine.
In summary Ǿx is satisfiable if and only if there exists a tableau with these properties
To finish the proof of Cook-Levin Theorem let us consider the following algorithm:
1. On input x
2. Compute Ǿx and output it.
This algorithm computes a function in polynomial time. Our discussion shows that this is a
polynomial time reducibility from L to SAT. This shows that L ≤p SAT.
Recently we finished the proof of the celebrated Cook-Levin theorem which states that:
Definition:
SAT is NP-complete.
This means
ee
1. SAT is in NP.
2. Every language (problem) in NP is polynomial time reducible to SAT.
Question:
Is SAT the only NP-complete problem?
rS
ree
Let us think about this question for a little
while.
Now suppose that we have a language L such that L is in NP. We suspect that L is NP-
complete. If that is the case then SAT ≤p L.
1. Give a polynomial time verification algorithm for L. Thereby showing that L is in NP.
2. Show that SAT ≤p L. Thereby showing indirectly that every language L/ NP is
polynomial time reducible to L.
Note that showing SAT ≤L is much easier than showing that all languages L/ NP reduce to L in
polynomial time.
Thus we will use Cook-Levin theorem to establish the NP-completeness of L. For this plan to
work we need to recall the following theorem:
ee
Definition:
If A ≤p B and B ≤p C then A ≤p C that is ≤p is a transitive relation.
The proof of this fact is so simple that we can revise it in a few minutes.
rS
Suppose A ≤p B then there is a function f : ∑* → ∑* such that
1. x A if and only if f(x) B.
2. f can be computed in time O(nk ) for some fixed k.
This algorithm computes a function h(x) = g(f (x)) and it is clear that
Ca
1. x A if and only if f(x)B if and only if g(f (x))C. Therefore, x A if and only if h(x)C.
2. Can we compute h in polynomial time?
How much time does it take to compute h. Let |x| = n. Then it takes O(nk ) time to compute y.
and O(|y|l ) time to compute z. Thus the total time is O(nk ) + O(|y|l).
Now, we realize |y| ≤ O(nk) therefore, the total time is O(nk) + O((nk)l) = O(nkl )
This theorem gives us the following plan to prove NP-completeness. Suppose we have a
problem K that is known to be NP-complete.
Take SAT for example. Now, we have another problem W that we want to prove is NP-
complete.
Note that we must show that K ≤p W and not the other way around. This will be our basic recipe
for showing that other problems are NP-complete. Let us start with one example.
Let us start by defining a problem that we will prove is NP-complete. This problem is going to be
very closely related to SAT. We will call it 3SAT. Let us say we have boolean variables x1, . .,xn.
We call a variable or it‟s negation a literal. Now suppose we have a lot of literals that are being
Or-ed together we will call that a clause. Here are examples of clauses:
ee
x 1 x 5 x7 x1 x4 x6 x8 x3 x5 x11
A formula in conjunctive normal form (CNF) is just an and of clauses. So it looks like this:
(x x x )(x x
1 5 7 1 4 ) (
x6 x8 x3 x5 x11 )
Let Ǿ be a formula. We will show how to construct a formula in 3CNF such that
1. Ψ is in 3CNF.
2. Ǿ is satisfiable if and only if Ψ is satisfiable.
3. Ψ can be computed from Ǿ in polynomial time.
Ca
Let us illustrate the proof of this theorem by an example. I will leave the details to you. Let us
say we have a boolean formula Ǿ given by
((( x x )) (( x x ) ) ( x x )) x
1 3 4 7 2 5 6
We will show how to make a formula that is satisfiable if and only if Ǿ is satisfiable. Firstly we
can make a circuit corresponding to this formula. Here is a circuit.
3SAT is NP-complete
ee
rS
Now, we can introduce a variable for each wire of the circuit. So, we first label each wire with y/
ree
s. As shown in the figure.
Ca
Now, let us consider a part of the circuit as shown below: Here we have two wires y1 and y2
feeding into an or gate and the output wire is y5.
ee
these are two clauses with two literals each.
So for each wire we can write a formula which is the and over all the wires. Finally we add the
clause yn where yn is the output wire.
( )
So for the example we find that the formula looks like y5 y5 y1 y2 .....
To satisfy this formula the output wire must be set to true. To satisfy the formula for the output
rS
wire the two wires feeding it must be set to true and so on. ... Thus to make ψ true we must make
an assignment of the x variables that makes Ǿ true and so on...
It is your homework to show that you can now convert ψ into a formula in which each clause has
exactly three literals. It is easy to see that this transformation can be done in polynomial time.
Theorem:
3SAT is NP-complete.
This is our second problem that we have shown to be NP-complete. Now, let us show that a
problem from graph theory is NP-complete.
Ca
Definition:
Let G = (V,E) be a graph. I V is called an independent set if the vertices in I have no edges
between them.
Independent Sets:
The figure shows a graph on 10 vertices. a, The figure shows a graph on 12 vertices. b,
c, d, g is an independent set. c, f, j,k is an independent set.
ee
Let us define the following language:
IS = {<G, k> : G has an independent set of size k}.
rS
We have to find out if G has a independent set of size k.
It is very easy to see that ISNP. We just have to give a verification algorithm. The details are
your homework. Hint: It is easy to verify that a given set is an independent set in a graph.
ree
We want to show that IS is NP-complete. Now, all we have to do is to show that SAT ≤ p IS. Note
that this is much easier than the proof of Cook-Levin theorem. We only have to show that one
problem is reducible to IS as opposed to all the problems in NP. Let‟s do that.
Theorem: SAT ≤p IS
( ) ( ) (
Let us take the formula: x1 x2 x3 x1 x4 x5 x3 x2 x5 )
and make the following graph.
For each clause place a cluster of vertices and connect them all to eachother. Label each vertex
with the literal in the clause.
Between clauses connect xi and xi . Let us see how this graph is made.
Suppose Ǿ has m clauses and n variables. Then we realize that GǾ - will have m clusters.
First observation is
1. Any independent set must have at most one vertex from a cluster.
2. This means that the largest independent set is of size ≤ m.
ee
Second observation is
1. If xi in one cluster belongs to an independent set then no xi belongs to the indept set.
2. If xi in one cluster belongs to an independent set then no xi belongs to the indept set.
rS
Thus we can think of the vertices in the independent set as variables that are made true in that
cluster.
This means if there is a independent set that contains m vertices it must have one vertex from
each cluster. Thus the corresponding assignment must make the formula Ǿ true.
On the other hand if we Ǿ is satisfiable then the assignment satisfies each clause. So, it
ree
satisfies one literal in each clause. We can pick all these literals and get an independent set of
size m.
Theorem:
Ǿ is satisfiable if and only if GǾ contains an independent set of size m. Where m is the number
of clauses in Ǿ.
It is readily seen that GǾ can be computed in polynomial time. So this shows that 3SAT ≤p IS
Thus IS is NP-complete.
Ca
We have so far seen that SAT ≤p 3SAT ≤ IS. Is it true that IS ≤p SAT?
The answer is yes. This is a consequence of the Cook-Levin Theorem. Therefore, all these
problems are polynomial time reducible to each other. Now, I want to show you two easy
reducibilities. The first one is almost trivial. Let us define another problem in graph theory.
Definiation:
Let G = (V,E) be a graph. C V is called an independent set if the vertices in I have no edges
between them. Thus for all x, y I where x ≠ y. {x, y} E.
ee
The figure shows a graph on 12 vertices. a, The figure shows a graph on 12 vertices. b,
d, f is a clique. c, d, j,k is a clique.
rS
ree
Clique:
Let us define the following language:
CLIQUE = {<G, k> : G has an clique of size k}.
Ca
It is very easy to see that CLIQUE NP. Again this is your homework. We want to show that
CLIQUE is NP-complete.
We can show any one of the following to establish the NP-completeness of CLIQUE.
1. SAT ≤p CLIQUE
2. 3SAT ≤p CLIQUE
3. IS ≤p CLIQUE
Which one should we choose. The answer is the one that is the easiest. There is no need to do
more work than required. In this case, we should choose the last one because both the
problems seem to be closely related.
So what is the relation between independent sets and cliques? The answer is that it is very
simple. Let G = (V,E) be a graph. Let us define the complement of a graph G = (V, E / ) where
ee
E / = {{x, y} : {x, y} E}. So G contains all edges that are not in G.
rS
ree
Let us now revise the definitions of independent set and clique.
1. I is an independent set if for all x, y I with x y, {x, y} E.
2. C is a clique if for all x, y I with x y, {x, y} E.
Thus we observe that I is an independent set in G if and only if I is a clique in G !
Theorem:
IS ≤p CLIQUE Consequently CLIQUE is also NP-complete.
How do we prove this. We have to give a reducibility. The reducibility takes a pair <G, k> and
outputs < G , k>. Clearly,
Vertex Cover:
The figure shows a graph on 10 vertices. b, The figure shows a graph on 12 vertices. a,
c, f is a vertex cover. c, d, f is a vertex cover.
ee
Let us define the following language:
rS
VC = {<G, r> : G has an clique of size r}.
So computationally we have a problem in which we will be given:
1. A graph G.
2. and an integer r .
ree
We have to find out if G has a vertex cover of size r. It is very easy to see that VC NP. Again this
is your homework. We want to show that VC is NP-complete. We have three problems that we
know are NP-complete.
1. SAT
2. 3SAT
3. IS.
4. CLIQUE
Ca
We can show any one of the following to establish the NP-completeness of CLIQUE.
1. SAT ≤p VC
2. 3SAT ≤p VC
3. IS ≤p VC
4. CLIQUE ≤p VC
Once again, we will choose the one where the reducibility is very easy to come up with!
If we think about vertex cover and independent sets there is a nice relationship between them.
Let G = (V,E) be a graph. Let us take W to be a vertex cover. Let us think about the complement
of the vertex cover. We know that W is a vertex cover.
It‟s complement is X = V \W. Thus for every edge {x, y} E we have {x, y} W Ǿ. But this is
the same as saying {x, y} not X. Thus no edge is present in X. So, what is X then? It is an
independent set.
Theorem:
W is a vertex cover of G if and only if X = V \W is an independent set of G.
Theorem:
G contains an independent set of size k if and only if it contains a vertex cover of size n − k
where n is the number of vertices in G.
ee
vertex cover.
vertices in G.
Theorem:
IS ≤p VC Consequently VC is also NP-
complete.
rS
Output <G, n – k> where n is the number of
ree
Here is a list of the NP-complete problems we have so far.
1. SAT (Cook-Levin)
2. 3SAT (By reducing SAT to 3SAT)
3. IS ( By reducing 3SAT TO IS)
4. CLIQUE (BY reducing IS to CLIQUE)
5. VC (BY reducing IS to VC).
Ca
Hamiltonian Paths:
Now, we will look at another problem and
prove that it is NP-complete. This
reducibility is not going to be so easy. Let us
define the problem first.
Our goal now is to add one more problem to this picture. We want to show that the hamiltonian
path problem is also NP-Complete. The picture will then look like this.
ee
Let G = (V,E) be a directed graph and s, t V be two vertices. A Hamiltonian path from s to t is
a path that
1. Starts and s.
2. Ends at t
3. visits all the vertices of the graph exactly once.
rS
Let us define: HAMPATH {< G, s, t >: G has a hamiltonian path from s to t}.
Thus the computational question is that given a graph G and two vertices s and t. Does there
exist a hamiltonian path from s to t. We can easily show that HAMPATH is in NP.
Theorem:
3SAT ≤p HAMPATH, Consequently HAMPATH is NP-complete.
ree
Given a boolean formula - we will define a graph G- and two vertices s and t such that G- has a
hamiltonian path from s to t if and only if Ф is satisfiable.
Let us take
Ф = C1 ^ C2 ^ · · · ^ Cm and x1, . . . , xn be the variables in Ф. We describe how to construct GФ.
For each variable xi we create a diamond shape structure as shown in the figure.
For each Clause Cj we create a vertex and label it Cj . Show figure. Two special vertices s and t
Ca
Now, we show how to connect the diamond vertices to the clause vertices. Each xi has two
vertices for each clause Cj . So, the diamond vertices are as follows:
Show figure. As you can see the vertices are paired up. Two verices each correspond to a
clause. We label them CLj and CRj for left and right.
Now, let us discuss how to connect the vertices in the diamond to the clauses. Here is the rule.
If xi appears in Cj then:
Connect the vertex in the diamond for xi that is labeled CL with Cj . Also, connect Cj with CRj .
Formally, all the following edges (CL ,Cj ) and (Cj , CR ). Showj figure.
j j
This finishes the construction of the graph. Let us look at a complete picture for the following
formula = ( x x )( x
1 2 2 x3 )
What can we say about this graph. Well, I will convince you that:
1. If Ǿ is satisfiable then this graph has a hamiltonian path from s to t.
ee
2. If this graph has a hamiltonian path from s to t then Ǿ must be satisfiable.
Actually if we ignore the clause vertices then finding a hamiltonian cycle in this graph is easy.
We can go from each diamond in two ways.
The idea is if we traverse a diamond from left to right then xi is set to true and if we traverse it
rS
from right to left then xi is set to false.
Thus the 2n assingments correspond to travesing the diamonds in 2n possible ways. Let us say
we have the assignment x1 = ture, x2 = false, x3 = true then we will traverse the graph as shown.
Show figure.
Note that if we are traversing the diamond for xi from Left to right then we can “pick” up a clause
ree
Cj, if xi appears in Cj. This due to construction. This is because we have an edge from CLj to Cj
and back to CRj . Show figure.
Note that if we are traversing the diamond for xi from Right to Left then we can “pick” up a
clause Cj if xi appears in Cj. This is again by construction. In this case we have an edge from CRj
to Cj and back to CLj.
Suppose that we have a satisfying assignment for Ф which satisfies all the clauses. Then we
can pick up all the clauses by traversing the graph in the way specified by the satisfying
Ca
assgnment.
Lemma:
If Ф is satisfiable then GФ has a hamiltonian path from s to t. We have to prove the converse of
this theorem next time
TSP Problems:
The traveling salesman problem is the following problem: A traveling salesman wants to start
from his home and visit all the major cities in district and come back to his hometown. Given that
he has found out about the fares from each city to every other city can we find the cheapest way
for him to make his tour.
The TSP is not a yes/no question. It is an optimization question. It is not only asking us to find a
hamiltonian path but find one with the least cost. So, intuitively it seems much harder to solve.
We can make this intuition precise as follows:
Theorem:
If the TSP has a polynomial time algorithm then P = NP.
ee
The Subset Sum Problem
Let X = {x1, x2,….., xn, } be a (multi)-set of positive integers. For any S X let us define the
subset sum as follows sum ( s ) = Xi
xiS
A set can have at most 2n different subset sums. But typically they are less as many sums add
to the same number.
Ф
{2}
{5}
0
2
5
rS
Let X = {2, 5, 6, 7} then the subsets and subset sums are:
{ 5, 6 }
{ 5, 7 }
{ 6, 7 }
11
12
13
ree
{6} 6 { 2, 5, 6 } 13
{7} 7 { 2, 5, 7 } 14
{ 2, 5 } 7 { 2, 6, 7 } 15
{ 2, 6 } 8 { 5, 6, 7 } 18
{ 2, 7 } 9 { 2, 5, 6, 7 } 20
Ф 0 { 2, 3 } 5
Ca
{1} 1 { 2, 4 } 6
{2} 2 { 6, 7 } 7
{3} 3 { 3, 4 } 6
{4} 4 { 1, 2, 3 } 7
{ 1, 2 } 3 { 1, 2, 4 } 8
{ 1, 3 } 4 { 1, 3, 4 } 9
{ 1, 4 } 5 { 1, 2, 3, 4 } 10
Given a set X of positive integers and a target t. Does there exist a subset of X that sums to t?
Let us say the input is: <{2, 5, 6, 7}, 13> then then there is a subset. In fact, as we saw there are
two of them {6, 7} and {2, 5, 6} that sum to 13. So the answer is yes!
Let us say the input is: <{2, 5, 6, 7}, 10>: You can check that no subset sums to 10.
Let us define
SS = {<S, t>: X has a subset that sums to t}:
Thus the computational question is that given a set X and a number t. Does there exist a
hamiltonian path from s to t? We can easily show that SS is in NP. This is again your
homework.
Theorem:
3SAT ≤p SS, Consequently SS is NP-
complete.
ee
Given a boolean formula Ф we will define an
instance <X, t> such that X has a subset S
summing to t if and only if Ф is satisfiable.
t has n ones and m threes. Ones correspond to the variable columns and three correspond to
the clause columns.
Note that if a subset is chosen there will be no carries. Let us see why? Let us choose all the
elements shown here. Then in a given column there are only a few entries that are one. Thus
there can be no carries.
Out of yi or zi exactly one must be chosen otherwise the target is not reached. That will
correspond to setting xi true or false.
Now the most important part. Note that if yi is chosen then the column corresponding to the
clauses that contain xi has a 1 there. Similarly, if zi is chosen then all the columns corresponding
to xi have a 1 in them.
We get a sum that gives us all 1‟s for the first n columns. Furthermore, each of the columns
ee
corresponding to a clause will have at least a sum of 1.
Conversely, if someone says they have a subset that sums to 1. You can be rest assured of the
following things:
1. They must have chosen exactly one out of yi and zi . Why?
2. This defines an assignment.
Lemma: rS
3. This assignment must satisfy all the clauses. Why?
This lemma shows that f (Ф) = <X, t> is a polynomial time reducibility from SS to 3SAT. Thus SS
ree
is NP-complete.
Assume that TSP has a polynomial time algorithm A. Consider the following algorithm for
HAMCYCLE:
1. On input < G,V >.
Ca
We just notice that any HAMCYCLE in G becomes a weight n HAMCYCLE in Kn. Thus the
algorithm is polynomial time if A runs in polynomial time. This proves our theorem.
These problems now become languages if we pose the appropriately. We want to study the
complexity of these questions.
1. TSP is NP-complete.
2. The metric TSP is NP-complete.
ee
3. The Euclidean TSP is NP-complete.
So it seems applying the restrictions does not seem to simplify this problem.
Proof (Outline) Given a graph G we can construct a complete graph Kn on the same vertex set.
rS
If a (a, b) is an edge in G then we give this edge weight 1 otherwise we give this edge weight 2.
As we have seen: G has a hamcycle if and only if this Kn has a tour of size n. All we do is
observe that this weight function satisfies the triangle inequality. Why?
We just have to verify that w(a, c) ≤ w(a, b) + w(b, c) Note that the wieghts are 1 or 2. So, this is
always satisfied. Since w(a, c) ≤ 2 and w(a, b) + w(a, c) ≥ 1 + 1 = 2
ree
Now let us talk about a different notion of algorithms. These are called approximation
algorithms. Let us first understand this from a practical point of view. Suppose that you have a
graph G and you want to find the cheapest Hamcycle in G. It is not good enough if someone
were to tell you that the problem is NP-complete.
You want a tour that is cheapest. Suppose someone were to propose to you that they will find a
Ca
“cheap” tour for you. But they cannot guarantee that it is the cheapest. You would take that offer
since you have no other option.
Now, there is a new question. Can we devise a fast (polynomial time) algorithm that does not
find the cheapest tour but a cheap enough tour? Such an algorithm will be called an
approximation algorithm.
How will we say which algorithm is good or bad? How will we compare these algorithms. If we
have two algorithms A1 and A2 you would like the one that finds a cheapest tour in your graph.
But that is very subjective.
A more objective way to compare algorithm is to compare approximation ratios. Suppose that G
is a graph with a cheapest tour of size t. If an algorithm A guarantees to return a tour of size t
then we say A is an approximation algorithm.
1. Thus a 2-approximation algorithm will return an answer that is at most twice the
cheapest tour.
2. A 1.5 approximation algorithm will return a tour that has cost at most 50% more than the
optimal.
3. 1.01 approximation algorithm will return a tour that has cost at most 1% more that the
cheapest tour.
Question:
What can we say about polynomial time approximation algorithms for these problems.
ee
These problems look very different.
1. If P ≠ NP then general TSP cannot be approximated to any constant factor (much more
is true).
2. The metric TSP can be approximated to a factor of 1.5.
3. The Euclidean TSP can be approximated to a factor of 1 + for any > 0. So from
this point of view applying restrictions does make the problems simpler.
rS
Suppose someone claims that they can approximate TSP within a factor of . Let us call such
an algorithm A. We can use A to solve the HAMCYCLE problem as follows:
1. On input G = (V,E).
2. Make a complete graph on vertex set C.
3. if (a, b) is in G give it weight 2n .
ree
4. Use an A to find the cheapest tour in the complete graph.
5. If the weight of this tour is less than n + 1 accept else reject.
Any HAMCYCLE in G becomes a tour of weight n in the complete graph. On the otherhand any
other cycle in the complete graph has weight at least n − 1 + 2 n + 1 = 2( + 1)n > n.
Thus the algorithm will return a hamiltonian cycle if it exists. Even though it is an approximate
algorithm.
Ca
Hardness results for approximation algorithms are not usually as easy to prove as we have
done for this problem. But now there is also unified theory of that also. We will discuss that in
the later lectures. Now, let us prove a simpler theorem than we promised.
Theorem: There is a two approximation algorithm for the metric TSP problem.
The algorithm takes polynomial time. How can we show that this gives us a tour that is at most
twice the optimal tour in the graph. We need to analyze this algorithm.
The cost of the MST is less than or equal to the cost of the cheapest tour.
Proof. Delete any edge from the cheapest tour. We can a path. The path is a spanning tree (of a
special kind). This path must have more cost than the MST (by definition of MST).
The second thing to observe is that: In a graph that satisfies the triangle inequality. The cost of
the tour given by DFS is at most two times the cost of the tree.
ee
We will see this using the following picture.
Take a tree.
Duplicate the edges. This gives us a tour which has all vertices but they are repeated. The cost
of this is twice that of the tree. As each edge is doubled. Now, we short cut.
In short cutting we replace v1, . . . , vk, by v1,…..,vk. Because of the triangle inequality the short
rS
cutting does not increase the cost of going from v1 to vk.
There is a more clever algorithm that can give an approximation ratio of 1.5. Improving this ratio;
that is, finding an algorithm that has an approximation ratio of less than 1.5 is very old open
ree
problem in computer science.
For the Euclidean TSP Arora has developed a wonderful algorithm. It is quite complicated and
approximates it within a factor of 1 + . The polynomial running time of the algorithm depends
on .
So we realize that even if a problem is proved to be NP-complete there is a lot that we can do
about it.
efficiently.
2. Look for approximation algorithms for such problems.
In the 1990‟s several computer scientist took active interest in approximation algorithms.
Usually, it is not difficult to show that problems are NP-complete. However, it is very difficult to
show the approximation algorithms of problems may not exist.
This lead to one of the deepest theorems in computer science called the PCP PCP theorem.
Here PCP stands for probablitically checkable proofs.
Using this theorem it is possible to prove several hardness results for approximation algorithms.
It can be shown that if there are approximation algorithms for certain problems then P = NP. In
some cases the best known algorithm match these hardness results.
Thus showing in some sense that we may have found the most promising approximation
algorithms for these problems.
The original proof of the PCP theorem was very algebraic. Now, a combinatorial proof has also
been found.
Let us explain what the theorem says. Informally talk about PCP. What a probablistically
checkable proof is and what the theorem says. How spectacular it is :)
ee
rS
ree
Ca
Complexity
Space
We would now like to study space complexity. The amount of space or memory that is required
to solve some computational problem. The following model of computation is the most
convenient to study space complexity.
ee
3. The machine is not allowed to write a blank symbol on the work tape.
The space used by this machine on input x is the number of non-blank symbols on the work
tape at the end of the computation. Note that in this case:
1. We do not count the input as space used by the machine.
intermediate results. rS
2. The machine is allowed to read the input again and again.
3. However, since the input tape is read only the machine cannot use that space to store
A good realistic machine to keep in mind is a computer with a CD-Rom. The input is provided on
the CD-Rom and we are interested in figuring out how much memory would be required to
ree
perform a computation.
Let M be a Turing machine that halts on all inputs. We say that M runs in space s(n) if for all
inputs of length n the machine uses at most s(n) space.
Let us look at a concrete example. Let us consider the language L = {x {0, 1}* : x = xr } this is
the set of all palindromes over {0, 1}. Lets look at two TMs that accept this language and see
how much space they require.
Ca
Let us describe a TM M1 that accepts L as follows we will give a high level description.
1. The machine copies the input to the work tape.
2. It moves one head to the left and the other to the rightmost character of the copy of the
input.
3. By moving one head left to right and the other right to left it makes sure that the
characters match.
4. It all of them match the machine accepts otherwise it rejects.
It is not hard to see that the space required by M1 is linear. Thus M1 accepts L in space O(n).
The question is can we accept this language in less space.
The problem is that we are not allowed to write on the input tape. Since, it is read only. Thus we
must copy the contents on the input tape to the work tape. This would give a machine that takes
space n at least. Can we some how do better? Here is another idea.
ee
Let us describe a TM M2 that accepts L.
1. The machine write 1 on its work tape. We call it the counter.
2. It matches the first symbol with the last symbol.
3. In general it has i written on the counter and it matches the i-th symbol with the i-th last
symbol.
4. In order to match get to the i-th symbol the machine can use another counter that start at
i and decrements each time it moves it head on the readonly head forward.
5. Similarly to get to the i last symbol it can use another counter.
rS
This machine only uses three counters. Each counter needs to store a number in binary. The
number will reach the maximum value of n.
To represent n in binary we need only log2 n bits. Thus the machine will require only O(log n)
space. This is a real improvement over the previous machine M1 in terms of space.
ree
A few points to remember that are made clear by this example are:
1. If you are claiming that you have a sublinear space TM then make sure that you are not
overwriting the input or using that space.
2. The machine M1 takes time O(n) and space O(n). On the other hand M2 takes time O(n2)
and space O(log2 n). So M1 is better in terms of time and M2 better in terms of space.
This is quite typical.
3. If you are asked to design a machine that takes less space then no consideration should
be given to time.
Ca
If M is a non-deterministic TM. Suppose that all branches of M halt on on all possible input. We
define the space complexity f(n) to be the maximum number of tape cells that M scans on any
branch of its computation for any input of length n.
Similarly, we define
NSPACE(f (n)) = {L : L is accepted by an O(f (n)) space NTM }.
Let us consider some facts which make space complexity very different from time complexity.
A sublinear time TM cannot completely read its input. Therefore the languages accepted in
sublinear time are not very interesting. This is not the case with sublinear space algorithms. In
fact, it is very interesting to study sublinear space algorithms.
Let us look at an NP-complete problem. We studied CNFSAT let us see if we can come up with
an space efficient algorithm for this problem. Let us say we are given a boolean formula ɸ on n
variables in CNF. Let x1, x2, . . . ,xn be the boolean variables that appear in ɸ and C1, . . . ,Cm be
the clauses of ɸ.
ee
3. Reject.
This shows:
2. If A is an NFA with q states then there exists a DFA D such that L(A) = L(D) and the
number of states in D is at most 2q.
As a consequence we get:
Theorem: If A is a NFA with q states with L(A) ≠ ɸ, then A accepts a string of length at most 2q.
The above NTM requires 2q space. Now, we just notice that in order to simulate an NFA we do
not need to know all of x but only the current symbol of x. Thus we can reuse space. We can be
more space efficient by “guessing the symbols” of x as we go along. Let us consider a new
machine.
ee
This NTM requires a counter that can count upto 2q. Such counter requires q bits. Thus the
whole simulation can be done in linear space. Thus we have
rS
It is interesting to note that ENFA is not known to be in NP or co-NP.
Let us now look at another very interesting problem. We call this reachability. Let G = (V,E) be a
graph we say that a vertex t is reachable from s if there is a path from s to t. How can we solve
the reachability problem.
You learn two algorithms to solve this problem. One is called DFS and the other is called BFS.
Rough outline is the following:
1. Mark s
2. Repeat n − 1 times.
3. If there is a edge from a marked vertex a to an unmarked vertex b then mark b.
Ca
Usually the emphasis is on the time complexity of this algorithm and DFS and BFS are elegant
and efficient implementations of this idea (with very nice properties). However, now we want to
view these algorithms from the point of view of space complexity.
The point to notice is that both these algorithm require us to mark the vertices (which initially are
all unmarked). Therefore, we need to have one bit for each vertex and we set it when we mark
that particular vertex. Thus the whole algorithm takes linear space.
In fact, as we will see little later. Reachability is one of the most important problems in space
complexity. Let us for now start with an ingenious space efficient algorithm for reachability.
We say that b is reachable from a via a path of length k if there exist a = v0, . . . ,vk = b such that
(vi , vi+1) E for all i = 0, . . . ,k − 1.
Fact: If b is reachable from a via a path of length k ≥ 2 then there is a vertex c such that:
ee
Let us begin with the following facts:
Fact: If b is reachable from a then it is reachable via a path of length at most n. Where n is the
number of vertices in the graph.
rS
Fact: b is reachable from a via a path of length 1 if and only if (a, b) E.
Now, let us write a function reach(a,b,k). This function will return true if b is reachable from a via
a path of length at most k.
ree
reach(a,b,k)
1. If k = 0 then
2. if a = b return true. Else return false.
3. If k = 1 then
4. if a = b or (a, b) E return true. Else return false.
5. For each vertex c V
6. if (reach(a,c, k 2 ) and reach(c,b, k 2 )
7. return true;
8. return false;
Ca
This is a recursive program and we need to analyze the space requirements of this recursive
program. There are three parameters that are passed and one local variable (c). These have to
be put on the stack. How much space do these variables take?
If the vertex set of the graph is V = {1, . . . ,n} then each vertex can be stored in log 2 n bits.
Similarly, the parameter k is a number between 1 and n and requires log2 n bits to store. Thus
each call will require us to store O(log n) space on the stack. So, the space requirements are
O(log n) × depth of the recursion.
The depth of the recursion is only log 2 n . Since, we start with n and each recursive call
reduces the it by a half. Hence the total space requirements are O(log2 n)
Let us prove a very simple relationship between time and space complexity. Let us say that M is
a deterministic TM that uses space s(n). Can we give an upper bound on the time taken by the
TM M?
ee
Let us count how many configurations of M are possible on an input of length n. Note that we
can specify a configuration by writing down the tape contents of the work tape (there are at most
s(n) cells used there). We also have to specify the position of the two heads.
Thus there are n × s(n) × bs(n) configurations. Here b is the number of symbols in the work
alphabet of the machine.
rS
Theorem: If M runs in space s(n) then there are at most 2O(s(n)) distinct configurations of M on x.
This theorem is true for both deterministic and non-deterministic TMs. Let us define the
ree
analogue of P and NP from space complexity point of view:
PSPACE = SPACE ( n k )
k
The analogue of the P vs. NP question for space computation is the following:
Unfortunately there is no prize on this problem. This problem was solved by Savitch. He proved
PSPACE = NPSPACE. We will prove this theorem. This theorem is a consequence of the
following theorem.
Let us for a moment make the assumption that we can compute f(n) easily (we will come back
to this point). Let us now define a problem called yeildability. Suppose we are given two
configurations c1 and c2 of the NTM N and a number t.
CANYEILD(c1, c2, t)
1. If t = 1 then
2. if c1 = c2 or c2 follows from c1 via rules of N return true. Else return false.
ee
3. For each configuration c0 of length at most f (n)
4. if (CANYEILD(c1, c‟, t 2 ) and CANYEILD(c‟, c2, t 2 )
5. return true;
6. return false;
Once again this is a recursive program. So, we have to see how much space does it require. It
1.
2.
Input x.
Compute f (n)
rS
stores the configurations on the stack that takes space O(f(n)).
Furthermore, if we call CANYEILD(c1, c2, t) then the depth of the recursion will be log2 t. Now,
given a TM M and an input x let us consider the following deterministic TM.
ree
3. Let c0 be the initial configuration of M on x.
4. For all accepting configurations cf of length at most f (n)
5. if CANYEILD(c0, cf , 2O(f(n))) accept.
6. reject.
It is clear that if M accepts x in space f (n) then the above algorithm will also accept x. On the
otherhand if M does not accept x in space f(n) then the above algorithm will reject. The space
required by the above algorithm is
f(n) log(2O(f (n))) = O(f(n) × f (n)) = O(f2(n)).
Ca
This completes the proof of our theorem. There is one technicality left. If you see the algorithm,
we have said we will compute f(n). However, we may not know how to compute f(n). In that
case, what can we do? This is just a technicality and we can add the assumption that f(n) is
easily computable to our theorem. Another way to get around it is as follows:
Note that we can use the above function to compute if the initial configuration yields a
configuration of length f or not. We can start with f = n and and increment its value till we find
the length of the largest configuration that we can get to from the initial configuration. We can
use that value of f now to see if M reaches an accepting configuration of length f.
Let us review some relationships between space and time and non-determinism and
determinism.
1. DTIME(t(n)) NTIME(t(n)).
2. DSPACE(s(n)) NSPACE(s(n)).
3. TIME(s(n)) DSPACE(s(n)).
ee
3. DSPACE(s(n)) DTIME(2O(s(n))) for s(n) ≥ n.
Time: Space:
NP =
k
EXPTIME =
k
NTIME(n )
k
k
nk
NTIME(n )
rS NPSPACE =
k
k
NSPACE(nk )
ree
Thus we have the following relationships: P NP PSPACE = NPSPACE EXPTIME.
So far complexity theorist have only been able to show that P ≠ EXPTIME. Therefore, one of the
above (or all) containments must be non-trivial but this all we can say now.
This picture shows these complexity classes. Put Picture from Sipser Figure 8.1 Page 282.
Motivated by the theory of NP-completeness we can ask the question if there are any PSPACE-
Ca
complete problems. What kind of problems these would be? Let us make a precise definition
first.
A language B is PSPACE-complete if
1. B is in PSPACE.
2. Every A in PSPACE is polynomial time reducible to B.
Why is it that we require that A be polynomial time reducible to B? Why not require that it is
polynomial space reducible to B?
Well, what we want to say is that if a PSPACE-complete language is easy then all languages in
PSPACE are easy! In other words, we claim that
This is the critical theorem (and its analogue in the theory of NP-completeness) that gives
evidence that PSPACE-complete languages are hard.
The proof of this theorem reveals the answer. Let B be any PSPACE-complete language. If B is
in P then there a polynomial time TM M that accepts B in time O(nk). Let A PSAPCE.
ee
3. Run M on y. If M accepts accept else reject.
Clearly this algorithm recognizes B. The running time of this algorithm is O(|x|r ) + O(|y|k ).
If |x| = n then |y| ≤ nr thus the above algorithm runs in O(nkr). This shows that A is in P.
Now the critical point is that f can be computed in polynomial time. If f were polynomial space
computable we could only say that the above algorithm runs in polynomial space. We will end
rS
up proving the following uselessly trivial theorem.
It gave us a vast understanding of NP. It identified one hard problem in NP. What we want is to
identify such a problem for PSPACE.
Recall SAT. What we want to do is consider its generalizations now. Let us look at SAT
Given a boolean formula on n variable x1,…..,xn , ɸ(x1,….., xn) does there exist a satisfying
assignment for ɸ?
Ca
Another way to phrase SAT is to say if the following is the formula x1 x2 ……. xnɸ(x1… Xn)
We notice that we are only using one quantifier here. What if we allow both quantifiers?
Consider the formula: ɸ = x y[(x y) ( x y )]
This is called a fully quantified boolean formula. A fully quantified boolean formula is also called
a sentence and it is either true or false. Now we are ready to define TQBF.
The first thing to notice is that TQBF is in PSPACE. To prove this we have to give a polynomial
space algorithm that solves this problem. This is almost trivial once we define when a QBF is
true.
If ɸ is a formula on 0 boolean variables then we can evaluate it to find out if it is true. For
example (0 0) (1 0) is false and (1 0) (1 1) is true.
For
( )
= xy ( x y ) x y
We have
( )
|x=0 = y (0 y ) 0 y
ee
And
( )
|x=1= y (1 y ) 1 y
To check
We have to check
rS (
(
)
|x=0 = y (0 y ) 0 y
)
|x=0,y=0 = (0 0) 0 0
ree
Which evaluates to false so we need to check the other one!
(
|x=0,y=1= (0 1) 0 1
)
This shows us that is true and we get
(
|x=0 = y (0 y ) 0 y )
is true.
Ca
Both true for the formula to be true. We have checked the first one and now you can easily
check the second one yourself and figure out if the original formula is true.
Note that we have given a definition of when a QBF is true. This definition leads to a recursive
algorithm. Let us write down this algorithm:
Eval(ɸ, n)
1. if n = 0 then ɸ has no variables. Directly evaluate if and return the value.
2. If n > 1 then let x1 be the first quantified variable in ɸ.
3. If ɸ = x1
4. Return (Eval(ɸ|x1=0 , n -1) Eval(ɸ|x1=1 , n - 1))
5. If ɸ = 8x1
6. return (Eval(ɸ|x1=0 , n _ 1)^ Eval(ɸ|x1=1 , n - 1))
Note that this is a recursive algorithm with n as its depth of recursion. It only stores a polynomial
amount on the stack each time. Thus this algorithm runs in polynomial space. So this problem is
in PSPACE.
Next our goal is to show that TQBF is PSPACE-hard. This is not going to be easy as we have to
show that for every language A PSPACE A ≤p TQBF.
Let us start. We do not know much about A. The only thing that we know is that it belongs to
PSPACE. That means there a TM M that accepts A in polynomial space. Let us take that TM M
and collect a few facts about it.
Firstly we observe that M runs in space O(nk). Therefore, it accepts in time at most 2O(nk ).
ee
Suppose x is an input and C0 is the initial configuration of M on x. We know that M accepts x if
k
an accepting configuration of M is reachable from C0in at most t = 2O(n ) steps.
k)
Thus M accepts x if and only if there is a accepting history of M on x of length at most 2O(n .
Can we use the same idea as given in the Cook-Levin theorem?
rS
In the Cook-Levin theorem each configuration was of length polynomial and the length of the
computation history was also polynomial. In this case, the length is a problem. Nevertheless
those ideas are going to help us. So you must review the proof of the Cook-Levin theorem in
order to understand this proof completely.
ree
A formula: Ф = x y[(x y) ( x y )].
This is called a fully quantified boolean formula. A fully quantified boolean formula is also called
a sentence and it is either true or false. Now we are ready to define TQBF.
The first thing to notice is that TQBF is in PSPACE. We proved this last time.
Our goal now is to show that TQBF is PSPACE-hard. This is not going to be easy as we have to
Ca
We know is that it belongs to PSPACE. That means there a TM M that accepts A in polynomial
O( n k )
space. Firstly we observe that M runs in space O(nk ). Therefore, it accepts in time at most 2
O(nk )
Thus M accepts x if and only if there is a accepting history of M on x of length at most 2 . We
will now try to construct a QBF ФM(x) such that Ф is true if and only if M accepts x. Furthermore
(and this is the bulk of the proof) the formula will have polynomial length and would be
computable in polynomial time.
Firstly, we know from the proof of the Cook-Levin theorem that we can encode configurations in
boolean variables. Let use this as a black box. We could make a formula on variables X such
that Con(X) is true if and only if the assignment of the variables X corresponds to a
configuration.
Thus we will refer to configurations (with a bit of reminders). Strictly speaking when we talk
about configurations we are talking about the underlying variables.
Now, recall that given C1 and C2 with corresponding variable sets X1 and X2 that we can make a
formula yields YM(X1 , X2) Which is true if C1 yelids C2 in one step. This is where we used the
windows of size six idea.
We could make also make a formula that checked if an configuration was accepting. That is
given C with corresponding variable sets X we could make a formula accept(X) which was true if
and only if C was an accepting configuration.
Lastly given x and a configuration C with corresponding varaible set X we could make a formula
ee
initialM(X) which was true if and only if C was the initial configuration of M on x.
We simply put these ideas together. Our last formula was simply saying that does there exist
C0,…..,Cm such that
1. Each Ci is a valid configuration.
2. C0 is the initial configuration of M on x.
3. Ci yeilds Ci+1
4. Cm is a accepting configuration.
rS
What if we do exactly the same? What goes wrong? Well m =
polynomial time. So we will use what is called a folding trick.
2
( )
O nk
. That is very large and not
Recall that we have the quantifier , that we did not have before. Let us use it. Let us make
ree
the formula Y(C1, C2, k) if C1 yeilds Ck in at most k steps. Now, Y(C1, C2, 0) and Y(C1, C2, 1) we
know how to make.
We observe that Y(C1, C2, k) ↔ C‟Y(C1, C‟, k=2) Y(C‟, C2, k=2)
This is great! We can now make this formula recursively. But let us not rejoice. Let us see how
long will this formula will be?
Let us say the length of each Y(C1, C2, 0) and Y(C1, C2, 1) is r .
Then the length of Y(C1, C2, k) ↔ C‟Y(C1, C‟, k/2) Y(C‟, C2, k/2) will be?
Ca
Let us compute the length lk Y(C1, C2, k) We have l0 = r , l1 = r, and lk = 2lk/2 + O(1). Thus lk = rk
O( nk )
at least! This is no good. Because, we want this to work for k = 2 . We need a new idea!
Remember that we have the quantified 8. Furthermore, let us look at the recursion again. Then
the length of Y(C1, C2, k) ↔ C‟Y(C1, C‟, k/2) Y(C‟, C2, k/2). If we can some how use 8 to
make this recursion less expensive we would be home.
Consider Y(C1, C2, k) ↔ C‟Y (C, D) {(C1, C‟), (C1, C')} Y(C, D, k/2) this is equivalent to
Y(C1, C2, k) ↔ C‟Y(C1, C‟, k/2) Y(C‟, C2, k/2).
Let us compute the length lk Y(C1, C2, k) We have l0 = r, l1 = r : and lk = lk=2 + O(r).
( )
O nk
Thus lk = O(r log k) at least. If k = 2 . This is polynomial in length. Once we have done this the
proof becomes almost trivial.
A formula like:
Ф = x y [(x y) (x y)]
This is called a fully quantified boolean formula. A fully quantified boolean formula is also called
ee
a sentence and it is either true or false. Now we are ready to define TQBF.
We proved
rS
We want to now ask if there are other PSPACE-complete problems.
The answer is yes. Let us look at a problem that is very closely related to TQBF. Suppose we
are given a QBF.
x1 x2 x3 x4……. xnψ.
ree
We can think of a game that we call FORMULA-GAME.
In this game there are two players. Player I and Player II. The game proceeds as follows:
1. Player one assigns a value to x1 {0, 1}.
2. Player one assigns a value to x2 {0, 1}.
3. Player one assigns a value to x3 {0, 1}.
4. Player one assigns a value to x4 {0, 1}.
5. and so on.
In general:
Ca
At the end of n rounds all the variables have been assigned. Now Player I wins the game if ψ is
made true for the assignment otherwise Player II wins.
This kind of game is called a two player perfect information game. An interesting question is to
ask who will win this game? We have to pose this question carefully. The question asks which
player will win this game if both players played perfectly.
Let us look at a simple example. Let us look at the formula ψ(x1, x2, x3). Let us say there are
three moves. Player I chooses x1 and then player II chooses x2 and Player I chooses x3. At the
end of the game we have an assignment of the variables. If ψ becomes true then Player I wins.
Otherwise Player II wins.
Now player one wants to make the formula true on the other hand player II is trying to make it
false. If Player I wants to win then he should find an x1 such that for any choice of x2 he can find
an x3 such that ψ is true. Thus determining if Player I wins is equivalent to determining if the
QBF x1 x2 x3 ψ(x1, x2, x3) is true or not.
In general if there are many variables and have a QBF: Q1x1,…..,Qnxnψ(x1,….,xn). Where two
players play n rounds and in the i-th round if Qi = , then Player I chooses the value of xi
otherwise Player II chooses the value of xi .Then Player I has a winning strategy if and only if
the QBF in question is true.
ee
of xi otherwise Player II chooses the value of xi . Then Player I has a winning strategy if and only
if the QBF in question is true. We call this game the game defined by Ф or we simply game the
game Ф.
Define, FORMULA-GAME = {<Ф>1: the game is won by Player I}. The computational question
rS
is given a game decide which player has a winning strategy. The observation that Player I wins
if and only if Ф is true gives us the following theorem.
In general there are many two player perfect information games. Most board games are like
ree
that. For example chess, checkers, Go etc. But most card games are not perfect information
games. Let us think about chess.
Let us think a bit about chess. Suppose there someone were to find a winning strategy for
chess. This means that whomever they play with they will win the game. This means they know
a move such that no matter what the opponent does they have a move such that no matter what
the opponent does the master wins. No such strategy is known for chess.
Let us look at a game called generalized geography. First let us recall the game geography.
Repeats are not allowed. The first player to “get stuck” loses.
Suppose that we take all the names of geographical regions and make a graph. Each vertex of
the graph has a label. Two vertices a and b are connected if and only if the last letter of a is the
same as the first letter of b. Here is a part of the graph.
Picture of GG graph.
Then the game can be thought of as being played on a graph. Player I chooses a vertex.
Player‟s continue to select the neighbors of the current vertex (without repeating vertices) untill
one of them fails and loses.
This game is called Generalized Geography (GG). Formally, GG = {< G >: Player I wins on G}:
The computational question again is given a directed graph G if Player I has a winning strategy.
Let us see how the winning strategy of Player I would be like? Player I has to choose a vertex
such that for all choices of Player II he can choose a vertex such that for all choices of Player I.
Player I wins.
ee
So the questions seems to be just like alternating quantified formula. But is it?
rS
along with this theorem we can easily see that...
Theorem: GG is in PSPACE.
ree
Theorem: GG is in PSPACE-complete.
We will show this theorem next time. Let us also discuss some very interesting results from
combinatorial games. It is known that appropriately formalized versions of many games are
PSPACE-complete. These include
1. Chess
2. Go
3. Hex
Ca
Repeats are not allowed. The first player to “get stuck” loses.
Suppose that we take all the names of geographical regions and make a graph. Each vertex of
the graph has a label. Two vertices a and b are connected if and only if the last letter of a is the
same as the first letter of b. Here is a part of the graph.
Picture of GG graph.
Then the game can be thought of as being played on a graph. Player I chooses a vertex.
Player‟s continue to select the neighbors of the current vertex (without repeating vertices) until
one of them fails and loses.
To avoid trivialities we will require the Player I to start with a designated node.
ee
This game is called Generalized Geography (GG). Formally,
GG = {<G,b>: Player I wins on G when the game is started with b}. The computational question
again is given a directed graph G if Player I has a winning strategy.
Let us see how the winning strategy of Player I would be like? Player I has to choose a vertex
such that for all choices of Player II he can choose a vertex such that for all choices of Player I.
Player I wins.
rS
So the questions seems to be just like alternating quantified formula. But is it?
We would like to show the following theorems
Theorem: GG is in PSPACE.
ree
Theorem: GG is PSPACE-hard.
Theorem:
Suppose G is a graph and Player I chooses the node b in the graph. Let G 0 be the graph with b
deleted from G and let b1,……,bk be the neighbors of b: Player I has a winning strategy on G
with this first move as b if and only if Player I does not have a winning strategy with b i as a move
for G0 for all i.
Ca
This observation can be converted into a simple recursive algorithm for GG.
The algorithm requires polynomial space. Although it would not be a polynomial time algorithm.
This shows that GG is in PSPACE.
To do this we will show that given a QBF Ф we can make a graph GФ such that Ф is ture if and
only if Player I wins GФ.
We will assume
1. The quantifiers alternate.
2. Is in CNF.
The graph GФ is going be in two parts. Let us look at the first part.
1. There are n diamond structures one for each variable.
2. The start node is b.
Let us have a look at this part of the graph.
ee
3. Player I now must choose b2.
4. Player II can now choose T2 or F2.
5. Player I must now choose z2.
6. Player II now must choose b3.
7. Player I can now choose T3 or F3.
We observe that
The game eventually ends at the node c. By adding an extra node before c we can make sure
that it is now Player II‟s turn. Let us look at the rest of the graph.
From c there are edges going to c1,……,cm where each ci is a clause of ψ.
From clause nodes there are edges to nodes labeled by the literals of the clause. Each clause
“has its own copy of the literal”
1. From the literal xi there is an edge to Ti.
2. From the literal xi there is an edge to Fi.
Ca
Let us consider the game from Player II‟s point of view when the current vertex is c. Note that
the choices of the players dictate an assignment.
1. He would like to choose a cj such that he can win from cj.
2. Player I will move to some literal in the clause.
3. Player II can only win if that literal is made false by the assignment.
Thus Player II is trying to move to a literal such that. All literals in the clause are made false by
the assignment. Thus Player II is going to move to a clause that is made false by the
assignment. Thus Player II is going to win if there is a false clause. Otherwise, player I wins.
So, while choosing the assignment Player I will try to make ψ true and Player II will try to make
ψ false. This shows that
Theorem: GG is PSPACE-complete.
ee
2. NL = NSPACE(log n):
Notice that to count upto n we need a log n-bit counter. Thus you should think of L as the set of
problems that can be solved using a fixed number of counters. NL is offcourse the non-
deterministic analogue of this class.
We know that L NL P
rS
Furthermore, we know from Savatich‟s theorem that NL SPACE(log2 n).
However, Savatich‟s theorem does not show us that NL = L.
ree
To study L and NL polynomial time reductions are useless. Thus we define a new reduction. Let
us make some definitions.
A LOGSPACE transducer is a TM with an input tape, a work tape and an output tape. The input
tape is read only. The output tape is write only. The machine on input x of length n is allowed to
use O(log n) work tape and produce its output.
Note that
1. The input is not counted as space used by the TM.
2. Only the space used on the work tape is counted as space used by the TM.
Ca
X2
Let us design a LOGSPACE transducer that on input x compute 1 . This is very easy.
1. On input x
2. Set counter=0;
3. For each character of the input increment the counter.
4. For i = 1 to counter.
5. For j = 1 to counter.
6. print 1.
This machine uses only three counters. They need to be log n bit counters on input of length n.
The output is clearly n2 bits.
It is easy to argue that if a log space transducer produces an output f(x) then |f(x)| ≤ |x|k for
some k.
This is not trivial. Let us try to give a simple solution for this problem and see why does it not
work.
ee
We know that A ≤L B thus there is a log space transducer N computing f such that xA if and
only if f(x) is in B. Furthermore, we know that BL so there is a TM M that uses O(log n) space
and accepts B.
rS
3. Run M on y and accept if M accepts y.
It is clear that T accepts x if and only if x A. But how much is the space used by T.
The first step is to compute f (x) and write it down. Although, it takes O(log n) space to compute
f (x) but writing down f (x) can require a lot of space. Thus this is NOT a log space TM. We need
ree
a new idea.
The new idea is to use what I call the man in the middle attack! Let us see what the man in the
middle attack is. Talk about man in the middle attack.
We observe that a TM reads one character at a time and then performs one step. Thus in order
to simulate the TM M we need to know what is the current symbol under the tape head. Not the
entire y.
We can now simulate M as follows. We find out what is the current symbol of y that is being
Ca
scanned by M. To perform one step of the simulation we can rerun N on x to compute the
symbol of y that is needed to take one step of M.
Thus if M needs the i-th symbol of the y we can run N on x and every time it produces a counter
we increment it. We discard the output by erasing it if the counter is less than i. This way the
output does not use any space. When the i-th bit is produced we use it to do one step in our
simulation.
This way by using a few extra counters the space requirement of this TM is O(log n) + O(log y).
The length of y is at most nk thus the whole procedure takes log-space.
Notice that we would not be able to prove this theorem if we were using polynomial time
reducibility. Thus log-space reducibility is what is needed to study L and NL.
A central question in space complexity is if L = NL. This is a major unresolved question in space
complexity. Solution to this problem means instant fame and glory!
ee
It is easy to see that
rS
On the other hand if someone proves that NP = co – NP that would be a great insight but would
not resolve the P vs. NP conjecture.
One way to resolve the L vs. NL conjecture is to show that NL ≠ co – NL. However, there is a twist in
the tale. We will prove that NL = co – NL. This is a major result in space complexity. However, the
Ca
Lets first observe that Reach in NL. We have to give a non-deterministic algorithm for Reach.
Theorem: NL = co-NL
This implies that PATH is co – NL complete. Since we will also show that PATH is in NL. This
would tell us that NL = co – NL.
Thus given two vertices in a graph we have to decide if there is a path from s to t in a graph.
This problem is also called s – t-connectivity.
ee
We want to show that
1. PATH is in NL.
2. PATH is NL-complete.
To see that PATH is in NL we recall that non-determinism allows us to guess. Thus given s and
t we can try to guess the path from s to t and accept if we find one. However, we are only
allowed to use small amount of space. So we have to be a bit more careful.
The algorithm is trying to „guess” a path. The trick is that it does so by “guessing” one vertex at
a time. It does not bother to remember the previous vertices of the path thereby saving space.
Now we would like to show that PATH is NL-complete. Let A L then there is a non-
deterministic TM M that accepts A.
Ca
On an input x the NTM M performs a computation. Let think of making a graph as follows:
1. The vertices in the graph are the configurations of M on the input x.
2. A configuration c us connected to c0 if and only if c yields c0 in one step.
To find out if M accepts x all we have to is to find out if from the vertex that corresponds to the
initial configuration can we reach a vertex that corresponds to the final configuration.
Note that each configuration c of a NTM is specified by
1. The contents of the work tape.
2. The position of the head reading the work tape.
3. The position of the head reading the input tape.
Thus all configurations are given by c = (w, i, j) where w is the the contents of the work tape and
the machine is currently reading the i-th symbol on the work tape and the j-th symbol of the
input.
Since M runs in LOGSPACE thus all such configurations can be enumerated in LOGSPACE. It
is also easy to see that for any given configuration c we can compute in LOGSPACE the
configurations reachable from c in one step. Thus we can output the edges of this whole graph.
Let us denote by GM(x) the graph described above. Let us consider the following algorithm.
1. On input x.
2. Compute GM(x)
3. Output <GM(x), ci, cf>
ee
Next we will show that PATH is in NL. This is not a simple theorem. We have to first
understand the problem very carefully.
Our goal is to design a non-deterministic TM (algorithm) that takes as input a graph G and two
vertices s and t. The TM M must have the following properties:
1. If s is connected to t then all branches of M must reject.
2. If s is not connected to t then at least one branch of M must accept.
rS
3. M must run in (non-deterministic) LOGSPACE.
What is the major difficulty? The problem is that non-deterministic TM‟s are very good at finding
things. However, this time a non-determnistic TM has to accept is there is no path from s to t.
So, it is a counter intuitive problem from the point of view of non-determinism.
ree
This is a difficult problem to solve. So, we start with solving a much simpler problem. Let us say
we are given the following information.
1. We are given an input graph G = (V, E) and two vertices s and t.
2. We are told a number c. This is the number of vertices in G that are reachable from s.
3. We have to design a LOGSPACE NTM.
4. The TM will reject all inputs where s is connected to t.
5. At least one branch of M will accept if s is not connected to t.
This problem is easier to solve. Let us look at a overview of a non-deterministic TM that solves
this problem and then we will improve it.
Ca
1. Input G, s, t and a number c (It is promised that there are exactly c vertices that are
reachable from s)
2. Guess c vertices v1,……,vc
3. if t = vi for some i reject.
4. For i = 1,…….,c
5. Guess a path from s to vi
6. Verify the path from s to t by checking edges of G.
7. If the path is not valid reject.
8. Accept.
This machine solves the problem. Lets see why! For it to reach accept state it would verify that
at least c vertices are reachable from s and t is not one of them. Furthermore, if t is not
reachable from s and the machine makes all the right guesses it will accept. Thus at least one
branch will accept if and only if t is not reachable from s. But how much space does this
algorithm take?
This is too space wasteful. So, we use the previous trick that we used to solve PATH. We do not
need to guess the whole set {v1,......vc} and similarly we do not need to guess the whole path.
We can guess them as we go along. Thus saving space. Let us look at these ideas clearly.
Let V = {v1, ,vn} be a set of vertices. What we want to do is think about choosing a subset of
size c from these vertices. A subset can be encoded in a 0/1 vector. Let (g1,……,gn) be a 0/1
vector. Then G = {vi, gi = 1} is a subset of V.
ee
What we can do is guess the current bit of this vector one at a time. Thus if gi = 1 we are
guessing that the vertex is in G otherwise not. Also, as we did previously we can forget all the
previous bits and only remember the number of them which were in G this way we will know the
size of G at the end of the loop.
1. Input G, s, t and a number c (It is promised that there are exactly c vertices that are
reachable from s)
2. Set d = 0.
3. For each vertex x V.
4.
5.
6.
7.
Guess a bit g. rS
If g = 1 (The algorithm has guessed that x is connected to s).
Set current vertex v = s;
For i = 1, ........... ,n
Guess a vertex w V;
ree
8.
9. if (v, w) E reject;
10. else if w = x exit the For i loop.
11. v = w;
12. if x = t reject;
13. d = d + 1;
14. if d = c then accept.
The previous algorithm tells us that given a graph G = (V, E) and a source s if we can compute
in LOGSPACE a number of vertices reachable form s then we can solve the PATH problem in
Ca
non-deterministic LOGSPACE.
Now, our task is a bit simpler. But we notice that it is even simpler than we thought. What we
want is an non-deterministic algorithm that does the following.
1. On input G and s.
2. At least one branch of M should output a number c the number of nodes reachable from
s.
3. All other branches should reject.
If we can accomplish the previous task we could run our algorithm at the branches that output c
and get our algorithm. However, solving this problem requires another similar idea. We will
study that idea in the next lecture.
Theorem: NL = co – NL
Thus by showing that only one problem belongs to NL we could show that two complexity
classes were equal.
ee
In the previous lecture we studied an non-deterministic algorithm. The algorithm almost did the
job. But it required an extra piece of information.
1. We are given an input graph G = (V, E) and two vertices s and t.
2. We are told a number c. This is the number of vertices in G that are reachable from s.
3. The TM will reject all inputs where s is connected to t.
4. At least one branch of M will accept if s is not connected to t.
rS
The previous algorithm tells us that given a graph G = (V, E) and a source s if we can compute
in LOGSPACE a number of vertices reachable form s then we can solve the PATH problem in
non-deterministic LOGSPACE.
Now, our task is a bit simpler. But we notice that it is even simpler than we thought. What we
ree
want is an non-deterministic algorithm that does the following.
1. On input G and s.
2. At least one branch of M should output a number c the number of nodes reachable from
s.
3. All other branches should reject.
Today we will design this algorithm. Let us start. We first break up the problem into simpler
pieces (by now you should be used to of this! Given a graph G and a vertex s let us define:
Let us now note that c0 = 1 and in fact, the only vertex reachable from s in 0 steps is s itself.
Now, suppose we can design an algorithm that does the following. Given ci as an input it at least
one of its branch computes ci+1 correctly and all other branches reject. Then we can use this
algorithm to eventually come up with an algorithm that computes cn from c0.
A few things about the algorithm. The indices are only there for clarity. We can forget the
previous ci ‟s so the space requirements are logarithmic.
ee
Now we show how we can compute ci+1 from ci. For that let us try to see why a vertex will be in
Ri+1. We observe that a vertex x is in Ri+1 if and only if
1. It is in Ri or
2. There is a vertex w Ri and (w, x) is an edge in the graph.
Lets start with a simple idea and then refine it. We will use the previous ideas again and again.
1. count = 0;
2. For each vertex v V
3. if v Ri +1 count = count+1;
rS
What we want to do is to count the number of vertices in Ri+1. There is a simple way to do
this....
This idea does not work! We do not know Ri but we do know its size ci. So, we do something
Ca
cleverer.
We are given non-determinism at our disposal. So, we can use it over and over again. Let us try
something.....
1. For each w V
2. Guess if w Ri
3. Check if w Ri if not reject.
4. if w = v or (w, v) an edge then v Ri+1.
5. v Ri+1.
The problem is how do we check if w Ri . Here we use our old trick.
Note that our subset Ri can be specified by n-bits (r1,…….,rn), where ri = 1 if and only if vi Ri .
Suppose we could guess a bit vector (r1, ........ ,rn).
ee
A simplified version of this program is as follows.
1. count = 0;
2. For w = 1,……,n
3. Guess rw {0, 1}
4. If rw = 1 then
5. Verify if there is a path of length ≤ i from s to w.
6.
7.
If verification fails reject.
count = count+1;
rS
8. If count ≠ ci reject (the guess of rv‟s was wrong).
Critically we can use this program to count the size of Ri+1. This is what we wanted to show.
Lets look at this program again. It is a highly non-deterministic program. But it only uses
counters! It is one of the most wonderful programs and has only theoretical implications.
Lets revisit it. Now, we will not use indicies anymore to make it clear that only counters are
used.
1. C = 1; ( c0 = 0 we know that)
2. For i = 1,............., n - 1
3. C has ci = |Ri| and we want to compute D = ci+1
4. D = 0;
5. For v = 1, ........ ,n (For each vertex let‟s check if it is in Ri+1)
6. b = 0;
7. count = 0;
8. For w = 1,… ..... ,n
9. guess rw {0, 1} if rw = 1
10. verify if there is a path of length at most i from s to w.
11. If verification fails reject.
12. else count = count+1;
ee
13. if v = w or (v, w) E then b = 1;
14. if count 6= C reject.
15. D=D+b
16. C = D
At the end of the algorithm we have computed cn. Its value is stored in D. Note that one of the
loop is not shown in this program. The loop that guesses the path of length at most i. Thus this
rS
is a program with four nested loops! It is quite complicated.
Once we have computed cn we know how many vertices are reachable from s. Now, we can use
the program of the previous lecture to write our non-deterministic program that we require. This
ends our proof.
ree
Ca
Computability
The most important achievement of computability theory is a formal definition of an algorithm.
The Church-Turing thesis asserts that the intuitive notion of algorithm is captured by the
mathematically precise notion of a Turing machine. This allows us to formally talk about
algorithms and prove things about computations and algorithms.
ee
A major part of this proof is the realization that we can encode the description of a TM. Another
important idea was the fact that we can make a TM that simulates another TM by reading it‟s
description.
We say that a language L is decidable if there exists an TM such that halts on all inputs and
accepts L. This gives us a precise definition of the notion of algorithms. We realize that that the
We note that ATM is a the language accepted by the Universal TM. However, the universal TM
ree
is not a decider. Thus the natural question that arises is, Is ATM decidable? The main technique
here was digitalization. We proved the amazingly interested theorem of Turing.
The amazing thing about this theorem is that it points out to something which probably would
not have crossed our minds. The fact that well defined computational questions can be
undecidable.
Ca
Based on the undecidability of the halting problem. We were able to prove that several other
problems were undecidable. A generic result called Rice’s theorem stated that all non-trivial
semantic properties of TM‟s are undecidable.
We looked at a very natural question that was undecidable. This problem seemed to be a
puzzle but turned out to be undecidable. That was the Post Correspondence Problem. Thus
problems which “naturally popup” can also be undecidable.
We then started with the following puzzle. Can one write a program that prints itself? We were
able to answer this in the affirmative and not only that we were able to come up with a technique
which told us that “A program can have access to its own description”. This statement was
formalized in the recursion theorem. The recursion theorem becomes a very nice tool to get
undecidability results.
We also looked at problems that arise from mathematical logic. In particular we looked at the
decidability of logical theories. We showed using a very non-trivial algorithm that Presburger
arithmetic is decidable.
On the other hand we were able to show that Paeno Arithmetic (the one that we are used to of)
is undecidable. This was a shocking and wonderful result.
ee
Lastly we showed that any property that holds for all most all strings must be false for finitely
many strings that are incompressible. Thus incompressible strings look random from the point of
view of any computable property.
Complexity
Then we changed our focus and started studying Complexity Theory. Thus now we were talking
about
problems. rS
1. Problems those are solvable in practice.
2. Thus we study computable problems and ask what resources are required to solve these
We precisely defined time and space complexity. We also noted that k-tape TM‟s can be
simulated by a 1-tape TM with quadratic delay.
ree
We defined non-deterministic and deterministic time and space complexity. This lead us to two
naturally defined classes. The class P and the class NP.
The class P captures the problem that can be solved in “practice”. However, the class NP
contains a lot of practical problem that we would like to solve. We asked the question if
Question: Is P = NP?
The P vs. NP question is one of the most celebrated open questions in computer science. It is
Ca
The study of P and NP has lead to the theory of NP-complexness. When we prove a problem is
NP-complete we show that if it can be solved in polynomial time then all the problems in NP can
be solved in polynomial time. Thus it provides us with hard evidence that the problem is hard.
However, it does not provide us with a proof (at least not till the P vs. NP conjecture is open).
One of the most spectacular theorem in this area was the Cook-Levin theorem.
Theorem (Cook-Levin): SAT is NP-complete.
Using the Cook-Levin theorem we were able to prove that several other problems are NP-
complete. These problems came from graph theory (independent set, coloring, vertex cover),
arithmetic (subset sums), logic (sat) etc. Thus NP-complete is a rich class of problems.
We defined non-deterministic and deterministic space complexity classes. The class PSPACE
and the class NPSPACE.
The class PSPACE captures the problem that can be solved in reasonable space. The class
NPSPACE was a natural non-deterministic analogue. We asked if
ee
Theorem (Savitch): PSPACE = NPSPACE:
The study of P and PSPACE lead to the theory of PSPACE-complexness. When we prove a
problem is PSPACE-complete we show that if it can be solved in polynomial time then all the
problems in PSPACE can be solved in polynomial time. Thus it provides us with hard evidence
that the problem is hard. However, it does not provide us with a proof.
rS
Theorem (Cook-Levin): TQBF is SPACE-complete.
In the last part of the course we studied the classes L and NL. These were space complexity
ree
classes where the space requirement was logrithmic. We again proved that path was NL-
complete.
We noticed that Savatich‟s techinique does not yield a result about L and NL. However, the big
surprise was the last theorem we proved was NL = co – NL, this gave us an example of a non-
deterministic complexity class that is closed under complementarily.
What more to study? What have we missed? Theory of computation is a wonderful subject and
it is impossible to study it in one course. There are many exciting things that you can look at in
the future.
Ca
These include:
1. Randomized Complexity Classes.
2. The polynomial time hierarchy.
3. Hardness of Approximation.
4. Quantum computation and quantum complexity.
5. Cryptography.
6. and the list goes on...
You can read papers in the following A few journals that you may want to look at
conference proceedings. to see what is going on in complexity theory
are
1. STOC 1. Journal of Computational Complexity
2. FOCS 2. Theoretical Computer Science
3. CCC 3. Journal of ACM