Context-Free Languages and Pushdown Automata
Context-Free Languages and Pushdown Automata
1. Pushdown automata
See Sipser §2.2 for more information about this section.
The memory that we add to our machines will be in the form of a stack : we can
put letters on to top of the stack (called pushing) and read and remove letters from
the top of the stack (called popping), but we can’t read any more of the stack than
the top letter. A good example to think of is a stack of plates in a cafeteria: we
can put plates on the top, and take plates from the top, but trying to take plates
from further down can lead to problems. . .
18
19
3. is a finite set, the stack alphabet: the word on the stack at every point is a
word from ⇤ .
This diagram is similar to the ones used to describe a DFA or NFA, the only
di↵erence is the labels on the arrows. We write a, b ! c to indicate that when the
machine reads an a from the input, it can pop a b from the stack and push a c onto
the stack.
20CHAPTER 2. CONTEXT-FREE LANGUAGES AND PUSHDOWN AUTOMATA
• If a = ✏, then P1 can make this move without reading any letters from the
input word. If a 6= ✏, but the next letter of the input word is not a, then this
branch of computation terminates (just like an NFA).
• If b = ✏, then P1 does not pop anything from the stack. If b 6= ✏, but the top
letter on the stack is not b, then this branch of computation terminates.
• If c = ✏, then P1 does not push anything onto the stack.
Since P1 is quite small, here is a formal description of it as well.
The transitions are given by the following table. An entry ; in column (qj , b, c)
means that P1 rejects that path of computation: remember that when P1 has
finished reading the input word, it can stop using the transition function.
Input: 0 1 ✏
Pop from stack: 0 $ ✏ 0 $ ✏ 0 $ ✏
q0 ; ; ; ; ; ; ; ; {(q1 , $)}
q1 ; ; {(q1 , 0)} {(q2 , ✏)} ; ; ; ; ;
q2 ; ; ; {(q2 , ✏)} ; ; ; {(q3 , ✏)} ;
q3 ; ; ; ; ; ; ; ; ;
The PDA P2 first pushes all of the as onto the stack. It then uses nondeterminism
to test two things at once. One branch of the computation tests whether the number
of as is equal to the number of bs, and then ignores the cs. The other branch ignores
the bs, and then tests whether the number of as is equal to the number of cs. The
PDA P2 tests equality of the numbers of letters by popping one a from the stack
for each letter (b or c) it reads, and checking whether it reaches the bottom of the
stack at the same time as it finishes reading the bs or cs, depending on the branch
of computation. If either branch matches, then the PDA accepts.
21
Here is a diagram of P2 :
Exercise: Show that the language L from Example 1.5 is not regular.
In Example 1.5 we used the full power of nondeterminism to simultaneously test
i = j or i = k. It can be shown that nondeterminism makes a di↵erence to PDAs:
there are languages (including the one from Example 1.5) that can be recognised
by a (nondeterministic) PDA, but that cannot be recognised by a machine that is
like a PDA but deterministic. We will not study machines that are like PDAs but
are deterministic.
2. Context-free languages
See Sipser §2.1 for more information on this section.
We saw in Chapter 1 that there are three equivalent ways to define regular
languages: as the languages recognised by DFAs, as the languages recognised by
NFAs, and as the languages described by a regular expression.
We now give a new way of defining languages, called context-free grammars
(CFGs). We will see later that this is another way of defining the class of languages
accepted by PDAs. Context free grammars di↵er from regular expressions, in that
rather than being static objects which either do or don’t match a given word, they
consist of sets of rules from which we make words.
Definition 2.1. A context free grammar (CFG) is a 4-tuple (N, ⌃, R, S), where
Variables are normally written with upper case letters and terminals with lower
case letters or numbers. If we have more than one rule with the same left side we
usually group them together: instead of writing, for example A ! w1 and A ! w2 ,
we write A ! w1 | w2 . This means that either rule can be applied.
Example 2.2. Our first context free grammar G1 is ({S}, {0, 1}, R, S), where R
contains two production rules: R = {S ! 0S1 | ✏}.
22CHAPTER 2. CONTEXT-FREE LANGUAGES AND PUSHDOWN AUTOMATA
S ) w1 ) w2 ) . . . ) w.
We see that S )⇤ 0n 1n for all n 0. Furthermore, this is all the words that can
be derived from S, so L(G1 ) = {0n 1n : n 0}, and this language is context-free.
Example 2.4. Consider the CFG, G2 , which has variables {R, S, T, X}, terminals
{a, b} and start symbol R. The production rules of G2 are
R ! XRX | S
S ! aT b | bT a
T ! XT X | X | ✏
X ! a | b.
Let’s try some derivations. If we want to make words in L(G2 ) then our deriva-
tions should start from R.
1. R ) S ) aT b ) ab
2. R ) S ) bT a ) bXa ) baa
3. R ) XRX ) XXRXX ) XXSXX ) XXbT aXX )⇤ babaXX )⇤ babaaa
T )⇤ X n T X n for all n = 1, 2, . . .
X nT X n ) X 2n+1
X nT X n ) X 2n
Context-free grammars were originally used in the study of human (natural) lan-
guages. An important application is the compilation and specification of program-
ming languages. Designers of compilers and interpreters often start by obtaining a
grammar for the language, and most compilers and interpreters contain a parser,
which extracts the meaning of the programme before either generating the compiled
code or running the programme. A parser is constructed (either automatically or
by hand) from the CFG that describes the language.
Corollary 3.2. The class of regular languages is a proper subset of the class of
context-free languages.
Proof. We showed in Theorem 1.4 that the regular languages are a proper subset
of the languages recognised by PDAs. The result now follows from Theorem 3.1.
⌅
We won’t prove Theorem 3.1, but will give the method of going from a context
free grammar to a pushdown automaton, which gives an idea of how one might
prove one direction.
Let the context free grammar be G = (V, ⌃, R, S). We will design a pushdown
automaton P which mimics all possible left-most derivations of G. (A leftmost
derivation is one where at every step the leftmost remaining variable is replaced.
One can show that if w 2 L(G) then there is a left-most derivation which produces
w). The PDA P uses nondeterminism to mimic all of them at once. It will accept
an input word w if one of the derivations produces w.
The PDA P will use its stack to store each step of the derivations: it will
pop the variable that it wants to replace, and push on the right hand side of the
corresponding rule of the CFG. This is described in Step 3, below, where we show
how to push words of length greater than 1 onto the stack.
There is one further complication, due to the fact that P can only read the top
letter of the stack: the next variable that P wishes to replace could be hidden under
some terminals. The PDA deals with this by matching the terminals at the top of
the stack to the letters of w, as they are produced: reading the stack from top to
bottom, these will appear in the same order as they are in w. More specificially, if
P is in state qloop (see Step 1, below), and there is a terminal a 2 ⌃ on the top of
the stack, P reads the next letter b of w. If a = b then P pops a o↵ the stack: see
Step 4, below. If a 6= b, then this branch of computation rejects w.
1. Start with four states: q0 , q1 , qloop and qaccept . Make q0 the start state, and
qaccept the only accept state.
Most of the work of P happens at qloop . Whenever P is in qloop the stack
will contain the result of a derivation in G (except with some initial terminals
missing, if they have already been matched with the input word w).
24CHAPTER 2. CONTEXT-FREE LANGUAGES AND PUSHDOWN AUTOMATA
Example 3.3. We show how to turn the CFG G1 from Example 2.2 into a PDA.
Here’s what the PDA looks like after Steps 1 and 2:
Here’s the final PDA. The chains of states for each rule that we added in Step 3
have been labelled:
Example 3.4. Consider the CFG ({S, T }, {a, b}, R, S), where R is
S ! aT b | b
T ! T a | ✏.
There are four rules in R here, so the PDA has four chains added in Step 3 (but
two of them are very short!).
1. |vxy| < N ,
Condition 1 means that although we don’t know how long the first part u of w
is, we do know that vxy can’t be too long. Condition 2 says that we’re not allowed
to choose v and y to both be ✏ (otherwise Condition 3 would trivially hold).
26CHAPTER 2. CONTEXT-FREE LANGUAGES AND PUSHDOWN AUTOMATA
The proof of Lemma 4.1 is not terribly complicated, but it relies on the concept
of a parse tree, which has not been introduced in these notes. Hence we omit its
proof, and instead just give some examples of how to use it.
Example 4.2. Prove that the language L = {an bn cn : n 1} is not context free.
We assume that L is context-free, and get a contradiction. Since L is context-
free, the Pumping Lemma for context-free languages applies. Let N be the pumping
length. Consider the word w = aN bN cN . Then w 2 L and |w| > N . We must show
that no matter how we write w = uvxyz, if Conditions 1 and 2 hold then there
exists an i such that uv i xy i z 62 L.
By Condition 1, |vxy| < N , so vxy cannot be of the form ai bN cj . Hence either
vxy is a power of one of a, b or c; or vxy is of the form ai bj or bi cj (for some
i, j > 0).
If either v or y is equal ai bj or bi cj (with both i and j greater than 0), then
/ L(a⇤ b⇤ c⇤ ), and so uv 2 xy 2 z 2
uv 2 xy 2 z 2 / L.
Hence v and y are both powers of single letters. Then at most two of a, b, c
appear in v and y. At least one of v and y is not ✏, by Condition 2, so at least one
of a, b, c appears in v and y. So uv 2 xy 2 z does not contain equal numbers of as, bs
and cs, and so uv 2 xy 2 z 2 / L.
Thus Condition 3 of the Pumping Lemma is violated, contradicting our assump-
tion that the Pumping Lemma applies. Hence L is not context-free.
Example 4.3. Prove that the language L2 = {ss : s 2 {0, 1}⇤ } is not context-free.
We assume that L2 is context-free, and get a contradiction. Since L2 is context
free, the Pumping Lemma for context-free languages applies. Let N be the pumping
length.
The first issue is finding a good choice of w = ss, so that when we pump it we
leave the language L2 . For example, one possibility might be w0 = 0N 10N 1, but by
writing w0 as uvxyz where v = 0i in the first subword 0N , and y = 0i in the second
subword 0N , this choice of w can in fact be pumped.
A better choice of w is 0N 1N 0N 1N . Then w 2 L2 and |w| > N . We must
consider all ways of writing w = uvxyz so that Conditions 1 and 2 of the Pumping
Lemma both hold, and show that Condition 3 always fails. Condition 3 implies
that there exists an s1 2 {0, 1}⇤ such that w1 = uv 2 xy 2 z = s1 s1 2 L2 .
First suppose that uvxy is a subword of 0N 1N (the beginning of w). Notice
that the first letter of w1 is 0, so s1 begins with 0. If v = 0i and y = 0j , then
uv 2 xy 2 z = 0N +i+j 1N 0N 1N , with i + j > 0 by Condition 2, and i + j < N by
Condition 1. Then counting back from the end of w1 , we see that the first letter of
s1 is 1, contradicting our earlier observation that the first letter of s1 is 0. If v = 1i
and y = 1j then uv 2 xy 2 z = 0N 1N +i+j 0N 1N , and we get the same contradiction.
Similarly, if v = 0i and y = 1j then uv 2 xy 2 z = 0N +i 1N +j 0N 1N , and we get the
same contradiction. Finally, if either of v or w is of the form 0i 1j , then uv 2 xy 2 z
must finish with 1N 0N 1N , so again the first letter of s1 can be neither a 0 nor a 1.
Instead suppose that vxyz is a subword of 0N 1N (the end of w). Then reasoning
as in the previous case, but considering now the last letter of s1 (which must be a
1, since the last letter of w is a 1), we get a similar contradiction.
Hence vxy must be contained in the middle subword 1N 0N of w, and vxy must
contain at least one 1 and at least one 0. But then uv 0 xy 0 z = uxz = 0N 1i 0j 1N ,
where i < N or j < N or both, and so uxz 2 / L2 .
Thus Condition 3 of the Pumping Lemma is violated in every case, contradicting
our assumption that the Pumping Lemma applies. Hence L2 is not context-free.