0% found this document useful (0 votes)
29 views3 pages

Chapter 12

Uploaded by

Kevin May
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views3 pages

Chapter 12

Uploaded by

Kevin May
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Context-Free Grammars

This chapter corresponds to chapter 12 of the second edition of Cohen (1997).

Grammars
In COS2601 we met several ways in which languages can be defined, namely
regular expressions and machines like FAs and TGs. In this chapter we
introduce another way to define languages — context-free grammars (CFGs).
What is a grammar? A grammar is a language generator in the sense that
it produces words that belong to a certain language. It is actually a set of
rules by which the valid words of the language are constructed.
A context-free grammar consists of three things:

• an alphabet Σ of letters called terminals,

• a set of nonterminals, which always includes the symbol S, and

• a finite set of rules called productions.

We usually take the alphabet Σ to be {a, b}. Actually, we do it so often


that you may take it to be {a, b} unless stated otherwise. The set of nonter-
minals is usually not mentioned either since it is easy to figure out what it
is from the productions.
The idea with a CFG is that we start with a production with the symbol
S on the lefthand side and then substitute strings for the occurrences of
nonterminals on the righthand side. Each nonterminal can be replaced by
any string which is the righthand side of a production where this nonterminal
is on the lefthand side. The set of words, i.e. the strings of terminals, that can
be obtained in this fashion is the language which is generated (also defined,
derived or produced ) by this CFG. When we say that a language is generated
by a CFG, we mean that the CFG can generate all words in the language and
no other words (as in the case of languages generated by regular expressions
in COS2601). It is important to remember this when you are required to
give a CFG that generates a given language (again similar to the case when
a regular expressions had to be given which would produce a given language).
Thus in the example dealing with the language EVEN-EVEN, Cohen first
shows that every word in the language can be generated by the given grammar
and then that every word generated by the grammar is in EVEN-EVEN.
The class of languages that can be generated by CFGs is called the class of
context-free languages. Context-free grammars are important because every
programming language, for example Pascal, can be defined by a CFG. A

Open Rubric
compiler exploits this fact to determine the structure of programs written
in these languages, which then enables it to convert such a program into
assembly language.
The class of regular languages is a subset of the class of context-free
languages. Note that Pascal is not a regular language; one cannot build an
FA that will accept all and only the valid Pascal statements.

Trees
The use of trees can help us to clarify certain properties of CFGs. We first
look at syntax trees ( also called parse, generation, derivation and production
trees). Such a tree is just a different way to represent the derivation of a word
from a given CFG. Its nodes represent terminals and nonterminals from a
grammar with the root node representing the start symbol S and the children
of each internal node representing the symbols that replace that nonterminal
in the derivation. The terminals are represented by the leaf nodes of the tree.
A useful application of syntax trees in Computer Science is the representa-
tion of arithmetic expressions in operator prefix notation. This should ring a
bell, since you have done prefix, postfix and infix notation in the second year
programming modules. Note also that what Cohen calls the tree-walking
method is a preorder traversal of the syntax tree.
Sometimes more than one derivation can be found for some of the words
generated by a given CFG. Such CFGs are called ambiguous (See for example
Figure 1). It is important to realise that it is a CFG that is ambiguous, not a
context-free language. Syntax trees help us to distinguish between different
derivations of the same words in ambiguous CFGs; there is a different syntax
tree for every derivation of a word.
Why are we interested in whether a grammar is ambiguous or not? The
reason is that the intended meaning of a derived string can be in doubt if
the grammar is ambiguous. Let us consider the context-free programming
language Pascal as an example. On pages 303-305 of Brookshear the problems
we will encounter if we try to write a compiler for Pascal that is based on
an ambiguous grammar, are discussed in detail1 . If we write a compiler to
parse the words generated by a certain grammar, we have to ensure that
the grammar is unambiguous. We can use syntax trees to show that some
grammar is ambiguous.
Another useful type of tree is a total language tree, which is a way to
represent (in a single tree) all the words in a language, along with the way in
1
BROOKSHEAR, J.G. Theory of Computation. Formal languages, Automata,
and Complexity. The Benjamin/Cummings Publishing Company, 1989.

2
which they are derived. A total language tree is always defined relative to a
given CFG. The internal nodes (nonterminal nodes) of a total language tree
are all the possible working strings obtained by applying the productions
of the CFG. The terminal nodes or leaves of the tree represent the words
generated by the relevant CFG. Total language trees are strange creatures.
Some branches of such a tree may be infinitely long and consequently the tree
may become arbitrarily wide. However, every node in a total language tree
will always have a finite number of branches coming out of it. (Why? Think
about the fact that a grammar always has a finite number of productions.)

S S
@ @
@ @
@ @
a S a A S
@
@
@
b S b a a
@
@
@
a S

Figure 1: Two syntax trees

You might also like