Chapter 12
Chapter 12
Grammars
In COS2601 we met several ways in which languages can be defined, namely
regular expressions and machines like FAs and TGs. In this chapter we
introduce another way to define languages — context-free grammars (CFGs).
What is a grammar? A grammar is a language generator in the sense that
it produces words that belong to a certain language. It is actually a set of
rules by which the valid words of the language are constructed.
A context-free grammar consists of three things:
Open Rubric
compiler exploits this fact to determine the structure of programs written
in these languages, which then enables it to convert such a program into
assembly language.
The class of regular languages is a subset of the class of context-free
languages. Note that Pascal is not a regular language; one cannot build an
FA that will accept all and only the valid Pascal statements.
Trees
The use of trees can help us to clarify certain properties of CFGs. We first
look at syntax trees ( also called parse, generation, derivation and production
trees). Such a tree is just a different way to represent the derivation of a word
from a given CFG. Its nodes represent terminals and nonterminals from a
grammar with the root node representing the start symbol S and the children
of each internal node representing the symbols that replace that nonterminal
in the derivation. The terminals are represented by the leaf nodes of the tree.
A useful application of syntax trees in Computer Science is the representa-
tion of arithmetic expressions in operator prefix notation. This should ring a
bell, since you have done prefix, postfix and infix notation in the second year
programming modules. Note also that what Cohen calls the tree-walking
method is a preorder traversal of the syntax tree.
Sometimes more than one derivation can be found for some of the words
generated by a given CFG. Such CFGs are called ambiguous (See for example
Figure 1). It is important to realise that it is a CFG that is ambiguous, not a
context-free language. Syntax trees help us to distinguish between different
derivations of the same words in ambiguous CFGs; there is a different syntax
tree for every derivation of a word.
Why are we interested in whether a grammar is ambiguous or not? The
reason is that the intended meaning of a derived string can be in doubt if
the grammar is ambiguous. Let us consider the context-free programming
language Pascal as an example. On pages 303-305 of Brookshear the problems
we will encounter if we try to write a compiler for Pascal that is based on
an ambiguous grammar, are discussed in detail1 . If we write a compiler to
parse the words generated by a certain grammar, we have to ensure that
the grammar is unambiguous. We can use syntax trees to show that some
grammar is ambiguous.
Another useful type of tree is a total language tree, which is a way to
represent (in a single tree) all the words in a language, along with the way in
1
BROOKSHEAR, J.G. Theory of Computation. Formal languages, Automata,
and Complexity. The Benjamin/Cummings Publishing Company, 1989.
2
which they are derived. A total language tree is always defined relative to a
given CFG. The internal nodes (nonterminal nodes) of a total language tree
are all the possible working strings obtained by applying the productions
of the CFG. The terminal nodes or leaves of the tree represent the words
generated by the relevant CFG. Total language trees are strange creatures.
Some branches of such a tree may be infinitely long and consequently the tree
may become arbitrarily wide. However, every node in a total language tree
will always have a finite number of branches coming out of it. (Why? Think
about the fact that a grammar always has a finite number of productions.)
S S
@ @
@ @
@ @
a S a A S
@
@
@
b S b a a
@
@
@
a S