Syntax Analysis: Role of Parsers
Syntax Analysis: Role of Parsers
Role of Parsers
A parser for a grammar ‘G’ is a program that takes as input a string ‘w’ and
produces as output either a parse tree for ‘w’ if ‘w’ is a sentence of G or an
error message indicating that ‘w’ is not a sentence of G.
• It checks that the tokens appearing in its input, which is the output of
the lexical analyzer, occurs in the patterns that are permitted by the
specification for the source language.
• It also imposes on the tokens a tree like structure that is used by the
subsequent phases of the compiler.
Then after lexical analysis this expression might appear to the syntax
analyzer as the token sequence:
id+/id
On seeing the / after +, the syntax analyzer should detect an error situation,
because the presence of these two adjacent binary operators violates the
rules of PL/I expression.
Expression Expression
Expression Expression
A / B * C
(a)
25
Expression
Expression Expression
Expression Expression
A / B * C
(b)
The language specification must tell that which of the interpretations (a) and
(b) is to be used, and in general, what hierarchical structure each source
program has. These rules form the syntactic specification of a programming
language.
Types of Parsers
There are two types of parsers:
1. Bottom-up parser, and
2. Top-down parser.
1 Bottom-up Parser
These build the parsers from the bottom or leaves to the top i.e., root. It is
also called as Shift-Reduce-Parser because it consists of shifting the input
symbols onto a stack until the right side of a production appears on the top
of the Stack.
The right side then be replaced or reduced by the symbol on the left side of
the production. One kind of shift-reduce parsers are Operator-precedence
parsers and more general type of shift-reduce parsers are LR parsers.
2 Top-down Parser
These start with the root and work down to the leaves. This method is called
as Recursive-Descent Parsing and its tabularized form is called as
Predictive Parser and a special kind of predictive parser is called as LL
Parser.
Ambiguous Grammar
A grammar that produces more than one parse tree for same sentence is
said to be ambiguous. An ambiguous grammar is one that produces more
than one left most or more than one right most derivations for same
sentence. For some type of parsers, it is desirable that the grammar be
made unambiguous, for if it is not, then it is difficult to uniquely determine
which parse tree to select for a sentence.
26
For example, consider the following grammar for arithmetic expressions
involving +, -, *, /, and ↑ (exponentiation).
E E + E | E - E | E * E | E / E | E ↑ E | ( E ) | -E | id
- (unary minus)
↑
* /
+ -
Thus using these precedence and associativity rules the above given
ambiguous grammar can be simplified as follows:
Step I: E E | id
II: E - E | id
III: E - E ↑ E | id
IV: E -E↑E*E| id
V: E -E↑E*E/ E | id
VI: E -E↑E*E/ E + E | id
VII: E -E↑E*E/ E + E – E | id
E E + E | E * E | ( E ) | - E | id --(1)
27
In general, a grammar involves four quantities:
- Terminals
- Non-terminals
- Start symbol, and
- Productions
Terminals:
Terminals are the basic symbols of which strings in the language are
composed. The term "token name" is a synonym for “Terminal". For e.g., +,
id, *, (, ), a, b, etc.
Non-terminals:
Non-terminals are the special symbols or syntactic variables that denote the
set of strings. The sets of strings denoted by non-terminals help define the
language generated by the grammar. For e.g., E, F, T in the productions as
follows:
EF+T
FT*E
T ( E ) / id
Productions:
Start Symbol:
Notational Conventions
To avoid always having to state that “these are non-terminals”, or “these are
terminals”, and so on, some notational shorthands shall be employed. These
may be summarized as follows:
28
2) Following symbols are usually terminals:
a. Lower-case letters such as a, b, c.
b. Operator symbols such as +, -, etc.
c. Punctuation symbols such as parentheses, comma, etc.
d. The digits 0, 1, …., 9.
e. Bold-face strings such as id or if.
Each interior node of the parse tree is labeled by some non-terminal A, and
the children of the node are labeled, from left to right by the symbols in the
right side of the production by which this A can be replaced in the
derivation. For e.g., if A XYZ is a production used at some step of
derivation, then the parse tree for that derivation will have the following sub
tree:
X Y Z
The leaves of the parse tree are labeled by the non-terminals or terminals,
and read from left to right. They constitute a sentential form called the
YIELD or FRONTIER of the tree.
29
For e.g., the parse tree for – (id + id) implied by the derivation is shown as
follows:
E
- E
( E )
E + E
Id id
Parse Tree
The parse tree ignores variations in the order in which symbols are replaced.
These variations in the order in which productions are applied can also be
eliminated by considering only leftmost (or rightmost) derivations. It is not
hard to see that every parse tree has associated with it a unique leftmost
and a unique rightmost derivation. However, it should not be assumed that
every sentence necessarily has only one parse tree or only one leftmost or
rightmost derivation.
30