Compiler Design Lec-Three Syntax Analysis
Compiler Design Lec-Three Syntax Analysis
Syntax Analysis
1
Outlines
Parsing
Types of Parsing
Top-down Parsing
Bottom-Up Parsing
Stack Implementation of Shift Reduce Parsing
Error Recovery
Parser Generator
2
ROLE OF A PARSER (Syntax Analyzer)
Syntax Analyzer is also known as Parser.
Syntax analysis is the second phase of a compiler.
As we see in chapter two, a lexical analyzer can identify tokens with
the help of regular expressions and patterns or rules.
As the second phase, a syntax analyzer or parser takes the input from
a lexical analyzer in the form of token streams.
The parser analyzes the source code (token stream) against the
production rules to detect any errors in the code.
3
Parser obtains a string of token from the lexical analyzer and
reports syntax error if any otherwise generates syntax tree.
In this way, the parser accomplishes two tasks, i.e.,
Parsing the code,
Looking for errors and generating a parse tree as the
output of the phase.
4
Context-Free Grammar (CFG)
In compilation, the parser obtains a string of tokens from the lexical
analyzer, and expected to parse the whole code even if some errors
exist in the program.
A lexical analyzer cannot check the syntax of the given sentence due
to the limitations of the regular expressions.
Regular expression cannot check balancing tokens, such as
parenthesis.
Therefore, the syntax analysis phase uses Context-Free Grammar
(CFG), which is recognized by push-down automata.
The syntax of a language is specified by a context-free grammar
(CFG).
5
CFG is a helpful tool in describing the syntax of programming
languages.
The rules in a CFG are mostly recursive.
A syntax analyzer checks whether a given program satisfies the rules
implied by a CFG or not.
If it satisfies, the syntax analyzer creates a parse tree for the given
program.
A Context-Free Grammar is a quadruple that consists of terminals,
6
A context-free grammar has four components; G= {V,
T, P, S}
The strings are derived from the start symbol by repeatedly replacing
a non-terminal.
Initially a non-terminal is start symbol by the right side of a
production, for that non-terminal.
7
Example of context-free grammar: The following grammar defines
simple arithmetic expressions:
In this grammar,
8
Notational Conventions
To avoid always having to state that "these are the terminals," "these
are the non-terminals," and so on.
The following notational conventions for grammars will be used
throughout the remainder of this.
9
10
Syntax Tree versus Parse Tree
Syntax tree is a variant of a parse tree in which each leaves node
represents an operand and each interior nodes represents an operator.
Syntax tree contains only meaningful information.
A parse tree may be viewed as a graphical representation for a
derivation that filters out the choice regarding replacement order.
Each interior nodes of a parse tree is labeled by some non-terminals A
and that the leaves of the node are labeled from left to right by
symbols in the right side of the production by which this A was
replaced in the derivation.
The leaves of the parse tree are terminal symbols.
It contains unusable information also.
11
Grammar: E E * E | E + E | id
Grammar: E E * E | E + E | id
String: id + id * id or a + b* c
String: id + id * id or a + b* c
E
+
id E * E
*
id id E + E id
id id
13
Derivation
14
Types of Derivation
There are two known derivation are
1. Leftmost Derivation
A derivation of a string in a grammar G is a left most
derivation if at every step the left most non-terminal is
replaced.
Here, the sentential form of an input is scanned and replaced
from left to right.
Sentential form derived from left-most derivation is called a
left-sentential form.
15
1. Rightmost Derivation
A derivation of a string in a grammar G is a right most derivation
if at every step the right most non-terminal is replaced.
Here, we scan and replace the input with production rules, from
right to left.
Sentential form derived from right-most derivation is called
right-sentential form.
16
N.B:
In left-most derivation, the left-most side non-terminal is always
processed first; whereas
In right-most derivation, the right-most side non-terminal is always
processed first. 17
Parse Tree
It is a graphical depiction of a derivation.
It is also convenient to see how strings are derived from the start
symbol.
The start symbol of the derivation becomes the root of the parse tree.
All leaf nodes are terminals and all interior nodes are non-terminals.
18
E E*E Step1: E * E E Step2: E * E + E E
E E+E *E
E id + E * E
E id + id * E E * E E * E
E id + id * id
E + E
Step3: id * E + E Step4: id * id + E Step5: id * id + id
E E E
E * E
E * E E * E
E + E
id id E + E id
E + E
id id
id
19
Ambiguous Grammar
A grammar that produces more than one parse tree for some sentence
is said to be ambiguous grammar.
Put another way, an ambiguous grammar is one that produces more
than one leftmost derivation or more than one rightmost derivation for
the same sentence.
A grammar G is said to be ambiguous if it has more than one parse
tree (either left or right derivation) for at least one string.
Example: using the production rule, generates two parse trees, for
the string id – id + id
20
Here, two leftmost derivation for string a +a * a is possible hence,
above grammar is ambiguous.
21
The language generated by an ambiguous grammar is
known as inherently ambiguous.
There is no any method to detect and remove ambiguity
automatically, it is not good.
Ambiguity can be removed by either re-writing the whole
grammar without ambiguity, or by setting and following
associativity and precedence constraints.
22
Associativity
If an operand has operators on both sides, the side on which the
operator takes this operand is decided by the associativity of those
operators.
If the operation is left-associative, then the operand will be taken by
the left operator or if the operation is right-associative, the right
operator will take the operand.
The operations like Addition, Multiplication, Subtraction, and
Division are left associative.
For example; if the expression contains: id op id op id it will be
evaluated as: (id op id) op id i.e.; (id + id) +id. Operations like
Exponentiation are right associative.
23
Precedence
If two different operators share a common operand, the
precedence of operators decides which will take the
operand.
For example, 2 +3 * 4 have two different parse tree;
(2+3)*4 and 2+ (3*4).
By setting precedence among operators, this problem can
easily removed. i.e.; mathematically *(multiplication) has
precedence over + (addition), so the expression 2+3*4 will
always be interpreted as: 2+ (3*4).
24
Eliminating Left Recursion
A grammar is said to be left recursive if it has a non-terminal A such
that there is a derivation A=>Aα for some string α.
Top-down parsing methods cannot handle left-recursive grammars.
Hence, left recursion can be eliminated as follows:
25
A grammar becomes left-recursive if it has any non-terminal ‘A’
whose derivation contains ‘A’ itself as the left-most symbol.
Left-recursive grammar is considered as a problem for top-down
parser.
Top-down parsers start parsing from the start symbol, which in itself
is non-terminal.
So, when the parser encounters the same non-terminal in its
derivation, it becomes hard for it to judge when to stop parsing the
left non-terminal and it goes into an infinite loop.
26
Example: A => Aσ |β; this is an example of immediate left
recursion, where A is any non-terminal symbol and σ
represents a string of non-terminals.
S => Aσ |β A => Sd; this is an example of indirect-left
recursion.
A top-down parser will first parse the A, in-turn A will
produce a string consisting of A itself and the parser may go
into a loop forever. This is called left recursion problem.
27
A Aσ
Aσ
A σ
A Aσ Aσ
A σ
A Aσ
Aσ
A σ
28
Left Factoring
It is a grammar transformation that is useful for producing a grammar
suitable for predictive parsing.
When it is not clear which of two alternative productions to use to
expand a non-terminal A, we can rewrite the A-productions to defer
the decision until we have seen enough of the input to make the right
choice.
Left factoring is a grammar of transformations in which the common
parts of two productions are isolated into a single production. It is
suitable for predictive parsing.
29
Process of Eliminating Left Factoring
Any production of the form A → αβ1 | αβ2 (where α is common) can be
31
Parsing
Parsing is a technique that takes input string and produces output
either a parse tree, if string is valid sentence of grammar, or an error
message indicating that string is not a valid.
It is the process of analyzing a continuous stream of input in order to
determine its grammatical structure with respect to a given formal
grammar.
Parser is that phase of the compiler which takes tokens as input and
with the help of CFG, converts it into the corresponding parse tree.
Parser are also called Syntax Analyzer.
32
TYPES OF PARSING
There are two main kinds of parsing in use, named for the way they
build the parse trees:-
Group Assignment
Group-1
Group-2
Group -3
Group-4
Group -5:- Error
Recovery
Strategies
33
▲ Top-Down (Predictive Parsing)
Construct parse tree in a top-down matter
Find the leftmost derivation
For every non-terminal and token predict the next production
Preorder tree traversal(Root-Left-Right)
▲ Bottom-Up
Construct parse tree in a bottom-up manner
Find the rightmost derivation in a reverse order
For every potential right hand side and token decide when a
production is found
Postorder tree traversal(Left-Right-Root)
34
Top down parsing
A top-down parser attempts to construct a parse tree from the root,
applying productions forward to expand non-terminals into strings of
symbols.
In top down parsing parser build parse tree from top to bottom.
Example: LL Parsers
A top down parsers starts at the root of the parse tree and grows
towards leaves.
At each node, the parser picks a production and tries to match the
input.
However, the parser may pick the wrong production in which case it
will need to backtrack. Some grammars are backtrack- free.
35
Top down parser generates parse tree for the given input string with
the help of grammar productions by expanding the non-terminals, i.e.;
it starts from the start symbol and ends on the terminals.
It uses leftmost derivation.
When the parse tree is constructed from root to leaves, then it is said
to be top down parsing.
36
Bottom up parsing
A bottom-up parser builds a parse tree starting with the leaves, using
productions in reverse to identify strings of symbols that can be
grouped together.
In bottom up parser starts from leaves and work up to the root.
Example: LR Parsers
Bottom up parser generates the parse tree for the given input string
with the help of grammar productions by compressing the non-
terminals, i.e.; it start from the terminals and ends in the start symbol.
It uses reverse of rightmost derivation.
When the parse tree is constructed from leaves to root, then it said to
be bottom up parsing.
37
TOP DOWN PARSING
Top-down parsing can be viewed as the problem of constructing a
parse tree for the given input string, starting from the root and
creating the nodes of the parse tree in preorder (depth-first left to
right).
Equivalently, top-down parsing can be viewed as finding a leftmost
derivation for an input string.
It can be viewed as an attempt to find a left-most derivation for an
input string or an attempt to construct a parse tree for the input
starting from the root to the leaves.
38
Types of top-down parsing
It is classified into two different variants namely; one which uses
Back Tracking and the other is Non Back Tracking in nature.
1. Recursive Descent Parsing
2. Predictive Parsing
39
RECURSIVE DESCENT PARSING
Recursive descent parsing is one of the top-down parsing techniques
that uses a set of recursive procedures to scan its input.
This parsing method may involve backtracking, that is, making
repeated scans of the input.
Recursive descent parsing is a top-down parsing technique that
constructs the parse tree from the top and the input is read from left to
right.
It uses procedures for every non-terminal entity.
This parsing technique recursively parses the input to make a parse
tree which may or may not require back-tracking.
But the grammar associated with it (if not left factored) cannot avoid
40
Example for: backtracking
Consider the grammar G: S → cAd
A→ab|a
and the input string w=cad.
The parse tree can be constructed using the following top-down
approach:
tep 1:
41
Step 2:
tep 3:
42
Example for recursive decent parsing:
A left-recursive grammar can cause a recursive-descent parser to go
into an infinite loop. Hence, elimination of left-recursion must be
done before parsing.
Consider the grammar for arithmetic expressions
43
After eliminating the left-recursion the
grammar becomes,
44
Now we can write the procedure for
grammar as follows:
45
Stack implementation:
To recognize input: id + id * id:
46
PREDICTIVE PARSING
Predictive parsing is a special case of recursive descent parsing where
no backtracking is required.
It is possible to build a non-recursive predictive parser by maintaining
a stack explicitly, rather than implicitly via recursive calls.
The key problem during predictive parsing is that of determining the
production to be applied for a nonterminal in case of alternatives.
The non-recursive parser in figure looks up the production to be
applied in parsing table.
In what follows, we shall see how the table can be constructed
directly from certain grammars.
47
LL(1) grammar
Predictive parsers are those recursive descent parsers needing no
backtracking.
LL(1) is non-recursive top down parser.
1) First L indicates input is scanned from Left to Right.
2) The second L means it uses Leftmost derivation for input string.
3) 1 means it uses look-ahead of only one input symbol to predict
the parsing process.
The parsing table entries are single entries. So each location has not
more than one entry. This type of grammar is called LL(1) grammar.
48
The table-driven predictive parser has
An input buffer,
Stack,
A parsing table and
An output stream.
49
Input buffer: It consists of strings to be parsed, followed by $ to
indicate the end of the input string.
Stack: It contains a sequence of grammar symbols preceded by $ to
indicate the bottom of the stack. Initially, the stack contains the start
symbol on top of $.
Parsing table: It is a two-dimensional array M[A, a], where ‘A’ is a
non-terminal and ‘a’ is a terminal.
Predictive parsing program:
The parser is controlled by a program that considers X, the symbol
on top of stack, and a, the current input symbol.
These two symbols determine the parser action.
50
There are three possibilities:
1) If X = a = $, the parser halts and announces successful completion
of parsing.
2) If X = a ≠ $, the parser pops X off the stack and advances the input
pointer to the next input symbol.
3) If X is a non-terminal , the program consults entry M[X, a] of the
parsing table M. This entry will either be an X-production of the
grammar or an error entry.
If M[X, a] = {X → UVW},the parser replaces X on top of
the stack by WVU.
If M[X, a] = error, the parser calls an error recovery
routine.
51
Implementation of Predictive Parser:-
1) Elimination of left recursion, left factoring and ambiguous grammar.
2) Construct FIRST() and FOLLOW() for all non-terminals.
3) Construct predictive parsing table.
4) Parse the given input string using stack and parsing table
52
Predictive Parsing Table Construction
The construction of a predictive parser is aided by two functions
associated with a grammar G :
1) FIRST
2) FOLLOW
FIRST and FOLLOW are two functions associated with a grammar
that help us fill in the entries of an M-table.
Benefits of First() and Follow()
Can be used to prove the LL(K) characteristic of a grammar
Can be used to aid in the construction of predictive parsing table.
Provides selection information for recursive descent parsers.
53
FIRST
First() is a function which gives the set of terminals that begins
strings derived from the production rule.
It computes the set of terminal symbols with which the RHS of the
productions begin.
Rules for First( ):
1.If X is terminal, then FIRST(X) is {X}.
2.If X → € is a production, then add € to FIRST(X).
3.If X is non-terminal and X → aα is a production then add a to FIRST(X).
4.If X is non-terminal and X → Y1 Y2…Yk is a production, then place a in
…,FIRST(Yi-1); that is, Y1,…. Yi-1 => € . If is in FIRST(Yj) for all j=1,2,..,k,
then add € to FIRST(X).
54
Follow
Follow() is a function which gives the set of terminals that can
appear immediately to the right of a given symbol.
It is nothing but the set of terminal symbols of the grammar that are
immediately following the Non-terminal A.
56
57
Predictive parsing Table
58
Question & Answer
59
Thank You !!!
60