0% found this document useful (0 votes)
4 views

Compiler Design Lec-Three Syntax Analysis

Uploaded by

mihretabdesta10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Compiler Design Lec-Three Syntax Analysis

Uploaded by

mihretabdesta10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

CHAPTER THREE

Syntax Analysis

Wachemo University(Durame Campus)


College of Engineering and Technology
Department of Computer Science

Mr. Abraham Wolde(2025)

1
Outlines

 Parsing
 Types of Parsing
 Top-down Parsing
 Bottom-Up Parsing
 Stack Implementation of Shift Reduce Parsing
 Error Recovery
 Parser Generator

2
ROLE OF A PARSER (Syntax Analyzer)
 Syntax Analyzer is also known as Parser.
 Syntax analysis is the second phase of a compiler.
 As we see in chapter two, a lexical analyzer can identify tokens with
the help of regular expressions and patterns or rules.
 As the second phase, a syntax analyzer or parser takes the input from
a lexical analyzer in the form of token streams.
 The parser analyzes the source code (token stream) against the
production rules to detect any errors in the code.

3
 Parser obtains a string of token from the lexical analyzer and
reports syntax error if any otherwise generates syntax tree.
 In this way, the parser accomplishes two tasks, i.e.,
 Parsing the code,
 Looking for errors and generating a parse tree as the
output of the phase.
4
Context-Free Grammar (CFG)
 In compilation, the parser obtains a string of tokens from the lexical
analyzer, and expected to parse the whole code even if some errors
exist in the program.
 A lexical analyzer cannot check the syntax of the given sentence due
to the limitations of the regular expressions.
 Regular expression cannot check balancing tokens, such as
parenthesis.
 Therefore, the syntax analysis phase uses Context-Free Grammar
(CFG), which is recognized by push-down automata.
 The syntax of a language is specified by a context-free grammar
(CFG).
5
 CFG is a helpful tool in describing the syntax of programming
languages.
 The rules in a CFG are mostly recursive.
 A syntax analyzer checks whether a given program satisfies the rules
implied by a CFG or not.
 If it satisfies, the syntax analyzer creates a parse tree for the given
program.
 A Context-Free Grammar is a quadruple that consists of terminals,

non-terminals, start symbol and productions.

6
 A context-free grammar has four components; G= {V,
T, P, S}

 The strings are derived from the start symbol by repeatedly replacing
a non-terminal.
 Initially a non-terminal is start symbol by the right side of a
production, for that non-terminal.
7
 Example of context-free grammar: The following grammar defines
simple arithmetic expressions:

In this grammar,

8
Notational Conventions
 To avoid always having to state that "these are the terminals," "these
are the non-terminals," and so on.
 The following notational conventions for grammars will be used
throughout the remainder of this.

9
10
Syntax Tree versus Parse Tree
 Syntax tree is a variant of a parse tree in which each leaves node
represents an operand and each interior nodes represents an operator.
 Syntax tree contains only meaningful information.
 A parse tree may be viewed as a graphical representation for a
derivation that filters out the choice regarding replacement order.
 Each interior nodes of a parse tree is labeled by some non-terminals A
and that the leaves of the node are labeled from left to right by
symbols in the right side of the production by which this A was
replaced in the derivation.
 The leaves of the parse tree are terminal symbols.
 It contains unusable information also.
11
Grammar: E E * E | E + E | id
Grammar: E E * E | E + E | id
String: id + id * id or a + b* c
String: id + id * id or a + b* c
E
+

id E * E
*

id id E + E id

id id

 A parse tree is the graphical representation of the structure of a


sentence according to its grammar.
 Parsing is performed at the Syntax Analysis phase where a streams
of tokens is taken as input from the Lexical Analysis and parser
produces the parse tree for the tokens against the syntax error.
 The concept of parsing is basically address various concepts such as;
Derivation, Parse tree, Ambiguity, Left recursion.
12
Derivation
 A derivation is basically a sequence of production rules, in order to
get the input string.
 During parsing, we take two decisions for some sentential form of
input.
Deciding the non-terminal which is to be replaced and
deciding the production rules, by which the non-terminal will
be replaced.

13
Derivation

 Derivation is a process that generates a valid string with the help of


grammar by replacing the non-terminals on the left with the string on
the right side of the production.
 Derivation means to replace a non-terminal by the body of its
production.
 It is used to find whether the string belongs to a given grammar or
not.

14
Types of Derivation
 There are two known derivation are
1. Leftmost Derivation
 A derivation of a string in a grammar G is a left most
derivation if at every step the left most non-terminal is
replaced.
 Here, the sentential form of an input is scanned and replaced
from left to right.
 Sentential form derived from left-most derivation is called a
left-sentential form.

15
1. Rightmost Derivation
 A derivation of a string in a grammar G is a right most derivation
if at every step the right most non-terminal is replaced.
 Here, we scan and replace the input with production rules, from
right to left.
 Sentential form derived from right-most derivation is called
right-sentential form.

16
N.B:
In left-most derivation, the left-most side non-terminal is always
processed first; whereas
In right-most derivation, the right-most side non-terminal is always
processed first. 17
Parse Tree
 It is a graphical depiction of a derivation.
 It is also convenient to see how strings are derived from the start
symbol.
 The start symbol of the derivation becomes the root of the parse tree.
 All leaf nodes are terminals and all interior nodes are non-terminals.

18
E E*E Step1: E * E E Step2: E * E + E E
E E+E *E
E id + E * E
E id + id * E E * E E * E
E id + id * id
E + E
Step3: id * E + E Step4: id * id + E Step5: id * id + id
E E E
E * E
E * E E * E
E + E
id id E + E id
E + E
id id
id
19
Ambiguous Grammar
 A grammar that produces more than one parse tree for some sentence
is said to be ambiguous grammar.
 Put another way, an ambiguous grammar is one that produces more
than one leftmost derivation or more than one rightmost derivation for
the same sentence.
 A grammar G is said to be ambiguous if it has more than one parse
tree (either left or right derivation) for at least one string.
 Example: using the production rule, generates two parse trees, for
the string id – id + id

20
 Here, two leftmost derivation for string a +a * a is possible hence,
above grammar is ambiguous.

21
 The language generated by an ambiguous grammar is
known as inherently ambiguous.
 There is no any method to detect and remove ambiguity
automatically, it is not good.
 Ambiguity can be removed by either re-writing the whole
grammar without ambiguity, or by setting and following
associativity and precedence constraints.

22
Associativity
 If an operand has operators on both sides, the side on which the
operator takes this operand is decided by the associativity of those
operators.
 If the operation is left-associative, then the operand will be taken by
the left operator or if the operation is right-associative, the right
operator will take the operand.
 The operations like Addition, Multiplication, Subtraction, and
Division are left associative.
 For example; if the expression contains: id op id op id it will be
evaluated as: (id op id) op id i.e.; (id + id) +id. Operations like
Exponentiation are right associative.
23
Precedence
 If two different operators share a common operand, the
precedence of operators decides which will take the
operand.
 For example, 2 +3 * 4 have two different parse tree;
(2+3)*4 and 2+ (3*4).
 By setting precedence among operators, this problem can
easily removed. i.e.; mathematically *(multiplication) has
precedence over + (addition), so the expression 2+3*4 will
always be interpreted as: 2+ (3*4).
24
Eliminating Left Recursion
 A grammar is said to be left recursive if it has a non-terminal A such
that there is a derivation A=>Aα for some string α.
 Top-down parsing methods cannot handle left-recursive grammars.
Hence, left recursion can be eliminated as follows:

25
 A grammar becomes left-recursive if it has any non-terminal ‘A’
whose derivation contains ‘A’ itself as the left-most symbol.
 Left-recursive grammar is considered as a problem for top-down
parser.
 Top-down parsers start parsing from the start symbol, which in itself
is non-terminal.
 So, when the parser encounters the same non-terminal in its
derivation, it becomes hard for it to judge when to stop parsing the
left non-terminal and it goes into an infinite loop.

26
Example: A => Aσ |β; this is an example of immediate left
recursion, where A is any non-terminal symbol and σ
represents a string of non-terminals.
S => Aσ |β A => Sd; this is an example of indirect-left
recursion.
 A top-down parser will first parse the A, in-turn A will
produce a string consisting of A itself and the parser may go
into a loop forever. This is called left recursion problem.

27
A Aσ

A σ
A Aσ Aσ

A σ
A Aσ

A σ

28
Left Factoring
 It is a grammar transformation that is useful for producing a grammar
suitable for predictive parsing.
 When it is not clear which of two alternative productions to use to
expand a non-terminal A, we can rewrite the A-productions to defer
the decision until we have seen enough of the input to make the right
choice.
 Left factoring is a grammar of transformations in which the common
parts of two productions are isolated into a single production. It is
suitable for predictive parsing.

29
Process of Eliminating Left Factoring
Any production of the form A → αβ1 | αβ2 (where α is common) can be

replaced the following productions,


A → αA’
A’ → β1 | β2

 Left factoring is required because it is difficult to decide

which production is to select either αβ1 | αβ2 to expand A.


 In left factoring, we defer this decision by expanding A to
A’ until we have seen enough input to make the right
choice.
30
For example, consider CFG,
S → iEtS | iEtSeS
E→b
 In this grammar, the common part is iEtS. Then after elimination of
left factoring, the productions are
S → iEtSS’
S’ → eS | €
E→b

31
Parsing
 Parsing is a technique that takes input string and produces output
either a parse tree, if string is valid sentence of grammar, or an error
message indicating that string is not a valid.
 It is the process of analyzing a continuous stream of input in order to
determine its grammatical structure with respect to a given formal
grammar.
 Parser is that phase of the compiler which takes tokens as input and
with the help of CFG, converts it into the corresponding parse tree.
 Parser are also called Syntax Analyzer.

32
TYPES OF PARSING
 There are two main kinds of parsing in use, named for the way they
build the parse trees:-

Group Assignment
Group-1
Group-2
Group -3
Group-4
Group -5:- Error
Recovery
Strategies

33
▲ Top-Down (Predictive Parsing)
 Construct parse tree in a top-down matter
 Find the leftmost derivation
 For every non-terminal and token predict the next production
 Preorder tree traversal(Root-Left-Right)
▲ Bottom-Up
 Construct parse tree in a bottom-up manner
 Find the rightmost derivation in a reverse order
 For every potential right hand side and token decide when a
production is found
 Postorder tree traversal(Left-Right-Root)

34
Top down parsing
 A top-down parser attempts to construct a parse tree from the root,
applying productions forward to expand non-terminals into strings of
symbols.
 In top down parsing parser build parse tree from top to bottom.
Example: LL Parsers
 A top down parsers starts at the root of the parse tree and grows
towards leaves.
 At each node, the parser picks a production and tries to match the
input.
 However, the parser may pick the wrong production in which case it
will need to backtrack. Some grammars are backtrack- free.
35
 Top down parser generates parse tree for the given input string with
the help of grammar productions by expanding the non-terminals, i.e.;
it starts from the start symbol and ends on the terminals.
 It uses leftmost derivation.
 When the parse tree is constructed from root to leaves, then it is said
to be top down parsing.

36
Bottom up parsing
 A bottom-up parser builds a parse tree starting with the leaves, using
productions in reverse to identify strings of symbols that can be
grouped together.
 In bottom up parser starts from leaves and work up to the root.
Example: LR Parsers
 Bottom up parser generates the parse tree for the given input string
with the help of grammar productions by compressing the non-
terminals, i.e.; it start from the terminals and ends in the start symbol.
 It uses reverse of rightmost derivation.
 When the parse tree is constructed from leaves to root, then it said to

be bottom up parsing.
37
TOP DOWN PARSING
 Top-down parsing can be viewed as the problem of constructing a
parse tree for the given input string, starting from the root and
creating the nodes of the parse tree in preorder (depth-first left to
right).
 Equivalently, top-down parsing can be viewed as finding a leftmost
derivation for an input string.
 It can be viewed as an attempt to find a left-most derivation for an
input string or an attempt to construct a parse tree for the input
starting from the root to the leaves.

38
Types of top-down parsing
 It is classified into two different variants namely; one which uses
Back Tracking and the other is Non Back Tracking in nature.
1. Recursive Descent Parsing
2. Predictive Parsing

39
RECURSIVE DESCENT PARSING
 Recursive descent parsing is one of the top-down parsing techniques
that uses a set of recursive procedures to scan its input.
 This parsing method may involve backtracking, that is, making
repeated scans of the input.
 Recursive descent parsing is a top-down parsing technique that
constructs the parse tree from the top and the input is read from left to
right.
 It uses procedures for every non-terminal entity.
 This parsing technique recursively parses the input to make a parse
tree which may or may not require back-tracking.
 But the grammar associated with it (if not left factored) cannot avoid
40
Example for: backtracking
Consider the grammar G: S → cAd
A→ab|a
and the input string w=cad.
The parse tree can be constructed using the following top-down
approach:
tep 1:

41
Step 2:

tep 3:

tep 4:Now try the second alternative for A.


 Now we can halt and announce the successful
completion of parsing.

42
Example for recursive decent parsing:
 A left-recursive grammar can cause a recursive-descent parser to go
into an infinite loop. Hence, elimination of left-recursion must be
done before parsing.
 Consider the grammar for arithmetic expressions

43
 After eliminating the left-recursion the
grammar becomes,

44
 Now we can write the procedure for

grammar as follows:

45
Stack implementation:
 To recognize input: id + id * id:

46
PREDICTIVE PARSING
 Predictive parsing is a special case of recursive descent parsing where
no backtracking is required.
 It is possible to build a non-recursive predictive parser by maintaining
a stack explicitly, rather than implicitly via recursive calls.
 The key problem during predictive parsing is that of determining the
production to be applied for a nonterminal in case of alternatives.
 The non-recursive parser in figure looks up the production to be
applied in parsing table.
 In what follows, we shall see how the table can be constructed
directly from certain grammars.

47
LL(1) grammar
 Predictive parsers are those recursive descent parsers needing no
backtracking.
 LL(1) is non-recursive top down parser.
1) First L indicates input is scanned from Left to Right.
2) The second L means it uses Leftmost derivation for input string.
3) 1 means it uses look-ahead of only one input symbol to predict
the parsing process.
 The parsing table entries are single entries. So each location has not
more than one entry. This type of grammar is called LL(1) grammar.

48
 The table-driven predictive parser has
 An input buffer,
 Stack,
 A parsing table and
An output stream.
49
 Input buffer: It consists of strings to be parsed, followed by $ to
indicate the end of the input string.
 Stack: It contains a sequence of grammar symbols preceded by $ to
indicate the bottom of the stack. Initially, the stack contains the start
symbol on top of $.
 Parsing table: It is a two-dimensional array M[A, a], where ‘A’ is a
non-terminal and ‘a’ is a terminal.
 Predictive parsing program:
 The parser is controlled by a program that considers X, the symbol
on top of stack, and a, the current input symbol.
 These two symbols determine the parser action.

50
 There are three possibilities:
1) If X = a = $, the parser halts and announces successful completion
of parsing.
2) If X = a ≠ $, the parser pops X off the stack and advances the input
pointer to the next input symbol.
3) If X is a non-terminal , the program consults entry M[X, a] of the
parsing table M. This entry will either be an X-production of the
grammar or an error entry.
If M[X, a] = {X → UVW},the parser replaces X on top of
the stack by WVU.
If M[X, a] = error, the parser calls an error recovery
routine.
51
Implementation of Predictive Parser:-
1) Elimination of left recursion, left factoring and ambiguous grammar.
2) Construct FIRST() and FOLLOW() for all non-terminals.
3) Construct predictive parsing table.
4) Parse the given input string using stack and parsing table

52
Predictive Parsing Table Construction
 The construction of a predictive parser is aided by two functions
associated with a grammar G :
1) FIRST
2) FOLLOW
 FIRST and FOLLOW are two functions associated with a grammar
that help us fill in the entries of an M-table.
Benefits of First() and Follow()
 Can be used to prove the LL(K) characteristic of a grammar
 Can be used to aid in the construction of predictive parsing table.
 Provides selection information for recursive descent parsers.

53
FIRST
 First() is a function which gives the set of terminals that begins
strings derived from the production rule.
 It computes the set of terminal symbols with which the RHS of the
productions begin.
Rules for First( ):
1.If X is terminal, then FIRST(X) is {X}.
2.If X → € is a production, then add € to FIRST(X).
3.If X is non-terminal and X → aα is a production then add a to FIRST(X).
4.If X is non-terminal and X → Y1 Y2…Yk is a production, then place a in

FIRST(X) if for some i, a is in FIRST(Yi), and € is in all of FIRST(Y1),

…,FIRST(Yi-1); that is, Y1,…. Yi-1 => € . If is in FIRST(Yj) for all j=1,2,..,k,
then add € to FIRST(X).
54
Follow
 Follow() is a function which gives the set of terminals that can
appear immediately to the right of a given symbol.
 It is nothing but the set of terminal symbols of the grammar that are
immediately following the Non-terminal A.

Rules for Follow( ):

1. $ is an element of follow(S), where S is start symbol and $ indicates


the end of the input.
2.If X → αXY is a production, the set First(Y) is in the set Follow(X),
excluding € .
3.If X → αX or X → αXY are productions and FIRST(Y) has an
element €, the set Follow(X) is in the set Follow(Y) 55
Example: Consider the following grammar:

 After eliminating left-recursion the grammar is

56
57
Predictive parsing Table

58
Question & Answer

59
Thank You !!!

60

You might also like