Chapter – three
Chapter – three
Syntax analysis
BY Melese Alemante
2025
1
Outline
Introduction
Context free grammar (CFG)
Derivation
Parse tree
Ambiguity
Top-down parsing
• Recursive Descent Parsing (RDP)
• Non-recursive predictive parsing
– First and follow sets
– Construction of a predictive parsing table
2
Outline
LR(1) grammars
Syntax error handling
Error recovery in predictive parsing
Panic mode error recovery strategy
Bottom-up parsing (LR(k) parsing)
Stack implementation of shift/reduce parsing
Conflict during shift/reduce parsing
LR parsers
Constructing SLR parsing tables
Canonical LR parsing
LARL (Reading assignment)
Yacc
3
Introduction
Syntax: the way in which tokens are put together to
form expressions, statements, or blocks of statements.
❑ The rules governing the formation of statements in a
programming language.
Syntax analysis: the task concerned with fitting a
sequence of tokens into a specified syntax.
Parsing: To break a sentence down into its component
parts with an explanation of the form, function, and
syntactical relationship of each part.
The syntax of a programming language is usually given
by the grammar rules of a context free grammar (CFG).
4
Parser
Parse tree
next char next token
lexical Syntax
analyzer analyzer
get next
char get next
token
Source
Program
symbol
table
Lexical Syntax
(Contains a record Error
Error
for each identifier)
5
Introduction…
The syntax analyzer (parser) checks whether a given
source program satisfies the rules implied by a CFG
or not.
If it satisfies, the parser creates the parse tree of that
program.
Otherwise, the parser gives the error messages.
A CFG: gives a precise syntactic specification of a
programming language.
A grammar can be directly converted in to a parser
bysome tools (yacc).
6
Introduction…
The parser can be categorized into two groups:
Top-down parser
The parse tree is created top to bottom, starting from
the root to leaves.
Bottom-up parser
The parse tree is created bottom to top, starting from
the leaves to root.
Both top-down and bottom-up parser scan the input
from left to right (one symbol at a time).
Efficient top-down and bottom-up parsers can be
implemented by making use of context-free-
grammar.
LL for top-down parsing
LR for bottom-up parsing
7
Introduction…
LL Parsing (Top-Down)
8
Context free grammar (CFG)
A context-free grammar is a specification for the
syntactic structure of a programming language.
Context-free grammar has 4-tuples:
G = (T, N, P, S) where
T is a finite set of terminals (a set of tokens)
N is a finite set of non-terminals (syntactic variables)
P is a finite set of productions of the form
A→α where A is non-terminal and
α is a strings of terminals and non-terminals (including the
empty string)
❑ S ∈ N is a designated start symbol (one of the non-
terminal symbols)
9
Example: grammar for simple arithmetic
expressions
10
Notational Conventions Used
Terminals:
Lowercase letters early in the alphabet, such as a, b, c.
Operator symbols such as +, *, and so on.
Punctuation symbols such as parentheses, comma, and
soon.
The digits 0,1,. . . ,9.
Boldface strings such as id or if, each of which represents
a single terminal symbol.
Non-terminals:
Uppercase letters early in the alphabet, such as A, B, C.
The letter S is usually the start symbol.
Lowercase, italic names such as expr or stmt.
Uppercase letters may be used to represent non-terminals
for the constructs.
• expr, term, and factor are represented by E, T, F
11
Notational Conventions Used…
❑ Grammar symbols
❑ Uppercase letters late in the alphabet, such as X, Y, Z, that
is, either non-terminals or terminals.
Strings of terminals.
Lowercase letters late in the alphabet, mainly u,v,x,y ∈T*
Strings of grammar symbols.
Lowercase Greek letters, α, β, γ ∈(N∪T)*
A set of productions A -> α1, A -> α2, . . . , A -> αk with a common
head A (call them A-productions), may be written
A -> α1 | α2 |…| αk
α1, α2,. . . , αk the alternatives for A.
The head of the first production is the start symbol.
E -> E + T | E - T I T
T -> T * F I T / F I F
F -> ( E ) | id 12
Derivation
A derivation is a sequence of replacements of structure names
by choices on the right hand sides of grammar rules.
Example: E → E + E | E – E | E * E | E / E | -E
E→(E)
E → id
13
Derivation…
❑ We will see that the top-down parser try to find the left-most
derivation of the given source program.
❑ We will see that the bottom-up parser try to find right-most
derivation of the given source program in the reverse order.
14
Parse tree
A parse tree is a graphical representation of a
derivation
It filters out the order in which productions are applied
to replace non-terminals.
E E E E E
- E - E - E - E
( E ) ( E ) ( E )
E + E E + E
This is a top-down derivation
because we start building the id id
parse tree at the top parse tree
16
Exercise
a) Using the grammar below, draw a parse tree for the
following string:
( ( id . id ) id ( id ) ( ( ) ) )
S→E
E → id
|(E.E)
|(L)
|()
L→LE
|E
b) Give a rightmost derivation for the string given in (a).
17
Ambiguity
A grammar produces more than one parse tree for a
sentence is called as an ambiguous grammar.
• produces more than one leftmost derivation or
• more than one rightmost derivation for the same
sentence.
18
Ambiguity: Example
Example: The arithmetic expression grammar
E → E + E | E * E | ( E ) | id
permits two distinct leftmost derivations for the
sentence id + id * id:
(a) (b)
E => E + E E => E * E
=> id + E => E + E * E
=> id + E * E => id + E * E
=> id + id * E => id + id * E
=> id + id * id => id + id * id
19
Ambiguity: example…
E E + E | E E | ( E ) | - E | id
Find a derivation for the expression: id + id id
E
According to the grammar, both are correct.
E + E
id E E
E + E
E E id
id id
20
Elimination of ambiguity
Precedence/Association
❑ These two derivations point out a problem with the grammar:
❑ The grammar do not have notion of precedence, or implied order of
evaluation
To add precedence
Create a non-terminal for each level of precedence
Isolate the corresponding part of the grammar
Force the parser to recognize high precedence sub expressions first
For algebraic expressions
Multiplication and division, first (level one)
Subtraction and addition, next (level two)
To add association
Left-associative : The next-level (higher) non-terminal places at the
last of a production 22
Elimination of ambiguity
To disambiguate the grammar :
E E + E | E E | ( E ) | id
EE+T|T id + id * id
TTF|F
F ( E ) | id
22
Syntax analysis
Every language has rules that prescribe the syntactic
structure of well formed programs.
The syntax can be described using Context Free
Grammars (CFG) notation.
23
Top-down parsing
Recursive Descent Parsing (RDP)
This method of top-down parsing can be considered as
an attempt to find the left most derivation for an input
string. It may involve backtracking.
To construct the parse tree using RDP:
we create one node tree consisting of S.
two pointers, one for the tree and one for the input, wil
be used to indicate where the parsing process is.
initially, they will be on S and the first input symbol,
respectively.
then we use the first S-production to expand the tree.
The tree pointer will be positioned on the left most
symbol of the newly created sub-tree.
24
Recursive Descent Parsing (RDP)…
25
RDP…
Example: G: S -> cAd
A > ab|a
Draw the parse tree for the input string cad using
the above method.
let’s construct the parse tree for cad with the
grammar S → cAd, A → ab | a:
Start with S.
Apply S → cAd, so the tree becomes:
27
RDP…
28
Exercise
❑ Using the grammar below, draw a parse tree for the
following string using RDP algorithm:
( ( id . id ) id ( id ) ( ( ) ) )
S→E
E → id
|(E.E)
|(L)
|()
L→LE
|E
29
Non-recursive predictive parsing
It is possible to build a non-recursive parser by explicitly
maintaining a stack.
This method uses a parsing table that determines the
next production to be applied.
x=a=$ id + id id $ OUTPUT:
INPUT:
x=a≠$
X is non-terminal E
E
Predictive Parsing
STACK:
$ Program
TABLE: E
E’
E TE’
E’ +TE’
E TE’
E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E) 30
Non-recursive predictive parsing…
The input buffer contains the string to be parsed
followed by $ (the right end marker)
The stack contains a sequence of grammar symbols
with $ at the bottom.
Initially, the stack contains the start symbol of the
grammar followed by $.
The parsing table is a two dimensional array M[A, a]
where A is a non-terminal of the grammar and a is a
terminal or $.
The parser program behaves as follows.
The program always considers
X, the symbol on top of the stack and
a, the current input symbol.
31
Predictive Parsing…
There are three possibilities:
1. x = a = $ : the parser halts and announces a successful
completion of parsing
2. x = a ≠ $ : the parser pops x off the stack and advances
the input pointer to the next symbol
3. X is a non-terminal : the program consults entry M[X, a]
which can be an X-production or an error entry.
If M[X, a] = {X -> uvw}, X on top of the stack will be
replaced by uvw (u at the top of the stack).
As an output, any code associated with the X-production
can be executed.
If M[X, a] = error, the parser calls the error recovery
method.
32
A Predictive Parser table
E TE’
E’ +TE’ |
T FT’
Grammar: T’ FT’ |
F ( E ) | id
INPUT: id + id id $ OUTPUT:
E
T E’
T
E
Predictive Parsing
STACK:
E’
$ Program
$
PARSING NON-
TERMINA id +
INPUT SYMBOL
* ( ) $
TABLE: L
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’ 34
F F id F (E)
Predictive Parsing Simulation…
INPUT: id + id id $ OUTPUT:
E
T E’
F Predictive Parsing
STACK: T F T’
Program
T’
E’
$
E’
$
PARSING NON- INPUT SYMBOL
TABLE: TERMINA
L
id + * ( ) $
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’ 35
F F id F (E)
Predictive Parsing Simulation…
INPUT: id + id id $ OUTPUT:
E
T E’
Predictive Parsing
STACK: E’
T’ F T’
Program
E’
$
$ id
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’ 36
F F id F (E)
Predictive Parsing Simulation…
T FT’ id F T’
F id
id F T’
T’ FT’
F id id
T’ When Top(Stack) = input = $
E’ the parser halts and accepts the
input string.
37
Non-recursive predictive parsing…
Example: G:
E -> TR
R -> +TR Input: 1+2
R -> -TR
R -> ε
T -> 0|1|…|9
X|a 0 1 … 9 + - $
38
Non-recursive predictive parsing…
39
Non-recursive predictive parsing…
Thank you
40
FIRST and FOLLOW
41
FIRST and FOLLOW
FIRST
FIRST(α) = set of terminals that begin the strings
derived from α.
If α => ε in zero or more steps, ε is in FIRST(α).
FIRST(X) where X is a grammar symbol can be found
using the following rules:
1 If X is a terminal, then FIRST(x) = {x}
2 If X is a non-terminal: two cases
43
Construction of a predictive parsing table…
FOLLOW
FOLLOW(A) = set of terminals that can appear
immediately to the right of A in some sentential
form.
1 Place $ in FOLLOW(A), where A is the start symbol.
48
FIRST(E’) = {+, }
FIRST(T’) = { , }
FIRST(F) = {(, id}
FIRST(T) = {(, id}
Rules to Create FOLLOW
FIRST(E) = {(, id}
49
FIRST(E’) = {+, }
FIRST(T’) = { , }
FIRST(F) = {(, id}
FIRST(T) = {(, id}
Rules to Create FOLLOW
FIRST(E) = {(, id}
50
FIRST(E’) = {+, }
FIRST(T’) = { , }
FIRST(F) = {(, id}
FIRST(T) = {(, id}
Rules to Create FOLLOW
FIRST(E) = {(, id}
51
Exercies:
Find FIRST and FOLLOW sets for the following
grammar G:
E -> TR
FIRST(E)=FIRST(T)={0,1,…,9}
R -> +TR FIRST(R)={+,-,ε}
R -> -TR
R -> ε
T -> 0|1|…|9 FOLLOW(E)={$}
FOLLOW(T)={+,-,$}
FOLLOW(R)={$}
52
Exercise…
Consider the following grammar over the alphabet
{ g,h,i,b}
A -> BCD
B -> bB | ε
C -> Cg | g | Ch | i
D -> AB | ε
Fill in the table below with the FIRST and FOLLOW sets for
the non-terminals in this grammar:
FIRST FOLLOW
A
B
C
D
53
Construction of predictive parsing table
Input Grammar G
Output Parsing table M
For each production of the form A - > α of the
grammar do:
• For each terminal a in FIRST(α), add A-> α to
M[A, a]
• If ε Є FIRST(α), add A -> α to M[A, b] for each
b in FOLLOW(A)
• If ε Є FIRST(α) and $ Є FOLLOW(A), add A -> α
to M[A, $]
• Make each undefined entry of M be an error.
Shortly
54
All the ε- productions are placed under Follow Sets.
Remaining productions are set under the FIRST set
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
55
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
56
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
57
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
58
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
59
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
2. If A :
if FIRST(), add A to M[A, b]
for each terminal b FOLLOW(A),
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
60
GRAMMAR: FIRST SETS: FOLLOW SETS:
E TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
Rules to Build Parsing Table
E’ +TE’ |
T FT’
FIRST(T’) = { , }
FIRST(F) = {(, id}
FOLLOW(E’) = { ), $}
FOLLOW(T) = {+, ), $}
T’ FT’ | FIRST(T) = {(, id}
F ( E ) | id FOLLOW(T’) = {+, ), $}
FIRST(E) = {(, id} FOLLOW(F) = {+, , ), $}
1. If A :
if a FIRST(), add A to M[A, a]
2. If A :
if FIRST(), add A to M[A, b]
for each terminal b FOLLOW(A),
3. If A :
if FIRST(), and $ FOLLOW(A),
add A to M[A, $]
PARSING NON- INPUT SYMBOL
TERMINA id + * ( ) $
TABLE: L
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
61
Example:
62
Non-recursive predictive parsing…
Exercise 1:
Consider the following grammars G, Construct the
predictive parsing table and parse the input symbols:
id + id * id
FIRST(E)=FIRST(T)=FIRST(F)={(,id}
E TE’ FIRST(E’)={+,ε}
E’ +TE’ | FIRST(T’)={*,ε}
T FT’
T’ FT’ | FOLLOW(E)=FOLLOW(E’)={$,)}
F ( E ) | id FOLLOW(T)=FOLLOW(T’)={+,$,)}
FOLLOW(F)={*,+,$,)}
64
LL(k) Parser
This parser parses from left to right, and does a
leftmost-derivation. It looks up 1 symbol ahead to
choose its next action. Therefore, it is known as
a LL(1) parser.
66
Non- LL(1) Grammar: Examples
67
LL(1) Grammars…
Exercise: Consider the following grammar G:
A’ -> A
A ->xA | yA |y
a) Find FIRST and FOLLOW sets for G:
b) Construct the LL(1) parse table for this
grammar.
c) Explain why this grammar is not LL(1).
d) Transform the grammar into a grammar that
is LL(1).
e) Give the parse table for the grammar created
in (d).
68
Exercises
69
Exercises
70
Exercises
3. Given the following grammar:
program -> procedure STMT–LIST
STMT–LIST -> STMT STMT–LIST | STMT
STMT -> do VAR = CONST to CONST begin STMT–LIST end
| ASSN–STMT
Show the parse tree for the following code fragment:
procedure
do i=1 to 100 begin
ASSN –STMT
ASSN-STMT
end
ASSN-STMT
71
Exercises
72
Syntax error handling
Common programming errors can occur at many
different levels:
Lexical errors include mispelings of identifiers,
keywords, or operators: E.g., ebigin instead of begin
Syntactic errors include misplaced semicolons ; or adding
or missing of braces { }, case without switch…
Semantic errors include type mismatches between
operators and operands. a return statement in a Java
method with result type void. Operator applied to
incompatible operand
Logical errors can be anything from incorrect reasoning.
E.g, assignment operator = instead of the comparison
operator ==
73
Syntax error handling…
The error handler should be written with the
following goals in mind:
74
Syntax error handling…
75
Error recovery in predictive parsing
An error can be detected in predictive parsing:
When the terminal on top of the
stack does not match the next
input symbol or
When there is a non-terminal A on top of the stack
and a is the next input symbol and M[A, a] = error.
Panic mode error recovery method
Synchronization tokens and scan
76
Panic mode error recovery strategy
77
Panic mode error recovery…
❑ Choose alternative 1 – If the current input token is $ or is in
FOLLOW (A) (synch)
❑ Chose alternative 2 – If the current input token is not $ and
is not in FIRST (A) υ FOLLOW (A). (scan)
❑ Example: Using FOLLOW and FIRST symbols as synchronizing
tokens, the parse table for grammar G:
E TE’ FIRST(E)=FIRST(T)=FIRST(F)={(,id}
E’ +TE’ | FIRST(E’)={+,ε}
FIRST(T’)={*,ε} FOLLOW(E)=FOLLOW(E’)={$,)}
T FT’ FOLLOW(T)=FOLLOW(T’)={+,$,)}
T’ FT’ | FOLLOW(F)={*,+,$,)}
F ( E ) | id
S aABe
Consider the Grammar: A Abc | b
B d
79
Bottom-Up Parser: Simulation
INPUT: a b b c d e $ OUTPUT:
Production
S aABe
Bottom-Up Parsing
A Abc
Program
Ab
Bd
80
Bottom-Up Parser: Simulation
INPUT: a b b c d e $ OUTPUT:
Production
S aABe
Bottom-Up Parsing
A Abc A
Program
Ab
Bd b
81
Bottom-Up Parser: Simulation
INPUT: a A b c d e $ OUTPUT:
Production
S aABe
Bottom-Up Parsing
A Abc A
Program
Ab
Bd b
82
Bottom-Up Parser: Simulation
INPUT: a A b c d e $ OUTPUT:
Production
S aABe
Bottom-Up Parsing
A Abc A
Program
Ab
Bd b
83
Bottom-Up Parser: Simulation
INPUT: a A b c d e $ OUTPUT:
Production
A
S aABe
Bottom-Up Parsing
A Abc A b c
Program
Ab
Bd b
84
Bottom-Up Parser: Simulation
INPUT: a A d e $ OUTPUT:
Production
A
S aABe
Bottom-Up Parsing
A Abc A b c
Program
Ab
Bd b
85
Bottom-Up Parser: Simulation
INPUT: a A d e $ OUTPUT:
Production
A B
S aABe
Bottom-Up Parsing
A Abc A b c d
Program
Ab
Bd b
86
Bottom-Up Parser: Simulation
INPUT: a A B e $ OUTPUT:
Production
A B
S aABe
Bottom-Up Parsing
A Abc A b c d
Program
Ab
Bd b
87
Bottom-Up Parser: Simulation
INPUT: a A B e $ OUTPUT:
S
Production e
a A B
S aABe
Bottom-Up Parsing
A Abc A b c d
Program
Ab
Bd b
88
Bottom-Up Parser: Simulation
INPUT: S $ OUTPUT:
S
Production e
a A B
S aABe
Bottom-Up Parsing
A Abc A b c d
Program
Ab
Bd b
90
Stack implementation of shift/reduce
parsing
In LR parsing, the two main issues are:
o Finding the part of the text to simplify
o Choosing the rule to apply
91
Stack implementation of shift/reduce parsing…
92
Example: An example of the operations of a
shift/reduce parser
G: E -> E + E | E*E | (E) | id
93
Conflict during shift/reduce parsing
Grammars for which we can construct an LR(k)
parsing table are called LR(k) grammars.
Most of the grammars that are used in practice are
LR(1).
There are two types of conflicts in shift/reduce
parsing:
shift/reduce conflict: when we have a situation
where the parser knows the entire stack content and
the next k symbols but cannot decide whether it
should shift or reduce. Ambiguity
reduce/reduce conflict: when the parser cannot
decide which of the several productions it should use
for a reduction.
E -> T
E-> id with an id on the top of stack
T-> id 108
LR parser
input a1 … ai … an $
Stack
Sm
Xm
Sm-1 LR Output
Xm-1
Parsing program
…
S0
$
ACTION GOTO
95
LR parser…
The LR(k) stack stores strings of the form:
S0X0S1X1…XmSm where
• Si is a new symbol called state that summarizes the
information contained in the stack
• Sm is the state on top of the stack
• Xi is a grammar symbol
The parser program decides the next step by using:
• the top of the stack (Sm),
• the input symbol (ai), and
• the parsing table which has two parts: action and
goto.
• then consulting the entry ACTION[Sm , ai] in the
parsing action table
96
Structure of the LR Parsing Table
The parsing table consists of two parts:
• a parsing-action function ACTION and
• a goto function GOTO.
The ACTION function takes as arguments a state i and a
terminal a (or $, the input endmarker).
The value of ACTION[i, a] can have one of four forms:
Shift j, where j is a state. The action taken by the parser shifts
input a on the top of the stack, but uses state j to represent a.
Reduce A -> β, The action of the parser reduces β on the top of the
stack to head A.
Accept, The parser accepts the input and finishes parsing.
Error, The parser discovers an error
GOTO function, defined on sets of items, to states.
GOTO[Ii, A] = Ij, then GOTO maps a state i and a non-terminal A
to state j. 97
Constructing SLR parsing tables
This method is the simplest of the three methods
used to construct an LR parsing table.
It is called SLR (simple LR) because it is the
easiest to implement.
However, it is also the weakest in terms of the
number of grammars for which it succeeds.
A parsing table constructed by this method is
called SLR table.
A grammar for which an SLR table can be
constructed is said to be an SLR grammar.
98
Constructing SLR parsing tables…
LR (0) item
An LR (0) item (item for short) is a production of a
grammar G with a dot at some position of the right
side.
For example for the production A -> X Y Z we have
four items:
A -> . X Y Z
A ->X . Y Z
A ->X Y . Z
A ->X Y Z.
For the production A -> ε we only have one item:
A ->.
99
Constructing SLR parsing tables…
An item indicates what is the part of a production that
we have seen and what we hope to see.
The central idea in the SLR method is to construct,
from the grammar, a deterministic finite automaton to
recognize viable prefixes.
A viable prefix is a prefix of a right sentential form
that can appear on the stack of a shift/reduce parser.
• If you have a viable prefix in the stack it is possible
to have inputs that will reduce to the start symbol.
• If you don’t have a viable prefix on top of the stack
you can never reach the start symbol; therefore you
have to call the error recovery procedure.
100
Constructing SLR parsing tables…
The closure operation
101
Constructing SLR parsing tables…
Example G1’:
E’ -> E
E -> E + T
E -> T
T -> T * F
T -> F
F -> (E)
F -> id
I = {[E’ -> .E]}
Closure (I) = {[E’ -> .E], [E -> .E + T], [E-> .T], [T
-> .T * F], [T -> .F], [F -> .(E)], [F -> .id]}
102
Constructing SLR parsing tables…
The Goto operation
The second useful function is Goto (I, X) where I is a
set of items and X is a grammar symbol.
Goto (I, X) is defined as the closure of all items
[A-> αX.β] such that [A -> α.Xβ] is in I.
Example:
I = {[E’ -> E.], [E -> E . + T]}
Then
goto (I, +) = {[E -> E +. T], [T -> .T * F], [T -> .F],
[F -> .(E)] [F -> .id]}
103
Constructing SLR parsing tables…
The set of Items construction
Below is given an algorithm to construct C, the
canonical collection of sets of LR (0) items for
augmented grammar G’.
Procedure Items (G’);
Begin
C := {Closure ({[S’ -> . S]})}
Repeat
For Each item of I in C and each grammar symbol X such
that Goto (I, X) is not empty and not in C do
Add Goto (I, X) to C;
Until no more sets of items can be added to C
End
104
Constructing SLR parsing tables…
Example: Construction of the set of Items for the
augmented grammar above G1’.
I0 = {[E’ -> .E], [E-> .E + T], [E ->.T], [T ->.T * F],
[T ->.F], [F->.(E)], [F ->.id]}
I1 = Goto (I0, E) = {[E’ -> E.], [E -> E. + T]}
I2 = Goto (I0, T) = {[E -> T.], [T -> T. * F]}
I3 = Goto (I0, F) = {[T -> F.]}
I4 = Goto (I0, () = {[F -> (.E)], [E -> .E + T], [E-> .T],
[T -> .T * F], [T -> .F], [F -> . (E)], [F -> .id]}
I5 = Goto (I0, id) = {[F -> id.]}
I6 = Goto (I1, +) = {[E -> E + . T], [T-> .T * F], [T-> .F],
[F -> .(E)], [F -> .id]}
105
I7 = Goto (I2, *) = {[T ->T * . F], [F->.(E)],
[F -> .id]}
I8 = Goto (I4, E) = {[F ->(E.)], [E -> E . + T]}
Goto(I4,T)={[E->T.], [T->T.*F]}=I2;
Goto(I4,F)={[T->F.]}=I3;
Goto (I4, () = I4;
Goto (I4, id) = I5;
I9 = Goto (I6, T) = {[E -> E + T.], [T -> T . * F]}
Goto (I6, F) = I3;
Goto (I6, () = I4;
Goto (I6, id) = I5;
I10 = Goto (I7, F) = {[T-> T * F.]}
Goto (I7, () = I4;
Goto (I7, id) = I5;
I11= Goto (I8, )) = {[F-> (E).]}
Goto (I8, +) = I6;
Goto (I9, *) = I7;
106
107
SLR table construction algorithm
1. Construct C = {I0, I1, ......, IN} the collection of the
set of LR (0) items for G’.
2. State i is constructed from Ii and
a) If [A -> α.aβ] is in Ii and Goto (Ii, a) = Ij (a is a
terminal) then action [i, a]=shift j
b) If [A -> α.] is in Ii then action [i, a] = reduce A-> α for
a in Follow (A) for A ≠ S’
c) If [S’ -> S.] is in Ii then action [i, $] = accept.
109
SLR table construction method…
Example: Construct the SLR parsing table for the
grammar G1’
Follow (E) = {+, ), $} Follow (T) = {+, ), $, *}
Follow (F) = {+, ), $,*}
E’ -> E
1 E -> E + T
2 E ->T
3 T -> T * F
4 T -> F
5 F -> (E)
6 F -> id
By following the method we find the Parsing table
used earlier. 110
State action goto
id + * ( ) $ E T F
0 S5 S4 1 2 3
1 S6 accept
2 R2 S7 R2 R2
3 R4 R4 R4 R4
4 S5 S4 8 2 3
5 R6 R6 R6 R6
6 S5 S4 9 3
7 S5 S4 10
8 S6 S11
9 R1 S7 R1 R1
10 R3 R3 R3 R3
11 R5 R5 R5 R5
Legend: Si means shift to state i,
Rj means reduce production by j 156
SLR parsing table
Exercise: Construct the SLR parsing table for
the following grammar:/* Grammar G2’ */
S’ -> S
S -> L = R
S -> R
L -> *R
L -> id
R -> L
113
Answer
C = {I0, I1, I2, I3, I4, I5, I6, I7, I8, I9}
I0 = {[S’ -> .S], [S ->.L = R], [S ->.R], [L-> .*R],
[L ->.id], [R ->.L]}
I1 = goto (I0, S) = {[S’ -> S.]}
I2 = goto (I0, L) = {[S -> L . = R], [R -> L . ]}
I3 = goto (I0, R) = {[S -> R . ]}
I4 = goto (I0, *) ={[L -> * . R] [L ->.*R], [L ->.id],
[R->.L]}
I5 = goto (I0, id) ={[L -> id . ]}
I6 = goto (I2, =) ={[S -> L = . R], [R ->. L ], [L ->.*R],
[L ->.id]}
❑ I7 = goto (I4, R) ={[L -> * R . ]}
114
I8 = goto (I4, L) ={[R -> L . ]}
goto (I4, *) = I4
goto (I4, id) = I5
I9 = goto (I6, R) ={[S -> L = R .]}
goto (I6, L) = I8
goto (I6, *) = I4
goto (I6, id) = I5
Follow (S) = {$} Follow (R) = {$, =} Follow (L) = {$, =}
We have shift/reduce conflict since = is in Follow
(R) and R -> L. is in I2 and Goto (I2, =) = I6
Every SLR(1) grammar is unambiguous, but there are many
unambiguous grammars that are not SLR(1).
G2’ is not an ambiguous grammar. However, it is not SLR. This
is because the SLR parser is not powerful enough to remember
enough left context to decide whether to shift or reduce when
it sees an =. 159
LR parsing: Exercise
❑ Given the following Grammar:
(1) S -> A
(2) S -> B
(3) A -> a A b
(4) A -> 0
(5) B -> a B b b
(6) B -> 1
❑ Construct the SLR parsing table.
❑ Write the action of an LR parse for the following string
aa1bbbb
116
Thank you
117
The Parser Generator: Yacc
Yacc stands for "yet another compiler-compiler".
Yacc: a tool for automatically generating a parser
given a grammar written in a yacc specification (.y
file)
Yacc parser – calls lexical analyzer to collect
tokens from input stream
Tokens are organized using grammar rules
When a rule is recognized, its action is executed
Note
❑ lex tokenizes the input and yacc parses the tokens,
taking the right actions, in context.
118
Scanner, Parser, Lex and Yacc
119
Yacc
Yacc
specification
Yacc.y Yacc compiler y.tab.c
y.tab.c a.out
C compiler
Input Output
stream
a.out stream
120
Yacc…
There are four steps involved in creating a compiler in
Yacc:
1. Generate a parser from Yacc by running Yacc over
the grammar file.
2. Specify the grammar:
– Write the grammar in a .y file (also specify the actions
here that are to be taken in C).
– Write a lexical analyzer to process input and pass tokens
to the parser. This can be done using Lex.
– Write a function that starts parsing by calling yyparse().
– Write error handling routines (like yyerror()).
3. Compile code produced by Yacc as well as any
other relevant source files.
4. Link the object files to appropriate libraries for
the executable parser. 172
Yacc Specification
As with Lex, a Yacc program is also divided into three
sections separated by double percent signs.
A yacc specification consists of three parts:
yacc declarations, and C declarations within
%{ ----%}
%%
translation rules
%%
user-defined auxiliary procedures
The translation rules are productions with actions
production1 {semantic action1}
production2 {semantic action2}
… 173
123
Synthesized Attributes
Semantic actions may refer to values of the synthesized
attributes of terminals and non-terminals in a
production:
X : Y1 Y2 Y3 … Yn { action }
$$ refers to the value of the attribute of X
$ i refers to the value of the attribute of Yi
For example
factor : ‘(’ expr ‘)’ { $$=$2; }
factor.val=x
$$=$2
( expr.val=x )
124
Lex Yacc interaction
yacc y.tab.c
Yacc
specification y.tab.h
Yacc.y compiler
Lex Lex
lex.l lex.yy.c
and token definitions
compiler
y.tab.h
lex.yy.c C a.out
y.tab.c compiler
125
Lex Yacc interaction…
yyparse()
y.tab.c input
calc.y
Yacc
y.tab.h a.out
gcc
Lex
calc.l lex.yy.c
Compiled
yylex()
output
126
Lex Yacc interaction…
If lex is to return tokens that yacc will process, they
have to agree on what tokens there are. This is
done as follows:
The yacc file will have token definitions
%token INTEGER
in the definitions section.
When the yacc file is translated with yacc -d, a header file
y.tab.h is created that has definitions like
#define INTEGER 258
This file can then be included in both the lex and yacc
program.
The lex file can then call return INTEGER, and the yacc
program can match on this token.
127
Example : Simple calculator: yacc file
%{
int types for attributes
#include <stdio.h>
and yylval
void yyerror(char *);
#define YYSTYPE int Grammar rules
%}
%token INTEGER action
%%
program:
program expr '\n' { printf("%d\n", $2); }
|
; The value of
expr: LHS (expr)
INTEGER { $$=$1;}
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
;
%% The value of
void yyerror(char *s) { tokens on RHS
fprintf(stderr, "%s\n", s);} Stored in yylval
int main(void) {
yyparse();
return 0;} Lexical analyzer invoked by
the parser 179
Example : Simple calculator: lex file
%{ The lex program matches
#include <stdio.h> Numbers and operators
#include "y.tab.h" and returns them
extern int yylval ; Generated by yacc, contains
%} #define INTEGER 256
%%
[0-9]+ {yylval=atoi(yytext); Defined in y.tab.c
return INTEGER;
} Place the integer value
[-+*/\n] return *yytext; In the stack
[ \t] ;/*Skip white space*/
. yyerror("invalid character");
%%
int yywrap(void){
operators will
return 1; be returned
} 180
Lex and Yacc: compile and run
[compiler@localhost yacc]$ vi calc.l
[compiler@localhost yacc]$ vi calc.y
[compiler@localhost yacc]$ yacc -d calc.y
yacc: 4 shift/reduce conflicts.
[compiler@localhost yacc]$ lex calc.l
[compiler@localhost yacc]$ ls
a.out calc.l calc.y lex.yy.c typescript y.tab.c y.tab.h
[compiler@localhost yacc]$ gcc y.tab.c lex.yy.c
[compiler@localhost yacc]$ ls
a.out calc.l calc.y lex.yy.c typescript y.tab.c y.tab.h
[compiler@localhost yacc]$ ./a.out
2+3
5
23+8+
Invalid charachter
syntax error
130
Example : Simple calculator: yacc file– option2
%{
#include<stdlib.h>
#include<stdio.h>
%}
%token INTEGER;
%%
Program :
program expr '\n' {printf("%d\n ", $2);}
|
;
expr : expr '+' mulexpr {$$=$1 + $3;}
|expr '-' mulexpr {$$=$1 - $3;}
|mulexpr {$$=$1;}
;
mulexpr : mulexpr '*' term {$$=$1 * $3;}
| mulexpr '/' term {$$=$1 / $3;}
|term {$$=$1;}
;
term :
'(' expr ')' {$$=$2;}
| INTEGER {$$=$1;}
;
%%
131
Example : Simple calculator: yacc file– option2
132
Calculator 2: Example– yacc file
%{
#include<stdio.h> user: 3 * (4 + 5)
sym holds the calc: 27
int sym[26];
%} value of the user: x = 3 * (4 + 5)
%token INTEGER VARIABLE associated user: y = 5
%left '+' '-' variable user: x
%left '*' '/' calc: 27
%% associative and user: y
program : precedence rules calc: 5
program statement '\n'
| user: x + 2*y
; calc: 37
statement :
expression {printf("%d\n", $1);}
|VARIABLE '=' expression {sym[$1]= $3;}
;
expression :
INTEGER {$$=$1;}
|VARIABLE {$$=sym[$1];}
|expression '+' expression {$$=$1 + $3;}
|expression '-' expression {$$=$1 - $3;}
|expression '*' expression {$$=$1 * $3;}
|expression '/' expression {$$=$1 * $3;}
| '(' expression ')' {$$=$2;}
;
184
%%
Calculator 2: Example– yacc file
185
Calculator 2: Example– lex file
%{
#include<stdio.h> The lexical
#include<stdlib.h> analyzer returns
#include "y.tab.h“ variables and
void yyerror(char *); integers
extern int yylval;
%}
%%
[a-z] { yylval=*yytext; For variables
return VARIABLE; yylval specifies an
} index to the
[0-9]+ { yylval=atoi(yytext); symbol table sym.
return INTEGER;
}
[-+*/()=\n] return *yytext;
[ \t] ; /*Skip white space*/
. yyerrror(" Invalid character ");
%%
int yywrap(void)
{
return 1;
} 186
Conclusions
Yacc and Lex are very helpful for building
the compiler front-end
A lot of time is saved when compared to
hand-implementation of parser and scanner
They both work as a mixture of “rules” and
“C code”
C code is generated and is merged with the
rest of the compiler code
Exercise on Syntax analyzer
137
Calculator program
Expand the calculator program so that the new
calculator program is capable of processing:
user: 3 * (4 + 5)
user: x = 3 * (4 + 5)
user: y = 5
user: x + 2*y
2^3/6
sin(1) + cos(PI)
tan
log
factorial
138
CFG for MINI Language and LR(1)
parser
Write a CFG for the MINI language specifications.
Transform your CFG into:
Predictive parser (LL(1)).
- Compute FIRST, FOLLOW sets for the grammar and
create the Parsing table (manually).
❑ Bottom up parser (LR(1)).
- Construct either SLR (LR(0)), canonical (LR(1)), or
LARL parsing table (manually).
❑ Write a parsing program in yacc to parse tokens from the
MINI language.
139