0% found this document useful (0 votes)
56 views

Chapter 3 B Top-Down Parsing

The document discusses top-down parsing and recursive descent parsers. It begins by introducing top-down and bottom-up parsing strategies. It then discusses recursive descent parsers, which implement top-down parsing by using a set of mutually recursive functions. Recursive descent parsers are a type of LL(k) parser that uses leftmost derivations with k symbols of lookahead. The document explains how to build a recursive descent parser by first determining the first and follow sets of the grammar's nonterminals. It provides algorithms for calculating first and follow sets and gives an example of applying the algorithms to a sample grammar.

Uploaded by

Sola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Chapter 3 B Top-Down Parsing

The document discusses top-down parsing and recursive descent parsers. It begins by introducing top-down and bottom-up parsing strategies. It then discusses recursive descent parsers, which implement top-down parsing by using a set of mutually recursive functions. Recursive descent parsers are a type of LL(k) parser that uses leftmost derivations with k symbols of lookahead. The document explains how to build a recursive descent parser by first determining the first and follow sets of the grammar's nonterminals. It provides algorithms for calculating first and follow sets and gives an example of applying the algorithms to a sample grammar.

Uploaded by

Sola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Chapter Three continue

Top-Down Parsing

1
Objective
At the end of this session students will be able to:
 Understand the basics of Parsing techniques(Top-Down Vs. Bottom-Up parsing).

 Understand the Recursive Descent Parsers: First and Fellow sets and how to find

fisrt and fellow sets of a parser, LL(1) parse Tables.


 Understand about Grammars that are not LL(1): Removing Ambiguity, Removing

left recursion, left factoring


 Be familiar with LL(k) grammars.

2
Parsing and Parsers
 Once we have described the syntax of our programming language using a
context-free grammar, the next step is to determine if a string of tokens
returned from the lexical analyzer could be derived from that context-free
grammar
 Determining if a sequence of tokens is syntactically correct is called parsing

 Two main strategies:

A. Top-down parsing: Start at the root of the parse tree and grow toward

leaves
o Pick a production rule & try to match the input

o Bad “pick”  may need to backtrack

B. Bottom-up parsing: Start at the leaves and grow toward root (earlier

parsers: e.g. yacc)


3
o As input is consumed, encode possible parse trees in an internal state
BothConti..
top-down and bottom-up parser scan the input from left to
right
(one symbol at a time).
Efficient top-down and bottom-up parsers can be implemented by
making
use of context-free- grammar.
 LL for top-down parsing
 LR for bottom-up parsing
We will see that the top-down parser try to find the left-most
derivation of
the given source program.
We will see that the bottom-up parser try to find right-most
derivation of the
given source program in the reverse order.
Contd.

LL(1) parsers, recursive descent Parsers


Left-to-right input
Grammars that this can
Leftmost derivation
handle are called LL(1)
1 symbol of look-ahead
grammars
LR(1) parsers, operator precedence
Left-to-right input
Rightmost derivation Grammars that this can
1 symbol of look-ahead handle are called LR(1)
grammars
Also: LL(k), LR(k), SLR, LALR, …

5
Recursive Descent Parsers
 Top-down parsers are usually implemented as a mutual recursive suite of functions
that descend through a parse tree for the string, and as such are called “recursive
descent parsers” (RDP).
 Recursive descent parsers fall into a class of parsers known as LL(k) parsers
 LL(k) stands for Left-to-right, Leftmost-derivation, k-symbol lookahead parsers
 We first examine LL(1) parsers – LL parsers with one symbol lookahead
 To build the RDP, at first, we need to create the “First” and “Follow” sets of the
non-terminals in the CFG.
Example: A CFG and a CFG in its RDP form Terminals = { e, f, g , h, i }
Non-Terminals = {S',S, A, B, C, D }
Terminals = { id, num, while, print,>, {, }, ;, (, ) }
Rules = (0) S'  S$
Non-Terminals = { S, E, B, L }

orm
(1) S  AB
orm

Rules = (1) S  print(E); (2) S  Cf

Pf
Pf

(2) S  while (B) S

RD
(3) A  ef
RD

(3) S  { L } (4) A  ε

it s
its

(5) B  hg

in
(4) E  id
t in

FG
(5) E  num (6) C  DD
no

C
(6) B  E > E (7) C  fi
G

b)
CF

(7) L  SL|ε (8) D  g


6
Start Symbol = S'
a)

Start Symbol = S
First Sets and Follow Sets

Goal:- Given productions A → a |b , the parser should be able to choose


between a and b
How can the next input token help us decide?
Solution: FIRST sets
Informally: FIRST(a) is the set of tokens that could appear as the first
symbol in a string derived from a
Def: x in FIRST(a) iff a → x g

First Sets

 The First set of a non-terminal A is the set of all terminals that can

begin a string derived from A

7  If the empty string ε can be derived from A, then ε is also in the First
Contd.
For instance, given the CFG below ($ is an end-of-file marker, ε means empty
string ) :  In this grammar, the set of all strings
derivable from the non-terminal S’ are
Terminals = { e, f, g , h, i }
{efhg, fi, gg, hg}
Non-Terminals = {S',S, A, B, C, D }  Thus, the First(S’) = {e,f,g,h}, where
e,f,g and h are the first terminal of each
Rules = (0) S'  S$ string in the above terminal set,

(1) S  AB|Cf respectively


 Similarly, we can derive the First sets of
(3) A  ef|ε S, A, B, C and D as follows:
First(S) = {e,f,g,h} First(DD) = {g}
(5) B  hg First(A) = {e, ε} First(AB) = {e, h}
First(B) = {h} First(efB) = {e}
(6) C  DD|fi First(C) = {f,g} First(AC) = {e, f,
8 (8) D  g First(D) = {g} g}
Contd.

Fellow Sets
 For each non-terminal in a grammar, we can also create a Follow set.

 The Follow set for a non-terminal A in a grammar is the set of all terminals

that could appear right after A in a valid sentence while driving it.
 Take another look at the CFG shown in slide no. 5 above, what terminals

can follow A in a derivation?


o Consider the derivation S’  S$  AB$  Ahg$, since h follows A in

this derivation, h is in the Follow set of A. Note: $ is the end-of-file

marker.
o What about the non-terminal D? Consider the partial derivation: S’ 

9 S$  Cf$  DDf$  Dgf$.


Contd.
 The follow sets for all non-terminals in the CFG are shown below:

Fellow(S') = { }
Fellow(S) = {$}
Fellow(A) = {h}
Fellow(B) = {$}
Fellow (C) = {f}
Fellow (D) = {f, g}

10
Finding First and Follow Sets
 To calculate the First set of a non-terminal A, we need to calculate the First set
of a string of terminals and non-terminals, since the rules that define what a
non-terminal can derive contain terminals and non-terminals.
 The First set of a string of terminals and non-terminals  can be defined

recursively using the following two algorithms:


Algorithm to calculate First( ), for a string of Terminals and Non-Terminals
 If = ε then First()= ε
 If the first symbol in  is the terminal a, then First()={a}
 If = A' for some non-terminal A, and (possibly empty) string of terminals
and non-terminals ':
o If First(A) does not contain ε, then First()=First(A)
o If First(A) does contain ε, then First()=(First(A)-{ε})  First(')
11
Contd.
Algorithm to calculate First sets for all Non-Terminals in CFG G
1. For each non-terminal A in G, set First(A)= { }
2. For each rule A (where  is a string of terminals and non-terminals), add
all elements of First() to First(A). That is:
o If = ε add ε to First(A)
o If the first character in  is the terminal a, then add a to First(A)
o If = A1' for some non-terminal A1, and First(A1) does not contain ε,

then add all elements of First(A1) to First(A)


o If = A1' for some non-terminal A1, and First(A1) does contain ε, then

add all elements of First(A1) (other than ε) to First(A), and recursively


add all elements of First(') to First(A)

12 3. If any changes were made in step 2, go back and to 2 and repeat


Example
Consider the CFG given on slide no. 7 above is G.
o Initially, for each non-terminal A in G we set First(A) = { }, an empty set
o We then go through one iteration of the algorithm, which will modify the
first set as follows:
(0) S'  S$. Add { } to First(S' ) ={ } (no change) Non-Terminals First Set
(1) S  AB. Add { } to First(S) = { } (no change)
S' {}
(2) S  C. Add { } to First(S) = { } (no change)
S {}
(3) A  ef. Add e to First(A) = {e}
(4) A  ε. Add ε to First(A) = {e, ε} A {e, ε}
(5) B  hg. Add h to First(B) = {h} B {h}
(6) C  DD. Add { } to First(C) = { } (no change)
C {f}
(7) C  fi. Add f to First(C) = {f}
(8) D  g. Add g to First(D) = {g} D {g}

Note: Since there were 5 changes, we need another iteration


13
Contd.

(0) S'  S$. Add { } to First(S’) = { } (no change)


Non-Terminals First Set
(1) S  AB. Add e,h to First(S) = {e, h}
S' {}
(2) S  C. Add f to First(S) = {e, h, f}
S {e, f, h}
(3) A  ef. Add e to First(A) = {e, ε} (no change)
(4) A  ε. Add ε to First(A) = {e, ε} (no change) A {e, ε}

(5) B  hg. Add h to First(B) ={h} (no change) B {h}

(6) C  DD. Add g to First(C) = {f, g} C {f, g}

(7) C  fi. Add f to First(C) = {f, g} (no change) D {g}

(8) D  g. Add g to First(D) ={g} (no change)

Note: Since there were 3 changes, we need another iteration

14
Contd.

(0) S’  S$. Add e,f,h to First(S’) = {e, f, h}


Non-Terminals First Set
(1) S  AB. Add e,h to First(S) = {e, f, h} (no change)
S' {e, f, h}
(2) S  C. Add f,g to First(S) = {e, h, f, g}
S {e, f, g, h}
(3) A  ef. Add e to First(A) = {e, ε} (no change)
(4) A  ε. Add ε to First(A) = {e, ε} (no change) A {e, ε}

(5) B  hg. Add h to First(B) = {h} (no change) B {h}

(6) C  DD. Add g to First(C) = {f, g} (no change) C {f, g}

(7) C  fi. Add f to First(C) = {f, g} (no change) D {g}

(8) D  g. Add g to First(D) = {g} (no change)

Note: Since there were 2 changes, we need another iteration


15
Contd.

(0) S'  S$. Add e, f, g, h to First(S’) = {e, h, f, g}


Non-Terminals First Set
(1) S  AB. Add e, h to First(S) = {e, f, h} (no change)
S' {e, f, g, h}
(2) S  C. Add f, g to First(S) = {e, h, f, g} (no change)
S {e, f, g, h}
(3) A  ef. Add e to First(A) = {e, ε} (no change)
(4) A  ε. Add ε to First(A) = {e, ε} (no change) A {e, ε}

(5) B  hg. Add h to First(B) = {h} (no change) B {h}

(6) C  DD. Add g to First(C) = {f, g} (no change) C {f, g}

(7) C  fi. Add f to First(C) = {f, g} (no change) D {g}

(8) D  g. Add g to First(D) = {g} (no change)

Note: Since there was 1 change, we need another iteration

16
Contd. Since there were no changes, we stop
(0) S’  S$. Add e, f, g, h to First(S’) (no change)
Non-Terminals First Set
(1) S  AB. Add e, f, h to First(S) (no change)
S' {e, f, g, h}
(2) S  C. Add f, g to First(S) (no change)
S {e, f, g, h}
(3) A  ef. Add e to First(A) (no change)
(4) A  ε. Add ε to First(A) (no change) A {e, ε}

(5) B  hg. Add h to First(B) (no change) B {h}

(6) C  DD. Add g to First(C) (no change) C {f, g}


(7) C  fi. Add f to First(C) (no change) D {g}
(8) D  g. Add g to First(D) (no change)

Note that within each iteration we can examine the rules in any order.
 If we examine the rules in a different order in each iteration, we will still achieve
the same result, but may take a different number of iterations.

17  Check that an order of iteration 8,7,6,5,4,3,2,1,0 requires fewer number of


iteration?
Finding Follow Sets for Non-Terminals
 If the grammar contains the rule: S  Aa, then a is in the Follow set of A,
since a appears immediately after A.
 If the grammar contains the rules:
S AB
Ba│b
then both a and b are in the Follow set of A, Why?
Consider the following two partial derivations:
S  AB  Aa
S  AB  Ab
So both a and b are in the Follow set of A.
S  ABC AaC
If the grammar contains the rules:
S  ABC AbC
S  ABC S  ABC AC Ac
Ba│b│ε S  ABC AD Ad
Cc│d
18
then a, b, c and d are all in the Follow set of A. Why?
Contd.
Consider the grammar:
S  Ax S  Ax Cx

AC
then x is in the Follow sets of A and C. Why?
Consider another grammar:
S  Ax
A  CD
Dε
then x is in the Follow sets of A, C and D. Why?
S  Ax CDx CDx

 The above examples lead us to the following method for finding the

19 follow sets for the non-terminals in a CFG


Finding First and Follow Sets
 We can calculate the Follow sets from the First sets by using the recursive
algorithm given below:
Algorithm to calculate Fellow sets for all Non-Terminals in CFG G
1. Calculate First(A) for all non-terminals A in G
2. Set Fellow(A)={ } for all non-terminals A in G
3. For each rule A in G(where A is a non-terminal and  is a string of
terminals and non-terminals). For each non-terminal A1 in 
i. If the rule is of the form AA1, where  and  are (possibly
empty) strings of terminals and non-terminals, and First() does not
contain ε, then add all elements of First() to Fellow(A1)
ii. If the rule is of the form AA1, where  and  are (possibly
empty) strings of terminals and non-terminals, and First() does
contain ε, then add all elements of First() except ε to Fellow(A1)
20 and add all elements of Fellow(A) to Fellow(A1)
Example

Consider the following CFG, for which we calculate the First sets for all non-

terminals: The First sets of non-terminals are:


Terminals = { a, b, c , d} S' = {a,b}; S = {a,b}; T = {a, ε}; U =
Non-Terminals = {S’,S, T, U, V} {b}; and V = {b,d}
Rules = (0) S'  S$
Non-Terminals First Set
(1) S  TU
S' {a, b}
(2) T  aVa
S {a, b}
(3) T  ε
T {a, ε}
(4) U  bVT
U {b}
(5) V  Ub
V {b, d}
(6) V  d
21
Start Symbol = S'
Contd.
Initially, for each non-terminal A in G we set Fellow(A) = { }, an empty set
(0) S'  S$ Add {$} to Follow (S) = {$}.
Non-Terminals Fellow Set
(1) S  TU Add First(U), {b}, to Follow(T) = {b}
S' {}
Add Follow(S), {$}, to Follow(U) = {$}
S {$}
(2) T  aVa Add {a} to Follow(V) = {a}
T {b, $}
(3) T  ε (no change)
U {b, $}
(4) U  bVT Add First(T), {a}, to Follow(V) = {a}
V {a, $}
Add Follow(U), {$}, to Follow(T) = {b, $}
Add Follow(U), {$}, to Follow(V) = {a, $} (for T  ε)
(5) V  Ub Add {b} to Follow(U) = {b, $}
(6) V  d (no change)
The Follow sets of non-terminals are:
S’ = { }; S = {$}; T = {b, $}; U = {b, $}; and V = {a,$}
22
Note: Since there were some changes, we need another iteration
Contd.
(0) S’  S$ Add $ to Follow (S) = {$}. (no change)
(1) S  TU Add First(U), b, to Follow(T) = {b, $} (no change)
Add Follow(S), {$}, to Follow(U) = {b, $} (no change)
(2) T  aVa Add {a} to Follow(V) = {a, $} (no change)
(3) T  ε (no change)
(4) U  bVT Add First(T), a, to Follow(V) = {a, $} (no change)
Non-Terminals Fellow Set
Add Follow(U), {$}, to Follow(T) = {b, $} (no change) S' {}
Add Follow(U), {b, $}, to Follow(V) = {a, b, $} (for T  ε) S {$}
(5) V  Ub Add b to Follow(U) = {b, $} (no change) T {b, $}
(6) V  d (no change) U {b, $}

The Follow sets of non-terminals are: V {a, b, $}

S’ = { }; S = {$}; T = {b, $}; U = {b, $}; and V = {a, b, $}

Note: Since there was 1 change, we need another iteration


23
Contd.
(0) S’  S$ Add $ to Follow S = {$}.. (no change)
(1)S  TU Add First(U), b, to Follow(T ) = {b, $} (no change)
Add Follow(S), {$}, to Follow(U) = {b, $} (no change)
(2) T  aVa Add a to Follow(V) = {a, $} (no change)
(3) T  ε (no change)
(4) U  bVT Add First(T), a, to Follow(V) = {a, $} (no change)
Add Follow(U), {$}, to Follow(T) = {b, $} (no change) Non-Terminals Fellow Set

Add Follow(U), {b, $}, to Follow(V) = {a, b, $} (no change) S' {}


(5) V  Ub Add b to Follow(U) = {b, $} (no change) S {$}

(6) V  d (no change) T {b, $}


U {b, $}
The Follow sets of non-terminals are: V {a, b, $}
S’ = { }; S = {$}; T = {b, $}; U = {b, $}; and V = {a, b, $}
24
Note: Since there were no changes, we stop.
LL(1) Parse Tables
 Once we have First and Follow sets for all non-terminals in the grammar, we
can create a Parse Table
 A parse table is a blueprint for the creation of a recursive descent parser (RDP)
o The rows in the parse table are labeled with non-terminals and the
columns are labeled with terminals
o Each entry in the parse table is either empty or contain a grammar rule
 The rule located at row S, column a of a parse table tells us which rule to apply
when we are trying to parse the non-terminal S, and the next symbol in the
input is an a
id for the
 For instance, numgrammar while print
in slide no. 5 (a), > table{ is:
the parse } ; ( )
S Swhile(B) S S print(E) S {L}
E Eid Enum
B BE>E BE>E
25 L LSL LSL LSL Lε
Contd.
 Once we have the parse table for a CFG, creating a recursive descent parser is
easy.
o We need to write a function for each non-terminal S in the grammar.
o The row labeled S in the parse table will tell us exactly what the function
parse S needs to do.
Creating Parse Tables
A parse table is created as follows:
1. The rows of the parse table are labeled with the non-terminals of the
grammar.
2. The columns of the parse table are labeled with the terminals of the grammar

3. Each entry of the parse table is either empty or contains grammar rule.
26 o Place each rule of the form S  γ in row S in each column in First(γ),
Example

Consider again the CFG in slide no 7 (b). We have the First and Follow sets of

each non-terminal:
(0) S'  S$ First(S')={e,f,g,h}. S'  S$ goes in row S',
columns e, f, g, h
Non- First Follow
terminal (1) S  AB First(AB)={e, h}. S AB goes in row S,

S' {e,f,g,h {} columns e, h


} (2) S  C First(C)={f, g}. S C goes in row C, columns f,
S {e,f,g,h {$} g
}
(3)A  ef First(ef)={e}. A ef goes in row A, column e
A {e,ε} {h}
(4)A ε First(ε)={ε}
B {h} {$}
Follow(A)={h} A ε goes in row A, column h
C {f,g} {f}
(5) B  hg First(hg)={h}. B hg goes in row B, column h
D {g} {f,g}
(6) C  DD First(DD)={g}. C DD goes in row C, column
g
27 The Resulting parse table is shown on next slide.
(7) C  fi First(C)={f}. C fi goes in row C, column f
Contd.

e f g h i

S' S'  S$ S'  S$ S'  S$ S'  S$

S S  AB SC SC S  AB

A A  ef Aε

B B  hg

C C  fi C  DD

D Dg

28
Example 2
Given the following CFG create the LL(1) of the CFG :

Terminals = { id, num, (, ), ;, if, else, ,, $}


Non-Terminals = {S’, S, L, C, E}
Rules = (0) S'  S$
Find the First and Fellow sets of each non-
(1) S  id(L);
terminal?
(2) S  if(E) S else S
Non-terminal First Follow
(3) L  ε
(4) L  E C S' {id, if} {}

(5) C  ε S {id, if} {$,else}


(6) C  , E C L {id, num, ε} {)}
(7) E  id C {,, ε} {)}
(6) E  num
E {id, num} { ), ,}
29 Start Symbol = S'
Contd.
Given the above First and Fellow sets, the parse table for the CFG is created as
id num ( ) ; if else ,
follows:
S' S'  S$ S'  S$
S S S  if(E) S else
id(L) S
L L E C L EC Lε
E C C,E
ε C
E E  id E
num
Note that we only need to compute Fellow sets for an LL(1) parser if at least one
First contains ε.
o Fellow sets are only used in creation of the parse table for rules of the form S
 γ, where First(γ) contains ε.
o Fellow sets are not necessary if no such rule exists. However, if there exists at
30
least one rule, then we still need to create the fellow sets of all non-terminals in
LL (1) Parser…
Exercise 2:
Let G be the following grammar:
S  [ SX ] | a
X  ε | +SY | Yb
Y  ε | -SXc
A – Find FIRST and FOLLOW sets for the non-terminals in this
grammar.
B – Construct predictive parsing table for the grammar above.
C – Show a top down parse of the string [a+a-ac]
Grammars That Are Not LL(1)
 If we can build an LL(1) parse table for a grammar that has no duplicate
entries, then we say that grammar is LL(1).
o Unfortunately, not all grammars are LL(1). For instance, the following
grammar is not LL(1) grammar. Parse Table for the
Grammar
+ - * / % id
Terminals = { id, +, - , *, /, % }
E  id
Non-Terminals = {E}
EE+E
Rules = (0) E  id
EE-E
(1) E  E + E|E - E E
EE*E
(3) E  E * E|E / E |E % E EE/E
Start Symbol = E EE%E
 The parse table includes only one non-terminal E, but it has 6 entries in the id
column.
o Hence, the above grammar is ambiguous and we can not create an LL(1)
32 parser for it.
o
Ambiguity
 A grammar produces more than one parse tree for a sentence is
called as an ambiguous grammar.
• produces more than one leftmost derivation or
• more than one rightmost derivation for the same sentence.
 We should eliminate the ambiguity in the grammar during the
design
phase of the compiler.
 An unambiguous grammar should be written to eliminate the
ambiguity.
Removing Ambiguity
There are four ways in which ambiguity can creep into (get into) a CFG for a
programming language:
1. Defining expressions:- the straightforward definition of expressions will
often lead to ambiguity, such as the one that we have seen in slide # 31
above.
2. Defining complex variables:- complex variables, such as instance variables
in classes, fields in records or structures, array subscripts and pointer
references, can also lead to ambiguity. Example V  id|V.V
3. Overlap between specific and general cases: For example CFG ,
The terminal id has several leftmost
Terminals = { id, +, - , *, /, % } derivations (and hence several parse
Non-Terminals = {E, T, F} trees):

34 Rules =(0) E  E+T|E-T|T|id E  T , E  T  id, E  T  F 


(1) T T*F|T/F|T%F|F|id id
Contd.
4. Nesting statements:- the most common instance of nesting statements
causing ambiguity is the infamous "dangling else", whose CFG is shown
below: Draw the
Parse tree
S  if e then S else S |S  if e then S for the CFG?
S  a|b
The above CFG has two parse trees
 It is not always possible to remove ambiguity from a context-free grammar.
o There are some languages that are inherently ambiguous. That is, there
exists a language L, such that all CFGs that generate L are ambiguous.
 Inherent ambiguity is not a problem that compiler designers usually
need to face. i.e. no major programming language is inherently
ambiguous.
35
o There is no algorithm that will always remove ambiguity from a context-
Left recursion
 A grammar is left recursive, if it has a non-terminal A such that
there is a derivation
A=>Aα for some string α.
 Top-down parsing methods cannot handle left-recursive
grammar.
 so a transformation that eliminates left-recursion is needed.
 To eliminate left recursion for single production
A  Aα |β could be replaced by the non left- recursive
productions
A  β A’
A’  α A’| ε
Removing Left Recursion
 An unambiguous grammar may still not be LL(1). Consider the unambiguous
expression grammar below:
Terminals = { id, +, - , *, /, % }  Though this CFG is unambiguous, it is not
Non-Terminals = {E, T, F} LL(1). In order for a CFG to be LL(1), it
Rules =(1) E  E+T
must be possible to decide which rule to apply
(2) E  E - T
after looking at only the leftmost symbol of a
(3) E  T
(4) T  T*F T T/F string.

(5) T  T/F  On seeing that rules an id, we cannot tell if


(6) T  F we should apply rule (1), (2), or (3).
(7) F  (E)
 The problem with this CFG is (1), (2) are
(8) F  id
Start Symbol = E left-recursive.
 A rule S  α (where S is a non-terminal
No left-recursive grammar
37
and α is a string of terminals and non-
is LL(1)
Contd.
Consider the following CFG fragment:
(1) S  Sα
(2) S  β
What strings can be derived from S? Consider the following partial derivations:
SSαSααSαααβααα
Any string that can be derived from S will be a string that can be derived from α
followed by zero or more strings that can be derived from β. Using EBNF
notation, we have:
S  β(α)*
Using CFG notations, we have:
S  βA We have removed the left-recursion in
A  αA the above example!!
38
Aε
Contd.

Let’s take a closer look at the


In general, the set of rules of the form: expression grammar:
EE+T
S  Sα1; S  Sα2 ; S  Sα3 ; ..... ; S 
EE–T
Sαn ET
Using the above transformation, we
S  β1 ; S  β2 ; S  β3 ; ….. ; S  βn
get the following CFG, which has no
Can be rewritten as: left-recursion:
E  TE'
S  BA
E'  +TE'
B  β1│β2│β3│…..│βn E'  -TE'
E'  ε
A  α1A│α2A│α3A│.....│αnA Using EBNG notations, we have:
39
E  T((+E)│(-E))*
Removing Left Factoring

 Even if a CFG is unambiguous and has no left-recursion, it still may not be

 LL(1).
When a non-terminal has two or more productions whose right-
hand sides start with the same grammar symbols,(common
prefix) the grammar is not LL(1) and cannot be used for
predictive parsing
 A predictive parser (a top-down parser without backtracking)

insists that the grammar must be left-factored.

In general : A  αβ1 | αβ2 , where α-is a non empty and the first

symbol of β1 and β2.


40
Contd.
 When processing α we do not know whether to expand A to αβ1 or
to αβ2, but if we re-write the grammar as follows:
A  αA’
A’  β1 | β2 so, we can immediately expand A to αA’.

 Example: given the following grammar:


S  iEtS | iEtSeS | a
Eb
 Left factored, this grammar becomes:
S  iEtSS’ | a
S’  eS | ε
Eb

41
Contd...

The following stmt  if expr then stmt else stmt


grammar:
| if expr then stmt

Cannot be parsed by a predictive parser that looks


one element ahead.
But the grammar stmt  if expr then stmt stmt’
can be re-written: stmt‘ else stmt | 
Where  is the empty string.
Rewriting a grammar to eliminate multiple productions
starting with the same token is called left factoring.

42
LL(K) Parsers
 LL(1) parser needs to decide which rule to apply after looking at only one
token
 If more than one single token is required to determine which rule to apply,
then the grammar is not LL(1)
For instance, consider the following simple CFG:
Terminals = {a, b, c}
Non-terminals = {S}
Using EBNF notations to get:
Rules = (1) S  abc S->a(bc|cd)

(2) S  acb
Start symbol = S
 This grammar is not LL(1),
a since the bLL(1) parse
c table has duplicate
43 entries:S S abc; S  acb
Contd.
 When trying to parse a string derivation from an S, we cannot tell
which rule to apply by looking at a single symbol; since all strings
derivable from S start with a.
 We could left-factor the grammar to make it LL(1)
 We also could modify our parser so that it examines the first two
elements in the string to determine which rule to apply
 The resulting parse table would be much larger as follows:

aa ab ac ba bb bc ca cb cc
S S abc S  acb

44
Contd.
 An LL(k) parser examines the first k symbols in the input before
determining which rule to apply
 In order to create an LL(k) parser, we need to generalize the definitions of
First and Follow sets
 Our definitions of generalized First and Follow sets will use the concept of
k-prefix
Definition1 k-prefix: The k-prefix of a string of terminals w is a string
consisting of the first k terminals in w. If │w│≤ k, then the k-prefix of w
is w.
 The k-prefix of a set of strings is the set of K-prefixes of all strings in the

45 set.
Contd.

Definition 2 Firstk: The Firstk set of a non-terminal S is the k-prefix of the set

of all strings of terminals derivable from S.

The Firstk set of a string of terminals and non-terminal γ is the k-prefix of all

strings of terminals derivable from γ.

Definition 3 Followk: The Followk set of a non-terminal S is the k-prefix of

the set of all strings of terminals that follow S in a partial derivation.

46
Algorithm to calculate Firstk for non-terminals
1. For each non-terminal S in G, set Firstk(S) = { }

2. For each rule S  γ in G, add all elements of k-prefix(Firstk(γ)) to

Firstk(S)
3. If any changes were made in step 2, go back to step 2 and repeat
Algorithm to calculate Firstk for a string of terminals and non-terminals:

1. For any terminal a, Firstk(a) = {a}

2. For any string of terminals and non-terminals γ = γ1γ2γ3…γn, Firstk(γ) = k-

prefix( Firstk(γ1) ○ Firstk(γ2) ○ Firstk(γ3) ○ … Firstk(γn))

47
Algorithm to calculate Followk for non-terminals
1. Calculate Firstk(S) for all non-terminals S in G

2. Set Followk = { } for all non-terminals S in G


3. For each rule S  γ in G
For each non-terminal S1 in γ where γ = α S1β, add

[k-prefix(Firstk(β) ○ Followk(S))] to Followk(S1). If Followk(S) = { },

add [k-prefix(Firstk(β))] to Followk(S1).


4. If any changes were made in step 3, go back to step 3 and repeat

48
End of slide!!

You might also like