Chapter 3 B Top-Down Parsing
Chapter 3 B Top-Down Parsing
Top-Down Parsing
1
Objective
At the end of this session students will be able to:
Understand the basics of Parsing techniques(Top-Down Vs. Bottom-Up parsing).
Understand the Recursive Descent Parsers: First and Fellow sets and how to find
2
Parsing and Parsers
Once we have described the syntax of our programming language using a
context-free grammar, the next step is to determine if a string of tokens
returned from the lexical analyzer could be derived from that context-free
grammar
Determining if a sequence of tokens is syntactically correct is called parsing
A. Top-down parsing: Start at the root of the parse tree and grow toward
leaves
o Pick a production rule & try to match the input
B. Bottom-up parsing: Start at the leaves and grow toward root (earlier
5
Recursive Descent Parsers
Top-down parsers are usually implemented as a mutual recursive suite of functions
that descend through a parse tree for the string, and as such are called “recursive
descent parsers” (RDP).
Recursive descent parsers fall into a class of parsers known as LL(k) parsers
LL(k) stands for Left-to-right, Leftmost-derivation, k-symbol lookahead parsers
We first examine LL(1) parsers – LL parsers with one symbol lookahead
To build the RDP, at first, we need to create the “First” and “Follow” sets of the
non-terminals in the CFG.
Example: A CFG and a CFG in its RDP form Terminals = { e, f, g , h, i }
Non-Terminals = {S',S, A, B, C, D }
Terminals = { id, num, while, print,>, {, }, ;, (, ) }
Rules = (0) S' S$
Non-Terminals = { S, E, B, L }
orm
(1) S AB
orm
Pf
Pf
RD
(3) A ef
RD
(3) S { L } (4) A ε
it s
its
(5) B hg
in
(4) E id
t in
FG
(5) E num (6) C DD
no
C
(6) B E > E (7) C fi
G
b)
CF
Start Symbol = S
First Sets and Follow Sets
First Sets
The First set of a non-terminal A is the set of all terminals that can
7 If the empty string ε can be derived from A, then ε is also in the First
Contd.
For instance, given the CFG below ($ is an end-of-file marker, ε means empty
string ) : In this grammar, the set of all strings
derivable from the non-terminal S’ are
Terminals = { e, f, g , h, i }
{efhg, fi, gg, hg}
Non-Terminals = {S',S, A, B, C, D } Thus, the First(S’) = {e,f,g,h}, where
e,f,g and h are the first terminal of each
Rules = (0) S' S$ string in the above terminal set,
Fellow Sets
For each non-terminal in a grammar, we can also create a Follow set.
The Follow set for a non-terminal A in a grammar is the set of all terminals
that could appear right after A in a valid sentence while driving it.
Take another look at the CFG shown in slide no. 5 above, what terminals
marker.
o What about the non-terminal D? Consider the partial derivation: S’
Fellow(S') = { }
Fellow(S) = {$}
Fellow(A) = {h}
Fellow(B) = {$}
Fellow (C) = {f}
Fellow (D) = {f, g}
10
Finding First and Follow Sets
To calculate the First set of a non-terminal A, we need to calculate the First set
of a string of terminals and non-terminals, since the rules that define what a
non-terminal can derive contain terminals and non-terminals.
The First set of a string of terminals and non-terminals can be defined
14
Contd.
16
Contd. Since there were no changes, we stop
(0) S’ S$. Add e, f, g, h to First(S’) (no change)
Non-Terminals First Set
(1) S AB. Add e, f, h to First(S) (no change)
S' {e, f, g, h}
(2) S C. Add f, g to First(S) (no change)
S {e, f, g, h}
(3) A ef. Add e to First(A) (no change)
(4) A ε. Add ε to First(A) (no change) A {e, ε}
Note that within each iteration we can examine the rules in any order.
If we examine the rules in a different order in each iteration, we will still achieve
the same result, but may take a different number of iterations.
AC
then x is in the Follow sets of A and C. Why?
Consider another grammar:
S Ax
A CD
Dε
then x is in the Follow sets of A, C and D. Why?
S Ax CDx CDx
The above examples lead us to the following method for finding the
Consider the following CFG, for which we calculate the First sets for all non-
3. Each entry of the parse table is either empty or contains grammar rule.
26 o Place each rule of the form S γ in row S in each column in First(γ),
Example
Consider again the CFG in slide no 7 (b). We have the First and Follow sets of
each non-terminal:
(0) S' S$ First(S')={e,f,g,h}. S' S$ goes in row S',
columns e, f, g, h
Non- First Follow
terminal (1) S AB First(AB)={e, h}. S AB goes in row S,
e f g h i
S S AB SC SC S AB
A A ef Aε
B B hg
C C fi C DD
D Dg
28
Example 2
Given the following CFG create the LL(1) of the CFG :
LL(1).
When a non-terminal has two or more productions whose right-
hand sides start with the same grammar symbols,(common
prefix) the grammar is not LL(1) and cannot be used for
predictive parsing
A predictive parser (a top-down parser without backtracking)
In general : A αβ1 | αβ2 , where α-is a non empty and the first
41
Contd...
42
LL(K) Parsers
LL(1) parser needs to decide which rule to apply after looking at only one
token
If more than one single token is required to determine which rule to apply,
then the grammar is not LL(1)
For instance, consider the following simple CFG:
Terminals = {a, b, c}
Non-terminals = {S}
Using EBNF notations to get:
Rules = (1) S abc S->a(bc|cd)
(2) S acb
Start symbol = S
This grammar is not LL(1),
a since the bLL(1) parse
c table has duplicate
43 entries:S S abc; S acb
Contd.
When trying to parse a string derivation from an S, we cannot tell
which rule to apply by looking at a single symbol; since all strings
derivable from S start with a.
We could left-factor the grammar to make it LL(1)
We also could modify our parser so that it examines the first two
elements in the string to determine which rule to apply
The resulting parse table would be much larger as follows:
aa ab ac ba bb bc ca cb cc
S S abc S acb
44
Contd.
An LL(k) parser examines the first k symbols in the input before
determining which rule to apply
In order to create an LL(k) parser, we need to generalize the definitions of
First and Follow sets
Our definitions of generalized First and Follow sets will use the concept of
k-prefix
Definition1 k-prefix: The k-prefix of a string of terminals w is a string
consisting of the first k terminals in w. If │w│≤ k, then the k-prefix of w
is w.
The k-prefix of a set of strings is the set of K-prefixes of all strings in the
45 set.
Contd.
Definition 2 Firstk: The Firstk set of a non-terminal S is the k-prefix of the set
The Firstk set of a string of terminals and non-terminal γ is the k-prefix of all
46
Algorithm to calculate Firstk for non-terminals
1. For each non-terminal S in G, set Firstk(S) = { }
Firstk(S)
3. If any changes were made in step 2, go back to step 2 and repeat
Algorithm to calculate Firstk for a string of terminals and non-terminals:
47
Algorithm to calculate Followk for non-terminals
1. Calculate Firstk(S) for all non-terminals S in G
48
End of slide!!