Parsing
Parsing
2
The parse tree is constructed
– From the top
– From left to right
3
Top-down parser
Recursive-Descent Parsing
Backtracking is needed (If a choice of a production rule
does not work, we backtrack to try other alternatives.)
It is a general parsing technique, but not widely used.
Not efficient
Predictive Parsing
no backtracking
efficient
needs a special form of grammars (LL(1) grammars).
Non-Recursive (Table Driven) Predictive Parser is also
known as LL(1) parser.
Recursive Predictive Parsing is a special form of Recursive
Descent parsing without backtracking.
4
Backtracking is needed.
It tries to find the left-most derivation.
S aBc
B bc | b
S S
Input : abc fails, backtrack
a B c a B c
b c b
5
Consider the following production
S → aAb
A → c |cd
Let the input string be acdb.
6
Consider the following production
SBA| AB
Aa| SA
Bb | SB
w= abab
current token
Unlike recursive-descent, predictive parser can “predict”
which production to use.
– By looking at the next few tokens.
– No backtracking.
8
stmt if ...... |
while ...... |
begin ...... |
for .....
When we are trying to write the non-terminal stmt, if the current
token is if we have to choose first production rule.
When we are trying to write the non-terminal stmt, we can
uniquely choose the production rule by just looking the current
token.
9
A → BC
B → DE
D → FG
F → HI
H → xY
First(A) = {x}
Write the sets of the following:
S -> Ty
T -> AB
T -> sT
A -> aA
A -> λ
B -> bB
B -> λ
Non-Recursive predictive parsing is a table-driven parser.
It is a top-down parser.
It is also known as LL(1) Parser.
input buffer
Parsing Table
72
S→Bc|DB
B→ab|cS
D→d|ε
For this grammar:
Construct FIRST and FOLLOW Sets
S Bc Bc DB
DB DB
B
D ε ε
81
output
a production rule representing a step of the
82
The symbol at the top of the stack (say X) and the
current symbol in the input string (say a) determine the
parser action.
There are four possible parser actions.
1. If X and a are $ parser halts (successful completion)
2. If X and a are the same terminal symbol then
parser pops X from the stack, and moves the next symbol in the input
buffer.
3. If X is a non-terminal
M [X,a] holds a production rule XY1Y2...Yk, it pushes Yk,Yk-1,...,Y1 into
the stack. The parser also outputs the production rule XY1Y2...Yk to
represent a step of the derivation.
4. none of the above error
all empty entries in the parsing table are errors.
If X is a terminal symbol different from a, this is also an error case. 83
stack input output
$E id+id$ E TE’ id + $
$E’T id+id$ E’
T FT E
$E’ T’F id+id$ F id TE’
$ E’ T’id id+id$
$ E ’ T’ +id$
E
T’
’
E’ E’
$ E’ +id$ E’ +TE’ +TE’
$ E’ T+ +id$ T T
$ E’ T id$ T FT’ FT’
$ E ’ T’ F id$ F id ’
T T’ T’
$ E’ T’id id$
$ E ’ T’ $ T’
$ E’ $
F
E’
F
$ $ accept id
141
a b $
S aBa LL(1) Parsing
B bB | S S aBa Table
w =abba
B B B bB
stack input output
$S abba$ S aBa
$aBa abba$
$aB bba$ B bB
$aBb bba$
$aB ba$ B bB
$aBb ba$
$aB a$ B
$a a$
$ $ accept, successful completion
142
Outputs: S aBa B bB B bB B
S
parse tree
a B a
b B
b B
143
PROGRAM → begin DECLIST comma STATELIST
end
DECLIS → d semi DECLIST
DECLIST → d
STATELIST → s semi STATELIST
STATELIST → s
After left factoring, the grammer is changed to
Void Y()
{
if (token == semi)
{
token = lexical();
STATELIST();
}
else
if (token == end) ; // do nothing
else error();
}
PROGRAM → begin DECLIST comma STATELIST
end
DECLIST → dX
X → semi DECLIST | є
STATELIST → sY
Y → semi STATELIST | є
E→ E ‘+’ T
E→ T
T→ T ‘*’ F
T→ F
F→ ‘(‘ E ‘)’
F→ ‘x’
E→ E ‘+’ T
E→ T
T→ T ‘*’ F
T→ F
Transforming the grammar into LL(1) F→
F→
‘(‘ E ‘)’
‘x’
E → TX
X → ‘ +’ TX | є
T → FY
Y → ‘*’ FY | є
F → ‘(‘ E ‘) | ‘x’
Void T()
{
F();
while (token == Times)
{
token = lexical();
F();
}
}
Void F()
{ E → T( ‘+’ T)*
if (token == obracket) T → F(‘*’ F)*
{
token = lexical(); F → ‘(‘ E ‘)’ | ‘x’
E();
if (token == cbracket)
token = lexical();
else
error();
}
else if (token == x)
token = lexical();
else
error();
main()
{
token = lexical(;
E();
}