0% found this document useful (0 votes)
70 views20 pages

Compiler Design: - Top-Down Parsing With A Recursive Descent Parser

This document discusses top-down parsing with a recursive descent parser. It begins by explaining that a parser takes the sequence of tokens from a lexical analyzer and outputs intermediate code representations or error messages. It then describes how top-down and bottom-up parsing work and that recursive descent parsing is simplest if the grammar is not complicated. The document goes on to explain how recursive descent parsing works by defining functions for each grammar rule and uses lookahead, first sets, and follow sets to determine the parsing path. It also discusses techniques for error recovery.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views20 pages

Compiler Design: - Top-Down Parsing With A Recursive Descent Parser

This document discusses top-down parsing with a recursive descent parser. It begins by explaining that a parser takes the sequence of tokens from a lexical analyzer and outputs intermediate code representations or error messages. It then describes how top-down and bottom-up parsing work and that recursive descent parsing is simplest if the grammar is not complicated. The document goes on to explain how recursive descent parsing works by defining functions for each grammar rule and uses lookahead, first sets, and follow sets to determine the parsing path. It also discusses techniques for error recovery.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

Compiler Design 5.

Top-Down Parsing with a Recursive Descent Parser


Kanat Bolazar February 2, 2010

Parsing
Lexical Analyzer has translated the source program into a sequence of tokens The Parser must translate the sequence of tokens into an intermediate representation
Assume that the interface is that the parser can call getNextToken to get the next token from the lexical analyzer And the parser can call a function called emit that will put out intermediate representations, currently unspecified

The parser outputs error messages if the syntax of the source program is wrong

Parsing: Top-Down, Bottom-Up


Given a grammar such as:
E -> 0 | E + E

And a string to parse such as "0 + 0" A parser can parse top-down, from start symbol (E above):
E -> 0+0 E+E -> 0 + E -> 0 + 0

Or parse bottom-up, grouping terminals into RHS of rules:


<- E + 0 <- E + E <- E

Usually, parsing is done as tokens are read in:


Top-down: After seeing 0, we don't yet know which rule to use; After seeing 0 +, we can expand E to E + E Bottom-up: 0 can be reduced to E right away, without seeing +

Parsing: Top-Down, Bottom-Up


Generally:
top-down is easier to understand, implement directly bottom-up is more powerful, allowing more complicated grammars top-down parsing may require changes to the grammar

Top-down parsing can be done:


programmatically (recursive descent) by table lookup and transitions

Bottom-up parsing requires table-driven parsing If the grammar is not complicated, the simplest approach is to implement a recursive-descent parser. A recursive descent parser does not require backtracking to take alternative paths along the parse (derivation) path.

Recursive Descent Parsing


For every BNF rule (production) of the form <phrase1> E the parser defines a function to parse phrase1 whose body is to parse the rule E
void parsePhrase1( ) { /* parse the rule E */

Where E consists of a sequence of non-terminal and terminal symbols Requires no left recursion in the grammar.
5

Parsing a rule
A sequence of non-terminal and terminal symbols, Y1 Y2 Y3 Yn is recognized by parsing each symbol in turn For each non-terminal symbol, Y, call the corresponding parse function parseY For each terminal symbol, y, call a function
expect(y)

that will check if y is the next symbol in the source program


The terminal symbols are the token types from the lexical analyzer If the variable currentsymbol always contains the next token:
expect(y): if (currentsymbol == y) then getNextToken() else SyntaxError()
6

Simple parse function example


Suppose that there was a grammar rule <program> class <classname> { <field-decl> <method-decl> } Then:
parseProgram(): expect(class); parseClassname(); expect({); parseFieldDecl(); parseMethodDecl(); expect(});

Look-Ahead
In general, one non-terminal may have more than one production, so more than one function should be written to parse that non-terminal. Instead, we insist that we can decide which rule to parse just by looking ahead one symbol in the input
<sentence> -> 'if' '(' <expr> ')' <block> | 'while' '(' <expr> ')' <block> ...

Then parseSentence can have the form


if (currentsymbol == "if") ... // parse first rule elsif (currentsymbol == "while") ... // parse second rule ...

First and Follow Sets


First(E), is the set of terminal symbols that may appear at the beginning of a sentence derived from E
And may also include if E can generate an empty string

Follow(<N>), where <N> is a non-terminal symbol in the grammar, is the set of terminal symbols that can follow immediately after any sentence derived from any rule of N In this grammar:
E -> 0 | E + E

First(0) = {0} First(E + E) = {0} Follow(E) = {+, EOF}

First(E) = {0}

Grammar Restriction 1
Grammar Restriction 1 (for top-down parsing): The First sets of alternative rules for the same LHS must be different (so we know which path to take upon seeing a first terminal symbol/token). Notice: This is not true in the grammar above. Upon seeing 0 we don't know if we should take 0 or E + E path.

10

Recognizing Possibly Empty Sentences


In a strict context free grammar, there may be rules in which the rhs is , the empty string Or in an extended BNF grammar, there may be the specification that some part of the rhs of the rule occurs 0 or 1 times <phrase1> [ <phrase2> ] Then we recognize the possibly empty occurrence of phrase2 by
if (currentsymbol is in First(<phrase2>)) then parsePhrase2()

11

Recognizing Sequences
In a context free grammar, you often have rules that specify any number of a phrase can occur <arglist> <arg> <arglist> | e In extended BNF, we replace this with the * to indicate 0 or more occurrences <arg> * We can recognize these sequences by using iteration. If there is a rule of the form <phrase1> <phrase2>* we can recognize the phrase2 occurrences by
while (currentsymbol is in First(<phrase2>)) do parsePhrase2()
12

Grammar Restriction 2
In either of the previous cases, where the grammar symbol may generate sentences which are empty, the grammar must be restricted suppose that <phrase2> is the symbol that can occur 0 times require that the sets First(<phrase2>) and Follow(<phrase2) be disjoint Grammar Restriction 2: If a nonterminal may occur 0 times, its First and Follow sets must be different (so we know whether to parse it or skip it on seeing a terminal symbol/token).
13

Multiple Rules
Suppose that there is a nonterminal symbol with multiple rules where each rhs is nonempty <phrase1> E1 | E2 | E3 | . . . | En then we can write ParsePhrase1 as follows: if (currentsymbol is in First( E1 )) then ParseE1 elsif (currentsymbol is in First( E2 )) then ParseE2 ... elsif (currentsymbol is in First( En )) then ParseEn else Syntax Error If any rhs can be empty, then dont give the syntax error Remember the first grammar restriction:
The sets First( E1 ), , First( En ) must be disjoint
14

Example Expression Grammar


Suppose that we have a grammar <expr> <term> { <op> <term> }* <term> const | ( <expr> ) <op> + | - Parsing functions:
void parseTerm ( ) void parseExpr ( ) { if (cursym == const) then getNextToken() { parseTerm(); else if (cursym == () while (cursym in First(<op>)) then { getNextToken(); { getNextToken(); parseExpr(); parseTerm(); expect( ) ) } } } }
15

First Sets
Here we give a more formal, and more detailed, definition of a First set, starting with any non-terminal.
If we have a set of rules for a non-terminal, <phrase1> <phrase1> E1 | E2 | E3 | . . . | En then First(<phrase1>) = First(E1)+ . . . + First(En ) (set union) For any right hand side, Y1 Y2 Y3 Yn , we make cases on the form of the rule First(aY2 Y3 Yn) = a , for any terminal symbol a First(N Y2 Y3 Yn) = First(N), for any non-terminal N that does not generate the empty string First([N]M) = First(N) + First(M) (0 or 1 occurrence of N) First({N}*M) = First(N) + First(M) (0 or more occurrences) First( ) =
16

Follow Sets
To define the set Follow(T), examine the cases of where the non-terminal T may appear on the rhs of a rule in the grammar.
N S T U or N S [T] U or N S {T}* U If U never generates an empty string, then Follow(T) includes First(U) If U can generate an empty string, then Follow(T) includes First(U) and Follow(N) N S T or N S [ T ] or N S { T }* Follow(T) includes Follow(N) The Follow set of the start symbol should contain EOT, the end of text marker

Include the Follow set of all occurrences of T from the rhs of rules to make the set Follow(T)
17

Simple Error Recovery


To enable the parser to keep parsing after a syntax error, the parser should be able to skip symbols until it finds a synchronizing symbol.
E.g. in parsing a sequence of declarations or statements, skipping to a ; should enable the parser to start parsing the next declaration or statement

18

General Error Recovery


A more general technique allows the syntax error routine to be given a list of symbols that it should skip to. void syntaxError(String msg, Symbols StopSymbols) { give error with msg; while (! currentsymbol in StopSymbols) { getNextSymbol } }
assuming that there is a type called Symbols of terminal symbols we may want to pass an error code instead of a message

Each recursive descent procedure should also take StopSymbols as a parameter, and may modify these to pass to any procedure that it calls
E.g. if there is a procedure to parse the parameter list of a method call, then it can have ) as a stop symbol
19

Stop Symbols
If the parser is trying to parse the rhs E of a non-terminal NE then the stop symbols are those symbols which the parser is prepared to recognize after a sentence generated by E
Remove anything ambiguous from Follow(N)

The stop symbols should always also contain the end of text symbol, EOT, so that the syntax error routine never tries to skip over symbols past the end of the program.

20

You might also like