0% found this document useful (0 votes)
54 views

Shift-Reduce Table Longest Sequence at The Top of Stack Matching The RHS of A Rule

Operator precedence parsing uses a bottom-up shift-reduce parsing method that follows weak precedence relations between grammar symbols. It differs from LR parsing in that it has no explicit parser states, makes shift/reduce decisions based only on the top of the stack and next input symbol using a shift-reduce table, and reduces using the longest match on the right-hand side. Weak precedence grammars that can be parsed this way have no empty right-hand sides or duplicate productions, at most one precedence relation between any two symbols, and no precedence conflicts between different productions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Shift-Reduce Table Longest Sequence at The Top of Stack Matching The RHS of A Rule

Operator precedence parsing uses a bottom-up shift-reduce parsing method that follows weak precedence relations between grammar symbols. It differs from LR parsing in that it has no explicit parser states, makes shift/reduce decisions based only on the top of the stack and next input symbol using a shift-reduce table, and reduces using the longest match on the right-hand side. Weak precedence grammars that can be parsed this way have no empty right-hand sides or duplicate productions, at most one precedence relation between any two symbols, and no precedence conflicts between different productions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Operator precedence parsing

Bottom-up parsing methods that follow the idea of shift-reduce


parsers
Several flavors: operator, simple, and weak precedence.
In this course, only weak precedence

Main di↵erences with respect to LR parsers:


I There is no explicit state associated to the parser (and thus no state
pushed on the stack)
I The decision of whether to shift or reduce is taken based solely on the
symbol on the top of the stack and the next input symbol (and stored
in a shift-reduce table)
I In case of reduction, the handle is the longest sequence at the top of
stack matching the RHS of a rule

Syntax analysis 194


Structure of the weak precedence parser
input a1 ai an $

stack
Xm 1

Xm

Weak precedence parsing output

X2

X1
Shift-reduce table
terminals and $
nonterminals and $
terminals,

Shift/Reduce/Error

(Ã modifier)
Syntax analysis 195
Weak precedence parsing algorithm
Create a stack with the special symbol $
a = getnexttoken()
while (True)
if (Stack= = $S and a = = $)
break // Parsing is over
Xm = top(Stack)
if (SRT [Xm , a] = shift)
Push a onto the stack
a = getnexttoken()
elseif (SRT [Xm , a] = reduce)
Search for the longest RHS that matches the top of the stack
if no match found
call error-recovery routine
Let denote this rule by Y ! Xm r +1 . . . Xm
Pop r elements o↵ the stack
Push Y onto the stack
Output Y ! Xm r +1 . . . Xm
else call error-recovery routine

Syntax analysis 196


Example for the expression grammar

Example:
Shift/reduce table
⇤ + ( ) id $
E !E +T E S S R
E !T T S R R R
T !T ⇤F F R R R R
T !F ⇤ S S
F ! (E ) + S S
F ! id ( S S
) R R R R
id R R R R
$ S S

Syntax analysis 197


Example of parsing

Stack Input Action


$ id + id ⇤ id$ Shift
$id +id ⇤ id$ Reduce by F ! id
$F +id ⇤ id$ Reduce by T ! F
$T +id ⇤ id$ Reduce by E ! T
$E +id ⇤ id$ Shift
$E + id ⇤ id$ Shift
$E + id ⇤id$ Reduce by F ! id
$E + F ⇤id$ Reduce by T ! F
$E + T ⇤id$ Shift
$E + T ⇤ id$ Shift
$E + T ⇤ id $ Reduce by F ! id
$E + T ⇤ F $ Reduce by T ! T ⇤ F
$E + T $ Reduce by E ! E + T
$E $ Accept

Syntax analysis 198


Precedence relation: principle

We define the (weak precedence) relations l and m between


symbols of the grammar (terminals or nonterminals)
I X l Y if XY appears in the RHS of a rule or if X precedes a
reducible word whose leftmost symbol is Y
I X m Y if X is the rightmost symbol of a reducible word and Y the
symbol immediately following that word
Shift when Xm l a, reduce when Xm m a
Reducing changes the precedence relation only at the top of the
stack (there is thus no need to shift backward)

Syntax analysis 199


Precedence relation: formal definition

Let G = (V , ⌃, R, S) be a context-free grammar and $ a new


symbol acting as left and right end-marker for the input word.
Define V 0 = V [ {$}
The weak precedence relations l and m are defined respectively on
V 0 ⇥ V and V ⇥ V 0 as follows:
+
1. X l Y if A ! ↵XB is in R, and B ) Y ,
2. X l Y if A ! ↵XY is in R
+
3. $ l X if S ) X ↵
+ ⇤
4. X m a if A ! ↵B is in R, and B ) X and )a
+
5. X m $ if S ) ↵X
for some ↵, , , and B

Syntax analysis 200


Construction of the SR table: shift
Shift relation, l:

Initialize S to the empty set.


1 add $ l S to S
2 for each production X ! L1 L2 . . . Lk
for i = 1 to k 1
add Li l Li+1 to S
3 repeat
for each⇤ pair X l Y in S
for each production Y ! L1 L2 . . . Lk
Add X l L1 to S
until S did not change in this iteration.


We only need to consider the pairs X l Y with Y a nonterminal that were added in
S at the previous iteration

Syntax analysis 201


Example of the expression grammar: shift

Step 1 Sl$
Step 2 E l+
+lT
T l⇤
⇤lF
E !E +T (lE
E !T E l)
T !T ⇤F Step 3.1 +lF
T !F ⇤ l id
F ! (E ) ⇤l(
F ! id (lT
Step 3.2 + l id
+l(
(lF
Step 3.3 (l(
(lid

Syntax analysis 202


Construction of the SR table: reduce
Reduce relation, m:

Initialize R to the empty set.


1 add S m $ to R
2 for each production X ! L1 L2 . . . Lk
for each pair X l Y in S
add Lk m Y in R
3 repeat
for each⇤ pair X m Y in R
for each production X ! L1 L2 . . . Lk
Add Lk m Y to R
until R did not change in this iteration.


We only need to consider the pairs X m Y with X a nonterminal that were added in
R at the previous iteration.

Syntax analysis 203


Example of the expression grammar: reduce

Step 1 E m$
Step 2 T m+
F m⇤
T m)
Step 3.1 T m$
E !E +T
F m+
E !T
id m ⇤
T !T ⇤F
)m⇤
T !F
F m)
F ! (E )
Step 3.2 F m$
F ! id
id m +
)m+
)m)
Step 3.3 id m $
)m$

Syntax analysis 204


Weak precedence grammars

Weak precedence grammars are those that can be analysed by a


weak precedence parser.
A grammar G = (V , ⌃, R, S) is called a weak precedence grammar
if it satisfies the following conditions:
1. There exist no pair of productions with the same right hand side
2. There are no empty right hand sides (A ! ✏)
3. There is at most one weak precedence relation between any two
symbols
4. Whenever there are two syntactic rules of the form A ! ↵X and
B ! , we don’t have X l B
Conditions 1 and 2 are easy to check
Conditions 3 and 4 can be checked by constructing the SR table.

Syntax analysis 205


Example of the expression grammar

Shift/reduce table
⇤ + ( ) id $
E !E +T
E S S R
E !T T S R R R
T !T ⇤F F R R R R
T !F ⇤ S S
F ! (E ) + S S
F ! id ( S S
) R R R R
id R R R R
$ S S

Conditions 1-3 are satisfied (there is no conflict in the SR table)


Condition 4:
I E ! E + T and E ! T but we don’t have + l E (see slide 202)
I T ! T ⇤ F and T ! F but we don’t have ⇤ l T (see slide 202)

Syntax analysis 206


Removing ✏ rules
Removing rules of the form A ! ✏ is not difficult
For each rule with A in the RHS, add a set of new rules consisting
of the di↵erent combinations of A replaced or not with ✏.
Example:

S ! AbA|B
B ! b|c
A ! ✏

is transformed into

S ! AbA|Ab|bA|b|B
B ! b|c

Syntax analysis 207


Summary of weak precedence parsing

Construction of a weak precedence parser


Eliminate ambiguity (or not, see later)
Eliminate productions with ✏ and ensure that there are no two
productions with identical RHS
Construct the shift/reduce table
Check that there are no conflict during the construction
Check condition 4 of slide 205

Syntax analysis 208


Using ambiguous grammars with bottom-up parsers

All grammars used in the construction of Shift/Reduce parsing


tables must be un-ambiguous
We can still create a parsing table for an ambiguous grammar but
there will be conflicts
We can often resolve these conflicts in favor of one of the choices to
disambiguate the grammar
Why use an ambiguous grammar?
I Because the ambiguous grammar is much more natural and the
corresponding unambiguous one can be very complex
I Using an ambiguous grammar may eliminate unnecessary reductions
Example:
E ! E + T |T
E ! E + E |E ⇤ E |(E )|id ) T ! T ⇤ F |F
F ! (E )|id

Syntax analysis 209


Set of LR(0) items of the ambiguous expression grammar

E ! E + E |E ⇤ E |(E )|id

Follow (E ) = {$, +, ⇤, )}
) states 7 and 8 have
shift/reduce conflicts for
+ and ⇤.

(Dragonbook)
Syntax analysis 210
Disambiguation
Example:
Parsing of id + id ⇤ id will give the configuration

(0E 1 + 4E 7, ⇤id$)

We can choose:
I ACTION[7, ⇤] =shift ) precedence to ⇤
I ACTION[7, ⇤] =reduce E ! E + E ) precedence to +

Parsing of id + id + id will give the configuration

(0E 1 + 4E 7, +id$)

We can choose:
I ACTION[7, +] =shift ) + is right-associative
I ACTION[7, +] =reduce E ! E + E ) + is left-associative
(same analysis for I8 )
Syntax analysis 211
Error detection and recovery

In table-driven parsers, there is an error as soon as the table


contains no entry (or an error entry) for the current stack (state)
and input symbols
The least one can do: report a syntax error and give information
about the position in the input file and the tokens that were
expected at that position
In practice, it is however desirable to continue parsing to report
more errors
There are several ways to recover from an error:
I Panic mode
I Phrase-level recovery
I Introduce specific productions for errors
I Global error repair

Syntax analysis 212


Panic-mode recovery

In case of syntax error within a “phrase”, skip until the next


synchronizing token is found (e.g., semicolon, right parenthesis) and
then resume parsing
In LR parsing:
I Scan down the stack until a state s with a goto on a particular
nonterminal A is found
I Discard zero or more input symbols until a symbol a is found that can
follow A
I Stack the state GOTO(s, A) and resume normal parsing

Syntax analysis 213


Phrase-level recovery

Examine each error entry in the parsing table and decide on an


appropriate recovery procedure based on the most likely programmer
error.
Examples in LR parsing: E ! E + E |E ⇤ E |(E )|id
I id + ⇤id:
⇤ is unexpected after a +: report a “missing operand” error, push an
arbitrary number on the stack and go to the appropriate next state
I id + id) + id:
Report a “unbalanced right parenthesis” error and remove the right
parenthesis from the input

Syntax analysis 214


Other error recovery approaches

Introduce specific productions for detecting errors:


Add rules in the grammar to detect common errors
Examples for a C compiler:
I ! if E I (parenthesis are missing around the expression)
I ! if (E ) then I (then is not needed in C)

Global error repair:


Try to find globally the smallest set of insertions and deletions that
would turn the program into a syntactically correct string
Very costly and not always e↵ective

Syntax analysis 215


Building the syntax tree

Parsing algorithms presented so far only check that the program is


syntactically correct
In practice, the parser needs also to build the parse tree (also called
concrete syntax tree)
Its construction is easily embedded into the parsing algorithm

Top-down parsing:
I Recursive descent: let each parsing function return the sub-trees for
the parts of the input they parse
I Table-driven: each nonterminal on the stack points to its node in the
partially built syntax tree. When the nonterminal is replaced by one
of its RHS, nodes for the symbols on the RHS are added as children
to the nonterminal node

Syntax analysis 216


in which tokens are grouped
ea often represented
token such inname
as <id,1>. The a parse
id is short for identifier. The value 1 is
Building the syntax tree
ymbol table produced by the compiler. This table is used to pass

he token <=>. In reality it is probably mapped to a pair, whose second


Bottom-up parsing:
hat there are many different identifiers so we need the second component,
mbol =.
n <id,2> I Each stack element points to a subtree of the syntax tree
en <+>.
right. I When performing a reduce, a new syntax tree is built with the
g and is discussed further in subsequent chapters. It is mapped to
e something. On the one nonterminal
hand there is onlyat
as bethe
one 3 so root
we couldand thethepopped-o↵ stack elements
just use as children
can be a difference between how such
ammar containing rules this should printed (e.g., in an error
hases) and how it should be stored (fixed vs. float vs double). Perhaps the
le where an entry for "this kind of 3" is stored. Another possibility is to
Note:
<;>.
I In practice, the concrete syntax tree is not built but rather an
rlly removed during scanning. In C, most blanks are non-significant.
simplified abstract syntax tree
I Depending on the complexity of the compiler, the syntax tree might
rs, and the various symbols and punctuation without using recursion
evenalso
ression (expr). Note notthe
behierarchical
constructed decomposition in the figure on the right.
ng)
parsing is somewhat arbitrary, but invariably if a recursive definition is involved,
g.
ch tokens are grouped
represented in a parse
d the syntax tree with operators as interior nodes and
rator. The syntax tree on the right corresponds to the parse

epresents ansuch
containing rules assignment
as expression not an assignment statement. In C an
railing semicolon.
Syntax analysis
That is, in C (unlike in Algol) the semicolon is a statement 217
Conclusion: top-down versus bottom-up parsing

Top-down
I Easier to implement (recursively), enough for most standard
programming languages
I Need to modify the grammar sometimes strongly, less general than
bottom-up parsers
I Used in most hand-written compilers
Bottom-up:
I More general, less strict rules on the grammar, SLR(1) powerful
enough for most standard programming languages
I More difficult to implement, less easy to maintain (add new rules,
etc.)
I Used in most parser generators like Yacc or Bison (but JavaCC is
top-down)

Syntax analysis 218


For your project

The choice of a parsing technique is left open for the project but we
ask you to implement the parser by yourself (Yacc, bison or other
parser generators are forbidden)
Weak precedence parsing was the recommended method in previous
implementations of this course
Motivate your choice in your report and explain any transformation
you had to apply to your grammar to make it fit the parser’s
constraints
To avoid mistakes, you should build the parsing tables by program

Syntax analysis 219

You might also like