LR Parsing PDF
LR Parsing PDF
LR Parsing
Reading: Hopcroft and Ullman, Intro. to Automata Theory, Lang. and Comp. Section 10.6-10.7, pp. 248256
Shift-Reduce Parsing
A class of parsers with the following principles: Parsing is done Bottom-Up, reducing the input into the grammar start symbol The parser builds a right-most derivation of the input in reverse Parsing algorithm simulates the operation of a PDA Prex of the sentential form is kept on the stack Two types of operation: Shift the next input symbol onto the stack Reduce the stack by popping the RHS of a grammar rule, and pushing the corresponding LHS non-terminal symbol Parser is usually deterministic and with no back-tracking Extremely efcient, operating in linear time
LR Parsing
General Principles: Use sets of dotted grammar rules to reect the state of the parser: What constituents have we constructed so far What constituents are we predicting next Pre-compile the grammar into a collection of nite sets of dotted rules Use these sets to capture the state of the parser during parsing The Parser is a deterministic shift-reduce parser. Developed by Knuth in the late 1960s - as a framework for compiling programming languages
2 11-711 Algorithms for NLP
LR Parsing Algorithm
Performs shift and reduce parsing actions on the stack, and changes state with each operation Is driven by a pre-compiled parsing table that has two parts The action table species the next shift or reduce parsing operation The goto table species which state to transfer to after a reduction The stack stores a string of the form 0 1 1 2 are parser states and the are grammar symbols
where the
At each step the parser does one of the following types of operations: Shift(s): Push the current input symbol the new state
Reduce(i): Reduce the stack according to rule of the grammar Reject: Reject the input as ungrammatical and signal an error Accept: Accept the input as grammatical and halt
3 11-711 Algorithms for NLP
LR Parsing - Example
The Grammar: 1 2
!"# !$ &'% !"# &
3
4
5
6
1 )
The original input: The large can can hold the water POS assigned input: art adj n aux v art n art adj n aux v art n $ Parser input:
1 ) 1 )
LR Parsing - Example
Constructed Parsing Table for the Grammar:
Shift aux v $ acc sh6 sh9 sh10 sh6 sh13 sh7 12 sh7 NP 2
Goto VP S 1 5
11
LR Parsing - Example
The input:
)
Step 0 1 2 3 4 5 6 7 8 9 10 11 12 13
Action
2 2 2 2 2 2 2 2 2 2 2 2 2
34 34 34 9 9 9 9 9 9 9 9 @ @ @ @ @ @ @ @ 5 5
1 1
2 2 2 2
36 36 7 B B B B B B @
2 2 2 2 2 2 2 2 6 6 6 6 6 6 7 7 7 7 3 3 9
2 5 2 2 2
2 2 2 2 2 2 2 2
3A 3A 3A 3A 3A 3A 2 D
2 2 2 2 2
C C C C D
2 2 2
34 34 9
11
5 @
12
E E
We construct a deterministic FSA that recognizes prexes of rightmost sentential forms of the grammar . The states of the FSA are sets of LR(0) items We augment the grammar with a new start rule We dene the closure operation on a set 1. Every item in 2. If add
R c T d S Q
of
0 items:
is also in
XY` (a "b
XY`
a(
"b
and
is a rule in
e
, then
to
T
The closure operation adds predicted new items to the set (similar to Earleys Predictor operation)
10
R e H
XY`
a(
"b
U`
Example:
I H P
c T S
1 h H Ti p q q t q q
U`
!()
g r q q 0 s v u q q
11
We construct the collection of sets of LR(0) items for an augmented grammar G We start with the item set So = {closure({[S1+ a s ] ) ) ) The algorithm:
procedure iterns ( G' ) ; begin C := { ~ . l o s u r e ( { l S1 . .SI})}; repeat for each set o f items I in C and each grammar symbol X such that goto ( I . X ) is not empty and not in C do add g o t o ( / , X ) to C until no more sets o f items can be added to C end
@ 87 @ 365 9 3A @ @ @ C 34 BD
11 :
12 :
E E
@ @ BD 9 3A 3A 9 @ D D D D 9 @ @ @ 9 @ 9 @ E @ @ 9 C C 34 34 36 34 9 365 87 536 85 D @ B D @ @ 7 @ 87 8
13 :
58 34 @ 9
87 36 @
x w
E E
@ D @ E y E @ E 9 9 34 536
87 @ 58 34 @ 9 9 @ y E 36 @ E 9 87
@ BD D 3A @ E D @ D 9 9 C @ 34 @ 36 5
87 8 5 7 34 @ 9 9 @ 36 8
0:
1:
2:
3:
4:
E
10 : 8: 9:
E
5:
6:
7:
13
F G
and
a
I H
Q c T S 1
(b) If (c) If
#!
(e) All table entries not dened in (a)-(d) are set as error
14 11-711 Algorithms for NLP
U`
#!
&
and
$ ( b
I H `
c T 1 a ! "b c T
then set
$
1 I
#!
&
bX
, if
a
and
I H P R
R Q cdT S 1
` g
S1
NP adj art
S2
VP
S5
aux
S4
n n
S3
art adj v v
aux
S6
VP
S7 S10 S9 S8
NP n
S11
S13
adj
15
S12
LR(k) Parsing
How to handle conicts in the SLR table: A table conict: more than one action is specied in
Conicts can be either shift-reduce or reduce-reduce Parser will not be able to parse deterministically A Grammar for which this happens is not SLR More powerful techniques for building item sets can sometimes resolve the problem, by making use of lookaheads into the input Known techniques: Canonical LR(k), LALR(k) A lookahead of one is sufcient (and optimal) in many cases Another option - extending the LR Parsing algorithm: GLR Parsing
16
17
LR Parsing - Example
The input:
)
18
Step 0 1 2 3 4 5 6 7 8 9 10 11 12 13
Action
Parse Node
1 2 3 4
5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 9 3 3 8 9
2 5 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2
34 34 34 9 9 9 9 9 9 9 9 E E
12
2
5 5 5 @ @ @ @ @ @ @ @
2 2 2
36 36
2 2 2 2 2 2 2 2
B 2
2 2 2 2 2 2 2 2
5 6 7 8 9
3A 3A 3A 3A 3A 3A @ D B B B B B
2 2 2 2 2
C C C C D
2 2 2
10
34 34 9 2
11
5 @
12
10 VP (6 9) 11 VP (5 10) 12 S (4 11)
11
19