0% found this document useful (0 votes)
59 views

Parsing Intro 1

A guide to parsing the secrets of life and all its beautiful little grammers and the grammers of the little details that it offers us

Uploaded by

Edwin Ma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Parsing Intro 1

A guide to parsing the secrets of life and all its beautiful little grammers and the grammers of the little details that it offers us

Uploaded by

Edwin Ma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Parsing

1. Grammars and parsing


2. Top-down and bottom-up parsing
3. Chart parsers
4. Bottom-up chart parsing
5. The Earley Algorithm

Slide CS4741

Syntax
syntax: from the Greek syntaxis, meaning setting out together or
arrangement.
Refers to the way words are arranged together.
Why worry about syntax?
The boy ate the frog.
The frog was eaten by the boy.
The frog that the boy ate died.
The boy whom the frog was eaten by died.

Slide CS4742

Syntactic Analysis
Key ideas:
constituency: groups of words may behave as a single unit or phrase
grammatical relations: refer to the subject, object, indirect
object, etc.
subcategorization and dependencies: refer to certain kinds of
relations between words and phrases, e.g. want can be followed by an
infinitive, but find and work cannot.

All can be modeled by various kinds of grammars that are based on


context-free grammars.

Slide CS4743

Grammars and Parsing


Need a grammar: a formal specification of the structures allowable in
the language.
Need a parser: algorithm for assigning syntactic structure to an input
sentence.
Sentence

Parse Tree

Beavis ate the cat.


S

NP

NAME

Beavis

Slide CS4744

VP

ate

NP

ART

the

cat

CFG example
CFGs are also called phrase-structure grammars.
Equivalent to Backus-Naur Form (BNF).
1. S NP VP

5. NAME Beavis

2. VP V NP

6. V ate

3. NP NAME

7. ART the

4. NP ART N

8. N cat

CFGs are powerful enough to describe most of the structure in


natural languages.
CFGs are restricted enough so that efficient parsers can be built.

Slide CS4745

CFGs
A context free grammar consists of:
1. a set of non-terminal symbols N
2. a set of terminal symbols (disjoint from N)
3. a set of productions, P, each of the form A , where A is a
non-terminal and is a string of symbols from the infinite set of
strings ( N )
4. a designated start symbol S

Slide CS4746

A Language, L, is defined via a derivation.


Informally, one string derives another one if
the first can be rewritten as the second via some
series of rule applications from the grammar.

E.g., NP VP can be rewritten as NAME VP via


the NP ---> NAME rule.

Derivations
If the rule A P , and and are strings in the set ( N ) ,
then we say that A directly derives , or A
Let 1 , 2 , . . . , m be strings in ( N ) , m > 1, such that
1 2 , 2 3 , . . . , m1 m ,

then we say that 1 derives m or 1 m

Slide CS4747

LG
The language LG generated by a grammar G is the set of strings
composed of terminal symbols that can be derived from the designated
start symbol S.

LG = {w|w , S w}
Parsing: the problem of mapping from a string of words to its parse
tree according to a grammar G.

Slide CS4748

General Parsing Strategies


Grammar

Top-Down

Bottom-Up

1. S NP VP

S NP VP

NAME ate the cat

2. VP V NP

NAME VP

NAME V the cat

3. NP NAME

Beav VP

NAME V ART cat

4. NP ART N

Beav V NP

NAME V ART N

5. NAME Beavis

Beav ate NP

NP V ART N

6. V ate

Beav ate ART N

NP V NP

7. ART the

Beav ate the N

NP VP

8. N cat

Beav ate the cat

Slide CS4749

Beavis ate the cat

A Top-Down Parser
Input: CFG grammar, lexicon, sentence to parse
Output: yes/no
State of the parse: (symbol list, position)
1

The

old

man

cried

start state: ((S) 1)

Slide CS47410

1 The 2 old 3 man 4 cried 5


What is the final (success!) state?
A. ((S) 5)
B. (( ) 5)
C. (( ) )
D. none of the above

Grammar and Lexicon


Grammar:
1. S NP VP

4. VP v

2. NP art n

5. VP v NP

3. NP art adj n
Lexicon:
the: art
old: adj, n
man: n, v
cried: v
1

The

old

man

cried

Slide CS47411

Algorithm for a Top-Down Parser


P SL (((S) 1))
1. Check for failure. If PSL is empty, return NO.
2. Select the current state, C. C pop (PSL).
3. Check for success. If C = (() <final-position>), YES.
4. Otherwise, generate the next possible states.
(a) s1 first-symbol(C)

POS

(b) If s1 is a lexical symbol and next word can be in that class, create
new state by removing s1 , updating the word position, and adding it
to P SL. (Ill add to front.)
(c) If s1 is a non-terminal, generate a new state for each rule in the
grammar that can rewrite s1 . Add all to P SL. (Add to front.)

Slide CS47412

Will this parser produce a parse tree?


A. Yes
B. No
C. It depends
D. Dunno

What kind of search does the parser use?


A. depth-first
B. breadth-first
C. best-first
D. I don't know (forgot all I knew) about search algorithms

Example
Current state

Backup states

1. ((S) 1)
2. ((NP VP) 1)
3. ((art n VP) 1)

((art adj n VP) 1)

4. ((n VP) 2)

((art adj n VP) 1)

5. ((VP) 3)

((art adj n VP) 1)

6. ((v) 3)

((v NP) 3) ((art adj n VP) 1)

7. (() 4)

((v NP) 3) ((art adj n VP) 1)

Slide CS47413

Backtrack

8. ((v NP) 3)

((art adj n VP) 1)

...
9. ((art adj n VP) 1)
10. ((adj n VP) 2)
11. ((n VP) 3)
12. ((VP) 4)
13. ((v) 4)

((v NP) 4)

14. (() 5)

((v NP) 4)

YES
DONE!

Slide CS47414

leads to backtracking

Problems with the Top-Down Parser


1. Only judges grammaticality.
2. Stops when it finds a single derivation.
3. No semantic knowledge employed.
4. No way to rank the derivations.
5. Problems with left-recursive rules.
6. Problems with ungrammatical sentences.

Slide CS47415

Efficient Parsing
The top-down parser is terribly inefficient.
Have the first year Phd students in the computer science
department take the Q-exam.
Have the first year Phd students in the computer science
department taken the Q-exam?

Slide CS47416

You might also like