Parsing Intro 1
Parsing Intro 1
Slide CS4741
Syntax
syntax: from the Greek syntaxis, meaning setting out together or
arrangement.
Refers to the way words are arranged together.
Why worry about syntax?
The boy ate the frog.
The frog was eaten by the boy.
The frog that the boy ate died.
The boy whom the frog was eaten by died.
Slide CS4742
Syntactic Analysis
Key ideas:
constituency: groups of words may behave as a single unit or phrase
grammatical relations: refer to the subject, object, indirect
object, etc.
subcategorization and dependencies: refer to certain kinds of
relations between words and phrases, e.g. want can be followed by an
infinitive, but find and work cannot.
Slide CS4743
Parse Tree
NP
NAME
Beavis
Slide CS4744
VP
ate
NP
ART
the
cat
CFG example
CFGs are also called phrase-structure grammars.
Equivalent to Backus-Naur Form (BNF).
1. S NP VP
5. NAME Beavis
2. VP V NP
6. V ate
3. NP NAME
7. ART the
4. NP ART N
8. N cat
Slide CS4745
CFGs
A context free grammar consists of:
1. a set of non-terminal symbols N
2. a set of terminal symbols (disjoint from N)
3. a set of productions, P, each of the form A , where A is a
non-terminal and is a string of symbols from the infinite set of
strings ( N )
4. a designated start symbol S
Slide CS4746
Derivations
If the rule A P , and and are strings in the set ( N ) ,
then we say that A directly derives , or A
Let 1 , 2 , . . . , m be strings in ( N ) , m > 1, such that
1 2 , 2 3 , . . . , m1 m ,
Slide CS4747
LG
The language LG generated by a grammar G is the set of strings
composed of terminal symbols that can be derived from the designated
start symbol S.
LG = {w|w , S w}
Parsing: the problem of mapping from a string of words to its parse
tree according to a grammar G.
Slide CS4748
Top-Down
Bottom-Up
1. S NP VP
S NP VP
2. VP V NP
NAME VP
3. NP NAME
Beav VP
4. NP ART N
Beav V NP
NAME V ART N
5. NAME Beavis
Beav ate NP
NP V ART N
6. V ate
NP V NP
7. ART the
NP VP
8. N cat
Slide CS4749
A Top-Down Parser
Input: CFG grammar, lexicon, sentence to parse
Output: yes/no
State of the parse: (symbol list, position)
1
The
old
man
cried
Slide CS47410
4. VP v
2. NP art n
5. VP v NP
3. NP art adj n
Lexicon:
the: art
old: adj, n
man: n, v
cried: v
1
The
old
man
cried
Slide CS47411
POS
(b) If s1 is a lexical symbol and next word can be in that class, create
new state by removing s1 , updating the word position, and adding it
to P SL. (Ill add to front.)
(c) If s1 is a non-terminal, generate a new state for each rule in the
grammar that can rewrite s1 . Add all to P SL. (Add to front.)
Slide CS47412
Example
Current state
Backup states
1. ((S) 1)
2. ((NP VP) 1)
3. ((art n VP) 1)
4. ((n VP) 2)
5. ((VP) 3)
6. ((v) 3)
7. (() 4)
Slide CS47413
Backtrack
8. ((v NP) 3)
...
9. ((art adj n VP) 1)
10. ((adj n VP) 2)
11. ((n VP) 3)
12. ((VP) 4)
13. ((v) 4)
((v NP) 4)
14. (() 5)
((v NP) 4)
YES
DONE!
Slide CS47414
leads to backtracking
Slide CS47415
Efficient Parsing
The top-down parser is terribly inefficient.
Have the first year Phd students in the computer science
department take the Q-exam.
Have the first year Phd students in the computer science
department taken the Q-exam?
Slide CS47416