Grammars and Parsing
Grammars and Parsing
• Grammars consisting entirely of rules with a single symbol on the left-hand side: called the
mother:
• Context-free grammars (CFGs).
• S
• => NP VP (rewriting S)
• => NAME VP (rewriting NP)
• => John VP (rewriting NAME)
• => John V NP (rewriting VP)
• => John ate NP (rewriting V)
• => John ate ART N (rewriting NP)
• => John ate the N (rewriting ART)
• => John ate the cat (rewriting N)
• Second process based on derivations is parsing,
• Parsing identifies the structure of sentences given a grammar.
• Given this evidence, we can conclude that the proposed constituent appears to
behave just like other NPs.
• I looked up John ‘s phone number and I looked up John ‘s chimney.
• Should these sentences have the identical structure?
• If so, you would presumably analyze both as subject-verb-complement
sentences with the complement in both cases being a PP.
• A typical parse state would be ((N VP) 2) indicating that the parser needs to find an N followed by a VP, starting at position two.
• New states are generated from old states depending on whether the first symbol is a lexical symbol or not.
• If it is a lexical symbol, like N in the preceding example, and if the next word can belong to that lexical category,
• then update the state by removing the first symbol and updating the position counter.
• since the word dogs is listed as an N in the lexicon, the next parser state would be ((VP) 3) which means it needs to find a VP starting at
position 3.
• If the first symbol is a nonterminal, like VP, then it is rewritten using a rule from the grammar.
• using rule 4 in Grammar, the new state would be ((V) 3) which means it needs to find a V starting at position 3.
• On the other hand, using rule 5, the new state would be ((V NP) 3)
• A parsing algorithm that is guaranteed to find a parse if there is one must systematically explore every possible new state.
• One simple technique for this is called backtracking.
• Using this approach, rather than generating a single new state from the state ((VP) 3), we generate all possible new states.
• One of these is picked to be the next state and the rest are saved as backup states.
• If we ever reach a situation where the current state cannot lead to a solution,
• simply pick a new current state from the list of backup states.
Grammar:
• 1 The 2 dogs 3 cried 4 1.
2.
S -> NP VP
NP -> ART N
3. NP -> ART ADJ
N
• A typical parse state would be ((N VP) 2) indicating that the parser needs to find an N 4. VP -> V
5. VP -> V NP
followed by a VP, starting at position two.
• New states are generated from old states depending on whether the first symbol is a
lexical symbol or not.
• If it is a lexical symbol, like N in the preceding example, and if the next word can belong to that
lexical category,
• then update the state by removing the first symbol and updating the position counter.
• since the word dogs is listed as an N in the lexicon, the next parser state would be ((VP) 3) which
means it needs to find a VP starting at position 3.
• If the first symbol is a nonterminal, like VP, then it is rewritten using a rule from the grammar.
• using rule 4 in Grammar, the new state would be ((V) 3) which means it needs to find a V starting
at position 3.
• On the other hand, using rule 5, the new state would be ((V NP) 3)
• A parsing algorithm that is guaranteed to find a parse if there is one must systematically explore
every possible new state.
• One simple technique for this is called backtracking.
• Using this approach, rather than generating a single new state from the state ((VP) 3), we generate all possible
new states.
• One of these is picked to be the next state and the rest are saved as backup states.
• If we ever reach a situation where the current state cannot lead to a solution,
• simply pick a new current state from the list of backup states.
Simple Top-Down Parsing Algorithm
• 1. Select the current state: Take the first state off the possibilities list and call
it C.
• If the possibilities list is empty, then the algorithm fails (that is, no successful parse is
possible).
• 2. If C consists of an empty symbol list and the word position is at the end of
the sentence, then the algorithm succeeds.
• 3. Otherwise, generate the next possible states.
• 3.1. If the first symbol on the symbol list of C is a lexical symbol, and the next word in the
sentence can be in that class,
• then create a new state by removing the first symbol from the symbol list and updating the
word position, and add it to the possibilities list.
• 3.2. Otherwise, if the first symbol on the symbol list of C is a non-terminal, generate a
new state for each rule in the grammar that can rewrite that nonterminal symbol and
add them all to the possibilities list.
Grammar:
Top-down depth-first parse of 1 The 2 dogs 3 cried 4 1. S -> NP VP
2. NP -> ART N
3. NP -> ART ADJ
Step Current State Backup States Comment N
4. VP -> V
1. ((S) 1) initial position 5. VP -> V NP
lexicon is:
•the: ART
•old: ADJ, N (ambiguous)
•man: N, V (ambiguous)
•cried: V
Grammar:
1. S -> NP VP
2. NP -> ART N
3. NP -> ART ADJ
N
4. VP -> V
5. VP -> V NP
Bottom-Up Chart Parser
The large can can hold the water