0% found this document useful (0 votes)

18 views29 pages

Grammars and Parsing

Uploaded by

Deep 21 Haloi 2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views29 pages

Grammars and Parsing

Uploaded by

Deep 21 Haloi 2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Grammars and Parsing

• To compute syntactic structure of a sentence, we need two things:

• The grammar: A formal specification of the structures allowable in the language,
• The parsing technique: The method of analyzing a sentence to determine its
structure according to the grammar.
• Grammars and Sentence Structure:
• The tree representation for the sentence John ate the cat:
The sentence (S) consists of an initial noun phrase (NP)
and a verb phrase (VP). The initial noun phrase is made
of the simple NAME John. The verb phrase is composed of
a verb (V) ate and an NP, which consists of an article
(ART) the and a common noun (N) cat.

In list notation this same structure could be represented

as:

(S (NP (NAME John))

• Node at the top---Root, (VP (V ate)
• Node at the bottom---Leaves (NP (ART the)
• Link, Parent Node, Child Node, Ancestor, (N cat) )))
• A node is dominated by its ancestor Nodes
• Root Node dominates all other Nodes in the tree
• A set of rewrite rules describes what tree structures are allowable:

• Grammars consisting entirely of rules with a single symbol on the left-hand side: called the
mother:
• Context-free grammars (CFGs).

• CFGs are a very important class of grammars for two reasons:

• 1. The formalism is powerful enough to describe most of the structure in natural
languages,
• 2. Yet it is restricted enough so that efficient parsers can be built to analyze sentences.
• Symbols that cannot be further decomposed in a grammar, namely the words in the example,
are called terminal symbols.
• Other symbols, such as NP, VP, and S, are called nonterminal symbols.
• The grammatical symbols such as N and V that describe word categories are called lexical
symbols.
• Words will be listed under multiple categories.
• For example, can would be listed under V and N.
• Start Symbol: S
• Process of Sentence generation uses derivations to construct legal sentences.
• A simple generator could be implemented by randomly choosing rewrite rules, starting from the S symbol, until
you have a sequence of words.

• S
• => NP VP (rewriting S)
• => NAME VP (rewriting NP)
• => John VP (rewriting NAME)
• => John V NP (rewriting VP)
• => John ate NP (rewriting V)
• => John ate ART N (rewriting NP)
• => John ate the N (rewriting ART)
• => John ate the cat (rewriting N)
• Second process based on derivations is parsing,
• Parsing identifies the structure of sentences given a grammar.

• There are two basic methods of searching.

• A top-down strategy starts with the S symbol and then searches through different ways
to rewrite the symbols until the input sentence is generated, or until all possibilities have
been explored.
• The preceding example demonstrates that John ate the cat is a legal sentence by
showing the derivation that could be found by this process.
• In a bottom-up strategy, we start with the words in the sentence and use the rewrite rules backward to
reduce the sequence of symbols until it consists solely of S.
• The left-hand side of each rule is used to rewrite the symbol on the right-hand side.
• A possible bottom-up parse of the sentence John ate the cat is—

• => NAME ate the cat (rewriting John)

• => NAME V the cat (rewriting ate)
• => NAME V ART cat (rewriting the)
• => NAME V ART N (rewriting cat)
• => NP V ART N (rewriting NAME)
• => NP V NP (rewriting ART N)
• => NP VP (rewriting V NP)
• => S (rewriting NP VP)
What Makes a Good Grammar
• Generality--the range of sentences the grammar analyzes correctly;
• Selectivity--the range of non-sentences it identifies as problematic;
• Understandability—the simplicity of the grammar

• When a group of words forms a particular constituent

• try to construct a new sentence that involves that group of words in a
conjunction with another group of words classified as the same type of
constituent.
• This is a good test because for the most part only constituents of the same
type can be conjoined.
The acceptable sentences:

• NP: I ate a hamburger and a hot dog..

• VP: I will eat the hamburger and throw away the hot dog.
• S: I ate a hamburger and John ate a hot dog.
• PP: I saw a hot dog in the bag and on the stove.
• ADJP: I ate a cold and well burned hot dog.
• ADVP: I ate the hot dog slowly and very carefully.
• N: I ate a hamburger and hot dog.
• V: I will cook and burn a hamburger.
• AUX: I can and will eat the hot dog.
• ADJ: I ate the very cold and burned hot dog

Not The acceptable sentences:

• *I ate a hamburger and on the stove.

• *I ate a cold hot dog and well burned.
• *I ate the hot dog slowly and a hamburger.
• Another test involves inserting the proposed constituent into other
sentences that take the same category of constituent.

• John’s hitting of Mary is an NP in John’s hitting of Mary alarmed Sue,

• then it should be usable as an NP in other sentences as well.
• In fact this is true—the NP can be the object of a verb,
• I cannot explain John’s hitting of Mary
• as well as in the passive form of the initial sentence
• Sue was alarmed by John’s hitting of Mary.

• Given this evidence, we can conclude that the proposed constituent appears to
behave just like other NPs.
• I looked up John ‘s phone number and I looked up John ‘s chimney.
• Should these sentences have the identical structure?
• If so, you would presumably analyze both as subject-verb-complement
sentences with the complement in both cases being a PP.

• That is, up John’s phone number would be a PP.

• try the conjunction test–

• Conjoining up John’s phone number with another PP, as in—
• *I looked up John’s phone number and in his cupboards, is certainly not acceptable
• I looked up John’s chimney and in his cupboards is perfectly acceptable.

• Thus apparently the analysis of up John ‘s phone number as a PP is incorrect.

• Thus a different analysis is needed for each of the two sentences.
• If up John’s phone number is not a PP,
• then two remaining analyses may be possible.
• The VP could be the complex verb looked up followed by an NP,
• or it could consist of three components: the V looked, a particle up, and an NP.
• Either of these is a better solution.
Top-Down Parser
• Parsing algorithm: a procedure that searches through various ways of combining
grammatical rules to find a combination that generates a tree that could be the
structure of the input sentence.
• the algorithm will say whether a certain sentence is accepted by the grammar or not
• A top-down parser: starts with the S symbol and attempts to rewrite it into a
sequence of terminal symbols that matches the classes of the words in the input
sentence.
• The state of the parse at any given time can be represented as a list of symbols that
are the result of operations applied so far, called the symbol list.
• the parser starts in the state (S) and after applying the rule S -> NP VP the symbol list will be (NP
VP).
• if it then applies the rule NP ->ART N, the symbol list will be (ART N VP), and so on...
• The parser could continue in this fashion—
• until the state consisted entirely of terminal symbols,
• and then it could check the input sentence to see if it matched
• But this would be quite wasteful, for a mistake made early on (say, in choosing the
rule that rewrites S) is not discovered until much later
• A better algorithm checks the input as soon as it can.
• In addition, a structure called the lexicon is used to efficiently store the possible categories for each word.
• A very small lexicon for use in the examples is
• cried: V Grammar:
• dogs: N, V 1. S -> NP VP
• the: ART 2. NP -> ART N
• a state of the parse is now defined by a pair: 3. NP -> ART ADJ
• a symbol list similar to before and a number indicating the current position in the sentence. N
4. VP -> V
5. VP -> V NP
• 1 The 2 dogs 3 cried 4

• A typical parse state would be ((N VP) 2) indicating that the parser needs to find an N followed by a VP, starting at position two.
• New states are generated from old states depending on whether the first symbol is a lexical symbol or not.
• If it is a lexical symbol, like N in the preceding example, and if the next word can belong to that lexical category,
• then update the state by removing the first symbol and updating the position counter.
• since the word dogs is listed as an N in the lexicon, the next parser state would be ((VP) 3) which means it needs to find a VP starting at
position 3.
• If the first symbol is a nonterminal, like VP, then it is rewritten using a rule from the grammar.
• using rule 4 in Grammar, the new state would be ((V) 3) which means it needs to find a V starting at position 3.
• On the other hand, using rule 5, the new state would be ((V NP) 3)
• A parsing algorithm that is guaranteed to find a parse if there is one must systematically explore every possible new state.
• One simple technique for this is called backtracking.
• Using this approach, rather than generating a single new state from the state ((VP) 3), we generate all possible new states.
• One of these is picked to be the next state and the rest are saved as backup states.
• If we ever reach a situation where the current state cannot lead to a solution,
• simply pick a new current state from the list of backup states.
Grammar:
• 1 The 2 dogs 3 cried 4 1.
2.
S -> NP VP
NP -> ART N
3. NP -> ART ADJ
N
• A typical parse state would be ((N VP) 2) indicating that the parser needs to find an N 4. VP -> V
5. VP -> V NP
followed by a VP, starting at position two.
• New states are generated from old states depending on whether the first symbol is a
lexical symbol or not.
• If it is a lexical symbol, like N in the preceding example, and if the next word can belong to that
lexical category,
• then update the state by removing the first symbol and updating the position counter.
• since the word dogs is listed as an N in the lexicon, the next parser state would be ((VP) 3) which
means it needs to find a VP starting at position 3.
• If the first symbol is a nonterminal, like VP, then it is rewritten using a rule from the grammar.
• using rule 4 in Grammar, the new state would be ((V) 3) which means it needs to find a V starting
at position 3.
• On the other hand, using rule 5, the new state would be ((V NP) 3)
• A parsing algorithm that is guaranteed to find a parse if there is one must systematically explore
every possible new state.
• One simple technique for this is called backtracking.
• Using this approach, rather than generating a single new state from the state ((VP) 3), we generate all possible
new states.
• One of these is picked to be the next state and the rest are saved as backup states.
• If we ever reach a situation where the current state cannot lead to a solution,
• simply pick a new current state from the list of backup states.
Simple Top-Down Parsing Algorithm

• The algorithm manipulates a list of possible states---called the

possibilities list.
• The first element of this list is the current state,
• consists of a symbol list and a word position in the sentence,
• remaining elements of the search state are the backup states, each indicating an
alternate symbol-list—word-position pair.
• For example, the possibilities list
• (((N) 2) ((NAME) 1) ((ADJ N) 1))
• current state consists of the symbol list (N) at position 2,
• and that there are two possible backup states:
• one consisting of the symbol list (NAME) at position 1
• and the other consisting of the symbol list (ADJ N) at position 1.
• The algorithm starts with the initial state ((S) 1) and no backup states.

• 1. Select the current state: Take the first state off the possibilities list and call
it C.
• If the possibilities list is empty, then the algorithm fails (that is, no successful parse is
possible).
• 2. If C consists of an empty symbol list and the word position is at the end of
the sentence, then the algorithm succeeds.
• 3. Otherwise, generate the next possible states.
• 3.1. If the first symbol on the symbol list of C is a lexical symbol, and the next word in the
sentence can be in that class,
• then create a new state by removing the first symbol from the symbol list and updating the
word position, and add it to the possibilities list.
• 3.2. Otherwise, if the first symbol on the symbol list of C is a non-terminal, generate a
new state for each rule in the grammar that can rewrite that nonterminal symbol and
add them all to the possibilities list.
Grammar:
Top-down depth-first parse of 1 The 2 dogs 3 cried 4 1. S -> NP VP
2. NP -> ART N
3. NP -> ART ADJ
Step Current State Backup States Comment N
4. VP -> V
1. ((S) 1) initial position 5. VP -> V NP

2. ((NP VP) 1) rewriting S by rule 1

3. ((ART N VP) 1) rewriting NP by rules 2 & 3
((ART ADJ N VP) 1)
4. ((N VP) 2) matching ART with the
((ART ADJ N VP) 1)
5. ((VP) 3) matching N with dogs
((ART ADJ N VP) 1)
6. ((V) 3) rewriting VP by rules 5
((V NP) 3)
((ART ADJ N VP) 1)
7. the parse succeeds as V is matched to cried, leaving an empty
grammatical symbol list with an empty sentence
Now, Let us consider the same
algorithm and grammar
operating on the sentence:
1 The 2 old 3 man 4 cried 5

lexicon is:
•the: ART
•old: ADJ, N (ambiguous)
•man: N, V (ambiguous)
•cried: V

Grammar:
1. S -> NP VP
2. NP -> ART N
3. NP -> ART ADJ
N
4. VP -> V
5. VP -> V NP
Bottom-Up Chart Parser
The large can can hold the water

NLP - UNIT II
No ratings yet
NLP - UNIT II
13 pages
UNIT III_NLP
No ratings yet
UNIT III_NLP
36 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
Year 5 Toolkit
100% (1)
Year 5 Toolkit
297 pages
NLP UNIT 2 Notes
No ratings yet
NLP UNIT 2 Notes
14 pages
NLP Unit Ii
No ratings yet
NLP Unit Ii
30 pages
Natural Language Processing
No ratings yet
Natural Language Processing
11 pages
BASIC PARSING TECHNIQUES
No ratings yet
BASIC PARSING TECHNIQUES
9 pages
Reading Advantage 4 TG (4TH)
No ratings yet
Reading Advantage 4 TG (4TH)
80 pages
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
No ratings yet
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
89 pages
Constituency Parsing Ppt
No ratings yet
Constituency Parsing Ppt
94 pages
Syntax Parsing: Lecture # 6
100% (1)
Syntax Parsing: Lecture # 6
65 pages
Chapter 2 Automata
0% (1)
Chapter 2 Automata
29 pages
NLP_M3_SPP
No ratings yet
NLP_M3_SPP
53 pages
CHAPTER 3
No ratings yet
CHAPTER 3
71 pages
Module 3 NLP
No ratings yet
Module 3 NLP
32 pages
Mod - 3 (2)
No ratings yet
Mod - 3 (2)
51 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
50 pages
module-2 ch-4
No ratings yet
module-2 ch-4
32 pages
4.Chapter5_ Syntactic and Semantic Representations
No ratings yet
4.Chapter5_ Syntactic and Semantic Representations
47 pages
Lecture15 Parsing
No ratings yet
Lecture15 Parsing
37 pages
c
No ratings yet
c
54 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
11-Syntax_part3
No ratings yet
11-Syntax_part3
36 pages
INFO 2950: Prof. Carla Gomes Gomes@cs - Cornell.edu
No ratings yet
INFO 2950: Prof. Carla Gomes Gomes@cs - Cornell.edu
43 pages
Unit Iii
No ratings yet
Unit Iii
17 pages
21cse356t Nlp Unit 2
No ratings yet
21cse356t Nlp Unit 2
89 pages
Natural Language Processing Artificial Intelligence
No ratings yet
Natural Language Processing Artificial Intelligence
81 pages
Unit 3-2
No ratings yet
Unit 3-2
26 pages
NLP Unit 2 Part 1
No ratings yet
NLP Unit 2 Part 1
28 pages
3.3-NLP
No ratings yet
3.3-NLP
32 pages
14-syntax-1
No ratings yet
14-syntax-1
22 pages
13-Dependency Grammar-03-09-2024
No ratings yet
13-Dependency Grammar-03-09-2024
31 pages
NLP-UNIT-II
No ratings yet
NLP-UNIT-II
30 pages
Overview of Linguistics
No ratings yet
Overview of Linguistics
19 pages
Unit - 2 NLP - R20
No ratings yet
Unit - 2 NLP - R20
21 pages
nlp unit 2
No ratings yet
nlp unit 2
13 pages
Ai Unit 5
No ratings yet
Ai Unit 5
19 pages
compiler_design- Module3
No ratings yet
compiler_design- Module3
19 pages
Ia-1 NLP
No ratings yet
Ia-1 NLP
7 pages
UNIT 3- Formal Grammar IN ENGLISH
No ratings yet
UNIT 3- Formal Grammar IN ENGLISH
5 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages
Unit - 2
No ratings yet
Unit - 2
13 pages
Units - 2.1
No ratings yet
Units - 2.1
8 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
Unit 3
No ratings yet
Unit 3
8 pages
Top Down Parsing
No ratings yet
Top Down Parsing
4 pages
Third Unit
No ratings yet
Third Unit
13 pages
Natural Language Processing
100% (1)
Natural Language Processing
21 pages
214 Grammar 2014
No ratings yet
214 Grammar 2014
50 pages
Natural Language Processing: Dr. Ahmed El-Bialy
100% (1)
Natural Language Processing: Dr. Ahmed El-Bialy
49 pages
Formal Grammars and Parsing
No ratings yet
Formal Grammars and Parsing
11 pages
Chart Parsers PDF
No ratings yet
Chart Parsers PDF
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
Ai Phases in NLP Sem Vi
No ratings yet
Ai Phases in NLP Sem Vi
3 pages
Context Free Grammars
No ratings yet
Context Free Grammars
38 pages
Rifah Fazlin Syaza Het325 Fieldwork Research Report
No ratings yet
Rifah Fazlin Syaza Het325 Fieldwork Research Report
38 pages
Natural Language Processing
No ratings yet
Natural Language Processing
13 pages
THE MAIN CLASSIFICATION OF PERIPHRASIS
No ratings yet
THE MAIN CLASSIFICATION OF PERIPHRASIS
5 pages
Chapter15 NaturalLanguage
100% (1)
Chapter15 NaturalLanguage
35 pages
Daily Lesson Plan Year 6
No ratings yet
Daily Lesson Plan Year 6
10 pages
Exam Syllabi.yr 8.Second Term.2024-25
No ratings yet
Exam Syllabi.yr 8.Second Term.2024-25
6 pages
Basic Parsing Techniques - Parsing
No ratings yet
Basic Parsing Techniques - Parsing
20 pages
MATERIAL - DESCRIBING PEOPLE + PRESENT SIMPLE
No ratings yet
MATERIAL - DESCRIBING PEOPLE + PRESENT SIMPLE
12 pages
Reported Speech - Explanation and Practice
No ratings yet
Reported Speech - Explanation and Practice
4 pages
PRESENT TENSES 2026
No ratings yet
PRESENT TENSES 2026
4 pages
Course2 Tokenization
No ratings yet
Course2 Tokenization
44 pages
STATE Vs Dynamic Verbs
100% (1)
STATE Vs Dynamic Verbs
5 pages
KC4 Tests U03 Standard
100% (5)
KC4 Tests U03 Standard
5 pages
Connected Speech
No ratings yet
Connected Speech
106 pages
Formal and Informal Lang Exercise
No ratings yet
Formal and Informal Lang Exercise
3 pages
Unit 1 - Lesson C
No ratings yet
Unit 1 - Lesson C
29 pages
Upload Nodes Files 1503308317 PDF
No ratings yet
Upload Nodes Files 1503308317 PDF
168 pages
An Introduction To Sociolinguistics (Holmes, 2008)
No ratings yet
An Introduction To Sociolinguistics (Holmes, 2008)
31 pages
English Phrases 1 (Adj.)
100% (1)
English Phrases 1 (Adj.)
2 pages
Editat: My Life in 30 Words
No ratings yet
Editat: My Life in 30 Words
1 page
IELTS Reading - Short Answer Questions
No ratings yet
IELTS Reading - Short Answer Questions
5 pages
Business Communication 5 PDF Free
No ratings yet
Business Communication 5 PDF Free
28 pages
Focus On Grammar 5 Adverb - Clauses - PDF
No ratings yet
Focus On Grammar 5 Adverb - Clauses - PDF
4 pages
Circular 20230809163214 9 Blueprint
No ratings yet
Circular 20230809163214 9 Blueprint
7 pages
4MS Summary - Seq 02 Tasks 2020
100% (1)
4MS Summary - Seq 02 Tasks 2020
1 page
Proposition
No ratings yet
Proposition
1 page
DLL - English 6 - Q1 - W9
No ratings yet
DLL - English 6 - Q1 - W9
8 pages
Đáp án đề Duyên Hải tiếng Anh - Chuyên Hạ Long
No ratings yet
Đáp án đề Duyên Hải tiếng Anh - Chuyên Hạ Long
2 pages
6 Minute Vocabulary Words With 'Ough': BBC Learning English
No ratings yet
6 Minute Vocabulary Words With 'Ough': BBC Learning English
5 pages
Guide in The Language Review of Deped-Developed Adm Modules
No ratings yet
Guide in The Language Review of Deped-Developed Adm Modules
3 pages
Speaking Topic 38
No ratings yet
Speaking Topic 38
3 pages
Music Basics of Notating Pitch: A Little Help…Please!
From Everand
Music Basics of Notating Pitch: A Little Help…Please!
Lynette Haddock
No ratings yet
Modern Shorthand
From Everand
Modern Shorthand
Anon.
4.5/5 (3)
Major Scales and Key Signatures A Little Help…Please!
From Everand
Major Scales and Key Signatures A Little Help…Please!
Lynette Haddock
No ratings yet

Grammars and Parsing

Uploaded by

Grammars and Parsing

Uploaded by

Grammars and Parsing

• To compute syntactic structure of a sentence, we need two things:

In list notation this same structure could be represented

(S (NP (NAME John))

• CFGs are a very important class of grammars for two reasons:

• There are two basic methods of searching.

• => NAME ate the cat (rewriting John)

• When a group of words forms a particular constituent

• NP: I ate a hamburger and a hot dog..

Not The acceptable sentences:

• *I ate a hamburger and on the stove.

• John’s hitting of Mary is an NP in John’s hitting of Mary alarmed Sue,

• That is, up John’s phone number would be a PP.

• try the conjunction test–

• Thus apparently the analysis of up John ‘s phone number as a PP is incorrect.

• The algorithm manipulates a list of possible states---called the

2. ((NP VP) 1) rewriting S by rule 1

You might also like