0% found this document useful (0 votes)
16 views

Machine 22

This document discusses different types of parsing including chart parsing, regex parsing, dependency parsing, and chunking. It provides examples of how to implement each type of parsing using the NLTK library in Python. Chart parsing uses dynamic programming to efficiently parse text. Regex parsing uses regular expressions to parse part-of-speech tagged sentences. Dependency parsing represents linguistic relationships between words using directed links. Chunking performs shallow parsing by grouping chunks of related words in a sentence without fully analyzing syntactic structure.

Uploaded by

shahzad sultan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Machine 22

This document discusses different types of parsing including chart parsing, regex parsing, dependency parsing, and chunking. It provides examples of how to implement each type of parsing using the NLTK library in Python. Chart parsing uses dynamic programming to efficiently parse text. Regex parsing uses regular expressions to parse part-of-speech tagged sentences. Dependency parsing represents linguistic relationships between words using directed links. Chunking performs shallow parsing by grouping chunks of related words in a sentence without fully analyzing syntactic structure.

Uploaded by

shahzad sultan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Chapter 4

A chart parser
We will apply the algorithm design technique of dynamic programming to the
parsing problem. Dynamic programming stores intermediate results and reuses
them when appropriate, achieving significant efficiency gains. This technique can be
applied to syntactic parsing. This allows us to store partial solutions to the parsing
task and then allows us to look them up when necessary in order to efficiently arrive
at a complete solution. This approach to parsing is known as chart parsing.

For a better understanding of the parsers, you can go through an


example at
http://www.nltk.org/howto/parse.html.

A regex parser
A regex parser uses a regular expression defined in the form of grammar on top of a
POS-tagged string. The parser will use these regular expressions to parse the given
sentences and generate a parse tree out of this. A working example of the regex
parser is given here:
# Regex parser
>>>chunk_rules=ChunkRule("<.*>+","chunk everything")
>>>import nltk
>>>from nltk.chunk.regexp import *
>>>reg_parser = RegexpParser('''
NP: {<DT>? <JJ>* <NN>*} # NP
P: {<IN>} # Preposition
V: {<V.*>} # Verb
PP: {<P> <NP>} # PP -> P NP
VP: {<V> <NP|PP>*} # VP -> V (NP|PP)*
''')
>>>test_sent="Mr. Obama played a big role in the Health insurance bill"
>>>test_sent_pos=nltk.pos_tag(nltk.word_tokenize(test_sent))
>>>paresed_out=reg_parser.parse(test_sent_pos)
>>> print paresed_out
Tree('S', [('Mr.', 'NNP'), ('Obama', 'NNP'), Tree('VP', [Tree('V',
[('played', 'VBD')]), Tree('NP', [('a', 'DT'), ('big', 'JJ'), ('role',
'NN')])]), Tree('P', [('in', 'IN')]), ('Health', 'NNP'), Tree('NP',
[('insurance', 'NN'), ('bill', 'NN')])])

[ 49 ]

www.it-ebooks.info
Parsing Structure in Text

The following is a graphical representation of the tree for the preceding code:

Root

NP VP

NNP NNP VBD NP PP

Mr. Obama played DT JJ NN IN NP

a big role in DT NNP NN NN

the Health insurance bill

In the current example, we define the kind of patterns (a regular expression of


the POS) we think will make a phrase, for example, anything that {<DT>? <JJ>*
<NN>*} has a starting determiner followed by an adjective and then a noun is mostly
a noun phrase. Now, this is more of a linguistic rule that we have defined to get the
rule-based parse tree.

Dependency parsing
Dependency parsing (DP) is a modern parsing mechanism. The main concept of DP
is that each linguistic unit (words) is connected with each other by a directed link.
These links are called dependencies in linguistics. There is a lot of work going on in
the current parsing community. While phrase structure parsing is still widely used
for free word order languages (Czech and Turkish), dependency parsing has turned
out to be more efficient.
A very clear distinction can be made by looking at the parse tree generated by phrase
structure grammar and dependency grammar for a given example, as the sentence
"The big dog chased the cat". The parse tree for the preceding sentence is:

[ 50 ]

www.it-ebooks.info
Chapter 4

NP VP

Art Adj N V NP

Art N

the big dog chased the cat The big dog chased the cat
Phrase Structure tree Dependency Tree

If we look at both parse trees, the phrase structures try to capture the relationship
between words and phrases and then eventually between phrases. While a
dependency tree just looks for a dependency between words, for example, big is
totally dependent on dog.
NLTK provides a couple of ways to do dependency parsing. One of them is to use
a probabilistic, projective dependency parser, but it has the restriction of training
with a limited set of training data. One of the state of the art dependency parsers is
a Stanford parser. Fortunately, NLTK has a wrapper around it and in the following
example, I will talk about how to use a Stanford parser with NLTK:
# Stanford Parser [Very useful]
>>>from nltk.parse.stanford import StanfordParser
>>>english_parser = StanfordParser('stanford-parser.jar', 'stanford-
parser-3.4-models.jar')
>>>english_parser.raw_parse_sents(("this is the english parser test")
Parse
(ROOT
(S
(NP (DT this))
(VP (VBZ is)
(NP (DT the) (JJ english) (NN parser) (NN test)))))
Universal dependencies
nsubj(test-6, this-1)
cop(test-6, is-2)
det(test-6, the-3)
amod(test-6, english-4)
compound(test-6, parser-5)
root(ROOT-0, test-6)

[ 51 ]

www.it-ebooks.info
Parsing Structure in Text

Universal dependencies, enhanced


nsubj(test-6, this-1)
cop(test-6, is-2)
det(test-6, the-3)
amod(test-6, english-4)
compound(test-6, parser-5)
root(ROOT-0, test-6)

The output looks quite complex but, in reality, it's not. The output is a list of three
major outcomes, where the first is just the POS tags and the parsed tree of the
given sentences. The same is plotted in a more elegant way in the following figure.
The second is the dependency and positions of the given words. The third is the
enhanced version of dependency:

Root

NP VP

DT VBZ NP

this is DT JJ NN NN

the english parser test

For a better understanding of how to use a Stanford parser, refer to


http://nlpviz.bpodgursky.com/home and
http://nlp.stanford.edu:8080/parser/index.jsp.

Chunking
Chunking is shallow parsing where instead of reaching out to the deep structure
of the sentence, we try to club some chunks of the sentences that constitute some
meaning.

[ 52 ]

www.it-ebooks.info
Chapter 4

A chunk can be defined as the minimal unit that can be processed. So, for example, the
sentence "the President speaks about the health care reforms" can be broken into two
chunks, one is "the President", which is noun dominated, and hence is called a noun
phrase (NP). The remaining part of the sentence is dominated by a verb, hence it is
called a verb phrase (VP). If you see, there is one more sub-chunk in the part "speaks
about the health care reforms". Here, one more NP exists that can be broken down
again in "speaks about" and "health care reforms", as shown in the following figure:

VP

The President speaks about The Health Care Reforms

NP NP

This is how we broke the sentence into parts and that's what we call chunking.
Formally, chunking can also be described as a processing interface to identify
non-overlapping groups in unrestricted text.

Now, we understand the difference between shallow and deep parsing. When we
reach the syntactic structure of the sentences with the help of CFG and understand
the syntactic structure of the sentence. Some cases we need to go for semantic
parsing to understand the meaning of the sentence. On the other hand, there are
cases where, we don't need analysis this deep. Let's say, from a large portion
of unstructured text, we just want to extract the key phrases, named entities, or
specific patterns of the entities. For this, we will go for shallow parsing instead of
deep parsing because deep parsing involves processing the sentence against all the
grammar rules and also the generation of a variety of syntactic tree till the parser
generates the best tree by using the process of backtracking and reiterating. This
entire process is time consuming and cumbersome and, even after all the processing,
you might not get the right parse tree. Shallow parsing guarantees the shallow parse
structure in terms of chunks which is relatively faster.

So, let's write some code snippets to do some basic chunking:


# Chunking
>>>from nltk.chunk.regexp import *
>>>test_sent="The prime minister announced he had asked the chief
government whip, Philip Ruddock, to call a special party room meeting for
9am on Monday to consider the spill motion."
>>>test_sent_pos=nltk.pos_tag(nltk.word_tokenize(test_sent))
>>>rule_vp = ChunkRule(r'(<VB.*>)?(<VB.*>)+(<PRP>)?', 'Chunk VPs')

[ 53 ]

www.it-ebooks.info

You might also like