0% found this document useful (0 votes)
10 views75 pages

nlp unit 3 part A pdf

This document discusses syntax analysis in natural language processing (NLP), detailing its importance in understanding sentence structure and meaning. It covers various approaches such as part-of-speech tagging, dependency parsing, and constituency parsing, along with examples and applications in NLP tasks. Additionally, it explains the use of treebanks and different representations of syntactic structure, including dependency graphs and phrase structure trees.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views75 pages

nlp unit 3 part A pdf

This document discusses syntax analysis in natural language processing (NLP), detailing its importance in understanding sentence structure and meaning. It covers various approaches such as part-of-speech tagging, dependency parsing, and constituency parsing, along with examples and applications in NLP tasks. Additionally, it explains the use of treebanks and different representations of syntactic structure, including dependency graphs and phrase structure trees.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

NATURAL LANGUAGE PROCESSING

UNIT-III
PART-A

Prepared
by
P V Komali,CSE(DS) Dept,PEC
Syntax Analysis:
• Syntax analysis in natural language processing (NLP) refers to the
process of identifying the structure of a sentence and its
component parts, such as phrases and clauses, based on the rules
of the language's syntax.

• There are several approaches to syntax analysis in NLP, including:

1. Part-of-speech (POS) tagging: This involves identifying the


syntactic category of each word in a sentence, such as noun, verb,
adjective, etc. This can be done using machine learning algorithms
trained on annotated corpora of text.
2. Dependency parsing: This involves identifying the relationships
between words in a sentence, such as subject-verb or object-verb
relationships. This can be done using a dependency parser, which
generates a parse tree that represents the relationships between
words.

3.Constituency parsing: This involves identifying the constituent


parts of a sentence, such as phrases and clauses. This can be done
using a phrase-structure parser, which generates a parse tree that
represents the structure of the sentence.
• Syntax analysis is important for many NLP tasks, such as named
entity recognition, sentiment analysis, and machine translation.
By understanding the syntactic structure of a sentence, NLP
systems can better identify the relationships between words and
the overall structure of the text, which can be used to extract
meaning and perform various downstream tasks.

Example :

• sentence 1 : Hyderabad is the capital of Telangana.

• sentence 2 : Is telangana the Hyderabad of capital.


• Here the two sentences having similar words.

• Every word in the sentence -1 is present in the sentence -2 also but


that doesn’t mean that these two sentences are conveying the
similar meaning.

• Sentence-1 is correct in the form of structure and is formed by


following a particular structure that is conveying some
meaning.So that it is considered as meaningful sentence.

• Sentecne -2 is not forming a particular structure and it is not


conveying any meaning from this sentence.
• Where as it is also formed by using some particular words but
not following any structure and that is not meaningful.

• The structure is very important in the formation of sentence so


that structure is termed as a syntax.

• In order to analyse the whole structure of a particular sentence


in a document in that occasion we are going to use syntax
analysis.

• Syntax analysis is the second phase of NLP.


• In a simple way syntax analysis is also referred as syntactic
analysis or parsing.

• Parsing uncovers the hidden structure of the input text.

• Parsing is nothing but dividing the sentence into smaller


units and tagging the POS.

• The syntactic analysis of language provides a means to


e x p l i c i t l y d i s c o v e r t h e v a r i o u s p re d i c a t e a r g u m e n t
dependencies that may exist in a sentence.
Example of predicate argument structure

• Predicate - It is a state or action described by a verb

• Argument - It indicates who or what is involved in state or


action.

• Eg: The elephant is sleeping

• In the above example predicate is nothing but simply verb i.e.,


verb in a sentence.So this verb describes a state or action.

• Finally it is described as the central action of a particular


sentence is nothing but verb and that verb is also called as
predicate.
• In the above example The cat is considered as an Argument.

• Argument is also termed as object or entity or noun.

• Finally syntax analysis is used to find out the predicate


argument dependencies in a particular sentence.

• By finding the predicate argument dependencies in a particular


sentence and it will helpful to findout a relationship between
words in a particular sentence.
Parsing Natural Language
• In natural language processing (NLP), syntax analysis, also
known as parsing, refers to the process of analyzing the
grammatical structure of a sentence in order to determine its
constituent parts, their relationships to each other, and their
functions within the sentence.

• This involves breaking down the sentence into its individual


components, such as nouns, verbs, adjectives, and phrases, and
then analyzing how these components are related to each other.
• There are two main approaches to syntax analysis in NLP:

1. rule-based parsing and

2. statistical parsing.

• Rule-based parsing involves the use of a set of pre-defined rules


that dictate how the different parts of speech and phrases in a
sentence should be structured and related to each other.

• Statistical parsing, on the other hand, uses machine learning


algorithms to learn patterns and relationships in large corpora of
text in order to generate parse trees for new sentences
• Here's an example of how syntax analysis works using a simple
sentence:

Sentence: "The cat sat on the mat."

Step 1: Tokenization

• The first step is to break the sentence down into its individual
words, or tokens:

• "The", "cat", "sat", "on", "the", "mat", "."

Step 2: Part of Speech Tagging

• Next, each token is assigned a part of speech tag, which


indicates its grammatical function in the sentence:
• "The" (determiner), "cat" (noun), "sat" (verb), "on" (preposition),
"the" (determiner),"mat" (noun), "." (punctuation)

Step 3: Dependency Parsing

• Finally, the relationships between the words in the sentence are


analyzed using a dependency parser to create a parse tree. In this
example, the parse tree might look something like this:
• This parse tree shows that "cat" is the subject of the verb "sat,"
and "mat" is the object of the preposition "on."

• Syntax analysis is a crucial component of many NLP tasks,


including machine translation, text-to-speech conversion, and
sentiment analysis.

• By understanding the grammatical structure of a sentence,


NLP models can more accurately interpret its meaning and
generate appropriate responses or translations.
• Before going to discuss about Parsing approaches we have
to know the motivations behind parsing.

• The motivation behind parsing natural language are

1. Natural Intonation

2. Summarization

3. Paraphrasing of text
Natural Intonation:

• Parsing can provide structural description that identifies a


break in the sentences.

Example :

(i) ‘They are going to the college’.

– The pitch may generally fall towards the end of sentence.

(ii) ‘Are they going to the college?’

– The pitch may rise towards the end of the sentences.


Summarization:

• Parsing is done to convert a large text into small text.

• It can reduce the number of words, but meaning remains the


same.

Example :

Beyond the basic level,the operation of the products vary widely.

The operations of products vary.


Paraphrasing of text:

• Paraphrases replaces the words/phrases with others phrases


without changing the meaning of sentence.

Example :

sentence 1 : open borders impy increasing racial fragmentation in


European countries.

sentence 2 : open borders simply increasing racial fragmentation in


the countries of Europe/European
states/Europe/European Nations/ the European
countries.
Applications of Syntactic parsers :

• Machine Translation

• Error Correction in text

• Speech Recognition system etc.


Treebanks: A Data-Driven Approach to Syntax:
• Treebanks are a data-driven approach to syntax analysis in
natural language processing (NLP). They consist of a large
collection of sentences, each of which has been manually
annotated with a parse tree that shows the syntactic structure
of the sentence.

• Treebanks are used to train statistical parsers, which can then


automatically analyze new sentences and generate their own
parse trees.
• A tree bank is a corpus of text annotated with syntactic
and grammatical structures.

• They are represented in the form of trees.

• Each sentence is parsed and broken down into its


constituent parts,creating a hierarchial tree that illustrates
the relationship between words and phrases.

• Each node in the tree represents a word and the branches


between nodes indicate syntactic relationships such as
subject-verb connections or dependencies.
• A parse tree is a hierarchical structure that represents the
syntactic structure of a sentence.

• Each node in the tree represents a constituent of the sentence,


such as a noun phrase or a verb phrase.

• The edges of the tree represent the relationships between these


constituents, such as subject-verb or verb-object relationships.

• Here's an example of a parse tree for the sentence "The cat sat
on the mat":
• This parse tree shows that the sentence is composed of a verb
phrase ("sat") and a prepositional phrase ("on the mat"), with
the verb phrase consisting of a verb ("sat") and a noun phrase
("the cat"). The noun phrase, in turn, consists of a determiner
("the") and a noun ("cat"), and the prepositional phrase consists
of a preposition ("on") and a noun phrase ("the mat").
• These parsers work by identifying patterns in the treebank data
and using these patterns to make predictions about the structure
of new sentences.

• For example, a statistical parser might learn that a noun phrase


is usually followed by a verb phrase and use this pattern to
generate a parse tree for a new sentence.

• Treebanks are an important resource in NLP, as they allow


researchers and developers to train and test statistical parsers
and other models that rely on syntactic analysis.
• Some well-known treebanks include the Penn Treebank and the
Universal Dependencies treebanks.

• These resources are publicly available and have been used in a


wide range of NLP research and applications.
Representation of Syntactic Structure:
• In natural language processing (NLP), the representation of
syntactic structure refers to how the structure of a sentence is
represented in a machine-readable form.

• There are several different ways to represent syntactic


structure, including constituency-based representations and
dependency-based representations.

• There are 2 main approaches

1. Phrase Structure Graph

2. Dependency Graph
Constituency-Based Representations: or (Phrase Structure )

• Constituency-based representations, also known as phrase


structure trees,represent the structure of a sentence as a
hierarchical tree structure, with each node in the tree
representing a constituent of the sentence.

• The nodes are labeled with a grammatical category such as


noun phrase (NP) or verb phrase (VP), and the branches
represent the syntactic relationships between the nodes.
Constituency-based representations are often used in rule-
based approaches to parsing.
• Here's an example of a constituency-based representation of the sentence
"The cat sat on the mat":

(S

(NP (DT The) (NN cat))

(VP (VBD sat)

(PP (IN on)

(NP (DT the) (NN mat))))

• This representation shows that the sentence is composed of a noun phrase


("The cat") and a verb phrase ("sat on the mat"),with the verb phrase
consisting of a verb ("sat") and a prepositional phrase ("on the mat"), and
the prepositional phrase consisting of a preposition ("on") and a noun
phrase ("the mat").
Phrase Structure Graph

• Phrase structure Graph is also known as Constituency Graph

• It is introduced by Noam Chomsky.

• Each node in the constituent tree represents a phrase ,and the


edges represent the hierarchical relationships between these
phrases.

• Phrases can include constituents such as noun phrases


(NP),verb Phrases(VP),prepositional phrases (PP),etc.

• Nodes ------> a Phrase

• Edges -------> relationships


Example :

sentence : “ The cat chased the mouse”

Node ------> The cat,chased,the mouse

How to construct ?

1. Aware about the rules

2. Tag the sentence with the phrases.

3. Hierarchical Representation (Top-Bottom or Bottom-Top)

• S ----- > NP VP

• NP -----> Det N

• VP -----> V NP
• Top-Bottom Representation of Phrase Structure Graph
• Bottom-Top Representation of Phrase Structure Graph
Dependency-Based Representations:
• Dependency-based representations represent the structure of a
sentence as a directed graph, with each word in the sentence
represented as a node in the graph, and the relationships
between the words represented as directed edges.

• The edges are labeled with a grammatical function such as


subject (SUBJ) or object (OBJ), and the nodes are labeled with
a part-of-speech tag such as noun (N) or verb (V).
Dependency-based representations are often used in statistical
approaches to parsing.
• It is introduced by Lucien Tesniere.

• In a dependency graph,words in a sentence are represented as


nodes,and the syntactic relationships between these words are
represented as directed edges.

• Each edge in the dependency graph indicates a dependency


relationship between a governor word(head) and a dependent
word(child).

• Dependency graphs typically rooted at a single node


representing the main verb of the sentence.
Example:
Sentence : The cat chased the mouse

Nodes : The cat,chased,the, mouse

How to construct the dependency graph?

• Identify the main verb in the sentence.

• Identify the head nodes and dependent nodes and mark the
syntactic relationship between them using directed edges.
Chased

Cat Mouse

The The
• Here's an another example of a dependency-based representation
of the sentence "The cat sat on the mat":

• This representation shows that the verb "sat" depends on the


subject "cat," and the preposition "on" depends on the object
"mat."
Syntax Analysis Using Dependency Graphs:

• Syntax analysis using dependency graphs is a popular approach


in natural language processing (NLP). Dependency graphs
represent the syntactic structure of a sentence as a directed graph,
where each word is a node in the graph and the relationships
between words are represented as directed edges.

• The nodes in the graph are labeled with the part of speech of the
corresponding word, and the edges are labeled with the
grammatical relationship between the two words.
• Here's an example of a dependency graph for the sentence
"The cat sat on the mat":

• In this graph, the word "cat" depends on the word "sat" with a
subject relationship,and the word "mat" depends on the word
"on" with a prepositional relationship.
• Dependency graphs are useful for a variety of NLP tasks,
including named entity recognition, relation extraction, and
sentiment analysis.

• They can also be used for parsing and syntactic analysis, as


they provide a compact and expressive way to represent the
structure of a sentence.

• One advantage of dependency graphs is that they are simpler


and more efficient than phrase structure trees, which can be
computationally expensive to build and manipulate.
• Dependency graphs also provide a more flexible
representation of syntactic structure, as they can easily capture
non-projective dependencies and other complex relationships
between words.

• Here's another example of a dependency graph for the


sentence "I saw the man with the telescope":
• This graph shows that the verb "saw" depends on the subject "I,"
and that the noun phrase "the man" depends on the verb "saw"
with an object relationship. The prepositional phrase "with the
telescope" modifies the noun phrase "the man," with the word
"telescope" being the object of the preposition with."
• In summary, dependency graphs provide a flexible and
efficient way to represent the syntactic structure of a sentence
in NLP. They can be used for a variety of tasks and are a key
component of many state-of-the-art NLP models.

Syntax Analysis Using Phrase Structure Trees:

• Syntax analysis, also known as parsing, is the process of


analyzing the grammatical structure of a sentence to identify
its constituent parts and the relationships between them.

• In natural language processing (NLP), phrase structure trees


are often used to represent the syntactic structure of a sentence.
• A phrase structure tree, also known as a parse tree or a syntax tree,
is a graphical representation of the syntactic structure of a sentence.

• It consists of a hierarchical structure of nodes, where each node


represents a phrase or a constituent of the sentence.

• Here's an example of a phrase structure tree for the sentence "The


cat sat on the mat":
• In this tree, the top-level node represents the entire sentence (S),
which is divided into two subparts: the noun phrase (NP) "The
cat" and the verb phrase (VP) "sat on the mat". The NP is further
divided into a determiner (Det) "The" and a noun (N) "cat".

• The VP is composed of a verb (V) "sat" and a prepositional


phrase (PP) "on the mat",which itself consists of a preposition (P)
"on" and another noun phrase (NP) "the mat".

• Here's another example of a phrase structure tree for the sentence


"John saw the man with the telescope":
• In this tree, the top-level node represents the entire sentence (S),
which is divided into a noun phrase (NP) "John" and a verb
phrase (VP) "saw the man with the telescope".

• The NP is simply a single noun (N) "John". The VP is composed


of a verb (V) "saw" and a prepositional phrase (PP) "with the
telescope", which itself consists of a preposition (P) "with" and
another noun phrase (NP) "the man with the telescope".

• The latter is composed of a determiner (Det) "the" and a noun (N)


"man",which is modified by another prepositional phrase "with
the telescope", consisting of a preposition (P) "with" and a noun
phrase (NP) "the telescope".
• Phrase structure trees can be used in NLP for a variety of tasks,
such as machine translation, text-to-speech synthesis, and
natural language understanding. By identifying the syntactic
structure of a sentence, computers can more accurately
understand its meaning and generate appropriate responses.
Parsing Algorithms:

• There are several algorithms used in natural language processing


(NLP) for syntax analysis or parsing, each with its own strengths
and weaknesses.

• Parsing algorithms helps in analyzing the syntactic structure


of Natural Language sentence.

• Parsing algorithms help in understanding and applying the


grammatical rules,allowing systems to generate
grammatically correct sentences.
• Here are three common parsing algorithms and their examples:

• Recursive descent parsing

• Shift-Reduce Parsing

• Chart parsing

• Regex parsing
Recursive descent parsing

• This is a top-down parsing algorithm that starts with the top-level


symbol (usually the sentence) and recursively applies production
rules to derive the structure of the sentence.

• Each production rule corresponds to a non-terminal symbol in the


grammar, which can be expanded into a sequence of other symbols.

• The algorithm selects the first production rule that matches the
current input, and recursively applies it to its right-hand side
symbols.

• This process continues until a match is found for every terminal


symbol in the input.
• A Recursive descent parser is a type of parsing technique
commonly used to analyze the syntax of a language.

• It is a top down parser and reads the input from left to right.

• It works by r e c u r s i v e l y de c e nd i ng t h r ou gh t h e i n p u t
string,checking each token against a set of production rules
defined by the grammar of the language.

• This process continues until the entire input string has been
successfully parsed or until an error is encountered.
Example: Consider the following context-free grammar for arithmetic
expressions:

• E -> E + T | E - T | T

• T -> T * F | T / F | F

• F -> ( E ) | num

• Suppose we want to parse the expression "3 + 4 * (5 - 2)" using


recursive descent parsing. The algorithm would start with the top-
level symbol E and apply the first production rule E -> E + T. It
would then recursively apply the production rules for E, T,and F
until it reaches the terminals "3", "+", "4", "*", "(", "5", "-", "2",
and ")". The resulting parse tree would look like this:
E
_____|_____
| T
E _____|_____
| T F
num(3) num(5) (E)
____|____
| T
E |
num(2) num(1)
Shift Reduce Parsing :
• This is a bottom-up parsing algorithm that starts with the input
tokens and constructs a parse tree by repeatedly shifting a
token onto a stack and reducing a group of symbols on the
stack to a single symbol based on the production rules.

• The algorithm maintains a parse table that specifies which


actions to take based on the current state and the next input
symbol.

• It works by incrementally reducing a stack of symbols to a


single non-terminal symbol that matches a production rule.
• Shift reduce parser is used to reduce a string to start symbol of a
given grammar.

• It is classified into two parts :

1. Stack

2. Input Tape

Stack : Shift reduce parser uses a stack to hold / store the grammar.

Input Tape : Shift reduce parser uses input tape to holf the stirng.

• This parser performs two actions : shift and reduce that is why it is
knwon as shift reduce parser.

• At the shift action,the current symbol in the input string is pushed


to the stack.
• At the reduction,the symbols will be replaced by the non-
terminals.

• Non terminals are the left side of the production rules and the
symbols are right side of the production rules.

Example : Production rules

• T ----> T+T

• T ----> T-T

• T ----> (T)

• T ----> C

• Consider the string as : C1-(C2+C3)


• Here is an another example of how shift-reduce parsing can be
used to parse the sentence "the cat chased the mouse" using a
simple grammar:

• S ---> NP VP

• NP ---> Det N

• VP ---> N VP

• Det ---> the

• N ---> Cat | Mouse

• V ---> Chased
1. Initialization: We start by initializing an empty stack and an
input buffer with the sentence tokens "the", "cat", "chased", "the",
and "mouse". We also initialize a parse chart, which is a table used
to keep track of all the possible partial parses of the sentence.

2. Shifting: We shift the first token "the" onto the stack and the
first token "cat" into the lookahead buffer. The stack now contains
only the symbol "the".

3. Shifting again: We shift the next token "cat" onto the stack and
the next token "chased" into the lookahead buffer. The stack now
contains the symbols "the" and "cat"
4. Reduction: We check if the top of the stack and the next
token in the lookahead buffer can be reduced to a non-terminal
symbol using a production rule from the grammar. In this case,
"the" and "cat" can be reduced to the non-terminal symbol NP
using the production rule NP -> Det N. We pop "the" and "cat"
from the stack and push the non-terminal symbol NP onto the
stack.

5. Shifting again: We shift the next token "chased" onto the


stack and the next token "the" into the lookahead buffer. The
stack now contains the symbols NP and "chased".
6. Reduction again: We check if the top of the stack and the next
token in the lookahead buffer can be reduced to a non-terminal
symbol using a production rule from the grammar. In this case,
"chased" and NP can be reduced to the non-terminal symbol VP
using the production rule VP -> V NP. We pop "chased" and NP
from the stack and push the non-terminal symbol VP onto the stack.

7. Shifting again: We shift the next token "the" onto the stack and
the next token "mouse" into the lookahead buffer. The stack now
contains the symbols VP and "the"
8. Reduction again: We check if the top of the stack and the next
token in the lookahead buffer can be reduced to a non-terminal
symbol using a production rule from the grammar. In this case, "the"
and VP can be reduced to the non-terminal symbol S using the
production rule S -> NP VP. We pop "the" and VP from the stack
and push the non-terminal symbol S onto the stack.

9. Completion: The stack now contains only the symbol S, which is


the final parse of the input sentence. We can also look at the parse
chart to see all the possible partial parses that were considered
during the parsing process. The final parse tree for the sentence is:
Hypergraphs and Chart Parsing:
• Hypergraphs and chart parsing are two related concepts used in
natural language processing (NLP) for syntactic parsing.

• Hypergraphs represent a generalization of traditional parse trees,


allowing for more complex structures and more efficient parsing
algorithms.

• A hypergraph consists of a set of nodes (representing words or


phrases in the input sentence) and a set of hyperedges, which
connect nodes and represent higher-level structures.

• A chart, on the other hand, is a data structure used in chart parsing


to efficiently store and manipulate all possible partial parses of a
sentence.
Chart Parser :

• A Chart Parser is a type of top down parsing technique.

• It is used to analyze the syntactic structure of given sentences based


on a given grammar.

• It is a dynamic parser.

• Tha parser incrementally builds parse trees for sentences by


combining smaller constituents into larger ones according to the
rules of grammar.

• Here is an example of how chart parsing can be used to parse the


sentence "the cat chased the mouse" using a simple grammar:
Grammar Rules :

• S ----> NP VP

• NP ----> DET N

• VP ----> NP PP

• VP ----> V NP

• VP ----> V NP PP

• PP ----> P NP

Example : “The cat chased the mouse”

Tag with part of speech

The ---> Det cat --- > noun chased ---> verb
Mininum Spanning Tree :
• Dependency parsing is a type of syntactic parsing that represents
the grammatical structure of a sentence as a directed acyclic
graph (DAG). The nodes of the graph represent the words of the
sentence, and the edges represent the syntactic relationships
between the words.

• Minimum spanning tree (MST) algorithms are often used for


dependency parsing, as they provide an efficient way to find the
most likely parse for a sentence given a set of syntactic
dependencies.
• Here's an example of how a MST algorithm can be used for
dependency parsing:

• Consider the sentence "The cat chased the mouse". We can


represent this sentence as a graph with nodes for each word and
edges representing the syntactic dependencies between them:

• We can use a MST algorithm to find the most likely parse for
this graph. One popular algorithm for this is the Chu-
Liu/Edmonds algorithm:

You might also like