nlp unit 3 part A pdf
nlp unit 3 part A pdf
UNIT-III
PART-A
Prepared
by
P V Komali,CSE(DS) Dept,PEC
Syntax Analysis:
• Syntax analysis in natural language processing (NLP) refers to the
process of identifying the structure of a sentence and its
component parts, such as phrases and clauses, based on the rules
of the language's syntax.
Example :
2. statistical parsing.
Step 1: Tokenization
• The first step is to break the sentence down into its individual
words, or tokens:
1. Natural Intonation
2. Summarization
3. Paraphrasing of text
Natural Intonation:
Example :
Example :
Example :
• Machine Translation
• Here's an example of a parse tree for the sentence "The cat sat
on the mat":
• This parse tree shows that the sentence is composed of a verb
phrase ("sat") and a prepositional phrase ("on the mat"), with
the verb phrase consisting of a verb ("sat") and a noun phrase
("the cat"). The noun phrase, in turn, consists of a determiner
("the") and a noun ("cat"), and the prepositional phrase consists
of a preposition ("on") and a noun phrase ("the mat").
• These parsers work by identifying patterns in the treebank data
and using these patterns to make predictions about the structure
of new sentences.
2. Dependency Graph
Constituency-Based Representations: or (Phrase Structure )
(S
How to construct ?
• S ----- > NP VP
• NP -----> Det N
• VP -----> V NP
• Top-Bottom Representation of Phrase Structure Graph
• Bottom-Top Representation of Phrase Structure Graph
Dependency-Based Representations:
• Dependency-based representations represent the structure of a
sentence as a directed graph, with each word in the sentence
represented as a node in the graph, and the relationships
between the words represented as directed edges.
• Identify the head nodes and dependent nodes and mark the
syntactic relationship between them using directed edges.
Chased
Cat Mouse
The The
• Here's an another example of a dependency-based representation
of the sentence "The cat sat on the mat":
• The nodes in the graph are labeled with the part of speech of the
corresponding word, and the edges are labeled with the
grammatical relationship between the two words.
• Here's an example of a dependency graph for the sentence
"The cat sat on the mat":
• In this graph, the word "cat" depends on the word "sat" with a
subject relationship,and the word "mat" depends on the word
"on" with a prepositional relationship.
• Dependency graphs are useful for a variety of NLP tasks,
including named entity recognition, relation extraction, and
sentiment analysis.
• Shift-Reduce Parsing
• Chart parsing
• Regex parsing
Recursive descent parsing
• The algorithm selects the first production rule that matches the
current input, and recursively applies it to its right-hand side
symbols.
• It is a top down parser and reads the input from left to right.
• It works by r e c u r s i v e l y de c e nd i ng t h r ou gh t h e i n p u t
string,checking each token against a set of production rules
defined by the grammar of the language.
• This process continues until the entire input string has been
successfully parsed or until an error is encountered.
Example: Consider the following context-free grammar for arithmetic
expressions:
• E -> E + T | E - T | T
• T -> T * F | T / F | F
• F -> ( E ) | num
1. Stack
2. Input Tape
Stack : Shift reduce parser uses a stack to hold / store the grammar.
Input Tape : Shift reduce parser uses input tape to holf the stirng.
• This parser performs two actions : shift and reduce that is why it is
knwon as shift reduce parser.
• Non terminals are the left side of the production rules and the
symbols are right side of the production rules.
• T ----> T+T
• T ----> T-T
• T ----> (T)
• T ----> C
• S ---> NP VP
• NP ---> Det N
• VP ---> N VP
• V ---> Chased
1. Initialization: We start by initializing an empty stack and an
input buffer with the sentence tokens "the", "cat", "chased", "the",
and "mouse". We also initialize a parse chart, which is a table used
to keep track of all the possible partial parses of the sentence.
2. Shifting: We shift the first token "the" onto the stack and the
first token "cat" into the lookahead buffer. The stack now contains
only the symbol "the".
3. Shifting again: We shift the next token "cat" onto the stack and
the next token "chased" into the lookahead buffer. The stack now
contains the symbols "the" and "cat"
4. Reduction: We check if the top of the stack and the next
token in the lookahead buffer can be reduced to a non-terminal
symbol using a production rule from the grammar. In this case,
"the" and "cat" can be reduced to the non-terminal symbol NP
using the production rule NP -> Det N. We pop "the" and "cat"
from the stack and push the non-terminal symbol NP onto the
stack.
7. Shifting again: We shift the next token "the" onto the stack and
the next token "mouse" into the lookahead buffer. The stack now
contains the symbols VP and "the"
8. Reduction again: We check if the top of the stack and the next
token in the lookahead buffer can be reduced to a non-terminal
symbol using a production rule from the grammar. In this case, "the"
and VP can be reduced to the non-terminal symbol S using the
production rule S -> NP VP. We pop "the" and VP from the stack
and push the non-terminal symbol S onto the stack.
• It is a dynamic parser.
• S ----> NP VP
• NP ----> DET N
• VP ----> NP PP
• VP ----> V NP
• VP ----> V NP PP
• PP ----> P NP
The ---> Det cat --- > noun chased ---> verb
Mininum Spanning Tree :
• Dependency parsing is a type of syntactic parsing that represents
the grammatical structure of a sentence as a directed acyclic
graph (DAG). The nodes of the graph represent the words of the
sentence, and the edges represent the syntactic relationships
between the words.
• We can use a MST algorithm to find the most likely parse for
this graph. One popular algorithm for this is the Chu-
Liu/Edmonds algorithm: