Unit V Natural Language Processing
Unit V Natural Language Processing
Involves processing of written text using computer models at lexical, syntactic, and semantic level.
Introduction:
1. Understanding (of language) – map text/speech to immediately useful form.
2. Generation (of language)
< 𝑁𝑜𝑢𝑛_𝑂𝑏𝑗 >: {Det-} & Obj+, Object noun is defined as the last word in a sentence and
labeled as Obj is connected to label Det on left in a sentence
< 𝑉𝑒𝑟𝑏 >: Sub- & {Obj+} Verb is defined as a word connected to label Obj on right in a
sentence and connected to label Sub on left.
The parsed sentence “the girl sings a song.” Appears as follows:
+ - - -Obj - - - - - - +
+ - - Det - - + - - Sub - - + + - - Det - - +
| | | | |
The:d girl:n sings:v a:d song:n
2. Chart Parser
3. Simple Transition Networks
4. Recursive Transition Networks
5. Augmented Transition Networks
Chart Parser
Chart parser store parses of intermediate constituents to be reused while trying alternative parsing path. A data
structure called chart is maintained to keep a record of the state of a bottom-up parse traversed. This structure is a
record of the positions of the words and new structure derived from the sentence. Chart also maintains the record
of rules that have been applied previously but are not complete. These rules are recorded as active arcs on the
chart. Chart parser provides exponential reductions of search space and enables coping with ambiguity in parsing.
It is compatible with a high degree of flexibility relative to search and control strategies.
Parsing by Chart Parser
There are two valid parse structures for the sentence, “the girls saw a man in the park with a cat” using grammar
defined below in Table 1 with propositional phrases. Both parse structures are shown in the figures 1 and 2 below.
Table 1. Sample Grammar
Grammar Rules Rule Number
< 𝑆 > → < 𝑁𝑃 >< 𝑉𝑃 > 1
< 𝑁𝑃 > → < 𝐷𝑒𝑡 >< 𝑁𝑜𝑢𝑛 > 2
< 𝑁𝑃 > → < 𝐷𝑒𝑡 >< 𝑁𝑜𝑢𝑛 >< 𝑃𝑃 > 3
< 𝑉𝑃 > → < 𝑉𝑒𝑟𝑏 >< 𝑁𝑃 > 4
< 𝑉𝑃 > → < 𝑉𝑒𝑟𝑏 >< 𝑁𝑃 >< 𝑃𝑃 > 5
< 𝑃𝑃 > → < 𝑃𝑟𝑒𝑝 >< 𝑁𝑃 > 6
S S
NP VP NP VP
NP NP
PP PP
NP NP
PP PP
NP NP
Det Noun Verb Det Noun Prep Det Noun Prep Det Noun Det Noun Verb Det Noun Prep Det Noun Prep Det Noun
The girls saw a man in the park with a cat The girls saw a man in the park with a cat
Transition Network TN: a convenient representation of grammar as network of nodes and labeled arcs.
Parsing:
It starts with start node and traversal takes place through arcs. If the current word in the sentence is in the
category on that arc then move to the next state in TN and continue the process till Pop arc is reached. If there are
no words left in the sentence at Pop arc then it is assumed to be correctly parsed.
NP Verb NP Pop
S: s0 s1 s2 s3
Jump
Universal Words
UWs are made up of a roman character string followed by a list of constraints. For example, dog is represented as
dog(icl>animal, where animal is upper UW. General definition of UW (BNF like grammar) is given below:
Universal Word UW Definition Meta symbol Interpretation
<UW> ::= <Head Word> [<Constraint List] < > For non-technical symbol or a variable
<Head Word> ::= <character> . . . “ ” for enclosed string is literal characters
<Constraint List> ::= “(“<Constraint> *“,” <Constraint>+”)” ::= for defined as
<Constraint> ::= <Relation Label> ,“>”- <UW> *<Constraint List+ . . . | for disjunction, “or”
<Relation Label> ::= “agt” | “and” |”adj”|”icl”| . . . [ ] for optional element
<character> ::= “A”| . . . |”Z”|”a”| . . .|0| . . . |9|”_”|” “|”#”|”!”| { } for alternative element
“$”|”%”|”=”|”^”|”~”|”|”|”@”|”+”|”-“|”<”|”>”|”?” . . . three dots for repetition more than one
time
Types of UWs:
1. Basic UW: English word with no restrictions.
2. Restricted UW: with constraint list attached. Example, ‘state(icl>situation)’ is a sense of ‘state’ that denotes a kind of situation.
3. Extra UW: mot found in English but used. Example, samba(icl>dance) – a kind of dance.
Binary Relation
Definition Interpretation
<Binary Relation> ::= <Relation Label> Relation Label is a string of two or three lower-case alphabetic
characters taken from the closed list.
*“:” <Compound UW-ID>+ “(“,<UW1> “:”UW-ID1]>} Compound UW-ID is a string of two digits used to identify each
*“:” <Compound UW-ID1>]”.”{<UW2> “:”<UW-ID2>} compound UWs.
Compound UW is a group of binary relations (called ‘Hyper-Nodes’).
*“:” <Compound UW-ID2>+ “)“
The UNL System consists of the Language Servers and basic tools. The language server resides in the network. UNL Dictionary stores concepts
represented by the language words. The grammar to define words of the language is followed. The knowledge base of the UNL system is
continually being expanded.