NLP
NLP
Processing
T. Muchabaiwa
Lecture Objectives
• Intro to NLP
• Components of NLP
• NLP terminology
• Steps in NLP
• Implementation of Semantic Analysis
• Text classification
• Somewhere around100,000 years ago, humans learned how
to speak, and about 7,000 years ago learned to write.
• There are two main reasons why we want our computer
agents to be able to process natural languages: first, to
communicate with humans, and second, to acquire
information from written language.
Intro to NLP
• Natural Language Processing (NLP) refers to AI method of
communicating with an intelligent systems using a natural
language such as English.
• Processing of Natural Language is required when you want an
intelligent system like robot to perform as per your
instructions, when you want to hear decision from a dialogue
based clinical expert system, etc.
• The field of NLP involves making computers to perform useful
tasks with the natural languages humans use. The input and
output of an NLP system can be −
• Speech
• Written Text
Components of NLP
• It is the grammar that consists rules with a single symbol on the left-
hand side of the rewrite rules. Let us create grammar to parse a
sentence −
“The bird pecks the grains”
• Articles (DET) − a | an | the
• Nouns − bird | birds | grain | grains
• Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
• = DET N | DET ADJ N
• Verbs − pecks | pecking | pecked
• Verb Phrase (VP) − NP V | V NP
• Adjectives (ADJ) − beautiful | small | chirping
• The parse tree breaks down the sentence into structured parts so that
the computer can easily understand and process it. In order for the
parsing algorithm to construct this parse tree, a set of rewrite rules,
which describe what tree structures are legal, need to be constructed.
Context-Free Grammar
…. cntd
• These rules say that a certain symbol may be expanded in the
tree by a sequence of other symbols. According to first order
logic rule, if there are two strings Noun Phrase (NP) and Verb
Phrase (VP), then the string combined by NP followed by VP is
a sentence. The rewrite rules for the sentence are as follows −
• S → NP VP
• NP → DET N | DET ADJ N
• VP → V NP
Lexocon −
• DET → a | the
• ADJ → beautiful | perching
• N → bird | birds | grain | grains
• V → peck | pecks | pecking
The parse tree can be created as shown −
However……
• Now consider the above rewrite rules. Since V can be replaced
by both, "peck" or "pecks", sentences such as "The bird peck the
grains" can be wrongly permitted. i. e. the subject-verb
agreement error is approved as correct.
Merit − The simplest style of grammar, therefore widely used one.
Demerits −
• They are not highly precise. For example, “The grains peck the
bird”, is a syntactically correct according to parser, but even if it
makes no sense, parser takes it as a correct sentence.
• To bring out high precision, multiple sets of grammar need to be
prepared. It may require a completely different sets of rules for
parsing singular and plural variations, passive sentences, etc.,
which can lead to creation of huge set of rules that are
unmanageable.
2. Top-Down Parser
• A training set is readily available: the positive (spam) examples are in spam folder, the negative
(ham) examples are in inbox. Here is an excerpt:
• Spam: Wholesale FashionWatches -57% today. Designer watches for cheap ...
• Spam: You can buy ViagraFr$1.85 All Medications at unbeatable prices! ...
• Spam: WE CAN TREAT ANYTHING YOU SUFFER FROM JUST TRUST US ...
• Spam: Sta.rt earn*ing the salary yo,u d-eserve by o’btaining the prope,r crede’ntials!