NLP

The document provides an overview of Natural Language Processing (NLP), its components, and the challenges involved in understanding natural language. It discusses the steps in NLP, including lexical and syntactic analysis, semantic analysis, and text classification, highlighting the importance of algorithms like Context-Free Grammar and Top-Down Parser. Additionally, it covers the application of NLP in tasks such as spam detection and sentiment analysis.

Uploaded by

natasha muunganirwa

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

NLP

Uploaded by

natasha muunganirwa

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Natural Language

Processing
T. Muchabaiwa
Lecture Objectives
• Intro to NLP
• Components of NLP
• NLP terminology
• Steps in NLP
• Implementation of Semantic Analysis
• Text classification
• Somewhere around100,000 years ago, humans learned how
to speak, and about 7,000 years ago learned to write.
• There are two main reasons why we want our computer
agents to be able to process natural languages: first, to
communicate with humans, and second, to acquire
information from written language.
Intro to NLP
• Natural Language Processing (NLP) refers to AI method of
communicating with an intelligent systems using a natural
language such as English.
• Processing of Natural Language is required when you want an
intelligent system like robot to perform as per your
instructions, when you want to hear decision from a dialogue
based clinical expert system, etc.
• The field of NLP involves making computers to perform useful
tasks with the natural languages humans use. The input and
output of an NLP system can be −
• Speech
• Written Text
Components of NLP

• There are two components of NLP as given −

Natural Language Understanding (NLU)
• Understanding involves the following tasks −
• Mapping the given input in natural language into useful representations.
• Analyzing different aspects of the language.
Natural Language Generation (NLG)
• It is the process of producing meaningful phrases and sentences in the
form of natural language from some internal representation.
• It involves −
• Text planning − It includes retrieving the relevant content from knowledge
base.
• Sentence planning − It includes choosing required words, forming meaningful
phrases, setting tone of the sentence.
• Text Realization − It is mapping sentence plan into sentence structure.

• The NLU is harder than NLG.

Difficulties in NLU

• NL has an extremely rich form and structure.

• It is very ambiguous. There can be different levels of ambiguity −
• Lexical ambiguity − It is at very primitive level such as word-
level.
• For example, treating the word “board” as noun or verb?
• Syntax Level ambiguity − A sentence can be parsed in different
ways.
• For example, “He lifted the beetle with red cap.” − Did he use cap
to lift the beetle or he lifted a beetle that had red cap?
• Referential ambiguity − Referring to something using pronouns.
For example, Rima went to Gauri. She said, “I am tired.” −
Exactly who is tired?
• One input can mean different meanings.
• Many inputs can mean the same thing.
NLP Terminology

• Phonology − It is study of organizing sound systematically.

• Morphology − It is a study of construction of words from primitive
meaningful units.
• Morpheme − It is primitive unit of meaning in a language.
• Syntax − It refers to arranging words to make a sentence. It also involves
determining the structural role of words in the sentence and in phrases.
• Semantics − It is concerned with the meaning of words and how to
combine words into meaningful phrases and sentences.
• Pragmatics − It deals with using and understanding sentences in
different situations and how the interpretation of the sentence is
affected.
• Discourse − It deals with how the immediately preceding sentence can
affect the interpretation of the next sentence.
• World Knowledge − It includes the general knowledge about the world.
Steps in NLP
• Lexical Analysis − It involves
identifying and analyzing the
structure of words. Lexicon of a
language means the collection of
words and phrases in a language.
Lexical analysis is dividing the
whole chunk of txt into
paragraphs, sentences, and
words.
• Syntactic Analysis (Parsing) − It
involves analysis of words in the
sentence for grammar and
arranging words in a manner that
shows the relationship among the
words. The sentence such as “The
school goes to boy” is rejected by
English syntactic analyzer.
Steps in NLP…… cntd
• Semantic Analysis − It draws the exact
meaning or the dictionary meaning
from the text. The text is checked for
meaningfulness. It is done by mapping
syntactic structures and objects in the
task domain. The semantic analyzer
disregards sentence such as “hot ice-
cream”.
• Discourse Integration − The meaning of
any sentence depends upon the
meaning of the sentence just before it.
In addition, it also brings about the
meaning of immediately succeeding
sentence.
• Pragmatic Analysis − During this, what
was said is re-interpreted on what it
actually meant. It involves deriving
those aspects of language which
require real world knowledge.
Implementation Aspects of Syntactic Analysis

• There are a number of algorithms researchers have developed

for syntactic analysis, but we consider only the following
simple methods −
• Context-Free Grammar
• Top-Down Parser
Context-Free Grammar

• It is the grammar that consists rules with a single symbol on the left-
hand side of the rewrite rules. Let us create grammar to parse a
sentence −
“The bird pecks the grains”
• Articles (DET) − a | an | the
• Nouns − bird | birds | grain | grains
• Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
• = DET N | DET ADJ N
• Verbs − pecks | pecking | pecked
• Verb Phrase (VP) − NP V | V NP
• Adjectives (ADJ) − beautiful | small | chirping
• The parse tree breaks down the sentence into structured parts so that
the computer can easily understand and process it. In order for the
parsing algorithm to construct this parse tree, a set of rewrite rules,
which describe what tree structures are legal, need to be constructed.
Context-Free Grammar
…. cntd
• These rules say that a certain symbol may be expanded in the
tree by a sequence of other symbols. According to first order
logic rule, if there are two strings Noun Phrase (NP) and Verb
Phrase (VP), then the string combined by NP followed by VP is
a sentence. The rewrite rules for the sentence are as follows −
• S → NP VP
• NP → DET N | DET ADJ N
• VP → V NP
Lexocon −
• DET → a | the
• ADJ → beautiful | perching
• N → bird | birds | grain | grains
• V → peck | pecks | pecking
The parse tree can be created as shown −
However……
• Now consider the above rewrite rules. Since V can be replaced
by both, "peck" or "pecks", sentences such as "The bird peck the
grains" can be wrongly permitted. i. e. the subject-verb
agreement error is approved as correct.
Merit − The simplest style of grammar, therefore widely used one.
Demerits −
• They are not highly precise. For example, “The grains peck the
bird”, is a syntactically correct according to parser, but even if it
makes no sense, parser takes it as a correct sentence.
• To bring out high precision, multiple sets of grammar need to be
prepared. It may require a completely different sets of rules for
parsing singular and plural variations, passive sentences, etc.,
which can lead to creation of huge set of rules that are
unmanageable.
2. Top-Down Parser

• Here, the parser starts with the S symbol and attempts to

rewrite it into a sequence of terminal symbols that matches
the classes of the words in the input sentence until it consists
entirely of terminal symbols.
• These are then checked with the input sentence to see if it
matched. If not, the process is started over again with a
different set of rules. This is repeated until a specific rule is
found which describes the structure of the sentence.
Merit − It is simple to implement.
Demerits −
• It is inefficient, as the search process has to be repeated if an
error occurs.
• Slow speed of working.
Text Classification
• Also known as categorization: Given a text of some kind, decide which of a predefined set of
classes it belongs to.
• Language identification and genre classification are examples of text classification, as is
sentiment analysis.
• SPAM DETECTION (classifying a movie or product review as positive or negative) and spam
detection (classifying an email message as spam or not-spam).
• We can treat spam detection as a problem in supervised learning.

• A training set is readily available: the positive (spam) examples are in spam folder, the negative
(ham) examples are in inbox. Here is an excerpt:
• Spam: Wholesale FashionWatches -57% today. Designer watches for cheap ...
• Spam: You can buy ViagraFr$1.85 All Medications at unbeatable prices! ...
• Spam: WE CAN TREAT ANYTHING YOU SUFFER FROM JUST TRUST US ...
• Spam: Sta.rt earn*ing the salary yo,u d-eserve by o’btaining the prope,r crede’ntials!

• Inbox: The practical significance of hypertree width in identifying more ...

• Inbox: Abstract: We will motivate the problem of social identity clustering: ...
• Inbox: Good to see you my friend. Hey Peter, It was good to hear from you. ...
• Inbox: PDS implies convexity of the resulting optimization problem (Kernel Ridge ...
Text Classification …. contd
• From this excerpt we can start to get an idea of what might be
good features to include in the supervised learning model.
• Word combinations such as “for cheap” and “You can buy” seem
to be indicators of spam (although they would have a nonzero
probability in inbox as well).
• Character-level features also seem important: spam is more likely
to be all uppercase and to have punctuation embedded in words.
• Apparently the spammers thought that the word bigram “you
deserve” would be too indicative of spam, and thus wrote “yo,u
d-eserve” instead.
• A character model should detect this. We could either create a
full character possible word combinations of spam and non
spam, or we could handcraft features such as “number of
punctuation marks embedded in words.”

Spectrum LA Grade4.compressed PDF
56% (9)
Spectrum LA Grade4.compressed PDF
16 pages
Comparatives and Superlatives. PDF
No ratings yet
Comparatives and Superlatives. PDF
4 pages
Natural Language Processing (NPL) : Group Name: Goal Diggers
No ratings yet
Natural Language Processing (NPL) : Group Name: Goal Diggers
22 pages
A Suffix Based Morphological Analysis of Assamese Word Formation
100% (2)
A Suffix Based Morphological Analysis of Assamese Word Formation
5 pages
1st Quarter English Grade 5
No ratings yet
1st Quarter English Grade 5
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Unit_5_NLP
No ratings yet
Unit_5_NLP
16 pages
Introduction To Natural Language Processing and NLTK
No ratings yet
Introduction To Natural Language Processing and NLTK
23 pages
ai 6
No ratings yet
ai 6
55 pages
Natural Language Processing
No ratings yet
Natural Language Processing
72 pages
Natural Language Processing
No ratings yet
Natural Language Processing
13 pages
NLP UNIT-II
No ratings yet
NLP UNIT-II
71 pages
AI Unit 5
No ratings yet
AI Unit 5
18 pages
Chapter 7 - Communication Perceving and Acting
No ratings yet
Chapter 7 - Communication Perceving and Acting
21 pages
nlp unit 3 part A pdf
No ratings yet
nlp unit 3 part A pdf
75 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
63 pages
Natural Language Processing
No ratings yet
Natural Language Processing
57 pages
5.natural Language Processing
No ratings yet
5.natural Language Processing
5 pages
Ai Phases in NLP Sem Vi
No ratings yet
Ai Phases in NLP Sem Vi
3 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
12 pages
1.introduction To Natural Language Processing (NLP)
100% (1)
1.introduction To Natural Language Processing (NLP)
37 pages
Chapter 5 - Communication Perceving and Acting
No ratings yet
Chapter 5 - Communication Perceving and Acting
20 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
118 pages
Unit II
No ratings yet
Unit II
61 pages
NLP Self
No ratings yet
NLP Self
22 pages
What Is Parsing
No ratings yet
What Is Parsing
47 pages
NLP UNIT-1
No ratings yet
NLP UNIT-1
37 pages
Chapter15 NaturalLanguage
100% (1)
Chapter15 NaturalLanguage
35 pages
AI6122 Topic 1.2 - WordLevel
No ratings yet
AI6122 Topic 1.2 - WordLevel
63 pages
Unit V
No ratings yet
Unit V
16 pages
Natural Language Processing
No ratings yet
Natural Language Processing
15 pages
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
No ratings yet
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
28 pages
Unit - 5
No ratings yet
Unit - 5
13 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
artificial intelligence computer science ppt
No ratings yet
artificial intelligence computer science ppt
31 pages
7-text classification-13-11-2024
No ratings yet
7-text classification-13-11-2024
53 pages
تعلم ML4 (1)
No ratings yet
تعلم ML4 (1)
42 pages
NLP SEM QUESTIONS AND ANSWERS
No ratings yet
NLP SEM QUESTIONS AND ANSWERS
72 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
6. Applications of NLP
No ratings yet
6. Applications of NLP
85 pages
AP for NLP-LO1
No ratings yet
AP for NLP-LO1
61 pages
4. Deep Parsing and Tools for NLP
No ratings yet
4. Deep Parsing and Tools for NLP
50 pages
AIML-HC Mod 04
No ratings yet
AIML-HC Mod 04
71 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
unit-1
No ratings yet
unit-1
23 pages
Natural Language Processing
100% (1)
Natural Language Processing
21 pages
AP for NLP-Word 2 Vec
No ratings yet
AP for NLP-Word 2 Vec
33 pages
NLP-Unit-1-part1
No ratings yet
NLP-Unit-1-part1
61 pages
3.1 Natural Language Processing
No ratings yet
3.1 Natural Language Processing
5 pages
Introduction To NLP
No ratings yet
Introduction To NLP
68 pages
Natural Language Processing Using Java: Sang Venkatraman April 21, 2015
No ratings yet
Natural Language Processing Using Java: Sang Venkatraman April 21, 2015
51 pages
NLP - 1_250119_222702 (1)
No ratings yet
NLP - 1_250119_222702 (1)
71 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages
Natural Language Processing: Dr. Abdulfetah A.A
No ratings yet
Natural Language Processing: Dr. Abdulfetah A.A
25 pages
Vector Semantics
No ratings yet
Vector Semantics
83 pages
AI Ch-12 Natural Language Processing
No ratings yet
AI Ch-12 Natural Language Processing
5 pages
Module 14
No ratings yet
Module 14
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
27 pages
Seminar Report
No ratings yet
Seminar Report
12 pages
NLP- Part-II
No ratings yet
NLP- Part-II
39 pages
13-Dependency Grammar-03-09-2024
No ratings yet
13-Dependency Grammar-03-09-2024
31 pages
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
BIOMATHEMATICS
No ratings yet
BIOMATHEMATICS
21 pages
METABOLITES
No ratings yet
METABOLITES
35 pages
PLANT GROWTH
No ratings yet
PLANT GROWTH
13 pages
SECURE PROGRAMMING
No ratings yet
SECURE PROGRAMMING
19 pages
WEB APPLICATION SECURITY
No ratings yet
WEB APPLICATION SECURITY
16 pages
CYBER THREATS principles
No ratings yet
CYBER THREATS principles
28 pages
CRYPTANALYSIS
No ratings yet
CRYPTANALYSIS
12 pages
Chromatography
No ratings yet
Chromatography
19 pages
SECURITY GOVERNANCE AND COMPLIANCE
No ratings yet
SECURITY GOVERNANCE AND COMPLIANCE
11 pages
IMMUNODIFFUSION
No ratings yet
IMMUNODIFFUSION
6 pages
Long-term Finance-source _ Topic 6-1
No ratings yet
Long-term Finance-source _ Topic 6-1
43 pages
SECURITY RISK MANAGEMENT
No ratings yet
SECURITY RISK MANAGEMENT
25 pages
Land and forestry notes
No ratings yet
Land and forestry notes
51 pages
Biosafety Regulations
No ratings yet
Biosafety Regulations
24 pages
plant nyacrismadz
No ratings yet
plant nyacrismadz
26 pages
FARM MANAGEMENT SYSTEM documentation
No ratings yet
FARM MANAGEMENT SYSTEM documentation
5 pages
Sources of Information
No ratings yet
Sources of Information
11 pages
research_title_and_proposal guide
No ratings yet
research_title_and_proposal guide
8 pages
Transgenic plants RE-1
No ratings yet
Transgenic plants RE-1
20 pages
Barriers to Effective Communication
No ratings yet
Barriers to Effective Communication
10 pages
Presenation Genetic Engineering
No ratings yet
Presenation Genetic Engineering
14 pages
Plant BIO presentation
No ratings yet
Plant BIO presentation
31 pages
2. BASIC LINEAR ALGEBRA
No ratings yet
2. BASIC LINEAR ALGEBRA
31 pages
Gene technology presentation-1
No ratings yet
Gene technology presentation-1
28 pages
Working with nucleic acids
No ratings yet
Working with nucleic acids
23 pages
Science presentation
No ratings yet
Science presentation
10 pages
A Sister's Promise
No ratings yet
A Sister's Promise
580 pages
Assignment One
No ratings yet
Assignment One
1 page
2. LINEAR PROGRAMMING NOTES
No ratings yet
2. LINEAR PROGRAMMING NOTES
25 pages
RESEARCH ETHICS
No ratings yet
RESEARCH ETHICS
31 pages
Adverbs of Degree List
No ratings yet
Adverbs of Degree List
12 pages
Modal Verbs PDF
100% (1)
Modal Verbs PDF
1 page
Syntactic Analysis
No ratings yet
Syntactic Analysis
21 pages
Type Semantics Summary PT 1
No ratings yet
Type Semantics Summary PT 1
4 pages
Adverb Worksheet
No ratings yet
Adverb Worksheet
7 pages
Questions Om Semantics 20161
No ratings yet
Questions Om Semantics 20161
5 pages
List of Irregular Verbs
No ratings yet
List of Irregular Verbs
18 pages
Idioms and Idiomatic Expressions - 4
No ratings yet
Idioms and Idiomatic Expressions - 4
8 pages
Chapter 7 - Lecture Notes
No ratings yet
Chapter 7 - Lecture Notes
8 pages
Examen de Ingles Quinto Grado
100% (2)
Examen de Ingles Quinto Grado
2 pages
"FIGHT CLUB"AS AN EXISTENTIAL NOVEL (A Comprehensive Study of Genetic Structuralism) PDF
No ratings yet
"FIGHT CLUB"AS AN EXISTENTIAL NOVEL (A Comprehensive Study of Genetic Structuralism) PDF
130 pages
A Lexico-Syntactic Analysis
No ratings yet
A Lexico-Syntactic Analysis
17 pages
Text Types and Text Forms 11.11.2014
No ratings yet
Text Types and Text Forms 11.11.2014
36 pages
Rules of Modifier
No ratings yet
Rules of Modifier
8 pages
Asep Nuhdi_template Baru Eduvest
No ratings yet
Asep Nuhdi_template Baru Eduvest
13 pages
An Analysis On Syntactic and Semantic Factors in Newspaper Headlines
No ratings yet
An Analysis On Syntactic and Semantic Factors in Newspaper Headlines
13 pages
Mandarin Chinese Sentence Patterns HSK Level 1
No ratings yet
Mandarin Chinese Sentence Patterns HSK Level 1
8 pages
Adverbs For ESL Kids
No ratings yet
Adverbs For ESL Kids
5 pages
Proverb and Riddles
No ratings yet
Proverb and Riddles
7 pages
Teaching Grammar Using The PACE Model
100% (1)
Teaching Grammar Using The PACE Model
1 page
Comparativos y Superlativos Presentacion en Ingles
No ratings yet
Comparativos y Superlativos Presentacion en Ingles
18 pages
Sesi 1 - MKU Bahasa Inggris - Introduction
No ratings yet
Sesi 1 - MKU Bahasa Inggris - Introduction
16 pages
Adjectives Adverbs
No ratings yet
Adjectives Adverbs
20 pages
(Meet 6) Direct and Reported Speech
No ratings yet
(Meet 6) Direct and Reported Speech
24 pages
15840-Unit Vi
No ratings yet
15840-Unit Vi
58 pages
Sneak Preview: Symbolic Logic Study Guide
No ratings yet
Sneak Preview: Symbolic Logic Study Guide
9 pages

NLP

Uploaded by

NLP

Uploaded by

Natural Language

• There are two components of NLP as given −

• The NLU is harder than NLG.

• NL has an extremely rich form and structure.

• Phonology − It is study of organizing sound systematically.

• There are a number of algorithms researchers have developed

• Here, the parser starts with the S symbol and attempts to

• Inbox: The practical significance of hypertree width in identifying more ...

You might also like