Mod 1
Mod 1
CSE3188:
Natural Language Processing
Contents
• What is NLP?
• History of NLP
• Stages of Processing in NLP
2
What is NLP?
● Branch of Artificial Intelligence
3
History of NLP
4
The Imitation Game
5
Georgetown Experiment
6
Syntactic Structures and Conceptual Dependency Theory
7
ALPAC Report
8
From Rules to Data
● Starting from the 1980s, we have seen a movement from using rule-
based NLP systems to statistical systems due to the presence of data.
● With data, we can use probability theory to build reasonably robust
systems for language modeling, machine translation, etc.
○ Example: Which one is correct in each pair and why?
■ I saw an elephant. Vs. I saw an equipment.
■ An European war is currently going on. Vs. A European war is currently going on.
■ Tell me something. Vs. Say me something.
● All this is possible because of probability.
9
Example of Machine Translation
10
Example of Machine Translation
11
Stages of NLP
12
Stages of Processing
● Phonetics and phonology
● Morphology
● Lexical Analysis
● Syntactic Analysis
● Semantic Analysis
● Pragmatics and Discourse
13
Challenges Associated with
Phonetics / Speech
● Homophones - Words that sound same / similar.
○ After Mahatma Gandhi was killed by Godse, India was mourning. However,
that did not stop some kids playing in the evening in a park. Someone once
asked them “Why are you playing? It is mourning time now.” To which one
of the kids said “Sir. It is not morning time, but it is evening, and we have
just finished our homework!”
● Word boundary - Where to split the words in speech
○ I got a plate.
○ I got up late.
● Disfluency - ah, um, etc…
14
Morphology
● Word formation from root words and morphemes
○ Eg. singular - plural (teacher + s = teachers), gender (lion + ess = lioness),
tense (listen + ing = listening), etc.
● First step in NLP - extract the morphemes of the given word
● Languages rich in morphology - Dravidian languages (Eg. Kannada,
Tamil, Telugu, etc.)
○ Example: Maadidhanu - Maadu (root verb) + past tense + male singular
● Languages poor in morphology - English
○ Example: Did - Do (root verb) + past tense
15
Lexical Analysis
● Words have different meanings.
● Meanings have different words.
Example:
● Where there’s a will…
● There are many relatives
16
Lexical Disambiguation
● Part of Speech disambiguation.
○ Love (is it a verb (I love to eat sushi) or a noun (God’s love is so wonderful)?)
● Sense disambiguation.
○ Bank (I went to the bank on the river to buy fish. vs. I went to the bank on
the river to withdraw Rs. 1000)
17
Syntactic Analysis
• Consider the sentence “I like mangoes” S
VP
NP
NP
V
N
N
I like
mangoes 18
Syntactic Analysis
● S -> NP VP
● NP -> N
● VP -> V NP
● N -> Noun (mangoes) / Pronoun (I)
● VP -> Verb (like)
19
Ambiguity in Parsing
● Natural Language Ambiguity:
I saw a boy with a telescope.
(Who has the telescope?)
● Design Ambiguity:
I saw a boy with a telescope which I dropped. Vs. I saw a boy with a
telescope which he dropped.
(Will the same parse tree be generated using probability?)
20
Semantic Analysis
● Semantic Analysis involves assigning semantic roles to entities in the
text.
Example: John gave the book to Mary.
Agent: John, Recipient: Mary, Object / Theme: the book, etc.
● Semantic ambiguity:
Example: Visiting people involves lot of work.
21
Pragmatics and Discourse
● Study of contexts in which language is used.
○ Example: Coreference Resolution.
● Very hard problem. Requires successful (or satisfactory) solutions of
previous problems.
● Disambiguation clues need not be present within the same
sentence, but can be present anywhere in the text!
22
Contents
• Text Classification
23
What is Text Classification?
• Classifying the text (or a part of it) into distinct classes.
• Which of the following are examples of text classification?
• Machine Translation
• Sentiment Analysis
• Part-of-Speech Tagging
• Named Entity Recognition
• Automatic Essay Grading
• Natural Language Generation
• MCQ Comprehension and Question Answering
• ………
24
Text Classification: definition
• Input:
• a document d
• a fixed set of classes C = {c1, c2,…, cJ}
• Input:
• a document d
• a fixed set of classes C = {c1, c2,…, cJ}
• A training set of m hand-labeled documents (d1,c1),....,
(dm,cm)
• Output:
• a learned classifier γ:d c
27
Classification Methods:
Supervised Machine Learning
29
Feature-based Representation
• Converts the text into a vector of features.
• Example features:
• Length-based features
• Average word length, average sentence length, no. of syllables per word, etc.
• PoS-based features
• Proportion of nouns, verbs, adverbs, adjectives, punctuations, etc.
• Syntactic-based features
• Average parse-tree depth, No. of SBARs, etc.
• Coherence-based features
• Entity grid features…
30
The Bag of Words Representation
31
Kernel Representation
• Eg. String Kernel
• String Kernel is a similarity function between two strings.
• Eg. Histogram Intersection String Kernel (HISK)
32
Embedding Representation
• Generate an embedding-based representation of the document.
• Eg. Generating the embedding-based representation for an essay for
grading it.
33
Evaluation
• Let's consider just binary text classification tasks
• Imagine you're the CEO of Delicious Pie Company
• You want to know what people are saying about your pies
• So you build a "Delicious Pie" tweet detector
• Positive class: tweets about Delicious Pie Co
• Negative class: all other tweets
The 2-by-2 confusion matrix
Evaluation: Accuracy
• Why don't we use accuracy as our metric?
• Imagine we saw 1 million tweets
• 100 of them talked about Delicious Pie Co.
• 999,900 talked about something else
• We could build a dumb classifier that just labels every tweet "not about
pie"
• It would get 99.99% accuracy!!! Wow!!!!
• But useless! Doesn't return the comments we are looking for!
• That's why we use precision and recall instead
Evaluation: Precision
• % of items the system detected (i.e., items the system labeled as
positive) that are in fact positive (according to the human gold labels)
Evaluation: Recall
• % of items actually present in the input that were correctly identified
by the system.
Why Precision and recall
• Our dumb pie-classifier
• Just label nothing as "about pie"
Accuracy=99.99%
but
Recall = 0
• (it doesn't get any of the 100 Pie tweets)
Precision and recall, unlike accuracy, emphasize true positives:
• finding the things that we are supposed to be looking for.
A combined measure: F
• F measure: a single number that combines P and R:
43
Part-of-Speech Tagging
• Involves tagging each token with a part-of-speech (Eg. noun).
• Lets say that we have only 6 tags – noun (NN), verb (VB), adjective (JJ)
adverb (RB), function word (FW) to represent all other words and
punctation (.) to represent all punctuations.
• Consider the following sentence.
• The quick brown fox jumped over the lazy dog.
• The tagged sentence is
• The_FW quick_JJ brown_JJ fox_NN jumped_VB over_FW the_FW lazy_JJ
dog_NN ._.
50
Named Entities
• Named entities are anything that can be referred to with a proper
name.
• Multiple class problem
• 3 classes – PER (person), LOC (location), ORG (organization)
• 4 classes – PER (person), LOC (location), ORG (organization), GPE (geo-political
entity)
• More classes – PER (person), LOC (location), ORG (organization), GPE (geo-
political entity) + classes for dates, times, numbers, prices, etc.
• Often can include multi word phrases
51
Examples of Named Entities
Class Examples
Person Sandeep Mathias
Location Bengaluru
Organization Presidency University
Geo-political Entity Prime Minister of India
52
Named Entity Tagging
• The task of Named Entity Recognition (NER):
• Find spans of text that constitute a named entity.
• Tag the entity with the proper NER class.
53
NER Input
• Citing high fuel prices, United Airlines said Friday it has increased
fares by $6 per round trip on flights to some cities also served by
lower-cost carriers.
• American Airlines, a unit of AMR Corp., immediately matched the
move, spokesman Tim Wagner said.
• United, a unit of UAL Corp., said the increase took effect Thursday and
applies to most routes where it competes against discount carriers,
such as Chicago to Dallas and Denver to San Francisco.
54
NER – Finding NER Spans
• Citing high fuel prices, [United Airlines] said [Friday] it has increased
fares by [$6] per round trip on flights to some cities also served by
lower-cost carriers.
• [American Airlines], a unit of [AMR Corp.], immediately matched the
move, spokesman [Tim Wagner] said.
• [United], a unit of [UAL Corp.], said the increase took effect
[Thursday] and applies to most routes where it competes against
discount carriers, such as [Chicago] to [Dallas] and [Denver] to [San
Francisco].
55
NER Output
• Citing high fuel prices, [ORG United Airlines] said [TIME Friday] it has
increased fares by [MONEY $6] per round trip on flights to some cities
also served by lower-cost carriers.
• [ORG American Airlines], a unit of [ORG AMR Corp.], immediately
matched the move, spokesman [PER Tim Wagner] said.
• [ORG United], a unit of [ORG UAL Corp.], said the increase took effect
[TIME Thursday] and applies to most routes where it competes against
discount carriers, such as [LOC Chicago] to [LOC Dallas] and [LOC Denver]
to [LOC San Francisco].
56
Why NER?
• Sentiment analysis: consumer’s sentiment toward a particular
company or person?
• Question Answering: answer questions about an entity?
• Information Extraction: Extracting facts about entities from text
57
Why NER is not so easy
• Segmentation
• In PoS tagging, no segmentation, since each word gets 1 tag.
• In NER, we have to find the span before adding the tags!
• Type Ambiguity
• Multiple types can map to same span.
• [Washington] was born into slavery on the farm of James Burroughs.
• [Washington] went up 2 games to 1 in the four-game series.
• Blair arrived in [Washington] for what may well be his last state visit.
• In June, [Washington] legislators passed a primary seatbelt law.
58
Why NER is not so easy
• Segmentation
• In PoS tagging, no segmentation, since each word gets 1 tag.
• In NER, we have to find the span before adding the tags!
• Type Ambiguity
• Multiple types can map to same span.
• [PER Washington] was born into slavery on the farm of James Burroughs.
• [ORG Washington] went up 2 games to 1 in the four-game series.
• Blair arrived in [LOC Washington] for what may well be his last state visit.
• In June, [GPE Washington] legislators passed a primary seatbelt law.
59
Contents
• Statistical Machine Translation
60
Machine Translation
• Conversion of a text from one language to another using computers.
• Example: Translation from English to French.
• Input language is also known as source language, and output /
translated language is also known as target language.
61
Examples of MT
• Conversion from English to French
• SRC: The three rabbits of Grenoble
• TGT: Les trois lapins de Grenoble
• Conversion from English to Simple English
• SRC: Students should not procrastinate their assignment submissions.
• TGT: Students should not delay submitting their assignments.
• Conversion from Spoken English to English
• SRC: I bought a car for Rupees five lakhs.
• TGT: I bought a car for Rs. 5,00,000.
• Code mixing
62
Machine Translation Paradigms
• Rule-based MT: Using linguistic rules to perform translation
• Example: Plurals end with an “s” in English. Hence, “Hudugaru” (plural of
“Huduga” in Kannada) = “Boys” in English.
• Example-based MT: Translation by analogy
• Statistical-based MT: Using source – target language pairs / parallel
corpus to learn alignments.
• Neural MT: Uses an encoder-decoder architecture to learn
representations of the source and map it to the target language
representations.
63
Challenges of MT
• Ambiguity
• Same word, multiple meanings
• Same meaning, multiple words
• Word Order
• SOV to SVO?
• Morphological Richness
• Challenging for SMT systems!
64
Problems with Rule-based MT
• Requires linguistic knowledge of both languages.
• Maintenance of the system is challenging
• Difficult to handle ambiguity
• Scaling is difficult!
65
Statistical MT
• Model translation using a probabilistic model.
• Measure of confidence in the translations
• Modeling uncertainty in translations
• Using argmax:
• E* = argmax P(e|f)
• E* = best translation
• e = target language text
• f = source language text
66
Word Alignment
• Given a parallel corpus, we find word-level alignments
• Example:
• English: Narendra Modi is the Prime Minister of India
• Hindi: Bharat ke Pradhan Mantri, Narendra Modi Hain.
• Alignments:
• Narendra Modi (English) -> Narendra Modi (Hindi)
• Prime Minister of India -> Bharat ke Pradhan Mantri
• Is (English) -> Hain
• Prime Minister (English) -> Pradhan Mantri
• of India (English) -> Bharat ke
• ………
• ………
67
Word Alignment
• There can be multiple possible alignments.
• Example: Prime Minister -> Bhaarat ke (?)
• Another example: Narendra Modi -> Bhaarat ke (?)
• With one sentence pair, we cannot find alignments properly!
• We need a parallel corpus to find alignments using co-occurrence of
words.
68
Example of Word Alignment
• Consider a parallel corpus with 2 sentences:
• S1: “Three rabbits” = “Trois lapins”
• S2: “The rabbits of Grenoble” = “Les lapins de Grenoble”
• What all words can be aligned?
• What about “The rabbits of Bengaluru”?
69
Phrase Table
• Table of probabilities of phrases
• Phrase table is learnt with word alignments
70
Challenges in PBSMT
• Divergent Word Order
• Rich morphology
• Named entities and OOV words
• To be covered in the last module…
71