0% found this document useful (0 votes)
17 views71 pages

Mod 1

The document provides an overview of Natural Language Processing (NLP), detailing its definition, history, and various stages of processing. It discusses the evolution from rule-based systems to data-driven approaches, highlighting key experiments and theories that shaped the field. Additionally, it covers text classification methods, evaluation metrics, and specific NLP tasks such as Part-of-Speech tagging and Named Entity Recognition.

Uploaded by

try.nahush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views71 pages

Mod 1

The document provides an overview of Natural Language Processing (NLP), detailing its definition, history, and various stages of processing. It discusses the evolution from rule-based systems to data-driven approaches, highlighting key experiments and theories that shaped the field. Additionally, it covers text classification methods, evaluation metrics, and specific NLP tasks such as Part-of-Speech tagging and Named Entity Recognition.

Uploaded by

try.nahush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 71

Module #1

CSE3188:
Natural Language Processing
Contents
• What is NLP?
• History of NLP
• Stages of Processing in NLP

2
What is NLP?
● Branch of Artificial Intelligence

● Combination of linguistics (understanding how languages work) and


computer science (building systems to solve natural language-
related problems).

3
History of NLP

4
The Imitation Game

● Identify the machine / human participant.

5
Georgetown Experiment

● Demonstration of machine translation performed by IBM at Georgetown


University in 1954.
● Involved translation of 60+ sentences from Russian to English.
● Consisted of 6 grammar rules and 250 lexical items (stems + endings)
● Initially, it led to lots of research money to be used by governments for
research in MT and NLP. However, real progress was much slower!

6
Syntactic Structures and Conceptual Dependency Theory

● In 1957, Noam Chomsky came up with Syntactic Structures, which


revolutionized linguistics and grammar.
○ Chomsky used phrase structure rules to generate new sentences.
○ Gave examples of grammatically correct sentences without any meaning.
Example: “Colourless green ideas sleep furiously”
○ Advocated a separation of syntax from semantics

7
ALPAC Report

● ALPAC (Automatic Language Processing Advisory Committee) was


formed to evaluate the progress in NLP in general and MT in particular in
1964.
● It published the ALPAC report in 1966, was sceptical of the research
done in the previous decade, and led to large funding cuts for
computational linguistics.

8
From Rules to Data

● Starting from the 1980s, we have seen a movement from using rule-
based NLP systems to statistical systems due to the presence of data.
● With data, we can use probability theory to build reasonably robust
systems for language modeling, machine translation, etc.
○ Example: Which one is correct in each pair and why?
■ I saw an elephant. Vs. I saw an equipment.
■ An European war is currently going on. Vs. A European war is currently going on.
■ Tell me something. Vs. Say me something.
● All this is possible because of probability.

9
Example of Machine Translation

● Earlier approach - Rule-based Machine Translation


● Linguists would create multiple rules in source language and target
language.
● People would use dictionaries to map to root words, morphemes, etc.
● Limited in scope. Could not account for many challenges in MT.

10
Example of Machine Translation

● More modern approach - Phrase-based Machine Translation


● Uses a parallel corpus, where sentences in the source language are
mapped to their equivalent in the target language.
● From this, phrases (or n-grams) are mapped from the source to the
target language.
● Example: India’s Prime Minister (EN) <-> Bhaarth ka Pradhaan Mantri
(HI)
● Used data to maximize alignment probability, and language modeling to
get correct target language sentences.

11
Stages of NLP

12
Stages of Processing
● Phonetics and phonology
● Morphology
● Lexical Analysis
● Syntactic Analysis
● Semantic Analysis
● Pragmatics and Discourse

13
Challenges Associated with
Phonetics / Speech
● Homophones - Words that sound same / similar.
○ After Mahatma Gandhi was killed by Godse, India was mourning. However,
that did not stop some kids playing in the evening in a park. Someone once
asked them “Why are you playing? It is mourning time now.” To which one
of the kids said “Sir. It is not morning time, but it is evening, and we have
just finished our homework!”
● Word boundary - Where to split the words in speech
○ I got a plate.
○ I got up late.
● Disfluency - ah, um, etc…

14
Morphology
● Word formation from root words and morphemes
○ Eg. singular - plural (teacher + s = teachers), gender (lion + ess = lioness),
tense (listen + ing = listening), etc.
● First step in NLP - extract the morphemes of the given word
● Languages rich in morphology - Dravidian languages (Eg. Kannada,
Tamil, Telugu, etc.)
○ Example: Maadidhanu - Maadu (root verb) + past tense + male singular
● Languages poor in morphology - English
○ Example: Did - Do (root verb) + past tense

15
Lexical Analysis
● Words have different meanings.
● Meanings have different words.
Example:
● Where there’s a will…
● There are many relatives

16
Lexical Disambiguation
● Part of Speech disambiguation.
○ Love (is it a verb (I love to eat sushi) or a noun (God’s love is so wonderful)?)

● Sense disambiguation.
○ Bank (I went to the bank on the river to buy fish. vs. I went to the bank on
the river to withdraw Rs. 1000)

17
Syntactic Analysis
• Consider the sentence “I like mangoes” S

VP
NP

NP
V
N

N
I like

mangoes 18
Syntactic Analysis
● S -> NP VP
● NP -> N
● VP -> V NP
● N -> Noun (mangoes) / Pronoun (I)
● VP -> Verb (like)

19
Ambiguity in Parsing
● Natural Language Ambiguity:
I saw a boy with a telescope.
(Who has the telescope?)
● Design Ambiguity:
I saw a boy with a telescope which I dropped. Vs. I saw a boy with a
telescope which he dropped.
(Will the same parse tree be generated using probability?)

20
Semantic Analysis
● Semantic Analysis involves assigning semantic roles to entities in the
text.
Example: John gave the book to Mary.
Agent: John, Recipient: Mary, Object / Theme: the book, etc.
● Semantic ambiguity:
Example: Visiting people involves lot of work.

21
Pragmatics and Discourse
● Study of contexts in which language is used.
○ Example: Coreference Resolution.
● Very hard problem. Requires successful (or satisfactory) solutions of
previous problems.
● Disambiguation clues need not be present within the same
sentence, but can be present anywhere in the text!

22
Contents
• Text Classification

23
What is Text Classification?
• Classifying the text (or a part of it) into distinct classes.
• Which of the following are examples of text classification?
• Machine Translation
• Sentiment Analysis
• Part-of-Speech Tagging
• Named Entity Recognition
• Automatic Essay Grading
• Natural Language Generation
• MCQ Comprehension and Question Answering
• ………
24
Text Classification: definition
• Input:
• a document d
• a fixed set of classes C = {c1, c2,…, cJ}

• Output: a predicted class c  C


Classification Methods: Hand-
coded rules
• Rules based on combinations of words or other features
• spam: black-list-address OR (“dollars” AND “you have been selected”)
• Accuracy can be high
• If rules carefully refined by expert
• But building and maintaining these rules is expensive
Classification Methods:
Supervised Machine Learning

• Input:
• a document d
• a fixed set of classes C = {c1, c2,…, cJ}
• A training set of m hand-labeled documents (d1,c1),....,
(dm,cm)
• Output:
• a learned classifier γ:d  c

27
Classification Methods:
Supervised Machine Learning

• Any kind of classifier


• Naïve Bayes
• Logistic regression
• Neural networks
• k-Nearest Neighbors
•…
Document Representations
• Feature-based representation
• Bag of words representation
• Kernel-based representation
• Embedding-based representation

29
Feature-based Representation
• Converts the text into a vector of features.
• Example features:
• Length-based features
• Average word length, average sentence length, no. of syllables per word, etc.
• PoS-based features
• Proportion of nouns, verbs, adverbs, adjectives, punctuations, etc.
• Syntactic-based features
• Average parse-tree depth, No. of SBARs, etc.
• Coherence-based features
• Entity grid features…

30
The Bag of Words Representation

31
Kernel Representation
• Eg. String Kernel
• String Kernel is a similarity function between two strings.
• Eg. Histogram Intersection String Kernel (HISK)

• The kernel is then normalized as follows:

32
Embedding Representation
• Generate an embedding-based representation of the document.
• Eg. Generating the embedding-based representation for an essay for
grading it.

33
Evaluation
• Let's consider just binary text classification tasks
• Imagine you're the CEO of Delicious Pie Company
• You want to know what people are saying about your pies
• So you build a "Delicious Pie" tweet detector
• Positive class: tweets about Delicious Pie Co
• Negative class: all other tweets
The 2-by-2 confusion matrix
Evaluation: Accuracy
• Why don't we use accuracy as our metric?
• Imagine we saw 1 million tweets
• 100 of them talked about Delicious Pie Co.
• 999,900 talked about something else
• We could build a dumb classifier that just labels every tweet "not about
pie"
• It would get 99.99% accuracy!!! Wow!!!!
• But useless! Doesn't return the comments we are looking for!
• That's why we use precision and recall instead
Evaluation: Precision
• % of items the system detected (i.e., items the system labeled as
positive) that are in fact positive (according to the human gold labels)
Evaluation: Recall
• % of items actually present in the input that were correctly identified
by the system.
Why Precision and recall
• Our dumb pie-classifier
• Just label nothing as "about pie"
Accuracy=99.99%
but
Recall = 0
• (it doesn't get any of the 100 Pie tweets)
Precision and recall, unlike accuracy, emphasize true positives:
• finding the things that we are supposed to be looking for.
A combined measure: F
• F measure: a single number that combines P and R:

• We almost always use balanced F1 (i.e.,  = 1)


Development Test Sets ("Devsets") and
Cross-validation

Training set Development Test Set Test Set

• Train on training set, tune on devset, report on testset


• This avoids overfitting (‘tuning to the test set’)
• More conservative estimate of performance
• But paradox: want as much data as possible for training, and as much for dev; how to
split?
Cross-validation: multiple splits
• Pool results over splits, Compute pooled dev performance
Contents
• Introduction to PoS Tagging

43
Part-of-Speech Tagging
• Involves tagging each token with a part-of-speech (Eg. noun).
• Lets say that we have only 6 tags – noun (NN), verb (VB), adjective (JJ)
adverb (RB), function word (FW) to represent all other words and
punctation (.) to represent all punctuations.
• Consider the following sentence.
• The quick brown fox jumped over the lazy dog.
• The tagged sentence is
• The_FW quick_JJ brown_JJ fox_NN jumped_VB over_FW the_FW lazy_JJ
dog_NN ._.

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 44


Calculation of Part-of-Speech Tags
• Find the best Tag sequence (T*), given the word sequence (W).
• (by Bayes’ theorem)
• We get (Bigram assumption)
• Here, is the Initial Probability
• is the Transition Probability
• Similarly,
• is the Lexical Probability

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 45


Training a Part-of-Speech Tagger
• 1. Use a part-of-speech tagged corpus (Eg. Brown Corpus)
• 2. For a set of T tags and a vocabulary of size V, learn the following
tables.
• Initial Probability table = |T*1|
• Transition Probability table = |T*T|
• Lexical Probability table = |V*T|

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 46


Common Tagsets
• Penn Treebank P.O.S. Tags (upenn.edu)
• BNC Tagset
• BNC: The BNC Basic (C5) Tagset (ox.ac.uk)
• BNC: List of Tags in the BNC Enriched Tagset (ox.ac.uk)
• Universal POS tags (universaldependencies.org)

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 47


Challenges of PoS Tagging
• Unknown Words are words which are not present at the time of
training.
• How to handle them?
• Solution:
• Consider a tag set of size |T|.
• Lexical probability of the word given each tag, is .

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 48


Evaluation of PoS Tagging
• Evaluation Method: Train – Test set / n-fold cross-validation
• Evaluation Metric: Accuracy / Precision, Recall, F-Score / Kappa
• Accuracy = No. of correctly tagged tokens / Total tokens
• Precision(T) = No. of times T is correctly tagged / No. of times a word
was tagged T
• Recall(T) = No. of times T is correctly tagged / No. of times the tag T is
present in the test set

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 49


Contents
• Named Entity Recognition

50
Named Entities
• Named entities are anything that can be referred to with a proper
name.
• Multiple class problem
• 3 classes – PER (person), LOC (location), ORG (organization)
• 4 classes – PER (person), LOC (location), ORG (organization), GPE (geo-political
entity)
• More classes – PER (person), LOC (location), ORG (organization), GPE (geo-
political entity) + classes for dates, times, numbers, prices, etc.
• Often can include multi word phrases

51
Examples of Named Entities
Class Examples
Person Sandeep Mathias
Location Bengaluru
Organization Presidency University
Geo-political Entity Prime Minister of India

52
Named Entity Tagging
• The task of Named Entity Recognition (NER):
• Find spans of text that constitute a named entity.
• Tag the entity with the proper NER class.

53
NER Input
• Citing high fuel prices, United Airlines said Friday it has increased
fares by $6 per round trip on flights to some cities also served by
lower-cost carriers.
• American Airlines, a unit of AMR Corp., immediately matched the
move, spokesman Tim Wagner said.
• United, a unit of UAL Corp., said the increase took effect Thursday and
applies to most routes where it competes against discount carriers,
such as Chicago to Dallas and Denver to San Francisco.

54
NER – Finding NER Spans
• Citing high fuel prices, [United Airlines] said [Friday] it has increased
fares by [$6] per round trip on flights to some cities also served by
lower-cost carriers.
• [American Airlines], a unit of [AMR Corp.], immediately matched the
move, spokesman [Tim Wagner] said.
• [United], a unit of [UAL Corp.], said the increase took effect
[Thursday] and applies to most routes where it competes against
discount carriers, such as [Chicago] to [Dallas] and [Denver] to [San
Francisco].

55
NER Output
• Citing high fuel prices, [ORG United Airlines] said [TIME Friday] it has
increased fares by [MONEY $6] per round trip on flights to some cities
also served by lower-cost carriers.
• [ORG American Airlines], a unit of [ORG AMR Corp.], immediately
matched the move, spokesman [PER Tim Wagner] said.
• [ORG United], a unit of [ORG UAL Corp.], said the increase took effect
[TIME Thursday] and applies to most routes where it competes against
discount carriers, such as [LOC Chicago] to [LOC Dallas] and [LOC Denver]
to [LOC San Francisco].

56
Why NER?
• Sentiment analysis: consumer’s sentiment toward a particular
company or person?
• Question Answering: answer questions about an entity?
• Information Extraction: Extracting facts about entities from text

57
Why NER is not so easy
• Segmentation
• In PoS tagging, no segmentation, since each word gets 1 tag.
• In NER, we have to find the span before adding the tags!
• Type Ambiguity
• Multiple types can map to same span.
• [Washington] was born into slavery on the farm of James Burroughs.
• [Washington] went up 2 games to 1 in the four-game series.
• Blair arrived in [Washington] for what may well be his last state visit.
• In June, [Washington] legislators passed a primary seatbelt law.

58
Why NER is not so easy
• Segmentation
• In PoS tagging, no segmentation, since each word gets 1 tag.
• In NER, we have to find the span before adding the tags!
• Type Ambiguity
• Multiple types can map to same span.
• [PER Washington] was born into slavery on the farm of James Burroughs.
• [ORG Washington] went up 2 games to 1 in the four-game series.
• Blair arrived in [LOC Washington] for what may well be his last state visit.
• In June, [GPE Washington] legislators passed a primary seatbelt law.

59
Contents
• Statistical Machine Translation

60
Machine Translation
• Conversion of a text from one language to another using computers.
• Example: Translation from English to French.
• Input language is also known as source language, and output /
translated language is also known as target language.

61
Examples of MT
• Conversion from English to French
• SRC: The three rabbits of Grenoble
• TGT: Les trois lapins de Grenoble
• Conversion from English to Simple English
• SRC: Students should not procrastinate their assignment submissions.
• TGT: Students should not delay submitting their assignments.
• Conversion from Spoken English to English
• SRC: I bought a car for Rupees five lakhs.
• TGT: I bought a car for Rs. 5,00,000.
• Code mixing
62
Machine Translation Paradigms
• Rule-based MT: Using linguistic rules to perform translation
• Example: Plurals end with an “s” in English. Hence, “Hudugaru” (plural of
“Huduga” in Kannada) = “Boys” in English.
• Example-based MT: Translation by analogy
• Statistical-based MT: Using source – target language pairs / parallel
corpus to learn alignments.
• Neural MT: Uses an encoder-decoder architecture to learn
representations of the source and map it to the target language
representations.

63
Challenges of MT
• Ambiguity
• Same word, multiple meanings
• Same meaning, multiple words
• Word Order
• SOV to SVO?
• Morphological Richness
• Challenging for SMT systems!

64
Problems with Rule-based MT
• Requires linguistic knowledge of both languages.
• Maintenance of the system is challenging
• Difficult to handle ambiguity
• Scaling is difficult!

65
Statistical MT
• Model translation using a probabilistic model.
• Measure of confidence in the translations
• Modeling uncertainty in translations
• Using argmax:
• E* = argmax P(e|f)
• E* = best translation
• e = target language text
• f = source language text

66
Word Alignment
• Given a parallel corpus, we find word-level alignments
• Example:
• English: Narendra Modi is the Prime Minister of India
• Hindi: Bharat ke Pradhan Mantri, Narendra Modi Hain.
• Alignments:
• Narendra Modi (English) -> Narendra Modi (Hindi)
• Prime Minister of India -> Bharat ke Pradhan Mantri
• Is (English) -> Hain
• Prime Minister (English) -> Pradhan Mantri
• of India (English) -> Bharat ke
• ………
• ………
67
Word Alignment
• There can be multiple possible alignments.
• Example: Prime Minister -> Bhaarat ke (?)
• Another example: Narendra Modi -> Bhaarat ke (?)
• With one sentence pair, we cannot find alignments properly!
• We need a parallel corpus to find alignments using co-occurrence of
words.

68
Example of Word Alignment
• Consider a parallel corpus with 2 sentences:
• S1: “Three rabbits” = “Trois lapins”
• S2: “The rabbits of Grenoble” = “Les lapins de Grenoble”
• What all words can be aligned?
• What about “The rabbits of Bengaluru”?

69
Phrase Table
• Table of probabilities of phrases
• Phrase table is learnt with word alignments

English Hindi Probability


Prime Minister of India Bhaarat ke Pradhan Mantri 0.75
Prime Minister of India Bhaarat ke Bhootpurv Pradhan Mantri 0.02
Prime Minister of India Pradhan Mantri 0.23

70
Challenges in PBSMT
• Divergent Word Order
• Rich morphology
• Named entities and OOV words
• To be covered in the last module…

71

You might also like