1. NLP-Intro
1. NLP-Intro
1
Shadakhtar:nlp:iiitd:2025:intro
Disclaimer
shadakhtar:nlp:iiitd:2025:intro
Course Objectives - NLP
● Understanding the advantages and disadvantages of basic NLP techniques
● Utilize and implement existing NLP models and analyze their performance with
respect to different datasets.
3
shadakhtar:nlp:iiitd:2025:intro
Logistics
shadakhtar:nlp:iiitd:2025:intro
Course format and evaluation policies - NLP 2024
● Project [20 points]
○ Details will be shared later
● Assignments - 4 [30 points]
● Quizzes - 2 [10 points]
○ Both quizzes will be **surprise**
○ No compensation/makeup quiz if you miss any.
5
shadakhtar:nlp:iiitd:2025:intro
Assignments
● Deadlines and extension
○ You’re encouraged to submit your assignments (preferably by deadline).
○ You’re permitted to use 3 days of extension for any (and only) one assignment without any
penalty.
○ Else, 10% penalty for delay upto 7 days.
○ Else, 25% penalty after 7 days till evaluation of the respective assignment.
● Bonus - 5 points
○ If you score >= 90% in the assignment evaluation without any extension.
shadakhtar:dl:iiitd:2021:intro
High-level course outline
● Classical NLP
● Word, Phrases, Sentences and their structures
● Text processing, Morphology, Syntax and Dependency Parsing
● Neural NLP
○ Word representation
○ Neural Sequence Learning and Information Extraction
○ Neural Text classification
○ Neural Sequence Transformation and Text Generation
● NLP problems
○ PoS Tagging, NER, LM, Sentiment/Hate Speech Classification, NMT, Summarization, Dialog
Understanding and Generation, Question Answering
7
shadakhtar:nlp:iiitd:2025:intro
TAs
● Zeba Afroz [email protected]
● Shivam Kumar [email protected]
● Sourav Chakraborty [email protected]
● Nalish Jain [email protected]
● Sanmay Sood [email protected]
● Ritesh Rajput [email protected]
● Asees Khurana [email protected]
● Manav Mittal [email protected]
● Akash Kushwaha [email protected]
● Mohammad Kaif [email protected]
8
shadakhtar:nlp:iiitd:2025:intro
Office hours
● Office hours:
○ Tuesday@ 5:00 — 6:00 PM
○ B-406, R&D Building
9
shadakhtar:nlp:iiitd:2025:intro
Course Management
• Google Classroom - Join using iiitd account only.
pzdkwuz
10
shadakhtar:nlp:iiitd:2025:intro
Useful resources/tools/libraries
● Natural Language Toolkit (NLTK)
● Stanford CoreNLP
● CMU ARK for Noisy Text
● Scikit-learn
● Spacy
● Stanza
● Shallow Parser - for Indian Language
● Universal Parser - Multi-lingual
● HuggingFace
11
shadakhtar:nlp:iiitd:2025:intro
Reading and Reference materials
● Books
○ Speech and Language Processing, Dan Jurafsky and James H. Martin
https://web.stanford.edu/~jurafsky/slp3/
○ Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schütze
○ A Primer on Neural Network Models for Natural Language Processing, Yoav Goldberg
http://u.cs.biu.ac.il/~yogo/nnlp.pdf
● Journals
○ Computational Linguistics, Natural Language Engineering, TACL, KBS, ACM TALLIP, ....
● Conferences
○ ACL, EMNLP, NAACL, COLING, CONLL, EACL, AACL, AAAI, IJCNLP, ICML, NIPS, WWW, KDD,
SIGIR, ICON, ….
12
shadakhtar:nlp:iiitd:2025:intro
Research papers repository https://aclanthology.org/
13
shadakhtar:nlp:iiitd:2025:intro
Research papers repository https://arxiv.org/list/cs.CL/recent
14
shadakhtar:nlp:iiitd:2025:intro
Introduction
shadakhtar:nlp:iiitd:2025:intro
A few questions
● What is language?
● A communication mean
16
shadakhtar:nlp:iiitd:2025:intro
Natural Language Processing
● The Natural Language Processing (NLP) is a subfield of Artificial Intelligence (AI) that
deals with the human language processing in textual form.
● The NLP domain is a collection of problems dealing with human languages
● Prime objective is to provide a platform for human-machine interaction.
○ The machine should be capable enough to
■ understand the human language (e.g., English, Hindi, etc.),
■ process it, and
■ generate a response in human-understandable language.
17
shadakhtar:nlp:iiitd:2025:intro
Task Input Processing Output
Sentiment Analysis I adore this Lexicon lookup, Negation Handling, ... Positive
movie.
shadakhtar:nlp:iiitd:2020:intro
है
रे
बि
ल्ली
Turing Test [Alan Turing, 1950]
● Setup
○ Two rooms, two humans, and a computer.
■ Room 1: One human C
■ Room 2: One computer (A) and one human (B)
19
shadakhtar:nlp:iiitd:2025:intro
NLP perspectives
● Science perspective
○ Understand the phenomenon of the language
○ How humans perceive a language?
● Engineering perspective
○ Build systems to understand, analyse, and generate language.
NLP = Linguistics + ML
20
shadakhtar:nlp:iiitd:2025:intro
NLP perspectives
21
shadakhtar:nlp:iiitd:2025:intro
What makes NLP challenging/interesting?
● What’s the difference between a programming language and natural language?
● Structure?
● Not really!
● Natural language can be structured, e.g., English follows S+V+O rule
● Ram eats mangoes.
22
shadakhtar:nlp:iiitd:2025:intro
Ambiguity
● Is ambiguity present in language only?
● No, ambiguity is prevalent in every dimension!
Duck or Rabbit?
23
shadakhtar:nlp:iiitd:2025:intro
Who has the
telescope?
Ambiguity in language
● I saw a girl with a telescope.
No
ambiguity!
24
shadakhtar:nlp:iiitd:2025:intro
Ambiguity in language
● I saw a girl with a telescope.
● Mary had a little lamb.
OR
25
shadakhtar:nlp:iiitd:2025:intro
Who’ll gift
whom?
Ambiguity in language
● I saw a girl with a telescope.
● Mary had a little lamb. I have to gift you some sweets.
● Mujhe aapko mithai khilani padegi.
OR
26
shadakhtar:nlp:iiitd:2025:intro
Ambiguity in language
● I saw a girl with a telescope.
● Mary had a little lamb.
● Mujhe aapko mithai khilani padegi.
● Public demand changes
OR
Public Public
demand: demand:
ABC OR XYZ
(a) Public demand changes, but does any body listen to them?
(b) Public demand changes, and we companies have to adapt to such changes.
(c) Public demand changes have pushed many companies out of business 27
shadakhtar:nlp:iiitd:2025:intro
Ambiguity in language
● I saw a girl with a telescope.
● Mary had a little lamb.
● Mujhe aapko mithai khilani padegi.
● Public demand changes
● Baby changing room
OR
IN OUT
Baby
changing
room
28
shadakhtar:nlp:iiitd:2025:intro
Ambiguity in language
● I saw a girl with a telescope.
● Mary had a little lamb.
● Mujhe aapko mithai khilani padegi.
● Public demand changes
● Baby changing room
● I ate rice with spoon.
● I ate rice with curd.
● I ate rice with Rahul.
Similar surface
structures but
different
interpretations!
29
shadakhtar:nlp:iiitd:2025:intro
Ambiguity and Punctuations!
A woman without her man is nothing.
30
shadakhtar:nlp:iiitd:2025:intro
Is it a valid
sentence?
A very interesting case: Restrictive clause
Buffaloe buffaloe Buffaloe buffaloe buffaloe buffaloe Buffaloe buffaloe
Buffaloe buffaloe, whom other Buffaloe buffaloe buffaloe, buffaloe Buffaloe buffaloe
I saw a boy wearing a red shirt. I saw a boy who was wearing a red shirt.
The sentence uses a restrictive clause, so there are no commas, nor is there the word "which," as in, "Buffalo buffalo, which Buffalo buffalo buffalo, buffalo Buffalo
buffalo." This clause is also a reduced relative clause, so the word that, which could appear between the second and third words of the sentence, is omitted. 31
shadakhtar:nlp:iiitd:2025:intro Dmitri Borgmann's Beyond Language: Adventures in Word and Thought. 1967.
Why NLP is difficult?
• Every language is unique and they offer new challenges in terms of lexical, structural,
and/or semantic properties.
• Languages evolve over the time
◦ Social media language, Code-mixed language
◦ Addition on new words, new rules, etc.
• Some other challenges
◦ Lexical/Grammatical mistakes: “The sun rises in the east” vs “The son rises in
the east.”
◦ Lack of context/World knowledge/Commonsense: “He put a turkey into the
fridge.” vs “He put an elephant into the fridge.”
◦ Echo-formation: Khana-wana, Roti-shoti
◦ Phonetic reference: My refrigerator was making khat khat khat khat khat sound
the whole night.
32
shadakhtar:nlp:iiitd:2025:intro
Role of ML in NLP
● Ambiguity resolution through classification
● Choose among multiple classes
● Choose the one with the HIGHEST SCORE aka. probability
}
● For example:
○ P(“The sun rises in the east”)
○ P(“The sun rise in the east”)
■ Less probable because of the grammatical mistake.
○ P(“The svn rises in the east”) Max
■ Less probable because of the lexical mistake.
○ P(“The sun rises in the west”)
■ Less probable because of the semantic mistake.
33
shadakhtar:nlp:iiitd:2025:intro
Standard NLP Tasks
shadakhtar:nlp:iiitd:2025:intro
NLP layers
● Understanding the semantics is a non-trivial task.
● Needs to performs a series of incremental tasks to achieve this.
● NLP happens in layers
35
shadakhtar:nlp:iiitd:2025:intro
NLP trinity
DL
36
shadakhtar:nlp:iiitd:2025:intro
Word and Token
● Word:
○ Smallest sequence of phonemes of a spoken language that can be uttered in isolation;
● Word Segmentation/Tokenization:
○ Breaking a string of characters into a sequence of words.
○ Smallest sequence of graphemes that are delimited with some predefined characters (space, comma, full-
stop, etc.);
Ram, Shyam, and Mohan are playing. [Ram] [,] [Shyam] [,] [and] [Mohan] [are] [playing] [.]
21,53,010 COVID cases in India. [21] [,] [53] [,] [010] [COVID] [cases] [in] [India] [.]
39
shadakhtar:nlp:iiitd:2025:intro
Part-of-Speech (PoS)
● Grammatical class of the word.
He ate an apple .
PRP VBD DT NN .
● PoS disambiguation
○ A word can belong to different grammatical classes.
PRP VBD TO DT NN IN DT NN .
PRP VBD TO VB DT NN IN DT NN .
40
shadakhtar:nlp:iiitd:2025:intro
Chunking
● Identification of non-recursive phrases (noun, verb, etc.)
● E.g.,
○ He reckons the current account deficit will narrow to only # 1.8 billion in September.
[NP He] [VP reckons] [NP the current account deficit] [VP will narrow] [PP to] [NP only # 1.8
billion] [PP in] [NP September]
○ Mumbai green lights women icons on traffic signals earns global praise.
[NP Mumbai green lights women icons] [PP on] [NP traffic signals] [VP earns] [NP global
praise]
41
shadakhtar:nlp:iiitd:2025:intro
Syntax Processing
S
● Validate the grammatical structure of the sentence.
● Let, vocabulary = [the, mango, he, eats, ...]
○ He eats a mango. ✅
○
NP VP .
He mango eats a. ❌
He eats a mango
Parse Tree 42
shadakhtar:nlp:iiitd:2025:intro
Syntax Processing
S
● Every language has a grammar G = <V, T, P, S>.
He eats a mango
43
shadakhtar:nlp:iiitd:2025:intro
Syntactic Ambiguity
S
S
NP VP .
NP VP .
VBZ NP
VBZ NP PP
PRP DT NN PP
PRP DT NN IN NP
IN NP
DT NN DT NN
telesco telesco
I saw a girl with a I saw a girl with a
pe pe
44
shadakhtar:nlp:iiitd:2025:intro
Semantic Role Labelling (SRL)
● Identify the semantic role of each argument (noun phrase) w.r.t. the predicate (main
verb) of the sentence
45
shadakhtar:nlp:iiitd:2025:intro
Textual Entailment
● Determine whether one natural language sentence entails (implies) another under an
ordinary interpretation
(Ram hit Shyam with a hockey stick yesterday. → Shyam got hurt) Positive TE
(Ram hit Shyam with a hockey stick yesterday. → Shyam did not get hurt) Negative TE
(Ram hit Shyam with a hockey stick yesterday. → Shyam got hospitalized) non TE
(Eyeing the huge market potential, currently led by Google, Yahoo took over search company
Overture Services Inc last year. → Yahoo bought Overture.) Positive TE
46
shadakhtar:nlp:iiitd:2025:intro
Pragmatics
● Pragmatics considers [Thomas, 1995]:
○ the negotiation of meaning between speaker and listener.
○ the context of the utterance.
○ the intention of the user.
○ Intention:
■ Utterance: Can you pass the water bottle?
■ Literal meaning: Are you able to pass the water bottle? (Response: Yes, I can.)
■ Pragmatic meaning: Pass me the water bottle. (Response: Handover the water bottle)
47
shadakhtar:nlp:iiitd:2025:intro
Discourse
● Processing of sequence of sentences.
Mother said to John: Go to school. It is open today. Are you planning to bunk? Father
will be very angry.
48
shadakhtar:nlp:iiitd:2025:intro
Coreference Resolution
● Two referring expressions used to refer to the same entity are said to corefer.
● Determine which phrases in a document corefer.
John shows Bob his Toyota yesterday. It’s similar to the one I bought five years ago.
That was really nice, but he like this one even better.
Victoria Chen, CFO of Megabucks Banking, saw her pay jump to $2.3 million, as the 38-year-old
became the company’s president. It is widely known that she came to Megabucks from rival
Lotsabucks.
[her, 38-year-old, she] → Referring to the same entity, i.e., Victoria Chen.
49
shadakhtar:nlp:iiitd:2025:intro
Information Extraction
● Extraction of relevant piece of information
● Relation extraction:
○ Relation among entities
■ CEO(Sundar Pichai, Google), CEO(Sundar Pichai, Alphabet), Born-at(Sundar Pichai,
India), ParentOrg(Alphabet, Google)
50
shadakhtar:nlp:iiitd:2025:intro
Word Sense Disambiguation (WSD)
● What does a word mean?
51
shadakhtar:nlp:iiitd:2025:intro
Sentiment Analysis
● Extract polarity orientation of the subjectivity
○ Really superb pillow. Love to sleep on it.. very comfortable... Positive
○ It's a mass Chinese product. Too expensive. Thin and useless Negative
○ My neighbours are home and it’s good to wake up at 3am in the morning. Negative
52
shadakhtar:nlp:iiitd:2025:intro
Machine Translation
● Given a sentence in the source language L1, convert it to the target language L2, such that
the semantic (adequacy and fluency) is preserved.
55
shadakhtar:nlp:iiitd:2025:intro
2025
Summarization
● Given a document, summarize the semantics (extract relevant information) in shorter length text.
● Document
○ Sen. Barack Obama sealed the Democratic presidential nomination last night after a grueling
and history-making campaign against Sen. Hillary Rodham Clinton that will make him the first
African American to head a major-party ticket.
● Summary
○ Barack Obama is the Democratic presidential candidate.
56
shadakhtar:nlp:iiitd:2025:intro
Question Answering
● Answer natural language questions based on information presented in the repository.
● Factoid Questions
○ Question: Who is the author of the book Wings of Fire?
○ Answer: A. P. J, Abdul Kalam
● List Questions
○ Question: What are the islands in India?
○ Answer: Andaman Island, Nicobar Island, Labyrinth Island, Barren Island
● Descriptive Questions
○ Question: What is Greenhouse effect?
○ Answer: The analogy used to describe the ability of gases in the atmosphere to absorb heat
from the earth’s surface.
57
shadakhtar:nlp:iiitd:2025:intro
Dialog System and Chatbot
● Conversation of two or more parties.
58
shadakhtar:nlp:iiitd:2025:intro
Hate Speech
• Any post that targets a speci c individual/group of people based on their ethnicity,
religious beliefs, geographical belonging, race, etc., with malicious intentions of
disseminating hate or emboldening violence.
• #BuildThatWall #BuildTheDamnWall I’m sorry my Lord #Jesus but people are
just deaf down here
• Women ... Can’t live with them...Can’t shoot them
• Related terms
• Insult, Abuse, Offensive, Provocative
59
shadakhtar:nlp:iiitd:2025:intro
fi
Fake News
• A piece of information or an alleged claim that is veri able to be false.
• Intentionally created posts to spread malicious and false narratives
◦ Leverages the chaos/misinformation to gain political, financial, or regional advantages in a quick
time
60
shadakhtar:nlp:iiitd:2025:intro
fi
Thanks
shadakhtar:nlp:iiitd:2025:intro