0% found this document useful (0 votes)
12 views

1. NLP-Intro

The document outlines the CSE556 Natural Language Processing course, detailing its objectives, evaluation policies, and course logistics. It emphasizes understanding both basic and advanced NLP techniques, alongside practical implementation and analysis of existing models. The course also covers various NLP tasks, challenges, and the role of machine learning in resolving ambiguities in language.

Uploaded by

saurav22465
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

1. NLP-Intro

The document outlines the CSE556 Natural Language Processing course, detailing its objectives, evaluation policies, and course logistics. It emphasizes understanding both basic and advanced NLP techniques, alongside practical implementation and analysis of existing models. The course also covers various NLP tasks, challenges, and the role of machine learning in resolving ambiguities in language.

Uploaded by

saurav22465
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

CSE556

Natural Language Processing


Md Shad Akhtar
[email protected]

1
Shadakhtar:nlp:iiitd:2025:intro
Disclaimer

This course in not about ChatGPT!!!!

But, we will learn the underlying techniques of it.

shadakhtar:nlp:iiitd:2025:intro
Course Objectives - NLP
● Understanding the advantages and disadvantages of basic NLP techniques

● Utilize and implement existing NLP models and analyze their performance with
respect to different datasets.

● Understanding the advantages and disadvantages of Advance NLP techniques

● Design and implement models for advance problems by leveraging real-world


datasets.

3
shadakhtar:nlp:iiitd:2025:intro
Logistics

shadakhtar:nlp:iiitd:2025:intro
Course format and evaluation policies - NLP 2024
● Project [20 points]
○ Details will be shared later
● Assignments - 4 [30 points]
● Quizzes - 2 [10 points]
○ Both quizzes will be **surprise**
○ No compensation/makeup quiz if you miss any.

● Mid-sem [20 points]

● End-sem [20 points]

5
shadakhtar:nlp:iiitd:2025:intro
Assignments
● Deadlines and extension
○ You’re encouraged to submit your assignments (preferably by deadline).
○ You’re permitted to use 3 days of extension for any (and only) one assignment without any
penalty.
○ Else, 10% penalty for delay upto 7 days.
○ Else, 25% penalty after 7 days till evaluation of the respective assignment.

● Bonus - 5 points
○ If you score >= 90% in the assignment evaluation without any extension.

shadakhtar:dl:iiitd:2021:intro
High-level course outline
● Classical NLP
● Word, Phrases, Sentences and their structures
● Text processing, Morphology, Syntax and Dependency Parsing
● Neural NLP
○ Word representation
○ Neural Sequence Learning and Information Extraction
○ Neural Text classification
○ Neural Sequence Transformation and Text Generation
● NLP problems
○ PoS Tagging, NER, LM, Sentiment/Hate Speech Classification, NMT, Summarization, Dialog
Understanding and Generation, Question Answering

7
shadakhtar:nlp:iiitd:2025:intro
TAs
● Zeba Afroz [email protected]
● Shivam Kumar [email protected]
● Sourav Chakraborty [email protected]
● Nalish Jain [email protected]
● Sanmay Sood [email protected]
● Ritesh Rajput [email protected]
● Asees Khurana [email protected]
● Manav Mittal [email protected]
● Akash Kushwaha [email protected]
● Mohammad Kaif [email protected]

8
shadakhtar:nlp:iiitd:2025:intro
Office hours

● Office hours:
○ Tuesday@ 5:00 — 6:00 PM
○ B-406, R&D Building

● TAs Office hours:


○ Will be shared later.

9
shadakhtar:nlp:iiitd:2025:intro
Course Management
• Google Classroom - Join using iiitd account only.

pzdkwuz
10
shadakhtar:nlp:iiitd:2025:intro
Useful resources/tools/libraries
● Natural Language Toolkit (NLTK)
● Stanford CoreNLP
● CMU ARK for Noisy Text
● Scikit-learn
● Spacy
● Stanza
● Shallow Parser - for Indian Language
● Universal Parser - Multi-lingual
● HuggingFace

11
shadakhtar:nlp:iiitd:2025:intro
Reading and Reference materials
● Books
○ Speech and Language Processing, Dan Jurafsky and James H. Martin
https://web.stanford.edu/~jurafsky/slp3/

○ Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schütze

○ Natural Language Processing, Jacob Eisenstein


https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf

○ A Primer on Neural Network Models for Natural Language Processing, Yoav Goldberg
http://u.cs.biu.ac.il/~yogo/nnlp.pdf

● Journals
○ Computational Linguistics, Natural Language Engineering, TACL, KBS, ACM TALLIP, ....

● Conferences
○ ACL, EMNLP, NAACL, COLING, CONLL, EACL, AACL, AAAI, IJCNLP, ICML, NIPS, WWW, KDD,
SIGIR, ICON, ….
12
shadakhtar:nlp:iiitd:2025:intro
Research papers repository https://aclanthology.org/

13
shadakhtar:nlp:iiitd:2025:intro
Research papers repository https://arxiv.org/list/cs.CL/recent

14
shadakhtar:nlp:iiitd:2025:intro
Introduction

shadakhtar:nlp:iiitd:2025:intro
A few questions
● What is language?
● A communication mean

● What makes the language so unique?


● Language is the difference between human and animal. So it makes a human, a human.

● What are modes of language?


● Visual, Verbal, and Written
● Language is very recent, vision is here for million years.

16
shadakhtar:nlp:iiitd:2025:intro
Natural Language Processing
● The Natural Language Processing (NLP) is a subfield of Artificial Intelligence (AI) that
deals with the human language processing in textual form.
● The NLP domain is a collection of problems dealing with human languages
● Prime objective is to provide a platform for human-machine interaction.
○ The machine should be capable enough to
■ understand the human language (e.g., English, Hindi, etc.),
■ process it, and
■ generate a response in human-understandable language.

17
shadakhtar:nlp:iiitd:2025:intro
Task Input Processing Output

Sentiment Analysis I adore this Lexicon lookup, Negation Handling, ... Positive
movie.

Machine Translation I have a cat. Choosing appropriate translation of each मे पास एक


word, Re-ordering of the translate words
as per the target language, … । (mere pass ek
billi hai.)
Question Answering Who is the CEO of IR (searching for relevant documents in Sundar Pichai
Google? repository), extracting answer, …

shadakhtar:nlp:iiitd:2020:intro
है
रे
बि
ल्ली
Turing Test [Alan Turing, 1950]
● Setup
○ Two rooms, two humans, and a computer.
■ Room 1: One human C
■ Room 2: One computer (A) and one human (B)

● A response generated from room 2 (either by A or B)


● C has to figure out the source of the response
○ If C is successful → “A” failed the turing test
○ Else, → “A” passed the turing test

19
shadakhtar:nlp:iiitd:2025:intro
NLP perspectives
● Science perspective
○ Understand the phenomenon of the language
○ How humans perceive a language?

● Engineering perspective
○ Build systems to understand, analyse, and generate language.

NLP = Linguistics + ML

20
shadakhtar:nlp:iiitd:2025:intro
NLP perspectives

21
shadakhtar:nlp:iiitd:2025:intro
What makes NLP challenging/interesting?
● What’s the difference between a programming language and natural language?
● Structure?
● Not really!
● Natural language can be structured, e.g., English follows S+V+O rule
● Ram eats mangoes.

● One word answer is Ambiguity!

22
shadakhtar:nlp:iiitd:2025:intro
Ambiguity
● Is ambiguity present in language only?
● No, ambiguity is prevalent in every dimension!

Duck or Rabbit?

23
shadakhtar:nlp:iiitd:2025:intro
Who has the
telescope?
Ambiguity in language
● I saw a girl with a telescope.

● I saw a girl with a bicycle.


OR

● I saw a bus with a telescope.

No
ambiguity!
24
shadakhtar:nlp:iiitd:2025:intro
Ambiguity in language
● I saw a girl with a telescope.
● Mary had a little lamb.

OR

25
shadakhtar:nlp:iiitd:2025:intro
Who’ll gift
whom?
Ambiguity in language
● I saw a girl with a telescope.
● Mary had a little lamb. I have to gift you some sweets.
● Mujhe aapko mithai khilani padegi.
OR

You have to gift me some sweets.

26
shadakhtar:nlp:iiitd:2025:intro
Ambiguity in language
● I saw a girl with a telescope.
● Mary had a little lamb.
● Mujhe aapko mithai khilani padegi.
● Public demand changes

OR

Public Public
demand: demand:

ABC OR XYZ
(a) Public demand changes, but does any body listen to them?
(b) Public demand changes, and we companies have to adapt to such changes.
(c) Public demand changes have pushed many companies out of business 27
shadakhtar:nlp:iiitd:2025:intro
Ambiguity in language
● I saw a girl with a telescope.
● Mary had a little lamb.
● Mujhe aapko mithai khilani padegi.
● Public demand changes
● Baby changing room
OR

IN OUT

Baby
changing
room

28
shadakhtar:nlp:iiitd:2025:intro
Ambiguity in language
● I saw a girl with a telescope.
● Mary had a little lamb.
● Mujhe aapko mithai khilani padegi.
● Public demand changes
● Baby changing room
● I ate rice with spoon.
● I ate rice with curd.
● I ate rice with Rahul.
Similar surface
structures but
different
interpretations!

29
shadakhtar:nlp:iiitd:2025:intro
Ambiguity and Punctuations!
A woman without her man is nothing.

30
shadakhtar:nlp:iiitd:2025:intro
Is it a valid
sentence?
A very interesting case: Restrictive clause
Buffaloe buffaloe Buffaloe buffaloe buffaloe buffaloe Buffaloe buffaloe

The word buffaloe has three senses:


1. Noun: Animal (plural is also buffaloe)
2. Proper Noun: American State
3. Verb: To bully someone

Buffaloe buffaloe, whom other Buffaloe buffaloe buffaloe, buffaloe Buffaloe buffaloe

I saw a boy wearing a red shirt. I saw a boy who was wearing a red shirt.

The sentence uses a restrictive clause, so there are no commas, nor is there the word "which," as in, "Buffalo buffalo, which Buffalo buffalo buffalo, buffalo Buffalo
buffalo." This clause is also a reduced relative clause, so the word that, which could appear between the second and third words of the sentence, is omitted. 31
shadakhtar:nlp:iiitd:2025:intro Dmitri Borgmann's Beyond Language: Adventures in Word and Thought. 1967.
Why NLP is difficult?
• Every language is unique and they offer new challenges in terms of lexical, structural,
and/or semantic properties.
• Languages evolve over the time
◦ Social media language, Code-mixed language
◦ Addition on new words, new rules, etc.
• Some other challenges
◦ Lexical/Grammatical mistakes: “The sun rises in the east” vs “The son rises in
the east.”
◦ Lack of context/World knowledge/Commonsense: “He put a turkey into the
fridge.” vs “He put an elephant into the fridge.”
◦ Echo-formation: Khana-wana, Roti-shoti
◦ Phonetic reference: My refrigerator was making khat khat khat khat khat sound
the whole night.
32
shadakhtar:nlp:iiitd:2025:intro
Role of ML in NLP
● Ambiguity resolution through classification
● Choose among multiple classes
● Choose the one with the HIGHEST SCORE aka. probability

}
● For example:
○ P(“The sun rises in the east”)
○ P(“The sun rise in the east”)
■ Less probable because of the grammatical mistake.
○ P(“The svn rises in the east”) Max
■ Less probable because of the lexical mistake.
○ P(“The sun rises in the west”)
■ Less probable because of the semantic mistake.

33
shadakhtar:nlp:iiitd:2025:intro
Standard NLP Tasks

shadakhtar:nlp:iiitd:2025:intro
NLP layers
● Understanding the semantics is a non-trivial task.
● Needs to performs a series of incremental tasks to achieve this.
● NLP happens in layers

35
shadakhtar:nlp:iiitd:2025:intro
NLP trinity

DL
36
shadakhtar:nlp:iiitd:2025:intro
Word and Token
● Word:
○ Smallest sequence of phonemes of a spoken language that can be uttered in isolation;
● Word Segmentation/Tokenization:
○ Breaking a string of characters into a sequence of words.
○ Smallest sequence of graphemes that are delimited with some predefined characters (space, comma, full-
stop, etc.);

Ram, Shyam, and Mohan are playing. [Ram] [,] [Shyam] [,] [and] [Mohan] [are] [playing] [.]

21,53,010 COVID cases in India. [21] [,] [53] [,] [010] [COVID] [cases] [in] [India] [.]

[21,53,010] [COVID] [cases] [in] [India] [.] ✅


Check this out…https://www.abc.com [Check] [this] [out] [.] [.] [.] [https] [:] [/] [/] [www] [.] [abc] [.] [com]

[Check] [this] [out] [...] [https://www.abc.com] ✅


#GreatDayEver [#] [Great] [Day] [Ever] 37
shadakhtar:nlp:iiitd:2025:intro
Morphology
● Field of linguistics that studies the internal structure of words; how they are formed, and
their relationship to other words in the same language.
● It defines word formation rule from the root word.
● Morpheme is the smallest linguistic unit that has semantic meaning
○ E.g.:
■ “Pre”, “ed”, “ing”, “s”, “es”, etc.
○ Dogs dog + s (plural)
○ Going go + ing (present participle)
○ Independently independent + ly (Adverb)
in + dependent + ly (Negation)
in + depend + ent + ly (relying)
in + de + pend + ent + ly
38
shadakhtar:nlp:iiitd:2025:intro Pend: (verb) to remain undecided or unsettled.
Morphology
● English, Chinese, etc. are commonly referred as morphologically-poor language.
● Indian, Turkish, Hungarian, etc. are termed as morphologically-rich language.

39
shadakhtar:nlp:iiitd:2025:intro
Part-of-Speech (PoS)
● Grammatical class of the word.

He ate an apple .

PRP VBD DT NN .

● PoS disambiguation
○ A word can belong to different grammatical classes.

He went to the park in a car .

PRP VBD TO DT NN IN DT NN .

They went to park the car in the shed .

PRP VBD TO VB DT NN IN DT NN .
40
shadakhtar:nlp:iiitd:2025:intro
Chunking
● Identification of non-recursive phrases (noun, verb, etc.)
● E.g.,
○ He reckons the current account deficit will narrow to only # 1.8 billion in September.
[NP He] [VP reckons] [NP the current account deficit] [VP will narrow] [PP to] [NP only # 1.8
billion] [PP in] [NP September]

○ He went to the Indian city Mumbai.


[NP He] [VP went] [PP to] [NP the Indian city Mumbai]

○ Mumbai green lights women icons on traffic signals earns global praise.
[NP Mumbai green lights women icons] [PP on] [NP traffic signals] [VP earns] [NP global
praise]

41
shadakhtar:nlp:iiitd:2025:intro
Syntax Processing
S
● Validate the grammatical structure of the sentence.
● Let, vocabulary = [the, mango, he, eats, ...]
○ He eats a mango. ✅


NP VP .
He mango eats a. ❌

● The sequence of words must follow the VBZ NP


grammatical structure of the language to form a
valid sentence.
○ Construct a parse tree.
PRP DT NN

He eats a mango
Parse Tree 42
shadakhtar:nlp:iiitd:2025:intro
Syntax Processing
S
● Every language has a grammar G = <V, T, P, S>.

Productions (P) or rules:


NP VP .
S → NP VP .
NP → PRP | NN | DT NP
VP → VBZ NP
PRP → He VBZ NP
VBZ → eats
DT → a
NN → mango
PRP DT NN

He eats a mango
43
shadakhtar:nlp:iiitd:2025:intro
Syntactic Ambiguity
S
S

NP VP .
NP VP .

VBZ NP
VBZ NP PP

PRP DT NN PP

PRP DT NN IN NP
IN NP

DT NN DT NN

telesco telesco
I saw a girl with a I saw a girl with a
pe pe
44
shadakhtar:nlp:iiitd:2025:intro
Semantic Role Labelling (SRL)
● Identify the semantic role of each argument (noun phrase) w.r.t. the predicate (main
verb) of the sentence

John drove Mary from Delhi to Pune in his car

Agent Patient source destination instrument

Ram hit Shyam with a hockey stick yesterday

Agent Patient instrument time

45
shadakhtar:nlp:iiitd:2025:intro
Textual Entailment
● Determine whether one natural language sentence entails (implies) another under an
ordinary interpretation

(Ram hit Shyam with a hockey stick yesterday. → Shyam got hurt) Positive TE
(Ram hit Shyam with a hockey stick yesterday. → Shyam did not get hurt) Negative TE
(Ram hit Shyam with a hockey stick yesterday. → Shyam got hospitalized) non TE
(Eyeing the huge market potential, currently led by Google, Yahoo took over search company
Overture Services Inc last year. → Yahoo bought Overture.) Positive TE

46
shadakhtar:nlp:iiitd:2025:intro
Pragmatics
● Pragmatics considers [Thomas, 1995]:
○ the negotiation of meaning between speaker and listener.
○ the context of the utterance.
○ the intention of the user.

○ Context/World knowledge: An employee coming late to the office.


■ Utterance: Do you know what time is it?
■ Literal meaning: Are you aware of the current time? (Response: Yes, it is 12:30 PM)
■ Pragmatic meaning: Why are you coming so late? (Response: Reason for being late.)

○ Intention:
■ Utterance: Can you pass the water bottle?
■ Literal meaning: Are you able to pass the water bottle? (Response: Yes, I can.)
■ Pragmatic meaning: Pass me the water bottle. (Response: Handover the water bottle)
47
shadakhtar:nlp:iiitd:2025:intro
Discourse
● Processing of sequence of sentences.

Mother said to John: Go to school. It is open today. Are you planning to bunk? Father
will be very angry.

○ Discourse processing helps answering these questions.


■ What is open?
■ Bunk what?
■ Why the father will be angry?

48
shadakhtar:nlp:iiitd:2025:intro
Coreference Resolution
● Two referring expressions used to refer to the same entity are said to corefer.
● Determine which phrases in a document corefer.

John shows Bob his Toyota yesterday. It’s similar to the one I bought five years ago.

That was really nice, but he like this one even better.

Victoria Chen, CFO of Megabucks Banking, saw her pay jump to $2.3 million, as the 38-year-old
became the company’s president. It is widely known that she came to Megabucks from rival
Lotsabucks.

[her, 38-year-old, she] → Referring to the same entity, i.e., Victoria Chen.
49
shadakhtar:nlp:iiitd:2025:intro
Information Extraction
● Extraction of relevant piece of information

● Named Entity Recognition (NER):


○ Identify names (Proper nouns)
■ [India]Location born [Sundar Pichai]Person is the CEO of [Google]Organization and its parent
company [Alphabet]Organization

● Relation extraction:
○ Relation among entities
■ CEO(Sundar Pichai, Google), CEO(Sundar Pichai, Alphabet), Born-at(Sundar Pichai,
India), ParentOrg(Alphabet, Google)

50
shadakhtar:nlp:iiitd:2025:intro
Word Sense Disambiguation (WSD)
● What does a word mean?

○ The fisherman went to the bank. Financial bank or river bank?

○ The fisherman went to the bank to withdraw money.


○ The fisherman went to the bank to fish.

51
shadakhtar:nlp:iiitd:2025:intro
Sentiment Analysis
● Extract polarity orientation of the subjectivity
○ Really superb pillow. Love to sleep on it.. very comfortable... Positive

○ It's a mass Chinese product. Too expensive. Thin and useless Negative

○ My neighbours are home and it’s good to wake up at 3am in the morning. Negative

○ Campus has deadly snakes. Negative

○ Shane Warne is a deadly spinner. Positive

○ The food was cheap. Positive

○ Not to mention the cheap service I got at the restaurant. Negative

○ Movie was 4 hrs long. Neutral

52
shadakhtar:nlp:iiitd:2025:intro
Machine Translation
● Given a sentence in the source language L1, convert it to the target language L2, such that
the semantic (adequacy and fluency) is preserved.

Source: Google Translate 53


shadakhtar:nlp:iiitd:2025:intro
Machine Translation: Interesting examples

2020 2024 2021


54
shadakhtar:nlp:iiitd:2025:intro
Machine Translation: Interesting examples

55
shadakhtar:nlp:iiitd:2025:intro
2025
Summarization
● Given a document, summarize the semantics (extract relevant information) in shorter length text.

● Document
○ Sen. Barack Obama sealed the Democratic presidential nomination last night after a grueling
and history-making campaign against Sen. Hillary Rodham Clinton that will make him the first
African American to head a major-party ticket.

● Summary
○ Barack Obama is the Democratic presidential candidate.

56
shadakhtar:nlp:iiitd:2025:intro
Question Answering
● Answer natural language questions based on information presented in the repository.

● Factoid Questions
○ Question: Who is the author of the book Wings of Fire?
○ Answer: A. P. J, Abdul Kalam

● List Questions
○ Question: What are the islands in India?
○ Answer: Andaman Island, Nicobar Island, Labyrinth Island, Barren Island

● Descriptive Questions
○ Question: What is Greenhouse effect?
○ Answer: The analogy used to describe the ability of gases in the atmosphere to absorb heat
from the earth’s surface.

57
shadakhtar:nlp:iiitd:2025:intro
Dialog System and Chatbot
● Conversation of two or more parties.

58
shadakhtar:nlp:iiitd:2025:intro
Hate Speech
• Any post that targets a speci c individual/group of people based on their ethnicity,
religious beliefs, geographical belonging, race, etc., with malicious intentions of
disseminating hate or emboldening violence.
• #BuildThatWall #BuildTheDamnWall I’m sorry my Lord #Jesus but people are
just deaf down here
• Women ... Can’t live with them...Can’t shoot them

• Related terms
• Insult, Abuse, Offensive, Provocative
59
shadakhtar:nlp:iiitd:2025:intro
fi
Fake News
• A piece of information or an alleged claim that is veri able to be false.
• Intentionally created posts to spread malicious and false narratives
◦ Leverages the chaos/misinformation to gain political, financial, or regional advantages in a quick
time

60
shadakhtar:nlp:iiitd:2025:intro
fi
Thanks

shadakhtar:nlp:iiitd:2025:intro

You might also like