0% found this document useful (0 votes)

65 views53 pages

1 cs772 Introduction Week of 3jan22

Uploaded by

Atul Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views53 pages

1 cs772 Introduction Week of 3jan22

Uploaded by

Atul Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

CS772: Deep Learning for

Natural Language Processing

(DL-NLP)
Introduction
Pushpak Bhattacharyya
Computer Science and Engineering
Department
IIT Bombay
Week 1 of 3rd Jan, 2022
Nature of NLP
Natural Language Processing

Art, science and technique of making

computers understand the generate
language
NLP is layered Processing,
Multidimensional too

Problem
Discourse and Coreference
Semantics NLP
Increased Trinity
Complexity Semantics Parsing
Of
Processing Part of Speech
Parsing Tagging

Morph
Analysis Marathi French
Chunking HMM
Hindi English
CRF
Language
POS tagging MEMM

Algorithm
Morphology
Main Challenge: AMBIGUITY
An interesting whatsapp
conversation (English and Bengali)
Lady A: Yesterday you told me about shop that
sells artificial jewellery
<bn>ki naam jeno?</bn> (what did you say was
the name?)
Lady B: nykaa
Lady A (offended): What do you mean Madam?
Is this the way to talk?
Lady B: <bn> kena ki holo?</bn> (why what
happened?)
Root cause of the problem: Ambiguity!
• NE-non NE ambiguity (proper noun-
common noun)
• Aggravated by code mixing
• “Nykaa”: name of the shop
• Sounds similar to “ন্যাকা” (nyaakaa),
meaning somebody “who feigns
ignorance/innocence” in a derogatory
sense
• An offensive word
NYKAA Fashion
Ambiguity at every layer, for every
language, for every mode

Problem
Discourse and Coreference
Semantics NLP
Increased Trinity
Complexity Semantics Parsing
Of
Processing Part of Speech
Parsing Tagging

Morph
Analysis Marathi French
Chunking HMM
Hindi English
CRF
Language
POS tagging MEMM

Algorithm
Morphology
Multimodal is important

• Signals from other modes

• E.g., Sarcasm
Data + Classifier > Human
decision maker !!

Case for ML-NLP

LEARN from Data with Probability
Based Scoring
• With LOTs of data, learn with
– High precision (small possibility of error of
commission)
– High recall (small possibility of error of
omission)
• But depends on human engineered
features, i.e., capturing essential
properties
Modern Modus Operandi: End to
End DL-NLP

An example deep n/w for

author identification
Problem Knowledge and Deep
Learning
● Large number of parameter in DL-NLP:
Why?
● Fixing large number parameter values need
large amounts of data (text for NLP).
● If we know underlying distribution then we
can make predictions.

IMP: The number of needed parameters can

be reduced by using knowledge.
NLP is Important

Cutting edge applications

Large Applications to reduce the
problem of scale
• (A) Machine Translation (demo)
• (B) Information Extraction
• (C) Sentiment and Emotion Analysis

• Complexity and applicability increases by

requirement and introduction of
Multilinguality, Multimodality
Dense Image Captioning
OCR-MT-TTS

• Input image:

• English transcription: Take the risk or loose the chance

• Hindi Translation: जोखिम लें या मौका गंवा दें ।
• Hindi speech
Course: Basic Info
• Slot 1: Monday 8.30, Tuesday 9.30
and Thursday 10.30
• TA Team: Nihar Ranjan Sahoo,
Apoorva Nunna, Kunal Verma, Vishal
Pramanik, Harsh Peswani, Ankush
Agrawal
• http://www.cfilt.iitb.ac.in/~cs772-2022
• Channels of communication: MS
Teams, Moodle, Course Website
Evaluation Scheme (tentative)

• 50%: Reading, Thinking,

Comprehending
– Quizzes (25) (at least 4)
– Endsem (25)
• 50%: Doing things, Hands on
– Assignments (25%)
– Project (25%)
Course Content: Task vs. Technique Matrix
Task (row) vs. Rules Classical ML Deep Learning
Technique (col) Matrix Based/Kn
owledge-
Based

Perceptron Logistic SVM Graphical Models Dense FF with RNN- CNN

Regression (HMM, MEMM, BP and softmax LSTM
CRF)

Morphology
POS
Chunking
Parsing
NER, MWE
Coref
WSD
Machine Translation

Semantic Role
Labeling

Sentiment
Question Answering
Books
• 1. Dan Jurafsky and James Martin,
Speech and Language Processing, 3 rd
Edition, 2019.
• 2. Ian Goodfellow, Yoshua Bengio and
Aaron Courville, Deep Learning, MIT
Press, 2016.
Books (2/2)
• 4. Christopher Manning and Heinrich
Schutze, Foundations of Statistical
NaturalLanguage Processing, MIT
Press, 1999.

• 5. Pushpak Bhattacharyya, Machine

Translation, CRC Press, 2017.
Journals and Conferences
• Journals: Computational Linguistics,
Natural Language Engineering, Journal
of Machine Learning Research (JMLR),
Neural Computation, IEEE Transactions
on Neural Networks

• Conferences: ACL, EMNLP, NAACL,

EACL, AACL, NeuriPS, ICML
Useful NLP, ML, DL libraries

• NLTK
• Scikit-Learn
• Pytorch
• Tensorflow (Keras)
• Huggingface
• Spacy
• Stanford Core NLP
26
cs626-pos:pushpak
week-of-17aug20

Nature of DL-NLP
The Trinity of NLP
Linguistics

Probability Coding (DL)

3 Generations of NLP

• Rule based NLP is also called Model

Driven NLP
• Statistical ML based NLP (Hidden
Markov Model, Support Vector
Machine)
• Neural (Deep Learning) based NLP
Illustration with POS tagging
Case of “present”

He gifted me the/a/this/that present.

They present innovative ideas.

He was present in the class.

Disambiguation of POS tag

• If no ambiguity, learn a table of

words and its corresponding tags.

• If ambiguity, then look for the

contextual information i.e. look-back
or look-ahead.
31
cs626-pos:pushpak
week-of-17aug20

Table look-up will not do

best ADJ ADV NP V
better ADJ ADV V DET

close RB JJ VB NN (running close to the

competitor, close escape, close the door,
towards the close of the play)
cut V N VN VD
even ADV DET ADJ V
grant NP N V –
hit V VD VN N
lay ADJ V NP VD
left VD ADJ N VN
like CNJ V ADJ P –
near P ADV ADJ DET
open ADJ V N ADV
past N ADJ DET P
present ADJ ADV V N
read V VN VD NP
right ADJ N DET ADV
second NUM ADV DET N
set VN V VD N –
that CNJ V WH DET
Rule Based POS Tagging
• For Present_NN (look-back)

– If present is preceded by determiner (the/a) or

demonstrative (this/that), then the POS tag will be
noun.

• Does this rule guarantee 100% precision and

100% recall?
– False positive:
• The present_ADJ case is not convincing. Adjective preceded by “the”

– False negative:
• Present foretells the future.
Noun but not preceded by “the”
Rule based POS tagging
cumbersome: statistical POS tagging
ML-POS needs training data
(1) He gifted me the/a/this/that
present_NN.
(2) They present_VB innovative ideas.
(3) He was present_JJ in the class.

POS options form a search graph

W: ^ Brown foxes jumped over the fence .

T: ^ JJ NNS VBD NN DT NN .

NN VBS JJ IN VB

VBD NN
DT NN .
NNS
JJ IN
NN DT VB
.
VBS JJ
DT
^
NNS RB
DT
JJ

VBS

^ Brown foxes jumped over the fence .

VBD NN
DT NN .
NNS
JJ IN
NN DT VB
.
VBS JJ
DT
^
NNS RB
DT
JJ

VBS

^ Brown foxes jumped over the fence .

Find the PATH with MAX Score.

What is the meaning of score?

Noisy Channel Model

W T
Noisy Channel

(wn, wn-1, … , w1) (tm, tm-1, … , t1)

Sequence W is transformed into

sequence T

T*=argmax(P(T|W))
T

W*=argmax(P(W|T))
36 W
37
cs626-hmm:pushpak
week-of-24aug20

HMM: Generative Model

^_^ People_N Jump_V High_R ._.

Lexical
Probabilities

^ N V A .

V N N Bigram
Probabilities

This model is called Generative model.

Here words are observed from tags as states.
This is similar to HMM.
CRF Based POS Tagging
Marathi

NN VG NN VBD
B B B I
Man tried flying

PRP VINF NN VBD

B B B I
He started to walk
Harshada Gune, Mugdha Bapat, Mitesh Khapra and Pushpak Bhattacharyya, Verbs are where all the Action Lies:
Experiences of Shallow Parsing of a Morphologically Rich Language, Computational Linguistics Conference
(COLING 2010), Beijing, China, August 2010.
Decoding for the best Sequence

i ranges over the

input
positions
DL based POS Tagging

PRON VB NNP

Decoder

Encoder

I love India
How to input text to neural net? Issue
of REPRESENTATION
• Inputs have to be sets of numbers
– We will soon see why

• These numbers form

REPRESENTATIONS

• What is a good representation? At what

granularity: words, n-grams, phrases,
sentences
Issues
• What is a good representation? At what
granularity: words, n-grams, phrases,
sentences
• Sentence is important- (a) I bank with
SBI; (b) I took a stroll on the river bank;
(c) this bank sanctions loans quickly
• Each ‘bank’ should have a differengt
representation
• We have to LEARN these representations
Principle behind representation

• Proverb: “A man is known by the

company he keeps”

• Similalry: “A word is known/represented

by the company it keeps”

• “Company”  Distributional Similarity

Representation: to learn or not learn?

• 1-hot representation does not capture

many nuances, e.g., semantic similarity
– But is a good starting point
• Collocations also do not fully capture all
the facets
– But is a good starting point
So learn the representation…

• Learning Objective

• MAXIMIZE CONTEXT
PROBABILITY
Neural LM
Neural Probability Computer

P(people
People laugh laugh
loudly loudly)

How does this happen

We have to first get the
representation in place
• Word representation
• Phrase representation
• Sentence representation
• Long text representation
Feedforward Neural Language
Model (FFNNLM): Bengio et al 2003
FFNNLM
• V is the vocabulary size, m is the dimension
of the feature vectors; word wi is projected
as the distributed feature vector C(wi) ε Rm
• The input x of the FFNN is a concatenation
of feature vectors of n - 1 words
• Softmax output layer to guarantee all the
conditional probabilities of words positive
and summing to one
• The learning algorithm is the Stochastic
Gradient Descent (SGD) method using the
backpropagation (BP) algorithm
Recurrent NN LM (RNNLM)- Mikolov
et al 2010
RNNLM

• RNN has an internal state that

changes with the input on each time
step, taking into account all previous
contexts

• State st can be derived from the input

word vector wt and the state st-1

Natural Language Processing A Machine Learning Perspective by Yue Zhang, Westlake University Zhiyang Teng, Westlake University
No ratings yet
Natural Language Processing A Machine Learning Perspective by Yue Zhang, Westlake University Zhiyang Teng, Westlake University
768 pages
NLP StudyMaterial
No ratings yet
NLP StudyMaterial
540 pages
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
No ratings yet
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
103 pages
Intro DL 10 NLP
No ratings yet
Intro DL 10 NLP
99 pages
2023 07 28 Evolution of Language Models
No ratings yet
2023 07 28 Evolution of Language Models
73 pages
NLP DeepNLP
No ratings yet
NLP DeepNLP
61 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
Introduction (BT4222) YL
No ratings yet
Introduction (BT4222) YL
48 pages
UNIT - 03 (All Topics)
No ratings yet
UNIT - 03 (All Topics)
54 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
NLP Intro
No ratings yet
NLP Intro
74 pages
Introduction To NLPAbebe Zerihun
No ratings yet
Introduction To NLPAbebe Zerihun
45 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
Unit 1
No ratings yet
Unit 1
99 pages
05 Introduction To NLP
No ratings yet
05 Introduction To NLP
63 pages
Deep Learning For Natural Language GDG Bloomington 1690248059
No ratings yet
Deep Learning For Natural Language GDG Bloomington 1690248059
41 pages
Unit 5 Updated
No ratings yet
Unit 5 Updated
107 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
33 pages
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
No ratings yet
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
83 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
Lecture 2
No ratings yet
Lecture 2
80 pages
Lect 01
No ratings yet
Lect 01
28 pages
Unit 5 NLP
No ratings yet
Unit 5 NLP
24 pages
Machine Learning Natural Language 2023
No ratings yet
Machine Learning Natural Language 2023
28 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
Unit 5
No ratings yet
Unit 5
107 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
Massp2023 NLP
No ratings yet
Massp2023 NLP
26 pages
Deep Learning Paper1
No ratings yet
Deep Learning Paper1
16 pages
Natural Language Processing With Deep Learning 1 PDF
No ratings yet
Natural Language Processing With Deep Learning 1 PDF
37 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
Deep Learning For Natural Language Processing: July 2021
No ratings yet
Deep Learning For Natural Language Processing: July 2021
10 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
68 pages
Mod 1
No ratings yet
Mod 1
71 pages
Unit 1 TB
No ratings yet
Unit 1 TB
19 pages
AI For Business Applications Unit 1: Introduction To AI: Faculty Name: Dr. Shivangi Agarwal
No ratings yet
AI For Business Applications Unit 1: Introduction To AI: Faculty Name: Dr. Shivangi Agarwal
52 pages
2009 Tutorial Nips
No ratings yet
2009 Tutorial Nips
113 pages
Unit 1
No ratings yet
Unit 1
19 pages
Slide
No ratings yet
Slide
28 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
NLP Handwritten Notes
No ratings yet
NLP Handwritten Notes
26 pages
Deep Learning in Natural Language Processing A State-of-the-Art Survey
No ratings yet
Deep Learning in Natural Language Processing A State-of-the-Art Survey
6 pages
The 7 NLP Techniques That Will Change How You Communicate in The Future (Part I)
No ratings yet
The 7 NLP Techniques That Will Change How You Communicate in The Future (Part I)
19 pages
AI4youngster - 6 - Topic NLP
No ratings yet
AI4youngster - 6 - Topic NLP
66 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
2020 NLPDeepLearning
No ratings yet
2020 NLPDeepLearning
72 pages
NLP Unit 1
No ratings yet
NLP Unit 1
44 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
No ratings yet
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
45 pages
1 s2.0 S0925231221010997 Main
No ratings yet
1 s2.0 S0925231221010997 Main
14 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
39 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
Big Data Analytics Chap 11
No ratings yet
Big Data Analytics Chap 11
8 pages
NLP - Natural Language Processing
No ratings yet
NLP - Natural Language Processing
74 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Introduction To NLP - First - Week - Lecture - 2st
No ratings yet
Introduction To NLP - First - Week - Lecture - 2st
4 pages
Project Plan - Kel 5 PDF
No ratings yet
Project Plan - Kel 5 PDF
5 pages
Question Bank Pdes
No ratings yet
Question Bank Pdes
26 pages
Object Detection
No ratings yet
Object Detection
96 pages
3 cs772 Skip Gram Perceptron Week of 17jan22
No ratings yet
3 cs772 Skip Gram Perceptron Week of 17jan22
69 pages
Wells Far Go
No ratings yet
Wells Far Go
42 pages
2 MA515 Notes
No ratings yet
2 MA515 Notes
44 pages
3 MA515 Notes
No ratings yet
3 MA515 Notes
38 pages
Sequence Modeling - Recurrent Networks: Biplab Banerjee
No ratings yet
Sequence Modeling - Recurrent Networks: Biplab Banerjee
66 pages
Loss and Opt
No ratings yet
Loss and Opt
22 pages
Generative Adversarial Networks: Biplab Banerjee
No ratings yet
Generative Adversarial Networks: Biplab Banerjee
54 pages
cs224n-2021-LSTM NN
No ratings yet
cs224n-2021-LSTM NN
59 pages
Exam Schedule BA BBA
No ratings yet
Exam Schedule BA BBA
1 page
VAE Continued: Biplab Banerjee
No ratings yet
VAE Continued: Biplab Banerjee
23 pages
Midsem1 Merged
No ratings yet
Midsem1 Merged
21 pages
Flipkart Product Recommendation System: T. Keerthana, T. Bhavani, N. Suma Priya, V. Sai Prathyusha, K.Santhi Sri
No ratings yet
Flipkart Product Recommendation System: T. Keerthana, T. Bhavani, N. Suma Priya, V. Sai Prathyusha, K.Santhi Sri
8 pages
Omt HB CGT Pub at Merged
No ratings yet
Omt HB CGT Pub at Merged
6 pages
Paper Review 1 Gradcam++
No ratings yet
Paper Review 1 Gradcam++
2 pages
Paper - Review - 2 - EfficientDet - Scalable and Efficient Object Detection
No ratings yet
Paper - Review - 2 - EfficientDet - Scalable and Efficient Object Detection
2 pages
Xequenceai Social Media Manager Internship November 12 2022
No ratings yet
Xequenceai Social Media Manager Internship November 12 2022
1 page

1 cs772 Introduction Week of 3jan22

Uploaded by

1 cs772 Introduction Week of 3jan22

Uploaded by

CS772: Deep Learning for

Natural Language Processing

Art, science and technique of making

• Signals from other modes

Case for ML-NLP

An example deep n/w for

IMP: The number of needed parameters can

Cutting edge applications

• Complexity and applicability increases by

• English transcription: Take the risk or loose the chance

• 50%: Reading, Thinking,

Perceptron Logistic SVM Graphical Models Dense FF with RNN- CNN

• 5. Pushpak Bhattacharyya, Machine

• Conferences: ACL, EMNLP, NAACL,

Probability Coding (DL)

• Rule based NLP is also called Model

He gifted me the/a/this/that present.

They present innovative ideas.

He was present in the class.

• If no ambiguity, learn a table of

• If ambiguity, then look for the

Table look-up will not do

close RB JJ VB NN (running close to the

– If present is preceded by determiner (the/a) or

• Does this rule guarantee 100% precision and

POS options form a search graph

^ Brown foxes jumped over the fence .

^ Brown foxes jumped over the fence .

Find the PATH with MAX Score.

What is the meaning of score?

(wn, wn-1, … , w1) (tm, tm-1, … , t1)

Sequence W is transformed into

HMM: Generative Model

^_^ People_N Jump_V High_R ._.

This model is called Generative model.

PRP VINF NN VBD

i ranges over the

• These numbers form

• What is a good representation? At what

• Proverb: “A man is known by the

• Similalry: “A word is known/represented

• “Company”  Distributional Similarity

• 1-hot representation does not capture

How does this happen

• RNN has an internal state that

• State st can be derived from the input

You might also like