1 cs772 Introduction Week of 3jan22
1 cs772 Introduction Week of 3jan22
Problem
Discourse and Coreference
Semantics NLP
Increased Trinity
Complexity Semantics Parsing
Of
Processing Part of Speech
Parsing Tagging
Morph
Analysis Marathi French
Chunking HMM
Hindi English
CRF
Language
POS tagging MEMM
Algorithm
Morphology
Main Challenge: AMBIGUITY
An interesting whatsapp
conversation (English and Bengali)
Lady A: Yesterday you told me about shop that
sells artificial jewellery
<bn>ki naam jeno?</bn> (what did you say was
the name?)
Lady B: nykaa
Lady A (offended): What do you mean Madam?
Is this the way to talk?
Lady B: <bn> kena ki holo?</bn> (why what
happened?)
Root cause of the problem: Ambiguity!
• NE-non NE ambiguity (proper noun-
common noun)
• Aggravated by code mixing
• “Nykaa”: name of the shop
• Sounds similar to “ন্যাকা” (nyaakaa),
meaning somebody “who feigns
ignorance/innocence” in a derogatory
sense
• An offensive word
NYKAA Fashion
Ambiguity at every layer, for every
language, for every mode
Problem
Discourse and Coreference
Semantics NLP
Increased Trinity
Complexity Semantics Parsing
Of
Processing Part of Speech
Parsing Tagging
Morph
Analysis Marathi French
Chunking HMM
Hindi English
CRF
Language
POS tagging MEMM
Algorithm
Morphology
Multimodal is important
• Input image:
Morphology
POS
Chunking
Parsing
NER, MWE
Coref
WSD
Machine Translation
Semantic Role
Labeling
Sentiment
Question Answering
Books
• 1. Dan Jurafsky and James Martin,
Speech and Language Processing, 3 rd
Edition, 2019.
• 2. Ian Goodfellow, Yoshua Bengio and
Aaron Courville, Deep Learning, MIT
Press, 2016.
Books (2/2)
• 4. Christopher Manning and Heinrich
Schutze, Foundations of Statistical
NaturalLanguage Processing, MIT
Press, 1999.
• NLTK
• Scikit-Learn
• Pytorch
• Tensorflow (Keras)
• Huggingface
• Spacy
• Stanford Core NLP
26
cs626-pos:pushpak
week-of-17aug20
Nature of DL-NLP
The Trinity of NLP
Linguistics
– False negative:
• Present foretells the future.
Noun but not preceded by “the”
Rule based POS tagging
cumbersome: statistical POS tagging
ML-POS needs training data
(1) He gifted me the/a/this/that
present_NN.
(2) They present_VB innovative ideas.
(3) He was present_JJ in the class.
T: ^ JJ NNS VBD NN DT NN .
NN VBS JJ IN VB
JJ
RB
VBD NN
DT NN .
NNS
JJ IN
NN DT VB
.
VBS JJ
DT
^
NNS RB
DT
JJ
VBS
VBS
W T
Noisy Channel
T*=argmax(P(T|W))
T
W*=argmax(P(W|T))
36 W
37
cs626-hmm:pushpak
week-of-24aug20
Lexical
Probabilities
^ N V A .
V N N Bigram
Probabilities
NN VG NN VBD
B B B I
Man tried flying
PRON VB NNP
Decoder
Encoder
I love India
How to input text to neural net? Issue
of REPRESENTATION
• Inputs have to be sets of numbers
– We will soon see why
• Learning Objective
• MAXIMIZE CONTEXT
PROBABILITY
Neural LM
Neural Probability Computer
P(people
People laugh laugh
loudly loudly)