1 - Intro - To - NLP 2
1 - Intro - To - NLP 2
Introduction to NLP
Instructor: Moushmi Dasgupta
Some of the material is from Georgia Institute of Technology, Atlanta, GA, USA.
AGENDA
Introduction to NLP and Business Applications
1.1 What is Language?
1.2 Building Blocks of Language
1.3 Why is NLP Challenging?
1.4 Machine Learning, Deep Learning, and NLP: An Overview
1.5 Approaches to NLP in Business Analytics
Pre-requisite:
1. Python programming
2. An understanding of Machine Learning
3. Invest in attending classroom sessions (Weekly 1 or 2 classes of 3+ hours duration)
4. Invest in yourself with1 hour of self study everyday
Human Language
Google search reports that there are 7,151 living languages
The system of
sounds and writing
that human beings
use to express
their thoughts,
ideas and feelings
6
Machine Translation
7
Natural Language Processing
▪ Email filters.
▪ Smart assistants – Siri, Alexa, Google Assistant
▪ Search results
▪ Predictive text Analytics
▪ Language translation
▪ Digital phone calls
▪ Data analysis
▪ Text analytics
Level Of Linguistic
Knowledge
10
Phonetics, Phonology
¡ Pronunciation Modeling
11
Some of the material is from Georgia Institute of Technology, Atlanta, GA, USA.
Words
¡ Language Modeling
¡ Tokenization
¡ Spelling correction
12
Morphology
¡ Morphology analysis
¡ Tokenization
¡ Lemmatization
14
Some of the material is from Georgia Institute of Technology, Atlanta, GA, USA.
Syntax
¡ Syntactic parsing
15
Some of the material is from Georgia Institute of Technology, Atlanta, GA, USA.
Semantics
16
Discourse
17
English Lexicon
meaning
"of or for words."
Why NLP is Hard?
1. Ambiguity
2. Scale
3. Sparsity
4. Variation
5. Expressivity
6. Unmodeled Variables
7. Unknown representations
19
Why NLP is Hard?
1. Ambiguity
2. Scale
3. Sparsity
4. Variation
5. Expressivity
6. Unmodeled Variables
7. Unknown representations
20
Ambiguity
21
22
Part of Speech Tagging
Sentences with all 8 Parts of Speech It is a process of converting a
1.Noun – Tom lives in New York. sentence to forms – list of
2.Pronoun – Did she find the book she was words, list of tuples (where
looking for?
3.Verb – I reached home. each tuple is having a form
4.Adverb – The tea is too hot. (word, tag)). The tag in case of
5.Adjective – The movie was amazing. is a part-of-speech tag, and
6.Preposition – The candle was kept under the signifies whether the word is a
table. noun, adjective, verb, and so on.
7.Conjunction – I was at home all day, but I am
feeling very tired.
8.Interjection – Oh! I forgot to turn off the
stove.
23
Part of Speech Tagging
24
Part of Speech Tagging
25
Syntax
26
Morphology + Syntax
A ship-shipping
ship, shipping
shipping-ships
27
Some of the material is from Georgia Institute of Technology, Atlanta, GA, USA.
Semantics
28
Syntax + Semantics
29
Some of the material is from Georgia Institute of Technology, Atlanta, GA, USA.
Syntax + Semantics
30
Dealing with Ambiguity
31
Corpora
• A corpus is a collection of text
• Often annotated in some way
(Sometimes just lots of text)
¡ Examples
¡ Penn Treebank: 1M words of parsed WSJ
¡ Canadian Hansards: 10M+ words of French/English sentences
¡ Yelp reviews
¡ The Web!
Rosetta Stone
32
33
Why NLP is Hard?
1. Ambiguity
2. Scale
3. Sparsity
4. Variation
5. Expressivity
6. Unmodeled Variables
7. Unknown representations
34
Sparsity
35
Sparsity
36
Some of the material is from Georgia Institute of Technology, Atlanta, GA, USA.
Sparsity
37
Why NLP is Hard?
1. Ambiguity
2. Scale
3. Sparsity
4. Variation
5. Expressivity
6. Unmodeled Variables
7. Unknown representations
38
Variation
¡ Suppose we train a part of speech tagger or a parser on the Wall Street Journal
¡ What will happen if we try to use this tagger/parser for social media?
¡ “ikr smh he asked fir yo last name so he can add u on fb lololol”
39
Variation
40
Why NLP is Hard?
1. Ambiguity
2. Scale
3. Sparsity
4. Variation
5. Expressivity
6. Unmodeled Variables
7. Unknown representations
41
Expressivity
¡ Not only can one form have different meanings (ambiguity) but the same meaning
can be expressed with different forms:
¡ She gave the book to Tom vs. She gave Tom the book
¡ Some kids popped by vs. A few children visited
¡ Is that window still open? vs. Please close the window
42
Unmodeled Variables
World knowledge
I dropped the glass on the floor and it broke
I dropped the hammer on the glass and it broke 48
Unmodeled Representation
Very difficult to capture what is ! , since we don’t even know how to represent the
knowledge a human has/needs:
¡ What is the “meaning” of a word or sentence?
¡ How to model context?
¡ Other general knowledge?
44
Desiderate for NLP Models
45
Symbolic and Probabilistic NLP
46
Probabilistic and Connectionist NLP
47
AI – ML – DL - NLP
NLP vs. Machine Learning
49
NLP vs. Linguistics
50
Fields with Connections to NLP
¡ Machine learning
¡ Linguistics (including psycho-, socio-, descriptive, and theoretical)
¡ Cognitive science
¡ Information theory
¡ Logic
¡ Data science
¡ Political science
¡ Psychology
¡ Economics
¡ Education
51
Today’s Applications
¡ Conversational agents
¡ Information extraction and question answering
¡ Machine translation
¡ Opinion and sentiment analysis
¡ Social media analysis
¡ Visual understanding
¡ Essay evaluation
¡ Mining legal, medical, or scholarly literature
57
Factors Changing NLP Landscape
58
Python Libraries for NLP
¡ NLP Pipeline
70
Some of the material is from Georgia Institute of Technology, Atlanta, GA, USA.