SlideShare a Scribd company logo
1
@graphific
Roelof Pieters
Deep	
  Learning	
  for	
  Natural	
  
Language	
  Processing:	
  Word	
  
Embeddings
3	
  December	
  2015	
  

KTH
www.csc.kth.se/~roelof/
roelof@kth.se
Language Understanding
2
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is productive:

We will always encounter new words or new
constructions
3. Language is culturally specific
Some of the challenges in Language Understanding:
3
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is productive:

We will always encounter new words or new
constructions
• plays well with others
VB ADV P NN
NN NN P DT
• fruit flies like a banana
NN NN VB DT NN
NN VB P DT NN
NN NN P DT NN
NN VB VB DT NN
• the students went to class
DT NN VB P NN
4
Some of the challenges in Language Understanding:
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is productive:

We will always encounter new words or new
constructions
5
Some of the challenges in Language Understanding:
[Karlgren 2014, NLP Sthlm Meetup]6
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is productive:

We will always encounter new words or new
constructions
3. Language is culturally specific
Some of the challenges in Language Understanding:
7
ML: Traditional Approach
1. Gather as much LABELED data as you can get
2. Throw some algorithms at it (mainly put in an SVM and
keep it at that)
3. If you actually have tried more algos: Pick the best
4. Spend hours hand engineering some features / feature
selection / dimensionality reduction (PCA, SVD, etc)
5. Repeat…
For each new problem/question::
8
Machine Learning for NLP
Data
Classic Approach: Data is fed into a learning algorithm:
Learning 

Algorithm
9
Machine Learning for NLP
some of the (many) treebank datasets
source: http://www-nlp.stanford.edu/links/statnlp.html#Treebanks
!
10
Penn Treebank
That’s a lot of “manual” work:
11
• the students went to class
DT NN VB P NN
• plays well with others
VB ADV P NN
NN NN P DT
• fruit flies like a banana
NN NN VB DT NN
NN VB P DT NN
NN NN P DT NN
NN VB VB DT NN
With a lot of issues:
Penn Treebank
12
Machine Learning for NLP
Learning 

Algorithm
Data
“Features”
Prediction
Prediction/

Classifier
train set
test set
13
Machine Learning for NLP
Learning 

Algorithm
“Features”
Prediction
Prediction/

Classifier
train set
test set
14
One Model rules them all ?



DL approaches have been successfully applied to:
Deep Learning: Why for NLP ?
Automatic summarization Coreference resolution Discourse analysis
Machine translation Morphological segmentation Named entity recognition (NER)
Natural language generation
Natural language understanding
Optical character recognition (OCR)
Part-of-speech tagging
Parsing
Question answering
Relationship extraction
sentence boundary disambiguation
Sentiment analysis
Speech recognition
Speech segmentation
Topic segmentation and recognition
Word segmentation
Word sense disambiguation
Information retrieval (IR)
Information extraction (IE)
Speech processing
15
Deep Learning: Why for NLP ?
16
Deep Learning for Natural Language Processing: Word Embeddings
• What is the meaning of a word?

(Lexical semantics)
• What is the meaning of a sentence?

([Compositional] semantics)
• What is the meaning of a longer piece of text?
(Discourse semantics)
Semantics: Meaning
18
• NLP treats words mainly (rule-based/statistical
approaches at least) as atomic symbols:

• or in vector space:

• also known as “one hot” representation.
• Its problem ?
Word Representation
Love Candy Store
[0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …]
Candy [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] AND
Store [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 …] = 0 !
19
Word Representation
20
• Structure corresponds to meaning:
Structure and Meaning
21
• Semantics
• Syntax
22
NLP: what can we work with?
• Language models define probability distributions
over (natural language) strings or sentences
• Joint and Conditional Probability
Language Model
23
• Language models define probability distributions
over (natural language) strings or sentences
Language Model
24
• Language models define probability distributions
over (natural language) strings or sentences
Language Model
25
Word senses
What is the meaning of words?
• Most words have many different senses:

dog = animal or sausage?
How are the meanings of different words related?
• - Specific relations between senses:

Animal is more general than dog.
• - Semantic fields:

money is related to bank
26
Word senses
Polysemy:
• A lexeme is polysemous if it has different related
senses
• bank = financial institution or building
Homonyms:
• Two lexemes are homonyms if their senses are
unrelated, but they happen to have the same spelling
and pronunciation
• bank = (financial) bank or (river) bank
27
Word senses: relations
Symmetric relations:
• Synonyms: couch/sofa

Two lemmas with the same sense
• Antonyms: cold/hot, rise/fall, in/out

Two lemmas with the opposite sense
Hierarchical relations:
• Hypernyms and hyponyms: pet/dog

The hyponym (dog) is more specific than the hypernym
(pet)
• Holonyms and meronyms: car/wheel

The meronym (wheel) is a part of the holonym (car)
28
Distributional representations
“You shall know a word by the company it keeps”

(J. R. Firth 1957)
One of the most successful ideas of modern
statistical NLP!
these words represent banking
• Hard (class based) clustering models
• Soft clustering models
29
Distributional hypothesis
He filled the wampimuk, passed it
around and we all drunk some
We found a little, hairy wampimuk
sleeping behind the tree
(McDonald & Ramscar 2001)
30
Distributional semantics
Landauer and Dumais (1997), Turney and Pantel (2010), …
31
Distributional semantics
Distributional meaning as co-occurrence vector:
32
Distributional representations
• Taking it further:
• Continuous word embeddings
• Combine vector space semantics with the
prediction of probabilistic models
• Words are represented as a dense vector:
Candy =
33
Word Embeddings: SocherVector Space Model
adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
34
Word Embeddings: SocherVector Space Model
adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
the country of my birth
the place where I was born
35
• Can theoretically (given enough units) approximate
“any” function
• and fit to “any” kind of data
• Efficient for NLP: hidden layers can be used as word
lookup tables
• Dense distributed word vectors + efficient NN
training algorithms:
• Can scale to billions of words !
Why Neural Networks for NLP?
36
• Representation of words as continuous vectors has a
long history (Hinton et al. 1986; Rumelhart et al. 1986;
Elman 1990)
• First neural network language model: NNLM (Bengio
et al. 2001; Bengio et al. 2003) based on earlier ideas of
distributed representations for symbols (Hinton 1986)
How?
37
Word Embeddings: SocherVector Space Model
Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
the country of my birth
the place where I was born ?
…
38
Compositionality
Principle of compositionality:
the “meaning (vector) of a
complex expression (sentence)
is determined by:
— Gottlob Frege 

(1848 - 1925)
- the meanings of its constituent
expressions (words) and
- the rules (grammar) used to
combine them”
39
• How do we handle the compositionality of language in
our models?
40
Compositionality
• How do we handle the compositionality of language in
our models?
• Recursion :

the same operator (same parameters) is
applied repeatedly on different components
41
Compositionality
• How do we handle the compositionality of language in
our models?
• Option 1: Recurrent Neural Networks (RNN)
42
RNN 1: Recurrent Neural Networks
• How do we handle the compositionality of language in
our models?
• Option 2: Recursive Neural Networks (also
sometimes called RNN)
43
RNN 2: Recursive Neural Networks
• achieved SOTA in 2011 on
Language Modeling (WSJ AR
task) (Mikolov et al.,
INTERSPEECH 2011):
• and again at ASRU 2011:
44
Recurrent Neural Networks
“Comparison to other LMs shows that RNN
LMs are state of the art by a large margin.
Improvements inrease with more training data.”
“[ RNN LM trained on a] single core on 400M words in a few days,
with 1% absolute improvement in WER on state of the art setup”
Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)

Recurrent neural network based language model
45
Recurrent Neural Networks
(simple recurrent 

neural network for LM)
input
hidden layer(s)
output layer
+ sigmoid activation function
+ softmax function:
Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)

Recurrent neural network based language model
46
Recurrent Neural Networks
backpropagation through time
47
Recurrent Neural Networks
backpropagation through time
class based recurrent NN
[code (Mikolov’s RNNLM Toolkit) and more info: http://rnnlm.org/ ]
• Recursive Neural
Network for LM (Socher
et al. 2011; Socher
2014)
• achieved SOTA on new
Stanford Sentiment
Treebank dataset (but
comparing it to many
other models):
Recursive Neural Network
48
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
Recursive Neural Tensor Network
49
code & info: http://www.socher.org/index.php/Main/
ParsingNaturalScenesAndNaturalLanguageWithRecursiveNeuralNetworks
Socher, R., Liu, C.C., NG, A.Y., Manning, C.D. (2011) 

Parsing Natural Scenes and Natural Language with Recursive Neural Networks
Recursive Neural Tensor Network
50
• RNN (Socher et al.
2011a)
Recursive Neural Network
51
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
• RNN (Socher et al.
2011a)
• Matrix-Vector RNN
(MV-RNN) (Socher et
al., 2012)
Recursive Neural Network
52
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
• RNN (Socher et al.
2011a)
• Matrix-Vector RNN
(MV-RNN) (Socher et
al., 2012)
• Recursive Neural
Tensor Network (RNTN)
(Socher et al. 2013)
Recursive Neural Network
53
• negation detection:
Recursive Neural Network
54
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
NP
PP/IN
NP
DT NN PRP$ NN
Parse Tree
Recurrent NN for Vector Space
55
NP
PP/IN
NP
DT NN PRP$ NN
Parse Tree
INDT NN PRP NN
Compositionality
56
Recurrent NN: CompositionalityRecurrent NN for Vector Space
NP
IN
NP
PRP NN
Parse Tree
DT NN
Compositionality
57
Recurrent NN: CompositionalityRecurrent NN for Vector Space
NP
IN
NP
DT NN PRP NN
PP
NP (S / ROOT)
“rules” “meanings”
Compositionality
58
Recurrent NN: CompositionalityRecurrent NN for Vector Space
Vector Space + Word Embeddings: Socher
59
Recurrent NN: CompositionalityRecurrent NN for Vector Space
Vector Space + Word Embeddings: Socher
60
Recurrent NN for Vector Space
Word Embeddings: Turian (2010)
Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning
code & info: http://metaoptimize.com/projects/wordreprs/61
Word Embeddings: Turian (2010)
Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning
code & info: http://metaoptimize.com/projects/wordreprs/
62
Word Embeddings: Collobert & Weston (2011)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (2011) .
Natural Language Processing (almost) from Scratch
63
Multi-embeddings: Stanford (2012)
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng (2012)

Improving Word Representations via Global Context and Multiple Word Prototypes
64
Linguistic Regularities: Mikolov (2013)
code & info: https://code.google.com/p/word2vec/
Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations
65
Word Embeddings for MT: Mikolov (2013)
Mikolov, T., Le, V. L., Sutskever, I. (2013) . 

Exploiting Similarities among Languages for Machine Translation
66
Word Embeddings for MT: Kiros (2014)
67
Recursive Deep Models & Sentiment: Socher (2013)
Socher, R., Perelygin, A., Wu, J., Chuang, J.,Manning, C., Ng, A., Potts, C. (2013) 

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.
code & demo: http://nlp.stanford.edu/sentiment/index.html
68
Paragraph Vectors: Le & Mikolov (2014)
Le, Q., Mikolov,. T. (2014) Distributed Representations of Sentences and Documents
69
• add context (sentence, paragraph, document) to word
vectors during training
!
Results on Stanford Sentiment 

Treebank dataset:
Paragraph Vectors: Dai et al. (2014)
70
Paragraph Vectors: Dai et al. (2014)
71
Paragraph Vectors: Dai et al. (2014)
72
Global Vectors, GloVe: Stanford (2014)
Pennington, P., Socher, R., Manning,. D.M. (2014). 

GloVe: Global Vectors for Word Representation
code & demo: http://nlp.stanford.edu/projects/glove/
vs
results on the word analogy task
“similar accuracy”
73
Dependency-based Embeddings: Levy & Goldberg (2014)
Levy, O., Goldberg, Y. (2014). Dependency-Based Word Embeddings
code & demo: https://levyomer.wordpress.com/2014/04/25/dependency-based-word-
embeddings/
- Syntactic Dependency Context
Australian scientist discovers star with telescope
- Bag of Words (BoW) Context
0.3$
0.4$
0.5$
0.6$
0.7$
0.8$
0.9$
1$
0$ 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$
Precision$
Recall$
“Dependency-based
embeddings have more
functional
similarities”
74
• LSTMS
• Attention
Wanna Play ?
Recent breakthroughs
75
• LSTMS
• Attention
Wanna Play ?
Recent breakthroughs
76
Wanna Play ?
LSTM
77
• LSTMS
• Attention
Wanna Play ?
Recent breakthroughs
78
Attention
Gregor et al (2015) DRAW: A Recurrent Neural Network For Image
Generation (arxiv) (code)
• Question-Answering Systems (&Memory)
• Summarization
• Text Generation
• Dialogue Systems
• Image Captioning & other multimodal tasks
Wanna Play ?
Recent breakthroughs
80
• Question-Answering Systems (&Memory)
• Summarization
• Text Generation
• Dialogue Systems
• Image Captioning & other multimodal tasks
Wanna Play ?
Recent breakthroughs
81
Wanna Play ?
QA & Memory
82
• Memory Networks (Weston et al 2015)
• Dynamic Memory Network (Kumar et al 2015)
• Neural Turing Machine (Graves et al 2014)
Facebook
Metamind
DeepMind
Weston et al (2015) Memory Networks (arxiv)
QA & Memory
83
Yyer et al. (2014) A Neural Network for Factoid Question Answering over
Paragraphs (paper)
Wanna Play ?
QA & Memory
84
• Memory Networks (Weston et al 2015)
• Dynamic Memory Network (Kumar et al 2015)
• Neural Turing Machine (Graves et al 2014)
Facebook
Metamind
DeepMind
Zaremba & Sutskever (2015) Learning to Execute (arxiv)
Wanna Play ?
QA & Memory
85
Babl Dataset
• Question-Answering Systems (&Memory)
• Summarization
• Text Generation
• Dialogue Systems
• Image Captioning & other multimodal tasks
Wanna Play ?
Recent breakthroughs
86
Wanna Play ?
Text generation
87
Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural
Networks (blog)
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural
Networks (blog)
• Question-Answering Systems (&Memory)
• Summarization
• Text Generation
• Dialogue Systems
• Image Captioning & other multimodal tasks
Wanna Play ?
Recent breakthroughs
91
Image-Text Embeddings
92
Socher et al (2013) Zero Shot Learning Through Cross-Modal Transfer (info)
Image-Captioning
• Andrej Karpathy Li Fei-Fei , 2015. 

Deep Visual-Semantic Alignments for Generating Image Descriptions (pdf) (info) (code)
• Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan , 2015. Show and Tell: A
Neural Image Caption Generator (arxiv)
• Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan
Salakhutdinov, Richard Zemel, Yoshua Bengio, Show, Attend and Tell: Neural Image
Caption Generation with Visual Attention (arxiv) (info) (code)
“A person riding a motorcycle on a dirt road.”???
Image-Captioning
“Two hockey players are fighting over the puck.”???
Image-Captioning
“A stop sign is flying in blue skies.”
“A herd of elephants flying in the blue skies.”
Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan
Salakhutdinov, 2015. Generating Images from Captions
with Attention (arxiv) (examples)
Image-Captioning
• TensorFlow: Recently released library by Google. 

http://tensorflow.org
• Theano - CPU/GPU symbolic expression compiler in python (from
LISA lab at University of Montreal). http://deeplearning.net/software/
theano/
• Caffe - Computer Vision oriented Deep Learning framework:
caffe.berkeleyvision.org
• Torch - Matlab-like environment for state-of-the-art machine learning
algorithms in lua (from Ronan Collobert, Clement Farabet and Koray
Kavukcuoglu) http://torch.ch/
• more info: http://deeplearning.net/software links/
Wanna Play ? General Deep Learning
97
• RNNLM (Mikolov)

http://rnnlm.org
• NB-SVM

https://github.com/mesnilgr/nbsvm
• Word2Vec (skipgrams/cbow)

https://code.google.com/p/word2vec/ (original)

http://radimrehurek.com/gensim/models/word2vec.html (python)
• GloVe

http://nlp.stanford.edu/projects/glove/ (original)

https://github.com/maciejkula/glove-python (python)
• Socher et al / Stanford RNN Sentiment code:

http://nlp.stanford.edu/sentiment/code.html
• Deep Learning without Magic Tutorial:

http://nlp.stanford.edu/courses/NAACL2013/
Wanna Play ? NLP
98
Questions?
roelof@kth.se
www.csc.kth.se/~roelof/
99
Code & Papers:
Collaborative Open Computer Science
.com
@graphific

More Related Content

What's hot (20)

PDF
Glove global vectors for word representation
hyunyoung Lee
 
PDF
Transformer Introduction (Seminar Material)
Yuta Niki
 
PDF
Natural Language Processing (NLP)
Yuriy Guts
 
PPTX
Attention Is All You Need
Illia Polosukhin
 
PPTX
Text Classification/Categorization
Oswal Abhishek
 
PPTX
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Weiwei Guo
 
PPTX
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
PDF
GPT-2: Language Models are Unsupervised Multitask Learners
Young Seok Kim
 
PDF
Natural Language Processing seminar review
Jayneel Vora
 
PDF
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
PDF
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Edureka!
 
PPTX
Natural language processing and transformer models
Ding Li
 
PDF
Transformers, LLMs, and the Possibility of AGI
SynaptonIncorporated
 
PDF
BERT Finetuning Webinar Presentation
bhavesh_physics
 
PDF
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Shuntaro Yada
 
PPTX
LLaMA 2.pptx
RkRahul16
 
PPTX
Netflix talk at ML Platform meetup Sep 2019
Faisal Siddiqi
 
PPTX
Lstm
Mehrnaz Faraz
 
PPT
Text classification
James Wong
 
PDF
LSTM Basics
Akshay Sehgal
 
Glove global vectors for word representation
hyunyoung Lee
 
Transformer Introduction (Seminar Material)
Yuta Niki
 
Natural Language Processing (NLP)
Yuriy Guts
 
Attention Is All You Need
Illia Polosukhin
 
Text Classification/Categorization
Oswal Abhishek
 
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Weiwei Guo
 
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
GPT-2: Language Models are Unsupervised Multitask Learners
Young Seok Kim
 
Natural Language Processing seminar review
Jayneel Vora
 
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Edureka!
 
Natural language processing and transformer models
Ding Li
 
Transformers, LLMs, and the Possibility of AGI
SynaptonIncorporated
 
BERT Finetuning Webinar Presentation
bhavesh_physics
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Shuntaro Yada
 
LLaMA 2.pptx
RkRahul16
 
Netflix talk at ML Platform meetup Sep 2019
Faisal Siddiqi
 
Text classification
James Wong
 
LSTM Basics
Akshay Sehgal
 

Similar to Deep Learning for Natural Language Processing: Word Embeddings (20)

PDF
Visual-Semantic Embeddings: some thoughts on Language
Roelof Pieters
 
PDF
Deep learning for natural language embeddings
Roelof Pieters
 
PDF
Deep Learning for Information Retrieval
Roelof Pieters
 
PDF
Deep learning for nlp
Viet-Trung TRAN
 
PPTX
Deep Learning for Natural Language Processing
ParrotAI
 
PDF
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
PPTX
A Panorama of Natural Language Processing
Ted Xiao
 
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Saurabh Kaushik
 
PPTX
NLP Introduction and basics of natural language processing
mailtoahmedhassan
 
PDF
AINL 2016: Nikolenko
Lidia Pivovarova
 
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
PDF
Representation Learning of Text for NLP
Anuj Gupta
 
PDF
Anthiil Inside workshop on NLP
Satyam Saxena
 
PDF
NLP Bootcamp 2018 : Representation Learning of text for NLP
Anuj Gupta
 
PPTX
NLP Bootcamp
Anuj Gupta
 
PDF
Contemporary Models of Natural Language Processing
Katerina Vylomova
 
PDF
CSCE181 Big ideas in NLP
Insoo Chung
 
PDF
Generative Artificial Intelligence and Large Language Model
Shiwani Gupta
 
PDF
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET Journal
 
PPTX
Introduction to Neural Information Retrieval and Large Language Models
sadjadeb
 
Visual-Semantic Embeddings: some thoughts on Language
Roelof Pieters
 
Deep learning for natural language embeddings
Roelof Pieters
 
Deep Learning for Information Retrieval
Roelof Pieters
 
Deep learning for nlp
Viet-Trung TRAN
 
Deep Learning for Natural Language Processing
ParrotAI
 
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
A Panorama of Natural Language Processing
Ted Xiao
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Saurabh Kaushik
 
NLP Introduction and basics of natural language processing
mailtoahmedhassan
 
AINL 2016: Nikolenko
Lidia Pivovarova
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
Representation Learning of Text for NLP
Anuj Gupta
 
Anthiil Inside workshop on NLP
Satyam Saxena
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
Anuj Gupta
 
NLP Bootcamp
Anuj Gupta
 
Contemporary Models of Natural Language Processing
Katerina Vylomova
 
CSCE181 Big ideas in NLP
Insoo Chung
 
Generative Artificial Intelligence and Large Language Model
Shiwani Gupta
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET Journal
 
Introduction to Neural Information Retrieval and Large Language Models
sadjadeb
 
Ad

More from Roelof Pieters (20)

PDF
Speculations in anthropology and tech for an uncertain future
Roelof Pieters
 
PDF
AI assisted creativity
Roelof Pieters
 
PDF
Creativity and AI: 
Deep Neural Nets "Going Wild"
Roelof Pieters
 
PDF
Deep Neural Networks 
that talk (Back)… with style
Roelof Pieters
 
PDF
Building a Deep Learning (Dream) Machine
Roelof Pieters
 
PDF
Multi-modal embeddings: from discriminative to generative models and creative ai
Roelof Pieters
 
PDF
Multi modal retrieval and generation with deep distributed models
Roelof Pieters
 
PDF
Creative AI & multimodality: looking ahead
Roelof Pieters
 
PDF
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Roelof Pieters
 
PDF
Explore Data: Data Science + Visualization
Roelof Pieters
 
PDF
Deep Learning as a Cat/Dog Detector
Roelof Pieters
 
PDF
Graph, Data-science, and Deep Learning
Roelof Pieters
 
PDF
Deep Learning: a birds eye view
Roelof Pieters
 
PDF
Learning to understand phrases by embedding the dictionary
Roelof Pieters
 
PDF
Zero shot learning through cross-modal transfer
Roelof Pieters
 
PDF
Deep Learning, an interactive introduction for NLP-ers
Roelof Pieters
 
PDF
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Roelof Pieters
 
PDF
Deep Learning & NLP: Graphs to the Rescue!
Roelof Pieters
 
PDF
Recommender Systems, Matrices and Graphs
Roelof Pieters
 
PDF
Hackathon 2014 NLP Hack
Roelof Pieters
 
Speculations in anthropology and tech for an uncertain future
Roelof Pieters
 
AI assisted creativity
Roelof Pieters
 
Creativity and AI: 
Deep Neural Nets "Going Wild"
Roelof Pieters
 
Deep Neural Networks 
that talk (Back)… with style
Roelof Pieters
 
Building a Deep Learning (Dream) Machine
Roelof Pieters
 
Multi-modal embeddings: from discriminative to generative models and creative ai
Roelof Pieters
 
Multi modal retrieval and generation with deep distributed models
Roelof Pieters
 
Creative AI & multimodality: looking ahead
Roelof Pieters
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Roelof Pieters
 
Explore Data: Data Science + Visualization
Roelof Pieters
 
Deep Learning as a Cat/Dog Detector
Roelof Pieters
 
Graph, Data-science, and Deep Learning
Roelof Pieters
 
Deep Learning: a birds eye view
Roelof Pieters
 
Learning to understand phrases by embedding the dictionary
Roelof Pieters
 
Zero shot learning through cross-modal transfer
Roelof Pieters
 
Deep Learning, an interactive introduction for NLP-ers
Roelof Pieters
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Roelof Pieters
 
Deep Learning & NLP: Graphs to the Rescue!
Roelof Pieters
 
Recommender Systems, Matrices and Graphs
Roelof Pieters
 
Hackathon 2014 NLP Hack
Roelof Pieters
 
Ad

Recently uploaded (16)

PDF
Leveraging the Power of Jira Dashboard.pdf
siddharthshukla742740
 
PPTX
Pastor Bob Stewart Acts 21 07 09 2025.pptx
FamilyWorshipCenterD
 
PPTX
STURGEON BAY WI AG PPT JULY 6 2025.pptx
FamilyWorshipCenterD
 
PPTX
Food_and_Drink_Bahasa_Inggris_Kelas_5.pptx
debbystevani36
 
PDF
The Impact of Game Live Streaming on In-Game Purchases of Chinese Young Game ...
Shibaura Institute of Technology
 
PPTX
Great-Books. Powerpoint presentation. files
tamayocrisgie
 
PPTX
presentation on legal and regulatory action
raoharsh4122001
 
PPTX
some leadership theories MBA management.pptx
rkseo19
 
PDF
The Family Secret (essence of loveliness)
Favour Biodun
 
PPTX
2025-07-06 Abraham 06 (shared slides).pptx
Dale Wells
 
PPTX
Inspired by VeinSense: Supercharge Your Hackathon with Agentic AI
ShubhamSharma2528
 
PDF
Generalization predition MOOCs - Conference presentation - eMOOCs 2025
pmmorenom01
 
PPTX
BARRIERS TO EFFECTIVE COMMUNICATION.pptx
shraddham25
 
PDF
Cloud Computing Service Availability.pdf
chakrirocky1
 
PPTX
AI presentation for everyone in every fields
dodinhkhai1
 
PPTX
Presentationexpressions You are student leader and have just come from a stud...
BENSTARBEATZ
 
Leveraging the Power of Jira Dashboard.pdf
siddharthshukla742740
 
Pastor Bob Stewart Acts 21 07 09 2025.pptx
FamilyWorshipCenterD
 
STURGEON BAY WI AG PPT JULY 6 2025.pptx
FamilyWorshipCenterD
 
Food_and_Drink_Bahasa_Inggris_Kelas_5.pptx
debbystevani36
 
The Impact of Game Live Streaming on In-Game Purchases of Chinese Young Game ...
Shibaura Institute of Technology
 
Great-Books. Powerpoint presentation. files
tamayocrisgie
 
presentation on legal and regulatory action
raoharsh4122001
 
some leadership theories MBA management.pptx
rkseo19
 
The Family Secret (essence of loveliness)
Favour Biodun
 
2025-07-06 Abraham 06 (shared slides).pptx
Dale Wells
 
Inspired by VeinSense: Supercharge Your Hackathon with Agentic AI
ShubhamSharma2528
 
Generalization predition MOOCs - Conference presentation - eMOOCs 2025
pmmorenom01
 
BARRIERS TO EFFECTIVE COMMUNICATION.pptx
shraddham25
 
Cloud Computing Service Availability.pdf
chakrirocky1
 
AI presentation for everyone in every fields
dodinhkhai1
 
Presentationexpressions You are student leader and have just come from a stud...
BENSTARBEATZ
 

Deep Learning for Natural Language Processing: Word Embeddings

  • 1. 1 @graphific Roelof Pieters Deep  Learning  for  Natural   Language  Processing:  Word   Embeddings 3  December  2015  
 KTH www.csc.kth.se/~roelof/ [email protected]
  • 3. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions 3. Language is culturally specific Some of the challenges in Language Understanding: 3
  • 4. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions • plays well with others VB ADV P NN NN NN P DT • fruit flies like a banana NN NN VB DT NN NN VB P DT NN NN NN P DT NN NN VB VB DT NN • the students went to class DT NN VB P NN 4 Some of the challenges in Language Understanding:
  • 5. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions 5 Some of the challenges in Language Understanding:
  • 6. [Karlgren 2014, NLP Sthlm Meetup]6
  • 7. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions 3. Language is culturally specific Some of the challenges in Language Understanding: 7
  • 8. ML: Traditional Approach 1. Gather as much LABELED data as you can get 2. Throw some algorithms at it (mainly put in an SVM and keep it at that) 3. If you actually have tried more algos: Pick the best 4. Spend hours hand engineering some features / feature selection / dimensionality reduction (PCA, SVD, etc) 5. Repeat… For each new problem/question:: 8
  • 9. Machine Learning for NLP Data Classic Approach: Data is fed into a learning algorithm: Learning 
 Algorithm 9
  • 10. Machine Learning for NLP some of the (many) treebank datasets source: http://www-nlp.stanford.edu/links/statnlp.html#Treebanks ! 10
  • 11. Penn Treebank That’s a lot of “manual” work: 11
  • 12. • the students went to class DT NN VB P NN • plays well with others VB ADV P NN NN NN P DT • fruit flies like a banana NN NN VB DT NN NN VB P DT NN NN NN P DT NN NN VB VB DT NN With a lot of issues: Penn Treebank 12
  • 13. Machine Learning for NLP Learning 
 Algorithm Data “Features” Prediction Prediction/
 Classifier train set test set 13
  • 14. Machine Learning for NLP Learning 
 Algorithm “Features” Prediction Prediction/
 Classifier train set test set 14
  • 15. One Model rules them all ?
 
 DL approaches have been successfully applied to: Deep Learning: Why for NLP ? Automatic summarization Coreference resolution Discourse analysis Machine translation Morphological segmentation Named entity recognition (NER) Natural language generation Natural language understanding Optical character recognition (OCR) Part-of-speech tagging Parsing Question answering Relationship extraction sentence boundary disambiguation Sentiment analysis Speech recognition Speech segmentation Topic segmentation and recognition Word segmentation Word sense disambiguation Information retrieval (IR) Information extraction (IE) Speech processing 15
  • 16. Deep Learning: Why for NLP ? 16
  • 18. • What is the meaning of a word?
 (Lexical semantics) • What is the meaning of a sentence?
 ([Compositional] semantics) • What is the meaning of a longer piece of text? (Discourse semantics) Semantics: Meaning 18
  • 19. • NLP treats words mainly (rule-based/statistical approaches at least) as atomic symbols:
 • or in vector space:
 • also known as “one hot” representation. • Its problem ? Word Representation Love Candy Store [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] Candy [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] AND Store [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 …] = 0 ! 19
  • 21. • Structure corresponds to meaning: Structure and Meaning 21
  • 22. • Semantics • Syntax 22 NLP: what can we work with?
  • 23. • Language models define probability distributions over (natural language) strings or sentences • Joint and Conditional Probability Language Model 23
  • 24. • Language models define probability distributions over (natural language) strings or sentences Language Model 24
  • 25. • Language models define probability distributions over (natural language) strings or sentences Language Model 25
  • 26. Word senses What is the meaning of words? • Most words have many different senses:
 dog = animal or sausage? How are the meanings of different words related? • - Specific relations between senses:
 Animal is more general than dog. • - Semantic fields:
 money is related to bank 26
  • 27. Word senses Polysemy: • A lexeme is polysemous if it has different related senses • bank = financial institution or building Homonyms: • Two lexemes are homonyms if their senses are unrelated, but they happen to have the same spelling and pronunciation • bank = (financial) bank or (river) bank 27
  • 28. Word senses: relations Symmetric relations: • Synonyms: couch/sofa
 Two lemmas with the same sense • Antonyms: cold/hot, rise/fall, in/out
 Two lemmas with the opposite sense Hierarchical relations: • Hypernyms and hyponyms: pet/dog
 The hyponym (dog) is more specific than the hypernym (pet) • Holonyms and meronyms: car/wheel
 The meronym (wheel) is a part of the holonym (car) 28
  • 29. Distributional representations “You shall know a word by the company it keeps”
 (J. R. Firth 1957) One of the most successful ideas of modern statistical NLP! these words represent banking • Hard (class based) clustering models • Soft clustering models 29
  • 30. Distributional hypothesis He filled the wampimuk, passed it around and we all drunk some We found a little, hairy wampimuk sleeping behind the tree (McDonald & Ramscar 2001) 30
  • 31. Distributional semantics Landauer and Dumais (1997), Turney and Pantel (2010), … 31
  • 32. Distributional semantics Distributional meaning as co-occurrence vector: 32
  • 33. Distributional representations • Taking it further: • Continuous word embeddings • Combine vector space semantics with the prediction of probabilistic models • Words are represented as a dense vector: Candy = 33
  • 34. Word Embeddings: SocherVector Space Model adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: 34
  • 35. Word Embeddings: SocherVector Space Model adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: the country of my birth the place where I was born 35
  • 36. • Can theoretically (given enough units) approximate “any” function • and fit to “any” kind of data • Efficient for NLP: hidden layers can be used as word lookup tables • Dense distributed word vectors + efficient NN training algorithms: • Can scale to billions of words ! Why Neural Networks for NLP? 36
  • 37. • Representation of words as continuous vectors has a long history (Hinton et al. 1986; Rumelhart et al. 1986; Elman 1990) • First neural network language model: NNLM (Bengio et al. 2001; Bengio et al. 2003) based on earlier ideas of distributed representations for symbols (Hinton 1986) How? 37
  • 38. Word Embeddings: SocherVector Space Model Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: the country of my birth the place where I was born ? … 38
  • 39. Compositionality Principle of compositionality: the “meaning (vector) of a complex expression (sentence) is determined by: — Gottlob Frege 
 (1848 - 1925) - the meanings of its constituent expressions (words) and - the rules (grammar) used to combine them” 39
  • 40. • How do we handle the compositionality of language in our models? 40 Compositionality
  • 41. • How do we handle the compositionality of language in our models? • Recursion :
 the same operator (same parameters) is applied repeatedly on different components 41 Compositionality
  • 42. • How do we handle the compositionality of language in our models? • Option 1: Recurrent Neural Networks (RNN) 42 RNN 1: Recurrent Neural Networks
  • 43. • How do we handle the compositionality of language in our models? • Option 2: Recursive Neural Networks (also sometimes called RNN) 43 RNN 2: Recursive Neural Networks
  • 44. • achieved SOTA in 2011 on Language Modeling (WSJ AR task) (Mikolov et al., INTERSPEECH 2011): • and again at ASRU 2011: 44 Recurrent Neural Networks “Comparison to other LMs shows that RNN LMs are state of the art by a large margin. Improvements inrease with more training data.” “[ RNN LM trained on a] single core on 400M words in a few days, with 1% absolute improvement in WER on state of the art setup” Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)
 Recurrent neural network based language model
  • 45. 45 Recurrent Neural Networks (simple recurrent 
 neural network for LM) input hidden layer(s) output layer + sigmoid activation function + softmax function: Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)
 Recurrent neural network based language model
  • 47. 47 Recurrent Neural Networks backpropagation through time class based recurrent NN [code (Mikolov’s RNNLM Toolkit) and more info: http://rnnlm.org/ ]
  • 48. • Recursive Neural Network for LM (Socher et al. 2011; Socher 2014) • achieved SOTA on new Stanford Sentiment Treebank dataset (but comparing it to many other models): Recursive Neural Network 48 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  • 49. Recursive Neural Tensor Network 49 code & info: http://www.socher.org/index.php/Main/ ParsingNaturalScenesAndNaturalLanguageWithRecursiveNeuralNetworks Socher, R., Liu, C.C., NG, A.Y., Manning, C.D. (2011) 
 Parsing Natural Scenes and Natural Language with Recursive Neural Networks
  • 51. • RNN (Socher et al. 2011a) Recursive Neural Network 51 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  • 52. • RNN (Socher et al. 2011a) • Matrix-Vector RNN (MV-RNN) (Socher et al., 2012) Recursive Neural Network 52 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  • 53. • RNN (Socher et al. 2011a) • Matrix-Vector RNN (MV-RNN) (Socher et al., 2012) • Recursive Neural Tensor Network (RNTN) (Socher et al. 2013) Recursive Neural Network 53
  • 54. • negation detection: Recursive Neural Network 54 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  • 55. NP PP/IN NP DT NN PRP$ NN Parse Tree Recurrent NN for Vector Space 55
  • 56. NP PP/IN NP DT NN PRP$ NN Parse Tree INDT NN PRP NN Compositionality 56 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  • 57. NP IN NP PRP NN Parse Tree DT NN Compositionality 57 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  • 58. NP IN NP DT NN PRP NN PP NP (S / ROOT) “rules” “meanings” Compositionality 58 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  • 59. Vector Space + Word Embeddings: Socher 59 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  • 60. Vector Space + Word Embeddings: Socher 60 Recurrent NN for Vector Space
  • 61. Word Embeddings: Turian (2010) Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning code & info: http://metaoptimize.com/projects/wordreprs/61
  • 62. Word Embeddings: Turian (2010) Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning code & info: http://metaoptimize.com/projects/wordreprs/ 62
  • 63. Word Embeddings: Collobert & Weston (2011) Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (2011) . Natural Language Processing (almost) from Scratch 63
  • 64. Multi-embeddings: Stanford (2012) Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng (2012)
 Improving Word Representations via Global Context and Multiple Word Prototypes 64
  • 65. Linguistic Regularities: Mikolov (2013) code & info: https://code.google.com/p/word2vec/ Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations 65
  • 66. Word Embeddings for MT: Mikolov (2013) Mikolov, T., Le, V. L., Sutskever, I. (2013) . 
 Exploiting Similarities among Languages for Machine Translation 66
  • 67. Word Embeddings for MT: Kiros (2014) 67
  • 68. Recursive Deep Models & Sentiment: Socher (2013) Socher, R., Perelygin, A., Wu, J., Chuang, J.,Manning, C., Ng, A., Potts, C. (2013) 
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. code & demo: http://nlp.stanford.edu/sentiment/index.html 68
  • 69. Paragraph Vectors: Le & Mikolov (2014) Le, Q., Mikolov,. T. (2014) Distributed Representations of Sentences and Documents 69 • add context (sentence, paragraph, document) to word vectors during training ! Results on Stanford Sentiment 
 Treebank dataset:
  • 70. Paragraph Vectors: Dai et al. (2014) 70
  • 71. Paragraph Vectors: Dai et al. (2014) 71
  • 72. Paragraph Vectors: Dai et al. (2014) 72
  • 73. Global Vectors, GloVe: Stanford (2014) Pennington, P., Socher, R., Manning,. D.M. (2014). 
 GloVe: Global Vectors for Word Representation code & demo: http://nlp.stanford.edu/projects/glove/ vs results on the word analogy task “similar accuracy” 73
  • 74. Dependency-based Embeddings: Levy & Goldberg (2014) Levy, O., Goldberg, Y. (2014). Dependency-Based Word Embeddings code & demo: https://levyomer.wordpress.com/2014/04/25/dependency-based-word- embeddings/ - Syntactic Dependency Context Australian scientist discovers star with telescope - Bag of Words (BoW) Context 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$ 0$ 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$ Precision$ Recall$ “Dependency-based embeddings have more functional similarities” 74
  • 75. • LSTMS • Attention Wanna Play ? Recent breakthroughs 75
  • 76. • LSTMS • Attention Wanna Play ? Recent breakthroughs 76
  • 78. • LSTMS • Attention Wanna Play ? Recent breakthroughs 78
  • 79. Attention Gregor et al (2015) DRAW: A Recurrent Neural Network For Image Generation (arxiv) (code)
  • 80. • Question-Answering Systems (&Memory) • Summarization • Text Generation • Dialogue Systems • Image Captioning & other multimodal tasks Wanna Play ? Recent breakthroughs 80
  • 81. • Question-Answering Systems (&Memory) • Summarization • Text Generation • Dialogue Systems • Image Captioning & other multimodal tasks Wanna Play ? Recent breakthroughs 81
  • 82. Wanna Play ? QA & Memory 82 • Memory Networks (Weston et al 2015) • Dynamic Memory Network (Kumar et al 2015) • Neural Turing Machine (Graves et al 2014) Facebook Metamind DeepMind Weston et al (2015) Memory Networks (arxiv)
  • 83. QA & Memory 83 Yyer et al. (2014) A Neural Network for Factoid Question Answering over Paragraphs (paper)
  • 84. Wanna Play ? QA & Memory 84 • Memory Networks (Weston et al 2015) • Dynamic Memory Network (Kumar et al 2015) • Neural Turing Machine (Graves et al 2014) Facebook Metamind DeepMind Zaremba & Sutskever (2015) Learning to Execute (arxiv)
  • 85. Wanna Play ? QA & Memory 85 Babl Dataset
  • 86. • Question-Answering Systems (&Memory) • Summarization • Text Generation • Dialogue Systems • Image Captioning & other multimodal tasks Wanna Play ? Recent breakthroughs 86
  • 87. Wanna Play ? Text generation 87 Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural Networks (blog)
  • 90. Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural Networks (blog)
  • 91. • Question-Answering Systems (&Memory) • Summarization • Text Generation • Dialogue Systems • Image Captioning & other multimodal tasks Wanna Play ? Recent breakthroughs 91
  • 92. Image-Text Embeddings 92 Socher et al (2013) Zero Shot Learning Through Cross-Modal Transfer (info)
  • 93. Image-Captioning • Andrej Karpathy Li Fei-Fei , 2015. 
 Deep Visual-Semantic Alignments for Generating Image Descriptions (pdf) (info) (code) • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan , 2015. Show and Tell: A Neural Image Caption Generator (arxiv) • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (arxiv) (info) (code)
  • 94. “A person riding a motorcycle on a dirt road.”??? Image-Captioning
  • 95. “Two hockey players are fighting over the puck.”??? Image-Captioning
  • 96. “A stop sign is flying in blue skies.” “A herd of elephants flying in the blue skies.” Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov, 2015. Generating Images from Captions with Attention (arxiv) (examples) Image-Captioning
  • 97. • TensorFlow: Recently released library by Google. 
 http://tensorflow.org • Theano - CPU/GPU symbolic expression compiler in python (from LISA lab at University of Montreal). http://deeplearning.net/software/ theano/ • Caffe - Computer Vision oriented Deep Learning framework: caffe.berkeleyvision.org • Torch - Matlab-like environment for state-of-the-art machine learning algorithms in lua (from Ronan Collobert, Clement Farabet and Koray Kavukcuoglu) http://torch.ch/ • more info: http://deeplearning.net/software links/ Wanna Play ? General Deep Learning 97
  • 98. • RNNLM (Mikolov)
 http://rnnlm.org • NB-SVM
 https://github.com/mesnilgr/nbsvm • Word2Vec (skipgrams/cbow)
 https://code.google.com/p/word2vec/ (original)
 http://radimrehurek.com/gensim/models/word2vec.html (python) • GloVe
 http://nlp.stanford.edu/projects/glove/ (original)
 https://github.com/maciejkula/glove-python (python) • Socher et al / Stanford RNN Sentiment code:
 http://nlp.stanford.edu/sentiment/code.html • Deep Learning without Magic Tutorial:
 http://nlp.stanford.edu/courses/NAACL2013/ Wanna Play ? NLP 98