SlideShare a Scribd company logo
Unsupervised Learning of Sentence Embeddings using
Compositional n-Gram Features
2018/07/12 M1 Hiroki Shimanaka
Matteo Pagliardini, Prakhar Gupta and Martin Jaggi
NAACL 2018
1
lUnsupervised training of word representation, such as Word2Vec
[Mikolov et al., 2013] is now routinely trained on very large amounts of
raw text data, and have become ubiquitous building blocks of a majority
of current state-of-the-art NLP applications.
lWhile very useful semantic representations are available for words.
Abstract & Introduction (1)
2
lA strong trend in deep-learning for NLP leads towards increasingly
powerful and com-plex models.
ØWhile extremely strong in expressiveness, the increased model complexity
makes such models much slower to train on larger datasets.
lOn the other hand, simpler “shallow” models such as matrix
factorizations (or bilinear models) can benefit from training on much
larger sets of data, which can be a key advantage, especially in the
unsupervised setting.
Abstract & Introduction (2)
Abstract & Introduction (3)
lThe authors present a simple but efficient unsupervised objective to
train distributed representations of sentences.
lTheir method outperforms the state-of-the-art unsupervised models
on most benchmark tasks, highlighting the robustness of the produced
general-purpose sentence embeddings.
3
4
Approach
Their proposed model (Sent2Vec) can be seen as an extension of the Cl -
BOW [Mikolov et al., 2013] training objecKve to train sentence instead
of word embeddings.
Sent2Vec is a simple unsupervised model allowing to coml -pose
sentence embeddings using word vectors along with n-gram
embeddings, simultaneously training composiKon and the embed-ding
vectors themselves.
5
Model (1)
lTheir model is inspired by simple matrix factor models (bilinear
models) such as recently very successfully used in unsupervised learning
of word embeddings.
!∈ℝ$×&
: target word vectors
'∈ℝ&×|)|: learnt source word vectors
): vocabulary
ℎ: hidden size
ι4 ∈{0, 1}|)|
:binary vector encoding :
:: sentence
6
Model (2)
lConceptually, Sent2Vec can be interpreted as a natural extension of the
word-contexts from C-BOW [Mikolov et al., 2013] to a larger sentence
context.
!(#): the list of n−grams (including un−igrams)
ι&(') ∈{0, 1}|/|: binary vector encoding !(#) #: sentence
67: source (or context) embed−ding
7
Model (3)
lNegative sampling
!"#
:the set of words sampled nega6vely for the word $% ∈ '
': sentence.": source (or context) embed−ding
/": target embedding
0":the normalized frequency of $ in the corpus
8
Model (4)
lSubsampling
lTo select the possible target unigrams (positives), they use subsampling as in
[Joulin et al., 2017; Bojanowski et al., 2017], each word ! being discarded
with probability 1 − $%(!)
()*
:the set of words sampled negaJvely for the word !+ ∈ -
-: sentence
4): source (or context) embed−ding
5): target embed−ding$6 ! ≔
8)
∑):∈; 8):
8):the normalized frequency of ! in the corpus
9
Experimental Setting (1)
lDataset:
lthe Toronto book corpus (70M sentences)
lWikipedia sentences and tweets
lDropout:
lFor each sentence they use dropout on its list of n-grams ! " ∖ {%(")},
where %(") is the set of all unigrams contained in sentence ".
lThey find that dropping K n-grams (n > 1) for each sentence is giving superior
results compared to dropping each token with some fixed probability.
lRegularization
lApplying L1 regularization to the word vectors.
lOptimizer
lSGD
10
Experimental Setting (2)
11
Evalua&on Tasks
lTransfer tasks:
lSupervised
lMR: movie review senti-ment
lCR: product reviews
lSUBJ: subjectivity classification
lMPQA: opinion polarity
lTREC: question type classifi-cation
lMSRP: paraphrase identification
lUnsupervised
lSICK: semantic relatedness task
lSTS : Semantic textual similarlity
12
Experimental Results (1)
13
Experimental Results (2)
14
Experimental Results (3)
15
Conclusion
lIn this paper, they introduce a novel, computationally efficient, unsupervised,
C-BOW-inspired method to train and infer sentence embeddings.
lOn supervised evaluations, their method, on an average, achieves better
performance than all other unsupervised competitors with the exception of
Skip-Thought [Kiros et al., 2015].
lHowever, their model is generalizable, extremely fast to train, simple to
understand and easily interpretable, showing the relevance of simple and
well-grounded representation models in contrast to the models using deep
architectures.
lFuture work could focus on augmenting the model to exploit data with
ordered sentences.
Ad

More Related Content

What's hot (20)

What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
Traian Rebedea
 
Probabilistic content models,
Probabilistic content models,Probabilistic content models,
Probabilistic content models,
Bryan Gummibearehausen
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
Surya Sg
 
Language models
Language modelsLanguage models
Language models
Maryam Khordad
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
Lidia Pivovarova
 
Word2Vec
Word2VecWord2Vec
Word2Vec
mohammad javad hasani
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
rudolf eremyan
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
ijtsrd
 
Tensorflow
TensorflowTensorflow
Tensorflow
Knoldus Inc.
 
[Emnlp] what is glo ve part i - towards data science
[Emnlp] what is glo ve  part i - towards data science[Emnlp] what is glo ve  part i - towards data science
[Emnlp] what is glo ve part i - towards data science
Nikhil Jaiswal
 
Centroid-based Text Summarization through Compositionality of Word Embeddings
Centroid-based Text Summarization through Compositionality of Word EmbeddingsCentroid-based Text Summarization through Compositionality of Word Embeddings
Centroid-based Text Summarization through Compositionality of Word Embeddings
Gaetano Rossiello, PhD
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
Hiroyuki Kuromiya
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
Karol Grzegorczyk
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
ijaia
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
Innovation Quotient Pvt Ltd
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Innovation Quotient Pvt Ltd
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
amit nagarkoti
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensim
Edgar Marca
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
Surya Sg
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
rudolf eremyan
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
ijtsrd
 
[Emnlp] what is glo ve part i - towards data science
[Emnlp] what is glo ve  part i - towards data science[Emnlp] what is glo ve  part i - towards data science
[Emnlp] what is glo ve part i - towards data science
Nikhil Jaiswal
 
Centroid-based Text Summarization through Compositionality of Word Embeddings
Centroid-based Text Summarization through Compositionality of Word EmbeddingsCentroid-based Text Summarization through Compositionality of Word Embeddings
Centroid-based Text Summarization through Compositionality of Word Embeddings
Gaetano Rossiello, PhD
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
Hiroyuki Kuromiya
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
ijaia
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
Innovation Quotient Pvt Ltd
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Innovation Quotient Pvt Ltd
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
amit nagarkoti
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensim
Edgar Marca
 

Similar to [Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features (20)

NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
Eugene Nho
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
AbdurrahimDerric
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
CHUNKER BASED SENTIMENT ANALYSIS AND TENSE CLASSIFICATION FOR NEPALI TEXT
CHUNKER BASED SENTIMENT ANALYSIS AND TENSE CLASSIFICATION FOR NEPALI TEXTCHUNKER BASED SENTIMENT ANALYSIS AND TENSE CLASSIFICATION FOR NEPALI TEXT
CHUNKER BASED SENTIMENT ANALYSIS AND TENSE CLASSIFICATION FOR NEPALI TEXT
SethDarren1
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
ijseajournal
 
Knowledge distillation deeplab
Knowledge distillation deeplabKnowledge distillation deeplab
Knowledge distillation deeplab
Frozen Paradise
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
ijsc
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
ijsc
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
ijsc
 
1808.10245v1 (1).pdf
1808.10245v1 (1).pdf1808.10245v1 (1).pdf
1808.10245v1 (1).pdf
KSHITIJCHAUDHARY20
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
kevig
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
ijnlc
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
changedaeoh
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
Viral Gupta
 
Neural word embedding and language modelling
Neural word embedding and language modellingNeural word embedding and language modelling
Neural word embedding and language modelling
Riddhi Jain
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
Editor IJCATR
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
Eugene Nho
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
AbdurrahimDerric
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
CHUNKER BASED SENTIMENT ANALYSIS AND TENSE CLASSIFICATION FOR NEPALI TEXT
CHUNKER BASED SENTIMENT ANALYSIS AND TENSE CLASSIFICATION FOR NEPALI TEXTCHUNKER BASED SENTIMENT ANALYSIS AND TENSE CLASSIFICATION FOR NEPALI TEXT
CHUNKER BASED SENTIMENT ANALYSIS AND TENSE CLASSIFICATION FOR NEPALI TEXT
SethDarren1
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
ijseajournal
 
Knowledge distillation deeplab
Knowledge distillation deeplabKnowledge distillation deeplab
Knowledge distillation deeplab
Frozen Paradise
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
ijsc
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
ijsc
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
ijsc
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
kevig
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
ijnlc
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
changedaeoh
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
Viral Gupta
 
Neural word embedding and language modelling
Neural word embedding and language modellingNeural word embedding and language modelling
Neural word embedding and language modelling
Riddhi Jain
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
Editor IJCATR
 
Ad

More from Hiroki Shimanaka (7)

[Tutorial] Sentence Representation
[Tutorial] Sentence Representation[Tutorial] Sentence Representation
[Tutorial] Sentence Representation
Hiroki Shimanaka
 
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
Hiroki Shimanaka
 
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
Hiroki Shimanaka
 
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
Hiroki Shimanaka
 
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
Hiroki Shimanaka
 
[論文紹介] Are BLEU and Meaning Representation in Opposition?
[論文紹介] Are BLEU and Meaning Representation in Opposition?[論文紹介] Are BLEU and Meaning Representation in Opposition?
[論文紹介] Are BLEU and Meaning Representation in Opposition?
Hiroki Shimanaka
 
[論文紹介] Skip-Thought Vectors
[論文紹介] Skip-Thought Vectors[論文紹介] Skip-Thought Vectors
[論文紹介] Skip-Thought Vectors
Hiroki Shimanaka
 
[Tutorial] Sentence Representation
[Tutorial] Sentence Representation[Tutorial] Sentence Representation
[Tutorial] Sentence Representation
Hiroki Shimanaka
 
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
Hiroki Shimanaka
 
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
Hiroki Shimanaka
 
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
Hiroki Shimanaka
 
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
Hiroki Shimanaka
 
[論文紹介] Are BLEU and Meaning Representation in Opposition?
[論文紹介] Are BLEU and Meaning Representation in Opposition?[論文紹介] Are BLEU and Meaning Representation in Opposition?
[論文紹介] Are BLEU and Meaning Representation in Opposition?
Hiroki Shimanaka
 
[論文紹介] Skip-Thought Vectors
[論文紹介] Skip-Thought Vectors[論文紹介] Skip-Thought Vectors
[論文紹介] Skip-Thought Vectors
Hiroki Shimanaka
 
Ad

Recently uploaded (20)

PRIZ Academy - Functional Modeling In Action with PRIZ.pdf
PRIZ Academy - Functional Modeling In Action with PRIZ.pdfPRIZ Academy - Functional Modeling In Action with PRIZ.pdf
PRIZ Academy - Functional Modeling In Action with PRIZ.pdf
PRIZ Guru
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
W1 WDM_Principle and basics to know.pptx
W1 WDM_Principle and basics to know.pptxW1 WDM_Principle and basics to know.pptx
W1 WDM_Principle and basics to know.pptx
muhhxx51
 
Dynamics of Structures with Uncertain Properties.pptx
Dynamics of Structures with Uncertain Properties.pptxDynamics of Structures with Uncertain Properties.pptx
Dynamics of Structures with Uncertain Properties.pptx
University of Glasgow
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
Taqyea
 
Introduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptxIntroduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptx
AS1920
 
Efficient Algorithms for Isogeny Computation on Hyperelliptic Curves: Their A...
Efficient Algorithms for Isogeny Computation on Hyperelliptic Curves: Their A...Efficient Algorithms for Isogeny Computation on Hyperelliptic Curves: Their A...
Efficient Algorithms for Isogeny Computation on Hyperelliptic Curves: Their A...
IJCNCJournal
 
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Journal of Soft Computing in Civil Engineering
 
Compiler Design_Syntax Directed Translation.pptx
Compiler Design_Syntax Directed Translation.pptxCompiler Design_Syntax Directed Translation.pptx
Compiler Design_Syntax Directed Translation.pptx
RushaliDeshmukh2
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
How to use nRF24L01 module with Arduino
How to use nRF24L01 module with ArduinoHow to use nRF24L01 module with Arduino
How to use nRF24L01 module with Arduino
CircuitDigest
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
MODULE 03 - CLOUD COMPUTING- [BIS 613D] 2022 scheme.pptx
MODULE 03 - CLOUD COMPUTING-  [BIS 613D] 2022 scheme.pptxMODULE 03 - CLOUD COMPUTING-  [BIS 613D] 2022 scheme.pptx
MODULE 03 - CLOUD COMPUTING- [BIS 613D] 2022 scheme.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
Novel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth ControlNovel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth Control
Chris Harding
 
ZJIT: Building a Next Generation Ruby JIT
ZJIT: Building a Next Generation Ruby JITZJIT: Building a Next Generation Ruby JIT
ZJIT: Building a Next Generation Ruby JIT
maximechevalierboisv1
 
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Journal of Soft Computing in Civil Engineering
 
PRIZ Academy - Functional Modeling In Action with PRIZ.pdf
PRIZ Academy - Functional Modeling In Action with PRIZ.pdfPRIZ Academy - Functional Modeling In Action with PRIZ.pdf
PRIZ Academy - Functional Modeling In Action with PRIZ.pdf
PRIZ Guru
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
W1 WDM_Principle and basics to know.pptx
W1 WDM_Principle and basics to know.pptxW1 WDM_Principle and basics to know.pptx
W1 WDM_Principle and basics to know.pptx
muhhxx51
 
Dynamics of Structures with Uncertain Properties.pptx
Dynamics of Structures with Uncertain Properties.pptxDynamics of Structures with Uncertain Properties.pptx
Dynamics of Structures with Uncertain Properties.pptx
University of Glasgow
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
最新版加拿大魁北克大学蒙特利尔分校毕业证(UQAM毕业证书)原版定制
Taqyea
 
Introduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptxIntroduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptx
AS1920
 
Efficient Algorithms for Isogeny Computation on Hyperelliptic Curves: Their A...
Efficient Algorithms for Isogeny Computation on Hyperelliptic Curves: Their A...Efficient Algorithms for Isogeny Computation on Hyperelliptic Curves: Their A...
Efficient Algorithms for Isogeny Computation on Hyperelliptic Curves: Their A...
IJCNCJournal
 
Compiler Design_Syntax Directed Translation.pptx
Compiler Design_Syntax Directed Translation.pptxCompiler Design_Syntax Directed Translation.pptx
Compiler Design_Syntax Directed Translation.pptx
RushaliDeshmukh2
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
How to use nRF24L01 module with Arduino
How to use nRF24L01 module with ArduinoHow to use nRF24L01 module with Arduino
How to use nRF24L01 module with Arduino
CircuitDigest
 
Novel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth ControlNovel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth Control
Chris Harding
 
ZJIT: Building a Next Generation Ruby JIT
ZJIT: Building a Next Generation Ruby JITZJIT: Building a Next Generation Ruby JIT
ZJIT: Building a Next Generation Ruby JIT
maximechevalierboisv1
 

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

  • 1. Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features 2018/07/12 M1 Hiroki Shimanaka Matteo Pagliardini, Prakhar Gupta and Martin Jaggi NAACL 2018
  • 2. 1 lUnsupervised training of word representation, such as Word2Vec [Mikolov et al., 2013] is now routinely trained on very large amounts of raw text data, and have become ubiquitous building blocks of a majority of current state-of-the-art NLP applications. lWhile very useful semantic representations are available for words. Abstract & Introduction (1)
  • 3. 2 lA strong trend in deep-learning for NLP leads towards increasingly powerful and com-plex models. ØWhile extremely strong in expressiveness, the increased model complexity makes such models much slower to train on larger datasets. lOn the other hand, simpler “shallow” models such as matrix factorizations (or bilinear models) can benefit from training on much larger sets of data, which can be a key advantage, especially in the unsupervised setting. Abstract & Introduction (2)
  • 4. Abstract & Introduction (3) lThe authors present a simple but efficient unsupervised objective to train distributed representations of sentences. lTheir method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings. 3
  • 5. 4 Approach Their proposed model (Sent2Vec) can be seen as an extension of the Cl - BOW [Mikolov et al., 2013] training objecKve to train sentence instead of word embeddings. Sent2Vec is a simple unsupervised model allowing to coml -pose sentence embeddings using word vectors along with n-gram embeddings, simultaneously training composiKon and the embed-ding vectors themselves.
  • 6. 5 Model (1) lTheir model is inspired by simple matrix factor models (bilinear models) such as recently very successfully used in unsupervised learning of word embeddings. !∈ℝ$×& : target word vectors '∈ℝ&×|)|: learnt source word vectors ): vocabulary ℎ: hidden size ι4 ∈{0, 1}|)| :binary vector encoding : :: sentence
  • 7. 6 Model (2) lConceptually, Sent2Vec can be interpreted as a natural extension of the word-contexts from C-BOW [Mikolov et al., 2013] to a larger sentence context. !(#): the list of n−grams (including un−igrams) ι&(') ∈{0, 1}|/|: binary vector encoding !(#) #: sentence 67: source (or context) embed−ding
  • 8. 7 Model (3) lNegative sampling !"# :the set of words sampled nega6vely for the word $% ∈ ' ': sentence.": source (or context) embed−ding /": target embedding 0":the normalized frequency of $ in the corpus
  • 9. 8 Model (4) lSubsampling lTo select the possible target unigrams (positives), they use subsampling as in [Joulin et al., 2017; Bojanowski et al., 2017], each word ! being discarded with probability 1 − $%(!) ()* :the set of words sampled negaJvely for the word !+ ∈ - -: sentence 4): source (or context) embed−ding 5): target embed−ding$6 ! ≔ 8) ∑):∈; 8): 8):the normalized frequency of ! in the corpus
  • 10. 9 Experimental Setting (1) lDataset: lthe Toronto book corpus (70M sentences) lWikipedia sentences and tweets lDropout: lFor each sentence they use dropout on its list of n-grams ! " ∖ {%(")}, where %(") is the set of all unigrams contained in sentence ". lThey find that dropping K n-grams (n > 1) for each sentence is giving superior results compared to dropping each token with some fixed probability. lRegularization lApplying L1 regularization to the word vectors. lOptimizer lSGD
  • 12. 11 Evalua&on Tasks lTransfer tasks: lSupervised lMR: movie review senti-ment lCR: product reviews lSUBJ: subjectivity classification lMPQA: opinion polarity lTREC: question type classifi-cation lMSRP: paraphrase identification lUnsupervised lSICK: semantic relatedness task lSTS : Semantic textual similarlity
  • 16. 15 Conclusion lIn this paper, they introduce a novel, computationally efficient, unsupervised, C-BOW-inspired method to train and infer sentence embeddings. lOn supervised evaluations, their method, on an average, achieves better performance than all other unsupervised competitors with the exception of Skip-Thought [Kiros et al., 2015]. lHowever, their model is generalizable, extremely fast to train, simple to understand and easily interpretable, showing the relevance of simple and well-grounded representation models in contrast to the models using deep architectures. lFuture work could focus on augmenting the model to exploit data with ordered sentences.