SlideShare a Scribd company logo
3
Most read
Attention Mechanism in Language
Understanding and their Applications
by Rajarshee Mitra, Research Engineer, Artifacia
(@rajarshee_mitra)
March 25, 2017
AI Meet|
Agenda
1. What is Seq2Seq ?
2. Challenges in vanilla Seq2Seq
3. Attention - introduction
4. Attention (contd.)
5. Attention - microscopic view
6. Visualizing attention
7. More applications of attention
8. Reference.
AI Meet|
What is Seq2Seq ?
● Contains two different RNNs: An encoder and a decoder.
● Encoder encodes the sentence to form a context vector.
● Decodes decodes the vector to generate language.
● Applications: NMT, Summarization, Conversations etc.
AI Meet|
Challenges in vanilla Seq2Seq
● Hard for encoder to compress the whole source sentence into a
single vector.
● Performance deteriorates rapidly as the length of sentence
increases.
● A single context vector for generating every word in decoder does
not produce the best results.
AI Meet|
Attention
● Does not squash the whole source sentence into a vector.
● Considers a subset of the word vectors in the source more than the
others while generating each target word.
● It is an intuitive process and comparable to how we read.
● While inferring something from a piece of text -- like answering
questions, we pay more attention to some words each time in the
text.
● Eventually, the machine learns where to attend more and where,
less.
● Eg. while translating an english sentence to french, the fourth word
in the french sentence can be highly correlated to the first word in
the english sentence.
● Hence, it is not very useful to consider the whole english sentence
everytime for generating each french word.
AI Meet|
Attention (contd.)
● From a high level point of view, the attention model differs
from a traditional Seq2Seq in sense that in vanilla seq2seq,
we simply feed h_t to softmax and not h_t_
● C_t differs for every time step in the decoder.
AI Meet|
Attention - microscopic view
AI Meet|
Visualizing attention
Top: Translation , Below: Voice recognition
AI Meet|
More applications of attention
1. Neural Machine Translation.
2. Text Summarization.
3. Voice recognition.
4. Generate parse trees of sentences.
5. Chatbots.
Attentional interfaces can be used whenever one wants to
interface with a neural network that has a repeating
structure in its output.
AI Meet|
Reference
1. https://nlp.stanford.edu/pubs/emnlp15_attn.pdf
1. http://distill.pub/2016/augmented-rnns
1. https://arxiv.org/pdf/1409.0473.pdf
THANK YOU
Join
meetup.com/Artifacia-AI-Meet/

More Related Content

What's hot (20)

PDF
Deep learning for NLP and Transformer
Arvind Devaraj
 
PDF
LSTM
佳蓉 倪
 
PDF
Attention scores and mechanisms
JaeHo Jang
 
PPTX
Thomas Wolf "Transfer learning in NLP"
Fwdays
 
PPTX
Survey of Attention mechanism
SwatiNarkhede1
 
PPTX
NLP State of the Art | BERT
shaurya uppal
 
PPTX
[AIoTLab]attention mechanism.pptx
TuCaoMinh2
 
PPTX
Spectral clustering
SOYEON KIM
 
PDF
LSTM Tutorial
Ralph Schlosser
 
PPTX
Survey of Attention mechanism & Use in Computer Vision
SwatiNarkhede1
 
PPTX
Word embedding
ShivaniChoudhary74
 
PDF
Variational Autoencoder
Mark Chang
 
PPTX
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
PDF
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
PDF
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
PPTX
Hyperparameter Tuning
Jon Lederman
 
PPTX
Natural Language Processing (NLP) - Introduction
Aritra Mukherjee
 
PDF
Recurrent Neural Networks, LSTM and GRU
ananth
 
PDF
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
Deep learning for NLP and Transformer
Arvind Devaraj
 
Attention scores and mechanisms
JaeHo Jang
 
Thomas Wolf "Transfer learning in NLP"
Fwdays
 
Survey of Attention mechanism
SwatiNarkhede1
 
NLP State of the Art | BERT
shaurya uppal
 
[AIoTLab]attention mechanism.pptx
TuCaoMinh2
 
Spectral clustering
SOYEON KIM
 
LSTM Tutorial
Ralph Schlosser
 
Survey of Attention mechanism & Use in Computer Vision
SwatiNarkhede1
 
Word embedding
ShivaniChoudhary74
 
Variational Autoencoder
Mark Chang
 
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
Hyperparameter Tuning
Jon Lederman
 
Natural Language Processing (NLP) - Introduction
Aritra Mukherjee
 
Recurrent Neural Networks, LSTM and GRU
ananth
 
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 

Similar to Attention Mechanism in Language Understanding and its Applications (20)

PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
NLP_Project_Paper_up276_vec241
Urjit Patel
 
PDF
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
ijaia
 
PPTX
Interspeech 2019 Survey Talk: When Attention Meets Speech Applications
Kyu Jeong Han
 
PPTX
EXTENDING OUTPUT ATTENTIONS IN RECURRENTNEURAL NETWORKS FOR DIALOG GENERATION
gerogepatton
 
PDF
Transformer_tutorial.pdf
fikki11
 
PDF
NMT with Attention-1.pdfhhhhhhhhhhhhhhhh
Kowser Tusher
 
PDF
Attention
SEMINARGROOT
 
PDF
Natural Language Processing NLP (Transformers)
Hichem Felouat
 
PPTX
Transformer Zoo
Grigory Sapunov
 
PDF
NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...
Eugene Nho
 
PPTX
Week9_Seq2seq.pptx
KhngNguyn81
 
PDF
From_seq2seq_to_BERT
Huali Zhao
 
PPT
NLP 2020: What Works and What's Next
Seth Grimes
 
PPTX
2010 PACLIC - pay attention to categories
WarNik Chow
 
PPTX
Image captioning
Rajesh Shreedhar Bhat
 
PDF
attention is all you need.pdf attention is all you need.pdfattention is all y...
Amit Ranjan
 
PDF
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
ijnlc
 
PDF
Comparative Study of Abstractive Text Summarization Techniques
IRJET Journal
 
PDF
Abstractive Text Summarization
Tho Phan
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
NLP_Project_Paper_up276_vec241
Urjit Patel
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
ijaia
 
Interspeech 2019 Survey Talk: When Attention Meets Speech Applications
Kyu Jeong Han
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENTNEURAL NETWORKS FOR DIALOG GENERATION
gerogepatton
 
Transformer_tutorial.pdf
fikki11
 
NMT with Attention-1.pdfhhhhhhhhhhhhhhhh
Kowser Tusher
 
Attention
SEMINARGROOT
 
Natural Language Processing NLP (Transformers)
Hichem Felouat
 
Transformer Zoo
Grigory Sapunov
 
NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder...
Eugene Nho
 
Week9_Seq2seq.pptx
KhngNguyn81
 
From_seq2seq_to_BERT
Huali Zhao
 
NLP 2020: What Works and What's Next
Seth Grimes
 
2010 PACLIC - pay attention to categories
WarNik Chow
 
Image captioning
Rajesh Shreedhar Bhat
 
attention is all you need.pdf attention is all you need.pdfattention is all y...
Amit Ranjan
 
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
ijnlc
 
Comparative Study of Abstractive Text Summarization Techniques
IRJET Journal
 
Abstractive Text Summarization
Tho Phan
 
Ad

Recently uploaded (20)

PDF
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PDF
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PDF
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
PPTX
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
PPTX
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
PDF
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
PDF
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
PDF
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
PDF
The Growing Value and Application of FME & GenAI
Safe Software
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
The Growing Value and Application of FME & GenAI
Safe Software
 
Practical Applications of AI in Local Government
OnBoard
 
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
Ad

Attention Mechanism in Language Understanding and its Applications

  • 1. Attention Mechanism in Language Understanding and their Applications by Rajarshee Mitra, Research Engineer, Artifacia (@rajarshee_mitra) March 25, 2017
  • 2. AI Meet| Agenda 1. What is Seq2Seq ? 2. Challenges in vanilla Seq2Seq 3. Attention - introduction 4. Attention (contd.) 5. Attention - microscopic view 6. Visualizing attention 7. More applications of attention 8. Reference.
  • 3. AI Meet| What is Seq2Seq ? ● Contains two different RNNs: An encoder and a decoder. ● Encoder encodes the sentence to form a context vector. ● Decodes decodes the vector to generate language. ● Applications: NMT, Summarization, Conversations etc.
  • 4. AI Meet| Challenges in vanilla Seq2Seq ● Hard for encoder to compress the whole source sentence into a single vector. ● Performance deteriorates rapidly as the length of sentence increases. ● A single context vector for generating every word in decoder does not produce the best results.
  • 5. AI Meet| Attention ● Does not squash the whole source sentence into a vector. ● Considers a subset of the word vectors in the source more than the others while generating each target word. ● It is an intuitive process and comparable to how we read. ● While inferring something from a piece of text -- like answering questions, we pay more attention to some words each time in the text. ● Eventually, the machine learns where to attend more and where, less. ● Eg. while translating an english sentence to french, the fourth word in the french sentence can be highly correlated to the first word in the english sentence. ● Hence, it is not very useful to consider the whole english sentence everytime for generating each french word.
  • 6. AI Meet| Attention (contd.) ● From a high level point of view, the attention model differs from a traditional Seq2Seq in sense that in vanilla seq2seq, we simply feed h_t to softmax and not h_t_ ● C_t differs for every time step in the decoder.
  • 7. AI Meet| Attention - microscopic view
  • 8. AI Meet| Visualizing attention Top: Translation , Below: Voice recognition
  • 9. AI Meet| More applications of attention 1. Neural Machine Translation. 2. Text Summarization. 3. Voice recognition. 4. Generate parse trees of sentences. 5. Chatbots. Attentional interfaces can be used whenever one wants to interface with a neural network that has a repeating structure in its output.
  • 10. AI Meet| Reference 1. https://nlp.stanford.edu/pubs/emnlp15_attn.pdf 1. http://distill.pub/2016/augmented-rnns 1. https://arxiv.org/pdf/1409.0473.pdf