0% found this document useful (0 votes)

48 views18 pages

ML - Attention Mechanism - GeeksforGeeks

The document discusses the attention mechanism in machine learning, particularly in natural language processing and computer vision, explaining how it allows models to focus on relevant information while filtering out noise. It details the architecture of attention mechanisms, including the encoder-decoder structure, and outlines the steps involved in processing input data to generate context vectors that enhance model performance. Applications of attention mechanisms include machine translation, image captioning, and speech recognition, significantly improving the quality and accuracy of these tasks.

Uploaded by

asaddocuext

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views18 pages

ML - Attention Mechanism - GeeksforGeeks

Uploaded by

asaddocuext

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

90% Refund @Courses Machine Learning Tutorial Data Analysis Tutorial Python – Data visualization tutorial Nu

ML – Attention mechanism
Read Courses Jobs

Let’s take a look at hearing and a case study of selective attention in the
context of a crowded cocktail party. Assume you’re at a social gathering with a
large number of people speaking at the same time. You’re also talking with a
friend, but the background noise is not recognized. You are
only paying attention to your friend’s voice and grasping their words while
filtering out background noise. In this scenario, our auditory system employs
selective attention to focus on the relevant auditory information. The
neurological system of our brain improves the representation of speech by
prioritizing relevant sounds and ignoring background noises.

A computer method for prioritizing specific information in a given context is

called the attention mechanism of deep learning. During translation or
question-answering activities, attention is used in natural language processing
to align pertinent portions of the source phrase. Without necessarily relying on
reinforcement learning, attention mechanisms allow neural networks to give
various weights to various input items, boosting their ability to capture crucial
information and improve performance in a variety of tasks. Google Streetview’s
house number identification is an example of an attention mechanism in
Computer vision that enables models to systematically identify particular
portions of an image for processing.

Attention Mechanism
An attention mechanism is an Encoder-Decoder kind of neural network
architecture that allows the model to focus on specific sections of the input
while executing a task. It dynamically assigns weights to different elements in
the input, indicating their relative importance or relevance. By incorporating
We use cookies to ensure
attention, you have the
the model canbest browsing experience
selectively attend on to
our and
website. By using our
process the most relevant
Got It !
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 1/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

information, capturing dependencies and relationships within the data. This

mechanism is particularly valuable in tasks involving sequential or structured
data, such as natural language processing or computer vision, as it enables the
model to effectively handle long-range dependencies and improve
performance by selectively attending to important features or contexts.

Recurrent models of visual attention use reinforcement learning to focus

attention on key areas of the image. A recurrent neural network governs the
peek network, which dynamically selects particular locations for exploration
over time. In classification tasks, this method outperforms convolutional neural
networks. Additionally, this framework goes beyond image identification and
may be used for a variety of visual reinforcement learning applications, such as
helping robots choose behaviours to accomplish particular goals. Although the
most basic use of this strategy is supervised learning, the use of reinforcement
learning permits more adaptable and flexible decision-making based on
feedback from past glances and rewards earned throughout the learning
process.

Free ChatGPT Browser Extension

ChatGPT Browser Plugin as your AI assistant on any page

Sider Open

The application of attention mechanisms to image captioning has substantially

enhanced the quality and accuracy of generated captions. By incorporating
attention, the model learns to focus on pertinent image regions while creating
each caption word. The model can synchronize the visual and textual
modalities by paying attention to various areas of the image at each time step
We use cookies to
thanks toensure you have the best
the attention browsing experience
mechanism. on our website.
By focusing By using our objects or areas
on important
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
in the image, the model is able to produce captions that are more detailed and
https://www.geeksforgeeks.org/ml-attention-mechanism/ 2/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

contextually appropriate. The attention-based image captioning models have

proven to perform better at catching minute details, managing complicated
scenes, and delivering cohesive and educational captions that closely match
the visual material.

The attention mechanism is a technique used in machine learning and natural

language processing to increase model accuracy by focusing on relevant data.
It enables the model to focus on certain areas of the input data, giving more
weight to crucial features and disregarding unimportant ones. Each input
attribute is given a weight based on how important it is to the output in order
to accomplish this. The performance of tasks requiring the utilization of the
attention mechanism has significantly improved in areas including speech
recognition, image captioning, and machine translation.

How Attention Mechanism Works

An attention mechanism in a neural network model typically consists of the

following steps:

1. Input Encoding: The input sequence of data is represented or embedded

using a collection of representations. This step transforms the input into a
format that can be processed by the attention mechanism.
2. Query Generation: A query vector is generated based on the current state
or context of the model. This query vector represents the information the
model wants to focus on or retrieve from the input.
3. Key-Value Pair Creation: The input representations are split into key-value
pairs. The keys capture the information that will be used to determine the
importance or relevance, while the values contain the actual data or
information.
4. Similarity Computation: The similarity between the query vector and each
key is computed to measure their compatibility or relevance. Different
similarity metrics can be used, such as dot product, cosine similarity, or
scaled dot product.

We use cookies to ensure you have the best browsing experience on our website. By using our
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 3/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

where,
hs: Encoder source hidden state at position s
yi: Encoder Target hidden state at the position i
W: Weight Matrix
v : Weight vector
5. Attention Weights Calculation: The similarity scores are passed through a
softmax function to obtain attention weights. These weights indicate the
importance or relevance of each key-value pair.

6. Weighted Sum: The attention weights are applied to the corresponding

values, generating a weighted sum. This step aggregates the relevant
information from the input based on their importance determined by the
attention mechanism.

Here,
Ts: Total number of key-value pairs (source hidden states) in the encoder.

7. Context Vector: The weighted sum serves as a context vector, representing

the attended or focused information from the input. It captures the relevant
context for the current step or task.
8. Integration with the Model: The context vector is combined with the
model’s current state or hidden representation, providing additional
information or context for subsequent steps or layers of the model.
9. Repeat: Steps 2 to 8 are repeated for each step or iteration of the model,
allowing the attention mechanism to dynamically focus on different parts of
the input sequence or data.

We useBycookies
incorporating an attention
to ensure you have mechanism,
the best browsing experience onthe modelBycan
our website. usingeffectively
our capture
site, you acknowledge thatemphasize
dependencies, you have read and understoodinformation,
important our Cookie Policy and
& Privacy Policy
adaptively focus on
https://www.geeksforgeeks.org/ml-attention-mechanism/ 4/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

different elements of the input, leading to improved performance in tasks such

as machine translation, text summarization, or image recognition.

Attention Mechanism Architecture for Machine Translation

The attention mechanism architecture in machine translation involves three

main components: Encoder, Attention, and Decoder. The Encoder processes the
input sequence and generates hidden states. The Attention component
computes the relevance between the current target hidden state and the
encoder’s hidden states, generating attention weights. These weights are used
to compute a context vector that captures the relevant information from the
encoder’s hidden states. Finally, the Decoder takes the context vector and
generates the output sequence. This architecture allows the model to focus on
different parts of the input sequence during the translation process, improving
the alignment and quality of the translations. We can observe 3 sub-parts or
components of the Attention Mechanism architecture :

Encoder
Attention
Decoder

Consider the following Encoder-Decoder architecture with Attention.

We use cookies to ensure you have the best browsing experience on our website. By using our
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 5/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

Encoder-Decoder with Attention

Encoder:

The encoder applies recurrent neural networks (RNNs) or transformer-based

models to iteratively process the input sequence. The encoder creates a hidden
state at each step that contains the data from the previous hidden state and
the current input token. The complete input sequence is represented by these
hidden states taken together.

We use cookies to ensure you have the best browsing experience on our website. By using our
Encoder
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 6/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

Contains an RNN layer (Can be LSTMs or GRU):

1. Let’s, there are 4 words sentence then inputs will be:

2. Each input goes through an Embedding Layer, It can be RNN, LSTM, GRU or
trnasformers
3. Each of the inputs generates a hidden representation.
4. This generates the outputs for the Encoder:

Attention:

The attention component computes the importance or relevance of each

encoder’s hidden state with respect to the current target hidden state. It
generates a context vector that captures the relevant information from the
encoder’s hidden states. The attention mechanism can be represented
mathematically as follows:

Free ChatGPT Browser Extension

ChatGPT Browser Plugin as your AI assistant on any page

Sider Open

Our goal is to generate the context vectors.

For example, context vector

tells us how much importance/ attention should be given the inputs:

.
This layer in turn contains 3 subparts:
Feed Forward Network
We use cookies to ensure you have the best browsing experience on our website. By using our
Softmax
site, you acknowledge thatCalculation
you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 7/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

Context vector generation

attention

Feed Forward Network:

The feed-forward network is responsible for transforming the target hidden
state into a representation that is compatible with the attention mechanism. It
takes the target hidden state h(t-1) and applies a linear transformation
followed by a non-linear activation function (e.g., ReLU) to obtain a new
representation

We use cookies to ensure you have the best browsing experience on our website. By using our
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 8/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

Feed-Forward-Network

Each is a simple feed-forward neural network with one hidden

layer. The input for this feed-forward network is:

Previous Decoder state

The output of Encoder states.

Each unit generates outputs: .i.e

Here,

g can be any activation function such as sigmoid, tanh, or ReLu.

Attention Weights or Softmax Calculation:

A softmax function is then used to convert the similarity scores into attention
weights. These weights govern the importance or attention given to each
encoder’s hidden state. Higher weights indicate higher relevance or
importance.

softmax calculation

These are called the attention weights. It decides how much

importance should be given to the inputs .

Contact Vector Generation:

Context Vector: The context vector is a weighted sum of the encoder’s hidden
states, where the attention weights serve as the weights for the summation. It
represents a specific arrangement of the encoder’s hidden states pertinent to
We use cookies to ensure you have the best browsing experience on our website. By using our
generating
site, you acknowledgethe
thatcurrent token.
you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 9/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

context vector generation

We find in the same way and feed it to different RNN units of the
Decoder layer. So this is the final vector which is the product of (Probability
Distribution) and (Encoder’s output) which is nothing but the attention paid to
the input words.

Decoder:
The context vector is fed into the decoder along with the current hidden state
of the decoder in order to predict the next token in the output sequence. Until
the decoder generates the entire output sequence, this process is done
recursively.

We feed these Context Vectors to the RNNs of the Decoder layer. Each decoder
produces an output which is the translation for the input words.

Conclusions

The attention mechanism allows the decoder to dynamically focus on different

segments of the input sequence based on their importance to the current
decoding step. As a result, the model can handle lengthy input sequences with
ease and capture the dependencies between various input and output
sequence components. The attention mechanism is a crucial component of
many cutting-edge sequence-to-sequence models since it significantly boosts
the quality and fluency of the generated sequences.

We useFrequently Asked
cookies to ensure you Questions
have the best (FAQs)
browsing experience on our website. By using our
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 10/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

1. What is self attention?

Self-attention allows a model to weigh the importance of different parts

of its input sequence when making predictions. It enables the model to
focus selectively on relevant information, considering the context of each
element in relation to others. This mechanism enhances the ability to
capture long-range dependencies and improves performance in tasks like
machine learning translations and natural language processing.

2. What is the applications of attention mechanism?

Attention mechanism is used in various Natural Language Processing and

Computer vision tasks.

Machine Translation: Attention mechanisms have significantly

improved the performance of machine translation models. It enable
the model to focus on different parts of the source sentence when
generating each word in the target sentence.
In tasks like sentiment analysis, question answering, and named entity
recognition, attention mechanisms helps models to focus on critical
words contributing to sentiment expression.
In text summarization, attention aids in selecting key information for
concise summaries.
Image Captioning: Attention mechanisms in image captioning models
allow the model to focus on specific regions of an image while
generating captions.
Attention mechanisms have been applied to improve the accuracy of
automatic speech recognition systems.
In generative models like Generative Adversarial Networks (GANs)
and Variational Autoencoders (VAEs), attention mechanisms help the
model capture dependencies between different parts of the input data,
leading
We use cookies to ensureto
youmore
have therealistic andexperience
best browsing coherent generated
on our samples.
website. By using our
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 11/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

Object detection models have been known to employ attention

processes in order to improve their localization and identification
accuracy by strengthening their focus on relevant portions of an image.

3. What are the different types of attention mechanism?

There are two types of attention mechanism:

Additive Attention computes attention scores by applying a feed-

forward neural network to the concatenated query and key vectors.
Dot-Product attention measures attention scores using dot product
between the query and key vectors.

4. What are the two main steps of attention mechanism?

The attention mechanism comprises two main steps:

Computing attention scores by measuring the relevance between a

query element and all other elements in the input sequence, often
using methods like dot-product or additive attention.
Weighted summation is computed based on these attention scores,
creating a context vector that emphasizes important input elements.

These steps enable the model to selectively focus on relevant

information.

5. How attention mechanism works?

Attention mechanisms operate by assigning weights to input elements

We use cookies to ensure you have the best browsing experience on our website. By using our
based on their relevance to a specific context or query. The process
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 12/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

involves calculating attention scores by comparing query and key vectors,

applying a softmax function for normalization, and obtaining a weighted
sum of input elements. This weighted sum, or context vector, captures
crucial information for the model’s decision-making. Attention
mechanisms enhance the model’s ability to selectively focus on pertinent
details, enabling it to capture long-range dependencies and improve
performance in various tasks, including natural language processing and
computer vision.

Whether you're preparing for your first job interview or aiming to upskill in this
ever-evolving tech landscape, GeeksforGeeks Courses are your key to success.
We provide top-quality content at affordable prices, all geared towards
accelerating your growth in a time-bound manner. Join the millions we've
already empowered, and we're here to do the same for you. Don't miss out -
check it out now!

Commit to GfG's Three-90 Challenge! Purchase a course, complete 90% in 90 days,

and save 90% cost click here to explore.

Last Updated : 28 Nov, 2023 8

Previous Next

How to Detect Outliers in Machine Item-to-Item Based Collaborative

Learning Filtering

Share your thoughts in the comments Add Your Comment

Similar Reads
We use cookies to ensure you have the best browsing experience on our website. By using our
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 13/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks
Sliding Window Attention Self -attention in NLP

Self -attention in NLP Sparse Transformer: Stride and Fixed

Factorized Attention

Dilated and Global Sliding Window

Attention

Complete Tutorials
Computer Vision Tutorial Pandas AI: The Generative AI Python
Library

Top Computer Vision Projects (2023) Deep Learning Tutorial

Top 100+ Machine Learning Projects for

2023 [with Source Code]

K KeshavBa… Follow

Article Tags : Machine Learning

Practice Tags : Machine Learning

Additional Information

We use cookies to ensure you have the best browsing experience on our website. By using our
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 14/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

💡 Spotlight

Top 5 Digital Ocean Alternatives 2023

DigitalOcean has changed the way teams and individual developers host
their apps and other web development projects. It uses a unique form of
hosting that removes the need to manage your development infrastructure
through reliable cloud hosting... Read More

Digital Ocean Alternative Data Center Cloud Storage

A-143, 9th Floor, Sovereign Corporate

Tower, Sector-136, Noida, Uttar Pradesh -
201305

Company Explore
About Us Job-A-Thon Hiring Challenge
Legal Hack-A-Thon
Careers GfG Weekly Contest
In Media Offline Classes (Delhi/NCR)
Contact Us DSA in JAVA/C++
Advertise with us Master System Design
GFG Corporate Solution Master CP
Placement Training Program GeeksforGeeks Videos
Apply for Mentor

Languages DSA
Python Data Structures
Java Algorithms
C++ the best browsing experience on our website. By using
We use cookies to ensure you have DSA
ourfor Beginners
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Basic
PHP PolicyDSA Problems

https://www.geeksforgeeks.org/ml-attention-mechanism/ 15/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

GoLang DSA Roadmap

SQL Top 100 DSA Interview Problems
R Language DSA Roadmap by Sandeep Jain
Android Tutorial All Cheat Sheets
Tutorials Archive

Data Science & ML HTML & CSS

Data Science With Python HTML
Data Science For Beginner CSS
Machine Learning Tutorial Bootstrap
ML Maths Tailwind CSS
Data Visualisation Tutorial SASS
Pandas Tutorial LESS
NumPy Tutorial Web Design
NLP Tutorial
Deep Learning Tutorial

Python Computer Science

Python Programming Examples GATE CS Notes
Django Tutorial Operating Systems
Python Projects Computer Network
Python Tkinter Database Management System
Web Scraping Software Engineering
OpenCV Python Tutorial Digital Logic Design
Python Interview Question Engineering Maths

DevOps Competitive Programming

Git Top DS or Algo for CP
AWS Top 50 Tree
Docker Top 50 Graph
Kubernetes Top 50 Array
Azure Top 50 String
GCP Top 50 DP
DevOps
We use cookies to ensure Roadmap
you have Top 15
the best browsing experience on our website. By using ourWebsites for CP
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 16/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

System Design JavaScript

What is System Design TypeScript
Monolithic and Distributed SD ReactJS
High Level Design or HLD NextJS
Low Level Design or LLD AngularJS
Crack System Design Round NodeJS
System Design Interview Questions Express.js
Grokking Modern System Design Lodash
Web Browser

NCERT Solutions School Subjects

Class 12 Mathematics
Class 11 Physics
Class 10 Chemistry
Class 9 Biology
Class 8 Social Science
Complete Study Material English Grammar

Commerce Management & Finance

Accountancy Management
Business Studies HR Managament
Indian Economics Income Tax
Macroeconomics Finance
Microeconimics Economics
Statistics for Economics

UPSC Study Material SSC/ BANKING

Polity Notes SSC CGL Syllabus
Geography Notes SBI PO Syllabus
History Notes SBI Clerk Syllabus
Science and Technology Notes IBPS PO Syllabus
Economy Notes IBPS Clerk Syllabus
Ethics Notes SSC CGL Practice Papers
We use cookies to ensure you have
Previous Yearthe best browsing experience on our website. By using our
Papers
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 17/18
1/10/24, 2:14 AM ML - Attention mechanism - GeeksforGeeks

Colleges Companies
Indian Colleges Admission & Campus Experiences IT Companies
Top Engineering Colleges Software Development Companies
Top BCA Colleges Artificial Intelligence(AI) Companies
Top MBA Colleges CyberSecurity Companies
Top Architecture College Service Based Companies
Choose College For Graduation Product Based Companies
PSUs for CS Engineers

Preparation Corner Exams

Company Wise Preparation JEE Mains
Preparation for SDE JEE Advanced
Experienced Interviews GATE CS
Internship Interviews NEET
Competitive Programming UGC NET
Aptitude Preparation
Puzzles

We use cookies to ensure you have the best browsing experience on our website. By using our
site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

https://www.geeksforgeeks.org/ml-attention-mechanism/ 18/18