0% found this document useful (0 votes)
2 views

LSTM Presentation

The document provides an overview of Long Short-Term Memory (LSTM) networks, which are a type of recurrent neural network designed to process sequential data while addressing issues like long-term dependencies and the vanishing gradient problem. It details the history of LSTM development, its structure including memory cells and gates, and contrasts it with traditional RNNs. Key features of LSTMs include the input, forget, and output gates that manage information flow, enabling effective learning from sequential data.

Uploaded by

ZeeshaN
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

LSTM Presentation

The document provides an overview of Long Short-Term Memory (LSTM) networks, which are a type of recurrent neural network designed to process sequential data while addressing issues like long-term dependencies and the vanishing gradient problem. It details the history of LSTM development, its structure including memory cells and gates, and contrasts it with traditional RNNs. Key features of LSTMs include the input, forget, and output gates that manage information flow, enabling effective learning from sequential data.

Uploaded by

ZeeshaN
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

202

4
Long Short Term
Memory (LSTM) Model
Group Members
Burhan Ahmed(bai222001)
Saif Ur Rehman(bcs222001)

Halima Sadia(bse222002)
CONTENT

O O
Introduction to LSTM Features of LSTM
1 2

O
3
O
4
↖ Introduction
1
↗ To LSTM
↘ Overview of Long Short-Term Memory Networks


What is LSTM?
 LSTM is a type of recurrent neural network (RNN) that can
process and analyze sequential data, such as text, speech,
and time series.
 They use a memory cell and gates to control the flow of
information.
 Memory cell stores information from previous time steps and
uses it to influence the output of the cell at the current time
step.
 The output of each LSTM cell is passed to the next cell in the
network, allowing the LSTM to process and analyze
sequential data over multiple time steps.

LSTM
HISTORY
 1990: LSTM concept proposed by Sepp Hochreiter
and Jürgen Schmidhuber.
 1997: Paper published explaining the design with
input, forget, and output gates.
 2015: Rise of Attention Mechanisms and
Transformers challenging LSTMs.
 2020: New architectures and training algorithms.
 2021: Introduction of Corrector LSTM for accurate
predictions.
 2022: NXAI invented xLSTM (Extended LSTM) with
billions of parameters.
Recurrent Neural
Network RNN
What is RNN? Basic Structure
Definition: Recurrent Neural Networks
(RNNs) are neural networks designed to
process sequential data with temporal
dependencies.
Key Characteristics:
 Analyze data with a temporal dimension
(e.g., time-series, speech, text).
 Use a hidden state passed from one
timestep to the next.
 Hidden state updates based on current
input and previous hidden state.
Strengths: Excellent at capturing short-term
dependencies.
Challenges: Struggle to handle long-term
dependencies due to vanishing or exploding
gradients.
Vanilla RNN


PROBLEMS IN RNN
Long Term Dependency Issue in RNN

Let us consider a sentence-


"I am a data science student and I love
machine_______.”
We know the blank has to be filled with 'learning'. But
had there been many terms after "l am a data science
student" like, "l am a data science student pursuing
MS from University of...... and I love machine _______”.
This time, however, RNNS fails to work. Likely in this
case we do not need unnecessary information like
"pursuing MS from University of......".
What LSTMs do is, leverage their forget gate to
eliminate the unnecessary information, which helps
them handle long-term dependencies.
VANISHING GRADIENT PROBLEM
 RNNs use the tanh (hyperbolic tangent) function.
 Output range: [-1, 1], derivative range: [0, 1].
 RNNs perform repeated matrix multiplications as
input sequence length increases.
 Use the chain rule of differentiation during
backpropagation.
 Multiplying small numbers repeatedly makes
gradients exponentially smaller.
 Leads to gradient values approaching 0, halting
weight updates.
 Results in poor training and inability to capture
long-term dependencies.
Vanishing Gradient Problem
RNN VS LSTM
RNN VS LSTM
Graphs of Sigmoid and
Tanh Functions
↖ FEATURES
2
↗ OF LSTM
↘ Structure, Gates, Memory Cell


Structure of LSTM
Structure of LSTM
Memory Cell
 Memory cells maintain information across time
steps, enabling LSTMs to learn and utilize long-
term dependencies in data.
 The content of the memory cell is updated or
modified by the interaction of three gates:
 Input Gate: Determines what new information to
add to the memory.
 Forget Gate: Decides which information to erase.
 Output Gate: Controls what part of the memory is
used for the current output.
Memory Cell
LSTM Memory Cell
Input Gate

As discussed earlier, the input gate optionally permits


information that is relevant from the current cell state.
It is the gate that determines which information is
necessary for the current input and which isn't by using
the sigmoid activation function. It then stores the
information in the current cell state. Next, comes to
play the tanh activation mechanism, which computes
the vector representations of the input-gate values,
which are added to the cell state.
Forget Gate

We already discussed, while introducing gates, that the hidden


state is responsible for predicting outputs. The output generated
from the hidden state at (t-1) timestamp is h(t-1). After the
forget gate receives the input x(t) and output from h(t-1), it
performs a pointwise multiplication with its weight matrix with
an add-on of sigmoid activation, which generates probability
scores. These probability scores help it determine what is useful
information and what is irrelevant.
THANK YOU !

You might also like