Long Short-Term Memory Networks (LSTM)- simply explained! _ Data Basecamp

Long Short-Term Memory (LSTM) networks are a subtype of Recurrent Neural Networks (RNN) designed to recognize patterns in sequential data by retaining information in both short-term and long-term memory. LSTMs address the limitations of standard RNNs, which struggle with maintaining context over longer sequences, by utilizing a cell state and various gates to manage information retention. While LSTMs have been widely used in applications like natural language processing and voice assistants, their prominence is declining as transformer models gain popularity despite their higher computational demands.

Uploaded by

jahid MazZ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views4 pages

Long Short-Term Memory Networks (LSTM)- simply explained! _ Data Basecamp

Uploaded by

jahid MazZ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

The Long Short-Term Memory (short: LSTM) model is a subtype of Recurrent Neural Networks (RNN). It is
used to recognize patterns in data sequences, such as those that appear in sensor data, stock prices, or
natural language. RNNs can do this because, in addition to the actual value, they also include its position
in the sequence in the prediction.

What are Recurrent Neural Networks?

To understand how Recurrent Neural Networks work, we have to take another look at how regular
feedforward neural networks are structured. In these, a neuron of the hidden layer is connected with the
neurons from the previous layer and the neurons from the following layer. In such a network, the output of
a neuron can only be passed forward, but never to a neuron on the same layer or even the previous layer,
hence the name “feedforward”.
Structure Feedforward Neural Network | Source: Author

This is different for recurrent neural networks. The output of a neuron can very well be used as input for a
previous layer or the current layer. This is much closer to how our brain works than how feedforward neural
networks are constructed. In many applications, we also need to understand the steps computed
immediately before improving the overall result.

What Problems do RNNs face?

Recurrent Neural Networks were a real breakthrough in the field of Deep Learning, as for the first time, the
computations from the recent past were also included in the current computation, significantly improving
the results in language processing. Nevertheless, during training, they also bring some problems that need
to be taken into account.

As we have already explained in our article on the gradient method, when training neural networks with
the gradient method, it can happen that the gradient either takes on very small values close to 0 or very
large values close to infinity. In both cases, we cannot change the weights of the neurons during
backpropagation, because the weight either does not change at all or we cannot multiply the number
with such a large value. Because of the many interconnections in the recurrent neural network and the
slightly modified form of the backpropagation algorithm used for it, the probability that these problems
will occur is much higher than in normal feedforward networks.

Regular RNNs are very good at remembering contexts and incorporating them into predictions. For
example, this allows the RNN to recognize that in the sentence “The clouds are at the ___” the word “sky”
is needed to correctly complete the sentence in that context. In a longer sentence, on the other hand, it
becomes much more difficult to maintain context. In the slightly modified sentence “The clouds, which
partly flow into each other and hang low, are at the ___ “, it becomes much more difficult for a Recurrent
Neural Network to infer the word “sky”.
How do Long Short-Term Memory Models work?

The problem with Recurrent Neural Networks is that they have a short-term memory to retain previous
information in the current neuron. However, this ability decreases very quickly for longer sequences. As a
remedy for this, the LSTM models were introduced to be able to retain past information even longer.

The problem with Recurrent Neural Networks is that they simply store the previous data in their “short-term
memory”. Once the memory in it runs out, it simply deletes the longest retained information and replaces
it with new data. The LSTM model attempts to escape this problem by retaining selected information in
long-term memory. This long-term memory is stored in the so-called Cell State. In addition, there is also
the hidden state, which we already know from normal neural networks and in which short-term
information from the previous calculation steps is stored. The hidden state is the short-term memory of
the model. This also explains the name Long Short-Term Networks.

In each computational step, the current input x(t) is used, the previous state of short-term memory c(t-1),
and the previous state of hidden state h(t-1).

LSTM Architecture | Photo based on Towards Data Science

These three values pass through the following gates on their way to a new Cell State and Hidden State:
1. In the so-called Forget Gate, it is decided which current and previous information is kept and which is
thrown out. This includes the hidden status from the previous pass and the current input. These values
are passed into a sigmoid function, which can only output values between 0 and 1. The value 0 means
that previous information can be forgotten because there is possibly new, more important information.
The number one means accordingly that the previous information is preserved. The results from this
are multiplied by the current Cell State so that knowledge that is no longer needed is forgotten since it
is multiplied by 0 and thus dropped out.

2. In the Input Gate, it is decided how valuable the current input is to solve the task. For this, the current
input is multiplied by the hidden state and the weight matrix of the last run. All information that
appears important in the Input Gate is then added to the Cell State and forms the new Cell State c(t).
This new Cell State is now the current state of the long-term memory and will be used in the next run.

3. In the Output Gate, the output of the LSTM model is then calculated in the Hidden State. Depending on
the application, it can be, for example, a word that complements the meaning of the sentence. To do
this, the sigmoid function decides what information can come through the output gate and then the
cell state is multiplied after it is activated with the tanh function.

Using our previous example, the whole thing becomes a bit more understandable. The goal of the LSTM is
to fill the gap. So the model goes through word by word until it reaches the gap. In the Recurrent Neural
Network, the problem here was that the model had already forgotten that the text was about clouds by
the time it arrived at the gap.

As a result, no correct prediction could be made. Let us, therefore, consider how an LSTM would have
behaved. The information “cloud” would very likely have simply ended up in the cell state, and thus would
have been preserved throughout the entire computations. Arriving at the gap, the model would have
recognized that the word “cloud” is essential to fill the gap correctly.

Which applications rely on LSTM?

For many years, LSTM Networks were the best tool in natural language processing because they could
hold the context of a sentence “in memory” for a relatively long time. The following concrete programs rely
on this type of neural network:

Apple’s keyboard completion is based on an LSTM network. In addition, Siri, Apple’s voice assistant, was
also based on this type of neural network.

Google’s AlphaGo software also relied on long short-term memory and was thus able to beat real
humans in the game of Go.

Google’s translation service was based on LSTMs, among other things.

Nowadays, however, the importance of LSTMs in applications is declining somewhat, as so-called

transformers are becoming more and more prevalent. However, these are very computationally intensive
and have high demands on the infrastructure used. Therefore, in many cases, the higher quality must be
weighed against the higher effort.

TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
LSTM
No ratings yet
LSTM
19 pages
LSTM
No ratings yet
LSTM
22 pages
RNN & LSTM
No ratings yet
RNN & LSTM
12 pages
longshorttermmemorylstm-231215171600-1feb7b1b
No ratings yet
longshorttermmemorylstm-231215171600-1feb7b1b
17 pages
UNIT-III
No ratings yet
UNIT-III
5 pages
lstm
No ratings yet
lstm
12 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
8 pages
Final PDL_Unit IV
No ratings yet
Final PDL_Unit IV
51 pages
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
LSTM Networks Thesis Updated
No ratings yet
LSTM Networks Thesis Updated
5 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
OlahLSTM NEURAL NETWORK TUTORIAL 15
No ratings yet
OlahLSTM NEURAL NETWORK TUTORIAL 15
9 pages
LSTM Presentation
No ratings yet
LSTM Presentation
23 pages
LSTM
No ratings yet
LSTM
12 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
15 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
10 pages
lstm
No ratings yet
lstm
3 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
7 pages
34-Long-Term Dependencies - Echo State Networks - Long Short-Term Memory and Othe-03!10!2024
No ratings yet
34-Long-Term Dependencies - Echo State Networks - Long Short-Term Memory and Othe-03!10!2024
14 pages
CS5560 Lect12-RNN - LSTM
No ratings yet
CS5560 Lect12-RNN - LSTM
30 pages
RNN_2
No ratings yet
RNN_2
144 pages
LSTM by Bushra
No ratings yet
LSTM by Bushra
16 pages
Deep Arch Msc 2024
No ratings yet
Deep Arch Msc 2024
83 pages
Deep Learning (MODULE-5)
No ratings yet
Deep Learning (MODULE-5)
71 pages
Sequence Modeling
No ratings yet
Sequence Modeling
131 pages
Neural Networks
No ratings yet
Neural Networks
22 pages
15.03.2024_CSA3007_A24+D23+D24 (1)
No ratings yet
15.03.2024_CSA3007_A24+D23+D24 (1)
8 pages
RNN Part1
No ratings yet
RNN Part1
12 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
15 pages
Unit5 6LSTM
No ratings yet
Unit5 6LSTM
9 pages
LSTM_1738024034
No ratings yet
LSTM_1738024034
13 pages
Module 6
No ratings yet
Module 6
42 pages
GRU
No ratings yet
GRU
17 pages
Long Short-Term Memory (LSTM) by Mohsin
No ratings yet
Long Short-Term Memory (LSTM) by Mohsin
17 pages
Ai - Ds - Ad3501-Dl GMT 3 QP and Key
No ratings yet
Ai - Ds - Ad3501-Dl GMT 3 QP and Key
10 pages
Deep Learning RNN
100% (1)
Deep Learning RNN
53 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
Long Short-Term Memory Networks PDF
No ratings yet
Long Short-Term Memory Networks PDF
22 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
RNNs and LSTMs
No ratings yet
RNNs and LSTMs
41 pages
Towardsdatascience
No ratings yet
Towardsdatascience
10 pages
Week 6 (1)
No ratings yet
Week 6 (1)
60 pages
Unit 3 Deep Learning SPPU BE IT
No ratings yet
Unit 3 Deep Learning SPPU BE IT
30 pages
CH4_AA1.1-Sequence Models (1)
No ratings yet
CH4_AA1.1-Sequence Models (1)
26 pages
RNN.docx
No ratings yet
RNN.docx
10 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
Long Short Term Memory (LSTM)
No ratings yet
Long Short Term Memory (LSTM)
33 pages
Short Notes On Vanishing & Exploding Gradients
No ratings yet
Short Notes On Vanishing & Exploding Gradients
30 pages
What is LSTM
No ratings yet
What is LSTM
5 pages
LSTM
No ratings yet
LSTM
123 pages
EPJ LSTM Survey
No ratings yet
EPJ LSTM Survey
14 pages
DL CO-3 PPT 3
No ratings yet
DL CO-3 PPT 3
19 pages
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
From Everand
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
Fouad Sabry
No ratings yet
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation
From Everand
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation
Fouad Sabry
No ratings yet
Unit 1 Introduction to Neural Networks Cleaned
No ratings yet
Unit 1 Introduction to Neural Networks Cleaned
4 pages
Echo State Network
No ratings yet
Echo State Network
4 pages
Adaline and K
0% (1)
Adaline and K
29 pages
Case Studies
No ratings yet
Case Studies
17 pages
ML Exp-7
No ratings yet
ML Exp-7
5 pages
INT422
No ratings yet
INT422
5 pages
Learning XOR - Gradient Based Learning - Hidden Units
No ratings yet
Learning XOR - Gradient Based Learning - Hidden Units
43 pages
Keras and Tensorflow
No ratings yet
Keras and Tensorflow
11 pages
Cognitive Control
No ratings yet
Cognitive Control
10 pages
NN Lec - 04 - 05
No ratings yet
NN Lec - 04 - 05
84 pages
DLunit 4
No ratings yet
DLunit 4
16 pages
hw1 Deep Learning SoSe2024
No ratings yet
hw1 Deep Learning SoSe2024
2 pages
Backpropagation Learning in Neural Networks
No ratings yet
Backpropagation Learning in Neural Networks
27 pages
Matlab Iris RBF
No ratings yet
Matlab Iris RBF
21 pages
Adaptive Resonance Theory
No ratings yet
Adaptive Resonance Theory
3 pages
NNML_Mid_1_Objective[2]
No ratings yet
NNML_Mid_1_Objective[2]
16 pages
CO328 - Deep - Learning - Final 23.12.23
No ratings yet
CO328 - Deep - Learning - Final 23.12.23
2 pages
GANS-ppt
No ratings yet
GANS-ppt
22 pages
Resnet50 Summary
No ratings yet
Resnet50 Summary
4 pages
CNN - NASA Battery Dataset
No ratings yet
CNN - NASA Battery Dataset
7 pages
Evaluating GRU and LSTM With Regularization On Translating Different Language Pairs
No ratings yet
Evaluating GRU and LSTM With Regularization On Translating Different Language Pairs
7 pages
Recurrent Neural Network Wiki
100% (1)
Recurrent Neural Network Wiki
7 pages
9. Generative Adversarial Network
No ratings yet
9. Generative Adversarial Network
22 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
Laboratorium Pembelajaran Ilmu Komputer Fakultas Ilmu Komputer Universitas Brawijaya
No ratings yet
Laboratorium Pembelajaran Ilmu Komputer Fakultas Ilmu Komputer Universitas Brawijaya
11 pages
简单粗暴Tensorflow2.0
No ratings yet
简单粗暴Tensorflow2.0
45 pages
DNN Ho
No ratings yet
DNN Ho
8 pages
Assignment_8_2024_updated
No ratings yet
Assignment_8_2024_updated
6 pages
4.0 The Complete Guide To Artificial Neural Networks
No ratings yet
4.0 The Complete Guide To Artificial Neural Networks
23 pages
ANN Architectures
No ratings yet
ANN Architectures
26 pages