0% found this document useful (0 votes)
3 views

What is an RNN

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

What is an RNN

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

What is an RNN?

A recurrent neural network, or RNN, is a deep neural network trained on


sequential or time series data to create a machine learning model can make
sequential predictions or conclusions based on sequential inputs.
An RNN might be used to predict daily flood levels based on past daily flood,
tide and meteorilogical data. But RNNs can also be used to solve ordinal or
temporal problems such as language translation, natural language processing
(NLP), speech recognition, and image captioning. RNNs are incorporated into
popular applications such as Siri, voice search, and Google Translate.

How does a recurrent neural network work?


The following image shows a diagram of an RNN.

RNNs are made of neurons: data-processing nodes that work together to


perform complex tasks. The neurons are organized as input, output, and
hidden layers. The input layer receives the information to process, and the
output layer provides the result. Data processing, analysis, and prediction take
place in the hidden layer.
Hidden layer
RNNs work by passing the sequential data that they receive to the hidden
layers one step at a time. However, they also have a self-looping
or recurrent workflow: the hidden layer can remember and use previous inputs
for future predictions in a short-term memory component. It uses the current
input and the stored memory to predict the next sequence.
For example, consider the sequence: Apple is red. You want the RNN to
predict red when it receives the input sequence Apple is. When the hidden
layer processes the word Apple, it stores a copy in its memory. Next, when it
sees the word is, it recalls Apple from its memory and understands the full
sequence: Apple is for context. It can then predict red for improved accuracy.
This makes RNNs useful in speech recognition, machine translation, and other
language modeling tasks.

Training
Machine learning (ML) engineers train deep neural networks like RNNs by
feeding the model with training data and refining its performance. In ML, the
neuron's weights are signals to determine how influential the information
learned during training is when predicting the output. Each layer in an RNN
shares the same weight.
ML engineers adjust weights to improve prediction accuracy. They use a
technique called backpropagation through time (BPTT) to calculate model error
and adjust its weight accordingly. BPTT rolls back the output to the previous
time step and recalculates the error rate. This way, it can identify which hidden
state in the sequence is causing a significant error and readjust the weight to
reduce the error margin.

What are the types of recurrent neural networks?


RNNs are often characterized by one-to-one architecture: one input sequence
is associated with one output. However, you can flexibly adjust them into
various configurations for specific purposes. The following are several common
RNN types.
One-to-many
This RNN type channels one input to several outputs. It enables linguistic
applications like image captioning by generating a sentence from a single
keyword.

Many-to-many
The model uses multiple inputs to predict multiple outputs. For example, you
can create a language translator with an RNN, which analyzes a sentence and
correctly structures the words in a different language.

Many-to-one
Several inputs are mapped to an output. This is helpful in applications like
sentiment analysis, where the model predicts customers’ sentiments
like positive, negative, and neutral from input testimonials.
How do recurrent neural networks compare to other
deep learning networks?
RNNs are one of several different neural network architectures.
Recurrent neural network vs. feed-forward neural network
Like RNNs, feed-forward neural networks are artificial neural networks that
pass information from one end to the other end of the architecture. A feed-
forward neural network can perform simple classification, regression, or
recognition tasks, but it can’t remember the previous input that it has
processed. For example, it forgets Apple by the time its neuron processes the
word is. The RNN overcomes this memory limitation by including a hidden
memory state in the neuron.
Recurrent neural network vs. convolutional neural networks
Convolutional neural networks are artificial neural networks that are designed
to process spatial data. You can use convolutional neural networks to extract
spatial information from videos and images by passing them through a series of
convolutional and pooling layers in the neural network. RNNs are designed to
capture long-term dependencies in sequential data

Common activation functions


As discussed in the Learn article on Neural Networks, an activation function
determines whether a neuron should be activated. The nonlinear functions
typically convert the output of a given neuron to a value between 0 and 1 or -1
and1.
Variant RNN architectures
Popular RNN architecture variants include
• Bidirectional recurrent neural networks (BRRNs)
• Long short-term memory (LSTM)
• Gated recurrent units (GNUs)
1.Bidirectional recurrent neural networks (BRNNs)

While unidirectional RNNs can only drawn from previous inputs to make
predictions about the current state, bidirectional RNNs, or BRNNs, pull in
future data to improve the accuracy of it. Returning to the example of “feeling
under the weather”, a model based on a BRNN can better predict that the
second word in that phrase is “under” if it knows that the last word in the
sequence is “weather.”
2.Long short-term memory (LSTM)

LSTM a popular RNN architecture, which was introduced by Sepp Hochreiter


and Juergen Schmidhuber as a solution to vanishing gradient problem. In
their paper (link resides outside ibm.com), they work to address the problem of
long-term dependencies. That is, if the previous state that is influencing the
current prediction is not in the recent past, the RNN model may not be able to
accurately predict the current state.
As an example, let’s say we wanted to predict the italicized words in following,
“Alice is allergic to nuts. She can’t eat peanut butter.” The context of a nut
allergy can help us anticipate that the food that cannot be eaten contains nuts.
However, if that context was a few sentences prior, then it would make it
difficult, or even impossible, for the RNN to connect the information.
To remedy this, LSTMs have “cells” in the hidden layers of the neural network,
which have three gates–an input gate, an output gate, and a forget gate. These
gates control the flow of information which is needed to predict the output in
the network. For example, if gender pronouns, such as “she”, was repeated
multiple times in prior sentences, you may exclude that from the cell state.
3.Gated recurrent units (GRUs)

A GRU is similar to an LSTM as it also works to address the short-term memory


problem of RNN models. Instead of using a “cell state” regulate information, it
uses hidden states, and instead of three gates, it has two—a reset gate and an
update gate. Similar to the gates within LSTMs, the reset and update gates
control how much and which information to retain.

You might also like