RNN Neural Network
RNN Neural Network
A Neural Network consists of different layers connected to each other, working on the
structure and function of a human brain. It learns from huge volumes of data and uses
complex algorithms to train a neural net.
Here is an example of how neural networks can identify a dog’s breed based on their
features.
• The image pixels of two different breeds of dogs are fed to the input layer of
the neural network.
• The image pixels are then processed in the hidden layers for feature
extraction.
• The output layer produces the result to identify if it’s a German Shepherd or a
Labrador.
• Such networks do not require memorizing the past output.
Several neural networks can help solve different business problems. Let’s look at a few
of them.
• Feed-Forward Neural Network: Used for general Regression and
Classification problems.
• Convolutional Neural Network: Used for object detection and image
classification.
• Deep Belief Network: Used in healthcare sectors for cancer detection.
• RNN: Used for speech recognition, voice recognition, time series prediction,
and natural language processing.
Read More: What is Neural Network: Overview, Applications, and Advantages
What Is a Recurrent Neural Network (RNN)?
RNN works on the principle of saving the output of a particular layer and feeding this
back to the input in order to predict the output of the layer.
Below is how you can convert a Feed-Forward Neural Network into a Recurrent Neural
Network:
Fig: Simple Recurrent Neural Network
The nodes in different layers of the neural network are compressed to form a single layer
of recurrent neural networks. A, B, and C are the parameters of the network.
Here, “x” is the input layer, “h” is the hidden layer, and “y” is the output layer. A, B, and C
are the network parameters used to improve the output of the model. At any given time
t, the current input is a combination of input at x(t) and x(t-1). The output at any given
time is fetched back to the network to improve on the output.
Now, let’s discuss the most popular and efficient way to deal with gradient problems,
i.e., Long Short-Term Memory Network (LSTMs).
First, let’s understand Long-Term Dependencies.
Suppose you want to predict the last word in the text: “The clouds are in the ______.”
The most obvious answer to this is the “sky.” We do not need any further context to
predict the last word in the above sentence.
Consider this sentence: “I have been staying in Spain for the last 10 years…I can speak
fluent ______.”
The word you predict will depend on the previous few words in context. Here, you need
the context of Spain to predict the last word in the text, and the most suitable answer to
this sentence is “Spanish.” The gap between the relevant information and the point
where it's needed may have become very large. LSTMs help you solve this problem.
Common Activation Functions
Recurrent Neural Networks (RNNs) use activation functions just like other neural
networks to introduce non-linearity to their models. Here are some common activation
functions used in RNNs:
Sigmoid Function:
The sigmoid function is commonly used in RNNs. It has a range between 0 and 1, which
makes it useful for binary classification tasks. The formula for the sigmoid function is:
σ(x) = 1 / (1 + e^(-x))
Hyperbolic Tangent (Tanh) Function:
The tanh function is also commonly used in RNNs. It has a range between -1 and 1,
which makes it useful for non-linear classification tasks. The formula for the tanh
function is:
tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))
Rectified Linear Unit (Relu) Function:
The ReLU function is a non-linear activation function that is widely used in deep neural
networks. It has a range between 0 and infinity, which makes it useful for models that
require positive outputs. The formula for the ReLU function is:
ReLU(x) = max(0, x)
Leaky Relu Function:
The Leaky ReLU function is similar to the ReLU function, but it introduces a small slope
to negative values, which helps to prevent "dead neurons" in the model. The formula for
the Leaky ReLU function is:
Leaky ReLU(x) = max(0.01x, x)
Softmax Function:
The softmax function is often used in the output layer of RNNs for multi-class
classification tasks. It converts the network output into a probability distribution over
the possible classes. The formula for the softmax function is:
softmax(x) = e^x / ∑(e^x)
These are just a few examples of the activation functions used in RNNs. The choice of
activation function depends on the specific task and the model's architecture.
With the current input at x(t), the input gate analyzes the important information — John
plays football, and the fact that he was the captain of his college team is important.
“He told me yesterday over the phone” is less important; hence it's forgotten. This
process of adding some new information can be done via the input gate.
Step 3: Decide What Part of the Current Cell State Makes It
to the Output
The third step is to decide what the output will be. First, we run a sigmoid layer, which
decides what parts of the cell state make it to the output. Then, we put the cell state
through tanh to push the values to be between -1 and 1 and multiply it by the output of
the sigmoid gate.
Let’s consider this example to predict the next word in the sentence: “John played
tremendously well against the opponent and won for his team. For his contributions,
brave ____ was awarded player of the match.”
There could be many choices for the empty space. The current input brave is an
adjective, and adjectives describe a noun. So, “John” could be the best output after
brave.