0% found this document useful (0 votes)
21 views51 pages

5. DEEP UNIT 3 F (1)

Ai and ds some subjects

Uploaded by

devikrishna8300
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views51 pages

5. DEEP UNIT 3 F (1)

Ai and ds some subjects

Uploaded by

devikrishna8300
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

AD3501 – Deep Learning

UNIT-III
RECURRENT NEURAL NETWORKS

Unfolding Graphs – RNN Design Patterns: Acceptor – Encoder – Transducer;


Gradient Computation – Sequence Modeling Conditioned on Contexts –
Bidirectional RNN – Sequence to Sequence RNN – Deep Recurrent Networks –
Recursive Neural Networks – Long Term Dependencies; Leaky Units: Skip
connections and dropouts; Gated Architecture: LSTM.

Arunachala College of Engineering for Women 129


AD3501 – Deep Learning

3.1 UNFOLDING GRAPHS


A computational graph is a way to formalize the structure of a set of computations, such as
those involved in mapping inputs and parameters to outputs and loss.
 We can unfold a recursive or recurrent computation into a computational graph that
has a repetitive structure - Corresponding to a chain of events
 Unfolding this graph results in sharing of parameters across a deep network structure
 The below figure shows unfolding the recurrent neuron in a Recurrent Neural
Network.

Arunachala College of Engineering for Women 130


AD3501 – Deep Learning
Recurrent Neural Networks
Recurrent Neural Networks (RNNs) are a class of neural networks designed for
processing sequences of data. A recurrent neural network (RNN) is a deep learning model
that is trained to process and convert a sequential data input into a specific sequential data
output.

Example of unfolding a recurrent equation:


Classical form of a dynamical system is
s(t) =f (s(t-1) ; θ)
• Where s(t) is called the state of the system
• Equation is recurrent because the definition of s at time t refers back to the same
definition at time t-1
For a finite no. of time steps τ, the graph can be unfolded by applying the definition τ-
1 times
• E,g, for τ = 3 time steps we get
s(3) = f (s(2); θ)
= f (f (s(1); θ); θ)
• Unfolding equation by repeatedly applying the definition in this way has yielded
expression without recurrence
• s(1) is ground state and s(2) computed by applying f

Arunachala College of Engineering for Women 131


AD3501 – Deep Learning
• Such an expression can be represented by a traditional acyclic computational graph
(as shown next)

Unfolded dynamical system:


The classical dynamical system described by
s(t) =f (s(t-1); θ ) and s(3)= f (f (s(1); θ); θ)
It is illustrated as an unfolded computational graph

• Each node represents state at some time t


• Function f maps state at time t to the state at t+1
• The same parameters (the same value of θ used to parameterize f ) are used for all
time steps
Dynamical System Driven By External Signal
As another example, consider a dynamical system driven by external (input) signal
x(t)
s(t) =f ( s(t-1), x(t) ; θ )
• State now contains information about the whole past input sequence
• Note that the previous dynamic system was simply s(t)=f (s(t-1); θ)
• Recurrent neural networks can be built in many ways
• Much as almost any function is a feed forward neural network, any function
involving recurrence can be considered to be a recurrent neural network

Arunachala College of Engineering for Women 132


AD3501 – Deep Learning
Defining values of hidden units in RNNs
• Many recurrent neural nets use same equation (as dynamical system with external
input) to define values of hidden units
• To indicate that the state is hidden rewrite using variable h for state:
h(t) =f ( h(t-1), x(t); θ )
It is illustrated below:

• Typical RNNs have extra architectural features such as output layers that read
information out of state h to make predictions
A Recurrent Network with No Outputs
This network just processes information from input x by incorporating it into state h that is
passed forward through time.

Typical RNNs will add extra architectural features such as output layers to read
information out of the state h to make predictions.
Unfolding in Recurrent Neural Networks (RNNs)
In RNNs, the network can be thought of as a graph where nodes represent neurons and edges
represent connections with weights. These connections form cycles, representing the

Arunachala College of Engineering for Women 133


AD3501 – Deep Learning
temporal dependencies in sequential data. Unfolding an RNN over time helps visualize and
understand how the network processes input sequences.
1. Initial State: The RNN starts with an initial hidden state, h0h_0h0.
2. Input Sequence: At each time step ttt, the RNN takes an input xtx_txt and updates its
hidden state hth_tht.
3. Unfolding: Unfolding the RNN over TTT time steps results in a sequence of states
and operations

Applications
 Sequence Prediction: In RNNs, unfolding helps in tasks like language modeling,
time series forecasting, and more.
 Node Classification: In GNNs, unfolding is useful for node classification tasks where
node embeddings are iteratively refined.
 Graph Classification: GNNs can be used to classify entire graphs by unfolding and
aggregating information from all nodes.
Disadvantages
1. Vanishing and Exploding Gradients:
 Vanishing Gradients: As the network is unfolded over many time steps, the
gradients can become very small, leading to very slow learning or the inability to
learn long-term dependencies.
2. Exploding Gradients: Conversely, the gradients can also grow exponentially, causing the
model parameters to become unstable and leading to numerical overflow.

Arunachala College of Engineering for Women 134


AD3501 – Deep Learning
3. Computational Complexity:
 Time Complexity: Unfolding RNNs over many time steps can be computationally
intensive, especially for long sequences.
 Memory Usage: Storing the intermediate states for backpropagation through time
(BPTT) requires significant memory, which can be a bottleneck for training deep
RNNs on large datasets.
4. Training Difficulties:
 Non-Parallelizable: RNNs inherently process data sequentially, making it difficult to
parallelize training, unlike feedforward neural networks.
 Long Training Times: Due to sequential data processing and the need for
backpropagation through time, training RNNs can be slow.
5. Model Complexity:
 Overfitting: With the increased number of parameters and complex dependencies,
RNNs can easily overfit to the training data, especially if not regularized properly.

3.2 RNN DESIGN PATTERNS


RNN (Recurrent Neural Network) design patterns refer to the various architectural variations
and techniques developed to enhance the performance, training efficiency, and capabilities of
RNNs. These patterns address common challenges in sequential data processing, such as
long-term dependencies, vanishing/exploding gradients, and the need for context from both
past and future inputs.
Examples of RNN Design Patterns:
1. Acceptor
2. Encoder
3. Transducer
a. Acceptor RNN
An acceptor RNN is designed to take an input sequence and produce a single output,
typically used for classification tasks. The key idea is that the final hidden state of the RNN
captures the summary of the entire input sequence, which can then be used for the final
decision-making process.

Arunachala College of Engineering for Women 135


AD3501 – Deep Learning
Architecture:
 Input: Sequence of data points (e.g., words in a sentence).
 RNN Layer: Processes the input sequence and updates the hidden state at each time
step.
 Output: A single output, such as a classification label, derived from the final hidden
state.
Example Use Case: Sentiment analysis of a sentence, where the model outputs a sentiment
label (positive or negative) based on the entire input sentence.
b. Encoder RNN
An encoder RNN is part of the encoder-decoder architecture commonly used in sequence-to-
sequence tasks, such as machine translation. The encoder processes the input sequence and
encodes it into a fixed-size context vector (or a sequence of vectors).
Architecture:
 Input: Sequence of data points.
 Encoder RNN Layer: Processes the input sequence and produces a context vector
representing the entire sequence.
Example Use Case: In machine translation, the encoder RNN processes the input sentence
(source language) and converts it into a context vector that summarizes the sentence.

c. Transducer RNN

Arunachala College of Engineering for Women 136


AD3501 – Deep Learning
A transducer RNN refers to a model that converts an input sequence into an output
sequence, typically using both an encoder and a decoder. This term is often used in the
context of sequence transduction tasks, where the goal is to transform one sequence into
another.
Architecture:
 Input: Sequence of data points.
 Encoder RNN Layer: Encodes the input sequence into a context vector.
 Decoder RNN Layer: Takes the context vector and generates the output sequence.
Example Use Case: Speech recognition, where the input is a sequence of audio frames and
the output is a sequence of transcribed text.
Combined Architecture: Acceptor-Encoder-Transducer

Input Sequence (x) -> Acceptor -> Encoder -> Decoder -> Output Sequence (y)

a) Input Sequence (x): This is the sequence of input data that the model processes.
b) Output Sequence (y): The desired output sequence that the model aims to generate or
predict.
c) Acceptor: Initial processing step that prepares the input sequence for further
encoding.
d) Encoder: Converts the processed input sequence into a fixed-size context vector.
e) Decoder: Utilizes the context vector from the encoder to generate an output sequence
step-by-step.
f) Decoder hidden state: Maintains the state of the decoder RNN across time steps,
influencing subsequent outputs.
Applications
It possesses many applications such as
• Google’s Machine Translation
• Question answering chatbots
• Speech recognition
• Time Series Application etc.,

Arunachala College of Engineering for Women 137


AD3501 – Deep Learning
3.3 GRADIENT COMPUTATION
The gradient computation involves recurrent multiplication of W. This multiplying by W to
each cell has a bad effect. In a recurrent neural network (RNN), gradient computation
involves calculating the gradients of the loss function with respect to the model parameters,
typically using backpropagation through time (BPTT). In Recurrent Neural Networks
(RNNs), computing gradients involves applying the chain rule of calculus to propagate errors
backwards through time, considering the sequential nature of RNNs.
Gradients:
Gradients are the values that indicate how much a parameter in a neural network
should change to reduce the error. They are computed using a technique called
backpropagation, which involves applying the chain rule of calculus to propagate the error
from the output layer to the input layer.

How is gradient calculated in neural networks?


Given the training examples, we will calculate the gradients by performing 2 passes in the
network: a forward pass and a backward pass. The forward pass takes the input values x, the
current weights W1 and W2, and calculates E(z2,y) the error/loss E between the actual output
values z2 and the expected output values y

Arunachala College of Engineering for Women 138


AD3501 – Deep Learning

Arunachala College of Engineering for Women 139


AD3501 – Deep Learning

General Backpropagation to compute gradient

3.4 SEQUENCE MODELING CONDITIONED ON CONTEXTS


In the context of Recurrent Neural Networks (RNNs), conditioning sequence modeling on
contexts involves using additional information to influence the generation or prediction of
sequences over time.
1. Context Definition:
o The context can be defined as any external information that helps guide the
sequence generation process. In the case of RNNs, this often includes previous
elements of the sequence or additional features related to the sequence.
2. Conditioning Mechanism in RNNs:
o RNNs maintain a hidden state that evolves over time as new elements of the
sequence are processed. This hidden state can be seen as a form of internal
context that encapsulates information from previous time steps.
o External contexts can be incorporated into RNNs in several ways:
 Initial State: The initial hidden state of the RNN can be initialized
based on the context information before processing the sequence.

Arunachala College of Engineering for Women 140


AD3501 – Deep Learning
 Concatenation: Contextual information can be concatenated with each
input vector at every time step before feeding it into the RNN.
 Attention Mechanisms: Attention mechanisms can dynamically
weight the importance of context information at different time steps,
influencing the RNN's hidden state evolution.
 Conditional RNNs: RNNs can be conditioned on external variables or
signals, modifying their behavior based on these inputs.
3. Applications:
o Language Modeling: Predicting the next word in a sentence based on the
context provided by previous words.
o Time Series Prediction: Forecasting future values in a time series considering
historical data and external factors.
o Behavior Modeling: Modeling user behavior over time considering
contextual factors like previous actions or user demographics.
4. Training and Optimization:
o During training, RNNs learn to leverage context information to improve
sequence prediction accuracy.
o Gradient-based optimization techniques like backpropagation through time
(BPTT) are used to update the RNN parameters based on the error signal
propagated through time steps.
5. Challenges:
o Long-Term Dependencies: RNNs may struggle with capturing dependencies
over long sequences, especially if context information is sparse or noisy.
o Context Integration: Effectively integrating context information without
overwhelming the RNN's capacity or increasing computational complexity.
o Overfitting: Ensuring that the RNN learns to generalize from context
information rather than memorizing specific sequences.
In summary, conditioning sequence modeling on contexts within RNNs involves integrating
external information to enhance the model's ability to generate or predict sequences based on
relevant context cues over time. This approach leverages the RNN's recurrent nature to

Arunachala College of Engineering for Women 141


AD3501 – Deep Learning
maintain and update internal state representations influenced by both current inputs and
external contexts.

Graphical models of RNNs without/with inputs


1. Directed graphical models of RNNs without inputs

2. RNNs do include a sequence of inputs x(1), x(2),..x(τ)

3.5 BIDIRECTIONAL RNN


Bi-directional recurrent neural networks (Bi-RNNs) are artificial neural networks that
process input data in both the forward and backward directions. They are often used in

Arunachala College of Engineering for Women 142


AD3501 – Deep Learning
natural language processing tasks, such as language translation, text classification, and named
entity recognition. In addition, they can capture contextual dependencies in the input data by
considering past and future contexts. Bi-RNNs consist of two separate RNNs that process the
input data in opposite directions, and the outputs of these RNNs are combined to produce the
final output.

After that, the results from these hidden layers are collected and input into a prediction-
making final layer. The goal of a Bi-RNN is to capture the contextual dependencies in the
input data by processing it in both directions, which can be useful in various natural
language processing (NLP) tasks.
In a Bi-RNN, the input data is passed through two separate RNNs: one processes the
data in the forward direction, while the other processes it in the reverse direction. The
outputs of these two RNNs are then combined in some way to produce the final output.

Arunachala College of Engineering for Women 143


AD3501 – Deep Learning

The network has two separate RNNs:


 One that processes the input sequence from left to right
 Another one that processes the input sequence from right to left.
These two RNNs are typically called forward and backward RNNs, respectively.
Need for Bi-directional RNNs
A uni-directional recurrent neural network (RNN) processes input sequences in a
single direction, either from left to right or right to left. This means the network can
only use information from earlier time steps when making predictions at later time
steps.
This can be limiting, as the network may not capture important contextual information
relevant to the output prediction.
Working of Bidirectional Recurrent Neural Network
1. Inputting a sequence: A sequence of data points, each represented as a vector with
the same dimensionality, are fed into a BRNN. The sequence might have different
lengths.
2. Dual Processing: Both the forward and backward directions are used to process the
data. On the basis of the input at that step and the hidden state at step t-1, the hidden
state at time step t is determined in the forward direction. The input at step t and the

Arunachala College of Engineering for Women 144


AD3501 – Deep Learning
hidden state at step t+1 are used to calculate the hidden state at step t in a reverse
way.
3. Computing the hidden state: A non-linear activation function on the weighted sum
of the input and previous hidden state is used to calculate the hidden state at each
step. This creates a memory mechanism that enables the network to remember data
from earlier steps in the process.
4. Determining the output: A non-linear activation function is used to determine the
output at each step from the weighted sum of the hidden state and a number of
output weights. This output has two options: it can be the final output or input for
another layer in the network.
5. Training: The network is trained through a supervised learning approach where the
goal is to minimize the discrepancy between the predicted output and the actual
output. The network adjusts its weights in the input-to-hidden and hidden-to-output
connections during training through backpropagation.
To calculate the output from an RNN unit, we use the following formula:

The hidden state at time t is given by a combination of H t (Forward) and H t (Backward).


The output at any given hidden state is :
Yt = Ht * WAY + by
The training of a BRNN is similar to backpropagation through a time algorithm.
Applications of Bidirectional Recurrent Neural Network

Arunachala College of Engineering for Women 145


AD3501 – Deep Learning
Bi-RNNs have been applied to various natural language processing (NLP) tasks, including:
 Sentiment Analysis: By taking into account both the prior and subsequent
context, BRNNs can be utilized to categorize the sentiment of a particular
sentence.
 Named Entity Recognition: By considering the context both before and after
the stated thing, BRNNs can be utilized to identify those entities in a sentence.
 Part-of-Speech Tagging: The classification of words in a phrase into their
corresponding parts of speech, such as nouns, verbs, adjectives, etc., can be done
using BRNNs.
 Machine Translation: BRNNs can be used in encoder-decoder models for
machine translation, where the decoder creates the target sentence and the
encoder analyses the source sentence in both directions to capture its context.
 Speech Recognition: When the input voice signal is processed in both
directions to capture the contextual information, BRNNs can be used in
automatic speech recognition systems.

Advantages of Bidirectional RNN


 Context from both past and future: With the ability to process sequential
input both forward and backward, BRNNs provide a thorough grasp of the full
context of a sequence. Because of this, BRNNs are effective at tasks like
sentiment analysis and speech recognition.
 Enhanced accuracy: BRNNs frequently yield more precise answers since they
take both historical and upcoming data into account.
 Efficient handling of variable-length sequences: When compared to
conventional RNNs, which require padding to have a constant length, BRNNs
are better equipped to handle variable-length sequences.
 Resilience to noise and irrelevant information: BRNNs may be resistant to
noise and irrelevant data that are present in the data. This is so because both the
forward and backward paths offer useful information that supports the
predictions made by the network.

Arunachala College of Engineering for Women 146


AD3501 – Deep Learning
 Ability to handle sequential dependencies: BRNNs can capture long-term
links between sequence pieces, making them extremely adept at handling
complicated sequential dependencies.
Disadvantages of Bidirectional RNN
 Computational complexity: Given that they analyze data both forward and
backward, BRNNs can be computationally expensive due to the increased
amount of calculations needed.
 Long training time: BRNNs can also take a while to train because there are
many parameters to optimize, especially when using huge datasets.
 Difficulty in parallelization: Due to the requirement for sequential processing
in both the forward and backward directions, BRNNs can be challenging to
parallelize.
 Overfitting: BRNNs are prone to overfitting since they include many
parameters that might result in too complicated models, especially when trained
on short datasets.
 Interpretability: Due to the processing of data in both forward and backward
directions, BRNNs can be tricky to interpret since it can be difficult to
comprehend what the model is doing and how it is producing predictions.
3.6 SEQUENCE-TO-SEQUENCE RNN:
A Recurrent Neural Network, or RNN, is a network that operates on a sequence and
uses its own output as input for subsequent steps. A Sequence to Sequence network, or
seq2seq network, or Encoder Decoder network, is a model consisting of two RNNs called the
encoder and decoder.
Sequence-to-Sequence (Seq2Seq) models using Recurrent Neural Networks (RNNs)
are a popular architecture in machine learning for tasks involving sequences, such as machine
translation, summarization, and dialogue generation.
Components:
Seq2seq model has 3 components:
4. Encoder
5. decoder

Arunachala College of Engineering for Women 147


AD3501 – Deep Learning
6. intermediate step

Encoder-Decoder Structure:
 Encoder: Takes an input sequence (e.g., a sentence in one language) and processes it
into a fixed-size context vector, which represents the input sequence in a semantic
space.
 Decoder: Takes this context vector and generates an output sequence (e.g., a
translated sentence in another language) one token at a time.

Recurrent Neural Networks (RNNs):

Arunachala College of Engineering for Women 148


AD3501 – Deep Learning
 Traditionally, Seq2Seq models use RNNs, such as LSTM (Long Short-Term
Memory) or GRU (Gated Recurrent Unit), as building blocks for both the encoder and
decoder.
 RNNs are suited for processing sequential data because they maintain a hidden state
that captures information about the sequence seen so far.
Encoding Phase:
 The input sequence is fed into the encoder RNN token by token. At each time step,
the encoder RNN updates its hidden state based on the current input token and the
previous hidden state.
 Once the entire input sequence is processed, the final hidden state of the encoder
captures the semantic representation of the entire input sequence.
Decoding Phase:
 The decoder RNN initializes its hidden state using the final hidden state of the
encoder.
 During decoding, the decoder RNN generates output tokens one by one. At each time
step, it uses the current input token (or the previously generated token) and the
previous hidden state to predict the next token in the output sequence.
 This process continues until an end-of-sequence token is generated or a predefined
maximum length is reached.
Training:
 Seq2Seq models are trained using pairs of input-output sequences (e.g., pairs of
sentences in two languages for machine translation).
 The model is optimized to minimize the difference between the predicted sequence
(output) and the target sequence (ground truth) using techniques like teacher forcing
(where the true previous output is fed as input during training).
Attention Mechanism (Optional):
 To handle long sequences and improve performance, Seq2Seq models often
incorporate an attention mechanism.
 Attention allows the decoder to focus on different parts of the input sequence
dynamically, aligning the input sequence with the current decoding state.

Arunachala College of Engineering for Women 149


AD3501 – Deep Learning
Encoder and Decoder in Seq2Seq model?
In the seq2seq model, the Encoder and the Decoder architecture plays a vital role in
converting input sequences into output sequences. Let’s explore about each block:
RNN based Seq2Seq Model
The decoder and encoder architecture utilizes RNNs to generate desired outputs. Let’s
look at the simplest seq2seq model.
For a given sequence of inputs , a RNN generates a sequence of outputs through iterative
computation based on the following equation:

Recurrent Neural Networks can easily map sequences to sequences when the alignment
between the inputs and the outputs are known in advance. Although the vanilla version of
RNN is rarely used, its more advanced version i.e. LSTM or GRU is used. This is because
RNN suffers from the problem of vanishing gradient. LSTM develops the context of the word
by taking 2 inputs at each point in time. One from the user and the other from its previous
output, hence the name recurrent (output goes as input).

Arunachala College of Engineering for Women 150


AD3501 – Deep Learning

Advantages of seq2seq Models


 Flexibility: Seq2Seq models can handle a wide range of tasks such as machine
translation, text summarization, and image captioning, as well as variable-length input
and output sequences.
 Handling Sequential Data: Seq2Seq models are well-suited for tasks that involve
sequential data such as natural language, speech, and time series data.
 Handling Context: The encoder-decoder architecture of Seq2Seq models allows the
model to capture the context of the input sequence and use it to generate the output
sequence.
 Attention Mechanism: Using attention mechanisms allows the model to focus on
specific parts of the input sequence when generating the output, which can improve
performance for long input sequences.
Disadvantages of seq2seq Models
 Computationally Expensive: Seq2Seq models require significant computational
resources to train and can be difficult to optimize.
 Limited Interpretability: The internal workings of Seq2Seq models can be difficult to
interpret, which can make it challenging to understand why the model is making
certain decisions.

Arunachala College of Engineering for Women 151


AD3501 – Deep Learning
 Overfitting: Seq2Seq models can overfit the training data if they are not properly
regularized, which can lead to poor performance on new data.
 Handling Rare Words: Seq2Seq models can have difficulty handling rare words that
are not present in the training data.
 Handling Long input Sequences: Seq2Seq models can have difficulty handling input
sequences that are very long, as the context vector may not be able to capture all the
information in the input sequence.
Applications of Seq2Seq model
 Text Summarization: The seq2seq model effectively understands the input text which
makes it suitable for news and document summarization.
 Speech Recognition: Seq2Seq model, especially those with attention mechanisms,
excel in processing audio waveform for ASR. They are able to capture spoken
language patterns effectively.
 Image Captioning: The seq2seq model integrate image features from CNNs with
textual generation capabilities for image captioning. They are capable to describe
images in a human readable format.

3.7 DEEP RECURRENT NETWORKS


A Deep RNN (Recurrent Neural Network) refers to a neural network architecture that
has multiple layers of recurrent units stacked on top of each other. It works by processing
information in layers, building up a more complete understanding of the data with each layer.
This helps it capture complex relationships between the different pieces of information and
make better predictions about what might come next. Deep RNNs are used in many real-life
applications, such as speech recognition systems like Siri or Alexa, language translation
software, and even self-driving cars

Arunachala College of Engineering for Women 152


AD3501 – Deep Learning

Fig: Typical architecture of deep recurrent neural network (RNN).


The computation in most RNNs can be decomposed into three blocks of parameters
and associated transformations:
1. from the input to the hidden state, x(t) → h(t)
2. from the previous hidden state to the next hidden state, h(t-1) → h(t)
3. from the hidden state to the output, h(t) → o(t)
These transformations are represented as a single layer within a deep MLP in the previous
discussed models. However, we can use multiple layers for each of the above transformations,
which results in deep recurrent networks.
(below) shows the resulting deep RNN, if we
(a) break down hidden to hidden,
(b) introduce deeper architecture for all the 1,2,3 transformations above and
(c) add “skip connections” for RNN that have deep hidden2hidden transformations.
Ways of making an RNN deep
a) Hidden recurrent state can be broken down into groups organized hierarchically
b) Deeper computation can be introduced in the input-hidden, hidden-hidden and hidden-
output parts. This may lengthen the shortest path linking different time steps

Arunachala College of Engineering for Women 153


AD3501 – Deep Learning
c) The path lengthening effect can be mitigated by introducing skip connections.

Training and Challenges:


 Gradient Vanishing/Exploding: Training deep RNNs can be challenging due to
issues with gradients becoming too small (vanishing gradients) or too large
(exploding gradients). Techniques such as gradient clipping, careful initialization of
weights, and using LSTM or GRU units help alleviate these issues.
 Computational Complexity: Deeper networks require more computation and
memory resources, which can increase training time and complexity.
Applications:
 Deep RNNs are widely used in tasks where capturing long-term dependencies and
complex patterns in sequential data is crucial. This includes natural language

Arunachala College of Engineering for Women 154


AD3501 – Deep Learning
processing tasks (machine translation, sentiment analysis), time series forecasting,
speech recognition, and more.
 They are also integrated into more complex architectures, such as seq2seq models
with attention mechanisms, to improve their ability to focus on relevant parts of input
sequences.

Advantages:
 Improved Representation Learning: Each layer in a deep RNN learns increasingly
abstract representations of the input sequence, potentially leading to better
performance on tasks that require understanding hierarchical structures.
 Enhanced Modeling Capabilities: Deeper architectures can capture more intricate
dependencies and patterns in sequential data compared to shallow RNNs or
feedforward networks.

3.8 RECURSIVE NEURAL NETWORKS


Recursive Neural Networks (RvNNs) are a class of deep neural networks that can learn
detailed and structured information. With RvNN, you can get a structured prediction by
recursively applying the same set of weights on structured inputs. The word recursive
indicates that the neural network is applied to its output.
Recursive Neural Networks (RecNNs) are a type of neural network architecture
designed to operate on hierarchical structures, such as parse trees or other recursively
structured data. They differ from traditional feedforward and recurrent neural networks by
explicitly modeling the hierarchical relationships inherent in the data.

Arunachala College of Engineering for Women 155


AD3501 – Deep Learning

Due to their deep tree-like structure, Recursive Neural Networks can handle
hierarchical data. The tree structure means combining child nodes and producing parent nodes.
Each child-parent bond has a weight matrix, and similar children have the same weights. The
number of children for every node in the tree is fixed to enable it to perform recursive
operations and use the same weights. RvNNs are used when there's a need to parse an entire
sentence.
A recursive network represent yet another generalization of recurrent networks, with
a different kind of computational graph, which is structured as a deep tree, rather than the
chain-like structure of RNNs. The typical computational graph for a recursive network is
illustrated in below fig.. Recursive neural networks were introduced by Pollack (1990).
Recursive networks have been successfully applied to processing data structures as input to
neural nets in natural language processing as well as in computer vision. One clear advantage
of recursive nets over recurrent nets is that for a sequence of the same length τ, the depth
(measured as the number of compositions of nonlinear operations) can be drastically reduced
from τ to O(log τ ), which might help deal with long-term dependencies. An open question is
how to best structure the tree. One option is to have a tree structure which does not depend on
the data, such as a balanced binary tree.

Arunachala College of Engineering for Women 156


AD3501 – Deep Learning

In some application domains, external methods can suggest the appropriate tree
structure. For example, when processing natural language sentences, the tree structure for the
recursive network can be fixed to the structure of the parse tree of the sentence provided by a
natural language parser. Ideally, one would like the learner itself to discover and infer the tree
structure that is appropriate for any given input, as suggested by Bottou (2011). Many
variants of the recursive net idea are possible.
Pros: Compared with a RNN, for a sequence of the same length τ, the depth (measured
as the number of compositions of nonlinear operations) can be drastically reduced from τ to
O(logτ).

Arunachala College of Engineering for Women 157


AD3501 – Deep Learning
Cons: how to best structure the tree? Balanced binary tree is an optional but not
optimal for many data. For natural sentences, one can use a parser to yield the tree structure,
but this is both expensive and inaccurate. Thus recursive NN is NOT popular.
Recursive Neural Network Implementation
A Recursive Neural Network is used for sentiment analysis in natural language
sentences. It is one of the most important tasks of Natural language Processing (NLP), which
identifies the writing tone and sentiments of the writer in a particular sentence. If a writer
expresses any sentiment, basic labels about the writing tone are recognized. We want to
identify the smaller components like nouns or verb phrases and order them in a syntactic
hierarchy. For example, it identifies whether the sentence showcases a constructive form of
writing or negative word choices.
A variable called 'score' is calculated at each traversal of nodes, telling us which pair of
phrases and words we must combine to form the perfect syntactic tree for a given sentence.
Let us consider the representation of the phrase -- "a lot of fun" in the following
sentence. Programming is a lot of fun.
An RNN representation of this phrase would not be suitable because it considers only
sequential relations. Each state varies with the preceding words' representation. So, a
subsequence that doesn't occur at the beginning of the sentence can't be represented. With
RNN, when processing the word 'fun,' the hidden state will represent the whole sentence.
However, with a Recursive Neural Network (RvNN), the hierarchical architecture can
store the representation of the exact phrase. It lies in the hidden state of the node R_{a\ lot\ of\
fun}. Thus, Syntactic parsing is completely implemented with the help of Recursive Neural
Networks.
Recursive Structure:
o RecNNs process structured data where each input example is represented as a
hierarchical tree or a nested structure.
o Each node in the tree represents a component of the input, and nodes are
recursively combined until a single representation (often at the root) is
generated.

Arunachala College of Engineering for Women 158


AD3501 – Deep Learning
Node Representation:
o Leaf Nodes: Represent individual input elements (e.g., words in a sentence,
tokens in a sequence).
o Internal Nodes: Represent combinations of their child nodes, which could be
simple concatenations, weighted sums, or more complex operations depending
on the architecture.
Recursive Computation:
o RecNNs propagate information up the tree structure by recursively applying a
neural network operation at each node.
o At each node, a function combines representations of its child nodes to
produce a new representation for itself.
o This recursive application continues until a single representation is computed
for the entire structure (often at the root node).
Neural Network Operation:
o Each node in a RecNN typically applies a neural network operation to
combine its child nodes' representations.
o Common operations include feedforward layers, convolutional operations, or
more specialized operations tailored to hierarchical data structures.
Output Prediction:
o After the hierarchical structure is processed, the final representation (usually at
the root node) is used for prediction or further processing.
o Depending on the task, this representation can be passed through additional
layers (e.g., fully connected layers) for classification, regression, or other
tasks.

Arunachala College of Engineering for Women 159


AD3501 – Deep Learning

o
Applications of Recursive Neural Networks:
 Natural Language Processing (NLP): RecNNs are used for tasks such as parsing,
sentiment analysis, and text classification where sentences or phrases can be
represented hierarchically.
 Image Processing: They can be adapted to process hierarchical representations in
images, such as object recognition in scenes.
 Bioinformatics: Analyzing biological data, such as protein structure prediction,
where molecular structures are naturally hierarchical.

Arunachala College of Engineering for Women 160


AD3501 – Deep Learning
 Robotics: Handling hierarchical data in robotics tasks, such as action planning or
perception.

Advantages:
o Hierarchical Representation: RecNNs naturally capture hierarchical
relationships in data.
o Flexibility: They can handle variable-sized input structures and adapt to
different domains with hierarchical data.
 Challenges:
o Complexity: Designing and training RecNNs can be more complex compared
to simpler neural network architectures.
o Scalability: Handling large and deep hierarchical structures may require
careful design to avoid computational bottlenecks.
Disadvantages of RvNNs
 The main disadvantage of recursive neural networks can be the tree structure. Using
the tree structure indicates introducing a unique inductive bias to our model. The bias
corresponds to the assumption that the data follow a tree hierarchy structure. But that is
not the truth. Thus, the network may not be able to learn the existing patterns.
 Another disadvantage of the Recursive Neural Network is that sentence parsing can be
slow and ambiguous. Interestingly, there can be many parse trees for a single sentence.
 Also, it is more time-consuming and labor-intensive to label the training data for
recursive neural networks than to construct recurrent neural networks. Manually
parsing a sentence into short components is more time-consuming and tedious than
assigning a label to a sentence.

3.8 LONG-TERM DEPENDENCIES


Long-term dependencies in the context of Recurrent Neural Networks (RNNs) refer to the
relationships or dependencies between inputs and outputs that are separated by a significant

Arunachala College of Engineering for Women 161


AD3501 – Deep Learning
number of time steps within a sequence. Handling long-term dependencies in traditional
Recurrent Neural Networks (RNNs) has been a significant challenge.
Issues with Long-Term Dependencies in RNNs:
1. Vanishing Gradient Problem:
o During backpropagation through time (BPTT), gradients can become very
small (vanish) as they propagate through many time steps.
o This occurs because gradients are multiplied at each time step, and if they are
less than one, they can diminish exponentially over time.
o As a result, the model struggles to learn dependencies that are spread across
many time steps.
2. Exploding Gradient Problem:
o Conversely, gradients can also become very large (explode), especially in deep
or unstable networks.
o This makes it difficult to train the network effectively as updates become too
large and lead to unstable learning.
3. Short-Term Memory:
o Traditional RNNs have a limited ability to retain information over long
sequences.
o The hidden state may only be capable of remembering information for a few
time steps before it starts to degrade or get overwritten by new input.

Arunachala College of Engineering for Women 162


AD3501 – Deep Learning
Approaches to Mitigate Long-Term Dependency Issues:
1. Long Short-Term Memory (LSTM):
o LSTMs are a type of RNN architecture designed to address the vanishing
gradient problem and improve the model's ability to capture long-term
dependencies.
o They achieve this by using a more complex memory cell structure with gates
(input gate, forget gate, output gate) that regulate the flow of information into
and out of the cell.
o This allows LSTMs to selectively remember or forget information over long
sequences, thus maintaining long-term dependencies more effectively.

2. Gated Recurrent Unit (GRU):


o GRUs are another variant of RNNs that simplify the LSTM architecture by
merging the cell state and hidden state into a single vector.
o They use reset and update gates to control the flow of information similarly to
LSTMs, offering a balance between performance and simplicity.
3. Bidirectional RNNs:
o Bidirectional RNNs process the input sequence in both forward and backward
directions, allowing each time step to access information from both past and
future contexts.
o This helps capture dependencies that may span across the entire sequence,
improving the model's ability to learn long-term relationships.
4. Attention Mechanisms:
o Attention mechanisms allow the model to selectively focus on relevant parts
of the input sequence.
o By dynamically weighting different parts of the input sequence, attention
mechanisms can effectively handle long-term dependencies by attending to the
most relevant information at each decoding step.

Arunachala College of Engineering for Women 163


AD3501 – Deep Learning
5. Deep RNNs:
o Stacking multiple layers of RNNs (deep RNNs) can also help in capturing
hierarchical representations and learning more abstract features over long
sequences.
o Each layer can learn different levels of abstraction, with higher layers
capturing longer-term dependencies.

3.9 LEAKY UNITS: SKIP CONNECTIONS AND DROPOUTS


Hidden units with linear self-connections behave similar to running average. They are
called leaky units. leaky units" typically refer to using activation functions that allow a small,
non-zero gradient when the unit is not active. This is primarily aimed at addressing the
vanishing gradient problem which can hinder the training of deep RNNs

3.9.1 Leaky Units in RNNs


Leaky units, specifically the Leaky ReLU (Rectified Linear Unit) activation function,
are used to improve the learning and performance of neural networks, including
Recurrent Neural Networks (RNNs). The core idea behind leaky units is to address
the vanishing gradient problem, which is particularly challenging in RNNs due to
their sequential nature.

Leaky units, particularly Leaky ReLU activations, are used to address the vanishing gradient
problem by allowing a small, non-zero gradient for negative inputs:

Arunachala College of Engineering for Women 164


AD3501 – Deep Learning
 Functionality: Leaky ReLU modifies the standard ReLU activation function to
f(x)=xf(x) = xf(x)=x if x>0x > 0x>0, and f(x)=axf(x) = axf(x)=ax if x≤0x \leq 0x≤0,
where aaa is a small constant (e.g., 0.01).
 Advantages: Helps maintain gradient flow through the network, preventing neurons
from becoming inactive and improving the RNN's ability to capture long-term
dependencies.
3.9.2 Skip Connections in RNNs
Skip connections, or residual connections, involve adding the input of a layer directly to the
output of a deeper layer in the network:
 Purpose: Facilitates gradient propagation through the network by providing an
additional path for information and gradients to flow.
 Implementation: In RNNs, skip connections can be introduced by adding the input
xtx_txt directly to the output hth_tht of a recurrent layer, allowing for easier learning
of complex dependencies over time.
Skip Connections (or Shortcut Connections) as the name suggests skips some of the
layers in the neural network and feeds the output of one layer as the input to the next
layers. Skip Connections were introduced to solve different problems in different
architectures. In the case of ResNets, skip connections solved the degradation problem
that we addressed earlier whereas, in the case of DenseNets, it ensured feature reusability.

Arunachala College of Engineering for Women 165


AD3501 – Deep Learning

1. Basic RNN Architecture


In a standard RNN, the hidden state at time ttt (hth_tht) is computed using the hidden state at
the previous time step (ht−1h_{t-1}ht−1) and the input at the current time step (xtx_txt):

2. Adding Skip Connections


Skip connections allow the hidden state at time ttt to be directly influenced by hidden states
from earlier time steps, not just the immediately preceding one. For example, a skip

connection might connect :

3. Residual Connections

Arunachala College of Engineering for Women 166


AD3501 – Deep Learning
This form of skip connection can help to preserve information from earlier time steps and
make the training of deep RNNs more stable.
4. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU)
Advanced RNN architectures like LSTM and GRU inherently incorporate mechanisms to
handle long-term dependencies better. Skip connections can still be added to these
architectures to further enhance their performance:

Benefits of Skip Connections in RNNs


1. Improved Gradient Flow: Skip connections help mitigate the vanishing gradient
problem, allowing gradients to flow more easily through the network during
backpropagation.
2. Better Representation Learning: By allowing the network to access information
from earlier time steps, skip connections enable the learning of more complex
temporal patterns.
3. Faster Convergence: Networks with skip connections often converge faster during
training due to improved gradient flow.
3.9.3 Dropouts in RNNs
Dropout is a regularization technique where a fraction of neurons are randomly dropped out
during training:
 Purpose: Prevents overfitting by forcing the network to learn more robust features
that generalize better to unseen data.
 Integration: In RNNs, dropout can be applied to the input and/or output of each
recurrent unit at each time step, improving the network's generalization capability.

Arunachala College of Engineering for Women 167


AD3501 – Deep Learning
Dropout is a computationally inexpensive, yet powerful regularization technique. The
problem with bagging is that we can’t train an exponentially large number of models and store
them for prediction later.

Dropout makes bagging practical by making an inexpensive approximation. In a simplistic


view, dropout trains the ensemble of all sub-networks formed by randomly removing a few
non-output units by multiplying their outputs by 0. For every training sample, a mask is
computed for all the input and hidden units independently. For clarification, suppose we
have h hidden units in some layer. Then, a mask for that layer refers to a h dimensional vector
with values either 0(remove the unit) or 1(keep the unit).
Bagging vs Dropout
In bagging, the models are independent of each other, whereas in dropout, the different
models share parameters, with each model taking as input, a sample of the total parameters.
In bagging, each model is trained till convergence, but in dropout, each model is trained
for just one step and the parameter sharing makes sure that subsequent updates ensure better
predictions in the future.
At test time, we combine the predictions of all the models. In the case of bagging with
K models, this was given by the arithmetic mean. In case of dropout, the probability that a
model is chosen is given by p(μ), with μ denoting the mask vector. The prediction then

Arunachala College of Engineering for Women 168


AD3501 – Deep Learning
becomes ∑ p(μ)p(y|x, μ). This is not computationally feasible, and there’s a better method to
compute this in one go, using the geometric mean instead of the arithmetic mean.
We need to take care of two main things when working with geometric mean:
 None of the probabilities should be zero.
 Re-normalization to make sure all the probabilities sum to 1.

The advantage for dropout is that first term can be approximated in one pass of the complete
model by dividing the weight values by the keep probability (weight scaling inference rule).
The motivation behind this is to capture the right expected values from the output of each unit,
i.e. the total expected input to a unit at train time is equal to the total expected input at test time.
A big advantage of dropout then is that it doesn’t place any restriction on the type of model or
training procedure to use.
Points to note:
 Reduces the representational capacity of the model and hence, the model should be
large enough to begin with.
 Works better with more data.
 Equivalent to L² for linear regression, with different weight decay coefficient for each
input feature.
Dropout in RNNs
In RNNs, dropout can be applied in several ways:
1. Input Dropout: Dropout is applied to the input features.
2. Output Dropout: Dropout is applied to the output of the RNN layer.
3. Recurrent Dropout: Dropout is applied to the connections between the recurrent
units.

Arunachala College of Engineering for Women 169


AD3501 – Deep Learning
3.10 GATED ARCHITECTURE: LSTM
Gated architectures in deep learning refer to neural network structures that include
mechanisms (gates) to regulate the flow of information. These gates control what information
should be retained, updated, or discarded at each step of the learning process, allowing the
network to manage long-term dependencies and solve issues like the vanishing gradient
problem.
Key Gated Architectures
1. Long Short-Term Memory (LSTM) Networks
2. Gated Recurrent Unit (GRU) Networks
3. Highway Networks
Gated architectures, such as the Long Short-Term Memory (LSTM) networks, are a type of
recurrent neural network (RNN) designed to better capture long-term dependencies in
sequence data. Here's an overview of LSTM architecture and its components:
LSTM Architecture
LSTM networks consist of a series of repeating modules (cells), each containing three gates
to regulate the flow of information:
1. Forget Gate: Determines which information from the previous cell state should be
discarded.
2. Input Gate: Decides which new information will be stored in the cell state.
3. Output Gate: Controls which part of the cell state will be output as the hidden state.

Arunachala College of Engineering for Women 170


AD3501 – Deep Learning

Key Components
 Cell State (C_t): Acts as a memory that carries information across different time
steps.
 Hidden State (h_t): The output at the current time step, used for predictions and fed
into the next cell.

Equations

Arunachala College of Engineering for Women 171


AD3501 – Deep Learning

Advantages of LSTM
 Long-Term Dependency Learning: Unlike traditional RNNs, LSTMs can learn
dependencies over long sequences, which is crucial for tasks like language modeling
and time-series forecasting.
 Avoids Vanishing Gradient Problem: The gating mechanism helps mitigate the
vanishing gradient problem, allowing the model to retain important information over
extended periods.
Applications
 Natural Language Processing (NLP): Language translation, sentiment analysis, and
speech recognition.
 Time-Series Analysis: Stock price prediction, weather forecasting, and anomaly
detection.
 Sequential Data Processing: Video analysis, music composition, and handwriting
recognition.
LSTMs have been widely adopted in various fields due to their ability to handle complex
sequential data effectively.

Arunachala College of Engineering for Women 172


AD3501 – Deep Learning
PART-A
1. What is Recurrent Neural Network?
Traditional neural networks mainly have independent input and output layers, which
make them inefficient when dealing with sequential data. Hence, a new neural network called
Recurrent Neural Network, introduced to store results of previous outputs in the internal
memory. These results are then fed into the network inputs in order to predict the output of the
layer. This allows it to be used in applications like pattern detection, speech and voice
recognition, natural language processing, and time series prediction.

2. Why Recurrent Neural Networks?


RNN were created because there were a few issues in the feed-forward neural network:
• Cannot handle sequential data
• Considers only the current input
• Cannot memorize previous inputs

3. How Does Recurrent Neural Networks Work?


 In Recurrent Neural networks, the information cycles through a loop to the middle
hidden layer.
 The input layer ‘x’ takes in the input to the neural network and processes it and passes
it onto the middle layer.
 The middle layer ‘h’ can consist of multiple hidden layers, each with its own activation
functions and weights and biases. If we have a neural network where the various
parameters of different hidden layers are not affected by the previous layer, ie: the
neural network does not have memory, then we can use a recurrent neural network.

4. Define Feed-Forward Neural Networks vs Recurrent Neural Networks


 Feed-forward neural networks transmit data in one direction—from input to output—
without feedback loops, making them suitable for tasks like pattern recognition and
classification.

Arunachala College of Engineering for Women 173


AD3501 – Deep Learning
 A recurrent neural network (RNN) is a deep learning model that is trained to process
and convert a sequential data input into a specific sequential data output.

5. Give the applications of Recurrent Neural Networks


Image Captioning: RNNs are used to caption an image by analysing the activities present.
Time Series Prediction: Any time series problem, like predicting the prices of stocks in a
particular month, can be solved using an RNN.
Natural Language Processing: Text mining and Sentiment analysis can be carried out using an
RNN for Natural Language Processing (NLP).

6. Advantages of Recurrent Neural Network


Recurrent Neural Networks (RNNs) have several advantages over other types of neural
networks, including:
 Ability to Handle Variable-Length Sequences: RNNs are designed to handle input
sequences of variable length, which makes them well-suited for tasks such as speech
recognition, natural language processing, and time series analysis.
 Memory of Past Inputs:RNNs have a memory of past inputs, which allows them to
capture information about the context of the input sequence. This makes them useful

Arunachala College of Engineering for Women 174


AD3501 – Deep Learning
for tasks such as language modelling, where the meaning of a word depends on the
context in which it appears.

 Parameter Sharing: RNNs share the same set of parameters across all time steps, which
reduce the number of parameters that need to be learned and can lead to better
generalization.
 Non-Linear Mapping: RNNs use non-linear activation functions, which allow them to
learn complex, non-linear mappings between inputs and outputs.
 Sequential Processing: RNNs process input sequences sequentially, which makes them
computationally efficient and easy to parallelize.

7. Disadvantages of Recurrent Neural Network


Although Recurrent Neural Networks (RNNs) have several advantages, they also have some
disadvantages. Here are some of the main disadvantages of RNNs:
• Vanishing and Exploding Gradients
• Computational Complexity
• Difficulty in Capturing Long-Term Dependencies
• Lack of ParallelismDifficulty in Choosing the Right Architecture
• Difficulty in Interpreting the Output

8. What are the types of Recurrent Neural Networks?


1. One-to-One
2. One-to-many
3. Many-to-One
4. many-to-many

9. Define Unfolding Graphs


A computational graph is a way to formalize the structure of a set of computations, such as
those involved in mapping inputs and parameters to outputs and loss.

Arunachala College of Engineering for Women 175


AD3501 – Deep Learning
 We can unfold a recursive or recurrent computation into a computational graph that
has a repetitive structure - Corresponding to a chain of events
 Unfolding this graph results in sharing of parameters across a deep network structure
 The below figure shows unfolding the recurrent neuron in a Recurrent Neural
Network.

10. Define One-to-Many


One-to-Many is a type of RNN that gives multiple outputs when given a single input. It takes a
fixed input size and gives a sequence of data outputs. Its applications can be found in Music
Generation and Image Captioning.

11. Define Gradient computation


It is a general solution to edge direction selection. Hibbard's method (1995) uses
horizontal and vertical gradients, computed at each pixel where the G component must be
estimated, in order to select the direction that provides the best green level estimation

12. Define Many-to-Many


Many-to-Many are used to generate a sequence of output data from a sequence of input units.
This type of RNN is further divided into the following two subcategories:
1. Equal Unit Size: In this case, the number of both the input and output units is the same. A
common application can be found in Name-Entity Recognition.
2. Unequal Unit Size: In this case, inputs and outputs have different numbers of units. Its
application can be found in Machine Translation.

13. Give two Issues of Standard RNNs


a. Vanishing Gradient Problem
Recurrent Neural Networks enable us to model time-dependent and sequential data problems,
such as stock market prediction, machine translation, and text generation. We will find,
however, RNN is hard to train because of the gradient problem.

Arunachala College of Engineering for Women 176


AD3501 – Deep Learning
RNNs suffer from the problem of vanishing gradients. The gradients carry information used in
the RNN, and when the gradient becomes too small, the parameter updates become
insignificant. This makes the learning of long data sequences difficult.
b. Exploding Gradient Problem
While training a neural network, if the slope tends to grow exponentially instead of decaying,
this is called an Exploding Gradient. This problem arises when large error gradients
accumulate, resulting in very large updates to the neural network model weights during the
training process.Long training time, poor performance, and bad accuracy are the major issues
in gradient problems.

14. Define Long Short-Term Memory (LSTM) Networks


LSTM is a type of RNN that is designed to handle the vanishing gradient problem that can
occur in standard RNNs. It does this by introducing three gating mechanisms that control the
flow of information through the network: the input gate, the forget gate, and the output gate.
These gates allow the LSTM network to selectively remember or forget information from the
input sequence, which makes it more effective for long-term dependencies.

15. Define Gated Recurrent Unit (GRU) Networks


GRU is another type of RNN that is designed to address the vanishing gradient problem. It has
two gates: the reset gate and the update gate. The reset gate determines how much of the
previous state should be forgotten, while the update gate determines how much of the new
state should be remembered. This allows the GRU network to selectively update its internal
state based on the input sequence

16. Define Bidirectional RNNs


Bidirectional RNNs are designed to process input sequences in both forward and backward
directions. This allows the network to capture both past and future context, which can be
useful for speech recognition and natural language processing tasks.

Arunachala College of Engineering for Women 177


AD3501 – Deep Learning
17. Define Encoder-Decoder RNNs
Encoder-decoder RNNs consist of two RNNs: an encoder network that processes the input
sequence and produces a fixed-length vector representation of the input and a decoder network
that generates the output sequence based on the encoder's representation. This architecture is
commonly used for sequence-to-sequence tasks such as machine translation.

18. Define Encoder-Decoder Model


There are three main blocks in the encoder-decoder model,

a. Encoder
b. Hidden Vector
c. Decoder

19. What are the Application Sequence to Sequence Model works?


It possesses many applications such as
• Google’s Machine Translation
• Question answering chatbots
• Speech recognition
• Time Series Application etc.,

20. Define Back-propagation through time (BPTT)


Back-propagation through time is a widely used algorithm for training recurrent neural
networks (RNNs). It is a variant of the back-propagation algorithm specifically designed to
handle the temporal nature of RNNs, where the output at each time step depends on the inputs
and outputs at previous time steps.

Arunachala College of Engineering for Women 178


AD3501 – Deep Learning
PART-B
1. Explain Recursive Neural Networks with types.
2. Describe Unfolding Computational Graphs.
3. Explain Bidirectional RNNs.
4. Describe the following
.i. Teacher Forcing in Recurrent Neural Networks.
ii. Networks with Output Recurrence.
5. Describe Echo State Networks.
6.Explain challenge of Long-Term Dependencies.
7.Discuss Recurrent Neural Networks in detail.
8.Describe Deep Recurrent Networks in detail.
9.llustrate Encoder-Decoder sequence-to-sequence Architecture.
10. Leaky Units and Other Strategies for Multiple Time Scales.
11.Point out various features of Echo state networks.

PART-C
1. Develop an example for Unfolding Computational Graphs and describe theMajor
advantages of unfolding process.
2. Explain how to compute the gradient in a Recurrent Neural Network.
3.Explain a modeling sequences Conditioned on Context with RNNs.
4.Prepare an example of Encoder- Decoder or sequence-to-sequence RNN architecture.
5Explain various Gated RNNs.
6. Explain the steps to developing the necessary assumption structure in Deep learning?
7. Explain the difference between a Shallow Network and a Deep Network.
8. For the application of Face Detection, which deep learning algorithm would you use?

Arunachala College of Engineering for Women 179

You might also like