0% found this document useful (0 votes)
5 views

Recurrent and Recursive Neural Networks

Module 5 covers Recurrent Neural Networks (RNNs) and their various architectures, including Bidirectional RNNs and Long Short-Term Memory (LSTM) networks, which are designed for processing sequential data and handling long-term dependencies. It discusses the unfolding of computational graphs, the unique capabilities of RNNs, and their applications in fields such as speech recognition and natural language processing. Additionally, it highlights the importance of deep learning infrastructure for large-scale applications and the evolution of techniques in computer vision and NLP.

Uploaded by

n.kumar05052002
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Recurrent and Recursive Neural Networks

Module 5 covers Recurrent Neural Networks (RNNs) and their various architectures, including Bidirectional RNNs and Long Short-Term Memory (LSTM) networks, which are designed for processing sequential data and handling long-term dependencies. It discusses the unfolding of computational graphs, the unique capabilities of RNNs, and their applications in fields such as speech recognition and natural language processing. Additionally, it highlights the importance of deep learning infrastructure for large-scale applications and the evolution of techniques in computer vision and NLP.

Uploaded by

n.kumar05052002
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Module 5

Recurrent and Recursive Neural Networks: Unfolding Computational Graphs, Recurrent Neural Network,
Bidirectional RNNs, Deep Recurrent Networks, Recursive Neural Networks, The Long ShortTerm Memory
and Other Gated RNNs. Applications: Large-Scale Deep Learning, Computer, Speech Recognition, Natural
Language Processing and Other Applications.

Textbook 1: Chapter: 10.1-10.3, 10.5, 10.6, 10.10, 12


Defn of RNN:
Recurrent neural networks or RNNs are a family of neural networks for processing sequential
data. Recurrent Neural Networks introduce a mechanism where the output from one step is fed
back as input to the next, allowing them to retain information from previous inputs . This
design makes RNNs well-suited for tasks where context from earlier steps is essential, such as
predicting the next word in a sentence.

10.1 Unfolding Computational Graph (important question )


Explain the concept of unfolding Recurrent Neural Network
A computational graph is a way to formalize the structure of a set of computations, such as those
involved in mapping inputs and parameters to outputs and loss
Recursive or recurrent computation into a computational graph that has a repetitive structure,
typically corresponding to a chain of events. Unfolding this graph results in the sharing of
parameters across a deep network structure.
Module 5

Example 1 for graph without external inputs

Example 2 : for graph with external inputs

where we see that the state now contains information about the whole past sequence.
Recurrent neural networks can be built in many different ways. Much as almost any function
can be considered a feedforward neural network, essentially any function involving recurrence
can be considered a recurrent neural network.
Module 5

10.2 Recurrent Neural Networks

Recurrent neural networks (RNNs) are different from other neural networks with their
unique capabilities:
 Internal Memory: This is the key feature of RNNs. It allows them to remember past
inputs and use that context when processing new information.
 Sequential Data Processing: Because of their memory, RNNs are exceptional at
handling sequential data where the order of elements matters. This makes them ideal
for speech recognition, machine translation, natural language processing
(NLP) and text generation.
 Contextual Understanding: RNNs can analyze the current input in relation to what
they’ve “seen” before. This contextual understanding is crucial for tasks where
meaning depends on prior information.
 Dynamic Processing: RNNs can continuously update their internal memory as they
process new data, allowing them to adapt to changing patterns within a sequence.

important design patterns for recurrent neural networks include the following
Module 5

 Recurrent networks that produce an output at each time step and have recurrent
connections between hidden units, illustrated in figure 10.3.
 Recurrent networks that produce an output at each time step and have recurrent
connections only from the output at one time step to the hidden units at the next time
step, illustrated in figure 10.4
 Recurrent networks with recurrent connections between hidden units, that read an
entire sequence and then produce a single output,

Figure 10.4: An RNN whose only recurrence is the feedback connection from the output to the hidden layer
Module 5

The forward propagation equations for the RNN are as follows:

Forward propagation begins with a specification of the initial state. Then, for each time step
from t=1 to t= τ, we apply the following update equations:

Where the parameters are the bias vectors b and c along with the weight matrices U, V and W,
respectively for input-to-hidden, hidden-to-output and hidden-to hidden connections.

10.2 Computing the Gradient in a Recurrent Neural Network:

Computing the gradient through a recurrent neural network is straightforward. One simply
applies the generalized back-propagation algorithm to the unrolled computational graph.
Module 5

c. Gradients for Parameters

The gradients for the shared parameters W, U, V, b, c are obtained by summing over all time
steps:

 Output weight (V)


 Recurrent weight (W)
 Input (U)
 Biases (b, c)
10.2.3 Recurrent Networks as Directed Graphical Models

One way to interpret an RNN as a graphical model is to view the RNN as defining a graphical model
whose structure is the complete graph, able to represent direct dependencies between any pair of y
values with the complete graph structure is shown in figure 10.7

graph interpretation of the RNN is based on ignoring the hidden units h(t ) by marginalizing them out of
the model.

It is more interesting to consider the graphical model structure of RNNs that results from regarding the
hidden units h(t ) hidden units in the graphical model reveals that the RNN provides a very efficient
parametrization of the joint distribution over the observations.

The edges in a graphical model indicate which variables depend directly on other variables
Module 5

4. Modeling Sequences Conditioned on Context with RNNs (Model QP )

Explain how the Recurrent neural network (RNN) processes data sequences(Model QP QNo 9.)

Modeling Sequences Conditioned on Context with RNNs focuses on how recurrent neural networks
(RNNs) can generate or predict sequential data while taking additional contextual information into
account. This approach is critical in tasks such as conditional language modeling, video captioning, or
time-series prediction conditioned on external variables
Module 5

Figure 10.9: An RNN that maps a fixed-length vector x into a distribution over sequences Y

The first and most common approach is illustrated in figure10.9 . The interaction between the input x
and each hidden unit vector h(t) is parametrized by newly introduced weight matrix R that was absent
from the model of only the sequence of y values.

Method of Data processing


Module 5

Bidirectional RNNs :

Discuss about Bidirectional RNNs. (Model QP NO 9b)

o Bidirectional RNNs process inputs in both forward and backward


directions, capturing both past and future context for each time step. This
architecture is ideal for tasks where the entire sequence is available, such as
named entity recognition and question answering.
Module 5

Q. Encoder-Decoder Architecture
Module 5

The encoder-decoder architecture is a type of RNN design that processes an input sequence of
variable length and generates an output sequence of potentially different length. This is
particularly useful in tasks like:

 Machine Translation: Translating a sentence from one language to another.


 Speech Recognition: Converting speech audio to text.
 Question Answering: Mapping a question to a textual answer.
Module 5

10.5 Deep Recurrent Network

The computation in most RNNs can be decomposed into three blocks of parameters and associated
transformations:

1. from the input to the hidden state,

2. from the previous hidden state to the next hidden state, and

3. from the hidden state to the output

each of these three blocks is associated with a single weight matrix. In other words, when the network is
unfolded, each of these corresponds to a shallow transformation.

Deep Operations: it Would be advantageous to introduce depth in each of these operations

A recurrent neural network can be made deep in many ways

(a) . The hidden recurrent state can be broken down into groups organized hierarchically. Fig a

(b) Deeper computation (e.g., an MLP) can be introduced in the input-to hidden, hidden-to-hidden and
hidden-to-output parts. This may lengthen the shortest path linking different time steps. fig b

(c) The path-lengthening effect can be mitigated by introducing skip connections fig c

A Deep recurrent neural network

10.6 Recursive Neural Networks


Module 5

Recursive neural networks represent yet another generalization of recurrent networks, with a different
kind of computational graph, which is structured as a deep tree, rather than the chain-like structure of
RNNs. The typical computational graph for a recursive network is illustrated in figure 10.14 .

Figure 10.14: A recursive network has a computational graph that generalizes that of the recurrent
network from a chain to a tree.

Recursive networks have been successfully applied to processing data structures as input to neural nets,
in natural language processing, computer vision

10.10 The Long Short-Term Memory and Other Gated RNN (Model QP imp)

ost effective sequence models used in practical applications are called gated RNNs Networks. These
include long short-term memory and networks based on the gated recurrent unit.

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN)
specifically designed to handle long-term dependencies in sequential data. They address the
vanishing gradient problem of standard RNNs, enabling the effective modeling of longer
sequences.
Module 5

Figure 10.16: Block diagram of the LSTM recurrent network.

An LSTM unit consists of a memory cell (ct) and three gating mechanisms—input, forget, and
output gates—that regulate the flow of information:
Module 5

In LSTM, old information is forgetted using forget gate, add new information using input gate,
compute output using output gate,

Advantages of LSTMs

1. Handles Long-Term Dependencies:


o LSTMs excel at retaining important information over long sequences because of their
gating mechanisms.
2. Prevents Vanishing/Exploding Gradients:
o The cell state ensures gradients do not diminish over time, enabling effective training.
3. Flexible Memory Management:
o The three gates allow fine-grained control over what to forget, store, and outpu
Module 5

Chapter 12

Q. DL for Large Scale Applications

Deep learning is based on the philosophy of connectionism: deep learning APPLICATIONS requires high
performance hardware and software infrastructure. The following are requirements :

1. Fast CPU Implementations: Traditionally, neural networks were trained using the CPU of a single
machine. Today, this approach is generally considered insufficient. Instead we use, GPU
computing or the CPUs of many machines networked together.
2. GPU Implementations:
 Origins in Graphics: GPUs were originally developed for rendering 3D graphics in
video games, requiring fast, parallel computation of simple operations.
 Parallelism: GPUs excel at handling parallel operations, such as matrix
multiplications and pixel color calculations, which are foundational to both graphics and
neural networks.
 High Memory Bandwidth: GPUs can efficiently handle large data buffers, making
them ideal for neural network training, where memory bandwidth is a bottleneck on
traditional CPUs.
 Hardware Flexibility: The evolution from specialized graphics hardware to general-
purpose GPUs (GP-GPUs) enabled broader scientific and machine learning applications.
3. Large-Scale Distributed Implementations:
 In many cases, the computational resources available on a single machine are insufficient.
Workload of training is distributed and inference across many machines. Distributing inference
is simple, because each input example we want to process can be run by a separate machine.
This is known as data parallelism.
 Another type, model parallelism, where multiple machines work together on a single
datapoint, with each machine running a different part of the model. This is feasible for both
inference and training.
4. Model Compression: The process of replacing a large, resource-intensive model with a
smaller, more efficient model that approximates the original model’s performance is
called Model Compression. It Reduce memory and runtime costs during inference. It also
helps to deploy lightweight models that deliver high performance while operating within
the constraints of limited hardware. Efficiency Gains: Minimized runtime and memory
usage, enabling real-time applications and lower energy consumption.
5. Dynamic Structure: Data processing systems can dynamically determine which subset of many
neural networks should be run on a given input. Individual neural networks can also exhibit
dynamic structure internally by determining which subset of features (hidden units) to compute
given information from the input. This form of dynamic structure inside neural networks is
sometimes called conditional computation
6. Specialized Hardware Implementations of Deep Networks:
Over the years, Application-Specific Integrated Circuits (ASICs), Field Programmable Gate
Arrays (FPGAs), and hybrid hardware solutions combining both digital and analog components
have been developed to optimize the performance of deep networks.
Module 5

The progress in the raw speed of CPUs or GPUs has slowed down due to physical
limitations. As a result, most performance improvements have come from parallelization
rather than single-core speed increases.
 Specialized hardware, like ASICs and FPGAs, can continue to advance performance
by optimizing specific computations in neural networks, pushing the envelope beyond
what general-purpose hardware can achieve.

2. Applications of DL in Computer Vision

Computer vision is a very broad field encompassing a wide variety of ways of


processing images, and an amazing diversity of applications. Applications of computer vision range from
reproducing human visual abilities, such as recognizing faces, to creating entirely new categories of
visual abilities.

1. Preprocessing: Many application areas require sophisticated preprocessing because the


original input comes in a form that is difficult for many deep learning architectures to
represent. Dataset augmentation may be seen as a way of preprocessing the training set
only.
2. Contrast Normalization: One of the most obvious sources of variation that can be safely
removed for many tasks is the amount of contrast in the image. Contrast simply refers
to the magnitude of the difference between the bright and the dark pixels in an image.
 Global contrast normalization aims to prevent images from having varying amounts of
contrast by subtracting the mean from each image, then rescaling it so that the standard
deviation across its pixels is equal to some constant .
 Local contrast normalization ensures that the contrast is normalized across each small
window, rather than over the image as a whole

3. Dataset Augmentation: Dataset augmentation is a technique used to artificially


expand the size of a dataset by applying various transformations or modifications
to the original data. This approach helps improve the performance and
generalization of machine learning models, especially when the dataset is small.
In specialized computer vision applications, more advanced transformations are
commonly used for dataset augmentation.

12.3 Speech Recognition


Module 5

The task of speech recognition is to map an acoustic signal containing a spoken natural language
utterance into the corresponding sequence of words intended by the speaker. Let X = ( x (1), x(2), ...,x(T))
denote the sequence of acoustic input vectors.

4. Natural Language Processing

In the Early Era, HMM (Hidden Markov Models) and GMM (Gaussian Mixture Models) were used for
Speech Recogntion.IN the Deep Learning Era, CNN , RNN and other deep earning models were used for
this purpose.

Natural language processing (NLP) is the use of human languages, such as English or French, by a
computer. Natural language processing includes applications such as machine translation, in which the
learner must read a sentence in one human language and emit an equivalent sentence in another
human language. Many NLP applications are based on language models that define a probability
distribution over sequences of words, characters or bytes in a natural language

i) n-grams: n-grams language model defines a probability distribution over sequences of


tokens in a natural language. Depending on how the model is designed, a token may be a
word, a character, or even a byte. Tokens are always discrete entities. The earliest successful
language models were based on models of fixed-length sequences n n of tokens called-
grams. An-gram is a sequence of tokens.
ii) Neural language models or NLMs are a class of language model designed to overcome the
curse of dimensionality problem for modeling natural language sequences by using a
distributed representation of words. Neural language models are able to recognize that two
words are similar without losing the ability to encode each word as distinct from the other.
iii) High-Dimensional Outputs: In many natural language applications, we often want our
models to produce words (rather than characters) as the fundamental unit of the output.
For large vocabularies, it can be very computationally expensive to represent an output
distribution over the choice of a word, because the vocabulary size is large.
iv) Combining Neural Language Models with-grams: A major advantage of n n-gram models
over neural networks is that n-gram models achieve high model capacity (by storing the
frequencies of very many tuples) while requiring very little computation to process an
example. typical neural network layers based on matrix multiplication use an amount of
computation proportional to the number of parameters. One easy way to add capacity is
Module 5

thus to combine both approaches in an ensemble consisting of a neural language model and
n-gram language model
v) Machine translation : is the task of reading a sentence in one natural language and emitting
a sentence with the equivalent meaning in another language

Fig: Encoder-decoder for Machine Translation

Q. Explain the Other Applications of Deep Learning:

other types of applications of deep learning that are different from the standard object recognition,
speech recognition and natural language processing tasks are discussed below:

i) Recommender Systems: One of the major families of applications of machine learning in the
information technology sector is the ability to make recommendations of items to potential
users or customers. Two major types of applications can be distinguished: online advertising
and item recommendations Companies including Amazon and eBay use machine learning,
including deep learning, for their product recommendations.
ii) Knowledge Representation, Reasoning : Deep learning approaches have been very
successful in language modeling, machine translation and natural language processing due
to the use of embeddings for symbols and words .These embeddings represent semantic
knowledge about individual words and concepts.
iii) Knowledge, Relations and Question Answering: One interesting research direction is
determining how distributed representations can be trained to capture the relations
between two entities. These relations allow us to formalize facts about objects and how
objects interact with each other

You might also like