0% found this document useful (0 votes)
2 views

Assignment_14_Modern_AI

The document discusses Recurrent Neural Networks (RNNs) and their limitations, particularly the vanishing and exploding gradient problems, which hinder their ability to learn long-term dependencies. It introduces Long Short-Term Memory (LSTM) networks as a solution, highlighting their architecture and advantages in tasks requiring memory retention. Additionally, it covers unsupervised learning, transfer learning, and applications in computer vision, natural language processing, and reinforcement learning.

Uploaded by

Ameen Aazam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Assignment_14_Modern_AI

The document discusses Recurrent Neural Networks (RNNs) and their limitations, particularly the vanishing and exploding gradient problems, which hinder their ability to learn long-term dependencies. It introduces Long Short-Term Memory (LSTM) networks as a solution, highlighting their architecture and advantages in tasks requiring memory retention. Additionally, it covers unsupervised learning, transfer learning, and applications in computer vision, natural language processing, and reinforcement learning.

Uploaded by

Ameen Aazam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Assignment - 14

Ameen Aazam
EE23BTECH11006

1 Recurrent Neural Networks (RNN)


Recurrent Neural Networks (RNNs) are neural networks which process sequential data using loops in
its computational graph so that they can keep a hidden state zt that changes over time. The hidden
state at time step t is computed using the equation:
zt = fw (zt−1 , xt ) = gz (Wz,z zt−1 + Wx,z xt ) (1)
The function updates the hidden state, while xt represents the input vector at time t. The weight
matrix for connections from previous state and current input is wx,z .
However, RNNs suffer from problems in training vocabulary due to gradient vanishing and exploding
problems that prevent it from learning dependencies over long sequences in sequential data.

1.1 Training a Basic RNN

Unrolled across time steps, an RNN is trained like a feed forward structure. The hidden layer updates
at each step based on the current input and the previous hidden state:
zt = gz (Wz,z zt−1 + Wx,z xt ) (2)
ŷt = gy (Wz,y zt ) (3)
Training uses backpropagation through time (BPTT), where gradients are computed across all time
steps. For example, the gradient for Wz,z is:
T
∂L X ∂zt
= −2(yt − ŷt )gy′ (iny,t )Wz,y (4)
∂Wz,z t=1
∂W z,z

One of the biggest challenges when training Recurrent Neural Networks (RNNs), is the vanishing
and exploding gradient problem. Learning long term dependencies can be difficult due to gradients
that shrink or grow exponentially. We find that basic RNNs are less suitable for tasks which require
information retention for long sequences, due to this limitation. To overcome these issues, advanced
architectures, such as LSTMs, help RNNs adapt to tasks where such information retention over long
sequences is necessary.

1.2 Long short-term memory RNNs

RNN limitations are overcome by Lstm networks with introduction of memory cells and gates. For
information flow, LSTM has its gates : forget (ft ), input (it ), output (ot ). Key equations governing
LSTMs are:
ft = σ(Wx,f xt + Wz,f zt−1 ) (5)
it = σ(Wx,i xt + Wz,i zt−1 ) (6)
ot = σ(Wx,o xt + Wz,o zt−1 ) (7)
ct = ft ⊙ ct−1 + it ⊙ tanh(Wx,c xt + Wz,c zt−1 ) (8)
zt = tanh(ct ) ⊙ ot (9)
The mechanisms which prevent gradient decay make LSTMs able to tolerate long term dependencies,
which is useful for speech recognition as well handwriting analysis etc.

Preprint. Under review.


2 Unsupervised Learning and Transfer Learning
Supervised learning is the main paradigm used by deep learning systems’ which in turn means
they need a lot of labeled data, therefore there has been an interest in trying to learn in the way of
unsupervised, transfer, and semi-supervised learning.

2.1 Unsupervised Learning

Text and images can be modeled with unsupervised learning using unlabeled data creating generative
models. Object recognition features and data samples probability distribution estimation are the
object of algorithms. Latent variables z, related to x, such as stroke thickness or color are represented
by joint models like PW (x, z).
• Probabilistic PCA: A simple generative model : A simple kind of model is probabilistic
principal components analysis (PPCA) where latent variable, z ∼ N (0, I) and observed data
x generated by x ∼ N (W z, σ 2 I). The overall likelyhood can be given by the following:
Z
PW (x) = PW (x, z) dz = N (x; 0, W W T + σ 2 I) (10)
Gradient methods and the EM algorithm both allow us to achieve W . PW (x) produces new
samples that can be used to generate new samples and low probability observations can be
flagged as an anomaly.
• Autoencoders : An autoencoder consists of:
Encoder f : x → ẑ (11)
Decoder g : ẑ → x (12)
2
P
This ties to minimise the reconstruction error j ∥xj − g(f (xj ))∥ . Linear autoencoder
uses a shared weight matrix W , learning the top m principal components.
In complex models the posterior distribution is approximated by variational methods. The
variational lower bound L is defined as:
L(x, Q) = log P (x) − DKL (Q(z)∥P (z | x)) (13)
Maximizing L approximates log P (x) using variational learning. The decoder g(z) models
log P (x|z), and the encoder f (x) defines parameters of Q.
• Deep Autoregressive Models : It is an autoregressive model in the sense that the prediction
of elements of the data vector x is based on other elements of that data vector, without
latent variables. Classical AR models are linear–Gaussian, for example. An alternative
to replacing linear models with deep networks is a deep autoregressive model, used in
DeepMind’s WaveNet application for example which generates speech from raw acoustic
signals via a nonlinear AR model.
• Generative Adversarial Networks (GANs) : A Generative Adversarial Network (GAN)
consists of two networks:
– Generator : Generates map from random values from a latent space z to generate
samples x resemling the distribution PW (x). It normally takes a unit Gaussian input.
– Discriminator : A classifier that tells between real data (from training set) and fake data
(generated by generator).
GANs are implicit generators that produce samples without explicit probability. Both train
at the same time, the generator tries to fool the discriminator and the discriminator classifies
the input correctly. If the generator perfectly replicates the training distribution, as they do
in the equilibrium state, it effectively removes any information about the source distribution.
• Unsupervised Translation : Unsupervised translation is a translation task that takes struc-
tured inputs x and produces structured outputs y without any paired input (x, y) data.
Supervised translation, traditionally, involves paired examples (e.g., human translated sen-
tences). Nevertheless, obtaining such pairs is seldom possible for tasks like converting a
night scene photo to daytime.
GANs are used to solve unsupervised translation problems, whereby we train one generator
to learn the distribution of reversed y given x, and another to learn x given y. With this
framework, it is able to generate plausible outputs without the need of specific pairs enabling
translation in a wide range of domains.

2
2.2 Transfer Learning and Multitask Learning

One very interesting but practical problem we call Transfer Learning, which suggests using knowledge
from one task to speed up performance in another. Copying weights from a pretrained model to a new
model, and fine tuning within this new model with data belonging to the new task. Faster learning
is made possible by the use of high quality models such as ResNet-50 or ROBERTA. Application
of transfer learning is highly critical for simulations and real world scenarios. Multitask Learning
reduces the training time of the model and allows models to understand and perform better by running
multiple objectives at the same time.

3 Applications
3.1 Computer Vision

AlexNet’s 2012 ImageNet competition breakthrough, which served as the core milestone initiating
deep learning revolutionizing computer vision, employed convolutional neural networks (CNNs),
since the 2D Cartesian arrays, or matrices, prescribed in CNNs naturally lend themselves to the
operations performed on 2D images. Today top-5 error rate for CNN in agriculture quality assessment
or self driving cars is below 2%.

3.2 Natural Language Processing (NLP)

NLP tasks such as machine translation and speech recognition have been improved greatly by deep
learning by allowing systems to be trained as one function and are more efficient and accurate than
there ever were before, and uses techniques like word embeddings.

3.3 Reinforcement Learning (RL)

Deep reinforcement learning is a combination of both deep learning and reinforcement learning
that finds an agent optimal behaviour through a reward signal. Despite its success in Atari and Go,
obtaining these properties consistently is difficult and it remains a thorny research area for which
there are limited commercial applications.

You might also like