Assignment_14_Modern_AI

The document discusses Recurrent Neural Networks (RNNs) and their limitations, particularly the vanishing and exploding gradient problems, which hinder their ability to learn long-term dependencies. It introduces Long Short-Term Memory (LSTM) networks as a solution, highlighting their architecture and advantages in tasks requiring memory retention. Additionally, it covers unsupervised learning, transfer learning, and applications in computer vision, natural language processing, and reinforcement learning.

Uploaded by

Ameen Aazam

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Assignment_14_Modern_AI

Uploaded by

Ameen Aazam

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Assignment - 14

Ameen Aazam
EE23BTECH11006

1 Recurrent Neural Networks (RNN)

Recurrent Neural Networks (RNNs) are neural networks which process sequential data using loops in
its computational graph so that they can keep a hidden state zt that changes over time. The hidden
state at time step t is computed using the equation:
zt = fw (zt−1 , xt ) = gz (Wz,z zt−1 + Wx,z xt ) (1)
The function updates the hidden state, while xt represents the input vector at time t. The weight
matrix for connections from previous state and current input is wx,z .
However, RNNs suffer from problems in training vocabulary due to gradient vanishing and exploding
problems that prevent it from learning dependencies over long sequences in sequential data.

1.1 Training a Basic RNN

Unrolled across time steps, an RNN is trained like a feed forward structure. The hidden layer updates
at each step based on the current input and the previous hidden state:
zt = gz (Wz,z zt−1 + Wx,z xt ) (2)
ŷt = gy (Wz,y zt ) (3)
Training uses backpropagation through time (BPTT), where gradients are computed across all time
steps. For example, the gradient for Wz,z is:
T
∂L X ∂zt
= −2(yt − ŷt )gy′ (iny,t )Wz,y (4)
∂Wz,z t=1
∂W z,z

One of the biggest challenges when training Recurrent Neural Networks (RNNs), is the vanishing
and exploding gradient problem. Learning long term dependencies can be difficult due to gradients
that shrink or grow exponentially. We find that basic RNNs are less suitable for tasks which require
information retention for long sequences, due to this limitation. To overcome these issues, advanced
architectures, such as LSTMs, help RNNs adapt to tasks where such information retention over long
sequences is necessary.

1.2 Long short-term memory RNNs

RNN limitations are overcome by Lstm networks with introduction of memory cells and gates. For
information flow, LSTM has its gates : forget (ft ), input (it ), output (ot ). Key equations governing
LSTMs are:
ft = σ(Wx,f xt + Wz,f zt−1 ) (5)
it = σ(Wx,i xt + Wz,i zt−1 ) (6)
ot = σ(Wx,o xt + Wz,o zt−1 ) (7)
ct = ft ⊙ ct−1 + it ⊙ tanh(Wx,c xt + Wz,c zt−1 ) (8)
zt = tanh(ct ) ⊙ ot (9)
The mechanisms which prevent gradient decay make LSTMs able to tolerate long term dependencies,
which is useful for speech recognition as well handwriting analysis etc.

Preprint. Under review.

2 Unsupervised Learning and Transfer Learning
Supervised learning is the main paradigm used by deep learning systems’ which in turn means
they need a lot of labeled data, therefore there has been an interest in trying to learn in the way of
unsupervised, transfer, and semi-supervised learning.

2.1 Unsupervised Learning

Text and images can be modeled with unsupervised learning using unlabeled data creating generative
models. Object recognition features and data samples probability distribution estimation are the
object of algorithms. Latent variables z, related to x, such as stroke thickness or color are represented
by joint models like PW (x, z).
• Probabilistic PCA: A simple generative model : A simple kind of model is probabilistic
principal components analysis (PPCA) where latent variable, z ∼ N (0, I) and observed data
x generated by x ∼ N (W z, σ 2 I). The overall likelyhood can be given by the following:
Z
PW (x) = PW (x, z) dz = N (x; 0, W W T + σ 2 I) (10)
Gradient methods and the EM algorithm both allow us to achieve W . PW (x) produces new
samples that can be used to generate new samples and low probability observations can be
flagged as an anomaly.
• Autoencoders : An autoencoder consists of:
Encoder f : x → ẑ (11)
Decoder g : ẑ → x (12)
2
P
This ties to minimise the reconstruction error j ∥xj − g(f (xj ))∥ . Linear autoencoder
uses a shared weight matrix W , learning the top m principal components.
In complex models the posterior distribution is approximated by variational methods. The
variational lower bound L is defined as:
L(x, Q) = log P (x) − DKL (Q(z)∥P (z | x)) (13)
Maximizing L approximates log P (x) using variational learning. The decoder g(z) models
log P (x|z), and the encoder f (x) defines parameters of Q.
• Deep Autoregressive Models : It is an autoregressive model in the sense that the prediction
of elements of the data vector x is based on other elements of that data vector, without
latent variables. Classical AR models are linear–Gaussian, for example. An alternative
to replacing linear models with deep networks is a deep autoregressive model, used in
DeepMind’s WaveNet application for example which generates speech from raw acoustic
signals via a nonlinear AR model.
• Generative Adversarial Networks (GANs) : A Generative Adversarial Network (GAN)
consists of two networks:
– Generator : Generates map from random values from a latent space z to generate
samples x resemling the distribution PW (x). It normally takes a unit Gaussian input.
– Discriminator : A classifier that tells between real data (from training set) and fake data
(generated by generator).
GANs are implicit generators that produce samples without explicit probability. Both train
at the same time, the generator tries to fool the discriminator and the discriminator classifies
the input correctly. If the generator perfectly replicates the training distribution, as they do
in the equilibrium state, it effectively removes any information about the source distribution.
• Unsupervised Translation : Unsupervised translation is a translation task that takes struc-
tured inputs x and produces structured outputs y without any paired input (x, y) data.
Supervised translation, traditionally, involves paired examples (e.g., human translated sen-
tences). Nevertheless, obtaining such pairs is seldom possible for tasks like converting a
night scene photo to daytime.
GANs are used to solve unsupervised translation problems, whereby we train one generator
to learn the distribution of reversed y given x, and another to learn x given y. With this
framework, it is able to generate plausible outputs without the need of specific pairs enabling
translation in a wide range of domains.

2
2.2 Transfer Learning and Multitask Learning

One very interesting but practical problem we call Transfer Learning, which suggests using knowledge
from one task to speed up performance in another. Copying weights from a pretrained model to a new
model, and fine tuning within this new model with data belonging to the new task. Faster learning
is made possible by the use of high quality models such as ResNet-50 or ROBERTA. Application
of transfer learning is highly critical for simulations and real world scenarios. Multitask Learning
reduces the training time of the model and allows models to understand and perform better by running
multiple objectives at the same time.

3 Applications
3.1 Computer Vision

AlexNet’s 2012 ImageNet competition breakthrough, which served as the core milestone initiating
deep learning revolutionizing computer vision, employed convolutional neural networks (CNNs),
since the 2D Cartesian arrays, or matrices, prescribed in CNNs naturally lend themselves to the
operations performed on 2D images. Today top-5 error rate for CNN in agriculture quality assessment
or self driving cars is below 2%.

3.2 Natural Language Processing (NLP)

NLP tasks such as machine translation and speech recognition have been improved greatly by deep
learning by allowing systems to be trained as one function and are more efficient and accurate than
there ever were before, and uses techniques like word embeddings.

3.3 Reinforcement Learning (RL)

Deep reinforcement learning is a combination of both deep learning and reinforcement learning
that finds an agent optimal behaviour through a reward signal. Despite its success in Atari and Go,
obtaining these properties consistently is difficult and it remains a thorny research area for which
there are limited commercial applications.

Proxy Know
No ratings yet
Proxy Know
136 pages
Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
Denon AVR-X530BT Service Manual
No ratings yet
Denon AVR-X530BT Service Manual
102 pages
5.1 CEMS - Manual - Forbes Marshal
100% (5)
5.1 CEMS - Manual - Forbes Marshal
209 pages
Introtodeeplearning MIT 6.S191
No ratings yet
Introtodeeplearning MIT 6.S191
36 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
16 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
Deep
No ratings yet
Deep
15 pages
Deep Generative Models
No ratings yet
Deep Generative Models
55 pages
IC Unit6 DeepLearning
No ratings yet
IC Unit6 DeepLearning
35 pages
Deep Learning u5
No ratings yet
Deep Learning u5
5 pages
DL ASMT-2
No ratings yet
DL ASMT-2
17 pages
DL unit 5 perfect pdf._1
No ratings yet
DL unit 5 perfect pdf._1
17 pages
CH4_AA1.1-Sequence Models (1)
No ratings yet
CH4_AA1.1-Sequence Models (1)
26 pages
GenAIWorkshop GEOMAR With Footnotes Final
No ratings yet
GenAIWorkshop GEOMAR With Footnotes Final
41 pages
For Seminar
No ratings yet
For Seminar
17 pages
Deep Learning Report for Students
No ratings yet
Deep Learning Report for Students
32 pages
Lecture1 ANN -Full
No ratings yet
Lecture1 ANN -Full
66 pages
Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
AN2DL_04_2324_RecurrentNeuralNetworks
No ratings yet
AN2DL_04_2324_RecurrentNeuralNetworks
34 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
11 pages
Unit 3 Questions With Answers Ghanta Ka Password
No ratings yet
Unit 3 Questions With Answers Ghanta Ka Password
20 pages
Unit IV
No ratings yet
Unit IV
22 pages
CSE 4237 SoftCom Solutions
No ratings yet
CSE 4237 SoftCom Solutions
115 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
deep learning u4
No ratings yet
deep learning u4
5 pages
unit-iv-v-deep-learning-material
No ratings yet
unit-iv-v-deep-learning-material
32 pages
RNNs
No ratings yet
RNNs
22 pages
SocrAI Day 4
No ratings yet
SocrAI Day 4
38 pages
IT641 RNN V2-Compressed
No ratings yet
IT641 RNN V2-Compressed
74 pages
nndl (2)
No ratings yet
nndl (2)
10 pages
Deep Learning For NLP
No ratings yet
Deep Learning For NLP
78 pages
Deep Learning Material
No ratings yet
Deep Learning Material
136 pages
11.RNN and Transformers
No ratings yet
11.RNN and Transformers
100 pages
Cs224n 2025 Lecture06 Fancy Rnn
No ratings yet
Cs224n 2025 Lecture06 Fancy Rnn
57 pages
Deep Learning
100% (3)
Deep Learning
32 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
Development of Deep Learning Architecture: Pantech Solutions & The Institution of Electronics and Telecommunication
No ratings yet
Development of Deep Learning Architecture: Pantech Solutions & The Institution of Electronics and Telecommunication
31 pages
Deep Learning in Neural Networks: An Overview
No ratings yet
Deep Learning in Neural Networks: An Overview
31 pages
RNN
No ratings yet
RNN
48 pages
Different Ann Algorithms
No ratings yet
Different Ann Algorithms
9 pages
lecture 11
No ratings yet
lecture 11
57 pages
deep learning questions
No ratings yet
deep learning questions
17 pages
Unit II
No ratings yet
Unit II
27 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Unit 4
No ratings yet
Unit 4
86 pages
Module 4
No ratings yet
Module 4
36 pages
Generative Adversarial Networks (Gans) : An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments
No ratings yet
Generative Adversarial Networks (Gans) : An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments
17 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Lec 1
No ratings yet
Lec 1
30 pages
AI_slide_2
No ratings yet
AI_slide_2
82 pages
AIDS II (1)
No ratings yet
AIDS II (1)
42 pages
Slides PyConfr Bordeaux Calcagno
No ratings yet
Slides PyConfr Bordeaux Calcagno
46 pages
Nn4ir PDF
No ratings yet
Nn4ir PDF
290 pages
1 AI_Introduction and ML
No ratings yet
1 AI_Introduction and ML
32 pages
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
No ratings yet
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
33 pages
Jntuk r20 Unit v Deep Learning Techniqueswwwjntumaterials
No ratings yet
Jntuk r20 Unit v Deep Learning Techniqueswwwjntumaterials
32 pages
Astro AI
No ratings yet
Astro AI
20 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Grade 9 Robotics Reviewer Quarter 2 PDF
No ratings yet
Grade 9 Robotics Reviewer Quarter 2 PDF
11 pages
8D Corrective Actions Report: Document Title: Document # Revision: Format Reviser
No ratings yet
8D Corrective Actions Report: Document Title: Document # Revision: Format Reviser
2 pages
Odyssey FAQs
No ratings yet
Odyssey FAQs
5 pages
Car Detection From Low-Altitude UAV Imagery With
No ratings yet
Car Detection From Low-Altitude UAV Imagery With
11 pages
Michael Dell Thesis
100% (3)
Michael Dell Thesis
7 pages
Africavotes.com
No ratings yet
Africavotes.com
16 pages
File Handling 1
No ratings yet
File Handling 1
72 pages
D410 Commissioning en-US
No ratings yet
D410 Commissioning en-US
198 pages
Zoom35 Pro Series: Manual Total Station
No ratings yet
Zoom35 Pro Series: Manual Total Station
2 pages
Machine Learning Report
No ratings yet
Machine Learning Report
58 pages
Rtu Exam Instructions
No ratings yet
Rtu Exam Instructions
6 pages
Name: Ghanshyam - S - Golhar Enroll No.: 2001027 Subject: AUTOCAD Practical 1 To 10
No ratings yet
Name: Ghanshyam - S - Golhar Enroll No.: 2001027 Subject: AUTOCAD Practical 1 To 10
24 pages
Cafe
No ratings yet
Cafe
25 pages
Data Science & Analytics Placement Assurance Program Brochure
No ratings yet
Data Science & Analytics Placement Assurance Program Brochure
19 pages
BUZ11A Datasheet
No ratings yet
BUZ11A Datasheet
9 pages
Comprehensive General Liability Claims Made Policy Wording
No ratings yet
Comprehensive General Liability Claims Made Policy Wording
6 pages
What Is A Neural Network
No ratings yet
What Is A Neural Network
3 pages
Rahul Malwade Resume
No ratings yet
Rahul Malwade Resume
2 pages
DIP Lab 13 DBSCAN Clustering
No ratings yet
DIP Lab 13 DBSCAN Clustering
6 pages
Translate Reverse Pitching
No ratings yet
Translate Reverse Pitching
8 pages
PBI E-Book
No ratings yet
PBI E-Book
122 pages
Analysis & Design of Algorithms: Introduction To Sorting
No ratings yet
Analysis & Design of Algorithms: Introduction To Sorting
9 pages
Pesquisa de mercado - 2859950 SAM ELECTRONICS AS10 TRANSCEIVER POWER SUPPLY NG3028G202 _ eBay
No ratings yet
Pesquisa de mercado - 2859950 SAM ELECTRONICS AS10 TRANSCEIVER POWER SUPPLY NG3028G202 _ eBay
5 pages
8051 Serialcommunication
No ratings yet
8051 Serialcommunication
49 pages
Report For Project Python
No ratings yet
Report For Project Python
5 pages
Binary Search
No ratings yet
Binary Search
10 pages
SAMx SafeWord 2008 EOS Announcement
No ratings yet
SAMx SafeWord 2008 EOS Announcement
6 pages