0% found this document useful (0 votes)

44 views

Lecture 6 Smaller Network: RNN: One X at A Time Re-Use The Same Edge Weights

This document discusses recurrent neural networks (RNNs) and how they address limitations of fully connected networks for sequential data. RNNs apply the same function to each element of a sequence, using the output from the previous step as input. This allows them to capture relationships over long sequences while maintaining a constant number of parameters. The document introduces LSTM and GRU variants that help address the vanishing gradient problem in standard RNNs through gating mechanisms.

Uploaded by

Chandra Shekhar Kadiyam

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Lecture 6 Smaller Network: RNN: One X at A Time Re-Use The Same Edge Weights

Uploaded by

Chandra Shekhar Kadiyam

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 39

Lecture 6 Smaller Network: RNN

This is our fully connected network. If x1 .... xn, n is very large and growing,
this network would become too large. We now will input one xi at a time,
and re-use the same edge weights.
Recurrent Neural Network
How does RNN reduce complexity?

 Given function f: h’,y=f(h,x) h and h’ are vectors with

the same dimension

y1 y2 y3

h0 f h1 f h2 f h3 ……

x1 x2 x3
No matter how long the input/output sequence is, we only need
one function f. If f’s are different, then it becomes a
feedforward NN. This may be treated as another compression
from fully connected network.
Deep RNN h’,y = f1(h,x), g’,z = f2(g,y)

…
z1 z2 z3

g0 f2 g1 f2 g2 f2 g3 ……

y1 y2 y3

h0 f1 h1 f1 h2 f1 h3 ……

x1 x2 x3
Bidirectional RNN y,h=f1(x,h) z,g = f2(g,x)
x1 x2 x3

g0 f2 g1 f2 g2 f2 g3

z1 z2 z3

p=f3(y,z) f3 p1 f3 p2 f3 p3

y1 y2 y3

h0 f1 h1 f1 h2 f1 h3

x1 x2 x3
Significantly speed up training
Pyramid RNN
 Reducing the number of time steps
Bidirectional
RNN

W. Chan, N. Jaitly, Q. Le and O. Vinyals, “Listen, attend and spell: A neural

network for large vocabulary conversational speech recognition,” ICASSP, 2016
Naïve RNN

y
h' Wh h Wi x

h f h'
y Wo h’ Note, y is computed
from h’
x softmax

We have ignored the bias

Problems with naive RNN

 When dealing with a time series, it tends

to forget old information. When there is a
distant relationship of unknown length, we
wish to have a “memory” to it.
 Vanishing gradient problem.
The sigmoid layer outputs numbers between 0-1 determine how much
each component should be let through. Pink X gate is point-wise multiplication.
LSTM This
Output
This gategate
sigmoid
decides
Controls
what info
what
Isdetermines how
to add to the cellmuch
state
goes into output
information goes thru

Ct-1

ht-1

Forget input
gate gate
The core idea is this cell
Why sigmoid or tanh:
state Ct, it is changed
Sigmoid: 0,1 gating as switch.
slowly, with only minor
Vanishing gradient problem in
linear interactions. It
LSTM is handled already. is very
easy for
ReLU information
replaces to flow
tanh ok?
along it unchanged.
it decides what component
is to be updated.
C’t provides change contents

Updating the cell state

Decide what part of the cell

state to output
RNN vs LSTM
Peephole LSTM

Allows “peeping into the memory”

Naïve RNN vs LSTM yt

yt ct-1 ct
LSTM
Naïve
ht-1 ht ht-1 ht
RNN

xt xt

c changes slowly ct is ct-1 added by something

h changes faster ht and ht-1 can be very different

These 4 matrix
computation should
be done concurrently.
xt
z W
ht-1

ct-1 xt
zi = Wi
σ( ht-1)
Controls Controls Updating Controls
forget gate input gate information Output gate xt
zf = Wf
σ(
ht-1)
zf zi z zo

xt
zo = Wo
σ(
h t-1
x t ht-1)

Information flow of LSTM

z =tanh( W ht-1 )

ct-1 ct-1
diagonal
“peephole” z z z
o f i obtained by the same way

zf zi z zo

ht-1 xt
Information flow of LSTM
Element-wise multiply

ct-1 ct
ct = zf  ct-1 + ziz

tanh ht = zo  tanh(ct)

yt = σ(W’ ht)

zf zi z zo

ht-1 xt ht

Information flow of LSTM

LSTM information flow

yt yt+1

ct-1 ct ct+1

tanh tanh

zf zi z zo zf zi z zo

ht+1
ht-1 xt ht xt+1

Information flow of LSTM

LSTM

GRU – gated recurrent unit

(more compression)

reset gate Update gate

It combines the forget and input into a single update gate.

It also merges the cell state and hidden state. This is simpler
than LSTM. There are many other variants too.

X,*: element-wise multiply

GRUs also takes xt and ht-1 as inputs. They perform some
calculations and then pass along ht. What makes them different
from LSTMs is that GRUs don't need the cell layer to pass values
along. The calculations within each iteration insure that the ht
values being passed along either retain a high amount of old
information or are jump-started with a high amount of new
information.
Feed-forward vs Recurrent Network
1. Feedforward network does not have input at each step
2. Feedforward network has different parameters for each layer

x f1 a1 f2 a2 f3 a3 f4 y

t is layer
a = ft(a ) = σ(W a + b )
t t-1 t t-1 t

h0 f h1 f h2 f h3 f g y4

x1 x2 x3 x4
t is time step

at= f(at-1, xt) = σ(Wh at-1 + Wixt + bi)

• Highway Network • Residual Network

at at

z controls red arrow

+ at-1
h’
h’
Gate copy
controller copy

at-1 at-1

Training Very Deep Networks Deep Residual Learning for Image

https://arxiv.org/pdf/1507.06228v2.p Recognition
df http://arxiv.org/abs/1512.03385
output layer output layer output layer

Highway Network automatically

determines the layers needed!
Input layer Input layer Input layer
Highway Network Experiments
Grid LSTM Memory for both
time and depth
depth
y a’ b’

c c’ c c’
LSTM Grid
LSTM
h h’ h h’

x a b

time
Grid LSTM
h' b'

a’ b’ c c'

a tanh a'
c c’
Grid
LSTM
h h’ zf zi z zo

a b
h b
You can generalize this to 3D, and more.
Applications of LSTM / RNN
Neural machine translation

LSTM
Sequence to sequence chat model
Chat with context
M: Hi
M: Hello
U: Hi
M: Hi

M: Hello U: Hi
Serban, Iulian V., Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle
Pineau, 2015 "Building End-To-End Dialogue Systems Using Generative Hierarchical
Baidu’s speech recognition using RNN
Attention
Image caption generation using attention
(From CY Lee lecture)
z0 is initial parameter, it is also learned
A vector for
each region
z0 match 0.7

CNN filter filter filter

filter filter filter

filter filter filter
Image Caption Generation
Word 1
A vector for
each region
z0 z1
Attention to
a region
weighted
CNN filter filter filter sum
filter filter filter
0.7 0.1 0.1
0.1 0.0 0.0

filter filter filter

filter filter filter
Image Caption Generation
Word 1 Word 2
A vector for
each region
z0 z1 z2

weighted
CNN filter filter filter sum
filter filter filter
0.0 0.8 0.2
0.0 0.0 0.0

filter filter filter

filter filter filter
Image Caption Generation

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron

Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show,
Attend and Tell: Neural Image Caption Generation with Visual Attention”,
ICML, 2015
Image Caption Generation

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron

Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show,
Attend and Tell: Neural Image Caption Generation with Visual Attention”,
ICML, 2015
* Possible project?

Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo

Larochelle, Aaron Courville, “Describing Videos by Exploiting Temporal Structure”, ICCV,
2015

02 02 2023 - Cse3009 Iot BK Zigbee
No ratings yet
02 02 2023 - Cse3009 Iot BK Zigbee
43 pages
Chapter 13
No ratings yet
Chapter 13
56 pages
Numerical Methods and Modelling
100% (7)
Numerical Methods and Modelling
343 pages
Feature Pyramid Networks For Object Detection
No ratings yet
Feature Pyramid Networks For Object Detection
9 pages
YOLOV8
No ratings yet
YOLOV8
13 pages
Automatic Cattle Identification Using YOLOv5 and M
No ratings yet
Automatic Cattle Identification Using YOLOv5 and M
8 pages
Yolov3: An Incremental Improvement: Joseph Redmon, Ali Farhadi
No ratings yet
Yolov3: An Incremental Improvement: Joseph Redmon, Ali Farhadi
6 pages
Is 15785 Lifts
No ratings yet
Is 15785 Lifts
13 pages
Yolov 5
No ratings yet
Yolov 5
9 pages
Chapter 7 - Neural-Networks
100% (1)
Chapter 7 - Neural-Networks
60 pages
Hough Transform
No ratings yet
Hough Transform
8 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
Object Detection Using Mask R-CNN
No ratings yet
Object Detection Using Mask R-CNN
5 pages
Reconfigurable Hardware Design Approach For Economic Neural Network
No ratings yet
Reconfigurable Hardware Design Approach For Economic Neural Network
5 pages
2 Convolutional Neural Network For Image Classification
No ratings yet
2 Convolutional Neural Network For Image Classification
6 pages
Yolo
No ratings yet
Yolo
10 pages
Unit 1 WSN
No ratings yet
Unit 1 WSN
139 pages
Digital Twin Technology A Bird Eye View
No ratings yet
Digital Twin Technology A Bird Eye View
7 pages
Unit 4
No ratings yet
Unit 4
47 pages
DAC
No ratings yet
DAC
14 pages
R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms
No ratings yet
R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms
11 pages
Image Enhancement
No ratings yet
Image Enhancement
144 pages
Artificial Intelligence: Agents and Environment
No ratings yet
Artificial Intelligence: Agents and Environment
27 pages
A Comprehensive Survey On IRS-Assisted NOMA-Based 6G Wireless Network Design Perspectives Challenges and Future Directions
100% (1)
A Comprehensive Survey On IRS-Assisted NOMA-Based 6G Wireless Network Design Perspectives Challenges and Future Directions
24 pages
Floyd-Warshall Algorithm
No ratings yet
Floyd-Warshall Algorithm
6 pages
Cpu Vs Gpu
No ratings yet
Cpu Vs Gpu
12 pages
Computer Organization and Architecture: UNIT-2
No ratings yet
Computer Organization and Architecture: UNIT-2
29 pages
MalenoV Code 5 Layer CNN 65x65x65 Voxels
No ratings yet
MalenoV Code 5 Layer CNN 65x65x65 Voxels
30 pages
Aim: To Convert CD To DVD Data Software: Matlab 2013A Theory
No ratings yet
Aim: To Convert CD To DVD Data Software: Matlab 2013A Theory
2 pages
PAPR Reduction in OFDM Using Low Complexity PTS
No ratings yet
PAPR Reduction in OFDM Using Low Complexity PTS
13 pages
IMAGE ENCRYPTION AND DECRYPTION USING BLOWFISH ALGORITHM Report
No ratings yet
IMAGE ENCRYPTION AND DECRYPTION USING BLOWFISH ALGORITHM Report
16 pages
Elevator Control Using Speech Recognition For People With Physical Disabilities
No ratings yet
Elevator Control Using Speech Recognition For People With Physical Disabilities
4 pages
Intro To Embedded Systems - AMIT - New
No ratings yet
Intro To Embedded Systems - AMIT - New
73 pages
3D U-Net Based Brain Tumor Segmentation
No ratings yet
3D U-Net Based Brain Tumor Segmentation
11 pages
DAA - Unit IV - Greedy Technique - Lecture Slides PDF
No ratings yet
DAA - Unit IV - Greedy Technique - Lecture Slides PDF
50 pages
76.multi Sensor-Smoke, Fire, Temperature, Gas, Metal and Intruder Based Security Robot Using Zigbee
0% (2)
76.multi Sensor-Smoke, Fire, Temperature, Gas, Metal and Intruder Based Security Robot Using Zigbee
4 pages
Machine Learning (CSC052P6G, CSC033U3M, CSL774, EEL012P5E) : Dr. Shaifu Gupta
No ratings yet
Machine Learning (CSC052P6G, CSC033U3M, CSL774, EEL012P5E) : Dr. Shaifu Gupta
18 pages
"Hospital Management System": Project Report On
No ratings yet
"Hospital Management System": Project Report On
19 pages
A Human-Detection Method Based On YOLOv5 and Trans
No ratings yet
A Human-Detection Method Based On YOLOv5 and Trans
12 pages
5CS3-01-ITC - Guess Paper @rawcoderz
No ratings yet
5CS3-01-ITC - Guess Paper @rawcoderz
45 pages
Research & Simulation - Network Simulations and Installation of NS2 and NS3
No ratings yet
Research & Simulation - Network Simulations and Installation of NS2 and NS3
2 pages
Unit II
No ratings yet
Unit II
13 pages
ATmega 8
No ratings yet
ATmega 8
96 pages
DIP Notes Unit-3
No ratings yet
DIP Notes Unit-3
57 pages
Handout 1 - Introduction To Setting Up Python
No ratings yet
Handout 1 - Introduction To Setting Up Python
49 pages
Smart Parking System Using Yolov3 Deep Learning Model: Major Project Report
No ratings yet
Smart Parking System Using Yolov3 Deep Learning Model: Major Project Report
26 pages
Text
No ratings yet
Text
131 pages
MATLab Tutorial #5 PDF
No ratings yet
MATLab Tutorial #5 PDF
7 pages
Super-Resolution of Document Images Using Transfer Deep Learning of An ESRGAN Model
No ratings yet
Super-Resolution of Document Images Using Transfer Deep Learning of An ESRGAN Model
6 pages
DSP CEN352 Ch2 Sampling
No ratings yet
DSP CEN352 Ch2 Sampling
29 pages
Enhanced Super-Resolution Using GAN
No ratings yet
Enhanced Super-Resolution Using GAN
6 pages
CNN Case Studies Unit 4
No ratings yet
CNN Case Studies Unit 4
13 pages
DL 1 - ComputerVision With PyTorch Notes
No ratings yet
DL 1 - ComputerVision With PyTorch Notes
304 pages
ANN - Ch2-Adaline and Madaline
100% (1)
ANN - Ch2-Adaline and Madaline
29 pages
Neural Network Presentation
100% (4)
Neural Network Presentation
33 pages
Socket Programming in Python
No ratings yet
Socket Programming in Python
7 pages
IoT Bluetooth
No ratings yet
IoT Bluetooth
80 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
38 pages
RNN, LSTM, Gru
No ratings yet
RNN, LSTM, Gru
36 pages
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
No ratings yet
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
71 pages
LCTM and Gru
No ratings yet
LCTM and Gru
62 pages
B-Spline & NURBS Curves
No ratings yet
B-Spline & NURBS Curves
94 pages
Fortran CF D
No ratings yet
Fortran CF D
160 pages
Mat 575
No ratings yet
Mat 575
6 pages
HW1 250719
100% (1)
HW1 250719
6 pages
10 1007/b136278 PDF
No ratings yet
10 1007/b136278 PDF
1,426 pages
Design & Analysis of Algorithms (Book)
100% (1)
Design & Analysis of Algorithms (Book)
263 pages
Class - IX Mathematics (Ex. 2.1) Answers
No ratings yet
Class - IX Mathematics (Ex. 2.1) Answers
6 pages
DATA STRUCTURE G.W-1
No ratings yet
DATA STRUCTURE G.W-1
12 pages
M3 DS21-Data Mining Dan Statistik - Rev
No ratings yet
M3 DS21-Data Mining Dan Statistik - Rev
101 pages
H Cluster
No ratings yet
H Cluster
4 pages
Mcs 031 qns2 Image: Travelling Salesman Problem Previous Post
No ratings yet
Mcs 031 qns2 Image: Travelling Salesman Problem Previous Post
5 pages
Lagrange Multiplier Complete
No ratings yet
Lagrange Multiplier Complete
22 pages
Deep Learning
100% (1)
Deep Learning
49 pages
Daa Cse 5TH Sem
No ratings yet
Daa Cse 5TH Sem
1 page
Unit 6 - Week 5: Assignment 5
No ratings yet
Unit 6 - Week 5: Assignment 5
3 pages
Finite Element Method
No ratings yet
Finite Element Method
4 pages
07cp18 Neural Networks and Applications 3 0 0 100
No ratings yet
07cp18 Neural Networks and Applications 3 0 0 100
2 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
3 pages
Instant download An introduction to numerical methods: a MATLAB approach 4th Edition Guenther pdf all chapter
100% (3)
Instant download An introduction to numerical methods: a MATLAB approach 4th Edition Guenther pdf all chapter
62 pages
Matric Assignment-1 Solution
No ratings yet
Matric Assignment-1 Solution
4 pages
Math's Chapter-9 (Alzebaric Expression and Identity) Solutions by Quanta Learning Academy
No ratings yet
Math's Chapter-9 (Alzebaric Expression and Identity) Solutions by Quanta Learning Academy
6 pages
MS4032 - 01 - Introduction To Finite Element
No ratings yet
MS4032 - 01 - Introduction To Finite Element
52 pages
Numerical Differentiation
No ratings yet
Numerical Differentiation
2 pages
Or (Bba-405)
No ratings yet
Or (Bba-405)
4 pages
5.4 Dividing Polynomials and Synthetic Division
No ratings yet
5.4 Dividing Polynomials and Synthetic Division
20 pages
Cee 6550 Ftcs CN Diffusion
No ratings yet
Cee 6550 Ftcs CN Diffusion
12 pages
1 Soft Computing
No ratings yet
1 Soft Computing
8 pages
Numerical Method Lab
No ratings yet
Numerical Method Lab
11 pages
Broyden's Method For Self-Consistent Field Convergence Acceleration
No ratings yet
Broyden's Method For Self-Consistent Field Convergence Acceleration
6 pages

Lecture 6 Smaller Network: RNN: One X at A Time Re-Use The Same Edge Weights

Uploaded by

Lecture 6 Smaller Network: RNN: One X at A Time Re-Use The Same Edge Weights

Uploaded by

Lecture 6 Smaller Network: RNN

 Given function f: h’,y=f(h,x) h and h’ are vectors with

W. Chan, N. Jaitly, Q. Le and O. Vinyals, “Listen, attend and spell: A neural

We have ignored the bias

 When dealing with a time series, it tends

Updating the cell state

Decide what part of the cell

Allows “peeping into the memory”

c changes slowly ct is ct-1 added by something

h changes faster ht and ht-1 can be very different

Information flow of LSTM

Information flow of LSTM

Information flow of LSTM

GRU – gated recurrent unit

reset gate Update gate

It combines the forget and input into a single update gate.

X,*: element-wise multiply

at= f(at-1, xt) = σ(Wh at-1 + Wixt + bi)

We will turn the recurrent network 90 degrees.

• Highway Network • Residual Network

z controls red arrow

Training Very Deep Networks Deep Residual Learning for Image

Highway Network automatically

CNN filter filter filter

filter filter filter

filter filter filter

filter filter filter

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron

Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo

You might also like