0% found this document useful (0 votes)

403 views

Deep Learning Book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

This document summarizes key aspects of deep feedforward neural networks discussed in Chapter 6 of the Deep Learning book. It first compares linear classifiers, shallow neural networks with one hidden layer, and deep neural networks, noting that deep networks can represent functions using relatively small hidden layers compared to shallow networks. It then discusses hyperparameters like depth, hidden layer sizes, and activation functions that define a network architecture. Finally, it discusses training parameters like weights and biases that must be optimized to learn functions from data.

Uploaded by

HarshitShukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

403 views

Deep Learning Book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Uploaded by

HarshitShukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Deep Learning book, by Ian Goodfellow,

Yoshua Bengio and Aaron Courville

Chapter 6 :Deep Feedforward Networks

Benoit Massé Dionyssos Kounades-Bastian

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 1 / 25

Linear regression (and classication)
Input vector x
Output vector y
Parameters Weight W and bias b

Prediction : y = W> x + b

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25

Linear regression (and classication)
Input vector x
Output vector y
Parameters Weight W and bias b

Prediction : y = W> x + b

W b
x u y

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25

Linear regression (and classication)
Input vector x
Output vector y
Parameters Weight W and bias b

Prediction : y = W> x + b

x1 W11 . . . W23 b1
u1 y1
x2 b2
u2 y2
x3
Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25
Linear regression (and classication)
Input vector x
Output vector y
Parameters Weight W and bias b

Prediction : y = W> x + b

x1 W, b
y1
x2
y2
x3
Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25
Linear regression (and classication)

Advantages
Easy to use
Easy to train, low risk of overtting

Drawbacks
Some problems are inherently non-linear

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 3 / 25

Solving XOR

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 4 / 25

Solving XOR

Linear regressor

x1 W, b
y
x2
There is no value for W and b
such that ∀(x1 , x2 ) ∈ {0, 1}2

x1
W >
+ b = xor (x1 , x2 )
x2

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 4 / 25

Solving XOR

What about... ?
W, b
x1 u1 V, c
y
x2 u2

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 5 / 25

Solving XOR

What about... ?
W, b
x1 u1 V, c
y
x2 u2
Strictly equivalent :
The composition of two linear operation is still a linear
operation

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 5 / 25

Solving XOR
And about... ?
W, b φ
x1 u1 h1 V, c
y
x2 u2 h2
In which φ(x ) = max {0, x }

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 6 / 25

Solving XOR
And about... ?
W, b φ
x1 u1 h1 V, c
y
x2 u2 h2
In which φ(x ) = max {0, x }

It is possible !
1 1 0 1

With W = ,b= ,V= and c = 0,
1 1 −1 −2

Vφ(Wx + b) = xor (x1 , x2 )

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 6 / 25
Neural network with one hidden layer

Compact representation
W , b, φ V, c
x h y

Neural network
Hidden layer with non-linearity
→ can represent broader class of function

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 7 / 25

Universal approximation theorem

Theorem
A neural network with one hidden layer can approximate any
continuous function

More formally, given a continuous function f : Cn 7→ Rm where

n
Cn is a compact subset of R ,

K
:x→ vi φ(wi > x + bi ) + c
X
ε
∀ε, ∃fNN
i =1
such that
ε
∀x ∈ Cn , ||f (x ) − fNN (x )|| < ε

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 8 / 25

Problems

Obtaining the network

The universal theorem gives no information about HOW to
obtain such a network
Size of the hidden layer h
Values of W and b

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 9 / 25

Problems

Obtaining the network

The universal theorem gives no information about HOW to
obtain such a network
Size of the hidden layer h
Values of W and b
Using the network
Even if we nd a way to obtain the network, the size of the
hidden layer may be prohibitively large.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 9 / 25

Deep neural network
Why Deep ?
Let's stack l hidden layers one after the other; l is called the
length of the network.

W 1 , b1 , φ W 2 , b2 , φ W l , bl , φ V, c
x h1 ... hl y

Properties of DNN
The universal approximation theorem also apply
Some functions can be approximated by a DNN with N
hidden unit, and would require O(e N ) hidden units to be
represented by a shallow network.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 10 / 25

Summary
Comparison
Linear classier
− Limited representational power

+ Simple

Shallow Neural network (Exactly one hidden layer)

+ Unlimited representational power

− Sometimes prohibitively wide

Deep Neural network

+ Unlimited representational power

+ Relatively small number of hidden units needed

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 11 / 25

Summary
Comparison
Linear classier
− Limited representational power

+ Simple

Shallow Neural network (Exactly one hidden layer)

+ Unlimited representational power

− Sometimes prohibitively wide

Deep Neural network

+ Unlimited representational power

+ Relatively small number of hidden units needed

Remaining problem
How to get this DNN ?

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 11 / 25

On the path of getting my own DNN

Hyperparameters
First, we need to dene the architecture of the DNN
The depth l
The size of the hidden layers n1 , . . . , nl
The activation function φ
The output unit

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 12 / 25

On the path of getting my own DNN

Hyperparameters
First, we need to dene the architecture of the DNN
The depth l
The size of the hidden layers n1 , . . . , nl
The activation function φ
The output unit

Parameters
When the architecture is dened, we need to train the DNN
W1 , b1 , . . . , Wl , bl

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 12 / 25

Hyperparameters

The depth l
The size of the hidden layers n1 , . . . , nl
Strongly depend on the problem to solve

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 13 / 25

Hyperparameters

The depth l
The size of the hidden layers n1 , . . . , nl
Strongly depend on the problem to solve

The activation function φ

g : x 7→ max { , xx } 1
σ : x 7→ ( + e )
ReLU 0

Sigmoid 1
− −

Many others : tanh, RBF, softplus...

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 13 / 25

Hyperparameters

The depth l
The size of the hidden layers n1 , . . . , nl
Strongly depend on the problem to solve

The activation function φ

g : x 7→ max { , xx } 1
σ : x 7→ ( + e )
ReLU 0

Sigmoid 1
− −

Many others : tanh, RBF, softplus...

The output unit

Linear outputE[y] = V> hl + c
For regression with Gaussian distribution y ∼ N (E[y], I)
ŷ
Sigmoid output = σ(w> hl + ) b
For classication with Bernouilli distribution P (y = 1|x) = ŷ

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 13 / 25

Parameters Training

Objective
Let's dene θ = (W1 , b1 , . . . , Wl , bl ).
We suppose we have a set of inputs X = (x1 , . . . , xN ) and a
set of expected outputs Y = (y1 , . . . , yN ). The goal is to nd
a neural network fNN such that
∀i , fNN (x i , θ) ' yi .

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 14 / 25

Parameters Training

Cost function
To evaluate the error that our current network makes, let's
dene a cost function L(X, Y, θ). The goal becomes to nd
argmin L(X, Y, θ)
θ

Loss function
Should represent a combination of the distances between every
yi and the corresponding fNN (xi , θ)
Mean square error (rare)
Cross-entropy

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 15 / 25

Parameters Training

Find the minimum

The basic idea consists in computing θ̂ such that
∇θ L(X, Y, θ̂) = 0.

This is dicult to solve analytically e.g. when θ have millions

of degrees of freedom.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 16 / 25

Parameters Training

Gradient descent
Let's use a numerical way to optimize θ, called the gradient
descent (section 4.3). The idea is that
f (θ − εu) ' f (θ) − εu> ∇f (θ)

So if we take u = ∇f (θ), we have u> u > 0 and then

f (θ − εu) ' f (θ) − εu> u < f (θ).

If f is a function to minimize, we have an update rule that

improves our estimate.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 17 / 25

Parameters Training

Gradient descent algorithm

1
Have an estimate θ̂ of the parameters
2
Compute ∇θ L(X, Y, θ̂)
3
Update θ̂ ←− θ̂ − ε∇θ L
4
Repeat step 2-3 until ∇θ L < threshold

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 18 / 25

Parameters Training

Gradient descent algorithm

1
Have an estimate θ̂ of the parameters
2
Compute ∇θ L(X, Y, θ̂)
3
Update θ̂ ←− θ̂ − ε∇θ L
4
Repeat step 2-3 until ∇θ L < threshold

Problem
How to estimate eciently ∇θ L(X, Y, θ̂) ?
Back-propagation algorithm

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 18 / 25

Back-propagation for Parameter Learning

Consider the architecture:

w 2 w 1

y φ φ
x
with function:

y =φ w2 φ(w1 x ) ,
N
some training pairs T = x̂n , ŷn n=1 , and

an activation-function φ().
Learn w1 , w2 so that: Feeding x̂n results ŷn .

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 19 / 25

Prerequisite: dierentiable activation function

For learning to be possible φ() has to be dierentiable.

Let φ0 (x ) = ∂φ(x ) denote the derivative of φ(x ).
∂x
For example when φ(x ) = Relu(x ) we have:

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 20 / 25

Gradient-based Learning

Minimize the loss function L(w1 , w2 , T ).

We will learn the weights by iterating:
 
updated ∂L
 ∂ w1 

w1 w1
= −γ , (1)
w2 w2 ∂L
∂ w2

L is the loss function (must be dierentiable): In detail is

L(w1 , w2 , T ) and we want to compute the gradient(s) at
w1 , w2 .

γ is the learning rate (a scalar typically known).

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 21 / 25

Back-propagation

Calculate intermediate The partial derivatives are:

values on all units:
∂L(d )
∂d = L0 ( ).d
a = w1x̂n
6

1 .
∂d
= φ0 ( w2φ(w1x̂n ))
b = φ(w1x̂n ) ∂c
7 .
.
∂c
w2
2

c = w2φ(w1x̂n )
8
∂b = .

w1x̂n )
3 .
∂b
d = φ w2φ(w1x̂n ) ∂a = φ0 ( .
9

4 .

L(d ) = L φ w2 φ(w1 x̂n )

5 .

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 22 / 25

Calculating the Gradients I

Apply chain rule:

∂L ∂L ∂ d ∂ c ∂ b ∂ a
= ,
∂ w1 ∂ d ∂ c ∂ b ∂ a w1

∂L(d ) ∂L(d ) ∂ d ∂ c
= .
∂ w2 ∂ d ∂ c w2

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 23 / 25

Calculating the Gradients I

Apply chain rule:

∂L ∂L ∂ d ∂ c ∂ b ∂ a
= ,
∂ w1 ∂ d ∂ c ∂ b ∂ a w1

∂L(d ) ∂L(d ) ∂ d ∂ c
= .
∂ w2 ∂ d ∂ c w2

Start the calculation from left-to-right.

We propage the gradients (partial products) from the last
layer towards the input.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 23 / 25

Calculating the Gradients

And because we have N training pairs:

N
∂L X ∂L(dn ) ∂ dn ∂ cn ∂ bn ∂ an
= ,
∂ w1 n=1 ∂ dn ∂ cn ∂ bn ∂ an w1

N
∂L X ∂L(dn ) ∂ dn ∂ cn
= .
∂ w2 n=1 ∂ dn ∂ cn w2

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 24 / 25

Thank you !

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 25 / 25

Essentials of Swedish Grammar PDF
100% (12)
Essentials of Swedish Grammar PDF
159 pages
J. C. Yannopoulos (Auth.) - The Extractive Metallurgy of Gold-Springer US (1991)
100% (6)
J. C. Yannopoulos (Auth.) - The Extractive Metallurgy of Gold-Springer US (1991)
284 pages
ND Complete11092009
100% (3)
ND Complete11092009
63 pages
A2.2 DNN Update 2
No ratings yet
A2.2 DNN Update 2
51 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
DL-2
No ratings yet
DL-2
62 pages
week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
week 03-04 - Deep Feedforward Networks - Intro
141 pages
Contents MLP PDF
No ratings yet
Contents MLP PDF
60 pages
Unit II
No ratings yet
Unit II
56 pages
Module 2
No ratings yet
Module 2
44 pages
MODULE 2 DL SNOTES P1
No ratings yet
MODULE 2 DL SNOTES P1
16 pages
DL - M2 - Deep Feedforward NN
No ratings yet
DL - M2 - Deep Feedforward NN
97 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
Deep Learning Algorithms Report PDF
No ratings yet
Deep Learning Algorithms Report PDF
11 pages
DNN - M2 - Deep Feedforward NN 23dec
No ratings yet
DNN - M2 - Deep Feedforward NN 23dec
97 pages
Ch06 Deep Feedforward Networks
No ratings yet
Ch06 Deep Feedforward Networks
90 pages
DL 02 Deep Forward Networks
No ratings yet
DL 02 Deep Forward Networks
47 pages
What is Gradient Based Learning in Deep Learning
No ratings yet
What is Gradient Based Learning in Deep Learning
12 pages
Deep Learning
100% (1)
Deep Learning
49 pages
6COM1044 Deep Learning 1
No ratings yet
6COM1044 Deep Learning 1
49 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
Lecture Slides
No ratings yet
Lecture Slides
30 pages
Sparse Autoencoder
No ratings yet
Sparse Autoencoder
15 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Lecture2
No ratings yet
Lecture2
67 pages
Unit-3
No ratings yet
Unit-3
16 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Ch2 - Fundamental of Deep Learning
No ratings yet
Ch2 - Fundamental of Deep Learning
33 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Deep Learning Basics Lecture 1 Feedforward
No ratings yet
Deep Learning Basics Lecture 1 Feedforward
31 pages
DL Intro
No ratings yet
DL Intro
64 pages
DEEP LEARNING
No ratings yet
DEEP LEARNING
38 pages
Deep Feedforward Networks
No ratings yet
Deep Feedforward Networks
103 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
AN2DL_02_2324_Perceptron_2_FeedForward
No ratings yet
AN2DL_02_2324_Perceptron_2_FeedForward
55 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
THE_DEEP_NEURAL_NETWORK-A_REVIEW
No ratings yet
THE_DEEP_NEURAL_NETWORK-A_REVIEW
5 pages
DS303_NN
No ratings yet
DS303_NN
20 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Oversampling Techniques Deep Belief Network DenseNets DNN
No ratings yet
Oversampling Techniques Deep Belief Network DenseNets DNN
23 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
No ratings yet
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
57 pages
6.1 DeepFFNets
No ratings yet
6.1 DeepFFNets
47 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
Deep Learning: Hung-yi Lee 李宏毅
No ratings yet
Deep Learning: Hung-yi Lee 李宏毅
29 pages
Deep learning
No ratings yet
Deep learning
15 pages
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
No ratings yet
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
65 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Introduction to Minimax
From Everand
Introduction to Minimax
V. F. Dem’yanov
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Tablet Publisher: - Instructions
No ratings yet
Tablet Publisher: - Instructions
18 pages
Algorithms Illuminated: Part 2: Graph Algorithms and Data Structures Tim Roughgarden
No ratings yet
Algorithms Illuminated: Part 2: Graph Algorithms and Data Structures Tim Roughgarden
28 pages
Study and Practice of Yoga Volume-I
No ratings yet
Study and Practice of Yoga Volume-I
8 pages
Graph Traversals: Slides by Carl Kingsford
No ratings yet
Graph Traversals: Slides by Carl Kingsford
52 pages
Teaching Philosophy Guidelines PDF
No ratings yet
Teaching Philosophy Guidelines PDF
14 pages
Glimpses of Vedic Literature
100% (1)
Glimpses of Vedic Literature
255 pages
1york Yhe 5005747 Ytg A 0216
No ratings yet
1york Yhe 5005747 Ytg A 0216
70 pages
Computer Science Major Worksheet 2016 17 PDF
No ratings yet
Computer Science Major Worksheet 2016 17 PDF
2 pages
YTG J 1009SunlineXP
No ratings yet
YTG J 1009SunlineXP
73 pages
Ytg PDF
No ratings yet
Ytg PDF
2 pages
Modern Placer Mining
No ratings yet
Modern Placer Mining
36 pages
YG Core Competency Framework
100% (1)
YG Core Competency Framework
15 pages
Manual Chiller York Model Ytg 3
No ratings yet
Manual Chiller York Model Ytg 3
3 pages
Manual Chiller York Model Ytg 2
No ratings yet
Manual Chiller York Model Ytg 2
3 pages
Synchronous Digital Hierarchy (SDH) Tutorial: 1. Introduction: Emergence of SDH
No ratings yet
Synchronous Digital Hierarchy (SDH) Tutorial: 1. Introduction: Emergence of SDH
15 pages
Raph Raversals: A B C D
No ratings yet
Raph Raversals: A B C D
22 pages
Manual Chiller York Model Ytg 6
100% (1)
Manual Chiller York Model Ytg 6
2 pages
Pharmacology Literature Review
No ratings yet
Pharmacology Literature Review
7 pages
Special Crime Investigation. Prelim
No ratings yet
Special Crime Investigation. Prelim
8 pages
Wesleyan and Quaker Perfectionism
No ratings yet
Wesleyan and Quaker Perfectionism
12 pages
09 Leanoejspring 14
No ratings yet
09 Leanoejspring 14
9 pages
Questions and Answers
No ratings yet
Questions and Answers
141 pages
Athiroh PDF
No ratings yet
Athiroh PDF
7 pages
Stoler Embodying Colonial Memories
No ratings yet
Stoler Embodying Colonial Memories
16 pages
Glencoe Osmosis Virtual Lab
No ratings yet
Glencoe Osmosis Virtual Lab
3 pages
Adj Phrase
No ratings yet
Adj Phrase
19 pages
SUSTAINABLE DEVELOPMENT & GREEN TECHNOLOGY IN AFRICA
No ratings yet
SUSTAINABLE DEVELOPMENT & GREEN TECHNOLOGY IN AFRICA
111 pages
BELLA + CANVAS - 3413 - Spec - Sheet PDF
No ratings yet
BELLA + CANVAS - 3413 - Spec - Sheet PDF
1 page
Regita Adhi Pramesiatari - Application Letter (UAS)
No ratings yet
Regita Adhi Pramesiatari - Application Letter (UAS)
3 pages
5 Preparing Evidence Plan
No ratings yet
5 Preparing Evidence Plan
1 page
Unit 1. Globalisation and Impact On Law-Easy
No ratings yet
Unit 1. Globalisation and Impact On Law-Easy
40 pages
San Beda Insurance Code Bar Reviewer
100% (2)
San Beda Insurance Code Bar Reviewer
26 pages
Horde (Hedge Supplement #1)
No ratings yet
Horde (Hedge Supplement #1)
56 pages
(Lippincott Manual Series) Canan Avunduk - Manual of Gastroenterology - Diagnosis and Therapy-LWW (2008)
100% (1)
(Lippincott Manual Series) Canan Avunduk - Manual of Gastroenterology - Diagnosis and Therapy-LWW (2008)
528 pages
KD 3.1 Family Members Pronouns
No ratings yet
KD 3.1 Family Members Pronouns
5 pages
Extended - Mathematics - Markscheme May 2016
No ratings yet
Extended - Mathematics - Markscheme May 2016
24 pages
The Princeton Review Cat Sample Paper 3
No ratings yet
The Princeton Review Cat Sample Paper 3
34 pages
WH Questions.
No ratings yet
WH Questions.
2 pages
Plasticity in Structural Engineering Fundamentals and Applications
No ratings yet
Plasticity in Structural Engineering Fundamentals and Applications
10 pages
College Professor Resume
100% (1)
College Professor Resume
2 pages
Lost Autonomy, Nationalism and Separatism David S. Siroky and John Cuffe
No ratings yet
Lost Autonomy, Nationalism and Separatism David S. Siroky and John Cuffe
32 pages
The Linkedin Sophisticated Marketers Guide v1 New
100% (1)
The Linkedin Sophisticated Marketers Guide v1 New
55 pages
Interpersonal Relationship Essay
100% (2)
Interpersonal Relationship Essay
4 pages
Vogel - Taxila. An Illustrated Account of Archaeological Excavations Carried Out at Taxila
No ratings yet
Vogel - Taxila. An Illustrated Account of Archaeological Excavations Carried Out at Taxila
14 pages

Deep Learning Book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Uploaded by

Deep Learning Book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Uploaded by

Deep Learning book, by Ian Goodfellow,

Yoshua Bengio and Aaron Courville

Benoit Massé Dionyssos Kounades-Bastian

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 1 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 3 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 4 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 4 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 5 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 5 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 6 / 25

Vφ(Wx + b) = xor (x1 , x2 )

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 7 / 25

More formally, given a continuous function f : Cn 7→ Rm where

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 8 / 25

Obtaining the network

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 9 / 25

Obtaining the network

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 9 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 10 / 25

Shallow Neural network (Exactly one hidden layer)

− Sometimes prohibitively wide

Deep Neural network

+ Relatively small number of hidden units needed

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 11 / 25

Shallow Neural network (Exactly one hidden layer)

− Sometimes prohibitively wide

Deep Neural network

+ Relatively small number of hidden units needed

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 11 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 12 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 12 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 13 / 25

The activation function φ

Many others : tanh, RBF, softplus...

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 13 / 25

The activation function φ

Many others : tanh, RBF, softplus...

The output unit

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 13 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 14 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 15 / 25

Find the minimum

This is dicult to solve analytically e.g. when θ have millions

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 16 / 25

So if we take u = ∇f (θ), we have u> u > 0 and then

If f is a function to minimize, we have an update rule that

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 17 / 25

Gradient descent algorithm

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 18 / 25

Gradient descent algorithm

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 18 / 25

Consider the architecture:

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 19 / 25

For learning to be possible φ() has to be dierentiable.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 20 / 25

Minimize the loss function L(w1 , w2 , T ).

L is the loss function (must be dierentiable): In detail is

γ is the learning rate (a scalar typically known).

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 21 / 25

Calculate intermediate The partial derivatives are:

L(d ) = L φ w2 φ(w1 x̂n )

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 22 / 25

Apply chain rule:

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 23 / 25

Apply chain rule:

Start the calculation from left-to-right.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 23 / 25

And because we have N training pairs:

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 24 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 25 / 25

You might also like

This is dicult to solve analytically e.g. when θ have millions

For learning to be possible φ() has to be dierentiable.

L is the loss function (must be dierentiable): In detail is