0% found this document useful (0 votes)

24 views

Ch4 - Multilayer Perceptron

The document describes multilayer perceptrons and the backpropagation algorithm. It discusses how multilayer perceptrons use multiple layers, including an input layer, one or more hidden layers, and an output layer. The backpropagation algorithm is used to calculate the gradient of the neural network parameters by traversing the network in reverse order from the output to the input layer. The algorithm stores intermediate variables to calculate the gradient with respect to the parameters in order to update the weights and biases using gradient descent.

Uploaded by

Đặng Anh Khoa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Ch4 - Multilayer Perceptron

Uploaded by

Đặng Anh Khoa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Chapter 4

Multilayer Perceptron

1
Multilayer Perceptron Chapter 4
Multilayer Perceptrons
𝑾(1) 𝑾(2) 𝑾(3) Hidden
Unit

Input .. Output
Unit . Unit
..
. ..
.

Input Hidden Hidden Output

layer layer 1 layer 2 layer

Minibatch 𝑿 ∈ ℝ𝑛×𝑑 of 𝑛 examples where each example has 𝑑 inputs (features)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 2 Duong Van Tu

Multilayer Perceptron Chapter 4
Multilayer Perceptrons

Output of unit
Input of unit

Matrix of weight

𝑙𝑡ℎ layer
𝑗𝑡ℎ unit
bias

Vector of
bias
Vector of input
Vector of output Activate function
HCM City Univ. of Technology, Faculty of Mechanical Engineering 3 Duong Van Tu
Multilayer Perceptron Chapter 4
Multilayer Perceptrons
𝑙
𝑤𝑖𝑗 is the weight of the connection between the 𝑖𝑡ℎ unit of the (𝑙 − 1)𝑡ℎ layer and
the 𝑗𝑡ℎ unit of the 𝑙𝑡ℎ layer.
𝑏𝑖𝑙 is the bias of the 𝑖𝑡ℎ unit of the 𝑙𝑡ℎ layer.
𝑾 – Matrix of weight.
𝒃 – Matrix of bias.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 4 Duong Van Tu

Multilayer Perceptron Chapter 4
Multilayer Perceptrons
Example of an MLP with a hidden layer of 5 hidden units.

This MLP has 4 inputs, 3 outputs, and its hidden layer contains 5 hidden units
HCM City Univ. of Technology, Faculty of Mechanical Engineering 5 Duong Van Tu
Multilayer Perceptron Chapter 4
Activation function
The output of the 𝑖𝑡ℎ unit belongs to the 𝑙𝑡ℎ layer is calculated by:
(𝑙) 𝑙 𝑙−1 𝑙
𝑎𝑖 = 𝑓(𝑤𝑖 𝑎 + 𝑏𝑖 )
Where 𝑓(. ) is the nonlinear activation function.
In vector form, the output of all units of the 𝑙𝑡ℎ layer is calculated by:
𝒂(𝑙) = 𝑓(𝑾 𝑙 𝒂 𝑙−1 +𝒃 𝑙 )
Sign function
It is noted that the sign function should not be used in MLP.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 6 Duong Van Tu

Multilayer Perceptron Chapter 4
Activation function
Sigmoid function

1
𝜎 𝑥 =
1 + 𝑒 −𝑥
Sigmoid saturate and kill gradients.
A very undesirable property of the sigmoid
neuron is that when the neuron’s activation
saturates at either tail of 0 or 1, the gradient
at these regions is almost zero.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 7 Duong Van Tu

Multilayer Perceptron Chapter 4
Activation function
Tanh function

1 − 𝑒 −2𝑥
𝑡𝑎𝑛ℎ 𝑥 = −2𝑥
= 2𝜎 2𝑥 − 1
1+𝑒
Like the sigmoid neuron, its activations
saturate, but unlike the sigmoid neuron its
output is zero-centered.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 8 Duong Van Tu

Multilayer Perceptron Chapter 4
Activation function
ReLu function
The Rectified Linear Unit has become very popular
in the last few years.
𝑓 𝑥 = max(0, 𝑥)
Non-differentiable at zero and ReLU is unbounded.
The gradients for negative input are zero, which
means for activations in that region, the weights
are not updated during backpropagation.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 9 Duong Van Tu

Multilayer Perceptron Chapter 4
Forward propagation
Forward propagation (or forward pass) refers to the calculation and storage of
intermediate variables (including outputs) for a neural network in order from the
input layer to the output layer.
𝑇
𝐱 = 𝑥1 𝑥2
(1) (1) (1) (2) (2)
(1) 𝑎1 = 𝑓1 (𝑤1,1 𝑥1 + 𝑤2,1 𝑥2 + b1 ) 𝑦ො = 𝑓2 (𝑤1,1 𝑎1 + 𝑤2,1 𝑎2 + 𝑏 (2) )
𝑤1,1 𝑛1
𝑥1 (2)
𝑤1,1
(1) (1)
𝑤2,1 𝑤1,2 (2)
𝑥2 𝑤2,1
(1) 𝑛2
𝑤2,2 (1) (1)
𝑎2 = 𝑓1 (𝑤1,2 𝑥1 + 𝑤2,2 𝑥2 + b2 )
(1)

Input layer hidden layer 1 output layer

HCM City Univ. of Technology, Faculty of Mechanical Engineering 10 Duong Van Tu

Multilayer Perceptron Chapter 4
Chain Rule Gradient
We supposes that 𝑤 is a function of 𝑥, 𝑦 and that 𝑥, 𝑦 are functions of 𝑢, 𝑣. That is,
𝑤 = 𝑓 𝑥, 𝑦 , 𝑥 = 𝑔 𝑢, 𝑣 , 𝑦 = ℎ(𝑢, 𝑣)
The use of the term chain comes because to compute w we need to do a chain of
computations
(𝑢, 𝑣) → (𝑥, 𝑦) → 𝑤

We will say 𝑤 is a dependent variable, 𝑢 and 𝑣 are independent variables and 𝑥 and
𝑦 are intermediate variables.
𝜕𝑤 𝜕𝑤
Since 𝑤 is a function of 𝑥 and 𝑦 it has partial derivatives and
𝜕𝑥 𝜕𝑦

HCM City Univ. of Technology, Faculty of Mechanical Engineering 11 Duong Van Tu

Multilayer Perceptron Chapter 4
Chain Rule Gradient
Since 𝑤 is a function of 𝑢 and 𝑣 we can also compute the partial derivatives, that is
𝜕𝑤 𝜕𝑤
and
𝜕𝑢 𝜕𝑣

The chain rule relates these derivatives by the following formulas

𝜕𝑤 𝜕𝑤 𝜕𝑥 𝜕𝑤 𝜕𝑦
= +
𝜕𝑢 𝜕𝑥 𝜕𝑢 𝜕𝑦 𝜕𝑢

𝜕𝑤 𝜕𝑤 𝜕𝑥 𝜕𝑤 𝜕𝑦
= +
𝜕𝑣 𝜕𝑥 𝜕𝑣 𝜕𝑦 𝜕𝑣

HCM City Univ. of Technology, Faculty of Mechanical Engineering 12 Duong Van Tu

Multilayer Perceptron Chapter 4
Backpropagation
✓ Backpropagation refers to the method of calculating the gradient of neural
network parameters.
✓ In short, the method traverses the network in reverse order, from the output to
the input layer, according to the chain rule from calculus.
✓ The algorithm stores any intermediate variables (partial derivatives) required
while calculating the gradient with respect to some parameters.
✓ To update the weight and the bias, we need to define the loss function
𝐽 𝑾, 𝒃, 𝐱, 𝐲
𝑾 is the weight matrix, 𝒃 is the bias matrix
𝐱, 𝐲 are the input and output vector
HCM City Univ. of Technology, Faculty of Mechanical Engineering 13 Duong Van Tu
Multilayer Perceptron Chapter 4
Backpropagation
The loss function is defined according to Mean Square Error (MSE)

𝑁
1 2
ෝ𝒏
𝐽 𝑾, 𝒃, 𝐱, 𝐲 = ෍ 𝒚𝒏 − 𝒚
N
𝑛=1
It is difficult to take the gradient of the loss function over the weight matrix.
𝜕𝐽 𝜕𝐽 𝜕𝐽
The backpropagation is to take the gradient , , … , for applying the
𝜕𝑾(1) 𝜕𝑾(2) 𝜕𝑾(𝐿)

Gradient Descent to update the weight matrix of each layer.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 14 Duong Van Tu

Multilayer Perceptron Chapter 4
Backpropagation
Consider the weight of the 𝑗𝑡ℎ unit of the 𝐿𝑡ℎ layer:

(𝐿)
𝜕𝐿 𝜕𝐿 𝜕𝑧𝑗
(𝐿)
= (𝐿) (𝐿)
𝜕𝑤𝑖𝑗 𝜕𝑧𝑗 𝜕𝑤𝑖𝑗
(𝐿) (𝐿) (𝐿−1) (𝐿)
Recall 𝑧𝑗 = 𝑤𝑖𝑗 𝑎𝑖 + 𝑏𝑗 then it can be deducted as:

(𝐿)
𝜕𝑧𝑗 (𝐿−1)
(𝐿)
= 𝑎𝑖
𝜕𝑤𝑖𝑗

(𝐿) 𝜕𝐿 𝜕𝐿 (𝐿) (𝐿−1)

Let define 𝑒𝑗 ≜ (𝐿) , the above equation becomes: (𝐿) = 𝑒𝑗 𝑎𝑖
𝜕𝑧𝑗 𝜕𝑤𝑖𝑗
HCM City Univ. of Technology, Faculty of Mechanical Engineering 15 Duong Van Tu
Multilayer Perceptron Chapter 4
Backpropagation
By the same manner, the gradient of the loss function over the bias is:

(𝐿)
𝜕𝐿 𝜕𝐿 𝜕𝑧𝑗 (𝐿)
(𝐿)
= (𝐿) (𝐿)
= 𝑒𝑗
𝜕𝑏𝑗 𝜕𝑧𝑗 𝜕𝑏𝑗
For the 𝑙𝑡ℎ layer, the gradient at the 𝑗𝑡ℎ unit is calculated as below:

(𝑙)
𝜕𝐿 𝜕𝐿 𝜕𝑧𝑗
(𝑙)
= (𝑙) (𝑙)
𝜕𝑤𝑖𝑗 𝜕𝑧𝑗 𝜕𝑤𝑖𝑗

(𝑙)
(𝑙) (𝑙) (𝑙−1) (𝑙) 𝜕𝑧𝑗 (𝑙−1)
Recall that 𝑧𝑗 = 𝑤𝑖𝑗 𝑎𝑖 + 𝑏𝑗 , then it can be deducted as (𝑙) = 𝑎𝑖
𝜕𝑤𝑖𝑗
HCM City Univ. of Technology, Faculty of Mechanical Engineering 16 Duong Van Tu
Multilayer Perceptron Chapter 4
Backpropagation

(𝑙) 𝜕𝐿
Let define 𝑒𝑗 ≜ (𝑙) , the gradient of 𝐿 over the weight is calculated as:
𝜕𝑧𝑗

𝜕𝐿 (𝑙) (𝑙−1)
(𝑙)
= 𝑒𝑗 𝑎𝑖
𝜕𝑤𝑖𝑗

𝜕𝐿
Let consider the term (𝑙) , we can use the chain rule:
𝜕𝑧𝑗

(𝑙)
𝜕𝐿 𝜕𝐿 𝜕𝑎𝑗
(𝑙)
= (𝑙) (𝑙)
𝜕𝑧𝑗 𝜕𝑎𝑗 𝜕𝑧𝑗

HCM City Univ. of Technology, Faculty of Mechanical Engineering 17 Duong Van Tu

Multilayer Perceptron Chapter 4
Backpropagation
(𝑙+1) (𝑙+1) (𝑙)
Recall that the input of the 𝑘 𝑡ℎ unit is calculated as 𝑧𝑘 = 𝒘𝑘 𝒂 + 𝒃(𝑙+1)
(𝑙) (𝑙+1)
𝑎𝑗 → 𝑧𝑘 with 𝑘 = 1, … , 𝑑 (𝑙+1)

Then we can rewritten as follows:

𝑑 (𝑙+1) (𝑙+1) 𝑑 (𝑙+1)

𝜕𝐿 𝜕𝐿 𝜕𝑧𝑘 (𝑙+1) (𝑙+1) (𝑙+1) (𝑙+1)
(𝑙)
= ෍ (𝑙+1) (𝑙)
= ෍ 𝑒𝑘 𝑤𝑗𝑘 = 𝒘𝑘: 𝒆
𝜕𝑎𝑗 𝑘=1 𝜕𝑧𝑘 𝜕𝑎𝑗 𝑘=1

In the other hand, we have:

(𝑙)
𝜕𝑎𝑗 (𝑙)
(𝑙) = 𝑓′(𝑧𝑗 ) with 𝑓 is the activation function
𝜕𝑧𝑗
HCM City Univ. of Technology, Faculty of Mechanical Engineering 18 Duong Van Tu
Multilayer Perceptron Chapter 4
Backpropagation
Then we can rewrite as follows:

(𝑙) 𝜕𝐿 (𝑙+1) (𝑙+1) (𝑙)

𝑒𝑗 = (𝑙)
= (𝒘𝑘: 𝒆 )𝑓′(𝑧𝑗 )
𝜕𝑧𝑗
Then

𝜕𝐿 (𝑙+1) (𝑙+1) (𝑙) (𝑙−1)

(𝑙) = (𝒘𝑘: 𝒆 )𝑓′(𝑧𝑗 )𝑎𝑖
𝜕𝑤𝑖𝑗

With the same manner, we have:

𝜕𝐿 (𝑙)
(𝑙)
= 𝑒𝑗
𝜕𝑏𝑗
HCM City Univ. of Technology, Faculty of Mechanical Engineering 19 Duong Van Tu
Multilayer Perceptron Chapter 4
Backpropagation
Then we can rewrite as follows:

(𝑙) 𝜕𝐿 (𝑙+1) (𝑙+1) (𝑙)

𝑒𝑗 = (𝑙)
= (𝒘𝑘: 𝒆 )𝑓′(𝑧𝑗 )
𝜕𝑧𝑗
Then

𝜕𝐿 (𝑙+1) (𝑙+1) (𝑙) (𝑙−1)

(𝑙) = (𝒘𝑘: 𝒆 )𝑓′(𝑧𝑗 )𝑎𝑖
𝜕𝑤𝑖𝑗

With the same manner, we have:

𝜕𝐿 (𝑙)
(𝑙)
= 𝑒𝑗
𝜕𝑏𝑗
HCM City Univ. of Technology, Faculty of Mechanical Engineering 20 Duong Van Tu
Multilayer Perceptron Chapter 4
Backpropagation
Summary
(Unit by unit)
Implement forward propagation with each input, store activation values 𝒂(𝑙)

(𝐿) 𝜕𝐿
At each unit at the output layer, we calculate: 𝑒𝑗 = (𝐿)
𝜕𝑧𝑗

𝜕𝐿 (𝐿−1) (𝐿) 𝜕𝐿 (𝐿)

The gradient is determined as: (𝐿) = 𝑎𝑖 𝑒𝑗 , and (𝐿) = 𝑒𝑗
𝜕𝑤𝑖𝑗 𝜕𝑏𝑗

(𝑙) (𝑙+1) (𝑙+1) (𝑙)

At the 𝑙𝑡ℎ layer, we calculate: 𝑒𝑗 = (𝒘𝑘: 𝒆 )𝑓′(𝑧𝑗 )

𝜕𝐿 (𝑙) (𝑙−1) 𝜕𝐿 (𝑙)

The gradient is determined as: (𝑙) = 𝑒𝑗 𝑎𝑖 , and (𝑙) = 𝑒𝑗
𝜕𝑤𝑖𝑗 𝜕𝑏𝑗
HCM City Univ. of Technology, Faculty of Mechanical Engineering 21 Duong Van Tu
Multilayer Perceptron Chapter 4
Backpropagation
Summary
(In vector form)
Implement forward propagation with each input, store activation values 𝒂𝑙
𝜕𝐿
At the output layer, we calculate: 𝒆(𝐿) =
𝜕𝒛(𝐿)

𝜕𝐿 𝜕𝐿
The gradient is determined as: = 𝒂(𝐿−1) 𝒆(𝐿) , and = 𝒆(𝐿)
𝜕𝑾(𝐿) 𝜕𝒃(𝐿)

At the 𝑙𝑡ℎ layer, we calculate: 𝒆(𝑙) = 𝑾(𝑙+1) 𝒆(𝑙+1) ∗ 𝑓 ′ 𝒛(𝑙) where * is the element-
wise product.
𝜕𝐿 𝜕𝐿
The gradient is determined as: = 𝒆(𝑙) 𝒂(𝑙−1) , and = 𝒆(𝑙)
𝜕𝑾(𝑙) 𝜕𝒃(𝑙)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 22 Duong Van Tu

Multilayer Perceptron Chapter 4
Multiclass Classification

HCM City Univ. of Technology, Faculty of Mechanical Engineering 23 Duong Van Tu

Multilayer Perceptron Chapter 4
Multiclass Classification
ReLu
Softmax

HCM City Univ. of Technology, Faculty of Mechanical Engineering 24 Duong Van Tu

Multilayer Perceptron Chapter 4
Multiclass Classification

Forward propagation
𝒁(1) = 𝑾(1) 𝑿 + 𝒃(1)
𝑨(1) = max 𝒁(1) , 0
𝒁(2) = 𝑾(2) 𝑨(1) + 𝒃(2)
෡ = 𝑨(2) = softmax(𝒁(2) )
𝒀
Loss function

𝑁 𝐶
1
𝐽 𝑾, 𝒃; 𝑿, 𝒀 = − ෍ ෍ 𝑦𝑗𝑖 log(𝑦ො𝑗𝑖 )
𝑁
𝑖=1 𝑗=1

HCM City Univ. of Technology, Faculty of Mechanical Engineering 25 Duong Van Tu

Multilayer Perceptron Chapter 4
Multiclass Classification
Backpropagation
𝜕𝐽 1 𝜕𝐽 𝜕𝐽 (2)
𝒆(2) = = ෡
(𝒀 − 𝒀), = 𝒂(1) 𝒆(2) , = σ𝑁
𝑛=1 𝒆𝑛
𝜕𝒁(2) 𝑁 𝜕𝑾(2) 𝜕𝒃(2)

𝜕𝐽
𝒆(1) = 𝑾 2 𝒆 1 ∗ 𝑓′ 𝒁 1 , = 𝒂(0) 𝒆(1) = 𝑿 𝒆(1)
𝜕𝑾(1)

Next discussion: Multiclass Regression with MLP

HCM City Univ. of Technology, Faculty of Mechanical Engineering 26 Duong Van Tu

Econometrics Formulas
80% (5)
Econometrics Formulas
2 pages
Ch2 - Fundamental of Deep Learning
No ratings yet
Ch2 - Fundamental of Deep Learning
33 pages
9NeuralNetworksLearning
No ratings yet
9NeuralNetworksLearning
38 pages
Pattern Recognition Using Neural Network (Project Proposal For Image Processing)
No ratings yet
Pattern Recognition Using Neural Network (Project Proposal For Image Processing)
6 pages
Amansoftcomp PDF
No ratings yet
Amansoftcomp PDF
27 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Machine Learning Part 9
No ratings yet
Machine Learning Part 9
33 pages
Neural Networks: Learning: Cost Function
No ratings yet
Neural Networks: Learning: Cost Function
33 pages
Ann MLP
No ratings yet
Ann MLP
56 pages
A Method To Find The Best Mixed Polarity Reed-Muller Expansion
No ratings yet
A Method To Find The Best Mixed Polarity Reed-Muller Expansion
52 pages
Oec552 Soft Computing QB PDF
No ratings yet
Oec552 Soft Computing QB PDF
13 pages
Lect5 UWA
No ratings yet
Lect5 UWA
93 pages
Chaotic Time Series Prediction by Artifi
No ratings yet
Chaotic Time Series Prediction by Artifi
17 pages
Lab Manual On Soft Computing (IT-802) : Ms. Neha Sexana
No ratings yet
Lab Manual On Soft Computing (IT-802) : Ms. Neha Sexana
29 pages
Ai 7
No ratings yet
Ai 7
41 pages
Nnn3neural Networks
No ratings yet
Nnn3neural Networks
5 pages
AIML - 04 Single Layer Perceptron
No ratings yet
AIML - 04 Single Layer Perceptron
11 pages
Unit II
No ratings yet
Unit II
38 pages
Activity#4 (Create A Discrete-Time Signal With EVEN & ODD) : 1. Objectives
No ratings yet
Activity#4 (Create A Discrete-Time Signal With EVEN & ODD) : 1. Objectives
9 pages
Backpropagation Learning in Neural Networks
No ratings yet
Backpropagation Learning in Neural Networks
27 pages
SOFT COMPUTING 2 marks with answer
No ratings yet
SOFT COMPUTING 2 marks with answer
13 pages
perceptron
No ratings yet
perceptron
32 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
ANN Models
No ratings yet
ANN Models
42 pages
Dsip Lab Manual Latest Updated
No ratings yet
Dsip Lab Manual Latest Updated
39 pages
Finite Element Method (FEM) : Course Code: NTME 637
No ratings yet
Finite Element Method (FEM) : Course Code: NTME 637
26 pages
AI - II - Cihan - Lect 6 PDF
No ratings yet
AI - II - Cihan - Lect 6 PDF
31 pages
Inverse Kinematics Solution of 3DOF Planar Robot Using ANFIS
No ratings yet
Inverse Kinematics Solution of 3DOF Planar Robot Using ANFIS
7 pages
Transmission de Ling
No ratings yet
Transmission de Ling
7 pages
SC Exp1 Shruti
No ratings yet
SC Exp1 Shruti
7 pages
22
No ratings yet
22
17 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
(ECE 357) Digital Signal Processing Lab: Bachelor of Technology IN Electronics and Communication
No ratings yet
(ECE 357) Digital Signal Processing Lab: Bachelor of Technology IN Electronics and Communication
58 pages
Lecture 3 - Signal and System Modelling
No ratings yet
Lecture 3 - Signal and System Modelling
58 pages
Machine Learning: Algorithms and Applications: Philip O. Ogunbona
No ratings yet
Machine Learning: Algorithms and Applications: Philip O. Ogunbona
29 pages
C2
No ratings yet
C2
22 pages
2020 NA Protection Article Testing - Directional - Overcurrent - Protection
No ratings yet
2020 NA Protection Article Testing - Directional - Overcurrent - Protection
24 pages
Convolution and Pooling Layers
No ratings yet
Convolution and Pooling Layers
42 pages
Machine Learning: Neural Networks Slides Mostly Adapted From Tom Mithcell, Han and Kamber
No ratings yet
Machine Learning: Neural Networks Slides Mostly Adapted From Tom Mithcell, Han and Kamber
40 pages
Unit II 2 Marks With Header DL
No ratings yet
Unit II 2 Marks With Header DL
7 pages
MLT Answer Key
No ratings yet
MLT Answer Key
10 pages
It6502 DSP Iq April May 2017
No ratings yet
It6502 DSP Iq April May 2017
3 pages
SC Unit-2
No ratings yet
SC Unit-2
20 pages
SIGNALS-SYSTEMS LAB Manual
No ratings yet
SIGNALS-SYSTEMS LAB Manual
44 pages
Implementation of Reduced Induction Machine Fuzzy Logic Control Based On dSPACE-1104 R&D Controller Board
No ratings yet
Implementation of Reduced Induction Machine Fuzzy Logic Control Based On dSPACE-1104 R&D Controller Board
9 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
CL7204-Soft Computing Techniques
No ratings yet
CL7204-Soft Computing Techniques
13 pages
Back Propagation Neural Network
No ratings yet
Back Propagation Neural Network
5 pages
Machine Learning: Chapter 4. Artificial Neural Networks
No ratings yet
Machine Learning: Chapter 4. Artificial Neural Networks
34 pages
Lecture_2 (1)
No ratings yet
Lecture_2 (1)
52 pages
07 AIS302 CNN
No ratings yet
07 AIS302 CNN
56 pages
Unit-2 Notes
No ratings yet
Unit-2 Notes
12 pages
DL_EXP-2_16010422230
No ratings yet
DL_EXP-2_16010422230
6 pages
Back Propagation Algorithm - A Review: Mrinalini Smita & Dr. Anita Kumari
No ratings yet
Back Propagation Algorithm - A Review: Mrinalini Smita & Dr. Anita Kumari
6 pages
Quicksort
No ratings yet
Quicksort
45 pages
Echo Real
No ratings yet
Echo Real
12 pages
AI31
No ratings yet
AI31
13 pages
Perceptrons
No ratings yet
Perceptrons
8 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Stefan Week 4 Slides
No ratings yet
Stefan Week 4 Slides
63 pages
Homework Panel Data
No ratings yet
Homework Panel Data
2 pages
Sem With Amos I PDF
100% (1)
Sem With Amos I PDF
68 pages
Stock_Watson_3U_ExerciseSolutions_Chapter6_Instructors
No ratings yet
Stock_Watson_3U_ExerciseSolutions_Chapter6_Instructors
13 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
تحليل وقياس العلاقة بين الإيرادات الجبائية والنفقات العمومية (دراسة قياسية للفترة 1993 - 2015)
No ratings yet
تحليل وقياس العلاقة بين الإيرادات الجبائية والنفقات العمومية (دراسة قياسية للفترة 1993 - 2015)
14 pages
Test Lux
No ratings yet
Test Lux
4 pages
Regression PDF
No ratings yet
Regression PDF
21 pages
Asymptotic Properties of 2Sls
No ratings yet
Asymptotic Properties of 2Sls
23 pages
PPKU06 07 Modelling Asosiasi Korelasi Regresi
No ratings yet
PPKU06 07 Modelling Asosiasi Korelasi Regresi
58 pages
Economic Rate of Return Using Multiple Regression
No ratings yet
Economic Rate of Return Using Multiple Regression
12 pages
Hausman
No ratings yet
Hausman
3 pages
Prediction of the Output Power of a Combined Cycle Power Plant using Machine Learning _ by Timothy Osirike _ Analytics Vidhya _ Medium
No ratings yet
Prediction of the Output Power of a Combined Cycle Power Plant using Machine Learning _ by Timothy Osirike _ Analytics Vidhya _ Medium
12 pages
EF3450 2122B MID
No ratings yet
EF3450 2122B MID
11 pages
MCA Question Bank
No ratings yet
MCA Question Bank
33 pages
Nested Anova Revised
100% (1)
Nested Anova Revised
6 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Logistic - Regression - Ipynb - Colaboratory
No ratings yet
Logistic - Regression - Ipynb - Colaboratory
3 pages
Chapter 03 Correlation and Regression
No ratings yet
Chapter 03 Correlation and Regression
21 pages
Word Validitas 1
No ratings yet
Word Validitas 1
8 pages
ECON6001 F2021 Topic4
No ratings yet
ECON6001 F2021 Topic4
76 pages
TO Machine Learning: Lecture Slides For
No ratings yet
TO Machine Learning: Lecture Slides For
33 pages
Fitting Generalized Additive Models With The GAM Procedure in SAS 9.2
No ratings yet
Fitting Generalized Additive Models With The GAM Procedure in SAS 9.2
14 pages
Spearman's Rank-Order Correlation
No ratings yet
Spearman's Rank-Order Correlation
10 pages
Linear/Multiple Regression: Application & Sample Problems Coefficient of Determination
100% (1)
Linear/Multiple Regression: Application & Sample Problems Coefficient of Determination
17 pages
Immediate download Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manual all chapters
100% (3)
Immediate download Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manual all chapters
50 pages
Chapter 4 PowerPoint
No ratings yet
Chapter 4 PowerPoint
76 pages
Tsvar
No ratings yet
Tsvar
12 pages
Stat Wizards-Case Study 2 - DS853
No ratings yet
Stat Wizards-Case Study 2 - DS853
23 pages

Ch4 - Multilayer Perceptron

Uploaded by

Ch4 - Multilayer Perceptron

Uploaded by

Chapter 4

Input Hidden Hidden Output

Minibatch 𝑿 ∈ ℝ𝑛×𝑑 of 𝑛 examples where each example has 𝑑 inputs (features)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 2 Duong Van Tu

HCM City Univ. of Technology, Faculty of Mechanical Engineering 4 Duong Van Tu

HCM City Univ. of Technology, Faculty of Mechanical Engineering 6 Duong Van Tu

HCM City Univ. of Technology, Faculty of Mechanical Engineering 7 Duong Van Tu

HCM City Univ. of Technology, Faculty of Mechanical Engineering 8 Duong Van Tu

HCM City Univ. of Technology, Faculty of Mechanical Engineering 9 Duong Van Tu

Input layer hidden layer 1 output layer

HCM City Univ. of Technology, Faculty of Mechanical Engineering 10 Duong Van Tu

HCM City Univ. of Technology, Faculty of Mechanical Engineering 11 Duong Van Tu

The chain rule relates these derivatives by the following formulas

HCM City Univ. of Technology, Faculty of Mechanical Engineering 12 Duong Van Tu

Gradient Descent to update the weight matrix of each layer.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 14 Duong Van Tu

(𝐿) 𝜕𝐿 𝜕𝐿 (𝐿) (𝐿−1)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 17 Duong Van Tu

Then we can rewritten as follows:

𝑑 (𝑙+1) (𝑙+1) 𝑑 (𝑙+1)

In the other hand, we have:

(𝑙) 𝜕𝐿 (𝑙+1) (𝑙+1) (𝑙)

𝜕𝐿 (𝑙+1) (𝑙+1) (𝑙) (𝑙−1)

With the same manner, we have:

(𝑙) 𝜕𝐿 (𝑙+1) (𝑙+1) (𝑙)

𝜕𝐿 (𝑙+1) (𝑙+1) (𝑙) (𝑙−1)

With the same manner, we have:

𝜕𝐿 (𝐿−1) (𝐿) 𝜕𝐿 (𝐿)

(𝑙) (𝑙+1) (𝑙+1) (𝑙)

𝜕𝐿 (𝑙) (𝑙−1) 𝜕𝐿 (𝑙)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 22 Duong Van Tu

HCM City Univ. of Technology, Faculty of Mechanical Engineering 23 Duong Van Tu

HCM City Univ. of Technology, Faculty of Mechanical Engineering 24 Duong Van Tu

HCM City Univ. of Technology, Faculty of Mechanical Engineering 25 Duong Van Tu

Next discussion: Multiclass Regression with MLP

HCM City Univ. of Technology, Faculty of Mechanical Engineering 26 Duong Van Tu

You might also like