0% found this document useful (0 votes)

4 views

Clase 4 Backpropagation

The document discusses backpropagation, a method for training neural networks by adjusting weights based on the error rate from previous iterations. It covers the structure of artificial neurons, activation functions, and the application of gradient descent to minimize loss during training. Additionally, it explains the process of forward and backward passes in both single and multi-layer perceptrons.

Uploaded by

lina.lopez.garcia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Clase 4 Backpropagation

Uploaded by

lina.lopez.garcia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

Aprendizaje por retropagación

(Backpropagation)

Deisy Chaves
Oficina 10, 4rto Piso, Edificio B13
[email protected]
Contenido

• Aprendizaje por retropropagación (backpropagation)

• Perceptrón (one neuron)
• Perceptrón multicapa (Multi-Layer Perceptron)

• Aceleración del aprendizaje

Artificial neuron

Artificial neuron
• Input features, xj
• Weights, wi 𝑥1 𝜔1 𝑏 Bias

• Weighted sum (linear function):

𝑥2 𝜔2
w1·x1+ w2·x2 + ... + wn·xn +b =Z
⋮
∑  𝑦ො
⋮
Activation Output
• Activation function,  : 𝑥𝑚 𝜔𝑚 Weighted
ො (w1·x1+ w2·x2 + ... + wn·xn +b)
𝑦= sum
Inputs Weights

• Bias
Activation function

Sign function: Sigmoid function:

1 𝑠𝑖 𝑧 ≥ 0 1
𝜎(𝑧) = ቊ 𝜎(𝑧) =
0 𝑠𝑖 𝑧 < 0 1 + 𝑒 −𝑧
Artificial neuron

• Input features, xj Artificial neuron

• Weights, wi
𝑥1 𝜔1 𝑏 Bias
• Weighted sum (linear function):
w1·x1+ w2·x2 + ... + wn·xn +b =Z
𝑥2 𝜔2

⋮
∑  𝑦ො
⋮
• Activation function,  : Activation Output
ො (w1·x1+ w2·x2 + ... + wn·xn +b)
𝑦= 𝑥𝑚 𝜔𝑚 Weighted
sum
1 Inputs Weights
(𝑧) =
1 + 𝑒 −𝑧

• Bias
Backpropagation
• Introduced in the 1970s
• Method for fine-tuning the weights (training) of a neural network (Perceptron,
MLP,…) with respect to the error rate obtained in the previous iteration or epoch
Backpropagation
• The error -loss function- of
model output compared to the
true values is calculated to
evaluate the model

• Gradient descent is used to

update the model parameters
of model (weights/bias) and
reduce the loss during
backward pass

• Cost function is the average of

the individual losses across the
entire training dataset
(aggregates the error over all
samples)

https://machinelearningknowledge.ai/wp-content/uploads/2019/10/Backpropagation.gif
Backpropagation: Gradient descent
• The gradient of a function at a
specific point is the direction
Loss function (L) vs Weight (w) of the steepest ascent

• To find the minimum, we have

to move in the opposite
direction (negative of the
gradient), the direction of the
steepest descent

https://datamapu.com/posts/deep_learning/intro_dl/
Backpropagation: example

• Training Data: Our dataset has one-dimensional sample with one

input and one output, x=0.5 and labels y=1.5

1
• Activation Function: Sigmoid function (𝑧) = 1 + 𝑒 −𝑧

• Learning rate = 0.1

𝑛
1
• Loss or Cost Function: Sum of Squared error L(y,ොy) = 2 ෍(𝑦𝑖 − yො 𝑖 )2
𝑖=1
The training set only
has one sample, n=1
Backpropagation: one neuron

• Training Data: Our dataset has one-dimensional sample with one

input and one output, x=0.5 and labels y=1.5

1
• Activation Function: Sigmoid function (𝑧) = 1 + 𝑒 −𝑧

• Learning rate = 0.1

𝑛
1
• Loss or Cost Function: Sum of Squared error L(y,ොy) = 2 ෍(𝑦𝑖 − yො 𝑖 )2
𝑖=1
1 The training set only
L(y,ොy) = (y − yො )2 has one sample, n=1
2
Backpropagation: one neuron
• Training sample (x =0.5, y=1.5), Learning rate =0.1
Artificial neuron

• Input features, x 𝑏 Bias

• Weights, w (initial w =0.3)
• Bias, b (initial b =0.1)
𝑥1 𝜔1 𝒛=∑ a=(z) 𝑦ො
• Weighted sum (linear function): Activation Output
Z(x) = w·x+b Inputs Weights Weighted
sum

• Activation function:
1
a = (𝑧) =
1 + 𝑒 −𝑧
1 1
ො a = (z) =
• Output: 𝑦= =
1+𝑒 −𝑧 1+𝑒 −(𝑤𝑥+𝑏)
Backpropagation: one neuron

• Training sample (x =0.5, y=1.5), Learning rate  0.1

• Input features, x
• Weights, w (initial w =0.3)
• Bias, b (initial b =0.1)
• Weighted sum (linear function):
Z(x) = w·x+b
• Activation function:
1
a = (𝑧) =
1 + 𝑒 −𝑧
1 1
ො a = (z) =
• Output: 𝑦= =
1+𝑒 −𝑧 1+𝑒 −(𝑤𝑥+𝑏)
Backpropagation: one neuron

• Forward pass
ො a = (z)
𝑦=
ො (w·x+b)
𝑦=
1
𝑦ො =
1 + 𝑒 −(𝑤𝑥+𝑏)
1
L(y,ොy) = (y − yො )2
2

Training sample (x =0.5, y=1.5)

Weights, w (initial w =0.3)
Bias, b (initial b =0.1)
Learning rate,  = 0.1
Backpropagation: one neuron

• Forward pass
ො a = (z)
𝑦=
ො (w·x+b)
𝑦=
1
𝑦ො = =0.56
1+𝑒 −(𝑤𝑥+𝑏)

1 1
L(y,ොy) = 2 (y − yො )2 = 2 (1.5 − 0.56)2 = 0.44

Training sample (x =0.5, y=1.5)

Weights, w (initial w =0.3)
Bias, b (initial b =0.1)
Learning rate,  = 0.1
Backpropagation: one neuron

• Backward pass (Gradient descent)

Training sample (x =0.5, y=1.5)

Weights, w (initial w =0.3)
Bias, b (initial b =0.1)
Learning rate,  = 0.1
Backpropagation: one neuron

• Backward pass (Gradient descent)

Training sample (x =0.5, y=1.5)

Weights, w (initial w =0.3)
Bias, b (initial b =0.1)
Learning rate,  = 0.1
Backpropagation: one neuron

• Backward pass (Gradient descent)

Derivative rule
Backpropagation: one neuron

• Backward pass (Gradient descent) Derivative rule

Backpropagation: one neuron

• Backward pass (Gradient descent)

Derivative rule
Backpropagation: one neuron

• Backward pass (Gradient descent)

Derivative rule
Backpropagation: one neuron

• Backward pass (Gradient descent)

Training sample (x =0.5, y=1.5)
Backpropagation: one neuron Weights, w (initial w =0.3)
Bias, b (initial b =0.1)
Learning rate,  = 0.1
Forward pass: 𝑦ො =0.56
• Backward pass (Gradient descent)
Training sample (x =0.5, y=1.5)
Backpropagation: one neuron Weights, w (initial w =0.3)
Bias, b (initial b =0.1)
Learning rate,  = 0.1
Forward pass: 𝑦ො =0.56
• Backward pass (Gradient descent)
Training sample (x =0.5, y=1.5)
Backpropagation: one neuron Weights, w (initial w =0.3)
Bias, b (initial b =0.1)
Learning rate,  = 0.1
Forward pass: 𝑦ො =0.56
• Backward pass (Gradient descent)
Multilayer Perceptron (MLP)
• Is a neural networks contain more than one computational layer
• The additional intermediate layers (between input and output) are hidden layers
because the computations performed are not visible to the user

Fully-connected neural network

Backpropagation: two neuron

• Training sample (x =0.5, y=1.5), Learning rate  0.1

• Input features, x
• Weights, w (initial w(1) =0.3, w(2) =0.2)
• Bias, b (initial b(1) =0.1, b(2) =0.4)
• Weighted sum (linear function):
Z(x) = w·x+b
• Activation function:
1
a = (𝑧) =
1 + 𝑒 −𝑧
1 1
ො a = (z) =
• Output: 𝑦= =
1+𝑒 −𝑧 1+𝑒 −(𝑤𝑥+𝑏)
Backpropagation: two neuron

• Forward pass
ො a(3) = (z (3))
𝑦=
1
 (z) =
1+𝑒 −𝑧

1
L(y,ොy) = (y − yො )2
2

Training sample (x =0.5, y=1.5)

Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Backpropagation: two neuron

• Forward pass

Training sample (x =0.5, y=1.5)

Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Backpropagation: two neuron

• Forward pass

Training sample (x =0.5, y=1.5)

Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Backpropagation: two neuron

• Forward pass
ො a(3) = (z (3)) = 0.625
𝑦=

𝟏
ෝ)𝟐
ො = (y − 𝒚
L(y,𝐲)
𝟐

Training sample (x =0.5, y=1.5)

Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Backpropagation: two neuron

• Forward pass
ො a(3) = (z (3)) = 0.625
𝑦=

𝟏
ො =
L(y,𝐲) ෝ)𝟐
(y − 𝒚
𝟐
1
L(y,ොy) = (1.5 − 0.625)2 = 0.38
2
Training sample (x =0.5, y=1.5)
Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Backpropagation: two neuron

• Backward pass

Training sample (x =0.5, y=1.5)

Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Forward pass: 𝑦ො =0.625
Backpropagation: two neuron

• Backward pass
We need to update the
model parameters

Training sample (x =0.5, y=1.5)

Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Forward pass: 𝑦ො =0.625
Backpropagation: two neuron

• Backward pass
The calculations for w(2) and
b(2) are similar to the ones in
the first example

Training sample (x =0.5, y=1.5)

Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Forward pass: 𝑦ො =0.625
Backpropagation: two neuron

• Backward pass
The calculations for w(2) and
b(2) are similar to the ones in
the first example

Training sample (x =0.5, y=1.5)

Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Forward pass: 𝑦ො =0.625
Backpropagation: two neuron

• Backward pass
The calculations for w(2) and
b(2) are similar to the ones in
the first example

Training sample (x =0.5, y=1.5)

Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Forward pass: 𝑦ො =0.625
Training sample (x =0.5, y=1.5)

Backpropagation: two neuron Weights, w (initial w(1) =0.3, w(2) =0.2)

Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Forward pass: 𝑦ො =0.625
• Backward pass
The calculations for w(2) and
b(2) are similar to the ones in Derivative rule
the first example
Training sample (x =0.5, y=1.5)

Backpropagation: two neuron Weights, w (initial w(1) =0.3, w(2) =0.2)

• Backward pass
Calculations for w(1) and b(1)

Training sample (x =0.5, y=1.5)

Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Forward pass: 𝑦ො =0.625
Backpropagation: two neuron

• Backward pass
Calculations for w(1) and b(1)

Training sample (x =0.5, y=1.5)

Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Forward pass: 𝑦ො =0.625
Backpropagation: two neuron
Individual derivatives

• Backward pass
Calculations for w(1) and b(1)

Training sample (x =0.5, y=1.5)

Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
Forward pass: 𝑦ො =0.625
Training sample (x =0.5, y=1.5)
Backpropagation: two neuron Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
• Backward pass Forward pass: 𝑦ො =0.625

Calculations for w(1) and b(1)

Training sample (x =0.5, y=1.5)
Backpropagation: two neuron Weights, w (initial w(1) =0.3, w(2) =0.2)
Bias, b (initial b(1) =0.1, b(2) =0.4)
Learning rate,  = 0.1
• Backward pass Forward pass: 𝑦ො =0.625

Update w(1), b(1), w(2) and b(2)

Backpropagation: Two Neurons in a layer

• Training sample (x =0.5, y=1.5),

• Learning rate  0.1

• Input features, x
• Weights, w (initial w(1) =0.3, w(2) =0.2,…)
• Bias, b (initial b(1) =[b1(1), b2(1)], b(2) =[b1(2), b2(2)],

• Weighted sum (linear function):

Z(x) = w·x+b
• Activation function:
1
a = (𝑧) =
1 − 𝑒 −𝑧
1 1
ො a = (z) = 1−𝑒 −𝑧 = 1−𝑒 −(𝑤𝑥+𝑏)
• Output: 𝑦=
Backpropagation: Two Neurons in a layer

• Forward pass
In this case, consider the sum of the two
neurons in the layer
Backpropagation: Two Neurons in a layer

• Backward pass
In this case, it is necesary to update
Backpropagation: Two Neurons in a layer

• Backward pass
For the parameters of the first layer
Backpropagation: Two Neurons in a layer

• Backward pass
For the parameters of the first layer
Example: implementation of backpropagation
Example adapt from:
• https://github.com/CodigoMaquina/code/blob/main/machine_learning_python/ba
ckpropagation_paso_a_paso.ipynb

• https://www.youtube.com/watch?v=iOsR-EC9z6I
Backpropagation: limitations & challenges
• Data quality: poor data quality, including noise, incompleteness, or bias,
can lead to inaccurate models, as backpropagation learns exactly what it
is given

• Training duration: backpropagation often requires extensive training time,

which can be impractical when dealing with large networks

• Matrix-based complexity: the matrix operations in backpropagation scale

with the network size, which increases the computational demand
Backpropagation: limitations & challenges
Accelerating learning: Gradient Descent
• Gradient Descent updates parameters by moving against the gradient of the loss function.
• All the training data is considered for a single step. The average of the gradients of all the
training examples(mean gradient) is used to update the parameters. So that’s just one step of
gradient descent in one epoch.
• Use: Small-scale models with well-behaved loss functions.
Accelerating learning: Stochastic Gradient Descent
(SGD)
• SGD updates the model parameters based on a single data point, offering
faster convergence but more variance.
• Parameters are updated after processing each data point.
• Use: Large datasets with noisy updates.
Accelerating learning: Mini-Batch SGD
• Mini-Batch Gradient Descent strikes a balance by updating parameters
using a small batch of data points at a time.
• This speeds up the training while reducing variance compared to pure SGD.
• Use: Large datasets where full gradient descent is too slow.

SGD Mini-Batch SGD

m
Accelerating learning
GD SGD Mini-Batch SGD

MSE vs epochs MSE vs epochs MSE vs epochs

Accelerating learning: SGD + Momentum
• The Momentum is a method to accelerate learning using SGD
• Introduce a new variable v (velocity) or the direction and speed by which the parameters
move as the learning dynamics progresses.
• The velocity accumulates the previous gradients

• What is the role of ?

• If i  s larger than the current update is more affected by the previous gradients
• Usually values for are set high between 0.8, 0.9
Accelerating learning: SGD + Momentum
• SGD with Momentum adds a momentum term (v) to the gradient to smooth the
update direction, helping the optimizer avoid local minima and speed up convergence.
It’s useful when the loss function has steep or shallow regions.
• Use: Tasks requiring fast convergence and stabilization of noisy gradients.
Accelerating learning: SGD + Momentum
SGD SGD + Momentum
Accelerating learning: SGD + Better Momentum
(Nesterov)
• Nesterov proposed another approach to compute momentum
• First take a step in the direction of the accumulated gradient
• Then calculate the gradient and make a correction

SGD + Momentum SGD + Nesterov

Readings
Martín del Brío, B. & Sanz Molina, A. Redes neuronales y sistemas borrosos,
• Capítulo 2.5: El perceptrón multicapa
Credits

Slides adapt from post “Backpropagation Step by Step” available at

https://datamapu.com/posts/deep_learning/backpropagation/

Slides adapts from Lecture 6: “Optimization for Deep Neural Networks CMSC
35246: Deep Learning”

Waiting For A Sales Renaissance in The Fourth Industrial Revolution
No ratings yet
Waiting For A Sales Renaissance in The Fourth Industrial Revolution
12 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
43 pages
SJNanda_Neural Network
No ratings yet
SJNanda_Neural Network
43 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
47 pages
Backpropagation (Numericals) SOLVED NEW
No ratings yet
Backpropagation (Numericals) SOLVED NEW
8 pages
0111CS191028
No ratings yet
0111CS191028
4 pages
Back Propagation
No ratings yet
Back Propagation
5 pages
ML Session 15 Backpropagation
No ratings yet
ML Session 15 Backpropagation
30 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
ANN Notes Updated
0% (1)
ANN Notes Updated
46 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Backpropagation
No ratings yet
Backpropagation
12 pages
UML - unit 2
No ratings yet
UML - unit 2
10 pages
Lect 15 MLP Introduction Backprop
No ratings yet
Lect 15 MLP Introduction Backprop
24 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Exp - 4 - 5 (Prakash)
No ratings yet
Exp - 4 - 5 (Prakash)
10 pages
Classification Advanced
No ratings yet
Classification Advanced
51 pages
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
No ratings yet
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
43 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Lect 15 MLP Introduction Backprop
No ratings yet
Lect 15 MLP Introduction Backprop
24 pages
2403B05107_DL_ACTIVITY_04(1)
No ratings yet
2403B05107_DL_ACTIVITY_04(1)
9 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Exp 3
No ratings yet
Exp 3
9 pages
Unit 2
No ratings yet
Unit 2
38 pages
Derivations For Back Propagation of Multilayer Neural Network
No ratings yet
Derivations For Back Propagation of Multilayer Neural Network
14 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Artificial Neural Network - Back-Propagation Learning
No ratings yet
Artificial Neural Network - Back-Propagation Learning
21 pages
Ann-Back Propagation
No ratings yet
Ann-Back Propagation
21 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Errorback Propagation
No ratings yet
Errorback Propagation
3 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
Exp 4
No ratings yet
Exp 4
9 pages
Week 2
No ratings yet
Week 2
17 pages
MODULE 2
No ratings yet
MODULE 2
14 pages
ML Assignment-9
No ratings yet
ML Assignment-9
4 pages
ML EXPT 9
No ratings yet
ML EXPT 9
9 pages
12. NN Introduction MES
No ratings yet
12. NN Introduction MES
39 pages
DOC-20241108-WA0006.
No ratings yet
DOC-20241108-WA0006.
70 pages
ANN research
No ratings yet
ANN research
18 pages
Pr3_ANN_WriteUp.docx
No ratings yet
Pr3_ANN_WriteUp.docx
8 pages
555610A19_DL_EXP4
No ratings yet
555610A19_DL_EXP4
11 pages
Back-Propagation Is Very Simple. Who Made It Complicated
No ratings yet
Back-Propagation Is Very Simple. Who Made It Complicated
26 pages
Multilayer ANN for regression 5107
No ratings yet
Multilayer ANN for regression 5107
7 pages
FFNN,GD,Backpropagation
No ratings yet
FFNN,GD,Backpropagation
18 pages
Lecture 9 - Supervised Learning in ANN - (Part 2) New
No ratings yet
Lecture 9 - Supervised Learning in ANN - (Part 2) New
7 pages
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
No ratings yet
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
6 pages
NeuralNetworks
No ratings yet
NeuralNetworks
29 pages
Week 7 - Lab
No ratings yet
Week 7 - Lab
6 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Lecture 40,41 BP Algorithm
No ratings yet
Lecture 40,41 BP Algorithm
11 pages
Multiple-Layer Networks Backpropagation Algorithms
No ratings yet
Multiple-Layer Networks Backpropagation Algorithms
46 pages
ML.8-Neural Networks - Deep Learning (Week 12,13)
No ratings yet
ML.8-Neural Networks - Deep Learning (Week 12,13)
80 pages
Session XX - Neural Network
No ratings yet
Session XX - Neural Network
43 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
32 pages
Da 3 Lab DL 21BCE2687
No ratings yet
Da 3 Lab DL 21BCE2687
15 pages
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Thesis Statement For Classification and Division Essay
100% (3)
Thesis Statement For Classification and Division Essay
7 pages
BBMP Glossary Built Up Area Plinth Area
100% (1)
BBMP Glossary Built Up Area Plinth Area
44 pages
J48
No ratings yet
J48
3 pages
DWDM
No ratings yet
DWDM
2 pages
Paper 142
No ratings yet
Paper 142
12 pages
STM UNIT-2
No ratings yet
STM UNIT-2
36 pages
classification basic concept.data mining
No ratings yet
classification basic concept.data mining
20 pages
09 - AI-900 1-35 - M - Answered
No ratings yet
09 - AI-900 1-35 - M - Answered
9 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
Web Mining - Lec1 2
No ratings yet
Web Mining - Lec1 2
62 pages
A Comprehensive Guide To Ensemble Learning (With Python Codes)
No ratings yet
A Comprehensive Guide To Ensemble Learning (With Python Codes)
21 pages
Diabetes prediction using svm
No ratings yet
Diabetes prediction using svm
6 pages
Chapter 10: Artificial Neural Networks
No ratings yet
Chapter 10: Artificial Neural Networks
17 pages
Data Mining 5 Semester Bca
No ratings yet
Data Mining 5 Semester Bca
44 pages
ML_Unit-1
No ratings yet
ML_Unit-1
15 pages
Prediction of Engineering Properties of A Selected Litharenite Sandstone From Its Petrographic Characteristics Using Correlation and Multivariate Statistical Techniques
No ratings yet
Prediction of Engineering Properties of A Selected Litharenite Sandstone From Its Petrographic Characteristics Using Correlation and Multivariate Statistical Techniques
23 pages
Data Science Study Plan v1
No ratings yet
Data Science Study Plan v1
29 pages
(Ebook) Artificial Intelligence Methods for Optimization of the Software Testing Process: With Practical Examples and Exercises (Uncertainty, Computational Techniques, and Decision Intelligence) by Sahar Tahvili, Leo Hatvani ISBN 9780323919135, 0323919138 download pdf
100% (6)
(Ebook) Artificial Intelligence Methods for Optimization of the Software Testing Process: With Practical Examples and Exercises (Uncertainty, Computational Techniques, and Decision Intelligence) by Sahar Tahvili, Leo Hatvani ISBN 9780323919135, 0323919138 download pdf
81 pages
(Applicable From The Academic Session 2018-2019) : Syllabus For B. Tech in Computer Science & Engineering
No ratings yet
(Applicable From The Academic Session 2018-2019) : Syllabus For B. Tech in Computer Science & Engineering
24 pages
Get Computational Statistics Handbook with MATLAB 3rd Edition Wendy L. Martinez PDF ebook with Full Chapters Now
100% (20)
Get Computational Statistics Handbook with MATLAB 3rd Edition Wendy L. Martinez PDF ebook with Full Chapters Now
50 pages
Lecture 12 - Neural Networks (DONE!!) PDF
No ratings yet
Lecture 12 - Neural Networks (DONE!!) PDF
27 pages
Machine Learning: Lecture 8: Ensemble Methods
No ratings yet
Machine Learning: Lecture 8: Ensemble Methods
28 pages
Fin Irjmets1668589338
No ratings yet
Fin Irjmets1668589338
6 pages
Tutorial: Gaussian Process Models For Machine Learning
No ratings yet
Tutorial: Gaussian Process Models For Machine Learning
35 pages
Accuracy of Traffic Detection Devices On Two-And Four-Lane Arterials
No ratings yet
Accuracy of Traffic Detection Devices On Two-And Four-Lane Arterials
21 pages
Fair SVM
No ratings yet
Fair SVM
10 pages
Classification
No ratings yet
Classification
81 pages
13 Clustering and Classifier
No ratings yet
13 Clustering and Classifier
123 pages
AIML - CAT-II - Question Bank
No ratings yet
AIML - CAT-II - Question Bank
2 pages

Clase 4 Backpropagation

Uploaded by

Clase 4 Backpropagation

Uploaded by

Aprendizaje por retropagación

• Aprendizaje por retropropagación (backpropagation)

• Aceleración del aprendizaje

• Weighted sum (linear function):

Sign function: Sigmoid function:

• Input features, xj Artificial neuron

• Gradient descent is used to

• Cost function is the average of

• To find the minimum, we have

• Training Data: Our dataset has one-dimensional sample with one

• Learning rate = 0.1

• Training Data: Our dataset has one-dimensional sample with one

• Learning rate = 0.1

• Input features, x 𝑏 Bias

• Training sample (x =0.5, y=1.5), Learning rate  0.1

Training sample (x =0.5, y=1.5)

Training sample (x =0.5, y=1.5)

• Backward pass (Gradient descent)

Training sample (x =0.5, y=1.5)

• Backward pass (Gradient descent)

Training sample (x =0.5, y=1.5)

• Backward pass (Gradient descent)

• Backward pass (Gradient descent) Derivative rule

• Backward pass (Gradient descent)

• Backward pass (Gradient descent)

• Backward pass (Gradient descent)

Fully-connected neural network

• Training sample (x =0.5, y=1.5), Learning rate  0.1

Training sample (x =0.5, y=1.5)

Training sample (x =0.5, y=1.5)

Training sample (x =0.5, y=1.5)

Training sample (x =0.5, y=1.5)

Training sample (x =0.5, y=1.5)

Training sample (x =0.5, y=1.5)

Training sample (x =0.5, y=1.5)

Training sample (x =0.5, y=1.5)

Training sample (x =0.5, y=1.5)

Backpropagation: two neuron Weights, w (initial w(1) =0.3, w(2) =0.2)

Backpropagation: two neuron Weights, w (initial w(1) =0.3, w(2) =0.2)

Training sample (x =0.5, y=1.5)

Training sample (x =0.5, y=1.5)

Training sample (x =0.5, y=1.5)

Calculations for w(1) and b(1)

Update w(1), b(1), w(2) and b(2)

• Training sample (x =0.5, y=1.5),

• Weighted sum (linear function):

• Training duration: backpropagation often requires extensive training time,

• Matrix-based complexity: the matrix operations in backpropagation scale

SGD Mini-Batch SGD

MSE vs epochs MSE vs epochs MSE vs epochs

• What is the role of ?

SGD + Momentum SGD + Nesterov

Slides adapt from post “Backpropagation Step by Step” available at

You might also like