0% found this document useful (0 votes)

93 views

Feedforward Neural Networks in Depth, Part 1 - Forward and Backward Propagations - I, Deep Learning

This document provides an in-depth overview of forward and backward propagation in feedforward neural networks. It introduces notations to represent nodes, layers, weights, biases, activations and other entities. Forward propagation is explained as computing predictions and cost based on these notations. Backward propagation uses the chain rule to compute gradients of the cost with respect to weights and biases in order to update them during training. Vectorization of equations is also discussed to take advantage of optimized linear algebra operations.

Uploaded by

Vikash Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views

Feedforward Neural Networks in Depth, Part 1 - Forward and Backward Propagations - I, Deep Learning

Uploaded by

Vikash Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

I, Deep Learning

Feedforward Neural Networks in

Depth, Part 1: Forward and Backward
Propagations
Dec 10, 2021

This post is the first of a three-part series in which we set out to derive the mathematics behind
feedforward neural networks. They have

an input and an output layer with at least one hidden layer in between,
fully-connected layers, which means that each node in one layer connects to every node in
the following layer, and
ways to introduce nonlinearity by means of activation functions.

We start with forward propagation, which involves computing predictions and the associated cost
of these predictions.

Forward Propagation
Settling on what notations to use is tricky since we only have so many letters in the Roman
alphabet. As you browse the Internet, you will likely find derivations that have used different
notations than the ones we are about to introduce. However, and fortunately, there is no right or
wrong here; it is just a matter of taste. In particular, the notations used in this series take
inspiration from Andrew Ng’s Standard notations for Deep Learning. If you make a comparison,
you will find that we only change a couple of the details.

Now, whatever we come up with, we have to support

multiple layers,
several nodes in each layer,
various activation functions,
various types of cost functions, and
mini-batches of training examples.

As a result, our definition of a node ends up introducing a fairly large number of notations:

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 1/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

Does the node definition look intimidating to you at first glance? Do not worry. Hopefully, it will
make more sense once we have explained the notations, which we shall do next:

Entity Description

The current layer , where is the number of layers that have weights
and biases. We use and to denote the input and output layers.

The number of nodes in the current layer.

The number of nodes in the previous layer.

The th node of the current layer, .

The th node of the previous layer, .

The current training example , where is the number of training

examples.

A weighted sum of the activations of the previous layer, shifted by a bias.

A weight that scales the th activation of the previous layer.

A bias in the current layer.

An activation in the current layer.

An activation in the previous layer.

An activation function used in the current layer.

To put it concisely, a node in the current layer depends on every node in the previous layer, and
the following visualization can help us see that more clearly:

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 2/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

a_{k - 2, i}^[l - 1] w_{j, k - 2}^[l] a_{j, i}^[l]

w_{j, k - 1}^[l]

a_{k - 1, i}^[l - 1] w_{j, k}^[l] a_{j + 1, i}^[l]

a_{k, i}^[l - 1] a_{j + 2, i}^[l]

Figure 1: A node in the current layer.

Moreover, a node in the previous layer affects every node in the current layer, and with a change
in highlighting, we will also be able to see that more clearly:

a_{k - 2, i}^[l - 1] a_{j, i}^[l]

a_{k - 1, i}^[l - 1] w_{j, k}^[l] a_{j + 1, i}^[l]

w_{j + 1, k}^[l]

a_{k, i}^[l - 1] w_{j + 2, k}^[l] a_{j + 2, i}^[l]

Figure 2: A node in the previous layer.

In the future, we might want to write an implement from scratch in, for example, Python. To take
advantage of the heavily optimized versions of vector and matrix operations that come bundled
with libraries such as NumPy, we need to vectorize and .

To begin with, we vectorize the nodes:

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 3/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

which we can write as

where , , , , , and lastly,

. We have used a colon to clarify that is the th column of , and so on.

Next, we vectorize the training examples:

where , , and . In addition, have a look at the

NumPy documentation if you want to read a well-written explanation of broadcasting.

We would also like to establish two additional notations:

where denotes the inputs and denotes the predictions/outputs.

Finally, we are ready to define the cost function:

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 4/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

where denotes the targets and can be tailored to our needs.

We are done with forward propagation! Next up: backward propagation, also known as
backpropagation, which involves computing the gradient of the cost function with respect to the
weights and biases.

Backward Propagation
We will make heavy use of the chain rule in this section, and to understand better how it works,
we first apply the chain rule to the following example:

Note that may affect , and may depend on ; thus,

Great! If we ever get stuck trying to compute or understand some partial derivative, we can
always go back to , , and . Hopefully, these equations will provide the clues
necessary to move forward. However, be extra careful not to confuse the notation used for the
chain rule example with the notation we use elsewhere in this series. The overlap is unintentional.

Now, let us concentrate on the task at hand:

Vectorization results in

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 5/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

which we can write as

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 6/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

where , , , and .

Looking back at and , we see that the only unknown entity is . By applying the
chain rule once again, we get

where .

Next, we present the vectorized version:

which compresses into

where and .

We have already encountered

and for the sake of completeness, we also clarify that

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 7/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

where .

On purpose, we have omitted the details of ; consequently, we

cannot derive an analytic expression for , which we depend on in . However, since
the second post of this series will be dedicated to activation functions, we will instead derive
there.

Furthermore, according to , we see that also depends on . Now, it might

come as a surprise, but has already been computed when we reach the th layer during
backward propagation. How did that happen, you may ask. The answer is that every layer paves
the way for the previous layer by also computing , which we shall do now:

As usual, our next step is vectorization:

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 8/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

which we can write as

where .

Summary
Forward propagation is seeded with and evaluates a set of recurrence relations to
compute the predictions . We also compute the cost .

Backward propagation, on the other hand, is seeded with and evaluates a

different set of recurrence relations to compute and . If not stopped

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 9/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

prematurely, it eventually computes , a partial derivative we usually ignore.

Moreover, let us visualize the inputs we use and the outputs we produce during the forward and
backward propagations:

W^[l] b^[l]

A^[l - 1] Z^[l] A^[l]

cache^[l]

dA^[l - 1] dZ^[l] dA^[l]

dW^[l] db^[l]

Figure 3: An overview of inputs and outputs.

Now, you might have noticed that we have yet to derive an analytic expression for the
backpropagation seed . To recap, we have deferred the derivations that
concern activation functions to the second post of this series. Similarly, since the third post will
be dedicated to cost functions, we will instead address the derivation of the backpropagation
seed there.

Last but not least: congratulations! You have made it to the end (of the first post). 🏅

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 10/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

Jonas Lalin

Yet another blog about deep learning.

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 11/11

How To Build Your Own Neural Network From Scratch in
No ratings yet
How To Build Your Own Neural Network From Scratch in
6 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
3.NN Backprop
No ratings yet
3.NN Backprop
56 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Unit 2 v1.
No ratings yet
Unit 2 v1.
41 pages
Module 02
No ratings yet
Module 02
20 pages
Unit 2 Deep Learning
No ratings yet
Unit 2 Deep Learning
19 pages
Feed Forward Neural Network
No ratings yet
Feed Forward Neural Network
16 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
week 03-04 - Deep Feedforward Networks - Intro
141 pages
Lec 23
No ratings yet
Lec 23
13 pages
Udacity Deep LEarning Part4 RNN
No ratings yet
Udacity Deep LEarning Part4 RNN
338 pages
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
No ratings yet
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
18 pages
BackProp in Recurrent NNs
100% (1)
BackProp in Recurrent NNs
10 pages
Neural Nets
No ratings yet
Neural Nets
33 pages
lec6 (1)
No ratings yet
lec6 (1)
18 pages
Chapter Neural Networks
No ratings yet
Chapter Neural Networks
14 pages
Back Propagation
No ratings yet
Back Propagation
56 pages
DS303_NN
No ratings yet
DS303_NN
20 pages
M3_Transcript
No ratings yet
M3_Transcript
10 pages
Understanding Multi-Layer Feed-Forward Neural Networks in Machine Learning
No ratings yet
Understanding Multi-Layer Feed-Forward Neural Networks in Machine Learning
4 pages
04 - Neural Networks PDF
No ratings yet
04 - Neural Networks PDF
46 pages
A Beginner's Tutorial For CNN
100% (1)
A Beginner's Tutorial For CNN
35 pages
Neural Networks - Learning
No ratings yet
Neural Networks - Learning
26 pages
Lecture_09_slides_-_after
No ratings yet
Lecture_09_slides_-_after
57 pages
UNIT_1_DL
No ratings yet
UNIT_1_DL
18 pages
DL 02 Deep Forward Networks
No ratings yet
DL 02 Deep Forward Networks
47 pages
Main
No ratings yet
Main
25 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
18 pages
Notes Chapter Neural Networks
No ratings yet
Notes Chapter Neural Networks
18 pages
Final PPT ANN
No ratings yet
Final PPT ANN
30 pages
Introduction To Feedforward Neural Networks
No ratings yet
Introduction To Feedforward Neural Networks
20 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
appendhhdhdh
No ratings yet
appendhhdhdh
17 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
What is Gradient Based Learning in Deep Learning
No ratings yet
What is Gradient Based Learning in Deep Learning
12 pages
Feedforward
No ratings yet
Feedforward
34 pages
Chapter 3-3 Neural Network-Back Propagation
No ratings yet
Chapter 3-3 Neural Network-Back Propagation
32 pages
Slides 11
No ratings yet
Slides 11
48 pages
1. Introduction to deep learning- Deep feed forward network
No ratings yet
1. Introduction to deep learning- Deep feed forward network
24 pages
backprop unit 2
No ratings yet
backprop unit 2
5 pages
Sparseautoencoder 2011new
No ratings yet
Sparseautoencoder 2011new
19 pages
Module 2
No ratings yet
Module 2
44 pages
Feedforward Neural Networks - Part 2 - Parveen Khurana - Medium
No ratings yet
Feedforward Neural Networks - Part 2 - Parveen Khurana - Medium
39 pages
A2.2 DNN Update 2
No ratings yet
A2.2 DNN Update 2
51 pages
2. Neural Network Training
No ratings yet
2. Neural Network Training
73 pages
Institute For Advanced Management Systems Research Department of Information Technologies Abo Akademi University
No ratings yet
Institute For Advanced Management Systems Research Department of Information Technologies Abo Akademi University
41 pages
Chapter3
No ratings yet
Chapter3
30 pages
SHALLOW NETWORKS VERSUS DEEP NETWORKS
No ratings yet
SHALLOW NETWORKS VERSUS DEEP NETWORKS
6 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
Neural Networks Handout
No ratings yet
Neural Networks Handout
7 pages
Understanding and Coding Neural Networks From Scratch in Python and R
No ratings yet
Understanding and Coding Neural Networks From Scratch in Python and R
12 pages
TO Artificial Neural Networks
No ratings yet
TO Artificial Neural Networks
22 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
AI17-Neural Networks
No ratings yet
AI17-Neural Networks
34 pages
Image Classification With Feed-Forward Neural Networks: Meller, Matula and Chłąd
No ratings yet
Image Classification With Feed-Forward Neural Networks: Meller, Matula and Chłąd
7 pages
Contents MLP PDF
No ratings yet
Contents MLP PDF
60 pages
AN2DL_02_2324_Perceptron_2_FeedForward
No ratings yet
AN2DL_02_2324_Perceptron_2_FeedForward
55 pages
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
MS For Raft Concrete Works (3 Files Merged)
No ratings yet
MS For Raft Concrete Works (3 Files Merged)
13 pages
Using Electricity Fun Activities Games Reading Comprehension Exercis 16716
No ratings yet
Using Electricity Fun Activities Games Reading Comprehension Exercis 16716
2 pages
Technical Specification 2200 SM
No ratings yet
Technical Specification 2200 SM
7 pages
56x00 Series
No ratings yet
56x00 Series
82 pages
Zoom Basic Functions - Final
No ratings yet
Zoom Basic Functions - Final
22 pages
CHEMISTRY NOTES 2025
No ratings yet
CHEMISTRY NOTES 2025
9 pages
Top 250 Earners in The City of Albuquerque: Name Department Grade Job Title Total Earnings
No ratings yet
Top 250 Earners in The City of Albuquerque: Name Department Grade Job Title Total Earnings
7 pages
Sap ABAP User Exit
No ratings yet
Sap ABAP User Exit
32 pages
Hikkake Pattern
100% (1)
Hikkake Pattern
16 pages
Cyber Sec
No ratings yet
Cyber Sec
4 pages
Rate List
No ratings yet
Rate List
1 page
ICICI Prudential Multi-Asset Fund Tactical Note-1
No ratings yet
ICICI Prudential Multi-Asset Fund Tactical Note-1
7 pages
VT 4.8 Operating Instructions
No ratings yet
VT 4.8 Operating Instructions
4 pages
How To Choose The Best Incoterm IN 2020: A Guide For Exporters and Importers
No ratings yet
How To Choose The Best Incoterm IN 2020: A Guide For Exporters and Importers
29 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
80 pages
Permission To Administer Medication
No ratings yet
Permission To Administer Medication
2 pages
Carrot: Soil and Climate
No ratings yet
Carrot: Soil and Climate
2 pages
IRCA Application Requirements
No ratings yet
IRCA Application Requirements
1 page
African Music Word File
No ratings yet
African Music Word File
5 pages
Writing Simple Automation Scripts With Python
No ratings yet
Writing Simple Automation Scripts With Python
3 pages
Solo Leveling - Volume 08 (Yen Press) (Kobo - LNWNCentral)
50% (2)
Solo Leveling - Volume 08 (Yen Press) (Kobo - LNWNCentral)
277 pages
The KOLACHI Method - Intro To 4 Levels
No ratings yet
The KOLACHI Method - Intro To 4 Levels
5 pages
Welder SMAW MAG MIG SAW
No ratings yet
Welder SMAW MAG MIG SAW
88 pages
Modelling and Analysis of Guide Rail Brackets and Attaching Parts
No ratings yet
Modelling and Analysis of Guide Rail Brackets and Attaching Parts
10 pages
Demo Class IELTS
No ratings yet
Demo Class IELTS
18 pages
CORE A Real-Time Network Emulator
No ratings yet
CORE A Real-Time Network Emulator
8 pages
Microsoft Word - Rajesh 45
No ratings yet
Microsoft Word - Rajesh 45
5 pages
Marketing Strategies and Plans
No ratings yet
Marketing Strategies and Plans
10 pages
OD&D Setting
100% (3)
OD&D Setting
11 pages
Customer Journey Maps and Service Blueprints
100% (1)
Customer Journey Maps and Service Blueprints
12 pages

Feedforward Neural Networks in Depth, Part 1 - Forward and Backward Propagations - I, Deep Learning

Uploaded by

Feedforward Neural Networks in Depth, Part 1 - Forward and Backward Propagations - I, Deep Learning

Uploaded by

3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

Feedforward Neural Networks in

Now, whatever we come up with, we have to support

The number of nodes in the current layer.

The number of nodes in the previous layer.

The th node of the current layer, .

The th node of the previous layer, .

The current training example , where is the number of training

A weighted sum of the activations of the previous layer, shifted by a bias.

A weight that scales the th activation of the previous layer.

A bias in the current layer.

An activation in the current layer.

An activation in the previous layer.

An activation function used in the current layer.

a_{k - 2, i}^[l - 1] w_{j, k - 2}^[l] a_{j, i}^[l]

a_{k - 1, i}^[l - 1] w_{j, k}^[l] a_{j + 1, i}^[l]

a_{k, i}^[l - 1] a_{j + 2, i}^[l]

Figure 1: A node in the current layer.

a_{k - 2, i}^[l - 1] a_{j, i}^[l]

a_{k - 1, i}^[l - 1] w_{j, k}^[l] a_{j + 1, i}^[l]

a_{k, i}^[l - 1] w_{j + 2, k}^[l] a_{j + 2, i}^[l]

Figure 2: A node in the previous layer.

To begin with, we vectorize the nodes:

which we can write as

where , , , , , and lastly,

Next, we vectorize the training examples:

where , , and . In addition, have a look at the

We would also like to establish two additional notations:

where denotes the inputs and denotes the predictions/outputs.

Finally, we are ready to define the cost function:

where denotes the targets and can be tailored to our needs.

Note that may affect , and may depend on ; thus,

Now, let us concentrate on the task at hand:

which we can write as

Next, we present the vectorized version:

which compresses into

We have already encountered

and for the sake of completeness, we also clarify that

On purpose, we have omitted the details of ; consequently, we

Furthermore, according to , we see that also depends on . Now, it might

As usual, our next step is vectorization:

which we can write as

Backward propagation, on the other hand, is seeded with and evaluates a

prematurely, it eventually computes , a partial derivative we usually ignore.

A^[l - 1] Z^[l] A^[l]

dA^[l - 1] dZ^[l] dA^[l]

Figure 3: An overview of inputs and outputs.

Yet another blog about deep learning.

You might also like