0% found this document useful (0 votes)
93 views

Feedforward Neural Networks in Depth, Part 1 - Forward and Backward Propagations - I, Deep Learning

This document provides an in-depth overview of forward and backward propagation in feedforward neural networks. It introduces notations to represent nodes, layers, weights, biases, activations and other entities. Forward propagation is explained as computing predictions and cost based on these notations. Backward propagation uses the chain rule to compute gradients of the cost with respect to weights and biases in order to update them during training. Vectorization of equations is also discussed to take advantage of optimized linear algebra operations.

Uploaded by

Vikash Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views

Feedforward Neural Networks in Depth, Part 1 - Forward and Backward Propagations - I, Deep Learning

This document provides an in-depth overview of forward and backward propagation in feedforward neural networks. It introduces notations to represent nodes, layers, weights, biases, activations and other entities. Forward propagation is explained as computing predictions and cost based on these notations. Backward propagation uses the chain rule to compute gradients of the cost with respect to weights and biases in order to update them during training. Vectorization of equations is also discussed to take advantage of optimized linear algebra operations.

Uploaded by

Vikash Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

I, Deep Learning

Feedforward Neural Networks in


Depth, Part 1: Forward and Backward
Propagations
Dec 10, 2021

This post is the first of a three-part series in which we set out to derive the mathematics behind
feedforward neural networks. They have

an input and an output layer with at least one hidden layer in between,
fully-connected layers, which means that each node in one layer connects to every node in
the following layer, and
ways to introduce nonlinearity by means of activation functions.

We start with forward propagation, which involves computing predictions and the associated cost
of these predictions.

Forward Propagation
Settling on what notations to use is tricky since we only have so many letters in the Roman
alphabet. As you browse the Internet, you will likely find derivations that have used different
notations than the ones we are about to introduce. However, and fortunately, there is no right or
wrong here; it is just a matter of taste. In particular, the notations used in this series take
inspiration from Andrew Ng’s Standard notations for Deep Learning. If you make a comparison,
you will find that we only change a couple of the details.

Now, whatever we come up with, we have to support

multiple layers,
several nodes in each layer,
various activation functions,
various types of cost functions, and
mini-batches of training examples.

As a result, our definition of a node ends up introducing a fairly large number of notations:

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 1/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

Does the node definition look intimidating to you at first glance? Do not worry. Hopefully, it will
make more sense once we have explained the notations, which we shall do next:

Entity Description

The current layer , where is the number of layers that have weights
and biases. We use and to denote the input and output layers.

The number of nodes in the current layer.

The number of nodes in the previous layer.

The th node of the current layer, .

The th node of the previous layer, .

The current training example , where is the number of training


examples.

A weighted sum of the activations of the previous layer, shifted by a bias.

A weight that scales the th activation of the previous layer.

A bias in the current layer.

An activation in the current layer.

An activation in the previous layer.

An activation function used in the current layer.

To put it concisely, a node in the current layer depends on every node in the previous layer, and
the following visualization can help us see that more clearly:

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 2/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

a_{k - 2, i}^[l - 1] w_{j, k - 2}^[l] a_{j, i}^[l]

w_{j, k - 1}^[l]

a_{k - 1, i}^[l - 1] w_{j, k}^[l] a_{j + 1, i}^[l]

a_{k, i}^[l - 1] a_{j + 2, i}^[l]

Figure 1: A node in the current layer.

Moreover, a node in the previous layer affects every node in the current layer, and with a change
in highlighting, we will also be able to see that more clearly:

a_{k - 2, i}^[l - 1] a_{j, i}^[l]

a_{k - 1, i}^[l - 1] w_{j, k}^[l] a_{j + 1, i}^[l]

w_{j + 1, k}^[l]

a_{k, i}^[l - 1] w_{j + 2, k}^[l] a_{j + 2, i}^[l]

Figure 2: A node in the previous layer.

In the future, we might want to write an implement from scratch in, for example, Python. To take
advantage of the heavily optimized versions of vector and matrix operations that come bundled
with libraries such as NumPy, we need to vectorize and .

To begin with, we vectorize the nodes:

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 3/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

which we can write as

where , , , , , and lastly,


. We have used a colon to clarify that is the th column of , and so on.

Next, we vectorize the training examples:

where , , and . In addition, have a look at the


NumPy documentation if you want to read a well-written explanation of broadcasting.

We would also like to establish two additional notations:

where denotes the inputs and denotes the predictions/outputs.

Finally, we are ready to define the cost function:

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 4/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

where denotes the targets and can be tailored to our needs.

We are done with forward propagation! Next up: backward propagation, also known as
backpropagation, which involves computing the gradient of the cost function with respect to the
weights and biases.

Backward Propagation
We will make heavy use of the chain rule in this section, and to understand better how it works,
we first apply the chain rule to the following example:

Note that may affect , and may depend on ; thus,

Great! If we ever get stuck trying to compute or understand some partial derivative, we can
always go back to , , and . Hopefully, these equations will provide the clues
necessary to move forward. However, be extra careful not to confuse the notation used for the
chain rule example with the notation we use elsewhere in this series. The overlap is unintentional.

Now, let us concentrate on the task at hand:

Vectorization results in

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 5/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

which we can write as

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 6/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

where , , , and .

Looking back at and , we see that the only unknown entity is . By applying the
chain rule once again, we get

where .

Next, we present the vectorized version:

which compresses into

where and .

We have already encountered

and for the sake of completeness, we also clarify that

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 7/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

where .

On purpose, we have omitted the details of ; consequently, we


cannot derive an analytic expression for , which we depend on in . However, since
the second post of this series will be dedicated to activation functions, we will instead derive
there.

Furthermore, according to , we see that also depends on . Now, it might


come as a surprise, but has already been computed when we reach the th layer during
backward propagation. How did that happen, you may ask. The answer is that every layer paves
the way for the previous layer by also computing , which we shall do now:

As usual, our next step is vectorization:

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 8/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

which we can write as

where .

Summary
Forward propagation is seeded with and evaluates a set of recurrence relations to
compute the predictions . We also compute the cost .

Backward propagation, on the other hand, is seeded with and evaluates a


different set of recurrence relations to compute and . If not stopped

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 9/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

prematurely, it eventually computes , a partial derivative we usually ignore.

Moreover, let us visualize the inputs we use and the outputs we produce during the forward and
backward propagations:

W^[l] b^[l]

A^[l - 1] Z^[l] A^[l]

cache^[l]

dA^[l - 1] dZ^[l] dA^[l]

dW^[l] db^[l]

Figure 3: An overview of inputs and outputs.

Now, you might have noticed that we have yet to derive an analytic expression for the
backpropagation seed . To recap, we have deferred the derivations that
concern activation functions to the second post of this series. Similarly, since the third post will
be dedicated to cost functions, we will instead address the derivation of the backpropagation
seed there.

Last but not least: congratulations! You have made it to the end (of the first post). 🏅

Subscribe

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 10/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning

Jonas Lalin

Yet another blog about deep learning.

https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 11/11

You might also like