Feedforward Neural Networks in Depth, Part 1 - Forward and Backward Propagations - I, Deep Learning
Feedforward Neural Networks in Depth, Part 1 - Forward and Backward Propagations - I, Deep Learning
I, Deep Learning
This post is the first of a three-part series in which we set out to derive the mathematics behind
feedforward neural networks. They have
an input and an output layer with at least one hidden layer in between,
fully-connected layers, which means that each node in one layer connects to every node in
the following layer, and
ways to introduce nonlinearity by means of activation functions.
We start with forward propagation, which involves computing predictions and the associated cost
of these predictions.
Forward Propagation
Settling on what notations to use is tricky since we only have so many letters in the Roman
alphabet. As you browse the Internet, you will likely find derivations that have used different
notations than the ones we are about to introduce. However, and fortunately, there is no right or
wrong here; it is just a matter of taste. In particular, the notations used in this series take
inspiration from Andrew Ng’s Standard notations for Deep Learning. If you make a comparison,
you will find that we only change a couple of the details.
multiple layers,
several nodes in each layer,
various activation functions,
various types of cost functions, and
mini-batches of training examples.
As a result, our definition of a node ends up introducing a fairly large number of notations:
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 1/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning
Does the node definition look intimidating to you at first glance? Do not worry. Hopefully, it will
make more sense once we have explained the notations, which we shall do next:
Entity Description
The current layer , where is the number of layers that have weights
and biases. We use and to denote the input and output layers.
To put it concisely, a node in the current layer depends on every node in the previous layer, and
the following visualization can help us see that more clearly:
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 2/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning
w_{j, k - 1}^[l]
Moreover, a node in the previous layer affects every node in the current layer, and with a change
in highlighting, we will also be able to see that more clearly:
w_{j + 1, k}^[l]
In the future, we might want to write an implement from scratch in, for example, Python. To take
advantage of the heavily optimized versions of vector and matrix operations that come bundled
with libraries such as NumPy, we need to vectorize and .
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 3/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 4/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning
We are done with forward propagation! Next up: backward propagation, also known as
backpropagation, which involves computing the gradient of the cost function with respect to the
weights and biases.
Backward Propagation
We will make heavy use of the chain rule in this section, and to understand better how it works,
we first apply the chain rule to the following example:
Great! If we ever get stuck trying to compute or understand some partial derivative, we can
always go back to , , and . Hopefully, these equations will provide the clues
necessary to move forward. However, be extra careful not to confuse the notation used for the
chain rule example with the notation we use elsewhere in this series. The overlap is unintentional.
Vectorization results in
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 5/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 6/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning
where , , , and .
Looking back at and , we see that the only unknown entity is . By applying the
chain rule once again, we get
where .
where and .
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 7/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning
where .
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 8/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning
where .
Summary
Forward propagation is seeded with and evaluates a set of recurrence relations to
compute the predictions . We also compute the cost .
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 9/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning
Moreover, let us visualize the inputs we use and the outputs we produce during the forward and
backward propagations:
W^[l] b^[l]
cache^[l]
dW^[l] db^[l]
Now, you might have noticed that we have yet to derive an analytic expression for the
backpropagation seed . To recap, we have deferred the derivations that
concern activation functions to the second post of this series. Similarly, since the third post will
be dedicated to cost functions, we will instead address the derivation of the backpropagation
seed there.
Last but not least: congratulations! You have made it to the end (of the first post). 🏅
Subscribe
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 10/11
3/17/23, 12:43 PM Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning
Jonas Lalin
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/ 11/11