DOC-20241108-WA0006.
DOC-20241108-WA0006.
Initial weight
W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55
Bias Values
b1=0.35 b2=0.60
Target Values
T1=0.01
T2=0.99
Now, we first calculate the values of H1 and H2 by a
forward pass.
Part 1: Calculate Forward Propagation Error
Input layer--→Hidden layer--→Output layer
To find the value of H1 (In and Out) we first multiply
the input value from the weights as
H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.35
H1=0.3775 (In)
We have updated all the weights. We found the error
0.298371109 on the network when we fed forward
the 0.05 and 0.1 inputs. In the first round of
Backpropagation, the total error is down to
0.291027924. After repeating this process 10,000, the
total error is down to 0.0000351085. At this point, the
outputs neurons generate 0.159121960 and
0.984065734 i.e., nearby our target value when we
feed forward the 0.05 and 0.1.
The MLP algorithm suggests that the weights are
initialised to small random numbers, both positive and
negative.
If the initial weight values are close to 1 or -1 then the
inputs to the sigmoid are also likely to be close to ±1
and so the output of the neuron is either 0 or 1.
If the weights are very small (close to zero) then the
input is still close to 0 and so the output of the neuron
is just linear, so we get a linear model.
Choosing the size of the initial values needs a little
more thought, then. Each neuron is getting input from n
different places (either input nodes if the neuron is in
the hidden layer, or hidden neurons if it is in the output
layer)
If we view the values of these inputs as having uniform
variance, then the typical input to the neuron will be
w√n, where w is the initialization value of the weights.
So a common trick is to set the weights in the range
−1/ √ n < w < 1/ √ n, where n is the number of nodes in
the input layer to those weights.
In neural networks, the activation function is a
mathematical “gate” in between the input feeding the
current neuron and its output going to the next layer.
The activation functions are at the very core of
Machine Learning. They determine the output of a
model, its accuracy, and computational efficiency. In
some cases, activation functions have a major effect on
the model’s ability to converge and the convergence
speed.
The following are the most popular activation
functions in Machine Learning algorithms
Sigmoid (Logistic)
Hyperbolic Tangent (Tanh)
Rectified Linear Unit (ReLU)
Leaky ReLU
Parametric Leaky ReLU (PReLU)
Exponential Linear Units (ELU)
Scaled Exponential Linear Unit (SELU)
Sigmoid (Logistic)
The Sigmoid function (also known as the Logistic
function) is one of the most widely used activation
function. The function is defined as:
Another very popular and widely used activation
function is the Hyperbolic Tangent, also known
as Tanh. It is defined as:
Rectified Linear Unit (ReLU)
The Rectified Linear Unit (ReLU) is the most
commonly used activation function in deep learning.
The function returns 0 if the input is negative, but for
any positive input, it returns that value back. The
function is defined as:
Leaky ReLU
Leaky ReLU is an improvement over the ReLU activation
function. It has all properties of ReLU, plus it will never
have dying ReLU problem. Leaky ReLU is defined as:
f(x) = max(αx, x)
Parametric leaky ReLU (PReLU)
Parametric leaky ReLU (PReLU) is a variation of
Leaky ReLU, where α is authorized to be learned during
training (instead of being a hyperparameter, it becomes a
parameter that can be modified by backpropagation like
any other parameters). This was reported to strongly
outperform ReLU on large image datasets, but on smaller
datasets it runs the risk of overfitting the training set.
Exponential Linear Unit (ELU)