0% found this document useful (0 votes)
37 views

Mod 2.3 - Activation Function

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Mod 2.3 - Activation Function

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

• Why Do We Need Activation Functions?

• An activation function Φ(v) in the output layer can


control the nature of the output (e.g., probability value
in [0, 1])

• In multilayer neural networks, activation functions


bring non-linearity into hidden layers, which increases
the complexity of the model.

A neural network with any number of layers but only linear


activations can be shown to be equivalent to a single-layer
network.

Binary Step Function


Binary step function depends on a threshold value that decides whether a
neuron should be activated or not. 

The input fed to the activation function is compared to a certain threshold; if


the input is greater than it, then the neuron is activated, else it is deactivated,
meaning that its output is not passed on to the next hidden layer.

Mathematically it can be represented as:

Binary Step Function


Here are some of the limitations of binary step function:
 It cannot provide multi-value outputs—for example, it cannot be used
for multi-class classification problems.
 The gradient of the step function is zero, which causes a hindrance in
the backpropagation process.

Linear Activation Function


The linear activation function, also known as "no activation," or "identity
function” is where the activation is proportional to the input.

The function doesn't do anything to the weighted sum of the input, it simply
spits out the value it was given.

Linear Activation Function

Mathematically it can be represented as: f(x)=x

However, a linear activation function has two major problems :


 It’s not possible to use backpropagation as the derivative of the
function is a constant and has no relation to the input x. 
 All layers of the neural network will collapse into one if a linear
activation function is used. No matter the number of layers in the
neural network, the last layer will still be a linear function of the first
layer. So, essentially, a linear activation function turns the neural
network into just one layer.

Non-Linear Activation Functions


The linear activation function is simply a linear regression model. 

Because of its limited power, this does not allow the model to create
complex mappings between the network’s inputs and outputs. 

Non-linear activation functions solve the following limitations of linear


activation functions:
 They allow backpropagation because now the derivative function
would be related to the input, and it’s possible to go back and
understand which weights in the input neurons can provide a better
prediction.
 They allow the stacking of multiple layers of neurons as the output
would now be a non-linear combination of input passed through
multiple layers. Any output can be represented as a functional
computation in a neural network.

Non-Linear Neural Networks Activation Functions


Sigmoid / Logistic Activation Function 

This function takes any real value as input and outputs values in the range
of 0 to 1. 

The larger the input (more positive), the closer the output value will be to
1.0, whereas the smaller the input (more negative), the closer the output will
be to 0.0, as shown below.
Sigmoid/Logistic Activation Function

Mathematically it can be represented as:

Sigmoid/logistic activation function is one of the most widely used functions:

Reasons
 It is commonly used for models where we have to predict the
probability as an output. Since probability of anything exists only
between the range of 0 and 1, sigmoid is the right choice because of
its range.
 The function is differentiable and provides a smooth gradient, i.e.,
preventing jumps in output values. This is represented by an S-shape
of the sigmoid activation function. 

Tanh Function (Hyperbolic Tangent)


Tanh function is very similar to the sigmoid/logistic activation function, and even
has the same S-shape with the difference in output range of -1 to 1. In Tanh,
the larger the input (more positive), the closer the output value will be to 1.0,
whereas the smaller the input (more negative), the closer the output will be to -
1.0.

Tanh Function (Hyperbolic Tangent)

Mathematically it can be represented as:

Advantages of using this activation function are:


 The output of the tanh activation function is Zero centered; hence we can
easily map the output values as strongly negative, neutral, or strongly
positive.
 Usually used in hidden layers of a neural network as its values lie
between -1 to 1; therefore, the mean for the hidden layer comes out to
be 0 or very close to it. It helps in centering the data and makes learning
for the next layer much easier.

ReLU Function
ReLU stands for Rectified Linear Unit. 

Although it gives an impression of a linear function, ReLU has a derivative


function and allows for backpropagation while simultaneously making it
computationally efficient. 

The main catch here is that the ReLU function does not activate all the neurons
at the same time. 

The neurons will only be deactivated if the output of the linear transformation is
less than 0.
ReLU Activation Function

Mathematically it can be represented as : f(x)=max(0,x)


The advantages of using ReLU as an activation function are as follows:
 Since only a certain number of neurons are activated, the ReLU function
is far more computationally efficient when compared to the sigmoid and
tanh functions.
 ReLU accelerates the convergence of gradient descent towards the
global minimum of the loss function due to its linear, non-saturating
property.

The limitations faced by this function are:


 All the negative input values become zero immediately, which decreases
the model’s ability to fit or train from the data properly. 
 The Dying ReLU problem.(Solved by an improved version named Leaky
ReLU)
Neural networks are a set of algorithms that are designed to
recognize trends/relationships in a given set of training data. These
algorithms are based on the way human neurons process
information.

This equation represents how a neural network processes the input


data at each layer and eventually produces a predicted output value.

To train — the process by which the model maps the relationship


between the training data and the outputs — the neural network
updates its hyperparameters, the weights, wT, and biases, b, to
satisfy the equation above.

Each training input is loaded into the neural network in a process


called forward propagation. Once the model has produced an
output, this predicted output is compared against the given target
output in a process called backpropagation — the
hyperparameters of the model are then adjusted so that it now
outputs a result closer to the target output.

This is where loss functions come in. Loss functions are one of the
most important aspects of neural networks, as they (along with the
optimization functions) are directly responsible for fitting the model
to the given training data
Loss Functions Overview

A loss function is a function that compares the target and


predicted output values; measures how well the neural network
models the training data. When training, we aim to minimize this
loss between the predicted and target outputs.

The hyperparameters are adjusted to minimize the average loss


— we find the weights, wT, and biases, b, that minimize the value
of J (average loss).

You might also like