0% found this document useful (0 votes)
28 views

Activation Functions

This presentation covers activation functions used in artificial neural networks. It discusses linear and non-linear activation functions, including sigmoid, tanh, softmax and ReLU. Non-linear activation functions are important for neural networks to learn complex patterns from data and model non-linear relationships. The presentation compares various non-linear activation functions and discusses their properties such as differentiability, saturation points, and advantages/disadvantages for training deep neural networks.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Activation Functions

This presentation covers activation functions used in artificial neural networks. It discusses linear and non-linear activation functions, including sigmoid, tanh, softmax and ReLU. Non-linear activation functions are important for neural networks to learn complex patterns from data and model non-linear relationships. The presentation compares various non-linear activation functions and discusses their properties such as differentiability, saturation points, and advantages/disadvantages for training deep neural networks.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Presentation on

Activation Functions

By
Mumtaz Khan
&
Aminullah
Ph. D Scholars

Presented to
Dr Mohammad Naeem
Assistant Professor
Department of Computer Science
University of Peshawar
This presentation Covers

• Activation Function
• Activation Function Types
• Linear
• Non-linear
• Differentiability and Continuity of the Activation Function
• Derivatives
• Saturation Points
Activation function
• In computational networks, the Activation Function of a Node
defines the output of that node given an input or set of inputs.
• Also known as Transfer Function.
• It can be attached b/w two Neural Networks
• It is used to determine the output of neural network like yes or no. It maps
the resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the
function).
• These are used to import a non-linearity properties into Network
• Which is one of the important factors which affect your
results and accuracy of your model.

• Activation functions are important for a Artificial Neural


Network to Learn and understand the complex patterns.
• The main purpose is to convert a input signal of a node in a A-NN to an output
signal. 3
The question arises that why can’t we do
it without activating the input signal?
• If we do not apply a Activation function then the output signal would simply be a simple
linear function.

• A linear function is just a polynomial of one degree. Now, a linear equation is easy to
solve but they are limited in their complexity and have less power to learn complex
functional mappings from data.

• A Neural Network without Activation function would simply be a Linear regression


Model, which has limited power and does not performs good most of the times.

• Also without activation function our Neural network would not be able to learn and
model other complicated kinds of data such as images, videos , audio etc.

• It is used to determine the output of neural network like yes or no. It maps the resulting
values in between 0 to 1 or -1 to 1 etc. (depending upon the function)
4
Types of Activation function

• Linear Activation Function

• Non-linear Activation Functions

5
Linear or Identity Activation function

• As you can see the function is a line or linear. Therefore, the output of the functions
will not be confined between any range.

• Equation : f(x) = x
• Range : (-infinity to infinity)
• Note: It doesn’t help with the complexity or various parameters of usual data that
is fed to the neural networks 6
Step function
• Step Function is one of the simplest kind of activation functions. In this, we consider a
threshold value and if the value of net input say y is greater than the threshold then
the neuron is activated.

• F(x)=1,if x>=0

7
Why do we need non-Linearities
• Non-linear functions are those which have degree more than
one and they have a curvature when we plot a Non-Linear
function. Now we need a Neural Network Model to learn and
represent almost anything and any arbitrary complex function
which maps inputs to outputs.

• Hence using a non linear Activation we are able to generate


non-linear mappings from inputs to outputs.

8
Non-Linear Activation function
• The Non-linear Activation Functions are the most used activation functions.
• Non-linearity helps to makes the graph look something like this

• It makes, it easy for the model to generalize or adapt with variety of data and to
differentiate between the output
9
Non-Linear Activation function

• The main terminologies needed to understand for non-linear


functions are:

• Derivative or Differential: Change in y-axis w.r.t. change in x-


axis. It is also known as slope.

• Monotonic function: A function which is either entirely non-


increasing or non-decreasing.

• The Non-linear Activation Functions are mainly divided on the


basis of their range or curves-

10
Sigmoid / Logistic function

• The Sigmoid Function curve looks like a S-shape

• The main reason why we use sigmoid function is because it


exists between (0 to 1).
• Therefore, it is especially used for models where we have to
predict the probability as an output. Since probability of
anything exists only between the range of 0 and 1, sigmoid is
the right choice.
• The function is differentiable
• That means, we can find the slope of the sigmoid curve at
any two points.

11
Sigmoid function

• The beauty of an exponent is that the value never reaches zero


nor exceed 1 in the above equation.
• The large negative numbers are scaled towards 0 and large
positive numbers are scaled towards.

• It is mostly used in output


layer of the network.

• Used for binary classification


Properties of Sigmoid function

• The sigmoid function returns a real-valued output.

• The first derivative of the sigmoid function will be non-negative


or non-positive.

• Non-Negative: If a number is greater than or equal to zero.


• Non-Positive: If a number is less than or equal to Zero.
• Derivative of Sigmoid Function as:
Disadvantages of Sigmoid function

• Major reasons which have made it fall out of popularity –


• Vanishing gradient problem.
• For very high or very low values of X, there is almost no change to the
prediction, causing a vanishing gradient problem.
• This can result in the network refusing to learn further, or being too slow to
reach an accurate prediction.

• Secondly , its output isn’t zero centered. It makes the gradient


updates go too far in different directions. 0 < output < 1, and it
makes optimization harder.

• Sigmoid saturate and kill gradients.


• Sigmoid have slow convergence.
Tanh or hyperbolic tangent Function
• tanh is also like logistic sigmoid but better
• The range of the tanh function is from (-1 to 1).
• tanh is also sigmoidal (s - shaped).

• The advantage is that the negative inputs will be mapped


strongly negative and the zero inputs will be mapped near zero
in the tanh graph 15
Tanh or hyperbolic tangent Function
• The hyperbolic tangent (tanh) function (used for hidden layer neuron
output) is an alternative to Sigmoid function.
• It is defined by the formula.

• tanh function is similar to Sigmoid function

• It squashes real-valued number to the range between -1 and 1, i.e.,  tanh(x)∈(−1,1) .


Tanh or hyperbolic tangent Function
• Therefore, in practice the tanh is always preferred to the
sigmoid.
• The derivative of tanh function is defined as:

tanh′(x)=1−

• The red dotted line in the previous graph, it interprets that tanh
also saturate and kill gradient, since tanh’s derivative has
similar shape as compare to Sigmoid’s derivative.

• tanh has stronger gradients, since data is centered around 0,


the derivatives are higher, and tanh avoids bias in the gradients
Hard hyperbolic tangent Function
• It is also known as the Hardtanh function
• Another variant of the tanh function
• It is cheaper and more computational efficient version of tanh
• The range of the Hardtanh function is (-1 to 1).
• It is defined as:
• The derivative of Hardtanh as:

18
Softmax Activation Function
• Used to compute probability distribution from a vector of real
numbers.
• Calculate the probabilities of each target class over all possible target
classes.
• Calculated probabilities help for determining the target class for the
given inputs.
• The output range of the Softmax function is from (0 to 1).
• Softmax function used for multi-classification model.
• The formula of softmax function is defined as:

• Computes the exponential (e-power) of the given input value and


the sum of exponential values of all the values in the inputs.
• Then the ratio of the exponential of the input value and the sum of
exponential values is the output of the softmax function.
Properties and Usage of Softmax Function
• Properties of Softmax as:
• The calculated probabilities will be in the range of 0 to 1
• The sum of all the probabilities is equals to 1.
• Usage of Softmax as:
• Used in multiple classification logistic regression model.
• Used in the different layers of neural networks as well.

Example of Softmax
ReLU(Rectified Linear Unit) Function
• Computationally efficient—
• allows the network to converge very quickly.
• Non-linear—although it looks like a linear function, ReLU has a
derivative function and allows for backpropagation.
• Similar to identity function for X>0
• ReLU (used for hidden layer neuron output) is defined as:
• It’s just R(x) = max(0,x) i.e if x < 0 , R(x) = 0 and if x >= 0 , R(x) = x.

• Note: It avoids and rectifies vanishing gradient problem.


Problems in ReLU(Rectified Linear Unit) Function
• But its limitation is that it should only be used within Hidden
Layers of a Neural Network Model.

• Another problem with ReLu is that some gradients can be fragile during training and
can die.
• It can cause a weight update which will makes it never activate on any data
point again.
• Simply saying that ReLu could result in Dead Neurons.
OR
• The Dying ReLU problem—
• when inputs approach zero, or are negative, the gradient of the function
becomes zero, the network cannot perform backpropagation and cannot learn
Leaky ReLU Function
• To fix this problem another modification was introduced called Leaky ReLu to fix the
problem of dying neurons.
• It introduces a small slope to keep the updates alive.

• Leaky RelU is an attempt to solve the dying problem of ReLU.


• This variation of ReLU has a small positive slope in the negative area
• So it does enable backpropagation, even for negative input values
• Otherwise like ReLU
• The leak helps to increase the range of the ReLU function.
• Usually, the value of a is 0.01 or so.
DISADVANTAGES:
• Results not consistent—Leaky ReLU does not provide consistent predictions for
negative input values
Randomized ReLU Function
• When a is not 0.01 then it is called Randomized ReLU.
• Therefore the range of the Leaky ReLU is (-infinity to infinity).
• Both Leaky and Randomized ReLU functions are monotonic in
nature. Also, their derivatives also monotonic in nature.
Advantages:
Allows the negative slope to be learned—unlike leaky ReLU,
• This function provides the slope of the negative part of the function as an
argument.
• It is, therefore, possible to perform backpropagation and learn the most
appropriate value of α.
• Otherwise like ReLU.
• Disadvantages:
• May perform differently for different problems

Formula of Randomized ReLU:


SoftPlus Function
• SoftPlus Function is a smooth version of ReLU function, which has
• Smoothing and non-zero gradient properties.
• Proposed by Dugas et al., 2001.
• The Outputs Range is (0, ∞)
• The SoftPlus function is defined as :
• The derivative of softPlus function is logistic function

•  It Provides more stabilization and performance to deep neural networks than ReLU
and Sigmoid function.
• It has been used in speech recognition.
Activation Function and its Derivatives
TYPES AND POSITIONS OF AFS USED IN DL ARCHITECTURES
DL ACTIVATION FUNCTIONS AND THEIR CORRESPONDING EQUATIONS
FOR COMPUTATION
References
• https://towardsdatascience.com/activation-functions-neural-networks-1cb
d9f8d91d6
• https://www.youtube.com/watch?v=9vB5nzrL4hY
• https://www.youtube.com/watch?v=3r65ZuFyi5Y
• http://dataaspirant.com/2017/03/07/difference-between-softmax-
function-and-sigmoid-function/
• https://towardsdatascience.com/activation-functions-and-its-types-which-
is-better-a9a5310cc8f
• https://missinglink.ai/guides/neural-network-concepts/7-types-neural-net
work-activation-functions-right/
• https://theclevermachine.wordpress.com/2014/09/08/derivation-derivativ
es-for-common-neural-network-activation-functions/
• https://en.wikipedia.org/wiki/Activation_function
• Nwankpa, Chigozie, Winifred Ijomah, Anthony Gachagan, and Stephen Marshall. "Activation
Functions: Comparison of trends in Practice and Research for Deep Learning." arXiv preprint
arXiv:1811.03378 (2018).

You might also like