Activation Functions
Activation Functions
Activation Functions
By
Mumtaz Khan
&
Aminullah
Ph. D Scholars
Presented to
Dr Mohammad Naeem
Assistant Professor
Department of Computer Science
University of Peshawar
This presentation Covers
• Activation Function
• Activation Function Types
• Linear
• Non-linear
• Differentiability and Continuity of the Activation Function
• Derivatives
• Saturation Points
Activation function
• In computational networks, the Activation Function of a Node
defines the output of that node given an input or set of inputs.
• Also known as Transfer Function.
• It can be attached b/w two Neural Networks
• It is used to determine the output of neural network like yes or no. It maps
the resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the
function).
• These are used to import a non-linearity properties into Network
• Which is one of the important factors which affect your
results and accuracy of your model.
• A linear function is just a polynomial of one degree. Now, a linear equation is easy to
solve but they are limited in their complexity and have less power to learn complex
functional mappings from data.
• Also without activation function our Neural network would not be able to learn and
model other complicated kinds of data such as images, videos , audio etc.
• It is used to determine the output of neural network like yes or no. It maps the resulting
values in between 0 to 1 or -1 to 1 etc. (depending upon the function)
4
Types of Activation function
5
Linear or Identity Activation function
• As you can see the function is a line or linear. Therefore, the output of the functions
will not be confined between any range.
• Equation : f(x) = x
• Range : (-infinity to infinity)
• Note: It doesn’t help with the complexity or various parameters of usual data that
is fed to the neural networks 6
Step function
• Step Function is one of the simplest kind of activation functions. In this, we consider a
threshold value and if the value of net input say y is greater than the threshold then
the neuron is activated.
• F(x)=1,if x>=0
7
Why do we need non-Linearities
• Non-linear functions are those which have degree more than
one and they have a curvature when we plot a Non-Linear
function. Now we need a Neural Network Model to learn and
represent almost anything and any arbitrary complex function
which maps inputs to outputs.
8
Non-Linear Activation function
• The Non-linear Activation Functions are the most used activation functions.
• Non-linearity helps to makes the graph look something like this
• It makes, it easy for the model to generalize or adapt with variety of data and to
differentiate between the output
9
Non-Linear Activation function
10
Sigmoid / Logistic function
11
Sigmoid function
tanh′(x)=1−
• The red dotted line in the previous graph, it interprets that tanh
also saturate and kill gradient, since tanh’s derivative has
similar shape as compare to Sigmoid’s derivative.
18
Softmax Activation Function
• Used to compute probability distribution from a vector of real
numbers.
• Calculate the probabilities of each target class over all possible target
classes.
• Calculated probabilities help for determining the target class for the
given inputs.
• The output range of the Softmax function is from (0 to 1).
• Softmax function used for multi-classification model.
• The formula of softmax function is defined as:
Example of Softmax
ReLU(Rectified Linear Unit) Function
• Computationally efficient—
• allows the network to converge very quickly.
• Non-linear—although it looks like a linear function, ReLU has a
derivative function and allows for backpropagation.
• Similar to identity function for X>0
• ReLU (used for hidden layer neuron output) is defined as:
• It’s just R(x) = max(0,x) i.e if x < 0 , R(x) = 0 and if x >= 0 , R(x) = x.
• Another problem with ReLu is that some gradients can be fragile during training and
can die.
• It can cause a weight update which will makes it never activate on any data
point again.
• Simply saying that ReLu could result in Dead Neurons.
OR
• The Dying ReLU problem—
• when inputs approach zero, or are negative, the gradient of the function
becomes zero, the network cannot perform backpropagation and cannot learn
Leaky ReLU Function
• To fix this problem another modification was introduced called Leaky ReLu to fix the
problem of dying neurons.
• It introduces a small slope to keep the updates alive.
• It Provides more stabilization and performance to deep neural networks than ReLU
and Sigmoid function.
• It has been used in speech recognition.
Activation Function and its Derivatives
TYPES AND POSITIONS OF AFS USED IN DL ARCHITECTURES
DL ACTIVATION FUNCTIONS AND THEIR CORRESPONDING EQUATIONS
FOR COMPUTATION
References
• https://towardsdatascience.com/activation-functions-neural-networks-1cb
d9f8d91d6
• https://www.youtube.com/watch?v=9vB5nzrL4hY
• https://www.youtube.com/watch?v=3r65ZuFyi5Y
• http://dataaspirant.com/2017/03/07/difference-between-softmax-
function-and-sigmoid-function/
• https://towardsdatascience.com/activation-functions-and-its-types-which-
is-better-a9a5310cc8f
• https://missinglink.ai/guides/neural-network-concepts/7-types-neural-net
work-activation-functions-right/
• https://theclevermachine.wordpress.com/2014/09/08/derivation-derivativ
es-for-common-neural-network-activation-functions/
• https://en.wikipedia.org/wiki/Activation_function
• Nwankpa, Chigozie, Winifred Ijomah, Anthony Gachagan, and Stephen Marshall. "Activation
Functions: Comparison of trends in Practice and Research for Deep Learning." arXiv preprint
arXiv:1811.03378 (2018).