0% found this document useful (0 votes)

19 views

Need and Use of Activation Functions in Anndeep Learning

Uploaded by

uiop38629

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Need and Use of Activation Functions in Anndeep Learning

Uploaded by

uiop38629

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Need and Use of Activation Functions in

ANN( Deep Learning)

Artificial Neural Networks as the powerful and a very strong tool which mimics the human brain.

As our brain has neurons which help us send and receive signals through Axons and Synapses, similarly

machines are being developed to act like a human being. Although it could not enact the same, it is pretty

Well. Let’s dive into our today’s topic, Activation Functions.

What is an Activation Function?

Activation Functions are the functions which when introduced to the input, brings out Non-linear properties to
the network. The main purpose of an activation function is to generate output from the input.

In ANN we perform the sum of products of inputs(X) and their corresponding Weights(W) and apply an
Activation function f(x) to it to get the output of that layer and feed it as an input to the next layer.

What if we do not activate the input signal?

If we do not apply the Activation function, then the output signal will only be a Linear function. Although a linear
function is easy to solve, it will possess less functional complexity and mapping.

We want our Neural Network for linear functions, we want it for more complex functionalities. Also without
activation function, our Neural network would not be able to learn and model other complicated data such as
images, videos, audios, speech etc. That is why we use Artificial Neural network techniques (Deep learning) to
make sense of something complicated, high dimensional, non-linear-big datasets, where the model has lots and lots
of hidden layers in between and has a very complicated architecture which helps us to make sense and extract
knowledge from such complicated big data sets.

Why do we need Non — linearities?

It is something interesting if it stuck to your mind. If the function is a linear one, then it will not have those
non-linear properties that are needed to understand the model in a more deeper manner. If we want to identify,
which one is linear and which one is non-linear, then there is a way. If there is a curvature on plotting the function,
then it is non-linear.

You will be glad to know that Neural Networks are considered as Universal Function Approximators i.e. neural
networks can compute any function.

Hence, we need the Activation function to make the Neural networks more powerful and to learn something
complex.

Activation functions are always differentiable.

To activate means if the particular neuron is acceptable to the network or not i.e. if we are giving any neuron, then
there is a particular range in which it appears. If that range is suitable for the network, it is activated else, not.

On this concept let’s discuss various types of activation functions:

Variants of an Activation function

 Step Function
Now, if we talk of a range, then the first thought that comes to our mind is setting a threshold value above which
the neuron is activated and if the value is below the threshold value, then it’s not activated. Sounds easy. Yes! It
is. This function is called as Step Function.
See the figure below to understand more:

Output is 1 i.e. activated when value > 0 and 0 i.e. not activated when value < 0.

Those things which seem easy carries many drawbacks with them. So is the case here. This function is applicable
only there where we have only two classeFrom above, we can conclude that ReLU activation function is the best
choice among all for the hidden layers. It is also a general activation function. And if we found any dead neuron
case, the Leaky ReLU must be used.
We should avoid using tanh function as it produces a vanishing gradient problem which results in degradation of
our model performance.

For binary classification problem, the sigmoid function is a very natural choice for the output layer and if
classification is not binary then softmax function should be suggested.

s. Hence, for more classifications, we can’t use this. Moreover, the gradient of the step function is zero, i.e.
during propagation when we will calculate the error and send it back for the improvement of the model, it will
reduce it to zero which will stop the improvement of the model. Hence, the best possible model will not come out.

 Linear Function
We saw the problem of gradient being zero in step function. This problem can be overcome by using the linear
function.
f(x) = ax
If we take a = 4.

Here, the activation is proportional to the input. One of the problems of only 2 classes has been overcome here as
by putting any value to a, x can correspond any value and many neurons can be activated at the same time using it.

Still, we have an issue here. If we take the derivative of this function. Then it comes out to be:

f'(x) = a

Derivative of Linear Function,

The derivative corresponding to the linear function is always a constant and does not depend on the input
given. Hence, whenever we find the gradient it comes out to be 4 in this case. Hence, we are not really improving
the model. We are just giving a fixed value. Also, suppose we have multiple layers and transformation is needed to
be done in each layer, then the output comes out to be the linear transformation of the input given. That means,
instead of multiple layers we could have one single layer. No matter how we are putting, the whole network still
acts as a single layer. Hence, linear functions are good for simple tasks where interpretability is needed.

Linear functions are used only at output layers.

 Sigmoid Function
It is one of the most widely used activation functions. It is a non-linear, ‘S’ shaped curve. It is of the form:
f(x)=1/(1+e^-x)
The X values range in -2 to 2. Notice that the small changes in x produce large changes in Y, i.e. Y is steep. The
range of the function is 0 to 1.
It is a smooth function and is continuously differentiable. the main advantage it possesses over step function and the
linear function is that it is non-linear.
If we look at it’s derivative:

Sigmoid Function,

If we look in the above graph then we find that the curve is smooth and dependent on x i.e. if we calculate the
error, then we will get some value to be used for improving the model. We can see that beyond -3 to +3, the
function is almost flattening, i.e. it is approaching to zero which indicates that the error will now be minimal and
the model will stop learning. This problem is termed as ‘Vanishing Gradient’.

One more problem that sigmoid function offers is that its range is between 0 and 1 i.e. it is not zero centered i.e.
it is not symmetric about the origin and also all the values are positive.

It is usually used in the output layer of binary classification, where results correspond to either 0 or 1.

 Tanh Function
The tanh function is almost similar to the sigmoid function. It is actually the scaled version of the sigmoid function.
tanh(x)=2sigmoid(2x)-1
or it can be written as:
tanh(x)=2/(1+e^(-2x)) -1

The tanh function is symmetric over the origin.

Its range is between -1 to 1. Hence, it solves the problem of the values being of the same sign. Its optimization is
easier hence, it is always preferred over Sigmoid function. But it also suffers from Vanishing Gradient Problem.
It is usually used in hidden layers of a neural network.
If we look at the derivative in the above graph, we will find that tanh function is steeper than sigmoid function.
Hence, what we need out of two totally depends on the gradient required.

 ReLU Function
The ReLU function is the Rectified Linear Units. It is the most widely used activation function.
f(x)=max(0,x)
The graphical representation of ReLU is:

From above, we can conclude that ReLU activation function is the best choice among all for the hidden layers. It is
also a general activation function. And if we found any dead neuron case, the Leaky ReLU must be used.

We should avoid using tanh function as it produces a vanishing gradient problem which results in degradation of our
model performance.

For binary classification problem, the sigmoid function is a very natural choice for the output layer and if
classification is not binary then softmax function should be suggested.

ReLU Function,
It seems like the same linear function in the positive axis, but no ReLU is non-linear i.e. it can be easily used
to backpropagate the errors and can have multiple layers for activation. The range of ReLU is [0 to inf).

ReLU is less computationally expensive than tanh and sigmoid functions as it involves the simple mathematical
operations. It also rectifies the Vanishing Gradient Problem. Nowadays, almost all Deep learning models are using
ReLU.

It does not activate all the neurons at the same time which is the biggest advantage of ReLU over other activation
functions. You didn’t understand? Look here. If we look at the ReLU function, the values below zero i.e. all the
negative values are considered to be zero and neuron does not get activated. This means at a particular time only a
few neurons are activated and are resulting in the network to be sparse and more efficient.

Now, have a look at the gradient:

Gradient of ReLU Function,

The gradient of values below zero is zero i.e. the network is not learning or the error is not backpropagated in this
region which results in the Dead Neurons which then never gets activated.

 Leaky ReLU
Leaky ReLU is a modification function. It is introduced to solve the problem of dying neurons during training of
gradients in ReLU function. A small slope is introduced to keep the neuron updates alive.

The graphical representation of Leaky ReLU is:

Leaky ReLU Function,

In the above graph, generally, the value of a is 0.01. If the value is not 0.01 then the function is
called Randomized ReLU or Maxout function

A new variant is formed which is both ReLU and Leaky ReLU called MaxOutfunction.

Comparison between different ReLU functions,

 Softmax
Softmax is a non-linear function. It is a type of sigmoid function but is handy while trying to handle multiple classes
or classification problems.

As we know for output layer of binary classification we use the sigmoid function but for more than two types of
classification, we should use softmax function.

It tries to attain the probabilities of each input of the class. The softmax function would squeeze the outputs for
each class between 0 and 1 and would also divide by the sum of the outputs. It can be explained as:

Conclusion:
Which activation function to be used?
From above, we can conclude that ReLU activation function is the best choice among all for the hidden layers.
It is also a general activation function. And if we found any dead neuron case, the Leaky ReLU must be used.
We should avoid using tanh function as it produces a vanishing gradient problem which results in degradation of
our model performance.
For binary classification problem, the sigmoid function is a very natural choice for the output layer and if
classification is not binary then softmax function should be suggested.

Lecture B1 - Overview and Intro
No ratings yet
Lecture B1 - Overview and Intro
86 pages
Sample Lesson Plan Using SIOP Model
No ratings yet
Sample Lesson Plan Using SIOP Model
4 pages
UNIT-III Activation-function
No ratings yet
UNIT-III Activation-function
6 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Performance Analysis of Various Activation Functio
No ratings yet
Performance Analysis of Various Activation Functio
7 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
Neural Network example and Activation Functions Summary
No ratings yet
Neural Network example and Activation Functions Summary
2 pages
Activation Function
No ratings yet
Activation Function
36 pages
Unit 5 Activation Function
No ratings yet
Unit 5 Activation Function
15 pages
activatn fn 2
No ratings yet
activatn fn 2
10 pages
Activation Function
No ratings yet
Activation Function
43 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Activation
No ratings yet
Activation
7 pages
Activation Function
No ratings yet
Activation Function
4 pages
Module1
No ratings yet
Module1
124 pages
5 TH
No ratings yet
5 TH
22 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Unit 3 Deep Learning
No ratings yet
Unit 3 Deep Learning
11 pages
Ad3451 Ml Unit 4 Notes
No ratings yet
Ad3451 Ml Unit 4 Notes
34 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Unit 2_Activation Function_PR
No ratings yet
Unit 2_Activation Function_PR
22 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Act_Fun
No ratings yet
Act_Fun
7 pages
Activation Function
No ratings yet
Activation Function
9 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Mod 2.3 - Activation Function, Loss Functions
No ratings yet
Mod 2.3 - Activation Function, Loss Functions
12 pages
Types of Neural Network Activation Functions_ How to Choose_ (1)
No ratings yet
Types of Neural Network Activation Functions_ How to Choose_ (1)
36 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Activation Function
No ratings yet
Activation Function
31 pages
lecture 9-NN- modified
No ratings yet
lecture 9-NN- modified
94 pages
ML_Lec-22
No ratings yet
ML_Lec-22
25 pages
activation fn
No ratings yet
activation fn
15 pages
Mod 2.3 - Activation Function
No ratings yet
Mod 2.3 - Activation Function
9 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
AD3451 ML UNIT 4 NOTES
No ratings yet
AD3451 ML UNIT 4 NOTES
36 pages
4 - Activation Functions in Neural Networks
No ratings yet
4 - Activation Functions in Neural Networks
12 pages
M2 PPT
No ratings yet
M2 PPT
84 pages
Aditya Jain NN Assignment
No ratings yet
Aditya Jain NN Assignment
13 pages
DL Answers
No ratings yet
DL Answers
24 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
26- netinput activation function forward and back propogation
No ratings yet
26- netinput activation function forward and back propogation
41 pages
4. ANNs
No ratings yet
4. ANNs
57 pages
Pr1_ANN_Writeup.docx
No ratings yet
Pr1_ANN_Writeup.docx
7 pages
Ann
No ratings yet
Ann
40 pages
Activation Function
No ratings yet
Activation Function
13 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Activation Functions
No ratings yet
Activation Functions
9 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
How To Choose An Activation Function For Deep Learning
No ratings yet
How To Choose An Activation Function For Deep Learning
15 pages
SoftComp 02
No ratings yet
SoftComp 02
33 pages
Feed Forward NN
No ratings yet
Feed Forward NN
35 pages
Activation Funtions
No ratings yet
Activation Funtions
26 pages
ReLu Heuristics For Avoiding Local Bad Minima
100% (2)
ReLu Heuristics For Avoiding Local Bad Minima
10 pages
3-Activation Function, Loss Function-24-07-2024
No ratings yet
3-Activation Function, Loss Function-24-07-2024
19 pages
Deep Learning Tutorial 3
No ratings yet
Deep Learning Tutorial 3
12 pages
Lect 5- Non Linear Activation Functions
No ratings yet
Lect 5- Non Linear Activation Functions
41 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Professional Reflection Module 2 For CalStateTEACH Credential Program
No ratings yet
Professional Reflection Module 2 For CalStateTEACH Credential Program
8 pages
CTE Workshop On Classroom Assessment Techniques
100% (1)
CTE Workshop On Classroom Assessment Techniques
26 pages
Corporate Social Responsibility-Bharat Forge
No ratings yet
Corporate Social Responsibility-Bharat Forge
17 pages
Lesson 12
No ratings yet
Lesson 12
1 page
Teacher Work Attachment at CSCI
No ratings yet
Teacher Work Attachment at CSCI
4 pages
Museum Studies in Collection Management: Digitalcommons@University of Nebraska - Lincoln
No ratings yet
Museum Studies in Collection Management: Digitalcommons@University of Nebraska - Lincoln
5 pages
Lesson Plan 1 - Grade 7
No ratings yet
Lesson Plan 1 - Grade 7
2 pages
Vision Statement
No ratings yet
Vision Statement
2 pages
CVAC Multidisciplinary
No ratings yet
CVAC Multidisciplinary
14 pages
ORS-November - Batangantang For TLE Department
No ratings yet
ORS-November - Batangantang For TLE Department
4 pages
Rubric For Journal Writing
No ratings yet
Rubric For Journal Writing
1 page
Science Lesson 1 1
No ratings yet
Science Lesson 1 1
3 pages
65e3947f80b845eaa7af1bf9 52792589908
No ratings yet
65e3947f80b845eaa7af1bf9 52792589908
4 pages
Information Technology (IT) Development Strategy of STIE INABA in The Perspective of IT Balanced Scorecard
No ratings yet
Information Technology (IT) Development Strategy of STIE INABA in The Perspective of IT Balanced Scorecard
9 pages
jEN HOPE 1 WEEK 6
No ratings yet
jEN HOPE 1 WEEK 6
5 pages
A Presentation On: Cricket Academy
No ratings yet
A Presentation On: Cricket Academy
14 pages
Reading v1 a1+
No ratings yet
Reading v1 a1+
3 pages
Melissa Byrne Resume February 2016
No ratings yet
Melissa Byrne Resume February 2016
3 pages
How To Write Theoratical Framework of A Research Paper
No ratings yet
How To Write Theoratical Framework of A Research Paper
2 pages
Jvses SLM Inventory 2022 2023 Tscruz Edited
No ratings yet
Jvses SLM Inventory 2022 2023 Tscruz Edited
12 pages
Attention All JHCSC Students
No ratings yet
Attention All JHCSC Students
2 pages
The Effectiveness of A Blended Learning
No ratings yet
The Effectiveness of A Blended Learning
14 pages
OES Student Handbook 2022-2023 Board Approved
No ratings yet
OES Student Handbook 2022-2023 Board Approved
15 pages
Lesson Plan
No ratings yet
Lesson Plan
4 pages
Computer Science Guide First Assessment 2027
No ratings yet
Computer Science Guide First Assessment 2027
67 pages
DLL Mapeh 10
No ratings yet
DLL Mapeh 10
8 pages
ENG 9 - WLP - Week 4 (Sept.26-30)
No ratings yet
ENG 9 - WLP - Week 4 (Sept.26-30)
7 pages
1 English Pedagogy free notes by Himanshi Singh
100% (1)
1 English Pedagogy free notes by Himanshi Singh
9 pages

Need and Use of Activation Functions in Anndeep Learning

Uploaded by

Need and Use of Activation Functions in Anndeep Learning

Uploaded by

Need and Use of Activation Functions in

ANN( Deep Learning)

Well. Let’s dive into our today’s topic, Activation Functions.

What is an Activation Function?

What if we do not activate the input signal?

Why do we need Non — linearities?

Activation functions are always differentiable.

On this concept let’s discuss various types of activation functions:

Variants of an Activation function

Derivative of Linear Function,

Linear functions are used only at output layers.

The tanh function is symmetric over the origin.

Now, have a look at the gradient:

Gradient of ReLU Function,

The graphical representation of Leaky ReLU is:

Leaky ReLU Function,

Comparison between different ReLU functions,

You might also like