0% found this document useful (0 votes)

28 views

Activation Functions

This presentation covers activation functions used in artificial neural networks. It discusses linear and non-linear activation functions, including sigmoid, tanh, softmax and ReLU. Non-linear activation functions are important for neural networks to learn complex patterns from data and model non-linear relationships. The presentation compares various non-linear activation functions and discusses their properties such as differentiability, saturation points, and advantages/disadvantages for training deep neural networks.

Uploaded by

Asimullah, M.Phil. Scholar Department of Computer Science, UoP

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

Activation Functions

Uploaded by

Asimullah, M.Phil. Scholar Department of Computer Science, UoP

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Presentation on

Activation Functions

By
Mumtaz Khan
&
Aminullah
Ph. D Scholars

Presented to
Dr Mohammad Naeem
Assistant Professor
Department of Computer Science
University of Peshawar
This presentation Covers

• Activation Function
• Activation Function Types
• Linear
• Non-linear
• Differentiability and Continuity of the Activation Function
• Derivatives
• Saturation Points
Activation function
• In computational networks, the Activation Function of a Node
defines the output of that node given an input or set of inputs.
• Also known as Transfer Function.
• It can be attached b/w two Neural Networks
• It is used to determine the output of neural network like yes or no. It maps
the resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the
function).
• These are used to import a non-linearity properties into Network
• Which is one of the important factors which affect your
results and accuracy of your model.

• Activation functions are important for a Artificial Neural

Network to Learn and understand the complex patterns.
• The main purpose is to convert a input signal of a node in a A-NN to an output
signal. 3
The question arises that why can’t we do
it without activating the input signal?
• If we do not apply a Activation function then the output signal would simply be a simple
linear function.

• A linear function is just a polynomial of one degree. Now, a linear equation is easy to
solve but they are limited in their complexity and have less power to learn complex
functional mappings from data.

• A Neural Network without Activation function would simply be a Linear regression

Model, which has limited power and does not performs good most of the times.

• Also without activation function our Neural network would not be able to learn and
model other complicated kinds of data such as images, videos , audio etc.

• It is used to determine the output of neural network like yes or no. It maps the resulting
values in between 0 to 1 or -1 to 1 etc. (depending upon the function)
4
Types of Activation function

• Linear Activation Function

• Non-linear Activation Functions

5
Linear or Identity Activation function

• As you can see the function is a line or linear. Therefore, the output of the functions
will not be confined between any range.

• Equation : f(x) = x
• Range : (-infinity to infinity)
• Note: It doesn’t help with the complexity or various parameters of usual data that
is fed to the neural networks 6
Step function
• Step Function is one of the simplest kind of activation functions. In this, we consider a
threshold value and if the value of net input say y is greater than the threshold then
the neuron is activated.

• F(x)=1,if x>=0

7
Why do we need non-Linearities
• Non-linear functions are those which have degree more than
one and they have a curvature when we plot a Non-Linear
function. Now we need a Neural Network Model to learn and
represent almost anything and any arbitrary complex function
which maps inputs to outputs.

• Hence using a non linear Activation we are able to generate

non-linear mappings from inputs to outputs.

8
Non-Linear Activation function
• The Non-linear Activation Functions are the most used activation functions.
• Non-linearity helps to makes the graph look something like this

• It makes, it easy for the model to generalize or adapt with variety of data and to
differentiate between the output
9
Non-Linear Activation function

• The main terminologies needed to understand for non-linear

functions are:

• Derivative or Differential: Change in y-axis w.r.t. change in x-

axis. It is also known as slope.

• Monotonic function: A function which is either entirely non-

increasing or non-decreasing.

• The Non-linear Activation Functions are mainly divided on the

basis of their range or curves-

10
Sigmoid / Logistic function

• The Sigmoid Function curve looks like a S-shape

• The main reason why we use sigmoid function is because it

exists between (0 to 1).
• Therefore, it is especially used for models where we have to
predict the probability as an output. Since probability of
anything exists only between the range of 0 and 1, sigmoid is
the right choice.
• The function is differentiable
• That means, we can find the slope of the sigmoid curve at
any two points.

11
Sigmoid function

• The beauty of an exponent is that the value never reaches zero

nor exceed 1 in the above equation.
• The large negative numbers are scaled towards 0 and large
positive numbers are scaled towards.

• It is mostly used in output

layer of the network.

• Used for binary classification

Properties of Sigmoid function

• The sigmoid function returns a real-valued output.

• The first derivative of the sigmoid function will be non-negative

or non-positive.

• Non-Negative: If a number is greater than or equal to zero.

• Non-Positive: If a number is less than or equal to Zero.
• Derivative of Sigmoid Function as:
Disadvantages of Sigmoid function

• Major reasons which have made it fall out of popularity –

• Vanishing gradient problem.
• For very high or very low values of X, there is almost no change to the
prediction, causing a vanishing gradient problem.
• This can result in the network refusing to learn further, or being too slow to
reach an accurate prediction.

• Secondly , its output isn’t zero centered. It makes the gradient

updates go too far in different directions. 0 < output < 1, and it
makes optimization harder.

• Sigmoid saturate and kill gradients.

• Sigmoid have slow convergence.
Tanh or hyperbolic tangent Function
• tanh is also like logistic sigmoid but better
• The range of the tanh function is from (-1 to 1).
• tanh is also sigmoidal (s - shaped).

• The advantage is that the negative inputs will be mapped

strongly negative and the zero inputs will be mapped near zero
in the tanh graph 15
Tanh or hyperbolic tangent Function
• The hyperbolic tangent (tanh) function (used for hidden layer neuron
output) is an alternative to Sigmoid function.
• It is defined by the formula.

• tanh function is similar to Sigmoid function

• It squashes real-valued number to the range between -1 and 1, i.e., tanh(x)∈(−1,1) .

Tanh or hyperbolic tangent Function
• Therefore, in practice the tanh is always preferred to the
sigmoid.
• The derivative of tanh function is defined as:

tanh′(x)=1−

• The red dotted line in the previous graph, it interprets that tanh
also saturate and kill gradient, since tanh’s derivative has
similar shape as compare to Sigmoid’s derivative.

• tanh has stronger gradients, since data is centered around 0,

the derivatives are higher, and tanh avoids bias in the gradients
Hard hyperbolic tangent Function
• It is also known as the Hardtanh function
• Another variant of the tanh function
• It is cheaper and more computational efficient version of tanh
• The range of the Hardtanh function is (-1 to 1).
• It is defined as:
• The derivative of Hardtanh as:

18
Softmax Activation Function
• Used to compute probability distribution from a vector of real
numbers.
• Calculate the probabilities of each target class over all possible target
classes.
• Calculated probabilities help for determining the target class for the
given inputs.
• The output range of the Softmax function is from (0 to 1).
• Softmax function used for multi-classification model.
• The formula of softmax function is defined as:

• Computes the exponential (e-power) of the given input value and

the sum of exponential values of all the values in the inputs.
• Then the ratio of the exponential of the input value and the sum of
exponential values is the output of the softmax function.
Properties and Usage of Softmax Function
• Properties of Softmax as:
• The calculated probabilities will be in the range of 0 to 1
• The sum of all the probabilities is equals to 1.
• Usage of Softmax as:
• Used in multiple classification logistic regression model.
• Used in the different layers of neural networks as well.

Example of Softmax
ReLU(Rectified Linear Unit) Function
• Computationally efficient—
• allows the network to converge very quickly.
• Non-linear—although it looks like a linear function, ReLU has a
derivative function and allows for backpropagation.
• Similar to identity function for X>0
• ReLU (used for hidden layer neuron output) is defined as:
• It’s just R(x) = max(0,x) i.e if x < 0 , R(x) = 0 and if x >= 0 , R(x) = x.

• Note: It avoids and rectifies vanishing gradient problem.

Problems in ReLU(Rectified Linear Unit) Function
• But its limitation is that it should only be used within Hidden
Layers of a Neural Network Model.

• Another problem with ReLu is that some gradients can be fragile during training and
can die.
• It can cause a weight update which will makes it never activate on any data
point again.
• Simply saying that ReLu could result in Dead Neurons.
OR
• The Dying ReLU problem—
• when inputs approach zero, or are negative, the gradient of the function
becomes zero, the network cannot perform backpropagation and cannot learn
Leaky ReLU Function
• To fix this problem another modification was introduced called Leaky ReLu to fix the
problem of dying neurons.
• It introduces a small slope to keep the updates alive.

• Leaky RelU is an attempt to solve the dying problem of ReLU.

• This variation of ReLU has a small positive slope in the negative area
• So it does enable backpropagation, even for negative input values
• Otherwise like ReLU
• The leak helps to increase the range of the ReLU function.
• Usually, the value of a is 0.01 or so.
DISADVANTAGES:
• Results not consistent—Leaky ReLU does not provide consistent predictions for
negative input values
Randomized ReLU Function
• When a is not 0.01 then it is called Randomized ReLU.
• Therefore the range of the Leaky ReLU is (-infinity to infinity).
• Both Leaky and Randomized ReLU functions are monotonic in
nature. Also, their derivatives also monotonic in nature.
Advantages:
Allows the negative slope to be learned—unlike leaky ReLU,
• This function provides the slope of the negative part of the function as an
argument.
• It is, therefore, possible to perform backpropagation and learn the most
appropriate value of α.
• Otherwise like ReLU.
• Disadvantages:
• May perform differently for different problems

Formula of Randomized ReLU:

SoftPlus Function
• SoftPlus Function is a smooth version of ReLU function, which has
• Smoothing and non-zero gradient properties.
• Proposed by Dugas et al., 2001.
• The Outputs Range is (0, ∞)
• The SoftPlus function is defined as :
• The derivative of softPlus function is logistic function

• It Provides more stabilization and performance to deep neural networks than ReLU
and Sigmoid function.
• It has been used in speech recognition.
Activation Function and its Derivatives
TYPES AND POSITIONS OF AFS USED IN DL ARCHITECTURES
DL ACTIVATION FUNCTIONS AND THEIR CORRESPONDING EQUATIONS
FOR COMPUTATION
References
• https://towardsdatascience.com/activation-functions-neural-networks-1cb
d9f8d91d6
• https://www.youtube.com/watch?v=9vB5nzrL4hY
• https://www.youtube.com/watch?v=3r65ZuFyi5Y
• http://dataaspirant.com/2017/03/07/difference-between-softmax-
function-and-sigmoid-function/
• https://towardsdatascience.com/activation-functions-and-its-types-which-
is-better-a9a5310cc8f
• https://missinglink.ai/guides/neural-network-concepts/7-types-neural-net
work-activation-functions-right/
• https://theclevermachine.wordpress.com/2014/09/08/derivation-derivativ
es-for-common-neural-network-activation-functions/
• https://en.wikipedia.org/wiki/Activation_function
• Nwankpa, Chigozie, Winifred Ijomah, Anthony Gachagan, and Stephen Marshall. "Activation
Functions: Comparison of trends in Practice and Research for Deep Learning." arXiv preprint
arXiv:1811.03378 (2018).

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6132)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (627)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1148)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (935)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4/5 (8215)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (631)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1253)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4/5 (8365)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (860)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (877)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (954)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4/5 (2923)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (484)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (277)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (4972)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (444)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2061)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4281)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (447)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1987)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2283)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1068)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2641)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (1993)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (1936)
BSN Curriculum UHS (Semester Wise)
0% (1)
BSN Curriculum UHS (Semester Wise)
176 pages
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (125)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (692)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (1912)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4074)
SYSCALL - Product System Manual - V4.1
100% (1)
SYSCALL - Product System Manual - V4.1
28 pages
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (75)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (830)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (901)
Scion - Yazata
100% (4)
Scion - Yazata
67 pages
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (143)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2544)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M L Stedman
4.5/5 (790)
Loss Functions
No ratings yet
Loss Functions
37 pages
Decision Trees
No ratings yet
Decision Trees
32 pages
BS Scale MIN INC MAX GPF BS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No ratings yet
BS Scale MIN INC MAX GPF BS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
2 pages
Mcqs in Computer Science 5nbsped 9789351342496 - Compress
No ratings yet
Mcqs in Computer Science 5nbsped 9789351342496 - Compress
487 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4/5 (105)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
3.5/5 (109)
Physics Investiga Tory Project: Primary Unit of Elementary Particles
No ratings yet
Physics Investiga Tory Project: Primary Unit of Elementary Particles
21 pages
Averages Answers
No ratings yet
Averages Answers
12 pages
Hyperopia & Presbyopia
No ratings yet
Hyperopia & Presbyopia
30 pages
Samartha Sadguru Sparsha
No ratings yet
Samartha Sadguru Sparsha
8 pages
Download full Vascular Medicine: A Companion to Braunwald’s Heart Disease 3rd Edition Mark Creager ebook all chapters
100% (1)
Download full Vascular Medicine: A Companion to Braunwald’s Heart Disease 3rd Edition Mark Creager ebook all chapters
62 pages
Cessna 172 RG Checklist Cessna 172 RG Checklist Cessna 172 RG Checklist
No ratings yet
Cessna 172 RG Checklist Cessna 172 RG Checklist Cessna 172 RG Checklist
2 pages
Egan's Irish Whiskey - Case Report
No ratings yet
Egan's Irish Whiskey - Case Report
13 pages
Exercise Matrial 7 Choose The Best Answer by Crossing Either A, B, C, D, or E. Read The Following Text and Answer The Questions 55 To 58
No ratings yet
Exercise Matrial 7 Choose The Best Answer by Crossing Either A, B, C, D, or E. Read The Following Text and Answer The Questions 55 To 58
5 pages
Flat slab design IS456
100% (1)
Flat slab design IS456
28 pages
Classic Newyork Cheese Cake
No ratings yet
Classic Newyork Cheese Cake
6 pages
Cuestionario VIA de Fortalezas Personales
No ratings yet
Cuestionario VIA de Fortalezas Personales
4 pages
Disaster Management Mains Previous Year Question Papers
No ratings yet
Disaster Management Mains Previous Year Question Papers
2 pages
Arch (Gerbang) : Prepared By: Miss - Izzati Yahya
No ratings yet
Arch (Gerbang) : Prepared By: Miss - Izzati Yahya
19 pages
Methodology of Compliant Mechanisms and Its Curren
No ratings yet
Methodology of Compliant Mechanisms and Its Curren
9 pages
Interpretation: S01 - S01-Dr. Madhu Loomba Kanpur, Kanpur
No ratings yet
Interpretation: S01 - S01-Dr. Madhu Loomba Kanpur, Kanpur
5 pages
Santa Monica $ 3rd Promenade Map
No ratings yet
Santa Monica $ 3rd Promenade Map
2 pages
4_5922626456550118159 (1)
No ratings yet
4_5922626456550118159 (1)
24 pages
Company Profile YB.07.04.2020
No ratings yet
Company Profile YB.07.04.2020
30 pages
Hindu Philosophy Advaita Vishtadvaita Dwaita
No ratings yet
Hindu Philosophy Advaita Vishtadvaita Dwaita
25 pages
Look & Clook Scheduling
No ratings yet
Look & Clook Scheduling
4 pages
Poetry Packet
No ratings yet
Poetry Packet
9 pages
Crosby IPH10E
No ratings yet
Crosby IPH10E
1 page
Proprietes Mexaniques
No ratings yet
Proprietes Mexaniques
9 pages
Ifugaw Wood Sculp
No ratings yet
Ifugaw Wood Sculp
3 pages
Earning Outcomes: Laguna State Polytechnic University
No ratings yet
Earning Outcomes: Laguna State Polytechnic University
13 pages
Audi MMI Display Retrofit (From 8.3 - To 10.1 - ) - VAG MIB2
No ratings yet
Audi MMI Display Retrofit (From 8.3 - To 10.1 - ) - VAG MIB2
3 pages
Jurisprudence - I, V (B)
No ratings yet
Jurisprudence - I, V (B)
7 pages

Activation Functions

Uploaded by

Activation Functions

Uploaded by

Presentation on

• Activation functions are important for a Artificial Neural

• A Neural Network without Activation function would simply be a Linear regression

• Linear Activation Function

• Non-linear Activation Functions

• Hence using a non linear Activation we are able to generate

• The main terminologies needed to understand for non-linear

• Derivative or Differential: Change in y-axis w.r.t. change in x-

• Monotonic function: A function which is either entirely non-

• The Non-linear Activation Functions are mainly divided on the

• The Sigmoid Function curve looks like a S-shape

• The main reason why we use sigmoid function is because it

• The beauty of an exponent is that the value never reaches zero

• It is mostly used in output

• Used for binary classification

• The sigmoid function returns a real-valued output.

• The first derivative of the sigmoid function will be non-negative

• Non-Negative: If a number is greater than or equal to zero.

• Major reasons which have made it fall out of popularity –

• Secondly , its output isn’t zero centered. It makes the gradient

• Sigmoid saturate and kill gradients.

• The advantage is that the negative inputs will be mapped

• tanh function is similar to Sigmoid function

• It squashes real-valued number to the range between -1 and 1, i.e., tanh(x)∈(−1,1) .

• tanh has stronger gradients, since data is centered around 0,

• Computes the exponential (e-power) of the given input value and

• Note: It avoids and rectifies vanishing gradient problem.

• Leaky RelU is an attempt to solve the dying problem of ReLU.

Formula of Randomized ReLU:

You might also like