0% found this document useful (0 votes)

37 views

Mod 2.3 - Activation Function

Uploaded by

Christeena Antony

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

Mod 2.3 - Activation Function

Uploaded by

Christeena Antony

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

• Why Do We Need Activation Functions?

• An activation function Φ(v) in the output layer can

control the nature of the output (e.g., probability value
in [0, 1])

• In multilayer neural networks, activation functions

bring non-linearity into hidden layers, which increases
the complexity of the model.

A neural network with any number of layers but only linear

activations can be shown to be equivalent to a single-layer
network.

Binary Step Function

Binary step function depends on a threshold value that decides whether a
neuron should be activated or not.

The input fed to the activation function is compared to a certain threshold; if

the input is greater than it, then the neuron is activated, else it is deactivated,
meaning that its output is not passed on to the next hidden layer.

Mathematically it can be represented as:

Binary Step Function

Here are some of the limitations of binary step function:
 It cannot provide multi-value outputs—for example, it cannot be used
for multi-class classification problems.
 The gradient of the step function is zero, which causes a hindrance in
the backpropagation process.

Linear Activation Function

The linear activation function, also known as "no activation," or "identity
function” is where the activation is proportional to the input.

The function doesn't do anything to the weighted sum of the input, it simply
spits out the value it was given.

Linear Activation Function

Mathematically it can be represented as: f(x)=x

However, a linear activation function has two major problems :

 It’s not possible to use backpropagation as the derivative of the
function is a constant and has no relation to the input x.
 All layers of the neural network will collapse into one if a linear
activation function is used. No matter the number of layers in the
neural network, the last layer will still be a linear function of the first
layer. So, essentially, a linear activation function turns the neural
network into just one layer.

Non-Linear Activation Functions

The linear activation function is simply a linear regression model.

Because of its limited power, this does not allow the model to create
complex mappings between the network’s inputs and outputs.

Non-linear activation functions solve the following limitations of linear

activation functions:
 They allow backpropagation because now the derivative function
would be related to the input, and it’s possible to go back and
understand which weights in the input neurons can provide a better
prediction.
 They allow the stacking of multiple layers of neurons as the output
would now be a non-linear combination of input passed through
multiple layers. Any output can be represented as a functional
computation in a neural network.

Non-Linear Neural Networks Activation Functions

Sigmoid / Logistic Activation Function

This function takes any real value as input and outputs values in the range
of 0 to 1.

The larger the input (more positive), the closer the output value will be to
1.0, whereas the smaller the input (more negative), the closer the output will
be to 0.0, as shown below.
Sigmoid/Logistic Activation Function

Mathematically it can be represented as:

Sigmoid/logistic activation function is one of the most widely used functions:

Reasons
 It is commonly used for models where we have to predict the
probability as an output. Since probability of anything exists only
between the range of 0 and 1, sigmoid is the right choice because of
its range.
 The function is differentiable and provides a smooth gradient, i.e.,
preventing jumps in output values. This is represented by an S-shape
of the sigmoid activation function.

Tanh Function (Hyperbolic Tangent)

Tanh function is very similar to the sigmoid/logistic activation function, and even
has the same S-shape with the difference in output range of -1 to 1. In Tanh,
the larger the input (more positive), the closer the output value will be to 1.0,
whereas the smaller the input (more negative), the closer the output will be to -
1.0.

Tanh Function (Hyperbolic Tangent)

Mathematically it can be represented as:

Advantages of using this activation function are:

 The output of the tanh activation function is Zero centered; hence we can
easily map the output values as strongly negative, neutral, or strongly
positive.
 Usually used in hidden layers of a neural network as its values lie
between -1 to 1; therefore, the mean for the hidden layer comes out to
be 0 or very close to it. It helps in centering the data and makes learning
for the next layer much easier.

ReLU Function
ReLU stands for Rectified Linear Unit.

Although it gives an impression of a linear function, ReLU has a derivative

function and allows for backpropagation while simultaneously making it
computationally efficient.

The main catch here is that the ReLU function does not activate all the neurons
at the same time.

The neurons will only be deactivated if the output of the linear transformation is
less than 0.
ReLU Activation Function

Mathematically it can be represented as : f(x)=max(0,x)

The advantages of using ReLU as an activation function are as follows:
 Since only a certain number of neurons are activated, the ReLU function
is far more computationally efficient when compared to the sigmoid and
tanh functions.
 ReLU accelerates the convergence of gradient descent towards the
global minimum of the loss function due to its linear, non-saturating
property.

The limitations faced by this function are:

 All the negative input values become zero immediately, which decreases
the model’s ability to fit or train from the data properly.
 The Dying ReLU problem.(Solved by an improved version named Leaky
ReLU)
Neural networks are a set of algorithms that are designed to
recognize trends/relationships in a given set of training data. These
algorithms are based on the way human neurons process
information.

This equation represents how a neural network processes the input

data at each layer and eventually produces a predicted output value.

To train — the process by which the model maps the relationship

between the training data and the outputs — the neural network
updates its hyperparameters, the weights, wT, and biases, b, to
satisfy the equation above.

Each training input is loaded into the neural network in a process

called forward propagation. Once the model has produced an
output, this predicted output is compared against the given target
output in a process called backpropagation — the
hyperparameters of the model are then adjusted so that it now
outputs a result closer to the target output.

This is where loss functions come in. Loss functions are one of the
most important aspects of neural networks, as they (along with the
optimization functions) are directly responsible for fitting the model
to the given training data
Loss Functions Overview

A loss function is a function that compares the target and

predicted output values; measures how well the neural network
models the training data. When training, we aim to minimize this
loss between the predicted and target outputs.

The hyperparameters are adjusted to minimize the average loss

— we find the weights, wT, and biases, b, that minimize the value
of J (average loss).

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6135)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (628)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1148)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (935)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4/5 (8215)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (631)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4/5 (8365)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1253)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (860)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (877)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (954)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4/5 (2923)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (484)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (277)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (4973)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (444)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2061)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4281)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (447)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1988)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2283)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1068)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2641)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (1936)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (1994)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (125)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (692)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (1912)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4074)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (75)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (830)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (901)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (143)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2544)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M L Stedman
4.5/5 (790)
Communication Lab Record
No ratings yet
Communication Lab Record
32 pages
CNN Notes
No ratings yet
CNN Notes
10 pages
Module 3
No ratings yet
Module 3
46 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Mod 2.3 - Activation Function, Loss Functions
No ratings yet
Mod 2.3 - Activation Function, Loss Functions
12 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4/5 (105)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carre
3.5/5 (109)

Mod 2.3 - Activation Function

Uploaded by

Mod 2.3 - Activation Function

Uploaded by

• Why Do We Need Activation Functions?

• An activation function Φ(v) in the output layer can

• In multilayer neural networks, activation functions

A neural network with any number of layers but only linear

Binary Step Function

The input fed to the activation function is compared to a certain threshold; if

Mathematically it can be represented as:

Binary Step Function

Linear Activation Function

Linear Activation Function

Mathematically it can be represented as: f(x)=x

However, a linear activation function has two major problems :

Non-Linear Activation Functions

Non-linear activation functions solve the following limitations of linear

Non-Linear Neural Networks Activation Functions

Mathematically it can be represented as:

Sigmoid/logistic activation function is one of the most widely used functions:

Tanh Function (Hyperbolic Tangent)

Tanh Function (Hyperbolic Tangent)

Mathematically it can be represented as:

Advantages of using this activation function are:

Although it gives an impression of a linear function, ReLU has a derivative

Mathematically it can be represented as : f(x)=max(0,x)

The limitations faced by this function are:

This equation represents how a neural network processes the input

To train — the process by which the model maps the relationship

Each training input is loaded into the neural network in a process

A loss function is a function that compares the target and

The hyperparameters are adjusted to minimize the average loss

You might also like