0% found this document useful (0 votes)
12 views

DL Mentoring Session - Final

Uploaded by

Ajanta Bearing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

DL Mentoring Session - Final

Uploaded by

Ajanta Bearing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Deep Learning

[email protected]
KU0V563MY9

WEEK 6

This file is meant for personal use by [email protected] only.


Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 1
Sharing or publishing the contents in part or full is liable for legal action.
Session agenda
• Introduction to Deep learning with case study discussion
• Understanding nodes and layers
• Loss function
• Activation function
• Forward and Backward propagation
[email protected]

• Gradient descent
KU0V563MY9

• Manipulating Deep Neural Networks


• Non-convex function
• Transfer learning
• Natural Language processing

This file is meant for personal use by [email protected] only.


Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 2
Sharing or publishing the contents in part or full is liable for legal action.
Why Deep Learning?

[email protected]
KU0V563MY9

• Deep learning is the first class of algorithms that is


scalable where its performance just keeps getting
better as you feed them more data.
• Almost all the value today of deep learning is through
supervised learning or learning from labeled data.
This file is meant for personal use by [email protected] only.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
Sharing or publishing the contents in part or full is liable for legal action.
Linear equation

This function is defined as


a weighted sum of its
inputs

[email protected]
KU0V563MY9

Simple Threshold Function(Step function)

This file is meant for personal use by [email protected] only.


Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 4
Sharing or publishing the contents in part or full is liable for legal action.
Sigmoid Function

[email protected]
KU0V563MY9

Loss Function

The goal of the loss function is to minimize the error between the predicted and desired output

This file is meant for personal use by [email protected] only.


Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
Sharing or publishing the contents in part or full is liable for legal action.
Activation Function
Perceptron

[email protected]
KU0V563MY9

Is ReLU faster than tanh?


The main idea is to let the gradient be non zero and
recover during training eventually. ReLU is less
computationally expensive than tanh and sigmoid
because it involves simpler mathematical operations.
This file is meant for personal use by [email protected] only.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 6
Sharing or publishing the contents in part or full is liable for legal action.
Gradient Descent

[email protected]
KU0V563MY9

Adjusted values of w and b


Gradient descent is the essence of the learning
process — through it the machine learns what
values of weights and biases minimize the cost
function. It does this by iteratively comparing its The alpha term in front of the partial derivative is called
predicted output for a set of data to the true the learning rate and is a measure of how big a step to
output in a process called training. take at each iteration.
This file is meant for personal use by [email protected] only.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 7
Sharing or publishing the contents in part or full is liable for legal action.
Training model
Forward Propagation

First, a linear combination of weights and biases is performed at each neuron in a layer. At each
neuron/node, the linear combination of the inputs(weights & biases) is then passed through an
activation function that introduces nonlinearity to the model. This process by which weights and
biases are propagated from inputs to output is called forward propagation. After arriving at the
predicted output, the value of the loss function for the training example is calculated.
[email protected]
KU0V563MY9

This file is meant for personal use by [email protected] only.


Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 8
Sharing or publishing the contents in part or full is liable for legal action.
Backward Propagation
Back propagation is the process of calculating the partial derivatives from the loss
function back to the inputs, thereby updating the values of w and b that lead us to
the minimum.

[email protected]
KU0V563MY9

This file is meant for personal use by [email protected] only.


Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 9
Sharing or publishing the contents in part or full is liable for legal action.
Final Design

[email protected]
KU0V563MY9

Accuracy
We will apply a threshold value to the output, 0.5 for
instance, so that probability values 0.5 or above
result in a predicted output value of 1, whereas
probability values less than 0.5 result in a predicted
output value of 0. This file is meant for personal use by [email protected] only. 10
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Case Study
Predicting Probability of credit card approval

We are going to discuss on


credit card approval
decision using neural net.
[email protected]
KU0V563MY9
The factor that will be
used to determine the
output are as listed – age,
salary, education, city ,
company and education.

This file is meant for personal use by [email protected] only.


Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 11
Sharing or publishing the contents in part or full is liable for legal action.
Manipulating Deep Neural Networks

[email protected]
KU0V563MY9

Hidden Layers
The blue layers (layers between the input and the final output neuron) are called the hidden layers. These are what makes
the Model “Deep”. The hidden layer neurons try to capture the latent patterns within the data that will help the model in
predicting the creditworthiness of the individual.

Neural Networks are also referred to as Black Box Models because even though the model will predict with high accuracy,
the results are not usually interpretable. The number of layers, and number of neurons in each layer, are called
the Hyperparameters, and are something whose desired value is found by the data scientist through trial and error. Unlike
weights, (aka parameters), whose optimum value is found by calculus or other mathematical methods.

This file is meant for personal use by [email protected] only.


Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 12
Sharing or publishing the contents in part or full is liable for legal action.
Convex vs Non-Convex Function
• A non-convex function "curves up and
down“. A non-convex function is wavy -
has some 'valleys' (local minima) that
aren't as deep as the overall deepest
'valley' (global minimum).
• Optimization algorithms can get stuck
in the local minimum

Convergence
[email protected]
KU0V563MY9

This file is meant for personal use by [email protected] only.


Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 13
Sharing or publishing the contents in part or full is liable for legal action.
Non Convex Optimization
• In Deep Learning, Non Convex Optimization convergence may happen at a bad local minimum .
In such a case, we can re-optimize the system with different initialization and/or add extra noise
for gradient updates.
• We may face convergence to a saddle point that can be tackled by finding the hessian and
computing a descent direction.
• Getting stuck in a region of low gradient magnitude can be solved using batchnorm or designing
networks efficiently using a rectified linear unit (ReLU) activation function
• We may take huge steps and diverge because of the high curvature. In that case, we can use
adaptive step size or more intuitively limit the size of the gradient step.
[email protected]
KU0V563MY9

• If we end up having a wrong setting of hyperparameters, we can go for hyperparameter


optimization methods.

Saddle Point
Convergence In ADAM Optimization
This file is meant for personal use by [email protected] only.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 14
Sharing or publishing the contents in part or full is liable for legal action.
Transfer Learning
• Transfer learning generally refers to a process where a model trained on one problem is used in some
way on a second related problem.
• Transfer learning has the benefit of decreasing the training time for a neural network model and can
result in lower generalization error.

[email protected]
KU0V563MY9

This file is meant for personal use by [email protected] only.


Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 15
Sharing or publishing the contents in part or full is liable for legal action.
How Transfer Learning works
• A model can be downloaded and used as-is.
• Models can be downloaded and used as feature extraction models. Here, the output of the model
from a layer prior to the output layer of the model is used as input to a new classifier model.
• The pre-trained model can be used as a separate feature extraction program, in which case input can
be pre-processed by the model or portion of the model to a given an output (e.g. vector of numbers)
[email protected]
for each input image, that can then use as input when training a new model.
KU0V563MY9

The pre-trained model or desired portion of the model can be integrated directly into a new neural
network model. In this usage, the weights of the pre-trained can be frozen so that they are not updated as
the new model is trained. Alternately, the weights may be updated during the training of the new model,
perhaps with a lower learning rate, allowing the pre-trained model to act like a weight initialization
scheme when training the new model.

This file is meant for personal use by [email protected] only.


Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 16
Sharing or publishing the contents in part or full is liable for legal action.
[email protected]
KU0V563MY9

ANY QUESTIONS
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
17

You might also like