DL Mentoring Session - Final
DL Mentoring Session - Final
[email protected]
KU0V563MY9
WEEK 6
• Gradient descent
KU0V563MY9
[email protected]
KU0V563MY9
[email protected]
KU0V563MY9
[email protected]
KU0V563MY9
Loss Function
The goal of the loss function is to minimize the error between the predicted and desired output
[email protected]
KU0V563MY9
[email protected]
KU0V563MY9
First, a linear combination of weights and biases is performed at each neuron in a layer. At each
neuron/node, the linear combination of the inputs(weights & biases) is then passed through an
activation function that introduces nonlinearity to the model. This process by which weights and
biases are propagated from inputs to output is called forward propagation. After arriving at the
predicted output, the value of the loss function for the training example is calculated.
[email protected]
KU0V563MY9
[email protected]
KU0V563MY9
[email protected]
KU0V563MY9
Accuracy
We will apply a threshold value to the output, 0.5 for
instance, so that probability values 0.5 or above
result in a predicted output value of 1, whereas
probability values less than 0.5 result in a predicted
output value of 0. This file is meant for personal use by [email protected] only. 10
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Case Study
Predicting Probability of credit card approval
[email protected]
KU0V563MY9
Hidden Layers
The blue layers (layers between the input and the final output neuron) are called the hidden layers. These are what makes
the Model “Deep”. The hidden layer neurons try to capture the latent patterns within the data that will help the model in
predicting the creditworthiness of the individual.
Neural Networks are also referred to as Black Box Models because even though the model will predict with high accuracy,
the results are not usually interpretable. The number of layers, and number of neurons in each layer, are called
the Hyperparameters, and are something whose desired value is found by the data scientist through trial and error. Unlike
weights, (aka parameters), whose optimum value is found by calculus or other mathematical methods.
Convergence
[email protected]
KU0V563MY9
Saddle Point
Convergence In ADAM Optimization
This file is meant for personal use by [email protected] only.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 14
Sharing or publishing the contents in part or full is liable for legal action.
Transfer Learning
• Transfer learning generally refers to a process where a model trained on one problem is used in some
way on a second related problem.
• Transfer learning has the benefit of decreasing the training time for a neural network model and can
result in lower generalization error.
[email protected]
KU0V563MY9
The pre-trained model or desired portion of the model can be integrated directly into a new neural
network model. In this usage, the weights of the pre-trained can be frozen so that they are not updated as
the new model is trained. Alternately, the weights may be updated during the training of the new model,
perhaps with a lower learning rate, allowing the pre-trained model to act like a weight initialization
scheme when training the new model.
ANY QUESTIONS
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
17