SlideShare a Scribd company logo
Deep Learning
Vidyasagar Bhargava
Contents
1. Introduction
2. Why deep learning?
3. Fundamentals of deep learning
4. How deep learning works?
5. Activation Function
6. Train Neural Network
• How to minimize loss or cost?
• How to move in right direction?
• Stochastic Gradient Descent
• How to calculate gradient?
7. Adaptive learning
8. Over fitting
9. Regularization
10. H2O
• Introduction
• H2O’s Deep learning
• Features
• Parameters
• Demo
Introduction
• Deep learning is an enhanced and powerful form of neural network
which is build on several hidden layers(more than 2)
• Since Data comes in many forms sometimes its difficult for linear
methods to detect non-linearity in the data. In fact, many a times
even non-linear algorithm such as GBM, decision tree fails to learn
from data.
• In such cases, a multi layered neural network which creates non-
linear interaction among the features gives the better solution!
Why Deep Learning?
• Neural Network has been around quite long time and only in past few
years they become so popular.
• Deep learning is powerful because it is able to learn powerful feature
representation in unsupervised manner which differs from traditional
machine learning algorithm where we have to manually handcraft
features.
• Handcraft features works in a lot of domain but some domains like
image classification where data is very high dimensional and makes it
difficult to craft feature that are useful for prediction.
• So deep learning takes the approach to take all the data and figuring
out what are the best features.
A Perceptron
Fundamentals of Deep learning
• At the core of any neural network is a perceptron
• In neural network visual… circle represents neuron and line
represents synapse.
• Synapse has very simple job they take value from neuron and
multiply by specific weight and output the result.
• Neurons are little bit complicated.. their job is to add together the
outputs from all the synapses add a bias term and apply the
activation function.
• Activation function allow neural net to model complex non linear
pattern.
Perceptron Forward Pass
Multiply weight
and Inputs • Step 1 (Synapse)
SUM all together
and add bias term
• Step 2 (Neuron)
Applying Non
linearity
• Step3
(Activtaion
function)
Why is bias added ?
• Bias is similar to the intercept term in linear
regression.
• It helps in achieving better prediction by shifting the
decision boundary
Activation function
• At the core of every activation function is non
linearity which transforms output from linear feature
to non linear feature.
• There are many -many activation functions .
• Some common activation functions are :- Sigmoid,
TanH, ReLu,
Importance of Activation function
• Activation functions add non linearity to our
network’s function.
• Non linearity is important because most of our real
world data is non linear.
How to build neural network with
perceptron?
• Perceptron is very basic of neural network. However,
perceptron isn’t powerful enough to work on linearly
separable data.
• Due to this Multi-Layer Perceptron came into existence
• We can add a hidden layer between input layer and output
layer which gives rise to Multi-Layer Perceptron. (MLP)
• To extend MLP to deep neural network simply add more
layers.
Deep Learning Model
• The input layer consist of neurons to equal to number of input
variable in the data.
• The number of neurons in hidden layer depends on user.
• We can find optimum number of neurons in hidden layers using
cross-validating strategy.
Applying Neural Network
• To quantify how good is our neural network we calculates loss i.e.
sum of difference of actual output and predicted output.
• There are lot of loss functions like cross entropy loss, mean square
error etc.
• Loss is represented as J (Θ)
• Our goal is to minimize the loss so that network can predict the
output more accurately.
• Note Θ = W1, W2,....Wn
J (Θ) = 1/N∑i
N loss(f(x(i); Θ), y(i))
argΘ min 1/N∑i
N loss(f(x(i); Θ), y(i))
Train Neural Network
• Now we have J (Θ) we express out loss and we will train our neural
network to minimize the loss.
• So the objective is find the theta that minimizes the loss function.
• Theta is just weights of our network.
• So loss is a function of the model’s parameters.
• To minimize the loss we need to find the lowest point.
How to minimize loss or cost?
• Once the predicted value is computed, it propagates back layer by
layer and recalculates weights associated with each neuron.
• This is known as back propagation.
• The back propagation algorithm optimizes the network performance
using cost function.
• This cost function is minimized using an iterative sequence of steps
called the gradient descent algorithm.
Gradient Descent: How to move in right direction?
• Start at random point and to get to the bottom.
• To reach bottom we calculate the gradient at this point which points
in the direction of maximum ascent…but we want to go downhill so
just multiply by negative 1 and move in opposite direction downward
and we form new point based on that.
• This way we update our parameters to form new loss.
• We can do this over and over again untill we reach the minimum
loss(untill we reach convergence)
Stochastic Gradient Descent algorithm
• Initialize Θ randomly
• For N epochs
o For each training example (x,y):
• Compute loss gradient ∂J(Θ)/∂ Θ
• Update Θ with update rule :
• Θ := Θ – ἠ* ∂J(Θ)/∂ Θ
Note Θ = W1, W2,....Wn
• Next :- How to calculate gradient part ? i.e. ∂J(Θ)/∂ Θ
How to calculate Gradient?
• Lets say we have simple neural network that just has three nodes. i.e. a input node ,a hidden
node and an output node.
W1 W2 J (Θ)
Lets look at W2. We want to see how W2 changes as our loss changes.
We calculate derivative of J (Θ) w.r.t W2. To do this we apply chain rule.
i.e. we find derivative of J (Θ) w.r.t to O0 and multiply by derivative of O0 w.r.t W2.
Similarly we can also look for W1 here i.e. calculative derivative of J (Θ) w.r.t W1.
i.e. we find derivative of J (Θ) w.r.t O0 multiply by derivative of O0 w.r.t h0 multiply by derivative of h0 w.r.t W1
This is what meant by the idea of back propagating gradients because often times gradient of one parameter
depends on previous parameter so it makes kind of chain of these.
This is idea of back propagation.
X0 h0 O0
∂J(Θ)/∂ W2 = ∂J(Θ)/∂ O0 * ∂ O0 /∂ W2
∂J(Θ)/∂ W1 = ∂J(Θ)/∂ O0 * ∂ O0 /∂ h0* ∂ h0 /∂ W1
Recap
Now we have good idea of :-
• 1. How to calculate the gradient?
• 2.How to move in the right direction?
• 3.How to minimize our loss?
Loss function can be difficult to optimize
• Update rule
• Learning rate (ἠ) actually represent the step size..i.e. how large a step
should we take with each of our gradient update.
• Next is how to choose learning rate?
Θ := Θ – ἠ* ∂J(Θ)/∂ Θ
How to choose learning rate (ἠ)?
• Small learning rate take lot of time to reach minimum and may be
struck in local minima rather global minima
• Large learning rate can leads to divergence or increase the loss.
• We need to find goldilocks in middle.
• There are couple ways like:- guessing…try whole bunch of different
values and see what gives best result.its very time consuming and not
best use of our resources.
• Do something smarter : Adaptive learning rate which adapt and
change how learning is going.
• We can adapt and change learning rate based on :-
o How fast is learning happening?
o How large are gradients are?
o How large are weights are?
o We can have different learning rate for different parameters.
Adaptive learning Rate Algorithms
• ADAM
• Momentum
• NAG
• Adagrad
• Adadelta (H20)
• RMSProp
For More info on this please check below URL
• http://ruder.io/optimizing-gradient-descent/
Over fitting
• Neural Networks are really powerful models and that are capable of
learning all sorts of features and functions.
• Sometimes they can be too powerful…i.e. they can either over fit or
memorize training examples.
• The idea of over fitting is model performs very well on training set
but when it comes to real world examples model learnt so specific to
training set that it does not apply outside or to test set.
Regularization
• Regularization is how we prevent over fitting in machine learning or
neural networks
• Regularization techniques :-
• Dropout
• Early stop
• Weight Regularization
• …Others
Intro to H2O
• H2O is fast, scalable, open-source machine learning and deep
learning for smarter application.
• Using in-memory compression, H2O handles billions of data
rows in-memory, even with a small cluster.
• H2O includes many common machine learning algorithms,
such as generalized linear modeling (linear regression, logistic
regression, etc.), Naive Bayes, principal components analysis,
time series, k-means clustering, and others.
H2O’s Deep Learning
• H2O ‘s Deep Learning is based on multi-layer feed forward artificial
neural network that is trained with stochastic gradient descent using
back propagation.
• A feed forward artificial neural network (ANN) also known as deep
neural network(DNN) or multi-layer perceptron(MLP) is the most
common type of Deep neural network and the only type that is
supported natively in H2O-3.
• Other types of DNN such as Convolution Neural Network (CNNs) and
Recurrent Neural Network RNN are popular as well.
• MLP works well on transactional data (tabular) ,CNN is great choice
for particularly image classification and RNN for sequential data (e.g.
text, audio, time-series).
• H20 deep water project supports CNNs and RNNs through third party
integration of deep learning libraries such as Tensorflow, Caffe and
MXNet.
Features
Features of H2O’s deep learning are:-
• Multi- threaded distributed parallel computation
• Adaptive learning rate for convergence
• Regularization options like L1 and L2
• Automatic missing value imputation
• Hyper parameter optimization using grid/random search.
• For optimization it uses the Hogwild method which is parallelized
version of SGD.
Parameters
• Hidden – It Specifies the number of hidden layer and number of
neurons in each layer.
• Epochs – It specifies the number of iterations to be done.
• Rate –It specifies the learning rate.
• Activation-It specifies the type of activation function to use.
(In H2O major activation function are TanH, Rectifier and Maxout.)
H2O Deep Learning Demo
Thank You!
Ad

More Related Content

What's hot (20)

Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati
 
Activation function
Activation functionActivation function
Activation function
Astha Jain
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Amr Rashed
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
Auro Tripathy
 
LeNet-5
LeNet-5LeNet-5
LeNet-5
佳蓉 倪
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
PyData
 
Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"
Ilya Kuzovkin
 
PyTorch Introduction
PyTorch IntroductionPyTorch Introduction
PyTorch Introduction
Yash Kawdiya
 
Handwritten Digit Recognition
Handwritten Digit RecognitionHandwritten Digit Recognition
Handwritten Digit Recognition
ijtsrd
 
INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRU
Sri Geetha
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
LEE HOSEONG
 
Hog
HogHog
Hog
Anirudh Kanneganti
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
YUNG-KUEI CHEN
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang
 
Lecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matchingLecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matching
cairo university
 
Understanding cnn
Understanding cnnUnderstanding cnn
Understanding cnn
Rucha Gole
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
Viet-Trung TRAN
 
Cnn
CnnCnn
Cnn
Mehrnaz Faraz
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati
 
Activation function
Activation functionActivation function
Activation function
Astha Jain
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Amr Rashed
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
PyData
 
Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"
Ilya Kuzovkin
 
PyTorch Introduction
PyTorch IntroductionPyTorch Introduction
PyTorch Introduction
Yash Kawdiya
 
Handwritten Digit Recognition
Handwritten Digit RecognitionHandwritten Digit Recognition
Handwritten Digit Recognition
ijtsrd
 
INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRU
Sri Geetha
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
LEE HOSEONG
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
YUNG-KUEI CHEN
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang
 
Lecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matchingLecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matching
cairo university
 
Understanding cnn
Understanding cnnUnderstanding cnn
Understanding cnn
Rucha Gole
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
Viet-Trung TRAN
 

Similar to Introduction to Deep learning and H2O for beginner's (20)

Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
Eran Shlomo
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
Uwe Friedrichsen
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
Shirin Elsinghorst
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)
Jon Lederman
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
ankit_ppt
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
DebabrataPain1
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
YaminiAlapati1
 
Introduction to deep Learning Fundamentals
Introduction to deep Learning FundamentalsIntroduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
VishalGour25
 
Introduction to deep Learning Fundamentals
Introduction to deep Learning FundamentalsIntroduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
VishalGour25
 
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
SrideviPcSenthilkuma
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
Gayatri Khanvilkar
 
deep learning UNIT-1 Introduction Part-1.ppt
deep learning UNIT-1 Introduction Part-1.pptdeep learning UNIT-1 Introduction Part-1.ppt
deep learning UNIT-1 Introduction Part-1.ppt
shashikanthsana
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
Ankita Tiwari
 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
PradeeshSAI
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introduction
Ganesan Narayanasamy
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Ahmed Yousry
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
Balázs Hidasi
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
Eran Shlomo
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)
Jon Lederman
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
ankit_ppt
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
DebabrataPain1
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
YaminiAlapati1
 
Introduction to deep Learning Fundamentals
Introduction to deep Learning FundamentalsIntroduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
VishalGour25
 
Introduction to deep Learning Fundamentals
Introduction to deep Learning FundamentalsIntroduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
VishalGour25
 
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
SrideviPcSenthilkuma
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
Gayatri Khanvilkar
 
deep learning UNIT-1 Introduction Part-1.ppt
deep learning UNIT-1 Introduction Part-1.pptdeep learning UNIT-1 Introduction Part-1.ppt
deep learning UNIT-1 Introduction Part-1.ppt
shashikanthsana
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
Ankita Tiwari
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introduction
Ganesan Narayanasamy
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Ahmed Yousry
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
Balázs Hidasi
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar
 
Ad

Recently uploaded (20)

Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Ad

Introduction to Deep learning and H2O for beginner's

  • 2. Contents 1. Introduction 2. Why deep learning? 3. Fundamentals of deep learning 4. How deep learning works? 5. Activation Function 6. Train Neural Network • How to minimize loss or cost? • How to move in right direction? • Stochastic Gradient Descent • How to calculate gradient? 7. Adaptive learning 8. Over fitting 9. Regularization 10. H2O • Introduction • H2O’s Deep learning • Features • Parameters • Demo
  • 3. Introduction • Deep learning is an enhanced and powerful form of neural network which is build on several hidden layers(more than 2) • Since Data comes in many forms sometimes its difficult for linear methods to detect non-linearity in the data. In fact, many a times even non-linear algorithm such as GBM, decision tree fails to learn from data. • In such cases, a multi layered neural network which creates non- linear interaction among the features gives the better solution!
  • 4. Why Deep Learning? • Neural Network has been around quite long time and only in past few years they become so popular. • Deep learning is powerful because it is able to learn powerful feature representation in unsupervised manner which differs from traditional machine learning algorithm where we have to manually handcraft features. • Handcraft features works in a lot of domain but some domains like image classification where data is very high dimensional and makes it difficult to craft feature that are useful for prediction. • So deep learning takes the approach to take all the data and figuring out what are the best features.
  • 6. Fundamentals of Deep learning • At the core of any neural network is a perceptron • In neural network visual… circle represents neuron and line represents synapse. • Synapse has very simple job they take value from neuron and multiply by specific weight and output the result. • Neurons are little bit complicated.. their job is to add together the outputs from all the synapses add a bias term and apply the activation function. • Activation function allow neural net to model complex non linear pattern.
  • 7. Perceptron Forward Pass Multiply weight and Inputs • Step 1 (Synapse) SUM all together and add bias term • Step 2 (Neuron) Applying Non linearity • Step3 (Activtaion function)
  • 8. Why is bias added ? • Bias is similar to the intercept term in linear regression. • It helps in achieving better prediction by shifting the decision boundary
  • 9. Activation function • At the core of every activation function is non linearity which transforms output from linear feature to non linear feature. • There are many -many activation functions . • Some common activation functions are :- Sigmoid, TanH, ReLu,
  • 10. Importance of Activation function • Activation functions add non linearity to our network’s function. • Non linearity is important because most of our real world data is non linear.
  • 11. How to build neural network with perceptron? • Perceptron is very basic of neural network. However, perceptron isn’t powerful enough to work on linearly separable data. • Due to this Multi-Layer Perceptron came into existence • We can add a hidden layer between input layer and output layer which gives rise to Multi-Layer Perceptron. (MLP) • To extend MLP to deep neural network simply add more layers.
  • 12. Deep Learning Model • The input layer consist of neurons to equal to number of input variable in the data. • The number of neurons in hidden layer depends on user. • We can find optimum number of neurons in hidden layers using cross-validating strategy.
  • 13. Applying Neural Network • To quantify how good is our neural network we calculates loss i.e. sum of difference of actual output and predicted output. • There are lot of loss functions like cross entropy loss, mean square error etc. • Loss is represented as J (Θ) • Our goal is to minimize the loss so that network can predict the output more accurately. • Note Θ = W1, W2,....Wn J (Θ) = 1/N∑i N loss(f(x(i); Θ), y(i)) argΘ min 1/N∑i N loss(f(x(i); Θ), y(i))
  • 14. Train Neural Network • Now we have J (Θ) we express out loss and we will train our neural network to minimize the loss. • So the objective is find the theta that minimizes the loss function. • Theta is just weights of our network. • So loss is a function of the model’s parameters. • To minimize the loss we need to find the lowest point.
  • 15. How to minimize loss or cost? • Once the predicted value is computed, it propagates back layer by layer and recalculates weights associated with each neuron. • This is known as back propagation. • The back propagation algorithm optimizes the network performance using cost function. • This cost function is minimized using an iterative sequence of steps called the gradient descent algorithm.
  • 16. Gradient Descent: How to move in right direction? • Start at random point and to get to the bottom. • To reach bottom we calculate the gradient at this point which points in the direction of maximum ascent…but we want to go downhill so just multiply by negative 1 and move in opposite direction downward and we form new point based on that. • This way we update our parameters to form new loss. • We can do this over and over again untill we reach the minimum loss(untill we reach convergence)
  • 17. Stochastic Gradient Descent algorithm • Initialize Θ randomly • For N epochs o For each training example (x,y): • Compute loss gradient ∂J(Θ)/∂ Θ • Update Θ with update rule : • Θ := Θ – ἠ* ∂J(Θ)/∂ Θ Note Θ = W1, W2,....Wn • Next :- How to calculate gradient part ? i.e. ∂J(Θ)/∂ Θ
  • 18. How to calculate Gradient? • Lets say we have simple neural network that just has three nodes. i.e. a input node ,a hidden node and an output node. W1 W2 J (Θ) Lets look at W2. We want to see how W2 changes as our loss changes. We calculate derivative of J (Θ) w.r.t W2. To do this we apply chain rule. i.e. we find derivative of J (Θ) w.r.t to O0 and multiply by derivative of O0 w.r.t W2. Similarly we can also look for W1 here i.e. calculative derivative of J (Θ) w.r.t W1. i.e. we find derivative of J (Θ) w.r.t O0 multiply by derivative of O0 w.r.t h0 multiply by derivative of h0 w.r.t W1 This is what meant by the idea of back propagating gradients because often times gradient of one parameter depends on previous parameter so it makes kind of chain of these. This is idea of back propagation. X0 h0 O0 ∂J(Θ)/∂ W2 = ∂J(Θ)/∂ O0 * ∂ O0 /∂ W2 ∂J(Θ)/∂ W1 = ∂J(Θ)/∂ O0 * ∂ O0 /∂ h0* ∂ h0 /∂ W1
  • 19. Recap Now we have good idea of :- • 1. How to calculate the gradient? • 2.How to move in the right direction? • 3.How to minimize our loss?
  • 20. Loss function can be difficult to optimize • Update rule • Learning rate (ἠ) actually represent the step size..i.e. how large a step should we take with each of our gradient update. • Next is how to choose learning rate? Θ := Θ – ἠ* ∂J(Θ)/∂ Θ
  • 21. How to choose learning rate (ἠ)? • Small learning rate take lot of time to reach minimum and may be struck in local minima rather global minima • Large learning rate can leads to divergence or increase the loss. • We need to find goldilocks in middle. • There are couple ways like:- guessing…try whole bunch of different values and see what gives best result.its very time consuming and not best use of our resources. • Do something smarter : Adaptive learning rate which adapt and change how learning is going. • We can adapt and change learning rate based on :- o How fast is learning happening? o How large are gradients are? o How large are weights are? o We can have different learning rate for different parameters.
  • 22. Adaptive learning Rate Algorithms • ADAM • Momentum • NAG • Adagrad • Adadelta (H20) • RMSProp For More info on this please check below URL • http://ruder.io/optimizing-gradient-descent/
  • 23. Over fitting • Neural Networks are really powerful models and that are capable of learning all sorts of features and functions. • Sometimes they can be too powerful…i.e. they can either over fit or memorize training examples. • The idea of over fitting is model performs very well on training set but when it comes to real world examples model learnt so specific to training set that it does not apply outside or to test set.
  • 24. Regularization • Regularization is how we prevent over fitting in machine learning or neural networks • Regularization techniques :- • Dropout • Early stop • Weight Regularization • …Others
  • 25. Intro to H2O • H2O is fast, scalable, open-source machine learning and deep learning for smarter application. • Using in-memory compression, H2O handles billions of data rows in-memory, even with a small cluster. • H2O includes many common machine learning algorithms, such as generalized linear modeling (linear regression, logistic regression, etc.), Naive Bayes, principal components analysis, time series, k-means clustering, and others.
  • 26. H2O’s Deep Learning • H2O ‘s Deep Learning is based on multi-layer feed forward artificial neural network that is trained with stochastic gradient descent using back propagation. • A feed forward artificial neural network (ANN) also known as deep neural network(DNN) or multi-layer perceptron(MLP) is the most common type of Deep neural network and the only type that is supported natively in H2O-3. • Other types of DNN such as Convolution Neural Network (CNNs) and Recurrent Neural Network RNN are popular as well. • MLP works well on transactional data (tabular) ,CNN is great choice for particularly image classification and RNN for sequential data (e.g. text, audio, time-series). • H20 deep water project supports CNNs and RNNs through third party integration of deep learning libraries such as Tensorflow, Caffe and MXNet.
  • 27. Features Features of H2O’s deep learning are:- • Multi- threaded distributed parallel computation • Adaptive learning rate for convergence • Regularization options like L1 and L2 • Automatic missing value imputation • Hyper parameter optimization using grid/random search. • For optimization it uses the Hogwild method which is parallelized version of SGD.
  • 28. Parameters • Hidden – It Specifies the number of hidden layer and number of neurons in each layer. • Epochs – It specifies the number of iterations to be done. • Rate –It specifies the learning rate. • Activation-It specifies the type of activation function to use. (In H2O major activation function are TanH, Rectifier and Maxout.)