SlideShare a Scribd company logo
  
Share
 2
Understanding Deep Learning & Parameter Tuning with MXnet, H2o
Package in R
M A C H I N E L E A R N I N GR
 January 30, 2017
Introduction
Deep Learning isn't a recent discovery. The seeds were sown back in the 1950s when the first
artificial neural network was created. Since then, progress has been rapid, with the structure of
the neuron being "re-invented" artificially.
Computers and mobiles have now become powerful enough to identify objects from images.
Not just images, they can chat with you as well! Haven't you tried Google's Allo app ? That's not all
—they can drive, make supersonic calculations, and help businesses solve the most complicated
problems (more users, revenue, etc).
But, what is driving all these inventions? It's Deep Learning!
With increasing open source contributions, R language now provides a fantastic interface for
building predictive models based on neural networks and deep learning. However, learning to
CATEGORY
build models isn't enough. You ought to understand the interesting story behind them.
In this tutorial, I'll start with the basics of neural networks and deep learning (from scratch). Along
with theory, we'll also learn to build deep learning models in R using MXNet and H2O package.
Also, we'll learn to tune parameters of a deep learning model for better model performance.
Note: This article is meant for beginners and expects no prior understanding of deep learning
(or neural networks).
Table of Contents
1. What is Deep Learning ? How is it different from a Neural Network?
2. How does Deep Learning work ?
Why is bias added to the network ?
What are activation functions and their types ?
3. Multi Layered Neural Networks
What is Backpropagation Algorithm ? How does it work ?
Gradient Descent
4. Practical Deep Learning with H2O & MXnet
What is Deep Learning ? How is it different from a Neural Network?
Deep Learning is the new name for multilayered neural networks. You can say, deep learning is an
enhanced and powerful form of a neural network. The difference between the two is subtle.
The difference lies in the fact that, deep learning models are build on several hidden layers (say,
more than 2) as compared to a neural network (built on up to 2 layers).
Since data comes in many forms (tables, images, sound, web etc), it becomes extremely difficult
for linear methods to learn and detect the non - linearity in the data. In fact, many a times even
non-linear algorithms such as tree based (GBM, decision tree) fails to learn from data.
In such cases, a multi layered neural network which creates non - linear interactions among the
features (i.e. goes deep into features) gives a better solution.
You might ask this question, 'Neural networks emerged in 1950s. But, deep learning emerged just
few years back. What happened all of a sudden in last few years?'
In the last few years, there has been tremendous advancement in computational devices (specially
GPUs). High performance of deep learning models come with a cost i.e. computation. They require
large memory for computation.
The world is continually progressing from CPU to GPU (Graphics Processing Unit). Why ? Because,
a CPU can be enabled with max. 22 cores, but a GPU can contain thousands of cores, thereby
making it exponentially powerful than a CPU.
Upcoming Webinar: How to Become a Data Scientist ?
How does Deep Learning work ?
To understand deep learning, let's start with basic form of neural network architecture i.e.
perceptron.
A Neural Network draws its structure from a human neuron. A human neuron looks like this:
Yes, you have it too. And, not just one, but billions. We have billions of neurons and trillions of
synapses (electric signals) which pass through them. Watch this short video (~2mins) to
understand your brain better.
It works like this:
1. The dendrites receive the input signal (message).
2. These dendrites apply a weight to the input signal. Think of weight as "importance factor" i.e.
higher the weight, higher the importance of signal.
3. The soma (cell body) acts on the input signal and does the necessary computation (decision
making).
4. Then, the signal passes through the axon via a threshold function. This function decides
whether the signal needs to be passed further.
5. If the input signal exceeds the threshold, the signal gets fired though the axon to terminals to
other neuron.
This is a simplistic explanation of human neurons. The idea is to make you understand the analogy
between human and artificial neurons.
Now, let's understand the working of an artificial neuron. The process is quite similar to the
explanation above. Make sure you understand it well because it's the fundamental concept of
neural network. A simplistic artificial neuron looks like this:
Here x1, x2, ... xn are the input variables (or independent variables). As the input variables are fed
into the network, they get assigned some random weights (w1,w2...wn). Alongside, a bias (wo) is
added to the network (explained below). The adder adds all the weighted input variable. The
output (y) is passed through the activation function and calculated using the equation:
where wo = bias, wi = weights, xi = input variables. The function g() is the activation function. In
this case, the activation function works like this: if the weighted sum of input variables exceeds a
certain threshold, it will output 1, else 0.
This simple neuron model is also known as McCulloch-Pitts model or Perceptron. In simple
words, a perceptron takes several input variables and returns a binary output. Why binary output
? Because, it uses a sigmoid function as the activation function (explained below).
If you remove the activation function, what you get is a simple regression model. After adding the
sigmoid activation function, it performs the same task as logistic regression.
However, perceptron isn't powerful enough to work on linearly inseparable data. Due to its
limitations, Multilayer Perceptron (MLP) came into existence. If the perceptron is one neuron,
think of MLP as a complete brain which comprises several neurons.
Why is bias added in the neural network ?
Bias (wo) is similar to the intercept term in linear regression. It helps improve the accuracy of
prediction by shifting the decision boundary along Y axis. For example, in the image shown below,
had the slope emerged from the origin, the error would have been higher than the error after
adding the intercept to the slope.
Similarly, in a neural network, the bias helps in shifting the decision boundary to achieve better
predictions.
What are activation functions and their types ?
The perceptron classifies instances by processing a linear combination of input variables through
the activation function. We also learned above that the perceptron algorithm returns binary
output by using a sigmoid function (shown below).
A sigmoid function (or logistic neuron ) is used in logistic regression. This function caps the max
and min values at 1 and 0 such that any large positive number becomes 1 and large negative
number becomes 0.
It is used in neural networks because it has nice mathematical properties (derivative is easier to
compute), which help calculate gradient in the backpropagation method (explained below).
In general, activation functions govern the type of decision boundary to produce given a non-
linear combination of input variables. Also, due to their mathematical properties, activation
functions play a significant role in optimizing prediction accuracy. Here is a complete list of
activation functions you can find.
Multi Layer Neural Network (or Deep Learning Model)
A multilayered neural network comprises a chain of interconnected neurons which creates the
neural architecture. As shown below, along with input and output layers, it consists of multiple
hidden layers also. Don't worry about the word "hidden;" it's how middle layers are named.
The input layer consists of neurons equal to the number of input variables in the data. The number
of neurons in the hidden layer depends on the user. In R, we can find the optimum number of
neurons in the hidden layer using a cross-validation strategy. Multilayered neural networks are
preferred when the given data set has a large number of features. That's why this model is being
widely used to work on images, text data, etc.
There are several types of neural networks; two of which are most commonly used:
1. Feedforward Neural Network: In this network, the information flows in one direction, i.e.,
from the input node to the output node.
2. Recurrent (or Feedback) Neural Network: In this network, the information flows from the
output neuron back to the previous layer as well. It uses the backpropagation algorithm.
What is the Backpropagation Algorithm? How does it work ?
The goal of the backpropagation algorithm is to optimize the weights associated with neurons so
that the network can learn to predict the output more accurately. Once the predicted value is
computed, it propagates back layer by layer and re-calculates weights associated with each
neuron.
In simple words, it tries to bring the predicted value as close to the actual value. It's quite
interesting!
The backpropagation algorithm optimizes the network performance using a cost function. This
cost function is minimized using an iterative sequence of steps called the gradient descent
algorithm. Let's first understand it mathematically. Then, we'll look at an example.
We'll take the cost function as squared error. It can be written as:
where n = the number of observations in the data, yi = the actual value, yi cap = the predicted
value. Let's call it equation 1.
The constant value 1/2 is added in front for ease of computational purposes. You'll understand it
in a while.
This cost function is convex in nature. A convex function can be identified by a U-shaped curve
(shown below). A great property of the convex function is that it is guaranteed to provide the
lowest value when differentiated at zero.
Think of a ball rolling down the curve. It will take a few rounds of rolling (up and down) to slow
down and it settles at the bottom. That bottom point is the minimum. And, that's where we want
togo!
If we assume that the data is fixed and the resultant cost function is a function of weights, we can
re-write equation 1 as:
where J(w) is the vector of weights. Note that we have only substituted ycap with its functional
form (wjxj); the rest of the equation is same. Now, this equation is ready for differentiation.
After partially differentiating the equation with respect to weights, we get a general equation as
As you can see, the constant 1/2 got cancelled. In partial differentiation, we differentiate the
entire equation with respect to one variable, keeping other variables constant.
We also learn that the partial derivative of this cost function is just the difference between actual
and predicted values multiplied by the respective weights averaged over all observations (n).
The weight vector [J(w)] comprises weights corresponding to every row in the data. To compute
these weights more effectively, gradient descent comes into picture. For a particular value of
weight, gradient descent works like this:
1. First, it calculates the partial derivative of the weight.
2. If the derivative is positive, it decreases the weight value.
3. If the derivative is negative, it increases the weight value.
4. The motive is to reach to the lowest point (zero) in the convex curve where the derivative is
minimum.
5. It progresses iteratively using a step size (η), which is defined by the user. But make sure that
the step size isn't too large or too small. Too small a step size will take longer to converge, too
large a step size will never reach an optimum.
Remember that the motive of gradient descent is to get to the bottom of the curve. The gradient
descent equation can be written as
Let's understand it using an example.
Suppose, we have a data set with 2 variables (inputs are scaled between 0 and 1):
Age CGPA Target
0.2 0.1 1
0.4 0.6 0
0.5 0.2 1
Let's run a recurrent neural network model on this data with 2 input neurons and an output
neuron. The activation function is a sigmoid function. If you understand this, calculations with
hidden neurons are similar. Output from one layer becomes input for the hidden layers.
Iteration 1:
Initial Weight (randomly chosen(wo,w1,w2)): 0.1,0.1,0.1
Bias: 1
Input value: 0.2, 0.1 [1st Row]
y = 1*0.1 + 0.1*(0.2) + 0.1*(0.1) = 0.13
y = 1 / (1 + e ^ (0.13)) = 0.467
ycap = 0 [prediction is incorrect]
Now, we'll re-calculate the weights using the equation above:
w1 = 0.1 - 1*1/2[(0 -1)*1] = 0.6
w2 = 0.1 - 1*1/2[(0 -1)*0.2] = 0.2
w3 = 0.1 - 1*1/2[(0 -1)*0.1] = 0.15
New weights: 0.6,0.2,0.15
Input value: 0.4, 0.6 [2nd Row]
y = 1*0.6 + 0.2*(0.4) + 0.15*(0.6) = 0.77
y = 1 / (1 + e ^ (0.77)) = 0.316
ycap = 0 [prediction is correct]
Since the prediction is correct, we'll continue with the same weights:
Weights: 0.6,0.2,0.15
Input value: 0.5, 0.2
y =1*0.6 + 0.2*(0.5) + 0.15*(0.2) = 0.73
y = 1 / (1 + e ^ (0.73)) = 0.323
ycap = 0 [prediction is incorrect]
Again, the algorithm will recompute the weights for Iteration 2 and so on. Practically, this
iteration goes on until the user defined stopping criteria is reached or the algorithm converges.
Since the algorithm finds weights for every row in the data, what if your data set has 10 million
rows ? You are lucky if you have a powerful computational machine. But for the unlucky ones ?
Don't get upset, you can use the stochastic gradient descent algorithm.
The only difference between gradient descent and stochastic gradient descent (SGD) is that SGD
takes one observation (or a batch) at a time instead of all the observations. It assumes that the
gradient for a cost function computed for a particular row of observations will be approximately
equal to the gradient computed across all rows.
It updates parameters (bias and weights) for each training example. Also, SGD is being widely
using in online learning algorithms.
Practical Deep Learning (+ Tuning) with H2O and MXNet
Until here, we focused on the conceptual part of deep learning. Now, we'll get some hands-on
experience in building deep learning models. R offers a fantastic bouquet of packages for deep
learning.
Here, we'll look at two of the most powerful packages built for this purpose.
For this tutorial, I've used the adult data set from the UC Irivine ML repository. Let's start with
H2O. This data set isn't the most ideal one to work with in neural networks. However, the motive
of this hands-on section is to make you familiar with model-building processes.
H2O Package
H2O package provides h2o.deeplearning function for model building. It is built on Java. Primarily,
this function is useful to build multilayer feedforward neural networks. It is enabled with several
features such as the following:
Multi-threaded distributed parallel computation
Adaptive learning rate (or step size) for faster convergence
Regularization options such as L1 and L2 which help prevent overfitting
Automatic missing value imputation
Hyperparameter optimization using grid/random search
There are many more!
For optimization, this package uses the hogwild method instead of stochastic gradient descent.
Hogwild is just parallelized version of SGD.
Let's understand the parameters involved in model building with h2o. Both the packages have
different nomenclatures, so make sure you don't get confused. Since most of the parameters are
easy to understand by their names, I'll mention the important ones:
1. hidden - It specifies the number of hidden layers and number of neurons in each layer in the
architechture.
2. epochs - It specifies the number of iterations to be done on the data set.
3. rate - It specifies the learning rate.
4. activation - It specifies the type of activation function to use. In h2o, the major activation
functions are Tanh, Rectifier, and Maxout.
Let's quickly load the data and get over with sanitary data pre-processing steps:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
path = "~/mydata/deeplearning"
setwd(path)
#load libraries
library(data.table)
library(mlr)
#set variable names
setcol <- c("age",
"workclass",
"fnlwgt",
"education",
"education-num",
"marital-status",
"occupation",
"relationship",
view raw
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
DL1_H2o.R hosted with ❤ by GitHub
Now, let's build a simple deep learning model. Generally, computing variable importance from a
"relationship",
"race",
"sex",
"capital-gain",
"capital-loss",
"hours-per-week",
"native-country",
"target")
#load data
train <- read.table("adultdata.txt",header = F,sep = ",",col.names = setcol,na.strings = c(" ?"),stringsAsFactors = F)
test <- read.table("adulttest.txt",header = F,sep = ",",col.names = setcol,skip = 1, na.strings = c(" ?"),stringsAsFactors = F)
setDT(train)
setDT(test)
#Data Sanity
dim(train) #32561 X 15
dim(test) #16281 X 15
str(train)
str(test)
#check missing values
table(is.na(train))
sapply(train, function(x) sum(is.na(x))/length(x))*100
table(is.na(test))
sapply(test, function(x) sum(is.na(x))/length(x))*100
#check target variable
#binary in nature check if data is imbalanced
train[,.N/nrow(train),target]
test[,.N/nrow(test),target]
#remove extra characters
test[,target := substr(target,start = 1,stop = nchar(target)-1)]
#remove leading whitespace
library(stringr)
char_col <- colnames(train)[sapply(test,is.character)]
for(i in char_col)
set(train,j=i,value = str_trim(train[[i]],side = "left"))
#set all character variables as factor
fact_col <- colnames(train)[sapply(train,is.character)]
for(i in fact_col)
set(train,j=i,value = factor(train[[i]]))
for(i in fact_col)
set(test,j=i,value = factor(test[[i]]))
#impute missing values
imp1 <- impute(data = train,target = "target",classes = list(integer = imputeMedian(), factor = imputeMode()))
imp2 <- impute(data = test,target = "target",classes = list(integer = imputeMedian(), factor = imputeMode()))
train <- setDT(imp1$data)
test <- setDT(imp2$data)
view raw
trained deep learning model is quite pain staking. But, h2o package provides an effortless
function to compute variable importance from a deep learning model.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
DL2.H2o.R hosted with ❤ by GitHub
Now, let's train a deep learning model with one hidden layer comprising five neurons. This time
instead of checking the cross-validation accuracy, we'll validate the model on test data.
#load the package
require(h2o)
#start h2o
localH2o <- h2o.init(nthreads = -1, max_mem_size = "20G")
#load data on H2o
trainh2o <- as.h2o(train)
testh2o <- as.h2o(test)
#set variables
y <- "target"
x <- setdiff(colnames(trainh2o),y)
#train the model - without hidden layer
deepmodel <- h2o.deeplearning(x = x
,y = y
,training_frame = trainh2o
,standardize = T
,model_id = "deep_model"
,activation = "Rectifier"
,epochs = 100
,seed = 1
,nfolds = 5
,variable_importances = T)
#compute variable importance and performance
h2o.varimp_plot(deepmodel,num_of_features = 20)
h2o.performance(deepmodel,xval = T) #84.5 % CV accuracy
view raw
view raw
1
2
3
4
5
6
7
8
9
10
11
12
DL3_H2o.R hosted with ❤ by GitHub
For hyperparameter tuning, we'll perform a random grid search over all parameters and choose
the model which returns highest accuracy.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
DL4.H2o.R hosted with ❤ by GitHub
MXNetR Package
The mxnet package provides an incredible interface to build feedforward NN, recurrent NN and
convolutional neural networks (CNNs). CNNs are being widely used in detecting objects from
images. The team that created xgboost also created this package. Currently, mxnet is being
popularly used in kaggle competitions for image classification problems.
deepmodel <- h2o.deeplearning(x = x
,y = y
,training_frame = trainh2o
,validation_frame = testh2o
,standardize = T
,model_id = "deep_model"
,activation = "Rectifier"
,epochs = 100
,seed = 1
,hidden = 5
,variable_importances = T)
h2o.performance(deepmodel,valid = T) #85.6%
#set parameter space
activation_opt <- c("Rectifier","RectifierWithDropout", "Maxout","MaxoutWithDropout")
hidden_opt <- list(c(10,10),c(20,15),c(50,50,50))
l1_opt <- c(0,1e-3,1e-5)
l2_opt <- c(0,1e-3,1e-5)
hyper_params <- list( activation=activation_opt,
hidden=hidden_opt,
l1=l1_opt,
l2=l2_opt )
#set search criteria
search_criteria <- list(strategy = "RandomDiscrete", max_models=10)
#train model
dl_grid <- h2o.grid("deeplearning"
,grid_id = "deep_learn"
,hyper_params = hyper_params
,search_criteria = search_criteria
,training_frame = trainh2o
,x=x
,y=y
,nfolds = 5
,epochs = 100)
#get best model
d_grid <- h2o.getGrid("deep_learn",sort_by = "accuracy")
best_dl_model <- h2o.getModel(d_grid@model_ids[[1]])
h2o.performance (best_dl_model,xval = T) #CV Accuracy - 84.7%
view raw
view raw
This package can be easily connected with GPUs as well. The process of building model
architecture is quite intuitive. It gives greater control to configure the neural network manually.
Let's get some hands-on experience using this package.
Follow the commands below to install this package in your respective OS. For Windows and Linux
users, installation commands are given below. For Mac users, here's the installation procedure.
1
2
3
4
5
6
7
8
9
10
11
12
13
mxnet1.R hosted with ❤ by GitHub
In R, mxnet accepts target variables as numeric classes and not factors. Also, it accepts data frame
as a matrix. Now, we'll make the required changes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
mxnet2.R hosted with ❤ by GitHub
Now, we'll train the multilayered perceptron model using the mx.mlp function.
# # Installation - Windows
install.packages("drat", repos="https://cran.rstudio.com")
drat:::addRepo("dmlc")
install.packages("mxnet")
library(mxnet)
#Installation - Linux
#Press Ctrl + Alt + T and run the following command
sudo apt-get update
sudo apt-get -y install git
git clone https://github.com/dmlc/mxnet.git ~/mxnet --recursive
cd ~/mxnet/setup-utils
bash install-mxnet-ubuntu-r.sh
#load package
require(mxnet)
#convert target variables into numeric
train[,target := as.numeric(target)-1]
test[,target := as.numeric(target)-1]
#convert train data to matrix
train.x <- data.matrix(train[,-c("target"),with=F])
train.y <- train$target
#convert test data to matrix
test.x <- data.matrix(test[,-c("target"),with=F])
test.y <- test$target
view raw
view raw
view raw
view raw
1
2
3
4
5
6
7
8
9
10
11
12
13
mxnet3.R hosted with ❤ by GitHub
Softmax function is used for binary and multi-classification problems. Alternatively, you can also
manually craft the model structure.
1
2
3
4
mxnet4.R hosted with ❤ by GitHub
We have configured the network above with one hidden layer carrying three neurons. We have
chosen softmax as the output function. The network optimizes for squared loss for regression,
and the network optimizes for classification accuracy for classification. Now, we'll train the
network:
1
2
3
4
5
6
7
8
mxnet5.R hosted with ❤ by GitHub
Similarly, we can configure a more complexed network fed with hidden layers.
1
2
3
4
5
6
mxnet6.R hosted with ❤ by GitHub
Understand it carefully: After feeding the input through data, the first hidden layer consists of 10
neurons. The output of each neuron passes through a relu (rectified linear) activation function. We
have used it in place of sigmoid. relu converges faster than a sigmoid function. You can read more
#set seed to reproduce results
mx.set.seed(1)
mlpmodel <- mx.mlp(data = train.x
,label = train.y
,hidden_node = 3 #one layer with 10 nodes
,out_node = 2
,out_activation = "softmax" #softmax return probability
,num.round = 100 #number of iterations over training data
,array.batch.size = 20 #after every batch weights will get updated
,learning.rate = 0.03 #same as step size
,eval.metric= mx.metric.accuracy
,eval.data = list(data = test.x, label = test.y))
#create NN structure
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, num_hidden=3) #3 neuron in one layer
lrm <- mx.symbol.SoftmaxOutput(fc1)
nnmodel <- mx.model.FeedForward.create(symbol = lrm
,X = train.x
,y = train.y
,ctx = mx.cpu()
,num.round = 100
,eval.metric = mx.metric.accuracy
,array.batch.size = 50
,learning.rate = 0.01)
#configure another network
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name = "fc1", num_hidden=10) #1st hidden layer
act1 <- mx.symbol.Activation(fc1, name = "sig", act_type="relu")
fc2 <- mx.symbol.FullyConnected(act1, name = "fc2", num_hidden=2) #2nd hidden layer
out <- mx.symbol.SoftmaxOutput(fc2, name = "soft")
view raw
view raw
about relu here.
Then, the output is fed into the second layer which is the output layer. Since our target variable
has two classes, we've chosen num_hidden as 2 in the second layer. Finally, the output from
second layer is made to pass though softmax output function.
1
2
3
4
5
6
7
8
9
mxnet7.R hosted with ❤ by GitHub
As mentioned above, this trained model predicts output probability, which can be easily
transformed into a label using a threshold value (say, 0.5). To make predictions on the test set, we
do this:
1
2
3
4
5
6
mxnet8.R hosted with ❤ by GitHub
The predicted matrix returns two rows and 16281 columns, each column carrying probability.
Using the max.col function, we can extract the maximum value from each row. If you check the
model's accuracy, you'll find that this network performs terribly on this data. In fact, it gives no
better result than the train accuracy! On this data set, xgboost tuning gave 87% accuracy!
If you are familiar with the model building process, I'd suggest you to try working on the popular
MNIST data set. You can find tons of tutorials on this data to get you going!
Summary
Deep Learning is getting increasingly popular in solving most complex problems such as image
recognition, natural language processing, etc. If you are aspiring for a career in machine learning,
this is the best time for you to get into this subject. The motive of this article was to introduce you
to the fundamental concepts of deep learning.
In this article, we learned about the basics of deep learning (perceptrons, neural networks, and
multilayered neural networks). We learned deep learning as a technique is composed of several
algorithms such as backpropagration and gradient descent to optimize the networks. In the end,
we gained some hands-on experience in developing deep learning models.
Do let me know if you have any feedback, suggestions, or thoughts on this article in the comments
below!
#train the network
dp_model <- mx.model.FeedForward.create(symbol = out
,X = train.x
,y = train.y
,ctx = mx.cpu()
,num.round = 100
,eval.metric = mx.metric.accuracy
,array.batch.size = 50
,learning.rate = 0.005)
#predict on test
pred_dp <- predict(dp_model,test.x)
str(pred_dp) #contains 2 rows and 16281 columns
#transpose the pred matrix
pred.val <- max.col(t(pred_dp))-1
  
Share
 2
 
A B O U T T H E A U T H O R
Manish Saraswat
Making an effort to help people understand Machine Learning. I believe your educational background
doesn't stop you to pursue ML & Data Science. Earned Masters in F/M, a self taught data science
professional. Previously worked at Analytics Vidhya. Now solving ML & Growth challenges at HackerEarth!
A U T H O R P O S T
Y O U M A Y A L S O L I K E
NEVER MISS A POST AGAIN
Subscribe to get latest updates delivered fresh to your inbox
Top 17 Competitive Data Scientists From India on Kaggle
Practical Guide to Clustering Algorithms & Evaluation in R
13 Free Training Courses on Machine Learning and Artificial Intelligence
Explaining The Basics of Machine Learning, Algorithms and Applications
How can R Users Learn Python for Data Science ?
Search... 
Email
S u b s c r i b e N o w
0
Comments
HackerEarth Login1
Share⤤ Sort by Best
Start the discussion…
Subscribe✉ Add Disqus to your sited Privacy�
Recommend
A B O U T U S
Blog
Engineering Blog
Updates & Releases
Team
Careers
In the Press
T O P C A T E G O R I E S
Hiring
Placements
Hackathons
Community
Competitive Programming
Culture
R E S O U R C E S
Webinars
Podcasts
CodeTable
Hackathon Handbook
Complete Reference to Competitive Programming
How to get started with Open Source
F O R C O M P A N I E S
Recruit
Assessment
Sourcing
Host Hackathons
Interview
© 2017
HackerEarth
Ad

More Related Content

What's hot (20)

Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Simplilearn
 
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Simplilearn
 
Artificial Neural Network (draft)
Artificial Neural Network (draft)Artificial Neural Network (draft)
Artificial Neural Network (draft)
James Boulie
 
Neural network and fuzzy logic
Neural network and fuzzy logicNeural network and fuzzy logic
Neural network and fuzzy logic
Lakshmi Sarveypalli
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
Mohd Arafat Shaikh
 
Ppt on artifishail intelligence
Ppt on artifishail intelligencePpt on artifishail intelligence
Ppt on artifishail intelligence
snehal_gongle
 
Neural networks introduction
Neural networks introductionNeural networks introduction
Neural networks introduction
آيةالله عبدالحكيم
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
Mohammed Bennamoun
 
Neural networks
Neural networksNeural networks
Neural networks
Basil John
 
Artificial neural network paper
Artificial neural network paperArtificial neural network paper
Artificial neural network paper
AkashRanjandas1
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
Dean Wyatte
 
Ann
Ann Ann
Ann
vini89
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
Imtiaz Siddique
 
Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.
Mohd Faiz
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
iTrain
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learning
grinu
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
Ahmed_hashmi
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
REHMAT ULLAH
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
NAGUR SHAREEF SHAIK
 
Neural network and artificial intelligent
Neural network and artificial intelligentNeural network and artificial intelligent
Neural network and artificial intelligent
HapPy SumOn
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Simplilearn
 
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Simplilearn
 
Artificial Neural Network (draft)
Artificial Neural Network (draft)Artificial Neural Network (draft)
Artificial Neural Network (draft)
James Boulie
 
Ppt on artifishail intelligence
Ppt on artifishail intelligencePpt on artifishail intelligence
Ppt on artifishail intelligence
snehal_gongle
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
Mohammed Bennamoun
 
Neural networks
Neural networksNeural networks
Neural networks
Basil John
 
Artificial neural network paper
Artificial neural network paperArtificial neural network paper
Artificial neural network paper
AkashRanjandas1
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
Dean Wyatte
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
Imtiaz Siddique
 
Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.
Mohd Faiz
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
iTrain
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learning
grinu
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
Ahmed_hashmi
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
REHMAT ULLAH
 
Neural network and artificial intelligent
Neural network and artificial intelligentNeural network and artificial intelligent
Neural network and artificial intelligent
HapPy SumOn
 

Similar to Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R (20)

Neural Networks Basic Concepts and Deep Learning
Neural Networks Basic Concepts and Deep LearningNeural Networks Basic Concepts and Deep Learning
Neural Networks Basic Concepts and Deep Learning
rahuljain582793
 
Cnn
CnnCnn
Cnn
rimshailyas1
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
Derek Kane
 
Neural Network
Neural NetworkNeural Network
Neural Network
Ashish Kumar
 
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdfDeep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
naveenraghavendran10
 
Fuzzy Logic Final Report
Fuzzy Logic Final ReportFuzzy Logic Final Report
Fuzzy Logic Final Report
Shikhar Agarwal
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
Eyad Alshami
 
Artificial Neural networks
Artificial Neural networksArtificial Neural networks
Artificial Neural networks
Learnbay Datascience
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Shahid Rajaee
 
INTRODUCTION TO NEURAL NETWORKS FINAL YEAR
INTRODUCTION TO NEURAL NETWORKS FINAL YEARINTRODUCTION TO NEURAL NETWORKS FINAL YEAR
INTRODUCTION TO NEURAL NETWORKS FINAL YEAR
ARVIND SARDAR
 
Lecture7_Neural Networks_and_analysis2024.pdf
Lecture7_Neural Networks_and_analysis2024.pdfLecture7_Neural Networks_and_analysis2024.pdf
Lecture7_Neural Networks_and_analysis2024.pdf
adiworks30
 
BASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxBASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptx
RiteshPandey184067
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
Akash Goel
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cse
NaveenBhajantri1
 
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance lec 13
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance   lec 13Dr. Syed Muhammad Ali Tirmizi - Special topics in finance   lec 13
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance lec 13
Dr. Muhammad Ali Tirmizi., Ph.D.
 
Deep learning vxcvbfsdfaegsr gsgfgsdg sd gdgd gdgd gse
Deep learning vxcvbfsdfaegsr gsgfgsdg  sd gdgd gdgd gseDeep learning vxcvbfsdfaegsr gsgfgsdg  sd gdgd gdgd gse
Deep learning vxcvbfsdfaegsr gsgfgsdg sd gdgd gdgd gse
NeelaGargeya
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Deep Learning Demystified
Deep Learning DemystifiedDeep Learning Demystified
Deep Learning Demystified
Affine Analytics
 
Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In Management
IOSR Journals
 
deep learning
deep learningdeep learning
deep learning
Hassanein Alwan
 
Neural Networks Basic Concepts and Deep Learning
Neural Networks Basic Concepts and Deep LearningNeural Networks Basic Concepts and Deep Learning
Neural Networks Basic Concepts and Deep Learning
rahuljain582793
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
Derek Kane
 
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdfDeep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
naveenraghavendran10
 
Fuzzy Logic Final Report
Fuzzy Logic Final ReportFuzzy Logic Final Report
Fuzzy Logic Final Report
Shikhar Agarwal
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
Eyad Alshami
 
INTRODUCTION TO NEURAL NETWORKS FINAL YEAR
INTRODUCTION TO NEURAL NETWORKS FINAL YEARINTRODUCTION TO NEURAL NETWORKS FINAL YEAR
INTRODUCTION TO NEURAL NETWORKS FINAL YEAR
ARVIND SARDAR
 
Lecture7_Neural Networks_and_analysis2024.pdf
Lecture7_Neural Networks_and_analysis2024.pdfLecture7_Neural Networks_and_analysis2024.pdf
Lecture7_Neural Networks_and_analysis2024.pdf
adiworks30
 
BASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxBASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptx
RiteshPandey184067
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
Akash Goel
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cse
NaveenBhajantri1
 
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance lec 13
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance   lec 13Dr. Syed Muhammad Ali Tirmizi - Special topics in finance   lec 13
Dr. Syed Muhammad Ali Tirmizi - Special topics in finance lec 13
Dr. Muhammad Ali Tirmizi., Ph.D.
 
Deep learning vxcvbfsdfaegsr gsgfgsdg sd gdgd gdgd gse
Deep learning vxcvbfsdfaegsr gsgfgsdg  sd gdgd gdgd gseDeep learning vxcvbfsdfaegsr gsgfgsdg  sd gdgd gdgd gse
Deep learning vxcvbfsdfaegsr gsgfgsdg sd gdgd gdgd gse
NeelaGargeya
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In Management
IOSR Journals
 
Ad

Recently uploaded (20)

Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Ad

Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R

  • 1.    Share  2 Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R M A C H I N E L E A R N I N GR  January 30, 2017 Introduction Deep Learning isn't a recent discovery. The seeds were sown back in the 1950s when the first artificial neural network was created. Since then, progress has been rapid, with the structure of the neuron being "re-invented" artificially. Computers and mobiles have now become powerful enough to identify objects from images. Not just images, they can chat with you as well! Haven't you tried Google's Allo app ? That's not all —they can drive, make supersonic calculations, and help businesses solve the most complicated problems (more users, revenue, etc). But, what is driving all these inventions? It's Deep Learning! With increasing open source contributions, R language now provides a fantastic interface for building predictive models based on neural networks and deep learning. However, learning to CATEGORY
  • 2. build models isn't enough. You ought to understand the interesting story behind them. In this tutorial, I'll start with the basics of neural networks and deep learning (from scratch). Along with theory, we'll also learn to build deep learning models in R using MXNet and H2O package. Also, we'll learn to tune parameters of a deep learning model for better model performance. Note: This article is meant for beginners and expects no prior understanding of deep learning (or neural networks). Table of Contents 1. What is Deep Learning ? How is it different from a Neural Network? 2. How does Deep Learning work ? Why is bias added to the network ? What are activation functions and their types ? 3. Multi Layered Neural Networks What is Backpropagation Algorithm ? How does it work ? Gradient Descent 4. Practical Deep Learning with H2O & MXnet What is Deep Learning ? How is it different from a Neural Network? Deep Learning is the new name for multilayered neural networks. You can say, deep learning is an enhanced and powerful form of a neural network. The difference between the two is subtle. The difference lies in the fact that, deep learning models are build on several hidden layers (say, more than 2) as compared to a neural network (built on up to 2 layers). Since data comes in many forms (tables, images, sound, web etc), it becomes extremely difficult for linear methods to learn and detect the non - linearity in the data. In fact, many a times even non-linear algorithms such as tree based (GBM, decision tree) fails to learn from data. In such cases, a multi layered neural network which creates non - linear interactions among the features (i.e. goes deep into features) gives a better solution. You might ask this question, 'Neural networks emerged in 1950s. But, deep learning emerged just few years back. What happened all of a sudden in last few years?' In the last few years, there has been tremendous advancement in computational devices (specially GPUs). High performance of deep learning models come with a cost i.e. computation. They require large memory for computation. The world is continually progressing from CPU to GPU (Graphics Processing Unit). Why ? Because, a CPU can be enabled with max. 22 cores, but a GPU can contain thousands of cores, thereby making it exponentially powerful than a CPU.
  • 3. Upcoming Webinar: How to Become a Data Scientist ? How does Deep Learning work ? To understand deep learning, let's start with basic form of neural network architecture i.e. perceptron. A Neural Network draws its structure from a human neuron. A human neuron looks like this: Yes, you have it too. And, not just one, but billions. We have billions of neurons and trillions of synapses (electric signals) which pass through them. Watch this short video (~2mins) to understand your brain better. It works like this: 1. The dendrites receive the input signal (message). 2. These dendrites apply a weight to the input signal. Think of weight as "importance factor" i.e. higher the weight, higher the importance of signal. 3. The soma (cell body) acts on the input signal and does the necessary computation (decision making). 4. Then, the signal passes through the axon via a threshold function. This function decides whether the signal needs to be passed further. 5. If the input signal exceeds the threshold, the signal gets fired though the axon to terminals to other neuron. This is a simplistic explanation of human neurons. The idea is to make you understand the analogy between human and artificial neurons. Now, let's understand the working of an artificial neuron. The process is quite similar to the explanation above. Make sure you understand it well because it's the fundamental concept of neural network. A simplistic artificial neuron looks like this:
  • 4. Here x1, x2, ... xn are the input variables (or independent variables). As the input variables are fed into the network, they get assigned some random weights (w1,w2...wn). Alongside, a bias (wo) is added to the network (explained below). The adder adds all the weighted input variable. The output (y) is passed through the activation function and calculated using the equation: where wo = bias, wi = weights, xi = input variables. The function g() is the activation function. In this case, the activation function works like this: if the weighted sum of input variables exceeds a certain threshold, it will output 1, else 0. This simple neuron model is also known as McCulloch-Pitts model or Perceptron. In simple words, a perceptron takes several input variables and returns a binary output. Why binary output ? Because, it uses a sigmoid function as the activation function (explained below). If you remove the activation function, what you get is a simple regression model. After adding the sigmoid activation function, it performs the same task as logistic regression. However, perceptron isn't powerful enough to work on linearly inseparable data. Due to its limitations, Multilayer Perceptron (MLP) came into existence. If the perceptron is one neuron, think of MLP as a complete brain which comprises several neurons. Why is bias added in the neural network ? Bias (wo) is similar to the intercept term in linear regression. It helps improve the accuracy of prediction by shifting the decision boundary along Y axis. For example, in the image shown below, had the slope emerged from the origin, the error would have been higher than the error after adding the intercept to the slope.
  • 5. Similarly, in a neural network, the bias helps in shifting the decision boundary to achieve better predictions. What are activation functions and their types ? The perceptron classifies instances by processing a linear combination of input variables through the activation function. We also learned above that the perceptron algorithm returns binary output by using a sigmoid function (shown below). A sigmoid function (or logistic neuron ) is used in logistic regression. This function caps the max and min values at 1 and 0 such that any large positive number becomes 1 and large negative number becomes 0. It is used in neural networks because it has nice mathematical properties (derivative is easier to compute), which help calculate gradient in the backpropagation method (explained below). In general, activation functions govern the type of decision boundary to produce given a non- linear combination of input variables. Also, due to their mathematical properties, activation functions play a significant role in optimizing prediction accuracy. Here is a complete list of activation functions you can find. Multi Layer Neural Network (or Deep Learning Model) A multilayered neural network comprises a chain of interconnected neurons which creates the neural architecture. As shown below, along with input and output layers, it consists of multiple hidden layers also. Don't worry about the word "hidden;" it's how middle layers are named.
  • 6. The input layer consists of neurons equal to the number of input variables in the data. The number of neurons in the hidden layer depends on the user. In R, we can find the optimum number of neurons in the hidden layer using a cross-validation strategy. Multilayered neural networks are preferred when the given data set has a large number of features. That's why this model is being widely used to work on images, text data, etc. There are several types of neural networks; two of which are most commonly used: 1. Feedforward Neural Network: In this network, the information flows in one direction, i.e., from the input node to the output node. 2. Recurrent (or Feedback) Neural Network: In this network, the information flows from the output neuron back to the previous layer as well. It uses the backpropagation algorithm. What is the Backpropagation Algorithm? How does it work ? The goal of the backpropagation algorithm is to optimize the weights associated with neurons so that the network can learn to predict the output more accurately. Once the predicted value is computed, it propagates back layer by layer and re-calculates weights associated with each neuron. In simple words, it tries to bring the predicted value as close to the actual value. It's quite interesting! The backpropagation algorithm optimizes the network performance using a cost function. This cost function is minimized using an iterative sequence of steps called the gradient descent algorithm. Let's first understand it mathematically. Then, we'll look at an example. We'll take the cost function as squared error. It can be written as: where n = the number of observations in the data, yi = the actual value, yi cap = the predicted value. Let's call it equation 1. The constant value 1/2 is added in front for ease of computational purposes. You'll understand it in a while.
  • 7. This cost function is convex in nature. A convex function can be identified by a U-shaped curve (shown below). A great property of the convex function is that it is guaranteed to provide the lowest value when differentiated at zero. Think of a ball rolling down the curve. It will take a few rounds of rolling (up and down) to slow down and it settles at the bottom. That bottom point is the minimum. And, that's where we want togo! If we assume that the data is fixed and the resultant cost function is a function of weights, we can re-write equation 1 as: where J(w) is the vector of weights. Note that we have only substituted ycap with its functional form (wjxj); the rest of the equation is same. Now, this equation is ready for differentiation. After partially differentiating the equation with respect to weights, we get a general equation as As you can see, the constant 1/2 got cancelled. In partial differentiation, we differentiate the entire equation with respect to one variable, keeping other variables constant. We also learn that the partial derivative of this cost function is just the difference between actual and predicted values multiplied by the respective weights averaged over all observations (n).
  • 8. The weight vector [J(w)] comprises weights corresponding to every row in the data. To compute these weights more effectively, gradient descent comes into picture. For a particular value of weight, gradient descent works like this: 1. First, it calculates the partial derivative of the weight. 2. If the derivative is positive, it decreases the weight value. 3. If the derivative is negative, it increases the weight value. 4. The motive is to reach to the lowest point (zero) in the convex curve where the derivative is minimum. 5. It progresses iteratively using a step size (η), which is defined by the user. But make sure that the step size isn't too large or too small. Too small a step size will take longer to converge, too large a step size will never reach an optimum. Remember that the motive of gradient descent is to get to the bottom of the curve. The gradient descent equation can be written as Let's understand it using an example. Suppose, we have a data set with 2 variables (inputs are scaled between 0 and 1): Age CGPA Target 0.2 0.1 1 0.4 0.6 0 0.5 0.2 1 Let's run a recurrent neural network model on this data with 2 input neurons and an output neuron. The activation function is a sigmoid function. If you understand this, calculations with hidden neurons are similar. Output from one layer becomes input for the hidden layers.
  • 9. Iteration 1: Initial Weight (randomly chosen(wo,w1,w2)): 0.1,0.1,0.1 Bias: 1 Input value: 0.2, 0.1 [1st Row] y = 1*0.1 + 0.1*(0.2) + 0.1*(0.1) = 0.13 y = 1 / (1 + e ^ (0.13)) = 0.467 ycap = 0 [prediction is incorrect] Now, we'll re-calculate the weights using the equation above: w1 = 0.1 - 1*1/2[(0 -1)*1] = 0.6 w2 = 0.1 - 1*1/2[(0 -1)*0.2] = 0.2 w3 = 0.1 - 1*1/2[(0 -1)*0.1] = 0.15 New weights: 0.6,0.2,0.15 Input value: 0.4, 0.6 [2nd Row] y = 1*0.6 + 0.2*(0.4) + 0.15*(0.6) = 0.77 y = 1 / (1 + e ^ (0.77)) = 0.316 ycap = 0 [prediction is correct] Since the prediction is correct, we'll continue with the same weights: Weights: 0.6,0.2,0.15 Input value: 0.5, 0.2 y =1*0.6 + 0.2*(0.5) + 0.15*(0.2) = 0.73 y = 1 / (1 + e ^ (0.73)) = 0.323 ycap = 0 [prediction is incorrect] Again, the algorithm will recompute the weights for Iteration 2 and so on. Practically, this iteration goes on until the user defined stopping criteria is reached or the algorithm converges. Since the algorithm finds weights for every row in the data, what if your data set has 10 million rows ? You are lucky if you have a powerful computational machine. But for the unlucky ones ? Don't get upset, you can use the stochastic gradient descent algorithm. The only difference between gradient descent and stochastic gradient descent (SGD) is that SGD takes one observation (or a batch) at a time instead of all the observations. It assumes that the gradient for a cost function computed for a particular row of observations will be approximately equal to the gradient computed across all rows. It updates parameters (bias and weights) for each training example. Also, SGD is being widely using in online learning algorithms. Practical Deep Learning (+ Tuning) with H2O and MXNet Until here, we focused on the conceptual part of deep learning. Now, we'll get some hands-on experience in building deep learning models. R offers a fantastic bouquet of packages for deep
  • 10. learning. Here, we'll look at two of the most powerful packages built for this purpose. For this tutorial, I've used the adult data set from the UC Irivine ML repository. Let's start with H2O. This data set isn't the most ideal one to work with in neural networks. However, the motive of this hands-on section is to make you familiar with model-building processes. H2O Package H2O package provides h2o.deeplearning function for model building. It is built on Java. Primarily, this function is useful to build multilayer feedforward neural networks. It is enabled with several features such as the following: Multi-threaded distributed parallel computation Adaptive learning rate (or step size) for faster convergence Regularization options such as L1 and L2 which help prevent overfitting Automatic missing value imputation Hyperparameter optimization using grid/random search There are many more! For optimization, this package uses the hogwild method instead of stochastic gradient descent. Hogwild is just parallelized version of SGD. Let's understand the parameters involved in model building with h2o. Both the packages have different nomenclatures, so make sure you don't get confused. Since most of the parameters are easy to understand by their names, I'll mention the important ones: 1. hidden - It specifies the number of hidden layers and number of neurons in each layer in the architechture. 2. epochs - It specifies the number of iterations to be done on the data set. 3. rate - It specifies the learning rate. 4. activation - It specifies the type of activation function to use. In h2o, the major activation functions are Tanh, Rectifier, and Maxout. Let's quickly load the data and get over with sanitary data pre-processing steps: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 path = "~/mydata/deeplearning" setwd(path) #load libraries library(data.table) library(mlr) #set variable names setcol <- c("age", "workclass", "fnlwgt", "education", "education-num", "marital-status", "occupation", "relationship",
  • 11. view raw 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 DL1_H2o.R hosted with ❤ by GitHub Now, let's build a simple deep learning model. Generally, computing variable importance from a "relationship", "race", "sex", "capital-gain", "capital-loss", "hours-per-week", "native-country", "target") #load data train <- read.table("adultdata.txt",header = F,sep = ",",col.names = setcol,na.strings = c(" ?"),stringsAsFactors = F) test <- read.table("adulttest.txt",header = F,sep = ",",col.names = setcol,skip = 1, na.strings = c(" ?"),stringsAsFactors = F) setDT(train) setDT(test) #Data Sanity dim(train) #32561 X 15 dim(test) #16281 X 15 str(train) str(test) #check missing values table(is.na(train)) sapply(train, function(x) sum(is.na(x))/length(x))*100 table(is.na(test)) sapply(test, function(x) sum(is.na(x))/length(x))*100 #check target variable #binary in nature check if data is imbalanced train[,.N/nrow(train),target] test[,.N/nrow(test),target] #remove extra characters test[,target := substr(target,start = 1,stop = nchar(target)-1)] #remove leading whitespace library(stringr) char_col <- colnames(train)[sapply(test,is.character)] for(i in char_col) set(train,j=i,value = str_trim(train[[i]],side = "left")) #set all character variables as factor fact_col <- colnames(train)[sapply(train,is.character)] for(i in fact_col) set(train,j=i,value = factor(train[[i]])) for(i in fact_col) set(test,j=i,value = factor(test[[i]])) #impute missing values imp1 <- impute(data = train,target = "target",classes = list(integer = imputeMedian(), factor = imputeMode())) imp2 <- impute(data = test,target = "target",classes = list(integer = imputeMedian(), factor = imputeMode())) train <- setDT(imp1$data) test <- setDT(imp2$data)
  • 12. view raw trained deep learning model is quite pain staking. But, h2o package provides an effortless function to compute variable importance from a deep learning model. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 DL2.H2o.R hosted with ❤ by GitHub Now, let's train a deep learning model with one hidden layer comprising five neurons. This time instead of checking the cross-validation accuracy, we'll validate the model on test data. #load the package require(h2o) #start h2o localH2o <- h2o.init(nthreads = -1, max_mem_size = "20G") #load data on H2o trainh2o <- as.h2o(train) testh2o <- as.h2o(test) #set variables y <- "target" x <- setdiff(colnames(trainh2o),y) #train the model - without hidden layer deepmodel <- h2o.deeplearning(x = x ,y = y ,training_frame = trainh2o ,standardize = T ,model_id = "deep_model" ,activation = "Rectifier" ,epochs = 100 ,seed = 1 ,nfolds = 5 ,variable_importances = T) #compute variable importance and performance h2o.varimp_plot(deepmodel,num_of_features = 20) h2o.performance(deepmodel,xval = T) #84.5 % CV accuracy
  • 13. view raw view raw 1 2 3 4 5 6 7 8 9 10 11 12 DL3_H2o.R hosted with ❤ by GitHub For hyperparameter tuning, we'll perform a random grid search over all parameters and choose the model which returns highest accuracy. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 DL4.H2o.R hosted with ❤ by GitHub MXNetR Package The mxnet package provides an incredible interface to build feedforward NN, recurrent NN and convolutional neural networks (CNNs). CNNs are being widely used in detecting objects from images. The team that created xgboost also created this package. Currently, mxnet is being popularly used in kaggle competitions for image classification problems. deepmodel <- h2o.deeplearning(x = x ,y = y ,training_frame = trainh2o ,validation_frame = testh2o ,standardize = T ,model_id = "deep_model" ,activation = "Rectifier" ,epochs = 100 ,seed = 1 ,hidden = 5 ,variable_importances = T) h2o.performance(deepmodel,valid = T) #85.6% #set parameter space activation_opt <- c("Rectifier","RectifierWithDropout", "Maxout","MaxoutWithDropout") hidden_opt <- list(c(10,10),c(20,15),c(50,50,50)) l1_opt <- c(0,1e-3,1e-5) l2_opt <- c(0,1e-3,1e-5) hyper_params <- list( activation=activation_opt, hidden=hidden_opt, l1=l1_opt, l2=l2_opt ) #set search criteria search_criteria <- list(strategy = "RandomDiscrete", max_models=10) #train model dl_grid <- h2o.grid("deeplearning" ,grid_id = "deep_learn" ,hyper_params = hyper_params ,search_criteria = search_criteria ,training_frame = trainh2o ,x=x ,y=y ,nfolds = 5 ,epochs = 100) #get best model d_grid <- h2o.getGrid("deep_learn",sort_by = "accuracy") best_dl_model <- h2o.getModel(d_grid@model_ids[[1]]) h2o.performance (best_dl_model,xval = T) #CV Accuracy - 84.7%
  • 14. view raw view raw This package can be easily connected with GPUs as well. The process of building model architecture is quite intuitive. It gives greater control to configure the neural network manually. Let's get some hands-on experience using this package. Follow the commands below to install this package in your respective OS. For Windows and Linux users, installation commands are given below. For Mac users, here's the installation procedure. 1 2 3 4 5 6 7 8 9 10 11 12 13 mxnet1.R hosted with ❤ by GitHub In R, mxnet accepts target variables as numeric classes and not factors. Also, it accepts data frame as a matrix. Now, we'll make the required changes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 mxnet2.R hosted with ❤ by GitHub Now, we'll train the multilayered perceptron model using the mx.mlp function. # # Installation - Windows install.packages("drat", repos="https://cran.rstudio.com") drat:::addRepo("dmlc") install.packages("mxnet") library(mxnet) #Installation - Linux #Press Ctrl + Alt + T and run the following command sudo apt-get update sudo apt-get -y install git git clone https://github.com/dmlc/mxnet.git ~/mxnet --recursive cd ~/mxnet/setup-utils bash install-mxnet-ubuntu-r.sh #load package require(mxnet) #convert target variables into numeric train[,target := as.numeric(target)-1] test[,target := as.numeric(target)-1] #convert train data to matrix train.x <- data.matrix(train[,-c("target"),with=F]) train.y <- train$target #convert test data to matrix test.x <- data.matrix(test[,-c("target"),with=F]) test.y <- test$target
  • 15. view raw view raw view raw view raw 1 2 3 4 5 6 7 8 9 10 11 12 13 mxnet3.R hosted with ❤ by GitHub Softmax function is used for binary and multi-classification problems. Alternatively, you can also manually craft the model structure. 1 2 3 4 mxnet4.R hosted with ❤ by GitHub We have configured the network above with one hidden layer carrying three neurons. We have chosen softmax as the output function. The network optimizes for squared loss for regression, and the network optimizes for classification accuracy for classification. Now, we'll train the network: 1 2 3 4 5 6 7 8 mxnet5.R hosted with ❤ by GitHub Similarly, we can configure a more complexed network fed with hidden layers. 1 2 3 4 5 6 mxnet6.R hosted with ❤ by GitHub Understand it carefully: After feeding the input through data, the first hidden layer consists of 10 neurons. The output of each neuron passes through a relu (rectified linear) activation function. We have used it in place of sigmoid. relu converges faster than a sigmoid function. You can read more #set seed to reproduce results mx.set.seed(1) mlpmodel <- mx.mlp(data = train.x ,label = train.y ,hidden_node = 3 #one layer with 10 nodes ,out_node = 2 ,out_activation = "softmax" #softmax return probability ,num.round = 100 #number of iterations over training data ,array.batch.size = 20 #after every batch weights will get updated ,learning.rate = 0.03 #same as step size ,eval.metric= mx.metric.accuracy ,eval.data = list(data = test.x, label = test.y)) #create NN structure data <- mx.symbol.Variable("data") fc1 <- mx.symbol.FullyConnected(data, num_hidden=3) #3 neuron in one layer lrm <- mx.symbol.SoftmaxOutput(fc1) nnmodel <- mx.model.FeedForward.create(symbol = lrm ,X = train.x ,y = train.y ,ctx = mx.cpu() ,num.round = 100 ,eval.metric = mx.metric.accuracy ,array.batch.size = 50 ,learning.rate = 0.01) #configure another network data <- mx.symbol.Variable("data") fc1 <- mx.symbol.FullyConnected(data, name = "fc1", num_hidden=10) #1st hidden layer act1 <- mx.symbol.Activation(fc1, name = "sig", act_type="relu") fc2 <- mx.symbol.FullyConnected(act1, name = "fc2", num_hidden=2) #2nd hidden layer out <- mx.symbol.SoftmaxOutput(fc2, name = "soft")
  • 16. view raw view raw about relu here. Then, the output is fed into the second layer which is the output layer. Since our target variable has two classes, we've chosen num_hidden as 2 in the second layer. Finally, the output from second layer is made to pass though softmax output function. 1 2 3 4 5 6 7 8 9 mxnet7.R hosted with ❤ by GitHub As mentioned above, this trained model predicts output probability, which can be easily transformed into a label using a threshold value (say, 0.5). To make predictions on the test set, we do this: 1 2 3 4 5 6 mxnet8.R hosted with ❤ by GitHub The predicted matrix returns two rows and 16281 columns, each column carrying probability. Using the max.col function, we can extract the maximum value from each row. If you check the model's accuracy, you'll find that this network performs terribly on this data. In fact, it gives no better result than the train accuracy! On this data set, xgboost tuning gave 87% accuracy! If you are familiar with the model building process, I'd suggest you to try working on the popular MNIST data set. You can find tons of tutorials on this data to get you going! Summary Deep Learning is getting increasingly popular in solving most complex problems such as image recognition, natural language processing, etc. If you are aspiring for a career in machine learning, this is the best time for you to get into this subject. The motive of this article was to introduce you to the fundamental concepts of deep learning. In this article, we learned about the basics of deep learning (perceptrons, neural networks, and multilayered neural networks). We learned deep learning as a technique is composed of several algorithms such as backpropagration and gradient descent to optimize the networks. In the end, we gained some hands-on experience in developing deep learning models. Do let me know if you have any feedback, suggestions, or thoughts on this article in the comments below! #train the network dp_model <- mx.model.FeedForward.create(symbol = out ,X = train.x ,y = train.y ,ctx = mx.cpu() ,num.round = 100 ,eval.metric = mx.metric.accuracy ,array.batch.size = 50 ,learning.rate = 0.005) #predict on test pred_dp <- predict(dp_model,test.x) str(pred_dp) #contains 2 rows and 16281 columns #transpose the pred matrix pred.val <- max.col(t(pred_dp))-1
  • 17.    Share  2   A B O U T T H E A U T H O R Manish Saraswat Making an effort to help people understand Machine Learning. I believe your educational background doesn't stop you to pursue ML & Data Science. Earned Masters in F/M, a self taught data science professional. Previously worked at Analytics Vidhya. Now solving ML & Growth challenges at HackerEarth! A U T H O R P O S T Y O U M A Y A L S O L I K E NEVER MISS A POST AGAIN Subscribe to get latest updates delivered fresh to your inbox Top 17 Competitive Data Scientists From India on Kaggle Practical Guide to Clustering Algorithms & Evaluation in R 13 Free Training Courses on Machine Learning and Artificial Intelligence Explaining The Basics of Machine Learning, Algorithms and Applications How can R Users Learn Python for Data Science ? Search...  Email S u b s c r i b e N o w
  • 18. 0 Comments HackerEarth Login1 Share⤤ Sort by Best Start the discussion… Subscribe✉ Add Disqus to your sited Privacy� Recommend A B O U T U S Blog Engineering Blog Updates & Releases Team Careers In the Press T O P C A T E G O R I E S Hiring Placements Hackathons Community Competitive Programming Culture R E S O U R C E S Webinars Podcasts CodeTable Hackathon Handbook Complete Reference to Competitive Programming How to get started with Open Source F O R C O M P A N I E S Recruit Assessment Sourcing Host Hackathons Interview © 2017 HackerEarth