0% found this document useful (0 votes)
16 views

Machine Learning 4th Unit

Uploaded by

komal dholiwal
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Machine Learning 4th Unit

Uploaded by

komal dholiwal
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Deep neural network

A deep neural network (DNN) is an ANN with multiple hidden layers between the input and
output layers. Similar to shallow ANNs, DNNs can model complex non-linear relationships.
The main purpose of a neural network is to receive a set of inputs, perform progressively
complex calculations on them, and give output to solve real world problems like
classification. We restrict ourselves to feed forward neural networks.
We have an input, an output, and a flow of sequential data in a deep network.

Neural networks are widely used in supervised learning and reinforcement learning problems.
These networks are based on a set of layers connected to each other.
In deep learning, the number of hidden layers, mostly non-linear, can be large; say about
1000 layers.
DL models produce much better results than normal ML networks.
We mostly use the gradient descent method for optimizing the network and minimising the
loss function.
We can use the Imagenet, a repository of millions of digital images to classify a dataset into
categories like cats and dogs. DL nets are increasingly used for dynamic images apart from
static ones and for time series and text analysis.
Training the data sets forms an important part of Deep Learning models. In addition,
Backpropagation is the main algorithm in training DL models.
DL deals with training large neural networks with complex input output transformations.
One example of DL is the mapping of a photo to the name of the person(s) in photo as they
do on social networks and describing a picture with a phrase is another recent application of
DL.
Neural networks are functions that have inputs like x1,x2,x3…that are transformed to outputs
like z1,z2,z3 and so on in two (shallow networks) or several intermediate operations also
called layers (deep networks).
The weights and biases change from layer to layer. ‘w’ and ‘v’ are the weights or synapses of
layers of the neural networks.
The best use case of deep learning is the supervised learning problem.Here,we have large set
of data inputs with a desired set of outputs.

Here we apply back propagation algorithm to get correct output prediction.


The most basic data set of deep learning is the MNIST, a dataset of handwritten digits.
We can train deep a Convolutional Neural Network with Keras to classify images of
handwritten digits from this dataset.
The firing or activation of a neural net classifier produces a score. For example,to classify
patients as sick and healthy,we consider parameters such as height, weight and body
temperature, blood pressure etc.
A high score means patient is sick and a low score means he is healthy.
Each node in output and hidden layers has its own classifiers. The input layer takes inputs
and passes on its scores to the next hidden layer for further activation and this goes on till the
output is reached.
This progress from input to output from left to right in the forward direction is
called forward propagation.
Credit assignment path (CAP) in a neural network is the series of transformations starting
from the input to the output. CAPs elaborate probable causal connections between the input
and the output.
CAP depth for a given feed forward neural network or the CAP depth is the number of
hidden layers plus one as the output layer is included. For recurrent neural networks, where a
signal may propagate through a layer several times, the CAP depth can be potentially
limitless.

Deep Nets and Shallow Nets


There is no clear threshold of depth that divides shallow learning from deep learning; but it is
mostly agreed that for deep learning which has multiple non-linear layers, CAP must be
greater than two.
Basic node in a neural net is a perception mimicking a neuron in a biological neural network.
Then we have multi-layered Perception or MLP. Each set of inputs is modified by a set of
weights and biases; each edge has a unique weight and each node has a unique bias.
The prediction accuracy of a neural net depends on its weights and biases.
The process of improving the accuracy of neural network is called training. The output from
a forward prop net is compared to that value which is known to be correct.
The cost function or the loss function is the difference between the generated output and the
actual output.
The point of training is to make the cost of training as small as possible across millions of
training examples.To do this, the network tweaks the weights and biases until the prediction
matches the correct output.
Once trained well, a neural net has the potential to make an accurate prediction every time.
When the pattern gets complex and you want your computer to recognise them, you have to
go for neural networks.In such complex pattern scenarios, neural network outperformsall
other competing algorithms.
There are now GPUs that can train them faster than ever before. Deep neural networks are
already revolutionizing the field of AI
Computers have proved to be good at performing repetitive calculations and following
detailed instructions but have been not so good at recognising complex patterns.
If there is the problem of recognition of simple patterns, a support vector machine (svm) or a
logistic regression classifier can do the job well, but as the complexity of patternincreases,
there is no way but to go for deep neural networks.
Therefore, for complex patterns like a human face, shallow neural networks fail and have no
alternative but to go for deep neural networks with more layers. The deep nets are able to do
their job by breaking down the complex patterns into simpler ones. For example, human face;
adeep net would use edges to detect parts like lips, nose, eyes, ears and so on and then re-
combine these together to form a human face
The accuracy of correct prediction has become so accurate that recently at a Google Pattern
Recognition Challenge, a deep net beat a human.
This idea of a web of layered perceptrons has been around for some time; in this area, deep
nets mimic the human brain. But one downside to this is that they take long time to train, a
hardware constraint
However recent high performance GPUs have been able to train such deep nets under a week;
while fast cpus could have taken weeks or perhaps months to do the same.

Choosing a Deep Net


How to choose a deep net? We have to decide if we are building a classifier or if we are
trying to find patterns in the data and if we are going to use unsupervised learning. To extract
patterns from a set of unlabelled data, we use a Restricted Boltzman machine or an Auto
encoder.
Consider the following points while choosing a deep net −
 For text processing, sentiment analysis, parsing and name entity recognition,
we use a recurrent net or recursive neural tensor network or RNTN;
 For any language model that operates at character level, we use the recurrent
net.
 For image recognition, we use deep belief network DBN or convolutional
network.
 For object recognition, we use a RNTN or a convolutional network.
 For speech recognition, we use recurrent net.
In general, deep belief networks and multilayer perceptrons with rectified linear units or
RELU are both good choices for classification.
For time series analysis, it is always recommended to use recurrent net.
Neural nets have been around for more than 50 years; but only now they have risen into
prominence. The reason is that they are hard to train; when we try to train them with a
method called back propagation, we run into a problem called vanishing or exploding
gradients.When that happens, training takes a longer time and accuracy takes a back-seat.
When training a data set, we are constantly calculating the cost function, which is the
difference between predicted output and the actual output from a set of labelled training
data.The cost function is then minimized by adjusting the weights and biases values until the
lowest value is obtained. The training process uses a gradient, which is the rate at which the
cost will change with respect to change in weight or bias values.

Restricted Boltzman Networks or Autoencoders - RBNs


In 2006, a breakthrough was achieved in tackling the issue of vanishing gradients. Geoff
Hinton devised a novel strategy that led to the development of Restricted Boltzman
Machine - RBM, a shallow two layer net.
The first layer is the visible layer and the second layer is the hidden layer. Each node in the
visible layer is connected to every node in the hidden layer. The network is known as
restricted as no two layers within the same layer are allowed to share a connection.
Autoencoders are networks that encode input data as vectors. They create a hidden, or
compressed, representation of the raw data. The vectors are useful in dimensionality
reduction; the vector compresses the raw data into smaller number of essential dimensions.
Autoencoders are paired with decoders, which allows the reconstruction of input data based
on its hidden representation.
RBM is the mathematical equivalent of a two-way translator. A forward pass takes inputs and
translates them into a set of numbers that encodes the inputs. A backward pass meanwhile
takes this set of numbers and translates them back into reconstructed inputs. A well-trained
net performs back prop with a high degree of accuracy.
In either steps, the weights and the biases have a critical role; they help the RBM in decoding
the interrelationships between the inputs and in deciding which inputs are essential in
detecting patterns. Through forward and backward passes, the RBM is trained to re-construct
the input with different weights and biases until the input and there-construction are as close
as possible. An interesting aspect of RBM is that data need not be labelled. This turns out to
be very important for real world data sets like photos, videos, voices and sensor data, all of
which tend to be unlabelled. Instead of manually labelling data by humans, RBM
automatically sorts through data; by properly adjusting the weights and biases, an RBM is
able to extract important features and reconstruct the input. RBM is a part of family of feature
extractor neural nets, which are designed to recognize inherent patterns in data. These are
also called auto-encoders because they have to encode their own structure.

Deep Belief Networks - DBNs


Deep belief networks (DBNs) are formed by combining RBMs and introducing a clever
training method. We have a new model that finally solves the problem of vanishing gradient.
Geoff Hinton invented the RBMs and also Deep Belief Nets as alternative to back
propagation.
A DBN is similar in structure to a MLP (Multi-layer perceptron), but very different when it
comes to training. it is the training that enables DBNs to outperform their shallow
counterparts
A DBN can be visualized as a stack of RBMs where the hidden layer of one RBM is the
visible layer of the RBM above it. The first RBM is trained to reconstruct its input as
accurately as possible.
The hidden layer of the first RBM is taken as the visible layer of the second RBM and the
second RBM is trained using the outputs from the first RBM. This process is iterated till
every layer in the network is trained.
In a DBN, each RBM learns the entire input. A DBN works globally by fine-tuning the entire
input in succession as the model slowly improves like a camera lens slowly focussing a
picture. A stack of RBMs outperforms a single RBM as a multi-layer perceptron MLP
outperforms a single perceptron.
At this stage, the RBMs have detected inherent patterns in the data but without any names or
label. To finish training of the DBN, we have to introduce labels to the patterns and fine tune
the net with supervised learning.
We need a very small set of labelled samples so that the features and patterns can be
associated with a name. This small-labelled set of data is used for training. This set of
labelled data can be very small when compared to the original data set.
The weights and biases are altered slightly, resulting in a small change in the net's perception
of the patterns and often a small increase in the total accuracy.
The training can also be completed in a reasonable amount of time by using GPUs giving
very accurate results as compared to shallow nets and we see a solution to vanishing gradient
problem too.

Generative Adversarial Networks - GANs


Generative adversarial networks are deep neural nets comprising two nets, pitted one against
the other, thus the “adversarial” name.
GANs were introduced in a paper published by researchers at the University of Montreal in
2014. Facebook’s AI expert Yann LeCun, referring to GANs, called adversarial training “the
most interesting idea in the last 10 years in ML.”
GANs’ potential is huge, as the network-scan learn to mimic any distribution of data. GANs
can be taught to create parallel worlds strikingly similar to our own in any domain: images,
music, speech, prose. They are robot artists in a way, and their output is quite impressive.
In a GAN, one neural network, known as the generator, generates new data instances, while
the other, the discriminator, evaluates them for authenticity.
Let us say we are trying to generate hand-written numerals like those found in the MNIST
dataset, which is taken from the real world. The work of the discriminator, when shown an
instance from the true MNIST dataset, is to recognize them as authentic.
Now consider the following steps of the GAN −
 The generator network takes input in the form of random numbers and returns
an image.
 This generated image is given as input to the discriminator network along with
a stream of images taken from the actual dataset.
 The discriminator takes in both real and fake images and returns probabilities,
a number between 0 and 1, with 1 representing a prediction of authenticity and
0 representing fake.
 So you have a double feedback loop −
o The discriminator is in a feedback loop with the ground truth of
the images, which we know.
o The generator is in a feedback loop with the discriminator.

RNNSare neural networks in which data can flow in any direction. These networks are used
for applications such as language modelling or Natural Language Processing (NLP).
The basic concept underlying RNNs is to utilize sequential information. In a normal neural
network it is assumed that all inputs and outputs are independent of each other. If we want to
predict the next word in a sentence w

Convolutional Deep Neural Networks – CNNs


If we increase the number of layers in a neural network to make it deeper, it increases the
complexity of the network and allows us to model functions that are more complicated.
However, the number of weights and biases will exponentially increase. As a matter of fact,
learning such difficult problems can become impossible for normal neural networks. This
leads to a solution, the convolutional neural networks.

CNNs are extensively used in computer vision; have been applied also in acoustic modelling
for automatic speech recognition.
The idea behind convolutional neural networks is the idea of a “moving filter” which passes
through the image. This moving filter, or convolution, applies to a certain neighbourhood of
nodes which for example may be pixels, where the filter applied is 0.5 x the node value −
Noted researcher Yann LeCun pioneered convolutional neural networks. Facebook as facial
recognition software uses these nets. CNN have been the go to solution for machine vision
projects. There are many layers to a convolutional network. In Imagenet challenge, a machine
was able to beat a human at object recognition in 2015.
In a nutshell, Convolutional Neural Networks (CNNs) are multi-layer neural networks. The
layers are sometimes up to 17 or more and assume the input data to be images.

CNNs drastically reduce the number of parameters that need to be tuned. So, CNNs
efficiently handle the high dimensionality of raw images.
It is assumed that the reader knows the concept of Neural networks.
When it comes to Machine Learning, Artificial Neural Networks perform really well.
Artificial Neural Networks are used in various classification tasks like image, audio,
words. Different types of Neural Networks are used for different purposes, for
example for predicting the sequence of words we use Recurrent Neural Networks
more precisely an LSTM, similarly for image classification we use Convolution
Neural networks. In this blog, we are going to build a basic building block for CNN.
Before diving into the Convolution Neural Network, let us first revisit some concepts
of Neural Network. In a regular Neural Network there are three types of layers:
 
1. Input Layers: It’s the layer in which we give input to our model. The
number of neurons in this layer is equal to the total number of features in
our data (number of pixels in the case of an image).
2. Hidden Layer: The input from the Input layer is then feed into the hidden
layer. There can be many hidden layers depending upon our model and
data size. Each hidden layer can have different numbers of neurons which
are generally greater than the number of features. The output from each
layer is computed by matrix multiplication of output of the previous layer
with learnable weights of that layer and then by the addition of learnable
biases followed by activation function which makes the network nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a logistic
function like sigmoid or softmax which converts the output of each class
into the probability score of each class.
The data is then fed into the model and output from each layer is obtained this step is
called feedforward, we then calculate the error using an error function, some common
error functions are cross-entropy, square loss error, etc. After that, we backpropagate
into the model by calculating the derivatives. This step is called Backpropagation
which basically is used to minimize the loss.
Here’s the basic python code for a neural network with random inputs and two hidden
layers. 
 

 Python
activation = lambda x: 1.0/(1.0 + np.exp(-x)) # sigmoid function

input = np.random.randn(3, 1)

hidden_1 = activation(np.dot(W1, input) + b1)

hidden_2 = activation(np.dot(W2, hidden_1) + b2)

output = np.dot(W3, hidden_2) + b3

W1,W2,W3,b1,b2,b3 are learnable parameter of the model. 


 
Image source: cs231n.stanford.edu

 
Convolution Neural Network
Convolution Neural Networks or covnets are neural networks that share their
parameters. Imagine you have an image. It can be represented as a cuboid having its
length, width (dimension of the image), and height (as images generally have red,
green, and blue channels). 
 

Now imagine taking a small patch of this image and running a small neural network
on it, with say, k outputs and represent them vertically. Now slide that neural network
across the whole image, as a result, we will get another image with different width,
height, and depth. Instead of just R, G, and B channels now we have more channels
but lesser width and height. This operation is called Convolution. If the patch size is
the same as that of the image it will be a regular neural network. Because of this small
patch, we have fewer weights. 
 

Image source: Deep Learning Udacity


Now let’s talk about a bit of mathematics that is involved in the whole convolution
process. 
 
 Convolution layers consist of a set of learnable filters (a patch in the above
image). Every filter has small width and height and the same depth as that
of input volume (3 if the input layer is image input).
 For example, if we have to run convolution on an image with dimension
34x34x3. The possible size of filters can be axax3, where ‘a’ can be 3, 5, 7,
etc but small as compared to image dimension.
 During forward pass, we slide each filter across the whole input volume
step by step where each step is called stride (which can have value 2 or 3
or even 4 for high dimensional images) and compute the dot product
between the weights of filters and patch from input volume.
 As we slide our filters we’ll get a 2-D output for each filter and we’ll stack
them together and as a result, we’ll get output volume having a depth
equal to the number of filters. The network will learn all the filters.
 
Layers used to build ConvNets
A covnets is a sequence of layers, and every layer transforms one volume to another
through a differentiable function. 
Types of layers: 
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3. 
 
1. Input Layer: This layer holds the raw input of the image with width 32,
height 32, and depth 3.
2. Convolution Layer: This layer computes the output volume by computing
the dot product between all filters and image patches. Suppose we use a
total of 12 filters for this layer we’ll get output volume of dimension 32 x
32 x 12.
3. Activation Function Layer: This layer will apply an element-wise activation
function to the output of the convolution layer. Some common activation
functions are RELU: max(0, x), Sigmoid: 1/(1+e^-x), Tanh, Leaky RELU, etc.
The volume remains unchanged hence output volume will have dimension
32 x 32 x 12.
4. Pool Layer: This layer is periodically inserted in the covnets and its main
function is to reduce the size of volume which makes the computation fast
reduces memory and also prevents overfitting. Two common types of
pooling layers are max pooling and average pooling. If we use a max pool
with 2 x 2 filters and stride 2, the resultant volume will be of dimension
16x16x12. 
 
Image source: cs231n.stanford.edu

1. Fully-Connected Layer: This layer is a regular neural network layer that


takes input from the previous layer and computes the class scores and
outputs the 1-D array of size equal to the number of classes. 
 
Artificial Neural Network

The term "Artificial neural network" refers to a biologically inspired sub-field of


artificial intelligence modeled after the brain. An Artificial neural network is usually a
computational network based on biological neural networks that construct the
structure of the human brain. Similar to a human brain has neurons interconnected
to each other, artificial neural networks also have neurons that are linked to each
other in various layers of the networks. These neurons are known as nodes.

Artificial neural network tutorial covers all the aspects related to the artificial neural
network. In this tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen
self-organizing map, Building blocks, unsupervised learning, Genetic algorithm, etc.

What is Artificial Neural Network?


The term "Artificial Neural Network" is derived from Biological neural networks
that develop the structure of a human brain. Similar to the human brain that has
neurons interconnected to one another, artificial neural networks also have neurons
that are interconnected to one another in various layers of the networks. These
neurons are known as nodes.
The given figure illustrates the typical diagram of Biological Neural Network.

The typical Artificial Neural Network looks something like the given figure.

Dendrites from Biological Neural Network represent inputs in Artificial Neural


Networks, cell nucleus represents Nodes, synapse represents Weights, and Axon
represents Output.

Relationship between Biological neural network and artificial neural network:

Biological Neural Network Artificial Neural Network

Dendrites Inputs

Cell nucleus Nodes


Synapse Weights

Axon Output

An Artificial Neural Network in the field of Artificial intelligence where it attempts


to mimic the network of neurons makes up a human brain so that computers will
have an option to understand things and make decisions in a human-like manner.
The artificial neural network is designed by programming computers to behave
simply like interconnected brain cells.

There are around 1000 billion neurons in the human brain. Each neuron has an
association point somewhere in the range of 1,000 and 100,000. In the human brain,
data is stored in such a manner as to be distributed, and we can extract more than
one piece of this data when necessary from our memory parallelly. We can say that
the human brain is made up of incredibly amazing parallel processors.

We can understand the artificial neural network with an example, consider an


example of a digital logic gate that takes an input and gives an output. "OR" gate,
which takes two inputs. If one or both the inputs are "On," then we get "On" in
output. If both the inputs are "Off," then we get "Off" in output. Here the output
depends upon input. Our brain does not perform the same task. The outputs to
inputs relationship keep changing because of the neurons in our brain, which are
"learning."

The architecture of an artificial neural network:


To understand the concept of the architecture of an artificial neural network, we have
to understand what a neural network consists of. In order to define a neural network
that consists of a large number of artificial neurons, which are termed units arranged
in a sequence of layers. Lets us look at various types of layers available in an artificial
neural network.

Artificial Neural Network primarily consists of three layers:


Input Layer:

As the name suggests, it accepts inputs in several different formats provided by the
programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer, which
finally results in output that is conveyed using this layer.

The artificial neural network takes input and computes the weighted sum of the
inputs and includes a bias. This computation is represented in the form of a transfer
function.

It determines weighted total is passed as an input to an activation function to


produce the output. Activation functions choose whether a node should fire or not.
Only those who are fired make it to the output layer. There are distinctive activation
functions available that can be applied upon the sort of task we are performing.
Advantages of Artificial Neural Network (ANN)
Parallel processing capability:

Artificial neural networks have a numerical value that can perform more than one
task simultaneously.

Storing data on the entire network:

Data that is used in traditional programming is stored on the whole network, not on
a database. The disappearance of a couple of pieces of data in one place doesn't
prevent the network from working.

Capability to work with incomplete knowledge:

After ANN training, the information may produce output even with inadequate data.
The loss of performance here relies upon the significance of missing data.

Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples and to


encourage the network according to the desired output by demonstrating these
examples to the network. The succession of the network is directly proportional to
the chosen instances, and if the event can't appear to the network in all its aspects, it
can produce false output.

Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from generating output,
and this feature makes the network fault-tolerance.

Disadvantages of Artificial Neural Network:


Assurance of proper network structure:

There is no particular guideline for determining the structure of artificial neural


networks. The appropriate network structure is accomplished through experience,
trial, and error.

Unrecognized behavior of the network:

It is the most significant issue of ANN. When ANN produces a testing solution, it
does not provide insight concerning why and how. It decreases trust in the network.

Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.

Difficulty of showing the issue to the network:

ANNs can work with numerical data. Problems must be converted into numerical
values before being introduced to ANN. The presentation mechanism to be resolved
here will directly impact the performance of the network. It relies on the user's
abilities.

The duration of the network is unknown:

The network is reduced to a specific value of the error, and this value does not give
us optimum results.

Science artificial neural networks that have steeped into the world in the mid-20 th century are
exponentially developing. In the present time, we have investigated the pros of artificial neural
networks and the issues encountered in the course of their utilization. It should not be overlooked
that the cons of ANN networks, which are a flourishing science branch, are eliminated individually,
and their pros are increasing

day by day. It means that artificial neural networks will turn into an irreplaceable part of our lives
progressively important.

How do artificial neural networks work?


Artificial Neural Network can be best represented as a weighted directed graph,
where the artificial neurons form the nodes. The association between the neurons
outputs and neuron inputs can be viewed as the directed edges with weights. The
Artificial Neural Network receives the input signal from the external source in the
form of a pattern and image in the form of a vector. These inputs are then
mathematically assigned by the notations x(n) for every n number of inputs.
Afterward, each of the input is multiplied by its corresponding weights ( these
weights are the details utilized by the artificial neural networks to solve a specific
problem ). In general terms, these weights normally represent the strength of the
interconnection between neurons inside the artificial neural network. All the
weighted inputs are summarized inside the computing unit.

If the weighted sum is equal to zero, then bias is added to make the output non-zero
or something else to scale up to the system's response. Bias has the same input, and
weight equals to 1. Here the total of weighted inputs can be in the range of 0 to
positive infinity. Here, to keep the response in the limits of the desired value, a
certain maximum value is benchmarked, and the total of weighted inputs is passed
through the activation function.

The activation function refers to the set of transfer functions used to achieve the
desired output. There is a different kind of the activation function, but primarily
either linear or non-linear sets of functions. Some of the commonly used sets of
activation functions are the Binary, linear, and Tan hyperbolic sigmoidal activation
functions. Let us take a look at each of them in details:

Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish
this, there is a threshold value set up. If the net weighted input of neurons is more
than 1, then the final output of the activation function is returned as one or else the
output is returned as 0.

Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the
tan hyperbolic function is used to approximate output from the actual net input. The
function is defined as:

F(x) = (1/1 + exp(-????x))

Where ???? is considered the Steepness parameter.

Types of Artificial Neural Network:


There are various types of Artificial Neural Networks (ANN) depending upon the
human brain neuron and network functions, an artificial neural network similarly
performs tasks. The majority of the artificial neural networks will have some
similarities with a more complex biological partner and are very effective at their
expected tasks. For example, segmentation or classification.

Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-
evolved results internally. As per the University of Massachusetts, Lowell Centre for
Atmospheric Research. The feedback networks feed information back into itself and
are well suited to solve optimization issues. The Internal system error corrections
utilize feedback ANNs.

Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output
layer, and at least one layer of a neuron. Through assessment of its output by reviewing its
input, the intensity of the network can be noticed based on group behavior of the associated
neurons, and the output is decided. The primary advantage of this network is that it figures
out how to evaluate and recognize input patterns.
.
Convolutional Neural Network
Convolutional Neural Network is one of the main categories to do image
classification and image recognition in neural networks. Scene labeling, objects
detections, and face recognition, etc., are some of the areas where convolutional
neural

networks are widely used.

CNN takes an image as input, which is classified and process under a certain
category such as dog, cat, lion, tiger, etc. The computer sees an image as an array of
pixels and depends on the resolution of the image. Based on image resolution, it will
see as h * w * d, where h= height w= width and d= dimension. For example, An RGB
image is 6 * 6 * 3 array of the matrix, and the grayscale image is 4 * 4 * 1 array of the
matrix.

In CNN, each input image will pass through a sequence of convolution layers along
with pooling, fully connected layers, filters (Also known as kernels). After that, we will
apply the Soft-max function to classify an object with probabilistic values 0 and 1.

Convolution Layer
Convolution layer is the first layer to extract features from an input image. By
learning image features using a small square of input data, the convolutional layer
preserves the relationship between pixels. It is a mathematical operation which takes
two inputs such as image matrix and a kernel or filter.
o The dimension of the image matrix is h×w×d.
o The dimension of the filter is fh×fw×d.
o The dimension of the output is (h-fh+1)×(w-fw+1)×1.

Let's start with consideration a 5*5 image whose pixel values are 0, 1, and filter matrix
3*3 as:

The convolution of 5*5 image matrix multiplies with 3*3 filter matrix is called
"Features Map" and show as an output.

Convolution of an image with different filters can perform an operation such as blur,
sharpen, and edge detection by applying filters.

Strides
Stride is the number of pixels which are shift over the input matrix. When the stride is
equaled to 1, then we move the filters to 1 pixel at a time and similarly, if the stride is
equaled to 2, then we move the filters to 2 pixels at a time. The following figure
shows that the convolution would work with a stride of 2.

Padding
Padding plays a crucial role in building the convolutional neural network. If the image
will get shrink and if we will take a neural network with 100's of layers on it, it will
give us a small image after filtered in the end.

If we take a three by three filter on top of a grayscale image and do the convolving
then what will happen?

It is clear from the above picture that the pixel in the corner will only get covers one
time, but the middle pixel will get covered more than once. It means that we have
more information on that middle pixel, so there are two downsides:
o Shrinking outputs
o Losing information on the corner of the image.

To overcome this, we have introduced padding to an image. "Padding is an


additional layer which can add to the border of an image."

Pooling Layer
Pooling layer plays an important role in pre-processing of an image. Pooling layer
reduces the number of parameters when the images are too large. Pooling is
"downscaling" of the image obtained from the previous layers. It can be compared
to shrinking an image to reduce its pixel density. Spatial pooling is also called
downsampling or subsampling, which reduces the dimensionality of each map but
retains the important information. There are the following types of spatial pooling:

Max Pooling
Max pooling is a sample-based discretization process. Its main objective is to
downscale an input representation, reducing its dimensionality and allowing for the
assumption to be made about features contained in the sub-region binned.

Max pooling is done by applying a max filter to non-overlapping sub-regions of the


initial representation.
Average Pooling
Down-scaling will perform through average pooling by dividing the input into
rectangular pooling regions and computing the average values of each region.

Syntax

layer = averagePooling2dLayer(poolSize)
layer = averagePooling2dLayer(poolSize,Name,Value)

Sum Pooling
The sub-region for sum pooling or mean pooling are set exactly the same as
for max-pooling but instead of using the max function we use sum or mean.

Fully Connected Layer


The fully connected layer is a layer in which the input from the other layers will be
flattened into a vector and sent. It will transform the output into the desired number
of classes by the network.

In the above diagram, the feature map matrix will be converted into the vector such
as x1, x2, x3... xn with the help of fully connected layers. We will combine features
to create a model and apply the activation function such as softmax or sigmoid to
classify the outputs as a car, dog, truck, etc.
RECURRENT NEURAL NETWORK

Recurrent neural networks (RNN) are the state of the art algorithm for
sequential data and are used by Apple's Siri and and Google's voice search.
It is the first algorithm that remembers its input, due to an internal
memory, which makes it perfectly suited for machine learning problems
that involve sequential data. It is one of the algorithms behind the scenes
of the amazing achievements seen in deep learning over the past few years.
In this post, we'll cover the basic concepts of how recurrent neural
networks work, what the biggest issues are and how to solve them.
Table of Contents

 Introduction
 How it works: RNN vs. Feed-forward neural network
 Backpropagation through time
 Two issues of standard RNNs: Exploding gradients & vanishing gradients
 LSTM: Long short-term memory
 Summary

Introduction to Recurrent Neural Networks


(RNN)
RNNs are a powerful and robust type of neural network, and belong to the
most promising algorithms in use because it is the only one with an
internal memory.

Like many other deep learning algorithms, recurrent neural networks are


relatively old. They were initially created in the 1980’s, but only in recent
years have we seen their true potential. An increase in computational
power along with the the massive amounts of data that we now have to
work with, and the invention of long short-term memory (LSTM) in the
1990s, has really brought RNNs to the foreground.

Because of their internal memory, RNN’s can remember important things


about the input they received, which allows them to be very precise in
predicting what’s coming next. This is why they're the preferred algorithm
for sequential data like time series, speech, text, financial data, audio,
video, weather and much more. Recurrent neural networks can form a
much deeper understanding of a sequence and its context compared to
other algorithms.
WHAT IS A RECURRENT NEURAL NETWORK (RNN)?
Recurrent neural networks (RNN) are a class of neural networks that are helpful in modeling
sequence data. Derived from feedforward networks, RNNs exhibit similar behavior to how
human brains function. Simply put: recurrent neural networks produce predictive results in
sequential data that other algorithms can’t.

But when do you need to use a RNN?

“Whenever there is a sequence of data and that temporal dynamics that


connects the data is more important than the spatial content of each
individual frame.” – Lex Fridman (MIT)

Since RNNs are being used in the software behind Siri and Google
Translate, recurrent neural networks show up a lot in everyday life.

How Recurrent Neural Networks Work


To understand RNNs properly, you'll need a working knowledge of
"normal“ feed-forward neural networks and sequential data.

Sequential data is basically just ordered data in which related things follow
each other. Examples are financial data or the DNA sequence. The most
popular type of sequential data is perhaps time series data, which is just a
series of data points that are listed in time order.

RNN VS. FEED-FORWARD NEURAL NETWORKS


RNN’s and feed-forward neural networks get their names from the way
they channel information.
In a feed-forward neural network, the information only moves in one
direction — from the input layer, through the hidden layers, to the output
layer. The information moves straight through the network and never
touches a node twice.

Feed-forward neural networks have no memory of the input they


receive and are bad at predicting what’s coming next. Because a feed-
forward network only considers the current input, it has no notion of order
in time. It simply can’t remember anything about what happened in the
past except its training.

In a RNN the information cycles through a loop. When it makes a


decision, it considers the current input and also what it has learned from
the inputs it received previously.

The two images below illustrate the difference in information flow


between a RNN and a feed-forward neural network.
A usual RNN has a short-term memory. In combination with a LSTM they
also have a long-term memory (more on that later).

Another good way to illustrate the concept of a recurrent neural


network's memory is to explain it with an example:

Imagine you have a normal feed-forward neural network and give it the
word "neuron" as an input and it processes the word character by
character. By the time it reaches the character "r," it has already forgotten
about "n," "e" and "u," which makes it almost impossible for this type of
neural network to predict which character would come next.

A recurrent neural network, however, is able to remember those characters


because of its internal memory. It produces output, copies that output and
loops it back into the network.

Simply put: recurrent neural networks add the immediate past to the
present.
Therefore, a RNN has two inputs: the present and the recent past. This is
important because the sequence of data contains crucial information about
what is coming next, which is why a RNN can do things other algorithms
can’t.
A feed-forward neural network assigns, like all other deep learning
algorithms, a weight matrix to its inputs and then produces the output.
Note that RNNs apply weights to the current and also to the previous
input. Furthermore, a recurrent neural network will also tweak the weights
for both through gradient descent and backpropagation through time
(BPTT).
TYPES OF RNNS
 One to One
 One to Many
 Many to One
 Many to Many

Also note that while feed-forward neural networks map one input to one
output, RNNs can map one to many, many to many (translation) and many
to one (classifying a voice).

Backpropagation Through Time
To understand the concept of backpropagation through time you'll need to
understand the concepts of forward and backpropagation first. We could
spend an entire article discussing these concepts, so I will attempt to
provide as simple a definition as possible.

WHAT IS BACKPRAPAGATION?
Backpropagation (BP or backprop, for short) is known as a workhorse algorithm in machine
learning. Backpropagation is used for calculating the gradient of an error function with
respect to a neural network’s weights. The algorithm works its way backwards through the
various layers of gradients to find the partial derivative of the errors with respect to the
weights. Backprop then uses these weights to decrease error margins when training.

In neural networks, you basically do forward-propagation to get the output


of your model and check if this output is correct or incorrect, to get the
error. Backpropagation is nothing but going backwards through your
neural network to find the partial derivatives of the error with respect to
the weights, which enables you to subtract this value from the weights.

Those derivatives are then used by gradient descent, an algorithm that can
iteratively minimize a given function. Then it adjusts the weights up or
down, depending on which decreases the error. That is exactly how a
neural network learns during the training process.

So, with backpropagation you basically try to tweak the weights of your
model while training.

The image below illustrates the concept of forward propagation and


backpropagation in a feed-forward neural network:
BPTT is basically just a fancy buzz word for doing backpropagation on an
unrolled RNN. Unrolling is a visualization and conceptual tool, which
helps you understand what’s going on within the network. Most of the
time when implementing a recurrent neural network in the common
programming frameworks, backpropagation is automatically taken care of,
but you need to understand how it works to troubleshoot problems that
may arise during the development process.

You can view a RNN as a sequence of neural networks that you train one
after another with backpropagation.

The image below illustrates an unrolled RNN. On the left, the RNN is
unrolled after the equal sign. Note there is no cycle after the equal sign
since the different time steps are visualized and information is passed from
one time step to the next. This illustration also shows why a RNN can be
seen as a sequence of neural networks.

An unrolled version of RNN


If you do BPTT, the conceptualization of unrolling is required since the
error of a given timestep depends on the previous time step.

Within BPTT the error is backpropagated from the last to the first
timestep, while unrolling all the timesteps. This allows calculating the
error for each timestep, which allows updating the weights. Note that
BPTT can be computationally expensive when you have a high number of
timesteps.
Two issues of standard RNN’s
There are two major obstacles RNN’s have had to deal with, but to
understand them, you first need to know what a gradient is.

A gradient is a partial derivative with respect to its inputs. If you don’t


know what that means, just think of it like this: a gradient measures how
much the output of a function changes if you change the inputs a little bit.

You can also think of a gradient as the slope of a function. The higher the
gradient, the steeper the slope and the faster a model can learn. But if the
slope is zero, the model stops learning. A gradient simply measures the
change in all weights with regard to the change in error.

EXPLODING GRADIENTS
Exploding gradients are when the algorithm, without much reason, assigns
a stupidly high importance to the weights. Fortunately, this problem can be
easily solved by truncating or squashing the gradients.

VANISHING GRADIENTS
Vanishing gradients occur when the values of a gradient are too small and
the model stops learning or takes way too long as a result. This was a
major problem in the 1990s and much harder to solve than the exploding
gradients. Fortunately, it was solved through the concept of LSTM by
Sepp Hochreiter and Juergen Schmidhuber.

Long Short-Term Memory (LSTM)


Long short-term memory networks (LSTMs) are an extension for recurrent
neural networks, which basically extends the memory. Therefore it is well
suited to learn from important experiences that have very long time lags in
between.

WHAT IS LONG SHORT-TERM MEMORY (LSTM)?


Long Short-Term Memory (LSTM) networks are an extension of RNN that extend the
memory. LSTM are used as the building blocks for the layers of a RNN. LSTMs assign data
“weights” which helps RNNs to either let new information in, forget information or give it
importance enough to impact the output.

The units of an LSTM are used as building units for the layers of a RNN,
often called an LSTM network.

LSTMs enable RNNs to remember inputs over a long period of time. This
is because LSTMs contain information in a memory, much like the
memory of a computer. The LSTM can read, write and delete information
from its memory.

This memory can be seen as a gated cell, with gated meaning the cell
decides whether or not to store or delete information (i.e., if it opens the
gates or not), based on the importance it assigns to the information. The
assigning of importance happens through weights, which are also learned
by the algorithm. This simply means that it learns over time
what information is important and what is not.

In an LSTM you have three gates: input, forget and output gate. These
gates determine whether or not to let new input in (input gate), delete the
information because it isn’t important (forget gate), or let it impact the
output at the current timestep (output gate). Below is an illustration of a
RNN with its three gates:
The gates in an LSTM are analog in the form of sigmoids, meaning they
range from zero to one. The fact that they are analog enables them to do
backpropagation.

The problematic issues of vanishing gradients is solved through LSTM


because it keeps the gradients steep enough, which keeps the training
relatively short and the accuracy high.

Graph neural network


Graphs are everywhere around us. Your social network is a graph
of people and relations. So is your family. The roads you take to
go from point A to point B constitute a graph. The links that
connect this webpage to others form a graph. When your
employer pays you, your payment goes through a graph of
financial institutions.

Basically, anything that is composed of linked entities can be


represented as a graph. Graphs are excellent tools to visualize
relations between people, objects, and concepts. Beyond
visualizing information, however, graphs can also be good
sources of data to train machine learning models for complicated
tasks.

Graph neural networks (GNN) are a type of machine learning


algorithm that can extract important information from graphs and
make useful predictions. With graphs becoming more pervasive
and richer with information, and artificial neural networks becoming
more popular and capable, GNNs have become a powerful tool for
many important applications.

Transforming graphs for neural network


processing
Every graph is composed of nodes and edges. For example, in a
social network, node can represent users and their characteristics
(e.g., name, gender, age, city), while edges can represent the
relations between the users. A more complex social graph can
include other types of nodes, such as cities, sports teams, news
outlets, as well as edges that describe the relations between the
users and those nodes.

Unfortunately, the graph structure is not well suited for machine


learning. Neural networks expect to receive their data in a
uniform format. Multi-layer perceptrons expect a fixed number of
input features. Convolutional neural networks expect a grid that
represents the different dimensions of the data they process
(e.g., width, height, and color channels of images).

Graphs can come in different structures and sizes, which does not
conform to the rectangular arrays that neural networks expect.
Graphs also have other characteristics that make them different
from the type of information that classic neural networks are
designed for. For instance, graphs are “permutation invariant,”
which means changing the order and position of nodes doesn’t
make a difference as long as their relations remain the same. In
contrast, changing the order of pixels results in a different image
and will cause the neural network that processes them to behave
differently.
To make graphs useful to deep learning algorithms, their data
must be transformed into a format that can be processed by a
neural network. The type of formatting used to represent graph
data can vary depending on the type of graph and the intended
application, but in general, the key is to represent the information
as a series of matrices.

For example, consider a social network graph. The nodes can be


represented as a table of user characteristics. The node table,
where each row contains information about one entity (e.g., user,
customer, bank transaction), is the type of information that you
would provide a normal neural network.

But graph neural networks can also learn from other information
that the graph contains. The edges, the lines that connect the
nodes, can be represented in the same way, with each row
containing the IDs of the users and additional information such as
date of friendship, type of relationship, etc. Finally, the general
connectivity of the graph can be represented as an adjacency
matrix that shows which nodes are connected to each other.

When all of this information is provided to the neural network, it


can extract patterns and insights that go beyond the simple
information contained in the individual components of the graph.
Graph embeddings

Graph neural networks can be created like any other neural


network, using fully connected layers, convolutional layers, pooling
layers, etc. The type and number of layers depend on the type
and complexity of the graph data and the desired output.

The GNN receives the formatted graph data as input and


produces a vector of numerical values that represent relevant
information about nodes and their relations.

This vector representation is called “graph embedding.”


Embeddings are often used in machine learning to transform
complicated information into a structure that can be differentiated
and learned. For example, natural language processing systems use
word embeddings to create numerical representations of words
and their relations together.

How does the GNN create the graph embedding? When the graph
data is passed to the GNN, the features of each node are
combined with those of its neighboring nodes. This is called
“message passing.” If the GNN is composed of more than one
layer, then subsequent layers repeat the message-passing
operation, gathering data from neighbors of neighbors and
aggregating them with the values obtained from the previous
layer. For example, in a social network, the first layer of the GNN
would combine the data of the user with those of their friends,
and the next layer would add data from the friends of friends and
so on. Finally, the output layer of the GNN produces the
embedding, which is a vector representation of the node’s data
and its knowledge of other nodes in the graph.

Interestingly, this process is very similar to how convolutional


neural networks extract features from pixel data. Accordingly,
one very popular GNN architecture is the graph convolutional
neural network (GCN), which uses convolution layers to create
graph embeddings.

Applications of graph neural networks

Once you have a neural network that can learn the embeddings
of a graph, you can use it to accomplish different tasks.

Here are a few applications for graph neural networks:

Node classification: One of the powerful applications of GNNs is


adding new information to nodes or filling gaps where information
is missing. For example, say you are running a social network and
you have spotted a few bot accounts. Now you want to find out if
there are other bot accounts in your network. You can train a
GNN to classify other users in the social network as “bot” or “not
bot” based on how close their graph embeddings are to those of
the known bots.

Edge prediction: Another way to put GNNs to use is to find new


edges that can add value to the graph. Going back to our social
network, a GNN can find users (nodes) who are close to you in
embedding space but who aren’t your friends yet (i.e., there isn’t
an edge connecting you to each other). These users can then be
introduced to you as friend suggestions.

Clustering: GNNs can glean new structural information from


graphs. For example, in a social network where everyone is in
one way or another related to others (through friends, or friends
of friends, etc.), the GNN can find nodes that form clusters in the
embedding space. These clusters can point to groups of users
who share similar interests, activities, or other inconspicuous
characteristics, regardless of how close their relations are.
Clustering is one of the main tools used in machine learning–based
marketing.

Graph neural networks are very powerful tools. They have


already found powerful applications in domains such as route
planning, fraud detection, network optimization, and drug
research. Wherever there is a graph of related entities, GNNs can
help get the most value from the existing data.
Ensemble learning
Ensemble learning helps improve machine learning results by combining
several models. This approach allows the production of better predictive
performance compared to a single model. Basic idea is to learn a set of
classifiers (experts) and to allow them to vote.
Advantage :  Improvement in predictive accuracy.
Disadvantage :  It is difficult to understand an ensemble of classifiers.

Why do ensembles work?


Dietterich(2002) showed that ensembles overcome three problems –
 Statistical Problem –
The Statistical Problem arises when the hypothesis space is too
large for the amount of available data. Hence, there are many
hypotheses with the same accuracy on the data and the learning
algorithm chooses only one of them! There is a risk that the
accuracy of the chosen hypothesis is low on unseen data!
 Computational Problem –
The Computational Problem arises when the learning algorithm
cannot guarantees finding the best hypothesis.
 Representational Problem –
The Representational Problem arises when the hypothesis space
does not contain any good approximation of the target class(es).
Main Challenge for Developing Ensemble Models?
The main challenge is not to obtain highly accurate base models, but rather
to obtain base models which make different kinds of errors. For example, if
ensembles are used for classification, high accuracies can be accomplished
if different base models misclassify different training examples, even if the
base classifier accuracy is low.
Methods for Independently Constructing Ensembles –
 Majority Vote
 Bagging and Random Forest
 Randomness Injection
 Feature-Selection Ensembles
 Error-Correcting Output Coding
Methods for Coordinated Construction of Ensembles –
 Boosting
 Stacking
Reliable Classification:  Meta-Classifier Approach
Co-Training and Self-Training
Types of Ensemble Classifier –
Bagging:
Bagging (Bootstrap Aggregation) is used to reduce the variance of a decision
tree. Suppose a set D of d tuples, at each iteration i, a training set Di of d
tuples is sampled with replacement from D (i.e., bootstrap). Then a classifier
model Mi is learned for each training set D < i. Each classifier M i returns its
class prediction. The bagged classifier M* counts the votes and assigns the
class with the most votes to X (unknown sample).
Implementation steps of Bagging –
1. Multiple subsets are created from the original data set with equal
tuples, selecting observations with replacement.
2. A base model is created on each of these subsets.
3. Each model is learned in parallel from each training set and
independent of each other.
4. The final predictions are determined by combining the predictions
from all the models.

Random Forest:
Random Forest is an extension over bagging. Each classifier in the
ensemble is a decision tree classifier and is generated using a random
selection of attributes at each node to determine the split. During
classification, each tree votes and the most popular class is returned.
Implementation steps of Random Forest –
0. Multiple subsets are created from the original data set,
selecting observations with replacement.
1. A subset of features is selected randomly and whichever
feature gives the best split is used to split the node iteratively.
2. The tree is grown to the largest.
3. Repeat the above steps and prediction is given based on the
aggregation of predictions from n number of trees.

What is reinforcement learning?


Reinforcement learning is a machine learning training method based on rewarding
desired behaviors and/or punishing undesired ones. In general, a reinforcement
learning agent is able to perceive and interpret its environment, take actions and learn
through trial and error.

How does reinforcement learning work?


In reinforcement learning, developers devise a method of rewarding desired behaviors
and punishing negative behaviors. This method assigns positive values to the desired
actions to encourage the agent and negative values to undesired behaviors. This
programs the agent to seek long-term and maximum overall reward to achieve an
optimal solution.

These long-term goals help prevent the agent from stalling on lesser goals. With time,
the agent learns to avoid the negative and seek the positive. This learning method has
been adopted in artificial intelligence (AI) as a way of directing unsupervised machine
learning through rewards and penalties.

Applications and examples of reinforcement learning


While reinforcement learning has been a topic of much interest in the field of AI, its
widespread, real-world adoption and application remain limited. Noting this, however,
research papers abound on theoretical applications, and there have been some
successful use cases.

THIS ARTICLE IS PART OF

In-depth guide to machine learning in the enterprise


 Which also includes:

 Learn the business value of AI's various techniques

 10 common uses for machine learning applications in business

 6 ways to reduce different types of bias in machine learning

Current use cases include, but are not limited to, the following:

 gaming

 resource management

 personalized recommendations

 robotics

Gaming is likely the most common usage field for reinforcement learning. It is
capable of achieving superhuman performance in numerous games. A common
example involves the game Pac-Man.
A learning algorithm playing Pac-Man might have the ability to move in one of four
possible directions, barring obstruction. From pixel data, an agent might be given a
numeric reward for the result of a unit of travel: 0 for empty space, 1 for pellets, 2 for
fruit, 3 for power pellets, 4 for ghost post-power pellets, 5 for collecting all pellets and
completing a level, and a 5-point deduction for collision with a ghost. The agent starts
from randomized play and moves to more sophisticated play, learning the goal of
getting all pellets to complete the level. Given time, an agent might even learn tactics
like conserving power pellets until needed for self-defense.

Reinforcement learning can operate in a situation as long as a clear reward can be


applied. In enterprise resource management (ERM), reinforcement learning algorithms
can allocate limited resources to different tasks as long as there is an overall goal it is
trying to achieve. A goal in this circumstance would be to save time or conserve
resources.

In robotics, reinforcement learning has found its way into limited tests. This type of
machine learning can provide robots with the ability to learn tasks a human teacher
cannot demonstrate, to adapt a learned skill to a new task or to achieve optimization
despite a lack of analytic formulation available.

Reinforcement learning is also used in operations research, information theory, game


theory, control theory, simulation-based optimization, multiagent systems, swarm
intelligence, statistics and genetic algorithms.

Challenges of applying reinforcement learning


Reinforcement learning, while high in potential, can be difficult to deploy and remains
limited in its application. One of the barriers for deployment of this type of machine
learning is its reliance on exploration of the environment.

For example, if you were to deploy a robot that was reliant on reinforcement learning
to navigate a complex physical environment, it will seek new states and take different
actions as it moves. It is difficult to consistently take the best actions in a real-world
environment, however, because of how frequently the environment changes.
The time required to ensure the learning is done properly through this method can
limit its usefulness and be intensive on computing resources. As the training
environment grows more complex, so too do demands on time and compute resources.

Supervised learning can deliver faster, more efficient results than reinforcement


learning to companies if the proper amount of data is available, as it can be employed
with fewer resources.

Common reinforcement learning algorithms


Rather than referring to a specific algorithm, the field of reinforcement learning is
made up of several algorithms that take somewhat different approaches. The
differences are mainly due to their strategies for exploring their environments.

 State-action-reward-state-action (SARSA). This reinforcement learning


algorithm starts by giving the agent what's known as a policy. The policy is
essentially a probability that tells it the odds of certain actions resulting in
rewards, or beneficial states.

 Q-learning. This approach to reinforcement learning takes the opposite


approach. The agent receives no policy, meaning its exploration of its
environment is more self-directed.

 Deep Q-Networks. These algorithms utilize neural networks in addition to


reinforcement learning techniques. They utilize the self-directed
environment exploration of reinforcement learning. Future actions are
based on a random sample of past beneficial actions learned by the neural
network.
A neural
network is a set of algorithms that is modeled loosely after the human brain. These algorithms are
designed to recognize patterns.
How is reinforcement learning different from supervised and
unsupervised learning?
Reinforcement learning is considered its own branch of machine learning, though it
does have some similarities to other types of machine learning, which break down into
the following four domains:

1. Supervised learning. In supervised learning, algorithms train on a body of


labeled data. Supervised learning algorithms can only learn attributes that
are specified in the data set. Common applications of supervised learning
are image recognition models. These models receive a set of labeled
images and learn to distinguish common attributes of predefined forms.

2. Unsupervised learning. In unsupervised learning, developers turn


algorithms loose on fully unlabeled data. The algorithm learns by
cataloging its own observations about data features without being told
what to look for.

3. Semisupervised learning. This method takes a middle-ground approach.


Developers enter a relatively small set of labeled training data, as well as a
larger corpus of unlabeled data. The algorithm is then instructed to
extrapolate what it learns from the labeled data to the unlabeled data and
draw conclusions from the set as a whole.

4. Reinforcement learning. This takes a different approach altogether. It


situates an agent in an environment with clear parameters defining
beneficial activity and nonbeneficial activity and an overarching endgame
to reach. It is similar in some ways to supervised learning in that
developers must give algorithms clearly specified goals and define rewards
and punishments. This means the level of explicit programming required is
greater than in unsupervised learning. But, once these parameters are set,
the algorithm operates on its own, making it much more self-directed than
supervised learning algorithms. For this reason, people sometimes refer to
reinforcement learning as a branch of semisupervised learning, but in
truth, it is most often acknowledged as its own type of machine learning.
DEEP LEARNING
Deep learning is based on the branch of machine learning, which is a subset of
artificial intelligence. Since neural networks imitate the human brain and so deep
learning will do. In deep learning, nothing is programmed explicitly. Basically, it is a
machine learning class that makes use of numerous nonlinear processing units so as
to perform feature extraction as well as transformation. The output from each
preceding layer is taken as input by each one of the successive layers.

Deep learning models are capable enough to focus on the accurate features
themselves by requiring a little guidance from the programmer and are very helpful
in solving out the problem of dimensionality. Deep learning algorithms are used,
especially when we have a huge no of inputs and outputs.

Since deep learning has been evolved by the machine learning, which itself is a
subset of artificial intelligence and as the idea behind the artificial intelligence is to
mimic the human behavior, so same is "the idea of deep learning to build such
algorithm that can mimic the brain".

Deep learning is implemented with the help of Neural Networks, and the idea behind
the motivation of Neural Network is the biological neurons, which is nothing but a
brain cell.

Deep learning is a collection of statistical techniques of machine learning for learning feature
hierarchies that are actually based on artificial neural networks.

So basically, deep learning is implemented by the help of deep networks, which are
nothing but neural networks with multiple hidden layers.

Example of Deep Learning


In the example given above, we provide the raw data of images to the first layer of
the input layer. After then, these input layer will determine the patterns of local
contrast that means it will differentiate on the basis of colors, luminosity, etc. Then
the 1st hidden layer will determine the face feature, i.e., it will fixate on eyes, nose,
and lips, etc. And then, it will fixate those face features on the correct face template.
So, in the 2nd hidden layer, it will actually determine the correct face here as it can be
seen in the above image, after which it will be sent to the output layer. Likewise,
more hidden layers can be added to solve more complex problems, for example, if
you want to find out a particular kind of face having large or light complexions. So,
as and when the hidden layers increase, we are able to solve complex problems.

Architectures
o Deep Neural Networks
It is a neural network that incorporates the complexity of a certain level, which means
several numbers of hidden layers are encompassed in between the input and output
layers. They are highly proficient on model and process non-linear associations.
o Deep Belief Networks
A deep belief network is a class of Deep Neural Network that comprises of multi-layer
belief networks.
Steps to perform DBN:
0. With the help of the Contrastive Divergence algorithm, a layer of features is
learned from perceptible units.
1. Next, the formerly trained features are treated as visible units, which perform
learning of features.
2. Lastly, when the learning of the final hidden layer is accomplished, then the
whole DBN is trained.
o Recurrent Neural Networks
It permits parallel as well as sequential computation, and it is exactly similar to that of
the human brain (large feedback network of connected neurons). Since they are
capable enough to reminisce all of the imperative things related to the input they
have received, so they are more precise.

Types of Deep Learning Networks

1. Feed Forward Neural Network


A feed-forward neural network is none other than an Artificial Neural Network, which
ensures that the nodes do not form a cycle. In this kind of neural network, all the
perceptrons are organized within layers, such that the input layer takes the input, and
the output layer generates the output. Since the hidden layers do not link with the
outside world, it is named as hidden layers. Each of the perceptrons contained in one
single layer is associated with each node in the subsequent layer. It can be concluded
that all of the nodes are fully connected. It does not contain any visible or invisible
connection between the nodes in the same layer. There are no back-loops in the
feed-forward network. To minimize the prediction error, the backpropagation
algorithm can be used to update the weight values.

Applications:

o Data Compression
o Pattern Recognition
o Computer Vision
o Sonar Target Recognition
o Speech Recognition
o Handwritten Characters Recognition

2. Recurrent Neural Network


Recurrent neural networks are yet another variation of feed-forward networks. Here
each of the neurons present in the hidden layers receives an input with a specific
delay in time. The Recurrent neural network mainly accesses the preceding info of
existing iterations. For example, to guess the succeeding word in any sentence, one
must have knowledge about the words that were previously used. It not only
processes the inputs but also shares the length as well as weights crossways time. It
does not let the size of the model to increase with the increase in the input size.
However, the only problem with this recurrent neural network is that it has slow
computational speed as well as it does not contemplate any future input for the
current state. It has a problem with reminiscing prior information.

Applications:

o Machine Translation
o Robot Control
o Time Series Prediction
o Speech Recognition
o Speech Synthesis
o Time Series Anomaly Detection
o Rhythm Learning
o Music Composition

3. Convolutional Neural Network


Convolutional Neural Networks are a special kind of neural network mainly used for
image classification, clustering of images and object recognition. DNNs enable
unsupervised construction of hierarchical image representations. To achieve the best
accuracy, deep convolutional neural networks are preferred more than any other
neural network.

Applications:

o Identify Faces, Street Signs, Tumors.


o Image Recognition.
o Video Analysis.
o NLP.
o Anomaly Detection.
o Drug Discovery.
o Checkers Game.
o Time Series Forecasting.

4. Restricted Boltzmann Machine


RBMs are yet another variant of Boltzmann Machines. Here the neurons present in
the input layer and the hidden layer encompasses symmetric connections amid them.
However, there is no internal association within the respective layer. But in contrast
to RBM, Boltzmann machines do encompass internal connections inside the hidden
layer. These restrictions in BMs helps the model to train efficiently.

Applications:

o Filtering.
o Feature Learning.
o Classification.
o Risk Detection.
o Business and Economic analysis.

5. Autoencoders
An autoencoder neural network is another kind of unsupervised machine learning
algorithm. Here the number of hidden cells is merely small than that of the input
cells. But the number of input cells is equivalent to the number of output cells. An
autoencoder network is trained to display the output similar to the fed input to force
AEs to find common patterns and generalize the data. The autoencoders are mainly
used for the smaller representation of the input. It helps in the reconstruction of the
original data from compressed data. This algorithm is comparatively simple as it only
necessitates the output identical to the input.

o Encoder: Convert input data in lower dimensions.


o Decoder: Reconstruct the compressed data.

Applications:

o Classification.
o Clustering.
o Feature Compression.
Deep learning applications
o Self-Driving Cars
In self-driven cars, it is able to capture the images around it by processing a huge
amount of data, and then it will decide which actions should be incorporated to take
a left or right or should it stop. So, accordingly, it will decide what actions it should
take, which will further reduce the accidents that happen every year.
o Voice Controlled Assistance
When we talk about voice control assistance, then Siri is the one thing that comes
into our mind. So, you can tell Siri whatever you want it to do it for you, and it will
search it for you and display it for you.
o Automatic Image Caption Generation
Whatever image that you upload, the algorithm will work in such a way that it will
generate caption accordingly. If you say blue colored eye, it will display a blue-
colored eye with a caption at the bottom of the image.
o Automatic Machine Translation
With the help of automatic machine translation, we are able to convert one language
into another with the help of deep learning.

Limitations
o It only learns through the observations.
o It comprises of biases issues.

Advantages
o It lessens the need for feature engineering.
o It eradicates all those costs that are needless.
o It easily identifies difficult defects.
o It results in the best-in-class performance on problems.

Disadvantages
o It requires an ample amount of data.
o It is quite expensive to train.
o It does not have strong theoretical groundwork.

You might also like