Seminar Presentation On Deep Learning
Seminar Presentation On Deep Learning
SEMINAR REPORT
ON
“Deep Learning”
Bachelor of Engineering
In
Guided By Submitted By
1
Department of Electronics and Communication Engineering
M.B.M. Engineering College
Jai Narain Vyas University,
Jodhpur-342011
2019-20
CERTIFICATE
This is to certify that this seminar report titled Deep Learning has been submitted by
Manan Mehta (16R/07699) in partial fulfillment of the requirements for the degree of
Bachelor of Engineering in Electronics and Communication Engineering of the
MBM Engineering College, JNVU, Jodhpur during the academic year 2019-20 and is a
record of study applied by him/her under my guidance and supervision.
______________________
2
ACKNOWLEDGEMENT
Foremost, I would like to express my sincere gratitude towards my mentor Kapil Parihar
sir (Assistant Professor, ECE dept.) for his continuous support and guidance during my
study on Deep Learning. I could not have expected a better mentor than him during this
study, his immense knowledge, patience has helped me in going through the topic in
detail.
Beside him, I would like to thank Professor Dr. Rajesh Bhadada (Head of Department,
Electronics and Communication Engineering) for giving me such a wonderful
opportunity to enhance my knowledge in the area of deep learning and giving me
guidelines to present a seminar report on the same.
I would also like to thank Pulkit Arora sir for his continuous guidance and support in
selecting state of art area for study and presentation. Apart from him, I would also like
to thank my fellow classmates for their help and motivation during this study.
Last but not the least, I would like to thank my family members and friends for their
continuous support throughout my whole engineering life.
3
ABSTRACT:
4
TABLE OF CONTENTS:
1. Certificate 2
2. Acknowledgement 3
3. Abstract 4
4. Chapter-1 Introduction 6
7. Chapter-4 Overview 31
4.3 Advantages 37
4.4 Disadvantages 38
4.5 Conclusion 38
References 39
5
CHAPTER -1: INTRODUCTION:
Inventors have long dreamed of creating machines that think. This desire dates back to
at least the time of ancient Greece. The mythical figures Pygmalion, Daedalus, and
Hephaestus may all be interpreted as legendary inventors, and Galatea, Talos, and
Pandora may all be regarded as artificial life (Ovid and Martin, 2004; Sparkes, 1996;
Tandy, 1997). When programmable computers were first conceived, people wondered
whether such machines might become intelligent, over a hundred years before one
was built (Lovelace, 1842). Today, artificial intelligence (AI) is a thriving field with many
practical applications and active research topics. We look to intelligent software to
automate routine labour, understand speech or images, make diagnoses in medicine
and support basic scientific research. In the early days of artificial intelligence, the field
rapidly tackled and solved problems that are intellectually difficult for human beings
but relatively straight forward for computers problems that can be described by a list
of formal, mathematical rules. The true challenge to artificial intelligence proved to be
solving the tasks that are easy for people to perform but hard for people to describe
formally—problems that we solve intuitively, that feel automatic, like recognizing
spoken words or faces in images.
Many of the early successes of AI took place in relatively sterile and formal
environments and did not require computers to have much knowledge about the
world. For example, IBM’s Deep Blue chess-playing system defeated world champion
Garry Kasparov in 1997 (Hsu, 2002). Chess is of course a very simple world, containing
only sixty-four locations and thirty-two pieces that can move in only rigidly
circumscribed ways. Devising a successful chess strategy is a tremendous
accomplishment, but the challenge is not due to the difficulty of describing the set of
chess pieces and allowable moves to the computer. Chess can be completely described
by a very brief list of completely formal rules, easily provided ahead of time by the
programmer.
Ironically, abstract and formal tasks that are among the most difficult mental
undertakings for a human being are among the easiest for a computer. Computers have
long been able to defeat even the best human chess player but only recently have begun
matching some of the abilities of average human beings to recognize objects or speech.
A person’s everyday life requires an immense amount of knowledge about the world.
Much of this knowledge is subjective and intuitive, and therefore difficult to articulate
in a formal way. Computers need to capture this same knowledge in order to behave in
6
an intelligent way. One of the key challenges in artificial intelligence is how to get this
informal knowledge into a computer.
The performance of these simple machine learning algorithms depends heavily on the
representation of the data they are given. Each piece of information included in the
representation of the patient is known as a Feature. Logistic regression learns how
each of these features of the patient correlates with various outcomes. However, it
cannot influence how features are defined in any way. Many artificial intelligence tasks
can be solved by designing the right set of features to extract for that task, then
providing these features to a simple machine learning algorithm.
7
The quintessential example of a deep learning model is the feedforward deep network,
or multilayer perceptron (MLP). A multilayer perceptron is just a mathematical function
mapping some set of input values to output values. The function is formed by
composing many simpler functions. We can think of each application of a different
mathematical function as providing a new representation of the input.
The idea of learning the right representation for the data provides one perspective on
deep learning. Another perspective on deep learning is that depth enables the computer
to learn a multistep computer program. Each layer of the representation can be thought
of as the state of the computer’s memory after executing another set of instructions in
parallel. Networks with greater depth can execute more instructions in sequence.
Sequential instructions offer great power because later instructions can refer back to the
results of earlier instructions. According to this view of deep learning, not all the
information in a layer’s activations necessarily encodes factors of variation that explain
the input. The representation also stores state information that helps to execute a
program that can make sense of the input. This state information could be analogous to
a counter or pointer in a traditional computer program. It has nothing to do with the
content of the input specifically, but it helps the model to organize its processing.
There are two main ways of measuring the depth of a model. The first view is based on
the number of sequential instructions that must be executed to evaluate the architecture.
We can think of this as the length of the longest path through a flow chart that describes
how to compute each of the model’s outputs given its inputs. Just as two equivalent
computer programs will have different lengths depending on which language the
program is written in, the same function may be drawn as a flowchart with different
depths depending on which functions we allow to be used as individual steps in the
flowchart.
Another approach, used by deep probabilistic models, regards the depth of a model as
being not the depth of the computational graph but the depth of the graph describing
how concepts are related to each other. In this case, the depth of the flowchart of the
computations needed to compute the representation of each concept may be much
deeper than the graph of the concepts themselves. This is because the system’s
understanding of the simpler concepts can be refined given information about the more
complex concepts. For example, an AI system observing an image of a face with one
eye in shadow may initially see only one eye. After detecting that a face is present, the
system can then infer that a second eye is probably present as well. In this case, the
graph of concepts includes only two layers—a layer for eyes and a layer for faces—but
the graph of computations includes 2N layers if we refine our estimate of each concept
given the other N times.
8
Figure 1.2: Illustration of computational graphs mapping an input to an output where each node
performs an operation.
Because it is not always clear which of these two views—the depth of the
computational graph, or the depth of the probabilistic modelling graph—is most
relevant, and because different people choose different sets of smallest elements from
which to construct their graphs, there is no single correct value for the depth of an
architecture, just as there is no single correct value for the length of a computer
program. Nor is there a consensus about how much depth a model requires to qualify
as “deep.” However, deep learning can be safely regarded as the study of models that
involve a greater amount of composition of either learned functions or learned
concepts than traditional machine learning does.
To summarize, deep learning, the subject of this report, is an approach to AI.
Specifically, it is a type of machine learning, a technique that enables computer
systems to improve with experience and data. We contend that machine learning is
the only viable approach to building AI systems that can operate in complicated real-
world environments. Deep learning is a particular kind of machine learning that
achieves great power and flexibility by representing the world as a nested hierarchy of
concepts, with each concept defined in relation to simpler concepts, and more abstract
representations computed in terms of less abstract ones.
9
1.2 HISTORICAL TRENDS IN DEEP LEARNING:
It is easiest to understand deep learning with some historical context. Rather than
providing a detailed history of deep learning, we identify a few key trends:
Deep learning has had a long and rich history, but has gone by many names,
reflecting different philosophical viewpoints, and has waxed and waned in
popularity.
Deep learning has become more useful as the amount of available training data
has increased.
Deep learning models have grown in size over time as computer infrastructure
(both hardware and software) for deep learning has improved.
Deep learning has solved increasingly complicated applications with increasing
accuracy over time.
10
The main reason for the diminished role of neuroscience in deep learning research
today is that we simply do not have enough information about the brain to use it as a
guide. To obtain a deep understanding of the actual algorithms used by the brain, we
would need to be able to monitor the activity of (at the very least) thousands of
interconnected neurons simultaneously. Because we are not able to do this, we are far
from understanding even some of the simplest and well-studied parts of the brain.
Media accounts often emphasize the similarity of deep learning to the brain. While it is
true that deep learning researchers are more likely to cite the brain as an influence
than researchers working in other machine learning fields, such as kernel machines or
Bayesian statistics, one should not view deep learning as an attempt to simulate the
brain. Modern deep learning draws inspiration from many fields, especially applied
math fundamentals like linear algebra, probability, information theory, and numerical
optimization. While some deep learning researchers cite neuroscience as an important
source of inspiration, others are not concerned with neuroscience at all.
It is worth noting that the effort to understand how the brain works on an algorithmic
level is alive and well. This endeavor is primarily known as “computational
neuroscience” and is a separate field of study from deep learning. It is common for
researchers to move back and forth between both fields. The field of deep learning is
primarily concerned with how to build computer systems that are able to successfully
solve tasks requiring intelligence, while the field of computational neuroscience is
primarily concerned with building more accurate models of how the brain actually
works.
The central idea in connectionism is that a large number of simple computational units
can achieve intelligent behaviour when networked together. This insight applies
equally to neurons in biological nervous systems as it does to hidden units in
computational models.
11
CHAPTER-2: LITERATURE REVIEW:
Deep learning is a very rapidly evolving field with new research papers being published
every day in this state-of-the-art technology. These research papers have lead to some
important breakthroughs in the technology that get used by billions from voice
assistants to image recognition apps such as Google Lens. I have tried to include some
of the most influential papers in the history of deep learning that have brought some
major breakthroughs in the technology.
Deep Residual learning for Image Recognition by He, K., Ren, S., Sun, J., & Zhang, X.
(2016) present a residual learning framework to ease the training of deep neural
networks that are substantially deeper than those used previously. They have explicitly
reformulated the layers as learning residual functions with reference to the layer inputs,
instead of learning unreferenced functions. They have provided comprehensive
empirical evidence showing that these residual networks are easier to optimize, and can
gain accuracy from considerably increased depth.
12
The neural network generally used for image, video classification and recognition are
known as convolutional neural networks. Large-Scale Video Classification with
Convolutional Neural Networks by Fei-Fei, L., Karpathy, A., Leung, T., Shetty, S.,
Sukthankar, R., & Toderici, G. (2014) proves that Convolutional Neural Networks
(CNNs) have been established as a powerful class of models for image recognition
problems. Encouraged by these results, we provide an extensive empirical evaluation of
CNNs on large-scale video classification using a new dataset of 1 million YouTube
videos belonging to 487 classes.
Generative adversarial nets by Bengio, Y., Courville, A.C., Goodfellow, I.J., Mirza, M.,
Ozair, S., Pouget-Abadie, J., Warde-Farley, D., & Xu, B. (2014) propose a new
framework for estimating generative models via an adversarial process, in which we
simultaneously train two models: a generative model G that captures the data
distribution, and a discriminative model D that estimates the probability that a sample
came from the training data rather than G. It is a very influential paper in the history of
deep learning leading to developments in GAN’s which in turns are influencing various
industries.
13
High-Speed Tracking with Kernelised Correlation Filters by Batista, J., Caseiro, R.,
Henriques, J.F., & Martins, P. (2015) . This paper discusses about the most modern
trackers, to cope with natural image changes, a classifier is typically trained with
translated and scaled sample patches. We propose an analytic model for datasets of
thousands of translated patches. By showing that the resulting data matrix is circulant,
we can diagonalize it with the discrete Fourier transform, reducing both storage and
computation by several orders of magnitude.
Dropout: A simple way to prevent neural networks from overfitting by Hinton, G.E.,
Krizhevsky, A., Srivastava, N., Sutskever, I., & Salakhutdinov, R. (2014). This paper
discusses about a technique very frequently used when developing deep learning
models. The key idea is to randomly drop units (along with their connections) from the
neural network during training. This prevents units from co-adapting too much. This
significantly reduces overfitting and gives major improvements over other
regularization methods.
Figure 2.3 This figure depicts how sometimes densely connected neurons can lead to the problem of
overfitting; this problem can be solved by using dropout which randomly disconnects some units in the
neural network to reduce this problem of overfitting.
14
Human-Level Control Through Deep Reinforcement Learning by Volodymyr Mnih,
Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex
Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles
Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan
Wierstra, Shane Legg & Demis Hassabis used recent advances in training deep neural
networks to develop a novel artificial agent, termed a deep Q-network which can learn
successful policies directly from high-dimensional sensory inputs using end-to-end
reinforcement learning. They tested this agent on the challenging domain of classic
Atari 2600 games. Further, they demonstrated that the deep Q-network agent,
receiving only the pixels and the game score as inputs, was able to surpass the
performance of all previous algorithms and achieve a level comparable to that of a
professional human games tester across a set of 49 games, using the same algorithm,
network architecture, and hyperparameters.
15
outperform state-of-the-art (SOTA) on both unsupervised ImageNet synthesis, as well
as in the conditional setting.
This part of the report focuses on the modern day deep neural networks which are
heavily used to solve problems faced in the industry. Modern deep learning provides a
powerful framework for supervised learning. By adding more layers and more units
within a layer, a deep network can represent functions of increasing complexity. Most
tasks that consist of mapping an input vector to an output vector, and that are easy for a
person to do rapidly, can be accomplished via deep learning, given sufficiently large
models and sufficiently large datasets of labelled training examples. Other tasks, that
cannot be described as associating one vector to another, or that are difficult enough that
a person would require time to think and reflect in order to accomplish the task, remain
beyond the scope of deep learning for now.
This part of the report describes the core parametric function approximation technology
that is behind nearly all modern practical applications of deep learning. Let us begin by
describing the feedforward deep network model that is used to represent these functions.
Next, I present advanced techniques for regularization and optimization of such models.
Scaling these models to large inputs such as high-resolution images or long temporal
sequences requires specialization. Then we will introduce the convolutional network for
scaling to large images and the recurrent neural network for processing temporal
sequences. Finally, we present general guidelines for the practical methodology
involved in designing, building, and configuring an application involving deep learning
and review some of its applications.
16
Figure 3.1 Graph of sigmoid function used in logistic regression
Logistic regression is used for binary classification. It is based on the probability model.
It calculates the probability whether the object belongs to which of the class. The
function used for the binary classification is sigmoid. Sigmoid is helpful as it lies
between 0 and 1. Now, we know that there is a high chance of a wrong classification by
the sigmoid function, which is bad for the algorithm. This “mistake” is also known as
weight (w) or loss. The goal of a good logistic regression algorithm is to reduce loss or
weight by improving the correctness of the output and this is achieved by a function
called Gradient Descent. A good way to evaluate the performance of the logistic
regression algorithm is by achieving a minimal cost function. Cost function quantifies
the error between the predicted value and the expected values. Therefore, a logistic
regression model must contain all these functions.
17
Figure 3.2 Logistic Regression Deep neural network architecture
Logistic regression with a neural network mindset simply means that we will be doing a
forward and backward propagation mode to code the algorithm as is usually the case with
neural network algorithms. For logistic regression, the forward propagation is used to calculate
the cost function and the output, y, while the backward propagation is used to calculate the
gradient descent. This algorithm can be used to classify images as opposed to the ML form of
logistic regression and that is what makes it stand out. The main steps for building the logistic
regression neural network are:
18
The architecture of a ConvNet is analogous to that of the connectivity pattern of
Neurons in the Human Brain and was inspired by the organization of the Visual Cortex.
Individual neurons respond to stimuli only in a restricted region of the visual field
known as the Receptive Field. A collection of such fields overlaps to cover the entire
visual area.
Figure 3.3 A convolutional neural network for classifying hand written digits
An image is nothing but a matrix of pixel values, right? So why not just flatten the
image (e.g. 3x3 image matrix into a 9x1 vector) and feed it to a Multi-Level Perceptron
for classification purposes? Uh.. not really.
19
A ConvNet is able to successfully capture the Spatial and Temporal dependencies in
an image through the application of relevant filters. The architecture performs a better
fitting to the image dataset due to the reduction in the number of parameters involved
and reusability of weights. In other words, the network can be trained to understand the
sophistication of the image better.
In the figure, we have an RGB image which has been separated by its three colour
planes — Red, Green, and Blue. There are a number of such colour spaces in which
images exist — Grayscale, RGB, HSV, CMYK, etc.
You can imagine how computationally intensive things would get once the images reach
dimensions, say 8K (7680×4320). The role of the ConvNet is to reduce the images into
a form which is easier to process, without losing features which are critical for getting a
good prediction. This is important when we are to design an architecture which is not
only good at learning features but also is scalable to massive datasets.
20
3.2.2 Convolution Layer — The Kernel
Figure 3.6 Convoluting a 5x5x1 image with a 3x3x1 kernel to get a 3x3x1 convolved feature
In the above demonstration, the green section resembles our 5x5x1 input image, I. The
element involved in carrying out the convolution operation in the first part of a
Convolutional Layer is called the Kernel/Filter, K, represented in the colour yellow.
We have selected K as a 3x3x1 matrix.
The Kernel shifts 9 times because of Stride Length = 1 (Non-Strided), every time
performing a matrix multiplication operation between K and the portion P of the
image over which the kernel is hovering.
21
Figure 3.7 Movement of Kernel
The filter moves to the right with a certain Stride Value till it parses the complete width.
Moving on, it hops down to the beginning (left) of the image with the same Stride Value
and repeats the process until the entire image is traversed.
Figure 3.8 Convolution operation on a MxNx3 image matrix with a 3x3x3 Kernel
22
In the case of images with multiple channels (e.g. RGB), the Kernel has the same depth
as that of the input image. Matrix Multiplication is performed between Kn and In stack
([K1, I1]; [K2, I2]; [K3, I3]) and all the results are summed with the bias to give us a
squashed one-depth channel Convoluted Feature Output.
The objective of the Convolution Operation is to extract the high-level features such
as edges, from the input image. ConvNets need not be limited to only one Convolutional
Layer. Conventionally, the first ConvLayer is responsible for capturing the Low-Level
features such as edges, color, gradient orientation, etc. With added layers, the
architecture adapts to the High-Level features as well, giving us a network, which has
the wholesome understanding of images in the dataset, similar to how we would.
The objective of the Convolution Operation is to extract the high-level features such
as edges, from the input image. ConvNets need not be limited to only one Convolutional
Layer. Conventionally, the first ConvLayer is responsible for capturing the Low-Level
features such as edges, color, gradient orientation, etc. With added layers, the
architecture adapts to the High-Level features as well, giving us a network which has
the wholesome understanding of images in the dataset, similar to how we would.
23
There are two types of results to the operation — one in which the convolved feature is
reduced in dimensionality as compared to the input, and the other in which the
dimensionality is either increased or remains the same. This is done by applying Valid
Padding in case of the former, or Same Padding in the case of the latter.
When we augment the 5x5x1 image into a 6x6x1 image and then apply the 3x3x1
kernel over it, we find that the convolved matrix turns out to be of dimensions 5x5x1.
Hence the name — Same Padding.
On the other hand, if we perform the same operation without padding, we are presented
with a matrix which has dimensions of the Kernel (3x3x1) itself — Valid Padding.
Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the
spatial size of the Convolved Feature. This is to decrease the computational power
required to process the data through dimensionality reduction. Furthermore, it is
useful for extracting dominant features which are rotational and positional invariant,
thus maintaining the process of effectively training of the model.
24
There are two types of Pooling: Max Pooling and Average Pooling. Max Pooling
returns the maximum value from the portion of the image covered by the Kernel. On
the other hand, Average Pooling returns the average of all the values from the portion
of the image covered by the Kernel.
Max Pooling also performs as a Noise Suppressant. It discards the noisy activations
altogether and also performs de-noising along with dimensionality reduction. On the
other hand, Average Pooling simply performs dimensionality reduction as a noise
suppressing mechanism. Hence, we can say that Max Pooling performs a lot better than
Average Pooling.
The Convolutional Layer and the Pooling Layer, together form the i-th layer of a
Convolutional Neural Network. Depending on the complexities in the images, the
number of such layers may be increased for capturing low-levels details even further,
but at the cost of more computational power.
After going through the above process, we have successfully enabled the model to
understand the features. Moving on, we are going to flatten the final output and feed it
to a regular Neural Network for classification purposes.
25
3.2.3 Classification — Fully Connected Layer (FC Layer)
Now that we have converted our input image into a suitable form for our Multi-Level
Perceptron, we shall flatten the image into a column vector. The flattened output is fed
to a feed-forward neural network and backpropagation applied to every iteration of
training. Over a series of epochs, the model is able to distinguish between dominating
and certain low-level features in images and classify them using the Softmax
Classification technique.
There are various architectures of CNNs available which have been key in building
algorithms which power and shall power AI as a whole in the foreseeable future. Some
of them have been listed below:
1. LeNet
2. AlexNet
3. VGGNet
4. GoogLeNet
5. ResNet
6. ZFNet
26
3.3 RECURRENT NEURAL NETWORK:
Recurrent Neural Network (RNN) are a type of neural network where the output from
previous step are fed as input to the current step. In traditional neural networks, all the
inputs and outputs are independent of each other, but in cases like when it is required to
predict the next word of a sentence, the previous words are required and hence there is a
need to remember the previous words. Thus, RNN came into existence, which solved
this issue with the help of a Hidden Layer. The main and most important feature of
RNN is hidden state, which remembers some information about a sequence.
RNN have a “memory” which remembers all information about what has been
calculated. It uses the same parameters for each input as it performs the same task on all
the inputs or hidden layers to produce the output. This reduces the complexity of
parameters, unlike other neural networks.
Figure 3.12 A recurrent neural network and the unfolding in time of the computation involved in its
forward computation
The idea behind RNNs is to make use of sequential information. In a traditional neural
network we assume that all inputs (and outputs) are independent of each other. But for
many tasks that’s a very bad idea. If you want to predict the next word in a sentence you
better know which words came before it. RNNs are called recurrent because they
perform the same task for every element of a sequence, with the output being depended
on the previous computations. Another way to think about RNNs is that they have a
“memory” which captures information about what has been calculated so far. In theory
RNNs can make use of information in arbitrarily long sequences, but in practice they
are limited to looking back only a few steps.
The above diagram shows a RNN being unrolled (or unfolded) into a full network. By
unrolling we simply mean that we write out the network for the complete sequence. For
example, if the sequence we care about is a sentence of 5 words, the network would be
unrolled into a 5-layer neural network, one layer for each word. The formulas that
govern the computation happening in a RNN are as follows:
27
3.3.1 Uses of a Recurrent Neural Network:
Image Captioning – Here, let’s say we have an image for which we need a
textual description. So we have a single input – the image, and a series or
sequence of words as output. Here the image might be of a fixed size, but the
output is a description of varying lengths
So RNNs can be used for mapping inputs to outputs of varying types, lengths and are
fairly generalized in their application. Looking at their applications, let’s see how the
architecture of an RNN looks like.
28
3.4 LONG SHORT TERM MEMORY (LSTM):
Long Short-Term Memory (LSTM) networks are a modified version of recurrent neural
networks, which makes it easier to remember past data in memory. The vanishing
gradient problem of RNN is resolved here. LSTM is well-suited to classify, process and
predict time series given time lags of unknown duration. It trains the model by using
back-propagation. In an LSTM network, three gates are present:
1. Input gate — discover which value from input should be used to modify the memory.
Sigmoid function decides which values to let through 0,1. and tanh function gives
weightage to the values which are passed deciding their level of importance ranging
from-1 to 1.
29
Figure 3.15 Forget gate of a LSTM network
3. Output gate — the input and the memory of the block is used to decide the
output. Sigmoid function decides which values to let through 0,1. and tanh
function gives weightage to the values which are passed deciding their level of
importance ranging from-1 to 1 and multiplied with output of Sigmoid.
The clever idea of introducing self-loops to produce paths where the gradient can flow
for long durations is a core contribution of the initial long short-term Memory (LSTM)
model (Hochreiter and Schmidhuber, 1997). A crucial addition has been to make the
weight on this self-loop conditioned on the context, rather than fixed (Gers et al., 2000).
By making the weight of this self-loop gated (controlled by another hidden unit), the
time scale of integration can be changed dynamically. In this case, we mean that even
for an LSTM with fixed parameters, the time scale of integration can change based on
the input sequence, because the time constants are output by the model itself. The
LSTM has been found extremely successful in many applications, such as unconstrained
handwriting recognition (Graves et al., 2009), speech recognition (Graves et al., 2013;
Graves and Jaitly, 2014), handwriting generation (Graves, 2013), machine translation
(Sutskever et al., 2014), image captioning (Kiros et al., 2014b; Vinyals et al., 2014b; Xu
et al., 2015), and parsing (Vinyals et al., 2014a).
30
CHAPTER-4 : OVERVIEW:
In this report we have discovered that deep learning is just very big neural networks on
a lot more data, requiring bigger computers.
Although early approaches published by Hinton and collaborators focus on greedy layer
wise training and unsupervised methods like autoencoders, modern state-of-the-art deep
learning is focused on training deep (many layered) neural network models using the
backpropagation algorithm. The most popular techniques are:
Multilayer Perceptron Networks.
Convolutional Neural Networks.
Long Short-Term Memory Recurrent Neural Networks.
Some researchers believe that the major changes that will be seen in the deep learning
industry will be:
Deep learning networks will demystify computer memory.
Neural architecture search will play a key role in building datasets for DL
models.
NAS will continue to use reinforcement learning to search convolutional
architectures.
31
Current growth of DL research and industry applications demonstrate its
“ubiquitous” presence in every facet of AI — be it NLP or computer vision
applications.
With time and research opportunities, unsupervised learning methods may
deliver models that will closely mimic human behaviour.
The apparent conflict between consumer data protection laws and research
needs of high volumes of consumer data will continue.
Deep learning technology’s limitations in being able to “reason” is a hindrance
to automated, decision-support tools.
Google’s acquisition of DeepMind Technologies holds promises for global
marketers.
The future ML and DL technologies must demonstrate learning from limited
training materials, and transfer learning between contexts, continuous
learning, and adaptive capabilities to remain useful.
Though globally popular, deep learning may not be the only saviour of AI
solutions.
Though globally popular, deep learning may not be the only saviour of AI
solutions.
If deep learning technology research progresses in the current pace, developers
may soon find themselves outpaced and will be forced to take intensive
training.
4.2.1 Self-driving cars: Deep Learning is the force that is bringing autonomous driving
to life. A million sets of data are fed to a system to build a model, to train the machines
to learn, and then test the results in a safe environment. The Uber Artificial Intelligence
Labs at Pittsburg is not only working on making driverless cars humdrum but also
integrating several smart features such as food delivery options with the use of
driverless cars. The major concern for autonomous car developers is handling
unprecedented scenarios. A regular cycle of testing and implementation typical to deep
learning algorithms is ensuring safe driving with more and more exposure to millions of
scenarios. Data from cameras, sensors, geo-mapping is helping create succinct and
sophisticated models to navigate through traffic, identify paths, signage, pedestrian-only
routes, and real-time elements like traffic volume and road blockages. According to
32
Forbes, MIT is developing a new system that will allow autonomous cars to navigate
without a map as 3-D mapping is still limited to prime areas in the world and not as
effective in avoiding mishaps. CSAIL graduate student Teddy Ort said, “The reason this
kind of ‘map-less’ approach hasn’t really been done before is because it is generally
much harder to reach the same accuracy and reliability as with detailed maps. A system
like this that can navigate just with on-board sensors shows the potential of self-driving
cars being able to actually handle roads beyond the small number that tech companies
have mapped.”
4.2.2 News Aggregation and Fraud News Detection: There is now a way to filter out
all the bad and ugly news from your news feed. Extensive use of deep learning in news
aggregation is bolstering efforts to customize news as per readers. While this may not
seem new, newer levels of sophistication to define reader personas are being met to
filter out news as per geographical, social, economical parameters along with the
individual preferences of a reader. Fraud news detection, on the other hand, is an
important asset in today’s world where the internet has become the primary source of all
genuine and fake information. It becomes extremely hard to distinguish fake news as
bots replicate it across channels automatically. The Cambridge Analytica is a classic
example of how fake news, personal information, and statistics can influence reader
perception (Bhartiya Janta Party vs Indian National Congress), elections (Read Donald
Trump Digital Campaigns), and exploit personal data (Facebook data for approximately
87 million people was compromised). Deep Learning helps develop classifiers that can
detect fake or biased news and remove it from your feed and warn you of possible
privacy breaches. Training and validating a deep learning neural network for news
detection is really hard as the data is plagued with opinions and no one party can ever
decide if the news is neutral or biased.
33
personalized form of expression to every scenario. Natural Language Processing
through Deep Learning is trying to achieve the same thing by training machines to catch
linguistic nuances and frame appropriate responses. Document summarization is widely
being used and tested in the Legal sphere making paralegals obsolete. Answering
questions, language modelling, classifying text, twitter analysis, or sentiment analysis at
a broader level are all subsets of natural language processing where deep learning is
gaining momentum. Earlier logistic regression or SVM were used to build time-
consuming complex models but now distributed representations, convolutional neural
networks, recurrent and recursive neural networks, reinforcement learning, and memory
augmenting strategies are helping achieve greater maturity in NLP. Distributed
representations are particularly effective in producing linear semantic relationships used
to build phrases and sentences and capturing local word semantics with word
embedding (word embedding entails the meaning of a word being defined in the context
of its neighbouring words).
4.2.4 Virtual assistants: The most popular application of deep learning is virtual
assistants ranging from Alexa to Siri to Google Assistant. Each interaction with these
assistants provides them with an opportunity to learn more about your voice and accent,
thereby providing you a secondary human interaction experience. Virtual assistants use
deep learning to know more about their subjects ranging from your dine-out preferences
to your most visited spots or your favourite songs. They learn to understand your
commands by evaluating natural human language to execute them. Another capability
virtual assistants are endowed with is to translate your speech to text, make notes for
you, and book appointments. Virtual assistants are literally at your beck-and-call as they
can do everything from running errands to auto-responding to your specific calls to
coordinating tasks between you and your team members. With deep learning
applications such as text generation and document summarizations, virtual assistants
can assist you in creating or sending appropriate email copy as well.
34
4.2.5 Visual recognition: Imagine yourself going through a plethora of old images
taking you down the nostalgia lane. You decide to get a few of them framed but first,
you would like to sort them out. Putting in manual effort was the only way to
accomplish this in the absence of metadata. The maximum you could do was sort them
out based on dates but downloaded images lack that metadata sometimes. In comes,
Deep Learning and now images can be sorted based on locations detected in
photographs, faces, a combination of people, or according to events, dates, etc.
Searching for a particular photo from a library (let’s say a dataset as large as Google’s
picture library) requires state-of-the-art visual recognition systems consisting of several
layers from basic to advanced to recognize elements. Large-scale image Visual
recognition through deep neural networks is boosting growth in this segment of digital
media management by using convolutional neural networks, Tensorflow, and Python
extensively.
35
Deep Learning projects picking up speed in the Healthcare domain. Readmissions are a
huge problem for the healthcare sector as it costs tens of millions of dollars in cost. But
with the use of deep learning and neural networks, healthcare giants are mitigating
health risks associated with readmissions while bringing down the costs. AI is also
being exceedingly being used in clinical researches by regulatory agencies to find cures
to untreatable diseases but physicians scepticism and lack of a humongous dataset are
still posing challenges to the use of deep learning in medicine.
4.2.7 Colourisation of black and white images: Image colorization is the process of
taking grayscale images (as input) and then producing colorized images (as output) that
represents the semantic colours and tones of the input. This process, was conventionally
done by hand with human effort, considering the difficulty of the task. However, with
the Deep Learning Technology today, it is now applied to objects and their context
within the photograph – in order to colour the image, just as human operator’s approach.
Essentially, this approach involves the use of high quality- convolutional neural
networks in supervised layers that recreate the image with the addition of colour.
Figure 4.4 Convolutional neural network used for colourisation of black and white images
4.2.8 Pixel restoration: The concept of zooming into videos beyond its actual
resolution was unrealistic until Deep Learning came into play. In 2017, Google Brain
researchers trained a Deep Learning network to take very low-resolution images of
faces and predict the person’s face through it. This method was known as the Pixel
Recursive Super Resolution. It enhances the resolution of photos significantly,
pinpointing prominent features in order that is just enough for personality identification.
36
Figure 4.5 The above image portrays a group of pictures which contains an original set of 8×8 photos on
the right along with the ground truth – which was the real face originally in the photos, on the left. And
finally, the middle column contains the guess made by the computer.
37
4.4 DISADVANTAGES OF DEEP LEARNING:
1. It is very difficult to assess its performance in real world applications;
applications can vary greatly from application to application, and testing
techniques for analysis, validation and scaling vary widely.
2. It needs to be trained on very large amounts of data (think thousands of images
or videos).
3. It is computationally very expensive, requiring a large amount of memory and
computational resources, and it is not easy to transfer it to other problems.
4.5 CONCLUSION:
In this report, we have discussed about deep learning that how it has emerged as a state-
of-the-art technology bringing transformations in every industry even whether the
common people are able to notice it or not. From self-driving cars to virtual assistants,
deep learning is making its impact on every technology in the industry. The research in
the field of deep learning is taking off at a great pace in the recent years due to more
availability of data, high computational power GPUs and TPUs. Everyday new research
papers are being published leading to new developments in the industry. The modern-
day deep learning chapter covers the neural network architectures that are currently
being used in the industry. Convolutional neural networks are used for the tasks of
computer vision, face recognition and those related to image and video. Sequence
related tasks such as natural language processing is performed by using recurrent neural
networks. While some sample generation related tasks are performed by relatively
newer architectures such as generative adversarial networks. Its various advantages and
disadvantages are discussed in this report. At last, I want to conclude that deep learning
looks like a very promising technology trying and successfully making its way through
our everyday lives.
38
REFERENCES:
1. “Deep Learning (2016)” by Ian Goodfellow and Yoshua Bengio and Aaron
Courville.
2. Andrej Karpathy blogs and medium articles.
3. “Pattern recognition and machine learning” by Christopher Bishop.
4. Analytics India magazine on analyticsindiamag.com.
5. “Introduction to Machine Learning (2014)” by Alpaydin, Ethem.
6. “Deep Learning with Python (2017)” by Francois Chollet.
39