Deep Learning Unit-1 Finals
Deep Learning Unit-1 Finals
UNIT-1
Topics : Introduction to Deep Learning , Historical Trends in Deep Learning ,
Deep Feed- forward networks , Gradient-Based learning, Hidden units,
Architecture design, Back-propagation networks.
Artificial intelligence is the ability of a machine to imitate intelligent human behaviour. Machine learning allows
a system to learn and improve from experience automatically.
Deep learning is an application of machine learning that uses complex algorithms and deep neural nets to train a
model.
Artificial intelligence
"It is a branch of computer science by which we can create intelligent machines which can behave like a
human, think like humans, and able to make decisions."
Artificial Intelligence exists when a machine can have human based skills such as learning, reasoning,
and solving problems
oWith the help of AI, you can create such software or devices which can solve real-world
problems very easily and with accuracy such as health issues, marketing, traffic issues, etc.
oWith the help of AI, you can create your personal virtual Assistant, such as Cortana, Google
Assistant, Siri, etc.
o With the help of AI, you can build such Robots which can work in an environment
where survival of humans can be at risk.
o AI opens a path for other new technologies, new devices, and new Opportunities.
Following are some sectors which have the application of Artificial Intelligence:
1. AI in Astronomy
2. AI in Healthcare
3. AI in Gaming
4. AI in Finance
5. AI in Data Security
6. AI in Social Media
7. AI in Travel & Transport
8. AI in Automotive Industry
9. AI in Robotics:
10.AI in Entertainment
11.AI in Agriculture
12.AI in E-commerce
13.AI in education
o High Accuracy with less errors: AI machines or systems are prone to less errors and
high accuracy as it takes decisions as per pre-experience or information.
o High-Speed: AI systems can be of very high-speed and fast-decision making, because of that AI
systems can beat a chess champion in the Chess game.
o High reliability: AI machines are highly reliable and can perform the same action
multiple times with high accuracy.
o Useful for risky areas: AI machines can be helpful in situations such as defusing a
bomb, exploring the ocean floor, where to employ a human can be risky.
o Digital Assistant: AI can be very useful to provide digital assistant to the users such as
AI technology is currently used by various E-commerce websites to show the products as
per customer requirement.
o Useful as a public utility: AI can be very useful for public utilities such as a self-driving car
which can make our journey safer and hassle-free, facial recognition for security purpose,
Natural language processing to communicate with the human in human-language, etc.
oHigh Cost: The hardware and software requirement of AI is very costly as it requires lots of
maintenance to meet current world requirements.
oCan't think out of the box: Even we are making smarter machines with AI, but still they cannot work
out of the box, as the robot will only do that work for which they are trained, or programmed.
oNo feelings and emotions: AI machines can be an outstanding performer, but still it does not have
the feeling so it cannot make any kind of emotional attachment with human, and may sometime
be harmful for users if the proper care is not taken.
oIncrease dependency on machines: With the increment of technology, people are getting more
dependent on devices and hence they are losing their mental capabilities.
oNo Original Creativity: As humans are so creative and can imagine some new ideas but still AI
machines cannot beat this power of human intelligence and cannot be creative and imaginative
Machine learning
Machine learning enables a machine to automatically learn from data, improve performance
from experiences, and predict things without being explicitly programmed.
A Machine Learning system learns from historical data, builds the prediction models, and
whenever it receives new data, predicts the output for it. The accuracy of predicted output depends
upon the amount of data, as the huge amount of data helps to build a better model which predicts the
output more accurately.
Machine learning has changed our way of thinking about the problem. The below block
diagram explains the working of Machine Learning algorithm:
1. Image Recognition
2. Speech Recognition
3. Traffic prediction:
4. Product recommendations:
5. Self-driving cars
6. Email Spam and Malware Filtering
7. Virtual Personal Assistant
8. Online Fraud Detection:
9. Stock Market trading
10. Medical Diagnosis:
Deep Learning
Applications of Deep Learning
Deep learning is widely used to make weather predictions about rain, earthquakes, and tsunamis. It helps
in taking the necessary precautions.
With deep learning, machines can comprehend speech and provide the required output. It enables the
Deep learning models also help advertisers leverage data to perform real- time bidding and targeted
display advertising.
In the next section introduction to deep learning tutorial, we will cover the need and importance of deep
learning.
Machine learning works only with sets of structured and semi-structured data, while deep learning works
with both structured and unstructured data
Deep learning algorithms can perform complex operations efficiently, while machine learning
algorithms cannot
Machine learning algorithms use labelled sample data to extract patterns, while deep learning accepts
large volumes of data as input and analyses the input data to extract features out of an object
An artificial neural network’s input layer, which is the first layer, receives input from external
sources and passes it on to the hidden layer, which is the second layer.
Each neuron in the hidden layer gets information from the neurons in the previous layer, computes the
weighted total, and then transfers it to the neurons in the next layer.
These connections are weighted, which means that the impacts of the inputs from the preceding layer
are more or less optimized by giving each input a distinct weight. These weights are then adjusted
during the training process to enhance the performance of the model.
machine learning and deep learning both are subsets of artificial intelligence but there are many
similarities and differences between them.
Apply statistical algorithms to learn the hidden Uses artificial neural network architecture to
patterns and relationships in the dataset. learn the hidden patterns and relationships in the
dataset.
Can work on the smaller amount of dataset Requires the larger volume of dataset compared
to machine learning
Better for the low-label task. Better for complex task like image processing,
natural language processing, etc.
Machine Learning Deep Learning
Takes less time to train the model. Takes more time to train the model.
A model is created by relevant features which are Relevant features are automatically extracted
manually extracted from images to detect an object in from images. It is an end-to-end learning process.
the image.
Less complex and easy to interpret the result. More complex, it works like the black box
interpretations of the result are not easy.
It can work on the CPU or requires less computing It requires a high-performance computer with
power as compared to deep learning. GPU.
1. High accuracy:
Deep Learning algorithms can achieve state-of-the-art performance in various tasks, such as
image recognition and natural language processing.
2. Automated feature engineering:
Deep Learning algorithms can automatically discover and learn relevant features from data
without the need for manual feature engineering.
3. Scalability:
Deep Learning models can scale to handle large and complex datasets, and can learn
from massive amounts of data.
4. Flexibility:
Deep Learning models can be applied to a wide range of tasks and can handle
various types of data, such as images, text, and speech.
5. Continual improvement:
Deep Learning models can continually improve their performance as more data becomes
available.
Disadvantages of Deep Learning:
In summary, while Deep Learning offers many advantages, including high accuracy and
scalability, it also has some disadvantages, such as high computational requirements, the need for large
amounts of labeled data, and interpretability challenges. These limitations need to be carefully
considered when deciding whether to use Deep Learning for a specific task.
.
The history of deep learning can be traced back to 1943, when Walter Pitts and Warren
McCulloch created a computer model based on the neural networks of the human brain.
They used a combination of algorithms and mathematics they called “ threshold logic” to
mimic the thought process. Since that time, Deep Learning has evolved steadily, with only
two significant breaks in its development. Both were tied to the infamous Artificial Intelligence
winters.
The 1960s
Henry J. Kelley is given credit for developing the basics of a continuous Back Propagation
Modelin 1960.
The 1970s
During the 1970’s the first AI winter kicked in, the result of promises that couldn’t be kept.
The impact of this lack of funding limited both DL and AI research. Fortunately, there were
individuals who carried on the research without funding.
The 1980s and 90s
In 1989, Yann LeCun provided the first practical demonstration of backpropagation at Bell
Labs. He combined convolutional neural networks with back propagation onto read
“handwritten” digits. This system was eventually used to read the numbers of handwritten
checks.
This time is also when the second AI winter (1985-90s) kicked in, which also effected
research for neural networks and deep learning.
2000-2010
Around the year 2000, The Vanishing Gradient Problem appeared. It was discovered
“features” (lessons) formed in lower layers were not being learned by the upper layers,
because no learning signal reached these layers
2011-2020
By 2011, the speed of GPUs had increased significantly, making it possible to train
convolutional neural networks “without” the layer-by-layer pre-training. With the increased
computing speed, it became obvious deep learning had significant advantages in terms of
efficiency and speed
Semantics technology is being used with deep learning to take artificial intelligence to the next level,
providing more natural sounding, human-like conversations.
Banks and financial services are using deep learning to automate trading, reduce risk, detect fraud, and
provide AI/chatbot advice to investors. A report from the EIU (Economist Intelligence Unit) suggests 86%
of financial services are planning to increase their artificial intelligence investments by 2025.
Deep learning and artificial intelligence are influencing the creation of new business models. These
businesses are creating new corporate cultures that embrace deep learning, artificial intelligence, and
modern technology
A Feed Forward Neural Network is an artificial neural network in which the connections between nodes
does not form a cycle. The opposite of a feed forward neural network is a recurrent neural network, in
which certain pathways are cycled. The feed forward model is the simplest form of neural network as
information is only processed in one direction. While the data may pass through multiple hidden nodes, it
always moves in one direction and never backwards.
A Feed Forward Neural Network is commonly seen in its simplest form as a single layer
perceptron. In this model, a series of inputs enter the layer and are multiplied by the weights. Each
value is then added together to get a sum of the weighted input values. If the sum of the values is
above a specific threshold, usually set at zero, the
value produced is often 1, whereas if the sum falls below the threshold, the output value is -1. The
single layer perceptron is an important model of feed forward neural networks and is often used in
classification tasks.
Furthermore, single layer perceptrons can incorporate aspects of machine learning. Using a
property known as the delta rule, the neural network can compare the outputs of its nodes with the
intended values, thus allowing the network to adjust its weights through training in order to
produce more accurate output values. This process of training and learning produces a form of a
gradient descent. In multi-layered perceptrons, the process of updating weights is nearly
analogous, however the process is defined more specifically as back-propagation. In such cases,
each hidden layer within the network is adjusted according to the output values produced by the
final layer.
Applications of Feed Forward Neural Networks
While Feed Forward Neural Networks are fairly straightforward, their simplified architecture can
be used as an advantage in particular machine learning applications.
For example, one may set up a series of feed forward neural networks with the intention of running
them independently from each other, but with a mild intermediary for moderation. Like the human
brain, this process relies on many individual neurons in order to handle and process larger tasks. As
the individual networks perform their tasks independently, the results can be combined at the end
to produce a synthesized, and cohesive output.
Gradient-Based Learning
1.Gradient-Based Learning
Gradient-based learning is a type of machine learning in which the optimization algorithm uses gradients to
update the model parameters during training.
This approach is commonly used in deep learning and neural networks because it allows the
model to learn complex representations of the input data.
HIDDEN UNITS
A hidden unit refers to the components comprising the layers of processorsbetween input and
output units in a connectionist system.
A lot of the objects we studied so far appear in both Machine Learning and Deep Learning, but
hidden units and output units often are additional objects in Deep Learning. These objects, hidden
units, can be one of manytypes.
Since this is an area of active research, there are many more being studied and have probably yet
to be discovered. Since this is an area of active research, and probably in its infancy, the
principles and definitions are not super set in stone. The closes thing to a formal definition is, a
hidden unit takes in a vector/tensor, compute an affine transformation z and then applies an
element-wise non- linear function g(z). Where z:
The way hidden units are differentiated from each other is based on their activation function,
g(z):
ReLU
ELU GELU
Maxout PReLU
Here we explore the different types of hidden units so that when its time to choose one for an
application you’re developing, you have some intuition about which one to use. When you’re in
the initial stages of development, don’t be afraid to experiment through trial and error.
What’s ReLU?
ReLU stands for Rectified Linear Unit. Rectified Linear Units are pretty much the standard that
everyone defaults to, but it’s only one out of the many options. And this activation function looks
like:
What’s Maxout?
The Maxout unit is then the maximum element of one of these groups:
Where,
With large enough k, a Maxout unit can learn to approximate any convex function with arbitrary
fidelity. In particular, a Maxout layer with two pieces can learn to implement the same inputs as
ReLU, PReLU, absolute value rectification and LeakyReLU.
The caveat here is that a Maxout unit is parametrized by k weight vectors instead of 1, and
require more regularization, unless, the training set is large enough. In general, although there is
no limit on k, lower is better as it requires less regularization.
What’s Logistic Sigmoid?
If the ReLU is the reigning queen of activation functions, then logistic sigmoid is the
former, denoted:
A close relative to the logistic sigmoid is the hyperbolic tangent, related to logistic sigmoid
by:
See the relation? They both saturate really extreme values to a small constant value, more on
this later. The difference between them is that sigmoid is 1/2 at 0,
whereas tanh is 0 at 0. In that sense, the tanh is more like the identity function, at least around
0.
What’s RBF?
This function, Radial Basis Function, becomes more active as x approaches a certain value
vector, it saturates to 0 everywhere else, so can be annoying for gradient descent:
What’s Softplus?
This one is discouraged from use based on empirical evidence. Which is counter-intuitive. Since
its meant to be an improvement on ReLU, making it differentiable everywhere. But in practice, it
does worse.
It looks like the tanh or the rectifier. But unlike the rectifier, it is bounded. It’s
computationally cheaper than many of the alternatives. It’s basically either -1 or the line a
or 1.
What’s Identity?
Having an identity function as the activation function is exactly like having no activation function.
A linear unit can be a useful output unit, but it can also be a decent hidden unit.
If every layer of the network is a linear transformation, the whole network is also a linear
transformation, by transitivity?
Generally multiplying and adding vectors and matrices acts as a linear transformation that
stretches, combines, rotates, compresses the input vector ormatrix.
We just learned that neural networks consist entirely of tensor operations, and all of these tensor
operations are just geometric transformations of the inputdata.
It follows that then neural networks are just geometric transformations of the input data.
Our network has n inputs and p outputs. With this approach we replace that with:
The first layer is matrix U and the second weight matrix is V. If the first
layer, U produces q parameters, together these layers produce (n+p)q parameters. Whereas
just W, would produce np parameters. Linear hidden units, then offer an effective way to
reduce the number of parameters in a network.
What’s Softmax?
These hidden units are often used in architectures where your goal is to learn to manipulate
memory. When there is a classification problem and you need to pick one of the multiple
categories, this is the one to use. As it always boosts the max category and drags the other
categories down. This will be studied later.
Deep Learningarchitectures
RNNs are very useful when it comes to fields where the sequence of presented information is
key. They are commonly used in NLP (i.a. chatbots), speech synthesis, and machine translations.
Bidirectional RNN: They work two ways; the output layer can get information from past and
future states simultaneously[2].
Deep RNN: Multiple layers are present. As a result, the DL model can extract more hierarchical
information.
LSTM: Long Short-Term Memory
It’s also a type of RNN. However, LSTM has feedback connections. This means that it can
process not only single data points (such as images) but also entire sequences of data (such as
audio or video files)[3].
LSTM derives from neural network architectures and is based on the concept of a memory cell.
The memory cell can retain its value for a short or long time as a function of its inputs, which
allows the cell to remember what’s essential and not just its last computed value.
A typical LSTM architecture is composed of a cell, an input gate, an output gate, and a forget
gate. The cell remembers values over arbitrary time intervals, and these three gates regulate the
flow of information into and out of the cell.
The input gate controls when new information can flow into the memory.
The output gate controls when the information that is contained in the cell is used in the output.
The forget gate controls when a piece of information can be forgotten, allowing the cell to process
new data.
Today, LSTMs are commonly used in such fields as text compression, handwriting recognition,
speech recognition, gesture recognition, and image captioning[4].
GRU
This abbreviation stands for Gated Recurrent Unit. It’s a type of LSTM. The major difference is
that GRU has fewer parameters than LSTM, as it lacks an output gate[5]. GRUs are used for
smaller and less frequent datasets, where they show better performance.
CNN can take in an input image, assign importance to various aspects/objects in the image, and
be able to differentiate one from the others[6]. The name ‘convolutional’ derives from a
mathematical operation involving the convolution of different functions. CNNs consist of an
input and an output layer, as well as multiple hidden layers. The CNN’s hidden layers typically
consist of a series of convolutional layers.
Here’s how CNNs work: First, the input is received by the network. Each input (for instance,
image) will pass through a series of convolution layers with various filters. The control layer
controls how the signal flows from one layer to the other. Next, you have to flatten the output and
feed it into the fully connected layer where all the layers of the network are connected with every
neuron from a preceding layer to the neurons from the subsequent layer. As a result, you can
classify the output.
According to a paper “An Evaluation of Deep Learning Miniature Concerning in Soft Computing”[8]
published in 2015, “the central idea of the DSN design relates to the concept of stacking, as proposed
originally, where simple modules of functions or classifiers are composed first and then they are stacked
on top of each other in order to learn complex functions or classifiers.”
Typically, DSNs consist of three or more modules. Each module consists of an input layer, a hidden
layer, and an output layer. These modules are stacked one on top of another, which means that the input
of a given module is based on the output of prior modules/layers. This construction enables DSNs to
learn more complex classification than it would be possible with just one module.
Deep Learning Architecture – Autoencoders
Autoencodersareaspecifictypeoffeedforwardneuralnetwork. Thegeneralideaisthat theinputandtheoutputarepretty
muchthesame.Whatdoesitmean?Simplyput, Autoencoders condense the input into a lower-dimensional code. Based
on this, the outcomeisproduced.Inthismodel,thecodeisacompactversionoftheinput.Oneof Autoencoders’maintasks
istoidentifyanddeterminewhatconstitutesregulardataand then identify the anomalies or aberrations.
The algorithm gets its name because the weights are updated backward, from output to input.
It does not have any parameters to tune except for the number of inputs.
It is highly adaptable and efficient and does not require any prior knowledge about the
network.
It is a standard process that usually works well. It is user-friendly, fast and easy to program.