MLT unit 4 (1)
MLT unit 4 (1)
UNIT-4
x1 w11 y1 v11 z1
w12 v12
w21 v21
x2 w22 y2 z2
wn1 vv1m
22
w1m vk1
wn2 v2m
vk2
w2m
xn wnm yk vmk zm
Input Output
Feedback
4. Single-layer recurrent network :
a. This network is single layer network with feedback connection in which processing
element’s output can be directed back to itself or to other processing element or both.
b. Recurrent neural network is a class of artificial neural network where connections
between nodes form a directed graph along a sequence.
c. This allows it to exhibit dynamic temporal behaviour for a time sequence. Unlike feed
forward neural networks, RNNs can use their internal state (memory) to process
sequences of inputs.
5. Multilayer recurrent network :
a. In this type of network, processing element output can be directed to the processing
element in the same layer and in the preceding layer forming a multilayer recurrent
network.
b. They perform the same task for every element of a sequence, with the output being
depended on the previous computations. Inputs are not needed at each time step.
c. The main feature of a multilayer recurrent neural network is its hidden state, which
captures information about a sequence.
Gradient Descent:
Gradient descent is an optimization algorithm used to minimize some function by
iteratively moving in the direction of steepest descent as defined by the negative of the
gradient. A gradient is the slope of a function, the degree of change of a parameter with the
amount of change in another parameter. Mathematically, it can be described as the partial
derivatives of a set of parameters with respect to its inputs. The more the gradient, the steeper
the slope. Gradient Descent is a convex function.
Gradient Descent can be described as an iterative method which is used to find the values of
the parameters of a function that minimizes the cost function as much as possible. The
parameters are initially defined a particular value and from that, Gradient Descent run in an
iterative fashion to find the optimal values of the parameters, using calculus, to find the
minimum possible value of the given cost function.
Delta rule :
1. The delta rule is specialized version of backpropagation’s learning rule that uses single layer
neural networks.
2. It calculates the error between calculated output and sample output data, and uses this to
create a modification to the weights, thus implementing a form of gradient descent.
Generalized delta learning rule (Error backpropagation learning) :
In generalized delta learning rule (error backpropagation learning). We are given the training
set :
{x1, y1), ..., (xk, yk)
1
O=
1 exp( WT O)
l
l
1 exp( WT x)
where = (y – O)O(1 – O)
Step 5 : Weights of the hidden units are updated
wt = wt + WtOt(1 – Ot)x, l = 1, ..., L
Step 6 : Cumulative cycle error is computed by adding the present error to E
E := E + 1/2(y – O)2
Step 7 : If k < K then k := k + 1 and we continue the training by going back to step 2,
otherwise we go to step 8.
Step 8 : The training cycle is completed. For E < Emax terminate the training session. If E
> Emax then E : = 0, k := 1 and we initiate a new training cycle by going back to step 3.
Backpropagation Algorithm:
1. Backpropagation is an algorithm used in the training of feedforward neural networks for
supervised learning.
2. Backpropagation efficiently computes the gradient of the loss function with respect to the
weights of the network for a single input-output example.
3. This makes it feasible to use gradient methods for training multi-layer networks, updating
weights to minimize loss, we use gradient descent or variants such as stochastic gradient
descent.
4. The backpropagation algorithm works by computing the gradient of the loss function with
respect to each weight by the chain rule, iterating backwards one layer at a time from the
last layer to avoid redundant calculations of intermediate terms in the chain rule; this is an
example of dynamic programming.
5. The term backpropagation refers only to the algorithm for computing the gradient, but it is
often used loosely to refer to the entire learning algorithm, also including how the gradient is
used, such as by stochastic gradient descent.
6. Backpropagation generalizes the gradient computation in the delta rule, which is the single-
layer version of backpropagation, and is in turn generalized by automatic differentiation,
where backpropagation is a special case of reverse accumulation (reverse mode).
[ W]n+1 = E [ W ]n
W
c. The momentum also overcomes the effect of local minima.
d. The use of momentum term will carry a weight change process through
one or local minima and get it into global minima.
3. Sigmoidal gain :
a. When the weights become large and force the neuron to operate in a region
where sigmoidal function is very flat, a better method of coping with
network paralysis is to adjust the sigmoidal gain.
b. By decreasing this scaling factor, we effectively spread out sigmoidal
function on wide range so that training proceeds faster.
4. Local minima :
a. One of the most practical solutions involves the introduction of a shock
which changes all weights by specific or random amounts.
b. If this fails, then the most practical solution is to rerandomize the weights
and start the training all over.
Perceptron:
1. The perceptron is the simplest form of a neural network used for classification of patterns
said to be linearly separable.
2. It consists of a single neuron with adjustable synaptic weights and bias.
3. The perceptron build around a single neuron is limited for performing pattern classification
with only two classes.
4. By expanding the output layer of perceptron to include more than one neuron, more than
two classes can be classified.
5. Suppose, a perceptron have synaptic weights denoted by w1, w2, w3, …..
wm.
6.The input applied to the perceptron are denoted by x1, x2, …… xm.
7.The externally applied bias is denoted by b.
Multilayer perceptron :
1. The perceptrons which are arranged in layers are called multilayer perceptron. This model
has three layers : an input layer, output layer and hidden layer.
2. For the perceptrons in the input layer, the linear transfer function used and for the perceptron
in the hidden layer and output layer, the sigmoidal or squashed-S function is used.
3. The input signal propagates through the network in a forward direction.
4. On a layer by layer basis, in the multilayer perceptron bias b(n) is treated as a synaptic weight
driven by fixed input equal to +1.
x(n) = [+1, x1(n), x2(n), xm(n)]T
where n denotes the iteration step in applying the algorithm. Correspondingly, we define the
weight vector as :
w(n) = [b(n), w1(n), w2(n), wm(n)]T
5. Accordingly, the linear combiner output is written in the compact form :
m
V(n) = wi(n)xi(n) = wT(n) × x(n)
Architecture of multilayer perceptron :
1. Fig. 4.15.1 shows architectural graph of multilayer perceptron with two hidden layer and an
output layer.
2. Signal flow through the network progresses in a forward direction, from the left to right
and on a layer-by-layer basis.
3. Two kinds of signals are identified in this network :
Functional signals : Functional signal is an input signal and propagates forward and
emerges at the output end of the network as an output signal.
Error signals: Error signal originates at an output neuron and propagates backward through
the network
Input signal
Output layer
Multilayer perceptrons have been applied successfully to solve some difficult and diverse
problems by training them in a supervised manner with highly popular algorithm known as
the error backpropagation algorithm.
Characteristics of multilayer perceptron :
1. In this model, each neuron in the network includes a non-linear activation function (non-
linearity is smooth). Most commonly used non-linear function is defined by :
1
y=
j 1 exp( vj )
where vj is the induced local field (i.e., the sum of all weights and bias) and y is the output
of neuron j.
2. The network contains hidden neurons that are not a part of input or output of the network.
Hidden layer of neurons enabled network to learn complex tasks.
3. The network exhibits a high degree of connectivity.
Competition : For each input pattern, the neurons compute their respective values of a
discriminant function which provides the basis for competition. The particular neuron with
the smallest value of the discriminant function is declared the winner
2. Cooperation : The winning neuron determines the spatial location of a topological
neighbourhood of excited neurons, thereby providing the basis for cooperation among
neighbouring neurons.
3. Adaptation : The excited neurons decrease their individual values of the discriminant
function in relation to the input pattern through suitable adjustment of the associated connection
weights, such that the response of the winning neuron to the subsequent application of a
similar input pattern is enhanced.
Stages of SOM algorithm are :
1. Initialization : Choose random values for the initial weight vectors wj.
2. Sampling : Draw a sample training input vector x from the input space.
3. Matching : Find the winning neuron I(x) that has weight vector closest.
D
where Tj, I(x)(t) is a Gaussian neighbourhood and h(t) is the learning rate.
5. Continuation : Keep returning to step 2 until the feature map stops changing.
Deep learning:
Deep learning is the subfield of artificial intelligence that focuses on creating large neural
network models that are capable of making accurate data-driven decisions.
1. Deep learning is used where the data is complex and has large datasets.
2. Facebook uses deep learning to analyze text in online conversations. Google and Microsoft
all use deep learning for image search and machine translation.
3. All modern smart phones have deep learning systems running on them. For example, deep
learning is the standard technology for speech recognition, and also for face detection on
digital cameras.
4. In the healthcare sector, deep learning is used to process medical images (X-rays, CT, and
MRI scans) and diagnose health conditions.
5. Deep learning is also at the core of self-driving cars, where it is used for localization and
mapping, motion planning and steering, and environment perception, as well as tracking driver
state.
Different architecture of deep learning are :
1. Deep Neural Network : It is a neural network with a certain level of complexity (having
multiple hidden layers in between input and output layers). They are capable of modeling
and processing non-linear relationships.
2. Deep Belief Network (DBN) : It is a class of Deep Neural Network. It is multi-layer belief
networks. Steps for performing DBN are :
a. Learn a layer of features from visible units using Contrastive Divergence algorithm.
b. Treat activations of previously trained features as visible units and then learn features of
features.
c. Finally, the whole DBN is trained when the learning for the final hidden layer is
achieved.
3. Recurrent (perform same task for every element of a sequence) Neural Network :
Allows for parallel and sequential computation. Similar to the human brain (large feedback
network of connected neurons). They are able to remember important things about the input
they received and hence enable them to be more precise.
Advantages of deep learning:
Convolutional layers:
1. Convolutional layers are the major building blocks used in convolutional neural networks.
2. A convolution is the simple application of a filter to an input that results in an activation.
3. Repeated application of the same filter to an input results in a map of activations called a
feature map, indicating the locations and strength of a detected feature in an input, such as an
image.
4. The innovation of convolutional neural networks is the ability to automatically learn a
large number of filters in parallel specific to a training dataset under the constraints of a
specific predictive modeling problem, such as image classification.
5. The result is highly specific features that can be detected anywhere on input images.
Activation function :
1. An activation function is a function that is added into an artificial neural network in order to
help the network learn complex patterns in the data.
2. When comparing with a neuron-based model that is in our brains, the activation function is
at the end deciding what is to be fired to the next neuron.
3. That is exactly what an activation function does in an ANN as well.
4. It takes in the output signal from the previous cell and converts it into some form that can be
taken as input to the next cell.
Pooling layer :
1. A pooling layer is a new layer added after the convolutional layer. Specifically, after a non-
linearity (for example ReLU) has been applied to the feature maps output by a
convolutional layer, for example, the layers in a model may look as follows :
a. Input image
b. Convolutional layer
c. Non linearlity
d. Pooling layer
2. The addition of a pooling layer after the convolutional layer is a common pattern used for
ordering layers within a convolutional neural network that may be repeated one or more
times in a given model.
3. The pooling layer operates upon each feature map separately to create a new set of the same
number of pooled feature maps.
Fully connected layer :
1. Fully connected layers are an essential component of Convolutional Neural Networks
(CNNs), which have been proven very successful in recognizing and classifying images for
computer vision.
2. The CNN process begins with convolution and pooling, breaking down the image into
features, and analyzing them independently.
3. The result of this process feeds into a fully connected neural network structure that drives
the final classification decision.
Training Network:
1.Once a network has been structured for a particular application, that network is ready to be
trained.
2.To start this process the initial weights are chosen randomly. Then, the training, or learning
begins.
3. There are two approaches to training :
• In supervised training, both the inputs and the outputs are provided. The network then
processes the inputs and compares its resulting outputs against the desired outputs.
• Errors are then propagated back through the system, causing the system to adjust the
weights which control the network. This process occurs over and over as the
weights are continually tweaked.
• The set of data which enables the training is called the “training set.” During the
training of a network the same set of data is processed many times as the connection
weights are ever refined.
• The other type of training is called unsupervised training. In unsupervised training, the
network is provided with inputs but not with desired outputs.
• The system itself must then decide what features it will use to group the input data. This
is often referred to as self-organization or adaption.
Case Study:
1. Diabetic Retinopathy (DR) is one of the major causes of blindness in the western world.
Increasing life expectancy, indulgent lifestyles and other contributing factors mean the number
of people with diabetes is projected to continue rising.
2. Regular screening of diabetic patients for DR has been shown to be a cost-effective and
important aspect of their care.
3. The accuracy and timing of this care is of significant importance to both the cost and
effectiveness of treatment.
4. If detected early enough, effective treatment of DR is available; making this a vital process.
5. Classification of DR involves the weighting of numerous features and the location of such
features. This is highly time consuming for clinicians.
6. Computers are able to obtain much quicker classifications once trained, giving the ability to
aid clinicians in real-time classification.
7. The efficacy of automated grading for DR has been an active area of research in computer
imaging with encouraging conclusions.
8. Significant work has been done on detecting the features of DR using automated methods
such a support vector machines and k-NN classifiers.
9. The majority of these classification techniques arc on two class classification for DR or
no DR.
1. The rapid development of the Internet economy and Artificial Intelligence (AI) has promoted
the progress of self-driving cars.
2. The market demand and economic value of self-driving cars are increasingly prominent.
At present, more and more enterprises and scientific research institutions have invested in
this field. Google, Tesla, Apple, Nissan, Audi, General Motors, BMW, Ford, Honda,
Toyota, Mercedes, and Volkswagen have participated in the research and development of
self-driving cars.
3. Google is an Internet company, which is one of the leaders in self- driving cars, based on its
solid foundation in artificial intelligence.
4. In June 2015, two Google self-driving cars were tested on the road. So far, Google vehicles
have accumulated more than 3.2 million km of tests, becoming the closest to the actual use.
5. Another company that has made great progress in the field of self- driving cars is Tesla.
Tesla was the first company to devote self-driving technology to production.
6. Followed by the Tesla models series, its “auto-pilot” technology has made major
breakthroughs in recent years.
7. Although the Tesla's autopilot technology is only regarded as Level 2 stage by the National
Highway Traffic Safety Administration (NHTSA), Tesla shows us that the car has basically
realized automatic driving under certain conditions