0% found this document useful (0 votes)
254 views

Ccs355 Neural Networks and Deep Learning Unit1 (1)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
254 views

Ccs355 Neural Networks and Deep Learning Unit1 (1)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

CCS355 NEURAL NETWORKS & DEEP LEARNING

CCS355 NEURAL NETWORKS AND DEEP LEARNING LTPC


2 0 2 3

COURSE OBJECTIVES:

➢ To understand the basics in deep neural networks


➢ To understand the basics of associative memory and unsupervised
learning networks
➢ To apply CNN architectures of deep neural networks
➢ To analyze the key computations underlying deep learning, then use
them to build and train deep neural networks for various tasks.
➢ To apply autoencoders and generative models for suitable
applications.

UNIT I INTRODUCTION

Neural Networks-Application Scope of Neural Networks-Artificial Neural


Network: An Introduction- Evolution of Neural Networks-Basic Models of
Artificial Neural Network- Important Terminologies of ANNs-Supervised
Learning Network.

UNIT II ASSOCIATIVE MEMORY AND UNSUPERVISED LEARNING


NETWORKS

Training Algorithms for Pattern Association- Auto associative Memory Network-


Hetero associative Memory Network-Bidirectional Associative Memory (BAM)-
Hopfield Networks-Iterative Auto associative Memory Networks-Temporal
Associative Memory Network-Fixed Weight Competitive Nets-Kohonen Self-
Organizing Feature Maps-Learning Vector Quantization-Counter propagation
Networks-Adaptive Resonance Theory Network.

UNIT III THIRD-GENERATION NEURAL NETWORKS

Spiking Neural Networks-Convolutional Neural Networks-Deep Learning


Neural Networks-Extreme Learning Machine Model-Convolutional Neural
Networks: The Convolution Operation – Motivation – Pooling – Variants of the
basic Convolution Function – Structured Outputs – Data Types – Efficient
Convolution Algorithms – Neuroscientific Basis – Applications: Computer
Vision, Image Generation, Image Compression.

UNIT IV DEEP FEEDFORWARD NETWORKS

History of Deep Learning- A Probabilistic Theory of Deep Learning- Gradient


Learning – Chain Rule and Backpropagation - Regularization: Dataset
CCS355 NEURAL NETWORKS & DEEP LEARNING

Augmentation – Noise Robustness -Early Stopping, Bagging and Dropout - batch


normalization- VC Dimension and Neural Nets.

UNIT V RECURRENT NEURAL NETWORKS

Recurrent Neural Networks: Introduction – Recursive Neural Networks –


Bidirectional RNNs – Deep Recurrent Networks – Applications: Image
Generation, Image Compression, Natural Language Processing. Complete Auto
encoder, Regularized Autoencoder, Stochastic Encoders and Decoders,
Contractive Encoders.
CCS355 NEURAL NETWORKS & DEEP LEARNING

CCS355 NEURAL NETWORKS AND DEEP LEARNING


UNIT I INTRODUCTION
Neural Networks-Application Scope of Neural Networks-Artificial Neural
Network: An Introduction- Evolution of Neural Networks-Basic Models of
Artificial Neural Network- Important Terminologies of ANNs-Supervised
Learning Network.

Neural Networks: Introduction


Neural networks mimic the basic functioning of the human brain and are
inspired by how the human brain interprets information. They solve various real-
time tasks because of its ability to perform computations quickly and its fast
responses.

Artificial neural networks are popular machine learning techniques that


simulate the mechanism of learning in biological organisms. The human nervous
system contains cells, which are referred to as neurons. The neurons are
connected to one another with the use of axons and dendrites, and the connecting
regions between axons and dendrites are referred to as synapses. The strengths of
synaptic connections often change in response to external stimuli. This change is
how learning takes place in living organisms. This biological mechanism is
simulated in artificial neural networks, which contain computation units that are
referred to as neurons. Throughout this book, we will use the term “neural
networks” to refer to artificial neural networks rather than biological ones. The
computational units are connected to one another through weights, which serve
the same role as the strengths of synaptic connections in biological organisms.
CCS355 NEURAL NETWORKS & DEEP LEARNING

Each input to a neuron is scaled with a weight, which affects the function
computed at that unit.

An artificial neural network computes a function of the inputs by


propagating the computed values from the input neurons to the output neuron(s)
and using the weights as intermediate parameters. Learning occurs by changing
the weights connecting the neurons. Just as external stimuli are needed for
learning in biological organisms, the external stimulus in artificial neural
networks is provided by the training data containing examples of input-output
pairs of the function to be learned.

Dendrites-These receive information or signals from other neurons that get


connected to it.
Cell Body-Information processing happens in a cell body. These take in all the
information coming from the different dendrites and process that information.
Axon-It sends the output signal to another neuron for the flow of information.
Here, each of the flanges connects to the dendrite or the hairs on the next one.
The network starts with an input layer that receives input in data form.
The lines connected to the hidden layers are called weights, and they add up on
the hidden layers. Each dot in the hidden layer processes the inputs, and it puts
an output into the next hidden layer and, lastly, into the output layer.
CCS355 NEURAL NETWORKS & DEEP LEARNING

Application Scope of Neural Networks:


With an enormous number of applications implementations every day, now
is the most appropriate time to know about the applications of neural networks,
machine learning, and artificial intelligence. Some of them are discussed below:
Handwriting Recognition
Neural networks are used to convert handwritten characters into digital
characters that a machine can recognize.
Stock-Exchange prediction
The stock exchange is affected by many different factors, making it
difficult to track and difficult to understand. However, a neural network can
examine many of these factors and predict the prices daily, which would help
stockbrokers.
Currently, this operation is still in its initial phases. However, you should
know that over three terabytes of data a day are generated from the United States
stock exchange alone. That's a lot of data to dig through, and you must sort it out
before you start focusing on even a single stock.
Traveling Issues of sales professionals
This application refers to finding an optimal path to travel between cities
in a given area. Neural networks help solve the problem of providing higher
revenue at minimal costs. However, the Logistical considerations are enormous,
and we must find optimal travel paths for sales professionals moving from town
to town.
Image compression
The idea behind neural network data compression is to store, encrypt, and
recreate the actual image again. Therefore, we can optimize the size of our data
using image compression neural networks. It is the ideal application to save
memory and optimize it.
Speech Recognition
Speech occupies a prominent role in human-human interaction. Therefore,
it is natural for people to expect speech interfaces with computers. In the present
era, for communication with machines, humans still need sophisticated languages
which are difficult to learn and use. To ease this communication barrier, a simple
solution could be, communication in a spoken language that is possible for the
machine to understand.
CCS355 NEURAL NETWORKS & DEEP LEARNING

Banking:
Credit card attrition, credit and loan application evaluation, fraud and risk
evaluation, and loan delinquencies
Business Analytics:
Customer behaviour modelling, customer segmentation, fraud propensity,
market research, market mix, market structure, and models for attrition, default,
purchase, and renewals
Defence:
Counterterrorism, facial recognition, feature extraction, noise suppression,
object discrimination, sensors, sonar, radar and image signal processing,
signal/image identification, target tracking, and weapon steering
Education:
Adaptive learning software, dynamic forecasting, education system
analysis and forecasting, student performance modelling, and personality
profiling
Financial:
Corporate bond ratings, corporate financial analysis, credit line use
analysis, currency price prediction, loan advising, mortgage screening, real estate
appraisal, and portfolio trading
Medical:
Cancer cell analysis, ECG and EEG analysis, emergency room test
advisement, expense reduction and quality improvement for hospital systems,
transplant process optimization, and prosthesis design
Securities:
Automatic bond rating, market analysis, and stock trading advisory
systems
Transportation:
Routing systems, truck brake diagnosis systems, and vehicle scheduling.
CCS355 NEURAL NETWORKS & DEEP LEARNING

ARTIFICIAL NEURAL NETWORK: An Introduction


An Artificial neural network is usually a computational network based on
biological neural networks that construct the structure of the human brain. Similar
to a human brain has neurons interconnected to each other, artificial neural
networks also have neurons that are linked to each other in various layers of the
networks. These neurons are known as nodes.
The term "Artificial Neural Network" is derived from Biological neural
networks that develop the structure of a human brain. Similar to the human brain
that has neurons interconnected to one another, artificial neural networks also
have neurons that are interconnected to one another in various layers of the
networks.

Input Layer:
As the name suggests, it accepts inputs in several different formats provided by
the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
CCS355 NEURAL NETWORKS & DEEP LEARNING

Output Layer:
The input goes through a series of transformations using the hidden layer, which
finally results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum
of the inputs and includes a bias. This computation is represented in the form of
a transfer function.

EVOLUTION OF NEURAL NETWORKS

WHO IS DEVELOPING NEURAL NETWORKS?


This section presents a very brief summary of the history of neural
networks, in terms of the development of architectures and algorithms that are
widely used today. Results of a primarily biological nature are not included, due
to space constraints. They have, however, served as the inspiration for a number
of net- works that are applicable to problems beyond the original ones studied.
The history of neural networks shows the interplay among biological
experimentation, modelling, and computer simulation/hardware implementation.
Thus, the field is strongly interdisciplinary.

The 1940s: The Beginning of Neural Nets McCulloch-Pitts neurons


Warren McCulloch and: Walter Pitts designed what are generally regarded
as the first neural networks [McCulloch & Pitts, 1943]. These researchers
recognized that combining many simple neurons into neural systems was the
source of increased computational power. The weights on a McCulloch-Pitts
neuron are set so that the neuron performs a particular simple logic function, with
different neurons performing different functions. The neurons can be arranged
into a net to produce any output that can be represented as a combination of logic
functions. The flow of information through the net assumes a unit time step for a
signal to travel from one neuron to the next. This time delay allows the net to
model some physiological processes, such as the perception of hot and cold.
The idea of a threshold such that if the net input to a neuron is greater than
the threshold then the unit fires is one feature of a McCulloch-Pitts neuron that is
used in many artificial neurons today. However, McCulloch-Pitts neurons are
used most widely as logic circuits [Anderson & Rosenfeld, 1988].
CCS355 NEURAL NETWORKS & DEEP LEARNING

McCulloch and Pitts subsequent work [Pitts & McCulloch, 1947] addressed
issues that are still important research areas today, such as translation and rotation
invariant pattern recognition.

Hebb learning
Donald Hebb, a psychologist at McGill University, designed the first
learning law for artificial neural networks [Hebb, 1949]. His premise was that
if two neurons were active simultaneously', then the strength of the connection
between them should be increased. Refinements were subsequently made to this
rather general statement to allow computer simulations [Rochester, Holland,
Haibt & Duda, 1956]. The idea is closely related to the correlation matrix
learning developed by Kohonen (1972) and Anderson (1972) among others. An
expanded form of Hebb learning [McClelland & Rumelhart, 1988] in which
units that are simultaneously off also reinforce the weight on the connection
between them.

The 1950s and 1960s: The First Golden Age of Neural Networks
Although today neural networks are often viewed as an alternative to (or
complement of) traditional computing, it is interesting to note that John von
Neumann, the "father of modern computing," was keenly interested in modeling
the brain [von Neumann, 1958]. Johnson and Brown (1988) and Anderson and
Rosenfeld (1988) discuss the interaction between von Neumann f1nd early neural
network researchers such as Warren McCulloch, and present further indication of
von Neumann's views of the directions in which computers would develop.
Perceptrons
Together with several other researchers [Block, 1962; Minsky & Papert,
1988 (originally published 1969)], Frank Rosenblatt (1958, 1959, 1962)
introduced and developed a large class of artificial neural networks called
perceptrons. The most typical perceptron consisted of an input layer (the retina)
connected by paths with fixed weights to associator neurons; the weights on the
connection paths were adjustable. The perceptron learning rule uses an iterative
weight adjustment that is more powerful than the Hebb rule. Perceptron learning
can be proved to con- verge to the correct weights if there are weights that will
solve the problem at hand (i.e., allow the net to reproduce correctly all of the
training input and target output pairs). Rosenblatt's 1962 work describes many
types of perceptrons. Like the neurons developed by McCulloch and Pitts and by
Hebb, perceptrons use a threshold output function.
The early successes with perceptrons led to enthusiastic claims. However,
the mathematical proof of the convergence of iterative learning under suitable
assumptions was followed by a demonstration of the limitations regarding what
the perceptron type of net can learn [Minsky & Papert, 1969].
CCS355 NEURAL NETWORKS & DEEP LEARNING

ADALINE
Bernard Widrow and his student, Marcian (Ted) Hoff [Widrow & Hoff,
1960], developed a learning rule (which usually either bears their names, or is
designated the least mean squares or delta rule) that is closely related to the
perceptron learning rule. The perceptron rule adjusts the connection weights to a
unit when- ever the response of the unit is incorrect. (The response indicates a
classification of the input pattern.) The delta rule adjusts the weights to reduce the
difference between the net input to the output unit and the desired output. This
results in the smallest mean squared error. The similarity of models developed in
psychology by Rosenblatt to those developed in electrical engineering by Widrow
and Hoff is evidence of the interdisciplinary nature of neural networks. The
difference in learning rules, although slight, leads to an improved ability of the
net to generalize (i.e., respond to input that is similar, but not identical, to that on
which it was trained). The Widrow-Hoff learning rule for a single-layer network
is a precursor of the backpropagation rule for multilayer nets.

Work by Widrow and his students is sometimes reported as neural network


research, sometimes as adaptive linear systems. The name ADALINE, interpreted
as either Adaptive Linear Neuron or Adaptive Linear system, is often given to
these nets. There have been many interesting applications of ADALINES, from
neural networks for adaptive antenna systems [Widrow, Mantey, Griffiths, &
Goode, 1967] to rotation-invariant pattern recognition to a variety of control
problems, such as broom balancing and backing up a truck [Widrow, 1987; Tolat
& Widrow, 1988; Nguyen & Widrow, 1989]. MADALINES are multilayer
extensions· of ADALINEs [Widrow & Hoff, 1960; Widrow & Lepr, 1990].

The 1970s: The Quiet Years


In spite of Minsky and Papert' s demonstration of the limitations of
perceptrons (i.e., single-layer nets), research on neural networks continued. Many
of the cur- rent leaders in the field began to publish their work during the 1970s.
(Widrow, of course, had started somewhat earlier and is still active.)

Kohonen
The early work of Teuvo Kohonen (1972), of Helsinki University of
Technology, dealt with associative memory neural nets. His more recent work
[Kohonen, 1982] has been the development of self-organizing feature maps that
use a topological structure for the cluster units. These nets have been applied to
speech recognition (for Finnish and Japanese words) [Kohonen, Torkkola,
Shozakai, Kangas, & Venta, 1987; Kohonen, 1988], the solution of the "Traveling
CCS355 NEURAL NETWORKS & DEEP LEARNING

Salesman Problem" [Angeniol, Vaubois, & Le Texier, 1988], and musical


composition [Kohonen, 1989b].

Anderson
James Anderson, of Brown University, also started his research in neural
net- works with associative memory nets [Anderson, 1968, 1972]. He developed
these ideas into his "Brain-State-in-a-Box" [Anderson, Silverstein, Ritz, & Jones,
1977], which truncates the linear output of earlier models to prevent the output
from becoming too large as the net iterates to find a stable solution (or memory).
Among the areas of application for these nets are medical diagnosis and learning
multiplication tables. Anderson and Rosenfeld (1988) and Anderson, Pellionisz,
and Rosenfeld (1990) are collections of fundamental papers on neural network
research. The introductions to each are especially useful.

Grossberg
Stephen Grossberg, together with his many colleagues and coauthors, has
had an extremely prolific and productive career. Klimasauskas (1989) lists 146
publica- tions by Grossberg from 1967 to 1988. His work, which is very
mathematical and very biological, is widely known [Grossberg, 1976, 1980, 1982,
1987, 1988]. Gross- berg is director of the Center for Adaptive Systems at Boston
University.

Carpenter
Together with Stephen Grossberg, Gail Carpenter has developed a theory of self-
organizing neural networks called adaptive resonance theory [Carpenter & Gross-
berg, 1985, 1987a, 1987b, 1990]. Adaptive resonance theory nets for binary input
patterns (ARTI) and for continuously valued inputs (ART2) will be examined in
Chapter 5.

The 1980s: Renewed Enthusiasm


Backpropagation
Two of the reasons for the "quiet years" of the 1970s were the failure of
single- layer perceptrons to be able to solve such simple problems (mappings) as
the XoR function and the lack of a general method of training a multilayer net. A
method for propagating information about errors at the output units back to the
hidden units had been discovered in the previous decade [Werbos, 1974], but had
not gained wide publicity. This method was also discovered independently by
David Parker (1985) and by LeCun (1986) before it became widely known. It is
very similar to yet an earlier algorithm in optimal control theory [Bryson & Ho,
CCS355 NEURAL NETWORKS & DEEP LEARNING

1969]. Parker's work came to the attention of the Parallel Distributed Processing
Group led by psychologists David Rumelhart, of the University of California at
San Diego, and James McClelland, of Carnegie-Mellon University, who refined
and publicized it [Rumelhart, Hinton, & Williams, 1986a, 1986b; McClelland &
Rumelhart, 19881.
Hopfield nets
Another key player in the increased visibility of and respect for neural nets
is prominent physicist John Hopfield, of the California Institute of Tech- nology.
Together with David Tank, a researcher at AT&T, Hopfield has developed a
number of neural networks based on fixed weights and adaptive activations
[Hopfield, 1982, 1984; Hopfield & Tank, 1985, 1986; Tank & Hopfield, 1987].
These nets can serve as associative memory nets and can be used to solve con-
straint satisfaction problems such as the "Traveling Salesman Problem." An ar-
ticle in Scientific American [Tank & Hopfield, 1987] helped to draw popular at-
tention to neural nets, as did the message of a Nobel prize-winning physicist that,
in order to make machines that can do what humans do, we need to study human
cognition.
Neocognitron
Kunihiko Fukushima and his colleagues at NHK Laboratories in Tokyo
have developed a series of specialized neural nets for character recognition. One
ex- ample of such a net, called a neocognitron, is described in Chapter 7. An
earlier self-organizing network, called the cognitron [Fukushima, 1975], failed to
rec- ognize position- or rotation-distorted characters. This deficiency was
corrected in the neocognitron [Fukushima, 1988; Fukushima, Miyake, & Ito,
1983].
Boltzmann machine
A number of researchers have been involved in the development of
nondeter- ministic neural nets, that is, nets in which weights or activations are
changed on the basis of a probability density function [Kirkpatrick, Gelatt, &
Vecchi, 1983; Geman & Geman, 1984; Ackley, Hinton, & Sejnowski, 1985; Szu
& Hartley, 1987]. These nets incorporate such classical ideas as simulated
annealing and Bayesian decision theory.
Hardware implementation
Another reason for renewed interest in neural networks (in addition to
solving the problem of how to train a multilayer net) is improved computational
capa- bilities. Optical neural nets [Farhat, Psaltis, Prata, & Paek, 1985] and VLSI
im- plementations [Sivilatti, Mahowald, & Mead, 1987] are being developed.
CCS355 NEURAL NETWORKS & DEEP LEARNING

Carver Mead, of California Institute of Technology, who also studies


motion detection, is the coinventor of software to design microchips. He is also
cofounder of Synaptics, Inc., a leader in the study of neural circuitry.
Nobel laureate Leon Cooper, of Brown University, introduced one of the
first multilayer nets, the reduced coulomb energy network. Cooper is chairman of
Nestor, the first public neural network company [Johnson & Brown, 1988], and
the holder of several patents for information-processing systems [Klima- sauskas,
1989].
Robert Hecht-Nielsen and Todd Gutschow developed several digital neu-
rocomputers at TRW, Inc., during 1983-85. Funding was provided by the Defense
Advanced Research Projects Agency (DARPA) [Hecht-Nielsen, 1990]. DARPA
(1988) is a valuable summary of the state of the art in artificial neural networks
(especially with regard to successful applications). To quote from the preface to
his book, Neurocomputing, Hecht-Nielsen is "an industrialist, an adjunct aca-
demic, and a philanthropist without financial portfolio" [Hecht-Nielsen, 1990].
The founder of HNC, Inc., he is also a professor at the University of California,
San Diego, and the developer of the counter propagation network.
Deep Learning Era (2000s - Present):
• Increased Data and Computing Power: Large datasets and faster
computers fueled the resurgence of deeper architectures.
• Convolutional Neural Networks (CNNs): Inspired by visual cortex,
excelled in image recognition and dominated computer vision tasks.
• Recurrent Neural Networks (RNNs): Processed sequential data like text
and speech, enabling tasks like machine translation and speech recognition.
• Generative Adversarial Networks (GANs): Two competing networks,
one generating data and the other discriminating real from fake, driving
advances in image generation and creative applications.
The Future:
• Neuromorphic Computing: Hardware inspired by the brain, promising
more energy-efficient and biologically realistic implementations.
• Explainable AI (XAI): Increasing transparency and understanding of
neural network decisions, fostering trust and interpretability.

BASIC MODELS OF ARTIFICIAL NEURAL NETWORK


The model of Artificial neural network which can be specified bhy three
factors.
CCS355 NEURAL NETWORKS & DEEP LEARNING

1. Interconnections
2. Learning rules
3. Activation functions

1. Interconnections: An ANN consists of a set of highly interconnected


processing elements such that each processing element's output is found to be
connected through weights to the other processing elements or to itself; delay
leads and lag-free connections are allowed. Hence, the arrangement of these
processing elements and the geometry of their interconnections are essential for
an ANN. The point where the connection originates and terminate should be
noted, and the function of each processing element in an ANN should be specified.
The arrangement of neurons to form layers and connection pattern formed within
and between layers is called the network architecture.
There are five basic types of neuron connection architectures: -

1. Single layer feed forward network.


2. Multilayer feedforward network
3. Single node with its own feedback
4. Single layer recurrent network
5. Multilayer recurrent network

1. Single layer feed forward network


CCS355 NEURAL NETWORKS & DEEP LEARNING

A layer is formed by taking a processing element and combining it with


other processing elements. When a layer of the processing nodes is formed, the
inputs can be connected to these nodes with various weights, resulting in a series
of outputs, one per node. Thus, a single-layer feedforward network is formed.

2. Multilayer feedforward network

A multilayer feedforward network is formed by the interconnection of


several layers. The input layer is that which receives the input and this layer has
no function except buffering the input signal. The output layer generates the
output of the network. Any layer that is formed between the input layer and the
output layer is called the hidden layer.
3. Single node with its own feedback

If the feedback of the output of the processing elements is directed back


as an input to the processing elements in the same layer, then it is called lateral
feedback.

4. Single-layer recurrent network


Recurrent networks are the feedback networks with a closed loop. The
above network is a single-layer network with a feedback connection in which the
processing element’s output can be directed back to itself or to another processing
element or both. A recurrent neural network is a class of artificial neural networks
CCS355 NEURAL NETWORKS & DEEP LEARNING

where connections between nodes form a directed graph along a sequence. This
allows it to exhibit dynamic temporal behavior for a time sequence. Unlike
feedforward neural networks, RNNs can use their internal state (memory) to
process sequences of inputs.

5. Multilayer recurrent network


In this type of network, processing element output can be directed to the
processing element in the same layer and in the preceding layer forming a
multilayer recurrent network. They perform the same task for every element of a
sequence, with the output being dependent on the previous computations. Inputs
are not needed at each time step. The main feature of a Recurrent Neural Network
is its hidden state, which captures some information about a sequence.

2. Learning rule: The main property of an ANN is its capability to learn.


Learning or training is a process by means of which a neural network adapts itself
CCS355 NEURAL NETWORKS & DEEP LEARNING

to a stimulus by resulting in the production of the fdesired response. Broadly,


there are two kinds of learning in ANNs:
1. Parameter learning: It updates the connecting weights in a neural net.
2. Structure learning: It focuses on the change in network structure.
Apart from these two categories of learning, learning in an ANN can be
generally classified into three categories:
i. Supervised learning
ii. Unsupervised learning
iii. Reinforcement learning.
I. SUPERVISED LEARNING
• In ANNs the supervised learning, each input vector requires a
corresponding target vector, which represents the desired output. The input
vector along with the target vector is called training pair.
• In this learning it is assumed that the correct "target" output value S is
known for each input pattern.

II. UNSUPERVISED LEARNING


• In ANNs following unsupervised learning, the input vectors of similar type
are grouped without the use of training data to specify how a member of
each group looks or to which group a number belongs.
CCS355 NEURAL NETWORKS & DEEP LEARNING

III. REINFORCEMENT LEARNING


• This learning process is similar to supervised learning. In the case of
supervised learning, the correct target output values are known for each
input pattern. But, in some cases, less information might be available.
• The learning based on this critic information is called Reinforcement
Learning and the feedback sent is called reinforcement signal.

3. Activation Function
• A person is performing some work. To make the work more efficient and
to obtain exact output, some force or activation may be given. This
activation helps in achieving the exact output. In a similar way, the
activation I function is applied over the next input to calculate the output
of an ANN.
• In the process of building a neural network, one of the choices you get to
make is what activation function to use in the hidden layer as well as at the
output layer of the network.
CCS355 NEURAL NETWORKS & DEEP LEARNING

• Activation function decides, whether a neuron should be activated or not


by calculating weighted sum and further adding bias with it. The purpose
of the activation function is to introduce non- linearity into the output of a
neuron.

The basic operation of an artificial neuron involves summing its weighted


input signal and applying an output, or activation, function. For the input
units, this function is the identity function (see Figure 1.7). Typically, the
same activation function is used for all neurons in any particular layer of a
neural net, although this is not required. In most cases, a nonlinear activation
function is used. In order to achieve the advantages of multilayer nets,
compared with the limited capabilities of single-layer nets, nonlinear
functions are required (since the results of feeding a signal through two or
more layers of linear processing elements-i.e., elements with linear
activation functions-are no different from what can be obtained using a
single layer).

(i) Identity function:

f ( x ) = x for all x.
Single-layer nets often use a step function to convert the net input, which
is a continuously valued variable, to an output unit that is a binary (1 or 0)
or bipolar (1 or -1) signal (see Figure 1.8):The use of a threshold in this
regard is discussed in Section 2.1.2. The binary step function is also known as
the threshold function or Heaviside function.
CCS355 NEURAL NETWORKS & DEEP LEARNING

( ii ) Binary step function (with threshold 0):

Sigmoid functions (S-shaped curves) are useful activation functions. The


logistic function and the hyperbolic tangent functions are the most common.
They are especially advantageous for use in neural nets trained by
backpropagation because the simple relationship between the value of the
function at a point and the value of the derivative at that point reduces the
computational burden during training.

The logistic function, a sigmoid function with range from O to l , is often


used as the activation function for neural nets in which the desired output values
either are binary or are in the interval between O and 1. To emphasize the range
of the function, we will call it the binary sigmoid ; it is also called the logistic
sigmoid. This function is illustrated in Figure 1.9 for two values of the steepness
parameter a.

( iii ) Binary sigmoid:

The logistic sigmoid function can be scaled to have any range of values
that is appropriate for a given problt!m. The most com- mon range is from -
1 to 1;we call this sigmoid the bipolar sigmoid .
CCS355 NEURAL NETWORKS & DEEP LEARNING

Binary sigmoid. Steepness parameters CY = 1 and CY = 3.

( iv ) Bipolar sigmoid:

The bipolar sigmoid is closely related to the hyperbolic tangent function,


which is also often used as the activation function when the desired range
of output values is between -1 and 1.

We illustrated the correspondence between the two for σ = 1. We have


CCS355 NEURAL NETWORKS & DEEP LEARNING

(
1

The hyperbolic tangent is


(
1

(
1

The derivative of the hyperbolic tangent is


h'( x) = [I + h(x)][l - h(x)].
For binary data (rather than continuously valued data in the range from
O to I), it is usually preferable to convert to bipolar form and use the bipolar
sigmoid or hyperbolic tangent. A more extensive discussion of the choice
of activation functions and different forms of sigmoid functions.

(v) tan-h:

Tanh function is very similar to the sigmoid/logistic activation function and


even has the same S-shape with the difference in the output range of -1 to 1. The
advantages of using this activation function are:
● The output of the tanh activation function is zero-centered; hence we can
easily map the output values as strongly negative, neutral, or strongly positive.
● Usually used in hidden layers of a neural network as its values lie
between -1 to 1; therefore, the mean for the hidden layer comes out to be 0 or
very close to it.
It helps in centering the data and makes learning for the next layer much
easier.
Drawback: it also faces the problem of vanishing gradients similar to the
sigmoid activation function.
CCS355 NEURAL NETWORKS & DEEP LEARNING

(vi) ReLU:

• ReLU stands for Rectified Linear Unit.


• Although it gives an impression of a linear function, ReLU has a
derivative function and allows for
• backpropagation while simultaneously making it computationally
efficient.
• The main catch here is that the ReLU function does not activate all the
neurons at the same time.
• The neurons will only be deactivated if the output of the linear
transformation is less than 0.
• The advantages of using ReLU as an activation function are as follows:
o Since only a certain number of neurons are activated, the ReLU
function is far more computationally efficient when compared
to the sigmoid and tanh functions.
o ReLU accelerates the convergence of gradient descent towards
the global minimum of the loss function due to its linear, non-
saturating property.
• The limitations faced by this function are:
o The Dying ReLU problem

REGULARIZATION:
Need of Regularization
• Overfitting refers to the phenomenon where a neural network models the
training data very well but fails when it sees new data from the same
problem domain.
• Overfitting is caused by noise in the training data that the neural network
picks up during training and learns it as an underlying concept of the data.
CCS355 NEURAL NETWORKS & DEEP LEARNING

• This learned noise, however, is unique to each training set. As soon as the
model sees new data from the same problem domain, but that does not
contain this noise, the performance of the neural network gets much worse.
• The reason for this is that the complexity of this network is too high.
• The model with a higher complexity is able to pick up and learn patterns
(noise) in the data that are just caused by some random fluctuation or error.
• Less complex neural networks are less susceptible to overfitting. To
prevent overfitting or a high variance we must use something that is called
regularization.
What Is Regularization?
Regularization means restricting a model to avoid overfitting by shrinking
the coefficient estimates to zero. When a model suffers from overfitting, we
should control the model's complexity. Technically, regularizations avoid
overfitting by adding a penalty to the model's loss function:
Regularization = Loss Function + Penalty
There are three commonly used regularization techniques to control the
complexity of machine learning
models, as follows:
• L2 regularization
• L1 regularization
• Elastic Net regularization
• Early stopping
• Drop-out

L2 Regularization
A linear regression that uses the L2 regularization technique is called ridge
regression. In other words, in ridge regression, a regularization term is added to
the cost function of the linear regression, which keeps the magnitude of the
model's weights (coefficients) as small as possible. The L2 regularization
technique tries to keep the model's weights close to zero, but not zero, which
means each feature should have a low impact on the output while the model's
accuracy should be as high as possible.
CCS355 NEURAL NETWORKS & DEEP LEARNING

Where λ controls the strength of regularization, and Wj are the model's weights
(coefficients).
By increasing 1, the model becomes flattered and underfit. On the other hand, by
decreasing 1, the model
becomes more overfit, and with λ = 0, the regularization term will be eliminated.

L1 Regularization
Least Absolute Shrinkage and Selection Operator (lasso) regression is an
alternative to ridge for regularizing linear regression. Lasso regression also adds
a penalty term to the cost function, but slightly different, called L1 regularization.
L1 regularization makes some coefficients zero, meaning the model will ignore
those features. Ignoring the least important features helps emphasize the model's
essential features.

Where λ controls the strength of regularization, and Wj are the model's weights
(coefficients).
Lasso regression automatically performs feature selection by eliminating the least
important features.

Elastic Net Regularization


The third type of regularization. (you may have guessed by now) uses both
L1 and L2 regularizations to produce most optimized output. In addition to setting
and choosing a lambda value elastic net also allows us to tune the alpha parameter
where a = 0 corresponds to ridge and a = 1 to lasso. Simply put, if you plug in 0
for alpha, the penalty function reduces to the L1 (ridge) term and if we set alpha
to 1 we get the L2 (lasso) term.
CCS355 NEURAL NETWORKS & DEEP LEARNING

Therefore we can choose an alpha value between 0 and 1 to optimize the


elastic net(here we can adjust the weightage of each regularization, thus giving
the name elastic). Effectively this will shrink some coefficients and set some to 0
for sparse selection.

Early stopping
Early stopping is a kind of cross-validation strategy where we keep one part
of the training set as the validation set. When we see that the performance on the
validation set is getting worse, we immediately stop the training on the model.
This is known as early stopping.

Droput:

1. Dropout is a regularization technique used in machine learning to reduce


overfitting by preventing complex co-adaptations on training data.
CCS355 NEURAL NETWORKS & DEEP LEARNING

2. Dropout randomly turns off a fraction of the neurons in a neural network


during training, which forces the network to learn more robust and generalizable
features.
3. Dropout is a simple and effective technique that can improve the
performance of deep learning models without requiring extensive hyperparameter
tuning.
4. Dropout can be applied to different types of neural networks, including
convolutional neural networks (CNNs) and recurrent neural networks (RNNs).
5. The dropout rate is a hyperparameter that controls the fraction of neurons
that are randomly dropped out during training. A dropout rate of 0.5 is commonly
used.
6. Dropout can be applied at different layers in a neural network but is
typically applied after the activation function.
7. Dropout can also be combined with other regularization techniques, such
as weight decay and early stopping, to further improve model performance.
8. Dropout can increase the training time of a model but can be easily
parallelized and scaled to large datasets.
9. Dropout has been shown to be effective in a wide range of machine
learning applications, including image classification, speech recognition, and
natural language processing.
10. Dropout is a widely used technique in modern deep learning
architectures and is implemented in popular deep learning libraries such as
TensorFlow and PyTorch.
Example:
Suppose we have a neural network with 3 layers: an input layer with 100
neurons, a hidden layer with 50 neurons, and an output layer with 10 neurons. We
apply dropout with a dropout rate of 0.5 after the hidden layer.
During training, the dropout layer randomly drops out 50% of the neurons
in the hidden layer on each forward pass. This means that on each forward pass,
a different set of neurons in the hidden layer will be dropped out, so the network
has to learn to be robust to different subsets of neurons being dropped out.
For example, during the first forward pass, neurons 1, 5, 10, 15, 20, and so
on may be dropped out. During the second forward pass, neurons 2, 6, 11, 16, 21,
and so on may be dropped out. And so on for each subsequent forward pass. The
CCS355 NEURAL NETWORKS & DEEP LEARNING

dropout layer does not drop out any neurons during inference (i.e., when making
predictions on new data), since we want the full power of the network to be used
for making accurate predictions.
Advantages of Dropout:
1. Improved Generalization: Dropout is an effective technique for reducing
overfitting and improving generalization. By randomly dropping out neurons
during training, the network learns to be more robust to different subsets of
neurons being dropped out and therefore can generalize better to unseen data.
2. Simplicity: Dropout is a simple and easy-to-implement regularization technique
that does not require extensive hyperparameter tuning. It can be easily integrated
into existing neural network architectures.
3. Computationally efficient: Dropout is computationally efficient, and can be
easily parallelized, allowing for faster training times and the ability to scale to
large datasets.
4. Reduces co-adaptation: Dropout encourages neurons to be more independent
and reduces the co-adaptation between neurons. This can help prevent overfitting
and improve model performance.
5. No additional data required: Unlike other regularization technique dropout
doesn’t need any additional data for training.

Drawbacks of Dropout:
1. Increased Training Time: Dropout increases the training time of the neural
network, as the network needs to be trained multiple times with different subsets
of neurons dropped out. However, this can be mitigated by parallelizing the
training process.
2. Reduced learning rate: The use of dropout can reduce the effective learning rate
of the network, which can slow down the learning process.
3. Can cause instability: In some cases, dropout can cause instability during
training, particularly if the dropout rate is too high. This can be addressed by
tuning the dropout rate and adjusting other hyperparameters.
4. Cannot used by all type of networks.
CCS355 NEURAL NETWORKS & DEEP LEARNING

Difference between Supervised vs Unsupervised vs Reinforcement learning

Criteria Supervised Unsupervised Reinforcement


Learning Learning Learning

Input Data Input data is Input data is not Input data is not
labelled. labelled. predefined.
Problem Learn pattern of Divide data into Find the best
inputs and their classes. reward between a
labels. start and an end
state.
Solution Finds a mapping Finds similar Maximizes reward
equation on input features in input by assessing the
data and its labels. data to classify it results of state-
into classes. action pairs

Model Building Model is built and Model is built and The model is
trained prior to trained prior to trained and tested
testing. testing. simultaneously.
Applications Deal with Deals with Deals with
regression and clustering and exploration and
classification associative rule exploitation
problems. mining problems. problems.

Algorithms Used Decision trees, K-means Q-learning,


linear regression, clustering, k- SARSA, Deep Q
K-nearest medoids Network
neighbors clustering,
agglomerative
clustering
Examples Image detection, Customer Drive-less cars,
Population segmentation, self-navigating
growth prediction feature elicitation, vacuum cleaners,
targeted etc
marketing, etc

You might also like