Artificial Neural Network
Artificial Neural Network
On
ARTIFICIAL NEURAL NETWORK
B.YOGAPRABHU
CSE ‘B’ III
REG.NO:10108104088
AALIM MUHAMMED SALEGH
COLLEGE OF ENGINEERING
Artificial neural network
An artificial neural network (ANN), usually called neural network (NN), is a mathematical
model or computational model that is inspired by the structure and/or functional aspects of biological
neural networks. A neural network consists of an interconnected group of artificial neurons, and it
processes information using a connectionist approach to computation. In most cases an ANN is an
adaptive system that changes its structure based on external or internal information that flows through the
network during the learning phase. Modern neural networks are non-linear statistical data modeling tools.
They are usually used to model complex relationships between inputs and outputs or to find patterns in
data.
An artificial neural network is an interconnected group of nodes, akin to the vast network of
neurons in the human brain.
Background
The original inspiration for the term Artificial Neural Network came from examination of central
nervous systems and their neurons, axons, dendrites and synapses which constitute the processing
elements of biological neural networks investigated by neuroscience. In an artificial neural network,
simple artificial nodes, variously called "neurons", "neurodes", "processing elements" (PEs) or "units",
are connected together to form a network of nodes mimicking the biological neural networks — hence
the term "artificial neural network".
Because neuroscience is still full of unanswered questions and since there are many levels of
abstraction and therefore, many ways to take inspiration from the brain, there is no single formal
definition of what an artificial neural network is. Most would agree that it involves a network of simple
processing elements which can exhibit complex global behavior determined by the connections between
the processing elements and element parameters. While an artificial neural network does not have to be
adaptive per se, its practical use comes with algorithms designed to alter the strength (weights) of the
connections in the network to produce a desired signal flow.
These networks are also similar to the biological neural networks in the sense that functions are
performed collectively and in parallel by the units, rather than there being a clear delineation of subtasks
to which various units are assigned (see also connectionism). Currently, the term Artificial Neural
Network (ANN) tends to refer mostly to neural network models employed in statistics, cognitive
psychology and artificial intelligence. Neural network models designed with emulation of the central
nervous system (CNS) in mind are a subject of theoretical neuroscience and computational neuroscience.
In modern software implementations of artificial neural networks, the approach inspired by biology has
been largely abandoned for a more practical approach based on statistics and signal processing. In some
of these systems, neural networks or parts of neural networks (such as artificial neurons) are used as
components in larger systems that combine both adaptive and non-adaptive elements. While the more
general approach of such adaptive systems is more suitable for real-world problem solving, it has far less
to do with the traditional artificial intelligence connectionist models. What they do have in common,
however, is the principle of non-linear, distributed, parallel and local processing and adaptation.
Models
Neural network models in artificial intelligence are usually referred to as artificial neural
networks (ANNs); these are essentially simple mathematical models defining a function or a
distribution over or both and , but sometimes models are also intimately associated with a particular
learning algorithm or learning rule. A common use of the phrase ANN model really means the definition
of a class of such functions (where members of the class are obtained by varying parameters, connection
weights, or specifics of the architecture such as the number of neurons or their connectivity).
Network function
The word network in the term 'artificial neural network' refers to the inter–connections between
the neurons in the different layers of each system. The most basic system has three layers. The first layer
has input neurons which send data via synapses to the second layer of neurons and then via more
synapses to the third layer of output neurons. More complex systems will have more layers of neurons
with some having increased layers of input neurons and output neurons. The synapses store parameters
called "weights" which are used to manipulate the data in the calculations.
The layers network through the mathematics of the system algorithms. The network function
is defined as a composition of other functions , which can further be defined as a composition of
other functions. This can be conveniently represented as a network structure, with arrows depicting the
dependencies between variables. A widely used type of composition is the nonlinear weighted sum,
where , where (commonly referred to as the activation function[1]) is some
predefined function, such as the hyperbolic tangent. It will be convenient for the following to refer to a
collection of functions as simply a vector .
Learning
What has attracted the most interest in neural networks is the possibility of learning. Given a
specific task to solve, and a class of functions , learning means using a set of observations to find
which solves the task in some optimal sense.
This entails defining a cost function such that, for the optimal solution ,
(i.e., no solution has a cost less than the cost of the optimal solution).
The cost function is an important concept in learning, as it is a measure of how far away a
particular solution is from an optimal solution to the problem to be solved. Learning algorithms search
through the solution space to find a function that has the smallest possible cost.
For applications where the solution is dependent on some data, the cost must necessarily be a
function of the observations, otherwise we would not be modelling anything related to the data. It is
frequently defined as a statistic to which only approximations can be made. As a simple example,
consider the problem of finding the model which minimizes , for data pairs drawn
from some distribution . In practical situations we would only have samples from and thus, for the
above example, we would only minimize . Thus, the cost is minimized over a
sample of the data rather than the entire data set.
When some form of online machine learning must be used, where the cost is partially
minimized as each new example is seen. While online machine learning is often used when is fixed, it
is most useful in the case where the distribution changes slowly over time. In neural network methods,
some form of online machine learning is frequently used for finite datasets.
While it is possible to define some arbitrary, ad hoc cost function, frequently a particular cost will
be used, either because it has desirable properties (such as convexity) or because it arises naturally from a
particular formulation of the problem (e.g., in a probabilistic formulation the posterior probability of the
model can be used as an inverse cost). Ultimately, the cost function will depend on the desired task. An
overview of the three main categories of learning tasks is provided below.
Learning paradigms
There are three major learning paradigms, each corresponding to a particular abstract learning
task. These are supervised learning, unsupervised learning and reinforcement learning.
Supervised learning
In supervised learning, we are given a set of example pairs and the aim is to find a
function in the allowed class of functions that matches the examples. In other words, we wish to
infer the mapping implied by the data; the cost function is related to the mismatch between our mapping
and the data and it implicitly contains prior knowledge about the problem domain.
A commonly used cost is the mean-squared error which tries to minimize the average squared
error between the network's output, f(x), and the target value y over all the example pairs. When one tries
to minimize this cost using gradient descent for the class of neural networks called multilayer
perceptrons, one obtains the common and well-known backpropagation algorithm for training neural
networks.
Tasks that fall within the paradigm of supervised learning are pattern recognition (also known as
classification) and regression (also known as function approximation). The supervised learning paradigm
is also applicable to sequential data (e.g., for speech and gesture recognition). This can be thought of as
learning with a "teacher," in the form of a function that provides continuous feedback on the quality of
solutions obtained thus far.
Unsupervised learning
In unsupervised learning, some data is given and the cost function to be minimized, that can be
any function of the data and the network's output, .
The cost function is dependent on the task (what we are trying to model) and our a priori
assumptions (the implicit properties of our model, its parameters and the observed variables).
As a trivial example, consider the model , where is a constant and the cost
. Minimizing this cost will give us a value of that is equal to the mean of the data. The
cost function can be much more complicated. Its form depends on the application: for example, in
compression it could be related to the mutual information between x and y, whereas in statistical
modeling, it could be related to the posterior probability of the model given the data. (Note that in both of
those examples those quantities would be maximized rather than minimized).
Tasks that fall within the paradigm of unsupervised learning are in general estimation problems;
the applications include clustering, the estimation of statistical distributions, compression and filtering.
Reinforcement learning
In reinforcement learning, data are usually not given, but generated by an agent's interactions
with the environment. At each point in time , the agent performs an action and the environment
generates an observation and an instantaneous cost , according to some (usually unknown) dynamics.
The aim is to discover a policy for selecting actions that minimizes some measure of a long-term cost;
i.e., the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are
usually unknown, but can be estimated.
More formally, the environment is modeled as a Markov decision process (MDP) with states
and actions with the following probability distributions: the instantaneous cost distribution
, the observation distribution and the transition , while a policy is defined as
conditional distribution over actions given the observations. Taken together, the two define a Markov
chain (MC). The aim is to discover the policy that minimizes the cost; i.e., the MC for which the cost is
minimal.
ANNs are frequently used in reinforcement learning as part of the overall algorithm.
Tasks that fall within the paradigm of reinforcement learning are control problems, games and other
sequential decision making tasks.
Learning algorithms
Training a neural network model essentially means selecting one model from the set of allowed
models (or, in a Bayesian framework, determining a distribution over the set of allowed models) that
minimizes the cost criterion. There are numerous algorithms available for training neural network
models; most of them can be viewed as a straightforward application of optimization theory and
statistical estimation. Recent developments in this field use particle swarm optimization and other swarm
intelligence techniques.
Most of the algorithms used in training artificial neural networks employ some form of gradient
descent. This is done by simply taking the derivative of the cost function with respect to the network
parameters and then changing those parameters in a gradient-related direction.
Temporal perceptual learning relies on finding temporal relationships in sensory signal streams.
In an environment, statistically salient temporal correlations can be found by monitoring the arrival times
of sensory signals. This is done by the perceptual network.
Choice of model: This will depend on the data representation and the application. Overly complex
models tend to lead to problems with learning.
Learning algorithm: There are numerous trade-offs between learning algorithms. Almost any
algorithm will work well with the correct hyperparameters for training on a particular fixed data
set. However selecting and tuning an algorithm for training on unseen data requires a significant
amount of experimentation.
Robustness: If the model, cost function and learning algorithm are selected appropriately the
resulting ANN can be extremely robust.
With the correct implementation, ANNs can be used naturally in online learning and large data set
applications. Their simple implementation and the existence of mostly local dependencies exhibited in
the structure allows for fast, parallel implementations in hardware.
Applications
The utility of artificial neural network models lies in the fact that they can be used to infer a
function from observations. This is particularly useful in applications where the complexity of the data or
task makes the design of such a function by hand impractical.
Real-life applications
The tasks to which artificial neural networks are applied tend to fall within the following broad
categories:
Function approximation, or regression analysis, including time series prediction, fitness
approximation and modeling.
Classification, including pattern and sequence recognition, novelty detection and sequential
decision making.
Data processing, including filtering, clustering, blind source separation and compression.
Robotics, including directing manipulators, Computer numerical control.
Application areas include system identification and control (vehicle control, process control), quantum
chemistry,[2] game-playing and decision making (backgammon, chess, racing), pattern recognition (radar
systems, face identification, object recognition and more), sequence recognition (gesture, speech,
handwritten text recognition), medical diagnosis, financial applications (automated trading systems), data
mining (or knowledge discovery in databases, "KDD"), visualization and e-mail spam filtering.
Theoretical and computational neuroscience is the field concerned with the theoretical analysis and
computational modeling of biological neural systems. Since neural systems are intimately related to
cognitive processes and behavior, the field is closely related to cognitive and behavioral modeling.
The aim of the field is to create models of biological neural systems in order to understand how
biological systems work. To gain this understanding, neuroscientists strive to make a link between
observed biological processes (data), biologically plausible mechanisms for neural processing and
learning (biological neural network models) and theory (statistical learning theory and information
theory).
Types of models
Many models are used in the field defined at different levels of abstraction and modeling different aspects
of neural systems. They range from models of the short-term behavior of individual neurons, models of
how the dynamics of neural circuitry arise from interactions between individual neurons and finally to
models of how behavior can arise from abstract neural modules that represent complete subsystems.
These include models of the long-term, and short-term plasticity, of neural systems and their relations to
learning and memory from the individual neuron to the system level.
Current research
While initial research had been concerned mostly with the electrical characteristics of neurons, a
particularly important part of the investigation in recent years has been the exploration of the role of
neuromodulators such as dopamine, acetylcholine, and serotonin on behavior and learning.
Biophysical models, such as BCM theory, have been important in understanding mechanisms for
synaptic plasticity, and have had applications in both computer science and neuroscience. Research is
ongoing in understanding the computational algorithms used in the brain, with some recent biological
evidence for radial basis networks and neural backpropagation as mechanisms for processing data.
Computational devices have been created in CMOS for both biophysical simulation and neuromorphic
computing. More recent efforts show promise for creating nanodevices for very large scale principal
components analyses and convolution. If successful, these effort could usher in a new era of neural
computing that is a step beyond digital computing, because it depends on learning rather than
programming and because it is fundamentally analog rather than digital even though the first
instantiations may in fact be with CMOS digital devices