0% found this document useful (0 votes)
71 views

Supervised Learning Neural Networks

This document provides an overview of supervised learning neural networks, specifically focusing on perceptrons, Adaline, backpropagation multilayer perceptrons, and radial basis function networks. It describes the architecture and learning rules of each model. Key points include: 1) Perceptrons are the simplest form of neural networks and can only solve linearly separable problems. Multilayer perceptrons using backpropagation can solve more complex, non-linear problems. 2) Adaline is an adaptive linear element that uses a least squares method or gradient descent to minimize error during training. 3) Backpropagation allows efficient training of multilayer perceptrons by propagating error signals backward from the output to earlier layers. 4) Rad

Uploaded by

Lekshmi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Supervised Learning Neural Networks

This document provides an overview of supervised learning neural networks, specifically focusing on perceptrons, Adaline, backpropagation multilayer perceptrons, and radial basis function networks. It describes the architecture and learning rules of each model. Key points include: 1) Perceptrons are the simplest form of neural networks and can only solve linearly separable problems. Multilayer perceptrons using backpropagation can solve more complex, non-linear problems. 2) Adaline is an adaptive linear element that uses a least squares method or gradient descent to minimize error during training. 3) Backpropagation allows efficient training of multilayer perceptrons by propagating error signals backward from the output to earlier layers. 4) Rad

Uploaded by

Lekshmi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Supervised Learning Neural Networks

(Mapping Networks)

Presentation by:
C. Vinoth Kumar
SSN College of Engineering
Perceptrons (Architecture & learning rule)

 The perceptron was derived from a biological brain


neuron model introduced by Mc Culloch & Pitts in 1943

 Rosenblatt designed the perceptron with a view toward


explaining & modeling pattern recognition abilities of
biological visual systems

 The following figure illustrate a two-class problem that


consists of determining whether the input pattern is a “P”
or not
Perceptrons
Perceptrons
 A signal xi is binary, it could be active (or excitatory) if
its value is 1, inactive if its value is 0 and inhibitory is its
value is -1.

 The output unit is a linear threshold element with a


threshold value θ.

 i =n 
Output = f  ∑ wi xi − θ 
 i =1 
 i =n 
= f  ∑ wi xi + wo , w o ≡ −θ
 i =1 
 i =n 
= f  ∑ wi xi , xo ≡ 1
 i =0 
Perceptrons
 wi is a modifiable weight associated with an incoming
signal xi

 The threshold θ = w0 can be viewed as the connection


weight between output unit & a dummy incoming signal
x0 = 1

 f(.) is the activation function of the perceptron & is either


a signum function sgn(x) or a step function step (x)
1 if x > 0
sgn( x ) = 
− 1 otherwise
1 if x > 0
step( x ) = 
0 otherwise
Perceptrons
Perceptrons
 Algorithm for Single-layer perceptron

1. Start with a set of random connection weights.


2. Select an input vector x from the training data set.
3. If the perceptron provides a wrong response then
modify all connection weights wi according to
wi = ηtixi
where: ti is a target output
η is a learning rate
4. Test whether the weight converge: if converge stop else
go to 2 by assigning new weights.

 This learning algorithm is based on gradient descent


Perceptrons
 Rosenblatt proved that there exists a method for tuning
the weights that is guaranteed to converge to provide
the required output if and only if such a set of weights
exist. This is called the perceptron convergence
theorem.

 Single layer perceptrons can only be used for solving


simple (toy) problems.
Perceptrons: Exclusive-OR problem
 Goal: Classify a binary input vector to class 0 if the
vector has an even number of 1’s, otherwise assign it to
class 1

X Y Class
Desired i/o pair 1 0 0 0
Desired i/o pair 2 0 1 1
Desired i/o pair 3 1 0 1
Desired i/o pair 4 1 1 0
Perceptrons: Exclusive-OR problem

 From the following figure, it can be viewed that the XOR


problem is not linearly separable.

 A single layer perceptron cannot be used to construct a


straight line to partition the two-dimensional input space
into two regions, each containing only data points of the
same class.
Perceptrons: Exclusive-OR problem

 Using a single-layer perceptron and the step function to


solve this problem requires satisfying the following
inequalities

0 * w1 + 0 * w2 + w0 ≤ 0 ⇔ w0 ≤ 0
0 * w1 + 1 * w2 + w0 > 0 ⇔ w0 > - w2
1 * w1 + 0 * w2 + w0 > 0 ⇔ w0 ≤ - w1
1 * w1 + 1 * w2 + w0 ≤ 0 ⇔ w0 ≤ - w1 – w2

 This set of inequalities is self-contradictory as a whole.


Perceptrons: Exclusive-OR problem

 The XOR problem can be solved using a two-layer


perceptron illustrated by the following figure.
Perceptrons
(x1 = 0, x2 = 0 ⇒ 0)
results at the hidden layer
0 * (+1) + 0 * (+1) = 0 < 1.5 ⇒ x3 = 0
0 * (+1) + 0 * (+1) = 0 < 0.5 ⇒ x4 = 0
results at the output layer
0 * (-1) + 0 * (+1) = 0 < 0.5 ⇒ x5 = output = 0

(x1 = 0, x2 = 1 ⇒ 1)
results at the hidden layer
0 * (+1) + 1 * (+1) = 1 < 1.5 ⇒ x3 = 0
1 * (+1) + 0 * (+1) = 1 > 0.5 ⇒ x4 = 1
results at the output layer
0 * (-1) + 1 * (+1) = +1 > 0.5 ⇒ x5 = output = 1

In summary, multilayer perceptrons can solve nonlinearly separable problems


and are thus more powerful than the single-layer perceptrons
Adaline (Adaptive linear element)

 Developed by Widrow & Hoff, this model represents a


classical example of the simplest intelligent self-learning
system that can adapt itself to achieve a given modeling
task.
Adaline (Adaptive linear element)
n
output = ∑ w i x i + w 0
i =1

 One possible implementation of ADALINE is the following:


– The input signals are voltages
– The weights wi are conductances of controllable
resistors
– The network output is the summation of the currents
caused by the input voltages
– The problem consists of finding a suitable set of
conductances, such that the input-output behavior of
ADALINE is close to a set of desired input-output data
points
Adaline (Adaptive linear element)

 The ADALINE model can be solved using a linear least-


square method, (n +1) linear parameters in order to
minimize the error m
E p = ∑ (t p − o p ) (m training data)
2

p =1

 However, since this method can be slow (requires too


many calculations) if n is large, therefore Widrow & Hoff
fell back on gradient descent
∂E p
if E p = (t p − o p ) ⇒ = −2(t p − o p )* x i
2
(LMS learning
∂w i procedure)
which provides : w pnext = w pnow + η(t p − o p )* x i
142 4 43 4
g
Backpropagation Multilayer Perceptron

 There was a change in 1985 of the reformulation of the


backpropagation training method by Rumelhart

 The signum and the step functions are not differentiable,


and the use of logistic and hyperbolic functions contribute
for a better learning scheme
– Logistic: f(x) = 1 / (1 + e-x)
– Hyperbolic tangent: f(x) = tanh(x/2) = (1 – e-x) / (1 + e-x)
– Identity: f(x) = x

 The signum function is approximated by the hyperbolic


tangent function & the step function is approximated by
the logistic function
Activation functions for
Backpropagation MLPs

 The applications of backpropagation multilayer perceptron


are pattern recognition (optical character recognition),
signal processing, data compression, automatic control,
etc.
Backpropagation Multilayer Perceptrons
- Learning rule

Principle: The net input x of a node is defined as the


weighted sum of the incoming signals plus a bias term:

x j = ∑ wij xi + w j
i
(Logistic function)
1
x j = f (x j ) =
1 + exp(− x j )

where: xi = ouptput of node i at any of the previous layers


wij = weight associated with the link connecting nodes i & j
wj = bias of node j
Backpropagation Multilayer Perceptrons

Figure: Node j of a backpropagation MLP


Backpropagation Multilayer Perceptrons

 The following figure shows a two-layer backpropagation


MLP with 3 inputs in the input layer, 3 neurons in the
hidden layer & 2 output neurons in the output layer

A 3-3-2 backpropagation MLP


Backpropagation Multilayer Perceptrons
 The square error measure for the pth input-output pair is
defined as: E p = ∑ (d k − x k )
2

where: dk = desired output for node k


xk = actual output for node k when the pth data pair is presented

 Since it is a propagation scheme, an error term ε i for


node i is needed: ∂+Ep
εi =
∂xi

 Using a chain rule derivation, we obtain:


next now now E = ∑ Ep
w k ,i = w − η∇w k,i E = w
k,i k ,i − ηε i x k p
1442443
steepest descent
Radial Basis Function Networks
- Architecture and Learning Methods

 Inspired by research in regions of the cerebral cortex &


the visual cortex, RBFNs have been proposed by
Moody & Darken in 1988 as a supervised learning
neural networks
 The activation level of the ith receptive field unit is:
wi = Ri(x) = Ri (||x – ui|| / σi), i = 1, 2, …, H

• x is a multidimensional input vector


• ui is a vector with same dimension as x
• H is the number of radial basis functions (also called receptive
field units)
• Ri(.) is the ith radial basis function with a single maximum at the
origin
Radial Basis Function Networks
- Single Output RBFN
Radial Basis Function Networks

 Ri(.) is either a Gaussian function


 x − ui 2

Ri ( x) = exp −
G 
 2σ i2 
 
or a logistic function

L 1
R ( x) =
i 2
1 + exp[ x − u i / σ i2 ]

G L
if x = ui ⇒ Ri = 1 (Maximum) & Ri = ½ (Maximum)
Radial Basis Function Networks

 The final output of an RBFN is the weighted sum of the


output value associated with each receptive field
i=H i=H
d( x ) = ∑ c i w i = ∑ c i R i ( x)
i =1 i =1

where ci = output value associated with ith receptive field

 The final output can be calculated using weighted


average i=H i=H
∑ c i w i ∑ c i R i ( x)
d( x ) = i i==1H = i i==1H
∑ wi ∑ R i ( x)
i =1 i =1
Radial Basis Function Networks
 Moody-Darken’s RBFN may be extended by assigning a
linear function to the output function of each receptive
field (ai is a parameter vector & bi is a scalar parameter).
T
ci = ai x + b
 Supervised adjustments of the center & shape of the
receptive field (or radial basis) functions may improve
RBFNs approximation capacity.

 Several learning algorithms have been proposed to


identify the parameters (ui, σI & ci) of an RBFN.
Radial Basis Function Networks
- Functional Equivalence to FIS

 The extended RBFN response given by the weighted


sum or the weighted average is identical to the
response produced by the first-order Sugeno fuzzy
inference system provided that the membership
functions, the radial basis function are chosen correctly.
r r
c i = a i .x + b i

i=H i=H
r r r r
d( x ) = ∑ (a i x + b i ) w i ( x ) = ∑ (u i x + v i )
i =1 i =1
r
[ ]
m T r
where : u i = u i , u i ,..., u i , x = [x1 , x 2 ,..., x m ]
1 2 T
Radial Basis Function Networks
- Functional Equivalence to FIS

 While the RBFN consists of radial basis functions, the FIS


comprises a certain number of membership functions
 The FIS & the RBFN were developed on different bases but
they are rooted in the same soil.
 RBFN & FIS use both of them the same aggregation method
(weighted average & weighted sum)
 The number of receptive field units in RBFN is equal to the
number of fuzzy if-then rules in the FIS
 Each radial basis function of the RBFN is equal to a
multidimensional composite MF of the premise part of a fuzzy
rule in the FIS
Radial Basis Function Networks
- Interpolation and Approximation

 The interpolation case: Each RBF or neuron is assigned


to each training input pattern
 Goal: Estimate a function d(.) that yields exact desired
outputs for all training data
 Our goal consists of finding weight coeffiencients ‘ci’
such that d(xi) = oi = desired output.
 Since wi = Ri (||x – ui ||) = exp [-(x – u i)2 / (2σi2)], we can
write, with xi as centers for the RBFNs,

n  ( x − x i )2 
d ( x) = ∑ ci exp − 2 
i =1  2σ i 
Radial Basis Function Networks
- Interpolation and Approximation

 For given σi (i = 1, 2, 3, …, n), we obtain the following n


simultaneous linear equations with respect to unknown
weights ci (i = 1, 2, …, n)

n  ( x 1 − x i )2 
First pattern d(x 1 ) = ∑ c i exp  − 2  = d1
i =1  2σ i 
n  ( x 2 − x i )2 
Second pattern d(x 2 ) = ∑ c i exp  − 2  = d2
i =1  2σ i 
M
n  ( x n − x i )2 
nth pattern d(x n ) = ∑ c i exp  − 2  = dn
i =1  2σ i 
Radial Basis Function Networks
- Interpolation and Approximation

 Rewriting the preceding in a compact form,


D = GC

D = [d 1 , d 2 ,..., d n ] , C = [c 1 , c 2 ,..., c n ] ,
T T
where

 When the matrix G is non-singular, we have a unique


solution
C = G-1D
Radial Basis Function Networks
- Interpolation and Approximation

 This corresponds to the case when there are fewer basis


functions than there are available training patterns.

 In this case, the matrix G is rectangular & the least


square methods are commonly used in order to find the
vector C.

You might also like