0% found this document useful (0 votes)

8 views96 pages

Neural Networks

Uploaded by

lemitu1904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views96 pages

Neural Networks

Uploaded by

lemitu1904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 96

Artificial Neural Networks –

Basics of MLP, RBF and

Kohonen Networks

Jerzy Stefanowski
Institute of Computing Science
Lecture 13 in Data Mining
for M.Sc. Course of SE
version for 2010
Acknowledgments
• Slides are also based on ideas coming from
presentations as:
– Rosaria Silipo: Lecture on ANN. IDA Spring School 2001
– Prévotet Jean-Christophe (Paris VI): Tutorial on Neural
Networks
– Włodzisław Duch: Lectures on Computational Intelligence
– Few others
• and many of my notes for a course on Machine
Learning and Neural Networks (Polish Language
ISWD – see my personal web page for more slides)
Outline
• Introduction
– Inspirations
– The biological and artificial neurons
– Architecure of networks and basic learning rules
• Single Linear and Non-linear Perceptrons
– Delta learning rule
• MultiLayer Perceptrons
– MLPs and Back-Propagation
– Tuning parameters of BP
• Radial Basis Functions
– Architectures and learning algorithms
• Competitive Learning
– Competitive Learning, LVQ, Kohonen self-organizing maps.
• Applications and Software Tools
• Final Remarks
Introduction
• Some definitions
– “… a system composed of many simple processing
elements operating in parallel whose function is
determined by network structure, connection strengths,
and the processing performed at computing elements or
nodes.” - DARPA (1988)
– A neural network: A set of connected input/output units
where each connection has a weight associated with it
• During the learning phase, the network learns by
adjusting the weights so as to be able to predict the
correct class output of the input signals
Some properties
• Some points from definitions
– Many neuron-like threshold switching units
– Many weighted interconnections among units
– Highly parallel, distributed process
– Emphasis on tuning weights automatically
– …
When to Consider Neural Networks
• Input: High-Dimensional and Discrete or Real-Valued
– e.g., raw sensor input
– Conversion of symbolic data to quantitative (numerical) representations possible
• Output: Discrete or Real Vector-Valued
– e.g., low-level control policy for a robot actuator
– Similar qualitative/quantitative (symbolic/numerical) conversions may apply
• Data: Possibly Noisy
• Target Function: Unknown Form
• Result: Human Readability Less Important Than Performance
– Performance measured purely in terms of accuracy and efficiency
– Readability: ability to explain inferences made using model; similar criteria
• Examples
– Speech phoneme recognition
– Image classification
– Time signal prediction, Robotics, and many others
Autonomous Learning Vehicle
•
in
Pomerleau et al
a Neural Net (ALVINN)
– http://www.cs.cmu.edu/afs/cs/project/alv/member/www/projects/ALVINN.html
– Drives 70mph on highways

Hidden-to-Output Unit
Weight Map
(for one hidden unit)

Input-to-Hidden Unit
Weight Map
(for one hidden unit)
Image Recognition and Classifiation
of Postal Codes

Examples of handwritten postal codes

drawn from a database available from the US Postal service
Example:Neural Nets for Face Recognition
Left Straight Right Up
Output Layer Weights (including w0 = θ) after 1 Epoch

Hidden Layer Weights after 25 Epochs

30 x 32 Inputs

Hidden Layer Weights after 1 Epoch

• 90% Accurate Learning Head Pose, Recognizing 1-of-20 Faces

• http://www.cs.cmu.edu/~tom/faces.html
Example:NetTalk
• Sejnowski and Rosenberg, 1987
• Early Large-Scale Application of Backprop
– Learning to convert text to speech
• Acquired model: a mapping from letters to phonemes and stress marks
• Output passed to a speech synthesizer
– Good performance after training on a vocabulary of ~1000 words
• Very Sophisticated Input-Output Encoding
– Input: 7-letter window; determines the phoneme for the center letter and context on
each side; distributed (i.e., sparse) representation: 200 bits
– Output: units for articulatory modifiers (e.g., “voiced”), stress, closest phoneme;
distributed representation
– 40 hidden units; 10000 weights total
• Experimental Results
– Vocabulary: trained on 1024 of 1463 (informal) and 1000 of 20000 (dictionary)
– 78% on informal, ~60% on dictionary
• http://www.boltz.cs.cmu.edu/benchmarks/nettalk.html
ANN and Mining Data
• ANN originally comes from AI and ML
• Data Mining and Exploration of Data
– We can meet numerical (at least partly) data, …
– Tasks of function approximation, pattern
classification, etc are also similar
• ANN are very good approximators or classifiers
– However, remember about time cost,
parameterization, black boxes, …
Examples of Different ANN
• Perceptron
• Multi-Layer Perceptron
• Radial Basis Function (RBF)
• Kohonen Features maps
• Other architectures, e.g.
– Hopfield networks and BAM
– ART
Looking at ANN
• ANN could be defined by:
– Model of artificial network (details of its
component and processing)
– Topology / architecture of the network
– Learning
Biological Inspirations
• Humans perform complex tasks like vision,
motor control, or language understanding
very well

• One way to build intelligent machines is to

try to imitate the (organizational principles
of) human brain
Biological inspirations
• Some numbers…
– The human brain contains about (or over) 10 billion
nerve cells (neurons)
– Each neuron is connected to the others through 10000
synapses

• Properties of the brain

– It can learn, reorganize itself from experience
– It adapts to the environment
– It is robust and fault tolerant
Biological neuron

synapse
synapse axon
axon
nucleus
nucleus

cell
cellbody
body

dendrites
dendrites

• A neuron has
– A branching input (dendrites)
– A branching output (the axon)
• The information circulates from the dendrites to the axon via the
cell body
• Axon connects to dendrites via synapses
– Synapses vary in strength
– Synapses may be excitatory or inhibitory
The Action Potential
Human Brain
• The brain is a highly complex, non-linear, and parallel computer,
composed of some 1011 neurons that are densely connected (~104
connection per neuron). We have just begun to understand how
the brain works...
• A neuron is much slower (10-3sec) compared to a silicon logic
gate (10-9sec), however the massive interconnection between
neurons make up for the comparably slow rate.
– Complex perceptual decisions are arrived at quickly (within a
few hundred milliseconds)
• 100-Steps rule: Since individual neurons operate in a few
milliseconds, calculations do not involve more than about 100
serial steps and the information sent from one neuron to another is
very small (a few bits)
• Plasticity: Some of the neural structure of the brain is present at
birth, while other parts are developed through learning, especially
in early stages of life, to adapt to the environment (new inputs).
The Artificial Neuron
(Mc Culloch and Pitt, 1943)
x1

Σ
y
a
wx y

u ab
wn
xn
⎛ n ⎞ ⎛ n
⎞
y (t + 1) = f ⎜ ∑ w k x k ⎝ t ⎠ − u ⎟ = f ⎜ ∑ w k x k (t )⎟
⎛⎜ ⎞⎟

⎝ k =1 ⎠ ⎝ k =0 ⎠
Activation Functions
⎧+ 1 if a≥u
• Step function f (a ) = ⎨
⎩− 1 if a<u

⎧+ 1 if a ≥ u
⎪
• Linear function f (a ) = ⎨a if − u ≤ a < u
⎪− 1 a < −u
⎩ if

1
• Logistic Sigmoid f (a ) = − ha
1+ e

a−u
−
• Gaussian f (a ) = e 2σ 2
Activation functions
20

16
Linear 1
14

y=x
12

6 -1
4

0
0 2 4 6 8 10 12 14 16 18 20

1,2
Sigmoidal (logistic) Step function
1
0,8
f(e) 0,6
0,4
1
0,2
0 y=
-6 -4 -2 0 2 4 6
1 + exp(− β x)
e

1.5

1
Hyperbolic tangent
0.5

0
exp( x) − exp(− x)
-0.5
y=
-1.5
-1

exp( x) + exp(− x)
-2
-10 -8 -6 -4 -2 0 2 4 6 8 10
Network topologies
Feed Forward Neural Networks
• The information is
Output layer propagated from the
inputs to the outputs
2nd hidden • Computations of No non
layer linear functions from n
input variables by
1st hidden compositions of Nc
layer algebraic functions
• Time has no role (NO
cycle between outputs and
inputs)
x1 x2 ….. xn
Network topologies
Recurrent Neural Networks
• Can have arbitrary topologies
• Can model systems with
internal states (dynamic ones)
0 1 • Delays are associated to a
0 specific weight
0 • Training is more difficult
1
• Performance may be
0 problematic
0 1 – Stable Outputs may be more
difficult to evaluate
x1 x2 – Unexpected behavior
(oscillation, chaos, …)
Learning neural networks
• The procedure that consists in estimating the parameters of
neurons (usually weights) so that the whole network can
perform a specific task

• Basic types of learning

– The supervised learning
– The unsupervised learning

• The Learning process (supervised)

– Present the network a number of inputs and their corresponding
outputs
– See how closely the actual outputs match the desired ones
– Modify the parameters to better approximate the desired outputs
The ANN Learning Process
• Neurons can learn, (Hebb, 1949):
– memory is stored in synapses and learning takes place by synaptic
modifications;
– neurons become organized into larger configurations to perform more
complex information processing
Hebbian learning:
When two joining cells fire
simultaneously, the connection
between them strengthens (Hebb,
1949)
Discovered at a biomolecular
level by Lomo (1966) (Long-term
potentiation).

US
UR

CS
Supervised Learning of Neurons
Let us suppose that a sufficiently large set of examples (training
set) is available.
Supervised learning:
– The network answer to each input pattern is directly
compared with the desired answer and a feedback is given to
the network to correct possible errors

Weights matrix
x and y

Required
Error output
y-d d
Perceptron
y1 y2 yp-1 yp

1 2 ... p-1 p
wp-1,1
w1,1 w1,n wp,1
... ...
w2,1w1,2
...
w1,n-1 wp,n

...
x1 x2 xn-1 xn

⎛ n ⎞
y i (t + 1 ) = f ⎜ ∑ w ik x k (t )⎟ i = 1, 2 , ... p
⎝ k =0 ⎠
What a Single Perceptron Does
• Classification: y=1 if
• Regression: y=wx+w0
y
(wx+w0>0)
y
s y
w0 w0
w w
x
w0
x x
x0=+1
1
y = sigmoid (o ) =
[
1 + exp − w T x ]
Perceptron
+
• Rosenblatt (1962) ++
+ + y = +1
+ +
• Linear separation + + + ++ +
+
+ + +
• Inputs :Vector of real values + + ++ +
+ + + + ++
• Outputs :1 or -1 +
+ +
+
+
y = f (o) y = −1 +
++

w0 + w1 x1 + w2 x2 = 0
∑ o = w0 + w1x1 + w2 x2
w0 w1 w2
1 x 1
x2
Error Function
• Training set: T = { (x q , d q ) q = 1, 2 , ..., m }
• Error Measure:
E (W ) = f (o iq − d iq )

E(W)

E(W*)
W* W
Gradient Descent algorithm
• Simple Gradient Descent Algorithm
– Applicable to different type of learning (with proper representation)

• Algorithm Train-Perceptron (D ≡ {<x, o(x) ≡ d(x)>})

– Initialize all weights wi to random values
– WHILE not all examples correctly predicted DO
FOR each training example x ∈ D
Compute current output o(x)
FOR i = 1 to n
wi ← wi + r(t - o)xi //delta perceptron learning rule

• Definition: Gradient

r ⎡ ∂E ∂E ∂E ⎤
∇E [w ] ≡ ⎢ , ,K , ⎥
∂
⎣ 0w ∂w 1 ∂w n⎦
Gradient Descent algorithm
The RMS error function:

∑ ∑ (o ) = ∑ E (W )
1 m p m
E (W ) = i
q
−d i
q 2 q

2 q =1 i =1 q =1

The learning process (stepwise looking for solution):

w ik (t + 1 ) = w ik (t ) + ∆ w ik (t )

The gradient descent algorithm:

∂ E (t ) m
∂ E q (t ) m
∆ w ik (t ) = − η = −η ∑ = ∑ ∆ w ikq (t )
∂ w ik q =1 ∂ w ik q =1
Delta Learning Rule (Widrow,Hoff)
⎛ 1 ⎞
∑ (o )
p
2
∂ ⎜⎜ q
− d q
⎟⎟
∂E
i i
q
⎝ 2 i =1 ⎠ =
=
∂ w ik ∂ w ik

= Αfter some computations --

∆ w ikq = − η δ iq x kq

In literature: Error is usually calculated as (d – o),

and delta learning rule will be given in a form:

∆ q
w ik =η δ i
q q
xk
Learning Rate (η)
∆w1= η1 δ x with η1 too small
∆w2= η2 δ x with η2 right size
∆w3= η3 δ x with η3 too big

E(W) ∆W1
∆W2

E(W*) ∆W3

W(q) W* W
x
• The standard perceptron learning +
+2
+ +
++
algorithm converges if examples + +
+
- -
- - -
are linearly separable → see
+ --
+ x1
- - -
- - -
• Consider an example of a simple -
-
-

logical AND problem Linearly Separable (LS)

Data Set
Perceptron limitations [Minski,Papert]
The XOR function: the non-linear separability problem

y x1
1
1

w1 w2
w1 x1 + w2 x2 -w0 = 0

x1 x2 0
x2
0 1
Need for constructing MLP
1
oi (t ) =
1 + e −( net i ( t ) −θ ) /τ

The solution – 2 layered

network with non-linear
Functions
However → how to learn
weights in such networks

x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)

The Universality Property
• A two layer feed-forward neural network with step
activation functions can implement any Boolean
function, provided that the number of hidden
neurons H is sufficiently large (Mc Culloch and Pitts,
1943) .
• If the input variables are continuous in [0,1] and
the activation function is the logistic sigmoid, it
can be proven that any continuous decision
boundary can be approximated arbitrarily close by
a two-layer Perceptron with a sufficient number H
of hidden neurons (Cybenko, 1989) .
MultiLayer Perceptrons
y1 y2 yp-1 yp

1 2 ... p-1 p
... ...
1 ... H

... ...
...
x1 x2 xn-1 xn
Non-linear regression mapping
Output of a generic MLP neuron in layer l

⎛ n ( l −1 ) ⎞
y i = f i (a i ) = f i ⎜⎜ ∑ w ik o k ⎟⎟ i = 1, ..., n (l ) k = 1, ..., n (l − 1 )
⎝ k =0 ⎠

Two-layer MLP, only one output unit with linear

activation function.
n (1 ) n (1 )
⎛ n (0 ) ⎞
y 1 = b1 = ∑w 1k ok = ∑w 1k f k ⎜⎜ ∑ v kj x j ⎟⎟ =
k =0 k =0 ⎝ j=0 ⎠

(vr )
n (1 )
r
= ∑w
k =1
1k fk T
k x + vk 0 + w0
Back propagation (I)
(The Generalized Delta Rule )
Gradient Descent formula for a weight wik connecting units
from two generic layers l and l-1 (i∈layer l, k ∈layer l-1)
after presentation of training pattern q.

∂E q
∆ w ikq = −η
∂ w ik

Now calculations should take into account activation

function.

∂E q ∂ E q ∂ a iq
=
∂ w ik ∂ a iq ∂ w ik
Back Propagation (II)
∂ a iq ∂E q
= o kq δ q
=
∂ w ik ∂ a iq
i

∂E q
∆ w ikq = −η = −η δ q
o kq
∂ w ik
i

For output units (i∈layer L) – generalized delta learning rule:

∂E q
δ i
q
=
∂aiq
= f ' ( )(o
a i
q
i
q
− d iq )
Multi-Layer Perceptron
• One or more hidden
layers
Output layer
• Where can we use
generalized delta rule?
2nd hidden • Where can we
layer compute error?
1st hidden
layer We do not know the desired
answers of the hidden layer and
therefore we can not estimate the
error function.
Input data
Back Propagation (III)
We do not know the desired answers of the hidden layer and
therefore we can not estimate the error function.

For hidden units (i∈layer l < L):

n (l + 1 )
∂E q
∂E q
∂a q

δ i
q
=
∂aiq
= ∑
j =1 ∂a q
∂b q
j
=
j i

n (l + 1 )
∂a q

= ∑
j =1
δ q
j
∂a q
j
=
i
n (l + 1 )
= f ' a iq ( )∑ δ q
j w ji
j =1
Back Propagation
(forward phase)
y1 y2 yp-1 yp

1 2 ... p-1 p
... ...
1 ... H

... ...
...
x1 x2 xn-1 xn
Back Propagation
(backward phase)
δ1 δ2 δp-1 δp

1 2 ... p-1 p
... ...
1 ... H

... ...
...
x1 x2 xn-1 xn
Elements of Backpropagation
• The set of learning examples is usually showed to
the algorithms several times (iterations → epochs)
/ sometimes thousands
• The order of showing examples is randomly
shuffled
• Stopping conditions
– Threshold for RMS (should be smaller than …)
– Max no. of iterations
– Classification evaluations
Tuning learning rate
• Too small – local minimum of error
• Too large – oscillations and unable to go inside the
global minimum
• Some solutions
– Slowly decreasing the rate with epoches (time)

E Sp

wi
Learning Rate and Momentum Term
∆W1
E(W)
∆W2

∆W3
E(W*)

W(q) W* Wmin W

∂E q
∆ w ikq = −η + α ∆ w ikq − 1
∂ w ik
Different non linearly separable
problems and number of layers
Types of Exclusive-OR Classes with Most General
Structure
Decision Regions Problem Meshed regions Region Shapes
Single-Layer Half Plane A B
Bounded By B
A
Hyperplane B A

Two-Layer Convex Open A B

Or B
A
Closed Regions B A

Three-Layer Abitrary
A B
(Complexity B
Limited by No. A
B A
of Nodes)
Neural Networks – An Introduction Dr. Andrew Hunter
Over-fitting
A too large number of parameters can memorize all the examples
of the training set with the associated noise, errors and
inconsistencies

E(u)

Test set

E(u*) Validation set

Training set
u* Training step u
Overtraining in ANNs
• Recall: Definition of Overfitting
– h’ worse than h on Dtrain, better on Dtest
• Overtraining: A Type of Overfitting
– Due to excessive iterations
– Avoidance: stopping criterion
(cross-validation: holdout, k-fold)
– Avoidance: weight decay Error versus epochs (Example 1)
Choosing the number of neurons
Network size
The universality property requires
a sufficient number of hidden neurons.

• Pruning algorithms
Start with a large network and gradually remove weights or
complete units that do not seem to be necessary
– Sensitivity methods
– Penalty-term methods

• Growing algorithms
Start from a small architecture and allow new units to be added
when necessary.
Neural Network as a Classifier
• Weakness
– Long training time
– Require a number of parameters typically best determined empirically,
e.g., the network topology or ``structure."
– Poor interpretability: Difficult to interpret the symbolic meaning behind
the learned weights and of ``hidden units" in the network
• Strength
– High tolerance to noisy data
– Ability to classify untrained patterns
– Well-suited for continuous-valued inputs and outputs
– Successful on a wide array of real-world data
– Algorithms are inherently parallel
– Techniques have recently been developed for the extraction of rules from
trained neural networks
Knowledge Extraction
– Global approach
A tree of symbolic rules is built to represent the whole network. Each
rule is then tested against the network behavior until most of training
space is covered.
Disadvantage: huge trees.

– Local approach
The original MLP is decomposed into a series of smaller usually
single layered sub-networks. The incoming weights form the
antecedent of a symbolic rule for each unit. Those rules are gradually
combined together to define a more general set of rules that describes
the network as a whole.
Disadvantage: Because of the distributed knowledge in an ANN
hidden units do not typically represent clear logic entities.
RBF networks
• This is becoming an increasingly popular neural network
with diverse applications and is probably the main rival to the
multi-layered perceptron
• Much of the inspiration for RBF networks has come from
traditional statistics and pattern classification techniques
(mainly local methods for non-parametric regression)
• These include function approximation, regularization
theory, density estimation and interpolation in the presence
of noise [Bishop, 1995]
• Cover Theorem on non-linear projections into new feature
space where difficult decision boundaries maybe linear
separable
Numerical approximation of functions
• Consider N data points characterized by p features
{xi ∈ R m i = 1,K, N }
• and corresponding N outputs (real values)
{di ∈ R | i = 1,K, N }
• The aim is to find an unknown function (mapping)
f ( xi ) = di ∀i = 1,K, N
• Complicated functions construct from simple
building blocks (local approximations)
Function Approximation with
Radial Basis Functions
RBF Networks approximate functions using (radial) basis functions
as the building blocks.
On Exact Interpolation of
• RBFs have their origins in techniques for performing
exact function interpolation [Bishop, 1995]:
– Find a function h(x) such that
h(xn) = tn ∀n=1, ... N

• Radial Basis Function approach (Powel 1987):

– Use a set of N basis functions of the form φ(||x-xn||), one for
each point,where φ(.) is some non-linear function.
– Output: h(x) = Σn wn φ(||x-xn||)
Radial Basis Function Networks
Goal: each hidden unit k should represent a cluster k in the
input space, for example by containing its prototype xk.

y1 yp
r H r
yi ( x ) = ∑ wik Φk (x ) + wi 0
k =1 1 ... p
... ...
Φ k ( x ) = Φ k ( x − xk )
r r r
Φ1 ... Φ H

−
r r
x−µk ... ...
r 2σ k2 ...
Φ k (x) = e x1 x2 xn-1 xn
Typical radial functions
Simple radial
Examples: Inverse multiquadratic
Multiquadratic
h( r ) = r = X − X i Gauss
Thin splines (cienkiej płytki)
h ( r ) = (σ + r
2 2 −α
) , α >0

h ( r ) = (σ + r
2 2 β
) , 1> β > 0
− ( r / σ )2
h( r ) = e
h( r ) = (σ r ) 2 ln(σ r )
RBFNs and MLPs

• Locality. In RBFNs only a small fraction of Φk is active

for each input vector => more efficient training algorithms
• Separation surfaces. MLP produces open separation
surfaces vs. RBFNs closed separation surfaces
• Approximation capability. The universality property
still holds for RBFNs if a sufficient number of Φk is given.
• Interpretability. RBFNs are easier to interpret than
MLPs. Φk can be interpreted as p(cluster k| x) and wik as
p(Ci|cluster k)
MLPs versus RBFs
• Classification
– MLPs separate classes via
hyperplanes
– RBFs separate classes via
hyperspheres X2 MLP
• Learning
– MLPs use distributed learning
– RBFs use localized learning
– RBFs train faster
X1
• Structure
– MLPs have one or more hidden
layers X2
– RBFs have only one layer RBF
– RBFs require more hidden neurons
=> curse of dimensionality
X1
The hybrid learning strategy
1. Unsupervised training of the RBF parameters
– K-means clustering algorithm
– Mixtures of Gaussians
– Kohonen Competitive learning

2. Supervised training of the weights connecting

the hidden and the output layer
– Back-Propagation
– Or a special mathematical approaches to
solve matrix equations!
RBF units Unsupervised Training
• K-means algorithm.
r 1 rq 1 H
r r
µk =
Nk
∑
q∈ S k
x σ =
H
∑ µi − µ j
i =1

H
rq r
J =∑∑
2
x − µk
k =1 q∈S k

• Mixtures of Gaussians.
rq
( )
m
l=∏p x
r H
r r
o (x ) = ∑ α j (x ) Φ j (x )
j =1 q =1
RBFNs Training Algorithms (I)
• Modified Back-Propagation.
The corresponding expressions of the partial
derivatives of the error function have to be
evaluated and included into the gradient descent
procedure.

• Orthogonal Least Square Algorithm.

RBF units are sequentially introduced. At the first step
each RBF is centered on one training pattern; the
RBF unit with smallest error is retained. The
algorithms continues on the remaining training data.
RBF analysis of sinus function
• Following lecture og prof. A.Bartkowiak Uniw.
Wrocławski
RBF analysis of sinus function (2)
Tasks for ANN
• Pattern classification
• Function approximation
• Time series and
forecasting
• Clustering
• Multidimensional
Projections
• Association memory
• Content addressed
memory
• Control strategies
• ..
Unsupervised ANN Learning
• Unsupervised Learning
If the desired answers are not available, not even for a subset
of data to use as training set, we use unsupervised learning.

• Similarity and Correlation

The network should organize the training data into clusters on
the basis of similarity and correlation criteria.

•Redundancy
This can happen only if there is redundancy in the training data

• Hebbian Learning and Competitive Learning

Standard Competitive Learning
(winner-take-all)
y1 y2 yp w ij ≥ 0
1 2
... p
n
rT r
... ai = ∑j =1
w ij x j = w i x
x1 x2 xn
⎧⎪1
yi = ⎨
rT r
if : w i x = max k = 1 ,..., p (
rT r
wk x )
⎪⎩ 0 otherwise

r r r r
r ⎧⎪1 if : w i − x = min k = 1 ,..., p wk − x
if w i = 1 yi = ⎨
⎪⎩ 0 otherwise
SCL: Training algorithm
Goal:

(rq
)
T rq rq
( T rq
)
rq
((rq
) (
w i (t ) x ≤ w i (t + 1 ) x = w i (t ) + ∆ w i (t ) x
T T rq
) )

rq ⎧ rq rq
⎪ η x − η wi (t )
∆wi (t ) = ⎨
( )
r q T rq
(( )
r q T rq
if : wi (t ) x = maxk wk (t ) x )
⎪⎩0 otherwise

η > 0 (usually 0.1 < η < 0.7)

SCL Training algorithm
y1 y2 y3

1 w2 2 p w3
w1

w3 x1 x2 x3 w1
w3

before after
training training

w1
w2
w2
Learning Vector Quantization (LVQ)

LVQ is the Supervised extension of the

winner-take-all learning algorithm.

( rq r q
)
⎧ + η (t ) x − wi (t ) if : class of unit i q is correct
⎪
rq
( rq r q
)
∆wi (t ) = ⎨− η (t ) x − wi (t ) if : class of unit i q is incorrect
⎪
⎩0 if : i q
is not a winner
Improved LVQ
The class of the input vector q is different from the class represented
by winner unit i, but it is the same as close unit j.
rq
(rq r q
∆wi (t ) = − η (t ) x − wi (t ) )
rq
(rq r q
∆w j (t ) = + η (t ) x − w j (t ) )
r
∆wk (t ) = 0 k ≠ i, j

The class of the input vector q is the same as winner unit i and
close unit j.
rq
( rq r q
∆wh (t ) = + ε η (t ) x − wh (t ) ) h = i, j
r
∆wk (t ) = 0 k ≠ i, j
Kohonen Self-Organizing Maps
• Architecture:
– Kohonen maps consist of a two-dimensional array of
neurons, fully connected, with no lateral connections,
arranged on a squared or hexagonal lattice
• Learning algorithm:
– follows the winner-take-all strategy
– forces close neurons to fire for similar inputs (Self-
Organizing Maps)
• Properties:
– The topology of the input space is preserved
Self organizing maps
• The purpose of SOM is to map a
multidimensional input space onto a
topology preserving map of neurons
– Preserve a topological so that
neighboring neurons respond to «
similar »input patterns
– The topological structure is often a 2 or
3 dimensional space
– the distance and proximity relationship
(i.e., topology) are preserved as much as
o
possible x=dane o
x o o
N-wymiarowa
przestrzeń danych
xo
• Similar to specific clustering: cluster o=pozycje wag
neuronów
o
o x
o x
o
o o
centers tend to lie in a low- o
wagi wskazują
na punkty w N-D
dimensional manifold in the feature
space siatka neuronów
w 2-D
• The activation of the
neuron is spread in its
direct neighborhood
=>neighbors become
sensitive to the same input
patterns
• Block distance
• The size of the
neighborhood is initially
large but reduce over time
=> Specialization of the
network 2nd neighborhood

Visualisation of an influence of different

patterns on neuron outputs
First neighborhood
SOM Learning Algorithm
Winner take-all learning rule
rq
(
∆wk (t ) = +η (t ) Λ k , i q , t )(
rq r q
x − wk (t ) ) for all units k

⎛ − r − rq 2
⎞
(
Neighborhood function Λ k , i q , t = exp ⎜ )k i
⎜⎜ 2σ (t )2
⎟
⎟⎟
⎝ ⎠

1 m rq r q
Q = ∑ x − wi
2
Quantization Error
m q =1

Average Distortion
1 m
D = ∑ Λ iq ,iq ,t
m q =1
( ) rq r q
x − wi
2
SOM algorithm
XT=(X1, X2 .. Xd), samples from feature space.
Create a grid with nodes i = 1 .. K in 1D, 2D or 3D,
each node with d-dimensional vector W(i)T = (W1(i) W2(i) .. Wd(i)),
W(i) = W(i)(t), changing with t – discrete time.

1. Initialize: random small W(i)(0) for all i=1...K.

Define parameters of neighborhood function h(|ri−rc|/σ(t),t)
2. Iterate: select randomly input vector X
3. Calculate distances d(X,W(i)), find the winner node W(c) most
similar (closest to) X
4. Update weights of all neurons in the neighborhood O(rc)
5. Decrease the influence h0(t) and shrink neighborhood σ(t).
6. If in the last T steps all W(i) changed less than ε stop.
Where to use SOM
• Natural language processing: linguistic analysis, parsing, learning
languages, hyphenation patterns.
• Optimization: configuration of telephone connections, VLSI
design, time series prediction, scheduling algorithms.
• Signal processing: adaptive filters, real-time signal analysis,
radar, sonar seismic, USG, EKG, EEG and other medical signals
...
• Image recognition and processing: segmentation, object
recognition, texture recognition ...
• Content-based retrieval: examples of WebSOM, Cartia,
VisierPicSom – similarity based image retrieval.

• More on SOM – see earlier lecture on clustering

Software Tools
• Commercial products, e.g.
– Matlab Toolbox
– Statistica Neural Networks
– Peltarion Synapse
– NeuroXL
– …
• Many others
– Sttugart Neural Simulator
– Limitted options WEKA, RapidMiner
– Many university projects.e.g NuClass7 Arlington US
– …
SSN (Univ. Sttugart)
Components in Process
Constructing and Learning ANN in Synapse
• German credit data (UCI repository) – prediction of paying loans by bank
customers / 700 good decisions and 300 bad ones
Hardware
• Usually more costly
• Specialized electronic devices
• Need for a real, popular application
• However, FPGA implementing ?
Applications
• Aerospace
– High performance aircraft autopilots, flight path simulations, aircraft
control systems, autopilot enhancements, aircraft component
simulations, aircraft component fault detectors
• Automotive
– Automobile automatic guidance systems, warranty activity analyzers
• Banking
– Check and other document readers, credit application evaluators
• Defense
– Weapon steering, target tracking, object discrimination, facial
recognition, new kinds of sensors, sonar, radar and image signal
processing including data compression, feature extraction and noise
suppression, signal/image identification
• Electronics
– Code sequence prediction, integrated circuit chip layout, process
control, chip failure analysis, machine vision, voice synthesis, nonlinear
modeling
Applications
• Financial
– Real estate appraisal, loan advisor, mortgage screening, corporate bond
rating, credit line use analysis, portfolio trading program, corporate
financial analysis, currency price prediction
• Manufacturing
– Manufacturing process control, product design and analysis, process
and machine diagnosis, real-time particle identification, visual quality
inspection systems, beer testing, welding quality analysis, paper quality
prediction, computer chip quality analysis, analysis of grinding
operations, chemical product design analysis, machine maintenance
analysis, project bidding, planning and management, dynamic
modeling of chemical process systems
• Medical
– Breast cancer cell analysis, EEG and ECG analysis, prosthesis design,
optimization of transplant times, hospital expense reduction, hospital
quality improvement, emergency room test advisement
Applications
• Robotics
– Trajectory control, forklift robot, manipulator controllers, vision
systems
• Speech
– Speech recognition, speech compression, vowel classification,
text to speech synthesis
• Securities
– Market analysis, automatic bond rating, stock trading advisory
systems
• Telecommunications
– Image and data compression, automated information services,
real-time translation of spoken language, customer payment
processing systems
• Transportation
– Truck brake diagnosis systems, vehicle scheduling, routing
systems
Conclusions
• ANNs are roughly based on the simulation of biological
nervous systems
• An equivalence can be established between many ANN
paradigms and statistical analysis techniques
• Perceptron as a non-linear regression function
• The auto-associator projects input data onto a PC space
• RBFNs can be interpreted as statistical classifiers
• etc …
• ANNs drawbacks:
• Lack of criteria to define the optimal network size =>
genetic algorithms?
• Many parameters to tune
• Hard interpretation of the ANN analysis process => fuzzy
models?
• Time and cost computational requirements
References
• J. Hertz, A. Krogh, R.G. Palmer, “Introduction to the
theory of Neural Computation”, Addison-Wesley, 1991.

• C.M. Bishop, “Neural Networks for pattern

recognition”, Oxford University Press, New York, 1995.

• S. Haykin, “Neural Networks, a comprehensive

foundation”, IEEE Press, 1994.

• J.M. Zurada, R.J. Marks, C.J. Robonson Eds.,

“Computational Intelligence imitating life”, IEEE Press,
New York, 1994.
• Many others
References in Polish Language
• Osowski Stanisław: Sieci Neuronowe do przetwarzania
informacji. Warszawa 2000
• J.M. Zurada, Baruch: Sztuczne sieci neuronowe, PWN.
• Several books by R.Tadeusiewicz
• Krawiec K., Stefanowski J.: Uczenie maszynowe i sieci
neuronowe, Wydawnictwo Politechniki Poznańskiej, Poznań
2004
• WWW teaching materials,np:
• prof. Włodzisław Duch, UMK Toruń
• prof. Anna Bartkowiak UWr Wrocław
• My own slides for II part of the course Machine
Learning
• Many others
Any questions, remarks?

Neural Networks
100% (1)
Neural Networks
57 pages
What Actions Can Human Brain Do?: Trained
No ratings yet
What Actions Can Human Brain Do?: Trained
40 pages
Neural Networks
No ratings yet
Neural Networks
75 pages
Lecture 25 - Artificial Neural Networks
No ratings yet
Lecture 25 - Artificial Neural Networks
42 pages
Introduction to Neural Networks
No ratings yet
Introduction to Neural Networks
125 pages
Neural-Networks.ppt-Compatibility-Mode-Repaired
No ratings yet
Neural-Networks.ppt-Compatibility-Mode-Repaired
55 pages
Lntroduction NN
No ratings yet
Lntroduction NN
96 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
22 pages
Unit 2ANNs
No ratings yet
Unit 2ANNs
169 pages
Neural-Networks - First Lecture
No ratings yet
Neural-Networks - First Lecture
54 pages
CO2- ANN Structure and Funadamentals_P1
No ratings yet
CO2- ANN Structure and Funadamentals_P1
65 pages
ML.Unit-5
No ratings yet
ML.Unit-5
22 pages
Notes ML 24 Slides RNN ANN
No ratings yet
Notes ML 24 Slides RNN ANN
78 pages
2-ANN - 1-14-12-2024
No ratings yet
2-ANN - 1-14-12-2024
34 pages
Lec 1
No ratings yet
Lec 1
57 pages
Ann
No ratings yet
Ann
23 pages
IOQM Polynomials Lec-2
No ratings yet
IOQM Polynomials Lec-2
11 pages
Soft Computing: by K.Sai Saranya, Assistant Professor, Department of CSE
No ratings yet
Soft Computing: by K.Sai Saranya, Assistant Professor, Department of CSE
127 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
Neural Network
No ratings yet
Neural Network
44 pages
unit 4- DL
No ratings yet
unit 4- DL
33 pages
Neural Network: Sudipta Roy
No ratings yet
Neural Network: Sudipta Roy
64 pages
Artificial-Neural-Network (1)
No ratings yet
Artificial-Neural-Network (1)
21 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
57 pages
AI Mod4 Session 8 Best Fit Line & ANN
No ratings yet
AI Mod4 Session 8 Best Fit Line & ANN
39 pages
Lecture 01-Introduction
No ratings yet
Lecture 01-Introduction
33 pages
Lecture Slides-Week13,14
No ratings yet
Lecture Slides-Week13,14
62 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
34 pages
DM Lecture 09
No ratings yet
DM Lecture 09
36 pages
NN Lecture1 Introduction
No ratings yet
NN Lecture1 Introduction
40 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
43 pages
Unit 9 - Neural Network
No ratings yet
Unit 9 - Neural Network
53 pages
12 Neural Network
No ratings yet
12 Neural Network
52 pages
Lecture Notes SC
No ratings yet
Lecture Notes SC
21 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Ann Today
No ratings yet
Ann Today
30 pages
1 - Introduction
No ratings yet
1 - Introduction
34 pages
Unit 1
No ratings yet
Unit 1
25 pages
Artificial Intelligence Artificial Neural Networks - : Introduction
No ratings yet
Artificial Intelligence Artificial Neural Networks - : Introduction
43 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
31 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
48 pages
Neural Network: Prepared By: Nikita Garg M.Tech (CS)
No ratings yet
Neural Network: Prepared By: Nikita Garg M.Tech (CS)
29 pages
Introduction To Neural Networks
100% (1)
Introduction To Neural Networks
46 pages
Neural Networks - Comprehensive Foundation (Introduction)
No ratings yet
Neural Networks - Comprehensive Foundation (Introduction)
47 pages
w1 01 Introtonn
No ratings yet
w1 01 Introtonn
42 pages
Sorting and Searching
No ratings yet
Sorting and Searching
92 pages
Artifcial Neural Network": "A Project On
No ratings yet
Artifcial Neural Network": "A Project On
31 pages
Artificial Neural Networks: Part 1/3
No ratings yet
Artificial Neural Networks: Part 1/3
25 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
46 pages
Neural Networks
No ratings yet
Neural Networks
5 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
50 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
83 pages
Ict L2 PDF
No ratings yet
Ict L2 PDF
49 pages
Introduction To Neural Networks: Training Learn Generalization
No ratings yet
Introduction To Neural Networks: Training Learn Generalization
46 pages
Introduction of Neural Network
No ratings yet
Introduction of Neural Network
31 pages
Artificial Neural Networks
0% (1)
Artificial Neural Networks
50 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
20 pages
Artificial Neural Networks
100% (2)
Artificial Neural Networks
35 pages
Neural Network 1
No ratings yet
Neural Network 1
34 pages
Signal Flow Graph
No ratings yet
Signal Flow Graph
51 pages
Experiment No.: 9 Implement Binary Search Tree ADT Using Linked List
No ratings yet
Experiment No.: 9 Implement Binary Search Tree ADT Using Linked List
18 pages
NLP Assignment-4 Solution
100% (1)
NLP Assignment-4 Solution
5 pages
DeviceSecurity and FirewallFilters
No ratings yet
DeviceSecurity and FirewallFilters
11 pages
DayOne VMX
No ratings yet
DayOne VMX
113 pages
Module 8 - MPLS Traffic Engineering
No ratings yet
Module 8 - MPLS Traffic Engineering
65 pages
Measuring Relationship via Regression Analysis and Correlation
No ratings yet
Measuring Relationship via Regression Analysis and Correlation
9 pages
Numerical methods-3
No ratings yet
Numerical methods-3
4 pages
Chapter 8
No ratings yet
Chapter 8
91 pages
Master of Computer Application: Lab Manual
No ratings yet
Master of Computer Application: Lab Manual
30 pages
Advanced Data Structures and Algorithms
No ratings yet
Advanced Data Structures and Algorithms
7 pages
Ios &amp Junos
No ratings yet
Ios &amp Junos
8 pages
Project 5j
No ratings yet
Project 5j
2 pages
Ensmble - Learning - ML - 5 - Jupyter Notebook
No ratings yet
Ensmble - Learning - ML - 5 - Jupyter Notebook
7 pages
Genetic Algo
No ratings yet
Genetic Algo
32 pages
Exp7 New
No ratings yet
Exp7 New
11 pages
Time-Dependent Methods
No ratings yet
Time-Dependent Methods
12 pages
Kiet School of Engineering & Technology: Department of Computer Appication
No ratings yet
Kiet School of Engineering & Technology: Department of Computer Appication
30 pages
191ai521 Daa QB1
No ratings yet
191ai521 Daa QB1
11 pages
Penaltyfunctionmethodsusingmatrixlaboratory MATLAB
No ratings yet
Penaltyfunctionmethodsusingmatrixlaboratory MATLAB
39 pages
Comparing With And: Regression Deep Learning SVM
No ratings yet
Comparing With And: Regression Deep Learning SVM
26 pages
Robot Motion Planning
No ratings yet
Robot Motion Planning
151 pages
Minimum Spanning Trees: Prim's Algorithm Kruskal's Algorithm
No ratings yet
Minimum Spanning Trees: Prim's Algorithm Kruskal's Algorithm
33 pages
A Practical Handbook of Speech Coders
No ratings yet
A Practical Handbook of Speech Coders
14 pages
Numerical Methods
No ratings yet
Numerical Methods
25 pages
Chapter 7: Dimensionality Reduction
No ratings yet
Chapter 7: Dimensionality Reduction
34 pages
Structured Iir Models For HRTF Interpolation
No ratings yet
Structured Iir Models For HRTF Interpolation
8 pages
MAD Test3
No ratings yet
MAD Test3
28 pages
Introduction To Communications: Source Coding
No ratings yet
Introduction To Communications: Source Coding
20 pages
Symbol Library Submenu Description Block Parameters Description
No ratings yet
Symbol Library Submenu Description Block Parameters Description
2 pages
CS 202-EE 202-Data Structures-Ihsan Qazi
No ratings yet
CS 202-EE 202-Data Structures-Ihsan Qazi
2 pages
DTMF Decoding
No ratings yet
DTMF Decoding
3 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
From Everand
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
Fouad Sabry
No ratings yet
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet

Neural Networks

Uploaded by

Neural Networks

Uploaded by

Artificial Neural Networks –

Basics of MLP, RBF and

Examples of handwritten postal codes

Hidden Layer Weights after 25 Epochs

Hidden Layer Weights after 1 Epoch

• 90% Accurate Learning Head Pose, Recognizing 1-of-20 Faces

• One way to build intelligent machines is to

• Properties of the brain

• Basic types of learning

• The Learning process (supervised)

• Algorithm Train-Perceptron (D ≡ {<x, o(x) ≡ d(x)>})

The learning process (stepwise looking for solution):

The gradient descent algorithm:

= Αfter some computations --

In literature: Error is usually calculated as (d – o),

logical AND problem Linearly Separable (LS)

The solution – 2 layered

x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)

Two-layer MLP, only one output unit with linear

Now calculations should take into account activation

For output units (i∈layer L) – generalized delta learning rule:

For hidden units (i∈layer l < L):

Two-Layer Convex Open A B

E(u*) Validation set

• Radial Basis Function approach (Powel 1987):

• Locality. In RBFNs only a small fraction of Φk is active

2. Supervised training of the weights connecting

• Orthogonal Least Square Algorithm.

• Similarity and Correlation

• Hebbian Learning and Competitive Learning

η > 0 (usually 0.1 < η < 0.7)

LVQ is the Supervised extension of the

Visualisation of an influence of different

1. Initialize: random small W(i)(0) for all i=1...K.

• More on SOM – see earlier lecture on clustering

• C.M. Bishop, “Neural Networks for pattern

• S. Haykin, “Neural Networks, a comprehensive

• J.M. Zurada, R.J. Marks, C.J. Robonson Eds.,

You might also like