0% found this document useful (0 votes)

53 views

Multi Layer Perceptron Haykin

A multilayer perceptron uses a backpropagation algorithm to minimize error during training. It consists of an input layer, hidden layers, and an output layer fully connected by weights. The algorithm calculates error signals that propagate backward from the output to adjust weights and reduce average squared error over multiple epochs until a stopping criterion is reached. Generalization to new data depends on factors like training set size and network architecture.

Uploaded by

AFFIFA JAHAN ANONNA

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Multi Layer Perceptron Haykin

Uploaded by

AFFIFA JAHAN ANONNA

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 50

Multilayer Percetrons

Neural Networks, Simon Haykin, Prentice-Hall, 3rd

edition
Multilayer Perceptrons
Architecture

Input Output
layer layer

Hidden Layers

2
A solution for the XOR problem
x1
x1 x2 x1 xor x2 1
-1 -1 -1
-1 1 1 -1 1
1 -1 1 x2
1 1 -1
-1
-1
0.1
+1
x1 +1
-1
1 if v > 0
(v) =
-1 -1 if v  0
x2 +1
+1  is the sign function.
-1
3
NEURON MODEL
• Sigmoidal Function
 (v j )
1  (v j )  1
 av j
Increasing a 1 e

-10 -8 -6 -4 -2 2 4 6 8
vj
10
vj  w
i 0 ,...,m
ji yi

• v j induced field of neuron j

• Most common form of activation function
• a     threshold function
• Differentiable
4
LEARNING ALGORITHM

• Back-propagation algorithm

Function signals
Forward Step

Error signals
Backward Step
• It adjusts the weights of the NN in order to
minimize the average squared error.

5
Average Squared Error
• Error signal of output neuron j at presentation of n-th
training example:
e j (n)  d j (n) - y j (n) C: Set of
• Total energy at time n: neurons

e
in output
• Average squared error: E(n)  1
2
2
j (n) layer
jC
N: size of
• Measure of learning training set
N

 E (n)
performance: 1
EAV  N
n 1

• Goal: Adjust weights of NN to minimize EAV

6
Notation

e j Error at output of neuron j

yj Output of neuron j
vj  w
i 0 ,...,m
ji yi Induced local
field of neuron j

7
Weight Update Rule
Update rule is based on the gradient descent method
take a step in the direction yielding the maximum
decrease of E

E
w ji  - Step in direction opposite to the gradient
w ji

10
Update Rule
• We obtain
w ji   j yi
because

E E v j

w ji v j w ji
E v j
 j  yi
v j w ji
11
Compute local gradient of
neuron j

• The key factor is the calculation of ej

• There are two cases:
– Case 1): j is a output neuron
– Case 2): j is a hidden neuron

12
Error ej of output neuron
• Case 1: j output neuron

ej  dj - yj
Then

 j  (d j - y j ) ' (v j )

13
Local gradient of hidden
neuron
• Case 2: j hidden neuron

• the local gradient for neuron j is recursively

determined in terms of the local gradients of
all neurons to which neuron j is directly
connected

14
15
Use the Chain Rule
E y j
j  - y j
  ' (v j )
y j v j v j

E(n)  1
2  k (n)
e
kC
2

E e k   e k  v k
   ek   e k   y
y j kC y j kC  v k  j

e k v k
from    ' (vk )  w kj
v k y j
E
We obtain 
y j
 
kC
k w kj

16
Local Gradient of hidden
neuron j
Hence  j    ( v j )  k w kj
kC

w1j e1 Signal-flow
1 ’(v1) graph of
j ’(vj) back-
wkj
’(vk) ek propagation
k error signals
wm j
to neuron j
em
m ’(vm)

17
Delta Rule
• Delta rule wji = j yi

  ( v j )(d j  y j ) IF j output node

j 
  ( v j ) k w kj IF j hidden node
kC

C: Set of neurons in the layer following the one

containing j

18
Local Gradient of neurons

 ' ( v j )  ay j[1  y j ] a>0

ay [1  y j  k
]  w if j hidden node
j  j kj
ay j [1  y j ][d j  y j ] If j output node
k

19
Backpropagation algorithm

• Two phases of computation:

– Forward pass: run the NN and compute the error for
each neuron of the output layer.
– Backward pass: start at the output layer, and pass
the errors backwards through the network, layer by
layer, by recursively computing the local gradient of
each neuron.

20
Summary

21
Training

• Sequential mode (on-line, pattern or

stochastic mode):
– (x(1), d(1)) is presented, a sequence of
forward and backward computations is
performed, and the weights are updated
using the delta rule.
– Same for (x(2), d(2)), … , (x(N), d(N)).

22
Training

• The learning process continues on an epoch-

by-epoch basis until the stopping condition is
satisfied.
• From one epoch to the next choose a
randomized ordering for selecting examples in
the training set.

23
Stopping criterions
• Sensible stopping criterions:
– Average squared error change:
Back-prop is considered to have converged
when the absolute rate of change in the
average squared error per epoch is
sufficiently small (in the range [0.1, 0.01]).
– Generalization based criterion:
After each epoch the NN is tested for
generalization. If the generalization
performance is adequate then stop.

24
Early stopping

25
Generalization
• Generalization: NN generalizes well if the I/O
mapping computed by the network is nearly
correct for new data (test set).
• Factors that influence generalization:
– the size of the training set.
– the architecture of the NN.
– the complexity of the problem at hand.
• Overfitting (overtraining): when the NN learns
too many I/O examples it may end up
memorizing the training data.
26
Generalization

27
Expressive capabilities of NN
Boolean functions:
• Every boolean function can be represented by
network with single hidden layer
• but might require exponential hidden units

Continuous functions:
• Every bounded continuous function can be
approximated with arbitrarily small error, by network
with one hidden layer
• Any function can be approximated with arbitrary
accuracy by a network with two hidden layers

28
Generalized Delta Rule
• If  small  Slow rate of learning
If  large  Large changes of weights
 NN can become unstable
(oscillatory)
• Method to overcome above drawback: include
a momentum term in the delta rule
Generalized
w ji ( n)  w ji ( n  1)   j ( n)yi ( n) delta
function

momentum constant
29
Generalized delta rule

• the momentum accelerates the descent in steady

downhill directions.
• the momentum has a stabilizing effect in
directions that oscillate in time.

PMR5406 Redes Neurais e 30

Lógica Fuzzy
 adaptation
Heuristics for accelerating the convergence of
the back-prop algorithm through  adaptation:

• Heuristic 1: Every weight should have its own .

• Heuristic 2: Every  should be allowed to vary from

one iteration to the next.

PMR5406 Redes Neurais e 31

Lógica Fuzzy
NN DESIGN
• Data representation
• Network Topology
• Network Parameters
• Training
• Validation

32
Setting the parameters
• How are the weights initialised?
• How is the learning rate chosen?
• How many hidden layers and how many neurons?
• Which activation function ?
• How to preprocess the data ?
• How many examples in the training data set?

33
Some heuristics (1)
• Sequential x Batch algorithms: the
sequential mode (pattern by pattern) is
computationally faster than the batch
mode (epoch by epoch)

PMR5406 Redes Neurais e 34

Lógica Fuzzy
Some heuristics (2)
• Maximization of information content:
every training example presented to the
backpropagation algorithm must
maximize the information content.
– The use of an example that results in the
largest training error.
– The use of an example that is radically
different from all those previously used.

PMR5406 Redes Neurais e 35

Lógica Fuzzy
Some heuristics (3)
• Activation function: network learns
faster with antisymmetric functions
when compared to nonsymmetric
functions.

 v   1 Sigmoidal function is
e
1  av nonsymmetric

 (v)  a tanh(bv) Hyperbolic tangent

function is
nonsymmetric

36
Some heuristics (3)

37
Some heuristics (4)
• Target values: target values must be
chosen within the range of the sigmoidal
activation function.
• Otherwise, hidden neurons can be
driven into saturation which slows down
learning

38
Some heuristics (4)
• For the antisymmetric activation
function it is necessary to design Є
• For a+: d j  a  
• For –a:
d j  a  
• If a=1.7159 we can set Є=0.7159 then
d=±1
39
Some heuristics (5)
• Inputs normalisation:
– Each input variable should be processed
so that the mean value is small or close to
zero or at least very small when compared
to the standard deviation.
– Input variables should be uncorrelated.
– Decorrelated input variables should be
scaled so their covariances are
approximately equal.

40
Some heuristics (5)

41
Some heuristics (6)
• Initialisation of weights:
– If synaptic weights are assigned large
initial values neurons are driven into
saturation. Local gradients become small
so learning rate becomes small.
– If synaptic weights are assigned small
initial values algorithms operate around the
origin. For the hyperbolic activation
function the origin is a saddle point.

42
Some heuristics (6)
• Weights must be initialised for the
standard deviation of the local induced
field v lies in the transition between the
linear and saturated parts.

v 1
w  m 1 / 2 m=number of weights

43
Some heuristics (7)
• Learning rate:
– The right value of  depends on the application.
Values between 0.1 and 0.9 have been used in
many applications.
– Other heuristics adapt  during the training as
described in previous slides.

44
Some heuristics (8)
• How many layers and neurons
– The number of layers and of neurons depend
on the specific task. In practice this issue is
solved by trial and error.
– Two types of adaptive algorithms can be used:
• start from a large network and successively
remove some neurons and links until network
performance degrades.
• begin with a small network and introduce new
neurons until performance is satisfactory.

45
Some heuristics (9)

• How many training data ?

– Rule of thumb: the number of training examples
should be at least five to ten times the number
of weights of the network.

46
Output representation and
decision rule
• M-class classification problem
Yk,j(xj)=Fk(xj), k=1,...,M

Y1,j
MLP Y2,j
YM,j

47
Data representation

1, x j  C k 0 
d k,j   
0, x j  C k  
1  Kth element
 
 
0

48
MLP and the a posteriori class
probability
• A multilayer perceptron classifier
(using the logistic function)
aproximate the a posteriori class
probabilities, provided that the size
of the training set is large enough.

49
The Bayes rule
• An appropriate output decision rule is
the (approximate) Bayes rule generated
by the a posteriori probability
estimates:
j k
• xЄCk if Fk(x)>Fj(x) for all
 F1 ( x ) 
 F (x ) 
F (x )   2 
  
 
 F M ( x )

Machine Learning
No ratings yet
Machine Learning
47 pages
Machine Learning (16CIC73) Project Report Template
33% (3)
Machine Learning (16CIC73) Project Report Template
12 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
No ratings yet
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
25 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Neural Network
No ratings yet
Neural Network
44 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
71 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Back Propagation
100% (1)
Back Propagation
27 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Ann4-3s.pdf 7oct PDF
No ratings yet
Ann4-3s.pdf 7oct PDF
21 pages
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
No ratings yet
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
25 pages
Machine Learning: Chapter 4. Artificial Neural Networks
No ratings yet
Machine Learning: Chapter 4. Artificial Neural Networks
34 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
14 pages
Neural Network BSC
No ratings yet
Neural Network BSC
32 pages
neural (2)
No ratings yet
neural (2)
32 pages
Lecture 4
No ratings yet
Lecture 4
50 pages
Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5
No ratings yet
Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5
11 pages
Module 3 Final
No ratings yet
Module 3 Final
88 pages
ML807_Distributed_and_Federated_Learning_Slides_2
No ratings yet
ML807_Distributed_and_Federated_Learning_Slides_2
211 pages
Lecture+8
No ratings yet
Lecture+8
65 pages
Neural
No ratings yet
Neural
53 pages
Anthony Kuh - Neural Networks and Learning Theory
No ratings yet
Anthony Kuh - Neural Networks and Learning Theory
72 pages
Neural Network
100% (1)
Neural Network
54 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
15 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Back Propagation
No ratings yet
Back Propagation
37 pages
Backpropagation Learning: 15-486/782: Artificial Neural Networks David S. Touretzky Fall 2006
No ratings yet
Backpropagation Learning: 15-486/782: Artificial Neural Networks David S. Touretzky Fall 2006
37 pages
Machine Learning: Feed Forward Neural Networks Backpropagation Algorithm Cnns and Rnns
No ratings yet
Machine Learning: Feed Forward Neural Networks Backpropagation Algorithm Cnns and Rnns
127 pages
Neural Networks Handout
No ratings yet
Neural Networks Handout
7 pages
Unit 4 notes
No ratings yet
Unit 4 notes
19 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
2021 Lecture11 NeuralNetworks
No ratings yet
2021 Lecture11 NeuralNetworks
48 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
14 pages
AIML-Module-3-part 2
No ratings yet
AIML-Module-3-part 2
122 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
2023-Lecture11-NeuralNetworks
No ratings yet
2023-Lecture11-NeuralNetworks
48 pages
Machine Learning: Algorithms and Applications: (Continued)
No ratings yet
Machine Learning: Algorithms and Applications: (Continued)
17 pages
Unit 1 (1)
No ratings yet
Unit 1 (1)
72 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
2025-Lecture07-P2-MLP
No ratings yet
2025-Lecture07-P2-MLP
56 pages
Notes_ML_02_Slides_RNN_ANN
No ratings yet
Notes_ML_02_Slides_RNN_ANN
105 pages
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
No ratings yet
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
43 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Unit 5
No ratings yet
Unit 5
219 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge
100% (1)
Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge
40 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
19 pages
ANN Unit 3
No ratings yet
ANN Unit 3
100 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Unit 3
No ratings yet
Unit 3
110 pages
5 1 ArtificialNeuralNetworks 4up
No ratings yet
5 1 ArtificialNeuralNetworks 4up
12 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Schedule - Term End Examinations - Fall Semester 2024-25
No ratings yet
Schedule - Term End Examinations - Fall Semester 2024-25
31 pages
Project Synopsis
33% (3)
Project Synopsis
4 pages
ShirgaonkarSaadResume - 1718717850371 - Shirgaonkar Saad
No ratings yet
ShirgaonkarSaadResume - 1718717850371 - Shirgaonkar Saad
2 pages
AIML - KE Answer Key
No ratings yet
AIML - KE Answer Key
12 pages
Data Mining Imp Solutions
No ratings yet
Data Mining Imp Solutions
6 pages
Time Series Forecasting Using Deep Learning - MATLAB & Simulink
100% (1)
Time Series Forecasting Using Deep Learning - MATLAB & Simulink
6 pages
Ai Past Present Future
No ratings yet
Ai Past Present Future
11 pages
Data Extraction from Google Trends with Pytrends _ by Mohsen Baghaee _ Medium
No ratings yet
Data Extraction from Google Trends with Pytrends _ by Mohsen Baghaee _ Medium
16 pages
Weather Forecasting in Python Django With Machine Learning
100% (1)
Weather Forecasting in Python Django With Machine Learning
5 pages
FACE DETECTION DOCUMENT
No ratings yet
FACE DETECTION DOCUMENT
7 pages
R17CS357 - Microsoft Azure Report
No ratings yet
R17CS357 - Microsoft Azure Report
13 pages
MSC Computer Science Thesis PDF
67% (3)
MSC Computer Science Thesis PDF
4 pages
A Review of Ambient Intelligence Assisted Healthcare Monitoring
No ratings yet
A Review of Ambient Intelligence Assisted Healthcare Monitoring
10 pages
Assignment 3 Specification
No ratings yet
Assignment 3 Specification
3 pages
Mcqs Bank Unit 1: A) The Autonomous Acquisition of Knowledge Through The Use of Computer Programs
100% (1)
Mcqs Bank Unit 1: A) The Autonomous Acquisition of Knowledge Through The Use of Computer Programs
8 pages
(External) Bangkit 2.0 Detailed Curriculum
No ratings yet
(External) Bangkit 2.0 Detailed Curriculum
24 pages
Toward Achieving The Core Goals of Digital Business Transformation - A Preliminary Study
No ratings yet
Toward Achieving The Core Goals of Digital Business Transformation - A Preliminary Study
5 pages
Deep Learning Techniques: An Overview: January 2021
No ratings yet
Deep Learning Techniques: An Overview: January 2021
11 pages
Natural Language Processing for Analyzing Online C
No ratings yet
Natural Language Processing for Analyzing Online C
37 pages
Enhanced Bearing Fault Diagnosis Through Trees Ensemble Method and Feature Importance Analysis
No ratings yet
Enhanced Bearing Fault Diagnosis Through Trees Ensemble Method and Feature Importance Analysis
17 pages
Heart Disease Paper
No ratings yet
Heart Disease Paper
10 pages
Lec 9
No ratings yet
Lec 9
16 pages
Session 06
No ratings yet
Session 06
29 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Deep Reinforcement Learning in Action 1st Edition Alexander Zai - Quickly download the ebook to read anytime, anywhere
100% (3)
Deep Reinforcement Learning in Action 1st Edition Alexander Zai - Quickly download the ebook to read anytime, anywhere
56 pages
Standford University, Course Links To "Practical Machine Learning" (2021 Fall)
No ratings yet
Standford University, Course Links To "Practical Machine Learning" (2021 Fall)
7 pages
914pm - 44.epra Journals 13330
No ratings yet
914pm - 44.epra Journals 13330
5 pages
Ebooks File Machine Learning Applications From Computer Vision To Robotics 1st Edition Indranath Chatterjee All Chapters
100% (3)
Ebooks File Machine Learning Applications From Computer Vision To Robotics 1st Edition Indranath Chatterjee All Chapters
79 pages

Multi Layer Perceptron Haykin

Uploaded by

Multi Layer Perceptron Haykin

Uploaded by

Multilayer Percetrons

Neural Networks, Simon Haykin, Prentice-Hall, 3rd

• v j induced field of neuron j

• Goal: Adjust weights of NN to minimize EAV

e j Error at output of neuron j

With w ji weight associated to the link from neuron i

• The key factor is the calculation of ej

• the local gradient for neuron j is recursively

  ( v j )(d j  y j ) IF j output node

C: Set of neurons in the layer following the one

 ' ( v j )  ay j[1  y j ] a>0

• Two phases of computation:

• Sequential mode (on-line, pattern or

• The learning process continues on an epoch-

• the momentum accelerates the descent in steady

PMR5406 Redes Neurais e 30

• Heuristic 1: Every weight should have its own .

• Heuristic 2: Every  should be allowed to vary from

PMR5406 Redes Neurais e 31

PMR5406 Redes Neurais e 34

PMR5406 Redes Neurais e 35

 (v)  a tanh(bv) Hyperbolic tangent

• How many training data ?

You might also like