0% found this document useful (0 votes)

63 views

Multilayer Perceptron (MLP) : The Backpropagation (BP) Algorithm

The document discusses the Multilayer Perceptron (MLP) and the Backpropagation (BP) algorithm. It provides details on how MLPs work, including representation of inputs, dynamics of forward propagation, learning via gradient descent and backpropagation, and their application to pattern classification problems. The BP algorithm is described as a method for modifying weights to minimize error by propagating errors backwards from the output to hidden and input layers.

Uploaded by

Alina Sirghie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Multilayer Perceptron (MLP) : The Backpropagation (BP) Algorithm

Uploaded by

Alina Sirghie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Multilayer Perceptron (MLP): the Backpropagation (BP) Algorithm

Guest Speaker: Edmondo Trentin Dipartimento di Ingegneria dellInformazione Universit` di Siena, V. Roma, 56 - Siena (Italy) a {trentin}@dii.unisi.it
October 7, 2008
Page 1 of 26 Go Back Full Screen Close Quit

Title Page

Representation of Inputs (Patterns) In order to carry out the learning task, we need to extract a digital representation x of any given object (/events) that has to be fed into the MLP x is called a pattern x is real-valued (i.e. x Rd): x is also known as feature vector The components of x are known as the features d is the dimensionality of the feature space Rd The (problem-specic) process of extracting representative features x1, . . . , xd is known as feaPage 2 of 26 Title Page

Go Back

Full Screen

Quit

ture extraction. It should satisfy two requirements: 1. x contains (most of) the information needed for the learning task 2. d is as small as possible Further processing steps, if needed: Feature selection/reduction (e.g. Principal Component Analysis) may reduce the dimensionality, preserving only the relevant information Normalization (/standardization) transforms the feature values into homogeneous and well-behaved values that yield numerical stability
Title Page

Page 3 of 26

Go Back

Full Screen

Quit

Multilayer Perceptron (MLP)

Title Page

Feedforward (and full) connections between pairs of adjacent layers Continuous and dierentiable activation functions

Page 4 of 26

Go Back

Full Screen

Realizes a multidimensional function y = (x) between input x Rdi and output y Rdo

Quit

MLP: Dynamics (forward propagation) Each unit realizes a transformation of the signal via application of its activation function f (.) to its argument a. The argument a is obtained as a weighted sum of the signals that feed the neuron through the incoming connections, i.e. a = k wk zk where wk is the weight associated with k th connection, and zk is the k-th component of the signal (either input signal, or yield by other neurons in the MLP).
Title Page

Page 5 of 26

Go Back

Full Screen

Quit

MLP: Learning A learning rule is applied in order to improve the value of the MLP weights over a training set T according to a given criterion function. MLP: Generalization The MLP must infer a general law from T (a raw memorization of the training examples is not sought!) that, in turn, can be applied to novel data that are distributed according to the same probability laws. Regularization techniques help improving generalization capabilities.
Title Page

Page 6 of 26

Go Back

Full Screen

Quit

MLP Training: the Idea Given an example (x, y), modify the weights w s.t. the output y yielded by the MLP (when fed with input x) gets closer to the target y. Criterion function C(.): minimum squared error (y y )2
100 90 80 70 60 C(x) 50 40 30 x*x

Title Page

Page 7 of 26
20 10 0 -10

-5

0 x

Go Back

Advantages:

Full Screen

Convex & Non-negative (search for minimum) Penalizes large errors Dierentiable (gradient-descent is viable)

Quit

Gradient Descent

Title Page

Page 8 of 26

Go Back

The criterion C(.) is a function of the MLP weights w. Method: iterate slight modication of the weights in order to move in the opposite way w.r.t. the gradient (steepest direction).

Full Screen

Quit

Backpropagation Algorithm (BP) Labeled (supervised) training set: T = {(xk , yk ) | k = 1, ..., N } Online criterion function: C = yn)2 where yn is n-th MLP output
1 2 do n=1 (yn
Title Page

C Weight-update rule: wij = wij (Note: wij is the connection weight between j-th unit in a given layer and i-th unit in the following layer)

Page 9 of 26

Activation function for i-th unit: fi(ai), where: fi : R R ai = j wij fj (aj ) is the input to i-th unit (Note: the sum is extended to all the units in the previous layer)

Go Back

Full Screen

Quit

BP Case 1: i is in the output layer 1 C = wij 2

do n=1

(yn yn)2 wij

(1)

1 (yi yi)2 = 2 wij yi = (yi yi) wij fi(ai) yi = wij wij fi(ai) ai = ai wij l wil yl = fi (ai) wij = fi (ai)j y

Title Page

(2)

Page 10 of 26

Go Back

Full Screen

Quit

where the sum over l is extended to all the units in the (rst) hidden layer. From Eqs. (1) and (2) we have: C = (yi yi)fi (ai)j y wij We dene: i = (yi yi)fi (ai) (4)
Page 11 of 26

(3)

Title Page

We substitute it into Eq. (3), and we can (nally) write: wij = iyj (5)

Go Back

Full Screen

Quit

BP Case 2: unit j in the (topmost) hidden layer Let wjk be the weight between k-th unit in the previous layer (either hidden, or input layer) and j-th unit in the topmost hidden layer: wjk = Again: C 1 = wjk 2 =
n=1 do n=1 do

C wjk

(6)

Title Page

(yn yn)2 wjk (yn yn) yn wjk

Page 12 of 26

(7)

Go Back

Full Screen

Quit

where:

fn(an) yn = wjk wjk fn(an) an = an wjk an = fn(an) wjk and an l wnl yl = wjk wjk yl = wnl wjk
l

(8)

Title Page

Page 13 of 26

(9)

Go Back

Full Screen

= wnj

yj wjk

Quit

(where, again, the sum over l is extended to all the units in the topmost hidden layer). In turn: yj fj (aj ) = wjk wjk fj (aj ) aj = aj wjk m wjmxm = fj (aj ) wjk = fj (aj )xk . (10)
Title Page

Page 14 of 26

Go Back

(of course the sum over m extends over all the units in the previous layer w.r.t. j).

Full Screen

Quit

Substituting eqs. (7), (8), (9) and (10) into equation (6) we obtain: wjk =
n

[(yn yn)fn(an)wnj ] fj (aj )xk (11) [wnj (yn yn)fn(an)]}fj (aj )xk
n
Title Page

= { = (
n

wnj n)fj (aj )xk

Page 15 of 26

where n is dened by Eq. (4). We dene: j = (

Go Back

wnj n)fj (aj )

(12)
Full Screen Close

and Eq. (11) becomes wjk = j xk (13)

Quit

which is known as the BP delta rule, i.e. a compact expression of the BP algorithm itself which captures the idea of top-down backpropagation of deltas throughout the MLP. The delta rule holds also for the other layers in the ANN (proof is easy, by induction on the number of layers). The rule is applied one example at a time, over the whole training set. A complete cycle is known as an Epoch. Many epochs are required in order to accomplish the ANN training. Popular choices for the activation functions: lin1 ear (f (a) = a) and sigmoid (f (a) = 1+ea ) The technique suers from local minima
Title Page

Page 16 of 26

Go Back

Full Screen

Quit

Universal property of MLPs Theorems (independently proved by Lippmann, Cybenko and others) state that for any given continuous and limited function : Rdi Rdo , a MLP with a single hidden layer with sigmoid units exists which approximates (.) arbitrarily well. These are existence theorems, that is to say they stress the exibility of MLPs but: 1. they do not tell us which architecture is the right one for a given (.) (i.e., for any given task) 2. even if the right topology were known, they do not tell us anything about the practical convergence of the BP algorithm to the right weight values

Title Page

Page 17 of 26

Go Back

Full Screen

Quit

MLP for Pattern Classication What is the relation (if any) between MLPs and Bayesian pattern classication (e.g., speech recognition, OCR)? The answer comes from theorems independently proved by Bourlard, Cybenko and others: Let us consider a classication problem involving c classes 1, . . . , c, and a supervised training sample T = {(xi, (xi)) | i = 1, . . . , N } (where (xi) denotes the class which pattern xi belongs to)
Title Page

Page 18 of 26

Go Back

Full Screen

Quit

Let us create a MLP-oriented training set T from T as follows: T = {(xi, yi) | i = 1, . . . , N } where yi = (yi,1, . . . , yi,c) Rc and yi,j = 1.0 if j = (xi) 0.0 otherwise (14)

Title Page

(i.e., yi has null components, except for the one which corresponds to the correct class) Then (theorem), training a MLP over T is equivalent to training it over the training set {(xi, (P (1 | xi), P (2 | xi), . . . , P (c | xi)) | i = 1, . . . , N } although, in general, we do not know P (1 | xi), P (2 | xi), . . . , P (c | xi) in advance.
Page 19 of 26

Go Back

Full Screen

Quit

In so doing, we can train a MLP to estimate Bayesian posterior probabilities without even knowing them on the training sample. Due to the universal property, the nonparametric estimate that we obtain may be optimal. Practical issues: On real-world data, the following problems usually prevent the MLP from reaching the optimal solution: 1. Choice of the architecture (i.e., number of hidden units) 2. Choice of and of the number of training epochs
Close Title Page

Page 20 of 26

Go Back

Full Screen

3. Random initialization of connection weight

Quit

4. BP gets stuck into local minima

Example: prediction of disulde binding state

Title Page

Cysteines (C or Cys) are -amino acids (Standard) -amino acids are molecules which dier in their residue: via condensation, chains of residues form proteins The linear sequence of residues is known as the primary structure of the protein Cysteines play a major role in structural and functional properties of proteins, due to the high reactivity of their side-chain

Page 21 of 26

Go Back

Full Screen

Quit

Title Page

Oxidation of a pair of cysteines form a new molecule called Cystine via a (-S-S-) disulde bond The disulde bond has an impact on protein folding: (a) it holds two portions of the protein together; (b) it stabilizes the secondary structure Prediction of the binding state of Cys within the primary structure of a protein would provide information on the secondary and tertiary structures.
Page 22 of 26

Go Back

Full Screen

Quit

Classication task: predict the binding state (1 = bond, 2 =no bond) of any given cysteine within the protein primary structure. We use a dataset of sequences, e.g. the Protein Databank (PDB) which consists of more than 1,000 sequences, and we apply a supervised approach: QNFITSKHNIDKIMTCNIRLNECHDNIFEICGSGK... GHFTLELVCQRNFVTAIEIDHKLKTTENKLVDHCDN... LNKDILQFKFPNSYKIFGNCIPYNISCTDIRVFDS... Part of the dataset is used for training, another (nonoverlapping)part is used for validation (i.e., tuning of the model parameters) and test (i.e., evaluation of the generalization performance in terms of estimated probability of error).

Title Page

Page 23 of 26

Go Back

Full Screen

Quit

We are faced with 2 problems: 1. We cannot classify on the basis of an individual cysteine only, since P (i | C) is just the prior P (i). Information from the sequence is needed, but the sequence is long and may have variable length, while statistical models and MLP require a xed-dimensionality feature space. Solution: we take xed-size windows (i.e., subsequences) centered in the cysteine at hand: QNFHNIDKIMTCNIRSKLNECHDNIFEICGSGK... The window might contain from 11 to 31 aminoacids. An overlap between adjacent windows is allowed, i.e. a cysteine may become part of the window of another cysteine.

Title Page

Page 24 of 26

Go Back

Full Screen

Quit

2. We cannot feed the MLP with symbols (namely, the literals of the amino-acids): a coding procedure is required. Solution: proles of multiple alignment among homologous (i.e., similar) proteins

Title Page

Page 25 of 26

In so doing, a sequence of 20-dim real vectors x1, . . . xT is obtained, where xt,i is the probability (relative frequency) of observing i-th amino-acid in t-th position within the sequence.

Go Back

Full Screen

Quit

MLP solution to the classication problem: Let us assume that the amino-acid in t-th position is a cysteine. The window centered in this Cys is now dened as W = (xtk , . . . , xt, . . . , xt+k ) for a certain k, that is a 20(2k+1)-dimensional real-valued vector A training set T = {(W, (W))} is created, where (W) is either 0 or 1 according to the binding state (i.e., no bond, bond) of the corresponding cysteine A 1-output MLP (6 hidden units) is trained on T via BP (6 epochs) to estimate P (i | W) Results: 19.36% error rate (which is an estimate of the probability of error)

Title Page

Page 26 of 26

Go Back

Full Screen

Quit

Ib Exam Study Guide Paper 2 English HL Lang and Lit
100% (1)
Ib Exam Study Guide Paper 2 English HL Lang and Lit
16 pages
The Book of Mathematics: Volume 3
From Everand
The Book of Mathematics: Volume 3
Simone Malacrida
No ratings yet
BSC (Hons) Business Management Bmp4005 Information Systems and Big Data Analysis Assessment Number 2 Written Report and Poster Accompanying Paper
No ratings yet
BSC (Hons) Business Management Bmp4005 Information Systems and Big Data Analysis Assessment Number 2 Written Report and Poster Accompanying Paper
8 pages
Theory of Computation and Application- Automata,Formal languages,Computational Complexity (2nd Edition): 2, #1
From Everand
Theory of Computation and Application- Automata,Formal languages,Computational Complexity (2nd Edition): 2, #1
S. R. Jena
No ratings yet
3.multilayer Perceptron
No ratings yet
3.multilayer Perceptron
7 pages
Multi Layer Perceptron
No ratings yet
Multi Layer Perceptron
62 pages
Multi Layer Perceptron
No ratings yet
Multi Layer Perceptron
64 pages
WK3 - Multi Layer Perceptron
No ratings yet
WK3 - Multi Layer Perceptron
59 pages
Lect 15 MLP Introduction Backprop
No ratings yet
Lect 15 MLP Introduction Backprop
24 pages
Unit 3
100% (1)
Unit 3
11 pages
Back Propagation
No ratings yet
Back Propagation
33 pages
Lect 15 MLP Introduction Backprop
No ratings yet
Lect 15 MLP Introduction Backprop
24 pages
Lect5 UWA
No ratings yet
Lect5 UWA
93 pages
Back Propagation Algorithm PDF
No ratings yet
Back Propagation Algorithm PDF
9 pages
7 Ann Multilayer Perceptron Full
No ratings yet
7 Ann Multilayer Perceptron Full
69 pages
Feedforward Neural Networks: An Introduction
No ratings yet
Feedforward Neural Networks: An Introduction
16 pages
Classification Error of Multilayer Perceptron Neural Networks
No ratings yet
Classification Error of Multilayer Perceptron Neural Networks
4 pages
Fast Training of Multilayer Perceptrons
No ratings yet
Fast Training of Multilayer Perceptrons
15 pages
7 Ann Multilayer Perceptron Full
No ratings yet
7 Ann Multilayer Perceptron Full
69 pages
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
No ratings yet
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
43 pages
NN-Ch3
No ratings yet
NN-Ch3
40 pages
Multilayer Perceptron and Uppercase Handwritten Characters Recognition
No ratings yet
Multilayer Perceptron and Uppercase Handwritten Characters Recognition
4 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
CI-6-8 Backpropagation (COMPLETE) Updated
No ratings yet
CI-6-8 Backpropagation (COMPLETE) Updated
76 pages
Solution To Credit Assignment Problem in MLP. Rumelhart, Hinton and Relating To Economics)
No ratings yet
Solution To Credit Assignment Problem in MLP. Rumelhart, Hinton and Relating To Economics)
14 pages
Chapter 10: Artificial Neural Networks
No ratings yet
Chapter 10: Artificial Neural Networks
17 pages
Supervised Learning: Multilayer Networks I
No ratings yet
Supervised Learning: Multilayer Networks I
40 pages
2025-Lecture07-P2-MLP
No ratings yet
2025-Lecture07-P2-MLP
56 pages
Mahendra Engineering College: Lecture Handouts
No ratings yet
Mahendra Engineering College: Lecture Handouts
3 pages
lecture 5
No ratings yet
lecture 5
63 pages
ML 03
No ratings yet
ML 03
42 pages
RBFN and TDNN
No ratings yet
RBFN and TDNN
42 pages
4 Multilayer Perceptrons and Radial Basis Functions
No ratings yet
4 Multilayer Perceptrons and Radial Basis Functions
6 pages
soft computing unit 2
No ratings yet
soft computing unit 2
23 pages
Analysis of Multi Layer Perceptron Network
No ratings yet
Analysis of Multi Layer Perceptron Network
7 pages
AJST-9-1-150-154
No ratings yet
AJST-9-1-150-154
5 pages
Pattern Classification of Back-Propagation Algorithm Using Exclusive Connecting Network
No ratings yet
Pattern Classification of Back-Propagation Algorithm Using Exclusive Connecting Network
5 pages
Unit 2 Soft
No ratings yet
Unit 2 Soft
14 pages
Lecture_2 (1)
No ratings yet
Lecture_2 (1)
52 pages
Python Unit 5
No ratings yet
Python Unit 5
36 pages
4 NN BP Learning and Problems
No ratings yet
4 NN BP Learning and Problems
18 pages
A_1024218716736
No ratings yet
A_1024218716736
13 pages
VINUTHNA-2004_Experiment-4---Implementation-of-MLP-with-Backpropagation
No ratings yet
VINUTHNA-2004_Experiment-4---Implementation-of-MLP-with-Backpropagation
7 pages
U2-ML-QB With Answers
No ratings yet
U2-ML-QB With Answers
16 pages
MLP-chap11
No ratings yet
MLP-chap11
24 pages
NN 2
No ratings yet
NN 2
31 pages
Neural Networks - Basics Matlab PDF
No ratings yet
Neural Networks - Basics Matlab PDF
59 pages
A Fuzzy Back Propagation Algorithm
No ratings yet
A Fuzzy Back Propagation Algorithm
13 pages
Hernandez Lobatoc15
No ratings yet
Hernandez Lobatoc15
9 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
Soft Computing 2
No ratings yet
Soft Computing 2
33 pages
UNIT 2 Notes
No ratings yet
UNIT 2 Notes
19 pages
Feedforward Neural Network
No ratings yet
Feedforward Neural Network
30 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
Solutions to Problems in Fluids and Turbomachinery
From Everand
Solutions to Problems in Fluids and Turbomachinery
Rahul Basu
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
SCH3U Career Profile
No ratings yet
SCH3U Career Profile
6 pages
Ogl 350 Capstone Project 1
No ratings yet
Ogl 350 Capstone Project 1
5 pages
Idolatry of Philosophy
No ratings yet
Idolatry of Philosophy
207 pages
As Prac - Osmosis Using Potato
No ratings yet
As Prac - Osmosis Using Potato
12 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
5 pages
Anni (
No ratings yet
Anni (
2 pages
Shreyas - Bapusaheb Patil - Resume - 09-03-2023-14-10-48
No ratings yet
Shreyas - Bapusaheb Patil - Resume - 09-03-2023-14-10-48
2 pages
Zacaria, Sittie Alyanna N. - Matatag Agenda (Reflection Paper)
100% (4)
Zacaria, Sittie Alyanna N. - Matatag Agenda (Reflection Paper)
3 pages
Skill Development Cell: Pradhan Mantri Kaushal Vikas Yojana - Technical Institutes
No ratings yet
Skill Development Cell: Pradhan Mantri Kaushal Vikas Yojana - Technical Institutes
2 pages
Consolidated WHLP Grade 9 Q3W5
No ratings yet
Consolidated WHLP Grade 9 Q3W5
10 pages
CNN The Great Homework Debate
100% (1)
CNN The Great Homework Debate
7 pages
Liehr
No ratings yet
Liehr
9 pages
Dictation Marathon
No ratings yet
Dictation Marathon
1 page
Brief Strategic Family Therapy
No ratings yet
Brief Strategic Family Therapy
7 pages
Ethical Decision Making
No ratings yet
Ethical Decision Making
30 pages
UQIDAR PHD Guidelines - JUNE2020 PDF
No ratings yet
UQIDAR PHD Guidelines - JUNE2020 PDF
26 pages
Sweet Spot of Success
No ratings yet
Sweet Spot of Success
6 pages
Production Process and Technical Change 1st Edition Mario Morroni download
100% (2)
Production Process and Technical Change 1st Edition Mario Morroni download
53 pages
AVOC
No ratings yet
AVOC
2 pages
new fee structure
No ratings yet
new fee structure
3 pages
John Dastin - Wikipedia, The Free Encyclopedia
No ratings yet
John Dastin - Wikipedia, The Free Encyclopedia
4 pages
Using WebPageTest 1st Edition Rick Viscomi download
100% (2)
Using WebPageTest 1st Edition Rick Viscomi download
48 pages
(Ebook) Computer Security and the Internet: Tools and Jewels from Malware to Bitcoin by Paul C. van Oorschot ISBN 9783030834104, 3030834107 - The ebook in PDF format is ready for immediate access
100% (2)
(Ebook) Computer Security and the Internet: Tools and Jewels from Malware to Bitcoin by Paul C. van Oorschot ISBN 9783030834104, 3030834107 - The ebook in PDF format is ready for immediate access
79 pages
1 PB
No ratings yet
1 PB
13 pages
III Sem ETC Sessinal 1 Analysis
No ratings yet
III Sem ETC Sessinal 1 Analysis
8 pages
HITB Ezine Issue 010
No ratings yet
HITB Ezine Issue 010
102 pages
Aacsb 2018 Business Standards
No ratings yet
Aacsb 2018 Business Standards
72 pages
Written Script
No ratings yet
Written Script
6 pages

Multilayer Perceptron (MLP) : The Backpropagation (BP) Algorithm

Uploaded by

Multilayer Perceptron (MLP) : The Backpropagation (BP) Algorithm

Uploaded by

Multilayer Perceptron (MLP): the Backpropagation (BP) Algorithm

Multilayer Perceptron (MLP)

BP Case 1: i is in the output layer 1 C = wij 2

(yn yn)2 wij

(yn yn)2 wjk (yn yn) yn wjk

wnj n)fj (aj )xk

where n is dened by Eq. (4). We dene: j = (

wnj n)fj (aj )

and Eq. (11) becomes wjk = j xk (13)

3. Random initialization of connection weight

4. BP gets stuck into local minima

Example: prediction of disulde binding state

You might also like