S2_5_NN
S2_5_NN
1
Contents
• Introduction
Motivation, Biological Background
• Th res h o l d L o g i c U n i t s
Definition, Geometric Interpretation, Limitations, Networks of T L U s , Training
• General N e u r a l Networks
Structure, Operation, Training
• M u l t i - layer Perceptrons
Definition, Function Approximation, Gradient Descent, Backpropagation, Variants, Sensitivity Analysis
• Deep Learn i n g
Many-layered Perceptrons, Rectified Linear Units, Auto-Encoders, Feature Construction, Image Analysis
• R a d i a l B a s i s Fu n ct ion Networks
Definition, Function Approximation, Initialization, Training, Generalized Version
• Self-Org an i zi n g Map s
Definition, Learning Vector Quantization, Neighborhood of Output Neurons
2
M u l t i - layer Perceptrons ( M L P s )
64
M u l t i - layer Perceptrons
(r−2)
(iii) C ⊆ (Uin × Uhidden
(1)
) ∪ (U r−3
(i)
i=1 hidden
× U (i+1) )
hidden
∪ ( U
hidden
× Uout )
65
M u l t i - layer Perceptrons
x1 y1
x2 y2
xn ym
(1) (2) (r−2)
Uin Uhidden Uhidden Uhidden Uout
66
M u l t i - layer Perceptrons
• The network input function of each hidden neuron and of each output neuron is
the weighted sum of its inputs, that is,
(u) ⊤→ Σ
∀u ∈ Uhidden ∪ Uout : →u, i→
f net (w nu) = →u inu
w wuv outv .
= v∈pred (u)
• The activation function of each output neuron is either also a sigmoid function or
a linear function, that is,
67
S i g mo i d A c t i v a t io n Functions
1 1
1
1
2
2
net net
0 0
θ θ− 1
2
θ θ+ 1
2
1 1
1 1
2 2
net net
0 π 0
θ− θ θ+ π
2 2 θ− 4 θ− 2 θ θ+ 2 θ+ 4
68
S i g mo i d A c t i v a t io n Functions
hyperbolic tangent:
fact(net, θ) = tanh(net −θ)
(net −θ) − e−(net −θ) 1
= e
e(net −θ) + e−(net −θ) net
0
θ− 2 θ− 1 θ θ+ 1 θ+ 2
1 − e−2(net −θ)
=
1 + e−2(net −θ) −1
= 2 −1
1 + e−2(net −θ)
69
M u l t i - layer Perceptrons: Weight Matrices
etU2 = W ·i→
n→ nU2 = W ·o→
utU1
70
M u l t i - layer Perceptrons: Biimplication
−2
x1 −1
2
2
3 y
2
x2 −1 2
−2
Uin U hidden U out
71
M u l t i - layer Perceptrons: Fr e d k i n G a t e
s s
x1 y1 s 0 0 0 0 1 1 1 1
x2 y2 x1 0 0 1 1 0 0 1 1
x2 0 1 0 1 0 1 0 1
0 0 1 1 y1 0 0 1 1 0 1 0 1
a a a b y2 0 1 0 1 0 0 1 1
b b b a
y1 y2
x3 x2 x3 x2
x1 x1
72
M u l t i - layer Perceptrons: Fr e d k i n G a t e
1
2
2
x1
2
y1
−2 3 1
2
2
s
2
−2 2
3 1 y2
2
x2
2
2
1
Uin U hidden U out
84
W h y N o n - linear A c t i v a t io n Functions?
85
W h y N o n - linear A c t i v a t io n Functions?
utU2 = A 12 ·o→
o→ utU1 + →
b12
with an n × m matrix A 12 and an n-dimensional vector →
b12.
86
W h y N o n - linear A c t i v a t io n Functions?
Therefore we have
o→ utU1 + →
utU2 = A 12 ·o→ b12
and
o→ utU2 + →
utU3 = A 23 ·o→ b23 for the
computations of two consecutive layers U2 and U3.
A 13 = A 23 ·A 12 and →
b13 = A 23 ·→
b12 + →
b23.
Res u l t: With linear activation and output functions any multi-layer perceptron can be
reduced to a two-layer perceptron.
87
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n
y
y4
y3
y2
y1
y0
x
x1 x2 x3 x4
88
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n
y
y4
y3
x1
2 y2
1 y1
1 −2
y1 y
x2 0
x
1 2
x1 x2 x3 x4
y2 y
x −2 1 id
1
x3
2 y3
1
−2 1
x4
A neural network that computes the step function shown on the preceding slide.
According to the input value only one step is active at any time.
The output neuron has the identity as its activation and output functions.
89
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n
90
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n
y y
y4
y3 ∆y 4
y2 ∆y 3
y1 ∆y 2
y0 ∆y 1
x x
x1 x2 x3 x4 x1 x2 x3 x4
1
0 ·∆y 4
1
0 ·∆y 3
By using relative step heights 1
one layer can be saved. 0 ·∆y 2
1
0 ·∆y 1
92
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n
x1
1 ∆y 1
x 2 ∆y 2
1
x id y
1
x 3 ∆y 3
1 ∆y 4
x4
A neural network that computes the step function shown on the preceding slide.
The output neuron has the identity as its activation and output functions.
93
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n
y y
y4
y3 ∆y 4
y2 ∆y 3
y1 ∆y 2
y0 ∆y 1
x x
x1 x2 x3 x4 x1 x2 x3 x4
1
0 ·∆y 4
1
0 ·∆y 3
By using semi-linear functions 1
the approximation can be 0 ·∆y 2
1
improved. 0 ·∆y 1
94
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n
θ1
1
∆x ∆y 1
1
∆x
θ2 ∆y 2
x id y
1
∆x θ3 ∆ y 3
1
∆x ∆y 4
xi
θi =
θ4 ∆x
∆ x = x i + 1 −x i
A neural network that computes the step function shown on the preceding slide.
The output neuron has the identity as its activation and output functions.
95