0% found this document useful (0 votes)
6 views22 pages

S2_5_NN

The document provides an overview of artificial neural networks and deep learning, detailing various types of networks such as multi-layer perceptrons, radial basis function networks, and recurrent neural networks. It covers their structures, operations, training methods, and applications in function approximation and image analysis. Additionally, it discusses activation functions, including sigmoid and hyperbolic tangent functions, and the importance of non-linear activation for effective learning.

Uploaded by

mahsa.kh.1980
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views22 pages

S2_5_NN

The document provides an overview of artificial neural networks and deep learning, detailing various types of networks such as multi-layer perceptrons, radial basis function networks, and recurrent neural networks. It covers their structures, operations, training methods, and applications in function approximation and image analysis. Additionally, it discusses activation functions, including sigmoid and hyperbolic tangent functions, and the importance of non-linear activation for effective learning.

Uploaded by

mahsa.kh.1980
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Artificial N eu r al Networks

and Deep Lear n in g

1
Contents

• Introduction
Motivation, Biological Background

• Th res h o l d L o g i c U n i t s
Definition, Geometric Interpretation, Limitations, Networks of T L U s , Training

• General N e u r a l Networks
Structure, Operation, Training

• M u l t i - layer Perceptrons
Definition, Function Approximation, Gradient Descent, Backpropagation, Variants, Sensitivity Analysis

• Deep Learn i n g
Many-layered Perceptrons, Rectified Linear Units, Auto-Encoders, Feature Construction, Image Analysis

• R a d i a l B a s i s Fu n ct ion Networks
Definition, Function Approximation, Initialization, Training, Generalized Version

• Self-Org an i zi n g Map s
Definition, Learning Vector Quantization, Neighborhood of Output Neurons

• Hopfield Networks and B o l t z m a n n Machines


Definition, Convergence, Associative Memory, Solving Optimization Problems, Probabilistic Models

• Recu rren t N e u r a l Networks


Differential Equations, Vector Networks, Backpropagation through Time

2
M u l t i - layer Perceptrons ( M L P s )

64
M u l t i - layer Perceptrons

An r-layer perceptron is a neural network with a graph G = (U, C )


that satisfies the following conditions:

(i) Uin ∩ Uout = ∅,

(1) ∪ ···∪ U (r−2)


(ii) Uhidden = Uhidden hidden ,
(i) (j)
∀1 ≤ i < j ≤ r − 2 : Uhidden ∩ Uhidden = ∅,

(r−2)
(iii) C ⊆ (Uin × Uhidden
(1)
) ∪ (U r−3
(i)
i=1 hidden
× U (i+1) )
hidden
∪ ( U
hidden
× Uout )

65
M u l t i - layer Perceptrons

General structure of a multi -layer perceptron

x1 y1

x2 y2

xn ym
(1) (2) (r−2)
Uin Uhidden Uhidden Uhidden Uout

66
M u l t i - layer Perceptrons

• The network input function of each hidden neuron and of each output neuron is
the weighted sum of its inputs, that is,
(u) ⊤→ Σ
∀u ∈ Uhidden ∪ Uout : →u, i→
f net (w nu) = →u inu
w wuv outv .
= v∈pred (u)

• The activation function of each hidden neuron is a so-called


sigmoid function, that is, a monotonically increasing function

f : IR → [0, 1] with lim f (x) = 0 and lim f (x) = 1.


x→−∞ x→∞

• The activation function of each output neuron is either also a sigmoid function or
a linear function, that is,

fact(net, θ) = α net −θ.

Only the step function is a neurobiologically plausible activation function.

67
S i g mo i d A c t i v a t io n Functions

step function: semi-linear function:

1 1

1
1
2
2

net net
0 0
θ θ− 1
2
θ θ+ 1
2

sine until saturation: logistic function:


1
f act (net, θ) =
1 + e−(net −θ)

1 1

1 1
2 2

net net
0 π 0
θ− θ θ+ π
2 2 θ− 4 θ− 2 θ θ+ 2 θ+ 4

68
S i g mo i d A c t i v a t io n Functions

• All sigmoid functions on the previous slide are unipolar,


that is, they range from 0 to 1.
• Sometimes bipolar sigmoid functions are used (ranging from −1 to +1),
like the hyperbolic tangent (tangens hyperbolicus).

hyperbolic tangent:
fact(net, θ) = tanh(net −θ)
(net −θ) − e−(net −θ) 1
= e
e(net −θ) + e−(net −θ) net
0
θ− 2 θ− 1 θ θ+ 1 θ+ 2
1 − e−2(net −θ)
=
1 + e−2(net −θ) −1

= 2 −1
1 + e−2(net −θ)

69
M u l t i - layer Perceptrons: Weight Matrices

Let U1 = {v 1 , . . . , v m } and U2 = {u 1 , . . . , u n } be the neurons of two consecutive


layers of a multi-layer perceptron.
Their connection weights are represented by an n × m matrix

where w u i v j = 0 if there is no connection from neuron v j to neuron u i .

Advantage: The computation of the network input can be written as

etU2 = W ·i→
n→ nU2 = W ·o→
utU1

etU2 = (netu1, . . . , netun)⊤ and i→


where n→ utU1 = (outv1, . . . , outvm)⊤.
nU2 = o→

70
M u l t i - layer Perceptrons: Biimplication

So l v i n g the biimplication problem with a multi -layer perceptron.

−2
x1 −1
2
2
3 y
2
x2 −1 2
−2
Uin U hidden U out

Note the additional input neurons compared to the T L U solution.

71
M u l t i - layer Perceptrons: Fr e d k i n G a t e

s s
x1 y1 s 0 0 0 0 1 1 1 1
x2 y2 x1 0 0 1 1 0 0 1 1
x2 0 1 0 1 0 1 0 1
0 0 1 1 y1 0 0 1 1 0 1 0 1
a a a b y2 0 1 0 1 0 0 1 1
b b b a

y1 y2
x3 x2 x3 x2
x1 x1

72
M u l t i - layer Perceptrons: Fr e d k i n G a t e

1
2
2
x1
2
y1
−2 3 1
2
2
s
2
−2 2
3 1 y2
2
x2
2
2
1
Uin U hidden U out

84
W h y N o n - linear A c t i v a t io n Functions?

With weight matrices we have for two consecutive layers U1 and U2


etU2 = W ·i→
n→ nU2 = W ·o→
utU1 .
If the activation functions are linear, that is,
fact(net, θ) = α net −θ.
the activations of the neurons in the layer U2 can be computed as
a→ etU2 − →
ctU2 = D act ·n→ θ,
where
ctU2 = (actu1, . . . , actun)⊤ is the activation vector,
• a→
• D act is an n × n diagonal matrix of the factors α u i , i = 1, . . . , n, and

θ = (θu1, . . . , θun)⊤ is a bias vector.


•→

85
W h y N o n - linear A c t i v a t io n Functions?

If the output function is also linear, it is analogously


o→ ctU2 − →
utU2 = D out ·a→ ξ,
where
utU2 = (outu1, . . . , outun)⊤ is the output vector,
• o→
• D out is again an n × n diagonal matrix of factors, and
• →ξ = (ξ u 1 , . . . , ξ u n ) ⊤ a bias
vector. Combining these computations
we get o→ utU1 − →
utU2 = D out · D act · W ·o→ θ
−→ξ
and thus

utU2 = A 12 ·o→
o→ utU1 + →
b12
with an n × m matrix A 12 and an n-dimensional vector →
b12.

86
W h y N o n - linear A c t i v a t io n Functions?

Therefore we have
o→ utU1 + →
utU2 = A 12 ·o→ b12
and
o→ utU2 + →
utU3 = A 23 ·o→ b23 for the
computations of two consecutive layers U2 and U3.

These two computations can be combined into


o→ utU1 + →
utU3 = A 13 ·o→ b13, where

A 13 = A 23 ·A 12 and →
b13 = A 23 ·→
b12 + →
b23.

Res u l t: With linear activation and output functions any multi-layer perceptron can be
reduced to a two-layer perceptron.

87
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n

• Up to now: representing and learning Boolean functions f : {0, 1} n → {0, 1}.


• Now: representing and learning real-valued functions f : IR n → IR.

General idea of function approximation:


• Approximate a given function by a step function.
• Construct a neural network that computes the step function.

y
y4
y3
y2

y1
y0
x
x1 x2 x3 x4

88
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n

y
y4
y3
x1
2 y2

1 y1
1 −2
y1 y
x2 0
x
1 2
x1 x2 x3 x4
y2 y
x −2 1 id
1
x3
2 y3
1
−2 1

x4

A neural network that computes the step function shown on the preceding slide.
According to the input value only one step is active at any time.
The output neuron has the identity as its activation and output functions.

89
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n

Theorem: Any Riemann-integrable function can be approximated


with arbitrary accuracy by a four-layer perceptron.

• But: Error is measured as the area between the functions.

• More sophisticated mathematical examination allows a stronger assertion:


With a three-layer perceptron any continuous function can be approximated
with arbitrary accuracy (error: maximum function value difference).

90
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n

y y
y4
y3 ∆y 4
y2 ∆y 3

y1 ∆y 2
y0 ∆y 1
x x
x1 x2 x3 x4 x1 x2 x3 x4

1
0 ·∆y 4
1
0 ·∆y 3
By using relative step heights 1
one layer can be saved. 0 ·∆y 2
1
0 ·∆y 1

92
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n

x1

1 ∆y 1
x 2 ∆y 2
1
x id y
1
x 3 ∆y 3
1 ∆y 4

x4

A neural network that computes the step function shown on the preceding slide.
The output neuron has the identity as its activation and output functions.

93
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n

y y
y4
y3 ∆y 4
y2 ∆y 3

y1 ∆y 2
y0 ∆y 1
x x
x1 x2 x3 x4 x1 x2 x3 x4

1
0 ·∆y 4
1
0 ·∆y 3
By using semi-linear functions 1
the approximation can be 0 ·∆y 2
1
improved. 0 ·∆y 1

94
M u l t i - layer Perceptrons: Function A p p r o x i ma t i o n

θ1
1
∆x ∆y 1
1
∆x
θ2 ∆y 2

x id y
1
∆x θ3 ∆ y 3
1
∆x ∆y 4
xi
θi =
θ4 ∆x
∆ x = x i + 1 −x i

A neural network that computes the step function shown on the preceding slide.
The output neuron has the identity as its activation and output functions.

95

You might also like