What Is Neural Network Technology?
What Is Neural Network Technology?
• Introduction
- what is neural network?
- what is learning?
- symbolic learning vs neural net learning
• Training Perceptron
- Gradient Descent Method
- Widrow-Hoff Procedure
- Generalized Delta Procedure
- Error Correction Procedure
1
What Makes Up A Neural Network?
2
SOMA
wi
∑ T
Dendrites
(input) Summer Threshold Axons
(output)
synapses
N
∑w x
i =1
i i <Neuron Model>
Memory and processing elements the same Memory and processing separate
Seeks answer by finding minima in solution space Seeks answer by following logical tree structure
3
Inductive learning : learning from example
• Representation of functions
– expressiveness : perceptron can’t learn XOR
– efficiency : # of examples for good generalization
– ‘a good set of sentences’
4
Learning procedure
1. collect a large set of examples
2. Divide it into two disjoint sets :
training set & test set
3. use the learning algorithm with the training
set as examples to generate a hypothesis H
4. Measure the percentage of examples in the
test set that are correctly classified by H.
5. Repeat steps 1-4 for different sizes of
training sets & different randomly selected
training sets of each size
Introduction :
• Knowledge
f :X →a
– function
from sensory data
to proper action
• Representation
– 1. single TLU with adjustable weights
5
Training Single TLU
• TLU geometry
– TLU definition
• internal knowledge representation
Input : X
• an abstract computation tool that calculates
output : f (X )
f ( X ) = f s (W ⋅ X − θ ) transfer : f s (s)
weight : W
threshold : θ
f s (s )
1
0
s
Y = W ⋅ X −θ = 0
Y = W ⋅ X −θ
6
Training Single TLU:
f s (s) ≡ s
– 3.2.4 The Widrow-Hoff Procedure
s
• Using transfer function : f s ( s) ≡ s
∂ε
∂W
= −2 ( d − f ) X ε
∂ε
W ' ← W − 12 c
∂W
W
W ' ← W + c(d − f ) X W' W
1
∂ε f s (s) ≡
= −2( d − f ) f (1 − f ) X 1 + e−s
∂W
∂ε
W ' ← W − 12 c
∂W
Sigmoid function f ′ = f (1 − f )
W ' ← W + c(d − f ) f (1 − f ) X
W ' ← W + c (d − f ) X
Known Theorem :
7
Training Single TLU:
s = W ⋅ X − θ = ( w1 , w2 ,..., wn ) ⋅ ( x1 , x 2 ,..., xn ) − θ
= W '⋅ X ' = ( w1 , w2 ,..., wn ,−θ ) ⋅ ( x1 , x 2 ,..., xn ,1)
e = ∑ (d i − f i ) 2
X i ∈Ξ
TLU output : f i = f s (W ⋅ X i )
desired output : di
∂ε def ∂ε ∂ε ∂ε
= ,..., ,...,
∂W ∂w1 ∂wi ∂wn +1 ε = ( d − f ) 2 = ( d − f s (W ⋅ X )) 2
s =W ⋅ X
∂ε ∂ε ∂s
=
∂W ∂s ∂W
∂s ∂W ⋅ X
= =X
∂W ∂W
∂ε ∂ε
= X
∂W ∂s
∂ε ∂ ( d − f s ) 2 ∂f
= = −2( d − f s ) s
∂s ∂s ∂s
∂ε ∂f
= −2(d − f ) X
∂W ∂s
∂f s
case 1) f s ( s) ≡ s =1
∂s
∂ε
= −2( d − f ) X 1 ∂f s
∂W case 2) f s ( s) ≡ = f s (1 − f s )
1 + e−s ∂s
∂ε
= −2(d − f ) f (1 − f ) X
∂W
8
Training Single TLU:
– Example problem
Home work !!
• Networking TLUs
– Feedforward net : There is no circuit in the net, output value is dependent on only input values.
– Recurrent net : There are circuits in the net, output value is dependent on input & history.
* Layers of net : group of TLUs, which input from and output to TLUs in other group.
f = x1 x2 + x1 x2
* sometimes it is called as 2 layer network.
9
Training Single TLU:
– 3.3.2 Notations
( j)
• j-th Layer output vector : X
• Input vector : X
( 0)
= input
• Final layer output vector : X
(k )
= f
∂ε ∂ε ∂ε
def
∂ε ∂ε
= ( j ) ,..., ( j ) ,..., ( j )
∂Wi ( j ) ∂Wi ( j ) ∂ ∂ ∂
1,i
w w l ,i wm j−1 +1,i
• Using activation variable and the chain rule, si( j ) = X ( j −1) ⋅Wi ( j )
X ( j −1)
∂ε ∂ε ∂ε ∂s ( j)
∂si( j )
= i
= X ( j −1)
∂si( j ) ∂Wi ( j ) ∂si( j ) ∂W i
( j)
∂Wi ( j )
∂ε
= X ( j −1)
∂si( j )
∂ε ∂ (d − f ) 2 ∂f
• Using the Derivative of Sigmoid = = −2(d − f ) ( j )
∂si( j ) ∂si( j ) ∂si
d− f ∂f
∂ε ∂f
∂si( j ) = −2( d − f ) ( j ) X ( j −1)
∂Wi ( j ) ∂si
∂f
• Using a new variable δ i( j ) (activation-error influence) δ i( j ) = ( d − f )
∂si( j )
∂ε ∂ε
δ i( j ) = −2δ i( j ) ⋅ X ( j −1) = −2δ i( j )
∂Wi ( j ) ∂si( j )
• A new weight update rule (gradient descent )
∂ε
Wi ( j ) ← Wi ( j ) + ci( j )δ i( j ) X ( j −1) W ' ← W − 12 c
∂W
10
Training Single TLU:
• By definition
∂f
δ i( j ) = ( d − f )
∂si( j )
∂f
δ ( k ) = (d − f )
∂s ( k )
• Since f is the sigmoid function of s (k ) f = sigmoid ( s ( k ) )
∂f
δ ( k ) = (d − f ) f (1 − f ) = f (1 − f )
∂s ( k )
• So, backpropagation weight adjustment rule for the single TLU in the final layer is,
δ l( j ) = − 12 m j +1
∂f ∂sl( j +1)
∂si( j ) = ∑ (d − f ) ∂f ∂ε
∂sl( j +1) ∂si( j ) (d − f ) = δ i( j ) = − 12 ( j )
l =1
∂si( j ) ∂si
m j +1
∂sl( j +1)
δ l( j ) = ∑ δ l( j +1)
δ l( j +1) ∂sl( j +1) l =1 ∂si( j )
∂si( j ) m j +1
• Using relationships between activation s l( j +1) = X ( j ) ⋅Wl ( j +1) = ∑ f v( j ) ⋅ wv( ,jl+1)
v =1
m j +1
∂[ ∑ f v( j ) ⋅ wv( ,jl+1) ]
∂sl( j +1) m j +1
∂f v( j ) ∂f ( j ) ∂f v( j )
= v =1
= ∑ wv( ,jl+1) ⋅ = wi(,lj +1) ⋅ i( j ) = 0 , if i ≠ v
∂si( j ) ∂s ( j)
i v =1 ∂si( j ) ∂si ∂si( j )
[ ]
m j +1 m j +1
δ i( j ) = ∑ δ l( j +1) wv( ,jl+1) f i ( j ) (1 − f i ( j ) ) = f i ( j ) (1 − f i ( j ) ) ∑ δ l( j +1) wv( ,jl+1)
l =1 l =1
∂ε
Wi ( j ) ← Wi ( j ) + ci( j )δ i( j ) X ( j −1) W ' ← W − 12 c
∂W
11
∂ε ∂ε
Recursive equation of ∂W ( j ) Dynamic Programming to calculate
i ∂Wi ( j )
∂ε
∂Wi ( j ) (j-1)-th layer
∂Wi ( j )
X ( j −1)
X ( j −1) ∂ε
∂si( j ) si( j )
∂si( j ) xi( j ) = f i ( j )
j-th layer
∂ε
δ i( j ) = α ( j)
∂si wi(,lj +1)
∂f
d− f
∂si( j )
∂s1( j +1) ∂sl( j +1) ∂sl(+j1+1) ∂s m( j( +j +11))
(j+1)-th layer
δ l
( j)
∂ε ∂ε
= −2δ i( j ) ⋅ X ( j −1) W ' ← W − 12 c
∂Wi ( j ) ∂W
Hopfield Net
• Proper when exact binary representations are
possible.
• Can be used as an associative memory or to
solve optimization problems.
• As an associative memory, Hopfield net has a
problem.
The number of classes (M) must be kept smaller than
0.15 times the number of nodes (N).
12
Hopfield Neural Net
OUTPUTS(Valid After Convergence)
x’0 x’1 x’N-2 x’N-1
. . . . .
x0 x1 xN-2 xN-1
INPUTS(Applied At Time Zero)
A Hopfield neural net that can be used as a content-addressable memory. An
unknown binary input pattern is applied at time zero and the net then iterates
until convergence when node ouputs remain unchanged. The output is that
pattern produced by node outputs after convergence
Tij = ∑x x
s =0
i
s s
j i≠ j
0 i=0
Tij is the connection weight from node i to node j ,
xis = 1, or − 1 (i − th element of class s )
13
Hopfield Net Algorithm
N −1
mi (t + 1) = Fh ∑ Tij m j (t ) 1
j =0 0
Fh : hard limiter -1
14
Hamming Net
• Optimum minimum error classifier
Calculate Hamming distance to the exemplar for each class and
selects that class with minimum Hamming distance
N2 vs NM + M2 → M(N+M)
N=100, M=10 10000 1100 ≈ NM (1000)
N >> M
Hamming Net
• Network 구조
OUTPUT(valid after MAXNET converge)
Y0 Y1 YM-2 YM-1 (Class)
MAXNET
Tkl
PICKS
MAXIMUM
CALCULATE
Wij
MATCHING
SCORES
x0 x1 xN-2 xN-1 (Data)
INPUT(applied at time zero)
15
Hamming Net Algorithm
• Step 1. Assign Connection Weights and offsets
in the lower subnet :
xij N
wij = , θj =
2 2
0 ≤ i ≤ N − 1, 0 ≤ j ≤ M − 1
in the upper subnet :
1 k =l
t kl = 1
− ε k ≠ l, ε <
M
0 ≤ k , l ≤ M −1
wij : connection weight from input i to node j
in the lower subnet
t kl : connection weight from node k to node l
• Step4. Go to step 2
ft:
1
16
17