0% found this document useful (0 votes)
2 views

AM

Uploaded by

anushka51203
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

AM

Uploaded by

anushka51203
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

ASSOCIATIVE MEMORY

NETWORKS
PATTERN ASSOCIATION
 Associating patterns which are

• similar,
• contrary,
• in close proximity (spatial),
• in close succession (temporal).

 Associative recall

• evoke associated patterns,


• recall a pattern by part of it,
• evoke/recall with incomplete/noisy patterns.
ASSOCIATIVE MEMORY (AM) NETWORK
 Two types of associations exist. For two patterns s and t

• hetero-association (s != t): relating two different patterns (s –


input, t – target).
• auto-association (s = t): relating parts of a pattern with other
parts.

 Architectures of NN associative memory:


• single layer (with/out input layer),
• two layers (for bidirectional association)

 Learning algorithms for AM:


• Hebbian learning rule and its variations,
• gradient descent.
ASSOCIATIVE MEMORY NETWORK
 WORKING PROCESS

• Recall a stored pattern by a noisy input pattern.

• Using the weights that capture the association.

• Stored patterns are viewed as “attractors”, each has its


“attraction basin”.

• Often call this type of NN “associative memory” (recall by


association, not explicit indexing/addressing).
TRAINING ALGORITHM FOR ASSOCIATIVE
MEMORY NETWORK
 Network structure: single layer

• one output layer of non-linear units and one input layer.

s_1 x_1 w_1 y_1 t_1


1
w_1m
w_n
s_n x_n 1 y_m t_m
w_n
m
 Goal of learning:

• to obtain a set of weights w_ij from a set of training pattern


pairs {s:t} such that when s is applied to the input layer, t is
computed at the output layer,
• for all training pairs s:t, tj = f(sTwj) for all j.
HEBB RULE FOR PATTERN ASSOCIATION
 Algorithm (bipolar or binary patterns):

• For each training samples s:t: ∆wij = si ⋅ t j


• ∆wij increases if both si and t j are ON (binary) or have the same
sign (bipolar).
If ∆wij = 0, then, after updates for all P training patterns,
P
wij = ∑ si ( p )t j ( p), W = { wij }
P =1

• Instead of obtaining W by iterative updates, it can be


computed from the training set by calculating the outer
product of s and t.
OUTER PRODUCT FOR PATTERN ASSOCIATION

Let s and t be row vectors.

Then for a particular training pair s:t

 s1   s1t1...s1tm  ∆w11...∆w1m 
   s t ...s t   
∆W ( p ) = s ( p ) ⋅ t ( p ) =   [t1 ,..., t m ] = 
T 2 1 2 m = 
     
     
s
 n s t ...s
 n 1 n m   n1t ∆w ...∆wnm 

and
P
W ( p ) = ∑ s T ( p) ⋅ t ( p )
p =1
HETERO-ASSOCIATIVE MEMORY NETWORK
 Binary pattern pairs s:t with |s| = 4 and |t| = 2.
 Total weighted input to output units: y _ in j = ∑ x i w ij
i
 Activation function: threshold 1 if y _ in j > 0
yj = 
0 if y _ in j ≤ 0
 Weights are computed by Hebbian rule (sum of outer products
of all training pairs) P
W = ∑ si ( p) t j ( p)
T
 Training samples: p =1

s(p) t(p)
p=1 (1 0 0 0) (1, 0)
p=2 (1 1 0 0) (1, 0)
p=3 (0 0 0 1) (0, 1)
p=4 (0 0 1 1) (0, 1)
COMPUTING THE WEIGHTS

1  1 0 1  1 0
       
 0 0 0 1  1 0
sT (1) ⋅ t (1) =  (1 0 ) =  sT (2) ⋅ t (2) =  (1 0 ) = 
0 0 0 0 0 0
       
 0 0 0   0 0 0 
     
 0 0 0  0 0 0
       
 0 0 0  0 0 0
sT (3) ⋅ t (3) =  (0 1) =  sT (4) ⋅ t (4) =  (0 1) = 
0 0 0 1 0 1
       
1  0 1  1  0 1 
     
2 0
 
1 0 
W = 
0 1 
 
0 2 

TEST/ RECALL THE NETWORK
x = [1 0 0 0] x = [0 1 1 0]
2 0 2 0
   
(0 1 1 0)
1 0
(1 0 0 0)
1 0 = (1 1)
 = (2 0 ) 0 1
0 1  
  0
0 2   2 

y1 = 1, y 2 = 0 y1 = 1, y 2 = 1

x = [0 1 0 0] similar to s(1) and s(2) (1 0 0 0), (1 1 0 0) class (1, 0)


2 0 (0 0 0 1), (0 0 1 1) class (0, 1)
 
(0 1 0 0)
1 0 (0 1 1 0) is not sufficiently similar
 = (1 0 )
0 1 to any class
 
0 2 

y1 = 1, y 2 = 0
AUTO-ASSOCIATIVE MEMORY NETWORK
 Same as hetero-associative nets, except t(p) =s (p).
 Used to recall a pattern by a its noisy or incomplete version.
(pattern completion/pattern recovery)
 A single pattern s = (1, 1, 1, -1) is stored (weights computed
by Hebbian rule or outer product rule.

1 1 1 − 1
1 1 1 − 1
W = 
1 1 1 − 1
 − 1 −1 − 1 1 
training pattern (111 − 1)⋅W = (4 4 4 − 4) → (111 − 1)
noisy pattern (− 111 − 1) ⋅W = (2 2 2 − 2) → (111 − 1)
missing info (0 0 1 − 1) ⋅W = (2 2 2 − 2) → (111 − 1)
more noisy (− 1 − 11 − 1) ⋅W = (0 0 0 0) not recognized
AUTO-ASSOCIATIVE MEMORY NETWORK –
DIAGONAL ELEMENTS
 Diagonal elements will dominate the computation when
multiple patterns are stored (= P).
 When P is large, W is close to an identity matrix. This causes
output = input, which may not be any stoned pattern. The
pattern correction power is lost.
 Replace diagonal elements by zero.
0 1 1 − 1
1 0 1 − 1
W0 =  
 1 1 0 − 1 
− 1 − 1 − 1 0 
(1 1 1 − 1)W ' = (3 3 3 − 3) → (1 1 1 − 1)
( −1 1 1 − 1)W ' = (3 1 1 − 1) → (1 1 1 − 1)
(0 0 1 − 1)W ' = ( 2 2 1 − 1) → (1 1 1 − 1)
( −1 − 1 1 − 1)W ' = (1 1 − 1 1) → wrong
STORAGE CAPACITY
 Number of patterns that can be correctly stored & recalled by a
network.
 More patterns can be stored if they are not similar to each
other (e.g., orthogonal).
 Non-orthogonal
0 0 −2 2 
0 0 0 0 
(1 − 1 − 1 1) → W0 =   (1 − 1 − 11) ⋅W0 = (1 0 − 1 1)
(1 1 − 1 1) − 2 0 0 − 2
It is not stored correctly
2 0 − 2 0 

 Orthogonal
0 −1 −1 − 1
(1 1 − 1 − 1) − 1 0 −1 − 1
(−1 1 1 − 1) → W0 =  
(−1 1 − 1 1) − 1 −1 0 − 1 All three patterns can be
− 1 −1 −1 0 correctly recalled
BIDIRECTIONAL ASSOCIATIVE MEMORY (BAM)
NETWORK
Architecture:
• Two layers of non-linear units: X-layer, Y-layer.
• Units: discrete threshold, continuing sigmoid (can be either
binary or bipolar).
Weights:

P
Wn×m = ∑ s T ( p) ⋅ t ( p) (Hebbian/outerproduct)
p =1

Symmetric: w ij = w ji
Convert binary patterns to bipolar when constructing
W.
RECALL OF BAM NETWORK
Bidirectional, either by X (to recall Y) or by Y (to recall X).
Recurrent:
y (t ) = [ f ( y _ in1 (t ),..., f ( y _ inm (t )]
n
where y _ in j (t ) = ∑ wi j ⋅ xi (t − 1)
i =1

x(t + 1) = [ f ( x _ in1 (t + 1),..., f ( x _ inn (t + 1)]


m
where x _ in i (t + 1) = ∑ wij ⋅ y j (t )
j =1

Update can be either asynchronous (as in hetero-associative memory)


or synchronous (change all Y units at one time, then all X units the
next time).
ACTIVATION FUNCTIONS IN BAM
The activation function is based on whether the input target vector
pairs used are binary or bipolar.

Activation function for the Y- Activation function for the X-


layer layer
With binary input vectors With binary input vectors

With bipolar input vectors With bipolar input vectors


DISCRETE HOPFIELD NETWORK (DHN)
 A single-layer network
• each node as both input and output units.

 More than an Associative Memory, Discrete Hopfield Network can


be used in several applications.
• Other applications such as combinatorial optimization.

 Different forms: discrete & continuous.

 Major contribution of John Hopfield to NN:


• Treating a network as a dynamic system.
• Introduction of energy function into Neural Network Research.
ARCHITECTURE OF DHN
 Architecture

• Single-layer (units serve as both input and output):


 nodes are threshold units (binary or bipolar).
 weights: fully connected, symmetric, and zero diagonal.

w ij = w ji
w ii = 0

xi are external inputs, which


may be transient or
permanent.
STORAGE CAPACITY OF DHN
P: maximum number of random patterns of dimension n can be stored
in a DHM of n nodes

P
Hopfield’s observation: P ≈ 0.15n, ≈ 0.15
n
n P 1
Theoretical analysis: P≈ , ≈
2 log 2 n n 2 log 2 n

P/n decreases because larger n leads to more interference between


stored patterns.

Some work to modify HM to increase its capacity to close to n, W is


trained (not computed by Hebbian rule).
CONTINUOUS HOPFIELD NET
 Architecture

• Continuous node output, and continuous time

• Fully connected with symmetric weights wij = w ji , wii = 0

dui (t ) n
• Internal activation ui : with = ∑ wij x j (t ) + θ i = neti (t )
dt j =1

• Output (state) xi (t ) = f (ui (t ))

where f is a sigmoid function to ensure binary/bipolar output. E.g. for


bipolar, use hyperbolic tangent function:
ex − e−x
f ( x) = tanh( x) = x
e + e−x
CONTINUOUS HOPFIELD NET

Computation: all units change their output (states) at the same time,
based on states of all others.

n
• Compute net: neti (t ) = ∑ wij x j (t ) + θi
j =1

• Compute internal activationui (t ) by first-order Taylor expansion


dui (t )
ui (t + δ ) = ∫ ini (t )dt ≈ ui (t ) + ⋅ δ + ... = ui (t ) + neti ⋅ δ
dt
• Compute output
xi (t ) = f (ui (t ))
ITERATIVE ASSOCIATIVE MEMORY NETWORK
Example

 0 1 1 −1 
 1 0 1 − 1 Output units are
x = (1, 1, 1, − 1) W = 
1 1 0 −1  threshold units
 
 − 1 − 1 − 1 0 

An incomplete recall input : x' = (1, 0, 0, 0)


Wx ' = (0, 1, 1, − 1) = x"
Wx" = (3, 2, 2, − 2) → (1, 1, 1, − 1) = x

In general: using current output as input of the next iteration

x(0) = initial recall input


x(I) = S(Wx(I-1)), I = 1, 2, ……
until x(N) = x(K) for some K < N
Dynamic System: State vector x(I)
If K = N-1, x(N) is a stable state (fixed point)

f(Wx(N)) = f(Wx(N-1)) = x(N)

If x(K) is one of the stored pattern, then x(K) is called a genuine


memory

Otherwise, x(K) is a spurious memory (caused by cross-


talk/interference between genuine memories)

Each fixed point (genuine or spurious memory) is an attractor (with


different attraction basin)
If K != N-1, limit-circle,

The network will repeat

x(K), x(K+1), …, x(N) = x(K) when iteration continues.

Iteration will eventually stop because the total number of distinct state
is finite (3^n) if threshold units are used. If patterns are continuous,
the system may continue evolve forever (chaos) if no such K exists.

You might also like