0% found this document useful (0 votes)

2 views

ETAM

This document introduces a new approach to associative memory (AM) called error tolerant associative memory (ETAM), which enhances the ability to recover original patterns from noisy inputs. The method improves error tolerance by enlarging the basins of attraction around stored patterns and reducing limit cycles in synchronous operation. The paper discusses the geometric interpretation of the Hopfield network and presents a retraining algorithm to optimize weight and threshold adjustments for better performance.

Uploaded by

oasisolga

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

ETAM

Uploaded by

oasisolga

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

526 U1180 neural networks 81

Chapter 6

Synthesis of Associative Memory

Abstract - We present a new approach to enlarging the basin of attraction of
associative memory, including auto-associative memory and temporal associative
memory. The memory trained by means of this method can tolerate and recover from
seriously noisy patterns. Simulations show that this approach will greatly reduce the
number of limit cycles.

1 Introduction
Associative memory (AM) is a mechanism used to store patterns: when a
reasonable subset of a certain pattern is received with the other part corrupted, it has
the ability to recover the original pattern. The fully connected network is a common
architecture for the AM. The interconnections between processing neurons provide
feedback, which enables the whole network to recurrently evolve into equilibrium.
Many well-developed models (Gardner 1987, 1988; Kanter and Sompolinsky 1987)
and algorithms have been devised to train the fully connected AM to improve its
accuracy, efficiency and capacity. The Hopfield model (HM) (Hopfield, 1982) applies
the Hebb's (1949) postulate of learning to generate interconnected weights. It has the
advantage of easy training and guarantees convergence when operation is in
asynchronous mode, but it also has the vital drawback that it has numerous limit
cycles in synchronous mode due to its zero-diagonal, symmetric weight matrix, and
zero threshold. The unexpected limit cycle restrains its ability to operate in
synchronous mode. There is no reason to exclude synchronous mode evolution among
neurons.
According to the relation between input patterns and output patterns, AM can be
divided into two classes: the auto-AM is able to recall a pattern which is the same as
the input pattern while the hetero-AM is able to present an output pattern which is
different from the input pattern. HM is an effective method for implementing the
auto-AM. This model is also extensively utilized to implement the hetero-AM, such
as bidirectional AM (Kosko 1987, 1988), the multidirectional AM (Hagiwara, 1990),
and the temporal AM (Amari, 1972).
82 526 U1180 neural networks

One of the features of the fully connected structure is that it can be viewed as a
hypercube; therefore, the learning problem of the AM can be transformed into a
geometric problem. In this work, we will present a new learning algorithm called error
tolerant associative memory (ETAM), which enlarges the basins of attraction,
centered at the stored patterns, to the greatest extent possible to improve error
tolerance. Simulations show that this algorithm also reduces the number of limit
cycles. We will focus on the auto-AM and the temporal-AM in this work. First we
will briefly illustrate the model used in this paper, where the geometric interpretations
and the ETAM algorithm will be presented. Computer experiments, comparisons, and
discussions will be given.

2 Geometric Interpretation of the Hopfield

Network
AM is a fully connected network with N neurons. Each neuron i has N weights
{wij, j=1,…,N} connecting all neurons j, a threshold θi, and a state value vi ∈ {1,−1} .
The state value is updated according to the rule

⎡N ⎤
vi (t + 1) = sgn ⎢∑ wij v j (t ) − θ i ⎥ , (1)
⎣ j =1 ⎦

or in matrix form,
V (t + 1) = sgn[WV (t ) − θ ] , (2)
where W is an N × N weight matrix, θ is an N × 1 threshold vector, V(t) is an N × 1
vector representing the global state at time t, and sgn() is the sigmoid function
returning 1 with input greater than or equal to zero and -1 with negative input. In the
learning phase, the network is trained by P patterns Xk, k=1, …,P, according to
various learning algorithms (Widrow and Hoff, 1960; Hopfield, 1982). In the
retrieving phase, the input is presented to the network as V(0). Equation (2) is applied
to all the neurons in each iteration, so this network is said to operate in synchronous
mode; Eq. (1) is applied to only one neuron, and this network operates in
asynchronous mode. Then, the network operates repeatedly according to Eq. (1) or Eq.
(2) until it converges to a stable state or enters a limit cycle. A stable state meets the
following requirement:
V (t ) = sgn[WV (t ) − θ ] , (3)
regardless of whether it operates in synchronous or asynchronous mode. Each neuron
has a bipolar state value (a bit), and there are 2N global states in total. Therefore we
can view the whole network as an N-dimensional (N-D) hypercube with each global
526 U1180 neural networks 83

state located at a corner. Any two neighboring corners differ in only one neuron state
or one Hamming distance. For example, in Fig. 1, there is a 3-D cube corresponding
to a network with three neurons. The current global state is located at one corner and
serves as the next input to the network. After updating according to Eq. (1) or Eq. (2),
the current global state either moves to another corner or stays at the original corner.
Corners that always remain unchanged are stable states. The patterns we intend to
save are located at certain stable corners. The goal of AM is to move the initial global
state to a nearby stable corner where a pattern is stored.

Fig. 1. A fully connected associative memory (AM) with three neurons can be viewed
as a 3-D cube. There are eight corners representing the eight global states. A neuron
represents a plane (shaded area) dividing the cube into positive and negative divisions.
In this figure (1,1,1), (1,-1,1), (1,1,-1), and (-1,1,1) are in the positive division while
the other corners are in the negative division

In this hypercube, each neuron i describes an (N-1)-D hyperplane through the

equation
wi1v1 + wi 2 v 2 + wi 3 v3 + ... + wiN v N − θ i = 0 ,
i = 1,...N . (4)
T
The N × 1 weight vector Wi = (wi1, wi2, …, wiN) of the ith neuron is the normal
vector of the corresponding hyperplane, and the hyperplane divides the hypercube
into positive division to which the normal vector points, and a negative division. We
then follow the geometrical point of view developed by Cover (1965). We will require
that the length of the vector Wi, |Wi|, be normalized to one, and that θi be divided by
this length accordingly. The learning process is used to adjust the hyperplane to make
all the patterns stable. That is, when a pattern has an ith bit which is equal to 1, it
should be located in the positive division of the ith hyperplane; if it has an ith bit
equal to -1, it should be located in the negative division. Furthermore, if we wish to
achieve good error tolerance, which means that a reasonably noisy pattern can
84 526 U1180 neural networks

converge to its original pattern, the neighbors of one pattern should be in the same
division in which the pattern is located. This can be achieved by rotating and shifting
the hyperplane so that it will face the pattern corner and include as many neighboring
corners in the same division as possible. These rotations and shifts can be performed
by adjusting the weights and the thresholds.
Each hyperplane can be adjusted separately to stabilize the ith bits of patterns. In
the following, unless otherwise stated, we will only discuss neuron i. First, we will
illustrate a 3-D cube to observe the mechanism by which the AM saves patterns, then
we will expand this mechanism into a higher dimensional hypercube; that is, we will
present a general idea for training an AM with more neurons.
An example is shown in Fig. 2. It is an AM with three neurons, in which two
patterns, (1,1,1) and (-1,-1,-1), are stored. For the first neuron, we require that the
weights satisfy the following conditions:

⎧w11 ⋅ 1 + w12 ⋅ 1 + w13 ⋅ 1 − θ 1 > 0

⎨ (5)
⎩w11 ⋅ (−1) + w12 ⋅ (−1) + w13 ⋅ (−1) − θ 1 < 0

The other two neurons also have to satisfy similar conditions, so we do not show their
corresponding planes. The black dots are patterns to be stored. The dark parts are the
positive division of the plane.

(a) (b) (c)

Fig. 2a-c. An associative memory (AM) saving (1, 1, 1) and (-1, -1, -1) by means of
the error correction rule (ECR). c The ideal hyperplane

Figure 2a shows a randomly chosen initial weights and the corresponding

hyperplane. We apply the error-correction rule (ECR) (Widrow and Hoff, 1960) to
train these weights, and in only one step we get the updated weights shown in Fig. 2b.
The weights satisfy the conditions in Eq. (5). However, in order to increase the error
tolerance, we wish all the one-bit neighbors of (1,1,1), {(1,1,-1), (1,-1,1), and(-1,1,1)}
to be in the positive division, and all one-bit neighbors of (-1,-1,-1), {(-1,-1,1),
(-1,1,-1), and (1,-1,-1)}to be in the negative division, just like the ideal hyperplane
526 U1180 neural networks 85

shown in Fig. 2c. The normal vector of the plane, (w11, w12, w13), is adjusted to point
at (1,1,1) with a right angle to the corner, and so that the plane lies in the middle
between (1,1,1) and (-1,-1,-1). In this case, all three planes will coincide.
In Fig. 3, we see only one pattern (1,-1,1) to be stored, and the first neuron
hyperplane is shown. In Fig. 3a, there is no error tolerance; in Fig. 3b, the one-bit
error can be recovered; in Fig. 3c, even a two-bit error can be recovered. In all three
cases, the normal vector of the plane, (w11, w12, w13), points at the pattern corner to be
stored with a right angle. The only difference is the threshold. When we modify the
threshold to move the pattern farther from the plane, we get better error tolerance. If
there is more than one pattern, such as in the example shown in Fig. 4a, b, the normal
vector points at the midpoint of the stored patterns; that is, the normal vector is the
sum of all the vectors which point from the origin to the patterns. In Fig. 4c, for the
second neuron, the two patterns have to lie separately in the two divisions, and the
best way to locate the plane is to make the distances from the two patterns to the plane
equal. Therefore the two patterns will have the same error tolerance.
As a result, we obtain the general idea that the normal vector of a certain
hyperplane representing neuron i is the sum of all patterns in the positive division and
all the patterns in the negative division multiplied by -1. In terms of a neural network,
the weights (wi1, wi2, wi3, …, wiN) are the sum of all patterns in which the ith bit is equal
to 1 and all the patterns multiplied by -1 in which the ith bit is equal to -1. This results
in the equation:
P
wij = ∑ X kj X ik , i, j = 1, …, N. (6)
k =1

We find that Eq. (6) coincides with the Hopfield model except for all the diagonal
weights wii. In the HM, all the self-feedback wii is made zero to achieve convergence.
This self-feedback has been discussed in Kanter and Sompolinsky (1987) Araki and
Saito (1995) and DeWilde (1997). In asynchronous mode, zero self-feedback and
symmetry ensure that the AM will finally converge to a stable state, but in
synchronous mode, there will be some limit cycles with length two (Bruck, 1990;
Goles and Martinez 1990). Following Xu et al. (1996) we know that the adequate
conditions for the AM to converge in synchronous mode are that the weight matrix W
is symmetric and nonnegative definite, or that W with certain values subtracted from
its diagonal is nonnegative definite. Equation (6) will make all wii equal to P, the
largest among all weights, and large self-feedback will benefit nonnegative
definiteness and convergence.
Besides the diagonal elements, the HM has null thresholds, which means that all
hyperplanes pass the origin. Even though the patterns may be stored without a
threshold, error tolerance can be greatly improved by tuning thresholds, such as in the
86 526 U1180 neural networks

example in Fig. 3. The problem of threshold tuning strategies has been addressed in
Schwenker et al. (1996) in the 0,1 model. By tuning thresholds, we can move the
planes as far away (distance) from the patterns as possible to increase error tolerance.
If there is more than one pattern, we can adjust threshold to maximize the minimal
distance among all distances from the patterns to the plane, as shown in Figs. 2c and
4c.

(a) (b) (c)

Fig. 3a-c. An associative memory (AM) saving (1, -1, 1) with different degrees of
error tolerance

Fig. 4a-c. An associative memory (AM) saving (1, -1, 1) and (1, 1, 1) with different
degrees of error tolerance

3 Error Tolerant Associative Memory

With the above general idea in mind, we have developed a method called the
ETAM, which trains AM to achieve the best possible error tolerance. According to the
ideas presented in the previous section, we can adjust the weights so as to rotate the
hyperplane facing the pattern corner with a right angle as much as possible, and adjust
526 U1180 neural networks 87

the threshold to maximize the minimal distance from the patterns to the hyperplane as
possible. We propose the following retraining algorithm:

1. Initialize the weights according to the following equations:

P
wij (0) = ∑ X kj X ik , i, j = 1, …, N, (7)
k =1

θ i (0) = 0 , i = 1, …, N.
Then normalize Wi, i = 1, …, N.
2. Set i to 1.
3. For the ith neuron, calculate the distances from all patterns to the
hyperplane, that is,

d ik = wi1 X 1k + wi 2 X 2k + ... + wiN X NN − θ i , k = 1, … P. (8)

Find the positive minimal distance dp and negative minimal distance dn :

⎧d p = min{d k | X k = 1, k = 1,...P}
⎪ i i i
⎨ . (9)
⎪d i = max{d i | X i = −1, k = 1,...P}
n k k
⎩

4. If all the patterns have X ik = 1 , set θ i to a large negative value less

than − N , increase i by one, and go to step 3.

If all patterns have X ik = −1 , set θ i to a large positive value greater than

N , increase i by one, and go to step 3.

Note that we move the hyperplane outside the range of the hypercube,
where N is the distance between each corner and the origin.
5. We shift the hyperplane to the middle of pattern p and pattern n to maximize
the minimal distance as follows:

θ i (t + 1) = θ i (t ) + (d ip + d in ) / 2 . (10)

Now the minimal distance d im is in the middle of d ip and d in :

d i
m
= (d i
p
− d i
n
)/2 . (11)

6. We rotate the hyperplane to increase the distances from both pattern p and
pattern n to the hyperplane:

wij (t + 1) = wij (t ) + α ( X ip X jp + X in X nj ) ,
88 526 U1180 neural networks

j = 1, …, N. (12)
where α is the learning rate. Normalize Wi.

7. Repeat Eq. (8) and Eq. (9) and compute the new d ip and d in . If the new

(d ip − d in ) / 2 is larger than the previous d im , go to step 3 and continue. If

not, undo Eq. (12) and go to step 8.

8. Increase i by one. If i is less than or equal to N, go to step 3. If not, stop.

The normalization procedure used in learning step 6 after Eq. (12) is necessary to
limit the range of the normal vector Wi. We may obtain an asymmetric matrix W using
this normalization. This asymmetric matrix is not excluded by any existing
physiological evidence (e.g. Hertz et al. 1986, 1991, Porat 1989). The diagonal
elements of this matrix are neither equal to any others nor equal to zero. These
elements will have large positive values after learning. To our knowledge, this kind of
matrix has not been derived using any existing method.
The training time is case-dependent, and the learning process can be accelerated
by increase the learning rate. Since the weights are normalized in the ETAM, they
cannot increase without any limit and neither can the distance. Therefore, the learning
process guarantees termination.

4 Error Tolerant Associative Memory with

Temporal Patterns
The temporal AM is used to store one sequence or several sequences of patterns
in the AM's dynamic state transitions. Given an initial input pattern, it will converge
to the next pattern in a memorized sequence. All the patterns in this sequence will be
recalled sequentially. Due to its dynamic property, it can be used to recognize or
generate temporal patterns, such as speeches or images, or musical notations.
The temporal AM is trained to remember all patterns in the following dynamics:

⎛ N ⎞
X ik +1 = sgn ⎜⎜ ∑ wij X kj − θ i ⎟⎟ . (13)
⎝ j =1 ⎠

The superscript of the pattern Xk can be computed using modulo P + 1. With an initial
input state V(0) is close to a stored pattern Xk, the pattern Xk+1 will be the first pattern
recalled, and the remainder of the patterns will be recalled sequentially.
This temporal AM can store various kinds of pattern sequences, such as a single
526 U1180 neural networks 89

chain (dotted line in Fig. 5), a cycle of patterns (dashed line in Fig. 5), or a tree (the
upper two patterns in Fig. 5). Generally, the temporal AM is able to save all
one-to-one or many-to-one patterns, but not one-to-many patterns like that shown in
Fig. 6. We will not discuss the patterns in Fig. 6 in this paper.

Fig. 5. Various kinds of sequential patterns

Fig. 6. One-to-many patterns

The original temporal AM proposed by Amari (1972) is implemented according

to the Hebb's postulate of learning, which is similar to the HM, as follows:
P
wij = ∑ X ik +1 X kj . (14)
k =1

This temporal associative memory has asymmetric weight and no threshold. However,
it has the same drawback as HM, low capacity. Amari's method usually cannot
memorize complete patterns, as we will see in simulations described later.
The idea presented in Sect. 2 is also applicable to the temporal AM or any other
hetero-AM. The difference is that in the auto-AM, all the states in the basin of a
pattern will converge to this pattern while in the temporal AM, all the states in the
basin of a pattern will evolve to its next pattern. To train weights is to shift and rotate
the hyperplanes to separate patterns into two divisions according to the ith bits of their

next patterns. If X ik +1 = 1 , Xk should be in the positive division of the hyperplane and

in the negative division if X ik +1 = −1 .

Therefore, the whole algorithm in the previous section can be used to train the
90 526 U1180 neural networks

temporal AM with slight modification. The modified algorithm is listed below.

1. Initialize the weights according to the following equations:
P
wij (0) = ∑ X ik +1 X kj , i, j = 1, …, N, (15)
k =1

θ i (0) = 0 , i = 1, …, N,
Then normalize Wi, i = 1, …, N.
2. Set i to 1.
3. For neuron i, calculate the distances from all patterns to the hyperplane, that
is,

d ik = wi1 X 1k + wi 2 X 2k + ... + wiN X Nk − θ i , k = 1, … P. (16)

Find the positive minimal distance dp and negative minimal distance dn :

⎧d p = min{d k | X k +1 = 1, k = 1,...P}
⎪ i i i
⎨ . (17)
⎪d i = max{d i | X i = −1, k = 1,...P}
n k k +1
⎩

4. If all patterns have X ik +1 = 1 , set θ i to a large negative value less

than − N , increase i by one, and go to step 3.

If all patterns have X ik +1 = −1 , set θ i to a large positive value greater than

N , increase i by one, and go to step 3.

Note that we move the hyperplane outside the range of the hypercube,
where N is the distance between each corner and the origin.
5. We shift the hyperplane to the midpoint between pattern p and pattern n to
maximize the minimal distance as follows:

θ i (t + 1) = θ i (t ) + (d ip + d in ) / 2 . (18)

Now the minimal distance d im is in the middle of d ip and d in :

d i
m
= (d i
p
− d i
n
)/2 . (19)

6. We rotate the hyperplane to increase the distances from both pattern p and
pattern n to the hyperplane :

wij (t + 1) = wij (t ) + α ( X ip +1 X jp + X in +1 X nj ) ,

j = 1, …, N. (20)
where α is the learning rate. Normalize Wi.
526 U1180 neural networks 91

7. Repeat Eq. (16) and Eq. (17) and compute the new d ip and d in . If the new

(d ip − d in ) / 2 is larger than the previous d im , go to step 3 and continue. If

not, undo Eq. (20) and go to step 8.

8. Increase i by one. If i is less than or equal to N, go to step 3. If not, stop.
The algorithm above is the same as that used to train auto-AM in the previous
section except for Eqs. (15), (17), and (20).
We initialize the weights using Amari's method in Eq. (15) and update weights
according to Hebb's postulate in Eq. (20). Then, we calculate all the distances from all
the patterns to the hyperplane. Among the patterns which are located at the positive
division, the one with minimal distance is chosen in step 3, and so is the pattern in the
negative division. Then, we move the hyperplane to the midpoint between these two
patterns just chosen according to step 5 and rotate this hyperplane to make it face
these two patterns more right in step 6. These steps are repeated until the minimal
distances cease to increase. If all the patterns are located in a single division, we
simply move the hyperplane outside the hypercube, as described in step 4.

5 Experiments
Next we will give some simulations. We will first give experimental results to
compare the Little model (LM) (Little 1974; Little and Shaw 1975), the ECR model,
and the ETAM algorithm. Several issues, such as the number of stable states, the
number of limit cycles, and fault tolerance, will be discussed. These three approaches
were applied to a fully connected network. The difference among these three
approaches lies in the learning phase. We will give examples of utilizing the ETAM in
a pattern recognition problem. Below, we will first briefly review these algorithms.

5.1 Little Model

The HM is constructed by using the outer product rule to compute the weights as
follows:
⎧ 1 P k k
⎪ ∑ X X , if i ≠ j;
wij = ⎨ N k =1 i j (21)
⎪ 0, if i = j.
⎩
Several characteristics are worth noting. Elements on the diagonal of the weight
92 526 U1180 neural networks

matrix, wii, are zero. This means that all the neurons have no self-feedback, and this
has the effect of reducing the number of spurious stable states for the reason that
overlarge self-feedback will cause neurons to tend to retain their previous states. The
zero-diagonal and symmetric weight matrix forces the HM to always converge to a
stable state in asynchronous dynamics. However, the model we are interested in here
is called the Little model (LM) and is similar to the HM. It differs from the HM only
in that it uses synchronous dynamics. This forces the network to always converge to
not only a stable state, but also limit cycles of length two.

5.2 Error-Correction Rule

The ECR is used to adjust the weights in proportion to the error term [ X ik − vi (t )] . At

the beginning, we randomly assign initial values to all the weights, then we adjust all
the weights according to following equations:

wij (t + 1) = wij (t ) + η [ X ik − vi (t )] X kj (22)

θ i (t + 1) = θ i (t ) − η [ X ik − vi (t )] (23)

⎡N ⎤
vi (t ) = sgn ⎢∑ wij (t ) X kj − θ i (t )⎥ , (24)
⎣ j =1 ⎦

where η is a positive constant which determines the rate of learning. The pattern Xk
used to train the network is chosen randomly from among all the patterns. Adjustment
will continue until there are no more errors.

5.3 Comparisons for Auto-Associative

Memory
Table 1 lists experimental results for (N = 5, P = 5), (N = 5, P = 3), (N = 10, P =
5), (N = 10, P = 3). In each experiment, we present ten sets of randomly produced
patterns to the three methods, obtained the information about the number of stable
states, limit cycles, etc., then calculated the averaged results. For each row item, the
following explanation is offered:
SP [No. of stored patterns ( /P)]:
given P patterns, the number of patterns successfully stored.
SS [No. of stable states (/2N)]:
the number of stable states.
526 U1180 neural networks 93

TS [No. of states to stable (/2N)]:

the number of states converging to all stable states.(≥SS)
C [No. of cycles]:
the number of limit cycles.
IC [No. of states in cycles (/2N)]:
the number of states involved in all limit cycles.
TC [No. of states to cycles (/2N)]:
the number of transient states falling into limit cycles.
R [recovery (/NP)]:
given NP 1-bit error patterns, the number of patterns converging to the
original stored patterns.

Table 1. Comparisons among the Little model (LM), error correction rule (ECR), and
error tolerant associative memory (ETAM)

LM ECR ETAM LM ECR ETAM

N=5, P=5 N=10, P=5
SP(/5) 1.8 5 5 SP(/5) 1.9 5 5
SS(/32) 4.2 15.2 17.6 SS(/1024) 5.0 43.9 60.0
TS(/32) 14.6 16.8 14.4 TS(/1024) 744.8 978.4 964.0
C 2.7 0 0 C 44.3 0.2 0
IC(/32) 5.4 0 0 IC(/1024) 88.6 0.4 0
TC(/32) 7.8 0 0 TC(/1024) 185.6 1.3 0
R(/25) 3.2 3.9 5.5 R(/50) 12.1 13.5 38.9

LM ECR ETAM LM ECR ETAM

N=5, P=3 N=10, P=3
SP(/3) 2.4 3 3 SP(/3) 2.6 3 3
SS(/32) 5.2 7.1 4.8 SS(/1024) 6.8 20.2 6.2
TS(/32) 13.0 24.9 27.2 TS(/1024) 553.4 1003.2 1017.8
C 5.5 0 0 C 84.9 0.2 0
IC(/32) 11 0 0 IC(/1024) 169.8 0.4 0
TC(/32) 2.8 0 0 TC(/1024) 294.0 0.2 0
R(/15) 4.0 4.0 10.9 R(/30) 21.3 8.6 29.0

The LM has a rather low capacity, so it cannot store all the patterns in all the
situations. The maximum number of patterns stored in the LM is similar to that of the
HM, which is N/4 ln N (Weisbuch and Fogelman-Soulié 1985; McEliece et al. 1987).
94 526 U1180 neural networks

This number has been improved by Mazza (1997). Also, the LM produces many limit
cycles.
For these two reasons, even though it produces the fewest stable states, this
advantage becomes redundant. When the number of patterns is small enough
compared to that of neurons, recovery from distorted patterns using the LM has
acceptable performance, as Table 1 shows with ten neurons and three patterns.
However, when there are more patterns, the LM performs poorly in terms of error
tolerance, and so does the ECR. Note that in this experiment, the patterns were
randomly generated, and some differed by only one or two bits. In this situation, it is
reasonable that not all 1-bit errors can be corrected.
The patterns given could be successfully memorized using the ECR. Regarding
other issues, the ECR produced very few limit cycles. However it produced spurious
stable states, and the error tolerance of the ECR was not very good either. This is
because the criterion based on which training to terminates is accuracy, not error
tolerance.
For the ETAM, these patterns are guaranteed to be saved because if errors occur,
Eqs. (9) and (10) will correct them immediately. Comparing the ECR and the ETAM,
the ECR produces more spurious stable states than the ETAM does when there are
few patterns and produces fewer spurious stable states when there are more patterns.
This is because the ETAM continues training until the minimal distance cannot be
increased further. A good minimal distance is easy to achieve when there are fewer
patterns but difficult to obtain when there are more patterns. Therefore, Eq. (12) is
repeated, the self-feedback weights wii continue increasing, and more spurious stable
states are generated. Overlarge self-feedback will cause a neuron to tend to stay in its
previous state and will produce more stable states. The extreme situation is that in
which all the weights and thresholds are zero except wii, which are all positive. In this
case, all the neurons remain unchanged, and all the global states are stable. We expect
the converged matrix to have weights of this kind for a large set of difficult patterns.
The ETAM produces no limit cycles. Again, this is because of the larger
self-feedback. When self-feedback is large, all neurons tend to stay in their previous
states; hence, the number of limit cycles can be effectively reduced. With the ETAM,
the performance in terms of recovery from noisy patterns is much better than that of
the other two methods for the reason we enlarge the neighborhood of the stored
pattern until no more improvement is possible. The larger the distance is, the more
error the AM can tolerate.
Although the number of spurious stable states is increased by the ETAM because
of the nonzero diagonal elements in the weight matrix, this has the effect of reducing
the number of states in limit cycles and improving proformance. This is a trade-off.
526 U1180 neural networks 95

Since we do not want a slightly noisy pattern to converge to an incorrect stable state,
better error tolerance has priority. Also, detecting whether a pattern falls into a limit
cycle is much harder than detecting whether a pattern converges to an incorrect stable
state. Therefore, we would rather avoid limit cycles. Note the computation cost of
ETAM is about five to ten times that of ECR in all of our simulations.

5.4 Example I for Auto-Associative Memory

This example is taken from the work of Tank and Hopfield (1987). The original
example is like a memo with six attributes for each person. The goal is to store three
patterns (1, 1, 1, -1, -1, -1), (1, -1, 1, 1, -1, 1), and (1,1, -1, 1, -1, -1) in an AM with six
neurons representing six attributes. Using the LM, we obtain six stable states, which is
a pretty good result. However, 15 2-cycles and 24 more transient states which
converge to these limit cycles in synchronous mode exist. This means that the AM
will seldom converge to the expected stable states and usually will not stop. Using the
ECR no cycle exists, but eight more spurious stable states appear. Also, its error
tolerance is poor because almost none of the one-bit neighbors of these three patterns
can converge to the expected patterns. Using the ETAM, the performance is better.
The number of stable states slightly increases to seven, but there is no limit cycles
appear. About two-thirds of the one-bit neighbors can converge to the patterns to
which they belong.

5.5 Example II for Auto-Associative Memory

Here, we will present another example of pattern recognition. There are ten
template patterns from characters A to J as shown in Fig. 7. Each pattern is an 8 × 2
character, and 96 neurons are required. In Table 2, we list the Hamming distances
between any two patterns. Ideally, the minimal Hamming distance for a pattern
divided by two should be the ideal error tolerance for that pattern. However, the result
is not as perfect as imagined due to the linear separability. Consider Fig. 8. The ideal
radius is illustrated by the circle around each dot. To linearly separate these circles is
hopeless. The actually achievable radius is the distance from the pattern to the straight
line.
96 526 U1180 neural networks

Fig. 7. The ten patterns

Fig. 8. The ideal radius is illustrated by the

circle around each point. The actually
achievable radius is the distance from the point
to the straight line

To determine the actually achievable radius of each pattern, we calculate d ik for

all the neurons according to Eq. (8) for each template pattern after the learning

converges. If a bit j of pattern k is reverted, the d ik will increase or decrease by 2wij,

depending on whether the signs of X kj and wij are identical or not, respectively. If

| d ik | is larger than the sum of the ri k largest weights multiplied by two, then ri k is

the actual radius of neuron i for pattern k. That is,

rik +1 rik
2 ∑ w'ij X X > d
k
j i
k
i
k
> 2∑ w'ij X kj X ik (25)
j =1 j =1

where w’ij are weights sorted in descending order according to their corresponding

wij X kj X ik . Note that there is at least one term (the largest one being wii X ik X ik ),

which has a positive value. Equation (25) will be satisfied, at least when ri k = 0. The

radius of pattern k is the minimal one among the radii of all the neurons. Table 3
presents the result. First, we find that the HM memorizes none of these patterns, and
neither does the LM. There are always several incorrect bits in the recalled pattern.
526 U1180 neural networks 97

Therefore, we did not include the HM or the LM in the following simulation. By

applying the ECR, all patterns are successfully saved. However, the radii are all zero
because training stops as soon as no further errors occur, and error tolerance is not
considered. With the ETAM, all the patterns are stored and have two or three radii.
Apparently, the actual radius is much smaller than the ideal radius, but the worse
situation rarely occurs; we can see this in Table 4. For each pattern, we randomly
generated 1000 noisy patterns with 10, 20, 30, and 40 error bits, and fed these noisy
patterns into the trained AM. The numbers of successfully recovered patterns are
listed. Almost all noisy patterns with 10 errors are recovered, and more than 80% of
the patterns with 20 errors were recovered. The number of recovered patterns is
roughly proportional to the ideal radius of each pattern. One thing worth noticing is
that there is no pattern fell into limit cycles in any of 40,000 experiments. The ETAM
performs excellently in reducing the number of limit cycles and improving error
tolerance.
There is no efficient way to measure the exact sizes of basins of attraction in
such a large network, which has 296 totally states. Actually, we have tested the basin
sizes on a modern personal computer using brute force. It took us three weeks and we
still failed to reach the boundaries of the basins of the three patterns, A, B, and C.
According to the records, their basins must be huge. Instead, we use an eclectic way,
testing the error tolerance, to approximately reveal the huge basins of the stored
patterns. Better error tolerance must result from large basins of attractions.
Also, as shown in Table 4, we tried to retrieve patterns in asynchronous mode.
Although the ETAM is designed for the synchronous mode, it is suitable in
asynchronous mode, neurons are processed sequentially. An earlier recovered error
may aid the correction of later errors. Restoring a noisy pattern will result in much
benefit from local corrections. This kind of correction is different from that in
synchronous mode since all neurons are processed in a global sense. However, we still
cannot predict which the synchronous mode or the asynchronous mode will be better
in every situation.

Table 2. Hamming distances between all the patterns

A B C D E F G H I J
98 526 U1180 neural networks

A 44 36 36 56 57 24 46 68 56
B 44 34 10* 20 27 32 24* 58 50
C 36 34 28 42 45 16* 54 48 46
D 36 10* 28 26 33 30 34 60 56
E 56 20 42 26 13* 46 24* 50 48
F 57 27 45 33 13* 53 27 49 43
G 24* 32 16* 30 46 53 48 56 50
H 46 24 54 34 24 27 48 66 52
I 68 58 48 60 50 49 56 66 28*
J 56 50 46 56 48 43 50 52 28*

Table 3. Actual radius of each pattern for the error tolerant associative memory
(ETAM) and error correction rule (ECR)
A B C D E F G H I J
ETAM 3 2 2 22 2 2 2 3 3 3
ECR 0 0 0 0 0 0 0 0 0 0

Table 4. Error tolerance for the error tolerant associative memory (ETAM) and error
correction rule (ECR)
A B C D E F G H I J
ETAM in synchronous mode
10 1000 980 1000 989 993 989 998 1000 1000 1000
20 996 840 975 868 908 874 966 995 979 999
30 935 381 697 456 602 612 610 866 746 951
40 450 37 150 47 117 163 57 255 97 461
ETAM in asynchronous mode
10 1000 998 1000 1000 999 999 1000 1000 1000 1000
20 997 950 983 969 957 965 996 988 997 1000
30 938 673 833 811 671 707 820 701 854 945
40 474 190 400 289 115 104 312 78 126 432
ECR in synchronous mode
10 350 289 50 73 123 25 219 31 92 32
20 189 163 11 19 44 11 75 4 32 8
30 54 40 3 1 4 5 12 3 5 0
40 4 1 0 8 0 0 0 0 0 0
526 U1180 neural networks 99

5.6 Example I for Temporal Associative

Memory
In this section, we present simulations for the temporal AM. There are ten
patterns, from characters A to J. We require the temporal-AM to store these patterns in
different orders. In Fig. 9a, they are saved as a chain, in Fig. 9b, they are saved as a
cycle, and in Fig. 9c, they are saved as a tree. In general, the patterns can be saved
similar to a water system. The basins of the patterns from the river valley in the
system. All the patterns are designed to transferred to a certain next pattern, and
finally converge to a final stable state or a limit cycle.
In these three simulations, we use both Amari's method as Eq. (14) and the
algorithm presented in the previous section to train the memory. We found that using
Amari's method in our simulation, cannot store all the patterns. Only character J can
be correctly recalled in the chain of patterns, only A and B in the cycle of patterns, and
only B in the tree of patterns.
The tree AM is extremely useful in speech recognition. Space is needed to
elaborate such an application. We will briefly introduce the techniques developed in
our laboratory for recognizing Chinese speeches. We build a tree for each Chinese
character pronunciation. There are 1309 character pronunciations in total. Fifty
sequences of binary coded speech templates in melscale are collected for each
character and used to train its tree AM. Each tree has a different form, which is
obtained by means of training with these 50 sequences. This tree replaces the linear
neural array (Liou et al. 1996). A sequence is composed of time-warped temporal
patterns for a character pronunciation. These 50 sequences are used to train the ETAM
repeatedly using the provided algorithm. The partial matches among the training
sequences determine the tree form. The time warping will not affect the form of the
tree. It will introduce noise into the templates. The ETAM can restore such noisy
patterns. During the retrieving phase, a sequence of unknown speech templates is
applied to the trained 1309 trees in parallel. The credit is accumulated at each tree root.
This is similar to a river system, where we can collect all the water at the river mouth.
The water flows into any branch river or its valley will follow the river to the mouth.
Any unknown pattern that falls in the tree valley will evolve to the tree root.
Recognition is achieved by finding the best credit among the 1309 trees. This
technique is particularly useful in recognizing Chinese voice commands. We have
obtained much better performance using these trees than the arrays given by Liou et al.
(1996). The ETAM has been designed to learn rhymes from musical notes. After
training, it can generate varying sequences of notes with similar melodies.
With the algorithm presented in the last section, we can successfully store these
100 526 U1180 neural networks

patterns in varying order. To test its error tolerance, we randomly generate noisy
patterns, then check how many can converge to the template patterns. Due to its
dynamic property, the temporal-AM will shift to a different pattern after every pass
and will not converge to a stable pattern in the long run like the auto-associative
memory. Therefore, we have two parts of each simulation. The first one is used to
operate the network in one single pass to see whether the correct next pattern will be
recalled. We call this part the recall rate. The second part is to operate the network
until it converges to a stable or a limit cycle to see whether the final state or cycle
belongs to the sequence trained. We call this part the converge rate. The results of
these two parts are shown in Fig. 10.
In Fig. 10a, the recall rate is more than 90% when five errors occur, and it
decreases rapidly with the increase in the number of errors. However, as shown in Fig.
10b, the convergence rate is much higher than the recall rate and remains at 90%,
even when there are 30 errors. Sometimes, a noisy pattern cannot shift to the next
correct next pattern but will fall into the basin of attraction of the next pattern.
Therefore, one more pass will make the pattern converge to the stored sequence. This
figure shows that the basins of all the patterns combine into a huge valley system for
the tree.
The data shown in Fig. 10 were generated in the same way as those shown in
Table 4. The data were averaged over patterns to plot the figure.

(a)

(b)
526 U1180 neural networks 101

(a)

(b)
Fig. 10a-b. The recall rates and convergence rates
102 526 U1180 neural networks

5.7 Example II for Temporal Associative

Memory
In this section, we will present one more application. We can use our method to
implement a deterministic finite state automata (DFA) in a temporal-AM. In Fig. 11,
there is a DFA which accepts all strings containing 01011 as a substring. There are six
states and two input alphabet {a, b}, and 6 × 2 = 12 transition rules. We encode each
transition rule into a pattern, as shown in Table 5. Each of the first six neurons
represents a state. The neuron representing the current state is on, and the others are
off. The other two neurons are input neurons. When input is a (or b) the seventh (or
eighth) neuron is on, and the other one is off. First, we set the state to initial state S0.
When an input is received, we clamp it into the two input neurons and let the six state
neurons operate in synchronous mode. After a single pass, we implement the next
input, and so on. When the inputs have all been implementd, we check if the current
state is the final state S5. After training the AM using our method, we find that this
DFA can be encoded into this temporal-AM successfully. When we increase the
number of state neurons, this DFA can tolerate noise in state neurons.

Fig. 11. The deterministic finite state automata (DFA) recognizing 01011
Table 5. The transition rules and corresponding patterns
State Input Next Pattern
S0 a S1 100000 10 → 010000
S0 b S0 100000 01 → 100000
S1 a S1 010000 10 → 010000
S1 b S2 010000 01 → 001000
S2 a S3 001000 10 → 000100
S2 b S0 001000 01 → 100000
S3 a S1 000100 10 → 010000
S3 b S4 000100 01 → 000010
S4 a S3 000010 10 → 000100
S4 b S5 000010 01 → 000001
S5 a S5 000001 10 → 000001
S5 b S5 000001 01 → 000001
526 U1180 neural networks 103

6 Discussions

6.1 About the Error-Correction Rule

In Eq. (22), the ECR trains AM according to the error term, and the iteration
stops when there are no more errors. This terminating criterion can definitely achieve
correctness, but an AM without error tolerance is useless since the patterns we present
will usually be distorted. Therefore, we may modify the terminating criterion of the
ECR to postpone the end of training and enlarge the summation result in Eq. (24),
which is the distance in the ETAM.
Then, Eq. (24) becomes

⎡N ⎤
vi (t ) = sgn ⎢∑ wij (t ) X kj − θ i (t ) − γX ik ⎥ , (26)
⎣ j =1 ⎦

where γ, the terminating criterion, is a positive constant. This equation was given by
Gardner (1987, 1988).With γ equal to zero, we obtain the original error correction rule.
Since γ is positive, it is rather difficult for the learning to achieve the correct result. If
N
X ik are also positive, we need a larger ∑w
j =1
ij (t ) X kj − θ i (t ) , which is equal to the

distance in Eq. (8), to make vi(t) positive. Therefore, even though a few bits are wrong,
the summation can still retain the same sign. Doing this has the same effect that the
ETAM has in pushing the plane as far as possible away from the patterns. However,
Eq. (22) together with Eq. (26) will not rotate the planes the same way as Eq. (12).
How to select γ is a problem. If γ is too small, good error tolerance cannot be
achieved; while γ is too large, the iteration time will be too long and may never stop.
We can gradually increase the value of γ whenever a lower value is reached by the
learning in Eq. (26).

6.2 About Error Tolerant Associative Memory

Pushing planes forces each pattern to occupy more state space and, hence,
enlarges the basin of attraction of the corresponding pattern to which all nearby global
states will finally converge. Hence, these patterns are harder to forget and easier to
recover.
104 526 U1180 neural networks

In the ETAM algorithm, the terminating criterion is that the minimal distance no
longer increases. In order to avoid the states being trapped, we can improve the
terminating criterion (in learning step 7 in Sect. 2) by monitoring the decrease in the
minimal distance two or more times successively. We find that doing this effectively
improves the result. However, this may cause the iteration time to be long while the
original ETAM is guaranteed to stop.
When it is found that all the patterns' i-th bit are the same, say 1, we simply stop
the training procedure for this neuron and set the threshold θi to a large negative
number. This ensures that this neuron will eventually converge to 1, no matter what
the initial state is. Also, this will save much time and reserve more space for newly
arriving memories.
Finally, in the ETAM, the weights are updated according to Eq. (12) or Eq. (20).
Although the ETAM is designed based on the geometric viewpoint, it meets Hebb's
postulate of learning. When two neurons fire at the same time, the weight between
them is increased; otherwise, the weight is decreased. This is quite similar to many
other learning methods. The difference is in the way the pattern used to adjust the
weights is chosen. Neurons’ weights will be tuned only for those most critical patterns
selected by Eq. (9) or Eq. (17). These critical patterns have the attention of the
neurons. This is similar to support vectors (Boser 1992).
As for biological modeling, the self-feedback synapse and the threshold of each
neuron are active where the neuron screens out vast amounts of noncritical patterns to
reduce the adaptation activities on its synapses. Its threshold will adapt to stand for the
critical patterns. Its synapses will adapt to keep the critical patterns as discriminable
as possible. A simple thresholding unit and Hebbian synapses are necessary for a
neuron to carry out this modeling.
Note when one uses {0, 1} as the state values instead of {1, -1}, one can use
coordinate transformation as for the HN to transform these two kinds of state value
representations. This transformation scales and shifts the hypercube {-1, 1}N to match
the hypercube {0, 1}N. Substituting {Xi = 2Yi – 1, i = 1, …N} in the retraining
algorithm will give the formulas in {0, 1}N space. The diagonal length of the
hypercube should be scaled accordingly.

6.3 Nonnegative Definiteness

We noted earlier that large self-feedback (wii) is of benefit to nonnegative
definiteness and, hence, to convergence. This is obvious in the following equation:
526 U1180 neural networks 105

N N
x T Wx = ∑∑ wij xi x j ≥ 0 . (27)
i =1 j =1

Nonnegative definiteness requires that xTWx be greater than or equal to zero for all
vectors x. On the right-hand side of Eq. (27), a positive wii can increase xTWx.
Therefore, large self-feedback has the effect to improving convergence and reducing
limit cycles. Also, it increase the number of spurious stable states.
In the geometric sense, large wii means that the ith hyperplane is nearly
perpendicular to its corresponding axis. If the thresholds are small or zero, then most
of the corners in which the ith bit is equal to 1 will lie in the positive division while
most of the corners in which the ith bit is equal to -1 will lie in the negative division.
Again, this shows that large self-feedback is good to stability.

VTP Configuration On GNS3 With NM
100% (1)
VTP Configuration On GNS3 With NM
3 pages
H15B Plus-2 PC
No ratings yet
H15B Plus-2 PC
92 pages
Kohonen Self-Organizing Maps: Is The Normalization Necessary?
No ratings yet
Kohonen Self-Organizing Maps: Is The Normalization Necessary?
19 pages
Dynamics-Assigment 2
No ratings yet
Dynamics-Assigment 2
16 pages
Roland Potthast and Peter Beim Graben - Existence and Properties of Solutions For Neural Field Equations
No ratings yet
Roland Potthast and Peter Beim Graben - Existence and Properties of Solutions For Neural Field Equations
24 pages
Entropy in NeuralNets
No ratings yet
Entropy in NeuralNets
12 pages
Inverted Pendulum
No ratings yet
Inverted Pendulum
13 pages
Solar Cell PDF
No ratings yet
Solar Cell PDF
4 pages
Frequencydomain Finite Volume Method For Electromagnetic Scatter
No ratings yet
Frequencydomain Finite Volume Method For Electromagnetic Scatter
4 pages
Adv Lab Oscillators
No ratings yet
Adv Lab Oscillators
12 pages
ch6 Perceptron MLP PDF
No ratings yet
ch6 Perceptron MLP PDF
31 pages
Simple Perceptrons: 1 Nonlinearity
No ratings yet
Simple Perceptrons: 1 Nonlinearity
5 pages
Competitive Mixtures of Simple Neurons: Karthik Sridharan Matthew J. Beal Venu Govindaraju
No ratings yet
Competitive Mixtures of Simple Neurons: Karthik Sridharan Matthew J. Beal Venu Govindaraju
4 pages
Description of Quantum Dynamics of Open Systems Based On Collision-Like Models
No ratings yet
Description of Quantum Dynamics of Open Systems Based On Collision-Like Models
11 pages
Ising MatLab
No ratings yet
Ising MatLab
7 pages
Neural Fields With Fast Learning Dynamic Kernel
No ratings yet
Neural Fields With Fast Learning Dynamic Kernel
12 pages
Hardware - Basic Requirements For Implementation: Jkinsergmu - Edu
No ratings yet
Hardware - Basic Requirements For Implementation: Jkinsergmu - Edu
8 pages
Ecture Nverse Inematics: 1 Definition
No ratings yet
Ecture Nverse Inematics: 1 Definition
13 pages
Complex-Valued Bidirectional Auto-Associative Memory: Yozo Suzuki and Masaki Kobayashi
No ratings yet
Complex-Valued Bidirectional Auto-Associative Memory: Yozo Suzuki and Masaki Kobayashi
7 pages
ps4 2022
No ratings yet
ps4 2022
2 pages
Associative Memory Networks
No ratings yet
Associative Memory Networks
4 pages
PS5_2025 (1)
No ratings yet
PS5_2025 (1)
2 pages
Dinius Project Paper
No ratings yet
Dinius Project Paper
8 pages
2.57 Nano-to-Macro Transport Processes Fall 2004: Ja (j-1) A (j+1) A
No ratings yet
2.57 Nano-to-Macro Transport Processes Fall 2004: Ja (j-1) A (j+1) A
9 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
Principal Component Analysis: Methods Used by Hubley-Kosey Et Al., 2006
No ratings yet
Principal Component Analysis: Methods Used by Hubley-Kosey Et Al., 2006
10 pages
Optimization Based State Feedback Control Design For Impulse Elimination in Descriptor Systems
No ratings yet
Optimization Based State Feedback Control Design For Impulse Elimination in Descriptor Systems
5 pages
Qcnotes 3
No ratings yet
Qcnotes 3
23 pages
Networks With Threshold Activation Functions: Navigation
No ratings yet
Networks With Threshold Activation Functions: Navigation
6 pages
1 s2.0 002199917790095X Main
No ratings yet
1 s2.0 002199917790095X Main
24 pages
Dynamical Analysis The Brain-State-in-a-Box (BSB) Neural Models
0% (1)
Dynamical Analysis The Brain-State-in-a-Box (BSB) Neural Models
9 pages
Learning Fuzzy Control Laws For The Inverted Pendulum: X X X U M' X X X X M M X X X G X X U M'X X M M ' M X M M
No ratings yet
Learning Fuzzy Control Laws For The Inverted Pendulum: X X X U M' X X X X M M X X X G X X U M'X X M M ' M X M M
4 pages
Hopfield Networks and Boltzman Machines-Part 2
No ratings yet
Hopfield Networks and Boltzman Machines-Part 2
13 pages
Modeling Excitable Cells With The EMI Equations: Spectral Analysis and Iterative Solution Strategy
No ratings yet
Modeling Excitable Cells With The EMI Equations: Spectral Analysis and Iterative Solution Strategy
27 pages
Lecture 2.1.3 - Hopfield
No ratings yet
Lecture 2.1.3 - Hopfield
10 pages
Systematic Testing of Explicit Positivity Preserving Algorithms For The Heat-Equation
No ratings yet
Systematic Testing of Explicit Positivity Preserving Algorithms For The Heat-Equation
21 pages
4 Multilayer Perceptrons and Radial Basis Functions
No ratings yet
4 Multilayer Perceptrons and Radial Basis Functions
6 pages
Discussion of The Improved Methods For Analyzing A
No ratings yet
Discussion of The Improved Methods For Analyzing A
16 pages
Rigid Body
No ratings yet
Rigid Body
9 pages
Fast Approximation of The Norm Via Optimization Over Spectral Value Sets
No ratings yet
Fast Approximation of The Norm Via Optimization Over Spectral Value Sets
29 pages
Lecture 11 - Supervised Learning - Hopfield Networks - (Part 4)
No ratings yet
Lecture 11 - Supervised Learning - Hopfield Networks - (Part 4)
5 pages
Neural Networks For Control - 01
No ratings yet
Neural Networks For Control - 01
5 pages
The Hopfield Model: 1 Deterministic Dynamics
No ratings yet
The Hopfield Model: 1 Deterministic Dynamics
11 pages
RingResonators TheoryandModeling
No ratings yet
RingResonators TheoryandModeling
11 pages
AMAKU-COUTINHO-TOYAMA 2020 - The Normalization of Wave Functions
No ratings yet
AMAKU-COUTINHO-TOYAMA 2020 - The Normalization of Wave Functions
8 pages
PERCEPTRONS
No ratings yet
PERCEPTRONS
13 pages
Module 1-CMEE
No ratings yet
Module 1-CMEE
25 pages
Wave Scattering On A Domain Wall in A Chain of PT - Symmetric Couplers
No ratings yet
Wave Scattering On A Domain Wall in A Chain of PT - Symmetric Couplers
7 pages
Histogram Methods in Monte Carlo Simulations: Belliappa
No ratings yet
Histogram Methods in Monte Carlo Simulations: Belliappa
61 pages
Aim:-Study of FEM (Finite Element Method) Finite-Element Method
No ratings yet
Aim:-Study of FEM (Finite Element Method) Finite-Element Method
9 pages
Problem Set 1
No ratings yet
Problem Set 1
2 pages
Algorithms For The Ising Model
No ratings yet
Algorithms For The Ising Model
67 pages
Automata and CLM Modelling With Distributed Computing: The Constellation Project December 4, 2011
No ratings yet
Automata and CLM Modelling With Distributed Computing: The Constellation Project December 4, 2011
8 pages
Reduction of Finite Element Models of Complex Mechanical Components
No ratings yet
Reduction of Finite Element Models of Complex Mechanical Components
5 pages
Applications of Propositional Logic To Cellular Automata
No ratings yet
Applications of Propositional Logic To Cellular Automata
25 pages
Schrödinger
No ratings yet
Schrödinger
18 pages
Haake 1991
No ratings yet
Haake 1991
13 pages
Silverwood 2007 Report
No ratings yet
Silverwood 2007 Report
23 pages
Hand-In Problems Due Fri October 20: Reading Assignment
No ratings yet
Hand-In Problems Due Fri October 20: Reading Assignment
4 pages
Lennard Jones Potential
No ratings yet
Lennard Jones Potential
4 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Negative Mass and Negative Refractive Index in Atom Nuclei - Nuclear Wave Equation - Gravitational and Inertial Control: Part 3: Gravitational and Inertial Control, #3
From Everand
Negative Mass and Negative Refractive Index in Atom Nuclei - Nuclear Wave Equation - Gravitational and Inertial Control: Part 3: Gravitational and Inertial Control, #3
Raul Fattore
No ratings yet
covariance matrix
No ratings yet
covariance matrix
6 pages
baye's rule
No ratings yet
baye's rule
2 pages
Screenshot 2024-07-27 at 2.58.15 PM
No ratings yet
Screenshot 2024-07-27 at 2.58.15 PM
8 pages
CONVOLUTIONAL NEURAL NETWORK
No ratings yet
CONVOLUTIONAL NEURAL NETWORK
36 pages
Machine Learning LAB MANUAL
No ratings yet
Machine Learning LAB MANUAL
56 pages
Analytical Method Development and Validation of RP-HPLC For Estimation of Pregabalin and Epalrestat in Pure and Pharmaceutical Dosage Form
No ratings yet
Analytical Method Development and Validation of RP-HPLC For Estimation of Pregabalin and Epalrestat in Pure and Pharmaceutical Dosage Form
10 pages
Ccna Presentation Training
No ratings yet
Ccna Presentation Training
21 pages
The Best Blockchain Developer Tools
No ratings yet
The Best Blockchain Developer Tools
6 pages
CH - 2 Polynomial
100% (1)
CH - 2 Polynomial
56 pages
Control Lab 3
No ratings yet
Control Lab 3
15 pages
1.0 Introduction:-: Funicular Shell Structure Is Relate With That So Will Study It in This Document
No ratings yet
1.0 Introduction:-: Funicular Shell Structure Is Relate With That So Will Study It in This Document
5 pages
Critical Thinking Quiz
75% (4)
Critical Thinking Quiz
16 pages
INGUN HSS-120 317 300 A 1502 M HSS-120-0090 EN Datasheet
No ratings yet
INGUN HSS-120 317 300 A 1502 M HSS-120-0090 EN Datasheet
2 pages
Biostatistics Primer: What A Clinician Ought To Know: Subgroup Analyses
No ratings yet
Biostatistics Primer: What A Clinician Ought To Know: Subgroup Analyses
6 pages
Gen Ed-Math Reviewer
No ratings yet
Gen Ed-Math Reviewer
8 pages
Structure in C Programming Group 4
No ratings yet
Structure in C Programming Group 4
14 pages
Chapter 3
No ratings yet
Chapter 3
180 pages
01 Bcon141 Mod3 Lesson6 LinearProgramming
No ratings yet
01 Bcon141 Mod3 Lesson6 LinearProgramming
86 pages
Fisher Industrial Regulators
No ratings yet
Fisher Industrial Regulators
24 pages
Fortinet Product Matrix
No ratings yet
Fortinet Product Matrix
7 pages
Linear Algebra Via Exterior Products
No ratings yet
Linear Algebra Via Exterior Products
285 pages
Risc Reduced Instruction Set Computer
No ratings yet
Risc Reduced Instruction Set Computer
33 pages
DX Diag
No ratings yet
DX Diag
53 pages
7-Applying the t-test for independent and dependent samples-13-
No ratings yet
7-Applying the t-test for independent and dependent samples-13-
6 pages
C Plus Plus Notes For Professionals
No ratings yet
C Plus Plus Notes For Professionals
706 pages
Creatinine ARC CHEM PDF
No ratings yet
Creatinine ARC CHEM PDF
8 pages
Interview Questions
100% (1)
Interview Questions
6 pages
Module - 1
No ratings yet
Module - 1
53 pages
Chapter13 Reteach
No ratings yet
Chapter13 Reteach
24 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
Mooer Red Truck Manual 480465
No ratings yet
Mooer Red Truck Manual 480465
12 pages
Principle of Combustion PDF
No ratings yet
Principle of Combustion PDF
6 pages
CEBU: 4/F J. Martinez BLDG., Osmeña BLVD., Cebu City (032) 254-9967 (Cebu)
No ratings yet
CEBU: 4/F J. Martinez BLDG., Osmeña BLVD., Cebu City (032) 254-9967 (Cebu)
1 page

ETAM

Uploaded by

ETAM

Uploaded by

526 U1180 neural networks 81

Synthesis of Associative Memory

2 Geometric Interpretation of the Hopfield

In this hypercube, each neuron i describes an (N-1)-D hyperplane through the

⎧w11 ⋅ 1 + w12 ⋅ 1 + w13 ⋅ 1 − θ 1 > 0

(a) (b) (c)

Figure 2a shows a randomly chosen initial weights and the corresponding

(a) (b) (c)

3 Error Tolerant Associative Memory

1. Initialize the weights according to the following equations:

d ik = wi1 X 1k + wi 2 X 2k + ... + wiN X NN − θ i , k = 1, … P. (8)

Find the positive minimal distance dp and negative minimal distance dn :

4. If all the patterns have X ik = 1 , set θ i to a large negative value less

than − N , increase i by one, and go to step 3.

If all patterns have X ik = −1 , set θ i to a large positive value greater than

N , increase i by one, and go to step 3.

Now the minimal distance d im is in the middle of d ip and d in :

(d ip − d in ) / 2 is larger than the previous d im , go to step 3 and continue. If

not, undo Eq. (12) and go to step 8.

4 Error Tolerant Associative Memory with

Fig. 5. Various kinds of sequential patterns

Fig. 6. One-to-many patterns

The original temporal AM proposed by Amari (1972) is implemented according

next patterns. If X ik +1 = 1 , Xk should be in the positive division of the hyperplane and

in the negative division if X ik +1 = −1 .

temporal AM with slight modification. The modified algorithm is listed below.

d ik = wi1 X 1k + wi 2 X 2k + ... + wiN X Nk − θ i , k = 1, … P. (16)

Find the positive minimal distance dp and negative minimal distance dn :

4. If all patterns have X ik +1 = 1 , set θ i to a large negative value less

than − N , increase i by one, and go to step 3.

If all patterns have X ik +1 = −1 , set θ i to a large positive value greater than

N , increase i by one, and go to step 3.

Now the minimal distance d im is in the middle of d ip and d in :

(d ip − d in ) / 2 is larger than the previous d im , go to step 3 and continue. If

not, undo Eq. (20) and go to step 8.

5.1 Little Model

5.2 Error-Correction Rule

wij (t + 1) = wij (t ) + η [ X ik − vi (t )] X kj (22)

5.3 Comparisons for Auto-Associative

TS [No. of states to stable (/2N)]:

LM ECR ETAM LM ECR ETAM

LM ECR ETAM LM ECR ETAM

5.4 Example I for Auto-Associative Memory

5.5 Example II for Auto-Associative Memory

Fig. 7. The ten patterns

Fig. 8. The ideal radius is illustrated by the

To determine the actually achievable radius of each pattern, we calculate d ik for

converges. If a bit j of pattern k is reverted, the d ik will increase or decrease by 2wij,

the actual radius of neuron i for pattern k. That is,

Therefore, we did not include the HM or the LM in the following simulation. By

Table 2. Hamming distances between all the patterns

5.6 Example I for Temporal Associative

5.7 Example II for Temporal Associative

6.1 About the Error-Correction Rule

6.2 About Error Tolerant Associative Memory

6.3 Nonnegative Definiteness

You might also like