ETAM
ETAM
Chapter 6
1 Introduction
Associative memory (AM) is a mechanism used to store patterns: when a
reasonable subset of a certain pattern is received with the other part corrupted, it has
the ability to recover the original pattern. The fully connected network is a common
architecture for the AM. The interconnections between processing neurons provide
feedback, which enables the whole network to recurrently evolve into equilibrium.
Many well-developed models (Gardner 1987, 1988; Kanter and Sompolinsky 1987)
and algorithms have been devised to train the fully connected AM to improve its
accuracy, efficiency and capacity. The Hopfield model (HM) (Hopfield, 1982) applies
the Hebb's (1949) postulate of learning to generate interconnected weights. It has the
advantage of easy training and guarantees convergence when operation is in
asynchronous mode, but it also has the vital drawback that it has numerous limit
cycles in synchronous mode due to its zero-diagonal, symmetric weight matrix, and
zero threshold. The unexpected limit cycle restrains its ability to operate in
synchronous mode. There is no reason to exclude synchronous mode evolution among
neurons.
According to the relation between input patterns and output patterns, AM can be
divided into two classes: the auto-AM is able to recall a pattern which is the same as
the input pattern while the hetero-AM is able to present an output pattern which is
different from the input pattern. HM is an effective method for implementing the
auto-AM. This model is also extensively utilized to implement the hetero-AM, such
as bidirectional AM (Kosko 1987, 1988), the multidirectional AM (Hagiwara, 1990),
and the temporal AM (Amari, 1972).
82 526 U1180 neural networks
One of the features of the fully connected structure is that it can be viewed as a
hypercube; therefore, the learning problem of the AM can be transformed into a
geometric problem. In this work, we will present a new learning algorithm called error
tolerant associative memory (ETAM), which enlarges the basins of attraction,
centered at the stored patterns, to the greatest extent possible to improve error
tolerance. Simulations show that this algorithm also reduces the number of limit
cycles. We will focus on the auto-AM and the temporal-AM in this work. First we
will briefly illustrate the model used in this paper, where the geometric interpretations
and the ETAM algorithm will be presented. Computer experiments, comparisons, and
discussions will be given.
⎡N ⎤
vi (t + 1) = sgn ⎢∑ wij v j (t ) − θ i ⎥ , (1)
⎣ j =1 ⎦
or in matrix form,
V (t + 1) = sgn[WV (t ) − θ ] , (2)
where W is an N × N weight matrix, θ is an N × 1 threshold vector, V(t) is an N × 1
vector representing the global state at time t, and sgn() is the sigmoid function
returning 1 with input greater than or equal to zero and -1 with negative input. In the
learning phase, the network is trained by P patterns Xk, k=1, …,P, according to
various learning algorithms (Widrow and Hoff, 1960; Hopfield, 1982). In the
retrieving phase, the input is presented to the network as V(0). Equation (2) is applied
to all the neurons in each iteration, so this network is said to operate in synchronous
mode; Eq. (1) is applied to only one neuron, and this network operates in
asynchronous mode. Then, the network operates repeatedly according to Eq. (1) or Eq.
(2) until it converges to a stable state or enters a limit cycle. A stable state meets the
following requirement:
V (t ) = sgn[WV (t ) − θ ] , (3)
regardless of whether it operates in synchronous or asynchronous mode. Each neuron
has a bipolar state value (a bit), and there are 2N global states in total. Therefore we
can view the whole network as an N-dimensional (N-D) hypercube with each global
526 U1180 neural networks 83
state located at a corner. Any two neighboring corners differ in only one neuron state
or one Hamming distance. For example, in Fig. 1, there is a 3-D cube corresponding
to a network with three neurons. The current global state is located at one corner and
serves as the next input to the network. After updating according to Eq. (1) or Eq. (2),
the current global state either moves to another corner or stays at the original corner.
Corners that always remain unchanged are stable states. The patterns we intend to
save are located at certain stable corners. The goal of AM is to move the initial global
state to a nearby stable corner where a pattern is stored.
Fig. 1. A fully connected associative memory (AM) with three neurons can be viewed
as a 3-D cube. There are eight corners representing the eight global states. A neuron
represents a plane (shaded area) dividing the cube into positive and negative divisions.
In this figure (1,1,1), (1,-1,1), (1,1,-1), and (-1,1,1) are in the positive division while
the other corners are in the negative division
converge to its original pattern, the neighbors of one pattern should be in the same
division in which the pattern is located. This can be achieved by rotating and shifting
the hyperplane so that it will face the pattern corner and include as many neighboring
corners in the same division as possible. These rotations and shifts can be performed
by adjusting the weights and the thresholds.
Each hyperplane can be adjusted separately to stabilize the ith bits of patterns. In
the following, unless otherwise stated, we will only discuss neuron i. First, we will
illustrate a 3-D cube to observe the mechanism by which the AM saves patterns, then
we will expand this mechanism into a higher dimensional hypercube; that is, we will
present a general idea for training an AM with more neurons.
An example is shown in Fig. 2. It is an AM with three neurons, in which two
patterns, (1,1,1) and (-1,-1,-1), are stored. For the first neuron, we require that the
weights satisfy the following conditions:
The other two neurons also have to satisfy similar conditions, so we do not show their
corresponding planes. The black dots are patterns to be stored. The dark parts are the
positive division of the plane.
shown in Fig. 2c. The normal vector of the plane, (w11, w12, w13), is adjusted to point
at (1,1,1) with a right angle to the corner, and so that the plane lies in the middle
between (1,1,1) and (-1,-1,-1). In this case, all three planes will coincide.
In Fig. 3, we see only one pattern (1,-1,1) to be stored, and the first neuron
hyperplane is shown. In Fig. 3a, there is no error tolerance; in Fig. 3b, the one-bit
error can be recovered; in Fig. 3c, even a two-bit error can be recovered. In all three
cases, the normal vector of the plane, (w11, w12, w13), points at the pattern corner to be
stored with a right angle. The only difference is the threshold. When we modify the
threshold to move the pattern farther from the plane, we get better error tolerance. If
there is more than one pattern, such as in the example shown in Fig. 4a, b, the normal
vector points at the midpoint of the stored patterns; that is, the normal vector is the
sum of all the vectors which point from the origin to the patterns. In Fig. 4c, for the
second neuron, the two patterns have to lie separately in the two divisions, and the
best way to locate the plane is to make the distances from the two patterns to the plane
equal. Therefore the two patterns will have the same error tolerance.
As a result, we obtain the general idea that the normal vector of a certain
hyperplane representing neuron i is the sum of all patterns in the positive division and
all the patterns in the negative division multiplied by -1. In terms of a neural network,
the weights (wi1, wi2, wi3, …, wiN) are the sum of all patterns in which the ith bit is equal
to 1 and all the patterns multiplied by -1 in which the ith bit is equal to -1. This results
in the equation:
P
wij = ∑ X kj X ik , i, j = 1, …, N. (6)
k =1
We find that Eq. (6) coincides with the Hopfield model except for all the diagonal
weights wii. In the HM, all the self-feedback wii is made zero to achieve convergence.
This self-feedback has been discussed in Kanter and Sompolinsky (1987) Araki and
Saito (1995) and DeWilde (1997). In asynchronous mode, zero self-feedback and
symmetry ensure that the AM will finally converge to a stable state, but in
synchronous mode, there will be some limit cycles with length two (Bruck, 1990;
Goles and Martinez 1990). Following Xu et al. (1996) we know that the adequate
conditions for the AM to converge in synchronous mode are that the weight matrix W
is symmetric and nonnegative definite, or that W with certain values subtracted from
its diagonal is nonnegative definite. Equation (6) will make all wii equal to P, the
largest among all weights, and large self-feedback will benefit nonnegative
definiteness and convergence.
Besides the diagonal elements, the HM has null thresholds, which means that all
hyperplanes pass the origin. Even though the patterns may be stored without a
threshold, error tolerance can be greatly improved by tuning thresholds, such as in the
86 526 U1180 neural networks
example in Fig. 3. The problem of threshold tuning strategies has been addressed in
Schwenker et al. (1996) in the 0,1 model. By tuning thresholds, we can move the
planes as far away (distance) from the patterns as possible to increase error tolerance.
If there is more than one pattern, we can adjust threshold to maximize the minimal
distance among all distances from the patterns to the plane, as shown in Figs. 2c and
4c.
Fig. 4a-c. An associative memory (AM) saving (1, -1, 1) and (1, 1, 1) with different
degrees of error tolerance
the threshold to maximize the minimal distance from the patterns to the hyperplane as
possible. We propose the following retraining algorithm:
θ i (0) = 0 , i = 1, …, N.
Then normalize Wi, i = 1, …, N.
2. Set i to 1.
3. For the ith neuron, calculate the distances from all patterns to the
hyperplane, that is,
⎧d p = min{d k | X k = 1, k = 1,...P}
⎪ i i i
⎨ . (9)
⎪d i = max{d i | X i = −1, k = 1,...P}
n k k
⎩
θ i (t + 1) = θ i (t ) + (d ip + d in ) / 2 . (10)
d i
m
= (d i
p
− d i
n
)/2 . (11)
6. We rotate the hyperplane to increase the distances from both pattern p and
pattern n to the hyperplane:
wij (t + 1) = wij (t ) + α ( X ip X jp + X in X nj ) ,
88 526 U1180 neural networks
j = 1, …, N. (12)
where α is the learning rate. Normalize Wi.
7. Repeat Eq. (8) and Eq. (9) and compute the new d ip and d in . If the new
The normalization procedure used in learning step 6 after Eq. (12) is necessary to
limit the range of the normal vector Wi. We may obtain an asymmetric matrix W using
this normalization. This asymmetric matrix is not excluded by any existing
physiological evidence (e.g. Hertz et al. 1986, 1991, Porat 1989). The diagonal
elements of this matrix are neither equal to any others nor equal to zero. These
elements will have large positive values after learning. To our knowledge, this kind of
matrix has not been derived using any existing method.
The training time is case-dependent, and the learning process can be accelerated
by increase the learning rate. Since the weights are normalized in the ETAM, they
cannot increase without any limit and neither can the distance. Therefore, the learning
process guarantees termination.
⎛ N ⎞
X ik +1 = sgn ⎜⎜ ∑ wij X kj − θ i ⎟⎟ . (13)
⎝ j =1 ⎠
The superscript of the pattern Xk can be computed using modulo P + 1. With an initial
input state V(0) is close to a stored pattern Xk, the pattern Xk+1 will be the first pattern
recalled, and the remainder of the patterns will be recalled sequentially.
This temporal AM can store various kinds of pattern sequences, such as a single
526 U1180 neural networks 89
chain (dotted line in Fig. 5), a cycle of patterns (dashed line in Fig. 5), or a tree (the
upper two patterns in Fig. 5). Generally, the temporal AM is able to save all
one-to-one or many-to-one patterns, but not one-to-many patterns like that shown in
Fig. 6. We will not discuss the patterns in Fig. 6 in this paper.
This temporal associative memory has asymmetric weight and no threshold. However,
it has the same drawback as HM, low capacity. Amari's method usually cannot
memorize complete patterns, as we will see in simulations described later.
The idea presented in Sect. 2 is also applicable to the temporal AM or any other
hetero-AM. The difference is that in the auto-AM, all the states in the basin of a
pattern will converge to this pattern while in the temporal AM, all the states in the
basin of a pattern will evolve to its next pattern. To train weights is to shift and rotate
the hyperplanes to separate patterns into two divisions according to the ith bits of their
Therefore, the whole algorithm in the previous section can be used to train the
90 526 U1180 neural networks
θ i (0) = 0 , i = 1, …, N,
Then normalize Wi, i = 1, …, N.
2. Set i to 1.
3. For neuron i, calculate the distances from all patterns to the hyperplane, that
is,
⎧d p = min{d k | X k +1 = 1, k = 1,...P}
⎪ i i i
⎨ . (17)
⎪d i = max{d i | X i = −1, k = 1,...P}
n k k +1
⎩
θ i (t + 1) = θ i (t ) + (d ip + d in ) / 2 . (18)
d i
m
= (d i
p
− d i
n
)/2 . (19)
6. We rotate the hyperplane to increase the distances from both pattern p and
pattern n to the hyperplane :
wij (t + 1) = wij (t ) + α ( X ip +1 X jp + X in +1 X nj ) ,
j = 1, …, N. (20)
where α is the learning rate. Normalize Wi.
526 U1180 neural networks 91
7. Repeat Eq. (16) and Eq. (17) and compute the new d ip and d in . If the new
5 Experiments
Next we will give some simulations. We will first give experimental results to
compare the Little model (LM) (Little 1974; Little and Shaw 1975), the ECR model,
and the ETAM algorithm. Several issues, such as the number of stable states, the
number of limit cycles, and fault tolerance, will be discussed. These three approaches
were applied to a fully connected network. The difference among these three
approaches lies in the learning phase. We will give examples of utilizing the ETAM in
a pattern recognition problem. Below, we will first briefly review these algorithms.
matrix, wii, are zero. This means that all the neurons have no self-feedback, and this
has the effect of reducing the number of spurious stable states for the reason that
overlarge self-feedback will cause neurons to tend to retain their previous states. The
zero-diagonal and symmetric weight matrix forces the HM to always converge to a
stable state in asynchronous dynamics. However, the model we are interested in here
is called the Little model (LM) and is similar to the HM. It differs from the HM only
in that it uses synchronous dynamics. This forces the network to always converge to
not only a stable state, but also limit cycles of length two.
The ECR is used to adjust the weights in proportion to the error term [ X ik − vi (t )] . At
the beginning, we randomly assign initial values to all the weights, then we adjust all
the weights according to following equations:
θ i (t + 1) = θ i (t ) − η [ X ik − vi (t )] (23)
⎡N ⎤
vi (t ) = sgn ⎢∑ wij (t ) X kj − θ i (t )⎥ , (24)
⎣ j =1 ⎦
where η is a positive constant which determines the rate of learning. The pattern Xk
used to train the network is chosen randomly from among all the patterns. Adjustment
will continue until there are no more errors.
Table 1. Comparisons among the Little model (LM), error correction rule (ECR), and
error tolerant associative memory (ETAM)
The LM has a rather low capacity, so it cannot store all the patterns in all the
situations. The maximum number of patterns stored in the LM is similar to that of the
HM, which is N/4 ln N (Weisbuch and Fogelman-Soulié 1985; McEliece et al. 1987).
94 526 U1180 neural networks
This number has been improved by Mazza (1997). Also, the LM produces many limit
cycles.
For these two reasons, even though it produces the fewest stable states, this
advantage becomes redundant. When the number of patterns is small enough
compared to that of neurons, recovery from distorted patterns using the LM has
acceptable performance, as Table 1 shows with ten neurons and three patterns.
However, when there are more patterns, the LM performs poorly in terms of error
tolerance, and so does the ECR. Note that in this experiment, the patterns were
randomly generated, and some differed by only one or two bits. In this situation, it is
reasonable that not all 1-bit errors can be corrected.
The patterns given could be successfully memorized using the ECR. Regarding
other issues, the ECR produced very few limit cycles. However it produced spurious
stable states, and the error tolerance of the ECR was not very good either. This is
because the criterion based on which training to terminates is accuracy, not error
tolerance.
For the ETAM, these patterns are guaranteed to be saved because if errors occur,
Eqs. (9) and (10) will correct them immediately. Comparing the ECR and the ETAM,
the ECR produces more spurious stable states than the ETAM does when there are
few patterns and produces fewer spurious stable states when there are more patterns.
This is because the ETAM continues training until the minimal distance cannot be
increased further. A good minimal distance is easy to achieve when there are fewer
patterns but difficult to obtain when there are more patterns. Therefore, Eq. (12) is
repeated, the self-feedback weights wii continue increasing, and more spurious stable
states are generated. Overlarge self-feedback will cause a neuron to tend to stay in its
previous state and will produce more stable states. The extreme situation is that in
which all the weights and thresholds are zero except wii, which are all positive. In this
case, all the neurons remain unchanged, and all the global states are stable. We expect
the converged matrix to have weights of this kind for a large set of difficult patterns.
The ETAM produces no limit cycles. Again, this is because of the larger
self-feedback. When self-feedback is large, all neurons tend to stay in their previous
states; hence, the number of limit cycles can be effectively reduced. With the ETAM,
the performance in terms of recovery from noisy patterns is much better than that of
the other two methods for the reason we enlarge the neighborhood of the stored
pattern until no more improvement is possible. The larger the distance is, the more
error the AM can tolerate.
Although the number of spurious stable states is increased by the ETAM because
of the nonzero diagonal elements in the weight matrix, this has the effect of reducing
the number of states in limit cycles and improving proformance. This is a trade-off.
526 U1180 neural networks 95
Since we do not want a slightly noisy pattern to converge to an incorrect stable state,
better error tolerance has priority. Also, detecting whether a pattern falls into a limit
cycle is much harder than detecting whether a pattern converges to an incorrect stable
state. Therefore, we would rather avoid limit cycles. Note the computation cost of
ETAM is about five to ten times that of ECR in all of our simulations.
all the neurons according to Eq. (8) for each template pattern after the learning
depending on whether the signs of X kj and wij are identical or not, respectively. If
| d ik | is larger than the sum of the ri k largest weights multiplied by two, then ri k is
where w’ij are weights sorted in descending order according to their corresponding
wij X kj X ik . Note that there is at least one term (the largest one being wii X ik X ik ),
which has a positive value. Equation (25) will be satisfied, at least when ri k = 0. The
radius of pattern k is the minimal one among the radii of all the neurons. Table 3
presents the result. First, we find that the HM memorizes none of these patterns, and
neither does the LM. There are always several incorrect bits in the recalled pattern.
526 U1180 neural networks 97
A 44 36 36 56 57 24 46 68 56
B 44 34 10* 20 27 32 24* 58 50
C 36 34 28 42 45 16* 54 48 46
D 36 10* 28 26 33 30 34 60 56
E 56 20 42 26 13* 46 24* 50 48
F 57 27 45 33 13* 53 27 49 43
G 24* 32 16* 30 46 53 48 56 50
H 46 24 54 34 24 27 48 66 52
I 68 58 48 60 50 49 56 66 28*
J 56 50 46 56 48 43 50 52 28*
Table 3. Actual radius of each pattern for the error tolerant associative memory
(ETAM) and error correction rule (ECR)
A B C D E F G H I J
ETAM 3 2 2 22 2 2 2 3 3 3
ECR 0 0 0 0 0 0 0 0 0 0
Table 4. Error tolerance for the error tolerant associative memory (ETAM) and error
correction rule (ECR)
A B C D E F G H I J
ETAM in synchronous mode
10 1000 980 1000 989 993 989 998 1000 1000 1000
20 996 840 975 868 908 874 966 995 979 999
30 935 381 697 456 602 612 610 866 746 951
40 450 37 150 47 117 163 57 255 97 461
ETAM in asynchronous mode
10 1000 998 1000 1000 999 999 1000 1000 1000 1000
20 997 950 983 969 957 965 996 988 997 1000
30 938 673 833 811 671 707 820 701 854 945
40 474 190 400 289 115 104 312 78 126 432
ECR in synchronous mode
10 350 289 50 73 123 25 219 31 92 32
20 189 163 11 19 44 11 75 4 32 8
30 54 40 3 1 4 5 12 3 5 0
40 4 1 0 8 0 0 0 0 0 0
526 U1180 neural networks 99
patterns in varying order. To test its error tolerance, we randomly generate noisy
patterns, then check how many can converge to the template patterns. Due to its
dynamic property, the temporal-AM will shift to a different pattern after every pass
and will not converge to a stable pattern in the long run like the auto-associative
memory. Therefore, we have two parts of each simulation. The first one is used to
operate the network in one single pass to see whether the correct next pattern will be
recalled. We call this part the recall rate. The second part is to operate the network
until it converges to a stable or a limit cycle to see whether the final state or cycle
belongs to the sequence trained. We call this part the converge rate. The results of
these two parts are shown in Fig. 10.
In Fig. 10a, the recall rate is more than 90% when five errors occur, and it
decreases rapidly with the increase in the number of errors. However, as shown in Fig.
10b, the convergence rate is much higher than the recall rate and remains at 90%,
even when there are 30 errors. Sometimes, a noisy pattern cannot shift to the next
correct next pattern but will fall into the basin of attraction of the next pattern.
Therefore, one more pass will make the pattern converge to the stored sequence. This
figure shows that the basins of all the patterns combine into a huge valley system for
the tree.
The data shown in Fig. 10 were generated in the same way as those shown in
Table 4. The data were averaged over patterns to plot the figure.
(a)
(b)
526 U1180 neural networks 101
(c)
Fig. 9a-c. The ten patterns in varying order. a A chain; b a cycle; c a tree
(a)
(b)
Fig. 10a-b. The recall rates and convergence rates
102 526 U1180 neural networks
Fig. 11. The deterministic finite state automata (DFA) recognizing 01011
Table 5. The transition rules and corresponding patterns
State Input Next Pattern
S0 a S1 100000 10 → 010000
S0 b S0 100000 01 → 100000
S1 a S1 010000 10 → 010000
S1 b S2 010000 01 → 001000
S2 a S3 001000 10 → 000100
S2 b S0 001000 01 → 100000
S3 a S1 000100 10 → 010000
S3 b S4 000100 01 → 000010
S4 a S3 000010 10 → 000100
S4 b S5 000010 01 → 000001
S5 a S5 000001 10 → 000001
S5 b S5 000001 01 → 000001
526 U1180 neural networks 103
6 Discussions
⎡N ⎤
vi (t ) = sgn ⎢∑ wij (t ) X kj − θ i (t ) − γX ik ⎥ , (26)
⎣ j =1 ⎦
where γ, the terminating criterion, is a positive constant. This equation was given by
Gardner (1987, 1988).With γ equal to zero, we obtain the original error correction rule.
Since γ is positive, it is rather difficult for the learning to achieve the correct result. If
N
X ik are also positive, we need a larger ∑w
j =1
ij (t ) X kj − θ i (t ) , which is equal to the
distance in Eq. (8), to make vi(t) positive. Therefore, even though a few bits are wrong,
the summation can still retain the same sign. Doing this has the same effect that the
ETAM has in pushing the plane as far as possible away from the patterns. However,
Eq. (22) together with Eq. (26) will not rotate the planes the same way as Eq. (12).
How to select γ is a problem. If γ is too small, good error tolerance cannot be
achieved; while γ is too large, the iteration time will be too long and may never stop.
We can gradually increase the value of γ whenever a lower value is reached by the
learning in Eq. (26).
In the ETAM algorithm, the terminating criterion is that the minimal distance no
longer increases. In order to avoid the states being trapped, we can improve the
terminating criterion (in learning step 7 in Sect. 2) by monitoring the decrease in the
minimal distance two or more times successively. We find that doing this effectively
improves the result. However, this may cause the iteration time to be long while the
original ETAM is guaranteed to stop.
When it is found that all the patterns' i-th bit are the same, say 1, we simply stop
the training procedure for this neuron and set the threshold θi to a large negative
number. This ensures that this neuron will eventually converge to 1, no matter what
the initial state is. Also, this will save much time and reserve more space for newly
arriving memories.
Finally, in the ETAM, the weights are updated according to Eq. (12) or Eq. (20).
Although the ETAM is designed based on the geometric viewpoint, it meets Hebb's
postulate of learning. When two neurons fire at the same time, the weight between
them is increased; otherwise, the weight is decreased. This is quite similar to many
other learning methods. The difference is in the way the pattern used to adjust the
weights is chosen. Neurons’ weights will be tuned only for those most critical patterns
selected by Eq. (9) or Eq. (17). These critical patterns have the attention of the
neurons. This is similar to support vectors (Boser 1992).
As for biological modeling, the self-feedback synapse and the threshold of each
neuron are active where the neuron screens out vast amounts of noncritical patterns to
reduce the adaptation activities on its synapses. Its threshold will adapt to stand for the
critical patterns. Its synapses will adapt to keep the critical patterns as discriminable
as possible. A simple thresholding unit and Hebbian synapses are necessary for a
neuron to carry out this modeling.
Note when one uses {0, 1} as the state values instead of {1, -1}, one can use
coordinate transformation as for the HN to transform these two kinds of state value
representations. This transformation scales and shifts the hypercube {-1, 1}N to match
the hypercube {0, 1}N. Substituting {Xi = 2Yi – 1, i = 1, …N} in the retraining
algorithm will give the formulas in {0, 1}N space. The diagonal length of the
hypercube should be scaled accordingly.
N N
x T Wx = ∑∑ wij xi x j ≥ 0 . (27)
i =1 j =1
Nonnegative definiteness requires that xTWx be greater than or equal to zero for all
vectors x. On the right-hand side of Eq. (27), a positive wii can increase xTWx.
Therefore, large self-feedback has the effect to improving convergence and reducing
limit cycles. Also, it increase the number of spurious stable states.
In the geometric sense, large wii means that the ith hyperplane is nearly
perpendicular to its corresponding axis. If the thresholds are small or zero, then most
of the corners in which the ith bit is equal to 1 will lie in the positive division while
most of the corners in which the ith bit is equal to -1 will lie in the negative division.
Again, this shows that large self-feedback is good to stability.