0% found this document useful (0 votes)
4 views

3

The document proposes a semi-holographic hyperdimensional representation system aimed at enhancing cognitive computing in hardware-friendly ways. It introduces a new cognitive processing unit (CoPU) that utilizes efficient mathematical operations like vector addition and binding, avoiding costly multiplications. The approach aims to enable higher-level reasoning and manipulation of semantic objects, contributing to the development of thinking machines in artificial intelligence.

Uploaded by

kifega9328
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

3

The document proposes a semi-holographic hyperdimensional representation system aimed at enhancing cognitive computing in hardware-friendly ways. It introduces a new cognitive processing unit (CoPU) that utilizes efficient mathematical operations like vector addition and binding, avoiding costly multiplications. The approach aims to enable higher-level reasoning and manipulation of semantic objects, contributing to the development of thinking machines in artificial intelligence.

Uploaded by

kifega9328
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

A semi-holographic hyperdimensional representation

system for hardware-friendly cognitive computing


A. Serb∗ , I. Kobyzev† , J. Wang∗ , T. Prodromakis∗
∗ Zepler
Institute, University of Southampton, SO17 1BJ, UK
† David R. Cheriton School of Computer Science, University of Waterloo, N2L 3G1, Canada
Corresponding author email: [email protected]

Abstract—One of the main, long-term objectives of artificial Handling the complex interactions/operations between se-
intelligence is the creation of thinking machines. To that end, mantic objects requires both orderly semantic object represen-
substantial effort has been placed into designing cognitive sys- tations and machinery to carry out useful object manipulation
tems; i.e. systems that can manipulate semantic-level information.
A substantial part of that effort is oriented towards designing operations. Hyperdimensional vector-based representation sys-
the mathematical machinery underlying cognition in a way that tems [11] have emerged as the de facto standard approach and
is very efficiently implementable in hardware. In this work we are employed in both the SPA and ACT-R. Their mathematical
propose a ‘semi-holographic’ representation system that can be machinery typically includes generalised vector addition (com-
implemented in hardware using only multiplexing and addition bine two vectors in such way that the result is as similar to both
operations, thus avoiding the need for expensive multiplication.
The resulting architecture can be readily constructed recycling operands as possible), vector binding (combine two vectors
standard microprocessor elements and ¡Something about hard- in such way that the result is as dissimilar to both operands
ware performance¿. Our proposed ‘cognitive processing unit’ as possible) and normalisation (scale vector elements so that
(CoPU) is intended as just one (albeit crucial) part of much overall vector magnitude remains constant). These operations
larger cognitive systems where artificial neural networks of all may be instantiated in holographic (all operands and results
kinds and associative memories work in concord to give rise to
intelligence. have fixed, common length) or non-holographic manners.
Non-holographic systems have employed convolution [12] or
I. I NTRODUCTION tensor products [13] as binding. Holographic approaches have
The explosive scale of research output and investment in the used circular convolution [11] and element-wise XOR [14].
field of artificial intelligence (AI) and machine learning (ML) Meanwhile, element-wise addition tends to remain the vector
testify to the tremendous impact of the field to the world. addition operation of choice across the board.
Thus far this has manifested itself as a mass-scale prolifera- Finally, whichever computational methodology is adopted
tion of artificial neural network-based (ANN) algorithms for for cognitive computing must be implementable in hardware
data classification. This covers multiple data modalities such with extremely high power efficiency in order to realise
as most prominently images [1] and speech/sound [2], and its full potential for practical impact. This is the objective
relies on a number of standard, popular ANN architectures, pursued by a number of accelerator architectures spanning
most notably multi-layer perceptrons [3], recurrent NNs (in from limited precision analogue neuron-based circuits [15],
particular, LSTM [4] and GRU [5]) and convolutional NNs through analogue/digital mixtures [16] to fully analogue chips
[6] amongst many others [7], [8]. seeking to emulate the diffusive kinetics of real synapses [17].
Thus far the vast majority of market-relevant ANN-based More recently memristor-based architectures has also emerged
systems belong to the domain of statistical learning, i.e. [18].
perform tasks which can be generally reduced to some sort In this work, we summarise an existing, abstract mathemat-
of pattern recognition and interpolation (in time, space, etc.). ical structure for carrying out semantic object manipulation
This, though demonstrably useful, is akin to memorising computations and propose an alternative, hardware-friendly
every answer to every question plus some ability to cope instantiation. Our approach uses vector concatenation and
with uncertainty. In contrast, higher level intelligence must be modular addition as its fundamental operations (in contrast
able to support fluid reasoning and syntactic generalisation, to the more typical element-wise vector addition and matrix-
i.e. applying previous knowledge/experience to solve novel vector multiplication respectively). Crucially, the chosen set
problems. This requires the packaging of classified information of operations no longer forms a holographic representation
generated by traditional ANNs into higher level variables system. This trades away some ‘expressivity’ (ability to form
(which we may call ‘semantic objects’), which can then be flu- semantic object expressions within limited resources) in ex-
ently manipulated at that higher level of abstraction. A number change for compression: Unlike holographic representations
of cognitive architectures have been proposed to perform such semantic object vector length depends on its information
post processing, most notably the ACT-R architecture [9] and content. Furthermore, the proposed system avoids use of multi-
the semantic pointer architecture (SPA) [10], which is an effort plication completely, thus allowing for both fast and efficient
to manipulate symbols using neuron-based implementations. processing in hardware (avoiding both expensive multipliers
and relatively slow spiking systems). Finally, we illustrate Chain
how the proposed system can be easily mapped onto a simple
vector processing unit and provide some preliminary, expected
performance metrics based on simulations in Cadence on a
|a1 a2 a3 ... | b1 b2 b3 ... | ... | x1 x2 ... |
65nm technology. a1=5

II. M ATHEMATICAL FOUNDATIONS AND MOTIVATION Element Item State


Generalising the series of work on models of associative Fig. 1. Summary of key terms used throughout this text.
memory, many of them inspired from the world of optics
[11], [13], [14], [19]–[24], one may inspect the most abstract
algebraic formulation of it. All we need is a commutative ring Finally, the operations of the system must be ideally im-
R with a distance metric dist. plementable in hardware in a way that minimises power and
In order to give this mathematical machinery sufficient area requirements. In practice this means that the fundamental
power to describe cognitive tasks, one must initially specify superposition and binding operations must rely on energet-
the ring operations and impose some restrictions on them. The ically cheap building block operations such as thresholding
primary operation (addition, denoted by +), enables superpo- (an inverter), shifts (flip-flop chain), addition (sum of currents
sition1 ; that is the combination of two elements in such way on a wire or digital adder) or possibly analogue multiplication
that the result is equidistant from its operands under the metric (memristor + switch) [25]. Implementation details will ulti-
dist (i.e. for a, b ∈ R, one has dist(a + b, a) ≈ dist(a + b, b)). mately determine the actual cost of each operation. The main
The secondary operation (multiplication, denoted by ∗) enables approaches so far either use too many multiply-accumulate
binding; that is the combination of two elements in such (MAC) operations (circular convolution-based binding from
manner that the result is ideally completely different from both [11] requires ≈ n2 MACs/binding), or are applicable only to
operands. Next, one needs to store a (finite) set of elements of binary vectors (radix k = 2) [14].
R including both invertible elements, which we call ‘pointers’
III. P ROPOSED SEMI - HOLOGRAPHIC REPRESENTATION
(or ‘roles’), and not necessarily invertible ones, which we call
SYSTEM
‘fillers’.
Let us now give an example of how such mathematical In this section we provide an intuitive overview followed
machinery may give rise to simple cognition. Assume that by a rigorous mathematical explanation of the proposed archi-
we have a ring R with the distance dist and the operations, tecture interwoven with pointers on how our design decisions
satisfying the desired properties. Also assume that we fixed aim towards hardware efficiency. Overall, in order to achieve a
five elements of R: “obj” and “col” are invertible, “red”, more hardware-friendly cognitive algebra realisation we trade
“green” and “car” are any elements. We can now construct a away some of the mathematical simplicity from the previous
new element: s = obj∗car+col∗red, which can be interpreted section for implementability. The algebraic structure we are
as a semantic object “red car”. Now one can ask: what colour using for cognition is no longer a ring, but a rather exotic
is this car? The answer can be accessed by performing an construction. It consists of an underlying set and two binary
algebraic operation: col−1 ∗s = red+col−1 ∗obj ∗car. Then if operations (superposition, binding).
the term col−1 ∗obj∗car is either close to zero or in some other
A. Building a set of semantic objects
way does not interfere with the computation of dist, the stored
memory element closest to the result of the query is red. Math- In our proposed system, the set of semantic objects is
ematically, the query is argminr∈R (dist(col−1 ∗ s, r)) = red. perhaps best understood in terms of two subsets: i) Fixed-
Thus, we observe that the mathematical foundation of AI is length ‘base items’, each consisting of y integer elements
underpinned by a solid computational/information processing in the range [0, p − 1]. The choices of p and y link to
foundation whose functionality must be preserved in any pro- desired memory capacity, i.e. the number of semantic objects
posed alternative representation system, even if not necessarily the system is capable of representing reliably - see section
via a distance-equipped commutative ring. IV). ii) Variable-length ‘item chains’ consisting of multiple
The classical realisation of the commutative ring-based concatenated base elements. The maximum length for chains
cognition principle is the holographic-like memory [11]. In is d base items for a total of n numerical elements, where d is
this case R is defined as follows: the set is a collection of n- determined by the hardware design2 and affects the capacity
dimensional real vectors (n-vectors) Rn . The ring operations of the system to hold/express multiple basic items at the same
are the element-wise addition and circular convolution. The time. The number of base items in a chain is defined as the
distance metric is the simple Euclidean. To define a pointer or rank of the chain. The terminology is summarised in figure 1
a filler one just needs to independently sample each entry of Some observations about our implementation: i) Base items
the vector from the normal distribution N (0, 1/n). are generally intended for encoding the fundamental vocabu-
1 In the literature this is typically called ‘chunking’, but this term by itself 2 However, note that much akin to standard computers being able to process
does not allude strongly enough to the desired simultaneous similarity between numbers more than 32 or 64 bits, there is no reason why chains longer than
operands and result. d base items cannot be processed using similar techniques.
TABLE I combined ranks exceed d are not allowed3 .
S UMMARY OF RELATIONS BETWEEN BASIC MATHEMATICAL OBJECTS Formally speaking, given a ∈ B d1 , b ∈ B d2 , the superposi-
USED IN THIS WORK . A S AN EXAMPLE OF HOW TO READ THE TABLE , THE
TOP LEFT ENTRY STATES THAT: ”I N EVERY ELEMENT THERE ARE p = 2l tion a + b is just an element (a, b) in a direct products of the
STATES ”. T HE TERM ‘ CHAIN ’ REFERS TO A MAXIMUM SIZE CHAIN . T HE groups B d1 +d2 . If d1 + d2 > n, the operation is not defined.
TERM ’ ITEM ’ REFERS TO BASE ITEMS . A LL PARAMETERS ARE INTEGERS . Next, the binding operation ‘∗’ is defined as a variant of a
In every... tensor product between semantic objects where the individual
Element Item Chain pairings are subsequently subjected to element-wise addition
States p = 2l 2l+z 2l+z+m modulo p. Mathematically, for given natural numbers d1 and
There are
Elements 1 y = 2z n = 2z+m
this many...
Items N/A 1 d = 2m d2 , such that d1 · d2 < n, one can define the binding operation
∗ : B d1 × B d2 → B d1 ·d2 by the formula:
(a1 , a2 , . . . , ad1 ) ∗ (b1 , b2 , . . . , bd2 ) =
(2)
lary items of the system (e.g. ‘red’, ‘apple’, ‘colour’) and pos- (a1 + b1 , a2 + b1 , . . . , ad1 + b1 , a1 + b2 , . . . , ad1 + bd2 )
sible bindings, including the classical ‘pointer-filler’ pairings
where + is the group operation in B = (Z/p)y . One can see
(e.g. ‘colour’∗‘red’: the value of the colour ‘attribute’ is ‘red’).
that any element from B (base item) is invertible under the
In contrast, chains are intended for simultaneously holding (su-
binding4 .
perpositions of) multiple base items in memory (e.g. composite
One should notice that modular addition is losslessly re-
descriptions of objects such as: colour ∗ red + object ∗ apple
versible: we may indefinitely add and subtract n-vectors, and
(‘a red apple’), or collections of unrelated items such as:
therefore can perfectly extract any individual term from any
shape ∗ circle + shape ∗ square (‘a circle and a square’).
multi-term binding combination if we bind with the modulo
The order in which the superposed items are kept in memory
p summation inverses of all other terms. We also remark
does not bear any functional significance; for the purposes of
that within the context of the order-independence property
our system items are either present or absent from a chain.
any binding of chains with length greater than 1 item is
Cognitive systems that are order- or even position-dependent
effectively a convenient shorthand for describing multiple base
can be, of course, conceived; all that is necessary is for each
item bindings and adds no further computational (or indeed
item to have some mechanism (e.g. a position indicator) for
semantic) value.
marking its location within a chain. ii) Setting p, y, d, n as
We conclude this section by highlighting that our super-
powers of 2 offers the attribute of naturally advantageous
position operation is not length-preserving but our binding is
implementation in digital hardware. This is the approach we
when one of the operands consists of 1 basic item. Thus we
choose in this work, as shown in table I. The choice of p is not
describe our system as semi-holographic. Interestingly, this is
necessarily obvious as what constitutes a ‘good’ choice of p
the opposite of the classical convolution-based system from
will depend on the specific implementations of superposition
[12], where the binding operation is not length-preserving but
and binding. iii) Any chain can be zero-padded until it forms
superposition (element-wise average) is.
a maximum-length chain.
Mathematically the above can be described as follows: Fix C. Similarity metric
natural numbers p and y as above. Then the set of base items is Let us define a distance. First, we use a “circular distance”
a group B = (Z/p)y (under element-wise mod p summation). on Z/p: for a ∈ Z/p, one has dist◦ (a, 0) = min(|a|, |p − a|),
The way to form item chains is by executing a direct Q product here we also denoted by a = a + 0 · p the corresponding
r
of copies of B. Then we say that any element of B r = i=1 B representative in Z. For example, for 4 ∈ Z/5, dist(4,0) = 1.
has rank r. The chain of maximal length will be an element Analogously one defines a distance dist◦ (a, b) for any a, b ∈
of B n , and n = d · y. Z/p as min(|b − a|, p − |b − a|). For two vectors a, b ∈ B,
one defines the distance as:
y
X
B. Superposition and binding distB (a, b) = dist◦ (ai , bi ) (3)
i=1
Next, we define our set of basic operations. The superposi- 3 In a practical hardware implementation we would either: i) raise an
tion operation ‘+’ is defined as follows: If a and b are semantic exception and forbid the operation, ii) truncate the result to size d and raise
objects, then: a warning flag or iii) raise a flag and trigger a software sequence (program)
designed to handle overlength chains - equivalent to branching to a different
subroutine in Assembly language.
a + b = (a, b) (1) 4 This is where the consequences of our choice of p become apparent:
Consider the item consisting of all elements equal to p/2. Binding this item
to itself twice results in the original item. This becomes problematic if we
which is a standard direct sum. The result contains both wish to define a sequence of items as a succession of bindings, e.g. if we
a and b operands preserved completely intact. This can be define the semantic object ‘2’ as 1 ∗ 1, ‘3’ as 2 ∗ 1 etc. If, on the other
contrasted with superposition implemented as regular element- hand p is prime, then for any integer x 6= 0 there is a guarantee that if
(k · x) mod p = x, the next greatest solution after k = 1 is k = p + 1;
wise summation, where each operand is ‘blurred’ and merged this may allow the construction of longer, non-tautological self-bindings vs.
into the result. Superpositions of semantic objects whose non-prime p systems. Morale: the choice of p is not always obvious.
For a ∈ B and b = (b1 , . . . br ) ∈ B r we define: A lower bound for s is given by calculating the number of
basic items J that the system can generate given a set of Qs
dist(a, b) = mini distB (a, bi ) (4)
vocabulary items and allowed complexity. These will all need
One can note that dist(a + b, a) = 0 for any a ∈ B. to be accommodated unambiguously for guaranteeing reliable
recall. In our proposed system the only operation that can
D. Basic properties generate basic items from combinations of vocabulary items
In terms of fundamental mathematical properties: The su- is the binding operation. Therefore for Qs vocabulary items we
perposition operation is not closed in general, but it acts as Q2
obtain 2s derived items arising from all the possible unordered
closed when our restriction on the sum of the ranks of the (to accountγ for the commutativity) pairwise bindings. This
operands is met. It is associative but not commutative. It has Q
rises to γ!s for exactly γ allowed bindings, and in general
an identity element (the empty string), but no inverse operation the system can generate:
as such.
The binding operation is not closed, but acts as closed when Γ
X Qi s QΓs Qs
the restriction on the product of the ranks of the operands J= ≈ , for 1 (6)
i=0
i! Γ! Γ
is met. This is always the case when one of the operands
is a basic item, i.e. a ∈ B. If a is a basic item, then for basic items, if we allow anything between 0 and Γ bindings in
any b ∈ B d , we have the commutativity: a ∗ b = b ∗ a. If total. Ideally we want to account for all possible basic items
a ∈ B d1 , b ∈ B d2 , c ∈ B d3 , and at least one of di = 1, then from the fundamental vocabulary via bindings, so J = Q(=
we have associativity: (a ∗ b) ∗ c = a ∗ (b ∗ c). In general py ), and therefore we can transform equation 6 into:
it is neither associative nor commutative, however, modulo √ y
Γ
permutation group on basic item components, it has those Qs ≈ Γ! · p Γ (7)
properties.
Finally, one has a distributivity in case of a basic item: revealing how expressivity is traded against capacity, at least
a ∈ B, b ∈ B d1 , c ∈ B d2 , then a ∗ (b + c) = a ∗ b + a ∗ c. in the absence of any further allowances to combat possible
In general, as above this property no longer holds (unless we uncertainty in the encoding, decoding or recall of semantic
don’t care about the order of terms and factorise by the action objects. Whether this boundary can be reached in practice
of permutation group). requires further study as the particular encodings of each basic
The identity element is the zero element of B. All basic item will determine whether specific bindings coincide with
elements are invertible under binding. pre-learnt vocabulary or other bindings. Let us observe that the
These properties form a good start for building a cognitive more binding is allowed in the system, the less√ fundamental
x x
system. vocabulary it can memorize (hint: limx→ − ∞ x! = e ). This
is an example of a trade-off between capacity and complexity.
IV. C APACITY Example: if we choose p = 16, y = 128 and we allow
In terms of higher level properties, a key metric is memory the system to have at most Γ = 20 bindings, then the upper
capacity: the maximum number of basic elements storable bound on the length of the core dictionary we can encode is
given some minimum upper bound for memory recall reliabil- 422 million items.
ity. Each rank 1 semantic object (base item), the smallest type V. A DDITIONAL SEMANTIC OBJECT MANIPULATIONS
of independent semantic objects, must be uniquely identifiable.
As a result, there can be no more than Q = py basic memories In order to complete the description of the proposed system
in total without guaranteeing at least one ambiguous recall, i.e. we need to cover three further issues: i) How does the
Q is the maximum memory capacity5 . However, an additional system cope with uncertainty? ii) Since the system is semi-
sparsity requirement is necessary in order to guarantee that holographic how does the system map multi-item chains to
the system is capable of unambiguously answering queries. single base items when necessary? In this work we provide
Returning to the example from section II, in order for the some cursory answers as these questions merit substantially
term col−1 ∗ obj ∗ car to be culled from any semantic pointer deeper study in the own right.
or filler from our vocabulary it should not coincide with a Dealing with uncertainty: The implementation of de-noising
valid object from the fixed fundamental vocabulary. In order will strongly depend on the form of the uncertainty present
to achieve that, we may impose that our memory safely stores in the system. We may define uncertainty as a probability
only up to Qs vocabulary objects, where s ∈ R is the desired distribution that encodes how likely it is to obtain semantic
sparsity factor, and the following formula holds: object x0 when in fact the ground truth is x. For example, if the
probability density only depends on the ‘circular distance’ (eq.
Q 3) between the x and x0 objects6 We may use an adaptation of
Qs = . (5)
s element-wise average for de-noising. The average is computed
5 It is very expedient if any semantic object that needs to be stored for quick as the mid-point along the geodesic. In particular, for a ∈ Z/p
recall is constructed as a basic object, not in the least because binding any
operand with a basic object does not lengthen the operand. For that reason 6 This alludes to the radial basis functions (RBFs) used in radial basis
we only consider basic elements when computing memory capacity. neurons [26].
let also denote by the same symbols a = a + 0 · p its A. Hardware system design strategy
representative in Z. Also denote by ∆ = ceil( dist◦2(a,b) ). The proposed holographic representation machinery can be
For a, b ∈ Z/p, if |b − a| ≤ p − |b − a|, pick a smaller implemented as a fully digital system in a very straightforward
representative with respect to the standard ordering on Z (say, manner. The underlying set will be implicitly determined by
it is a), then avg(a, b) = a + ∆. If the alternative inequality the bit-width used. The inverses of each n-vector element
happens, pick the greater representative (say, it is b), then under element-wise modular addition are simply their 2’s
avg(a, b) = (b + ∆) mod p. In general, for items a, b ∈ B, we complements. Full representation of any semantic object can
define the average as the element-wise average. therefore consist of d, log2 p-bit words, plus x flag bits for
To this we add the following observations: i) The purpose tracking the number of items in any given chain.
of the de-noising average is to reconcile multiple, corrupted The superposition operation can be handled by the hardware
versions of a single semantic object vector, not combine as ‘APPEND’ operations (akin to linked lists); the system
different vectors into new semantic objects (i.e. ai is expected need only know the operands and the state of their flag
to be reasonably close to bi most of the time). Nevertheless, bits. In practice this would be implemented as d ‘SELECT’
when used with radically different semantic objects as inputs, operations, which directly map onto a simple l · n-width7 mul-
it is inescapable to observe that the operation acts very simi- tiplexer/demultiplexer (MUX/DEMUX) pair. A small digital
larly to binding. The effects of using a binding-like operation circuit determines the appropriate, successive configurations
for denoising (a task usually handled by superposition) are of the MUX/DEMUX structure depending on the flag bits
an interesting subject for further study. ii) Different uncer- of the operands. Finally, the circuit also sets the flag bits
tainty descriptors (probability distribution functions) may lend of the resulting chain. The block diagram of this system is
themselves to different de-noising strategies. So will different shown in Figure [FIG]. The hardware-level complexity of our
metrics. iii) Even with fixed underlying probability distribution proposed system can be contrasted with the standard element-
assumptions, de-noising may be carried out using multiple wise addition approach, which requires n times z = ceil(l)-
alternative strategies. Examples applicable to our assumptions level ‘ADD’ operations (cost: n, z-bit adders, or one time-
would be majority voting (select element-wise mode instead shared z-size adder or valid trade-off solutions in between).
of mean - works best for large number of input sample terms) The binding operation can be carried out by n element-wise
or median selection. addition/subtractions (ADD/SUB), implementable as n, z-bit
Compressing long chains into basic items: Ideally any ADD/SUB modules. Because of the modular arithmetic rules
cognitive system should be able to take any expression and overflow bits are simply ignored. The ADDSUB terminal of
collapse it into a new memory that can be stored, recalled each module can directly convert one of the operands into
and used with the facileness that basic items enjoy. In our its 2’s complement inverse as is standard. This is illustrated
case this requires compressing chains into the size of a basic in Figure [FIG]. The complexity of (a maximum of) n, z-
item. In principle, any compression algorithm will suffice. bit additions can be contrasted to the computational cost of
Examples could be applying genetic algorithm-like methods circular convolution, which would involve n2 multiplication
[27] on the items of a chain or combining said items using and n · (n − 1) additions (= n · (n − 1) MACs + n
any multiplication (e.g., circular convolution etc). multiplications). On top of this, the additional hardware cost of
We conclude by remarking that the operation of creating shifting a chosen operand of the circular convolution n times
a new semantic object can be reasonably expected to be in its entirety must also be considered.
executed orders of magnitude less frequently than any of the Naturally, alternative hardware implementations are also
other operations. As such, it is possible to dedicate hardware possible. This might include fully analogue ones, e.g. using
that is both more complex (luxury of using relatively heavy analogue multiplexers for superposition and current-steering-
computation) and more remotely located from the core of the based binding [28]. Alternatively it might include ‘packet’-
semantic object processor (luxury of preventing the layout based ones where chains are packaged into e.g. TCP-like
footprint of the semantic object generator from impacting the (Transmission Control Protocol) packets and communicated
layout efficiency of the processor core). across an internet-like router structure. Each packet could
VI. H ARDWARE IMPLEMENTATION contain a header detailing the number of items within the
packet and a payload, a technique similar to the protocol used
In this section we examine how the mathematical machinery in neuromorphic systems communications over the internet
can be mapped onto a hardware module which we call the [29]. The proposed implementation is chosen because it nat-
‘Cognitive Processing Unit’ (CoPU). The system receives urally maps onto easily synthesisable digital hardware. The
chains as input operands and generates new chains at its output most efficient implementation technique in any given system,
after executing the requested superposition and/or binding however, will naturally depend on the rest of the system,
operations. The CoPU is based on a common block-level e.g. on whether the broader environment operates in mainly
design blueprint which can then be instantiated as specific analogue or digital.
CoPU designs. It is at the point of instantiating a particular A block diagram of a full CoPU is shown in Figure [FIG].
CoPU design that the values of key parameters p, y, d are
decided upon. 7n ‘bundles’ of l binary lines.
B. Example instantiation of CoPU and performance evalua- Naturally there is a price to pay for compression: when
tion creating new semantic objects for storage it is extremely useful
-Concept of introducing hardware-relevant performance if these new objects can be mapped onto minimum-length units
metrics such as number and type of ops. needed to carry out (the semantic object basis of any cognitive system). Mecha-
bindings. nisms for mapping any arbitrary chain onto such units need to
be supported, adding to system complexity. Furthermore, in a
VII. D ISCUSSION non-holographic system any circuitry designed to support the
The starting point of this work is the observation that any last items of a chain may be utilised only infrequently. This
system consisting of a length n vector with p states per is expected to strongly affect hardware design decisions.
element (corresponding to some fixed number of digital signal Trade-off 3 - long vectors with few states per element vs
lines) can only represent pn uniquely identifiable vectors. This short vectors with many states per element: If we have a fixed
is effectively a hardware resource constraint and imposes a number of binary lines (i.e. l · y = C), we have a choice of
number of trade-offs warranting design decisions. treating C as either: i) one single, large identifier number, ii)
Trade-off 1 - expressivity vs. capacity: In the classical a collection of binary bits independent of one another or iii)
holographic representation systems all semantic object vectors certain possibilities in between. For example, for C = 16 we
are of equal length no matter how many times semantic objects can have {l, y} ∈ {(1, 16), (2, 8), (4, 4), (8, 2), (16, 1)}. The
are combined together through superposition or binding. By number of states we can represent remains fixed at 2ly , but:
contrast, in our proposed system some objects will be base • The distance relationships between semantic objects will
items and others will be chains of various lengths. This intro- be different in each case. In the case (1,16) our item
duces some constraints into which combinations of semantic consists of a vector of 16x 1-bit elements, and therefore
objects are allowable, yet the system retains the capability of there are 16 nearest neighbours each item (all items that
representing pn states overall. This seems to be a manifestation differ from the base object at exactly one position). In
of a fundamental trade-off. Cognitive systems may either: the case (16,1) our item is a single 16-bit number which
1) Operate on relatively few basic semantic objects (ob- has exactly two nearest neighbours (the elements/items
jects stored in memory as meaningful/significant) but different from the base object by one unit of distance).
allow many possible combinations between them, i.e. Note that the case (1,16) corresponds tightly to the spatter
be expressive but low capacity. code system proposed by Kanerva [14] since modular
2) Operate on relatively many basic semantic objects but addition now reduces to a simple XOR.
only accommodate certain possible combinations be- • The degree of modularity achievable in hardware may
tween them. This is the regime in which our proposed be impacted in each case. The (1,16) case requires 16x
system operates. XOR gates in order to perform one item-item binding
We note that the question of the optimum balance between whilst in the (16,1) case requires a single 16-bit adder. In
expressivity and capacity is highly complex and requires the case of large values of C there may be an additional
further study in its own right. In our proposed system capacity impact on speed (how viable is to make a 512-bit adder
and expressivity are to some extent decoupled: p, y affect that computes an answer in one clock cycle/step? - 512x
capacity and expressivity in a trade-off manner whilst d affects XOR gates on the other hand will compute 512 outputs in
only capacity. one step). This subject requires further, dedicated study.
Trade-off 2 - ‘holographicity’ vs. compression: Cognitive Trade-off 4 - operation complexity vs. property attractive-
systems can be conceived at different levels of ‘holographicity’ ness: As a rule of thumb operations with more attractive
as determined by the percentage of operations that are operand mathematical properties tend to introduce computational and
length-preserving. For fixed maximum semantic object length implementational difficulties. This is perhaps well exemplified
the choice lies between the extreme of always utilising the by examining different binding operations:
full length of n elements in order to represent every possible • Convolution commutes, ‘scrambles’ the information well
8
semantic object (full-holographic), or allowing some semantic and preserves information. However, it lengthens the
objects to be shorter (non-holographic). This significantly im- vectors that it processes and it is computationally heavy
pacts the amount of information each numerical element car- (many MACs).
ries. In a fully holographic representation transmitting or pro- • Circular convolution commutes and scrambles. Length-
cessing even a single-item-equivalent semantic object requires ening no longer occurs, but information is lost and the
handling of n elements; the same as transmitting/processing operation is still heavy on MACs.
the equivalent of a long chain. The semantic information • Modular arithmetic commutes. Lengthening does not
per element may dramatically differ in each situation. In occur and the operation is MAC-lightweight, but infor-
our proposed system, however, superpositions of fewer items mation is lost and the scrambling properties are similar
are represented by shorter chains. This illustrates how less to those of superposition by element-wise addition, so
holographic systems generally offer the option of operating the similarity requirements for defining two semantic
on more compressed information, i.e. closer to the signal-to-
noise ratio (SNR) limit. 8 The result bears in general very little resemblance to either of the operands.
objects as corrupted versions of each other have to be [4] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, and
substantially tightened. J. Schmidhuber, “LSTM: A Search Space Odyssey,” IEEE Transactions
on Neural Networks and Learning Systems, vol. 28, no. 10, pp.
Ultimately, a complex mix of factors/specs in all trade-off 2222–2232, oct 2017. [Online]. Available: http://ieeexplore.ieee.org/
directions will determine the best cognitive system implemen- document/7508408/
[5] C. W. Wang, P. C. Guo, X. Wang, Paerhati, L. B. Li, and J. P.
tation. This may depend on the overall cognitive capabilities Bai, “Autologous peroneus brevis and allogeneic tendon to reconstruct
required of the system. In this work we have focussed on a lateral collateral ligament of the ankle joint,” Chinese Journal of Tissue
partially holographic system based on effectively multiplexing Engineering Research, vol. 19, no. 30, pp. 4908–4914, sep 2015.
[Online]. Available: http://arxiv.org/abs/1409.1259
and addition as the system operations. The advantage of this
[6] C. J. Spoerer, P. McClure, and N. Kriegeskorte,
implementation vs. the holographic approach that we have “Recurrent Convolutional Neural Networks: A Better
used as standard and inspiration is that both operations have Model of Biological Object Recognition,” Frontiers in
been simplified in hardware: superposition became a multi- Psychology, vol. 8, p. 1551, sep 2017. [Online].
Available: http://www.ncbi.nlm.nih.gov/pubmed/28955272http://
plexing operation instead of addition whilst binding became www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5600938http:
element-wise addition instead of circular convolution. The //journal.frontiersin.org/article/10.3389/fpsyg.2017.01551/full
balance of these advantages vs. the attributes that had to be [7] G. B. Kaplan and C. Güzelis, “Hopfield networks for solving
Tower of Hanoi problems,” Ari, vol. 52, no. 1, pp. 23–29, 2001.
traded-away (mathematical elegance, full holographicity, etc.) [Online]. Available: https://www.researchgate.net/profile/Cueneyt{\
needs to be considered very carefully. In general, however }Guezelis/publication/225181806{\ }Hopfield{\ }networks{\ }
the system is designed for occasions where we have partially for{\ }solving{\ }Tower{\ }of{\ }Hanoi{\ }problems/links/
54520f750cf24884d8873da2.pdfhttps://ejournal.csiro.au/cgi-bin/sciserv.
restricted expressivity (notable cap on chain length - effective pl?collection=journals{\&}journal=14345641{\&}issue=v5
number of successive superpositions allowed) but enables ex- [8] F. Schurmann, K. Meier, and J. Schemmel, “Edge of Chaos Computation
treme implementational simplicity and high energy efficiency. in Mixed-Mode VLSI - A Hard Liquid ,” Proc. of NIPS, 2005.
[9] J. R. Anderson, M. Matessa, and C. Lebiere, “ACT-R: A Theory
Finally, we envision that our proposed CoPU will form a of Higher Level Cognition and Its Relation to Visual Attention,”
core component of larger systems with cognitive capability. HumanComputer Interaction, vol. 12, no. 4, pp. 439–462, dec
Much like in a traditional computer, our CPU-equivalent will 1997. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1207/
s15327051hci1204{\ }5
need a memory to which it can communicate as well as
[10] C. Eliasmith, How to build a brain : a neural architecture for biological
peripheral structures. Work in that general direction has very cognition.
recently begun to gain traction [18], [30]. Relating this back [11] T. A. Plate, “Holographic Reduced Representations,” IEEE Transactions
to biological brains we see the closest analogue of our CoPU on Neural Networks, vol. 6, no. 3, pp. 623–641, may 1995. [Online].
Available: http://ieeexplore.ieee.org/document/377968/
in the putative attentional systems of the brain; the contents [12] P. H. Schönemann, “Some algebraic relations between involutions,
of the input buffers at any given time could be interpreted as convolutions, and correlations, with applications to holographic
the semantic objects in the machine’s ‘conscious attention’. In memories,” Biological Cybernetics, vol. 56, no. 5-6, pp. 367–374, jul
1987. [Online]. Available: http://link.springer.com/10.1007/BF00319516
conclusion, we envisage that future thinking machines will [13] P. Smolensky, “Tensor product variable binding and the representation
be complex systems consisting of multiple, heterogeneous of symbolic structures in connectionist systems,” Artificial Intelligence,
modules including ANNs, memories (bio-inspired or standard vol. 46, no. 1, pp. 159–216, 1990.
digital look-up tables), sensors, possibly even classical mi- [14] P. Kanerva, “Fully Distributed Representation,” Proceedings of 1997
Real World Computing Symposium, no. c, pp. 358–365, 1997.
croprocessors and more; all working together to give rise to [15] F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur,
cognitive intelligence. We hope that our CoPU will play a P. Merolla, N. Imam, Y. Nakamura, P. Datta, G. J. Nam, B. Taba,
central role in this ‘hyperarchitecture’ structure by acting as M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk,
B. Jackson, and D. S. Modha, “TrueNorth: Design and Tool Flow of a
the equivalent of the CPU in a classical computer, and that 65 mW 1 Million Neuron Programmable Neurosynaptic Chip,” IEEE
it will do so with the energy efficiency required for enabling Transactions on Computer-Aided Design of Integrated Circuits and
widespread adaptation of cognitive computers. Systems, vol. 34, no. 10, pp. 1537–1557, oct 2015. [Online]. Available:
http://ieeexplore.ieee.org/document/7229264/
[16] A. Neckar, S. Fok, B. V. Benjamin, T. C. Stewart, N. N.
ACKNOWLEDGEMENTS Oza, A. R. Voelker, C. Eliasmith, R. Manohar, and K. Boahen,
The authors would like to thank Prof. Chris Eliasmith whose “Braindrop: A Mixed-Signal Neuromorphic Architecture With a
Dynamical Systems-Based Programming Model,” Proceedings of the
work provided much of the inspiration for this work. We also IEEE, vol. 107, no. 1, pp. 144–164, jan 2019. [Online]. Available:
thank Prof. Jesse Hoey for his support and fruitful discussions. https://ieeexplore.ieee.org/document/8591981/
[17] N. Qiao, H. Mostafa, F. Corradi, M. Osswald, F. Stefanini,
R EFERENCES D. Sumislawska, and G. Indiveri, “A reconfigurable on-line learning
spiking neuromorphic processor comprising 256 neurons and 128K
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classifica- synapses,” Frontiers in Neuroscience, vol. 9, no. APR, p. 141, 2015. [On-
tion with Deep Convolutional Neural Networks,” Advances In Neural line]. Available: http://www.ncbi.nlm.nih.gov/pubmed/25972778http:
Information Processing Systems, pp. 1–9, 2012. //www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC4413675
[2] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, [18] A. Rahimi, T. F. Wu, H. Li, J. M. Rabaey, H. S. P. Wong, M. M.
R. Prenger, S. Satheesh, S. Sengupta, A. Coates, and A. Y. Ng, “Deep Shulaker, and S. Mitra, “Hyperdimensional Computing Nanosystem,”
Speech: Scaling up end-to-end speech recognition,” dec 2014. [Online]. nov 2018. [Online]. Available: http://arxiv.org/abs/1811.09557
Available: http://arxiv.org/abs/1412.5567 [19] D. Casasent and B. Telfer, “Key and recollection vector
[3] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, effects on heteroassociative memory performance.” Applied
vol. 521, no. 7553, pp. 436–444, 2015. [Online]. Available: optics, vol. 28, no. 2, pp. 272–283, jan 1989. [On-
https://www.nature.com/nature/journal/v521/n7553/pdf/nature14539. line]. Available: https://www.osapublishing.org/abstract.cfm?URI=
pdfhttp://arxiv.org/abs/1606.01781http://arxiv.org/abs/1603.05691 ao-28-2-272http://www.ncbi.nlm.nih.gov/pubmed/20548469
[20] A. D. Fisher, W. L. Lippincott, and J. N. Lee, “Optical implementations
of associative networks with versatile adaptive learning capabilities,”
Applied Optics, vol. 26, no. 23, p. 5039, dec 1987. [Online]. Available:
https://www.osapublishing.org/abstract.cfm?URI=ao-26-23-5039
[21] E. G. Paek and D. Psaltis, “Optical Associative Memory Using
Fourier Transform Holograms,” Optical Engineering, vol. 26, no. 5,
p. 265428, may 1987. [Online]. Available: http://opticalengineering.
spiedigitallibrary.org/article.aspx?doi=10.1117/12.7974093
[22] D. Willshaw and P. Dayan, “Optimal Plasticity from Matrix
Memories: What Goes Up Must Come Down,” Neural Computation,
vol. 2, no. 1, pp. 85–93, mar 1990. [Online]. Available: http:
//www.mitpressjournals.org/doi/10.1162/neco.1990.2.1.85
[23] D. W. J, O. B. P, and H. L.-H. C, “Non-holographic associative
memory,” Nature, vol. 222, no. 5197, pp. 960–962, 1969. [Online].
Available: http://psycnet.apa.org/psycinfo/1970-20044-001
[24] D. Aerts, M. Czachor, and B. De Moor, “On Geometric Algebra
representation of Binary Spatter Codes,” oct 2006. [Online]. Available:
http://arxiv.org/abs/cs/0610075
[25] M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves,
S. Lam, N. Ge, R. S. Williams, J. Yang, and H. P. Labs,
“Dot-Product Engine for Neuromorphic Computing: Programming
1T1M Crossbar to Accelerate Matrix-Vector Multiplication,” IEEE
Design Automation Conference, pp. 1—-6, 2016. [Online]. Available:
https://www.labs.hpe.com/techreports/2016/HPE-2016-23.pdf
[26] J. Park and I. W. Sandberg, “Universal Approximation Using Radial-
Basis-Function Networks,” Neural Computation, vol. 3, no. 2, pp.
246–257, 1991. [Online]. Available: http://www.mitpressjournals.org/
doi/10.1162/neco.1991.3.2.246
[27] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and
elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions
on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, apr 2002.
[Online]. Available: http://ieeexplore.ieee.org/document/996017/
[28] J. Deveugele and M. Steyaert, “A 10-bit 250-MS/s Binary-Weighted
Current-Steering DAC,” IEEE Journal of Solid-State Circuits, vol. 41,
no. 2, pp. 320–329, feb 2006. [Online]. Available: http://ieeexplore.
ieee.org/document/1583796/
[29] K. Boahen, “Point-to-point connectivity between neuromorphic chips
using address events,” IEEE Transactions on Circuits and Systems II:
Analog and Digital Signal Processing, vol. 47, no. 5, pp. 416–434,
may 2000. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/
wrapper.htm?arnumber=842110
[30] A. Graves, G. Wayne, and I. Danihelka, “Neural Turing Machines,” oct
2014. [Online]. Available: http://arxiv.org/abs/1410.5401

You might also like