Synthesis of Reconfigurable Multiplier B
Synthesis of Reconfigurable Multiplier B
WestminsterResearch
http://eprints.wmin.ac.uk
http://www.wmin.ac.uk/westminsterresearch
Suleyman Demirsoy1
Izzet Kale1
Andrew Dempster2
1
Cavendish School of Computer Science, University of Westminster
2
School of Surveying and Spatial Information Systems, University of New
South Wales, Sydney, Australia
This material is posted here with permission of the IEEE. Such permission of the
IEEE does not in any way imply IEEE endorsement of any of the University of
Westminster's products or services. Internal or personal use of this material is
permitted. However, permission to reprint/republish this material for advertising or
promotional purposes or for creating new collective works for resale or redistribution
must be obtained from the IEEE by writing to [email protected]. By
choosing to view this document, you agree to all provisions of the copyright laws
protecting it.
The WestminsterResearch
Eprints service at theonline digitalof
University archive at the University
Westminster aims to of Westminster
make the research
aims
outputtoof
make the research
the University output of
available to the University
a wider available
audience. to a wider
Copyright andaudience.
Moral Rights
Copyright and Moral Rights remain with the
remain with the authors and/or copyright owners. authors and/or copyright owners.
Users are
are permitted
permittedtotodownload
downloadand/or
and/orprint one
print copy
one for for
copy non-commercial
non-commercialprivate
private
study or
orresearch.
research. Further distribution
Further andand
distribution any any
use of material
use from within
of material from this
within this
archive for profit-making enterprises or for commercial gain is strictly forbidden.
Whilst further
further distribution
distributionofofspecific
specificmaterials
materialsfrom
from within
within thisthis archive
archive is forbidden,
is forbidden,
mayfreely
you may freely distribute
distribute the of
the URL URL of the University of Westminster Eprints
WestminsterResearch.
(http://eprints.wmin.ac.uk).
(http://www.wmin.ac.uk/westminsterresearch).
Abstract— Reconfigurable Multiplier Blocks (ReMB) offer In recent years, the application of the multiplier blocks to
significant area, delay and possibly power reduction in time- time-multiplexed digital filter designs was studied in [12]-
multiplexed implementation of multiple constant [14]. The coefficient store and the general-purpose
multiplications. This paper and its companion paper (subtitled multiplier in Fig. 1(a) were replaced by a reconfigurable
Part II- Algorithm) together present a systematic synthesis multiplier block (b), which can generate the required
method for Single Input Single Output (SISO) and Single coefficient products with its different configurations. For the
Input Multiple Output (SIMO) ReMB designs. This paper example in Fig. 1(b) the ReMB is a Single Input Single
presents the necessary foundation and terminology needed for Output (SISO) block. A Single Input Multiple Output
developing a systematic synthesis technique. The companion
(SIMO) ReMB can replace the entire fixed multipliers in a
paper illustrates the synthesis method through examples. The
method proposed achieves reduced logic-depth and area over
bank of filters as shown in Fig 1(c).
standard multipliers / multiplier blocks. It has been shown that the redundancy can be reduced
and the resulting specialized multiplier design can be much
more efficient in terms of area and computational complexity
I. INTRODUCTION
compared to the general-purpose multiplier with its
Primitive Operator Filters [1] and multiplier blocks [2] associated coefficient store [12]-[14]. Guidelines for
are especially beneficial for the fully parallel implementation efficient realization were presented in [12], and an efficient,
of digital filters and filter banks. They reduce the automated design algorithm based on the graphical approach
complexity of the implementation effectively, by exploiting was developed and reported in [13]. This algorithm was
the redundancy of the multiple constant multiplications. suitable for SIMO systems such as filter banks.
Multiplications by coefficients are realized by successive
x[n]
shift and add operations. The intermediate values that are
formed during the generation of one coefficient are re-used
for other coefficients, and thus reducing the computational Coefficient Input
Store ci Memory
redundancy. This topic has been studied extensively in the wi[n] (b)
literature, and many algorithms were developed to design wi[n]
problem: y[n]
The savings that can be achieved in implementing fully Figure 1 (a) Time-multiplexed Tappled Delay Line (TDL) (direct-form)
parallel digital filters as a result of these techniques are FIR filter, (b) Conceptual SISO ReMB that would replace the coefficient
impressive both in terms of area, complexity and power store and the general purpose multiplier. (c) A SIMO ReMB system can
reduction [1]-[11]. replace the dashed box in a transpose direct form filter bank.
537
Layer 1 Layer 2 Layer 3 implementation with a restricted set of basic structures as
explained in Section 2, (8+1) and (16-1) cannot be combined
on a basic structure. Therefore, the set {9, 15} needs to be
designed in layer two.
As a summary, the lower bound to the basic-structure-
depth of an output node is the maximum of two values. The
first one is the minimum depth that can generate the required
output set size. This value depends on the type of the basic
structure employed in the design. The second one is the
maximum of the adder-costs of the coefficients.
C. Graphs
The realization of any coefficient from a set of
Figure 3 A ReMB design with a basic-structure-depth of three, which can fundamentals can be represented on a graph as shown in Fig.
produce 128 different coefficients at the output of layer 3. 4. For a coefficient x, the ‘graph’ consists of a set of
numbers {a, b, c, d} satisfying the equation:
ni = 2ni −1 + 1 (1)
x= ac ± bd (2)
It should be noted that, for a different basic structure, the r
where c and d are in the form of ±2 , r being a natural
maximum number of outputs per node would be different. number for integer x.
On the other hand, the individual coefficient costs put a a
c
separate restriction to the number of basic-structures
interconnected for building a particular output node. For d
b x
example a cost-3 coefficient needs at least three basic
structures to be generated. They can either be in a chain Figure 4 An example graph
form (Fig. 2c) or in a tree form (Fig. 2d). However it is
shown that, the tree form interconnection of n Equation (2) results in more than one graph for a coefficient
adders/subtractors cannot produce all cost-n numbers in a x when {a, b, c, d} change in a pre-defined interval.
multiplier block [10]. Therefore, a basic-structure-depth of n Collecting all such graphs of a coefficient in a table, graph-
would ensure that a cost-n coefficient can be generated. tables are formed. Graph-tables can be employed in
generating efficient ReMB designs.
The basic-structure-depth is important when deciding the
layer of an output node. To explain this, consider a node
with the fundamental set {39, 45, 41, 47, 61, 11, 27, 57, D. Node-definition
119}. All of the fundamentals are cost-2, i.e. each of them A node-definition is a combination of graphs using a
requires a cascade of two adders to be generated. However, particular basic structure to produce a given coefficient set.
since there are nine different numbers, the basic-structure- Fig. 5 shows a node-definition for the coefficient set {K, L,
depth of the node would be at least three if the basic M} on a basic structure. A, B1 and B2 are the inputs of the
structures shown in Fig. 2(a) were to be employed, since the basic structure. c, d1, and d2 are the edge values. [t0 t1 t2] are
maximum number of outputs at depth-2 is eight. Here, the the different configuration states of the resulting ReMB
basic-structure-depth is dictated by the output set size. In a design. [aK, aL, aM], [b1K, X, b1M] and [X, b2L, X] are the
different example, the coefficient set {473, 181, 49} has fundamental vectors holding the inputs of the basic structure
three different numbers. The coefficient ‘49’ is a cost-2 for different configurations. The ‘X’ (don’t care) in [b1K, X,
number whereas 473 and 181 are both cost-3. Again, b1M] means the multiplexer does not use B1 for configuration
assuming the simplest basic-structure is used, the output set t1 but rather uses b2L from B2 to produce the coefficient L.
size only requires a minimum of two interconnected basic At configuration t0, this node generates K as K= aK×c +
structures. This time, the basic-structure-depth is dictated b1K×d1.
not by the output set size but by the cost of the coefficients,
which is three. However, it should be kept in mind that, [ t0 t1 t2 ]
c
some cost-3 coefficients can be generated at depth-2. A [ aK aL aM]
Choosing depth-3 guarantees to cover all the different B1 [b1K X b1M] 1
d [ t0 t1 t2 ]
topologies that generate cost-3 coefficients. d [K L M]
B2 [X b2L X] 2
The basic-structure-depth can not always suggest the
Figure 5 A generalized node-definition includes all the details about the
accurate layer of the output node by checking the coefficient node; edge values, and the fundamentals required to build the coefficient
set. For example, the set {9, 15} includes two cost-1 set for a given basic structure.
numbers. The coefficient ‘9’ can be realized as (8+1),
whereas ‘15’ is generated as (16-1). For an FPGA
538
The graphs of the coefficients should be combined in IV. CONCLUSION
such a way that, the basic-structure-depth of the resulting As a new design technique, ReMB needs new concepts to
fundamental sets at A, B1 and B2, should be kept less than be developed for its efficient application. This paper
the basic-structure-depth of the coefficient set, otherwise the presented new concepts to synthesize SISO and SIMO
design would not converge back to the graph input. This ReMB circuits. They form the foundation for the algorithm
implies that two parameters have to be decreased while that is presented in the companion paper entitled as “Part II:
choosing the graphs; the number of different fundamentals Algorithm” [17].
(fundamental set size) at an input, and the cost of the
fundamentals. Assuming the basic-structure-depth of the The proposed technique divides the whole ReMB design
coefficient set is three, the fundamentals at the input sets into layers depending on the basic-structure-depth and deals
should be at most cost-2, and the fundamental set sizes can with each layer recursively, starting from the output towards
be at most eight (i.e. the maximum number of outputs the input.
allowed at that particular depth, see (1)). REFERENCES
[1] Bull D.R. and D.H Horrocks, “Primitive operator digital filters”, IEE
The node-definitions satisfying the two requirements Proceedings-G, vol. 138, no. 3, pp. 401-412, June 1991.
mentioned above could be found by processing the [2] Dempster A.G. and Macleod M.D., “Use of minimum-adder
combinations of graphs that exist in the graph-tables. This multiplier-blocks in FIR digital filters”, IEEE Trans. CAS-II, vol. 42,
method is explained further in the companion paper [17]. no. 9, pp. 569-577, November 1995.
[3] Bernstein R., “Multiplication by integer constants”, Software-Practice
and Experience, vol. 16, no. 7, pp. 641-652, Academic Press, New
E. Algorithm Approach York, July 1986.
Fig. 6 shows a typical symbolic SIMO ReMB example [4] Hartley R., “Subexpression sharing in filters using canonic signed
that can be generated by the algorithm. There are three digit multipliers”, IEEE CAS-II, vol. 43, no.10, pp. 677-688, 1996
output nodes, y1, y2, and y3. As observed from the figure, all [5] Pasko R., et al, “A new algorithm for elimination of common sub-
output nodes have a basic-structure-depth of three. expressions”, IEEE Trans. CAD ICS, vol. 18, pp.58-68, January
1999.
Layer 0 Layer 1 Layer 2 Layer 3 [6] Martinez-Peiro M., E.I. Boemo and L. Wanhammar, “Design of high-
speed multiplierless filters using a nonrecursive signed common
subexpression algorithm”, IEEE Trans. CAS-II, vol. 49, no. 3, pp.
196-203, March 2002.
[7] Potkonjak M., M.B. Srivastava and A. P. Chandrakasan, “Multiple
y1(k) constant multiplications: Efficient and versatile framework and
algorithms for exploring common subexpression elimination”, IEEE
Trans. on CAD of ICS, vol. 15, no. 2, pp. 151-165, February 1996.
x(k) [8] Li D., “Minimum number of adders for implementing a multiplier and
y2(k)
its application to the design of multiplierless digital filters”, IEEE
Trans. CAS-II, vol. 42, no. 7, pp. 453-460, July 1995.
[9] Dempster A. G. and M.D. Macleod, “General algorithms for reduced-
y3(k) adder integer multiplier design”, Elec. Letters, vol. 31, no. 21, pp.
1800-1802, October 1995.
[10] Gustafsson O., A. Dempster and L. Wanhammar, “Extended results
for minimum-adder constant integer multipliers”, IEEE ISCAS’2002,
vol. 1, pp. 73-76, May 2002.
Figure 6 A symbolic diagram for SIMO ReMB [11] Kang H.J. and I.C. Park, “Multiplier-less IIR filter Synthesis
algorithms to trade-off the delay and the number of adders”,
The layers partition the design into smaller units that can Proceedings of IEEE ISCAS’01, vol. 2, pp. 693-696, Australia 2001.
systematically be handled by the algorithm. Each layer has [12] Demirsoy S. S., A.G. Dempster and I. Kale, “Design Guidelines for
Reconfigurable Multiplier Blocks”, IEEE ISCAS’03, vol. 4, pp. 293-
output nodes and fundamental sets that feed the basic 296, Thailand, May 2003.
structures. For an intermediate layer, the fundamental sets [13] Demirsoy S. S., “Complexity Reduction in Digital Filters and Filter
are the output nodes generated in the preceding layers. For Banks”, Ph.D. Thesis, University of Westminster, October 2003
layer 1, the fundamental set is always the input signal, which [14] Demirsoy S. S., R. Beck, A.G. Dempster and I. Kale, “Reconfigurable
is represented as ‘1’. Starting from the last layer of the implementation of recursive DCT kernels with reduced quantization
design, the algorithm recursively calls itself for each layer noise”, IEEE ISCAS’2003, vol.4, pp. 289-292, Thailand, May 2003
until it reaches the input signal. At each call, a number of [15] Turner R. H., T. Courtney and R. Woods, “Implementation of fixed
DSP functions using the reduced coefficient multiplier”, IEEE Proc.
coefficient sets or output nodes are processed by the of ICASSP’2001, vol. 2, pp. 881-884, May 2001, USA
algorithm to create node-definitions that generate the [16] Turner R. H., “Functionally diverse programmable logic
required coefficient sets. The fundamental sets that are implementations of digital signal processing algorithms”, PhD Thesis,
required by these node-definitions are then designed by Queen’s University of Belfast, August 2002.
recursive calls of the algorithm. [17] Demirsoy S. S., I. Kale, A. G. Dempster, “Synthesis of
Reconfigurable Multiplier Blocks: Part II- Details of the Algorithm”,
to be publised in IEEE ISCAS’05.
539