0% found this document useful (0 votes)
22 views

Synthesis of Reconfigurable Multiplier B

Uploaded by

metch hermann
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Synthesis of Reconfigurable Multiplier B

Uploaded by

metch hermann
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

University of Westminster Eprints

WestminsterResearch
http://eprints.wmin.ac.uk
http://www.wmin.ac.uk/westminsterresearch

Synthesis of reconfigurable multiplier blocks: part I:


fundamentals.

Suleyman Demirsoy1
Izzet Kale1
Andrew Dempster2
1
Cavendish School of Computer Science, University of Westminster
2
School of Surveying and Spatial Information Systems, University of New
South Wales, Sydney, Australia

Copyright © [2005] IEEE. Reprinted IEEE International Symposium on Circuits and


Systems 2005, pp. 536-539.

This material is posted here with permission of the IEEE. Such permission of the
IEEE does not in any way imply IEEE endorsement of any of the University of
Westminster's products or services. Internal or personal use of this material is
permitted. However, permission to reprint/republish this material for advertising or
promotional purposes or for creating new collective works for resale or redistribution
must be obtained from the IEEE by writing to [email protected]. By
choosing to view this document, you agree to all provisions of the copyright laws
protecting it.

The WestminsterResearch
Eprints service at theonline digitalof
University archive at the University
Westminster aims to of Westminster
make the research
aims
outputtoof
make the research
the University output of
available to the University
a wider available
audience. to a wider
Copyright andaudience.
Moral Rights
Copyright and Moral Rights remain with the
remain with the authors and/or copyright owners. authors and/or copyright owners.
Users are
are permitted
permittedtotodownload
downloadand/or
and/orprint one
print copy
one for for
copy non-commercial
non-commercialprivate
private
study or
orresearch.
research. Further distribution
Further andand
distribution any any
use of material
use from within
of material from this
within this
archive for profit-making enterprises or for commercial gain is strictly forbidden.

Whilst further
further distribution
distributionofofspecific
specificmaterials
materialsfrom
from within
within thisthis archive
archive is forbidden,
is forbidden,
mayfreely
you may freely distribute
distribute the of
the URL URL of the University of Westminster Eprints
WestminsterResearch.
(http://eprints.wmin.ac.uk).
(http://www.wmin.ac.uk/westminsterresearch).

In case of abuse or copyright appearing without permission e-mail [email protected].


Synthesis of Reconfigurable Multiplier Blocks: Part I-
Fundamentals

Süleyman Sırrı Demirsoy, Izzet Kale Andrew G. Dempster


Applied DSP and VLSI Research Group School of Surveying and Spatial Information Systems
University of Westminster University of New South Wales
London, W1W 6UW, UK Sydney, Australia
{demirss, kalei}@wmin.ac.uk [email protected]

Abstract— Reconfigurable Multiplier Blocks (ReMB) offer In recent years, the application of the multiplier blocks to
significant area, delay and possibly power reduction in time- time-multiplexed digital filter designs was studied in [12]-
multiplexed implementation of multiple constant [14]. The coefficient store and the general-purpose
multiplications. This paper and its companion paper (subtitled multiplier in Fig. 1(a) were replaced by a reconfigurable
Part II- Algorithm) together present a systematic synthesis multiplier block (b), which can generate the required
method for Single Input Single Output (SISO) and Single coefficient products with its different configurations. For the
Input Multiple Output (SIMO) ReMB designs. This paper example in Fig. 1(b) the ReMB is a Single Input Single
presents the necessary foundation and terminology needed for Output (SISO) block. A Single Input Multiple Output
developing a systematic synthesis technique. The companion
(SIMO) ReMB can replace the entire fixed multipliers in a
paper illustrates the synthesis method through examples. The
method proposed achieves reduced logic-depth and area over
bank of filters as shown in Fig 1(c).
standard multipliers / multiplier blocks. It has been shown that the redundancy can be reduced
and the resulting specialized multiplier design can be much
more efficient in terms of area and computational complexity
I. INTRODUCTION
compared to the general-purpose multiplier with its
Primitive Operator Filters [1] and multiplier blocks [2] associated coefficient store [12]-[14]. Guidelines for
are especially beneficial for the fully parallel implementation efficient realization were presented in [12], and an efficient,
of digital filters and filter banks. They reduce the automated design algorithm based on the graphical approach
complexity of the implementation effectively, by exploiting was developed and reported in [13]. This algorithm was
the redundancy of the multiple constant multiplications. suitable for SIMO systems such as filter banks.
Multiplications by coefficients are realized by successive
x[n]
shift and add operations. The intermediate values that are
formed during the generation of one coefficient are re-used
for other coefficients, and thus reducing the computational Coefficient Input
Store ci Memory
redundancy. This topic has been studied extensively in the wi[n] (b)
literature, and many algorithms were developed to design wi[n]

multiplier blocks or - in other words - multiple constant select


w1[n] w2[n] wk−1[n] wk [n]
multiplications for different applications. These algorithms
can be grouped into two, depending on their approach to the (a)
x[n]
c1 c2
Multiplier cblock
k −1 ck

problem: y[n]

• Sub-expression sharing method; that works on the y i [n ]


Signed Digit (SD) representations of a group of
coefficients [3]-[7], c1j c 2j ckj
ckj−1

• Numerical (graphical) approach; where a group of c12 c 2


2 c 2
k −1 ck2
coefficient products are generated using common x[n ] c11 c12 c1k −1 c1k
intermediate products [1],[2],[8]-[11]. (c)

The savings that can be achieved in implementing fully Figure 1 (a) Time-multiplexed Tappled Delay Line (TDL) (direct-form)
parallel digital filters as a result of these techniques are FIR filter, (b) Conceptual SISO ReMB that would replace the coefficient
impressive both in terms of area, complexity and power store and the general purpose multiplier. (c) A SIMO ReMB system can
reduction [1]-[11]. replace the dashed box in a transpose direct form filter bank.

0-7803-8834-8/05/$20.00 ©2005 IEEE. 536


Efficient use of the resources on FPGA structures was III. FOUNDATION FOR SYNTHESIS
studied in [15]. In this study, Turner reported significant
savings in the area and delay of some DSP blocks by using A. Efficient Handling of Multiple Outputs
the Reduced Coefficient Multiplier (RCM) that uses the The multiplier block algorithm RAG-n presented in [2],
configurable resources of a Field Programmable Gate Array built the coefficients in a given set one by one in an order
(FPGA). His design method [16], which is based on generally defined by their costs (minimum number of
common sub-expression sharing, combines the SD-encoded interconnected adders to generate the coefficient) or
coefficients on to the Look-Up Tables (LUT) that exist in magnitudes. The coefficients having the same costs still
FPGAs and can be used for SISO and Multiple Input Single needed to be built in order, by making use of all the
Output (MISO) blocks. previously generated numbers (both the fundamentals and
In this paper, we will present the fundamentals required the coefficients) in the multiplier block. The multiple-output
and developed in [13] for a systematic synthesis of SISO and requirement of the multiplier block to be used in the
SIMO ReMB. Section 2 will focus on the basic structure transposed direct form filters (the multiplier block in Fig.
topology. Section 3 will give the details of the developed 1(b) without the multiplexer) was realized by connecting the
foundation for SIMO and low logic-depth, with conclusions generated partial products or coefficients to the
in Section 4. corresponding filter taps.
The efficient realization of multiple outputs in a ReMB
II. BASIC STRUCTURE TOPOLOGY design has to be different than multiplier blocks. Let us
All the examples in the paper are based on the simplest consider a typical time-multiplexed filter bank application as
basic structure topology as shown in Fig. 2(a). In general, all shown in Fig. 1(c), with output nodes y1, y2, … yk. Typically
ReMB designs are presented as directed-acyclic graphs each output node of the ReMB has the same set size, i.e. the
where each line represents a connection. The (•) represents number of coefficients per output node is the same, which
we shall assume to be M. The upper bound of the output set
an adder or a subtractor or an adder/subtractor. One of its
size of a ReMB design grows exponentially as the number of
inputs is connected to a multiplexer. This basic structure can
cascaded basic structures increases [13]. We further assume
be configured to operate either on the (A, B) or (A, C) inputs
that an output node y1 is built using several interconnected
by the help of the select line of the adder resulting in two
basic structures shown in Fig. 2(a) and have M different
configuration stages. Some of the possible variants of this
outputs. Any other output node, say y2, built with the same
topology are (A+B, A+C), (A+B, A-C), (A-B, A-C). These
type of basic structure cascaded to y1 would typically have
sets of operations are particularly important since they can be
the capacity of 2M outputs. Since y1 and y2 both have the
efficiently implemented in the Virtex FPGA, with no extra
same output set size, the basic structure of y2 becomes under-
hardware cost for multiplexers [16].
utilized.
Although the algorithm is structured to employ variants
One way to make sure that the output nodes are treated
of this basic structure, the idea of how to design ReMB for a
independently is to start designing from the output nodes and
given coefficient set is applicable to any basic structure (See
build the whole design step by step back to the input, as each
[12], [13] for other forms of basic structures and much
output node would be a different starting point without any
detailed information on ReMB).
dependence on one another.
Fig. 2(b) shows two interconnected basic structure. The
output produced by the first basic structure is fed to the input B. Basic Structure Depth
of the second one. The total number of outputs that can be To avoid under-utilization, all output nodes in the design
produced by this structure is four. When three of the basic should have a similar number of interconnected basic
structures are interconnected as shown in Fig. 2(c) and (d) structures when traced back to the input. The number of
the number of coefficients that can be produced at the output coefficients per output node would put a restriction on both
(rightmost node) is eight. the minimum number of basic structures required and the
A minimum depth of the ReMB design. For example, it was
B shown in previous section that, by using the simplest basic
C
structure, a maximum of eight different numbers can be
(a) (b) generated at depth 2. In the same way, the maximum
number of outputs that can be achieved at a depth of three
basic structures is 128 for a possible ReMB design shown in
(c) (d) Fig. 3. The basic structures in the diagram are placed in
Figure 2 (a) Topology of the simplest basic structure, (b) An example for layers to indicate the “basic-structure-depth” of that node.
a cascade of two basic structures. Three basic structures interconnected in
the (c) chain form, (d) tree form.
The maximum number of outputs from a node is 2 ni
where i is the basic-structure-depth and ni can be formulated
recursively for ReMB designs comprising the simplest basic
structure as follows:

537
Layer 1 Layer 2 Layer 3 implementation with a restricted set of basic structures as
explained in Section 2, (8+1) and (16-1) cannot be combined
on a basic structure. Therefore, the set {9, 15} needs to be
designed in layer two.
As a summary, the lower bound to the basic-structure-
depth of an output node is the maximum of two values. The
first one is the minimum depth that can generate the required
output set size. This value depends on the type of the basic
structure employed in the design. The second one is the
maximum of the adder-costs of the coefficients.

C. Graphs
The realization of any coefficient from a set of
Figure 3 A ReMB design with a basic-structure-depth of three, which can fundamentals can be represented on a graph as shown in Fig.
produce 128 different coefficients at the output of layer 3. 4. For a coefficient x, the ‘graph’ consists of a set of
numbers {a, b, c, d} satisfying the equation:
ni = 2ni −1 + 1 (1)
x= ac ± bd (2)
It should be noted that, for a different basic structure, the r
where c and d are in the form of ±2 , r being a natural
maximum number of outputs per node would be different. number for integer x.
On the other hand, the individual coefficient costs put a a
c
separate restriction to the number of basic-structures
interconnected for building a particular output node. For d
b x
example a cost-3 coefficient needs at least three basic
structures to be generated. They can either be in a chain Figure 4 An example graph
form (Fig. 2c) or in a tree form (Fig. 2d). However it is
shown that, the tree form interconnection of n Equation (2) results in more than one graph for a coefficient
adders/subtractors cannot produce all cost-n numbers in a x when {a, b, c, d} change in a pre-defined interval.
multiplier block [10]. Therefore, a basic-structure-depth of n Collecting all such graphs of a coefficient in a table, graph-
would ensure that a cost-n coefficient can be generated. tables are formed. Graph-tables can be employed in
generating efficient ReMB designs.
The basic-structure-depth is important when deciding the
layer of an output node. To explain this, consider a node
with the fundamental set {39, 45, 41, 47, 61, 11, 27, 57, D. Node-definition
119}. All of the fundamentals are cost-2, i.e. each of them A node-definition is a combination of graphs using a
requires a cascade of two adders to be generated. However, particular basic structure to produce a given coefficient set.
since there are nine different numbers, the basic-structure- Fig. 5 shows a node-definition for the coefficient set {K, L,
depth of the node would be at least three if the basic M} on a basic structure. A, B1 and B2 are the inputs of the
structures shown in Fig. 2(a) were to be employed, since the basic structure. c, d1, and d2 are the edge values. [t0 t1 t2] are
maximum number of outputs at depth-2 is eight. Here, the the different configuration states of the resulting ReMB
basic-structure-depth is dictated by the output set size. In a design. [aK, aL, aM], [b1K, X, b1M] and [X, b2L, X] are the
different example, the coefficient set {473, 181, 49} has fundamental vectors holding the inputs of the basic structure
three different numbers. The coefficient ‘49’ is a cost-2 for different configurations. The ‘X’ (don’t care) in [b1K, X,
number whereas 473 and 181 are both cost-3. Again, b1M] means the multiplexer does not use B1 for configuration
assuming the simplest basic-structure is used, the output set t1 but rather uses b2L from B2 to produce the coefficient L.
size only requires a minimum of two interconnected basic At configuration t0, this node generates K as K= aK×c +
structures. This time, the basic-structure-depth is dictated b1K×d1.
not by the output set size but by the cost of the coefficients,
which is three. However, it should be kept in mind that, [ t0 t1 t2 ]
c
some cost-3 coefficients can be generated at depth-2. A [ aK aL aM]
Choosing depth-3 guarantees to cover all the different B1 [b1K X b1M] 1
d [ t0 t1 t2 ]
topologies that generate cost-3 coefficients. d [K L M]
B2 [X b2L X] 2
The basic-structure-depth can not always suggest the
Figure 5 A generalized node-definition includes all the details about the
accurate layer of the output node by checking the coefficient node; edge values, and the fundamentals required to build the coefficient
set. For example, the set {9, 15} includes two cost-1 set for a given basic structure.
numbers. The coefficient ‘9’ can be realized as (8+1),
whereas ‘15’ is generated as (16-1). For an FPGA

538
The graphs of the coefficients should be combined in IV. CONCLUSION
such a way that, the basic-structure-depth of the resulting As a new design technique, ReMB needs new concepts to
fundamental sets at A, B1 and B2, should be kept less than be developed for its efficient application. This paper
the basic-structure-depth of the coefficient set, otherwise the presented new concepts to synthesize SISO and SIMO
design would not converge back to the graph input. This ReMB circuits. They form the foundation for the algorithm
implies that two parameters have to be decreased while that is presented in the companion paper entitled as “Part II:
choosing the graphs; the number of different fundamentals Algorithm” [17].
(fundamental set size) at an input, and the cost of the
fundamentals. Assuming the basic-structure-depth of the The proposed technique divides the whole ReMB design
coefficient set is three, the fundamentals at the input sets into layers depending on the basic-structure-depth and deals
should be at most cost-2, and the fundamental set sizes can with each layer recursively, starting from the output towards
be at most eight (i.e. the maximum number of outputs the input.
allowed at that particular depth, see (1)). REFERENCES
[1] Bull D.R. and D.H Horrocks, “Primitive operator digital filters”, IEE
The node-definitions satisfying the two requirements Proceedings-G, vol. 138, no. 3, pp. 401-412, June 1991.
mentioned above could be found by processing the [2] Dempster A.G. and Macleod M.D., “Use of minimum-adder
combinations of graphs that exist in the graph-tables. This multiplier-blocks in FIR digital filters”, IEEE Trans. CAS-II, vol. 42,
method is explained further in the companion paper [17]. no. 9, pp. 569-577, November 1995.
[3] Bernstein R., “Multiplication by integer constants”, Software-Practice
and Experience, vol. 16, no. 7, pp. 641-652, Academic Press, New
E. Algorithm Approach York, July 1986.
Fig. 6 shows a typical symbolic SIMO ReMB example [4] Hartley R., “Subexpression sharing in filters using canonic signed
that can be generated by the algorithm. There are three digit multipliers”, IEEE CAS-II, vol. 43, no.10, pp. 677-688, 1996
output nodes, y1, y2, and y3. As observed from the figure, all [5] Pasko R., et al, “A new algorithm for elimination of common sub-
output nodes have a basic-structure-depth of three. expressions”, IEEE Trans. CAD ICS, vol. 18, pp.58-68, January
1999.
Layer 0 Layer 1 Layer 2 Layer 3 [6] Martinez-Peiro M., E.I. Boemo and L. Wanhammar, “Design of high-
speed multiplierless filters using a nonrecursive signed common
subexpression algorithm”, IEEE Trans. CAS-II, vol. 49, no. 3, pp.
196-203, March 2002.
[7] Potkonjak M., M.B. Srivastava and A. P. Chandrakasan, “Multiple
y1(k) constant multiplications: Efficient and versatile framework and
algorithms for exploring common subexpression elimination”, IEEE
Trans. on CAD of ICS, vol. 15, no. 2, pp. 151-165, February 1996.
x(k) [8] Li D., “Minimum number of adders for implementing a multiplier and
y2(k)
its application to the design of multiplierless digital filters”, IEEE
Trans. CAS-II, vol. 42, no. 7, pp. 453-460, July 1995.
[9] Dempster A. G. and M.D. Macleod, “General algorithms for reduced-
y3(k) adder integer multiplier design”, Elec. Letters, vol. 31, no. 21, pp.
1800-1802, October 1995.
[10] Gustafsson O., A. Dempster and L. Wanhammar, “Extended results
for minimum-adder constant integer multipliers”, IEEE ISCAS’2002,
vol. 1, pp. 73-76, May 2002.
Figure 6 A symbolic diagram for SIMO ReMB [11] Kang H.J. and I.C. Park, “Multiplier-less IIR filter Synthesis
algorithms to trade-off the delay and the number of adders”,
The layers partition the design into smaller units that can Proceedings of IEEE ISCAS’01, vol. 2, pp. 693-696, Australia 2001.
systematically be handled by the algorithm. Each layer has [12] Demirsoy S. S., A.G. Dempster and I. Kale, “Design Guidelines for
Reconfigurable Multiplier Blocks”, IEEE ISCAS’03, vol. 4, pp. 293-
output nodes and fundamental sets that feed the basic 296, Thailand, May 2003.
structures. For an intermediate layer, the fundamental sets [13] Demirsoy S. S., “Complexity Reduction in Digital Filters and Filter
are the output nodes generated in the preceding layers. For Banks”, Ph.D. Thesis, University of Westminster, October 2003
layer 1, the fundamental set is always the input signal, which [14] Demirsoy S. S., R. Beck, A.G. Dempster and I. Kale, “Reconfigurable
is represented as ‘1’. Starting from the last layer of the implementation of recursive DCT kernels with reduced quantization
design, the algorithm recursively calls itself for each layer noise”, IEEE ISCAS’2003, vol.4, pp. 289-292, Thailand, May 2003
until it reaches the input signal. At each call, a number of [15] Turner R. H., T. Courtney and R. Woods, “Implementation of fixed
DSP functions using the reduced coefficient multiplier”, IEEE Proc.
coefficient sets or output nodes are processed by the of ICASSP’2001, vol. 2, pp. 881-884, May 2001, USA
algorithm to create node-definitions that generate the [16] Turner R. H., “Functionally diverse programmable logic
required coefficient sets. The fundamental sets that are implementations of digital signal processing algorithms”, PhD Thesis,
required by these node-definitions are then designed by Queen’s University of Belfast, August 2002.
recursive calls of the algorithm. [17] Demirsoy S. S., I. Kale, A. G. Dempster, “Synthesis of
Reconfigurable Multiplier Blocks: Part II- Details of the Algorithm”,
to be publised in IEEE ISCAS’05.

539

You might also like