Processors.: Mops Integer Dmder Ic
Processors.: Mops Integer Dmder Ic
A. Roberto Criado
T h e divide function i n siqml processing systems is, i n Divident Input and a t p u t Eius
r m y instances, unavoidable. Particularly when
implerrrenting range scaling, m t r i x operations, or D31 D30 ... D18 ... DO1 WO
perspective transforms, in system applications such as
wrkstations, radar systems, and image processors. 31 30 18 1 0
Traditionally t h i s need has been f u l f i l l e d a t the -2 2 ... 2 ... 2 2
e-nse of reduced system sped and efficiency by
relying on canverging recursive techniques to carry out Divisor Input E m
the division operation. T h i s paper reports the design
and implementation of a 32-bit fixed pint integer X15 X14 ... XO1 XOO
divider. The OK6 chip performs two's-aniplmt
integer division of 32-bit dividends and 16-bit 15 14 1 0
divisors without p r e n o m l i z a t i o n , to prcduce a 32-bit -2 2 ... 2 2
output quotient.
For fractional data the fonnats are as follaws :
The device is architected a s a one-dimensional s y s t o l i c D31 D30 D29 D28 ... D18 ... DO1 Do0
array of sixteen identical tvn-block arithmetic cells.
?Ae pipelined s y s t o l i c architecture operates a t a 0 -1 -2 -3 -1 3 -30 -31
throughput of 25 million operations per second (25 -2 2 2 2 ... 2 ... 2 2
WPS) , w i t h a latency of 1 9 clock cycles. The algorithm
chosen f o r this design is a continuous mn-restoring Divisor Input E m
procedure [ 13 , implemented a s a series of conditioral
adderfsubtractor modules. These arithwtic modules X15 X14 X13 ... XO1 XOO
opsrate on the previous module's output (reminder),
along w i t h the original divisor. Each a r i b t i c module 0 -1 -2 -13 -14
accepts the next two b i t s of the dividend and generates -2 2 2 2 2 ...
the next tw bits of t h e quotient. This process is (note: the minus sign represents the sign b i t ) .
repeated N t k s u n t i l the remainder is zero, or a
positive n-r. A s l i g h t cunplication occurs a t t i m s Having latched the incaning operands, the divisor i n
when the division of two p s i t i v e numbers yields a f u l l and a 16-bit sign-extension of t h e dividend sign
negative remainder. I n these cases one extra processing b i t are operated on by the f i r s t of the 16 arithwtic
step is needed; the divisor is added to the emanating modules which ccmprise the chip's core. This procedure
negative remainder to correct it. The c h i p ' s is repzated throughout the core, with each module
architecture p a r a l l e l s the a l q x i t h m i e flow (Fig. 1 ) . accepting the next two hits of the dividend
The device cunprises 1 6 identical s y s t o l i c functional appropriately delayed to canpensate f o r t b pipeline
c e l l s separated by synchronous pipeline registers (Fig. delay within the core. The carries e m a ~ t i n ga t each
2 ) . The n m k r of s y s t o l i c functional cells (one-bit stage of the arithwtic a r e are collected and
quotient generators) per inter-register pipeline stage assembled into a provisional quotient. Before
sets t h e balance between throughput and number of outputting the f i n a l result, conditioning of the
cycles of pipeline latency. The core of each block quotient is performea to correct the result for the
consists of two N-bit wide adderfsubtractor circuits, p s s i b i l i t y of having reached a z e r o remainder prior to
which can increase o r decrease the inccmirq remainder the last arithmetic process in the core, or as stated
by the divisor (Fig. 3 ) . Subtraction is performed i f above, t o correct a p s s i b l e remainder anamly; and
the sign b i t s of the divisor and the incaning reminder l a s t l y , to oanplement the quotient in case of a
match; addition is done i f they d i f f e r . With two cells negative divisor, or dividend. The design o f f e r s two
per stage, the 1Mc3211 is able t o meet the design goals error flags, DZ (divisor equals zero), ard RFM (inexact
of throughput and chip s i z e . result, non-zero ranainder), which accompany each
quotient output. When DZ is high, indicating a divide
by zero operation, the quotient is meaningless. me t o
RJNC!lXQ\LAL DBCRIFTICN a f i n i t e data word width, a t m ' s complement overflow
e r r o r OCCUTS urder the following unique conditions :
Both the 32-bit d i v i d e d ard the 16-bit divisor i n p t s
are accepted by the chip on separate 25 MHz. busses. Dividend : Y = 80000000 [hex] (neg. f u l l scale)
Each input port is enabled individually to allow
ccntinuous division by a constant. The integer divider Divisor : X = FFFF [hex] ( -1)
s u p p r t s a l l fixed-point input f o m t s , although the
u s e r must keep track of the b i m r y point t o i n t e r p r e t Result
the quotient properly.
Quotient : Q = 80000000 [hex] (neg. f u l l scale)
W P DESIC24 AND
--- IMPLINDTTATICN .IF. .X15
...t
-
216 aM + A*
. .-
. .R15
- zx R + x + yY
OM R15 -
-
,
ckle of the lrain goals in t h i s effort was to achieve the IF X15 [CARRY-IN1
(CARRY-OUT)
Z 1 6 x OM + R' 2x R -X + YM
desired throughput of 25 Mops, and also to maximize the
effective use of active area real estate. The systolic
architecture implemented as a direct result of the
chosen division algorithm pointed to a "long and
M_rrow' aspect ratio. Furthermre, the dividend and
16 4 XI1531 16 R'l153;
P7-4.2
OlVlSOR REMAINDER
r I
X15 X14:O A15 R144 Y31
Y
HOLl
CKTI
I
X
- R
1
Y31
FIGURE 3
ARITHMETIC CELL FUNCTIONAL BLOCK DIAGRAM
X
DIVISOR
FIGURE 6
I
C H IP M I CROP HOTOGRAPH
HOLDING
MSB
1 OlVlDER
QUOTIENT .
I CORE
FLAGS 4 2,
FIGURE 4
P7-4.3