0% found this document useful (0 votes)

85 views3 pages

Processors.: Mops Integer Dmder Ic

1. The document describes a 32-bit integer divider chip that performs division operations at a throughput of 25 million operations per second without needing to prenormalize operands. 2. The chip uses a systolic array architecture with 16 identical two-block arithmetic cells arranged in a pipelined manner. 3. The division algorithm used is a continuous restoring procedure that generates bits of the quotient one at a time over successive clock cycles through conditional addition/subtraction operations.

Uploaded by

api-3736507

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views3 pages

Processors.: Mops Integer Dmder Ic

Uploaded by

api-3736507

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

A 25 MOPS SYSTOLIC INTEGER D M D E R I C

A. Roberto Criado

TRW LSI P m m INC.

P.O. Box 2412
~a J o l l a , California 92038

mmm 'pne integer data formats are as follows:

T h e divide function i n siqml processing systems is, i n Divident Input and a t p u t Eius
r m y instances, unavoidable. Particularly when
implerrrenting range scaling, m t r i x operations, or D31 D30 ... D18 ... DO1 WO
perspective transforms, in system applications such as
wrkstations, radar systems, and image processors. 31 30 18 1 0
Traditionally t h i s need has been f u l f i l l e d a t the -2 2 ... 2 ... 2 2
e-nse of reduced system sped and efficiency by
relying on canverging recursive techniques to carry out Divisor Input E m
the division operation. T h i s paper reports the design
and implementation of a 32-bit fixed pint integer X15 X14 ... XO1 XOO
divider. The OK6 chip performs two's-aniplmt
integer division of 32-bit dividends and 16-bit 15 14 1 0
divisors without p r e n o m l i z a t i o n , to prcduce a 32-bit -2 2 ... 2 2
output quotient.
For fractional data the fonnats are as follaws :

A_uxIRITHM AND DEVICE ARCHITECIVRE Dividend Input and a t p u t E m

The device is architected a s a one-dimensional s y s t o l i c D31 D30 D29 D28 ... D18 ... DO1 Do0
array of sixteen identical tvn-block arithmetic cells.
?Ae pipelined s y s t o l i c architecture operates a t a 0 -1 -2 -3 -1 3 -30 -31
throughput of 25 million operations per second (25 -2 2 2 2 ... 2 ... 2 2
WPS) , w i t h a latency of 1 9 clock cycles. The algorithm
chosen f o r this design is a continuous mn-restoring Divisor Input E m
procedure [ 13 , implemented a s a series of conditioral
adderfsubtractor modules. These arithwtic modules X15 X14 X13 ... XO1 XOO
opsrate on the previous module's output (reminder),
along w i t h the original divisor. Each a r i b t i c module 0 -1 -2 -13 -14
accepts the next two b i t s of the dividend and generates -2 2 2 2 2 ...
the next tw bits of t h e quotient. This process is (note: the minus sign represents the sign b i t ) .
repeated N t k s u n t i l the remainder is zero, or a
positive n-r. A s l i g h t cunplication occurs a t t i m s Having latched the incaning operands, the divisor i n
when the division of two p s i t i v e numbers yields a f u l l and a 16-bit sign-extension of t h e dividend sign
negative remainder. I n these cases one extra processing b i t are operated on by the f i r s t of the 16 arithwtic
step is needed; the divisor is added to the emanating modules which ccmprise the chip's core. This procedure
negative remainder to correct it. The c h i p ' s is repzated throughout the core, with each module
architecture p a r a l l e l s the a l q x i t h m i e flow (Fig. 1 ) . accepting the next two hits of the dividend
The device cunprises 1 6 identical s y s t o l i c functional appropriately delayed to canpensate f o r t b pipeline
c e l l s separated by synchronous pipeline registers (Fig. delay within the core. The carries e m a ~ t i n ga t each
2 ) . The n m k r of s y s t o l i c functional cells (one-bit stage of the arithwtic a r e are collected and
quotient generators) per inter-register pipeline stage assembled into a provisional quotient. Before
sets t h e balance between throughput and number of outputting the f i n a l result, conditioning of the
cycles of pipeline latency. The core of each block quotient is performea to correct the result for the
consists of two N-bit wide adderfsubtractor circuits, p s s i b i l i t y of having reached a z e r o remainder prior to
which can increase o r decrease the inccmirq remainder the last arithmetic process in the core, or as stated
by the divisor (Fig. 3 ) . Subtraction is performed i f above, t o correct a p s s i b l e remainder anamly; and
the sign b i t s of the divisor and the incaning reminder l a s t l y , to oanplement the quotient in case of a
match; addition is done i f they d i f f e r . With two cells negative divisor, or dividend. The design o f f e r s two
per stage, the 1Mc3211 is able t o meet the design goals error flags, DZ (divisor equals zero), ard RFM (inexact
of throughput and chip s i z e . result, non-zero ranainder), which accompany each
quotient output. When DZ is high, indicating a divide
by zero operation, the quotient is meaningless. me t o
RJNC!lXQ\LAL DBCRIFTICN a f i n i t e data word width, a t m ' s complement overflow
e r r o r OCCUTS urder the following unique conditions :
Both the 32-bit d i v i d e d ard the 16-bit divisor i n p t s
are accepted by the chip on separate 25 MHz. busses. Dividend : Y = 80000000 [hex] (neg. f u l l scale)
Each input port is enabled individually to allow
ccntinuous division by a constant. The integer divider Divisor : X = FFFF [hex] ( -1)
s u p p r t s a l l fixed-point input f o m t s , although the
u s e r must keep track of the b i m r y point t o i n t e r p r e t Result
the quotient properly.
Quotient : Q = 80000000 [hex] (neg. f u l l scale)

CH1234-5/8910000-P7-4.1$01 .OO 0 1989 IEEE 7.7 I

As stated earlier, this condition occurs due to a X11531 RI1591
limitation in the n-r of bits available to indicate
a positive full scale quotient.

W P DESIC24 AND
--- IMPLINDTTATICN .IF. .X15
...t

-
216 aM + A*
. .-
. .R15
- zx R + x + yY
OM R15 -
-
,
ckle of the lrain goals in t h i s effort was to achieve the IF X15 [CARRY-IN1
(CARRY-OUT)
Z 1 6 x OM + R' 2x R -X + YM
desired throughput of 25 Mops, and also to maximize the
effective use of active area real estate. The systolic
architecture implemented as a direct result of the
chosen division algorithm pointed to a "long and
M_rrow' aspect ratio. Furthermre, the dividend and
16 4 XI1531 16 R'l153;

quotient holding circuits required shift registers that IF X15 # R'15 -

in the f o m case increase in length by one stage as
the process flows towards the final result, and in the
latter case decrease in length by one stage (Fig. 4 ) .
These requirements on the holding circuits, as e11 as
on the arithmetic core, i m p s e d special care on the
chip's flax plan. The device flmr plan follows a U-
shaped flm. The data flow is fran upper right, to
u p p r left along the U-path. The dividend holding
registers supply the delayed Y-bits to the modules fran
the mxlule's outer kourdary (away fran the center);
while the quotient bits are received by the quotient FIGURE 1
holding circuits located in the center of the chip
D I V I D E R ALGORITHMIC FLOW
(fig. 5). The increasing, ard decreasing registers of
the quotient an3 dividend holding circuits,
respectively, were implemented as S-shape clusters of
&type flip-flops in an attempt to m3ximize active area
usage. A v e q desirable chip aspect ratio has been
accaplished, the die size is (7.00 X 7.18) m. sq.
(Fig. 6). The transistor count is over 50 K. The logic
implawntation of the chip was done exclusively with
cells f m n OUT CMOS I1sP-cell library; ard fabricated in
W ' s ~ ~ C N -me-micron
C CMOS process.

The entire project, from architecture definition to

first prototypes, was acomplished in 8 calendar
mths; and the design proved successful on the first
iteration. Although somewhat m r e cunberscme than
dtiplication, division can be executed cost-
effectively in an algorithm-specific integrated
. .
circuit. The size of the chip is roughly proportional
to the precisian of the divisor, which determines the
width of each acader/subtractor stage. Likewise, it is
also p r o p o r t i o ~ lto the desired precision of the
quotient, which requires one adder/subtractor per bit
The 1MC3211 offers a cost effective, low p e r , high
speed solution for performing fixed pint division in a
variety of digital signal processing applications.
I ZERO RfMAINOER CORRLCTOR OYSR RiM I
REFERENCES
FIGURE 2
111 Oterman, R.M.M., Digital Circuits for Binary T M C 3 2 1 1 BLOCK DIAGRAM
Arithmetic, maw-Hill, Nar York (1979)

P7-4.2
OlVlSOR REMAINDER
r I
X15 X14:O A15 R144 Y31

Y
HOLl
CKTI

D I V I D E R ACTUAL FLOOR PLAN

I
X
- R
1
Y31

FIGURE 3
ARITHMETIC CELL FUNCTIONAL BLOCK DIAGRAM

X
DIVISOR

FIGURE 6
I
C H IP M I CROP HOTOGRAPH
HOLDING
MSB

1 OlVlDER
QUOTIENT .
I CORE

FLAGS 4 2,
FIGURE 4

DIVIDER PRELIMINARY FLOOR PLAN

P7-4.3

Bending Moment in A Beam
87% (103)
Bending Moment in A Beam
19 pages
Accuload IV Manual de Operador Mn06200
No ratings yet
Accuload IV Manual de Operador Mn06200
192 pages
Determination The Total Solid Content (TSC) of NR and NBR Latex
100% (2)
Determination The Total Solid Content (TSC) of NR and NBR Latex
7 pages
Install Coovachilli On Ubuntu 14.04
No ratings yet
Install Coovachilli On Ubuntu 14.04
5 pages
Division: Parts Chapters
No ratings yet
Division: Parts Chapters
23 pages
Adaptive Approximation in Arithmetic Circuits A Low-Power Unsigned Divider Design
No ratings yet
Adaptive Approximation in Arithmetic Circuits A Low-Power Unsigned Divider Design
6 pages
SHMT Chap4 Division
No ratings yet
SHMT Chap4 Division
100 pages
Co Set-2
No ratings yet
Co Set-2
36 pages
Coa
No ratings yet
Coa
3 pages
A. With: George
No ratings yet
A. With: George
8 pages
Module 2 Book
No ratings yet
Module 2 Book
34 pages
4175-Article Text-7672-1-10-20210430
No ratings yet
4175-Article Text-7672-1-10-20210430
12 pages
2742060.2742063-Non Restoring Divider
No ratings yet
2742060.2742063-Non Restoring Divider
6 pages
COA Module 2
No ratings yet
COA Module 2
65 pages
ARITHMETIC and LOGIC UNIT - in This Lecture, We Will Examine How
No ratings yet
ARITHMETIC and LOGIC UNIT - in This Lecture, We Will Examine How
12 pages
Computer Division
No ratings yet
Computer Division
2 pages
Dkan0003a BCD Division
No ratings yet
Dkan0003a BCD Division
8 pages
DOC-20240913-WA0013.
No ratings yet
DOC-20240913-WA0013.
12 pages
Module 3
No ratings yet
Module 3
71 pages
Comp 11
No ratings yet
Comp 11
13 pages
Homework 3 Computer Architecture
No ratings yet
Homework 3 Computer Architecture
4 pages
10-Fixed Point Arithmetic - Division
No ratings yet
10-Fixed Point Arithmetic - Division
7 pages
Implement 16 bit division circuit using data flow architecture
No ratings yet
Implement 16 bit division circuit using data flow architecture
7 pages
What are the roles of ALU
No ratings yet
What are the roles of ALU
150 pages
A Fpga Ieee-754-2008 Decimal64 Floating-Point Adder-Subtractor
No ratings yet
A Fpga Ieee-754-2008 Decimal64 Floating-Point Adder-Subtractor
6 pages
Implementation of N-Bit Divider Using VHDL: Abstract
No ratings yet
Implementation of N-Bit Divider Using VHDL: Abstract
4 pages
Mulitpliers Dividers Supp4 PDF
No ratings yet
Mulitpliers Dividers Supp4 PDF
15 pages
Designing A Divider: With Contributions From J. Kubiatowicz (CS152)
No ratings yet
Designing A Divider: With Contributions From J. Kubiatowicz (CS152)
12 pages
COA UNIT -3
No ratings yet
COA UNIT -3
75 pages
Chapter 2 PDF
No ratings yet
Chapter 2 PDF
18 pages
Unit - 3 of Computer Architecture
No ratings yet
Unit - 3 of Computer Architecture
59 pages
GCD
No ratings yet
GCD
9 pages
Non Restoring Asynchronous Divider
No ratings yet
Non Restoring Asynchronous Divider
10 pages
Unit2_2.3&2.4
No ratings yet
Unit2_2.3&2.4
28 pages
ECEN 4233 - Implentation of Goldschmidt's Algorithm For 16 Bit Division and Square Root
No ratings yet
ECEN 4233 - Implentation of Goldschmidt's Algorithm For 16 Bit Division and Square Root
13 pages
FPGA Implementation of Fixed Point Integer Divider Using Iterative Array Structure
No ratings yet
FPGA Implementation of Fixed Point Integer Divider Using Iterative Array Structure
10 pages
Multiplication For 2's Complement System - Booth Algorithm: B B B B B B
No ratings yet
Multiplication For 2's Complement System - Booth Algorithm: B B B B B B
24 pages
CA Notes 01
No ratings yet
CA Notes 01
14 pages
Chapter 3 Part 3 PDF
No ratings yet
Chapter 3 Part 3 PDF
11 pages
Single and Double Precision Floating Point Multiplication and Division Alu
No ratings yet
Single and Double Precision Floating Point Multiplication and Division Alu
26 pages
Division
No ratings yet
Division
19 pages
FALLSEM2018-19 CSE2001 TH SJT502 VL2018191005001 Reference Material II 2.5a Fixedpoint Division
No ratings yet
FALLSEM2018-19 CSE2001 TH SJT502 VL2018191005001 Reference Material II 2.5a Fixedpoint Division
13 pages
Division: Check For 0 Divisor Long Division Approach
No ratings yet
Division: Check For 0 Divisor Long Division Approach
27 pages
Lecture 8: Binary Multiplication & Division
No ratings yet
Lecture 8: Binary Multiplication & Division
20 pages
Coa Unit 2
No ratings yet
Coa Unit 2
5 pages
Lec 14
No ratings yet
Lec 14
29 pages
Data Representation (V) For SMJE3093
No ratings yet
Data Representation (V) For SMJE3093
75 pages
VLSI Implementation of A Floating-Point Divider
No ratings yet
VLSI Implementation of A Floating-Point Divider
4 pages
Design and Implementation of An Optimized Double Precision Floating Point Divider On FPGA
No ratings yet
Design and Implementation of An Optimized Double Precision Floating Point Divider On FPGA
8 pages
WINSEM2023-24 SWE1005 TH VL2023240503115 2024-01-23 Reference-Material-I
No ratings yet
WINSEM2023-24 SWE1005 TH VL2023240503115 2024-01-23 Reference-Material-I
28 pages
Chapter 2 Exercise and Answer Sign
No ratings yet
Chapter 2 Exercise and Answer Sign
10 pages
ch5
No ratings yet
ch5
76 pages
Binary Division: 64-Bit Shift Register 64-Bit ALU 32-Bit Shift Register
No ratings yet
Binary Division: 64-Bit Shift Register 64-Bit ALU 32-Bit Shift Register
4 pages
Report
No ratings yet
Report
42 pages
Multiplication Division Using Digital Circuits
No ratings yet
Multiplication Division Using Digital Circuits
11 pages
Lecture35
No ratings yet
Lecture35
34 pages
Implementation of Binary To Floating Point Converter Using HDL
No ratings yet
Implementation of Binary To Floating Point Converter Using HDL
41 pages
Fpga Implementation of Modified Radix 2 SRT Division Algorithm
No ratings yet
Fpga Implementation of Modified Radix 2 SRT Division Algorithm
4 pages
Chapter 07 Computer Arithmetic 2
No ratings yet
Chapter 07 Computer Arithmetic 2
57 pages
DSP Architecture
100% (1)
DSP Architecture
31 pages
Hardware Implementation of Multiplication and Division Algorithm
No ratings yet
Hardware Implementation of Multiplication and Division Algorithm
6 pages
Analog Dialogue, Volume 48, Number 1: Analog Dialogue, #13
From Everand
Analog Dialogue, Volume 48, Number 1: Analog Dialogue, #13
Analog Dialogue
4/5 (1)
Digital Circuit Simulation Using Excel
From Everand
Digital Circuit Simulation Using Excel
Anthony Mazzurco
No ratings yet
Reference Guide To Useful Electronic Circuits And Circuit Design Techniques - Part 2
From Everand
Reference Guide To Useful Electronic Circuits And Circuit Design Techniques - Part 2
Kerwin Mathew
No ratings yet
A Solar Observation Was Made in The Afternoon and The Following Quantities Have Been Determined
No ratings yet
A Solar Observation Was Made in The Afternoon and The Following Quantities Have Been Determined
5 pages
Block Chain Fundamentals and Hyperledger
No ratings yet
Block Chain Fundamentals and Hyperledger
42 pages
FHC 35 K
No ratings yet
FHC 35 K
244 pages
The 7 Natural Laws
100% (2)
The 7 Natural Laws
16 pages
Chapter 4
No ratings yet
Chapter 4
11 pages
Itskov - On The Theory of Fourth-Order Tensors and Their Applications in Computational Mechanics (CMAM)
No ratings yet
Itskov - On The Theory of Fourth-Order Tensors and Their Applications in Computational Mechanics (CMAM)
20 pages
Tutorial Retaining Wall
No ratings yet
Tutorial Retaining Wall
14 pages
Forouzan - MCQ in Analog Transmission PDF
No ratings yet
Forouzan - MCQ in Analog Transmission PDF
18 pages
02 Lab Manual - Program Using Macro
No ratings yet
02 Lab Manual - Program Using Macro
3 pages
Working With The Seagull Framework
100% (2)
Working With The Seagull Framework
48 pages
CoachingManualLevelIIIUPDATE13Mar17 English-1
No ratings yet
CoachingManualLevelIIIUPDATE13Mar17 English-1
78 pages
Unit 3 - Decision Making under Uncertainty in AI
No ratings yet
Unit 3 - Decision Making under Uncertainty in AI
25 pages
Legend Qgis With Python
No ratings yet
Legend Qgis With Python
4 pages
Question Bank Data Science & Its Applications
No ratings yet
Question Bank Data Science & Its Applications
3 pages
GenMath w1
No ratings yet
GenMath w1
5 pages
DP5900
No ratings yet
DP5900
2 pages
Effect of Curing Methods On The Compressive Strength of Concrete
No ratings yet
Effect of Curing Methods On The Compressive Strength of Concrete
7 pages
Grade 8 Science Q1 Pre-Test
No ratings yet
Grade 8 Science Q1 Pre-Test
2 pages
KSB Omega
No ratings yet
KSB Omega
2 pages
Drawing Molex 5263
No ratings yet
Drawing Molex 5263
2 pages
Ninol 1281: May 2006 Supercedes: Dec. 2001
No ratings yet
Ninol 1281: May 2006 Supercedes: Dec. 2001
2 pages
Experiment No 1: Tittle
No ratings yet
Experiment No 1: Tittle
11 pages
PSD Syllabus
No ratings yet
PSD Syllabus
2 pages
1.2 Basic Syntax, Data Types, and Variables
100% (1)
1.2 Basic Syntax, Data Types, and Variables
11 pages
DSB SC Final Project
No ratings yet
DSB SC Final Project
14 pages

Processors.: Mops Integer Dmder Ic

Uploaded by

Processors.: Mops Integer Dmder Ic

Uploaded by

A 25 MOPS SYSTOLIC INTEGER D M D E R I C

TRW LSI P m m INC.

mmm 'pne integer data formats are as follows:

A_uxIRITHM AND DEVICE ARCHITECIVRE Dividend Input and a t p u t E m

CH1234-5/8910000-P7-4.1$01 .OO 0 1989 IEEE 7.7 I

quotient holding circuits required shift registers that IF X15 # R'15 -

The entire project, from architecture definition to

D I V I D E R ACTUAL FLOOR PLAN

DIVIDER PRELIMINARY FLOOR PLAN

You might also like