0% found this document useful (0 votes)
29 views9 pages

Short Papers: An Efficient and Regular Routing Methodology For Datapath Designs Using Net Regularity Extraction

Data path design paper 3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views9 pages

Short Papers: An Efficient and Regular Routing Methodology For Datapath Designs Using Net Regularity Extraction

Data path design paper 3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO.

1, JANUARY 2002

93

Short Papers_______________________________________________________________________________
An Efficient and Regular Routing Methodology for
Datapath Designs Using Net Regularity Extraction
Sabyasachi Das and Sunil P. Khatri

AbstractWe present a new detailed routing methodology specifically


designed for datapath layouts. In typical state-of-the-art microprocessor
designs, datapaths comprise about 70% of the logic (excluding caches).
However, most logic and layout synthesis research has targeted randomlogic portions of the design. In general, techniques for random-logic placement and routing do not produce good results for datapath layouts. Although research on datapath placement and global routing has been reported, very little research has been reported in the area of detailed routing
for datapaths. Our datapath routing methodology exploits the unique feature of datapaths, namely, their regularity. Datapaths typically comprise
regular structures (or bit slices), which are replicated. The interconnections between these replicated bit slices are also typically very regular. Our
datapath routing methodology utilizes new techniques to extract interconnect regularity among bit slices. We define a net cluster, which is collection of similarly structured nets present across different bit slices. We introduce two clustering schemes (footprint-driven clustering and instancedriven clustering) to extract such net clusters. Using these net clusters, we
select one representative bit slice to perform a strap-based routing (which
optimally finds the shortest path between two points if that path is available) on a member net of each net cluster. Then for each such net, we propagate its route to all other nets in its net cluster. Our algorithm is unique
in that it performs the detailed routing on a single bit slice and infers the
routing for all bit slices using the notion of net clusters. Since we only route
a small fraction of nets present in the design, significant speedup is obtained. We demonstrate at least six times speedup for industrial 32- and
64-bit datapath designs. The regularity of the routes across the bit slices
results in more predictable timing characteristics for the resulting layout.
Index TermsDatapath, detailed routing, footprint-driven clustering,
instance-driven clustering, net clusters.

paths. To the best of our knowledge, there has been no other research
on detailed routing for datapaths.
In this paper, we propose a new detailed routing methodology that
exploits the regularity of connections in a datapath circuit. In our
scheme, we route all the regular nets in a similar fashion so as to
ensure good quality, regular routes. This results in highly predictable
timing characteristics of the resulting design and the routing process
is much faster than other conventional routers.
We have organized the rest of the paper as follows: Section II
presents general characteristics and some definitions of a datapath.
In Section III, we discuss our proposed flow. Section IV presents the
advantages of our approach. Experimental results are provided in
Section V and conclusions are drawn in Section VI.
II. CHARACTERISTICS OF DATAPATHS
Datapaths are commonly found in microprocessors, digital signal
processors, and graphics integrated circuits. In datapaths, the same
logic is repeated multiple times. We define a bit slice as the logic
corresponding to a particular bit. In practice,
bit slices are abutted
to obtain the design of an -bit datapath. The layout width of all bit
slices is identical and we call this the bit pitch or pitch. The convention
we follow for this paper is that the data flows vertically and control
flows horizontally. In most standard-cell-based datapath styles, each
bit slice is composed of multiple instances of standard cells (or larger
master cells).

III. OUR APPROACH


Fig. 1 describes our overall flow. In following sections, we discuss
each step in detail.
A. Reading the Schematic Netlist

I. INTRODUCTION
As we migrate toward ultra deep-submicrometer feature sizes, designs are becoming increasingly complex with very aggressive goals.
Datapaths are one of the more critical parts of the design. It is well understood that traditional design automation methodologies are not well
suited for the design of high-performance datapaths. As a result, datapath blocks are usually manually designed, resulting in a significantly
larger design time and cost.
To solve this problem, researchers are actively trying to develop design automation methodologies which are suitable for the design of
datapath circuits. For example, several datapath placement [1], [2] and
synthesis [3] techniques have been reported. In [4], the authors introduce a datapath routing methodology. Their work differs from ours in
that it uses probabilistic measures of congestion to guide the routing
which is performed simultaneously for all nets. Results are reported
on small designs, while our goal is to tackle very large industrial data-

Manuscript received April 10, 2001; revised August 2, 2001. This paper was
recommended by Guest Editor S. S. Sapatnekar.
S. Das is with the Cadence Design Systems, San Jose, CA 95134 USA
(e-mail: [email protected]).
S. P. Khatri is with the Department of Electrical and Computer Engineering,
University of Colorado, Boulder, CO 80309 USA (e-mail: [email protected]).
Publisher Item Identifier S 0278-0070(02)00100-8.

First, we read the schematic (logic) netlist of the whole block, which
consists of several instances of library cells. Currently, our tool can
handle only two levels of hierarchy. In the top level of the hierarchy, all
the connections between the instances are specified. In the lower level,
logical details of the library cells are specified.
B. Generating the Placement
Next, we place instances of the master cells of the datapath block in
a structured manner. In this paper, we do not focus on placement, since
several datapath placement algorithms are already available. Rather, we
use an industrial datapath placement tool to produce a regular placement.
C. Reading the Layout Information of Cells
In this step, we read the layout information of the library cells that
make up the datapath. In particular, we obtain details about the blockages present in the datapath block.
D. Extracting Net Clusters
In a datapath block, several regular structures are present across multiple bit slices. Techniques to extract regular instance structures have
been proposed by Arikati et al. [5] and Hassoun et al. [6].
In this paper, we extract regular net structures present in different
bit slices. We define a net cluster as a collection of nets (spread over

02780070/02$17.00 2002 IEEE

94

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 1, JANUARY 2002

Fig. 2. Footprint-driven clustering.

by lexicographically concatenating the names of all the connecting


instances and the name of the net. FDC is described in Algorithm 1.
Detailed comments are provided below.
Algorithm 1: Footprint-Driven Clustering
findAllNetNames(designName)
NetNames
GFs
findGlobalFootPrints(NetNames)
NULL
AllGroupsOfNets
for each unique global footprint (ugf) do
NewGroups
getGroupsOfNets(ugf)
NewGroups
AllGroupsOfNets
AllGroupsOfNets
end for
AllNetClusters
NULL
for each Group in AllGroupsOfNets do
findNetsInGroup(Group)
NetsInGroup
DFs
findDetailedFootPrints(NetsInGroup)
CreateNetCluster(DFs, NetsInGroup)
NewNC
AllNetClusters
AllNetCluster
NewNC
end for
Return AllNetClusters

Fig. 1. Overall datapath routing flow.

different bit slices) in which all nets have similar connections. In particular, if two nets net1 and net2 belong to same net cluster, then net1
and net2 contain the same number of pins and for each pin p of net1
with coordinates (xp , yp ), there exists a pin q of net2 with coordinates
(xq , yq ) such that
1) yp = yq ;
2) jxp 0 xq j = k 1 bit pitch (1  k  N 0 1).
To denote a net cluster NC 1 with nets N 0, N 1, N 2, N 3, N 4,
we use the following notation: NC 1 = fN 0; N 1; N 2; N 3; N 4g. We
have developed different algorithms to identify net clusters. The footprint-driven clustering (FDC) algorithm creates net clusters based on
the names of pins, master cells, and nets in the datapath. This is supplemented by a more powerful instance-driven clustering (IDC) algorithm, which extracts clusters based on position information of the pins
of nets. In the detailed description of these techniques, we have illustrated the techniques via examples containing two-pin nets. However,
all our clustering techniques work for multipin nets as well.
1) Footprint-Driven Clustering: In general, datapath designers
follow a very regular naming style, in order to effectively manage and
debug the datapath design. The FDC exploits this naming regularity.
Fig. 2 shows a 4-bit datapath which follows a regular naming scheme.
We define the global footprint of a net as a string which is created
by lexicographically concatenating the names of the net pins (of the
connecting instances) and names of master-cells of those instances.
The detailed footprint of a net is defined as a string that is created

1) First, we calculate global footprints for all nets. We form a


number of groups of nets, such that all nets in a single group
have the same global footprint and no two nets belonging to
different groups have same global footprint.
2) In the next part, our target is to create one or more net clusters from each group. We generate detailed footprints for each
member net in a group. If the indexes of the names in the detailed footprints of two nets differ by a constant k , then these
two nets belong to a single net cluster. Otherwise, these two nets
belong to different net clusters.
We illustrate FDC by applying Algorithm 1 on the design shown in
Fig. 2. We assume that A3 , A2 , A1 , A0 belong to same master cell
(say, M 1). B 3, B 2, B 1, B 0 also belong to the master cell M 1. Let
C 3, C 2, C 1, C 0 and D 3, D 2, D 1, D 0 all belong to master cell M 2.
The nets in bit-slice 3 are as follows. Net LB [3] connects pin s of A3
to pin p of D3, net SB [3] connects pin s of B 3 to pin p of C 3, and net
SM [3] connects pin x of C 3 to pin x of D 3.
1) After running the first step of FDC algorithm on the design of
Fig. 2, we get two groups. Group 1 contains nets LB [3], LB [2],

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 1, JANUARY 2002

95

LB [1] LB [0] SB [3] SB [2] SB [1] SB [0]

,
,
,
,
,
. Group 2 contains
nets SM [3], SM [2], SM [1], SM [0].
2) In second step, we consider one group at a time to create net
clusters. At the end of this step, we get a total of three net clusters
(two from Group 1 and one from Group 2). These are

f
f
NC 3 =f

g
g

NC 1 =

LB [3]; LB [2]; LB [1]; LB [0]

NC 2 =

SB [3]; SB [2]; SB [1]; SB [0]

SM [3]; SM [2]; SM [1]; SM [0] :

2) Instance-Driven Clustering: While designing the datapath, if


some nets were not named using a uniform naming scheme, then the
FDC algorithm would not identify their corresponding net clusters.
This problem often occurs when a logic synthesis tool is utilized to
create the datapath schematic. Synthesis tools usually assign randomly
generated names for unnamed nets. To identify such nets and create
the appropriate net clusters, we apply IDC. This is also a two-step
technique. The IDC algorithm is as follows.
1) For each unclustered net, we first compute the global footprints
to form candidate groups, just as in the FDC algorithm.
2) Next, we consider one group at a time and create net clusters
from that group by using the following definition.
Let us assume that we have two nets netA and netB , which
are connected to P pins each. After sorting the pin coordinates
by Y -coordinate value, assume that the pins of netA are at loP
cations (x1a ; ya1 ); (x2a ; ya2 ); 1 1 1 ; (xP
a ; ya ) and the pins of netB
1 1
2 2
P
P
are at (xb ; yb ); (xb ; yb ); 1 1 1 ; (xb ; yb ). Then, two nets belong
to same net cluster if the following 2 1 P conditions are satisfied:
1) yaj = ybj (for j = 1; 2; 1 1 1 ; P );
2) jxja 0 xjb j = k 1 bit pitch (for j = 1; 2; . . . ; P and 1 
k  N 0 1) .
The above algorithm can be illustrated by using a slightly modified
version of the design shown in Fig. 2. Let us assume that in Fig. 2, the
logic synthesis tool specified names AB , C D , EF , GH , K L, M N ,
RS , and T V for nets LB [3], LB [2], LB [1], LB [0], SB [3], SB [2],
SB [1], and SB [0], respectively.
After running the IDC algorithm, we get two net clusters. These are

f
f

NC 1 =

AB; C D; E F; GH

NC 2 =

K L; M N; RS; T V

g
g

We note that the worst case complexity of the clustering process


(which comprises FDC followed by IDC) is quadratic in the number
of nets m in the design. However, for datapath designs that follow a
regular naming convention for similar signals, the complexity reduces
to O(m log m).
3) Cluster Merging: After running both FDC and IDC, we attempt
to merge clusters. This is useful in designs where some nets in a cluster
have regular names while others do not. We illustrate this algorithm
with the help of a slightly modified version of Fig. 2. Let us assume
that net SM [3] was named as ABC and net SM [2] was named as
DE F .
First, we invoke FDC algorithm to get the following three net clusters:

f
f
NC 3 =f

NC 1 =

LB [3]; LB [2]; LB [1]; LB [0]

NC 2 =

SM [1]; SM [0]

SB [3]; SB [2]; SB [1]; SB [0] :

IDC identifies an additional cluster


NC 4 =

ABC; DE F

Fig. 3.

Control net-clustering.

Now, we apply our cluster merging technique. We only consider


those clusters which have less than N member nets, where N is the
number of bits in the datapath. Now, after selecting one representative
net from each net cluster, we check whether the 2 1 P conditions described in the IDC algorithm are satisfied. If they are satisfied, then we
merge the two net clusters. After each merging operation, if the number
of nets in a merged cluster becomes equal to N , then it is not considered as a candidate for further merging. We continue this process on all
candidate clusters until no further merging is possible. In our example,
clusters NC 2 and NC 4 get merged to form a new cluster
NC New =

ABC; DE F; SM [1]; SM [0] :

4) Control-Net Clustering: In datapath blocks, there are typically


some control signals which are connected to one instance in each bit
slice and also to two boundary pins in two sides of the block. The select
signal in a multiplexer belongs to this category. Fig. 3 illustrates two
such signals (RST [0] and SE L[1]). Since these are long single nets,
these cannot be part of any net cluster described so far. In our example
datapaths, we have seen that such nets consist of between 2% and 5%
of all nets. We handle these nets using our control-net clustering (CNC)
technique.
This technique consists of following two steps.
1) If a net is connected to N instances (belonging to bit slices
0; 1; . . . ; N 0 1) and two side pins, we can model that net as
a combination of:
a) a net between the left boundary pin and the pin in (N 0 1)th
bit slices instance;
b) (N 0 1) nets between the pins in ith and (i + 1)th bit slices
instance. (for i = 0; 1; . . . ; (N 0 2));
c) a net between the pin in zeroth bit slices instance and the
right boundary pin.
2) Once we split that single long net into N + 1 smaller subnets,
we create a single net cluster out of those subnets.
Control-net clusters have two special properties.
1) All the members in a single net cluster actually belong to same
net.
2) Since locations of connecting instance pins and two boundary
pins may not be same, the leftmost and rightmost segments of a
control-net cluster may not be exactly similar to the other (N 0 1)
members in that net cluster.

96

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 1, JANUARY 2002

We illustrate this algorithm by using the net RST [0] in Fig. 3. The
net is connected to pin s of instances B 3, B 2, B 1, B 0 and to two
side-pins (U and V ). We can model this long net (spread over four bit
slices) as a collection of five subnets. Then, we create a net cluster out
of these five segments. To handle the special properties of control-net
clusters, the routing strategy is modified slightly, as described in Section III-F2.
The overall net cluster determination flow is described in Algorithm 2.
Algorithm 2: Net-Cluster Determination Flow
getFootprintDrivenClusters(allNets)
fpdClusters
getUnClusteredNets(allNets, fpdRemainingNets
Clusters)
idClusters
getInstanceDrivenClusters(remainingNets)
dataClusters
getMergedClusters(fpdClusters, idClusters)
ControlNets
getUnClusteredNets(allNets, dataClusters)
getControlNetClusters(ControlcontrolClusters
Nets)
ControlClusters
allClusters
dataClusters
return allClusters

E. Selecting Representative Bit Slices


After identifying all the net clusters, we route the nets. One of the
powerful features of our router is that we do not explicitly perform
the routing for all nets in the design. Instead, we explicitly route one
member net in each net cluster and then propagate the routes to other
nets in that net cluster. Therefore, all other nets in a net cluster are
routed implicitly. This results in highly regular routes and a significant reduction in router runtime. In practical designs, these benefits are
significant. Traditional routing approaches perform explicit routing for
each net in the design.
To use this routing approach, we need to select representative bit
slices on which to perform explicit routing.
A net is said to have same-bit connections if all the pins connected
to that net belong to a single bit slice. Such a net is called a same-bit
net. On the other hand, if all the pins of a net do not belong to the same
single bit slice, we define the connection as a cross-bit connection and
call the net a cross-bit net. If a net X has a cross-bit connection from bit
slice S (source) to bit-slide D (destination), we denote that connection
as fX : SX; DX g, where SX 6= DX .
There are two types of cross-bit connections:
1) forward cross-bit connection of degree l: fX : SX; DX g, if
(DX 0 SX ) = l (where l > 0);
2) backward cross-bit connection of degree l: fX : SX; DX g, if
(SX 0 DX ) = l (where l > 0).
If all the member nets of a net cluster are same-bit nets, we define
the net cluster as a same-bit net cluster. On the other hand, if all the
member nets of a net cluster are cross bit, then that net cluster is called
cross-bit net cluster.
In a typical datapath, most nets have same-bit connections. We observe that cross-bit net clusters always contain less than N nets.
In an N -bit datapath, we define a full net cluster as a net cluster that
satisfies one of the following two properties.
1) This net cluster has N member nets (if it is a same-bit net
cluster).
2) This net cluster has N 0 l member nets (if it is a backward or
forward cross-bit net cluster of degree l).

We first discuss the bit-slice selection strategy for the case when all
the net clusters are full and then we consider the case when some net
clusters are not full.
1) Selecting Representative Bit Slice When All Net Clusters Are
Full: To determine the representative bit slice, we conceptually
consider the datapath to have an infinite number of bit slices on the
left of the N th bit slice and on the right of the zeroth bit slice. In this
way, each of the N bit slices has an identical number of nets. In such
a structure, any of the N bit slices can be used as the representative
bit slice for explicit routing.
When we perform route propagation, all routes to or from bit slices
to the left of the N th bit slice or the right of the zeroth slice are disregarded, resulting in a correctly routed datapath.
2) Selecting Representative Bit Slices When Some Net Clusters Are
Not Full: If all the net clusters are not full, we need to select multiple bit slices as the representative slices. The following strategy is
used in this selection process. For simplicity, we limit our discussion
to same-bit net clusters, but same strategy can be used for cross-bit net
clusters also.
1) Calculate the number of member nets in each net cluster and then
find the number of nets present in each bit slice.
2) Find the bit slice with the maximum number of nets and then
select that slice as the representative bit slice. In case of a tie,
we select the bit slice with the largest net cluster (other than full
net clusters). If we encounter a tie in this comparison as well,
then any one of those slices may be selected. Let us consider two
bit slices which have same number of nets. Let us assume that
the largest net cluster (other than full net clusters) belonging to
the first bit slice has i member nets and the largest net cluster
(other than full net clusters) belonging to the second bit slice has
j member nets. If i > j , then we select the first bit slice as the
representative one. On the other hand, if i = j , then any one
of those bit slices can be selected (in our implementation, we
choose the bit slice having higher index).
3) After obtaining the routes for nets in the selected bit slice, we
propagate these routes to other bit slices with nets in the same net
cluster (and mark the net cluster as routed). Then, we repeat steps
2 and 3 for the unrouted nonfull net clusters. When we route the
next representative slice, we do not disturb routes that have been
generated or propagated earlier. This process continues until we
mark all net clusters as routed.
We illustrate the above technique using the design shown in Fig. 4.
There are four net clusters present in that design. Those are:
1)
2)
3)
4)

f
f
NC 3 = f
NC 4 = f

g;

NC 1 =

LB [3]; LB [2]; LB [1]; LB [0]

NC 2 =

SM [3]; SM [2]; SM [1]

g;
g.

RF [1]; RF [0]

g;

LF [3]; LF [0]

Notice that bit slices 3, 2, 1, and 0 have three, two, three, and three
nets, respectively. Since bit slices 3, 1, and 0 have three nets each, we
break the tie by choosing the bit slice with the largest net cluster. We
observe that bit slices 3 and 1 have one three-member net cluster and
one two-member net cluster. On the other hand, bit slice 0 has two
two-member net clusters. Thus, either bit slice 1 or bit slice 3 can be
chosen as the first representative bit slice. In our implementation, we
choose the bit slice 3. After obtaining the routes for nets in bit slice 3,
we propagate them to other bit slices with nets in the same net cluster.
Now we mark net clusters NC 1, NC 2, and NC 4 as routed. After
this, only one two-member net cluster is left unrouted in both bit slices
1 and 0. We select bit slice 1 as our next representative bit slice. Once
routing and route propagation is completed, we mark NC 3 as routed
after which no more net clusters remain to be routed.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 1, JANUARY 2002

97

Fig. 5. Routing in single bit slice.


Fig. 4.

Selecting representative bit slices when all net clusters are not full.

F. Routing the Nets in the Selected Bit Slice


Our next task is to route the nets present in the representative bit
slice. Our routing approach is a combination of pattern-based routing
and maze routing. Lee et al. [7] originally proposed the maze routing
algorithm. Pattern routing was introduced by Pugh et al. [8] and subsequently modified by Soukup et al. [9] and Asano [10].
We call our main approach as strap-based routing. We define a strap
as a straight segment, which can be either vertical or horizontal. We denote a strap between points (xi , yj ) to (xk , yj ) as (xi ; yj )
(xk ; yj ).
Strap-based routing is a gridless routing approach. In our router, if
strap-based routing fails, a maze router is invoked as a fallback. The
maze router is not gridless, but has a flexible grid size based on the
route being computed.
We first discuss the routing strategy for same-bit nets and then consider cross-bit nets.
As we can see in the Fig. 2, the datapath cells are placed in a row-like
fashion. After placement, most of the connections are found to be confined within nearby rows. Based on this observation, we have finetuned
the memory requirement for our router by only loading the information
pertaining to the required rows of the representative bit slice (instead
of loading the entire bit slice).
1) Routing Same-Bit Nets: Initially, we have the list of nets that
need to be routed. For each net, we have the list of endpoints (which
will be some pins). We first sort the nets in decreasing order of the
largest Y coordinate value of their pins. In case of a tie, we select the
longest net first, with the intuition that longer nets are expected to be
harder to route later. We also have a mechanism by which the user can
assign precedence to a particular net by assigning a large weight to that
net. Algorithm 3 describes our net-ordering scheme.

Algorithm 3: Net-Ordering Process


if (special weights present) then
findWeightedNets(allNets)
WtNets
SrtWtNets
SortNetsByWeightDescend(WtNets)
OtherNets
allNets
WtNets
else
allNets
OtherNets
end if

=
=
=

SrtOtherNets
SortByPinYDescend(OtherNets)
if (same Y for multiple nets among SrtOtherNets)
then
SameYNets
GetSameYNets(SrtOtherNets)
ConflictNets
SortByLength(sameYNets)
SrtOtherNets
ModifyNets(ConflictNets,
SrtOtherNets)
end if
SrtOtherNets
SrtWtNets
SrtAllNets
Return SrtAllNets

=
=

If two end points of a net are (x1 , y1 ) and (x2 , y2 ), then we define a
direct route as a path which has one the following strap patterns.
Case 1) If (x1 = x2 ) AND (y1 = y2 ), only strap (vertical)
(x1 ; y1 )
(x1 ; y2 ).
Case 2) If (x1 = x2 ) AND (y1 = y2 ), only strap (horizontal)
(x1 ; y1 )
(x2 ; y1 ).
Case 3) If (x1 = x2 ) AND (y1 = y2 ):
a) vertical-then-horizontal (V T H ): first strap (vertical)
(x1 ; y1 )
(x1 ; y2 ), second strap (horizontal)
(x1 ; y2 )
(x2 ; y2 ).
b) horizontal-then-vertical (H T V ): first strap (horizontal) (x1 ; y1 )
(x2 ; y1 ), second strap (vertical)
(x2 ; y1 )
(x2 ; y2 ).
We illustrate our routing algorithm using representative bit slice of
Fig. 5. Case 1 is illustrated between pins D and C . Case 2 is shown
between pins E and F (this case occurs only for cross-bit nets). Case 3
is shown between pins A and B . Examples of case 3 are shown in path
P 5 (H T V ) and path P 6 (V T H ).
After sorting all nets with respect to the largest Y coordinate of their
pins, we note that the topmost two pins are pins V and S . We select
the net associated with pin V as our first routing candidate, since it
would have a longer vertical strap (assuming we can find a direct route
for the net associated with pin S ). We first try to find a direct route
(this minimizes the via count). We attempt both V T H and H T V direct
routes and check whether either of these routes intersect with any other
pin/blockage. In this example, path P 2 (V T H direct route) intersects
with pin G. So we choose path P 1 as the route between pins V and U .
If there was no blockage in path P 2, then we could have taken any of
those two paths as the final route.

!
6
!

!
!
!

98

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 1, JANUARY 2002

Another scenario that often occurs is that both the V T H and H T V


direct routes intersect some pin/blockage. In Fig. 5, we note that the
connection between pins S and T illustrates this case. Pins J and H
block both the direct paths (P 3 and P 4 respectively). Therefore, we
try to form a three-strap path from both the direct paths. To denote the
coordinates of any pin, we use following convention: pin A is located
at (xA , yA ), pin B is at (xB , yB ), and so on. To tackle the problem of
forming a three-strap path from two direct paths (when we have only
two pin blockages), we do the following.

yJ ) and there exist two legal straps (xS ; yS ) !


and (xJ ; yJ ) ! (xT ; yT ), then we check if there
exists a legal strap from (xH ; yH ) ! (xJ ; yH ). If it exists,
we are done. Otherwise, we search for the existence of a strap
(xH ; y3 ) ! (xJ ; y3 ), for yH  y3  yJ . If such a strap is not
found, we search for five-strap routes.
2) If (yH > yJ ) and there exist two legal straps (xS ; yS ) !
(xJ ; yS ) and (xT ; yT ) ! (xH ; yT ); then we try to find a vertical strap (x3 ; yT ) ! (x3 ; yS ), where xT  x3  xS . If we
are successful, then we get a three-strap route. Otherwise, we
extend the bounding box of our vertical strap finder to (xT 0
width) < xI < (xT + 2 1 width), where width = xS 0 xT .
Both the above cases are based on the assumption that the vertical
straps of direct routes exhibit an intersection with some pin/blockage.
Such conflicts can also occur for horizontal straps (or for both vertical
and horizontal straps). These problems are handled using an analogous
technique to that described above.
In all cases, we ensure that there is no minimum spacing rule violation between two straps placed next to each other.
Once we are done routing the nets originating from the topmost row
(i.e., the nets between V U , S T , and AB ), we route the nets originating from the next row from the top. So we next select the net connecting pin L and pin M . By routing nets one row at a time, we minimize the possibility of conflict with routes in lower rows. Unfortunately, we cannot guarantee that there will be no conflicts. To handle
conflicts between an existing route and a new route, we sometimes need
to rip up existing routes. We minimize the ripup effort by allowing
only one-degree of ripup (in other words, in trying to fix a problem
for a particular route, we rip up only those nets that directly block its
path). When this rippedup net is rerouted, it preferentially uses routing
regions that have been utilized already in order to minimize the possibility of ripup/reroute being invoked on this net again.
Our strap-based router is gridless. As a result, it can handle variable
width and spacing requirements. By default, all route widths and spacings are minimum. The user may specify the desired width of any net
via a control file. In this case, the router attempts to find a strap with
the desired width. Similarly, the user may specify the desired spacing
for any net as well.
As a fallback mechanism when we cannot find a strap-based route,
we have implemented a maze routing algorithm. On the designs we
have encountered so far, this fallback routine has never been invoked.
2) Routing Cross-Bit Nets: Consider a design in which the
maximum degree of forward or backward cross connectivity is k . One
method to route this design is to route k + 1 consecutive bit slices as
a single unit. Such an approach would result in longer runtimes and
may generate irregular routes for different nets in a net cluster. In our
method, we route a single representative bit slice and infer cross-bit
connections while performing the route.
In our approach, we model a single net, spread over multiple bit
slices, as a combination of multiple subnets, each confined within a
single bit slice. As a result, only one such subnet explicitly belongs to
the representative bit slice. Other subnets belong to other bit slices. We

1) If (yH

<

(xH ; yH )

Fig. 6. Routing of cross-bit nets.

virtually instantiate the subnets of other bit slices into the representative bit slice. Virtual instantiation of a subnet implies treating the subnet
as a part of the representative bit slice while routing. After routing,
we reinstate the actual subnet route back to its original bit slice. This
method of modeling cross-bit nets saves runtime and memory and ensures regularity of the resulting design.
Fig. 6 shows four bit slices of a larger design with a forward cross
connectivity of degree two. Let us assume that our representative bit
slice is the bit slice k . Net 1 connects pin S of instance A5 (in bit slice
k ) to pin X of instance D7 (in bit slice K + 2). We assume that Net 1
belongs to a net cluster that has other member nets as well. In order to
maintain the readability of Fig. 6, only one other net (Net 2) of this net
cluster is shown.
Our aim is to route all cross-bit nets with only the data of the representative bit slice (bit slice k ) loaded in memory. The core algorithm
for finding routes is the same as that for same-bit nets with a few modifications to handle cross-bit nets. To obtain the route for Net 1, we
actually need to traverse through through bit slices (because the degree
of cross-bit connectivity is two). Therefore, whenever we try to extend
a horizontal strap to some location in the adjacent bit, we split the strap
into two straps such that each strap is confined to a single bit slice. In
the case of Net 1, we first obtain a strap from point S (in instance A5)
to point G (in A5). Next, we attempt to create a horizontal strap from
point G (in A5) to point J (in A6). In order to illustrate the mechanism
by which we model this strap using a single representative bit slice, we
do following splitting:
(xG ; yG )

!(

J ; yJ ) = (xG ; yG ) ! (xH ; yH )
+(xH ; yH ) ! (xJ ; yJ )

where H is a point on the bit-slice boundary and yH = yG = yJ .


Now, we virtually instantiate the pins of other bit slices into the representative bit slice. Fig. 7 shows the representative bit slice for the
datapath of Fig. 6 with all virtual instantiations performed. First, we
virtually instantiate the point J within the representative bit slice k by
incrementing the X coordinate of point J by a bit pitch.
In Fig. 7, we denote the virtual point as J 0 . Similarly, we virtually instantiate the point H as the point H 0 in the bit slice k . Once
these points have been instantiated, we instantiate the (xH ; yH ) !
(xJ ; yJ ) strap of bit slice k + 1 within the k th bit slice as a strap
0
0
0
0
(xH ; yH ) ! (xJ ; yJ ). Depending on the location of points J and G,
we may get an overlap between the straps (xG ; yG ) ! (xH ; yH ) and
0
0
0
0
(xH ; yH ) ! (xJ ; yJ ). This virtual overlap is not a problem because

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 1, JANUARY 2002

(a)

99

(b)

Fig. 7. Representative bit slice with virtual instantiations of subnets.

the overlapping straps belong to the same net. After routing, the virtual
straps will be reinstated to their actual bit slices, solving the overlap
problem. Once we reach the virtual destination pin, then we simply
perform the reverse mapping of all virtual locations to their original locations to get the legal straps. We utilize the same virtual instantiation
approach for vertical straps such as (xJ ; yJ ) ! (xP ; yP ).
If we follow the above approach, then we may get overlaps in the
horizontal straps of cross-bit nets belonging to same net cluster. To
avoid this problem, we use the following approach: in any net cluster,
if the horizontal span between the source and destination pins is p, then
we generate p different routes for that net cluster.
Control nets are also routed using this cross-bit routing strategy.
Fig. 3 illustrates control nets that span the entire datapath block. Control-net clusters for such nets are created using the CNC technique of
Section III-D2. As a result of applying the CNC technique, (N 0 1)
intermediate cross-bit nets (with cross-connectivity of degree one) are
created in a special net cluster. The destination pins and the source pins
of these nets have the same y coordinate and their x coordinates differ
by the bit pitch. We route one of these segments (in the representative bit slice) by using the cross-bit routing technique. Since all these
(N 0 1) segments belong to the same net, we do not need to generate
multiple routes to guarantee an electrically correct design.
3) Routing Multipin Nets: Multipin nets are defined as nets which
are connected to more than two pins. In typical datapath blocks, about
30%40% of nets are multipin nets. An approach to route a k -pin net is
to split the net into (k 0 1) two-pin subnets (using a minimum spanning
tree) and then route each subnet individually. The problem with this
technique is that the routes are often nonoptimal.
To address this issue, we utilize a shifted-pin approach. This
technique can be used for both same-bit and cross-bit nets. Consider a k -pin net, which is connected to pins present at (x1 ; y1 );
(x2 ; y2 );(x3 ; y3 ); . . . ; (xk ; yk ), where y1  y2  y3  1 1 1  yk .
In the first step, we split this net into (k 0 1) two-pin subnets
S1 ; S2 ; . . . ; Sk01 , where Si is connected to pins located at (xi , yi )
and (xi+1 , yi+1 ). Since our strap-based router starts routing from the
topmost row, the subnet S1 is routed first. After routing this subnet,
the router considers S2 , which is the subnet between the pins present
at (x2 , y2 ) and (x3 , y3 ). Instead of treating the pin located at (x2 , y2 )
as a fixed pin, we virtually shift it to (x3 , y 3 ), such that:
1) (x3 , y 3 ) is a point on the route for S1 ;
2) the Manhattan distance between (x3 , y 3 ) and (x3 , y3 ) is the minimum over all the points on the route for S1 .
After obtaining the new pin location (x3 , y 3 ), we perform strapbased routing between this new pin and the pin at (x3 , y3 ). This strategy
is applied for all the (k 0 1) subnets. This technique is quite useful

(c)

(d)

Fig. 8. Routing multipin nets. (a) Example of routing, without shifted-pin


approach. (b) Routing with shifted-pin approach for example (a). (c) Example
of routing without shifted-pin approach. (d) Routing with shifted-pin approach
for example (c).

because instead of routing pin to pin, it essentially utilizes the already


routed portion of the net.
To quantify the improvement due to the shifted-pin strategy, we compared it with the alternative approach mentioned above. We observed
that the wire-length of multipin nets were 6%9% less (without any
significant change in two-pin net wire-length) with the shifted-pin approach.
In Fig. 8, we illustrate the routing of two three-pin nets using both
approaches referred to in this section. In Fig. 8(a) and (c), routing is
performed without using shifted-pin approach. In Fig. 8(b) and (d), we
have shown the corresponding routes with the shifted-pin technique. In
both examples, the routes produced by our approach are better.
G. Propagating the Routes and Completing
Having performed the routing of the representative bit slice, we propagate the virtual routes to other bit slices. For this, we use the previously
formed net cluster information. Algorithm 4 describes our route propagation scheme for same-bit connections.
Algorithm 4: Route Propagation (for Same-Bit nets)
getAllNets(k)
AllNets
for each net (MasterNet) in AllNets do
getRouteForNet(MasterNet)
MasterRoute
NetCluster
getNetClusterForNet(MasterNet)
OtherSisterNets
getSisterNets(NetCluster, MasterNet)
for each net (SisterNet) in OtherSisterNets do
ModifyRoute(MasterRoute, SisSisterRoute
terNet, MasterNet)
AssignRoute(SisterNet, SisterRoute)
end for
end for

=
=

Algorithm 5: Route Propagation (for Cross-Bit nets)


AllCrossBitNets
getAllCrossBitNetsWithSourceInBit(k)
for each net (MasterNet) in AllCrossBitNets do

100

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 1, JANUARY 2002

TABLE I
CHARACTERISTICS OF EXAMPLE CIRCUITS

trol-net clusters. The figure assumes that the horizontal route between
the s pins of net RST [0] of adjacent bit slices is blocked.
At this stage, we check whether there are any unrouted nets present in
the design. If some nets have not been routed, we invoke the strap-based
routing scheme to route these nets.
IV. ADVANTAGES OF OUR APPROACH

Fig. 9.

Routing of nets in control-net clusters.

MasterRoutes
getMasterRoutes(MasterNet, p)
= MasterRoutes is an array of p routes =
MasterRoute
MasterRoutes
getNetClusterForNet(MasterNet)
NetCluster
OtherSisterNets
getSisterNets(NetCluster,
MasterNet)
for each net (SisterNet) in OtherSisterNets do
getSourceBit(SisterNet)
NewSourceBit
NewSourceBit-k N
PositiveBitDiff
ModValue
(PositiveBitDiff modulus p)
SisterRoute
ModifyRoute(MasterRoutes[Modvalue], SisterNet, MasterNet)
AssignRoute(SisterNet, SisterRoute)
end for
end for

=
=

Algorithm 4 is summarized as follows. We first get all the nets in


representative bit slice k . Now, for each master net, we find the other
(sister) nets in the corresponding net cluster. Finally, by modifying the
route of the master net we create distinct routes for each sister net
Within the ModifyRoute() function, we create the SisterRoute by doing
the following calculations:
1)
2)

Y co-ord of SisterRoute pins = Y co-ord of MasterRoute pins;


X co-ord of SisterRoute pins = X co-ord of MasterRoute pins
+ (MasterNetBit 0 SisterNetBit) 1 Bit-Patch.

Once we complete route propagation for all the nets present in the
representative bit slice, we obtain a design-rule correct route for all the
bit slices and the routing task is completed.
Route propagation changes slightly for cross-bit connections. As
mentioned in the previous section, if the horizontal span between the
source and destination pins of a cross-bit net is p bit slices, then we
construct p different routes for that net. Now, we need to propagate the
correct route to individual bit slices. Algorithm 5 describes our technique for route propagation of cross-bit nets.
For control nets, we propagate the routes obtained in the representative bit slice by using the cross-bit route propagation technique. Also,
since the rightmost and the leftmost segments in a control-net cluster
are topologically different, we perform explicit routing for these two
segments. In Fig. 9, we have shown the routing results for the two con-

Our routing approach has several advantages over traditional routing


schemes utilized in the datapath context. Some of these are the following.
1) Speed of Routing: By exploiting datapath regularity, our routing
technique is able to route an entire datapath while explicitly
routing only a small subset of the nets. This approach makes
it possible to route large industrial datapaths with significantly
shorter runtimes compared to a traditional router.
2) Predictable Routes: The routes obtained by our router are highly
regular across bit slices. As a result, the wiring parasitics for different nets in a net cluster are very similar, resulting in a predictable design.
3) Better Debuggability and Timing: If a datapath, routed using conventional routers, does not meet timing requirements, the designer usually spends a significant amount of time trying to find
the badly routed nets and then re-routes those nets. This can
cause a ripple effect, making some other nets critical. In our flow,
all nets in a net cluster have substantially similar delays. This
allows better and more predictable timing characteristics of the
design and eases the debugging task.
4) Easy Incremental Routing: Before a design is taped out, several iterations of routing and timing checks are performed. In
these iterations, designers often modify the design slightly. In
such a scenario, the router needs to perform efficient incremental
routing. Because of the inherent speed and the regular nature of
our router, it is very much suitable for incremental routing.
V. EXPERIMENTAL RESULTS
We implemented our router in the C++ programming language. The
code for our datapath router consists of about 7000 lines of C++. Experiments were run on a 440-MHz Hewlett-Packard (HP) machine with
512-MB memory, running the HP-UX 10.20 operating system.
To compare our results, we used datapath blocks from state-of-theart 32-bit or 64-bit microprocessors. Table I describes the characteristics of our benchmark circuits. We compared our algorithm against
a commercially available router. In Table II, we report runtime, wirelength, and via-count results from both our router and industrial router.
On an average, our router is about six times faster for 32-bit datapaths
and eight times faster for 64-bit designs. This is expected since we only
route a single representative bit slice in our approach. After our current
prototype router is optimized for speed, we expect our runtime numbers
to be improved further.
We note that the average wire-length gain of our method is minimal.
Finally, we notice that our technique reduces the total number of vias by

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 1, JANUARY 2002

101

TABLE II
RUNTIME, WIRE-LENGTH, AND VIA-COUNT COMPARISON BETWEEN AN INDUSTRIAL ROUTER AND OUR ROUTER

about 8%. We conjecture that the our router utilizes fewer vias because
of its strap-based nature.
VI. CONCLUSION
In this paper, we have presented a new method for performing
detailed routing for datapath designs, which fully utilizes the regular
structures present in a datapath. In our technique, we first extract
interconnection regularity within the datapath by creating net clusters. Next, we route the net in a single representative bit slice of the
datapath and from the routes thus obtained and we infer routes for the
rest of the nets in the corresponding net cluster. Experimental results
demonstrate a significant improvement in runtime over a commercial
router. Also, our router produces highly predictable timing results.
REFERENCES
[1] N. Buddi, M. Chrzanowska-Jeske, and C. Saxe, Layout synthesis for
datapath designs, in Proc. Eur. Design Automation Conf., Sept. 1995,
pp. 8690.

[2] T. Ye and G. D. Micheli, Datapath placement with regularity, in Proc.


IEEE/ACM Int. Conf. Computer-Aided Design, Nov. 2000, pp. 264270.
[3] J. King and S. M. Kang, A timing-driven datapath layout synthesis with
integer programming, in Proc. IEEE/ACM Int. Conf. Computer-Aided
Design, Nov. 1995, pp. 716719.
[4] S. Raman, S. Sapatnekar, and C. Alpert, Datapath routing based on a
decongestion metric, in Proc. ACM Int. Symp. Physical Design, Apr.
2000, pp. 122127.
[5] S. Arikati and R. Varadarajan, A signature based approach to regularity extraction, in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, Nov. 1997, pp. 542545.
[6] S. Hassoun and C. McCreary, Regularity extraction via clan-based
structural circuit decomposition, in Proc. IEEE/ACM Int. Conf.
Computer-Aided Design, Nov. 1999, pp. 414418.
[7] C. Lee, An algorithm for path connections and its applications, IRE
Trans. Electron. Comput., vol. EC-10, pp. 346365, 1961.
[8] L. Pugh, An improvement in printed circuit board routability using a
maze-running algorithm, Electron. Lett., vol. 14, no. 1, Jan. 1991.
[9] J. Soukup and S. Fournier, Pattern router, in Proc. Int. Symp. Circuits
and Systems, 1979, pp. 486489.
[10] T. Asano, Parametric pattern router, in Proc. ACM/IEEE 19th Design
Automation Conf., 1982, pp. 411417.

You might also like