0% found this document useful (0 votes)
6 views

placement and routing

The document discusses the process of placement in circuit design, focusing on arranging components on a layout surface while considering constraints such as wirelength and routability. It outlines placement goals, objectives, and algorithms, including min-cut and eigenvalue methods, to optimize the arrangement of logic cells. Additionally, it addresses the importance of minimizing interconnect congestion and delay in achieving efficient circuit layouts.

Uploaded by

22256
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

placement and routing

The document discusses the process of placement in circuit design, focusing on arranging components on a layout surface while considering constraints such as wirelength and routability. It outlines placement goals, objectives, and algorithms, including min-cut and eigenvalue methods, to optimize the arrangement of logic cells. Additionally, it addresses the importance of minimizing interconnect congestion and delay in achieving efficient circuit layouts.

Uploaded by

22256
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 91

Placemen

t
Conten
t⚫
Placement Definitions
⚫ Placement Goals and Objectives
⚫ Measurement of placement Goals and Objectives
⚫ Placement Algorithms
⚫ Simple placement Example
⚫ Physical Design Flow

• 20
2
Placement
⚫ The process of arranging circuit components on a
layout
surface undercertain constraints.
⚫ Inputs : Set of fixed modules, netlist
⚫ Output : Best position for each module based on various
cost functions
⚫ hotspots,
Cost performance,
functions I/O pads.
include wirelength, wire
⚫ routability,
Placement is much more to automation than
suited floorplanning.
⚫ After we complete floorplanning and placement, we
can predict both intrablock and interblock capacitances
Good placement vs Bad placement*

⚫ Good placement  Bad placement


⚫ No congestion  Congestion
⚫ Shorter wires  Longer wire lengths
⚫ Less metal levels
⚫ Smaller delay
 More metal levels
⚫ Lower power
 Longer delay
dissipation  Higher power dissipation
• 20
4
Placement Terms and Definitions
⚫ CBIC, MGA, and FPGA architectures all have rows of logic cells separated by
the
interconnect—these are row-based ASICs

• FIGURE 16.18 INTERCONNECT STRUCTURE. (a) The two-level metal CBIC floorplan shown in Figure 16.11
b. (b) A channel from the flexible block A. This channel has a channel height equal to the maximum channel density
of
• 2075(there is room for seven interconnects to run horizontally in m1). (c) A channel that uses OTC (over-the-cell)
• FIGURE 16.19 GATE-ARRAY INTERCONNECT. (a) A small two-level metal gate array (about 4.6 k-
gate). (b) Routing in a block. (c) Channel routing showing channel density and channel capacity. The
channel height on a gate array may only be increased in increments of a row. If the interconnect does not use
up all of the channel, the rest of the space is wasted. The interconnect in the channel runs in m1 in the
horizontal direction with m2 in the vertical direction.
•6
Vertical interconnect uses feedthroughs to cross the logic cells. Here are some
commonly used terms with explanations (there are no generally accepted
definitions):

⚫ An unused vertical track (or just track ) in a logic cell is called an uncommitted
feedthrough (also built-in feedthrough , implicit feedthrough , or jumper ).
⚫ A vertical strip of metal that runs from the top to bottom of a cell (for double-
entry
cells ), but has no connections inside the cell, is also called a feedthrough or
jumper.
⚫ Two connectors for the same physical net are electrically equivalent
connectors
(or equipotential connectors ). for double-entry cells these are usually at the top
and bottom of the logic cell.
⚫ A dedicated feedthrough cell (or crosser cell ) is an empty cell (with no logic) that
can hold one or more vertical interconnects. These are used if there are no other
feedthroughs available.

⚫ A feedthrough pin or feedthrough terminal is an input or output that has


connections at both the top and bottom of the standard cell.
⚫ A spacer cell (usually the same as a feedthrough cell) is used to fill space in
rows
so that the ends of all rows in a flexible block may be aligned to connect to
power
buses, for example.
•7
⚫ There are also LOGICALLY CONNECTORS (or FUNCTIONALLY
EQUIVALENT EQUIVALENT CONNECTORS, also just
called EQUIVALENT
sometimes CONNECTORS—which is very
confusing).The two inputs of a two-input NAND gate may be logically equivalent
⚫ Example:

connectors. The placement tool can swap these without altering the logic (but the two

inputs may have different delay properties, so it is not always a good idea to swap

them).

⚫ There can also be LOGICALLY EQUIVALENT CONNECTOR GROUPS . For example, in


an OAI22 (OR-AND-INVERT) gate there are four inputs: A1, A2 are inputs to one OR
gate (gate A), and B1, B2 are inputs to the second OR gate (gate B). Then group A = (A1,
A2) is logically equivalent to group B = (B1, B2)—if we swap one input (A1 or A2) from
gate A to gate B, we must swap the other input in the group (A2 or A1).

•8
Interconnect Area for CBIC,MGA and FPGA
HORIZONTAL INTERCONNECT

⚫ In the case of channeled gate arrays and FPGAs, the horizontal interconnect
areas—the channels, usually on m1—have a fixed capacity.

⚫ The channel capacity of CBICs and channelless MGAs can be expanded to


hold as many interconnects as are needed. Normally we choose, as an objective,
to minimize the number of interconnects that use each channel.

VERTICAL INTERCONNECT

⚫ In the vertical interconnect direction, usually m2, FPGAs still have fixed
resources.

⚫ In contrast the placement tool can always add vertical feedthroughs to a


channeled MGA, channelless MGA, or CBIC. These problems become less
important as we move to three and more levels of interconnect.

•9
Placement Goals and
Objectives
The goal of a placement tool is to arrange all the logic cells within the flexible
blocks on a chip.
Ideally, the objectives of the placement step are to
⚫ Guarantee the router can complete the routing step
⚫ Minimize all the critical net delays
⚫ Make the chip as dense as possible

We may also have the following additional objectives:


⚫ Minimize power dissipation
⚫ Minimize cross talk between signals

Current placement tools use more specific and achievable criteria. The most
commonly used placement objectives are one or more of the following:
⚫ Minimize the total estimated interconnect length
⚫ Meet the timing requirements for critical nets
⚫ Minimize the interconnect congestion

•10
Measurement of Placement Goals and Objectives

⚫ The graph structures that correspond to making all the connections for a net
are known as trees on graphs (or just trees ).

⚫ Special classes of trees— Steiner trees —minimize the total length of


interconnect and they are central to ASIC routing algorithms.

⚫ Minimum Steiner tree - This type of tree uses diagonal connections—we


want to solve a restricted version of this problem, using interconnects on a
rectangular grid. This is called rectilinear routing or Manhattan routing.

⚫ Euclidean distance between two points is the straight-line distance.

⚫ The Manhattan distance (or rectangular distance) between two points is the
distance we would have to walk in New York.

•11
• FIGURE 16.20 Placement using trees on graphs. (a) The floorplan from Figure 16.11 b. (b) An
expanded view of the flexible block A showing four rows of standard cells for placement (typical blocks may
contain
thousands or tens of thousands of logic cells). We want to find the length of the net shown with four
terminals, W through Z, given the placement of four logic cells (labeled: A.211, A.19, A.43, A.25). (c) The
problem for net (W, X, Y, Z) drawn as a graph. The shortest connection is the minimum Steiner tree. (d) The
minimum
Measurement of Placement (contd.,)
⚫ The minimum rectilinear Steiner tree ( MRST ) is the shortest interconnect
using a rectangular grid. The determination of the MRST is in general an NP-
complete problem—which means it is hard to solve.

⚫ The complete graph has connections from each terminal to every other terminal.

⚫ The complete-graph measure adds all the interconnect lengths of the complete-graph
connection together and then divides by n /2, where n is the number of terminals.
Complete graph = (n ( n -1) ) / 2 )

⚫ The bounding box is the smallest rectangle that encloses all the terminals.

⚫ half-perimeter measure (or bounding-box measure) is one-half the perimeter of the


bounding box.
half perimeter f = ½ Σm h
i=1 i
where m is the nets, hi is the half perimeter measure for
net i.
•13
FIGURE 16.21 Interconnect-length measures. (a)
Complete-
graph measure. (b) Half-perimeter measure.
•14
Correlation between total length of chip interconnect and the half-
perimeter and complete-graph measures.

⚫ Meander factor that specifies, on average, the ratio of the interconnect created by
the routing tool to the interconnect-length estimate used by the placement tool.
⚫ Another problem is that we have concentrated on finding estimates to the MRST, but
the
MRST that minimizes total net length may not minimize net delay.
•15
Interconnect
⚫ congestion
There is no point in minimizing the interconnect length if we create a placement that
is too congested to route.

⚫ If weuse minimum interconnect congestion as an additional


placement
objective, we need some way of measuring it.

⚫ What we are trying to measure is interconnect density

⚫ One measure of interconnect congestion uses the maximum cut line .

⚫ Maximum cut line: Imagine a horizontal or vertical line drawn anywhere across a chip
or
block,

⚫ The number of interconnects that must cross this line is the cut size (the number
of interconnects we cut).The maximum cut line has the highest cut size.
•16
• FIGURE 16.23 Interconnect congestion for the cell-
based ASIC from Figure 16.11 (b). (a) Measurement of
congestion. (b) An expanded view of flexible block A
shows a maximum cut line.
•17
Interconnect
Delay
⚫ Many placement tools minimize estimated interconnect length or interconnect congestion

as objectives.

⚫ The problem with this approach is that a logic cell may be placed a long way from
another logic cell to which it has just one connection. This logic cell with one connection is less
important as far as the total wire length is concerned than other logic cells, to which there are many
connections. However, the one long connection may be critical as far as timing delay is
concerned.

⚫ As technology is scaled, interconnection delays become larger relative to circuit delays and

this problem gets worse.

•18
Interconnect
⚫ In Delay
timing-driven placement we must estimate delay for every net for every trial
placement, possibly for hundreds of thousands of gates.

⚫ Unfortunately, the minimum-length Steiner tree does not necessarily correspond to


the interconnect path that minimizes delay. To construct a minimum-delay path we may
have to route with non-Steiner trees.

⚫ In the placement phase typically we take a simple interconnect length approximation to this
minimum-delay path (typically the half-perimeter measure).

⚫ Even when we can estimate the length of the interconnect, we do not yet have information
on which layers and how many vias the interconnect will use or how wide it will be. Some
tools allow us to include estimates for these parameters.
⚫ Often we can specify metal usage , the percentage of routing on the different layers to
expect from the router. This allows the placement tool to estimate RC values and delays—
and thus minimize delay.
•19
Placement
Algorithms
There are two classes of placement algorithms commonly used in
commercial

CAD tools: constructive placement - uses a set of rules to arrive at a constructed
placement. Example :min-cut algorithm. Eigenvalue method.
 iterative placement improvement.

As in system partitioning, placement usually starts with a constructed


solution and then improves it using an iterative algorithm.

The min-cut placement method uses successive application of


partitioning.The
following steps are,
⚫ Cut the placement area into two pieces.
⚫ Swap the logic cells to minimize the cut cost.
⚫ Repeat the process from step 1, cutting smaller pieces until all the logic cells are
placed
Usually we divide the placement area into bins .The size of a bin can vary,
from a bin size equal to the base cell (for a gate array) to a bin size that would
hold several logic cells.We can start with a large bin size, to get a rough
placement, and then reduce the bin size to get a final placement.

•20
• FIGURE 16.24 Min-cut placement. (a) Divide the chip into bins using a grid. (b) Merge all connections to
the center of each bin. (c) Make a cut and swap logic cells between bins to minimize the cost of the cut.
(d) Take the cut pieces and throw out all the edges that are not inside the piece. (e) Repeat the process with
a new cut and continue until we reach the individual bins.

•21
Eigen Value Placement
Algorithm
The eigenvalue placement algorithm uses the cost matrix or weighted connectivity matrix (eigen
value methods are also known as spectral methods ) [Hall, 1970]. The measure we use is a cost
function f that we shall minimize, given by ,
n
1
f   cij d ij 2

2 i
(1)
1

where C = [ c ij ] is the (possibly weighted) connectivity matrix, and d ij is the Euclidean distance
between the centers of logic cell i and logic cell j . Since we are going to minimize a cost function that is
the square of the distance between logic cells, these methods are also known as quadratic placement
methods. This type of cost function leads to a simple mathematical solution. We can rewrite the cost
function f in matrix form:  n 2 2
ij i j i j
2 i, j
1
f  x T Bx  y T By

B is a symmetric matrix, the disconnection matrix (also called the


Laplacian).

B= D- C

C – Connectivity Matrix ; D – Diagonal Matrix or Degree Matrix


•222
n

where,
dii  C
j 1
ij

dij  0, i  j
We can simplify the problem by noticing that it is symmetric in the x - and y -
coordinates.

Let us solve the simpler problem of minimizing the cost function for the placement
of logic cells
along just the x – axis first. We can then apply this solution to the more general two-dimensional
placement problem.

Before
them inwe solvepositions.
fixed this simpler problem,
We can definewe introduce
a vector a constraint
p consisting of thethat
validthe coordinates of the logic
positions:
cells must correspond to valid positions (the cells do not overlap and they are placed on-grid). We
make another simplifying assumption p1, all
p that p2 ....p n  cells are the
logic (4)
same size and we must
place
For a valid placement the x -coordinates of the logic cells,
x  x1, x2 ,...xn
(5)

must be a permutation of the fixed positions, p . We can show that requiring the logic cells to be
in
fixed positions in this way leads to a series of n equations restricting the values of the logic cell
coordinates .If we impose all of these constraint equations the problem becomes very complex.
Instead we choose just one of the equations:
n n
 x i  pi2
2 (6)
i1 i1

•23
Simplifying the problem in this way will lead to an approximate solution to the placement
problem. We can write this single constraint on the x -coordinates in matrix form: ,

xT x  P
n
P  p2

i i1
where P is a constant.

•24
We can now summarize the formulation of the problem, with the simplifications that we have
made, for a one-dimensional solution. We must minimize a cost function, g, where
(8)
subject to the constraint: g  x BxT

(9)
xT x 
p solve using a Lagrangian multiplier:
This is a standard problem that we can

  x T
Bx   x T
x  p  (10)
To find the value of x that minimizes g we differentiate L partially with respect to x and set the
result equal to zero. We get the following equation:

 B   I x  (11)
This last equation is called the characteristic equation for the disconnection matrix B and occurs
frequently in matrix algebra (this l 0has nothing to do with scaling). The solutions to this
equation are the eigenvectors and eigenvalues of B . Multiplying Eq.(11) by x T we get:

x xx T xx= P and x T Bx = g , then


However, since we imposed the constraint
T T

Bx g
The eigenvectors of the disconnection matrix B are the solutions to our 
placement problem.
 p

•25
•26
Iterative Placement Improvement

An iterative placement improvement algorithm takes an existing placement


and tries to improve it by moving the logic cells.There are two parts to the
algorithm:
⚫ The selection criteria that decides which logic cells to try moving.
⚫ The measurement criteria that decides whether to move the selected cells.

There are several interchange or iterative exchange methods that differ in


their
selection and measurement criteria:
⚫ Pair wise interchange,
⚫ force-directed interchange,
⚫ force-directed relaxation, and
⚫ force-directed pair wise relaxation.

All of these methods usually consider only pairs of logic cells to be


exchanged.
A source logic cell is picked for trial exchange with a destination logic
cell
•27
Iterative Placement Improvement

An iterative placement improvement algorithm takes an existing placement


and tries to improve it by moving the logic cells.There are two parts to the
algorithm:
⚫ The selection criteria that decides which logic cells to try moving.
⚫ The measurement criteria that decides whether to move the selected cells.

There are several interchange or iterative exchange methods that differ in


their
selection and measurement criteria:
⚫ Pair wise interchange,
⚫ force-directed interchange,
⚫ force-directed relaxation, and
⚫ force-directed pair wise relaxation.

All of these methods usually consider only pairs of logic cells to be


exchanged.
A source logic cell is picked for trial exchange with a destination logic
cell
•28
Iterative Placement
Improvement (contd.,)
The pair wise-interchange algorithm is similar to the interchange
algorithm used for iterative improvement in the system partitioning step:

⚫ Select the source logic cell at random.


⚫ Try all the other logic cells in turn as the destination logic cell.
⚫ Use any of the measurement methods we have discussed to decide on whether
to
accept the interchange.
⚫ The process repeats from step 1, selecting each logic cell in turn as a source
logic cell.

The neighborhood exchange algorithm is a modification to pairwise


interchange that considers only destination logic cells in a neighborhood —
cells within a certain distance, e, of the source logic cell. Limiting the search
area for the destination logic cell to the e -neighborhood reduces the
search time.
•29
• FIGURE 16.26 Interchange.
• (a) Swapping the source logic cell with a destination logic cell in pairwise interchange.
• (b) Sometimes we have to swap more than two logic cells at a time to reach an optimum
placement, but this is expensive in computation time. Limiting the search to
neighborhoods reduces the search time. Logic cells within a distance e of a logic cell
form an e-neighborhood.
• (c) A one-neighborhood.
• 23 • (d) A two-neighborhood.
0
Iterative Placement
Improvement (contd.,)
Force-directed placement methods:

Imagine identical springs connecting all the logic cells we wish to place.
The number of springs is equal to the number of connections between logic
cells. The effect of the springs is to pull connected logic cells together. The more highly
connected the logic cells, the stronger the pull of the springs. The force on a logic cell
i due to logic cell j is given by Hooke’s law , which says the force of a spring is
proportional to its extension:
F ij = – c ij x ij .
⚫ The vector component x ij is directed from the center of logic cell i to the center of
logic
cell j .
⚫ The vector magnitude is calculated as either the Euclidean or
Manhattan distance between the logic cell centers.
⚫ The c ij form the connectivity or cost matrix (the matrix element c ij is
the number of connections between logic cell i and logic cell j ).

•31
• FIGURE 16.27 Force-directed placement.
• (a) A network with nine logic cells.
• (b) We make a grid (one logic cell per bin).
•(c) Forces are calculated as if springs were attached to
the centers of each logic cell for each connection.The two
nets connecting logic cells A and I correspond to two
springs.
•32
• (d) The forces are proportional to the spring extensions.
Iterative Placement
Improvement (contd.,)
Force-directed placement algorithms:

 The force-directed interchange algorithm uses the force vector to


select a pair of logic cells to swap.
 The force-directed relaxation a chain of logic cells is moved.
 The force-directed pairwise relaxation algorithm swaps one pair of
logic cells at a time.

We reach a force-directed solution when we minimize the energy of the system,


corresponding to minimizing the sum of the squares of the distances
separating logic cells. Force-directed placement algorithms thus also use a
quadratic cost function.

•33
• FIGURE 16.28 Force-directed iterative
placement
improvement.
• (a) Force-directed interchange.
• (b) Force-directed relaxation.
• (c) Force-directed pairwise relaxation.
•34
Placement Using Simulated Annealing
Applying simulated annealing to placement, the algorithm is as follows:

⚫ Select logic cells for a trial interchange, usually at random.


⚫ Evaluate the objective function E for the new placement.
⚫ If D E is negative or zero, then exchange the logic cells. If D E is positive, then exchange
the
logic cells with a probability of exp(– D E / T ).
⚫ Go back to step 1 for a fixed number of times, and then lower the temperature T
according
to a cooling schedule: T n +1 = 0.9 T n , for example.

Experiments show that simple min-cut based constructive placement is


faster than simulated annealing but that simulated annealing is capable of giving
better results at the expense of long computer run times. The iterative
improvement methods that we described earlier are capable of giving results as good as
simulated annealing, but they use more complex algorithms.

•35
Timing-Driven Placement Methods
⚫ Minimizing delay is becoming more and more important as a
placement objective.
⚫ There are two main approaches:
– net based

– path based

⚫ We can use net weights in our algorithms.


⚫ The problem is to calculate the weights.
⚫ One method finds the n most critical paths (using a timing-analysis engine,
possibly in the synthesis tool).
⚫ The net weights might then be the number of times each net appears in this list.
Another method to find the net weights uses the zeroslack algorithm.

•36
Timing-Driven Placement Methods

⚫ Figure 16.29 (a) shows a circuit with primary inputs at which we know the
arrival times (actual times) of each signal.
⚫ We also know the required times for the primary outputs the points in
time at which we want the signals to be valid.
⚫ We can work forward from the primary inputs and backward from the
primary outputs to determine arrival and required times at each input pin
for each net.
⚫ The difference between the required and arrival times at each input pin is
the slack time (the time we have to spare).
⚫ The zero-slack algorithm adds delay to each net until the slacks are zero,
as
shown in Figure 16.29 (b).
⚫ The net delays can then be converted to weights or constraints in the
placement.

•37
• FIGURE 16.29
The zero-slack
algorithm.
(a) The circuit
with no net
delays.

• (b) The zero-


slack algorithm
adds net delays

•38
Physical design
flow

•39
Routing
Introduction
 Once the designer has
 Floorplanned a chip
 The logic cells within the flexible blocks have been placed
 Time to make the connections by routing the chip.

 This is still a hard problem that is made easier by dividing it into smaller
problems.

 Routing is usually split into


 Global routing followed by detailed routing .

• 241
• T hevitsetrabrtiingdepcoidnetr of floorplaning and
t h e
•placement
- collectionsteps for
routing.
yet •2f4o2 r
of standard cells with no room set aside
The starting point of floorplaning and
placement steps for the viterbi decoder

• Small boxes that look like bricks - outlines of the standard cells.

• Largest standard cells, at the bottom of the display (labeled dfctnb)


- 188 D flipflops.

• '+' symbols -drawing origins of the standard cells—for the D flip-flops


they are shifted to the left and below the logic cell bottom left-
hand corner.

• Large box surrounding all the logic cells - estimated chip size.

• (This is a screen shot from Cadence Cell Ensemble.)

•43
The viterbi decoder after floorplanning
•44
• FIGURE 17.1 The core of the Viterbi decoder chip after placement (a screen shot from
Cadence Cell Ensemble) • 245
• FIGURE 17.2 The core of the Viterbi decoder chip after the completion of global and detailed
routing (a screen shot from Cadence Cell Ensemble). This chip uses two-level metal. Although you

cannot see the difference, m1 runs in the horizontal direction and m2 in the vertical direction. 246
Global Routing
• The details of global routing differ slightly between

– cell-based ASICs, gate arrays, and FPGAs, but the principles are
the
same.

• A global router does not make any connections, it just plans them.

• Global route the whole chip (or large pieces if it is a large chip) before
detail routing the whole chip (or the pieces).

•47
Goals and Objectives
• Input to routing

– Floorplan that includes the locations of all the fixed and flexible blocks;
– Placement information for flexible blocks;
• Locations of all the logic cells.

• Goal of global routing

– To provide complete instructions to the detailed router on where to


route every net.

• Objectives of global routing

– Minimize the total interconnect length.


– Maximize the probability that the detailed router can
complete
the routing.
– Minimize the critical path delay.
•48
Measurement of Interconnect Delay
• After placement, the logic cell positions are fixed and the global router can afford to use
better estimates of the interconnect delay.
• To illustrate one method, we shall use the Elmore constant to estimate the
interconnect
delay for the circuit shown in Figure 17.3 .

• FIGURE 17.3 Measuring the delay of a net. (a) A simple circuit with an inverter A driving a
net with a fanout of two. Voltages V 1 , V 2 , V 3 , and V 4 are the voltages at intermediate
points along the net. (b) The layout showing the net segments (pieces of interconnect).
(c) The RC model with each segment replaced by a capacitance and resistance. The ideal • 249
switch and pull-down resistance R pd model the inverter A.
The problem is to find the voltages at the inputs to logic cells B and C taking
into account the parasitic resistance and capacitance of the metal interconnect.
Figure 17.3 (c) models logic cell A as an ideal switch with a pull-down
resistance equal to R pd and models the metal interconnect using resistors and
capacitors for each segment of the interconnect.
• The Elmore constant for node 4 (labeled V 4 ) in the network
shown in Figure 17.3 (c) is
4

ζ 4 = ΣR k 4 C k (17.1
)
k=1

= R 14 C 1 +R 24 C 2 +R 34 C 3 +R 44 C 4 ,
• where, R +R (resistance to V0
= pd 1
(17.2
R 14 shared by node 1 and 4)
)
= R +R
R 24
pd 1

= R +R +R •250
R 34
pd 1 3
In Eq. 17.2 notice that R 24 = R pd + R 1 (and not R pd + R 1 + R 2 ) because
R 1 is the resistance to V 0 (ground) shared by node 2 and node 4.

Suppose we have the following parameters (from the generic 0.5 m m CMOS
process, G5) for the layout shown in Figure 17.3 (b):

• m2 resistance is 50 m Ω /square.
• m2 capacitance (for a minimum-width line) is 0.2 pFmm –1 .
• 4X inverter delay is 0.02 ns + 0.5 CLns ( C L is in picofarads).
• Delay is measured using 0.35/0.65 output trip points.
• m2 minimum width is 3 λ = 0.9 µm.
• 1X inverter input capacitance is 0.02 pF (a standard load).

First we need to find the pull-down resistance, Rpd , of the 4X inverter. If we


model the gate with a linear pull-down resistor, Rpd , driving a load CL , the
output waveform is exp – t /( CLRpd ) (normalized to 1V).

The output reaches 63 percent of its final value when t = CLRpd , because
exp (–1) = 0.63. Then,because the delay is measured with a 0.65 trip point, the
constant 0.5 nspF –1 0.5kW is very close to the equivalent pull-down
resistance. Thus, Rpd = 500 Ω .

•51
•m2 resistance is 50 m Ω square.
•m2 capacitance (for a minimum-width
line) is 0.2 pFmm –1 .
•4X inverter delay is 0.02 ns + 0.5 CLns (
C L is in picofarads).
•Delay is measured using 0.35/0.65
output trip points.
•m2 minimum width is 3 λ = 0.9 µm.
•1X inverter input capacitance is 0.02
pF (a standard load).
•52
• R1= R2 = 6 Ω
• R3=56 Ω
• R4=112 Ω
• C 1=0.02 pF
• C 2 =0.04 pF
• C 3=0.2 pF
• C 4=0.42 pF

Now we can calculate the path resistance, Rki, values (notice that Rki = Rki):
R14 = 500 Ω + 6 Ω =506 Ω
R24 = 500 Ω + 6 Ω =506 Ω
R34 =500 Ω + 6 Ω + 56 Ω =562 Ω
R44 =500 Ω + 6 Ω + 56 Ω + 112 Ω =674 Ω (17.5)

•53
Finally, we can calculate Elmore’s constants for node 4 and node 2 as follows
= R 14 C 1 + R 24 C 2 + R 34 C 3 + R 44 (17.6
:
ζD4
C4 )

= (506)(0.02) + (506)(0.04)
+ (562)(0.2) + (674)(0.42) (17.7
ζD2 = R= 12 4C251 +psR. 22 C 2 + R 32 C 3
)
+R 42 C4
= (R pd +R 1 )( C 1 +C 3 +C 4 )
+(R pd +R 1 +R 2 )C 2

= (500 + 6 + 6)(0.04)
+ (500 + 6)(0.02 + 0.2 + 0.2)
• and ζD4 –=ζD2344
= (425
ps . – 344) = 81 ps.

• A lumped-delay model neglects the effects of interconnect resistance


and simply sums all the node capacitances (the lumped capacitance )
as follows:
• ζD = R pd ( C 1 + C 2 + C 3 + C 4 ) (17.8)
• • 254
= (500) (0.02 + 0.04 + 0.2 + 0.42)
Measurement of delay

The delay of the inverter can be assigned as follows:


– 20 ps (the intrinsic delay, 0.2 ns, due to the cell output
capacitance),
– 340 ps (due to the pull-down resistance and the
output
capacitance),
– 4 ps (due to the interconnect from A to B), (ζD2- ζD )

– 85 ps (due to the interconnect from A to C) (ζD4- ζD ).

•55
Measurement of Interconnect Delay (contd.,)
• Even using the Elmore constant we still made the following assumptions in
estimating the path delays:
• A step-function waveform drives the net.
• The delay is measured from when the gate input changes.
• The delay is equal to the time constant of an exponential waveform
that approximates the actual output waveform.
• The interconnect is modeled by discrete resistance and capacitance
elements.

• The global router could use more sophisticated estimates that remove some
of these assumptions, but there is a limit to the accuracy with which delay
can be estimated during global routing

• When the global router attempts to minimize interconnect delay, there is


an important difference between a path and a net.

• The path that minimizes the delay between two terminals on a net is
not necessarily the same as the path that minimizes the total path
length of the net.

•56
Global Routing Methods
• Many of the methods used in global routing are based on the solutions to the
tree on a graph problem.

• sequential routing :
One approach to global routing takes each net in turn and calculates the shortest
using tree on graph algorithms—with the added restriction of using the available chann

Disadvantage:
• As a sequential routing algorithm proceeds, some channels will become
congested since they hold more interconnects than others.
• In the case of FPGAs and channeled gate arrays, the channels have a fixed ch
capacity and can only hold a certain number of interconnects.

•57
Global Routing Methods (contd.,)
• There are two different ways that a global router normally handles this problem.
1. Order independent Routing
2. Order dependent Routing

• Order-independent routing, a global router proceeds by routing each net, ignoring


how crowded the channels are. Whether a particular net is processed first or last
does not matter, the channel assignment will be the same.

• Order-independent routing, after all the interconnects are assigned to channels, the
global router returns to those channels that are the most crowded and reassigns
some interconnects to other, less crowded, channels.

• order dependent :A global router can consider the number of interconnects already
placed in various channels as it proceeds. In this case the global routing is
order dependent —the routing is still sequential, but now the order of processing
the nets will affect the results.

• Iterative improvement or simulated annealing may be applied to the solutions found


from both order-dependent and order-independent algorithms.
•58
Global Routing Methods (contd.,)

• Hierarchical routing handles all nets at a particular level at once.

• Rather than handling all of the nets on the chip at the same time, the global-
routing problem is made more tractable by dividing the chip area into levels of
hierarchy.

• By considering only one level of hierarchy at a time the size of the problem is
reduced at each level.

• There are two ways to traverse the levels of hierarchy.


• top-down approach :- Starting at the whole chip, or highest level, and
proceeding down to the logic cells is the.
• The bottom-up approach starts at the lowest level of hierarchy and globally
routes the smallest areas first.

•59
Global Routing

• There are two types of areas to global route:

– between blocks

– inside the flexible blocks

•60
Global Routing Between Blocks

• FIGURE 17.4 Global routing for a cell-based ASIC formulated


as a graph problem. (a) A cell-based ASIC with numbered
channels. (b) The channels form the edges of a graph. (c) The
channel-intersection graph. Each channel corresponds to an
edge on a graph whose weight corresponds to the channel
length. • 261
Global Routing Between Blocks
( contd.,)

• FIGURE 17.5 Finding paths in global routing. (a) A cell-based ASIC showing a single net
with a fanout of four (five terminals). We have to order the numbered channels to complete
the interconnect path for terminals A1 through F1. (b) The terminals are projected to the
center of the nearest channel, forming a graph. A minimum-length tree for the net that uses
the channels and takes into account the channel capacities. (c) The minimum-length tree
does not necessarily correspond to minimum delay. If we wish to minimize the delay
from terminal A1 to D1, a different tree might be better. • 262
Global Routing Between Blocks
( contd.,)
• Global routing is very similar for cell-based ASICs and gate arrays, but
there is a very important difference between the types of channels in
these ASICs.

• In channeled gate-arrays and FPGAs the size, number, and location of


channels are fixed.

• Advantage - the global router can allocate as many interconnects to each


channel as it likes, since that space is committed anyway.

• Disadvantage - there is a maximum number of interconnects that each


channel can hold.

• If the global router needs more room, even in just one channel on the
whole chip, the designer has to repeat the placement-and-routing steps and try
again (or use a bigger chip).

• 263
Global Routing Inside Flexible Blocks

• FIGURE 17.6 Gate-array global routing. (a) A small gate array. (b) An enlarged view of the routing. The
top channel uses three rows of gate-array base cells; the other channels use only one. (c) A further
enlarged view showing how the routing in the channels connects to the logic cells. (d) One of the logic
cells, an inverter. (e) There are seven horizontal wiring tracks available in one row of gate-array base • 264
cells—the channel capacity is thus 7
Global Routing Inside Flexible Blocks (contd.,)

• FIGURE 17.7 The gate-array inverter from Figure 17.6


d. (a) An oxide-isolated gate-array base cell, showing
the diffusion and polysilicon layers. (b) The metal and
contact layers for the inverter in a 2LM (two-level
metal) process. (c) The router’s view of the cell in a 3LM•265
Global Routing Inside Flexible Blocks

FIGURE 17.8 Global routing a gate array. (a) A single global-routing cell (GRC or routing bin)
containing 2-by-4 gate-array base cells. For this choice of routing bin the maximum horizontal
track capacity is 14, the maximum vertical track capacity is 12. The routing bin labeled C3
contains three logic cells, two of which have feedthroughs marked 'f'. This results in the edge
capacities shown. (b) A view of the top left-hand corner of the gate array showing 28 routing bins.
The global router uses the edge capacities to find a sequence of routing bins to connect
the • 266
nets.
Timing-DrivenMethods
• As in timing-driven placement, there are two main approaches to timing-driven routing:
– net-based and path-based.

• Path-based methods are more sophisticated.


For example, if there is a critical path from logic cell A to B to C, the global router
may increase the delay due to the interconnect between logic cells A and B if it
can reduce the delay between logic cells B and C.

• Placement and global routing tools may or may not use the same algorithm to
estimate net delay. If these tools are from different companies, the algorithms
are probably different.

• The algorithms must be compatible, however. There is no use performing placement to


minimize predicted delay if the global router uses completely
different measurement methods.

• Companies that produce floorplanning and placement tools make sure that the
output is compatible with different routing tools—often to the extent of using
different algorithms to target different routers.

•67
Back-annotation
• The global router can give not just an estimate of the total net
length (which was all we knew at the placement stage), but
the resistance and capacitance of each path in each net.
This RC information is used to calculate net delays.

• Back-annotate this net delay information


– to the synthesis tool for in-place optimization or
– to a timing verifier to make sure there are no timing surprises.

• Differences in timing predictions at this point arise due to the


different ways in which the placement algorithms estimate
the paths and the way the global router actually builds the
paths.

•68
Detailed Routing
Goal:
• The goal of detailed routing is to complete all the connections between logic
cells.

Objectives:

• The most common objective is to minimize one or more of the following:


– The total interconnect length and area
– The number of layer changes that the connections have to make
– The delay of critical paths

• Minimizing the number of layer changes corresponds to minimizing the


number of vias that add parasitic resistance and capacitance to a connection.

•69
Measurement of Channel Density
Definition of Local and Global channel density

• Maximum local density of channel is Global density


• Channel density is less than or equal to Channel capacity.

•70
Left-edge algorithm
The left-edge algorithm ( LEA ) is the basis for several routing algorithms [
Hashimoto and Stevens, 1971].
The LEA applies to two-layer channel routing, using one layer for the trunks and the
other layer for the branches.
For example, m1 may be used in the horizontal direction and m2 in the vertical
direction.

The LEA proceeds as follows:


1. Sort the nets according to the leftmost edges of the net’s
horizontal segment.
2. Assign the first net on the list to the first free track.
3. Assign the next net on the list, which will fit, to the track.
4. Repeat this process from step 3 until no more nets will fit in the current
track.
5. Repeat steps 2–4 until all nets have been assigned to tracks.
6. Connect the net segments to the top and bottom of the channel.

•71
Left-edge algorithm

•72
Left-edge algorithm

•73
Constraints and Routing Graphs
• Two terminals that are in the same column in a channel
create a
vertical constraint .
• Overlap between the trunks of nets is called horizontal
constraint.

•74
Dog-Leg router

• A dogleg router removes the restriction that each net can use only one
track or trunk.

•75
Area Routing Algorithm- Lee-Maze algorithm
[For general shaped areas]

• Finds a path from source (X) to target (Y) by emitting a wave from both
the source and the target at the same time.

• Successive outward moves are marked in each bin.

• Once the target is reached, the path is found by backtracking (if there
is a choice of bins with equal labeled values, choose the bin that avoids
changing direction). (The original form of the Lee algorithm uses a single
wave.)

•76
Hightower or line search-Area routing algorithm
[For general shaped areas]

•1. Extend lines from both the source and target toward each

other.
• 2. When an extended line, known as an escape line , meets

an obstacle, choose a point on the escape line from which to


project another escape line at right angles to the old one. This

277
Special routing- CLK routing
• Gate arrays normally use a clock spine (a regular grid), eliminating the need
for special routing.
• The clock distribution grid is designed at the same time as the gate-array base to en
minimum clock skew and minimum clock latency—given power dissipation and clock buf
limitations.

• Cell-based ASICs may use either a clock spine, a clock tree, or a hybrid
approach.
• Figure shows how a clock router may minimize clock skew in a clock spine by making t
lengths, and thus net delays, to every leaf node equal— using jogs in the interconnect p
necessary.

• More sophisticated clock routers perform clocktree synthesis (autom


choosing the depth and structure of the clock tree) and clock-buffer insertion (equaliz
delay to the leaf nodes by balancing interconnect delays and buffer delays).

•78
Special routing- CLK routing

FIGURE: Clock routing. (a) A clock network for the cellbased ASIC
(b) Equalizing the interconnect segments between CLK and all
destinations (by including jogs if necessary) minimizes clock
skew.
•79
Special routing- Power routing
• Power bus width
• Each of the power buses has to be sized according to the current it
will carry.

• Too much current in a power bus can lead to a failure through a


mechanism known as electromigration.

• The required power-bus widths can be estimated automatically


from library information, from a separate power simulation tool,
or by entering the power-bus widths to the routing software
by hand.

• Many routers use a default power-bus width so that it is quite easy


to complete routing of an ASIC without even knowing about this
•80
problem.
Special routing- Power routing
• Gate-Array ASIC

• Gate arrays normally use a regular power grid as part of


the gate-array base.

• The gate-array logic cells contain two fixed-width power


buses inside the cell, running horizontally on m1.

• The horizontal m1 power buses are then strapped in a


vertical direction by m2 buses, which run
vertically across the chip.

•81
Special routing- Power routing
• Cell-based ASIC
• Standard cells are constructed in a similar fashion to gate-array cells,
with power buses running horizontally in m1 at the top and bottom of
each cell.

• A row of standard cells uses end-cap cells that connect to the VDD and VSS
power buses placed by the power router.

• Power routing of cell-based ASICs may include the option to include


vertical m2 straps at a specified intervals.

• In a three-level metal process, power routing is similar to two-level


metal ASICs. Power buses inside the logic cells are still normally run
on m1. Using HVH routing it would be possible to run the power buses on
m3
dropand
vias all the way down to m1 when power is required in the cells. • 282
Circuit Extraction
• After detailed routing is complete, the exact length and position of each
interconnect for every net is known.
• Now the parasitic capacitance and resistance associated with each
interconnect, via, and contact can be calculated.
• This data is generated by a circuit-extraction tool in one of the
formats.

• standard parasitic format ( SPF )


• The standard parasitic format ( SPF ) describes interconnect delay
and loading due to parasitic resistance and capacitance.

• There are three different forms of SPF:


– Two of them ( regular SPF and reduced SPF ) contain the same
information, but in different formats, and model the behavior of
interconnect;
• 283
– Third form of SPF ( detailed SPF ) describes the actual parasitic
Circuit Extraction
• The load at the output of gate A is represented by one of three models: lumped-C,
lumped- RC, or PI segment.

Figure: The regular and reduced standard parasitic format (SPF) models for
interconnect. (a) An example of an interconnect network with fanout. The driving-point
admittance of the interconnect network is Y ( s ). (b) The SPF model of the interconnect.
(c) The lumped-capacitance interconnect model. (d) The lumped-RC interconnect
model. (e) The PI segment interconnect model .
The values of C , R , C 1 , and C 2 are calculated so that Y 1 ( s ), Y 2 ( s ), and Y 3 ( s ) are the
first-, second-, and third-order Taylor-series approximations to Y ( s ). •
284
Circuit Extraction
The key features of regular and reduced SPF are as follows:
• The loading effect of a net as seen by the driving gate is represented
by choosing one of three different RC networks: lumped-C, lumped-RC, or
PI segment (selected when generating the SPF) [ O’Brien and Savarino, 1989].
• The pin-to-pin delays of each path in the net are modeled by a simple
RC delay (one for each path). This can be the Elmore constant for each path,
but it need not be.
• The reduced SPF ( RSPF) contains the same information as regular
SPF, but uses the SPICE format.
• Detailed SPF:
• The detailed SPF ( DSPF) shows the resistance and capacitance of
each segment in a net, again in a SPICE format. There are no
models or assumptions on calculating the net delays in this format.

•285
Design-Rule Check ( DRC )
• ASIC designers perform two major checks before fabrication.
• DRC:
• The first check is a design-rule check ( DRC ) to ensure that nothing
has gone wrong in the process of assembling the logic cells
and routing.

• The DRC may be performed at two levels.


• Phantom-Level DRC:
• The first level of DRC is a phantom-level DRC , which checks for
shorts, spacing violations, or other design-rule problems
between logic cells.
• This is principally a check of the detailed router.

• If the real library-cell layouts (sometimes called hard layout ) can be


accessed, we can instantiate the phantom cells and perform a
•286
second-level DRC at the transistor level.
Design-Rule Check ( DRC )
• Dracula check:
• This is principally a check of the correctness of the library cells.

• Normally the ASIC vendor will perform this check using its own
software as a type of incoming inspection.

• The Cadence Dracula software is one de facto standard in this area,


and you will often hear reference to a Dracula deck that consists
of the Dracula code describing an ASIC vendor’s design rules.

• Sometimes ASIC vendors will give their Dracula decks to customers so


that the customers can perform the DRCs themselves.

•287
Design-Rule Check ( DRC )
• Layout Vs Schematic check:
• To ensure that what is about to be committed to silicon
is what is really wanted.

• An electrical schematic is extracted from the


physical layout and compared to the netlist.

• This closes a loop between the logical and physical design


processes and ensures that both are the same.

• The LVS check is not as straightforward as it may sound,


however. • 288
Design-Rule Check ( DRC )
• Problems in LVS check:
• The first problem is transistor-level netlist for a large ASIC forms an
enormous graph.

• LVS software essentially has to match this graph against a reference


graph that describes the design.

• Ensuring that every node corresponds exactly to a corresponding


element in the schematic (or HDL code) is a very difficult task.

• The first step is normally to match certain key nodes (such as the
power supplies, inputs, and outputs), but the process can very
quickly become bogged down in the thousands of mismatch errors
that are inevitably generated initially.
•89
Design-Rule Check ( DRC )
• Problems in LVS check:
• The second problem with an LVS check is creating a true reference.

• The starting point may be HDL code or a schematic.

• Logic synthesis, test insertion, clock-tree synthesis, logical-to-physical


pad mapping, and several other design steps each modify
the netlist.

• The reference netlist may not be what we wish to fabricate.

• In this case designers increasingly resort to formal verification that


extracts a Boolean description of the function of the layout
and compare that to a known good HDL description.
•90
Thank you

•91

You might also like