0% found this document useful (0 votes)
26 views25 pages

vlsi_module_2[1]

Uploaded by

pooja T.L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views25 pages

vlsi_module_2[1]

Uploaded by

pooja T.L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

MODULE 2

Floorplanning, Placement & Routing

Floorplanning: Goals and objectives, Measurement of delay in Floorplanning. Floor planning tools,
Channel definition, I/O and Power planning and clock planning.
Placement: Goals and Obectives, Min-cut Placement algorithm, Iterative Placement Improvement, Tie
driven placement methods, Physical Design Flow.
Routing: Global Routing: Goals and objectives. Global Routing methods, Global routing between blocks,
Back annotation.

ud
2.1 FLOORPLANNING GOALS & OBJECTIVES
➢ The input to a floorplanning tool is a hierarchical netlist that describes the interconnection
of the blocks (RAM, ROM, ALU, cache controller, and so on); the logic cells (NAND, NOR, D
flip-flop, and so on) within the blocks; and the logic cell connectors (the terminals, pins, or
ports mean the same thing as connectors).
➢ The netlist is a logical description of the ASIC; the floorplan is a physical description of an
lo
ASIC. Floorplanning is thus a mapping between the logical description (the netlist) and the
physical description (the floorplan).
➢ The goals of floorplanning are to:
• Arrange the blocks on a chip,
C
• Decide the location of the I/O pads,
• Decide the location and number of the power pads,
• Decide the type of power distribution, and
tu

• Decide the location and type of clock distribution.


➢ The objectives of floorplanning are to

• Minimize the chip area

• Minimize delay.
V

2.2 MEASUREMENT OF DELAY IN FLOORPLANNING


➢ In floorplanning, the interconnect delay must be predicted before completion of any routing.
To predict delay we need to know the parasitics associated with interconnect which are:
• the interconnect capacitance (wiring capacitance or routing capacitance) and
• the interconnect resistance
➢ At the floorplanning stage, only the fanout (FO) of a net (the number of gates driven by a net)
and the size of the block that the net belongs to is known.
➢ The resistance of the various pieces of the interconnect path cannot be predicted since the
shape of the interconnect for a net is not known.

Nayana K, Assistant Professor, SVIT 2024-25 1


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

➢ However, the total length of the interconnect and the total capacitance can be estimated.
➢ The interconnect length can be estimated by collecting statistics from previously routed chips
and analyzing the results. From these statistics tables that predict the interconnect
capacitance as a function of net fanout and block size is created.
➢ A floorplanning tool can then use these predicted-capacitance tables (also known as
interconnect-load tables or wire-load tables).
➢ Figure 2.1 shows how to derive and use wire-load tables and illustrates the following facts:

ud
lo
C
tu

FIGURE 2.1 Predicted capacitance.


V

(a) Interconnect lengths as a function of fanout (FO) and circuit-block size.


(b) Wire-load table. There is only one capacitance value for each fanout (the average value).
(c) The wire-load table predicts the capacitance and delay of a net

➢ Typically between 60 and 70 percent of nets have a FO = 1.


➢ The distribution for a FO= 1 has a very long tail, stretching to interconnects that run from
corner to corner of the chip.
➢ The distribution for a FO = 1 often has two peaks, corresponding to a distribution for close
neighbours in subgroups within a block, superimposed on a distribution corresponding to
routing between subgroups.

Nayana K, Assistant Professor, SVIT 2024-25 2


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

➢ The distributions for FO > 1 are more symmetrical and flatter than for FO = 1.
➢ The wire-load tables can only contain one number, for example the average net capacitance,
for any one distribution. A tool may use a predicted capacitance for which we know 90 percent
of the nets will have less than the estimated capacitance.
➢ The statistical analysis for blocks with different sizes needs to be repeated. For example, a
net with a FO = 1 in a 25 k-gate block will have a different (larger) average length than if the
net were in a 5 k-gate block.
➢ The statistics depend on the shape (aspect ratio) of the block (usually the statistics are only
calculated for square blocks).
➢ The statistics will also depend on the type of netlist. For example, the distributions will be

ud
different for a netlist generated by setting a constraint for minimum logic delay during
synthesis which tends to generate large numbers of two-input NAND gates than for netlists
generated using minimum-area constraints.
➢ Figure 2.2 shows that, the worst-case interconnect delay increases because we do not
decrease chip size as we scale down feature size. One way to measure the worst-case delay
uses an interconnect that completely crosses the chip, a coast-to-coast interconnect.
lo
C
tu

FIGURE 2.2 Worst Case Interconnect delay.


V

2.3 FLOORPLANNING TOOLS


➢ An initial random floorplan generated by a floorplanning tool is shown in Figure 2.3 (a).
➢ Two of the blocks, A and C in this example, are standard-cell areas. These are flexible blocks
(or variable blocks) because, although their total area is fixed, their shape (aspect ratio) and
connector locations may be adjusted during the placement step.
➢ The dimensions and connector locations of the other fixed blocks (such as RAM, ROM,
compiled cells, or megacells) can only be modified when they are created.
➢ We may force logic cells to be in selected flexible blocks by seeding. We choose seed cells by
name. For example, ram_control* would select all logic cells whose names started with

Nayana K, Assistant Professor, SVIT 2024-25 3


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

ram_control to be placed in one flexible block. The special symbol, usually '*', is a wildcard
symbol.
➢ Seeding may be hard or soft. A hard seed is fixed and not allowed to move during the
remaining floorplanning and placement steps.
➢ A soft seed is an initial suggestion only and can be altered if necessary by the floorplanner.
➢ We may also use seed connectors within flexible blocks forcing certain nets to appear in a
specified order, or location at the boundary of a flexible block.

ud
lo
C
tu

FIGURE 2.3 Floorplanning a cell-based ASIC

➢ The floorplanner can complete an estimated placement to determine the positions of


V

connectors at the boundaries of the flexible blocks.


➢ Initial floorplan generated by the floorplanning tool is depicted in figure 2.3a. Two of the
blocks are flexible (A and C) and contain rows of standard cells (unplaced). A pop-up window
shows the status of block A.
➢ Figure 2.3 (b) shows an estimated placement for flexible blocks A and C. It illustrates a rat's
nest display of the connections between blocks. Connections are shown as bundles between
the centers of blocks or as flight lines between connectors. The rat s nest display shows the
heavy congestion below block B.

Nayana K, Assistant Professor, SVIT 2024-25 4


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

➢ Figure 2.3 (c) shows how we can move the blocks in a floorplanning tool to minimize routing
congestion and improve the floorplan.
➢ Figure 2.3(d) shows the updated display with the reduced congestion after the changes.
➢ We need to control the aspect ratio of our floorplan because we have to fit the chip into the
die cavity (a fixed-size hole, usually square) inside a package.
➢ Figure 2.4(a) shows the initial floorplan with a 2:1.5 die aspect ratio.
➢ (b) shows altering the floorplan to give a 1:1 chip aspect ratio.
➢ (c) A trial floorplan with a congestion map. Blocks A and C have been placed so that we know
the terminal positions in the channels. Shading indicates the ratio of channel density to the
channel capacity. Dark areas show regions that cannot be routed because the channel

ud
congestion exceeds the estimated capacity.
➢ (d) Resizing flexible blocks A and C alleviates congestion.
➢ Figure 2.4 (a) - (c) show how we can rearrange our chip to achieve a square aspect ratio.
➢ Figure 2.4 (c) also shows a congestion map, another form of routability display. There is no
standard measure of routability. Generally the interconnect channels, (or wiring channels)
have a certain channel capacity; that is, they can handle only a fixed number of
interconnects.
lo
➢ One measure of congestion is the difference between the number of interconnects that we
actually need, called the channel density, and the channel capacity.
➢ Another measure, shown in Figure 2.4 (c), uses the ratio of channel density to the channel
C
capacity. With practice, we can create a good initial placement by floorplanning and a pictorial
display.
tu
V

FIGURE 2.4 Congestion analysis.

Nayana K, Assistant Professor, SVIT 2024-25 5


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

➢ FIGURE 2.5 shows Routing a T-junction between two channels in two-level metal. The dots
represent logic cell pins.
➢ Routing channel A (the stem of the T) first allows us to adjust the width of channel B.(Fig 2.5a)
➢ If we route channel B first (the top of the T), this fixes the width of channel A. We have to
route the stem of a T-junction before we route the top.(Fig 2.5b)

ud
Fig 2.5 Routing a T junction between channels

2.4 CHANNEL DEFINITION


lo
➢ During the floorplanning step we assign the areas between blocks that are to be used for
interconnect. This process is known as channel definition or channel allocation.
➢ Figure 2.5 shows a T-shaped junction between two rectangular channels and illustrates why
C
we must route the stem (vertical) of the T before the bar.
➢ The general problem of choosing the order of rectangular channels to route is channel
ordering. Defining the channel routing order for a slicing floorplan using a slicing tree is
tu

shown in fig 2.6


➢ Make a cut all the way across the chip between circuit blocks. Continue slicing until each
piece contains just one circuit block. Each cut divides a piece into two without cutting
through a circuit block.
➢ A sequence of cuts: 1, 2, 3, and 4 that successively slices the chip until only circuit blocks
V

are left. The slicing tree corresponding to the sequence of cuts gives the order in which to
route the channels: 4, 3, 2, and finally 1.

Fig 2.6 Defining the channel routing order for slicing floorplan using a slicing tree

Nayana K, Assistant Professor, SVIT 2024-25 6


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

➢ Figure 2.6 shows a floorplan of a chip containing several blocks. Suppose we cut along the
block boundaries slicing the chip into two pieces (Figure 2.6 a). Then suppose we can slice
each of these pieces into two. If we can continue in this fashion until all the blocks are
separated, then we have a slicing floorplan ( Figure 2.6b).
➢ Figure 2.6(c) shows how the sequence we use to slice the chip defines a hierarchy of the
blocks. Reversing the slicing order ensures that we route the stems of all the channel
T-junctions first.
➢ Figure 2.7 shows a floorplan that is not a slicing structure with a cyclic constraint that
prevents channel routing. This means we cannot route any of the channels in this floorplan
without routing all of the other channels first. There is a cyclic constraint in this floorplan.

ud
➢ There are two solutions to this problem. One solution is to move the blocks until we obtain a
slicing floorplan. The other solution is to allow the use of L-shaped, rather than rectangular,
channels (or areas with fixed connectors on all sides a switch box). We need an area-based
router rather than a channel router to route L -shaped regions or switch boxes.
➢ Fig 2.7(c) This floorplan may be sliced (with initial cuts 1 or 2) and has no cyclic constraints,
but it is inefficient in area use and will be very difficult to route.
lo
C
Figure 2.7 Cyclic constraints.
tu

➢ Figure 2.8 (a) displays the floorplan of the ASIC. We can remove the cyclic constraint by
moving the blocks again, but this increases the chip size. Figure 2.8 (b) shows an alternative
solution. We merge the flexible standard cell areas A and C. We can do this by selective
flattening of the netlist. Sometimes flattening can reduce the routing area because routing
between blocks is usually less efficient than routing inside the row-based blocks. Figure 2.8(b)
V

shows the channel definition and routing order for our chip.

Figure 2.8 : Channel definition & ordering

Nayana K, Assistant Professor, SVIT 2024-25 7


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

2.5 I/O & POWER PLANNING


➢ Every chip communicates with the outside world. Signals flow onto and off the chip and we
need to supply power. We need to consider the I/O and power constraints early in the
floorplanning process.
➢ A silicon chip or die is mounted on a chip carrier inside a chip package. Connections are
made by bonding the chip pads to fingers on a metal lead frame that is part of the package.
The metal lead-frame fingers connect to the package pins. A die consists of a logic core inside
a pad ring. Figure 2.9(a) shows a pad-limited die and Figure 2.9(b) shows a core-limited die.
➢ On a pad-limited die we use tall, thin pad-limited pads, which maximize the number of pads
we can fit around the outside of the chip. The number of pads determines the die size.

ud
➢ On a core-limited die we use short, wide core-limited pads. The core logic determines the die
size.
➢ Figure 2.9(c) shows how we can use both types of pad to change the aspect ratio of a die to
be different from that of the core. Using both pad-limited pads and core-limited pads for a
square die.
lo
C
tu

Figure 2.9 Pad-limited and core-limited die.

➢ Special power pads are used for the positive supply, or VDD, power buses (or power rails)
V

and the ground or negative supply, VSS or GND.


➢ Usually one set of VDD/VSS pads supplies one power ring that runs around the pad ring and
supplies power to the I/O pads only.
➢ Another set of VDD/VSS pads connects to a second power ring that supplies the logic core.
➢ The I/O power is sometimes called dirty power since it has to supply large transient currents
to the output transistors. We keep dirty power separate to avoid injecting noise into the
internal-logic power (the clean power).
➢ I/O pads also contain special circuits to protect against electrostatic discharge (ESD). These
circuits can withstand very short high-voltage (several kilovolt) pulses that can be generated
during human or machine handling.

Nayana K, Assistant Professor, SVIT 2024-25 8


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

➢ Depending on the type of package and how the foundry attaches the silicon die to the chip
cavity in the chip carrier, there may be an electrical connection between the chip carrier and
the die substrate.
➢ Usually, the die is cemented in the chip cavity with a conductive epoxy, making an electrical
connection between substrate and the package cavity in the chip carrier. If we make an
electrical connection between the substrate and a chip pad, or to a package pin, it must be
to VDD (n-type substrate) or VSS (p-type substrate). This substrate connection (for the whole
chip) employs a down bond (or drop bond) to the carrier. We have several options:
• We can dedicate one (or more) chip pad(s) to down bond to the chip carrier.
We can make a connection from a chip pad to the lead frame and down bond from the

ud
chip pad to the chip carrier.
• We can make a connection from a chip pad to the lead frame and down bond from the
lead frame.
• We can down bond from the lead frame without using a chip pad.
• We can leave the substrate and/or chip carrier unconnected.
➢ Depending on the package design, the type and positioning of down bonds may be fixed. This
lo
means we need to fix the position of the chip pad for down bonding using a pad seed.
➢ A double bond connects two pads to one chip-carrier finger and one package pin. A multiple-
signal pad or pad group is a set of pads. For example, an oscillator pad usually comprises a
set of two adjacent pads that we connect to an external crystal. The oscillator circuit and the
C
two signal pads form a single logic cell. Another common example is a clock pad.
➢ Some foundries allow a special form of comer pad (normal pads are edge pads) that squeezes
two pads into the area at the corners of a chip using a special two-pad comer cell, to help
meet bond-wire angle design rules (Figure 2.10 b and c).
tu

➢ To reduce the series resistive and inductive impedance of power supply networks, it is normal
to use multiple VDD and VSS pads. This is particularly important with the simultaneously
switching outputs (SSOS) that occur when driving buses off-chip. Depending on the
technology it may be necessary to provide dedicated VDD and VSS pads for every few SSOS.
➢ These dedicated VDD/VSS pads must follow groups of output pads as they are seeded or
V

planned on the floorplan with some chip packages this can become difficult because design
rules limit the location of package pins that may be used for supplies.
➢ Using a pad mapping we translate the logical pad in a netlist to a physical pad from a pad
library. The handling of I/O pads can become quite complex; there are several nonobvious
factors that must be considered when generating a pad ring:
• Ideally we would only need to design library pad cells for one orientation. For example,
an edge pad for the south side of the chip, and a corner pad for the southeast corner. We
could then generate other orientations by rotation and flipping (mirroring). Some ASIC
vendors will not allow rotation or mirroring of logic cells in the mask file. To avoid these

Nayana K, Assistant Professor, SVIT 2024-25 9


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

problems, we may need to have separate horizontal, vertical, left-handed, and right-
handed pad cells in the library with appropriate logical to physical pad mappings.
• If we mix pad-limited and core-limited edge pads in the same pad ring, this complicates
the design of corner pads. Usually, the two types of edge pad cannot abut. In this case a
corner pad also becomes a pad-format changer, or hybrid corner pad.
• In single-supply chips we have one VDD net and one VSS net, both global power nets. It
is also possible to use mixed power supplies (for example, 3.3 V and 5 V) or multiple
power supplies (digital VDD, analog VDD).
➢ Figure 2.10 (a) and (b) are magnified views of the southeast corner of our example chip and
show the different types of I/O cells. Figure 2.10 (c) shows a stagger-bond arrangement using

ud
two rows of I/O pads.

lo
C
tu
V

Figure 2.10 Bonding pads.


➢ Figure 2.10 (d) shows an area-bump bonding arrangement (also known as flip-chip, solder-
bump or C4) used, the chip is turned upside down and solder bumps connect the pads to the
lead frame. Even though the bonding pads are located in the center of the chip, the I/O
circuits are still often located at the edges of the chip because of difficulties in power supply
distribution and integrating I/O circuits together with logic in the center of the die.

Nayana K, Assistant Professor, SVIT 2024-25 10


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

➢ In an MGA(Masked Gate Array) the pad spacing and I/O-cell spacing is fixed—each pad
occupies a fixed pad slot (or pad site ). This means that the properties of the pad I/O are also
fixed but, if we need to, we can parallel adjacent output cells to increase the drive.
➢ To increase flexibility further the I/O cells can use a separation, the I/O-cell pitch , that is
smaller than the pad pitch .
➢ For example, three 4 mA driver cells can occupy two pad slots. Then we can use two 4 mA
output cells in parallel to drive one pad, forming an 8 mA output pad as shown in Figure 2.11.
This arrangement also means the I/O pad cells can be changed without changing the base
array. This is useful as bonding techniques improve and the pads can be moved closer
together.

ud
➢ Fig 2.11(a) Cell-based ASICs may contain pad cells of different sizes and widths. (b) A corner
of a gate-array base. (c) A gate-array base with different I/O cell and pad pitches.

lo
C
tu

Figure 2.11 Gate-array I/O pads

➢ Figure 2.12 shows two possible power distribution schemes. The long direction of a
rectangular channel is the channel spine. Some automatic routers may require that metal
lines parallel to a channel spine use a preferred layer (either m1, m2, or m3).
V

➢ Since we can have both horizontal and vertical channels, we may have the situation shown
in Figure 2.12, where we have to decide whether to use a preferred layer or the preferred
direction for some channels.
➢ Fig 2.12 (a) Power distributed using m1 for VSS and m2 for VDD. This helps minimize the
number of vias and layer crossings needed but causes problems in the routing channels.
➢ (b) In this floorplan m1 is run parallel to the longest side of all channels, the channel spine.
This can make automatic routing easier but may increase the number of vias and layer
crossings. (c) An expanded view of part of a channel. If power runs on different layers along
the spine of a channel, this force signals to change layers. (d) A closeup of VDD & VSS buses
as they cross. Changing layers requires a large number of via contacts to reduce resistance.

Nayana K, Assistant Professor, SVIT 2024-25 11


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

ud
lo
Figure 2.12 Power distribution
C
2.5 CLOCK PLANNING
➢ Figure 2.13 (a) shows a clock spine routing scheme with all clock pins driven directly from
the clock driver.
tu

➢ MGAs and FPGAs often use this fish bone type of clock distribution scheme. Figure 2.13(b)
shows a clock spine for a cell-based ASIC.
➢ Figure 2.13(c) shows the clock-driver cell, often part of a special clock-pad cell.
➢ Figure 2.13(d) illustrates clock skew and clock latency. Since all clocked elements are driven
V

from one net with a clock spine, skew is caused by differing interconnect lengths and loads.
➢ If the clock-driver delay is much larger than the interconnect delays, a clock spine achieves
minimum skew but with long latency.

Nayana K, Assistant Professor, SVIT 2024-25 12


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

ud
lo
Figure 2.13 Clock distribution.
C
(a) A clock spine for a gate array. (b) A clock spine for a cell-based ASIC (typical chips have
thousands of clock nets). (c) A clock spine is usually driven from one or more clock-driver
cells. Delay in the driver cell is a function of the number of stages and the ratio of output to
tu

input capacitance for each stage (taper). (d) Clock latency and clock skew.

➢ Clock skew represents a fraction of the clock period that we cannot use for computation. A
clock skew of 500ps with a 200 MHz clock means that we waste 500ps of every 5ns clock
cycle, or 10 percent of performance.
➢ Latency can cause a similar loss of performance at the system level when we need to
V

resynchronize our output signals with a master system clock.


➢ Figure 2.13(c) illustrates the construction of a clock-driver cell. The delay through a chain of
CMOS gates is minimized when the ratio between the input capacitance C1 and the output
(load) capacitance C2 is about 3(e~2.7). This means that the fastest way to drive a large load
is to use a chain of buffers with their input and output loads chosen to maintain this ratio,
or taper. This is not necessarily the smallest or lowest-power method.

Nayana K, Assistant Professor, SVIT 2024-25 13


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

➢ We can design a tree of clock buffers so that the taper of each stage is e ⊕ 2.7 by using a
fanout of three at each node, as shown in Figure 2.14(a) and (b).
➢ The clock tree, shown in Figure 2.14 (c), uses the same number of stages as a clock spine,
but with a lower peak current for the inverter buffers. Figure 2.14 (c) illustrates that we now
have another problem—we need to balance the delays through the tree carefully to minimize
clock skew .

ud
lo
Figure 2.14 A clock tree.
C
(a) Minimum delay is achieved when the taper of successive stages is about 3. (b) Using a fanout
of three at successive nodes. (c) A clock tree for the cell-based ASIC of Figure 2.3(b). We have to
balance the clock arrival times at all of the leaf nodes to minimize clock skew.
tu

➢ Designing a clock tree that balances the rise and fall times at the leaf nodes has the beneficial
side-effect of minimizing the effect of hot-electron wearout . This problem occurs when an
electron gains enough energy to become “hot” and jump out of the channel into the gate oxide.
➢ The trapped electrons change the threshold voltage of the device and this alters the delay of
V

the buffers. As the buffer delays change with time, this introduces unpredictable skew. The
problem is worst when the n -channel device is carrying maximum current with a high voltage
across the channel—this occurs during the rise-and fall-time transitions. Balancing the rise
and fall times in each buffer means that they all wear out at the same rate, minimizing any
additional skew.
➢ A phase-locked loop ( PLL ) is an electronic flywheel that locks in frequency to an input clock
signal. The input and output frequencies may differ in phase, however. This means that we
can, for example, drive a clock network with a PLL in such a way that the output of the clock
network is locked in phase to the incoming clock, thus eliminating the latency of the clock
network .

Nayana K, Assistant Professor, SVIT 2024-25 14


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

2b: PLACEMENT

➢ After completing a floorplan we can begin placement of the logic cells within the flexible
blocks. Placement is much more suited to automation than floorplanning.
➢ After we complete floorplanning and placement, we can predict both intrablock and interblock
capacitances. This allows us to return to logic synthesis with more accurate estimates of the
capacitive loads that each logic cell must drive.

2.1 PLACEMENT GOALS AND OBJECTIVES

The goal of a placement tool is to arrange all the logic cells within the flexible blocks on a chip.
Ideally, the objectives of the placement step are to

ud
• Guarantee the router can complete the routing step
• Minimize all the critical net delays
• Make the chip as dense as possible
We may also have the following additional objectives:
• Minimize power dissipation
• Minimize cross talk between signals
lo
Current placement tools use more specific and achievable criteria. The most commonly used
placement objectives are one or more of the following:
• Minimize the total estimated interconnect length
• Meet the timing requirements for critical nets
C
• Minimize the interconnect congestion
Each of these objectives in some way represents a compromise.

2.2 PLACEMENT ALGORITHMS


tu

➢ There are two classes of placement algorithms commonly used in commercial CAD tools:
constructive placement and iterative placement improvement.

2.2.1 Min-cut Algorithm


➢ A constructive placement method uses a set of rules to arrive at a constructed placement.
V

The most commonly used methods are variations on the min-cut algorithm. The other
commonly used constructive placement algorithm is the eigenvalue method. As in system
partitioning, placement usually starts with a constructed solution and then improves it using
an iterative algorithm. In most tools we can specify the locations and relative placements of
certain critical logic cells as seed placements.
➢ The min-cut placement method uses successive application of partitioning. The following
steps are shown in Figure 2.1:
1. Cut the placement area into two pieces.
2. Swap the logic cells to minimize the cut cost.
3. Repeat the process from step 1, cutting smaller pieces until all the logic cells are placed.

Nayana K, Assistant Professor, SVIT 2024-25 15


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

➢ Figure 2.1(a) Divide the chip into bins using a grid.


➢ (b) Merge all connections to the center of each bin.
➢ (c) Make a cut and swap logic cells between bins to minimize the cost of the cut.
➢ (d) Take the cut pieces and throw out all the edges that are not inside the piece.
➢ (e) Repeat the process with a new cut and continue until we reach the individual bins.

ud
lo
C
Figure 2.1 Min-cut placement.

➢ Usually we divide the placement area into bins . The size of a bin can vary, from a bin size
tu

equal to the base cell (for a gate array) to a bin size that would hold several logic cells. We
can start with a large bin size, to get a rough placement, and then reduce the bin size to get
a final placement.

2.2.2 Iterative Placement Improvement


V

An iterative placement improvement algorithm takes an existing placement and tries to improve
it by moving the logic cells. There are two parts to the algorithm:
• The selection criteria that decides which logic cells to try moving.
• The measurement criteria that decides whether to move the selected cells.
There are several interchange or iterative exchange methods that differ in their selection and
measurement criteria:
• pairwise interchange,
• force-directed interchange,
• force-directed relaxation, and
• force-directed pairwise relaxation.

Nayana K, Assistant Professor, SVIT 2024-25 16


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

➢ All of these methods usually consider only pairs of logic cells to be exchanged. A source logic
cell is picked for trial exchange with a destination logic cell.
➢ The most widely used methods use group migration, especially the Kernighan–Lin algorithm.
The pairwise-interchange algorithm is similar to the interchange algorithm used for iterative
improvement in the system partitioning step:
1. Select the source logic cell at random.
2. Try all the other logic cells in turn as the destination logic cell.
3. Use any of the measurement methods we have discussed to decide on whether to accept
the interchange.
4. The process repeats from step 1, selecting each logic cell in turn as a source logic cell.

ud
➢ Figure 2.2 (a) and (b) show how we can extend pairwise interchange to swap more than two
logic cells at a time. If we swap l logic cells at a time and find a locally optimum solution, we
say that solution is l -optimum .
➢ The neighborhood exchange algorithm is a modification to pairwise interchange that
considers only destination logic cells in a neighborhood —cells within a certain distance, e,
of the source logic cell. Limiting the search area for the destination logic cell to the e-
lo
neighborhood reduces the search time.
➢ Figure 2.2 (c) and (d) show the one- and two-neighborhoods (based on Manhattan distance)
for a logic cell.
C
tu

Figure 2.2 Interchange.


V

➢ (a) Swapping the source logic cell with a destination logic cell in pairwise interchange.
➢ (b) Sometimes we have to swap more than two logic cells at a time to reach an optimum
placement, but this is expensive in computation time.
➢ Limiting the search to neighborhoods reduces the search time. Logic cells within a distance
e of a logic cell form an e-neighborhood. (c) A one-neighborhood. (d) A two-neighborhood.
➢ Neighborhoods are also used in some of the force-directed placement methods.
➢ Imagine identical springs connecting all the logic cells we wish to place. The number of
springs is equal to the number of connections between logic cells. The effect of the springs is
to pull connected logic cells together. The more highly connected the logic cells, the stronger

Nayana K, Assistant Professor, SVIT 2024-25 17


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

the pull of the springs. The force on a logic cell i due to logic cell j is given by Hooke’s law ,
which says the force of a spring is proportional to its extension:

F ij = c ij x ij

➢ The vector component x ij is directed from the center of logic cell i to the center of logic cell j .
The vector magnitude is calculated as either the Euclidean or Manhattan distance between
the logic cell centers. The c ij form the connectivity or cost matrix (the matrix element c ij is
the number of connections between logic cell i and logic cell j). Fig 2.3 illustrates the force-
directed placement algorithm.

ud
Figure 2.3 Force-directed placement.
➢ 2.3 (a) A network with nine logic cells. (b) We make a grid (one logic cell per bin). (c) Forces
lo
are calculated as if springs were attached to the centers of each logic cell for each connection.
The two nets connecting logic cells A and I correspond to two springs.
➢ In the definition of connectivity it was pointed out that the network graph does not accurately
model connections for nets with more than two terminals. Nets such as clock nets, power
C
nets, and global reset lines have a huge number of terminals.
➢ Without external forces to counteract the pull of the springs between logic cells, the network
will collapse to a single point as it settles. An important part of force-directed placement is
tu

fixing some of the logic cells in position.


➢ Figure 2.4 illustrates the different kinds of force-directed placement algorithms. The force-
directed interchange algorithm uses the force vector to select a pair of logic cells to swap.
➢ In force-directed relaxation a chain of logic cells is moved. The force-directed pairwise
relaxation algorithm swaps one pair of logic cells at a time.
V

Figure 2.4 Force-directed iterative placement improvement. (a) Force-directed


interchange. (b) Force-directed relaxation. (c) Force-directed pairwise relaxation.

Nayana K, Assistant Professor, SVIT 2024-25 18


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

2.2.3 Timing-Driven Placement Methods


➢ Minimizing delay is becoming more and more important as a placement objective. There are
two main approaches: net based and path based. We know that we can use net weights in
our algorithms. The problem is to calculate the weights. One method finds the ‘n’ most critical
paths (using a timing-analysis engine, in the synthesis tool). The net weights might then be
the number of times each net appears in this list. The problem with this approach is that as
soon as we fix (for example) the first 100 critical nets, suddenly another 200 become critical.
➢ Another method to find the net weights uses the zero-slack algorithm. Figure 2.5 shows
how this works (all times are in nanoseconds).
➢ Figure 2.5 (a) shows a circuit with primary inputs at which we know the arrival times of

ud
each signal. We also know the required times for the primary outputs —the points in time
at which we want the signals to be valid.
➢ We can work forward from the primary inputs and backward from the primary outputs to
determine arrival and required times at each input pin for each net. The difference between
the required and arrival times at each input pin is the slack time (the time we have to spare).
➢ The zero-slack algorithm adds delay to each net until the slacks are zero, as shown
lo
in Figure 2.5 (b). The net delays can then be converted to weights or constraints in the
placement. Notice that we have assumed that all the gates on a net switch at the same time
so that the net delay can be placed at the output of the gate driving the net.
➢ An important point to remember is that adjusting the net weight, even for every net on a chip,
C
does not theoretically make the placement algorithms any more complex—we have to deal
with the numbers anyway.
➢ The practical problem, however, is getting the weight information for each net (usually in the
form of timing constraints) from a synthesis tool or timing verifier. These files can easily be
tu

hundreds of megabytes in size.


➢ With the zero-slack algorithm we simplify but overconstrain the problem. For example, we
might be able to do a better job by making some nets a little longer than the slack indicates
if we can tighten up other nets.
➢ There is still the question of how to predict path delays between gates with only placement
V

information. Usually we still do not compute a routing tree but use simple approximations to
the total net length (such as the half-perimeter measure) and then use this to estimate a net
delay (the same to each pin on a net). It is not until the routing step that we can make
accurate estimates of the actual interconnect delays.

Nayana K, Assistant Professor, SVIT 2024-25 19


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

ud
lo
C
tu

Figure 2.5 The zero-slack algorithm.


V

(a) The circuit with no net delays. (b) The zero-slack algorithm adds net delays (at the
outputs of each gate, equivalent to increasing the gate delay) to reduce the slack times to
zero.

Nayana K, Assistant Professor, SVIT 2024-25 20


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

2.3 Physical Design Flow


➢ Historically placement was included with routing as a single tool (the term P&R is often used
for place and route). Because interconnect delay now dominates gate delay, the trend is to
include placement within a floorplanning tool and use a separate router.
➢ Figure 2.6 shows a design flow using synthesis and a floorplanning tool that includes
placement. This flow consists of the following steps:
1. Design entry. The input is a logical description with no physical information.
2. Synthesis. The initial synthesis contains little or no information on any interconnect loading.
The output of the synthesis tool (typically an EDIF netlist) is the input to the floorplanner.
3. Initial floorplan. From the initial floorplan interblock capacitances are input to the synthesis

ud
tool as load constraints and intrablock capacitances are input as wire-load tables.
4. Synthesis with load constraints. At this point the synthesis tool is able to resynthesize the
logic based on estimates of the interconnect capacitance each gate is driving. The synthesis
tool produces a forward annotation file to constrain path delays in the placement step.
5. Timing-driven placement. After placement using constraints from the synthesis tool, the
location of every logic cell on the chip is fixed and accurate estimates of interconnect delay
lo
can be passed back to the synthesis tool.
6. Synthesis with in-place optimization (IPO). The synthesis tool changes the drive strength
of gates based on the accurate interconnect delay estimates from the floorplanner without
altering the netlist structure.
C
7. Detailed placement. The placement information is ready to be input to the routing step.
tu
V

Figure 2.6 Timing-driven floorplanning and placement design flow.

Nayana K, Assistant Professor, SVIT 2024-25 21


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

2c: ROUTING
➢ Once the designer has floorplanned a chip and the logic cells within the flexible blocks have
been placed, it is time to make the connections by routing the chip. This is still a hard problem
that is made easier by dividing it into smaller problems. Routing is usually split into global
routing followed by detailed routing.
➢ The details of global routing differ slightly between cell-based ASICs, gate arrays, and FPGAs,
but the principles are the same in each case. A global router does not make any connections,
it just plans them. We typically global route the whole chip (or large pieces if it is a large chip)
before detail routing the whole chip (or the pieces). There are two types of areas to global
route: inside the flexible blocks and between blocks (the Viterbi decoder, although a cell-

ud
based ASIC, only involved the global routing of one large flexible block).

2.1 GOALS AND OBJECTIVES

➢ The input to the global router is a floorplan that includes the locations of all the fixed and
flexible blocks; the placement information for flexible blocks; and the locations of all the logic
cells. The goal of global routing is to provide complete instructions to the detailed router on


lo
where to route every net. The objectives of global routing are one or more of the following:

Minimize the total interconnect length.

• Maximize the probability that the detailed router can complete the routing.

• Minimize the critical path delay.


C
➢ In both floorplanning and placement, with minimum interconnect length as an objective, it
is necessary to find the shortest total path length connecting a set of terminals .
➢ Floorplanning and placement both assume that interconnect may be put anywhere on a
tu

rectangular grid, since at this point nets have not been assigned to the channels, but the
global router must use the wiring channels and find the actual path. Often the global router
needs to find a path that minimizes the delay between two terminals.

2.2 GLOBAL ROUTING METHODS


V

➢ Global routing cannot use the interconnect length approximations such as the half perimeter
measure that were used in placement. One approach to global routing takes each net in turn
and calculates the shortest path using tree on graph algorithms with the added restriction of
using the available channels. This process is known as sequential routing.
➢ As a sequential routing algorithm proceeds, some channels will become more congested since
they hold more interconnects than others. In the case of FPGAs and channelled Gate
Arrays, the channels have a fixed channel capacity and can only hold a certain number of
interconnects.
➢ There are two different ways that a global router handles this problem. Using order-
independent routing a global router proceeds by routing each net ignoring how crowded the

Nayana K, Assistant Professor, SVIT 2024-25 22


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

channels are. Whether a particular net is processed first or last does not matter the channel
assignment will be the same.
➢ In order independent routing, after all the interconnects are assigned to channels, the Global
router returns to those channels that are the most crowded and reassigns some interconnects
to others less crowded channels.
➢ Alternatively, a Global router can consider the number of interconnects already placed in
various channels as it proceeds. In this case the Global routing is order dependent, the
routing is still sequential but now the order of processing the nets will affect the results.
➢ In contrast to sequential Global routing methods, which handle nets one at a time,
hierarchical routing handles all nets at a particular level at once. rather than handling all of

ud
the nets on the chip at the same time the Global routing problem is made more tractable by
dividing the ship area into levels of Hierarchy.
➢ By considering only one level of Hierarchy at a time the size of the problem is reduced at each
level. There are two ways to traverse the levels of Hierarchy starting at the whole ship or
highest level and preceding down to the logic cells is the top-down approach the bottom-up
approach starts at the lowest level of hierarchy and globally routes the smallest areas first.
lo
2.2.1 GLOBAL ROUTING BETWEEN BLOCKS

➢ Figure 2.1 illustrates the global routing problem for a cell-based ASIC. Each edge in the
channel- intersection graph in figure 2.1(c) represents a channel. The global router is
C
restricted to using these channels. The weight of each edge in the graph corresponds to the
length of the channel. The global router plans a path for each interconnect using this graph.
tu
V

Figure 2.1 Global routing for a cell-based ASIC formulated as a graph problem.

(a) A cell-based ASIC with numbered channels. (b) The channels form the edges of a graph
(c)The channel-intersection graph. Each channel corresponds to an edge on a graph whose
weight corresponds to the channel length.

Nayana K, Assistant Professor, SVIT 2024-25 23


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

➢ Figure 2.2 shows an example of global routing for a net with five terminals, labelled A1
through F1, for the cell-based ASIC shown in figure 2.1. If a designer wishes to use minimum
total interconnect path length as an objective.

ud
Figure 2.2 Finding paths in global routing
lo
(a) A cell based ASIC showing a single net with a fanout of 4(five terminals). We have to order
the numbered channels to complete interconnect path for terminals A1 through F1.
(b) The terminals are projected to the centre of the nearest channel forming a graph. a
minimum length tree for the net that uses the channels and takes into account the channel
C
capacities
(c) The minimum length tree does not necessarily correspond to minimum delay. If we wish
to minimise the delay from terminal A1 to D1 a different tree might be better.
tu

➢ Global routing is very similar for cell based ASICs and Gate arrays but there is a very
important difference between the types of channels in these ASICs.
➢ The size of the channel in sea of Gates array ,channelless Gate arrays and cell based ASICs
can be varied to make sure there is enough space to complete the wiring.
➢ In chanelled gate arrays and FPGAs the size, number and location of channels are fixed.
V

➢ The Global router can allocate as many interconnects to each channel as it likes since that
space is committed anyway but there is a maximum number of interconnects that each
channel can hold.
➢ If the Global router needs more room even in just one channel on the whole chip the designer
has to repeat the placement and routing steps and try again(or use bigger chip).

Nayana K, Assistant Professor, SVIT 2024-25 24


Module 2- Floorplanning, Placement & Routing Advanced VLSI (21EC71)

2.3 BACK ANNOTATION

➢ Back annotation After Global routing is complete it is possible to accurately predict what the
length of each interconnect in every net will be after detailed routing. The Global router can
give us not just an estimate of the total net length but the resistance and capacitance of each
path in each net. This RC information is used to calculate net delays.
➢ We can back annotate this net delay information to the synthesis tool for in place
Optimisation or to a timing verifier to make sure there are no timing surprises. Differences
in timing predictions at this point arise due to the different ways in which the placement
algorithms estimate the parts and the way the Global router actually builds the paths.

ud
lo
C
tu
V

Nayana K, Assistant Professor, SVIT 2024-25 25

You might also like