0% found this document useful (0 votes)
84 views

Minimal Buffer Insertion in Clock Trees With Skew and Slew Rate Constraints

This document summarizes a research paper that investigates computing the minimum number of buffers required in a clock tree to satisfy a maximum clock slew rate constraint. It formulates the problem of minimal buffer insertion as a nonlinear optimization problem and derives an algorithm that bounds buffer capacitance, allowing the problem to be solved optimally in linear time. The algorithm is extended to include an upper bound on clock skew. Experiments show the algorithms can effectively design high-speed clock trees with small skew.

Uploaded by

sakru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views

Minimal Buffer Insertion in Clock Trees With Skew and Slew Rate Constraints

This document summarizes a research paper that investigates computing the minimum number of buffers required in a clock tree to satisfy a maximum clock slew rate constraint. It formulates the problem of minimal buffer insertion as a nonlinear optimization problem and derives an algorithm that bounds buffer capacitance, allowing the problem to be solved optimally in linear time. The algorithm is extended to include an upper bound on clock skew. Experiments show the algorithms can effectively design high-speed clock trees with small skew.

Uploaded by

sakru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 16, NO.

4, APRIL 1997 333

Minimal Buffer Insertion in Clock Trees


with Skew and Slew Rate Constraints
Gustavo E. Téllez, Associate Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE

Abstract— In this paper, we investigate the problem of com-


puting a lower bound on the number of buffers required when
given a maximum clock slew rate (or rise time) constraint and a
predefined clock tree. Using generalized properties of published
CMOS timing models, we formulate a novel nonlinear buffer
insertion problem. Next, we derive an algorithm that bounds
the capacitance for each buffer stage without sacrificing the
generality of the timing models. With this capacitance bound
we formulate a second linear buffer insertion problem, which
we solve optimally in O(n) time. The basic formulation and
algorithm are extended to include a skew upper bound constraint.
Using these algorithms we propose further algorithmic extensions
that allow area and phase delay tradeoffs. Our results are verified
using SPICE3e2 simulations with MCNC MOSIS 2.0  models
and parameters. Experiments with these test cases show that
the buffer insertion algorithms proposed herein can be used
effectively for designs with high clock speeds and small skews.
Index Terms— Buffered clock tree, clock phase delay, clock
slew rate, clock skew. Fig. 1. A buffered clock tree with synchronizing elements A; B , and
C , source s0 , sinks fs1 ; s2 ; s3 g, buffers fv0 ; v2 ; v3 g; and Steiner nodes
fv1 ; v4 g.
I. INTRODUCTION

M ODERN high-speed digital systems are designed with


a target clock period (or clock frequency), which deter-
mines the rate of data processing. A clock network distributes
severe, consequently the layout of a good clock distribution
network is difficult and time consuming. In this paper we will
study some of these problems in buffered clock trees.
the clock signal from the clock generator, or source, to the Work on clock trees has focused on zero or near zero-
clock inputs of the synchronizing components, or sinks. This skew routing [2], [6], [13]. In addition to zero-skew, further
must be done while maintaining the integrity of the signal, and work concentrates on routing clock trees with minimal total
minimizing (or at least upper bounding) the following clock wire length [7], [4], [10]. The construction of clock trees that
parameters: minimize phase delay of the clock signal has been studied in
• the clock skew, which is defined as the maximum dif- [9] and [11].
ference of the delays from the clock source to the clock Consider a buffered clock tree or clock power-up tree, which
pins; is a tree that contains buffers in its source to sink paths (see
• the clock slew rate (or rise time) of the signals at the clock Fig. 1.). The clock period is limited in a clock tree by the
pins, which is defined as the time it takes the waveform longest rise time in the clocking network [1]. The rise time of a
to change from a to a value; clock signal depends on several factors: the output impedance
• the clock phase delay (or latency), which is defined as the of the clock driver, the parasitic loads of the clock wiring
maximum delay from the clock source to any clock pin; (wiring load), and the input loads of the driven gates (gate
• the sensitivity to parametric variations of the clock skew. load). The addition of a buffer to a clock path reduces the
Additionally, these objectives must be attained while min- wiring load (and may also reduce the gate load) to the clock
imizing the use of system resources such as power and area. driver. Hence, one may reduce the clock slew rates by adding
In high performance systems these constraints can be quite buffers in a clock tree. The previous discussion implies that a
minimum number of clock buffers is needed to maintain the
Manuscript received October 4, 1994; revised September 22, 1995 and clock signal integrity. The following facts motivate minimizing
February 19, 1997. This work was supported in part by NSF under Grant the number of buffers used in a clock tree.
MIP 9207267 and by the IBM Ph.D. Resident Study Program. This paper
was presented in part at the Proceedings of ICCAD-94. This paper was 1) The total capacitance, and therefore the total power
recommended by Associate Editor C.-K. Cheng. consumed in driving the clock signal, is also minimized.
G. E. Téllez is with IBM Corporation, East Fishkill, NY 12533-0999 USA. 2) In high-speed designs, the number of buffers may be
M. Sarrafzadeh is with the Department of Electrical Engineering and
Computer Science, Northwestern University, Evanston, IL 60208 USA. large enough to be of significant impact in the total chip
Publisher Item Identifier S 0278-0070(97)05483-3. area.
0278–0070/97$10.00  1997 IEEE
334 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 16, NO. 4, APRIL 1997

(a) (b) (c)


Fig. 2. (a) Buffer design used in experiments and test cases. The channel widths of the buffer’s transistors are parameterized by width W . (b) Graph
shows a picture of a rise time computation. (c) Shows the rise time function obtained by SPICE simulations using MCNC MOSIS 2.0  models on
a buffer with W = 4 .

In this paper we analyze the problem of computing the can only be used as an initial guess for the final buffer
minimum number of buffers needed to satisfy a given upper locations. If the designer is mostly concerned about the cor-
bound clock slew rate in an arbitrary clock tree. As far as we rectness of the clock tree, then the algorithm proposed in [8]
know, the problem of constructing a clock tree with minimal is sufficient to produce a final layout of the clock tree. In
number of buffers, subject to a maximum clock slew rate, has the more general case where, for example, buffers must be
not been posed or formalized. Work on buffered clock trees inserted in a layout with timing constraints, this algorithm
has focused on minimizing the phase delay of the clock tree is not sufficient, and more sophisticated detail clock design
[16], [18] on the assumption that reduced clock tree delays algorithms are needed.
also reduce skew and skew sensitivity to process variation. For this work, we make the following assumptions.
In these works the delay model usually consists of a fixed • A routing of the clock tree is given. A single wire width
on-resistance and delay for each buffer. However, it is well is used in the clock tree routing.
known that the buffer on-resistance and delay are at least • The clock slew rate, denoted , is a constraint on the
functions of the input wave shape (rise time) and the output maximum allowable rise time on any buffer input signal.
load capacitance [5] (see Fig. 2). Hence, the results obtained This constraint guarantees the correct operation of the
from the above algorithms can be inaccurate and may result clock tree and will be shown to be useful in manipulating
in wasteful clock tree designs. More recently, work in [15] the clock tree design.
considers the minimization of skew and delay in the presence • The clock tree will be buffered with a single type of
of process variations in buffered clock trees. In that work, CMOS buffer. This assumption is not unusual since using
placement of buffers is limited to tree nodes at a given level
a single buffer type is an accepted strategy used in
in a tree. For a survey on clock network construction issues,
reducing skew and skew process-variation sensitivity.
see [3] and [12].
Consider the early design of a clock distribution network. A We also consider several extensions to this problem.
designer would like to estimate and obtain tradeoffs on several • Compute the minimum number of buffers for a buffered
clock design choices. Furthermore, the effectiveness of these clock tree with bounded buffer skew. The buffer skew
choices depends on the precise clock design objectives, such is the largest difference in the number of buffers in any
as bounds on power consumption and acceptable clock skews. source to sink path.
The algorithms proposed in this paper are intended to be used • Compute a zero-buffer-skew solution which yields the
in the early design of buffered clock trees. The methodology minimum number of buffer levels needed to satisfy a
that one may use with these algorithms is as follows. Assume given clock slew rate upper bound.
that an initial placement of storage elements is available. Then • Compute the minimum number of buffers given the
generate a clock routing, by applying an algorithm such as objective of minimizing the longest buffered path in a
proposed in [2], [6], and [13]. Given this clock tree, a slew clock tree. The longest buffered path is the source to sink
rate objective and a skew objective, the algorithms proposed path with the largest number of buffers.
in this paper return the minimum number of buffers required • Compute any of the above objectives given that the slew
to satisfy the objective and estimates of the locations of these rate constraint only applies to the clock sinks.
buffers in the clock tree, and thus on the chip. • Given a library of buffers of different sizes obtain area and
Since these algorithms do not consider module placement delay tradeoffs. Furthermore we show that we can obtain
constraints, it is very likely that the resulting buffer locations these tradeoffs by varying the clock slew rate constraint.
TÉLLEZ AND SARRAFZADEH: MINIMAL BUFFER INSERTION IN CLOCK TREES 335

The outline of this paper is as follows. In Section II delay properties of the buffers. With these ideas in mind, we
we introduce some definitions and terminology to be used outline our assumptions about the buffer and wiring timing
in the remainder of the paper. In Section III we describe models. These assumptions will be stated referring to a single
the assumptions made about the timing models to solve the input, single output buffer, as follows.
problems posed in this paper. In Section IV we formulate the • The shape of the input waveform of a buffer is parameter-
nonlinear buffer insertion problem with no assumptions, and ized by the rise time . The rise and fall time of the input
we proceed to apply the assumptions made in the previous waveform are assumed for convenience to be the same.
section to simplify the problem. In Section V we formulate • The input rise time of a fan-out in buffer section
a linear buffer insertion problem based on the experimental can be computed using a function of the input rise time
evidence from the previous section, and in Section VI we of buffer and the wiring lengths, tree topology, and
present an algorithm that solves the buffer insertion problems gate loads of the buffer section
and include proofs of the correctness of the algorithm. We
discuss algorithm extensions in Section VI. Finally, in Section (1)
VII we present experimental results and conclusions.
We make two important assumptions about the function ,
II. DEFINITIONS AND TERMINOLOGY which we formalize in the next two properties.
Property 1: The functions are strictly increasing in the
A clock net N consists of a set of terminals
range of practical values of the wire lengths and rise times.
, where is the set of real numbers,
Property 2: The structure of implies that the capacitive
terminal is the clock source, and terminals are
coupling (Miller capacitance) between stages is negligible.
the sinks of the clock net, which represent the locations of
Most CMOS timing models satisfy these properties [5],
the synchronizing components. A clock tree is a rooted tree
[14], [17]. Furthermore, the functions can be implemented by
over with root and leaves . The internal
any simulator, including SPICE. These properties do not seem
nodes of the clock tree are . An edge
overly restrictive and will be used as follows. Let there be an
connects a parent node and a child node . We denote the
upper bound on the input rise time of a buffer ,
number of children (or out-degree) of as , and the set of
then the rise time at the input of a buffer driven by can
children of as Child . If node lies in the path from node
be upper bounded using solely the wiring of the buffer stage
to the root (leaf), then node is said to lie above (below)
. The algorithms proposed
node . The length of the simple path from node to node
in this paper are based on this observation.
below it is given by . A node is said to be at level
if there are edges on the path from to the root. The
height of a tree is the largest level of any node of that tree. IV. PROBLEM FORMULATION
In a buffered clock tree, the internal nodes of the clock tree In this section, we formulate the so-called nonlinear con-
may represent buffers. Internal nodes which are not buffered strained buffer insertion problem. A rooted clock tree is
are called Steiner nodes. A buffer tree section for a given with an assigned input rise time . The stage rise time
buffer is a subtree of rooted at whose leaf nodes, of a buffer stage is defined as the largest rise time at a
or fan-outs of are either buffers or sinks of below . fan-out and is denoted by .
Internal nodes of the buffer section are Steiner nodes below Nonlinear Constrained Buffer Insertion Problem (NLCBI):
in . The fan-outs of buffer stage are denoted by , and Minimize the number of stages in a given rooted clock tree
the total wire-length in this buffer stage is denoted by . , such that for every stage , the stage rise
A wire of length has wire capacitance , and time does not exceed the upper bound: .
resistance , where and are the resistance and Since the input rise time of each buffer stage is bounded,
capacitance per unit length, respectively. The input capacitance the following bottom-up algorithm for buffer insertion is
of the buffer or sink will be obtained by . The total suggested. Traverse the clock tree starting at the leaves, and
capacitance seen under a node will be denoted by the func- going toward the root. As you traverse the clock tree, imagine
tion . The capacitance will have two components, the sliding a buffer from each leaf toward the root along the wiring
wire capacitance and the load capacitances of buffers of the tree. This process causes the buffer stage rise time to
and terminals . increase. Find the location where buffer stage rise time reaches
the upper bound , and insert a buffer at this location. Reset
III. BUFFER AND WIRING TIMING MODEL ASSUMPTIONS the buffer stage rise time and restart the process from this point
Timing models for buffered clock trees present a special with another buffer. When a Steiner node of the tree is reached,
challenge. On one hand, for analysis and design purposes, compute the buffer stage rise time, allow only one buffer to
one would like a realistic but simple timing model. On the continue further up the tree, and insert buffers at this point as
other hand, advances in technology require that the buffers and needed. Continue until the root of the clock tree is reached.
interconnects be accurately modeled. Furthermore, these same The above algorithm requires an exact computation of the
advances cause the accuracy of timing model assumptions function , which makes the analysis of the algorithm difficult.
to be short-lived. For this reason, we would like to obtain In the next section we show a method that allows a further
buffer insertion algorithms that do not depend on the use of simplification of the problem that leads to a linearized bottom-
a particular timing model, but rather depend on the inherent up buffer insertion algorithm.
336 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 16, NO. 4, APRIL 1997

Fig. 3. Worse case rise time for buffer stages 0–9 from a set of randomly generated tree topologies. The graph shows the rise times for buffer stages with
fan-outs f = 1
f1; 2; 3; 4g. Rise times obtained from SPICE simulations. All buffer chains have 0 = 1 ns. The capacitance bound is C = 690f f
and all buffer stages have this stage capacitance.

V. LINEARIZED BUFFER INSERTION PROBLEM TABLE I


PSEUDO-CODE FOR ALGORITHM PACKEDGE
We have investigated the properties of (1) with an experi-
ment which we describe next. Consider a chain of two buffers,
where the input rise time of the first buffer is . Suppose
these two buffers are connected by a wire of length . Now
compute the value of such that the input rise time at the
second buffer is .
We use the length to define the quantity:
. We conjecture that any buffer stage with stage
capacitance will have . While we
cannot prove this claim in general, we have found excellent
experimental evidence to this effect. Fig. 3 shows the rise
times of a set of ten buffer stages with total wire length equal
to . The figure shows the longest rise times obtained over
a random sample generated as follows. The ten stages are
connected in series, and for each stage, a clock tree topology
with leaves is chosen, then the wire length is distributed
randomly over the selected tree topology. The wiring parasitics
that solves the LCBI problem as follows: given compute
are modeled using a distributed RC model. The resulting tree
, then traverse the tree in bottom-up order. Buffers are
is simulated using SPICE3e2. The results show that the rise
inserted by two stages, packEdge and packNode, during the
time remains bounded from stage to stage, and that the worst
tree traversal. Next we proceed to extend this algorithm to
case is the rise time of a stage with fan-out . obtain buffered trees with prescribed buffer skew, minimum
We therefore argue that we can use as an effective phase delay and a prescribed rise time at the sinks. In this
upper bound on the capacitance of any buffer stage in the tree. section we will use the example in Figs. 4(a) and 5(a) to
We can now formalize the linear constrained buffer insertion illustrate the algorithms. We will use and ,
problem. where represents the minimum distance between buffers.
Linear Constrained Buffer Insertion Problem (LCBI):
Minimize the number of stages in a given rooted clock A. buBufferInsert
tree , such that for every stage , the stage The algorithm packEdge (see Table I) packs a given edge
capacitance does not exceed the upper bound capacitance: with a chain of buffers at a distance from each other.
. The buffer parent of node is inserted such that the effective
length under is . The effective length is a convenient way
VI. AN EXACT ALGORITHM of representing the stage capacitance: a capacitance is said
As shown in previous sections the solution obtained to the to have an effective length . A wire of length connected
LCBI is a feasible solution for the NLCBI problem. In this to this capacitance will have an effective length of .
section we propose an algorithm, so-called buBufferInsert, The effective length of a simple path from a node to a
TÉLLEZ AND SARRAFZADEH: MINIMAL BUFFER INSERTION IN CLOCK TREES 337

(a)

(b) (c)
Fig. 4. (a) Example for buffer insertion on a node vi . Fan-outs of vi are sinks sA ; sB ; sC . The distance between grid points is one. The examples
1=
will use ` 4;  = 1, and Cg = 0. (b) Example of buffer insertion after the packEdge algorithm has finished with edges eiA ; eiB ; eiC . Buffers
inserted by this algorithm fv1 ; v2 ; v3 g are represented by triangles. (c) Example of buffer insertion after the packNode algorithm has finished with
vertex vi . Inserted buffers fv5; v 6g are highlighted.

node below is defined as . TABLE II


Furthermore, the total effective length below a node is PSEUDO-CODE FOR ALGORITHM PACKNODE
.
The effective length below a node is stored in . Once
the packEdge algorithm has completed, the new child of
(let us say it is ) must store the remaining effective length
above it in . This value is used later. The results of the
algorithm on the example are illustrated in Figs. 4(b) and 5(b).
Algorithm packNode (see Table II) adds at most one buffer
per fan-out edge of . Each inserted buffer has .
The algorithm first selects a maximum number of fan-out
edges that will not require additional buffering. Buffers are
inserted in the remaining fan-out edges. As will be seen in
the proof of correctness, the order in which the unbuffered
edges are selected guarantees optimality. In this algorithm,
the quantity contains the effective length accumulated by
the children of . The results of this step are illustrated in
Fig. 4(c).
The algorithm buBufferInsert visits the nodes in depth first
order. After child of a node is visited, the algorithm
packs buffers into the edge . After visiting the children of Proof: We prove this by induction on the height of
a node , the algorithm caps the node by visiting its children the tree . The basis step of this proof is trivial so we start
and ensuring that the remaining effective length is less than with the induction.
. Upon completion, buBufferInsert returns the number of Inductive Hypothesis: The algorithm inserts a minimum
buffers inserted, and, as a byproduct, has inserted these number of buffers in a tree of height , such that
buffers into the proper locations in the clock tree. We now the capacitance of any buffer stage will satisfy .
show that the number of buffers inserted is optimal. Induction Step: We must prove that the algorithm will
Theorem 6.1: The algorithm buBufferInsert solves LCBI insert a minimum number of buffers for a tree of height ,
for any given capacitance . rooted at node . First observe that implies
338 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 16, NO. 4, APRIL 1997

(a) (b)

(c) (d)
Fig. 5. (a) Original example. (b) Example of buffer insertion after packEdge. (c) Example of buffer insertion after packNode-BS Step 1 has finished
with edges eiA ; eiC . Example is using S =0. Note that at the end of this step all paths have equal buffer length. (d) Example of buffer insertion after
the packNode-BS Step 3 is completed. Note that a buffer was added to all edges.

. Now consider a root with a single child We now formulate this problem as a packing problem. We
. The edge is packed with sections of effective length limit the total effective length to . We
, since any larger section would violate the capacitance add an unbuffered edge which contributes an effective
constraint and any shorter section could result in a larger length to the packing.
number of inserted buffers. Since the first Now we would like to pack as many of these edges as we
buffer added to edge , let us say , will be packed such can fit in the space of size . No additional buffers will
that . The packing continues until be added to these edges, and the remaining will be buffered.
, where is the last buffer inserted in the We do this by selecting the edges in increasing order of ,
edge. This scheme is implemented by the packEdge algorithm. thereby minimizing the number of inserted buffers. Note that
Now consider a root with more than one child. We first this section of the proof parallels algorithm packNode. Finally,
apply the packEdge algorithm to each edge of each child. At note that the induction follows the order in which buffers are
this point, two cases can occur. Case I: if , then inserted by the buBufferInsert algorithm. With this we have
we are done. Case II: corresponds to packNode. When we completed the proof.
pack the buffers in this case we would like to remove as much The bottom-up algorithm traverses the tree once. For each
effective length as possible from the vertex . We limit the child buffers are inserted, hence work is
distance to the root node to which we use to represent the done. Given that the algorithm inserts a total of buffers, the
minimum distance between buffers. A feasible packing of the total work done by the algorithm is .
vertex, with unbuffered edges, must satisfy This accounts for the priority queue, and the work necessary
to insert the buffers. Simplifying, and assuming that the tree
has degree , the time complexity of this algorithm is
For trees of interest , hence
the time complexity becomes . Since the tree and
buffer data are stored as part of a tree data structure, the space
complexity of this algorithm is also . We note that the
complexity of the algorithm can be improved as follows: let
with some manipulations, we obtain
each edge contain the buffers inserted in the edge as a triple
dist dist . Insert buffers as follows: the first buffer is
inserted at a distance dist from and the remaining
buffers at a distance dist from each other. By this method, the
TÉLLEZ AND SARRAFZADEH: MINIMAL BUFFER INSERTION IN CLOCK TREES 339

explicit storage of the buffers in the tree is avoided, reducing handled in previous steps. Therefore, a buffer addition
the time and space complexity of the algorithm to . to any such edge will decrease . So again, if the
capacitance constraint is satisfied we are done. This step
B. Skew is shown in Fig. 5(d).
4) Finally, apply the packNode algorithm considering only
In this section we investigate the problem of inserting a fan-out edges , where .
minimal number of buffers with an upper bound on the skew.
We name the complete algorithm buBufferInsert-BS. In
Skew can have several causes as follows:
Theorem 6.2 we show the correctness and optimality of this
• the number of buffers is not the same between two paths; procedure.
• the loads in the buffer stages are not matched prop- Theorem 6.2: The algorithm buBufferInsert-BS solves
erly causing differences in the buffer delays and in the LCBI-BS for any given skew and capacitance .
contribution to the delays of the stages; Proof: We prove this by induction on the height of the
• the wiring delays may also be mismatched. tree . The basis step of the proof is trivial so we will omit
In this section we only consider the first source of skew. it. Since the procedure packEdge cannot change the skew,
The other sources of skew can be handled using the methods then it need not be changed, and it remains optimal. The
proposed in [15]. The length of the longest and shortest remainder of the proof shows the correctness and optimality
buffered paths in a tree rooted at node are denoted as of the packNode-BS algorithm.
and , respectively. For a leaf node . Inductive Hypothesis: The algorithm inserts a minimum
The buffer skew of a tree rooted at node is defined as number of buffers in a tree of height , such
. that the capacitance of any buffer stage will satisfy
The buBufferInsert algorithm unfortunately produces solu- and the buffer skew will satisfy .
tions of arbitrary buffer skew. We propose a simple modifica- Induction Step: Again we must show that the algorithm
tion to buBufferInsert such that , where is a design will insert a minimum number of buffers for a tree of height
parameter. The examples of the previous section are used to , rooted at node . After Step 1 all children of node
illustrate the algorithm. is used in the example. The satisfy the buffer skew constraint. Since Step 2 does not
formulation of this problem is as follows. increase (decrease) the longest (shortest) path of , the skew
Linear Constrained Buffer Insertion Problem with Bounded constraint remains satisfied. Step 3 adds one buffer to those
Skew (LCBI-BS): Minimize the number of stages in a given edges in the shortest buffered paths, hence, the buffer skew
rooted clock tree , such that for every stage , the remains satisfied. Finally, if Step 4 adds a buffer to an edge,
stage capacitance does not exceed the upper bound capacitance then this edge will be in a longest buffered path, but will not
and the buffer skew satisfies . violate the skew constraint. We must also show that all edges
Note that during the packEdge step in the algorithm, are covered by Steps 1–4. Step 1 handles all edges that
the skew cannot be changed. Skew problems may only be violate the buffer skew constraint, Step 2 handles edges which
addressed during the packNode step. We now propose a are not in the longest path, and Steps 3 and 4 handle edges
solution to the LCBI-BS problem, wherein we modify the that are in the longest path. Thus all edges are covered. This
packNode step in the buBufferInsert algorithm. We assume completes the correctness portion of the proof.
we are working with a node . The modified procedure, so The optimality of the packEdge procedure was established
called packNode-BS, consists of four steps. by Theorem 6.1, hence only the optimality of the packNode-
1) First, the procedure inserts buffers such that the skew BS algorithm has to be shown. Trivially, the number of
constraint is satisfied at node . For each edge buffers inserted by Step 1 must be optimal. After this step
such that buffers the algorithm will stop when the capacitance constraint at
are added. The last inserted buffer has . is satisfied. By Theorem 6.1, Step 2 should also be optimal.
Note that is decreased by this step and every Finally, the only remaining edges where buffer addition can
subsequent step where buffers are added. We proceed decrease the effective length are in the longest path. Therefore,
to the next step only if , otherwise we are any buffer addition to these edges will violate the skew
done. This step is shown in Fig. 5(c). constraint. Step 3 ensures that this is not the case by adding
2) Next, apply the packNode algorithm considering only a buffer to every edge in a shortest path. Finally, and again
fan-out edges , where . After this by Theorem 6.1, Step 4 must be optimal. With this we have
step the capacitance constraint may not yet be satisfied. completed the proof.
In this case proceed to the next step, otherwise we are We now analyze the space and time complexity of the
done. buBufferInsert-BS algorithm. We assume in this analysis that
3) By this step we can only add buffers to edges in the the buffers are not stored explicitly in the tree, as shown at the
longest paths. Due to the skew constraint, if a buffer end of the previous section. Steps 1 and 3 take time ,
is added to any of these paths, then a buffer must be since the fan-outs of are visited at most once. Steps 2 and 4
added to all of the edges in the shortest paths. Hence have the same as the complexity of the packNode algorithm,
we add a buffer to every edge , where satisfies or . Since for every node we execute Steps 1–4
. If there are edges in a shortest only once, the time complexity of the buBufferInsert-BS
and longest path, then these edges will not have been algorithm is the same as that of the buBufferInsert algorithm.
340 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 16, NO. 4, APRIL 1997

Furthermore, since we can still use the same schemes to store TABLE III
the buffers, the space complexity also remains unchanged. COMPARISONS BETWEEN BUFFER SKEW, LONGEST BUFFER PATH, REAL
SKEW AND REAL PHASE DELAY, FOR TREES WITH DIFFERENT
NUMBER OF SINKS. REAL SKEW AND DELAYS ARE OBTAINED
BY TIMING SIMULATION OF THE BUFFERED CLOCK TREES
C. Rise Time and Phase Delay
As discussed in Section I, an objective in clock design
is to obtain clock signals at the clock terminals with given
rise times. The buffer insertion algorithm can be modified
to handle this requirement as follows: the input rise time of
the buffer is known to be bounded by . With the input
capacitance load of the storage elements it is possible to
compute a new capacitance upper bound such that the rise
time requirement at the sinks will be met. This upper bound
is then used as the initial upper bound in the buffer insertion
algorithm. Subsequently the algorithm uses for further
buffer insertions.
Another important objective in clock design is to minimize
the phase delay (or total delay) of the clock tree. As with
the buffer skew, we consider only the buffer contribution to
for different buffer sizes (see Fig. 6). We now prove that this
delays. Hence, we would like to minimize the length of longest
is a property of the buBufferInsert-BS algorithm.
buffered path , which is our measure of the phase delay.
Theorem 6.4: For a given tree, if two solutions from the
Next we will show that the buBufferInsert-BS algorithm is
buBufferInsert-BS algorithm and have skews and
also very effective in reducing the longest buffered path. Note
, respectively, such that , then .
that the packNode-BS algorithm avoids increasing the longest
Proof: For purposes of contradiction, suppose that there
path of a node, and leaves this as a last resort measure.
exist two solutions and with , and
Theorem 6.3: For a solution with skew , the
We can use solution to construct another solution, let’s say
algorithm buBufferInsert-BS minimizes .
, as follows: let , add buffers to the short paths
Proof: We prove this by induction on the height of the
such that . By construction this solution
tree . The basis step of the proof is trivial so we will omit it.
has and , which is a contradiction by
Inductive Hypothesis: The algorithm inserts buffers in a
Theorem 6.3. With this we have completed the proof.
tree of height , such that the longest path of
As a consequence of Theorem 6.4 we have shown that the
any node is minimum.
proposed buffer insertion algorithm also minimizes the phase
Induction Step: Again, since the capacitance constraint
delay of the clock tree. In fact, the objectives of minimizing
must be satisfied, the procedure does not have any choice
skew and minimizing phase delay are satisfied simultaneously.
during the edge packing step. Hence this proof must be shown
Corollary 6.1: The solution from the buBufferInsert-BS
for the packNode-BS step of the algorithm. We consider again
algorithm with zero buffer skew has a minimum buffered
the four steps during node packing. The first step does not
longest path.
increase the longest path. The second step may increase the
Corollary 6.2: For a given tree, if two solutions from the
longest path if the skew is less than . The third and fourth
buBufferInsert-BS algorithm and have skews and
steps will increase the length of the longest path by one. Hence,
, respectively, such that , then .
we have two cases. Case I: suppose the algorithm increases the
Corollaries 6.1 and 6.2 have two valuable consequences.
longest path during Step 3 or 4. Since the algorithm considers
all possibilities of satisfying the capacitance constraint before • Buffered clock trees generated using buBufferInsert-BS
increasing the longest path (Steps 1 and 2), then there must be with zero skew will also have minimal phase delays.
no other choice and therefore the resulting longest path must be • Worst case delays of trees where skew is not of concern
minimal. Case II: suppose the longest path is increased by the can be minimized by buffering with zero skew. The
application of Step 2. In this case any path may be arbitrarily number of buffers in these trees can be minimized without
lengthened, but by the theorem assumptions, the resulting skew affecting the phase delay by finding the solution with
must be . We can envision a case where it would largest buffer skew and longest buffered path length equal
be possible to reduce the length of the longest path. In Step 2, to the longest buffered path length in the zero skew solu-
we could instead choose for buffer insertion only those paths tion. Since the longest path length is nondecreasing with
with length shorter than the longest path. However, this action increasing skew, this search can be done in
would produce a solution with , a contradiction. time, where is the skew of the solution obtained
With this we have completed the proof. from running buBufferInsert.
We have observed experimentally (see Table III) that as we
decrease the buffer skew we also decrease the longest buffered VII. RESULTS AND CONCLUSION
path, the real skew, and the phase delay. Furthermore, we have We implemented the buBufferInsert-BS algorithm in C++
seen that this behavior is consistent across values of and on a SPARCstation 10. The results of the program were tested
TÉLLEZ AND SARRAFZADEH: MINIMAL BUFFER INSERTION IN CLOCK TREES 341

Fig. 6. Graph shows TR versus phase delay for a sample clock tree. Each graph represents different values of S for two buffer sizes. Note that, for
a given buffer, the S =0 bounds from below the other curves.

Fig. 7. Graphs of n (number of sinks) versus B for values of TR with unlimited and zero skew. Note that even for very small clock rise times, the
number of buffers inserted is smaller than the number of sinks.

and verified on randomly generated test cases. Each test case second case simulates a future WSI wafer and is intended to
was generated by randomly selecting cells on a square grid. create clock trees where the resistive load of the interconnect
The square grid was designed to simulate possible locations is of significance. MOSIS MCNC 2.0 metal-1 was used
for circuit storage elements. The test cases were routed using for the wiring electrical parameters and minimum dimensions.
the clock routing algorithm proposed in [10] producing clock SPICE3e2 was used for simulations, experiments, and timing
topologies with balanced wirelengths. The test cases were data. The MCNC MOSIS 2.0 technology parameters and
generated in two square chip sizes: 15 mm 15 mm and models were used to design the buffers and for the simulations.
5 cm 5 cm. The first case simulates the larger modern Fig. 7 shows results obtained from running the algorithm
CMOS chips, and is intended to create clock trees where the on trees of a range of sizes (64–16 386 pins), with a range of
capacitive load dominates rise time and delay calculations. The rise times (5–100 ns) and skew constraints (100 and 0 skew).
342 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 16, NO. 4, APRIL 1997

The trees used in this example are generated as described [11] , “Delay minimization for zero-skew routing,” in Proc. Int. Conf.
above. The quality of the algorithm is evidenced by the ratio Computer-Aided Design, IEEE/ACM, 1993, pp. 563–566.
[12] E. G. Friedman, “Clock distribution design in VLSI circuits—An
of buffers to tree edges. In the worse cases ns, overview,” in Proc. Int. Symp. Circuits Syst., May 1993, pp. 1475–1478.
this ratio does not exceed 0.6. The results shown [13] M. A. B. Jackson, A. Srinivasan, and E. S. Kuh, “Clock routing for
high-performance IC’s,” in Proc. Design Automation Conf., IEEE/ACM,
in the previous sections show that this algorithm is practical, 1990, pp. 573–579.
since it can be used to produce solutions with small phase [14] E. V. Meerch, L. Claesen, and H. D. Man, “SLOCOP: A timing
delays, predictable rise times, and small skews. By adding verification tool for synchronous CMOS logic,” in Proc. European
Design Automation Conf., IEEE/ACM, 1986, pp. 205–207.
load balancing algorithms proposed in [15] this algorithm can [15] S. Pullela, N. Menezes, J. Omar, and L. T. Pillage, “Skew and delay
be used to design buffered clock trees with near-zero nominal optimization for reliable buffered clock trees,” in Proc. Int. Conf.
skews and small sensitivities to parametric variations. Computer-Aided Design, ACM/IEEE, Nov. 1993, pp. 556–562.
[16] N. A. Sherwani and B. Wu, “Effective buffer insertion of clock tree for
In conclusion, we have proposed an algorithm for computing high speed VLSI circuits,” Microelectronics J., vol. 23, pp. 291–300,
a lower bound on the number of buffers needed in a buffered July 1992.
clock tree, given a target clock frequency. Our approach also [17] Y. H. Shih and S. M. Kang, “Analytic transient solution of general
MOS circuit primitives,” IEEE Trans. Computer-Aided Design, vol. 11,
constructs the buffered clock networks so it can be used for pp. 719–731, June 1992.
their construction. We have shown that although this approach [18] L. P. P. P. van Ginneken, “Buffer placement in distributed RC-tree
networks for minimal Elmore delay,” in Int. Symp. Circuits Syst., 1990,
is grounded on a complicated formulation, the problem can pp. 865–868.
be simplified to yield an efficient solution. Furthermore, we
have shown that the solution to this buffer insertion problem
is a practical starting point to the more general problem of
constructing buffered clock distribution networks. Gustavo E. Téllez (A’96) was born in Bogotá,
Colombia. He received the B.S. and M.S. degrees in
electrical engineering from Rensselaer Polytechnic
Institute, Troy, NY, in 1984 and 1985, respectively.
REFERENCES In 1996, he received the Ph.D. degree in computer
science at Northwestern University, Evanston, IL,
[1] M. Afghahi and C. Svensson, “Performance of synchronous and asyn- under the IBM Ph.D. Resident Study Program.
chronous schemes for VLSI systems,” IEEE Trans. Comput., vol. 41, From 1986 to 1992, he worked for IBM Corpo-
no. 7, pp. 858–872, 1992. ration in their EDA Development facility in East
[2] H. Bakoglu, J. T. Walker, and J. D. Meindl, “A symmetric clock- Fishkill, NY. He returned to his position at IBM
distribution tree and optimized high-speed interconnections for reduced EDA in 1996. His research interests include design
clock skew in ULSI and WSI circuits,” in Proc. Int. Conf. Computer and analysis of algorithms, computer architectures, timing and power-driven
Design, Oct. 1986, pp. 118–122. VLSI design, and clock network design.
[3] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI.
Reading, MA: Addison-Wesley, 1990, pp. 81–112.
[4] K. D. Boese and A. B. Kahng, “Zero-skew clock routing with minimum
wirelength,” in Proc. 5th Ann. IEEE Int. ASIC Conf. Exhibit, Sept. 1992,
pp. 17–21. Majid Sarrafzadeh (S’82–M’82–SM’91–F’96) received the B.S., M.S., and
[5] L. M. Brocco, S. P. McCormick, and J. Allen, “Macromodeling CMOS Ph.D. degrees from the University of Illinois at Urbana-Champaign in
circuits for timing simulation,” IEEE Trans. Computer-Aided Design, Electrical and Computer Engineering Department in 1982, 1984, and 1987,
vol. 7, pp. 1237–1249, 1988. respectively.
[6] T. H. Chao, Y. C. Hsu, and J. M. Ho, “Zero skew clock net routing,” He joined Northwestern University, Evanston, IL, as an Assistant Professor
in Proc. Design Automation Conf., IEEE/ACM, 1992, pp. 518–523. in 1987. Since 1991, he has been Associate Professor of Electrical Engineering
[7] T. H. Chao, Y. C. Hsu, J. M. Ho, K. D. Boese, and A. B. Kahng, “Zero and Computer Science at Northwestern University. He became a Full Professor
skew clock routing with minimum wirelength,” IEEE Trans. Circuits in 1997. His research interests lie in the area of VLSI CAD, design and
Syst., vol. 39, pp. 799–814, Nov. 1992. analysis of algorithms and VLSI architecture.
[8] J. D. Cho and M. Sarrafzadeh, “A buffer distribution algorithm for Dr. Sarrafzadeh received an NSF Engineering Initiation award in 1987,
high-speed clock routing,” in Proc. Design Automation Conf., 1993, pp. two distinguished paper awards in ICCAD-91, and the best paper award for
537–543. physical design in DAC-93. He has served on the technical program committee
[9] N.-C. Chou and C.-K. Cheng, “Wire length and delay minimization in of various conferences, for example, ICCAD, EDAC, and ISCAS. He is a
general clock net routing,” in Proc. Int. Conf. Computer-Aided Design, co-editor of the book Algorithmic Aspects of VLSI Layout, co-author of the
Nov. 1993, pp. 552–555. book An Introduction to VLSI Physical Design, on the Editorial Board of the
[10] M. Edahiro, “A clustering-based optimization algorthm in zero-skew VLSI Design Journal, and is co-Editor-in-Chief of the International Journal
routings,” in Proc. Design Automation Conf., IEEE/ACM, 1993, pp. of High-Speed Electronics, He is an Associate Editor of IEEE TRANSACTIONS
612–616. ON COMPUTER-AIDED DESIGN.

You might also like