ASIC Timing
ASIC Timing
Timing in ASICs
3.1 INTRODUCTION
The number of ASICs designed increases every year. Advances in technology allow more transistors to be packed onto a single die which expands the applications where they can be used and accelerates development. Successful development of an ASIC depends on accurate modeling of its operation. Designing a circuit to be logically correct is simple. Producing an accurate timing model is critical to successful development. Current methodologies for generating accurate timing models for ASIC designs are described here. Integrated circuits start as computer representations of a physical device. The designers goal is to model the device characteristics with sufcient accuracy that actual silicon behaves as the model predicts, assuming the computer simulations exercise the model in the same way the device is expected to operate in the real world. Modeling a devices logical operation is relatively simple, and the translation from the model to the physical would be easy if it were not for the major difference introduced during fabrication: timing delays. The conversion of a logic statement to a model of its physical implementation is shown in Figure 3.1. The operation of the circuit
45
46
Timing in ASICs
Chap. 3
a b
R Out C
Logic Statement
Physical Device
in Figure 3.1 is affected by the charging and discharging of the parasitic capacitor through resistors, both of which are inherent to silicon physical implementation. The stray capacitance and resistance can have such a great and deleterious effect that the physical operation is nothing like the simulated logical model. A circuits correct operation can be assured only if the timing of the simulated model is a close approximation of the nal device. The accurate modeling of delay is of major importance. As process geometry shrinks and the number of transistors per die increases, the task of modeling the effects of parasitic capacitance and resistance makes it more challenging to correlate prelayout to postfabrication timing. Fortunately, CAD tools exist to accurately estimate delays before layout and extract the capacitance and resistance once layout is complete. Modeling estimated and extracted delays plays an important part in guaranteeing the timing and operation. Any delay value used before the device is fabricated is merely an estimate. The four sources of delay are shown in Figure 3.2. Gate delay is determined by input slew rate and the inherent RC loading of the gate. Delay through a line depends on the RC load the gate drives. The fanout load simply increases the capacitance the driver must charge and discharge. Methodologies for predicting delay are well established. Gate delay is measured from fabricated test struc-
3.1 INTRODUCTION
47
R Input C
Fig. 3.2 Components of Circuit Delay: Input Slew Rate, Inherent Gate Delay, Line Propagation Delay, Fanout Load
tures tested at specic operating points. A transistors speed, and therefore the inherent delay in a gate, is affected by its dimensions, the supply voltage, doping levels, input slew rate, operating temperature, and fanout load. The data measured from the test structure provides a device model that extrapolates to estimate delay under all operating and fabrication conditions. The delay due to signal lines may be modeled in two stages: prelayout and postlayout. In either case, the physical characteristics of fabricated traces are known, having been measured from test structures. In the absence of layout, the unknown elements that affect timing are the traces length, width, and surrounding signals. Figure 3.3 shows the parasitic capacitors seen by a metal trace. Parasitic capacitance is explored in detail in section 3.3. Before the layout is completed, any delay attributed to a signal line is an estimate based on probable length and width of the trace. Since the actual path is not known, the length is simply a guess based on the size of the overall circuit and the probability of placing the output of one gate close to the input terminals of the gates it drives. Another unknown aspect of the trace is the topology over which it passes. Once the layout is nished, the trace lines, and therefore their
48
Timing in ASICs
Chap. 3
Substrate
delay, can be accurately modeled. The layout xes their length and reveals what lies under the trace, whether it is substrate, transistors, or other layers. Delay estimations are made in all stages of design: prelayout, synthesis, and postlayout. The most common methods used to estimate delays at all stages of the design cycle are explored.
49
Example 3.1
RTL Code
out = ((a & b) | !a);
Gate-Level Code
and(a, b, s2); not(a, s1); or(s1, s2, out);
Modeling delay at the gate level is straightforward. The delay of each gate is found in the technology library. The appropriate delay can be assigned to every gate in the code and the propagation delay of signals estimated to provide a fairly accurate representation. However, manually implementing HDL code with delays for each gate is time consuming. At the prelayout stage, most design methodologies use synthesis to provide a gate model with delays while RTL code is used to model the circuits behavior.
s1 a out s2 b
50
Timing in ASICs
Chap. 3
A clear method of accounting for delay is to determine the delay through each gate. The technology library already has delay information for every gate. Accurate modeling requires the assignment of the appropriate delay to each gate as described below. Estimating the delay of the RTL code is more difcult because of its level of abstraction. Until synthesis is complete, there is no straightforward way to correlate RTL code to actual gate delays. The level of coding used affects the delays that can be modeled. Generally, RTL code is used to determine correct logical operation without regard for delays. A design at the gate level not only checks for correct operation, it also ensures that delays meet the required timing. Most designs start with an RTL code, then use synthesis to generate the gates needed to verify timing. Furthermore, few designs start at the gate level because the simulations, especially when timing is included, are very slow. Design at the RTL level offers a fast method to ensure that the logic is properly implemented. Synthesis then converts the design to gates that include delays from gates, estimated routing, and fanout.
A high-level system is shown in Figure 3.5. Each block is implemented as RTL. The RAM and the EPROM will not be synthesized. They are both modeled as an array of memory indexed by the address. The processor comes from a vendors library. Its model reects only the bus transactions that take place. The address-
51
Program Code
RAM
EPROM Processor Bus Model Address Decoder Address Data Low Speed I/O Port
Peripherals
decode and low-speed I/O port will be synthesized and include any logic and ip-ops needed to perform their functions. Timing is important in the system simulation. At the RTL level, it is possible to see if the processor bus timing matches the RAM and EPROM timing. It can be determined if the decoder has too much delay or if the read/write timing of the I/O port meshes with the processors requirements. The timing response of each block can be added to the model. The read timing of the RAM is given in Figure 3.6. When the RTL model detects a read cycle, it can instantaneously get the data from its memory array and present it on the bus, but a fast response does not correspond to reality. The delay, shown in Figure 3.6 as Tvavd must be implemented in the model to reect the time actually needed for the RAM to access and present valid data. The
52
Timing in ASICs
Chap. 3
Address
R/Wb
Data
Valid
Valid
response time of the address decode cannot be instantaneous, but should reect a delay based on the maximum delay it can have and still work in the system. The I/O port also needs bus timing to match the processors characteristics. The processor model comes from the vendor with timing that matches the processors real operation. The processor cycle time provides a check of the timing of all the other blocks. If a block meets the bus cycle time, it will work when fabricated. A snippet of Verilog code, shown in Example 3.2, demonstrates how to implement the Tvavd and Ts2z delays in the memory model.
Example 3.2
1. 'define Tvavd 2. 'define Ts2z 4. 5. 6. 7. 8. 9. 10 5 // data delay out of memory // delay of deselect to tristate
3. module RAM (addr, data, sel, rw); input [15:0] addr; inout [15:0] data; input sel, rw; reg [15:0] mem_array [0:65536], data_internal; // data bus tristate. Bi-directional. assign #Ts2z data = (sel) ? data_internal : 16'bz;
53
Note when the memory is read, the assignment of the data from the array to the bus is delayed by the time Tvavd. The data bus response to the sel signal is also delayed by Ts2z. Whenever the timing of a module is known, it should be implemented in the RTL model; however, HDL languages offer different types of delays. It is important to understand how the delay is applied to ensure the model mirrors the real world. In Verilog, the two main default types are regular and intra-assignment. The effects of both types on continuous blocking, and nonblocking assignments are discussed below.
Example 3.3
Assign #5 sel = address15 | address16 | address17;
The output, sel, is simply the OR of the inputs address15, address16, and address17. Logically, whenever one of the address signals goes high, sel goes high; however, delay changes that fundamental assumption slightly. The relationship between the input and the output signals is shown in Figure 3.7.
54
Timing in ASICs
Chap. 3
Address15
Address16
Address17
Sel
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95 100 105
At 20ns, each input sequentially goes high for 3ns. Each input stays high for less time than the specied delay of 5. The output does not change because the delay is inertial and no input is highlonger than the delay. At 50ns, address15 goes high for 6ns. After the input signal has been high for 5ns, the output responds and produces a pulse 6ns wide. At 70ns, both address16 and address17 go high for 3ns, but they are coincident and do not satisfy the inertial delay requirement, so the output does not change. At 90ns, a 3nswide pulse on address16 overlaps a 4ns-wide pulse from address17. The simulator interprets the overlap as meeting the delay requirement and a 7ns pulse occurs on the output. Both the continuous assignment statement and the regular delay operate like combinatorial logic. Just as the delay through a gate suppresses glitches, so does the regular delay when used with a continuous assignment statement.
55
regular and intra-assignment delays on both types of statements are shown below. A regular delay with a blocking assignment is given in Example 3.4.
Example 3.4
always @(posedge clk) begin #2 q1 = d; #2 q2 = d; #3 q3 = d; end
A blocking statement means the simulator is blocked from moving on to any subsequent statement until the present one is complete. A regular delay delays evaluation of the inputs. The output signals that correspond to the process in Example 3.4 reveal exactly how a regular delay in a blocking statement works. Refer to Figure 3.8.
clk d
q1
q2
q3
10
15
20
25
30
35
40
45
50
Fig. 3.8 Signals from the Blocking Assignment Statements with Regular Delays from Example 3.4
56
Timing in ASICs
Chap. 3
At 15ns, when the clock goes high, the simulator begins to execute the rst statement, #2 q1 = d. It interprets it to mean: after a delay of 2 time units, assign the current value of d to q1. At 17ns, d is zero, so q1 becomes zero. The simulator waits at the rst statement until it is completely nished; then it moves to the second statement. The second statement means the same as the rst: wait 2 time units, then assign the present value of d to q2. Waiting an additional 2 time units means the value of d at 19ns is assigned to q2. At 19ns, d is zero, so q2 is assigned a zero. The simulator stays at the second line until the assignment to q2 takes place; then it moves to execute the third statement. The last statement has a delay of 3 time units. Like the previous regular delays, the simulator waits the specied time, 3 time units, then assigns d to q3. In this case, d changed to a one at 21ns, so when the simulator evaluates d at 22ns, it assigns a one to q3. The important concepts to remember about regular delays and blocking assignments are:
Blocking Assignments: Finish executing the current, including the delay, before moving to the next line. Regular Delays: Wait the specied delay before evaluating the input signals and determining the output signal.
Intra-Assignment Delay: Upon execution, immediately evaluate the input signals and determine the value of the output signal. Wait the specied delay before assigning the value to the output.
The regular delay waits, evaluates, then assigns. The intraassignment delay evaluates, waits, then assigns. An intra-assignment delay with blocking assignment statements is given in Example 3.5 along with the process statement that generates the input signal d.
57
Example 3.5
The waveforms in Figure 3.9 show how the input is evaluated immediately upon execution. At 15ns when the clock goes high, the rst statement immediately grabs the value of d. The positive edge of clk triggers both the evaluation of d and its transition. At clks positive edge, d has not yet changed and does not change until after it is grabbed by the q3 = #2 d assignment statement. As a result, the value assigned to q3 is ds value just before the clocks rising edge. At 15ns, ds value is one, so a one is grabbed and 2 time units later, at 17ns, a one is assigned to q3. The execution of the rst statement is done, so the execution of the second assignment statement begins. At 17ns, the value of d is zero, so a zero value is grabbed by the second assignment statement and is assigned 2 time units later to q4.
always @(posedge clk) begin q3 = #2 d; q4 = #2 d; end always @(posedge clk) begin d <= ~d; end
clk d
q3
q4
0 5 10 15 20 25 30 35
Fig. 3.9 Signals from Blocking Assignment Statements with IntraAssignment Delays from Example 3.5
58
Timing in ASICs
Chap. 3
The operation of the intra-assignment delay is the same with nonblocking assignment statements, but the operation of a nonblocking statement does affect the output.
Nonblocking assignment: Execute all nonblocking statements simultaneously. Do not execute them serially.
Nonblocking assignment statements with intra-assignment delays are given in Example 3.6. The waveforms in Figure 3.10 show the value of d at the rising edge at 15ns to be a one. Two nanoseconds later, the value of one is assigned to both q7 and q8. Since the delay is the same in both statements, both outputs change at the same time. Of the delay and assignment types described above, continuous assignment statements with regular delays closely model combinatorial logic. However, nonblocking assignment statements with intra-assignment delays in an always block, controlled by the clock, exactly model a op-op or sequential logic. The regular and intra-assignment delays with continuous blocking and nonblocking assignment statements allow the designer to put delays anyplace in the circuit; however, assigning delays to possibly every line of RTL code takes a lot of time. As discussed in the section on synthesis, detailed timing should wait until synthesis or layout is complete. At the RTL level, it is sufcient to describe delays the boundaries and not lower. A higher level of granularity saves time developing code and also provides enough timing information to do meaningful analysis until the synthesis is complete. All HDL languages can express delays between module inputs and outputs. The approach taken in Verilog is presented below.
59
Example 3.6
At the rising edge of clk, both assignment statements start execution. As shown in Figure 3.10, the positive edge of clk at 15ns causes both assignment statements to grab ds value. With blocking statements, the second statement was not executed until the rst was completed, but with nonblocking statements both immediately start execution. Since the delay is intra-assignment, the input signal is immediately evaluated; then both statements wait 2 time units before assigning the evaluated result to the outputs.
always @(posedge clk) begin q7 <= #2 d; q8 <= #2 d; end
clk d
q7
q8
0 5 10 15 20 25 30 35
Fig. 3.10 Signals from Nonblocking Assignment Statements with IntraAssignment Delays from Example 3.6
ports. A RAM memory module is again used to show how timing delays and verication are easily implemented in the specify block. A synchronous memory easily displays what types of checks can be done. The memory has the following timing requirements as shown
60
Timing in ASICs
Chap. 3
in Table 3.1. Although numerous other parameters are needed to specify correct operation, these are sufcient to show how timing checks are dened. A diagram of the timing given in Table 3.1 is shown in Figure 3.11.
Table 3.1 Synchronous Memory Timing Parameters Parameter Tclk_period Tclk_high_min Tclk_low_min Taddr_clk_setup Taddr_clk_hold Tsel_clk_setup Tsel_clk_hold Time (ns) 20 9 7 4 3 4 12 Parameter Tclk_data_valid T0_to_z Tz_to_1 T1_to_z Tz_to_0 Trise Tfall Time (ns) 9 0.1 0.3 0.1 0.2 0.5 0.3
Tclk_period clock
Tclk_low_min
Tclk_high_min
Taddr_clk_setup addr
valid invalid valid invalid valid invalid valid
Taddr_clk_hold
invalid
Tsel_clk_setup sel
Tsel_clk_hold
61
Example 3.7
All the timing parameters listed in Table 3.1 are codied in the specify section. Each parameter is listed as a specparam. The parameter names of Table 3.1 directly correspond to the specparam names for easy correlation. The specparam statements span lines 30 through 43. The paths through the module are declared and described in terms of the timing parameters. The memory model has only two paths with dened delays: clock to data and sel to data. The statement that denes the delay from the rising edge of the clock to valid data out is on lines 46 and 47 of Example 3.7. If the input signal, sel, is active, the delay from the rising edge of the clock to valid data out is dened in the parentheses following the equal sign. The delay from clock to valid data is Tclk_data_valid and the rise and fall times of internal signals are Trise and Tfall. The value for clk_data_valid is combined with the rise and fall times to provide more accurate delays.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. module RAM (clk, addr, data, sel, rw); input input [15:0] inout [15:0] input reg [15:0] reg clk; addr; data; sel, rw; mem_array [0:65536], data_internal; tprob;
// data bus tri-state. Bi-directional. assign data = (sel) ? data_internal : 16'bz; // Always statement that does the actual read and write always @ (posedge clk) begin // read memory if ((rw === 1'b1) && (sel === 1'b1)) data_internal = mem_array [addr[15:0]]; // write memory if ((rw === 1'b0) && (sel === 1'b1)) mem_array [addr[15:0]] = data; end
// The specify block where all the timing and verification is placed. 27. /*****************************************************/ 28. specify
62
Timing in ASICs
Chap. 3
The meaning of the numbers in parentheses, lines 46 through 50, is summarized in Example 3.8. The rst term denes the time it takes for a signal to transition from zero to one, the second is the time to transition from one to zero, the third is zero to high impedance, high impedance to one, one to high impedance, and high impedance to zero.
63
Example 3.8
(0 -> 1, 1 -> 0, 0 -> z, z -> 1, 1 -> z, z -> 0)
For the memory module, the delay from the positive edge of the clock to valid data only needs to have the zero to one and one to zero transitions dened because clock transitions do not cause the data bus to tristate. The sel signal does cause the data bus to tristate, so the delay statements that dene the relationship between the sel input and the data bus, lines 49 and 50 in Example 3.7, do not specify 0->1 or 1->0 delays. The statement on line 49 denes the time it takes to tristate the bus when sel goes inactive. Line 50 denes the time for the bus to leave tristate when sel becomes active. Periods, pulse widths, and setup and hold times are also checked to see if they are in the specication. Lines 53 through 55 check the clock period, the time it is high and the time it is low. The setup and hold times of the address with respect to the clocks rising edge are checked in lines 57 and 58. The setup and hold times of sel to the rising edge of clock are checked in line 60. The formats of the verication statements are explained in Example 3.9. The notier toggles every time a violation is found. The always statement after the specify block, lines 65 through 68, is activated when the notier toggles to report the time of the violation.
Example 3.9
$period (ref_event, limit, notifier); $width (ref_event, limit, threshold, notifier); $setup (data_event, ref_event, limit, notifier); $hold (ref_event, data_event, limit, notifier); $setuphold (ref_event, data_event, s_limit, h_limit, notifier);
The specify block in Verilog HDL provides a convenient and powerful way to add timing to modules. It offers the right level of timing for RTL code. More specic and involved timing is available after synthesis or layout, automatically through the use of CAD tools. Do not spend time at the RTL level adding too much detail. Simply take advantage of any model-level timing offered by the simulator.
64
Timing in ASICs
Chap. 3
1. Maximum fanout per gate 2. Maximum transition time of a signal 3. Maximum allowable capacitance per net
65
The designer species optimization constraints to control these elements: 1. Speed 2. Area
set_max_fanout: Every input pin of every gate of the library has a fanout-load attribute. The sum of all fanout loads connected to an output cannot exceed the max_fanout limit. The command limits only the number of gates driven by any given output. Loading from wire capacitance is not controlled with this command.
66
Timing in ASICs
Chap. 3
set_max_transition: The transition time is the amount of time it takes to charge or discharge a node. It is a product of the signal-line capacitance and resistance. The command set_max_transition watches the RC delay on a wire. In an effort to stay below the max_transition limit, the synthesis tool may increase the drive capacity of a gate to better swing the load or limit the capacitance and resistance by setting constraints that can be passed on to the oorplanner. The characteristics of the wire, such as area, capacitance, and resistance, are found in the wire-load model. set_max_capacitance: There are two components to a load on a net: fanout (other gates) and interconnect capacitance. The command set_max_capacitance checks to see that no gate drives more capacitance than the limit whether the source be interconnect or gate capacitance. There is no direct correlation between the command and net delay, simply between the command and capacitance. The wire-load model details the capacitance of a wire.
The three constraints mentioned above must be used in conjunction to ensure that the limits of the library are not exceeded. In the case where the library constraints do not match the limits set by the designer, the synthesis tool will meet the more restrictive value.
67
requirement. Synthesis and STA work best on synchronous designs. There are techniques to deal with asynchronous circuits; however, if it is possible to design the circuit to be synchronous, it will t into the modern ASIC ow with fewer exceptions that need to be manually checked. There is another design practice, in addition to synchronous design, that enhances the use of synthesis and STA. The static timing analyzer in the synthesis tool considers the clock tree to be ideal which means there is no delay between the clock source and the input of any gate. In a design where the clock signal goes directly from the clock tree to the gates, its operation is nearly ideal. Any design technique, such as gated clocks, that places delays in the clocks path will not work unless the amount of delay in the clock is quantied. It is possible to use the clock skew parameter to account for the delay in the clock, but it must include both the skew of the tree and the delay through gates. The clock delay through gates is not automatically measured, so it may be a difcult gure to arrive at. It is a good design practice to not gate the clock. The designer can control the synthesized speed of the circuit with commands explained below.
create_clock: At a minimum, the synthesis tool must know the clocks period and duty cycle. The clock sets the time allowed for signals to propagate between sequential elements. The create_clock command also species clock skew. set_input_delay: The delay of the input of a module is assumed to be zero. The circuit, shown in Figure 3.12, has four inputs and is considered a module. Two inputs go directly to ip-ops while the other two inputs go through gates before they reach a ip-op. The delay time for input a to reach the ip-op is input_delay. If the set_input_delay command species the input_delay as 2ns, then the synthesis tool measures the delay of a and b as 2ns and if necessary modies the design appropriately to still work at speed. The delay of input c or d is: input_delay + and-gate delay + or-gate delay. The value of input_delay is added to the gate delays to arrive at the nal
68
Timing in ASICs
Chap. 3
Q D Q Out1
c d
Fig. 3.12 The set_input_delay Command Adds Additional Delay to Module Input Times
speed of the path. If there is a lot of input_delay, the synthesis tool chooses faster gates to maintain the overall speed specied by the designer. set_output_delay: The delay out of a module can be increased by the amount specied by set_output_delay. The module shown in Figure 3.13 has two output signals. The delay of
Out1
Out2 b D Q
Fig. 3.13 The set_output_delay Command Adds Additional Delay to Module Output Times
69
out1 is: ip-op propagation delay + output_delay. If the set_output_delay command sets out_delay to 5ns, the delay of out1 is 5ns longer than the propagation delay of a ip-op. The delay of out2 is: ip-op propagation delay + [maximum of (andgate or or-gate delay)] + and-gate delay + output_delay. Once again, the output_delay adds to the circuits inherent delays. set_max_delay: Timing constraints can be placed on asynchronous paths with set_max_delay and set_min_delay. The values set by these two commands determine the time allowed to propagate through a path not controlled by a clock. set_min_delay: Refer to set_max_delay. set_max_area: The area constraint is set by a single command. If an area is specied, the synthesis tool will try to keep the area of both the gates and the wires under the max_area limit. The area of the wires can only be estimated if it is specied in the wire-load model.
Once the design rule and optimization constraints are specied, the synthesis tool works to nd the correct gates to implement the logic functions specied in the RTL code with the timing specied by the designer. Timing in ASIC standard cell circuits cannot be fully understood without knowing the source of gate and wire delays.
Every gate available to the synthesis tool must be described as a library model. A sample of a library that con-
70
Timing in ASICs
Chap. 3
tains only an AND-gate is given in Example 3.10. The description of the AND2 cell provides all the information the synthesis tool needs to determine if it can meet timing and area requirements. The area of the cell is given on line 3. Each input pin, lines 6 and 11, is described with its associated capacitance, lines 8 and 13, so the synthesis tool can calculate total fanout loads for the driving gates. The output is described in terms of the logic function it performs, line 18, in addition to the response of the output with respect to each input. The timing response of the output with respect to input A is given in lines 20 through 27, and with respect to input B in lines 28 through 36. The most important timing gure is the propagation delay for rising and falling transitions as controlled by each input, which is given in lines 20 and 21 and 29 and 30. The output rise and fall times and slopes are given along with the output resistance.
Example 3.10
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. library (proc_35) { date: "September 29, 2001" revision: 1.9 cell(AND2) { area: 3 pin(A) { Direction: input Capacitance: 1.2 fanout_load: 1.0 } pin(B) { Direction: input Capacitance: 1.2 fanout_load: 1.0 } pin(Z) { Direction: output Function: "AB" Timing(): { intrinsic_rise: 1.38 intrinsic_fall: 0.97 rise_resistance: 1.00 fall_resistance: 1.00 slope_rise: 1.00 slope_fall: 1.00 related_pin: "A" } Timing(): { intrinsic_rise: 1.38 intrinsic_fall: 0.97 rise_resistance: 1.00 fall_resistance: 1.00 slope_rise: 1.00
71
The synthesis tool estimates wire delays using a wire-load model that relates a nets estimated length to estimated capacitance and resistance. The manual calculation of the characteristics of a line is fully described in section 3.3. The synthesis tool uses the same techniques to nd the RC delay of each net. There is a statistical aspect of the wire delay calculation. The actual length of each net is unknown to the synthesis tool; however, it makes a guess using statistics of routing from the reference design. Based on the statistical estimate of length, it calculates area, capacitance, and resistance. The delays determined using vendor wire-load models are inaccurate because the model is design dependent. If your design is not similar to the reference design used to make the wire-load model, there is signicant error; however, the estimated delay decreases the number of synthesis iterations because estimated delay is better than ignoring it altogether. Fortunately, more accurate wire-load models can be generated specically for a given design. As soon as the RTL code is complete, the design can be synthesized and given to a oorplanner, then a place-and-route tool. The information from the preliminary route is fed back into the synthesis tool to make custom wire-load models that are much more accurate than the vendor-supplied models because they are design specic. The most accurate wire-load models are available after the place-and-route procedure once each wires exact dimensions are known. A wire-load model is shown in Example 3.11.
Example 3.11
Wire-load ("16x16") { Resistance : 0.1 ; Capacitance : 1.85;
72
Timing in ASICs
Chap. 3
Another approach to compensating for inaccurate wire-load models is to synthesize to a faster clock than the design will actually use. The synthesis tool chooses gates capable of driving larger loads, so when the accurate delays are fed back to the simulator after layout, the extra speed is used up in driving the lines. Another technique is to overestimate the capacitance values of the gates in the library so the synthesis tool chooses gates with extra drive capacity. The problem with any approach based on deliberate overdesign is that the area is larger than it may have to be and the amount of overcompensation, whether it be in time or capacitance, is merely a guess. The best approach is to have the most accurate models possible, which for wire-load models means that the data from an early oorplan should be used to develop accurate wire-load models.
73
4. Place and route the design. 5. Extract cluster values (PDEF), delay values (SDF), and parasitic estimated values (RC). Feed the information to the oorplanner. 6. Create wire-load models using the information from place and route. 7. Back annotate the wire-load models into the synthesis tool (or a stand-alone static timing analyzer). Analyze the design to see if it meets timing requirements. 8. If the timing is close, use the reoptimize_design command to x the few problems that exist. Generate new constraint information; then go to step 3 when done. 9. If the timing is not close, use the new wire-load models to synthesize again. Generate new constraints and return to step 3. 10. If the timing has plenty of slack, do the nal place and route. Go to step 9. 11. If the timing is perfect after the nal oorplan place-and-route iteration, the design is done. Otherwise x the few minor problems that exist with the in-place optimize option of the synthesis tool. 12. Write out nal delay and parasitic values for use in a static timing analyzer or in RTL-gate simulations as a nal verication that the correct timing was achieved. There is a tremendous amount of communication between the synthesis tool, the oorplanner, the place-and-route tool, static timing analyzers, and even the RTL simulator. The information sent from each tool helps the next tool in the process do its job better. Each iteration brings the design closer to the correct timing which is veried with either a static timing analyzer or RTL-gate simulations with full-timing back annotation. Three common le formats pass the information between the tools: physical data exchange format (PDEF), standard delay format (SDF), and resistance/load scripts. Each is described below.
74
Timing in ASICs
Chap. 3
Physical Data Exchange Format (PDEF): The PDEF le contains information about the clustering of cells. The synthesis tool determines which cells should be close to each other (in a cluster) based on how the RTL le is organized. Since most designers partition their designs based on logic functions, the synthesis tool also groups logically. Once the oorplanner gets the netlist, it places the cells together based on timing or routing considerations. It generates a PDEF le based on physical placement that may not be anything like the logical groupings generated by the synthesis tool. A typical PDEF le is shown in Example 3.12.
Example 3.12
(CLUSTERFILE (PDEFVERSION "2.0") (DESIGN "top") (DATE "October 29, 2001") (VENDOR "Intrinsix") (DIVIDER /) (CLUSTER (NAME "MultB1") (X_BOUNDS 0.0 (Y_BOUNDS 0.0 150.0) 163.0)
(NAME "MultSub1") (X_BOUNDS 0.0 25.6) (Y_BOUNDS 0.0 89.3) (CELL (NAME U24/U78) (LOC (CELL (NAME U24/U45) (LOC (NAME "MultSub2") (X_BOUNDS 25.6 78.8) (Y_BOUNDS 0.0 89.3) (CELL (NAME U55/U14) (LOC (CELL (NAME U55/U83) (LOC ) (NAME "MultB2") (X_BOUNDS 150 (Y_BOUNDS 0.0 204.0) 79.0)
3.8 16.0
16) 46.5)
30.0 51.0
42.7) 46.5)
(NAME "Nts1") (X_BOUNDS 154.2 36.0) (Y_BOUNDS 185.0 0.0) (CELL (NAME U47/U64) (LOC (CELL (NAME U86/U37) (LOC ) )
153.8 167.0
24.0) 28.2)
75
Standard Delay Format (SDF): The standard delay format le species delays. It is a case-sensitive format. The synthesis tool uses the SDF le to pass timing constraints to the oorplanner, an action known as forward-annotation. It uses the PATHCONSTRAINT parameter to tell the oorplanner the amount of propagation delay allowed for critical paths. The format of the PATHCONSTRAINT statement is given in Example 3.13.
Example 3.13
(PATHCONSTRAINT port_start [intermediate_node, ...] port_end (rise time) (fall time))
A simple SDF constraint le for the circuit shown in Figure 3.14 is shown in Example 3.14. The three highlighted paths are described.
s0
i1 i2 z i3
c1
s1
i1 i2
c2
z
c1
i1 z
s2 c2 out
i1 z i2
Fig. 3.14 Circuit Corresponding to the SDF Constraint File of Example 3.14
Example 3.14
(DELAYFILE // Start of the sdf header. This file contains all typical data (SDFVERSION "1.0") (DESIGN "test")
76
Timing in ASICs
Chap. 3
The same le format passes delay information from the synthesis tool to the RTL simulator and from the oorplanner/router to synthesis or RTL. The format can dene the delays across a module, gates, or interconnect. Timing for setup, hold, setuphold, skew, width, and period are also valid parameters. Delays can also be specied to be absolute or incremental. Most HDL simulators use a subset of the SDF parameters. The designer does not need to do anything with the SDF le. The simulator accepts and assigns the delays using built-in system tasks. For Verilog, the command to read an SDF le is $sdf_annotate. The user can specify if minimum, typical, or maximum timing values are extracted from the SDF le and can set a scale factor if desirable.
77
P1
s1
C1 i1 z
C2
bk3
Q Q
P2
z s2
C3 i1 z i2 P3 C4 i1 z i2 P4 C5 i1 z i2 C6
i Q Q i
P5
The SDF le for the circuit shown in Figure 3.15 is given in Example 3.15. It includes the most common parameters used by RTL simulators. Each construct is also described.
Example 3.15
1. (DELAYFILE 2. // Start of the sdf header. 3. // This file contains all typical data. 4. (SDFVERSION "1.0") 5. (DESIGN "test") 6. (DATE "Monday January 30 08:30:33 PST 1999") 7. (VENDOR "Intrinsix Corp.") 8. (PROGRAM "delay_find") 9. (VERSION "3.6") 10. (DIVIDER /) 11. (VOLTAGE 5.0:5.0:5.0) 12. (PROCESS "typical") 13. (TEMPERATURE 85:85:85) 14. (TIMESCALE 1ns) 15. // description of interconnect delays.
78
Timing in ASICs
Chap. 3
(.163:.163:.163) (.147:.147:.147)) P1/z s2/c4/i2 (.152:.152:.152) (.139:.139:.139)) P2/z s1/c2/clk (.102:.102:.102) (.099:.099:.099)) P2/z s2/c6/clk (.109:.109:.109) (.101:.101:.101)) P3/z s2/c3/i2 (.178:.178:.178) (.165:.165:.165)) P3/z s2/c4/i1 (.176:.176:.176) (.163:.163:.163)) s1/c1/z s1/c2/d (.184:.184:.184) (.175:.175:.175)) s1/c2/q s2/c3/i1 (.171:.171:.171) (.163:.163:.163)) s2/c3/z s2/c5/i1 (.185:.185:.185) (.173:.173:.173)) s2/c4/z s2/c5/i2 (.146:.146:.146) (.137:.137:.137)) s2/c5/z s2/c6/d (.189:.189:.189) (.176:.176:.176)) s2/c6/q P4/i (.169:.169:.169) (.155:.155:.155)) s2/c6/qn P5/i (.187:.187:.187) (.174:.174:.174))
s1/c1/i
34. ))) 35. // The intrinsic delays of each cell used in the design. Equivalent to gate delays. 36. (CELL 37. (CELLTYPE "INV") 38. (INSTANCE s1/c1) 39. (DELAY 40. (ABSOLUTE 41. (IOPATH i z (.323:.323:.323) (.311:.311:.311)) 42. ))) 43. (CELL 44. (CELLTYPE "DFF") 45. (INSTANCE s1/c2) 46. (DELAY 47. (ABSOLUTE 48. (IOPATH clk q (.417:.417:.417) (.404:.404:.404)) 49. )) 50. (TIMINGCHECK 51. (SETUP D (posedge clk) (.260)) 52. (HOLD D (posedge clk) (.000)) 53. (WIDTH (negedge clk) (1.60)) 54. (WIDTH (posedge clk) (1.73)) 55. )) 56. (CELL
79
The delay times are specied as triplets. The rst group of three numbers is the rise time. The second is the fall time. The three numbers, separated by colons, are supposed to represent (minimum:typical:maximum) delays; however, many tools write only one case at a time. In the SDF le example above, all three numbers in the parentheses are the same. The header species that they are the typical case. Completely new les would need to be written for the minimum and maximum cases. The time scale, line 13, is also given in the header as nanoseconds. For this process, most of the gates have a propagation delay of
80
Timing in ASICs
Chap. 3
between 0.250 and 0.45ns. In all cases for this example, the delay times are declared as absolute. If the delays had been incremental, the nal delay would be the sum of a base time and the increment specied in the SDF le. In this case, there are no base time and no increments. The timing number states how long it takes for a signal to propagate through a line or gate.
3.2.12.1 INTERCONNECT
The interconnect command is used to specify interconnect delay. After synthesis, but before routing, the interconnect delays are estimates based on wire-load models. After place and route, the exact dimensions of each line are extracted and the RC delay calculated. The interconnect statement on line 20 describes the delay from the pad input P1 to the input of the inverter C1. The signal propagation delay is 0.163ns for a rising edge and 0.147ns for a falling edge. The format for the statement is:
The ioport statement species input to output cell delays. The path can be from any input/ioput port to any legal output/ioput port. Lines 35 through 41 describe the propagation delay through the inverter C1. The propagation delay of a rising edge is 0.323ns and a falling edge is 0.311ns. The format for the statement is:
3.2.12.2 IO PATH
(IOPATH input_port output_port (rise times) (fall times))
These statements are timing checks of sequential devices like ip-op or latches. The timing checks must be specied in the TIMINGCHECK part of the cell denition. The purpose of each timing check matches its name. The denition of the D ip-op (DFF), lines 42 through 54, describes the clk to q propagation delay, on line 47, as 0.417ns for a rising edge and 0.404ns for a falling edge. It also provides timing checks for the data setup and hold times on lines 50 and 51. Minimum pulse widths are also specied for the clock on lines 52 and 53. Line 52 requires the clock to be low for at least 1.6ns. The minimum
3.2.12.3 SETUP, HOLD, SETUPHOLD, WIDTH, PERIOD
81
clock high time of 1.73ns is given on line 53. The formats for the timing check statements are:
(TIMINGCHECK (SETUP data_signal reference_signal (time)) (HOLD data_signal reference_signal (time)) (WIDTH (posedge/negedge signal) (time)) (SETUPHOLD data_signal reference_signal (setup time) (hold time)) (PERIOD (posedge/negedge signal) (period)) )
The oorplanner sends line resistance and load scripts to the synthesis tool to make new wire-load models. The format is shown in Example 3.16.
Example 3.16
Set_resistance 0.1569 "aout" Set_load 0.673 "aout"
82
Timing in ASICs
Chap. 3
3. Have minimum and maximum delays been set for all asynchronous paths? As mentioned earlier, synchronous designs are better for the synthesis design methodology. If a path simply cannot be made synchronous, be sure that both minimum and maximum delays are specied to set boundaries on its operation. 4. Do the timing violations require a change in architecture? Are the timing violations so egregious they cannot be solved with minor xes or more iterations through the design cycle? Look at timing violations with an eye on architecture. Always ask if an architectural change would make many timing problems go away. 5. Can minor timing problems be solved with incremental compiles? At times, the synthesis, oorplan, place, and route cycle can be like the golpher game at the arcadeafter hitting one golpher over the head with the bat, another three pop up. Sometimes doing another synthesis run to solve a minor problem can result in problems in several other paths. When the timing is close, use the incremental compile option to iron out any remaining problems.
83
84
Timing in ASICs
Chap. 3
Such cases will be rare; however, if they exist, it is worth the effort to manually verify the timing of a few lines. If the delay from the manual calculation is no more accurate than the delay already reported, there is no need to individually model any more lines. There are extraction tools capable of determining the width and length of any line. If such tools are available, use them to get the information necessary to calculate the line resistance and capacitance as shown below. Except at ultrahigh frequencies, internal routing lines do not exhibit inductive characteristics, so inductance can safely be ignored in the model.
where is the dielectric constant of the insulator (SiO2) between the trace and the substrate. A value for permittivity of free space, 0, is 8.85e-6 pf/um. The permittivity of SiO2 is 3.9*0 or 3.45e5 pf/um.
85
metal 2
C2 C1
met al 1
C3
C4
C5 C6 C3 - Plate capacitor between M2 and substrate C4 - Fringe capacitor between M2 and substrate C5 - Plate capacitor between M1 and substrate Substrate C6 - Fringe capacitor between M1 and substrate
The fringe capacitance is more difcult to model, but a close approximation is to convert each edge of the trace to a half cylinder then combine them into a separate, complete cylinder. Figure 3.18 diagrams the conversion. The cylinder represents the capacitance contributed by both edges and replaces the 2Cfringe term in equation 3.1. The formula to calculate the capacitance of a cylindrical conductor is:
2 d Ccylinder = --------------------------------------------------------------------------------------ln(1 + {2D/d}{1 + sqrt[1 + (d/D)]}) (Eq. 3.3)
where d = diameter of cylinder D = distance from the substrate = dielectric constant of SiO2
86
Timing in ASICs
Chap. 3
T L
W W
End View
Side View
Notice in Figure 3.18, the width of the rectangle is decreased by T/2 for the calculation of the plate capacitance. It would seem that its width should decrease by T since a T/2 slice was taken off each side; however, the cylinder represents all the edge capacitance plus a little bit of plate capacitance, so the rectangle width is reduced by a lesser amount. Refer to [GD85] for more in-depth information. Many design rule specications include values for both plate and fringe capacitance which are expressed as pf/um^2 and pf/um. If the values are not already available, calculate the plate capacitance as pf/um^2 and fringe capacitance as pf/um as follows:
Cp = --D 2 Cc = -------------------------------------------------------------------------------------ln(1 + {2D/T}{1 + sqrt[1 + (T/D)]}) (Eq. 3.4)
(Eq. 3.5)
87
1/2T
1/2T
W W-T
W - T/2
T/2
Substrate
Minimum line widths and typical trace thickness are listed in Table 3 .2 for an 0.8um process. Typical capacitance values for a
88
Timing in ASICs
Chap. 3
triple-layer metal process are given in Table 3.3. The dielectric constant for SiO2 is also given in Equation 3.7:
SiO2 = 3.45e5 pf/um (Eq. 3.7)
The capacitances of several long lines for all of the above capacitance combinations are given in Table 3.4 below. The width is assumed to be the minimum width for the material listed.
Table 3.2 Typical Metal Widths for 0.8um Process T (um) Poly Metal 1 Metal 2 Metal 3 0.3 0.4 0.8 1.0 Minimum Width (um) 0.8 1.4 1.4 1.6
Table 3.3 Typical Capacitance Values for 0.8um Process D (um) Poly to Substrate Metal 1 to Poly Metal 1 to Substrate Metal 1 to Diffusion Metal 2 to Metal 1 Metal 2 to Poly Metal 2 to Substrate Metal 2 to Diffusion Metal 3 to Metal 2 Metal 3 to Metal 1 Metal 3 to Poly Metal 3 to Substrate Metal 3 to Diffusion 0.47 0.52 0.97 0.52 0.45 1.05 1.37 1.05 0.45 1.3 1.9 2.22 1.9 Cp (e-4 pf/um^2) 0.734 0.664 0.356 0.664 0.767 0.329 0.252 0.329 0.767 0.265 0.182 0.155 0.182 Cc (e-4 pf/ um) 1.03 1.11 0.884 1.11 1.56 1.11 1.0 1.11 1.72 1.11 0.963 0.911 0.963
89
Table 3.4 Capacitance of Long Lines Line Length (um) Total Line Capacitance (pf) Poly to Substrate Metal 1 to Poly Metal 1 to Substrate Metal 1 to Diffusion Metal 2 to Metal 1 Metal 2 to Poly Metal 2 to Substrate Metal 2 to Diffusion Metal 3 to Metal 2 Metal 3 to Metal 1 Metal 3 to Poly Metal 3 to Substrate Metal 3 to Diffusion 500 0.076 0.095 0.066 0.095 0.117 0.072 0.063 0.072 0.128 0.07 0.058 0.054 0.058 1500 0.227 0.286 0.197 0.286 0.35 0.215 0.188 0.215 0.385 0.21 0.174 0.162 0.174 3000 0.453 0.572 0.393 0.572 0.699 0.43 0.376 0.43 0.77 0.42 0.349 0.325 0.349 6000 0.906 1.14 0.787 1.14 1.4 0.86 0.751 0.86 1.54 0.841 0.698 0.649 0.698
Table 3.4 shows the capacitance of a line over a single material. For example, in the metal-2-to-substrate capacitance, it is assumed that there are no poly or metal-1 lines between the substrate and the metal 2. The amount of capacitance depends on the amount of area overlap and which layers are involved. Assuming that a line runs over a single material is not an accurate assumption. Routing between cells leaves myriad lines crossing one another, over transistors and the substrate. A good extraction tool can determine exactly what lies under a line; however, if the line crosses lots of other layers, making a capacitance model can be difcult. If the tool can report only the width and length of a trace, but not the layers underneath, the capacitance value for the trace to the substrate may have to be used. If a quick visual check reveals the line is over a different layer some part of the time, the model accuracy can be improved.
90
Timing in ASICs
Chap. 3
For example, if the line is a very long metal-2 line with half of it over metal 1, use the metal-2-to-metal-1 capacitance value for 50% of the traces area and the metal-2-to-substrate value for the rest. Also note that the capacitance scales linearly with the line length. A table, like Table 3.4, which contains relatively few entries, can be used to determine the capacitance of a line of any length by extrapolation.
I
9 8 7 6
10
L
3 2 1
5 4
(a)
W I
W L W
(b)
91
marked off along the direction of the ow of current. The line shown in 3.19b has 0.33 squares. Each type of material used in semiconductors has a characteristic resistance per square. The resistance of a line is calculated by nding the total number of squares and multiplying by the ohms per square value. Resistance values for each layer are given in the process design rules. Typical resistance values for a 0.8um process are given in Table 3.5. The resistances of the lines drawn in Figure 3.19 for different materials are given in Table 3.6. A look at the resistance of the different materials reveals why lines laid out in polysilicon need additional attention. The capacitance of poly to substrate is about the same as the capacitance between metal 2 and metal 3 or metal 1 and metal 2; however, its resistance is 160 times greater than the resistance of metal 1 and 400 times that of metal 2 and metal 3. Lines of polysilicon have a high RC delay. If the CAD tools do not account for its RC characteristics, the simulation would be highly inaccurate. Fortunately, all
Table 3.5 Typical Material Resistance in a 0.8um Process Ohms/square Polysilicon Metal 1 Metal 2 Metal 3 40 0.25 0.1 0.1
Table 3.6 Line resistances for Figure 3.19 3.19a (ohms) Polysilicon Metal 1 Metal 2 Metal 3 400 2.5 1.0 1.0 3.19b (ohms) 13.3 0.083 0.033 0.033
92
Timing in ASICs
Chap. 3
routing in a standard cell process is done in metal. If any hand edits are made or if there are any deviations from a previously proven ow, check for poly routing. Special cells, that are more custom in nature, such as memory decoders, select lines in memories, or decoded buses may use poly as a routing layer to achieve higher density. If the timing of drop-in cells is not provided to include in the RTL model as described in section 3.2, do not neglect the RC delay of lines when characterizing the blocks timing.
m Seg nt me L 1/3
ent L 1/3
Seg Seg me nt
1/3
Fig. 3.20 Distributed RC Model Provides Greater Accuracy than Lumped Model
93
accurate to within 3% of a transmission line model. Simulations show that the 10%-90% delay of the distributed line approaches the RC time constant as the number of RC segments in the simulation increase [Wilnai71] and [Chiprout98]. The RC delay for lines of minimum width and various lengths are listed in Table 3.7. The capacitance in each case is that of the line to the substrate. Polysilicon is the highest resistance routing material, so it results in the highest delays. Very few lines, if any, will be run in polysilicon, but those that are need a closer look. The delays are easily reduced by increasing the line width which in turn decreases the resistance. Although the capacitance also increases, the signicant decrease in resistance results in lower propagation delays. Note that the delays in metal are minimal. They may be a bit higher if the line runs over closer layers like poly or other metal lines; however, for the most part the delay of a metal trace will not affect performance signicantly. If the circuit is designed right on the edge of the process capabilities, long metal lines may need closer scrutiny. If there are several long lines on the device, simulate the distributed RC model of one of them to see if all need to be modeled more accurately.
94
Timing in ASICs
Chap. 3
takes care of what is known as the front-end part of the design which consists of the functional specication through to postsynthesis gate-level simulations. The vendors responsibilities are to supply the technology library and to perform the layout. Once the layout is done, the vendor provides the designer with an extracted SDF le for nal verication. If the extracted design passes nal verication, there is a high probability that the design will work when fabricated. A checklist details which tasks are to be done by whom and how to ensure, or verify, that the task has been done properly. Most checklists cover at least the following items.
95
lines that help the synthesis tool better utilize the library, they should be followed.
96
Timing in ASICs
Chap. 3
nology library and are accurate. The delays are provided from the synthesis tool via a standard delay format (SDF) le.
3.4.8 Floorplanning
Some designers have the tools and capabilities to perform their own oorplanning. Most do not and the task falls to the vendor. Floorplanning takes information from synthesis to group the cells to meet the timing performance. It feeds back more accurate wire-load models to the synthesis tool and it provides the framework for place and route.
97
3.4.12 Testing
Once the device is fabricated, the vendor does some limited tests to ensure the process met specications. Production uses the vectors developed by the designer.
Page 98 To Be Blank