Checking The Timing Between Asynchronous Clock Group Paths
Checking The Timing Between Asynchronous Clock Group Paths
Introduction
In static timing analysis, slack is the ultimate metric for determining whether a
timing path meets the timing constraints placed on the design. Slack is the result
of adding several other pieces of timing information such as required time, arrival
time, data-path delay, setup or hold time, timing margins, multicycle path
adjustment, and cycle adjustment. While some of these elements are self-
explanatory, others are frequently a source of confusion among those trying to
understand static timing analysis.
This application note explains the elements that contribute to slack in the Magma
central timing engine. In particular, this document focuses on the less intuitive
ideas, like multicycle path adjustment and cycle adjustment. This application note
also addresses time borrowing issues in latch-based designs.
Timing Paths
The Magma timer abstracts a gate-level netlist into a series of timing nodes.
These timing nodes are connected by timing arcs that describe how the nodes
relate. The timing nodes and the arcs that connect them are the building blocks
that compose a larger construction called a timing path. Clock nodes make up
clock paths, and data nodes make up data paths. This document deals primarily
with data paths, because these are the paths where slack is calculated.
Every timing path is defined by a startpoint, an endpoint, and the timing nodes
between them. A startpoint is normally either a primary input for the design or the
output of a register cell. An endpoint is normally either a primary output for the
design or the input to a register cell. Combinational logic makes up the rest of the
timing path between the data startpoint and endpoint.
Multicycle Paths
A dynamic timing analyzer feeds a design a vector test set of many possible
input combinations. These vectors propagate through the design over several
clock cycles. The output that result from these input vectors is tested for
correctness.
Unlike a dynamic timing analyzer, a static timing analyzer does not require an
input set of test vectors. Even the clock is abstracted down to a single cycle. As a
result, the default behavior of the Magma timer, like any other static timing
analyzer, is to assume that data takes one clock cycle to propagate through a
timing path.
Harmonic Clocks
Figure 1 introduces an example in which two clocks exist in a design—one with a
period of 6 nanoseconds and the other with a period of 8 nanoseconds. Because
these clocks have different frequencies, their timing relationship with each other
changes. So, for example, the relationship from the rising edge of the 6-
nanosecond clock to the rising edge of the 8-nanosecond clock changes as the
clocks go in and out of sync with each other.
The Magma timer calculates the worst-case possibility resulting from this
phenomenon using the least common multiple (LCM) of the periods of the two
clock cycles. The LCM represents the point when the cyclic relationship between
the two clocks repeats. In this example, the LCM is 24 nanoseconds, because
the clocks have matching rising edges at 0 nanoseconds, and again at 24
nanoseconds.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
If two clocks do not have a harmonic integral relationship with each other, the
calculated LCM can grow very large, because it takes the clocks many cycles to
harmonize again. The LCM of two clocks is calculated within a certain tolerance.
If you are unable to harmonize your clocks, one solution is to use the config
timing clock tolerance <time> command to change the tolerance, which
defaults to 10 picoseconds.
Note: Increasing the LCM threshold also introduces inaccuracies into the
calculation, so exercise caution.
The setup relationship is the shortest time, greater than zero, from the launching
clock edge to the next receiving clock edge. In Figure 2, four checks must be
made to determine the setup relationship. The results of these checks, which you
can verify by examining Figure 2, are 8 – 3 = 5 nanoseconds, 16 – 9 = 7
nanoseconds, 16 – 15 = 1 nanosecond, and 24 – 21 = 3 nanosecond. The
shortest check is 1 nanosecond, so this is the setup relationship between these
two clocks—from the falling edge of the first to the rising edge of the second.
The setup relationship shows you how much budgeted time the data has to travel
from the launching to the receiving register. Factors, such as data-path delay and
setup time for the receiving register, must be subtracted from this value to
determine whether timing is met on that path. These topics are discussed later in
this document. Also note that the setup relationship is the shortest time from the
launching clock edge to the next receiving clock edge, because this is the worst-
case possibility.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Understanding Slack
You use the report timing path command to view timing paths. By default, this
report returns the path with the worst slack in the design. The report gives
information about the timing paths for the launching clock, receiving clock, and
the data path. It also gives a breakdown of how slack is calculated for that
particular data path. Various options allow you to look at multiple paths with a
variety of starting, ending, and intermediate timing nodes in late mode, early
mode, or both.
Start reg2/L2
End scan3/D
Reference scan3/E
Path slack 1445p
The output gives details about the clocks that launch and receive the data signal,
referred to as the starting clock and the reference clock, respectively. This
assumes that the path is register bound, meaning that it begins and ends at a
register. If the path is not register bound, either the startpoint or the endpoint of
the path is a primary I/O In such a case, the path report does not contain
information for the starting clock and reference clock, because the information
does not exist.
Dissect the slack calculation portion of the report, given the following timing
constraints:
You have a two-clock design—one with a period of 2 nanoseconds and the other
with a period of 3 nanoseconds. A multicycle relationship exists between these
two clock domains, so any data launched by one clock and captured by the other
has two cycles to traverse the data path. 150 picoseconds are specified for the
PLL generating clk2. Also, 200 picoseconds of network latency and 300
picoseconds of source latency are specified for both clocks. I/O timing
constraints are omitted from this list, because you are dealing only with register
bound paths.
Note: This discussion assumes that you are dealing with late mode timing
analysis. Find a brief discussion of early mode timing analysis later in
this application note.
Slack
In static timing analysis, slack indicates whether timing is met along a timing
path. A positive slack means that the signal can get from the startpoint to the
endpoint of the timing path fast enough for the circuit to operate correctly. A
negative slack means that the data signal is unable to traverse the combinational
logic between the startpoint and the endpoint of the timing path fast enough to
ensure correct circuit operation.
In late mode analysis, slack is the difference between the required time and the
arrival time for the timing path. The time that a signal needs to arrive at the
endpoint of the path to ensure that timing is met is called the required time. The
time that the signal actually arrives at the endpoint is called the arrival time. I
In the path report, required time is called end-of-path required time and arrival
time is called end-of-path arrival time. This is to differentiate them from the
starting arrival time and reference arrival time, which are discussed in the next
section.
Because slack is the required time minus the arrival time, a negative slack
indicates that the signal arrives at the endpoint later than the time it needs to be
there, and vice-versa for positive slack. Next, consider the timing components
that contribute to the required time and the arrival time.
Clock source latency indicates the clock signal delay before arriving on the chip.
Clock network latency indicates the delay in the clock path between the boundary
of the chip and the clock pin on the register. Added together, these two
components give an indication of the delay in the clock tree from the PLL to the
capturing register’s clock pin, assuming no clock jitter.
Clock source and network latency, when you specify them as timing constraints,
are your best guess at what the actual source and network latency will be in the
circuit. Use these values only in ideal clock mode. When the Magma software
establishes some information about the physical layout of the clock tree, these
values are ignored as the Magma timer switches to computed clock mode. In
computed clock mode, the analyzer uses placement and routing information to
estimate the delays in the clock trees.
Clock phase is the third contributor to reference arrival time. In static timing
analysis, all clock information is abstracted down to a single cycle. A clock
constraint is declared in relation to an absolute time zero, and the clock phase is
relative to this time origin. For example:
This clock constraint defines a clock named clk with a 6-nanosecond period. By
default, clocks are noninverted, so this clock has its rising edge event at zero and
its falling edge event at 3 nanoseconds. Because the static timing analyzer
considers only a single cycle of the clock, the event associated with the second
rising edge is not at 6 nanoseconds, but rather at zero, like the first one.
Likewise, the second falling edge event is at 3 nanoseconds. This is the case for
all rising and falling edge events. This can seem confusing at first, but it is an
efficient method of handling the clocks without much overhead. This method
requires some compensation in the form of a cycle adjustment, which is
discussed later in this document.
This clock constraint defines a clock names clksh that is identical to clk in the
previous example, but shifted forward in time by one nanosecond. For this clock,
all rising edge events occur at 1 nanosecond and all falling edge events occur at
4 nanoseconds.
As a third example, consider the line directly below the reference arrival time in
Figure 1:
The cycle adjustment is discussed later. For now, consider the information in the
parentheses. clk2:F#1 means that the launching clock is clk2 and that the edge
of the clock that triggers the flow of data down the timing path is the first falling
edge. Likewise, clk1:R#2 means that the reference clock is clk1 and that the
second rising edge of that clock captures the data at the end of the timing path.
So, clk1 is the reference clock for this timing path. Because clk1 is noninverted
and not shifted in time, the rising edge events occur at time zero. There is no
modifier added to the reference arrival time resulting from the clock phase, but
the reference arrival time in the report timing path is listed as 500 picoseconds.
This is because 200 picoseconds of network latency and 300 picoseconds of
source latency are specified for clk1, giving a total of 500 picoseconds for the
reference arrival time.
The starting arrival time, like the reference arrival time, contains information
about the source delay and phase for the launching clock of the timing path.
Unlike reference arrival time, clock network latency is not included in the figure.
Instead, clock network latency for the launching clock is specified separately on
the next line in the path report:
This has the same overall result as reference arrival time, because the value for
clock path delay (network latency) is added to the starting arrival time during the
process of computed end-of-path arrival time. The starting arrival time, because it
does not include network latency, can be considered the time it takes for the
launching clock edge to travel from the PLL to the edge of the chip or block. The
delay from the edge of the chip or block to the clock pin of the starting register is
then added to that value as clock path delay.
Using Example 1, you can see that the total starting arrival time is 1800
picoseconds. Here is how this number is generated. First, find out which clock
launches the data. Looking in the parenthesis after the cycle delay, you see that
the falling edge of clk2 launches the data. There is a source latency of 300
picoseconds specified for both clocks, which accounts for some of the starting
arrival time. The other 1500 picoseconds is a clock phase modifier. The clk2 is
noninverted and not shifted in time, so its rising edge occurs at zero. In this case,
though, the falling edge of the clock triggers the register. Because clk2 has a 50
percent duty cycle and a period of 3 nanoseconds, its falling edge must occur at
1.5 nanoseconds or 1500 picoseconds. This accounts for the remainder of the
starting arrival time.
Likewise, you must modify the end-of-path arrival time to calculate the starting
arrival time, because the end-of-path arrival time is when the data actually gets to
the end of the timing path. You are concerned with what time it gets launched
down the path (the starting arrival time) and how long it takes to get there. You
modify the starting arrival time using information, such as data-path delay, to
determine the actual end-of-path arrival time.
Clock source latency and the phase information have already been taken into
account as starting arrival time. To calculate end-of-path arrival time, add the
clock network latency and the delay along the timing path to the starting arrival
time.
The clock network latency is added to the starting arrival time as clock path
delay. Any delay that is incurred by traversing combinational logic between the
launching and capturing registers is combined and added to the starting arrival
time as data path delay.
Example 1 shows that the end-of-path arrival time is determined by adding the
starting arrival time, the clock path delay, and the data path delay. This gives the
Magma timer an estimate for when the data actually arrives at the endpoint of the
timing path. The last step is to subtract this number from the required time to
determine whether slack is positive or negative and, thus, whether the path
meets timing.
As discussed earlier, the clock latency and phase information is combined and
called reference arrival time. Jitter, skew, and setup time are subtracted from this
value, making the final required time more stringent than the time the clock
arrives at the capturing register’s clock pin (reference arrival time). The required
time is also made longer by adding a multicycle path adjustment to it, because
multicycle path constraints allow the data extra time to arrive at the path
endpoint. The cycle adjustment is a bookkeeping factor added to reference
arrival time to account for the way clock phase information is handled.
You specify jitter using the force timing margin setup <time> -from <clock> -
jitter command. Always specify clock jitter as a setup check, because setup
checks are performed from one clock cycle to the next. Hold checks are
performed on the same clock edge, so jitter has no effect. When a jitter constraint
is defined as a hold check, this further constrains the hold check requirements,
which might be desirable if you want to ensure that all hold checks are met with a
nonzero time as a safety margin.
• Interclock skew
Interclock skew is the largest difference in time for two different clocks
arriving at their respective registers. Specify interclock skew using two
commands: force timing margin setup <time> -from <clock1> -to
<clock2> and force timing margin setup <time> -from <clock2> -to
<clock1>. You can also apply both of these forces for hold checks.
You specify clock source latency, network latency, and skew as an estimate
when the timer is in ideal mode. After the initial routing of the clock tree
performed by run route clock in fix clock, the timer has enough physical data
about the clock tree to switch into computed mode. At this point, the user-
specified forces for the clocks are discarded and replaced with more accurate
estimates. The two exceptions to this are the actual clock definition with force
timing clock and the clock jitter defined with force timing margin setup –jitter.
Both exceptions are because the timer has no information about the PLL external
to the chip or block that generates the clock.
The third value subtracted from the reference arrival time is the setup time for the
capturing register. This value is obtained from the library definitions of the
register cells, so it is technology dependent.
In Example 1, the timing constraints indicate a two-cycle path for all data
traveling between the two clock domains. The multicycle path adjustment for this
report is 2000 picoseconds, which is the period of the receiving clock, clk1.
Because this timing path is allotted two cycles (or one extra cycle), one full cycle
of time is added to the reference arrival time in the form of a multicycle
adjustment.
The Magma timer uses clk1 instead of clk2 to determine the period added in the
multicycle adjustment because, by default, the period used for multicycle
adjustment is taken from the reference clock. You control this issue. If you know
that the multicycle adjustment for a path should be based on the launching clock,
you can use the force timing multicycle -reference command. The default value
for the -reference option is end.
Cycle Adjustment
Probably the most confusing aspect of slack calculation is the cycle adjustment
added to reference arrival time in order to calculate end-of-path required time.
Cycle adjustment does not have any physical or logical counterpart in terms of
design. Instead, it is a bookkeeping measure that results from the way in which
static timing is done. Static timing analysis considers only one cycle of a clock.
This being the case, each successive rising edge for a single clock has the same
time associated with it, even though each edge occurs one full clock period after
the previous one. Complete dynamic timing information is lost as a result of this
abstraction, but easily regained using cycle adjustment. Again, consider the
same path report in Example 1 to see where the cycle adjustment figure comes
from.
The cycle adjustment line in the path report also contains information about the
launching and receiving clock edges for the currently reported timing path. For
example:
This line means that the timing path is launched by the first falling edge of clk2
and captured by the second rising edge of clk1. Why are edge numbers reported
if the static timer only really considers a single cycle of the clock? These edge
numbers are the results of the setup or hold relationships between the two
clocks. In this case, the timer must examine enough cycles of both clocks to
determine the worst-case time for setup and hold relationships. Otherwise, the
timer is not concerned with multiple clock periods.
The edge numbers are determined by calculating the LCM for these two clock
periods and then finding the setup relationship from clk2 falling to clk1 rising. As
illustrated in Figure 3, the LCM for these two clocks is 2 x 3 = 6 nanoseconds.
The setup relationship from clk2 falling to clk1 rising is 500 picoseconds,
between the first falling edge of clk2 and the second rising edge of clk1. These
edge numbers correspond to the ones reported in the cycle adjustment line of the
report in Example 1.
Figure 3 shows that in the worst case, data has 500 picoseconds to get from the
startpoint to the endpoint of the timing path. But, consider the phase of each
clock. The clk2 launches the data with its falling edge, which occurs at 1500
picoseconds. The clk1 receives the data with its rising edge, which occurs at 0
picoseconds. This means that ideally (with no clock delay, data delay, setup time,
multicycle adjustment, and so on), data has 0 – 1500 = -1500 picoseconds to
traverse the timing path. This does not make sense and is a result of abstracting
away the dynamic clock information.
0 1 2 3 4 5 6
By examining the setup relationship in Figure 3, you know that the data should
ideally have 500 picoseconds to traverse the timing path. This is where cycle
adjustment occurs. If you add 2000 picoseconds to the reference arrival time,
that number minus 1500 picoseconds becomes 500 picoseconds. Therefore,
cycle adjustment is assigned the value of 2000 picoseconds.
A basic formula for calculating cycle adjustment (in late mode) is: