Troubleshooting Simulation Failures
Troubleshooting Simulation Failures
If you need to regenerate patterns while debugging them, you should adjust the tester cycle period to
avalue that is easy to work with and to troubleshoot. For example, a tester cycle period of 1000ns is
easier to work with than one of 240ns. If you have multiple timing sets defined, try to make them all
have identical tester cycle periods.
The parallel Verilog testbench does not present the same simulation times for the capture procedures
as the serial testbench. If you want to have the parallel scan load have the identical timing as the serial
scan load would take, then define tmax_serial_timing during the Verilog compilation/simulation.
This might be done by placing a `define statement within the Verilog testbench, or by adding a
+define+tmax_serial_timing argument to the Verilog command line options.
Most users simulate 5 to 10 patterns using serial scan loading with full timing and the balance of the
patterns using parallel scan loading in zero delay or typical timing mode. If you are using unit delay
timing you should try zero delay. If zero delay is failing then try typical timing. If that fails try full
annotated timing. To make this process more efficient see the topic at the end of this topic on isolating
a failing pattern.
Unless you have explicitly suppressed the creation of the chain test by use of set atpg -chain
off then the first pattern (pattern 0) in the pattern block is a chain test pattern. By default, this pattern
shifts a repeating value of 0011... into each scan chain, there is no capture clock in this pattern so the
values are not disturbed, and the same values (adjusted for scan-in to scan out inversions) is expected
to shift out again during the scan unload of pattern 1. If you experience simulation mismatches on
pattern 0 (which occurs during pattern 1 scan loading) then you have a fundamental problem with your
design because it cannot successfully shift a bit from scan input to scan output. Look for a clock timing
problem in the vicinity of the scan cell which fails. You might find one of the alternative scan chain
patterns such as "1000" easier to debug, these are selected by the -chain_test option of the
set_atpg command before generating patterns. See the topic at the end of this topic for interpreting
failure messages and translating them into scan cell instance names.
Another possibility when the scan chain tests fail is that you are using a pattern translator from one of
the four native pattern formats created by TestMAX ATPG (STIL, WGL, VHDL) and your pattern
translator is introducing an error. This can happen when the pattern translator was created for a
different ATPG tool and then used with the TestMAX ATPG output. If you are using WGL through a
translator to other formats carefully study the inversion control as well as the bidirectional port mapping
controls of the set_wgl command. Also check the FAQ section of this online help for vendor specific
WGL setup and configuration advice.
Stick to basics and work on getting the chain tests to pass before moving on to failures in other
patterns.
Does your simulator indicate any setup or hold or other timing violation before the ATPG pattern
mismatch? If so, you can have some timing problems to correct that are outside of the scope of ATPG
patterns. Investigate the areas experiencing the timing problems and correct them. The
add_capture_mask command can be helpful for disabling the capture of any expected values at
state elements with setup/hold timing problems. If the state element is also a part of a scan chain, then
the add_cell_constraints command can be used to mask observed values and control loaded
values into that state element.
If you are experiencing simulation timing problems then you might benefit from switching to zero delay
simulation mode. This is not always successful, though, and having ATPG pattern mismatches in zero
delay mode is not conclusive proof of bad patterns.
If you feel comfortable, you can also temporarily edit the SDF timing annotation file to change the setup
or hold limit to zero or a very small number and then re-simulate. If these simulations now pass, this is
an indication that the ATPG simulation mismatch is directly linked to a timing problem. The ATPG
patterns show mismatches until the timing problems are corrected.
Unless you have successfully used your current library before, you should suspect the ATPG models in
use do not match the simulation library models. Even if you have used the library before, you might be
using different library cells with the current design than with the previous design. When an ATPG model
produces a different expected answer (other than X) than the simulation model, result in ATPG patterns
which fail in simulation.
One sanity check that you can perform to check TestMAX ATPG patterns is to run a good machine
simulation on the patterns both with and without the -sequential option.
The second form of the command, with the -sequential option actually uses a different simulation
engine whose primary intended function is the simulation of non-ATPG functional patterns. This
simulation engine has many limitations, and it is not uncommon for it to report mismatches when it is
used on ATPG patterns. However, if use of the -sequential option does show a difference, then it is
best to have Synopsys evaluate whether that difference is expected or a bug.
In summary, mismatches using run_simulation are not expected and considered a high probability
indicator of a bug. In contrast, mismatches using the -sequential option are often normal and only
occasionally an indication of a problem; however, it is best to submit a testcase to Synopsys for review.
If there are no mismatches reported by either form of the run_simulation command, this does not
rule out the possibility that there is a TestMAX ATPG bug. If you have exhausted all reasonable
possibilities then it is time to send in a testcase and the referenced simulation libraries, timing files,
control files, and so forth.
Important Note: Do not add the run_simulation step to your generic command scripts, as it is not
a necessary step in the standard ATPG flow. Performing a run_simulation command at the end of
an ATPG run wastes CPU time because the simulation has already been done during the vector
generation process; there is no need to repeat this simulation. Use of the run_simulation command
is only recommended when debugging patterns is necessary or when working with functional patterns.
If just a few patterns are failing you can gain some useful clues about those patterns by performing a
report_patterns -all -types command. Check to see whether the patterns that fail are
associated with the same clock or whether they are of a particular type, such as Basic Scan with
clock_on_measures (COM) for example.
Finding a pattern to the failures can give you some potential workarounds. If you find the clock-on
measures are failing you could return to DRC mode and disable them and then regenerate patterns.
For some clocks, such as asynchronous resets, you could try constraining them to an off value.
If there are just a few patterns failing you could extract and eliminate them from the pattern set by
making use of the -reorder option of the write_patterns command. This option takes as an
argument a file containing a numeric list which defines both the order and pattern numbers of the
patterns to be written. It is a convenient method for dropping selected patterns from the pattern output.
Note, however, that you cannot drop or reorder patterns from within a range of Full-Sequential
patterns. This is because Full-Sequential patterns assume the design is left at the simulation state
caused by the prior Full-Sequential pattern. Any dropping or reordering of Full-Sequential patterns
would lead to simulation failures. and so is not allowed.
7. Are you using parallel patterns? What was the shift count?
Using the -parallel n_shifts with a value of 1 or more when writing patterns assists in loading
the nonscan devices to known states by serially simulating the last N shifts of every parallel scan chain
load. This value is automatically calculated and included in the STIL pattern file. You can overwrite this
value using the write_testbench command, or the stil2verilog command using a configuration
file, or on the VCS compilation command line using the predefine options +tmax_parallel=N, as
documented in the Test Pattern Validation User Guide.
Are you getting the same few scan cell locations failing over and over again? If so, then you might wish
to go back and use the add_cell_constraints OX command to mask off the observe value at that
cell and generate new patterns. There is a drop in test coverage but your new ATPG patterns created
can then pass in simulation.
If you are using Fast-Sequential or Full-Sequential ATPG along with a cell constraint of X, or XX, you
might wish to consider using the add_capture_masks as well. The cell constraint causes the cell to
be loaded to X, but the capture mask is necessary to ensure the cell remains at X for a pattern where
multiple capture clocks might be applied.
9. Did you have DRC violation warnings that would indicate that patterns might fail?
Did you have any N20 violations when reading the library or building the design? If so, there is a risk
that the Verilog simulation model predicts an X when the ATPG model predicts a non-X. This rarely
causes a simulation mismatch but you might be in the unlucky 2% of N20 violations that are the root
cause of the simulation mismatch. It might be worthwhile to perform a library validation if your library
cells. A mismatch during library validation can identify a potential cause of simulation mismatches in
your design.
The presence of certain DRC rule violations, such as C1, C5-C14, and S29 for the Basic-Scan or Fast-
Sequential ATPG algorithms, and C22 and C25 for the Full-Sequential algorithm, can cause TestMAX
ATPG to create patterns that fail in simulation. These warnings should be carefully
investigated/corrected before starting ATPG. If you have not corrected them, an additional warning is
issued at the start of ATPG pattern generation.
There is a -mask option of the set_rules command you can use which attempts to increase the
chances of patterns successfully simulating in exchange for potentially lower test coverage. Generally
this masking is unnecessary for the Basic-Scan and Fast-Sequential ATPG algorithms.
Have you ignored any V18 or V20 violations? If so, there is a risk that this has caused simulation
failures. Consult the online help for the full text of V18 and V20 violations and the risks involved.
Interpreting the Simulation Failure Messages
When a parallel compare fails the bit number listed is the bit in the chain relative to the scan chain
output port and the numbering scheme starts from zero. So, bit 0 is the bit that is connected directly to
the scan output port, bit 1 is the bit one shift clock away from the scan output, and so forth. So the scan
cell bit position can also be thought of as the number of shifts required to move the scan cell data to the
scan output.
The inversion information is often important to understand if you intend to correctly relate the pattern
data to simulated values from the pattern.
The two character inversion code of IN for the master cell indicates the inversion to the internal
sequential modeling element which has been identified as the master by TestMAX ATPG. The first
character is an I if there is an inversion from the scan chain input to the sequential device identified as
the master and an N if there is no inversion. The second character conveys similar information about
the inversion from the master's stored value to the scan chain output. The master is most often inside
of the library cell the user normally sees so the cells pin inversion information is also necessary.
The single character I or N listed for the input pin conveys the inversion from the scan input port to the
library cell's "DI" pin. In the previous example the N indicates there is no inversion. Likewise the single
character I for the output pin "Q" indicates there is an inversion between the value seen on "Q" and the
scan output port.
The previous diagram shows in simplified form how a relationship of IN for the master cell and N and I
for the cell's input and output pin affects how data needs to be interpreted. In this diagram, the SDI pin
represents the top level design scan in port and SDO represents the top level design's scan out port.
There is no inversion in the scan path between SDI port and the library cells "DI" input pin. However,
within the library cell there is inversion both going into and coming out of the sequential element. An
inverter exists between the library cell's output and the top level SDO scan output port, causing an
additional data inversion to be considered.
XTB: Starting parallel simulation of 4 patterns
XTB: Using 0 serial shifts
XTB: Begin parallel scan load for pattern 0 (T=100.00 ns, V=2)
>>> Error during scan pattern 2 (detected during parallel unload of pattern
1)
>>> At T=840.00 ns, V=9, exp=1, got=0, chain c0, pin SO, scan cell 10
>>> At T=840.00 ns, V=9, exp=0, got=1, chain c0, pin SO, scan cell 11
XTB: Simulation of 4 patterns completed with 2 errors (time: 1700.00 ns,
cycles: 17)
Since this is a scan chain unload mismatch the error message indicates expected value at the scan
output was expected 1, but got 0. You need translate this into "expected 0", got "1" at the Q pin
because of the inversion indicated by the "I" on the output pin in the report scan cells output.
If you find yourself in the middle of a simulator debug session you should keep in mind the four different
possible inversion arches shown by the previous diagram. If you are investigating the stored value
within the sequential model of the DFF in the simulator then the top-left arch tells you the relation
between the scan in data and the data to be found in the state element. The top-right arch tells you
whether that value is inverted by the time it appears at the SDO output. The lower-left and lower-right
arch are used when you are looking at the "DI and "Q" pins of the library cell in the simulator.
read_libraries
read_design
run_build
add_clocks, PI constraints, PI equivs
run_drc <original.spf> # with original STIL procedure file
You can essentially rerun any command file you had with the exception that instead of:
add_faults -all
run_atpg -auto_compression
You'll do:
Use binary formats whenever possible to read patterns into TestMAX ATPG. Other pattern formats
such as WGL and STIL have limited features to store all data about the patterns. For example, when
you read a STIL or WGL pattern file back, a fast-sequential pattern might be interpreted as a full-
sequential pattern.
Step #2: Now that the patterns have been read into TestMAX ATPG again you can write them out. For
example, say that your failing pattern is pattern 412 and you want to write out that pattern plus one on
either side in STIL format.
Your output file includes a test_setup procedure, if one was defined in our procedures file, along with
patterns 411, 412, and 413.
Step #1: Get the patterns back in. If you have TestMAX ATPG up and running with original patterns
then proceed to step #2. If you don't you'll need to reestablish the same environment as was used to
create the patterns. You should essentially re-run any command file you had with the exception that
instead of:
add_faults -all
run_atpg # or some other variant
You'll do:
set_patterns -external <file_you_saved_patterns_in>
Step #2: Generate a list of patterns vs. pattern type:
Step #3: Edit the 'reorder.dat' file and delete or comment out the lines corresponding to patterns 103,
412, and 720 as well as the table headings in this report.
Step #4: Write out new pattern file using the -reorder option. The edited file causes the undesired
patterns to be dropped as the patterns are written out.
write_patterns pat.wgl -external -format wgl -reorder reorder.dat
Step #1: Translate the mismatch messages from the simulator into the failure data file format needed
for the run_diagnosis command.
The first column is the failing pattern number. The second column is either the name of the scan output
pin, or in the case of a design with multiple scan groups it is the name of the scan chain. The third
column is the scan cell position. The remaining columns are not required and treated as comment text.
Step #2: Read in the original pattern file into TestMAX ATPG and use the run_diagnosis command:
So this has identified three different physical but logically identical fault sites that are being tested by
one of the failing patterns. This might be helpful information.
However, a word of caution -- this technique requires that all failures be provided starting from the first
failure encountered. You can't just randomly pick a failure from your simulation data and present it to
TestMAX ATPG via a failure file. You must present all of the reported failures in sequence up to and
including the failure you are interested in.
remove_faults -all
set_patterns -external pat_46.bin
read_faults faults_left.dat
run_atpg
write_fault faults_detect_46.dat -class dt -class pt -collapsed
Now you've got a fault list file that contains exactly the faults detected by pattern 46. We wrote the
"collapsed" fault list but you could have also save the "uncollapsed" list which would include all primary
faults and equivalent fault sites detected.
There are a couple of problems with using this information. The first is that your design probably has
more than 18 input pins, 18 outputs, and a scan chain slightly longer than 3 bits. But we need to fit this
onto a page and so had to pick a small example.
The second problem is that you need a reference for which bit in the "force_all_pis" and
"measure_all_pos" is which. This can be determined by using the report_primitives -pis -
pios command for the inputs, and the report_primitives -pos -pios" command for the
outputs. The data presented for the force_all_pis/measure_all_pos data corresponds left-to-right with
the corresponding report_primitives command output from top to bottom.
The scan data is presented left-to-right in the order in which bits are shifted into the scan chain for
"loads" (111), or shifted out for "unloads" (011).
The previous example shows a six-event sequence for a basic-scan pattern. Time 0 involves loading
scan chain c4 and involves shifting of three bits into scan chain c4. Time 1 applies values to all top
level inputs. Time 2 measures all top level outputs. Time 3 applies a pulse to four different clocks
(which must have been declared as PI equivalent). Time 4 is used to apply a master observe
procedure. Finally at Time 5 the expected data from the prior events is unloaded from chain c4. This
again involves shifting three bits out of the scan chain.
To begin we should display the appropriate gates in the GSV window. Next we either issue the
analyze_simulation_data 5 -fast command or use the SETUP button to select a pin data type
of "good machine" with a pattern number of 5. The following example shows the good machine pindata
for pattern 5 on chain c4, bit 1. This is instance = u1/q1_sig_reg_0 as well as bit 2, instance =
u1/q1_sig_reg2_2
An important thing to remember about the "good machine" data is that there are five forms of the data
display for the time in the cycle. There is the time=clock (default), time=preclock, time=postclock, time=
all, and time=LETE. The various forms are discussed below.
For time=clock, the values shown on the schematic represent the simulation results with the capture
clock active and the state elements at their previous states. This is the default display for Good Sim
Data until you adjust it. Think of this as a simulation snapshot at the instant the clocks go active, but
before any data-in to data-out changes of DFF's or DLAT's have occurred.
For time=preclock, the values shown represent the simulation results with the capture clock OFF and
the state elements at their states from the last scan load. When the pattern is a Basic-Scan pattern,
many gates/pins in the design do not have a calculated preclock value and the value is shown as either
'x' or '-' or '?'.
For time=postclock, the values shown represent the simulation results with the capture clock OFF and
the state elements at their newly captured states.
For time=LETE, the values shown on the schematic represent the simulation results with the capture
clock on and the leading edge and level sensitive state elements at their new value to be used by a
trailing edge state element.
For time=all, the values shown on the schematic represent the pre-, active-, and post-clock times as
three characters. A question mark "?" represents data not available. This form is probably the more
natural representation of data for those familiar with logic simulators. The command to display this type
of time value is set_primitive_report -time all. Do not trust the pre-clock time value.
TestMAX ATPG does not simulate the pre-clock value for all gates in the design — only those gates
where there is a need. Generally an "X" value is used as the pre-clock value for all other gates. For
clocks and reset values a non-X value is used. This makes ATPG much faster but debugging
somewhat harder and potentially more confusing.
y g
The analyze_simulation_data command can also be used to read in a Value Change Dump
(VCD) data file collected during Verilog simulation and to display the values graphically, or side-by-side
with Fast-Sequential or Full-Sequential expected values.
See Also
set_primitive_report
report_settings primitive