0% found this document useful (1 vote)

463 views

Streaming Scan Network

This document proposes a Streaming Scan Network (SSN) architecture to efficiently test increasingly complex System-on-Chip (SoC) designs with many cores. SSN allows simultaneous testing of any number of cores using few chip input/output pins. It facilitates short test times by enabling high-speed data distribution across cores and handling imbalances between cores. SSN provides a plug-and-play interface in each core well-suited for tightly integrated designs. The paper compares SSN's test cost and productivity against Intel's Structural Test Fabric.

Uploaded by

Deepak Tiwari

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

463 views

Streaming Scan Network

Uploaded by

Deepak Tiwari

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Streaming Scan Network (SSN):

An Efficient Packetized Data Network for Testing of Complex SoCs

Jean-François Côté, Mark Kassab, Wojciech Janiszewski, Ricardo Rodrigues, Reinhard Meier, Bartosz Kaczmarek,
Peter Orlando, Geir Eide, Janusz Rajski, Glenn Colon-Bonet*, Naveen Mysore*, Ya Yin*, Pankaj Pant**

* **
Mentor, A Siemens Business Intel Corporation Intel Corporation
8005 SW Boeckman Road 4701 Technology Parkway 75 Reed Road
Wilsonville, OR 97070 Fort Collins, CO 80528 Hudson, MA 01749
2020 IEEE International Test Conference (ITC) | 978-1-7281-9113-3/20/$31.00 ©2020 IEEE | DOI: 10.1109/ITC44778.2020.9325233

Abstract—System-on-Chip (SoC) designs are increasingly cores are wrapped with scan and interface control logic. Test
difficult to test using traditional scan access methods without patterns targeting most faults in a core are generated and
incurring inefficient test time, high planning effort, and physical validated at the core level. Subsequently, the patterns from
design/timing closure challenges. The number of cores keeps multiple wrapped cores are retargeted or mapped to the top level.
growing while chip pin counts available for scan remain constant They are often merged with patterns retargeted from other cores
or decline, limiting the ability to drive cores concurrently. With that are tested at the same time if scan access and design
increasingly commonplace tiling and abutment, the scan constraints permit. In addition to retargeting patterns generated
distribution hardware must be placed inside the cores, making for testing the wrapped logic within each core, test pattern
balanced pipelining when broadcasting to identical cores difficult.
generation is also run at the next level up to test peripheral logic
Optimizing test time requires analyzing all the cores and
outside wrapper chains as well as logic at that higher level of
subsequently changing the test hardware in the cores. Internal
shift speed constraints may limit the ability to shift data in and out hierarchy. If this parent level is not the chip level, then those
of the chip at high rates. Differences in pattern counts or scan patterns will also have to be retargeted to the chip level. The
chain lengths between cores tested in parallel can result in padding same test pattern generation and retargeting methodology is
and increased test time. SSN is a bus-based scan data distribution applied recursively regardless of the levels of hierarchy, but the
architecture designed to address all these challenges. It enables planning and implementation of DFT get more complex with
simultaneous testing of any number of cores even with few chip additional levels of hierarchy, especially when using
I/Os. It facilitates short test time by enabling high-speed data conventional scan access methods.
distribution, by efficiently handling imbalances between cores,
and by supporting testing of any number of identical cores with a
The following subsections explain key SoC test challenges
constant cost. It provides a plug-and-play interface in each core inherent with pin-mux scan access, which is commonly used in
that is well suited for abutted tiles, and simplifies scan timing the industry and explained in the referenced papers.
closure. This paper also compares the test cost and
implementation productivity of SSN with those of Intel’s A. SoC Test Challenges: Planning and Layout
Structural Test Fabric. Traditionally, for a group of cores to be tested concurrently,
one of the requirements is that their channel inputs and outputs
Keywords—Design For Test, DFT, SoC Test, Hierarchical Test, must be directly connected to chip-level pins. As the number of
Multiple Identical Cores, Known-Good-Die Testing, Test Time
cores in SoCs grows and the number of chip-level pins available
Reduction, Low Pin Count Test, Scan Distribution Architecture,
Scan Fabric
for scan test remains the same or is reduced, additional groups
of cores and scan access configurations must be created. This
has negative implications on DFT implementation effort, silicon
I. INTRODUCTION area, pattern retargeting complexity, and test time.
With some Integrated Circuits (ICs) growing to billions of Part of hierarchical test planning is to identify early in the
transistors, it is virtually impossible to design, implement, and design flow the number of scan channels used in every core, and
test them flat. A System-on-a-Chip (SoC) is an IC that is the groups of cores which will be tested concurrently in every
comprised of multiple components, referred to as cores. Each scan access configuration. This can result in sub-optimal results
core is typically designed, implemented, and validated since it creates fixed core groupings and forces premature
independently before being integrated with others. As design decisions on channel counts per core before the cores are
complexity has grown, so have the levels of core hierarchy. It is completed and before their compression configurations can be
not uncommon to have lower-level cores integrated into optimized and their pattern counts estimated. Chip-level design
subsystems, which are integrated into chiplets that are then decisions depend on the cores. The cores are finalized too late in
assembled into a chip. the design cycle, and their compression configurations are
As design is done hierarchically to manage complexity, so is influenced by the chip-level core groupings and pin availability.
DFT. In hierarchical test methodologies [1][2][3], scan chains This mutual dependency makes it virtually impossible to
and compression logic [4][5][6] are inserted into every core. The optimize compression for the SoC. As the number of levels of

Regular Paper INTERNATIONAL TEST CONFERENCE 1

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 16,2021 at 18:36:16 UTC from IEEE Xplore. Restrictions apply.
core hierarchy increases, the planning complexity and test accumulating pipelining delay. Routing of individual output
inefficiency also grow. channels from each core instance through the other core
instances can also be complicated due to the fact that all cores
Connecting chip pins to the cores can have physical design are copies of each other. A solution exists where every core
implications. Connecting each pin to different cores in different instance is programmed with a different number of pipeline
test configurations can lead to routing congestion. The pads may stages and different routing for scan output paths, but this
be embedded inside cores in some packaging technologies such introduces complexity and limits the reuse of cores. Designing a
that the connections for one core impact the design of other cores new chip with more core instances requires redesigning the
to which the signals have to be routed, or through which the scan cores to account for differences in pipelining and routing
connections flow. Those connections are also often pipelined, so channels.
timing between those pipeline stages and compression logic
must be carefully designed to achieve high shift speeds and
avoid timing violations. II. PRIOR WORK
Tile-based layout is a relatively recent trend in SoC design To address some of the challenges explained, a few
that is adding further complexity and constraints to DFT companies have developed and published scan access
architectures. In pure tiling layouts, virtually all logic and technologies beyond the traditional pin-mux topologies. They
routing is within the cores and not at the top level. The cores are vary in the scope of the challenges they address and the trade-
designed to abut one another when integrated into the chip such offs they make.
that connections flow from one core to the next. Any A packetized bus-based architecture specifically tailored at
connectivity between cores has to flow through cores that are providing a scalable solution for testing of multiple identical
between them. Logic that is at the top level has to be pushed into core instances was introduced in [7]. It is not a general scan
the cores and designed as part of the cores. access mechanism that can simultaneously test heterogeneous
cores. It supports shifting in the expected data, in addition to
B. SoC Test Challenges: Limited Chip-Level Pins input stimuli, such that on-chip comparison can be done and
When retargeting core-level patterns, limited chip-level pin pass/fail data accumulated and observed. It also allows some
counts can be dealt with by increasing the number of core groups trade-offs between efficiency and diagnostic information.
and test sessions, as long as there are enough chip pins to drive Getting full failure data for diagnosis may require the
at least each core individually. However, there are cases where application of a different pattern set; one that uses a different
simultaneous access to multiple or all cores is necessary, and configuration than the full-rate mode used for high-volume
grouping cores into smaller groups is not an option. One manufacturing. This architecture also has data overhead because
example is Iddq test, where scan data is loaded across the entire every parallel word includes a command opcode in addition to
chip before a relatively lengthy current measurement is taken. the scan data payload. The fact that each parallel word has to
When using scan compression such as Embedded Deterministic include both payload and a command imposes limits on how
Test (EDT) [4], this means there must be enough pins available narrow the bus may be, and imposes additional constraints on
to drive all the EDT channels of the cores concurrently. the bus width and its relation to the core scan channel counts.
The authors subsequently introduced a new architecture [8]
C. SoC Test Challenges: Identical Core Instances that has a different focus: while it maintains a solution for testing
Pattern retargeting in the presence of identical core instances of multiple identical cores, its primary new design objective is
can benefit from generating patterns once, and from the ability to enable better bin packing for retargeted core-level patterns. It
to broadcast the scan inputs from the same top-level pins, does so by providing flexibility in mapping chip-level pins to
reducing both ATPG runtime and pin requirements. There are, core-level scan pins such that there is flexibility in controlling
however, still multiple challenges to be resolved. which cores are tested concurrently. Instead of a bus architecture
as in [7], it uses a flexible mux-based switching network. The
Although broadcast of scan inputs keeps the number of input architecture succeeds in enabling effective dynamic bandwidth
pins constant for any number of identical cores, the outputs are management [9] and late-binding core grouping to minimize
often observed independently to guarantee the same test padding caused by test length differences across cores.
coverage achieved at the core level and to ensure enough However, this architecture incurs some costs. Given the network
observability for diagnosing failing cores. Since at least 1 output provides flexibility in connecting any top-level pin to any core-
channel is needed per core instance, this can limit the number of level scan channel (although there are restrictions on
identical core instances that can be tested concurrently just as combinations of connections), the network can result in
there are similar limitations on heterogeneous core instances. significant routing cost especially in the presence of a large
The second issue is that after scan loading, the capture number of cores. Using a mux-based star network is also less
clocking is usually applied concurrently to all core instances. amenable to connection-by-abutment in tile-based designs
Combined with the broadcast of input scan data, the number of compared to bus-based architectures.
pipeline stages must be equal between a scan input pin and all The Structural Test Fabric (STF) solution [10][11],
the identical core instances it drives. This can be difficult to published by co-authors of this paper, provides a general packet-
achieve in the presence of tiling where no routing or logic may based core access mechanism that works for heterogeneous
exist outside the cores. Signals, including scan inputs, may cores, and has a scalable solution for multiple identical cores. It
propagate across multiple instances of the same core, is flexible in that every parallel word is self-contained, but incurs

Regular Paper INTERNATIONAL TEST CONFERENCE 2

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 16,2021 at 18:36:16 UTC from IEEE Xplore. Restrictions apply.
overhead per parallel bus word. A detailed comparison of this Each SSH has two external interfaces: An IEEE 1687 [15]
architecture to SSN is presented in Section VIII. IJTAG interface predominantly used for setup, and a parallel
data bus that subsequently transports the payload scan data and
To allow simultaneously driving more internal scan channels connects one SSH node to the next. The IJTAG network, shown
than the number of chip-level scan pins, some architectures such as a 1-bit bus, is used to configure all nodes in the SSN network
as [12] employ serializers/deserializers. This additionally allows prior to the application of a test pattern set. Each node is loaded
running chip-level scan pins at higher frequencies than internal with information related to the protocol such as the active bus
scan chains support, improving overall bandwidth. A width, its location in the series of nodes driven, the number of
subsequent version of this technology [13] added flexibility to shift cycles per scan pattern, scan_enable transition timing
allow varying the number of scan pins per core. The number of information, etc. Following this setup, the entire test pattern set
external scan pins per core and the related is applied as packetized scan data that is streamed on the parallel
serialization/deserialization ratio are programmable. The bus shown as an N-bit bus. Because the protocol of alternating
purpose is to enable reuse of the test data for a given core across shift/capture operations is very regular and repeatable, each SSH
SoCs with different scan pin configurations. It also enables is pre-loaded with the information needed for its counters and
varying shift frequencies in different cores within the SoC. finite state machine to track the streaming operation. There is no
Those methods facilitate IP reuse and access to cores in the
need to send opcode or address information with each packet.
presence of limited chip-level scan pins. However, they do not Only the scan payload is streamed, as shown in the next section.
address routing challenges in tile-based designs nor provide an As data streams through the SSH nodes, each node can identify
efficient and scalable solution for multiple identical cores. when it needs to read scan_in data from the bus, when it needs
Some scan compression methods have extensions to to place scan_out data on the bus, and when it needs to pass
facilitate test across an SoC. For example, the architecture in along data that is destined for other nodes. Each SSH controls
[14] can distribute test data to compression logic in cores, and the local scan operations for the core, including transitions
uses serializers/deserializers to manage pin count limitations. between load/unload and capture stages, as well as performing
However, as with the preceding method, it is not an abutment- individual shift operations. All scan signals and EDT controls
friendly architecture nor does it efficiently test many identical are generated by the SSN local to the core and the only test
cores as SSN will be shown to do. signals that cross core boundaries are the SSN parallel bus (N-
bit data bus + clock) and the IJTAG signals. This allows scan
In the next sections, we describe how SSN aims to solve the timing closure to be completed at the core level.
challenges presented in Section I, while improving on
efficiency, flexibility, and capabilities of previously published SSN supports the abutment of cores in tile-based designs
access mechanisms. with no routing outside the cores. The outputs of one core
connect to the inputs of the next adjacent core. A chip with SSN
III. SSN TECHNOLOGY FUNDAMENTALS usually has a single datapath (parallel bus) that goes through all
cores. Depending on the floorplan and pad locations, it may be
A. Architecture Overview preferable for physical design to implement multiple, physically
independent datapaths (for example, one datapath per chiplet
Fig. 1 shows a simplified example of a 6-core design that [16][17]). Each datapath is also configurable and can include
uses SSN. Each core typically contains one Streaming Scan Host muxes that can be programmed to include or exclude segments
(SSH) node (yellow box). The SSH drives local scan resources of the network similar to the Segment Insertion Bit (SIB) in
to load/unload scan chains/channels with data delivered on the IJTAG networks.
SSN bus. In the figure, an EDT scan compression controller is
shown for simplicity as a representative of the scan logic within As will be demonstrated in the upcoming sections, the SSN
the core. In reality, the SSH node can interface with EDT bus width is selected based on chip-level pin availability and is
controller(s), uncompressed/legacy scan chains, or a independent of the number and logic size of the scanned cores,
combination of the two. and the number of channels needed by the EDT controller(s) in
each core. This enables each core to have the same plug-and-
play interface and bus width for scan test, allowing SSN to scale
efficiently as the design floorplan, number of cores, or the
content of the cores change.
The ability to route the bus carrying the data from one core
to the next while dynamically controlling which cores are
active/inactive/bypassed means one has flexibility in accessing
any combination of cores without changing the hardware.
Unlike pin-mux architectures, this flexibility does not come at
the expense of routing congestion. Additionally, there is no need
to try and predict at design time how to group cores that are to
be tested concurrently. Whether performing ATPG on groups of
cores or retargeting patterns from different cores, the same SSN
Fig. 1: SSN Architecture
network can provide access to one core at a time, all cores
simultaneously, or anything in-between.

Regular Paper INTERNATIONAL TEST CONFERENCE 3

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 16,2021 at 18:36:16 UTC from IEEE Xplore. Restrictions apply.
Fig. 2: Streaming scan packets

bits of the second parallel word, and the 2 bits of the following
B. Packets parallel word. While the allocation of bits within a packet to an
In SSN terminology, a “packet” usually consistent of all the SSH is invariant, there is no static mapping between a bit of the
scan data needed for all the active SSH nodes to perform a single bus and an EDT channel inputs/output. The locations of the 9-
internal scan shift operation. A packet should not be confused bit packets within each 8-bit bus word rotate with each packet.
with the actual SSN physical bus width which could be narrower Each SSH node keeps track of the location of its data in each
or wider than a packet. The SSN payload delivered from the packet, including accounting for rotation of the data. The size of
tester may be viewed as a continuous stream of packets that may each packet must be equal to or greater than the bus width. In
wrap across SSN bus boundaries. To illustrate this concept, exceptional cases where the packet size is less than the physical
consider the example shown in Fig. 2 where two blocks are bus width, the bus is re-programmed to reduce its active width
being tested concurrently. Block A loads/unloads 5 bits per shift such that it does not exceed the number of bits in a packet.
cycle of the block (has 5 EDT channels). Block B has 4 channels.
For both blocks to perform one shift cycle, 9 bits have to be Typically, the same time slots of the packet that carry scan-
loaded/unloaded. In conventional scan access methods, this in data to an SSH node also carry scan-out data from that node.
would have required 9 chip-level scan input pins and 9 scan (Multiple identical cores may be handled differently as
output pins. With SSN, the packet size in this example gets set explained later.) As block A reads the first 5 bits of every packet,
to 9 bits independent of the SSN 8-bit bus width. 9 bits have to it replaces them with 5 bits scanned out (with slight latency).
be delivered for each of the 2 blocks to shift once. The first 5 Any number of internal cores and their channels can be
bits of every 9-bit packet are programmed to belong to block A, controlled with an SSN bus that is as narrow as one bit. This is
and the next 4 bits of every packet are programmed to belong to because the packets can be as wide as they need to be, and can
block B. This is all determined and programmed at pattern occupy as many bus words as needed. The internal channel
generation time – it is not hard-coded in the SSN logic. After requirements (9 bits in this example) are decoupled from the
programming all the SSN nodes using IJTAG, SSN delivers a available scan pins at the chip level (8 × 2 pins for scan in this
continuous, repeating stream of 9-bit packets. The allocation of case). If the packet is wider than the bus and occupies multiple
packet bit positions to SSH nodes is the same for all packets and bus words, the cores shift less often than once every bus shift
is programmed at setup. As soon as block A extracts 5 bits from cycle but it will be possible to drive all the cores needed. In this
the bus, it performs one internal shift operation. Likewise for example with 9-bit packets and an 8-bit bus, the blocks shift
block B, every time it accumulates 4 bits. The SSH is approximately every bus/tester clock cycle. Occasionally, a
programmed with the shift count per scan load, so it can identify block may omit shifting in a given cycle because it has to wait
when to perform shift, and when to perform capture. Capture to acquire all the bits it needs for one shift cycle. If the bus is 1
involves events generated by the SSH such as de-asserting bit wide instead of 8 bits wide, it takes 9 tester cycles to scan in
scan_enable, applying capture clocks through an On-chip Clock each packet. So the internal shift rate is 1/9th of the external shift
Controller (OCC) [18], and re-asserting scan_enable in rate, but it is still possible to drive all 9 internal channels from
preparation for the next scan operation. the 1-bit bus. In fact, the bus width can be scaled down
In this example, we have decided to use 9-bit packets dynamically at pattern generation time. When driving multiple
although the bus width is 8 bits. The stream of 9-bit packets is cores concurrently such that the packet spans multiple bus
simply folded into the 8-bit bus with no bits wasted. The first 9- widths, and the internal shift frequency is slower than the
bit packet occupies the first 8-bit parallel word of the bus, and external frequency as a result, this presents an opportunity to
the first bit of the second word (second tester cycle). The second deliver the data more quickly without exceeding the constraints
packet starts immediately after that, occupying the remaining 7 on the internal core shift frequencies. It is common in SSN

Regular Paper INTERNATIONAL TEST CONFERENCE 4

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 16,2021 at 18:36:16 UTC from IEEE Xplore. Restrictions apply.
implementations to cap the core-internal shift frequency at 100 using a Bus Frequency Divider (BFD)/Bus Frequency
MHz yet run a faster/narrow bus at 400 MHz. Multiplier (BFM) pair, as shown in Fig. 4.

C. The Streaming Scan Host (SSH) Node

Fig. 3 shows a high-level view of the SSH. In addition to its
aforementioned functionality, other characteristics to highlight
are:
1. If a core with an SSH is not under test in a given mode, the
SSH may have to continue passing data through, being part
of the network, but does not have to deliver scan data to its
EDT. In this case, the SSH is said to be disabled. The data
passes from the bus input register directly to the bus output
register, such that the SSH acts as two pipeline stages
within the network.
2. If a core is to be powered off when not under test such that
the data cannot flow through the SSN segment within it,
the datapath can be designed such that the segment going
through the powered-off region is muxed out.
3. Because the packets data may rotate within the bus and
span multiple parallel words, the SSH has shifters and
registers to re-align and collect the data.
4. To test the SSH and the rest of the SSN network before
they are used for scan test, the SSH can be placed into
loopback mode. In this mode, the scan data normally going
to EDT is directly fed back to the scan data normally
unloaded from EDT, as shown in the figure.
5. The node is small in size. It is usually smaller than an EDT
Fig. 4: Managing clock skew across CTS regions using
controller. BFD/BFM

The pair acts as a deskew FIFO. By temporarily converting a

fast narrow bus into a slow wide bus when crossing Clock Tree
Synthesis (CTS) regions, a larger amount of clock skew can be
tolerated without impacting the shift speed or throughput. The
FIFO logically acts like pipeline stages in the SSN datapath.
Splitting the FIFO into 2 discrete components allows the BFD
to be placed in the transmitting region and the BFM in the
receiving region, with each component driven by the local SSN
clock in its region.
The BFD and BFM nodes may additionally be used to reduce
the bus width distributed around the chip and reduce the SSN
area. Although an SSN bus that operates at 400 MHz can be
easily implemented, it is often not possible to shift data through
the chip-level pins at more than 200 MHz. Assume that the SoC
has enough pins to implement 64 scan inputs and 64 scan
outputs. One option would be to implement a 64-bit bus
Fig. 3: Streaming Scan Host (SSH) node
throughout the chip and operate it at 200 MHz. Alternatively,
the data can be scanned into the chip through 64 pins at 200 MHz
and a BFM added between the scan inputs and the first SSH to
IV. MANAGING CLOCK SKEW & BUS WIDTH convert this input stream to a 32-bit, 400 MHz bus. This 32-bit
To maximize SSN’s throughput, it is desired to run the bus bus is then used across the chip, connecting SSH nodes with 32-
at higher frequencies than shift frequencies of the cores. It is bit buses. Then before exiting the chip, a BFD node is added to
possible to implement a 400 MHz SSN bus. It is, however, often convert the SSN output bus back to a 200 MHz 64-bit bus
unrealistic to balance the SSN clock throughout a large SoC. The driving the output pins.
SSN clock may be balanced within each core or groups of cores,
but there may be clock skew between those regions that must
not be allowed to degrade the shift frequency. This is addressed

Regular Paper INTERNATIONAL TEST CONFERENCE 5

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 16,2021 at 18:36:16 UTC from IEEE Xplore. Restrictions apply.
V. OPTIMIZING TEST TIME AND DATA VOLUME shift and capture at the same time. In addition to scan access,
It is important to differentiate when the capture cycles of all this may further facilitate testing a large number of cores
cores must be aligned and performed concurrently versus when concurrently.
each core (or group of cores) operates independently and can
capture regardless of whether other cores are shifting or VI. TESTING OF MULTIPLE IDENTICAL CORES
capturing. The latter enables more efficiency when it can be Many SoCs that achieve high throughput by parallelizing
used. processing contain a number of cores that are replicated multiple
When running ATPG on a group of interacting cores, such times. CPU chips often include multiple processor cores. AI and
as during external test, it is always necessary to align capture GPU chips in particular can have some cores replicated well
events because of the interactions between the cores during over 100 times. As previously explained, in pin-mux scan access
capture. In this case, the SSH in each of those cores can shift architectures, the scan inputs may be broadcast to identical core
independently, but all those SSH nodes are programmed to instances, but the scan outputs are usually observed
capture concurrently once they complete a scan load/unload. independently to ensure lossless mapping and observability for
diagnosis. This results in a non-scalable solution where
However, consider when pattern generation is performed on increasing the number of core instances requires additional chip
wrapped cores (or groups of cores) that are isolated from one pins for concurrent test.
another and have their own OCCs. At the top level, those
patterns sets are independent and can be merged and applied A. On-Chip Compare
concurrently. In most other retargeting solutions, the capture
events are aligned as shown in Fig. 5. While this is necessary for SSN provides a scalable method for testing any number of
ATPG, it can be unnecessary and inefficient for the case of identical core instances in near constant test time, independent
retargeted patterns. Imbalances in shift lengths per scan load of the number of available chip-level pins, even in the presence
may result in unnecessary padding. A core with short scan of tile-based design constraints explained earlier. Instead of
chains should not need to wait for other cores to complete shifting in the stimuli only and unloading the expected response
shifting before they can capture. Furthermore, there are often for comparison on the tester, the stimuli, expected responses,
significant imbalances in the pattern counts of different cores. and compare/nocompare mask data are scanned in within each
Traditional retargeting methods pad the cores with fewer packet so that each core can perform its own on-chip
patterns such that there is a waste of data and test time. comparison. Note that the data arrives at each core instance at a
slightly different time since the SSN bus data streams through
the nodes. With each internal shift cycle, the channel data
transferred from EDT to the SSH is compared, and a pass/fail
status bit per channel per shift cycles is computed. What is
ultimately observed on the tester is the following:
1. Per-shift status bits: This is the aforementioned pass/fail
bit for a given channel in a given internal shift cycle. This
status bit is allocated a timeslot in the packet for unloading.
To provide a scalable solution for any number of identical
core instances, the same status bit in the packet usually
accumulates the pass/fail status from a given channel/shift
Fig. 5: Retargeting with aligned vs. independent capture
cycle across all identical core instances (or a subset of
them). If this bit indicates a fail, one can identify which
SSN has two features to reduce test time and test data volume core-level bit had a failure but not necessarily which core
in such cases. First, it supports independent shift/capture for instance(s) this failure originated from. It is still possible
different retargeted cores. This is possible because signals such to identify failing cores and per-core fail information for
as scan_enable and the shift clock are generated locally by each diagnosis as explained later.
SSH. Second, it reduces the shift length/pattern count
imbalances between cores by programmatically varying the 2. Sticky status bits: One sticky bit per SSH indicates if there
bandwidth used for each core. If a core requires many fewer was a failure in scan observed by this SSH in any
overall shift cycles across a pattern set than other cores, it can be cycle/channel of the pattern set. This bit per SSH is
sent fewer bits per packet. For example, a core with 4 channels unloaded through IJTAG at the end of a pattern set to
does not need to be allocated 4 bits per packet. It can be throttled quickly identify failing cores (for designs with redundant
down and sent only 1 bit per packet such that it shifts internally cores), and to aid in diagnosis. Note that where finer
every four packets instead of every packet. The result is that the granularity than 1 fail bit per SSH is needed, it is possible
total number of packets remains the same, but the size of the to generate a sticky bit per channel output connected to the
packets is reduced, speeding up the overall test time. The next SSH.
section introduces further test optimization possible in the Fig. 6 shows an example of data encoding into packets when
presence of multiple identical core instances. using on-chip compare. Six identical core instances are used in
Note that an additional benefit of independent capture is this example, each driving an EDT controller that has 7 input
power. It can mitigate IR drop since cores under test do not all channels and 2 output channels. Each packet has enough scan
data for the cores to perform one internal shift operation. First,

Regular Paper INTERNATIONAL TEST CONFERENCE 6

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 16,2021 at 18:36:16 UTC from IEEE Xplore. Restrictions apply.
Fig. 6: Packets when using on-chip compare to test multiple identical cores
7 bits per packet corresponding to the 7 input channels (shown Diagnosis in the presence of on-chip compare is more
in blue) are allocated. Those stimuli are broadcast (in time) to all involved and may require re-application of the pattern set to
identical core instances. The expected responses (2 output collect all the data needed. Consider the case where all identical
channels = 2 bits) and mask information (2 output channels = 2 core instances are placed in a single status group such that their
bits) are also shifted in and broadcasted (red). Last are the status per-cycle pass/fail information is aggregated into the same
bits that accumulate the pass/fail information per channel per packet timeslots. If any of those bits indicate failures, we have
shift cycle (green). Typically, we would allocate 2 bits the cumulative per-pin per-cycle fail data but may not know
corresponding to the 2 output channels. A failure in one of those which core(s) the failures came from. The sticky status bits
bits would indicate that the first channel of one of the 6 core unloaded at the end of the test set via IJTAG indicate which
instances failed, but we would not know which one. When we core(s) failed at least once. If only one core in this group fails,
accumulate the status information of all 6 cores together, they then we know the per-cycle pass/fail data came from this core
are considered to be placed into 1 status group. In this example, alone and therefore we have all the information needed for
we chose to partition the 6 cores into group “a” and group “b”. diagnosis. However, if multiple cores fail, we have to separately
We only accumulate the fail information within each group. test and observe each of those failing cores to get their individual
That is why we have 4 green bits: 2 output channels × 2 groups. fail data. If two cores fail, for example, then the same test set is
The number of groups is programmable at pattern retargeting re-applied twice, with minor patching applied. In each case,
time. Increasing the number of groups beyond 1 sacrifices test static bits in the setup of the cores are patched to control which
efficiency for improved observability as will be explained in the cores are allowed to contribute to the cumulative pass/fail
diagnosis section. results. Note there is no need to store separate patterns for
diagnosis on the tester.
When using on-chip compare, the response data cannot
replace the stimuli in the packet because the stimuli have to If identical core instances are split into multiple groups, this
travel to all other core instances. Separate time slots have to be slightly increases the test time, but decreases the probability of
allocated for the stimuli, the expected responses and the masks resorting to multiple test applications for collecting diagnosis
shifted in, as well as the status bits unloaded. In the common data. In the example shown in Fig. 6, the six cores are split into
case of 1 status group, the number of bits per packet is usually two groups. If cores A1 and A4 are found to have failed, there is
#input_channels + 3 × #output_channels. Because each output no need for test re-application because cores A1-A3 accumulate
channel requires at least 3 bits of data in the packet (expected their status bits separately from cores A4-A6. However, if cores
value, mask, and pass/fail status), using an asymmetric EDT A1 and A3 fail, test re-application with patching is needed to
with fewer output channels than input channels improves test acquire the individual fail data. In the extreme case, you may
time and test data volume in conjunction with on-chip compare. choose to assign each core instance to its own group so that each
core is observed individually. This mode of operation may be
B. Diagnosis Flow better suited for silicon debug than high-volume manufacturing.
Failure data is needed even during high-volume
manufacturing for on-tester identification of failing cores to VII. ALTERNATE INTERFACES
support partial good die strategies (redundant logic cores), and
for diagnosis-driven yield analysis. When not using on-chip A. Streaming Tests through JTAG/IJTAG Interfaces
compare, every channel output bit in a core maps to a single bit It is possible not to use the SSH parallel bus at all, and
on the top-level SSN bus outputs that are unloaded and instead use the JTAG(chip)/IJTAG(core) interface for both
compared on the tester. Logic diagnosis is straightforward in setup and subsequent streaming of the test data. There are two
that case: perform reverse mapping of chip-level failures cases where this may be desirable:
through the SSN network to the EDT channel outputs, then
perform conventional compressed pattern diagnosis (at the core 1. As a survivability option. If during silicon bring-up, the
level in case of retargeted patterns). bus is inaccessible due to a silicon defect, this provides an
alternate method of accessing any SSH or group of SSHs.

Regular Paper INTERNATIONAL TEST CONFERENCE 7

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 16,2021 at 18:36:16 UTC from IEEE Xplore. Restrictions apply.
2. If a low pin count device only has a JTAG interface and
no other digital pins, it is possible to implement SSN
without the parallel bus and rely on the JTAG/IJTAG
interfaces for streaming the test data.
Fig. 8: STF packing of narrow EDT data
B. Compatibility with Test Using SerDes (IEEE 1149.10)
IEEE 1149.10 [19] provides for re-using high-speed I/O
(HSIO) SerDes lanes to enable very high bandwidth transfer of B. Comparison of Data Field Utilization
test data to/from a chip. The Packet Encoder/Decoder and STF utilizes a fixed data field size of 32 bits. To
Distribution Architecture (PEDDA) IP described in the standard accommodate EDTs with a smaller number of channels, the STF
results in deserialized data presented on a parallel bus. SSN’s data word is divided up into fields, and the data for multiple shift
synchronous parallel bus is ideally suited to interface with the cycles is packed into the 32-bit word to achieve better
PEDDA. SSN can handle on-chip distribution of test data and utilization. However, when the EDT channel size does not
internal generation of test signals. As the SSN network can divide evenly into the 32-bit word, this reduces efficiency as
operate internally at high frequencies (at least 400 MHz), it is illustrated in Fig. 8. In this example, with 9-bit EDTs, we can
capable of testing many cores concurrently and quickly when pack 3 shift cycles of data into the data word with 5 bits of
coupled with this high-bandwidth chip-level interface. unused data, resulting in an overhead of nearly 16%. In the worst
case of a 17-bit EDT, 47% of the data bandwidth is wasted.
VIII. PRACTICAL EXPERIENCE USING SSN Thus, STF data field utilization can range from 53% to 100%
depending on how the EDT data packs into the 32-bit word.
In collaboration with Mentor, Intel has been evaluating the Because SSN utilizes data rotation, any leftover bits within the
use of SSN. SSN is capable of scaling to large SoCs and server- bus become part of the next packet, always achieving 100%
class designs that require support for large partition counts and utilization of the bus data word.
identical core testing. Previous generations of Intel SoCs have
utilized an internally developed high bandwidth packetized C. Interleaving, Vector Count and Chain Length Mismatch
fabric, STF [10][11] to address these needs. STF was developed Handling
to allow this scalability at much lower overhead than the
traditional pin muxed scan solutions. In evaluating SSN, the Both STF and SSN scale to any number of partitions,
goals were to assess whether moving to SSN could further however their approaches differ in how they handle the
improve test time and bandwidth utilization over STF, as well as interleaving of partitions. In the example shown in Fig. 9, a set
reduce design effort through the use of a vendor supported of partition patterns that have differing numbers of vectors are
platform. to be merged. Typically, STF will have a specified interleave
factor, in this case 4, to which the patterns are repacked
optimally into these 4 groups. These groups are then round-robin
A. Comparison of Packet Encoding Overhead
interleaved to create the final pattern set, as shown in the figure.
Both STF and SSN can scale to support any number of
partitions, however, the approach to accomplish this differs SSN’s handling of interleaving achieves similar efficiency
between the two systems. The STF network relies on explicit for vector count mismatch as STF, but SSN can also partially
addressing information stored within each packet. This is
accomplished by having a short address ID tag contained within
each packet, typically 4 bits in size. In addition, STF requires an
opcode field, 4 bits in size, as well as input and output valid bits.
This results in an overhead of 10 bits being added to each data
packet. In contrast under SSN, the destinations and interleave
settings are statically programmed during the test setup,
allowing the entire bus bandwidth to be used for data. For a
typical bus size of 32 bits, STF has a 31% higher overhead than
SSN. This is depicted in Fig. 7. Fig. 9: STF pattern interleaving

Fig. 7: STF packet overhead vs. SSN

Fig. 10: Chain length mismatch padding in STF

Regular Paper INTERNATIONAL TEST CONFERENCE 8

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 16,2021 at 18:36:16 UTC from IEEE Xplore. Restrictions apply.
mitigate chain length mismatch between partitions, which STF looked at other aspects, such as design effort and run times. To
cannot. STF requires all partitions in the pattern set to be padded perform the study, a simple test design was created consisting of
to the same shift length, resulting in overhead. This is depicted a single interface partition, partition1, and four identical copies
in Fig. 10. In our current designs, we allow up to 20% chain of a partition, partitions 2-5, as shown in Fig. 11.
length mismatch between partitions, so it is theoretically
possible SSN could have up to 20% better packing efficiency in
the final pattern.

D. Fabric Test Setup

Since the STF fabric is configured in-band using packets, the
pattern overhead for network setup is very small, approximately
10 cycles per active endpoint. SSN utilizes IJTAG to program
the network with approximately 160 bits of state per active
endpoint, plus IJTAG network overhead. Though this could
result in substantially higher setup overhead for SSN, the cost of
the setup is amortized across the entire scan vector set. For large
pattern sets, network setup should not present a significant
overhead of more than 1% for SSN.

E. On-Die Compare
Fig. 11: Pilot network topology
STF and SSN provide comparable functionality for identical
core testing using on-die compare. Both systems require the
input data stream to include the input data, mask data and Table I. Theoretical Comparison of STF vs. SSN Data Volume
expected response, causing a 3X growth of the data volume, but
allow testing of any number of cores in constant time. SSN has Vector/
a possible advantage in the handling of an asymmetric number Chain
of input and output channels. In this case, SSN can more tightly Packet Data Field Length Fabric
pack the expect and mask fields to match the smaller output Encoding Utilization Mismatch Setup
Overhead Overhead Overhead Overhead
channel case, possibly realizing less than 3X data growth. STF,
however, allocates bandwidth assuming symmetric usage and is SSN (baseline) 1.0 1.0 1.0 1
always 3X data volume. For the purpose of this analysis, we STF 1.31 1.0 - 1.47 1.0 - 1.2 0.99
assumed that on-die compare would be neutral between the two
systems. STF data volume vs. SSN (theoretical): 1.30 - 2.29

F. Total Estimated Overhead Comparison An SSN bus data width of 32 bits was chosen to match STF
In summary, STF pays a high overhead in packet encoding, to allow direct comparison. ATPG patterns were created
data field utilization and handling of chain length mismatch. targeting partitions 2-5, each having 9 EDT channels for a total
of 36 bits of channel data. By having a total channel data set size
Network setup overhead is higher in SSN, but amortized across
of >32 bits, SSN will perform data rotation and create a more
the number of scan vectors resulting in a negligible difference.
meaningful comparison. The 9-bit EDT channel size represents
Overall, this can lead to over 2X reduction in data volume under
a typical data field packing inefficiency for STF. Multiple
SSN vs. STF, as summarized in Table I.
ATPG runs were conducted to analyze the overhead at 10, 500,
and 10,000 vectors. The results from these runs are summarized
G. SSN Pilot Study in Table II, comparing STF, SSN, and a legacy pin mux solution.
SSN offers a compelling theoretical advantage over the
current STF fabric in use. However, we wanted to measure For this testcase, SSN shows a clear advantage over STF,
results on actual partition data to verify. Further, the study with STF having 19% higher test time and 57% more data

Table II. Pilot test time and data volume results

SSN (32b bus size) STF (42b packet size / 32b data size) Pin-muxed GPIO (estimated)
Setup Scan Total Total Setup Scan Total Total Setup Scan Total Total
cycles Test Test Data cycles Test Test Data Cycles Test Test Data
Patterns (IJTAG) Cycles Cycles (Mb) (STF) Cycles Cycles (Mb) (IJTAG) Cycles Cycles (Mb)
10 34,703 3,280 37,983 0.28 64 3,800 3,864 0.16 20,446 5,740 26,186 0.29
500 34,703 136,120 170,823 4.53 64 161,820 161,884 6.80 20,446 241,900 262,346 7.74
10000 34,703 2,711,740 2,746,443 86.95 64 3,257,200 3,257,264 136.81 20,446 4,820,780 4,841,226 154.26
STF vs. SSN @ 10K patterns 1.19 1.57 Pinmux vs. SSN 1.76 1.77

Regular Paper INTERNATIONAL TEST CONFERENCE 9

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 16,2021 at 18:36:16 UTC from IEEE Xplore. Restrictions apply.
volume than SSN. SSN test setup is higher overhead than STF, especially well suited for tile-based designs. Intel evaluated SSN
however when amortized across the 10,000 vectors in the run and compared it to STF as well as to conventional pin-muxed
set, this impact is in the expected range of 1.2%. This testcase access. SSN was found to reduce the test data volume by 36%
used identical partitions and hence did not exercise vector count and 43%, respectively. It reduced test cycles by 16% and 43%,
mismatch between partitions nor chain length mismatch, which respectively. Steps in the design and retargeting flow were
would further favor SSN. For comparison purposes, a legacy pin between 10x – 20x faster with SSN compared to STF.
muxed solution is included showing a large overhead relative to
SSN. Since the pin muxed solution cannot transport 36 bits of ACKNOWLEDGMENT
channel data in a single run, it must be split into 2 runs, nearly
doubling test time and data volume. The authors wish to thank other contributors to the
development of the SSN technology: Yahya Zaidan, Pawel
In addition to data volume and test time metrics, we also Galas, Szymon Walkowiak, Paul Reuter, and Tony Fryars. We
collected information on design efficiency between the internal would also like to thank the contributors to the SSN pilot study:
STF toolset and the Mentor Tessent™ tool flows for SSN. This Sirish Chittoor, Yonsang Cho, Luis Briceño Guerrero, Kavita
comparison is summarized in Table III. Bansal, Kelsey Byers, and Ian Nuber. Finally, many thanks to
As the table shows, SSN and the Tessent flows provide all our other partners who also provided invaluable feedback
significant productivity improvement over our previous flow during the development, validation, and deployment of SSN.
built from multiple tools, enabling rapid integration into the
design and fast turnaround ATPG runs. The SSN flows do not REFERENCES
require ATPG cut points and custom setups to generate and
retarget patterns, resulting in significant savings in pattern [1] Standard Testability Method for Embedded Core-based Integrated
retargeting. Though not in the scope of this analysis, further Circuits, IEEE Standard 1500, 2005.
benefits are expected in gate level simulation debug [2] J. Remmers et al., “Hierarchical DFT methodology - a case study, ” IEEE
productivity. International Test Conference, 2004.
[3] D. Trock et al., “Recursive Hierarchical DFT Methodology with Multi-
Table III. Design efficiency comparison between STF and SSN level Clock Control and Scan Pattern Retargeting,” IEEE Design,
Automation & Test in Europe Conference & Exhibition (DATE), 2016.
STF Tessent SSN [4] J. Rajski et al., “Embedded Deterministic Test,” IEEE Trans. on CAD,
Metric Flow Flow vol. 23, May 2004, pp. 776-792.
Tools Count 7 3 [5] P. Wohl, J.A. Waicukauski, J.E. Colburn, M. Sonawane. "Achieving
RTL Completion to ATPG extreme scan compression for SoC Designs", IEEE International Test
~10 Hours ~1 Hour Conference, 2014.
Start
ATPG Completion to Gate [6] C. Barnhart et al., "OPMISR: The foundation for compressed ATPG
~1 Day ~2 Hours vectors," IEEE International Test Conference, 2001.
Level Simulation start
ATPG pattern retargeting of [7] G. Giles et al., “Test Access Mechanism for Multiple Identical Cores,”
~4 Hours ~12 Minutes IEEE International Test Conference, 2008.
a partition
[8] Y. Dong et al., “Maximizing Scan Pin and Bandwidth Utilization with a
Scan Routing Fabric,” IEEE International Test Conference, 2017.
H. SSN Pilot Study Summary [9] J. Janicki et al., "EDT bandwidth management - Practical scenarios for
large SoC designs," IEEE International Test Conference, 2013.
Analysis of a small test network verified that the theoretical
[10] G. Colon-Bonet, “High Bandwidth DFT Fabric Requirements for Server
advantages of SSN over our previous internal STF fabric are and Microserver SoCs,” IEEE International Test Conference, 2015.
achievable and a significant improvement in both test time (16% [11] G. Colon-Bonet, “High Bandwidth Packetized DFT Fabric for Server
reduction) and test data volume (36% reduction). The data SoCs,” IEEE International System-on-Chip Conference, 2016.
shows that the approach of static network configuration during [12] A. Sanghani et al., “Design and Implementation of A Time-Division
test setup is more efficient for large scan data sets than allocating Multiplexing Scan Architecture Using Serializer and Deserializer in GPU
addressing and opcode information within each packet. In Chips,” IEEE VLSI Test Symposium, 2011.
addition, further benefits were seen in design efficiency for [13] M. Sonawane et al., “Flexible Scan Interface Architecture for Complex
insertion, ATPG setup and pattern retargeting relative to our SoCs,” IEEE VLSI Test Symposium, 2016.
previous flows. [14] P. Wohl et al., “Achieving Extreme Scan Compression for SoC Designs,”
IEEE International Test Conference, 2014.
[15] Standard for Access and Control of Instrumentation Embedded within a
IX. CONCLUSION Semiconductor Device, IEEE Standard 1687, 2014.
The SSN technology introduced in this paper solves many of [16] J. Durupt et al., " IJTAG supported 3D DFT using chiplet-footprints for
the scan distribution challenges in complex SoCs. It enables testing multi-chips active interposer system," IEEE European Test
Symposium, 2016.
simultaneous testing of any number of cores with few chip-level
[17] M. Lin et al., “A 7nm 4GHz Arm®-core-based CoWoS® Chiplet Design
pins, and it has multiple features to reduce test time and test data for High Performance Computing”, Symposium on VLSI Circuits Digest
volume. It can test any number of identical core instances in near of Technical Papers, 2019.
constant time, minimizes padding in the presence of cores with [18] T. Waayers et al., “Clock control architecture and ATPG for reducing
mismatched pattern counts and/or scan chain lengths, and pattern count in SoC designs with multiple clock domains,” IEEE
enables fast streaming of data to/from and throughout the chip. International Test Conference, 2010.
It simplifies design planning and implementation, and is [19] Standard for High-Speed Test Access Port and On-Chip Distribution
Architecture, IEEE Standard 1149.10, 2017.

Regular Paper INTERNATIONAL TEST CONFERENCE 10

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 16,2021 at 18:36:16 UTC from IEEE Xplore. Restrictions apply.

Ram Sequential Atpg
No ratings yet
Ram Sequential Atpg
14 pages
Atpg Lab Notes
100% (1)
Atpg Lab Notes
11 pages
ATPG LAB3d
No ratings yet
ATPG LAB3d
2 pages
SmartScan Compression
No ratings yet
SmartScan Compression
5 pages
Recursive Hierarchical DFT Methodology With Multi-Level Clock Control and Scan Pattern Retargeting
No ratings yet
Recursive Hierarchical DFT Methodology With Multi-Level Clock Control and Scan Pattern Retargeting
4 pages
Test Compression - DFT
No ratings yet
Test Compression - DFT
8 pages
Benefits of Moving To Plug-And-Play Hierarchical DFT
No ratings yet
Benefits of Moving To Plug-And-Play Hierarchical DFT
5 pages
Bi - Directional Signals in IEEE P1500 Standard
No ratings yet
Bi - Directional Signals in IEEE P1500 Standard
68 pages
SSN Questions
No ratings yet
SSN Questions
2 pages
P1500-Wrapper - Creation
No ratings yet
P1500-Wrapper - Creation
24 pages
DFT Lbist PDF
100% (1)
DFT Lbist PDF
10 pages
SSN (Streaming Scan Network)
No ratings yet
SSN (Streaming Scan Network)
15 pages
Useful Book Ieee1500 Dft
No ratings yet
Useful Book Ieee1500 Dft
63 pages
Memories
No ratings yet
Memories
22 pages
Hold and Setup Violation and SDF Metastable False and Multicycle
No ratings yet
Hold and Setup Violation and SDF Metastable False and Multicycle
6 pages
Ijtag Ieee 1687
No ratings yet
Ijtag Ieee 1687
29 pages
Hierarchical Scan and Atpg For Two Stage PDF
No ratings yet
Hierarchical Scan and Atpg For Two Stage PDF
3 pages
Block-Level OCC Insertion With A Top-Level PLL
No ratings yet
Block-Level OCC Insertion With A Top-Level PLL
3 pages
Lockup Latche
0% (1)
Lockup Latche
6 pages
Tessent Command Study Notes 4
No ratings yet
Tessent Command Study Notes 4
7 pages
Tessent Memorybist and Logicbist
100% (1)
Tessent Memorybist and Logicbist
11 pages
Scan Training Class 4 - EDT Scan Compression, STF, and Border Sealing
No ratings yet
Scan Training Class 4 - EDT Scan Compression, STF, and Border Sealing
29 pages
Fault Classification
No ratings yet
Fault Classification
7 pages
Transition Delay Fault ATPG PDF
No ratings yet
Transition Delay Fault ATPG PDF
3 pages
IJTAG Tutorial FINAL Circuitnet PDF
No ratings yet
IJTAG Tutorial FINAL Circuitnet PDF
9 pages
Testing of Repairable Embedded Memories in SoC Approach and Challenges
No ratings yet
Testing of Repairable Embedded Memories in SoC Approach and Challenges
6 pages
Rom Bist
No ratings yet
Rom Bist
51 pages
DFT DRC - PPT 0
No ratings yet
DFT DRC - PPT 0
3 pages
IJTAG Tutorial Third Edition
No ratings yet
IJTAG Tutorial Third Edition
36 pages
Edt
No ratings yet
Edt
10 pages
Overview of Scan Insertion
No ratings yet
Overview of Scan Insertion
3 pages
Scan Insertion Lab Observations
No ratings yet
Scan Insertion Lab Observations
2 pages
(IEEE 1149.1/P1149.4) : Tutorial Intermediate
100% (1)
(IEEE 1149.1/P1149.4) : Tutorial Intermediate
57 pages
Edt Masking
100% (1)
Edt Masking
3 pages
Why Masking Is Needed: Scan Chain Masking in The Acompactor
No ratings yet
Why Masking Is Needed: Scan Chain Masking in The Acompactor
6 pages
Ch3.Fault Modeling
No ratings yet
Ch3.Fault Modeling
47 pages
Logic Design For Single On-Chip Test Clock Generation For N Clock Domain - Impact On SOC Area and Test Quality
No ratings yet
Logic Design For Single On-Chip Test Clock Generation For N Clock Domain - Impact On SOC Area and Test Quality
5 pages
DFT Adviser PDF
No ratings yet
DFT Adviser PDF
445 pages
Lockup Latch - Semicon Shorts
100% (1)
Lockup Latch - Semicon Shorts
4 pages
Scan DRC'S
No ratings yet
Scan DRC'S
7 pages
Using OCC For Full Scan Design PDF
No ratings yet
Using OCC For Full Scan Design PDF
6 pages
Tessent Plltest User'S Manual: Software Version 2014.1 March 2014
No ratings yet
Tessent Plltest User'S Manual: Software Version 2014.1 March 2014
146 pages
Atpg GD PDF
100% (1)
Atpg GD PDF
400 pages
10050-Sharing EDT Clock With Scan Clock
50% (2)
10050-Sharing EDT Clock With Scan Clock
10 pages
The IEEE P1500 Embedded Core Test: Presented by Wei Chen, Wang
No ratings yet
The IEEE P1500 Embedded Core Test: Presented by Wei Chen, Wang
13 pages
Scan Chain Operation For Stuck at Test
100% (1)
Scan Chain Operation For Stuck at Test
15 pages
MBIST Basics
No ratings yet
MBIST Basics
91 pages
Sdc under DFT
100% (1)
Sdc under DFT
8 pages
Lbist Ref
No ratings yet
Lbist Ref
764 pages
Coverage Improvement
No ratings yet
Coverage Improvement
11 pages
Coverage Improvement: by Deepa.B
No ratings yet
Coverage Improvement: by Deepa.B
13 pages
Faults, Testing & Test Generation
No ratings yet
Faults, Testing & Test Generation
34 pages
Emailing DFT Interview Preparation
No ratings yet
Emailing DFT Interview Preparation
17 pages
Tessent Integrated Flow Lab2 IJTAG Introduction Ex2-3
No ratings yet
Tessent Integrated Flow Lab2 IJTAG Introduction Ex2-3
23 pages
Sns College of Technology: Department of Electronics & Communication Engineering
No ratings yet
Sns College of Technology: Department of Electronics & Communication Engineering
24 pages
How Does Scan Work PDF
100% (1)
How Does Scan Work PDF
15 pages
MBIST Verification Best Practices Challenges
No ratings yet
MBIST Verification Best Practices Challenges
5 pages
Tetramax® Lab: Automatic Test Pattern Generation (Atpg) : Computer-Aided Vlsi System Design
No ratings yet
Tetramax® Lab: Automatic Test Pattern Generation (Atpg) : Computer-Aided Vlsi System Design
4 pages
Streaming 20 Network
No ratings yet
Streaming 20 Network
36 pages
Siemens SW Streaming Scan Network WP 82735 C7
No ratings yet
Siemens SW Streaming Scan Network WP 82735 C7
10 pages
Assignment Till 8th May
No ratings yet
Assignment Till 8th May
9 pages
Design and Construction of A Solar-Powered Automatic Irrigation System With IOT Support
No ratings yet
Design and Construction of A Solar-Powered Automatic Irrigation System With IOT Support
11 pages
BUREAU OF INDIAN STANDARDS (Hyderabad Branch Office) : E - Tender Notice
No ratings yet
BUREAU OF INDIAN STANDARDS (Hyderabad Branch Office) : E - Tender Notice
15 pages
Unit 14 - Combinational Circuit Test Pattern Generation: Week 11 Assignment
No ratings yet
Unit 14 - Combinational Circuit Test Pattern Generation: Week 11 Assignment
7 pages
Unit 13 - Fault Simulation and Testability Measures: Week 10 Assignment
No ratings yet
Unit 13 - Fault Simulation and Testability Measures: Week 10 Assignment
6 pages
Vlsi Tech
No ratings yet
Vlsi Tech
15 pages
Lecture 17 PDF
No ratings yet
Lecture 17 PDF
11 pages
ATMega16 AVR Microcontroller LCD Digital Clock
100% (2)
ATMega16 AVR Microcontroller LCD Digital Clock
5 pages
16 Bit Multiplication 8051
100% (5)
16 Bit Multiplication 8051
4 pages
Rog Strix Z390-E Gaming: DDR4 4266 Qualified Vendors List (QVL)
No ratings yet
Rog Strix Z390-E Gaming: DDR4 4266 Qualified Vendors List (QVL)
13 pages
Functional Verification
No ratings yet
Functional Verification
54 pages
Testing Mother Board For Failure
No ratings yet
Testing Mother Board For Failure
7 pages
JARM7 LPC 2148 Manual Board
No ratings yet
JARM7 LPC 2148 Manual Board
5 pages
Product Data Sheet 6Es7414-3Xj00-0Ab0: Cir - Configuration in Run
No ratings yet
Product Data Sheet 6Es7414-3Xj00-0Ab0: Cir - Configuration in Run
10 pages
At91Sam ARM-based Embedded MPU: Features
No ratings yet
At91Sam ARM-based Embedded MPU: Features
47 pages
Dte Dce
No ratings yet
Dte Dce
3 pages
FGPA Internship Report
No ratings yet
FGPA Internship Report
61 pages
Logic 1
No ratings yet
Logic 1
180 pages
220644-665-666-DLD Lab 09 F.. (2) .Docxooooooooooo
No ratings yet
220644-665-666-DLD Lab 09 F.. (2) .Docxooooooooooo
8 pages
Motherboard
No ratings yet
Motherboard
27 pages
4CS015-Workshop5 v4 (2)
No ratings yet
4CS015-Workshop5 v4 (2)
4 pages
555 Modes
No ratings yet
555 Modes
2 pages
Computer Organization & Architecture
No ratings yet
Computer Organization & Architecture
2 pages
Lab 4
No ratings yet
Lab 4
3 pages
Serial Communication: 8051 Microcontroller
No ratings yet
Serial Communication: 8051 Microcontroller
15 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Assembly Language For x86 Processors: Chapter 17: Expert MS-DOS Programming
No ratings yet
Assembly Language For x86 Processors: Chapter 17: Expert MS-DOS Programming
58 pages
Alesis ION (Q01) Service Manual: Attention!
No ratings yet
Alesis ION (Q01) Service Manual: Attention!
46 pages
shift register
No ratings yet
shift register
3 pages
Unit 2-Basic Processing Unit
No ratings yet
Unit 2-Basic Processing Unit
95 pages
AL1x2x EIP Digital IO Setup Guide Rev8
No ratings yet
AL1x2x EIP Digital IO Setup Guide Rev8
12 pages
Evolution of Intel Processors
No ratings yet
Evolution of Intel Processors
4 pages
Machine Language Coding and The Debug Software Development Program of The PC
No ratings yet
Machine Language Coding and The Debug Software Development Program of The PC
13 pages
VLSI Lecture01 20190206 Introduction To VLSI Design
0% (1)
VLSI Lecture01 20190206 Introduction To VLSI Design
17 pages
Athlon Specs PDF
No ratings yet
Athlon Specs PDF
92 pages

Streaming Scan Network

Uploaded by

Streaming Scan Network

Uploaded by

Streaming Scan Network (SSN):

An Efficient Packetized Data Network for Testing of Complex SoCs

Regular Paper INTERNATIONAL TEST CONFERENCE 1

Regular Paper INTERNATIONAL TEST CONFERENCE 2

Regular Paper INTERNATIONAL TEST CONFERENCE 3

Regular Paper INTERNATIONAL TEST CONFERENCE 4

C. The Streaming Scan Host (SSH) Node

The pair acts as a deskew FIFO. By temporarily converting a

Regular Paper INTERNATIONAL TEST CONFERENCE 5

Regular Paper INTERNATIONAL TEST CONFERENCE 6

Regular Paper INTERNATIONAL TEST CONFERENCE 7

Fig. 7: STF packet overhead vs. SSN

Fig. 10: Chain length mismatch padding in STF

Regular Paper INTERNATIONAL TEST CONFERENCE 8

D. Fabric Test Setup

Table II. Pilot test time and data volume results

Regular Paper INTERNATIONAL TEST CONFERENCE 9

Regular Paper INTERNATIONAL TEST CONFERENCE 10

You might also like