NPDF
NPDF
Sudarshan Bahukudumbi
Date:
Approved:
3315870
2008
Copyright
c 2008 by Sudarshan Bahukudumbi
All rights reserved
ABSTRACT
WAFER-LEVEL TESTING AND TEST
PLANNING FOR INTEGRATED CIRCUITS
by
Sudarshan Bahukudumbi
Date:
Approved:
The relentless scaling of semiconductor devices and high integration levels have lead
to a steady increase in the cost of manufacturing test for integrated circuits (ICs).
The higher test cost leads to an increase in the product cost of ICs. Product cost
is a major driver in the consumer electronics market, which is characterized by low
profit margins and the use of a variety of core-based system-on-chip (SoC) designs.
Packaging has also been recognized as a significant contributor to the product cost for
SoCs. Packaging cost and the test cost for packaged chips can be reduced significantly
by the use of effective test methods at the wafer level, also referred to as wafer sort.
Test application time is a major practical constraint for wafer sort, even more than
for package test. Therefore, not all the scan-based digital test patterns can be applied
to the die under test. This thesis first presents a test-length selection technique for
wafer-level testing of core-based SoCs. This optimization technique, which is based
on a combination of statistical yield modeling and integer linear programming (ILP),
provides the pattern count for each embedded core during wafer sort such that the
probability of screening defective dies is maximized for a given upper limit on the SoC
test time. A large number of wafer-probe contacts can potentially lead to higher yield
loss during wafer sort. An optimization framework is therefore presented to address
test access mechanism (TAM) optimization and test-length selection for wafer-level
testing, when constraints are placed on the number of number of chip pins that can
be contacted.
iv
and digital cores in a mixed-signal SoC, and to study its impact on test escapes,
yield loss and packaging cost. Results are presented for a typical mixed-signal “big-
D/small-A” SoC from industry, which contains a large section of flattened digital
logic and several large mixed-signal cores.
Finally, this thesis presents a test-pattern ordering technique for WLTBI. The
objective here is to minimize the variation in power consumption during test applica-
tion. The test-pattern ordering problem for WLTBI is solved using ILP and efficient
heuristic techniques. The thesis also demonstrates how test-pattern manipulation
and pattern-ordering can be combined for WLTBI. Test-pattern manipulation is car-
ried out by carefully filling the don’t-care (X) bits in test cubes. The X-fill problem
is formulated and solved using an efficient polynomial-time algorithm.
v
Acknowledgements
There are several people who significantly influenced this dissertation - in ways direct
and indirect - and I would like to thank them here. My advisor, Dr. Krishnendu
Chakrabarty provided me the academic freedom to pursue research problems that
truly interested me, and for that I am very grateful. His genuine interest in my
progress, technical insights and pursuit of perfection have largely been responsible
for making me a better researcher.
I thank Dr. Sule Ozev for providing valuable counsel and feedback on the mixed-
signal project, and for educating me on the practical aspects of mixed-signal testing.
I would also like to thank Vikram Iyengar of IBM Corporation for providing industry-
insights on the mixed-signal project and for his help in preparing the mixed-signal
manuscript. I thank Rick Kacprowicz of Intel Corporation for being our industrial
mentor in the burn-in project, and for providing valuable insights on the implemen-
tation aspects of our work.
I would like to thank my committee members Dr. Kishor Trivedi, Dr. John
Board, Dr. Montek Singh, and Dr. Romit Roy Choudhury for taking time to serve
on my dissertation committee, and for providing constructive technical feedback on
my work. I would also like to thank Dr. Chris Dwyer for serving on my preliminary
examination committee.
I would like to thank people in my research group Zhanglei Wang, Lara Oliver,
Mahmut Yilaz and Yang Zhao. I have benefited greatly from numerous discussions
with Zhanglei and Mahmut on a wide range of topics- from testing to politics. I am
also indebted to Mahmut and Hongxia Fang for all their help with data generation
for my research projects.
Many people in the secretarial and support staff in electrical engineering specifi-
vi
cally Autumn Wenner and Ellen Currin have helped me on numerous occasions with
travel reimbursements, departmental letters and administrative support, making my
life around here a lot easier.
I am grateful for financial support I received for my graduate studies from the
Semiconductor Research Corporation and the National Science Foundation.
Finally, I would like to thank my mom, dad and brother for being a constant
source of support and comfort in times of need. This dissertation will not be complete
without the excellent support system they have provided over the years.
vii
Contents
Abstract iv
Acknowledgements vi
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
viii
2.1.2 Procedure to determine core defect probabilities . . . . . . . . 27
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
ix
3.3.2 Cost model: Results considering failures due to both digital
and mixed-signal cores . . . . . . . . . . . . . . . . . . . . . . 86
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
x
6.1 Minimum-variation X-fill problem: PM V F . . . . . . . . . . . . . . . 137
Bibliography 164
Biography 175
xi
List of Tables
2.1 Core defect probabilities for four ITC’02 SoC test benchmark circuits. 34
3.2 Experimental Results for Cost Savings Considering Failure Type Dis-
tributions for Mixed-Signal Cores. . . . . . . . . . . . . . . . . . . . . 92
xii
5.2 Percentage reduction in the variance of test power consumption ob-
tained using the P attern Order heuristic for selected ISCAS’89 bench-
mark circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
xiii
List of Figures
1.1 Trend in test cost versus manufacturing cost per transistor (adapted
from [1]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Test and burn-in flow using: (a) PLBI; (b) WLTBI. . . . . . . . . . . 9
2.10 (a) Accessing a wrapped core for package test only (b) TAM de-
sign that allows RPCT-based wafer sort using a pre-designed wrap-
per/TAM architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . 53
xiv
2.11 Integer linear programming model for PT LT W S . . . . . . . . . . . . . 56
2.14 Percentage of test patterns applied to each core in p34392 when when
W ∗ = 16 and W = 32. . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1 Flowchart depicting the mixed-signal test process for wafer-level fault
detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2 The variation of the fault coverage and correction factor versus the
number of test vectors applied to the digital portion of Chip K. . . . 81
3.3 Distribution of cost savings for a small die with packaging costs of (a)
$1 (b) $3 (c) $5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.4 Distribution of cost savings for a medium die with packaging costs of
(a) $3 (b) $5 (c) $7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.5 Distribution of cost savings for a large die with packaging costs of (a)
$5 (b) $7 (c) $9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.6 Distribution of cost savings for a large die with packaging costs of (a)
$5, (b) $7 (c) $9, when test escapes between digital and analog parts
are correlated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.1 (a) TAM architecture for the d695 SoC with W = 32 (b) Correspond-
ing B-partite (B = 3) graph, also referred to as a tripartite graph for
the d695 SoC with W = 32. The nodes correspond to cores. . . . . . 98
xv
4.2 (a) Test schedule for the d695 SoC with W = 32 and Pmax = 1800.
(b) Matched tripartite graph for the d695 SoC with W = 32. Dotted
lines represent matching. . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4 Power profile for d695 obtained using baseline approach 1 and Core Order
(W = 32 and Pmax = 1800). . . . . . . . . . . . . . . . . . . . . . . . 105
5.4 Impact of T Cth on test power variation for s5378: (a) Pmax = 145 and
(b) Pmax = 150 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.6 (a) Test cubes for s208 benchmark circuit; (b) Equations describing
the per-cycle change in transition counts (c) Test set after minimum-
variation X-fill. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
xvi
Chapter 1
Introduction
1
Figure 1.1: Trend in test cost versus manufacturing cost per transistor (adapted
from [1]).
The manufacturing test of SoCs is a process where test stimuli are applied to
the fabricated SoC by means of a test-access mechanism (TAM). The TAM provides
test access to the embedded cores in the SoC from the input/output (I/O) terminals
of the chip. The steps involved in testing of SoCs, and semiconductor devices in
general, can be classified into three categories: wafer sort or probe test, post-package
manufacturing test, and burn-in.
Wafer sort is the first step in the manufacturing test process, where the chip
in bare wafer form is tested for manufacturing defects. The devices are subjected
to standardized parametric and functional tests; devices that pass these tests are
subjected to further assembly and test processes, and the ones that fail these tests
are marked with an ink dot on the wafer to indicate that they are faulty.
Once the devices that pass the test at the wafer-level are packaged, they are
2
Figure 1.2: The steps involved in the testing of a semiconductor device.
subjected to package test. The package test process is often carried out in two
stages. The first stage of testing a packaged device takes place before the burn-in
process, and the second stage, i.e., the final step in testing the device, is carried out
after burn-in. Complete parametric, functional, and structural testing are performed
during package testing of these devices.
Some devices that pass all the manufacturing tests may fail early during their
lifetime. The burn-in test process accelerates failure mechanisms that cause early
field failures (“infant mortality”) by stressing packaged chips under high operating
temperatures and voltages. The burn-in process is therefore an important component
in the test and manufacturing flow of a semiconductor device, and it is necessary to
ensure reliable field operation. Figure 1.2 illustrates the conventional test flow for
semiconductor devices.
3
Techniques and solutions employed for probe testing of SoCs can also be used
exclusively during the manufacture of known good dies (KGDs); KGDs are fully
functional devices that are sold as bare dies and used in the manufacture of complex
system-in-package (SiP) devices and multi-chip packages (MCPs). Until recently, a
major concern was the electrical integrity of the bare die. There are several challenges
to performing full electrical testing of a bare die to verify conformance to specifica-
tions. Also, until recently, bare die were not subjected to burn-in. Thus latent defects
went undetected with the bare die. With recent advances in the manufacture of semi-
conductor test equipment, and increased awareness of the importance of the KGD,
complex test and burn-in functions can be carried out at the wafer level [5, 6].
Market and functionality segments exist for both SoCs and KGD integration in
SiPs and three-dimensional (3-D) ICs, and these design approaches are complemen-
tary rather than competitive [3]. SoCs find applications in standardized processes for
digital-centric functions, thereby enabling easy and seamless integration of additional
functions when necessary. SiPs and stacked 3-D ICs provide an approach where a
mix of devices, components, and technologies are used to maximize performance and
cost. Designers are thus able to drastically reduce the time-to-market with the choice
of such design technologies. It is therefore important to address the test challenges of
SoCs as well as reduce the test cost for the manufacture of KGDs at the wafer level.
1.1 Background
In this section, we review some key testing methods and concepts that are referred
to in the rest of the thesis.
4
1.1.1 System-level design-for-test and test scheduling for core-
based SoCs
The testing of core-based SoCs requires the availability of a suitable on-chip test
infrastructure, which typically include test wrappers and TAMs. A core test wrapper
is the logic circuitry that is added around the embedded core to provide suitable test
access to the core, while at the same time isolating the core from its surrounding
logic during test [7, 8, 9]. The test wrapper provides each core with a normal mode
of operation, an external-test mode, and an internal-test mode. When the core is
in the normal mode of operation, it maintains the functionality that is desired for
proper device operation; the wrapper is transparent to the surrounding logic in this
mode of operation. The core in its external-test mode observes the wrapper elements
for interconnect test, and when the core is in the internal-test mode, the wrapper
elements control the state of the core input terminals for testing the core internal
logic. The TAM transports test stimuli and responses between the SoC pins and the
core terminals. Careful design of test wrappers and TAM can lead to significant cost
savings by minimizing the overall test time [8, 9, 10, 11, 12, 13, 14].
Figure 1.3 illustrates the use of generic core test wrappers and TAMs for a design
with N embedded cores [2]. The test source provides test vectors to the embedded
cores via on-chip linear feedback shift registers (LFSRs), a counter, ROM, or off-chip
automatic test equipment. A test sink, by means of on-chip signature analyzers, or off-
chip automatic test equipment (ATE), provides verification of the output responses.
The TAM is user-defined; the system integrator must design these structures for the
SoC by optimally allocating TAM wires to the embedded cores in the SoC with the
objective of minimizing the overall test time. The TAM is not only used to transport
test stimuli and responses to and from the cores, but it is also used for interconnect
test between the embedded cores in the SoC. The test access port (TAP) receives
5
Figure 1.3: Test architecture based on wrappers and a TAM [2].
control signals from outside to control the mode of operation of test wrappers; the
TAP enables the loading of test instructions serially on to the test-wrappers.
6
however, these methods are aimed at reducing the test time for package test only.
They do not address the problems that are specific to wafer-level testing.
In addition to the need for effective test techniques for defect screening and speed
binning for ICs, there is an ever-increasing demand for high device reliability and low
defective-parts-per-million levels. Semiconductor manufacturers routinely perform
reliability screening on all devices before shipping them to customers [20]. Accelerated
test techniques shorten time-to-failure for defective parts without altering the device
failure characteristics [21]. Burn-in is one such technique that is widely used in the
semiconductor industry [6, 21].
The long time intervals associated with burn-in result in high cost [1, 22, 23].
It is however unlikely that burn-in will be completely eliminated in the near future
for high-performance chips and microprocessors [1]. Wafer level burn-in (WLBI) has
recently emerged as an enabling technology to lower the cost of burn-in [6]. In this
approach, devices are subjected to burn-in and electrical testing while in the bare
wafer form. By moving the burn-in process to the wafer-level, significant cost savings
can be achieved in the form of lower packaging costs, as well as reduced burn-in and
test time.
Test during burn-in at the wafer-level enhances the benefits that are derived from
the burn-in process. The monitoring of device responses while applying suitable test
stimuli during WLBI leads to the easier identification of faulty devices. We refer to
this process as “wafer-level test-during-burn-in” (WLTBI); it is also referred to in the
literature as “test in burn-in” (TIBI) [21], “wafer-level burn-in test” (WLBT) [24],
etc.
Figure 1.4 illustrates and compares the test and burn-in flow in a semiconductor
7
manufacturing process. The manufacturing flow for package-level burn-in (PLBI) is
shown in Figure 1.4(a); Figure 1.4(b) highlights the manufacturing flow when WLTBI
is employed for test and burn-in at the wafer-level. Test and burn-in of devices in the
bare wafer form can potentially reduce the need for post-packaging test and burn-in
for packaged chips and KGDs. In the manufacture of KGDs, WLTBI eliminates the
need for a die-carrier and carrier burn-in, thereby resulting in significant cost savings.
The basic techniques used for the testing and burn-in of individual chips are the
same as those used in WLTBI. Test and burn-in require the availability of suitable
electrical excitation of the device/die under test (DUT), irrespective of whether it is
done on a packaged chip or a bare die. The only difference lies in the mode of delivery
of the electrical excitation. Mechanically contacting the leads provides electrical bias
and excitation during conventional testing and burn-in. In the case of WLTBI, this
excitation can be provided in any of the following three ways: the probe-per-pad
method, the sacrificial metal method and the built-in test/burn-in method [25].
The built-in test/burn-in method involves the use of on-chip design-for-test (DfT)
infrastructure to achieve WLTBI. This technique allows wafers to undergo full-wafer
contact using far fewer probe contacts. The presence of sophisticated built-in DfT
features on modern day ICs makes “monitored burn-in” possible. Monitored burn-in
is a process where a DUT is provided with input test patterns; the output responses
of the DUT are monitored on-line, thereby leading to the identification of failing
devices. It is therefore clear that WLTBI has a significant potential to lower the
overall product cost by breaking the barrier between burn-in and test processes. As
a result, ATE manufacturers have recently introduced WLBI and test equipment
that provide full-wafer contact during burn-in and they also provide test monitoring
capabilities [6, 24, 26].
8
9
(a) (b)
Figure 1.4: Test and burn-in flow using: (a) PLBI; (b) WLTBI.
Figure 1.5: Flip-flops in a circuit connected as a scan chain.
Scan design is a widely used DfT technique that provides controllability and observ-
ability for flip-flops by adding a scan mode to the circuit. When the circuit is in scan
mode, all the flip-flops form one or more shift registers, also known as scan chains.
Using separate scan access I/O pins, test patterns are serially shifted in to the scan
chains and test responses are serially shifted out. This process significantly reduces
the cost of test by transforming the sequential circuit into a combinational circuit for
test purposes. For circuits with scan designs, the test process involves test pattern
application from external ATE to the primary inputs and scan chains of the DUT.
To make a pass/fail decision on the DUT, the states of the primary outputs and the
flip-flops are fed back to the ATE for analysis. Figure 1.5 illustrates how flip-flops
are connected to form a scan chain.
The ATE is first used in the semiconductor manufacturing process during wafer
sort, when the chip is still in the bare wafer form. Effective defect screening at the
wafer level leads to significant cost savings by eliminating the assembly and further
testing of faulty die. Data generated during the sort process quickly provides valuable
feedback to the wafer fab. This information is time-sensitive, and the timely reporting
10
of this information to the fab can facilitate changes to the manufacturing process that
can increase the yield.
Wafer-level testing leads to early defect screening, thereby reducing packaging and
production cost [2, 27, 28]. As highlighted in [1, 29], packaging cost accounts for a
significant part of the overall production cost. Current packaging costs for a cost-
sensitive, yet performance-driven, IC can vary between $3.6 to $20.5, depending
on the number of pins in the IC [1]. These costs are further increased for high-
performance ICs. It has also been reported that the packaging cost per pin exceeds
the cost of silicon per square millimeter, and the number of pins per die can easily
exceed the number of square millimeters per die [1, 29].
Several challenges are associated with testing at the wafer-level. These challenges
need to be addressed in order to reduce the cost associated with the complete test
process of a semiconductor chip.
Semiconductor companies often resort to the use of low-cost testers at the wafer-
level to reduce the overall capital investment on ATE. These testers are constrained
by the limited amount of memory available to store test patterns and responses,
the number of available tester channels, and the maximum frequency at which they
can be operated. Reduced memory and the limited the number of available tester
channels reduce the number of devices that can be tested simultaneously. This is
especially a severe limitation at the wafer-level since there are multiple dies on a
single wafer; decrease in parallelism due to tester limitations results in a significant
increase in the overall test time.
Measurement inaccuracies are common when analog cores are tested in a mixed-
signal test environment based on digital signal processing. This problem is exacer-
11
bated by noisy DC power supply lines, improper grounding of the wafer probe, and
lack of proper noise shielding of the wafer probe station [30]. The above problems
make test and characterization at the wafer-level especially difficult, and they can
lead to high yield loss during wafer sort.
The scaling of test costs for semiconductor devices highlights the need for new
techniques to minimize the overall test cost. Several techniques to minimize the
overall test time for SoCs during package testing have been proposed in [8, 10, 11,
12, 9]. In contrast, test planning for effective utilization of hardware resources for
wafer-level has not been studied. There is a need for basic research in two focus
areas related to wafer-level testing of core-based digital SoCs. It is common practice
in industry to partially test these devices at the wafer level in order to reduce test
cost. The first focus area addresses wafer-level test planning of these devices under
constraints of test application time. The ATE is also constrained by the number of
available tester channels because of the use of low-cost digital testers. The second
focus area develops test techniques to test these devices at the wafer-level under such
limitations.
12
increasing importance for such SoCs. Wafer-level defect screening techniques to test
these devices are essential in order to minimize the overall test cost.
Wafer sort testing was once considered a method to save packaging costs by elim-
inating bad dies. Today, wafer sort is an important step in process control, yield
enhancement, and yield management [34]. The emerging trend of selling bare dies
(KGDs) instead of packaged parts further emphasizes the importance of wafer sort.
KGDs are handled in the following ways: a) packaged by the customer in a custom
package; b) mounted directly on a substrate; c) combined with other die in a MCP
or SiPs [34]. KGDs produced with different process technologies can be integrated
into a high-density product at the package level.
With the emergence of MCPs and SiPs, the yields of the individual die making up
the package determine the overall yield of the product. Full functional and structural
testing of these devices at wafer sort is therefore important. In addition to testing
these devices, there is a need to burn-in these devices in their bare wafer form to
weed out all latent defects and ensure reliable operation.
The test flow for a typical SiP, shown in Figure 1.6, highlights the need for cost-
effective wafer-scale test and burn-in solutions.
WLTBI technology has recently made rapid advances with the advent of the KGD
[35]. The growing demand for KGDs in complex SoC/SiP architectures, multi-chip
modules, and stacked memories, highlights the importance of cost-effective and vi-
able WLTBI solutions [6]. WLTBI will also facilitate advances in the manufacture of
3-D ICs, where bare dies or wafers must be tested before they are vertically stacked.
13
Figure 1.6: System-in-package test flow [3].
Recently, Motorola teamed up with Tokyo Electron Ltd. and W.L. Gore & Asso-
ciates Inc. to develop a WLTBI system for commercial use. These systems provided
a full-wafer direct-contact solution for bumped die used in flip-chip assembly applica-
tions [25]. Aehr Test Systems recently announced that it shipped full wafer contact
burn-in and test systems to a leading automotive IC manufacturer [36]. These test
systems have the ability to contact 14 wafers simultaneously by providing 30,000
contact-point capability per wafer. The test features of the system include a full al-
gorithmic test for memories, and vector pattern generator for devices using BIST [36].
14
A study was presented in [5] to compare the cost of wafer level burn-in and test with
the burn-in and test of a singulated die. It was shown that in a high-volume manu-
facturing environment, wafer-level burn-in and test was more cost-effective compared
to equivalent “die-pack” and chip-scale packaging technologies [5]. Similar WLBI
and TDBI equipment with response monitoring capabilities at elevated temperatures
have been successfully manufactured and deployed by other leading test equipment
manufacturing firms such as Advantest [37] and Delta-V instruments [6].
1. Burn-in for KGDs at the wafer level require less test insertions and also reduces
the burn-in cycle time when compared with die-level burn-in [25].
3. It has been shown in [5] that WLTBI is a cost-efficient technique for the man-
ufacture of reliable and fully functional KGDs.
4. Commercial WLTBI test equipment are currently being deployed by some lead-
ing semiconductor companies to lower manufacturing cost [5].
Thermal Challenges
15
the junction temperatures of the DUT need to be maintained within a small window
such that burn-in predictions are accurate.
Scan-based testing is now widely used in the semiconductor industry [39]. How-
ever, scan testing leads to complex power profiles during test application; in partic-
ular, there is a significant variation in the power consumption of a device under test
on a cycle-by cycle basis. In a burn-in environment, the high variance in scan power
adversely affects predictions on burn-in time, resulting in a device being subjected
to excessive or insufficient burn-in. Incorrect predictions may also result in thermal
runaway.
The challenges that are encountered during WLTBI are a combination of the
problems faced during the sort process and during burn-in. Wafer-sort is used to
identify defective dies at wafer level before they are assembled in a package. It is also
the first test-related step in the manufacturing process where thermal management
plays an important role. Current wafer probers use a thermal chuck to control the
device temperature during the sort process. The chuck is an actively regulated metal
device controlled by external chillers and heaters embedded under the device [38].
The junction temperature of the DUT is determined by the following relationship
[38, 40, 41]:
Tj = Ta + P · θja (1.1)
16
One of the important goals of the burn-in process is to keep the burn-in time to a
minimum, thereby increasing throughput, and minimizing equipment and processing
costs. It is also important to have a tight spread in temperature distribution of
the device, to increase yield and at the same time minimize burn-in time [38]. The
parameter Tj cannot exceed a pre-determined threshold due to concerns of thermal
runaway and the need to maintain proper circuit functionality. It is this issue of
controlling the spread in Tj over the period of test application that we address in this
thesis.
The objective of this thesis research is to reduce the overall cost of the product
by efficient test planning and test resource optimization at the wafer level. Four key
research problems are identified and solved in this thesis.
17
in the test cube such that the variation in power consumption during WLTBI
is minimized.
A recent SoC test scheduling method attempted to minimize the average test time
for a packaged SoC, assuming an abort-on-first fail strategy [42, 43]. The key idea
in this work is to use defect probabilities for the embedded cores to guide the test
scheduling procedure. These defect probabilities are used to determine the order in
which the embedded cores in the SoC are tested, as well as to identify the subsets of
cores that are tested concurrently. The defect probabilities for the cores were assumed
in [42] to be either known a priori or obtained by binning the failure information
for each individual core over the product cycle [43]. In practice, however, short
product cycles make defect estimation based on failure binning difficult. Moreover,
defect probabilities for a given technology node are not necessarily the same for the
next (smaller) technology node. Therefore, a yield modeling technique is needed to
accurately estimate these defect probabilities.
Test time is a major practical constraint for wafer sort, even more so than for
package test, because not all the scan-based digital tests can be applied to the die
under test. It is therefore important to determine the number of test patterns for
each core that must be used for the given upper limit on SoC test time for wafer
sort, such that the probability of successfully screening a defective die is maximized.
The number of patterns need to be determined on the basis of a yield model that
can estimate the defect probabilities of the embedded cores, as well as a “test escape
18
model” that provides information on how the fault coverage for a core depends on
its test length.
Reduced-pin count testing (RPCT) has been advocated as a design-for test tech-
nique, especially for use at wafer sort, to reduce the number of IC pins that needs
to be contacted by the tester [44, 45, 46, 47, 48]. RPCT reduces the cost of test by
enabling the reuse of old testers with limited channel availability. It also reduces the
number of probe points required during wafer test; this translates to lower test cost,
as well as less yield loss issues arising from contact problems with the wafer probe.
We have developed an optimization framework for wafer sort that addresses TAM
optimization and test-length selection for wafer-level testing of core-based digital
SoCs.
The test cost for a mixed-signal SoC is significantly higher than that for a digital SoC
[49]. This is due to the capital cost associated with expensive mixed-signal ATE, as
well as the high test times for analog cores. Test methods for analog circuits that
rely on low-cost digital testers are therefore especially desirable; a number of such
methods have recently been developed [50, 51, 52, 53, 54, 55, 52].
Despite the numerous benefits of testing at the wafer level, industry practitioners
have reported that mixed-signal test is seldom carried out at the wafer level [33, 56].
In our work, we present a new correlation-based signature analysis technique for
mixed-signal cores, which facilitates defect screening at the wafer-level. The proposed
technique is inspired by popular outlier analysis techniques for IDDQ testing [57, 58].
Outlier identification using IDDQ during wafer sort is difficult for deep-submicron
processes [59]. This problem has been addressed using statistical post-processing
19
techniques that utilize the test response data from the ATE [57]. We have developed
a similar classification technique that allows us to make a pass/fail decision under
non-ideal ambient conditions and using imprecise measurements. We present a wafer-
scale analog test method based on the use of low-cost digital testers, and with reduced
dependence on mixed-signal testers.
Several test scheduling techniques target reduction in overall test application time
while considering power consumption constraints [16], precedence constraints during
test [18], and conflicts between cores arising from the use of shared TAM wires.
However, test scheduling for WLTBI has not thus far been addressed in research
literature. In this thesis, we present a test scheduling technique that reduces the
variation in power consumption during WLTBI.
The higher power consumption of ICs during scan-based testing is a serious concern in
the semiconductor industry; scan power is often several times higher than the device
power dissipation during normal circuit operation [60]. Excessive power consumption
during scan testing can lead to yield loss. As a result, power minimization during
test-pattern application has recently received much attention [61, 62, 63, 64, 65, 66].
Research has focused on pattern ordering to reduce test power [61, 67, 68]. The
pattern-ordering problem has been mapped to the well-known Traveling Salesman
Problem (TSP) [67, 68]. Testing semiconductor devices during burn-in at wafer-
level requires low variation in power consumption during test [38]. A test-pattern
reordering method that minimizes the dynamic power consumption does not address
the needs of WLTBI. Specific techniques need to be developed to address this aspect
20
of low-power testing testing, i.e., the ordering of test patterns to minimize the overall
variation in power consumption.
In this thesis we address three important practical problems: (i) wafer-level modular
testing of core-based digital SoCs, (ii) wafer-level defect screening for “big-D/small-
A” SoCs, and (iii) power management for WLTBI. These problems are solved with
the underlying objective of lowering product cost, either by reducing the cost of
packaging, or the cost of testing and the associated test infrastructure. The remainder
of the thesis is organized as follows.
21
make decisions on the test-lengths for the cores under constraints of test application
time. Similar techniques for wafer-level RPCT are also developed. Simulation results
on the defect-screening probabilities are presented for five of the ITC’02 SoC Test
benchmarks.
22
proposed methods.
23
Chapter 2
We also present an optimization framework for wafer sort that addresses TAM op-
timization and test-length selection for wafer-level testing of core-based digital SoCs
when the tester has limited channel availability [74]. The objective here is to design
a TAM architecture for that utilizes a pre-designed underlying TAM architecture for
package test, and determine test-lengths for the embedded cores such that the overall
SoC defect-screening probability at wafer sort is maximized. The proposed method
reduces packaging cost and the subsequent test time for the IC lot, while efficiently
utilizing available tester bandwidth at wafer sort.
24
The key contributions of this chapter are as follows:
• We show how statistical yield modeling for defect-tolerant circuits can be used
to estimate defect probabilities for embedded cores in an SoC.
• We develop an ILP model to obtain optimal solutions for the test-length se-
lection problem. The optimal approach is applied to five ITC’02 SoC test
benchmarks, including three from industry.
• We present two techniques for test-length selection and TAM optimization. The
first technique is based on the formulation of a non-linear integer programming
model, which can be subsequently linearized and solved using standard ILP
tools. While this approach leads to a thorough understanding of the optimiza-
tion problem, it does not appear to be scalable for large SoCs. We therefore
describe a second method that enumerates all possible valid TAM partitions,
and then uses the ILP model presented in Section 2.2.1 to derive test-lengths
to maximum defect screening at wafer sort. This enumerative procedure allows
an efficient search of a large solution space. It results in significantly lower
computation time than that needed for the first method. Simulation results on
TAM optimization and test-length selection are presented for five of the ITC’02
SoC Test benchmarks.
25
2.1 Defect probability estimation for embedded
cores
In this section, we show how defect probabilities for embedded cores in an SoC can
be estimated using statistical yield modeling techniques.
We adapt the yield model presented in [75, 76, 77] to model the yield of the indi-
vidual cores in a generic core-based SoC. The model presented in [75] unifies the
“small-area clustering” and “large-area clustering” models presented in [76] and [77],
respectively. It is assumed in [75, 76] that the number of defects in a given area A is a
random variable that follows a negative-binomial distribution. The negative binomial
distribution is a two-parameter distribution characterized by the parameters λA and
αA . The parameter λA denotes the average number of defects in an area A. The
clustering parameter αA is a measure of the amount of defect clustering on the wafer.
It can take values that range from 0.5 to 5 depending on the fabrication process, with
lower values of α denoting increased defect clustering. The probability P(x, A) that
x faults occur in area A is given by:
The above yield model was validated using industrial data in [78], and it has
recently been used in [79, 80, 81]. An additional parameter incorporated in [75] is
the block size, defined as the smallest value B such that the wafer can be divided
into disjoint regions, each of size B, and these regions are statistically independent
with respect to manufacturing defects. As in [75], we assume that the blocks are
rectangular and can be represented by a tuple (B1 , B2 ), corresponding to the dimen-
26
sions of the rectangle. The goal of the yield model in [75] was to determine the
effect of redundancy on yield in a fault-tolerant VLSI system. The basic redundant
block is called a module, and the block is considered to be made up of an integer
number of modules. Since our objective here is to model the yield of embedded (non-
overlapping) cores in an SoC, we redefine the module to be an imaginary chip area
denoted by (a1 , a2 ). The size of the imaginary chip area, i.e., the values of a1 and a2
can be fixed depending on the resolution of the measurement system, e.g., an optical
defect inspection setup. In this chapter we assume the dimensions of the imaginary
chip area, a1 and a2 , to be unity.
We use the following steps to estimate the defect probabilities for the embedded cores:
(1) Determine the block size: Empirical data obtained on wafer maps and tech-
niques described in [75] can be used to determine the block size. The block size helps
us to determine the model parameters αB and λB , where λB refers to the average
number of defects within a block B of size (B1 , B2 ), and αB is the clustering param-
eter for the block. The size of the block plays an important role in our procedure
to determine core defect probabilities. We next describe the procedure to determine
the block size.
Efficient techniques for determining the block size have been presented in [75],
and these techniques have been validated using empirical data. The block size can
be determined using a simple iterative procedure, in which the wafer is divided into
rectangular sub-areas (blocks), whose sizes are increased at every step. Starting with
blocks of size I = 1, J = 1, we alternately increase I and J. For each fixed value of
block size I × J, we then calculate the corresponding parameter αB (I, J) and arrange
these values in a matrix. The value of (I, J), for which the difference between αB (I, J)
27
Figure 2.1: Defect estimation: Placement of a core with respect to blocks.
and αB (1, 1) is minimum, is chosen as the block size. The value of αB (I, J) can be
determined using standard estimation techniques such as the moment method, the
maximum likelihood method, or curve fitting [76]. The clustering parameter remains
constant within a block and increases when the are consists of multiple blocks [75, 76];
this property forms the basis for determining the block size.
In our work, we make the following assumptions: (a) as in [75], we assume that
the area of the block consists of an integer number of imaginary chip areas; (b) the
block size and its negative binomial parameters are pre-determined using rigorous
statistical information processing of wafer defect maps. The illustration in Figure 2.1
represents a cross section of a wafer and its division among blocks. The dimensions
of the block in Figure 2.1 is (2, 3), and each block contains 8 imaginary chips of area
(1, 1).
(2) We consider each core in the SoC to be an “independent chip”. Let us consider
a core represented by (C1 , C2 ), block size (B1 , B2 ), and imaginary chip (a1 , a2 ). The
imaginary chip is a sub-area in a block. For a fault in a block, the distribution of
the fault within the area of the block is uniform; the imaginary chip area parameters
28
λm and αm take on values λB /B and αB respectively. The relationship between
the imaginary chip area parameters and the block parameters can be established
using techniques proposed in [75]. The purpose of dividing a wafer into blocks, is to
facilitate the division of a wafer into sub-areas, such that distinct fault clusters are
contained in distinct blocks (each block is statistically independent with respect to
manufacturing defects). We now determine the probability that the core is defective
using the following steps:
(a) In a statistical sample of multiple wafers, a core can be oriented in different
configurations with respect to the block. The number of possible orientations of the
core with respect to the block in the wafer is given by min{B1 , C1 } × min{B2 , C2 }.
The dimensions of the block in Figure 2.1 are smaller than that of the core. The
number of possible orientations for the core in Figure 2.1 is therefore 2 × 4, i.e., there
are 8 possible core orientations with respect to the block in Figure 2.1. The list of
possible values (R1 , R2 ) in Figure 2.1 can take, (1,1), (1,2), (1,3), (1,4), (2,1), (2,2),
(2,3), (2,4), intuitively illustrates the 8 possible core orientations with respect to a
block of size (2,4).
(b) For each orientation, determine the distance from the top-left corner of the core to
the closest block boundaries. This is represented as (R1 , R2 ), the two values denoting
distances in the Y and X directions, respectively; the placement of the core with
respect to the block determines the way the core is divided into complete and partial
blocks. In Figure 2.1, we have R1 = R2 = 1.
(c) The dimensions of the core can now be represented as C1 = R1 + n1 · B1 + m1 ,
and C2 = R2 + n2 · B2 + m2 , where n1 and m1 are defined as:
n1 = ( C1B−R
1
1
)
m1 = (C1 − R1 ) mod B1
29
m2 for the illustrated orientation in Figure 2.1 are all 1.
(d) The core can be divided into a maximum of nine disjoint sub-areas for the orienta-
tion illustrated in Figure 2.1, with each sub-area placed in a different block. Dividing
the core into independent sub-areas allows for the convolution of the probability of
failure of each individual sub-area. Let us assume that there are a total of D sub-
areas; the probability that the core is defect-free is given by P (R1 ,R2 ) = D
i=1 a(Ni ).
The superscript (R1 , R2 ) indicates the dependence of this probability on the place-
ment. Here a(Ni ) denotes the probability that all the Ni imaginary chip areas in
the sub-area i are defect-free. This probability can be obtained from Equation (2.2)
shown below, where a(k, N) denotes the probability of k defect-free modules in a
sub-area with N modules. By substituting N instead of k in Equation (2.2), we
obtain Equation (2.3). This is done in order to estimate the probability that a block
is fault-free.
N
−k −αm
N N − k (i + k)λ m
a(k, N) = (−1)i 1+ (2.2)
k i=0 i αm
−αm
Nλm
a(N, N) = a(N) = 1+ (2.3)
αm
The process of dividing the area of a core into multiple sub-areas facilitates the
application of large-area clustering conditions on the individual sub-areas. It is im-
portant to distinguish between sub-areas i = 1, 3, 7, 9 and i = 2, 4, 5, 6, 8 in Figure
2.1. In the latter case, the sub-area i is divided into several parts, each contained in a
different block. The derivation of the probability density function for these sub-areas
is now a trivial extension of the base case represented by Equation (2.3).
(e) The final step is the estimation of the defect probability for the core. We first
estimate the probability that the core is defect-free for all possible values of R1 and
30
Figure 2.2: Flowchart depicting the sequence of procedures used to estimate core
defect probabilities.
1
min(B1 ,C1 ) min(B2 ,C2 )
We use Figure 2.1 to illustrate the calculation of the defect probability for an
embedded core. The figure represents the relative placement of a core with respect
to the blocks. We have a block size of (4, 2), a core size of (6, 4) and imaginary chip
area of size (1, 1). The core is divided into nine distinct sub-areas numbered 1 − 9.
31
For values of αB = 0.25 and λB = 0.1, we now determine the probability that the
core is defect-free using Equation (2.3):
= 0.76206 (2.5)
The above procedure is repeated until the defect-free probability for all min(B1 , C1 )
× min(B2 , C2 ) combinations of R1 and R2 are determined. The final core defect-free
probability is then calculated using Equation (2.4). The probability that the core
has a defect is simply P̄ = 1 − P. For a given SoC, this procedure can be repeated
for every embedded core until all defect probabilities are obtained. The flowchart in
Figure 2.2 summarizes the sequence of procedures that lead to the estimation of core
defect probabilities. The procedure begins by accumulating wafer defect information
and information on the individual core dimensions. This information is then used to
determine the size of the block and the block parameters, λb and αb . These are then
used to calculate parameters for the imaginary chip area. The defect probability of
the core is then calculated for all possible core orientations, with respect to a block
in the wafer; the defect probability of the core is the calculated using Equation (2.4).
32
dimensions as given in [82] to derive information pertaining to the size of the indi-
vidual modules in the ITC’02 SOC Test benchmarks. Since these benchmarks do not
provide information about the sizes of the embedded cores, we use the total number
of patterns for each core as an indicator of size. This assumption helps us extract the
relative size of a core by normalizing it with respect to the overall SoC dimensions.
We use layout information in the form of X − Y coordinates for the SoC as described
in [82]; the bottom-left corner of the SoC has X − Y coordinates of (0, 0), and the
layout information provides information on the X − Y coordinates of the top-right
corner of the SoC. The sequence of procedures in Figure 2.2 is then performed to
determine the core defect probabilities. Table 2.1 shows the defect probabilities for
each core in four of the ITC’02 SoC test benchmark circuits [83], estimated using the
parameters αB = 0.25 and λB = 0.035.
In this section, we formulate the problem of determining the test-length for each em-
bedded core, such that for a given upper limit on the SoC test time (expressed as a
percentage of the total SoC test time), the defect-screening probability is maximized.
We present a framework that incorporates the defect probabilities of the embedded
cores in the SoC, the upper bound on SoC test time at wafer sort, the test lengths
for the cores, and the probability that a defective SoC is screened. The defect prob-
abilities for the cores are obtained using the yield model presented in Section 2. Let
us now define the following statistical events for Core i:
Ai : the event that the core has a fault; the probability associated with this event is
determined from the statistical yield model.
Bi : the event that the tests applied to Core i do not produce an incorrect response.
Āi and B̄i represent events that are complementary to events Ai and Bi , respectively.
33
Table 2.1: Core defect probabilities for four ITC’02 SoC test benchmark circuits.
Two important conditional probabilities associated with the above events are yield
loss and test escape, denoted by P(B̄i | Āi ) and P(Bi | Ai ), respectively. Using a ba-
sic identity of probability theory, we can derive the probability that the test applied
34
to Core i detects a defect:
Due to SoC test time constraints during wafer-level testing, only a subset of the
pattern set can be applied to any Core i, i.e., if the complete test suite for the SoC
contains pi scan patterns for Core i, only p∗i ≤ pi patterns can be actually applied to
it during wafer sort. Let us suppose the difference between the SoC package test time
and the upper limit on wafer sort test time is ΔT clock cycles. The test time for each
TAM partition therefore needs to be reduced by ΔT clock cycles, if we assume that
the package test times on the TAM partitions are equal. The value of p∗i adopted for
Core i depends on its wrapper design. The larger the difference between the external
TAM width and internal test bitwidth (number of scan chains plus the number of
I/Os), the greater the impact of (pi − p∗i ) on ΔT . In fact, given two cores (Core i and
Core j) with different wrapper designs, the reduction in the number of patterns by
the same amount, i.e., pi − p∗i = pj − p∗j , can lead to different amount of reductions in
core test time (measured in clock cycles). Let f ci (p∗i ) be the fault coverage for Core
i with p∗i test patterns.
We next develop the objective function for the test-length selection problem.
The goal of this objective function is to satisfy two objectives: (1) Maximize the
probability that Core i fails the test; 2) Minimize the overall test escape probability.
The ideal problem formulation is one that leads to an objective function satisfying
both the above objectives.
Let us now assume that the yield loss is γi , the test escape is βi , and the probability
that Core i has a defect is θi . Using these variables, we can rewrite Equation (2.6)
as:
P(B̄i ) = f ci (p∗i ) · θi + γi · (1 − θi ) (2.7)
35
Similarly we can rewrite P(Bi ) as follows:
We therefore conclude that for a given value of αi and γi, the objective function
that maximizes the probability P(B̄i ) that Core i fails the test, also minimizes the test
escape βi . Therefore, it is sufficient to maximize P(Bi ) to ensure that the test escape
rate is minimized. In our study, we assume that the yield loss γi is negligible for each
core. Assuming that the cores fail independently with the probabilities derived in
Section 2, the defect-screening probability PS for an SoC with N embedded cores is
given by PS = 1 − N i=1 P(Bi ).
Let the upper limit on the test time for an SoC at wafer sort be Tmax (clock
cycles). This upper limit on the scan test time at wafer sort is expected to be a
fraction of the scan test time TSoC (clock cycles) for package test, as determined by
the TAM architecture and test schedule. The fixed-width TAM architecture requires
that the total test time on each TAM partition must not exceed Tmax .
If the internal details of the embedded cores are available to the system integrator,
fault simulation can be used to determine the fault coverage for various values of p∗i ,
i.e., the number of patterns applied to the cores during wafer sort. Otherwise, we
36
model the relationship between fault coverage and the number of patterns with an
exponential function. It is well known in the testing literature that the fault coverage
for stuck-at faults increases rapidly initially as the pattern count increases, but it
flattens out when more patterns are applied to the circuit under test [2, 86]. In our
log10 (p∗i +1)
work, without loss of generality, we use the normalized function f ci (p∗i ) = log10 pi
Let i (p∗i ) be the defect-escape probability for Core i when p∗i patterns are applied
to it. This probability can be obtained using Equation (2.8) as a function of the test
escape βi and the probability θi that the core is faulty. The value of θi for each core
in the SoC is obtained using the procedure described in Section 2.2.
PT LS : Given a TAM architecture for a core-based SoC and an upper limit on the
SoC test time, determine the total number of test patterns to be applied to each
core such that: (i) the overall testing time on each TAM partition does not exceed
the upper bound Tmax and (ii) the defect-screening probability P(B̄i ) for the SoC is
maximized. The objective function for the optimization problem is as follows:
N
Maximize Y = 1 − P(Bi )
i=1
where the number of cores in the SoC is N. We next introduce the indicator binary
variable δij , 1 ≤ i ≤ N, 0 ≤ j ≤ pi , which ensure that exactly one test-length is
selected for each core. It is defined as follows:
1 if p∗i = j
δij =
0 otherwise
37
where pi
i=1 δij = 1. The defect escape probability ∗i for Core i is given by ∗i =
qi
j=1 δij i (j). We next reformulate the objective function to make it more amenable
for further analysis. Let F = ln(Y ). We therefore get:
F = ln(Y )
N
= ln 1 − P(Bi )
i=1
N
= ln(1 − i )
i=1
N
pi
= ln 1 − δij i (j)
i=1 j=1
x2 x3
We next use the Taylor series expansion ln(1 − x) = − x + 2
+ 3
+ ···
and ignore the second- and higher-order terms [87]. This approximation is justified
if the defect-escape probability for Core i is much smaller than one. While this is
usually the case, occasionally the defect-escape probability is large; in such cases, the
optimality claim is valid only in a limited sense. The impact that this approximation
has on the overall defect-screening probability of the SoC is examined in Section 2.3.
The simplified objective function is given by:
N
pi
Maximize F = − δij i (j) (2.9)
i=1 j=1
N
pi
Minimize F = δij i (j) (2.10)
i=1 j=1
38
Next we determine the constraints imposed by the upper limit on the SoC test
time. Suppose the SoC-level TAM architecture consists of B TAM partitions. Let
Ti (j) be the test time for Core i when j patterns are applied to it. For a given Core
i on a TAM partition of width wB , we use the design-wrapper technique from [8]
to determine the longest scan in (out) chains of length si (so ) of the core on that
TAM partition. The value of Ti (j) can be determined using the formula Ti (j) =
(1 + max{si , so } · j + min{si , so }) [8]. The test time Ti∗ for Core i is therefore given
by Ti∗ = pi
j=1 δij Ti (j). Let Aj denote the set of cores that are assigned to TAM
partition j. We must ensure that Corei ∈Aj Ti∗ ≤ Tmax , 1 ≤ j ≤ B.
The number of variables and constraints for a given ILP model determines the
complexity of the problem. The number of variables in the ILP model is only N +
N N
i=1 pi , and the number of constraints is only N
i=1 N · pi + B; thus this exact
approach is scalable for large problem instances. The complete ILP model is shown
as Figure 2.3.
N pi
Minimize F = i=1 j=1 δij i (j) subject to:
pi
1. i=1 δij = 1, 1 ≤ i ≤ N , 0 ≤ j ≤ pi
∗
2. Corei ∈Aj Ti ≤ Tmax , 1 ≤ j ≤ B
3. δij = 0 or 1, 1 ≤ i ≤ N, 0 ≤ j ≤ pi
39
2.2.2 Efficient heuristic procedure
The exact optimization method based on ILP is feasible for the largest benchmarks
(contributed by Philips) in the ITC’02 SoC benchmark set. While these three bench-
marks are representative of industrial designs in 2002, current core-based SoCs are
larger in size. To handle such SoCs, we present a heuristic approach to determine
the test-length p∗i for each core, given the upper limit on maximum SoC test time.
The heuristic method consists of a sequence of five procedures. The objective of the
heuristic method is similar to that for the ILP technique, i.e., to maximize the over-
all defect-screening probability. The heuristic method performs an iterative search
over the TAM partitions. In each, step we identify a core for which a reduction in
the number of applied patterns results in a minimal decrease in the overall defect-
screening probability. This procedure is repeated until the time constraint on all TAM
partitions is satisfied. We next describe the procedures that make up the heuristic
method.
1. We begin our heuristic procedure by assuming that all patterns are applied to
each core. This assumption implies that Corei ∈Aj Ti∗ = TSoC , 1 ≤ j ≤ B.
3. We use the design wrapper technique in our next procedure step, Ttime Update,
40
to determine the test time reduction for Core i (obtained using Tpat Reduce),
corresponding to the reduction in the number of test patterns Δp∗i . We denote
ΔTij as the reduction in test time obtained by reducing the number of test
patterns for Core i on TAM partition j by Δp∗i ; this can be obtained by solving
the following equation: ΔTmaxij = (1 + max(si , so )ij ) · Δp∗i + min(si , so )ij .
The core test time, Ti∗ , is now updated as Ti∗ − ΔTmaxij .
4. The Tmax Check procedure checks whether Corei ∈Aj Ti∗ ≤ Tmax , 1 ≤ j ≤ B.
This procedure is performed each time after procedure Tpat Reduce is executed.
5. If the check in procedure Tmax Check returns true for all TAM partitions, we
then compute the overall defect-screening probability for the SoC.
A sort operation is performed each time procedure Tpat Reduce is executed. Hence
the worst-case computational complexity of the heuristic procedure is O(pT · NlogN),
N
where N is the number of cores in the SoC, and pT = i=1 pi is the total number
of test patterns for package test for all the cores. The pseudocode for the heuristic
procedure, as shown in Algorithm 1, calculates the test-lengths and the defect-escape
probabilities for each core in the SoC.
We now present a greedy heuristic procedure to solve the test-length selection prob-
lem. This procedure was developed to demonstrate the need for an iterative heuristic
procedure that reduces the core test-lengths with minimal impact on defect-screening.
The heuristic approach in this section determines the test length p∗i for each core,
given the upper limit on maximum SoC test time as a constraint. Let us suppose
there are B TAM partitions in the SoC test access architecture. It is obvious that
41
Algorithm 1 Test-Length Selection
1: Let Tmax be the constraint on wafer test time for the SoC, Tmax = k · TSoC ,
0 ≤ k ≤ 1;
2: Let B = total number of TAM partitions;
3: Let k = fraction of TSoC permissible for wafer test;
∗
4: Corei ∈Aj Ti = TSoC , 1 ≤ j ≤ B;
5: while time constraint for the SoC is not satisfied for TAM partition j, 1 ≤ j ≤ B
do
6: for all cores in Aj do
7: Find i such that θi · (f ci (p∗i ) − (f ci (p∗i − Δp∗i ) is maximum;
8: end for
9: ΔTmaxij = (1 + max(si , so )ij ) · Δp∗i + min(si , so )ij ;
10: Ti∗ = Ti∗ − ΔTmaxij ;
11: for all TAM partitions, 1 ≤ j ≤ B do
12: if Corei ∈Aj Ti∗ ≤ Tmax , 1 ≤ j ≤ B then
13: Compute relative defect-screening probability for the SoC;
14: end if
15: end for
16: end while
17: return relative defect-screening probability PSr for the SoC;
we can satisfy the constraint on Tmax if we reduce the test time for all the cores in
each TAM partition to a fraction of the original test time.
Let us denote the maximum wafer-test time for Core i on TAM partition j as
Tmaxij . The test-length for the core corresponding to the test time Tmaxij is given
Tmaxij −min(si ,so )ij
by p∗i = 1+max(si ,so )ij
. With the knowledge of the test-length p∗i for each core
in the SoC, we can then proceed to determine the corresponding defect-escape prob-
abilities i (p∗i ), and then the overall defect-escape probability of the SoC given by
∗
SoC = N i=1 1 − i (pi ) . The heuristic procedure is simple and has a computational
complexity of only O(N). The above procedure is reasonable if the test times on
the TAM partitions are fairly close to one another. This however is not the case in
most industrial designs because of the heterogeneous nature of the cores in the SoC.
The pseudocode for the heuristic procedure, which calculates the test-lengths and
the defect-escape probabilities for each core in the SoC, is shown in Algorithm 2.
42
Algorithm 2 Test-length selection
1: Let Tmax = k · TSoC , 0 ≤ k ≤ 1; /*Constraint on wafer-test time for the SoC*/
2: Let B = total number of TAM partitions;
3: Let k = fraction of TSoC permissible for wafer test;
4: Let SoC = overall SoC defect escape probability during wafer-test;
5: while all core test-lengths have not been determined, do
6: for T AMj ← 1 to B do
7: Calculate max(si , so )ij and min(si , so )ij , ∀i on T AMj ;
Tmax −min(si ,so )ij
8: p∗i = 1+max(s
ij
i ,so )ij
, ∀i on T AMj ;
∗
9: Calculate i (pi ), ∀i on T AMj ;
10: end for
11: end while
∗
12: SoC = N i=1 1 − (p
i i ) ;
In this section, we present experimental results for five SoCs from the ITC’02 SoC
test benchmark suite [83]. We use the public domain ILP solver lpsolve for our
experiments [88]. Since the objectives of our experiment are to select the number
of test patterns in a time-constrained wafer sort test environment, and at the same
time maximize the defect-screening probability for the SoC, we present the following
results:
• Given values of W and Tmax relative to TSoC , the percentage of test patterns
for each individual core that must be applied at wafer sort to maximize the
defect-screening probability for the SoC.
• The relative defect-screening probability PSr for each core in an SoC, where
PSr = PS /PS100 and PS100 is the defect-screening probability if all 100% of the
patterns are applied per core.
• The relative defect-screening probability for each SoC obtained using the ILP
model and the proposed heuristic methods.
43
Table 2.2: Defect screening probabilities: ILP-based approach versus proposed heuristic approaches.
SoC W Tmax = 0.75 TSoC Tmax = 0.5 TSoC Tmax = 0.25 TSoC
Optimal Heuristic Greedy Optimal Heuristic Greedy Optimal Heuristic Greedy
Method Method Method Method Method Method Method Method Method
d695 8 0.9229 0.7316 0.4111 0.6487 0.5343 0.1091 0.4095 0.3834 0.0039
16 0.9229 0.7316 0.4113 0.6487 0.5759 0.1091 0.4308 0.3706 0.0039
24 0.9047 0.6400 0.4110 0.5985 0.3106 0.1091 0.3604 0.1779 0.0039
32 0.8765 0.4627 0.4110 0.5245 0.4024 0.1091 0.1666 0.1088 0.0039
p22810 8 0.7693 0.7473 0.1563 0.5947 0.4763 0.0053 0.0969 0.0302 ∼0
16 0.8137 0.6994 0.1553 0.5996 0.3302 0.0047 0.1699 0.0524 ∼0
44
24 0.7871 0.7079 0.0966 0.3340 0.3190 0.0032 0.0143 0.0012 ∼0
32 0.7656 0.5736 0.1553 0.3435 0.1706 0.0032 0.0414 0.0005 ∼0
p34392 8 0.8661 0.6576 0.0042 0.6869 0.3513 ∼0 0.3521 0.1036 ∼0
16 0.8807 0.6965 0.0041 0.7118 0.4400 ∼0 0.2157 0.0780 ∼0
24 0.8990 0.7010 0.0042 0.7207 0.4315 ∼0 0.2569 0.1835 ∼0
32 0.9161 0.5783 0.0042 0.6715 0.3007 ∼0 0.2278 0.0833 ∼0
p93791 8 0.4883 0.4406 0.1539 0.2299 0.0716 0.0097 0.0097 0.0048 ∼0
16 0.5341 0.4420 0.1539 0.2438 0.1161 0.0097 0.0168 0.0088 ∼0
24 0.7234 0.5547 0.1539 0.2535 0.1354 0.0096 0.0826 0.0015 ∼0
32 0.7098 0.6317 0.1539 0.3335 0.1351 0.0097 0.0548 0.0037 ∼0
We first present results on the number of patterns determined for the cores. The
results are presented in Figures 2.4-2.6 for three values of Tmax : 0.75TSoC , 0.50TSoC ,
and 0.25TSoC . For the three large “p” SoCs from Philips, we select the value of B
that minimizes the SoC package test time. The results show that the fraction of
patterns applied per core, while close to 100% in many cases, varies significantly in
order to maximize the SoC defect-screening probability. The maximum value of TAM
width W (in bits) is set to 32 and we repeat the optimization procedure for all TAM
widths ranging from 8 to 32 in steps of eight. Results are reported only for W = 8;
similar results are obtained for other values of W . The CPU time for lpsolve for the
largest SoC benchmark was less than a second.
We next present the defect-screening probabilities for all the individual cores in
the benchmark SoCs (Figures 2.7-2.9). The cores that are more likely to lead to
fails during wafer sort exhibit higher defect-screening probabilities, and vice versa.
A core with small defect probability ends up having more patterns removed from the
initial test suite during wafer sort. This is because a manufacturing defect is unlikely
to cause a failure in that core. The second reason for low relative defect-screening
probability is because certain cores have very few patterns that need to be applied
when test-lengths are reduced for these cores. As a result, we obtain significantly low
relative defect-screening probabilities for these cores. Even though the large SoCs
have low relative defect-screening probabilities, these are the optimal values under
the given test time constraints at wafer sort.
Finally, we compare our ILP-based optimization technique with the two heuris-
tic procedures on the basis of relative SoC defect-screening probabilities obtained
using the two methods. The values of the defect-screening probabilities PS of the
benchmark SoCs obtained using both the ILP-based model and the heuristic method
for varying TAM widths, as well as overall test time are summarized in Table 2.2.
45
The results show that, as expected, the ILP-based method leads to higher defect-
screening probabilities when compared with the heuristic procedure. Nevertheless,
the heuristic procedure is efficient for defect screening when Tmax = 0.75TSoC and
0.5TSoC . The greedy heuristic method on the other hand yields poor defect-screening
probabilities compared to the ILP method and the heuristic method. This shows
that the proposed heuristic method is effective for screening dies at wafer sort testing
of large SoCs. A significant percentage of the faulty dies can be screened at wafer
sort using our proposed techniques.
Figure 2.4: Percentage of test patterns applied to each core in p22810 for W = 8.
A Taylor’s series expansion of δi (j)i (j), without the higher-order terms, was used
in Section 2.2 to obtain a linear objective function for PT LS . If the defect-escape
probability for Core i is much smaller than unity, this assumption can be justified.
46
Figure 2.5: Percentage of test patterns applied to each core in p34392 for W = 8.
To study the effect of this approximation, we evaluated the approximation error for
industrial designs. We used a commercial nonlinear programming (NLP) solver [89] to
incorporate higher order terms in our objective function. The nonlinear programming
solver [89] uses the generalized reduced gradient (GRG) method to solve large-scale
nonlinear problems [90].
probability of the SoC using a nonlinear objective function. The nonlinear objective
47
Figure 2.6: Percentage of test patterns applied to each core in p93791 for W = 8.
Figure 2.7: Relative defect-screening probabilities for the individual cores in p22810
for W = 8.
48
Figure 2.8: Relative defect-screening probabilities for the individual cores in p34392
for W = 8.
Figure 2.9: Relative defect-screening probabilities for the individual cores in p93791
for W = 8.
49
function that we use in our experiments is shown in Equation (2.11).
N
pi
(δij i (j))2 (δij i (j))3
Minimize F = δij i (j) + + (2.11)
i=1 j=1
2 3
The relative magnitudes of the quadratic and cubic terms are negligible compared
to the leading order term when the defect-escape probability of the core is negligible.
We determine the approximation error as a measure to quantify the effect of these
PS−ILP
r −PS−NLP
r
higher-order terms on PSr . The approximation error is given by PS−ILP
r ×100%.
As in the case of any nonlinear optimization package, the commercial solver used
[89] cannot guarantee finding a globally optimal solution in cases where there are
distinct local optima and CPU time is limited. Knowledge of the convexity of the ob-
jective function and the constraints are essential to determine whether the nonlinear
test-length selection problem will yield globally optimal solutions. In other words, if
a function f (x) has a second derivative in the interval [a, b], a necessary and sufficient
condition for it to be convex in that interval is that the second derivative f (x) ≥ 0,
∀x in [a, b] [91]. It is evident that the a second derivative exists for the objective
function in Equation (2.11), and the function is convex; the solver therefore yields
globally optimal solutions for the nonlinear test-length selection problem.
The approximation errors for the d695 SoC, and two “p” SoCs from Philips are
shown in Table 2.3 respectively. The experimental results show that the relative
defect-screening probabilities for the SoC are consistently higher when a linear ob-
jective function is used. The error in predicting the defect-screening probability,
however, is less that 10% in most cases; our approximation is therefore reasonable
for the benchmark circuits used in this work. The CPU time for lpsolve, to solve the
ILP version of PT LS for the largest SoC benchmark was less than a second. The time
on the NLP solver [89] to solve PT LS with the nonlinear objective function ranges
50
from 4 minutes for the d695 SoC, to 26 minutes for the “p” SoCs from Philips. This
clearly indicates that the nonlinear version of PT LS is not scalable for large SoCs.
Suppose Core i is accessed from the SoC pins for package test using a TAM of width
wi (bits). Let us assume that for RPCT-based wafer sort, the TAM width for Core
i is constrained to be wi∗ bits, where wi∗ < wi . In order to access Core i using only
wi∗ bits for wafer sort, the pre-designed TAM architecture for package test needs to
be appropriately modified.
Figure 2.10(a) shows a wrapped core that is connected to a 4-bit wide TAM
width (wi = 4). For the same wrapped core, Figure 2.10(b) outlines a modified test
access design that allows RPCT-based wafer-level test with wi∗ = 2. For wafer sort
in this example, the lines T AMout [0], and T AMout [2] are not used. In order to ensure
efficient test access architecture for wafer sort, serial-to-parallel conversion of the test
51
data stream is necessary at the wrapper inputs of the core. A similar parallel-to-
serial conversion is necessary at the wrapper outputs of the cores. Boundary input
cells BIC[0], . . . , BIC[3], and boundary output cells BOC[0], . . . , BOC[3], which can
operate in both a parallel load and a serial shift mode, are added at the I/Os of the
wrapped core. Multiplexers are added on the input side of the core to enable the
use of a smaller number of TAM lines for wafer sort. A global select signal P T /W S
is used to choose either the package test mode (P T /W S = 0) or the wafer sort
mode (P T /W S = 1). For the output side, the multiplexers are not needed; the test
response can be serially shifted out to the TAM while the next pattern is serially
shifted in to the boundary input cells. Note the above design is fully compliant
with the IEEE 1500 standard [7] because no modifications are made to the standard
wrapper cells.
We next explain how the test time for Core i is affected by the serialization
process. Let Ti (j) be the total testing time (in clock cycles) for core i if it is placed
on TAM partition j of the SoC. Let wi (j) be the width of TAM partition j in the
pre-designed TAM architecture. At the wafer level, if only wi bits are available for
TAM partition j, we assume, as in [92] for hierarchical SoC testing, that the wi lines
are distributed equally into wi parts. Thus the wafer-level testing time for core i on
the test time for core i due to serialization for is Ti∗ (j) = Ti (j)·(4/2). Note that other
TAM serialization methods can also be used for wafer sort. While TAM serialization
can be integrated in an overall optimization problem, it is not considered here for the
sake of simplicity.
52
BIC [0]
TAMin[0]
BOC [1] TAMout[1]
TAMin[1] BIC [1] Wrapped
Core i
BOC [2] TAMout[2]
TAMin[0] TAMout[0] BIC [2]
TAMin[1] TAMout[1]
Wrapped MUX BOC [3] TAMout[3]
53
TAMin[2] Core i TAMout[2] 0 1
TAMin[2]
TAMin[3] TAMout[3]
TAMin[3] BIC [3]
PT/WS
(a) (b)
Figure 2.10: (a) Accessing a wrapped core for package test only (b) TAM design that allows RPCT-based wafer sort
using a pre-designed wrapper/TAM architecture.
2.4.1 Test-length and TAM optimization problem: PT LT W S
Let us now consider an SoC with a top-level TAM width of W bits and suppose it
has B TAM partitions of widths w1 , w2, · · · , wB , respectively. For a given value of
the maximum wafer level TAM width W ∗ , we need to determine appropriate TAM
sub-partitions of widths w1∗ , w2∗ , . . . , wB
∗
such that wi∗ ≤ wi , 1 ≤ i ≤ B, and
w1∗ + w2∗ + · · · + wB
∗
= W ∗ . The optimization problem PT LT W S can now be formally
stated as follows:
Problem PT LT W S : Given a pre-designed TAM architecture for a core-based SoC,
the defect probabilities for each core in the SoC, maximum available test bandwidth
at wafer sort W ∗ and the upper limit on the test time for the SoC at wafer sort TM AX ,
determine (i) the total number of test patterns to be applied to each core, and (ii)
the (reduced) TAM width for each partition, such that: (a) the overall testing time
on each TAM partition does not exceed the upper bound Tmax , and (b) the defect-
screening probability P (B̄i ) for the SoC is maximized.
The objective function for the optimization problem is the same as that developed
in Section 2.2.1 and is given by Equation (2.10). Due to serialization, the testing
time for core i on TAM partition j, is given by (wi (j)/wi∗(j)Ti (j) [92]. Therefore
the test time of core i when it is tested with a reduced bitwidth of wi∗ is given by
Equation (2.12).
pi w (j)
Ti∗
i
= δij Ti (j) (2.12)
j=1
wi∗ (j)
Let us now define a second binary indicator variable λik , to ensure that every core
in the SoC is tested using a single TAM width; this variable can be defined as follows:
1 if wi∗ = 1/k
λik =
0 otherwise
54
wi
It can be inferred from the above definition that k=1 λik = 1 and Equation (2.12)
can now represented as Ti∗ (j) = pi
j=1 k=1 δij Ti (j)λjk wi · k.
wi
The nonlinear term in
the constraint δij · λik can be replaced with a new binary variable uijk by introducing
two additional constraints:
δij + λik ≤ uijk + 1 (2.13)
A constraint to ensure that every core in a TAM partition is tested with the same
TAM width Wx∗ is also necessary and can be represented as shown in Equation (2.15).
The variable Aj denotes the set of cores that are assigned to TAM partition j. The
constraint must be satisfied for every core in Aj .
wi
k · λik = Wx∗ (2.15)
k=1
The complete ILP model is shown in Figure 2.11. The number of variables and
constraints in the ILP model determines the complexity of the problem. The number
N
of variables in our ILP model is i=1 (pi + wi + pi · wi ), and the number of constraints
N
is 2 · N + 2 i=1 (pi · wi ) + B + 1.
We now present the experimental results for two SoCs from the ITC’02 SoC test
benchmark suite [83]. We use the public domain ILP solver lpsolve for our experi-
ments [88]. Since the objectives of our experiment are to select the number of test
patterns in a time- and bitwidth-constrained wafer-sort environment, and at the same
time maximize the defect-screening probability, we present the following results:
• Given values of W ∗ and Tmax relative to TSoC , the percentage of test patterns
that must be applied for each individual core to maximize the defect-screening
55
N pi
Minimize F = i=1 j=1 δij i (j) , subject to :
nx pi wi
1) i=1 j=1 k=1 δij Ti (j)λjk wi · k ≤ Tmax ; ∀x, 1 ≤ x ≤ B
pi
2) j=1 δij = 1; ∀i, 1 ≤ i ≤ N
wi
3) k=1 λik = 1; ∀i, 1 ≤ i ≤ N
wi ∗
4) k=1 k · λik = Wx ; ∀Corei ∈ Aj
B ∗ ≤ W∗
5) x=1 Wx
• The relative defect-screening probability PSr for each core in an SoC, where
PSr = PS /PS100 and PS100 is the defect-screening probability if all 100% of the
patterns are applied per core.
• The relative defect-screening probability for the SoC obtained using the ILP
model.
We first present results on the number of patterns determined for the cores. The
results for the d695 benchmark SoC is presented in Figure 2.12 for three values of
Tmax : TSoC , 0.75TSoC and 0.5TSoC . The fraction of test patterns applied per core
is found to be different in each case to maximize the defect-screening probability.
56
Table 2.4: Relative Defect-Screening Probabilities Obtained Using PT LT W S (W = 32).
∗
SoC W Tmax = TSoC Tmax = 0.75TSoC Tmax = 0.5TSoC
Optimal Optimal Optimal
Distribution Defect-Screening Distribution Defect-Screening Distribution Defect-Screening
(w1 , w2 , w3 ) Probability (w1 , w2 , w3 ) Probability (w1 , w2 , w3 ) Probability
57
d695 8 (5,1,2) 0.3982 (5,1,2) 0.2907 (4,1,3) 0.1058
12 (5,1,6) 0.4426 (5,1,6) 0.3272 (5,3,4) 0.2631
16 (10,3,3) 0.9064 (10,3,3) 0.7279 (10,3,3) 0.4306
a586710 8 (1,4,3) 0.7294 (1,4,3) 0.6142 (1,4,3) 0.4623
12 (1,7,4) 0.7519 (1,7,4) 0.6682 (1,7,4) 0.5191
16 (1,8,7) 0.7621 (1,8,7) 0.6682 (1,8,7) 0.5191
Figure 2.12: Percentage of test patterns applied to each core in d695 when W ∗ = 16
and W = 32.
Results are reported only for W ∗ = 16 and W = 32; similar plots are obtained for
different values of W ∗ and W . Figure 2.13 illustrates the defect-screening probabilities
for the cores in the d695 benchmark for the above-mentioned test case.
We summarize the results for two benchmark SoCs in Table 2.4.2 for three different
values of W ∗ and W = 32. The relative defect-screening probabilities PS and TAM
partition widths to be used at wafer sort, obtained using PT LT W S , are enumerated
for both benchmark SoCs. The ILP-based technique takes up to 3 hours of CPU
time on a 2.4 GHz AMD Opteron processor with 4 GB of memory for d695, when
W ∗ = 16 and W = 32. The results show that a significant portion of the faulty dies
can be screened at wafer sort using the proposed technique.
58
Figure 2.13: Relative defect-screening probabilities for the individual cores in d695
when W ∗ = 16 and W = 32.
The ILP-based approach in Section 2.2 is efficient only for small SoCs. However,
due to its large size, it may not scale well for SoCs with a large number of cores.
It is therefore necessary to develop an alternative technique that can handle larger
SoC designs. We next propose an efficient heuristic approach Pe−T LT W S based on a
combination TAM partition-width enumeration and ILP.
59
sequence of procedures:
(i) Given the number of TAM partitions B and an upper limit on the maximum
TAM width W ∗ , we first enumerate all possible TAM partition combinations. This
enumeration can be done following the principle of a B-bit odometer, where each
bit corresponds to the width of each TAM partition. The odometer resets to one as
opposed to zero in the case of a conventional odometer (the maximum value that the
ith bit can take before a reset is wi ). At every increment in the odometer, we check
whether B
i=1 wi∗ = W ∗ .
All possible TAM partitions that meet the above condition are recorded as a valid
partition. We illustrate the above enumeration procedure with a small example. Let
us consider an SoC whose TAM architecture is fixed and designed for 5 bits, and parti-
tioned into three TAM partitions of widths 2, 3, and 1 respectively. The possible TAM
enumerations for the above partitions are { 1, 1, 1, 1, 2, 1, 1, 3, 1, 2, 1, 1, 2, 2, 1,
2, 3, 1}. If we consider W ∗ to be 4, then the valid TAM partitions are { 1, 2, 1, 2, 1, 1}.
(ii) For each valid TAM partition calculated in Step (i), we apply the test-length se-
lection procedure PT LS . We calculate the defect-screening probability for the SoC
from the results obtained using PT LS .
(iii) If the defect-screening probability of the new partition is greater than the pre-
vious partition, we store it as the new defect-screening probability, and store this
partition as the current optimal partition.
(iv) We repeat this procedure until all possible TAM partitions are enumerated.
60
Fig. 2.14. The results are shown for three values of Tmax : TSoC , 0.75TSoC and 0.5TSoC .
Results are reported only for W ∗ = 16 and W = 32; similar plots are obtained for
a range of values of W ∗ and W . Fig. 2.15 illustrates the relative defect-screening
probabilities for the cores in the p34392 benchmark for the above-mentioned test
case. The heuristic method results in lower defect-screening probability for most
cases compared with the ILP-based method; for higher values of W ∗ , the difference
in defect screening probability between the two methods decreases. The computation
time for the largest benchmark SoC p93791 was only 4 minutes, hence this approach
is suitable for large designs.
Figure 2.14: Percentage of test patterns applied to each core in p34392 when when
W ∗ = 16 and W = 32.
61
Table 2.5: Relative Defect-Screening Probabilities Obtained Using Pe−T LT W S .
SoC W∗ Tmax = TSoC Tmax = 0.75TSoC Tmax = 0.5TSoC
Optimal Optimal Optimal
Distribution Defect-Screening Distribution Defect-Screening Distribution Defect-Screening
(w1 , w2 , w3 ) Probability (w1 , w2 , w3 ) Probability (w1 , w2 , w3 ) Probability
a586710 8 (1,5,2) 0.5732 (1,5,2) 0.5341 (1,5,2) 0.3319
12 (2,6,4) 0.7014 (2,6,4) 0.5789 (2,6,4) 0.4449
16 (2,9,5) 0.7118 (2,9,5) 0.5837 (2,9,5) 0.4580
d695 8 (5,1,2) 0.5392 (4,2,2) 0.5471 (5,1,2) 0.3102
12 (7,2,3) 0.8139 (7,3,2) 0.5542 (7,2,3) 0.4116
62
16 (9,2,5) 0.8543 (9,3,4) 0.7022 (8,2,6) 0.5231
p34392 8 (3,3,2) 0.3385 (3,2,3) 0.2275 (3,2,3) 0.1110
12 (4,5,3) 0.6382 (4,4,4) 0.4360 (4,4,4) 0.2180
16 (6,7,3) 0.8010 (5,7,4) 0.5968 (4,7,5) 0.2948
p22810 8 (3,3,2) 0.1331 (3,3,2) 0.0580 (3,3,2) 0.0098
12 (4,6,2) 0.1891 (4,5,3) 0.1800 (3,6,3) 0.0333
16 (5,6,5) 0.6186 (6,6,4) 0.3841 (6,6,4) 0.1495
p93791 8 (2,4,2) 0.0606 (2,4,2) 0.0165 (2,4,2) 0.0050
12 (3,6,3) 0.2228 (3,7,2) 0.0949 (3,7,2) 0.0189
16 (4,8,4) 0.5018 (4,8,4) 0.2201 (4,8,4) 0.0615
Figure 2.15: Relative defect-screening probabilities for the individual cores in
p34392 when W ∗ = 16 and W = 32.
Geometric programming (GP) problems are convex optimization problems that are
similar to linear programming problems [93]. A GP is a mathematical problem of
the form
Minimize f0 (x)
subject to fi (x) ≤ 1, i = 1, · · · , m
gi (x) = 1, i = 1, · · · , p
where fi are posynomial functions, gi are monomials, and xi are the optimization
variables; it is implicitly assumed that the optimization variables are positive, i.e.,
xi > 0 [93]. Mixed-integer GPs are a class of problems that are hard to solve [93].
The problem PT LT W S can be modeled as a mixed-integer GP (MIGP) problem. In
63
this chapter, we employ a heuristic method to solve the MIGP problem Pgp−T LT W S
for test-length and TAM width selection. Using heuristic methods, approximate
solutions can be found in a reasonable amount of time; however, the optimality
of the solution cannot be guaranteed. Before we describe the GP-based heuristic
method, we need to modify the objective function to make it amenable for further
analysis. The objective of PT LT W S is to maximize the defect-screening probability,
PS = 1− N i=1 P(Bi ). This is equivalent to the following minimization-based objective
function.
N
pi
Minimize G = δij i (j)
i=1 j=1
N
Minimize G = i=1
pi
j=1 δij i (j) , subject to :
nx pi wi
i=1 j=1 k=1
δij Ti (j)λjk wi ·k
1) Tmax
≤ 1; ∀x, 1 ≤ x ≤ B
2) pi
j=1 δij = 1; ∀i, 1≤i≤N
k=1 λik = 1; ∀i, 1 ≤ i ≤ N
wi
3)
wi
k·λik
4) k=1
Wx∗
= 1; ∀Corei ∈ Aj
B ∗
x=1 Wx
5) W∗
=1
/* Constants : i (j), Tmax , W ∗ */
/* Variables : δij , λik ; 1 ≤ i ≤ N , 0 ≤ j ≤ pi */
The constraints for the optimization problem described in Section 2.4.1 can be
easily modified for use in the MIGP problem. The complete MIGP problem for
PT LT W S is shown in Figure 2.16. We use GP relaxation to transform the MIGP
problem to a general GP problem that can be solved using commercial tools [94]. To
obtain an approximate solution of the MIGP problem, the MIGP is relaxed to a GP
and solved using [94]; the result obtained in this way is an upper bound on the optimal
64
value of the objective function for the MIGP. The values of the variables obtained after
relaxation are then simply rounded towards the nearest integer. The heuristic then
iteratively reassigns the values of the variables such that the constraints are satisfied
while maximizing the defect-screening probability for the SoC. The heuristic used to
solve Pgp−T LT W S consists of the following steps:
2. We then use [94] to solve the relaxed MIGP problem. The resulting values of
the indicator variables δij are sorted for each core i. The highest value of δij for
each core is rounded to unity, while the remaining variables are rounded down
to zero.
3. The procedure then assigns the smallest value of TAM width to each core in
the SoC; i.e., λi1 = 1, ∀i. For the smallest value of TAM widths assigned to the
cores, the test time for each TAM partition is calculated.
4. The procedure then iteratively assigns additional TAM width to the TAM par-
tition with the maximum test time. This is repeated until B
i=1 wi∗ = W ∗ .
5. Once the TAM widths for RPCT are determined, we check to determine if
65
experiments, and choose the value that results in maximum defect screening PSr
for the SoC. This procedure, searches for a core in each TAM partition, which
yields a maximum value for θi · (f ci (p∗i ) − (f ci (p∗i − Δp∗i ). This is repeated until
the time constraint on all TAM partitions are satisfied.
6. The relative defect-screening probability PSr = PS /PS100 for each core in the
SoC is then calculated; PS100 is the defect-screening probability if all 100% of
the patterns are applied per core. This information is used to determine the
relative defect-screening probability for the SoC.
Experimental results obtained using the GP-based heuristic procedure are sum-
marized in Table 2.6. The results are represented in a similar fashion as in Table
II. The relative defect-screening probability obtained using the GP-based heuristic is
greater than that obtained using the enumerative heuristic technique and less than
that obtained using the ILP method. The computation time ranges from 6 minutes
for the a586710 SoC, to 51 minutes for the p93791 SoC.
We present experimental results on the approximation error in PSr when ILP and
heuristic methods are used to solve PT LT W S versus when NLP and GP-based meth-
ods are used. We use a commercial solver [94] for the GP-based heuristic method. The
relative defect-screening probability was determined for a nonlinear objective (Equa-
tion (2.11)) function using [89], where the quadratic and cubic terms are considered
in addition to the leading order term; this procedure is similar to the procedure
described in Section 2.3.
Let PS−ILP
r
denote the relative defect-screening probability of the SoC obtained
using a linear objective function, PS−e−T
r
LT W S the defect-screening probability us-
66
Table 2.6: Relative Defect Screening Probabilities Obtained Using the GP-based Heuristic Method.
Tmax = TSoC Tmax = 0.75TSoC Tmax = 0.5TSoC
Optimal Defect- Optimal Defect- Optimal Defect-
Distribution Screening Distribution Screening Distribution Screening
SoC W ∗ (w1 , w2 , w3 ) Probability (w1 , w2 , w3 ) Probability (w1 , w2 , w3 ) Probability
a586710 8 (4,2,2) 0.7226 (4,1,3) 0.5833 (4,1,3) 0.4594
12 (6,3,3) 0.7446 (6,3,3) 0.6391 (6,3,3) 0.5138
16 (9,3,4) 0.7582 (8,3,5) 0.6435 (8,3,5) 0.5120
d695 8 (4,1,3) 0.5027 (4,1,3) 0.5288 (4,1,3) 0.3014
12 (6,1,5) 0.7962 (6,2,4) 0.5532 (6,2,4) 0.3961
67
16 (8,2,6) 0.8420 (8,2,6) 0.6931 (8,2,6) 0.5090
p34392 8 (3,4,1) 0.3440 (3,3,2) 0.2330 (3,3,2) 0.1150
12 (3,4,4) 0.6473 (3,4,4) 0.4455 (3,4,4) 0.2251
16 (6,6,4) 0.8072 (6,5,5) 0.6081 (6,5,5) 0.3002
p22810 8 (3,2,3) 0.1346 (3,1,4) 0.0598 (3,1,4) 0.0100
12 (4,5,3) 0.1911 (4,6,2) 0.1848 (4,6,2) 0.0322
16 (4,5,7) 0.6246 (5,4,7) 0.3892 (5,4,7) 0.1508
p93791 8 (4,3,1) 0.0620 (4,3,1) 0.0170 (4,2,2) 0.0051
12 (2,6,4) 0.2285 (2,6,4) 0.0977 (2,6,4) 0.0193
16 (3,7,6) 0.5119 (3,8,5) 0.2216 (3,8,5) 0.0619
ability of the SoC using a nonlinear objective function, and PS−GP
r
the relative
defect-screening probability using the GP-based heuristic method. We determine
the approximation error as a measure to quantify the effect of these higher-order
terms on PSr . The approximation error obtained using the ILP method is deter-
PSr ILP −PSr NLP
mined as δILP = PSr NLP
× 100%. The approximation errors obtained using
PSr Heur −PSr NLP
the heuristic and GP are similarly determined as δHeur = PSr NLP
× 100% and
PSr GP −PSr NLP
δGP = PSr NLP
× 100% respectively.
As it is evident from the above equations, the results obtained using the non-
linear programming solver is used as a baseline case. This is because the results
obtained using GP-based heuristic are only bounds (upper bounds on the relative
defect-screening probability), and the results obtained using ILP and the enumera-
tive heuristic method are not optimal. The “p” benchmarks do not consider solutions
obtained using ILP because of the lack of a suitable solver to solve problems of this
size. The approximation errors for the benchmark circuits are presented in Tables
2.6-2.7. The time needed by the NLP solver [89] to solve PT LT W S with the nonlinear
objective function ranges from 6 minutes for the d695 SoC, to 4 hours for the “p”
SoCs from Philips. This clearly indicates that the nonlinear version of PT LT W S is
not scalable for large SoCs. The time to solve the GP-based heuristic ranges from 2
minutes for the d695 SoC, to 45 minutes for the “p” SoCs. The GP-based heuristic
can therefore be used to quickly determine bounds on PSr .
2.5 Summary
68
Table 2.7: Approximation error in relative defect-screening probability for d695 and
a586710.
Tmax = TSoC Tmax = 0.75TSoC Tmax = 0.5TSoC
∗
SoC W δHeur δGP δILP δHeur δGP δILP δHeur δGP δILP
d695 8 4.36 1.59 0.17 7.77 4.82 0.37 15.09 11.81 3.31
12 3.83 1.58 0.15 7.48 7.48 0.58 12.73 10.45 2.94
16 3.23 1.74 0.25 7.10 5.72 1.04 11.31 9.61 2.36
a586710 8 2.47 1.52 0.12 3.77 1.46 0.54 5.40 4.76 0.94
12 2.18 1.46 0.32 3.27 1.22 0.41 4.87 3.79 0.79
16 2.05 1.53 0.28 3.03 0.78 0.46 4.36 2.93 1.03
Table 2.8: Approximation error in relative defect-screening probability for the “p”
SoCs.
Tmax = TSoC Tmax = 0.75TSoC Tmax = 0.25TSoC
SoC W ∗ δHeur δGP δHeur δGP δHeur δGP
p22810 8 2.91 4.09 4.36 7.68 5.40 7.18
12 1.82 2.92 4.29 7.09 5.31 8.19
16 1.28 2.26 3.06 4.43 3.62 4.54
p34392 8 0.97 2.62 3.48 5.99 6.54 10.33
12 1.12 2.56 3.25 5.49 7.55 11.07
16 1.85 2.64 2.84 4.78 5.67 7.62
p93791 8 4.30 6.62 5.75 8.97 6.48 8.92
12 6.16 8.87 7.10 10.28 8.21 10.72
16 4.90 7.01 6.53 7.23 8.35 9.15
probabilities for the individual cores in an SoC can be obtained using statistical
modeling techniques. The defect probabilities were then used in an ILP model to
solve the test-length selection problem. The ILP approach takes less than a second
for the largest SoC test benchmarks from Philips. Experimental results for the ITC’02
SoC test benchmarks show that the ILP-based method can contribute significantly
to defect-screening at wafer sort. A heuristic method that scales well for larger SoCs
has also been presented.
We have also formulated a test-length and a TAM width selection problem for
wafer-level testing of core-based digital SoCs. To the best of our knowledge, this
69
is the first attempt to incorporate TAM-width-selection in the wafer-level SoC test
flow. Experimental results for the ITC’02 SoC test benchmarks using the optimal
method and the enumeration based approach show that the proposed approach can
contribute to effective defect screening at wafer sort.
70
Chapter 3
Conventional test techniques for mixed-signal circuits requires the use of a dedicated
analog test bus and an expensive mixed-signal ATE [95, 96]. In this chapter, we
present a correlation-based signature analysis technique for mixed-signal cores in a
SoC [97]. This method is specifically developed for defect screening at the wafer level
using low-cost digital testers, and with minimal dependence on mixed-signal testers.
The remainder of the chapter is organized as follows. Section 3.1 describes the
proposed signature analysis method for wafer-level test of analog cores. Simulation
results are presented to evaluate the signature analysis method. Section 3.2 describes
the cost model for a generic mixed-signal SoC. Section 3.3 details the reduction in
product cost that can be obtained using wafer-level testing for an industrial mixed-
signal SoC. Finally, Section 3.4 concludes this work.
71
3.1 Wafer-level defect screening: Mixed-signal cores
Test procedures for data converters can be classified as being based on either spectral-
based tests or code density tests. Spectral-based test methods [96] usually involve
the use of a suitable transform, such as the Fourier Transform, to analyze the out-
put. These methods are used to determine the dynamic test parameters of the data
converter. On the other hand, code density tests are based on the constructions of
histograms of the individual code counts [98]. The code counts of the data converter-
under-test are then analyzed and compared with the expected code counts to de-
termine its static parameters. Recent work in mixed-signal testing has focused on
spectral-based frequency domain tests, due to the inherent advantage of test time
over the code density tests. In [96], a test flow process is described, that uses only
the dynamic tests. A case study on sample data converters presented in [96] claims
that 96% of faults involving both static and dynamic specifications can be detected
without using the code density test technique. It is important to note that the pro-
cedure described in [96] is aimed at production testing. In [99], it has been shown
that frequency-domain-based signature analysis helps in suppressing non-idealities
associated with the test data, and it serves as a robust mechanism for enhancing
fault coverage and reducing false alarms.
72
• Time-domain signature analysis techniques have extremely low tolerance to
noise, since the measured signature can be incorrect even for single bit errors
[100].
• Noisy signals and imprecise test clocks at wafer sort lead to distortion in the
value of the dynamic parameters of such a signal to noise ratio (SNR), which
directly affects the effective number of bits for the data converter. The lower-
order bits of the data converter, in the presence of noise, convert noise rather
than the signal itself. In such circumstances, the comparison of the data con-
verter with a pre-specified signature, inevitably leads to increased yield loss.
• Test signals which are more linear than the linearity of the device under test
(DUT), are prescribed as a requirement for successful testing of data convert-
ers [101]. This cannot be guaranteed in “big-D/small-A” SoC designs, as the
digital-to-analog converters (DACs) are used to provide test stimuli to the
analog-to-digital converters (ADCs), when configured in a loop-back mode.
Test procedures examine the output response of the circuit and compare it to
a pre-determined “acceptable” signature. In light of all the possible error sources
during wafer sort, a reliable acceptable signature is hard to derive because it requires
the modeling of all possible errors. To address the above problems, outlier analysis
has been extensively used in the IDDQ testing of digital circuits [57, 58]. We employ
a similar pass/fail criterion in the proposed wafer-level testing approach. To perform
such an analysis, we first require a measurable parameter for each core. In IDDQ
73
testing, this data comes in the form of supply current information. However, in spec-
tral analysis, the information obtained as a signature is spread over multiple data
points, where each data point represents the power associated with the corresponding
frequency bin. It is therefore necessary to encode this information as a single param-
eter corresponding to each individual core. We propose two correlation-based test
methods to achieve this goal. These methods are referred to as the mean-signature-
and golden-signature-based correlation techniques.
In [99], the authors use the correlation between a reference spectrum and the spectrum
of the circuit under test as a pass/fail criterion. The reference spectrum serves
as an acceptable signature, and is used for comparison with the spectrum of the
circuit under test. Such a reference signature is called an Eigen signature [99]. The
sensitivities to changes in the shape of the spectrum of the device-under-test from
the Eigen signature can be quantified by means of a correlation parameter. The
correlation is a fraction that lies between 0 and 1, and it serves as a single measurable
parameter for each individual die.
74
the number of good dies is appreciably larger than the number of defective ones, the
Eigen signature contains the information needed to classify the good dies from the
defective ones. Since both Xi and E are random variables, let X i and E represent
the mean of Xi and E respectively. The correlation between the Eigen spectrum and
that of the circuit under test can now be defined using Equation (3.1) as:
P m
xij
(xij − X i ) i=1
−E
j=1
m
corr(Xi , E) = 2 (3.1)
P
P m
xij
1/2
(xij −X i )2 i=1
−E
j=1 j=1
m
For the MSBC technique, the collection of spectral signatures requires the storage
of spectral information of a number of dies before a pass/fail decision can be made.
While this information does not have to reside in the main memory of the tester,
storing and handling such a large amount of data may be inconvenient. It may be
desirable to use a pre-defined golden-signature for correlation during wafer sort. It is
important to note that the use of a pre-defined spectrum as the golden signature does
not hamper outlier analysis. The golden-signature spectrum is obtained a priori, by
assuming ideal and fault-free operating conditions for the circuit under test. The
correlation parameter can still be used to identify the possible faulty dies. The
correlation parameters are estimated in the same way as in Section 3.1.1. The only
difference here lies in the use of a golden signature as the Eigen signature. The test
flow for both methods is described in Figure 3.1.
The next step in signature analysis is to set a threshold to determine the pass/fail
criterion for each die. As explained previously, due to all the non-idealities in the
75
Figure 3.1: Flowchart depicting the mixed-signal test process for wafer-level fault
detection.
76
be estimated by using the expected yield (Y% ) information from the characterization
(100−Y% )
data. We set the fraction of the number of dies passing the test to be Y% + k
.
The constant k can be chosen based on the type of signature analysis technique used.
77
moderate and grossly faulty data converters in the overall population are 44%, 37%
and 19% respectively.
We present experimental results for the 8-bit flash ADC model in Table 3.1. It is
clear that the MSBC technique outperforms the GSBC technique in most cases, both
in terms yield loss (YL) and overall test escapes (OTE). Table 3.1 lists the percentage
of test escapes for marginal (T EM aF ), moderate (T EM oF ), and grossly (T EGF ) faulty
dies. The percentages are given in terms of the number of faulty dies in each group).
Columns 5-7 list the relevant data separately for each fail type. As a result, the
rows of the table for these three columns do not add up to 100%. This analysis is
performed in order to evaluate the effectiveness of our proposed signature analysis
techniques over different failure regions. A significant percentage of marginal failures
result in test escapes. This shows that the proposed signature analysis technique is
not effective for screening marginal failures. On the other hand, 33%–92% and 26%–
92% of the moderately faulty dies are screened in the case of the MSBC technique and
GSBC technique, respectively. Thus our technique is effective for screening moderate
and gross failures, which is typically the objective in wafer-level testing. Marginal
failures are best detected at package test, where the chip can be tested in a more
comprehensive manner.
In this section, we present a cost model to evaluate wafer-level testing for a generic
mixed-signal SoC. A cost model for an entire electronic assembly process is described
in [103], using the concept of “yielded cost”. However, it cannot be readily adapted
for wafer-level testing. In [27], a cost modeling framework for analog circuits was
proposed, but it did not explicitly model the precise relationship between yield loss,
test escape and the overall product cost. The effects of yield loss and test escape for
78
Table 3.1: Wafer-level defect screening: experimental results for an 8-bit flash ADC.
FFT: No. of
Correlation Sample Points-
Technique Yield Type YL (%) OTE (%) T EM aF (%) T EM oF (%) T EGF (%) k
Mean 1024-LY 0.8176 46.66 89.25 33.21 3.53 5
Signature 1024-MY 0.25 67.7 97.11 66.19 0 7
1024-HY 0.9 49 95.23 54 7.4 7
4096-LY 0.06 47 77.1 7.95 0 10
4096-MY 0.08 27.43 58.65 12.67 0 10
4096-HY 0 25 95.23 10 0 10
Golden 1024-LY 1.006 75.71 98.59 73.7 42.47 5
Signature 1024-MY 0.0375 68.75 96.15 67.6 5 7
1024-HY 1.1 74 100 76 55.55 5
4096-LY 0.18 29.78 88.31 7.95 0.88 8
4096-MY 0.16 43.36 96.15 15.49 0 10
4096-HY 0.1 2.5 100 8 0 10
YL → Yield loss; OTE → Overall test escapes; T EM aF , T EM oF and T EGF → Test escapes for
marginal, moderate and grossly faulty dies respectively.
both the digital and mixed-signal cores in an SoC is modeled in our unified analytical
framework. The proposed model also considers the cost of silicon corresponding to
the die area.
Testing at the wafer level leads to yield loss and test escapes. Yield loss occurs when
testing results in the misclassification of good dies as being defective, and the dies
are not sent for packaging. We use the term Wafer-Test-Yield Loss (WYL), to refer
to the yield loss resulting from wafer-level testing, and the associated non-idealities.
Clearly, WYL must be minimized to reduce product cost.
The test escape component is also undesirable, due in large part to the mandated
levels of shipped-product quality-level (SPQL), also known as defects per million,
which is a major driver in the semiconductor industry. SPQL is defined as the
fraction of faulty chips in a batch that is shipped to the customer. Test escapes at
the wafer-level are undesirable because they add to packaging cost, but they do not
increase SPQL if these defects are detected during module tests.
In order to make the cost model robust, we introduce correction factors to account
79
for the test escapes and WYL. The correction factor for test escapes is obtained from
the “fault coverage curve”, which shows the variation of the fault coverage versus the
number of test vectors. It has been shown in [2], and more recently in [104], that, the
fault coverage curve can be mapped to a log function of the type f cn = 1 − αe−βn ,
where n is the number of test patterns applied, f cn is the fault coverage for n test
patterns, α and β are constants specific to the circuit under test and the fault model
used.
Typically in wafer-level testing for digital cores, only a subset of patterns are
applied to the circuit, i.e., if the complete test suite contains n patterns, only n∗ ≤ n
patterns are actually applied to the core-under-test. The correction factor θn∗ , defined
(f cn −f cn∗ )
as θn∗ = f cn
, 0 ≤ n∗ ≤ n, is used in the model to account for test escapes during
wafer-level testing.
Figure 3.2 shows how the fault coverage varies as a function of the number of
applied test vectors for the digital portion of a large industrial ASIC, which we
call Chip K11 . The digital logic in this chip contains 2,821,647 blocks (including
approximately 334,000 flip-flops), where a block represents a cell in the library. The
figure also shows the correction factor as a function of the number of test vectors
applied to the same circuit. Section 3.1 showed how we can evaluate the test escapes
for analog cores. Let us assume that the test escape for analog cores is β. Assuming
that test escapes for the analog cores are independent from the test escapes for digital
cores (a reasonable assumption due to the different types of tests applied for the two
cores), the SoC test escape can be estimated to be 1 − (1 − θn∗ ) · (1 − β).
Let us now consider the correction factor due to WYL. If the WYL for the digital
part of the SoC is W Y Ld and that for the analog part of the SoC is W Y La , the
effective WYL for the SoC is simply given by W Y Lef f = 1−(1−W Y Ld )·(1−W Y La ).
1
ASICs Test Methodology, IBM Microelectronics, Essex Jct, VT 05452.
80
Figure 3.2: The variation of the fault coverage and correction factor versus the
number of test vectors applied to the digital portion of Chip K.
We now present our generic cost model. The cost model treats the outcomes of a
test as random variables and assigns probabilities to the different possible outcomes.
Appropriate conditional probabilities are used to ensure that the model takes all
possible scenarios into account. Let us first define the following events: T + : the
event that the test passes, i.e., the circuit is deemed to be fault-free; T − : the event
of the test fails, i.e., the circuit is deemed to be faulty; D + : the event that the die is
81
fault-free; D − : the event that the die is faulty.
Using the above conditional probabilities, we can derive the following expressions
for P (T + ) and P (T − ):
P (T + ) = P (T + | D + )P (D + ) + P (T + | D − )P (D − ) (3.2)
P (T − ) = P (T − | D + )P (D + ) + P (T − | D − )P (D − ) (3.3)
where, P (T + ) = 1 − P (T − ).
P (T + | D + ) = (1 − P (T − ) − (P (T + | D − )P (D − ))/P (D +) (3.4)
The probability P (T + ) represents the fraction of the total number of dies that
need to be packaged. The conditional probability P (T + | D + ) represents the number
of good dies that are packaged i.e., it represents the fraction of dies for which the test
passes when the die is fault-free. This conditional probability, which can be easily
calculated using Equation (3.4), is used to calculate the effective cost per unit die
from the overall test and manufacturing costs.
82
3.2.3 Overall cost components
The overall production cost depends on whether only after-package testing is carried
out, or if wafer-level testing is done in addition to production testing. We first
determine the cost when only after-package testing is carried out. Let the total
number of dies being produced be N, let tap represent the total test application time
at the production level and cap represent the cost of test application (in $) per unit
time during after-package testing. Let CP denote the cost of packaging per unit die,
Adie be the area of the die under consideration, and Csil be the cost of silicon (in $)
per unit area of the die. The overall production cost Cocap (that includes test time
cost and silicon area cost, but ignores other cost components not affected by the
decision to do wafer-level testing) associated with manufacturing a batch of N dies
can now be determined using Equation (3.5):
Similarly the overall cost (Cocwap ) associated with the manufacture of a batch of
N dies for which both wafer-level and after-package testing are performed can be
determined using Equation (3.6).
Cocwap = (N · tw · cw ) + P (T + ) · N · CP + (P (T + ) ·
In Equation (3.6), tw and cw represent the overall test time at the wafer-level and
the tester cost per unit time, respectively. Recall that P (T + ) represents the fraction
of dies that pass the test at the wafer-level. This is an indicator of the number of dies
to be packaged and tested at the production level. The cost per unit die by performing
wafer and production level tests (Cdiewap ) can be calculated from Equations (3.5) and
(3.6) as Cocwap /(N · Y · P (T + | D + )). When only production level tests are performed
83
the cost per unit die can be estimated to be Cocap /(N ·Y ). This estimate of the cost per
unit die is overly optimistic because we assume that there is no yield loss or test escape
associated with after-package testing. This is usually not the case in practice. We
can now define the cost savings as (δC = Cocap /(N ·Y ))−(Cocwap /N ·Y ·P (T + | D + )),
which indicates the reduction in production cost per die due to the use of wafer-level
testing.
In this section, we use the model to validate the importance of wafer-testing from a
cost perspective. In order to use the cost model, we need realistic values of the cost
components used in the model. For this purpose, we model the section of flattened
digital logic (as explained in Section 3.2) as a single core, and use relevant information
from a commercial mixed-signal SoC, Chip U22 . The mixed-signal SoC includes a pair
of complementary data converters of identical bit-resolution. The data converters can
be configured in such a way that each DAC is routed through the ADC for purposes
of test (as explained in Section 3.1). It is appropriate to assume that the ADC and
the DACs are tested as pairs because a single point of failure is a sufficient criterion
to reject the IC as being faulty.
84
choose lower values for the package cost to ensure that there is no bias in the results.
Packaging costs for a high-end IC can be as high as $100 per die [29, 106, 1]. The cost
of silicon from [29] is estimated to be $0.1 per unit mm2 . We consider three typical die
sizes from industry (10mm2 , 40mm2 and 120mm2 ) corresponding to small, medium
and large dies, for purposes of simulation. We use a typical industry “yield curve”1 ,
shown in Figure 3.3, to illustrate the spread in cost savings than is achieved by testing
mixed-signal SoCs at the wafer level. The points on the yield curve correspond to the
probability that the yield matches the corresponding point on the x-axis. The yield
curve is appropriately adjusted to reflect distributions corresponding to die sizes,
because, higher yield numbers are optimistic for large dies, and vice versa [107].
Test costs typically range from $0.07 per second for an analog tester to $0.03 per
second for a digital tester1 . The cost is further reduced dramatically for an old tester,
which has depreciated from long use to a fraction of a cent per second. The proposed
wafer-level test method benefits from lower test time costs, hence to eliminate any
favorable bias in our cost evaluation, we assume that the test time cost is an order
of magnitude higher, i.e., $0.30 per second.
We model the test escapes by assuming that the the digital portion ASIC Chip
K is tested with 4046 test patterns, and for which the test escape correction factor
is calculated from Figure 3.2. The analog test time is modeled by assuming that
the data converter pair is tested with a 4096-point FFT. The test escape of the
mixed-signal portion of the chip is assumed to be 50%.
Figures 3.3–3.5 illustrate the effect of varying packaging costs on δC for small and
large dies, respectively. The cost savings per die are analyzed for each point in the
discretized yield curve. This is done in order to illustrate the spread in cost saving
85
that can be achieved in a realistic production environment. It is evident that the
savings that can be achieved by performing wafer level tests is significant, and that
it decreases with increase in yield.
Figure 3.3: Distribution of cost savings for a small die with packaging costs of (a)
$1 (b) $3 (c) $5.
Until now, we have only considered chip failures that can be attributed to either
the digital logic or the analog components, but not both. We next evaluate the cost
savings when the digital and the analog fails are correlated. Let A denote the event
of a mixed-signal test escape and B denote the event of a digital test escape. A test
escape in either the mixed-signal portion of the die, or in the digital portion of the die
will result in the part being packaged. The probability that the test process results in
at least one test escape can be given as P (A ∪ B); this probability can be represented
86
Figure 3.4: Distribution of cost savings for a medium die with packaging costs of
(a) $3 (b) $5 (c) $7.
Figure 3.5: Distribution of cost savings for a large die with packaging costs of (a)
$5 (b) $7 (c) $9.
87
using the following equation:
Our initial experiment considered a scenario where we assumed the test escapes
occurring in the different sections of the die to be independent. We therefore took
the product of the individual test escape probabilities to determines the resultant
test escape. We now consider an additional scenario where test escapes occur in
both parts of the die simultaneously, i.e., when a test results in a test escape in the
digital portion of the die, the mixed-signal test also results in a test escape. This is
given by the probability P (A ∩ B). In our experiments we consider test escape values
by varying P (A ∩ B) between 0 and min{P (A), P (B)} to determine the test escape
probability from Equation (3.8). The values of P (A), P (B), and the various costs
associated with the test and packaging process remain the same from our experiments
in Section 3.3.1. The purpose of this experiment is to determine the impact, on the
overall cost savings, of test escapes that occur in both the digital and mixed-signal
portions of the die.
We now present experimental results for a large die under three different yield
scenarios: high yield (Figure 3.6(a)) where the yield is 90%, medium yield (Figure
3.6(b)) where the yield is 75%, and low yield (Figure 3.6(c)), where the yield is 60%.
The x-axis denotes the probability of test escape overlap; the overlap in test escape
is varied from 0 to the test escape probability of digital cores (0.05 for Chip U). It is
observed from Figure 3.6 that our defect screening technique results in cost savings
despite the overlap in test escapes in the digital and mixed-signal cores. The cost
88
savings are minimum when there is no overlap in test escapes between the digital
and mixed-signal cores, and vice-versa. Similar results are obtained for small and
medium dies.
The results in Figures 3.3–3.5 do not consider the breakdown between the various
mixed-signal fail types. The percentage of marginal, moderate and gross failures
can be determined via statistical binning of failure information for a given batch of
dies being manufactured. Unfortunately, such failure data is not easily available in
the literature; companies are reluctant to disclose this information. Therefore, we
consider different scenarios and a range of values for the percentages corresponding
to the different failure types. Let x1 , x2 and x3 represent the percentage of failures
corresponding to marginal, moderate and gross fail types, and T E1 , T E2 and T E3 be
their corresponding test escape rates. The test escape (β) for the analog cores can
now be calculated as: (T E1 · x1 + T E2 · x2 + T E3 · x3 )/(x1 + x2 + x3 ).
We first consider the following cases: 1) all the fail types are equally distributed; 2)
the marginal fail type dominates the sample fail population; 3) the moderate fail type
dominates the sample fail population; 4) the gross fail type dominates the sample fail
population. In the case of a particular fail type dominating the sample fail population,
we assume that the other two fail types make equal contributions to the number of
failing dies. Table 3.2 illustrates the above four cases; it is assumed here that the
digital core in the SoC is tested with 4046 digital test patterns. The packaging costs
are chosen according to the yield type considered. We consider a packaging cost of
$5 for the low yield case, since the low yield case nominally corresponds to large
dies. Similarly we consider packaging costs of $3 and $1 for the medium and high
yield cases respectively. The die areas considered are, 10mm2 , 40mm2 and 120mm2 ,
89
(a) Low-yield scenario
Figure 3.6: Distribution of cost savings for a large die with packaging costs of (a)
$5, (b) $7 (c) $9, when test escapes between digital and analog parts are correlated.
90
corresponding to low, medium and high yield. A constant yield loss of 1% for all test
cases is considered. The percentage test escapes corresponding to failure type are
determined from Table 3.1 for all yield cases. The choices of packaging costs reflect
the lower bounds from the values considered in Section 3.3.1. We assume here that
the digital and mixed-signal fails are uncorrelated, due to the lack of representative
information. In practice, as discussed in Section 3.3.2, the correlation information
can be easily incorporated in the cost model if it is available for failing dies.
Table 3.2 presents results obtained using the cost model for the different cases
described above. We present results for both the MSBC- and the GSBC-based tech-
niques. The purpose of this experiment is to relate the importance of the proposed
wafer-level defect screening techniques to the dominance of a particular fail type. It is
obvious that a sample population with a high marginal fail type will result in a high
overall test escape rate for the SoC (TEM SBC and TEGSBC ). On the other hand, the
test escape rate will be low for the gross fail type. Table 3.2 shows that irrespective of
the distribution of fail types, wafer-level testing reduces cost in most cases. The use
of the MSBC-based technique results in greater cost savings (CSM SBC ), compared to
the GSBC technique (CSGSBC ). For a process known to have high yield, wafer-level
testing does not always reduce test and packaging costs. The negative entries in
Table 3.2 provide a reality check on the extent to which wafer-level tests should be
applied. These results help us to judiciously determine the extent of wafer testing
for different scenarios. The GSBC technique is inefficient for testing in a high-yield
production environment, which typically corresponds to the manufacture of small
dies. It is more suitable for low- and medium-yield dies.
We next vary x1 , x2 , and x3 , each between 0 and 100, under the constraint that
x1 + x2 + x3 = 100. The resulting cost savings are shown in Figure 3.7. The three
axes in Figure 3.7 denote the percentage of marginal failures (x1 ), the percentage
91
Table 3.2: Experimental Results for Cost Savings Considering Failure Type Distributions for Mixed-Signal Cores.
Distribution: {Marginal, FFT: No.
Yield Moderate, Gross} of Sample TEMSBC CSMSBC TEGSBC CSMSBC
Type Failures (%) Points (%) (in $) (%) (in $)
Low {33.33,33.33,33.33} 1024 42.79 1.6867 72.01 0.702
Yield 4096 29.3 2.1333 32.99 2.009
(60%) {70,15,15} 1024 68.42 0.8227 86.65 0.2084
4096 55.78 1.24 63.65 0.9744
{15,70,15} 1024 38.02 1.8472 73.26 0.6597
4096 18.27 2.5057 20.45 2.4454
{15,15,70} 1024 21.92 2.3898 56.21 1.2343
4096 13.95 2.6513 16.26 2.5733
Medium {33.33,33.33,33.33} 1024 55 0.3867 56.79 36.86
Yield 4096 24.82 0.685 38.04 0.5509
(75%) {70,15,15} 1024 78.21 0.1519 78.49 0.149
92
4096 43.74 0.4932 70.04 0.2265
{15,70,15} 1024 61.44 0.3216 63.01 0.3051
4096 18.79 0.746 26.29 0.67
{15,15,70} 1024 25.53 0.6848 29.05 0.6492
4096 11.92 0.8157 17.89 0.7552
High {33.33,33.33,33.33} 1024 52.81 0.0258 76.73 -0.0011
Yield 4096 35.93 0.0381 59.96 0.018
(90%) {70,15,15} 1024 76.2 -0.0005 89.87 -0.0159
4096 68.6 0.001 82.25 -0.0145
{15,70,15} 1024 53.84 0.0247 76.85 -0.0013
4096 22.36 0.0535 71.4 -0.0021
{15,15,70} 1024 28.56 0.0532 65.76 0.0113
4096 16.94 0.0596 28 0.0471
TEM SBC and TEGSBC → Overall test escape rate for the SoC using MSBC and GSBC;
CSM SBC and CSGSBC → Cost savings per die using MSBC and GSBC.
(a) Low-yield scenario
Figure 3.7: Variation in cost savings considering the impact of mixed-signal fail
types.
of gross failures (x3 ), and the cost savings per die, respectively. (The percentage of
moderate failures, x2 , is derived from x1 and x3 .) Results are presented for three
different yield scenarios: low, medium, and high yield; MSBC is used as the defect-
screening technique for the results presented in Figure 3.7. It is observed that the cost
savings are the least when marginal failures dominate the fail population. Similarly,
93
a fail population with significant gross failures result in high cost savings per die.
As expected, the cost savings for moderate failures lies between the cost savings for
marginal and gross failures. Similar results are observed when GSBC is used instead
of MSBC.
3.4 Summary
The next chapter presents a test scheduling technique for WLTBI of core-based
SoCs. The objective of the proposed test-scheduling technique is to minimize the
variation in power consumption during WLTBI, while maintaining a reasonable test
application time for the SoC.
94
Chapter 4
1. Scheduling the cores serially during WLTBI does not satisfy the objectives of
dynamic burn-in. The objective of dynamic burn-in is to have the maximum
switching activity, so that all the latent defects can be screened efficiently.
Testing a single core at a time does not contribute significantly towards stressing
the device.
2. Even though the burn-in time is long, all the time is not allocated for test
purposes. Burn-in involves temperature and voltage cycling multiple times [40].
Burn-in also involves subjecting the device to a period of static burn-in when
no patterns are applied. Any minimization in test time that can be achieved
95
through a test scheduling technique will help minimize the overall time required
for WLTBI, while at the same time satisfying the twin objectives of burn-in
and test.
3. All the die in the wafer cannot be contacted during WLTBI [6]. This can
be because of the lack on sufficient probe pins and/or limitations of WLTBI
equipment to remove heat. When only a fraction of the die can be tested during
burn-in, it is important to have low test times for the SoCs in order to test all
die during WLTBI.
96
4.1 Test scheduling for WLTBI
Efficient test-scheduling methods target increased test concurrency to reduce the test
application time. This leads to increased power consumption during test. Recent test-
scheduling techniques for core based SoCs have included the additional dimension
of test power consumption [109, 17]; this ensures that a pre-determined limit on
power consumption is not exceeded during test. These techniques, however, do not
address the variations in power that occur during test application. We develop a
power-conscious test scheduling approach in this chapter, tailored for WLTBI of
core-based SoCs. The primary objective of our work is to minimize variations in
power consumption such that predictions on burn-in time are accurate. A secondary
objective is to minimize the test application time.
97
(a) (b)
Figure 4.1: (a) TAM architecture for the d695 SoC with W = 32 (b) Corresponding
B-partite (B = 3) graph, also referred to as a tripartite graph for the d695 SoC with
W = 32. The nodes correspond to cores.
any given clock cycle, any set of cores on the three different TAM partitions can be
tested concurrently.
We assume a fixed-width TAM architecture and test buses [111], where the divi-
sion of W wires into B TAM partitions has been determined a priori using methods
described in [111, 84]. We now have to determine an optimal ordering of cores such
that the overall variation in power consumption for the SoC is minimized while sat-
isfying the constraint on peak power consumption Pmax . We refer to this problem as
PCore Order . We use the following two measures as metrics to analyze the variation in
power consumption.
1. The first measure is the statistical variance in test power consumption. Let TSoC
represent the test time for the SoC in clock cycles, and Pmean the mean value
of power consumption per clock cycle during test. The variance in test power
TSoC
consumption for the SoC is defined as 1
TSoC i=1 (Pi − Pmean )2 . Low variance
indicates low (aggregated) deviation in test power from the mean value of power
consumption during test. Successful WLTBI requires the minimization of this
98
metric.
during the ith and (i + 1)th clock cycles. Low values of γ are desirable for
WLTBI.
Problem PCore Order : Let T1 , T2 and T3 be the sets of cores on TAM partitions 1,
2 and 3 respectively. Determine the sets of cores that can be tested simultaneously,
and the ordering of the cores on the TAM partitions, such that the overall variation
in power consumption for the SoC is minimized and the peak power constraint Pmax
is satisfied.
To solve PCore Order , we need to determine sets of cores that are tested (con-
currently) during a test-session, and the ordering of these sets of cores in the test
schedule such that the variation in power consumption during test is minimized. The
tripartite graph, such as the the one shown in Figure 4.1(b), is used to represent the
99
assignment of cores to TAMs in an SoC. A matched triple in the tripartite graph
represents a set of three cores that are concurrently tested during a test-session with-
out violating the peak power constraint Pmax . The numbers of cores on each TAM
partition in an SoC are not necessarily equal. It is therefore necessary to determine
matched sets of triples, matched edges, and unmatched vertices (in the same order)
iteratively for the tripartite graph, to ensure that all the cores are assigned to the test
schedule. A graph-theoretic matching procedure can be used to determine and order
the matched triples, the resulting matched edges, and unmatched vertices. We next
describe how edge weights are added to the tripartite graph. These weights indicate
the power variation during test.
We next determine a matching in the tripartite graph that results in the least
“cost”. The cost of a matching here corresponds to the aggregate variation in power
consumption when the cores corresponding to the matched groups of vertices and
matched edges, are assigned to test sessions in the test schedule for WLTBI. The
matching problem uses the weighted tripartite graph to determine matched sets of
three vertices and matched edges in the order of increasing weight to obtain a match
100
with the lowest cost. Matched sets of three vertices and edges in the order of increas-
ing weight leads to a reduction in the variance in test power, and the mean cycle-
to-cycle variation in test power. The least-cost weighted tripartite graph-matching
problem is
Figure 4.2(a) illustrates an example test schedule optimized for WLTBI. The first
two test sessions T S1 and T S2 in the test schedule correspond to matched set of
triples {3, 1, 4}, and {7, 2, 6} in the weighted tripartite graph shown in Figure 4.2(b).
The dotted lines in Figure 4.2(b) represent matching in the tripartite graph. Cores 3,
1, and 4 when tested concurrently result in the least power variation among all valid
core combinations. The power data for this example is taken from the cycle-accurate
test modeling approach presented in [109]. The test session T S3 is represented by a
matched edge {5, 8} in Figure 4.2(b). Cores 9 and 10 represent unmatched vertices
in the tripartite graph, and they are tested individually in the test schedule. The
solution to PGM P therefore corresponds to a solution for PCore Order .
We next use the method of restriction to prove that PCore Order is NP-hard. A
special case of PCore Order , where |T1 | = |T2 | = |T3 | = n and all the edge weights are
equal, is equivalent to the well-known perfect tripartite matching problem [112]. The
perfect tripartite matching problem is stated as follows:
101
(a) (b)
Figure 4.2: (a) Test schedule for the d695 SoC with W = 32 and Pmax = 1800.
(b) Matched tripartite graph for the d695 SoC with W = 32. Dotted lines represent
matching.
matching problem, it follows from the method of restriction that PCore Order is NP-
hard.
We next describe the heuristic algorithm that we use to solve PCore Order . The algo-
rithm starts with an initial assignment of cores to TAM partitions, and then itera-
tively (re)assigns cores to the three TAM partitions such that the variation in test
power is minimized. The main steps, as shown in Figure 4.3, are outlined below:
1. In procedure Initial Assign, we schedule cores that are tested first on each TAM
partition, i.e., their test start-times are zero. The assignment of cores is obtained
by determining the set of cores that yield the least variation in power consumption
when tested simultaneously.
2. In procedure Assign Cores, we determine the next sets of cores that are assigned
to the test schedule. Cores are iteratively scheduled in sets of three, until all cores
102
Algorithm Core_Order
1: Initial_Assign();
2: Determine ρ(i, j, k)init = μ(i, j, k)init + σ(i, j, k)init ;
3: while there is a matched triple, do
4: Assign_Cores();
5: delete vertices corresponding to the triple chosen
by Assign_Cores();
6: end while
7: while |T1 |, |T2 |, and |T3 | = ∅; do
8: U nmatched_Assign();
9: end while
10: return test schedule for the cores in the SoC;
The proposed solution can be easily extended for SoCs with more than three
TAM partitions. Instead of a tripartite graph, we will need a B-partite graph for
B TAM partitions. The Initial Assign procedure and the Assign Cores procedure
both require searching through N 3 candidate solutions in the worst case; hence the
time complexity is O(N 3 ), where N is the number of cores in the SoC. The worst-case
time complexity of the heuristic procedure in terms of the number of TAM partitions
B is O(N B ). The heuristic procedure is exponential in the number of TAM partitions
B, but B is a constant at wafer-level since the TAM architecture is optimized during
design time for package test.
103
4.3 Baseline methods
We next describe two baseline methods. The first baseline method solves a power-
constrained test-scheduling problem for core-based SoCs. This approach considers
a single power-limit value for the entire SoC [109]. We determine the variation in
power consumption over time, when only a peak power limit is considered for test
scheduling. We use the same TAM architecture used by the Core Order heuristic.
The baseline scheduling algorithm keeps a record of the per-cycle values of power
consumption and ensures that it is less than Pmax at every cycle. When a new core
is added to the test schedule, the test power for the core is accumulated to reflect
the overall power consumption profile of the SoC. The algorithm iteratively schedules
the cores in the SoC to minimize the SOC test time, while satisfying the power limit
Pmax .
In this section, we present experimental results for three SoCs from the ITC’02 SoC
test benchmarks. We use cycle-accurate power data from [109]. Since the objective
of PCore Order is to minimize the variation in test power consumption (represented by
the two metrics presented in Section 4.1) during WLTBI, we present the following
results:
104
Power profile Power distribution Flatness profile
2000 2500
SD = 143.04 γ = 13.08
2000 Baseline1 1500 Baseline1
1500
P = 1800 Mean
max = 202.39
Baseline1
1500
1000
1000
1000
|Pi+1−Pi|
500 500
500
Power consumption
0 0 0
0 500 1000
105
SDCore_Order = 90.76 γCore_Order = 5.35
1500
1500 Pmax = 1800 6000 Mean =115.42
Core_Order
1000
1000 4000
|Pi+1−Pi|
Power consumption
0 0 0
No. of occurrences (cycles)
0 2 4 6 0 500 1000 0 2 4 6
Test time (cycles) 4 Test−power values
x 10
Test time (cycles)x 104
(d) (e) (f )
Figure 4.4: Power profile for d695 obtained using baseline approach 1 and Core Order (W = 32 and Pmax = 1800).
• The percentage difference in variance between baseline method 1 and Core Order.
VBaseline1 −VCore
This difference is denoted by δVBaseline1 , and it is computed as VBaseline1
Order
×
100%; VCore Order represents the variance in test power consumption obtained using
the Core Order heuristic, and VBaseline1 represents the variance in power consump-
tion obtained using the first baseline method.
• The percentage difference in variance between baseline method 2 and Core Order.
This is calculated in a similar fashion as δVBaseline1 , and is denoted as δVBaseline2 .
• We also present the WLTBI test time for the SoC obtained using Core Order and
the baseline test methods.
We first present power profiles and the corresponding distribution in power con-
sumption values during test. Figure 4.4 illustrates the power profile for the d695 SoC
when tested with a TAM width of 32; the maximum value of power consumption,
Pmax , is set to 1800 units in this case. (The units are derived from [109].) Figures
4.4(a) and 4.4(b) represent the power profile during test for the baseline approach
and the distribution in power consumption values corresponding to the power profile,
respectively; Figures 4.4(d) and 4.4(e) represent the same information obtained us-
ing the Core Order heuristic. Figures 4.4(c) and 4.4(f) illustrate the flatness profiles
obtained for the baseline scenario and using Core Order respectively. We can make
the following observations from Figure 4.4:
106
• The standard deviation SD, and hence the variance in power during test, is sig-
nificantly lower when Core Order is used to determine the ordering of cores.
• The mean value of power consumption (Mean) during test is also significantly
lower when the cores are ordered using Core Order. This is because Core Order
reduces the variation in power consumption at the cost of increased test time.
• The lower values of variance in power consumption obtained using the Core Order
heuristic results in a distribution where the power consumption values are packed
into fewer bins in the power distribution profile as compared to the baseline ap-
proach.
• The power profile obtained using Core Order, for the case illustrated in Figure
4.4, is 59% flatter than the baseline scenario. This is an indicator of the low
cycle-to-cycle power variation during test.
The results for the three benchmark SoCs, d695, p22810 and p93791 are sum-
marized in Tables 4.1-4.3 respectively; eight different values of W are considered in
each case. The values of Pmax for each circuit are chosen carefully after analyzing the
per-cycle test-power data provided in [109]. The minimum value of Pmax is chosen
such that a feasible schedule can be formulated using the given value of Pmax . The
SoC test time, T TCore Order , obtained using Core Order, and the SoC test time using
the baseline cases, T TBaseline1 and T TBaseline2 are reported in addition to δVBaseline1 ,
δVBaseline2 , and δγ. The results show that significant reduction in test power variation
can be obtained using our heuristic procedure, which ideally is the goal for WLTBI.
Significant reduction in cycle-to-cycle power variation is observed for all scenarios
when Core Order is used to order the cores.
The test times for the proposed approach are higher than that for baseline method
1. Recall that test-time minimization is a secondary objective for WLTBI. The
107
primary objective here is to minimize the test-power variance. Note that a limited
increase in the test time is not a serious drawback because the wafer is subjected to
relatively long intervals of burn-in.
The second baseline approach results in low values of variance for power consump-
tion. This because the cores are tested sequentially in this case, thereby resulting in
much higher test times as compared to the first baseline approach and Core Order.
Higher test times result in higher memory requirements; this limits the number of die
that can tested in parallel during WLTBI. Temperature and voltage cycling during
burn-in result in the die being tested at different operating temperatures and voltages
[37]. A reasonable test time is therefore necessary to support test repetitions under
such a scenario. The tester scan clock frequency for the burn-in ATE is lower than
that for a conventional ATE [37]. The significantly higher test time for the second
baseline method renders the method unsuitable for WLTBI.
The CPU time for test scheduling for d695 is less than a minute for all cases. The
Core Order procedure takes up to 2 hours for the p22810 SoC and up to 4 hours of
CPU time for the p93791 SoC on a 2.4 GHz AMD Opteron processor, with 4 GB of
memory.
4.5 Summary
108
Table 4.1: Reduction in test-power variance for d695.
T TCore Order T TBaseline1 T TBaseline2
Pmax W δVBaseline1 δVBaseline2 δγ (cycles) (cycles) (cycles)
8 27.77 0.16 35.19 247730 180799 290754
16 65.49 −13.37 49.61 124402 60482 147568
24 31.36 −24.61 13.60 71517 59329 96472
32 59.74 −7.71 40.95 65870 53833 77113
40 26.55 −23.29 20.37 61589 47442 75283
48 6.74 −1.89 8.58 51274 35940 61868
56 20.81 −25.49 25.02 42350 22569 49620
1600 64 12.57 −25.34 26.66 41882 21595 48740
8 27.77 −2.56 35.19 239727 180799 290754
16 56.52 −21.38 48.12 120468 60481 147568
24 45.65 −24.61 51.47 71517 40383 96472
109
32 59.74 −7.71 59.09 65870 53833 77113
40 26.55 −23.29 5.76 61589 47442 75283
48 6.74 −1.89 8.58 51274 35940 61868
56 8.39 −21.36 40.85 40690 22569 49620
1800 64 11.71 −24.14 26.66 32499 21595 48740
8 15.60 −10.86 31.10 191668 180798 290754
16 56.52 −21.38 48.12 120468 60481 147568
24 40.49 −24.61 55.03 71517 37370 96472
32 59.74 −7.71 40.95 65870 53833 77113
40 14.11 −21.01 5.76 61589 35124 75283
48 5.46 −21.05 8.13 44167 30830 61868
56 7.46 −16.38 43.10 34860 22423 49620
2000 64 4.17 −15.74 43.32 32499 18726 48740
Table 4.2: Reduction in test-power variance for p22810.
T TCore Order T TBaseline1 T TBaseline2
Pmax W δVBaseline1 δVBaseline2 δγ (cycles) (cycles) (cycles)
8 45.73 −7.52 47.21 1974010 879724 2600613
16 2.00 −10.52 43.13 1122870 550139 1375168
24 18.41 −15.33 40.70 808702 420351 995305
32 6.66 −12.09 24.15 675010 343927 834102
40 28.81 −6.64 47.40 560079 318426 688086
48 13.15 −4.65 41.68 547496 230457 661847
56 38.92 −2.45 45.00 514440 210138 629568
6000 64 38.61 −4.99 47.51 483679 202185 605656
8 44.86 −7.52 47.83 1974010 869465 2600613
16 1.02 −11.07 53.52 1206435 463658 1375168
24 4.54 −17.31 50.42 798816 338348 995305
110
32 6.85 −4.89 39.59 681721 263638 834102
40 26.16 −8.91 47.40 560079 250457 688086
48 13.15 −4.65 49.74 547496 230457 661847
56 38.92 −2.45 45.00 514440 210138 629568
8000 64 12.31 −13.02 45.08 475127 202185 605656
8 44.86 −7.52 47.83 1974010 869465 2600613
16 1.02 −11.07 53.52 1206435 463658 1375168
24 15.59 −17.31 50.42 798816 338348 995305
32 6.85 −4.89 35.85 681721 263638 834102
40 6.85 −4.89 47.40 560079 250457 688086
48 12.08 −3.13 55.36 547496 221241 661847
56 31.74 −2.45 45.36 514440 208476 629568
10000 64 12.31 −13.02 45.08 475127 202185 605656
Table 4.3: Reduction in test-power variance for p93791.
T TCore Order T TBaseline1 T TBaseline2
Pmax W δVBaseline1 δVBaseline2 δγ (cycles) (cycles) (cycles)
8 55.26 −0.02 29.68 8777550 4303557 10828293
16 43.38 −0.05 6.27 4937767 1890881 5851966
24 71.05 −0.19 32.83 2849621 1727943 3621107
32 31.28 −0.02 27.99 2272156 1427138 2860859
40 25.64 11.24 28.55 1600355 1152953 2017488
48 18.27 −0.28 26.28 1494115 1028976 1720545
56 35.10 −0.84 39.95 1185860 736604 1613826
15000 64 17.49 −2.49 40.01 1045983 694142 1478334
8 64.82 −0.02 32.45 8777550 4103621 10828293
16 43.38 −0.05 10.02 4937767 1890881 5851966
24 79.58 −0.19 31.94 2849621 1727943 3621107
111
32 31.28 −0.02 27.99 2272156 1427138 2860859
40 25.64 11.24 28.55 1600355 1152953 2017488
48 21.88 −1.21 29.43 1342911 991466 1720545
56 35.10 −0.84 39.95 1185860 736604 1613826
20000 64 17.49 −2.49 40.01 1045983 694142 1478334
8 64.82 −0.02 31.53 8777550 4103621 10828293
16 43.38 −0.05 8.79 4937767 1890881 5851966
24 79.58 −0.19 31.94 2849621 1727943 3621107
32 31.28 −0.02 27.99 2272156 1427138 2860859
40 31.25 6.19 28.55 1400129 1061723 2017488
48 21.88 −1.21 29.43 1342911 991466 1720545
56 33.39 −0.46 39.23 1124549 713561 3621107
25000 64 8.56 −4.59 41.62 1012164 662198 1478334
fore, we have presented a heuristic technique to solve PCore Order . Results for the
ITC’02 SoC test benchmarks show that a significant reduction in power variation is
obtained using the proposed method.
112
Chapter 5
Test application during burn-in at the wafer level requires low variation in power
consumption during test pattern application. The issue of controlling the variation in
power consumption during test is addressed in this chapter. We present two solution
methods, that allow us to determine an ordering of test patterns for WLTBI. Reduced
variance in test power results in less fluctuations in the junction temperatures of
the device. The ordering methods presented help control the variation in power
consumption during test; this will significantly lower the fluctuations in junction
temperature.
• We also develop heuristic techniques to solve the test pattern ordering problem
for large circuits.
113
In Section 5.3, a heuristic method to solve the problem efficiently is presented. The
baseline methods used to evaluate the test pattern ordering techniques are presented
in Section 5.4. Section 5.5 presents simulation results for several ISCAS’89 and
IWLS’05 benchmark circuits [113]. Finally, Section 5.6 summarizes the chapter.
A significant percentage of scan cells change values in every scan-shift and scan-
capture cycle. The toggling of scan flip-flops can result in excessive switching activity
during test, resulting in high power consumption. It has been shown in [63] that the
number of transitions of the DUT is proportional to the number of transitions in the
device scan-chains. Therefore, a reduction in the number of transitions in the scan
cells during test application leads to lower test power. A number of techniques have
been developed to reduce the peak power and average power consumption during test
by reducing the number of transitions in the scan chain [62, 114]. These techniques
rely on test-pattern ordering [115, 116], scan-chain ordering [67, 117], and the use of
multiple capture cycles during test application [115] to reduce the toggling of scan
cells during shift/capture cycles. Segmented scan approaches [118, 65, 64] have also
been used to address test power issues for industrial designs.
In [63], a metric known as the weighted transition count (WTC) was presented to
calculate the number of transitions in the scan chain during scan shifting. It was
also shown in [63] that the WTC has a strong correlation with the total device power
consumption. The WTC metric can be extended easily to determine the cycle-by-
cycle transition counts while applying test patterns. The knowledge of the length
of the scan chains, the test pattern to be scanned in, and the initial state of the
114
scan cells (response from previously applied test stimulus), can be used to generate
cycle-accurate test power data.
Let us consider a scan chain of length n that has an initial value ti = (ti1 , ..., tin ),
and a test pattern tp = (tp1 , ..., tpn ) that is shifted into the scan chain. The transitions
that occur during the shifting of the test pattern (and shifting out the previous state
test response) can be represented as an n × n matrix T [109]. An element tij of T is
1 if there is a transition in scan cell j during clock cycle i; otherwise tij = 0. T can
be used to calculate the total number of scan-cell transitions (a measure of the power
consumption during test) during every clock cycle. During any given clock cycle i,
the total number of transitions tr(i) can be calculated by summing the values of all
n
elements in row i of T ; this can be expressed using the equation tr(i) = j=1 tij .
For the example shown in Figure 5.1, the cycle-by-cycle number of scan-cell tran-
sitions is given by the set {4, 4, 4, 4, 5, 4}. For the test response (111100), the number
of transitions that occur during the capture cycle for this example is 2. For multiple
scan chains, the above calculation can simply be carried out independently for each
115
Figure 5.1: Example to illustrate scan shift operation.
scan chain.
In this section we present the test pattern ordering problem PT P O . The goal is
to determine an optimal ordering of test patterns for scan-based testing, such that
the overall variation in power consumption during test is minimized. For simplicity
of discussion, we assume a single scan chain for test application and N patterns
T1 , T2 , · · · , TN . The extension of PT P O to a circuit with multiple scan chains is
trivial. The test application for the CUT is carried out as follows:
2. The test-application procedure is initiated by shifting in the first test pattern into
the circuit.
3. The scan-out of the first test response and the scan-in of the next pattern are then
carried out simultaneously. This process is repeated until all the test patterns are
applied to the CUT, and all test responses are shifted out of the circuit.
116
4. The scan-out of the final test response terminates the test application process for
the CUT.
We next compute the cycle-by-cycle power when response Ri is shifted out and test
pattern Tj is shifted in, for a scan chain of length n. Let T Ck (Ri , Tj ), 1 ≤ k ≤ n, de-
note the power (number of transitions) for shift cycle k. The overall test power can be
represented by the following set T C(Ri , Tj ) = {T C1 (Ri , Tj ), · · · , T Cn (Ri , Tj ), T Cn+1
(Ri , Tj )}. The parameter T Cn+1 (Ri , Tj ) denotes the number of transitions during the
capture cycle. The average power consumption for T C(Ri , Tj ), μ(Ri , Tj ), is given by:
n+1
T Ck (Ri ,Tj )
k=1
n+1
. The unbiased estimate of statistical variance in test power, σ 2 (Ri , Tj ),
is given by
1
n+1
2
σ (Ri , Tj ) = (T Ck (Ri , Tj ) − μ(Ri , Tj ))2 .
n k=1
For the example of Figure 5.1, the average power consumption and the statistical
variance in test power are 3.85 and 0.80, respectively. We use the following two
measures as metrics to analyze the variation in power consumption.
1. The first measure is the statistical variance in test power consumption. Let Ttot be
the test time (in clock cycles) needed to apply all the test patterns for the CUT.
Let Pmean be the mean value of power consumption per clock cycle during test.
i=1 (Pi −
1 Ttot
The variance in test power consumption for the CUT is defined as Ttot −1
Pmean )2 . Low variance indicates low (aggregated) deviation in test power from the
mean value of power consumption during test. Successful WLTBI requires the
minimization of this metric.
117
are undesirable. We therefore quantify the “flatness” of the power profile using a
|Pi −Pi+1 |
measure Tth , obtained by counting the number of clock cycles i for which Pi
1 if Tj immediately follows Ti
xij =
0 otherwise
The objective function for the optimization problem can be written as follows:
N
Minimize F = max∀i xij · σ (Ri , Tj )
2
j=1
Minimize C, subject to
118
N
C≥ xij · σ 2 (Ri , Tj ), 1 ≤ i ≤ N
j=1
Next we formulate constraints to ensure that a test pattern is followed (and pre-
ceded) by exactly one pattern. This constraint can be represented by the following
two sets of equations.
N
xij = 1, i = S, 1, 2, . . . , N
j=1
N
xij = 1, j = 1, 2, . . . , E
i=1
We next formulate constraints imposed by the upper limit on peak power con-
sumption during any given clock cycle. Let us assume that the maximum constraint
on peak power consumption at any given clock cycle is Pmax ; the constraint to ensure
that this limit on power consumption is never violated can be written as:
Thus far, the model does not consider the change in power consumption when
three test patterns Ti , Tj , Tk are applied consecutively. It is important during WLTBI
to ensure that the power consumption between any two consecutive test patterns does
not change dramatically. We therefore need to maintain the change in test power
between two consecutive patterns within a reasonable threshold T Cth . This value is
chosen starting with the lowest value of T Cth necessary to formulate a valid ordering.
We model this constraint as follows:
|T Cn (Ri , Tj ) − T C1 (Rj , Tk )|
> T Cth =⇒ xij · xjk = 0.
T Cn (Ri , Tj )
119
Figure 5.2: Integer linear programming model for PT P O .
The xij · xjk product term is nonlinear and it can be replaced with a new binary
variable uijk and two additional constraints [85]:
In the worst case, the number of variables in the above ILP model is O(N 3 ) and
the number of constraints is also O(N 3 ). The complete ILP model is shown as Figure
5.2.
It can be easily shown that the pattern-ordering problem for WLTBI is NP-Complete.
The objective of PT P O is to determine an ordering of the N test patterns O1 , O2, · · · ,
ON that minimizes max{σ 2 (RO1 , TO2 ), σ 2 (RO2 , TO3 ), · · · , σ 2 (RON−1 , TON )}. Before
120
we prove that the pattern-ordering problem for WLTBI is NP-Complete, we in-
troduce the bottleneck traveling salesman problem (BTSP) [119]. Consider a set
{C1 , C2 , · · · , Cn } of n cities. The problem of finding a tour that visits each city ex-
actly once and minimizes the total distance traveled is known as TSP. In BTSP, we
attempt to find a tour that minimizes the maximum distance traveled between any
two adjacent cities in the tour. It has been shown in [119] that BTSP is NP-Complete.
The notations for the same graph G = (V, E) can be written in the context of
the pattern-ordering problem. In the context of the pattern-ordering problem, a
vertex can be interpreted as a test pattern and the edge weight w(i, j) can be used
to represent σ 2 (Ri , Tj ), i.e., variation in test power when test response i is scanned
out while scanning in test pattern j.
An optimal ordering of test patterns is one that minimizes the maximum value of
σ 2 (Ri , Tj ). This is an exact instance of BTSP. An optimal ordering of test patterns
that minimizes the maximum value of variation in test power consumption can be
found in polynomial time if and only if a tour that minimizes the maximum distance
between all two cities in the tour is found in polynomial time. This proves that PT P O
121
is NP-hard. Since PT P O is in NP, we conclude that it is NP-Complete. We next
present a heuristic technique to solve PT P O for large problem instances.
The exact optimization procedure based on ILP is feasible only when the number of
patterns is less than an upper limit, which depends on the CPU and the amount of
available memory. To handle large problem instances, we present a heuristic approach
to determine an ordering of test patterns for WLTBI, given the upper limit Pmax
on peak power consumption. The heuristic method consists of a sequence of four
procedures. Its objective is similar to that of the ILP technique, i.e., to minimize
the overall variation in power consumption during test. We start by determining
cycle-accurate test power information for all pairs of test patterns in O(nN 2 ) time.
We next determine the first pattern to be shifted-in, and then iteratively determine
the ordering of patterns such that the variation in test power is minimized. The
main steps used in the P attern Order heuristic, as shown in Figure 5.3, are outlined
below:
2. In procedure Initial Assign, the first test pattern to be shifted-in to the circuit
is determined. The pattern Ti that yields the lowest value in test power variance,
σ(S, Ti ), is chosen as the first test pattern to be applied. We ensure that the
constraint on peak power consumption Pmax is not violated when Ti is applied to
the CUT. The first pattern Ti that is added to the ordered list of test patterns is
referred to as Initpat .
122
termined. Once Initpat is determined, the subsequent ordering of patterns are
then iteratively determined by choosing the test pattern that results in the lowest
test-power variance σ(Initpat , Ti ) without violating Pmax .
4. In procedure F inal Assign, the lone unassigned test pattern is added last to the
test ordering. A final list of ordered patterns for WLTBI can now be constructed
using information from the Initial Assign and the P at Order procedures.
A search operation is performed each time procedures Initial Assign, and P at Order
are executed to determine the test pattern to be ordered. Hence the worst-case com-
putational complexity of the heuristic procedure, not including the O(nN 2 ) initial-
ization step, is O(Nlog2 N).
A second heuristic method based on the ILP model for PT P O can also be used
to determine an ordering of patterns for WLTBI. The computational complexity
associated with the ordering of a large number of test patterns limits the use of the
ILP model for large circuits. Using a divide-and-conquer approach, the ILP model
can recursively be applied to two or more subsets of test patterns for a circuit with
large N. The ordered subsets of patterns can then be combined, by placing subsets
that result in minimum cycle-to-cycle variation in power consumption adjacent to
each other.
123
Figure 5.3: Pseudocode for the P attern Order heuristic.
The first baseline method determines an ordering of test patterns to minimize the
average power consumption during test. The problem of reordering test sets to min-
imize average power has been addressed using the well-known TSP [67, 68]. Starting
with the initial state S, consecutive test patterns are selected at each instance to
124
minimize the average power consumption.
The above problem can be easily shown to be NP-hard [112]. Efficient heuristics
are therefore necessary to determine an ordering of test patterns to minimize the
average power consumption in a reasonable amount of CPU time. We use a heuristic
technique based on the cross-entropy method [120]. The average power values are
collected in a matrix of size N × N. Each element in the matrix corresponds to
an average power value for an ordered pair of patterns; for example element (1, 2)
in the matrix corresponds to the average power consumption when test pattern 2 is
shifted-in after test pattern 1. The heuristic technique takes the complete N × N
matrix as an input to determine an ordering of test patterns.
The second baseline approach determines an ordering of test patterns such that the
peak power consumption is minimized during test. The objective function for this
baseline method is as follows:
N
Minimize F = max∀i xij · P(Ri , Tj ) ,
j=1
where P(Ri , Tj ) denotes the peak power consumption when response Ri is shifted
out while simultaneously shifting in Tj . This optimization problem can be easily
solved to obtain a test-pattern ordering that reduces the peak power consumption.
As in the case of PT P O , an ILP method can be used for this baseline for small problem
instances. For large problem sizes, procedures Initial Assign and P at Order can
be modified to select a test-pattern ordering that results in the lowest peak power
consumption.
125
5.5 Experimental results
In this section, we present experimental results for eight circuits from the ISCAS’89
test benchmarks, and five IWLS’05 circuits. Since the objective of the test pattern
ordering problem is to minimize the variation in test power consumption during
WLTBI, we present the following results:
• The percentage difference in variance between baseline method 1 and the P attern
Order heuristic. This difference is denoted by δVB1 , and it is computed as
VBaseline1 −VP attern
VBaseline1
Order
× 100%; VP attern Order represents the variance in test power
consumption obtained using the P attern Order heuristic, and VBaseline1 represents
the variance in power consumption obtained using the second baseline method.
• The percentage difference in variance between baseline method 2 and the P attern
Order heuristic. This is calculated in a similar fashion as δVBaseline1 , and is
denoted as δVB2 .
• The percentage difference in variance obtained using random ordering of test pat-
terns and the P attern Order heuristic. This is calculated in a similar fashion as
δVBaseline1 , and is denoted as δVB3 .
• We highlight the difference in the total number of clock cycles i during which
|Pi −Pi+1 |
Pi
exceeds γ for baseline method 1, and P attern Order. We characterize this
TthBaseline1 −TthP attern
difference as δTthB1 = TthBaseline1
Order
×100%; TthBaseline1 and TthP attern Order
are the measures (defined in Section IV) obtained using the first baseline method
and the P attern Order heuristic respectively. The value of γ is chosen to be 0.05
(i.e., 5%) to highlight the flatness in power profiles obtained using the different
techniques.
• The indicators δTthB2 and δTthB3 are determined in a similar fashion as δTthB1 .
126
Table 5.1: Percentage reduction in the variance of test power consumption obtained using ILP and the P attern Order
heuristic.
ILP P attern Order
Baseline 1 Baseline 2 Baseline 3 Baseline 1 Baseline 2 Baseline 3
Circuit N n Pmax δVB 1 δTth B1 δVB 2 δTth B2 δVB 3 δTth B3 δVB 1 δTth B1 δVB 2 δTth B2 δVB 3 δTthB3
s1423 94 98 60 13.80 13.73 12.39 17.84 12.86 12.13 10.57 11.60 10.03 14.11 11.12 10.64
65 11.12 11.39 10.21 14.10 10.79 10.48 8.73 9.08 7.91 11.62 7.53 7.06
127
s5378 155 313 145 12.07 9.02 10.47 9.02 13.14 12.60 8.63 8.91 7.42 6.03 8.15 7.87
150 11.08 5.04 8.66 6.51 11.91 11.28 7.01 5.00 5.72 5.86 7.56 7.01
s35392 66 2083 1080 10.53 11.57 7.13 6.42 10.88 10.12 6.57 7.36 5.40 6.91 6.07 5.74
1120 8.64 7.40 6.94 9.15 7.23 6.86 5.32 4.06 4.19 5.11 5.57 5.13
ac97 ctrl 106 2252 1210 9.13 10.32 6.88 7.21 11.12 11.64 7.91 8.10 6.49 7.33 9.87 10.23
1220 6.93 6.97 6.08 6.37 8.04 8.63 6.15 6.58 5.79 5.90 7.61 8.11
1230 6.91 6.94 6.07 6.33 8.00 8.58 6.09 6.52 5.71 5.88 7.55 7.97
(a)
(b)
Figure 5.4: Impact of T Cth on test power variation for s5378: (a) Pmax = 145 and
(b) Pmax = 150 .
128
• For three ISCAS’89 and one IWLS’05 benchmark circuits, the above results are
reported for both the ILP method and the P attern Order heuristic.
• For three benchmark circuits, the above results are reported for the ILP-based
heuristic technique.
• The reduction in the variance of test power are reported for three ISCAS’89 bench-
mark circuits with a single scan chain, using t-detect test sets.
The ordering of test patterns using the ILP based technique yields lower varia-
tion in test power compared to the heuristic method. The P attern Order heuristic
however, is an efficient method for circuits with a large number of test patterns. The
results show that a significant reduction in test power variation can be obtained us-
129
ing the proposed ordering technique. The test-pattern-ordering technique also results
in low cycle-to-cycle variation in test power consumption. The ILP-based heuristic
technique can also be used as an effective technique to determine the ordering of
test patterns for WLTBI. This reduction in test power variation obtained using the
ILP-based heuristic technique is comparable to the P attern Order heuristic.
Even small reductions in the variations in test power can contribute significantly
towards reducing yield loss and test escape during WLTBI. We know from Equation
(1.1) that the junction temperature of the device varies directly with the power
consumption. This indicates that a 10% variation in device power consumption will
lead to a 10% variation in junction temperatures; this can potentially result in thermal
runaway (yield loss), or under burn-in (test escape) of the device. The importance
of controlling the junction temperature for the device to minimize post-burn-in yield
loss is highlighted in [40].
All experiments were performed on a 2.4 GHz AMD Opteron processor, with 4
GB of memory. The CPU times for optimal ordering of test patterns using ILP
ranges from 16 minutes for s1423 to 6 hours for s5378. The CPU times for ordering
test patterns using the P attern Order heuristic, when the cycle-accurate power in-
formation is given, is in the order of minutes (the maximum being 120 minutes for
s13207). The CPU time to construct the cycle-accurate power information is in the
order of hours for the benchmark circuits.
5.6 Summary
130
Table 5.2: Percentage reduction in the variance of test power consumption obtained using the P attern Order heuristic
for selected ISCAS’89 benchmark circuits.
No. of
scan Baseline 1 Baseline 2 Baseline 3
Circuit N n chains Pmax δVB 1 δTth B1 δVB 2 δTth B2 δVB 3 δTth B3
s9234 231 290 1 155 18.50 18.61 12.49 14.13 18.51 19.43
165 14.16 16.42 10.46 11.21 11.13 12.62
175 9.54 18.83 6.68 7.57 5.42 5.91
4 155 16.98 14.39 10.97 9.68 8.62 9.51
165 7.66 11.13 7.58 8.72 6.33 7.14
175 5.14 8.92 4.41 5.19 4.02 4.39
8 155 10.92 13.33 2.60 2.93 7.90 8.17
165 4.83 9.69 0.91 1.44 4.52 4.75
175 3.49 6.87 0.41 1.02 3.66 4.13
s13207 311 723 1 460 4.43 5.11 1.12 4.00 4.59 4.72
470 2.90 3.58 0.97 1.88 3.12 3.37
480 2.89 3.58 0.97 1.88 3.12 3.37
4 460 3.56 3.94 0.78 3.24 3.73 4.02
470 2.19 2.74 0.41 0.61 2.53 2.81
480 2.19 2.73 0.41 0.61 2.53 2.81
8 470 1.81 1.62 0.26 1.31 1.99 2.11
131
480 1.81 1.62 0.26 1.31 1.99 2.11
s15850 210 761 1 400 16.66 25.71 10.57 14.33 14.71 17.54
410 11.19 19.19 6.96 8.05 9.42 11.17
420 8.42 16.11 3.93 3.96 7.11 8.30
4 410 8.22 14.34 4.95 5.19 6.31 7.02
420 4.94 9.13 0.14 0.09 5.16 5.93
8 410 6.33 10.23 3.22 3.17 4.88 5.26
420 3.75 7.81 ≈0 0.03 3.64 4.00
s38417 198 764 1 390 4.08 6.39 3.39 3.62 2.59 2.82
405 3.48 3.56 2.44 2.11 1.67 1.79
415 0.77 0.81 0.25 0.40 0.08 0.15
4 405 3.06 3.42 2.16 3.53 1.80 1.94
415 0.54 0.67 0.09 0.28 ≈0 0.06
s38584 162 1372 1 695 7.11 4.13 5.94 3.26 8.39 7.44
710 5.76 3.32 4.19 3.55 6.08 5.64
720 3.51 2.86 2.84 1.92 4.70 3.98
4 695 5.83 5.01 4.64 3.91 6.14 5.72
710 4.46 3.59 3.63 3.04 5.22 4.60
720 3.49 2.88 2.21 1.64 3.98 3.52
8 695 3.21 2.61 2.52 2.03 3.68 3.17
710 2.20 1.75 1.43 1.08 2.76 2.21
Table 5.3: Percentage reduction in the variance of test power consumption obtained using the P attern Order heuristic
for selected IWLS’05 benchmark circuits.
No. of
scan Baseline 1 Baseline 2 Baseline 3
Circuit N n chains Pmax δVB1 δTthB 1 δVB2 δTthB 2 δVB3 δTthB 3
systemcaes 294 1008 1 570 9.55 8.93 7.31 7.08 9.94 9.62
580 6.67 6.48 4.70 4.58 7.29 7.66
590 6.53 2.81 4.64 1.90 7.19 7.61
usb funct 237 1918 1 1030 7.14 6.83 5.91 5.77 7.62 7.48
1040 4.52 4.04 2.23 1.91 5.05 4.96
132
1060 4.27 3.95 1.69 1.34 4.87 4.54
ac97 ctrl 230 2302 1 1055 12.32 11.98 10.61 10.49 12.73 12.24
1065 12.18 11.43 9.87 9.14 12.45 12.09
1075 11.66 10.93 9.21 8.87 12.14 11.58
wb conmax 413 3316 1 1520 5.12 5.08 4.37 4.11 5.91 5.75
1530 4.63 4.42 4.04 3.98 4.80 4.66
des perf 346 9105 1 5660 8.39 7.87 6.71 6.63 8.58 8.33
5670 8.12 7.94 6.27 6.01 7.93 7.67
5680 7.48 7.20 5.51 5.63 7.62 7.56
Table 5.4: Percentage reduction in the variance of test power consumption obtained using the ILP-based heuristic.
No. of Baseline 1 Baseline 2 Baseline 3
Circuit N n scan chains Pmax δVB1 δTthB 1 δVB2 δTthB 2 δVB3 δTthB 3
s9234 231 290 1 155 13.94 14.06 8.19 11.76 15.61 17.12
165 10.39 14.66 7.82 9.24 8.68 9.62
175 9.54 18.83 6.68 7.57 5.42 5.91
4 155 14.53 12.14 9.28 10.12 7.43 8.15
165 8.12 12.41 8.92 9.48 7.74 9.02
175 4.83 8.24 4.19 4.54 3.67 3.98
8 155 11.44 14.16 4.43 3.08 8.41 8.93
165 4.21 8.86 0.63 0.72 3.67 4.06
133
175 3.75 7.21 0.92 1.36 4.88 5.61
s13207 311 723 1 460 5.39 6.41 1.46 4.84 5.26 5.97
470 2.74 3.18 0.93 1.76 2.98 3.05
480 2.73 3.16 0.92 1.76 2.97 3.03
4 460 2.90 3.43 0.50 2.52 3.17 3.83
470 3.18 3.66 0.84 0.92 3.72 4.04
480 3.14 3.53 0.80 0.89 3.41 3.63
systemcaes 294 1008 1 570 7.82 7.43 5.92 5.84 8.39 8.11
580 6.33 6.21 4.52 4.17 7.04 6.90
590 5.86 5.94 4.91 5.14 6.28 6.63
Table 5.5: Percentage reduction in the variance of test power consumption obtained using the P attern Order heuristic
for three ISCAS’89 benchmark circuits using t-detect test patterns.
Baseline 1 Baseline 2 Baseline 3
Circuit n N t-detect Pmax δVB1 δTthB 1 δVB2 δTthB 2 δVB3 δTthB 3
s5378 313 363 t=3 150 8.19 7.63 5.48 5.06 9.26 9.45
155 6.92 7.14 4.03 3.70 7.57 7.72
586 t=5 150 7.31 7.65 4.98 5.13 8.10 8.22
155 7.16 7.49 4.41 4.64 7.73 7.62
s9234 290 349 t=3 150 17.12 19.46 7.03 9.64 13.39 15.94
160 7.40 9.14 4.32 7.26 9.13 10.29
134
170 4.95 5.43 3.89 3.96 7.14 8.41
539 t=5 150 13.81 15.02 10.48 11.37 16.17 17.26
160 7.08 7.93 6.42 7.21 9.15 9.87
170 4.11 5.03 3.74 4.36 5.69 6.56
s38417 764 436 t=3 390 5.14 5.89 3.96 4.08 5.42 5.63
405 4.78 4.92 3.12 3.33 5.31 5.74
415 2.01 2.26 0.79 1.07 2.45 2.63
679 t=5 390 6.38 6.72 5.15 5.41 6.93 6.86
405 5.85 6.11 4.32 4.47 6.74 6.88
415 3.53 3.38 2.06 2.23 3.87 4.10
to solve the pattern-ordering problem. We have compared the proposed reordering
techniques to baseline methods that minimize peak power and average power, as well
as a random-ordering method. In addition to computing the statistical variance of
the test power, we have also quantified the flatness of the power profile during test
application. Experimental results for the ISCAS’89 and the IWLS’05 benchmark
circuits show that there is a moderate reduction in power variation if patterns are
carefully ordered using the proposed techniques. Since the junction temperatures in
the device under test are directly proportional to the power consumption, even small
reductions in the power variance offer significant benefits for WLTBI.
135
Chapter 6
Dynamic burn-in using a full-scan circuit ATPG was proposed in [121] with the
objective of maximizing the number of transitions in the scan chains. We focus on a
WLTBI-specific X-fill framework that can control the variation in power consump-
tion during scan shift/capture. The test-pattern-ordering technique developed in
Chapter 5 is integrated into this framework to further reduce the variation in power
consumption during WLTBI.
We show how test-data manipulation and pattern ordering can be used to al-
leviate thermal problems during WLTBI. We present a unified framework for test-
pattern-manipulation and test-pattern-ordering for scan-based WLTBI. Our goal is
to minimize the variation in test power during test application. In order to fully
realize the benefits of WLTBI, it is necessary to address the challenges of test during
burn-in at the wafer level. We attempt to reduce the variation in power consumption
during test by manipulating test cubes. Improving power-management for WLTBI
can result in reduced yield loss at the wafer level [122].
The remainder of this chapter is organized as follows. Section 6.1 provides a de-
scription of the metrics used along with a description of the problem. Section 6.2
presents the “minimum variance” framework to control power variation for WLTBI.
136
The baseline methods used to evaluate the proposed technique are presented in Sec-
tion 6.3. Section 6.4 presents simulation results for several ISCAS’89 and IWLS’05
benchmark circuits [113]. Finally, Section 6.5 summarizes the chapter.
In this section, we present an outline of a procedure that can be used to manage test
power efficiently for WLTBI. Test application for a DUT is carried out by simultane-
ous scan-out of the test response and scan-in of the next test pattern; this is repeated
until all the test patterns are applied to the DUT. Every time a shift operation is
performed there is significant switching activity in the scan chains. This leads to
constantly varying device power during test. It is therefore important to minimize
the cycle-by-cycle variation in the number of transitions during the course of pattern
application. In addition, it is also important to minimize the power variance for scan
capture. The capturing of output responses in the scan chains can result in excessive
flip-flop transitions, resulting in the violation of peak-power constraints [71]. In [63],
a metric known as the weighted transition count (WTC) was presented to calculate
the number of transitions in the scan chain during scan shifting. It was also shown
in [63] that the WTC has a strong correlation with the total device consumption.
The WTC metric can be easily extended to determine the cycle-by-cycle transition
counts during pattern application.
As in Chapter 5, we use the following two measures as metrics to analyze the variation
in power consumption.
137
2. The total cycle-to-cycle variation in test power consumption used to assess the
“flatness” of the power profile during test.
Detailed descriptions of the above two metrics can be found in Chapter 5 (Section
5.1) of this thesis.
The goal of problem PM V F is to first determine optimal X-fill values for the test
cubes using for scan-based testing, and then an ordering of fully specified test vectors
such that the overall variation in power consumption during test is minimized. For
simplicity of discussion, we assume a single scan chain for test application and N test
cubes T1 , T2 , · · · , TN . The extension of PM V F to a circuit with multiple scan chains
is trivial. The optimization problem PM V F can now be formally stated as follows:
PM V F : Given a CUT with a set T of N test cubes, i.e., T = {T1 , T2 , . . . , TN },
determine appropriate X-fill values for the unspecified bits in the test cubes, and
subsequently determine an optimal ordering of the fully specified test patterns such
that: 1) the overall variation in power consumption during test is minimized, and 2)
the constraint on per-cycle peak-power consumption Pmax during test is satisfied.
The steps involved in the proposed Min V ar procedure to minimize power vari-
ation are as follows:
Step 1: The first step involves generation of test cubes for the DUT for any targeted
fault set. In our work we consider test patterns for stuck-at faults. An a priori ran-
dom ordering of test cubes for the DUT is first considered.
Step 2: The second step involves the elimination of power violations that occur
during scan shifting. The objective during this step is to fill the unspecified bits in
the test cube (X-fill) such that the cycle-by-cycle variation in power consumption is
minimized. There is significant variation in power consumption when a test response
138
is shifted out and a test pattern is shifted in simultaneously. This procedure mini-
mizes the cycle-by-cycle variation in test power for the pattern ordering determined
in the first step.
Step 3: In this step, peak-power violations due to scan capture are eliminated. If
capture-power violations are observed after the X-fill procedure, the previously as-
signed values of Xs are reassigned to new values to control the capture power during
test.
Step 4: The penultimate step in the power-management procedure for WLTBI in-
volves test-pattern ordering. After Steps 1, 2 and 3 are completed, the test-pattern
ordering approach from [123] is used to further reduce the variation in power con-
sumption during WLTBI.
Step 5: The final procedure checks for any power violations introduced by the test-
pattern ordering procedure. Power violations, if any, are resolved in a similar fashion
as done in Step 3.
In this section, we describe Min V ar, a procedural framework to control the power
variation during test for WLTBI. It consists of a sequence of four steps as described
in Section 6.1.
The switching activity in the scan flip-flops during shift in/out result in significant
power consumption during test. Let us consider a scan chain of length n that has
an initial value r = (r1 r2 · · · rn ); the initial value corresponds to the test response
from previous pattern application. Let us consider a test pattern t = (t1 t2 · · · tn )
139
that is shifted into the scan chain. Figure 6.1 represents the cycle-by-cycle change in
states of the scan cells when the test pattern t is scanned in and the test response
r = (r1 r2 · · · rn ) is scanned out. The total number of transitions in the scan
chain, i.e., the transition count for the various clock cycles can be represented by the
equations described in Figure 6.2. The transition count during any clock cycle j is
represented as T C(j); e.g., T C(1) represents the total number of transitions in the
scan chain during the first clock cycle.
Clock
cycle F F1 F F2 F F3 F F4 ··· F Fn−1 F Fn
0 r1 r2 r3 r4 ··· rn−1 rn
1 tn r1 r2 r3 ··· rn−2 rn−1
2 tn−1 tn r1 r2 ··· rn−3 rn−2
3 tn−2 tn−1 tn r1 ··· rn−4 rn−3
.. .. .. .. .. .. .. ..
. . . . . . . .
n−1 t2 t3 t4 t5 ··· tn r1
n t1 t2 t3 t4 ··· tn−1 tn
The cycle-by-cycle change in transition counts can now be represented using the
equations shown in Figure 6.3. The objective during WLTBI is to minimize the
cycle-by-cycle change in power consumption during test. This can be accomplished
by minimizing the change in transition counts between any two consecutive clock
cycles. In other words, our goal is to minimize ΔT C(j) = T C(j) − T C(j − 1) for all
j, 2 ≤ j ≤ n, by making it as close to 0 as possible.
140
E0 : ΔT C(1) = T C(1)
E1 : ΔT C(2) = |T C(2) − T C(1)|
= |(tn ⊕ tn−1 ) − (rn−1 ⊕ rn )|
E2 : ΔT C(3) = |T C(3) − T C(2)|
= |(tn−1 ⊕ tn−2 ) − (rn−1 ⊕ rn−2 )|
..
.
En−1 : ΔT C(n) = |(t1 ⊕ t2 ) − (r1 ⊕ r2 )|
Theorem 1. Given the set of equations E = {E1 , E2 , · · · , En } from Figure 6.3, de-
noting the per-cycle change in transition counts, there exists at least one equation in
E that has only a single unknown variable.
Proof. We use the method of contradiction. Every equation in the set E has at most
two variables on the right-hand side. Suppose none of these equations has exactly
one unknown variable. This implies that every equation has two unknowns, i.e., the
complete test pattern t1 , t2 , · · · , tn is unspecified. This is a contradiction since the
test pattern must have at least one specified bit.
Once the equation Ei with exactly one unknown is solved to minimize ΔT C(i+1),
it leads to at least another equation with exactly one unknown variable. This process
is continued until all the variables are assigned values to minimize each ΔT C(j),
1 ≤ j ≤ n.
141
Let us consider Equation (6.1) representing the change in transition count for
clock cycle j.
ΔT C(j) = tn−j+2 ⊕ tn−j+1 − rn−j+2 ⊕ rn−j+1 (6.1)
Without loss of generality, let us suppose that tn−j+1 is a care bit and tn−j+2 is an
unspecified bit. Since our objective is to minimize ΔT C(j), we can determine tn−j+2
as follows:
tn−j+2 = (rn−j+2 ⊕ rn−j+1) ⊕ tn−j+1 (6.2)
Once tn−j+2 is determined, we delete the equation for ΔT C(j) from the set of
equations and proceed in a similar fashion until all the unspecified bits in the test
cubes are filled. As a final step, we solve for ΔT C(1). It is important to note here that
we cannot guarantee the least possible value for ΔT C(1). However, the above O(n)
algorithm is optimal for ΔT C(2), ΔT C(3), · · · , ΔT C(n) for minimizing variation in
power consumption during scan shift. The algorithm solves one equation at a time,
and there can be a maximum of n equations; the complexity of the algorithm is
therefore O(n).
142
tion is also shown in the figure. The minimum-variance X-fill method results in an
X-filling of the test cube that yields 22.47% less cycle-by-cycle variance in test power
when compared with the baseline adjacent-fill method.
143
increment step in maximum power consumption during each iteration. The value of
ΔP can be chosen based on the size of the benchmarks circuit and constraints on
CPU time.
If we assume that the maximum number of unspecified bits in a test cube is p, the
worst-case complexity of this procedure is O(Np). It is important to note here that
the above procedure does not explore the exponential number of input assignments
(2p assignments in the worst case) for each test cube. The procedure uses a greedy
algorithm to save CPU time. Once the capture power violation is resolved, fault-free
simulation is performed to verify if the bit-reversals have created new shift-power
violations. If power violations exist after the completion of the bit-reversal procedure,
the power constraint Pmax is relaxed and the procedure is repeated; this process is
repeated until all power violations are eliminated.
In Chapter 5 a heuristic method (P attern Order) was presented to order test patterns
for WLTBI. We use the same test-pattern ordering method here to further reduce the
variation in power consumption during WLTBI. Section 5.5 describes the heuristic
method in detail. The heuristic approach determines an ordering of test patterns
for WLTBI, given an upper limit Pmax on peak power consumption. It consists of a
sequence of four procedures. The main steps used in the P attern Order heuristic,
as described in Chapter 5, are outlined below for the sake of completeness:
2. In procedure Initial Assign, the first test pattern to be shifted-in to the circuit
is determined. The pattern Ti that yields the lowest value in test power variance,
144
σ(S, Ti ), is chosen as the first test pattern to be applied; S is used to denote a
(dummy) start pattern. We ensure that the constraint on peak power consumption
Pmax is not violated when Ti is applied to the CUT. The first pattern Ti that is
added to the ordered list of test patterns is referred to as Initpat .
4. In procedure F inal Assign, the lone unassigned test pattern is added last to the
test ordering. A final list of ordered patterns for WLTBI can now be constructed
using information from the Initial Assign and the P at Order procedures.
The complete framework for reducing the variation in power consumption during
WLTBI is described in Figure 6.5. The process begins by determining the test cubes
for the DUT. Starting with a randomly ordered test set, the procedures described
in Sections 6.2.1-3 are performed in the order described in Figure 6.5 to obtain an
ordered set of fully-specified test patterns. This pattern set is specifically determined
for WLTBI in order to keep the fluctuations in junction temperature under control
while applying test patterns during burn-in. The experimental results for our pro-
posed framework are described in Section 6.4, and are compared with appropriate
baseline scenarios.
We next present an example using the full-scan version of the s208 ISCAS’89
benchmark circuit to illustrate the complete procedure. This circuit has eight flip-
flops in the scan chain. We use a commercial ATPG tool to generate test cubes for
s208; we consider six of the test cubes for this example, as shown in Figure 6.6(a).
145
Figure 6.5: Flowchart depicting the Min V ar framework for WLTBI.
The eight equations for the X-fill procedure for the first pattern are shown in Figure
6.6(b). We first target equation E7 , which has one unknown variable t2 . For the
first test cube, we set t2 to 0 to minimize ΔT C(8). We next consider equation E6 to
minimize ΔT C(7). This procedure is continued until all X’s are assigned values. The
same procedure is repeated for the remaining test cubes. The completely-specified
test patterns are now shown in Figure 6.6(c). The next step is to check for power
violations during capture. We arbitrarily set Pmax = 6 for this example. For the
current assignment of don’t-care values, a power violation occurs for the current
specification of test pattern 3. We use the procedure in Section 6.2.2 to reverse-
map don’t cares to resolve the power violation. Modifying the third test pattern
to t3 = 11101010 removes the power violation. The complete set of previous state
146
test responses are shown in Figure 6.6(c). Finally, we use the procedure in Section
6.2.3 to determine an ordering of test patterns. The ordering of test patterns for this
example is {3, 2, 6, 1, 4, 5}.
Pattern t1 t2 t3 t4 t5 t6 t7 t8
1 1 X X X X 0 X 1
2 X 1 1 X X X 1 X
3 X X 1 0 X X X 0
4 0 X X X X 0 1 X
5 1 X X 0 1 0 X 0
6 X 1 X X X X 0 X
(a)
E0 : ΔT C(1) = T C(1)
E1 : ΔT C(2) = (1 ⊕ t7 ) − (0 ⊕ 0) Pattern t1 t2 t3 t4 t5 t6 t7 t8
E2 : ΔT C(3) = (t7 ⊕ 0) − (0 ⊕ 0) 1 1 0 0 0 0 0 1 1
E3 : ΔT C(4) = (0 ⊕ t5 ) − (0 ⊕ 0) 2 0 1 1 0 0 1 1 0
E4 : ΔT C(5) = (t5 ⊕ t4 ) − (0 ⊕ 0) 3 0 0 1 0 1 0 1 0
E5 : ΔT C(6) = (t4 ⊕ t3 ) − (0 ⊕ 0) 4 0 1 1 1 0 0 1 0
E6 : ΔT C(7) = (t3 ⊕ t2 ) − (0 ⊕ 0) 5 1 1 0 0 1 0 1 0
E7 : ΔT C(8) = (t2 ⊕ 1) − (0 ⊕ 0) 6 1 1 1 0 0 1 0 1
(b) (c)
Figure 6.6: (a) Test cubes for s208 benchmark circuit; (b) Equations describing the
per-cycle change in transition counts (c) Test set after minimum-variation X-fill.
The first baseline method involves filling strings of X’s in the test cube with the same
value [124]; this minimizes the number of transitions during scan-in. For example,
147
if we consider a test cube 0X1XXX10, an assignment of X’s that minimizes the
number of transitions results in a fully specified test vector 00111110. If a peak-power
violation is observed, a reverse bit-stripping process [124] is employed to introduce a
different assignment of values to the unspecified bits in the vector. This process is
repeated until all peak-power violations are eliminated.
The second baseline method employs an X-fill methodology that assigns logic value
0 to all unspecified bits in the test cube. This method was employed in [64] for power
minimization during scan testing. In [64], the authors do not consider a specific value
of peak power; the objective of the work in [64] is to simply minimize the power
consumption during test. For this baseline scenario, we check for power violations
during scan-in and capture. If power violations occur after filling the test cubes, we
perform reverse bit-stripping to eliminate all peak power violations.
The third baseline method is similar to baseline method 2. In this baseline method
we assign a logic value 1 to all unspecified bits in the test cube. Power violations are
checked and reverse bit-stripping as in Baseline Method 2.
Such a fill technique can still result in a peak-power violation during scan-in and
capture. Once X-fill is complete, it is necessary to check if the peak power Pmax is
violated.
The final baseline method considers an ATPG-compacted test set, i.e., fully specified
test vectors are used. The pattern counts for these test sets are significantly less
148
than for the case where test cubes are used. While the proposed method uses more
patterns, it is not a serious concern because burn-in times are relatively high in
practice.
• The percentage difference in variance between baseline method 1 and the Min V ar
VBaseline1 −VM in
procedure. This difference is denoted by δVB1 , and it is computed as VBaseline1
V ar
×
• The percentage difference in variance between baseline method 2 and the Min V ar
procedure. This is calculated in a similar fashion as δVBaseline1 , and is denoted as
δVB2 .
• The percentage difference in variance obtained using 1-fill of test cubes and the
Min V ar procedure. This is calculated in a similar fashion as δVBaseline1 , and is
denoted as δVB3 .
• The percentage difference in variance obtained using a fully compacted test set
and the Min V ar procedure. This is calculated in a similar fashion as δVBaseline1 ,
and is denoted as δVB4 .
• We highlight the difference in the total number of clock cycles i during which
|Pi −Pi+1 |
Pi
exceeds γ for baseline method 1, and Min V ar. We characterize this
TthBaseline1 −TthM in
difference as δTthB1 = TthBaseline1
V ar
×100%; TthBaseline1 and TthM in V ar are the
measures obtained using the first baseline method and the Min V ar procedure
respectively. The value of γ is chosen to be 0.05 (i.e., 5%) to highlight the flatness
in power profiles obtained using the different techniques.
149
• The indicators δTthB2 , δTthB3 δTthB4 and are determined in a similar fashion as
δTthB1 .
In this section, we present experimental results (Tables 6.1-6.5) for circuits from the
ISCAS’89 benchmarks and the IWLS’05 benchmarks. Since the objective of the test
pattern ordering problem is to minimize the variation in test power consumption
during WLTBI, we present the following results:
A commercial tool was used to perform scan insertion for the IWLS benchmark
circuits, which are available in Verilog format. We used a commercial ATPG tool to
generate stuck-at patterns (and responses) for the full-scan ISCAS’89 and IWLS’05
benchmarks. The results for the five large ISCAS’89 benchmark circuits are listed in
Table 6.1. The values of Pmax (measured in terms of the number of flip-flop transitions
per cycle) for each circuit are chosen carefully after analyzing the per-cycle test-power
data. We also present experimental results for the IWLS’05 benchmark circuits in
Table 6.2. A description of the IWLS’05 benchmarks in terms of the number of
scan flip-flops and total number of cells is shown in Table 6.2. Tables 6.3 and 6.4
describe the percentage reduction in the variance of test power consumption using
150
Table 6.1: Percentage reduction in the variance of test power consumption obtained using the Min V ar procedure for
the ISCAS’89 benchmark circuits.
No. of No. of No. of Baseline 1 Baseline 2 Baseline 3
Circuit patterns flip-flops scan chains Pmax δV B1 δT th δV B2 δT th δV B3 δT th
B1 B2 B3
s9234 2005 286 1 150 6.12 5.77 10.16 12.69 9.77 12.14
160 5.43 5.31 10.02 12.31 8.93 10.21
170 5.17 5.26 9.67 11.04 7.09 5.91
4 150 5.94 5.47 9.68 11.41 8.62 11.21
160 5.16 4.82 8.67 11.13 8.09 10.24
170 4.33 4.29 7.74 8.45 6.23 9.56
8 150 3.67 3.18 7.46 8.19 4.92 6.31
160 3.53 2.84 6.02 6.93 3.56 4.87
170 1.98 1.77 4.13 5.01 1.45 1.62
s15850 3944 761 1 310 8.17 9.49 14.32 15.94 12.17 12.46
320 7.73 9.18 13.64 14.86 11.25 11.60
4 310 6.42 6.79 12.16 12.68 11.92 12.23
320 6.18 6.41 11.72 11.56 10.98 11.63
755
8 310 5.28 5.13 10.62 11.17 10.86 10.43
320 4.73 5.02 8.66 9.41 9.84 9.10
s35392 9760 2083 1 895 7.43 7.17 9.84 10.63 8.16 8.49
910 7.02 6.81 8.92 10.19 8.03 7.61
930 6.37 6.24 8.65 9.42 7.56 7.33
151
4 895 6.29 6.67 9.14 9.76 7.62 8.11
910 5.83 5.69 7.80 8.23 6.49 7.72
930 5.58 5.61 7.64 7.92 6.13 7.36
8 895 4.08 5.40 7.41 7.97 6.86 7.04
910 2.17 2.62 6.18 7.25 6.51 6.74
930 0.89 1.20 4.53 5.04 5.19 5.37
s38417 10081 1770 1 770 2.63 3.16 4.41 4.24 5.04 6.19
780 2.48 2.31 3.72 3.91 4.30 4.46
790 1.83 1.88 3.15 3.29 3.54 3.61
4 770 1.17 1.92 2.74 2.32 3.18 3.41
780 0.64 0.83 2.49 1.87 2.71 2.94
790 −0.29 −0.07 0.72 1.14 1.53 1.44
8 770 0.89 1.68 2.13 1.94 2.76 3.13
780 0.56 1.21 1.63 1.88 2.31 2.58
790 −0.12 0.05 0.60 0.72 1.79 1.85
s38584 14161 1768 1 735 3.23 4.75 5.14 5.97 5.33 6.19
745 3.18 4.26 4.84 5.30 4.91 5.47
755 2.91 2.99 4.62 5.01 4.69 5.05
4 735 2.70 4.43 4.94 5.12 5.29 5.78
745 2.17 2.32 3.64 4.48 4.16 4.72
755 1.67 2.08 3.22 4.58 3.97 4.31
8 735 1.94 2.67 3.13 4.39 4.61 4.78
745 0.89 1.23 1.96 2.78 3.21 3.60
755 −0.22 0.34 1.42 1.73 2.29 2.51
Table 6.2: Percentage reduction in the variance of test power consumption obtained using the Min V ar procedure for
the IWLS’05 benchmark circuits.
No. of No. of No. of No. of Baseline 1 Baseline 2 Baseline 3
Circuit patterns flip-flops cells scan chains Pmax δV B1 δT th δV B2 δT th δV B3 δT th
B1 B2 B3
systemcaes 10454 1058 17817 1 530 8.06 7.92 11.35 11.91 10.62 11.56
540 7.48 8.16 11.27 11.47 10.26 11.10
550 6.82 7.33 10.61 11.14 10.83 11.59
4 530 7.44 7.13 10.87 10.69 10.18 10.96
540 6.61 7.46 10.54 10.22 9.68 9.70
550 5.03 5.72 8.61 8.59 7.32 7.93
8 530 6.17 5.95 8.24 7.98 9.13 9.46
540 5.82 5.51 8.06 7.62 6.97 7.01
550 5.37 4.95 7.38 7.14 6.60 6.74
usb funct 18399 1968 25510 1 960 2.34 3.61 4.46 5.68 4.17 5.36
970 2.19 3.28 3.67 4.56 3.33 4.18
980 1.96 2.49 3.12 3.63 3.27 3.78
4 960 2.17 2.94 4.12 5.51 4.04 4.43
970 1.89 2.23 3.47 3.82 3.27 3.91
980 1.56 1.70 2.85 3.29 2.86 3.47
8 960 1.05 1.46 3.78 4.23 3.59 3.70
970 0.68 0.87 3.26 3.31 3.04 3.12
980 0.17 0.23 2.63 2.58 2.52 2.65
ac97 ctrl 20146 2302 28049 1 1025 11.56 12.14 13.12 13.67 12.83 13.58
1040 11.42 11.53 12.69 12.51 11.40 12.82
152
1050 10.93 11.21 11.43 11.60 11.16 11.51
4 1025 11.13 11.39 12.86 13.23 11.77 12.21
1040 10.78 10.47 11.94 12.11 10.72 11.63
1050 10.14 10.39 11.19 11.83 10.38 10.92
8 1025 10.46 10.24 10.61 11.45 10.18 10.86
1040 9.80 9.62 10.27 10.95 9.27 10.17
1050 9.06 8.77 9.83 10.31 8.80 9.51
wb conmax 57681 3316 59483 1 1460 14.17 15.68 16.33 16.12 17.25 16.98
1500 13.42 15.03 15.58 15.43 17.02 16.71
4 1460 13.76 14.47 14.90 15.61 16.36 15.80
1500 13.19 13.71 14.16 14.87 15.93 15.26
8 1460 12.93 13.09 14.58 14.12 15.49 15.68
1500 11.24 10.45 13.31 13.14 14.67 15.11
des perf 76001 9105 146224 1 5600 7.39 8.12 9.24 10.33 8.75 8.89
5650 7.14 7.61 9.07 9.96 8.49 8.52
4 5600 6.73 7.38 8.91 10.14 8.53 8.46
5650 5.96 7.16 8.12 9.57 7.83 8.13
8 5600 6.51 6.91 8.43 9.28 7.60 7.71
5650 5.64 6.83 7.68 8.85 7.27 7.04
ethernet 119636 10752 153945 1 6380 5.17 6.89 8.06 8.17 8.59 8.92
6400 4.92 5.41 7.73 7.91 8.11 8.44
4 6380 4.80 4.97 6.76 6.59 7.05 7.14
6400 4.36 4.48 6.19 6.05 6.52 6.77
8 6380 4.23 4.47 6.19 6.31 5.95 5.86
6400 3.95 4.16 5.74 5.86 5.38 5.23
Table 6.3: Percentage reduction in the variance of test power consumption using
the Min V ar procedure over Baseline 4 for the ISCAS’89 benchmark circuits.
No. of No. of
compacted scan Baseline 4
Circuit test patterns chains Pmax δVB 4 δTth B4
s9234 349 1 150 17.46 18.21
160 15.48 17.65
170 14.74 15.83
4 150 16.95 16.37
160 14.72 15.97
170 12.35 12.12
8 150 10.47 11.15
160 10.08 9.95
170 5.63 7.18
s15850 210 1 310 19.46 20.03
320 18.15 18.63
4 310 15.48 19.67
320 14.10 18.71
8 310 13.64 16.76
320 11.48 12.59
s35392 20146 1 895 13.32 12.53
910 12.07 11.23
930 11.70 10.81
4 895 12.38 11.92
910 10.51 11.69
930 10.35 11.38
8 895 10.03 10.39
910 8.31 9.96
930 6.13 7.42
s38417 436 1 770 9.18 8.87
780 6.53 5.21
790 5.70 5.38
4 770 5.18 6.32
780 3.49 3.67
790 3.14 3.29
8 770 4.43 4.86
780 3.39 4.12
790 3.24 3.90
s38584 313 1 735 12.01 10.24
745 11.30 9.93
755 10.79 9.45
4 735 11.54 10.82
745 8.50 8.83
755 7.51 8.07
8 735 7.38 8.93
745 6.70 6.43
755 3.31 4.39
the Min V ar over Baseline 4 for the ISCAS’89 and the IWLS’05 benchmark circuits
respectively. Table 6.5 describes the individual contribution of X-fill and pattern-
ordering procedures in reducing the variation in power consumption during test for
five benchmarks.
The Min V ar procedure is an efficient method for circuits with a large number
153
Table 6.4: Percentage reduction in the variance of test power consumption using
the Min V ar procedure over Baseline 4 for the IWLS’05 benchmark circuits.
No. of No. of
compacted scan Baseline 4
Circuit test patterns chains Pmax δVB 4 δTth B4
systemcaes 294 1 530 23.24 25.19
540 21.57 24.43
550 19.66 21.94
4 530 17.88 21.34
540 15.88 20.36
550 12.09 15.61
8 530 9.35 14.98
540 8.82 13.87
550 8.14 12.46
usb funct 237 1 960 16.87 15.43
970 15.79 14.02
980 14.13 10.64
4 960 12.62 8.72
970 10.99 6.61
980 9.07 5.04
8 960 6.10 4.33
970 3.95 2.58
980 0.99 0.68
ac97 ctrl 230 1 1025 18.68 19.13
1040 18.45 18.17
1050 17.66 17.66
4 1025 17.34 17.38
1040 16.79 15.98
1050 15.80 15.85
8 1025 15.30 15.63
1040 14.33 14.68
1050 13.25 13.38
wb conmax 413 1 1460 21.34 19.87
1500 20.21 19.05
4 1460 19.70 18.34
1500 18.88 17.37
8 1460 18.51 16.59
1500 16.09 13.24
des perf 346 1 5600 12.14 14.86
5650 11.73 13.93
4 5600 11.06 13.51
5650 9.79 13.10
8 5600 8.89 12.65
5650 7.70 12.50
ethernet 2110 1 6380 14.96 14.28
6400 14.24 13.80
4 6380 13.07 13.24
6400 12.49 12.76
8 6380 10.91 11.35
6400 10.58 10.80
of test patterns. The results show that a significant reduction in test power variation
can be obtained using the proposed framework for test data manipulation and test-
pattern-ordering. The technique also results in low cycle-to-cycle variation in test
power consumption. The “negative” reduction in Table 6.1 in a few cases can be
154
Table 6.5: Contribution of pattern-ordering in reducing the variation in test power consumption.
No. of
Circuit scan chains Baseline 1 Baseline 2 Baseline 3
Pmax δP attern O rder δP O T δP attern O rder δP O T δP attern O rder δP O T
th th th
s35392 1 895 24.32 25.89 26.37 26.53 16.26 19.84
910 28.94 24.49 21.39 16.12 16.95 19.66
930 23.71 28.09 18.87 23.54 25.65 26.71
4 895 19.52 17.37 20.07 14.60 26.43 21.04
910 18.40 27.91 18.44 25.98 24.38 22.33
930 26.57 14.02 18.35 25.92 22.54 21.94
8 895 15.90 23.52 21.76 18.13 22.66 27.69
910 22.68 21.23 21.84 18.90 20.07 23.18
930 24.86 25.30 20.44 18.68 25.66 28.91
s38584 1 735 19.15 2.43 9.30 −4.09 12.26 12.18
745 6.56 7.38 7.09 7.46 8.43 10.04
755 6.03 15.07 9.61 1.67 −4.05 14.33
4 735 10.68 3.20 4.17 12.25 −0.93 3.53
745 9.78 13.12 14.90 11.26 6.94 18.15
755 15.53 −0.54 2.95 9.34 14.07 10.58
8 735 13.57 4.87 16.42 7.51 11.89 15.41
745 16.66 16.90 3.82 12.16 10.25 10.11
755 15.17 14.38 6.43 12.66 18.71 −4.25
155
ac97 ctrl 1 1025 11.17 9.69 12.68 13.27 11.97 18.93
1040 13.38 17.22 8.76 16.49 13.25 16.96
1050 8.09 4.00 9.21 11.48 9.77 6.24
4 1025 16.37 13.26 14.27 10.10 11.88 9.36
1040 8.56 8.35 16.09 13.50 12.18 5.42
1050 5.80 17.06 17.84 4.80 9.80 8.54
8 1025 11.55 16.54 6.68 14.19 8.72 16.10
1040 5.35 4.84 13.69 17.93 4.68 6.79
1050 8.71 6.83 5.01 10.79 16.17 12.67
wb conmax 1 1460 4.00 2.69 −8.82 −3.12 1.50 −2.77
1500 −1.48 6.59 −7.74 3.82 2.61 −1.10
4 1460 −8.08 −4.46 −2.32 −2.92 −9.74 3.04
1500 3.14 −10.43 4.71 −2.41 −10.84 6.94
8 1460 −3.50 −6.76 6.80 −5.71 1.59 −8.45
1500 2.01 −4.57 −8.02 6.97 −9.41 4.16
des perf 1 5600 −5.42 −2.07 −1.85 1.19 −0.35 −3.91
5650 −4.90 −3.19 −1.54 1.12 −4.90 2.84
4 5600 −6.26 −0.91 −2.48 −4.31 −2.31 1.60
5650 0.11 −3.30 −1.07 −4.74 2.17 1.23
8 5600 1.90 −3.80 −4.07 0.10 −6.73 −6.83
5650 0.13 −5.09 −2.18 −2.30 −3.47 −1.12
attributed to the heuristic nature of the pattern-ordering procedure. The negative
entries in Table 6.5, which implies that pattern reordering is counter-productive can
be explained as follows. The test pattern set is split into multiple sets to reorder
patterns for large benchmark circuits. This is done to save CPU time for reordering.
The P attern Order heuristic appears to be ineffective for these instances. However,
X-fill alone results in significant variance reduction in each case.
Even small reductions in the variations in test power can contribute significantly
towards reducing yield loss and test escape during WLTBI. We know from Equation
(1.1) that the junction temperature of the device varies directly with the power
consumption. This indicates that a 10% variation in device power consumption will
lead to a 10% variation in junction temperatures; this can potentially result in thermal
runaway (yield loss), or under burn-in (test escape) of the device. The importance
of controlling the junction temperature for the device to minimize post-burn-in yield
loss is highlighted in [40].
All experiments were performed on a 2.4 GHz AMD Opteron processor, with 4
GB of memory. The CPU times for the Min V ar procedure (including X-fill and
pattern reordering) is in the order of hours for large benchmark circuits. The CPU
time for X-fill alone is in the order of minutes.
6.5 Summary
We have developed a new X-fill method to minimize power variation during WLTBI.
This approach is based on cycle-accurate power information for the device under
test. For N test patterns, an O(N) procedure has been presented to solve the X-fill
problem for scan-shift and capture. We have further reduced the variation in power
consumption by reordering the test-pattern set after minimum-variation X-fill. We
have compared the proposed reordering techniques to baseline methods that fill un-
156
specified bits in a test cube with the objective of reducing power consumption during
scan testing. In addition to computing the statistical variance of the test power, we
have also quantified the flatness of the power profile during test application. Ex-
perimental results for the ISCAS’89 and the IWLS’05 benchmark circuits show that
there is a moderate to significant reduction in power variation if patterns are carefully
manipulated and ordered using the proposed framework. Since the junction temper-
atures in the device under test are directly proportional to the power consumption,
even small reductions in the power variance offer significant benefits for WLTBI.
157
Chapter 7
A prerequisite to assembling 3-D ICs and SiP devices, is the ability to manufacture
and test KGD solutions in a cost-effective manner. The research reported in this
dissertation explores multiple solutions for wafer-level manufacturing test of SoCs
and KGDs. The goal of this research is to provide robust and scalable engineering
solutions for wafer-level test and optimization techniques for test planning. According
to the ITRS [1], each device in the future can be considered to be a SoC or a 3-D IC
(or SiP). The need for flexible test solutions to accommodate increasing integration
trends has also been emphasized [1]. The high test cost associated with the testing of
these devices motivates the need for effective test techniques and test planning at the
wafer level. Significant yield improvements early in the product/process development
cycle can be achieved by efficient wafer-level test techniques [125].
In this thesis, we have addressed the design of a test infrastructure at the wafer
level. Efficient test techniques at wafer level under resource constraints have also
been developed. We have developed test-planning approaches for wafer-level test
that address test-resource optimization, defect screening, test scheduling, and test-
data manipulation for digital and mixed-signal SoCs, as well as for KGDs. This thesis
research also focuses on reducing the capital expenditure on ATE at the wafer level
by combining the burn-in and test processes.
158
techniques were presented to determine defect probabilities for the individual cores
in an SoC. An ILP model that incorporates defect probabilities to determine the test-
lengths for each core in the SoC was also developed with the objective of maximizing
defect screening at wafer sort. The ILP approach presented is computationally effi-
cient and takes only a fraction of a second even for the largest SoC test benchmarks
from Philips. Experimental results for the ITC’02 SoC test benchmarks have shown
that the test-length selection procedure can lead to significant defect-screening at
wafer sort. An efficient heuristic method that scales well for larger SoCs was also
presented. A test-length selection problem for RPCT of core-based SoCs was also
formulated and solved using ILP and heuristic-based techniques.
159
with appropriate baseline methods. The relevance of the pattern-ordering problem
in the context of WLTBI was further emphasized by quantifying the “flatness” in
power profile during test application. Experimental results were presented for several
ISCAS’89 and IWLS’05 benchmark circuits to show the reduction in power variation
obtained using the proposed pattern-ordering techniques.
This thesis has explored a number of wafer-level test solutions that reduce product
cost. The focus has been on new test planning methods for digital SoCs, as well as
for mixed-signal SoCs and KGDs. As next-generation semiconductor devices become
more integrated with multiple functionalities, a number of new test challenges will
continue to emerge. We next summarize future research directions. The topics dis-
cussed below are aligned with the theme of intelligently performing test at the wafer
level to achieve maximum cost benefits.
160
7.2.1 Integrated test-length and test-pattern selection for
core-based SoCs
In this thesis, we have limited ourselves to test-length selection for core-based SoCs
under resource constraints of test time and TAM width. The ultimate objective of
wafer sort testing is to maximize the detection of faulty die; this maximizes profit
margins by lowering packaging costs. The test-length selection framework proposed
in this thesis does not address the issue of pattern grading to choose the reduced test
pattern set. In [126], “output deviation” was proposed as a coverage-metric and a
test-pattern grading method for pattern-reordering. It was also shown in [126] that
test sets that are carefully reordered using such a metric as basis can potentially yield
“steep” fault coverage curves.
This research direction will provide an intelligent framework for test-pattern se-
lection. The number of test patterns for wafer sort under resource constraints can be
determined using the framework presented in Chapter 2; the choice of test patterns
however, can be determined using output deviations. This process of test-pattern
selection can potentially lead to improved defect screening at the wafer level under
resource constraints.
161
7.2.2 Multiple scan-chain design for WLTBI
Multiple scan chains have been primarily used in DfT architectures to lower test
application times. With increasing emphasis on power consumption, several design
techniques for multiple scan designs have been developed [127, 128]. In [127] a single
scan chain is partitioned into multiple smaller scan chains to minimize the number
of transitions in the scan chains. Thus far, the layout information, information on
clock domains, and other geometric constraints have not been incorporated in such
design techniques.
162
framework.
One of the primary benefits of a test scheduling method, specifically suited for
WLTBI, is the reduction in power variation during test application. In this thesis,
we presented an efficient test scheduling approach to minimize the power variation
during test. However, the proposed test scheduling method does not consider the
placement of the cores in SoC while formulating the test schedule. The simultaneous
testing of cores that are placed close to one another can lead to hot-spots during test
application. It is also beneficial to stress the DUT uniformly during WLTBI.
It is therefore important to study the activity patterns of the cores during WLTBI,
and construct test schedules accordingly, to flatten the temperature profile of the SoC.
Modifying existing academic tools such as HOTSPOT [131], or using commercial
tools such as FLOTHERM [129] to incorporate die cooling capabilities under burn-in
conditions, and constantly varying device power, will lead to more accurate thermal
predictions for WLTBI. This can potentially lead to the minimization of yield loss
and test escapes during WLTBI.
163
Bibliography
[2] M. Bushnell and V. Agrawal, Essentials of Electronic Testing for Digital, Mem-
ory and Mixed-Signal VLSI Circuits. Kluwer, 2000.
[6] “A comparison of wafer level burn-in & test platforms for device qual-
ification and known good die (KGD) production”, http://www.delta-
v.com/images/White Paper - Comparing WLBT Platforms.pdf.
[7] “IEEE standard testability method for embedded core-based integrated cir-
cuits,” IEEE Std 1500-2005, pp. 1–117, 2005.
[8] V. Iyengar, K. Chakrabarty, and E. Marinissen, “Test wrapper and test access
mechanism co-optimization for system-on-chip,” Journal of Electronic Testing:
Theory and Applications, vol. 18, pp. 213–230, Apr. 2002.
[11] S. K. Goel and E. J. Marinissen, “Effective and efficient test architecture design
for SOCs,” in Proceedings of International Test Conference, 2002, pp. 529–538.
[12] Q. Xu and N. Nicolici, “Modular SoC testing with reduced wrapper count,”
IEEE Transactions on Computer-Aided Design, vol. 24, pp. 1894–1908, Dec.
2005.
164
[13] K. Chakrabarty, “Optimal test access architectures for system-on-a-chip,” ACM
Transactions on Design Automation of Electronic Systems, vol. 6, pp. 26–49,
Jan. 2001.
[18] E. Larsson, K. Arvidsson, H. Fujiwara and Z. Peng, “Efficient test solutions for
core-based designs,” IEEE Transactions on Computer-Aided Design, vol. 23,
pp. 758–775, May 2004.
[19] W. Zou, S. M. Reddy, and I. Pomeranz, “SoC test scheduling using simulated
annealing,” in Proceedings of VLSI Test Symposium, 2003, pp. 325–330.
[21] P. C. Maxwell, “Wafer-package test mix for optimal defect detection and test
time savings,” IEEE Design & Test of Computers, vol. 20, pp. 84–89, Sep. 2003.
[23] T. J. Powell, J. Pair, M. John, and D. Counce, “Delta IDDQ for testing relia-
bility,” in Proceedings of VLSI Test Symposium, 2000, pp. 439–443.
165
[26] P. Pochmuller, Configuration for carrying out burn-in processing operations of
semiconductor devices at wafer level. U. S. Patent Office, Mar 2003, Patent
number 6,535,009.
[28] S. Ozev and C. Olgaard, “Wafer-level RF test and DfT for VCO modulating
transceiver architectures,” in Proceedings of IEEE VLSI Test Symposium, 2004,
pp. 217–222.
[29] A. B. Kahng, “The road ahead: The significance of packaging,” IEEE Design
and Test, pp. 104–105, Nov. 2002.
[32] G. Bao, “Challenges in low cost test approach for ARM core based mixed-
signal SoC DragonBalltm -MX1,” in Proceedings of International Test Confer-
ence, 2003, pp. 512–519.
[33] J. Sweeney and A. Tsefrekas, “Reducing test cost through the use of digital
testers for analog tests,” in Proceedings of International Test Conference, 2005,
pp. 1–9.
[34] M. Allison, “Wafer probe acquires a new importance in testing,” IEEE Design
& Test of Computers, vol. 5, pp. 45–49, May. 2005.
[35] A. Singh, P. Nigh, and C. M. Krishna, “Screening for known good die (KGD)
based on defect clustering: an experimental study,” in Proceedings of Interna-
tional Test Conference, 1997, pp. 362–371.
[37] “Innovative burn-in testing for SoC devices with high power dissipation”,
http://www.advantest.de/dasat/index.php?cid=100363&conid=101096&
sid=17d2c133fab7783a035471392fd60862.
166
[39] P. Nigh, “Scan-based testing: The only practical solution for testing
asic/consumer products,” in Proceedings of International Test Conference,
2002.
[40] A. Vassighi, O. Semenov, and M. Sachdev, “Thermal runaway avoidance during
burn-in,” in Proceedings of International Reliability Physics Symposium, 2004,
pp. 655–656.
[41] K. Kanda, K. Nose, H. Kawaguchi, and T. Sakurai, “Design impact of positive
temperature dependence on drain current in Sub-1V CMOS VLSIs,” IEEE
Journal of Solid-State Circuits, vol. 36, pp. 1559–1564, Oct. 2001.
[42] E. Larsson, J. Pouget and Z. Peng, “Defect-aware SoC test scheduling,” in
Proceedings of VLSI Test Symposium, 2004, pp. 228–233.
[43] U. Ingelsson, S. K. Goel, E. Larsson and E. J. Marinissen, “Test scheduling for
modular SOCs in an abort-on-fail environment,” in Proceedings of European
Test Symposium, 2005, pp. 8–13.
[44] R. W. Bassett, B. J. Butkus, S. L. Dingle, M. R. Faucher, P. S. Gillis, J. H.
Panner, J. G. Petrovick, and D. L. Wheater, “Low-cost testing of high-density
logic components,” in Proceedings of International Test Conference, 1989, pp.
550–758.
[45] J. Darringer, E. Davidson, D. J. Hathaway, B. Koenemann, M. Lavin, J. K.
Morrell, K. Rahmat, W. Roesner, E. Schanzenbach, G. Tellez, and L. Tre-
villyan, “EDA in IBM: Past, present, and future,” IEEE Transactions on
Computer-Aided Design, vol. 19, pp. 1476–1497, Dec. 2000.
[46] H. F. H. Vranken, T. Waayers and D. Lelouvier, “Enhanced reduced pin-count
test for full scan design,” in Proceedings of International Test Conference, 2001,
pp. 738–747.
[47] J. Jahangiri, N. Mukherjee, W. T. Cheng, S. Mahadevan and R. Press, “Achiev-
ing high test quality with reduced pin count testing,” in Proceedings of Asian
Test Symposium, 2005, pp. 312–317.
[48] T. G. Foote, D. E. Hoffman, W. V. Huott, T. J. Koprowski, M. P. Kusko,
and B. J. Robbins, “Testing the 500-MHz IBM S/390 Microprocessor,” IEEE
Design & Test of Computers, vol. 15, no. 3, pp. 83–89, 1998.
[49] B. Koupal and T. Lee and B. Gravens. Bluetooth Single Chip Radios: Holy
Grail or White Elephant, http://www.signiatech.com/pdf/paper two chip.pdf.
[50] C. Pan and K. Cheng, “Pseudo-random testing and signature analysis for
mixed-signal circuits,” in Proceedings of International Conference on Computer
Aided Design, 1995, pp. 102–107.
167
[51] N. A. M. Hafed and G. W. Roberts, “A stand-alone integrated test core for
time and frequency domain measurements,” in Proceedings of International
Test Conference, 2001, pp. 1190–1199.
[52] A. Sehgal, F. Liu, S. Ozev, and K. Chakrabarty, “Test planning for mixed-
signal SOCs with wrapped analog cores,” in Proceedings of Design Automation
and Test in Europe Conference, 2005, pp. 50–55.
[54] S. Bahukudumbi and K. Bharath, “A low overhead high speed histogram based
test methodology for analog circuits and IP cores,” in Proceedings of Interna-
tional Conference on VLSI Design, 2005, pp. 804–807.
[55] A. Sehgal, S. Ozev, and K. Chakrabarty, “Test infrastructure design for mixed-
signal SOCs with wrapped analog cores,” IEEE Transactions on VLSI Systems,
vol. 14, pp. 292–304, Mar. 2006.
[56] M. d’Abreu, “Noise – its sources, and impact on design and test of mixed signal
circuits,” in Proceedings of International Workshop on Electronic Design, Test
and Applications, 1997, pp. 370–374.
[60] Y. Zorian, “A distributed BIST control scheme for complex VLSI devices,” in
Proceedings of VLSI Test Symposium, 1993, pp. 4–9.
[61] S. Wang and S. K. Gupta, “An automatic test pattern generator for mini-
mizing switching activity during scan testing activity,” IEEE Transactions on
Computer-Aided Design, vol. 21, pp. 954–968, Aug. 2002.
[62] P. Girard, “Survey of low-power testing of VLSI circuits,” IEEE Design & Test
of Computers, vol. 19, pp. 80–90, May 2002.
168
[63] R. Sankaralingam, R. R. Oruganti, and N. A. Touba, “Static compaction tech-
niques to control scan vector power dissipation,” in Proceedings of VLSI Test
Symposium, 2000, pp. 35–40.
[70] W. Li, S. M. Reddy, and I. Pomeranz, “On reducing peak current and power
during test,” in Proceedings of ISVLSI, 2005, pp. 156–161.
[74] ——, “Test-length selection, reduced pin-count testing, and tam optimization
for wafer-level testing of core-based digital SoCs,” in Proceedings of Interna-
tional Conference on VLSI Design, 2007, pp. 459–464.
169
[75] I. Koren, Z. Koren, and C. H. Strapper, “A unified negative-binomial distri-
bution for yield analysis of defect-tolerant circuits,” IEEE Transactions on
Computers, vol. 42, pp. 724–734, Jun. 1993.
[76] I. Koren and C. H. Strapper, Yield Models for Defect Tolerant VLSI circuit: A
Review. Plennum, 1989.
[78] J. A. Cunningham, “The use and evaluation of yield models in integrated circuit
manufacturing,” IEEE Transactions on Semiconductor Manufacturing, vol. 3,
pp. 60–71, May 1990.
[82] S. K. Goel and E. J. Marinissen, “Layout-driven SoC test architecture design for
test time and wire length minimization,” in Proceedings of Design Automation
and Test in Europe Conference, 2003, pp. 10 738–10 743.
[84] E. Larsson and Z. Peng, “An integrated framework for the design and opti-
mization of SoC test solutions,” Journal of Electronic Testing: Theory and
Applications, vol. 18, pp. 385–400, Feb. 2002.
[85] V. Iyengar and K. Chakrabarty, “Test bus sizing for system-on-a-chip,” IEEE
Transactions on Computers, vol. 51, pp. 449–459, May 2005.
[87] E. Kreyszig, Advanced Engineering Mathematics, 8th ed. John Wiley & Sons
Inc., 1998.
170
[88] M. Berkelaar et al., “lpsolve: Open source (mixed-integer) linear programming
system”. Version 5.5 dated May 16, 2005
URL: http://www.geocities.com/lpsolve.
[98] A. Frisch and T. Almy, “HABIST: histogram based analog built in self test,”
in Proceedings of International Test Conference, 1997, pp. 760–767.
[99] E. Acar and S. Ozev, “Delayed-RF based test development for fm transceivers
using signature analysis,” in Proceedings of International Test Conference,
2004, pp. 783–792.
171
[102] U. Ingelsson, S. K. Goel, E. Larsson, and E. J. Marinissen, “Test scheduling for
modular SOCs in an abort-on-fail environment,” in Proceedings of European
Test Symposium, 2005, pp. 8–13.
[103] D. E. Becker and A. Sandborn, “On the use of yielded cost in modeling elec-
tronic assembly processes,” IEEE Transactions on Electronics Packaging Man-
ufacturing, vol. 24, pp. 195–202, Jul. 2001.
[104] S. Edbom and E. Larsson, “An integrated technique for test vector selection
and test scheduling under test time constraint,” in Proceedings of Thirteenth
Asian Test Symposium, 2004, pp. 254–257.
[106] http://www.mosis.org.
[107] M. Shen, Z. Li-Rong, and H. Tenhunen, “Cost and performance analysis for
mixed-signal system implementation: System-on-chip or system-on-package,”
IEEE Transactions on Electronics Packaging Manufacturing, vol. 25, pp. 522–
545, Oct. 2002.
[112] M. Garey and D. Johnson, Computers and Intractability; A Guide to the Theory
of NP-Completeness. W. H. Freeman, 1979.
172
[115] P. M. Rosinger, B. M. Al-Hashimi, and N. Nicolici, “Power profile manipulation:
A new approach for reducing test application time under power constraints,”
IEEE Transactions on Computer-Aided Design, vol. 21, pp. 1217–1225, May
2002.
[117] S. Ghosh, S. Basu, and N. A. Touba, “Joint minimization of power and area in
scan testing by scan cell reordering,” in Proceedings of Annual Symposium on
VLSI, 2003, pp. 246–249.
[124] R. Sankaralingam and N. A. Touba, “Controlling peak power during scan test-
ing,” in Proceedings of VLSI Test Symposium, 2002, pp. 153–159.
173
[127] D. Ghosh, S. Bhunia, and K. Roy, “Multiple scan chain design technique for
power reduction during test application in BIST,” in Proceedings of Interna-
tional Symposium on Defect and Fault Tolerance in VLSI Systems, 2003, pp.
191–198.
[128] N. Nicolici and B. M. Al-Hashimi, “Multiple scan chains for power minimization
during test application in sequential circuits,” IEEE Transactions on Comput-
ers, vol. 51, pp. 721–733, Jun. 2002.
174
Biography
PERSONAL DATA
Date of birth: April 4, 1982.
Place of birth: Chennai, Tamil Nadu, India.
EDUCATION
Doctor of Philosophy, Duke University, USA, expected 2008.
Master of Science, New Mexico State University, USA, 2005.
Bachelor of Engineering, University of Madras, India, 2003.
PUBLICATIONS
• Journal Articles
175
4. S. Bahukudumbi, K. Chakrabarty and R. Kacprowicz, “Test scheduling for
wafer-level test-during-burn-in of core-based SoCs”, Proc. Design Automation
and Test in Europe (DATE) Conference, pp. 1103-1106, 2008.
5. S. Bahukudumbi and K. Chakrabarty, “Test-pattern ordering for wafer-level
test-during-burn-in”, accepted for publication in Proc. IEEE VLSI Test Sym-
posium, 2008.
∗
6. S. Bahukudumbi and K. Bharath, “A low overhead high speed histogram based
test methodology for analog circuits and IP Cores”, Proc. IEEE International
Conference on VLSI Design, pp. 804-807, 2005.
∗
7. P. Srivatsan, S. Bahukudumbi and P. P. Bhaskaran, “DYNORA: A new caching
technique”, Proc. IEEE Euromicro Symposium on Digital Systems Design, pp.
70-75, 2003.
• Submitted Papers
Professional Activities
• IEEE student member
∗
Not related to Ph.D. thesis work.
176