ctrl
ctrl
net/publication/220487302
CITATIONS READS
7 359
1 author:
SEE PROFILE
All content following this page was uploaded by Peter David Turney on 05 January 2014.
http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=rtdoc&an=5763791&lang=en
http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=rtdoc&an=5763791&lang=fr
Access and use of this website and the material on it are subject to the Terms and Conditions set forth at
http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/jsp/nparc_cp.jsp?lang=en
READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE.
L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site
http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/jsp/nparc_cp.jsp?lang=fr
LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.
Peter Turney
Knowledge Systems Laboratory
Institute for Information Technology
National Research Council Canada
Ottawa, Ontario, Canada
K1A 0R6
613-993-8564 (voice)
613-952-7151 (fax)
[email protected]
Abstract
We have analyzed manufacturing data from several different semiconductor manu-
facturing plants, using decision tree induction software called Q-YIELD. The soft-
ware generates rules for predicting when a given product should be rejected. The
rules are intended to help the process engineers improve the yield of the product,
by helping them to discover the causes of rejection. Experience with Q-YIELD has
taught us the importance of data engineering — preprocessing the data to enable
or facilitate decision tree induction. This paper discusses some of the data engi-
neering problems we have encountered with semiconductor manufacturing data.
The paper deals with two broad classes of problems: engineering the features in a
feature vector representation and engineering the definition of the target concept
(the classes). Manufacturing process data present special problems for feature
engineering, since the data have multiple levels of granularity (detail, resolution).
Engineering the target concept is important, due to our focus on understanding the
past, as opposed to the more common focus in machine learning on predicting the
future.
1. Introduction
We define data engineering as the transformation of raw data into a form useful as
input to algorithms for inductive learning.1 This paper is concerned with the trans-
formation of semiconductor manufacturing data for input to a decision tree induc-
tion algorithm. We have been using decision tree induction for process
optimization. The optimization task that we address is to discover what aspects of a
manufacturing process are responsible for a given class of rejected products. A
product in semiconductor manufacturing may be considered to be an integrated
circuit, a wafer (a disk of silicon, usually holding about 100 to 1,000 integrated cir-
cuits), or a batch of wafers (usually about 20 to 30 wafers). A product is usually
accepted or rejected on the basis of electrical measurements. (See Van Zant (1986)
for a good introduction to semiconductor manufacturing.)
We have analyzed data from several different semiconductor manufacturing
plants. The data were analyzed using Q-YIELD, which generates rules for predict-
ing when a given product will be rejected, given certain process measurements
(Famili & Turney, 1991; 1992).2 The rules are intended to help process engineers
improve the yield of the product, by helping them discover the causes of rejection.
In general, there are two types of applications for inductive learning algorithms:
they may be used to predict the future or to understand the past. Our emphasis has
been on understanding the past. This places certain constraints on the software that
we use. For example, it is very important that the induced model should be readily
understandable by process engineers, which excludes neural network models.
This real-world application of machine learning has presented us with some
interesting technical problems, which do not seem to have been considered in the
machine learning literature. Section 2 discusses the problems of engineering the
features. Manufacturing process data have multiple levels of granularity (levels of
detail; levels of resolution), which makes the data difficult to represent in the stan-
dard feature vector notation. Section 3 discusses the problems of engineering the
classes. Most papers on inductive learning assume that the definition of the target
concept is given. In our work, determining the best definition of the target concept
is a large part of the task. For each problem we describe, we outline the solutions
we have adopted and the open questions. In general, we have not had the resources
to validate our solutions by comparing them with alternative approaches.
The conclusion is that data engineering is essential to the successful applica-
tion of decision tree induction to semiconductor manufacturing. Data engineering
is currently much more an art than a science. We present here a list of recipes for
data engineering with semiconductor manufacturing data, but what we need is a
unifying scientific foundation for this diverse collection of recipes.
reason to record this information at the site-level (for example), since all the site-
level temperature measurements would be (approximately) identical within a given
batch. Site-level and IC-level measurements are typically electrical measurements
that are used for quality control. A wafer may be rejected if certain site-level mea-
surements are not within bounds. An IC may be rejected if certain IC-level mea-
surements are not within bounds. Part of the reason that granularity is important is
that decisions (accept or reject) are made at different levels of granularity (reject a
whole wafer or reject a single IC).
The data at these different levels of granularity are frequently stored in sepa-
rate databases. Data from the manufacturing process are recorded at the batch-level
in one database, while data from electrical testing are recorded at the IC-level in a
second database. This introduces the mundane difficulty of extracting data from
two or more databases, but there is also a challenging data engineering problem:
The data have a structure that is not naturally represented in the feature vector for-
mat required by most decision tree induction algorithms.
So far, our practice has been to convert the data to a feature vector format by
moving all measurements up to the highest level of granularity that is relevant for
the given manufacturing problem (often the batch-level). We have experimented
with two methods for transforming lower-level data to higher-level data:
Method A: Suppose that we wish to move site-level data up to the batch-level. Let
X be an electrical parameter (the voltage drop across a diode, for example) that is
measured at five test sites on a wafer. In a batch of 24 wafers, there will be 120
( 24 × 5 = 120 ) measurements of X . To bring X up from the site-level to the
batch-level, we can introduce five new batch-level features:
1. the average of X in the 120 measurements in the given batch
2. the standard deviation of X in the 120 measurements in the given batch
3. the median of X in the 120 measurements in the given batch
4. the minimum of X in the 120 measurements in the given batch
5. the maximum of X in the 120 measurements in the given batch
This can result in a large feature vector, since every lower-level measurement
results in five higher-level features. It can also result in a shortage of cases. A data-
base with 1,200 records (cases, examples), where each field (attribute, feature,
measurement) is measured at the site-level, yields 10 batch-level feature vectors
( 1, 200 ⁄ 120 = 10 ). Thus an abundance of data is transformed into a shortage of
data. However, if the manufacturing problem is due to fluctuations in the process at
the batch-level, then the apparent abundance of data was an illusion, since the data-
base only has a small amount of information about batch-level fluctuations.
extract sequential order from the batch ID. Most plants assign an ID to each batch
and the IDs are often a combination of digits that are assigned sequentially. This
ordering information is sufficient to detect unique events — it may be unnecessary
to know the absolute time of an event.
of a process is usually above 90% but sometimes dips below 90% and the process
engineer wants to understand what is causing the dip. In the simplest case, there is
a batch-level measurement called yield and the target class is “yield is less than
90%”. We can define the target class as a symbolic variable with the value 1 when
the yield is below 90% and the value 0 when the yield is above 90%.
We convert the continuous yield variable to a discrete variable using a thresh-
old, such as 90%. There are (at least) three ways to set a threshold for the yield. (1)
We may use external factors (economic factors, management decisions, pressure
from competition) to determine the desired minimum yield for the process; (2) we
can choose the median yield, so that we have a balance of examples and counter-
examples; or (3) we can look at the data to see whether there is a natural threshold,
based on clusters in the data. We find that we tend to get better results with
approaches (2) and (3), rather than (1). We often experiment with several different
thresholds. We use visual aids to suggest possible thresholds. One aid is a histo-
gram of the yield (the x axis is the yield and the y axis is the number of batches
with the given yield). Sometimes there will be a valley in the histogram that sug-
gests a natural place for a threshold. Another aid is a plot of the yield over time (the
x axis is time and the y axis is yield). Sometimes there are recurrent dips in the plot
that can readily be isolated with the right threshold.
The yield of a process is a composite variable, since there are many different
reasons for rejection of a wafer or IC. In a process with a yield of 90%, there may
be 30 different classes of problems in the 10% of parts that are rejected. Treating
each problem separately can make the task simpler for the induction algorithm.
Suppose that electrical measurements are made at five test sites on a wafer and a
wafer is rejected when two or more of the five measurements of electrical parame-
ter X are above a threshold T . This electrical measurement X is one way that a
wafer can be rejected and we can focus on X instead of examining yield. To bring
X up from the wafer-level to the batch-level, we can introduce a new variable Y
defined as the percentage of the wafers for which two or more of the five measure-
ments of electrical parameter X are above the threshold T (as we discussed in
Section 2.1). We can then define two classes of batches, those for which Y is
below some threshold U and those for which Y is above U . The target class is “ Y
is below U ”. The same issues arise in setting the threshold U as arose in setting
the threshold on the yield.
Some open questions are:
1. Can we automate the selection of a threshold in the definition of a target class?
2. Should we use regression trees instead of classification trees (Breiman et al.,
1984)? Is there a way to make regression trees easier to understand?
4. Conclusions
The above examples show that a significant amount of data engineering is involved
in the application of decision tree induction to semiconductor manufacturing data.
There are many open questions raised by our data engineering methods and many
assumptions that have not yet been investigated. We believe that it is possible and
worthwhile to build a firm theoretical foundation for data engineering. We are
hopeful that the recipes and open questions raised here can contribute to such a
foundation.
Notes
1. This definition suggests that data engineering is always done by hand. We do
not mean to exclude the possibility of automatic data engineering, but we have
not been able to invent a more satisfying definition of data engineering.
2. Q-YIELD is a commercial product, available from Quadrillion Corporation,
380 Pinhey Point Road, Dunrobin, Ontario, Canada, K0A 1T0. The software
is based on a prototype that was developed at the NRC.
3. These issues were raised by Joel Martin, in conversation.
4. For some tasks, it is reasonable to consider a lower level of granularity, such
as the components (flip flops, transistors, gates) within an IC. The four levels
listed here are not meant to be exhaustive.
5. There are higher levels of granularity, such as a production run, but we do not
usually analyze the data at this level of granularity.
Acknowledgments
Thanks to Michael Weider and Joel Martin for their very helpful comments on ear-
lier versions of this paper.
References
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and
regression trees. California: Wadsworth.
Famili, A. and Turney, P.D. (1991), “Intelligently helping the human planner in
industrial process planning”, Artificial Intelligence for Engineering Design,
Analysis, and Manufacturing, Vol. 5, No. 2, pp. 109-124.
Famili, A. and Turney, P.D. (1992), “Application of machine learning to industrial
planning and decision making”, in Artificial Intelligence Applications in Man-
ufacturing, edited by A. Famili, S. Kim, and D. Nau, MIT Press, Cambridge,
MA, pp. 1-16.
Lavrac, N., & Dzeroski, S. (1994). Inductive Logic Programming: Techniques and
Applications. New York: Ellis Horwood.
Van Zant, P. (1986). Microchip Fabrication: A Practical Guide to Semiconductor
Processing. California: Semiconductor Services.