Explanation - Unit 2 Collecting Engineering Data
Explanation - Unit 2 Collecting Engineering Data
An effective data-collection procedure can greatly simplify the analysis and lead to improved understanding
of the population or process that is being studied. We now consider some examples of these data-collection
methods.
The reflux rate should be held constant for this process. Consequently, production personnel change this
very infrequently.
A retrospective study would use either all or a sample of the historical process data archived over some
period of time. The study objective might be to discover the relationship among the two temperatures and
the reflux rate on the acetone concentration in the output product stream.
However, this type of study presents some problems:
1. We may not be able to see the relationship between the reflux rate and acetone concentration,
because the reflux rate didn’t change much over the historical period.
2. The archived data on the two temperatures (which are recorded almost continuously) do not
correspond perfectly to the acetone concentration measurements (which are made hourly). It may
not be obvious how to construct an approximate correspondence.
3. Production maintains the two temperatures as closely as possible to desired targets or set points.
Because the temperatures change so little, it may be difficult to assess their real impact on acetone
concentration.
4. In the narrow ranges within which they do vary, the condensate temperature tends to increase with
the reboil temperature. Consequently, the effects of these two process variables on acetone
concentration may be difficult to separate.
As you can see, a retrospective study may involve a lot of data, but those data may contain relatively little
useful information about the problem. Furthermore, some of the relevant data may be missing, there may
be transcription or recording errors resulting in outliers (or unusual values), or data on other important
factors may not have been collected and archived. In the distillation column, for example, the specific
concentrations of butyl alcohol and acetone in the input feed stream are a very important factor, but they
are not archived because the concentrations are too hard to obtain on a routine basis. As a result of these
types of issues, statistical analysis of historical data sometimes identifies interesting phenomena, but solid
and reliable explanations of these phenomena are often difficult to obtain.
Generally, an observational study tends to solve problems 1 and 2 above and goes a long way toward
obtaining accurate and reliable data. However, observational studies may not help resolve problems 3 and
4.
In this simple comparative experiment, the engineer is interested in determining if there is any
difference between the 3/32- and 1/8-inch designs. An approach that could be used in analyzing the
data from this experiment is to compare the mean pull-off force for the 3/32-inch design to the mean
pull-off force for the 1/8-inch design using statistical hypothesis testing, which is discussed in detail
in future modules.
2.5 Observing Processes Over Time
Often data are collected over time. In this case, it is usually very helpful to plot the data versus time in a time
series plot. Phenomena that might affect the system or process often become more visible in a time-oriented
plot and the concept of stability can be better judged.
Figure 1.8 is a dot diagram of acetone concentration readings taken hourly from the distillation column
described in Section 2.2. The large variation displayed on the dot diagram indicates considerable variability
in the concentration, but the chart does not help explain the reason for the variation.
The time series plot is shown in Figure 1.9. A shift in the process mean level is visible in the plot and an
estimate of the time of the shift can be obtained.
W. Edwards Deming, a very influential industrial statistician, stressed that it is important to understand the
nature of variability in processes and systems over time.
The funnel was aligned as closely as possible with the center of the target. He then used two different
strategies to operate the process.
(1) He never moved the funnel. He just dropped one marble after another and recorded the distance
from the target.
(2) He dropped the first marble and recorded its location relative to the target. He then moved the
funnel an equal and opposite distance in an attempt to compensate for the error. He continued to
make this type of adjustment after each marble was dropped.
After both strategies were completed, he noticed that the variability of the distance from the target for
strategy 2 was approximately twice as large than for strategy 1. The adjustments to the funnel increased the
deviations from the target. The explanation is that the error (the deviation of the marble’s position from the
target) for one marble provides no information about the error that will occur for the next marble.
Consequently, adjustments to the funnel do not decrease future errors. Instead, they tend to move the
funnel farther from the target.
This interesting experiment points out that adjustments to a process based on random disturbances can
actually increase the variation of the process. This is referred to as overcontrol or tampering.
Adjustments should be applied only to compensate for a nonrandom shift in the process—then they
can help.
The question of when to apply adjustments (and by what amounts) begins with an understanding of the
types of variation that affect a process. The use of a control chart is an invaluable way to examine the
variability in time-oriented data. Figure 1.13 presents a control chart for the concentration data from Figure
1.9.
The center line on the control chart is just the average of the concentration measurements for the first 20
samples (𝑥𝑥̅ = 91.5 g∕l) when the process is stable. The upper control limit and the lower control limit are a
pair of statistically derived limits that reflect the inherent or natural variability in the process. These limits
are located 3 standard deviations of the concentration values above and below the center line. If the process
is operating as it should without any external sources of variability present in the system, the concentration
measurements should fluctuate randomly around the center line, and almost all of them should fall between
the control limits.
In the control chart of Figure 1.13, the visual frame of reference provided by the center line and the control
limits indicates that some upset or disturbance has affected the process around sample 20 because all of the
following observations are below the center line, and two of them actually fall below the lower control limit.
This is a very strong signal that corrective action is required in this process. If we can find and eliminate the
underlying cause of this upset, we can improve process performance considerably. Thus, control limits serve
as decision rules about actions that could be taken to improve the process.
Furthermore, Deming pointed out that data from a process are used for different types of conclusions.
Sometimes we collect data from a process to evaluate current production.
For example, we might sample and measure resistivity on three semiconductor wafers selected from
a lot and use this information to evaluate the lot.
This is called an enumerative study. However, in many cases, we use data from current production to
evaluate future production. We apply conclusions to a conceptual, future population. Deming called this an
analytic study. Clearly this requires an assumption of a stable process, and Deming emphasized that control
charts were needed to justify this assumption. See Figure 1.14 as an illustration.
The use of control charts is a very important application of statistics for monitoring, controlling, and
improving a process. The branch of statistics that makes use of control charts is called statistical process
control, or SPC.