Applying Artificial Neural Networks and Virtual Experimental Design to Quality Improvement of Two Industrial Processes
Applying Artificial Neural Networks and Virtual Experimental Design to Quality Improvement of Two Industrial Processes
1, 101–118
Artificial neural networks (ANNs) are powerful tools to model the non-linear
cause-and-effect relationships inherent in complex production processes, usually
for process and quality control. This paper substantiates the concurrent applica-
tion of ANNs and virtual design of experiments to quality improvement. For a
chemical manufacturing process and a printed circuit board machining process,
respectively, empirical ANN models were constructed and validated using histor-
ical data, which were further used to predict the outputs of well-designed process
Downloaded By: [Montana State University] At: 19:04 1 September 2009
settings. The predicted results were then used to perform statistical tests and
identify the significant factors and interactions that affect the quality-related
output variables. For the production of a resin intermediate, it was revealed
that the combination of low water concentration and an appropriate ratio of
raw materials increases both the yield and product quality in a synergistic
manner. For the machining of printed circuit board slot by a milling cutter, it
was concluded that a high forwarding speed was preferred for the better quality of
the milled surface. For both cases, the preliminary conclusions lead to the direc-
tions of further real-world experiments for quality improvement. The data mining
approach integrating ANNs and virtual design of experiments showed great
potential to achieve a better understanding of process behaviour and to improve
the process quality efficiently.
1. Introduction
To improve the quality of any complex manufacturing process, it is important
first to achieve quantitative understanding of process behaviour, such as possible
cause-and-effect relationships inherent in the process. In addition to statistically
designed experiments, great importance is attached to full utilization of historical
data, which were gathered either during the course of production or from ill-designed
experiments. Such data, existing in the data repository of companies, are referred to as
happenstance data, so called because they were not collected for a well-designed
experiment under controlled conditions. However, these happenstance data possess
potential value to quality improvement efforts if they contain information on process
behaviour. Although happenstance data would be better used to ask questions rather
than to reach firm conclusions (Pyzdek 1999), if coupled with appropriate data-mining
approaches, they can be used to model the functional relationships in the process
and thus draw preliminary conclusions indicating the direction for improvement.
International Journal of Production Research ISSN 0020–7543 print/ISSN 1366–588X online # 2004 Taylor & Francis Ltd
http://www.tandf.co.uk/journals
DOI: 10.1080/00207540310001602937
102 X. Shi et al.
real world are usually implicit or complicated, need not be made. Third, ANNs
provide universal approximation functions flexible in modelling linear and non-
linear relationships.
The ANN paradigm adopted in this study was the multiplayer feedforward
neural network, of which a typical architecture is shown in figure 1. The nodes in
the input and output layers consist of independent variables and response vari-
able(s), respectively. One or two hidden layers are included to model the dependency
based on the complexity of relationship(s). For a feedforward network, signals are
propagated from the input layer through the hidden layer(s) to the output layer, and
each node in a layer is connected in the forward direction to every node in the next
layer. Every node simulates the function of an artificial neuron. The inputs are
linearly summated utilizing connection weights and bias terms and then transformed
via a non-linear transfer function.
For the training of the networks, we adopted the error back-propagation (BP)
algorithm. All the connection weights and bias terms for nodes in different layers are
initially randomized and then iteratively adjusted based on certain learning rules.
For each given sample, the inputs are forwarded through the network until they
reach the output layer producing output values, which are then compared with the
target values. Errors are computed for the output nodes and propagated back to the
connections stemming from the input layer. The weights are systematically modified
to reduce the error at the nodes, first in the output layer and then in the hidden
layer(s). The changes in weights involve a learning rate and a momentum factor and
are usually in proportion to the negative derivative of the error term. It may take a
few thousands of rounds, repeating the feedforward and error back-propagation,
before the predicted output gets very close to the target value. The learning process
is continued with multiple samples until the prediction error across all samples in
the training data is minimized to a reasonable range or stabilized (convergence).
Thereafter, knowledge of the network remains encoded in the refined connec-
tion weights and bias terms that can be used to recall any trained sample or, if
well generalized, to predict unknown input–output pairs. For more details, see
Rumelhart et al. (1986) and McAuley (1997).
Design of experiments (DOE) (Hicks and Turner 1999) is a well-established
methodology that enables the analyst to draw inferences or test hypotheses about
an entire population based on a few sampled observations. Usually the advantage of
DOE is to minimize the number of experiments required to determine the effects of
various factors on the response of a system. The data are obtained via a statistical
Downloaded By: [Montana State University] At: 19:04 1 September 2009
randomly chose two small sets of data as test data and validation data, respectively,
and used the rest as training data. The test data were used to monitor the perfor-
mance of the model during training. Then, the validation data were used to measure
the performance of the trained model. Once the empirical ANN model was validated,
it was used to predict the outputs of well-designed process settings. The predicted
data were further used to perform statistical tests and to identify the significant
factors and interactions that affect the quality-associated output. For the significant
factors, the established mathematical models were also used to construct the three-
dimensional response surfaces. The data mining process is illustrated in figure 2.
With the better understanding of the process behaviour, relevant cause-and-effect
relationships were quantified and the direction for quality improvement was
discussed.
transfer function and the sum of the mean squared error in the output layer (SMSE)
as the convergence criteria. The training of the networks was performed in batch
mode. The values of learning rate and momentum factor were initialized at 0.9 and
0.7, respectively, and they were automatically adjusted during the training process to
avoid the trap of local minima while maintaining the features of the BP algorithm.
All the data for input and output were normalized based on equation (2), where Xi
and NXi are the i-th value of factor X before and after the normalization, and Xmin
and Xmax are the minimum and maximum of factor X, respectively. The program
was written in C language:
f ðxÞ ¼ ð1 þ ex Þ1 ð1Þ
Water Analytic
Run Solids Temperature added Stoichiometrys Reaction Catalystc yield
no. (%) ( C) (ppm) (%) time (min) (mol%) (%) YI
Table 2. Parameters and performance of the ANN models for the chemical process.
ANNs, virtual experimental design to quality improvement 107
degrees of freedom (d.f.) left for us to investigate the two- and three-way interac-
tions. While other choices of fractional factorial design such as 361, 362 and 363
were available, we chose the 36 factorial design because the virtual experiments were
not limited by real sources other than the computer time. Owing to the nature of
virtual experiments, there are no uncontrolled factors (noise) contributing to the
responses, and therefore it was not necessary to randomize the order of conducting
the 729 experiments. For the same reason, every combination was unreplicated.
Levels and controllability of reaction parameters used for prediction of the chemical
process are shown in table 3. For the designed (virtual experiment) data, the
responses were predicted using the established ANN models and the resulting virtual
data were used to determine the effects of the investigated factors on analytic yield
and YI.
We used SAS to perform the ANOVA procedure, which was applied to test the
statistical hypotheses about the significance of each factor and their two- and three-
Downloaded By: [Montana State University] At: 19:04 1 September 2009
way interactions to the dependent variable, analytic yield or YI. ANOVA tables for
the dependent variables, analytic yield and YI, are summarized in table 4. Of key
importance in the tables are the two columns, F ratio and p value, to test the reli-
ability and efficiency of the models. The F ratio works as a surrogate of the signal-
to-noise ratio, where the noise is the residual error indicating contribution to the
response by higher order terms not included in the three-way ANOVA model. We
may consider the p value as the smallest level at which the data are significant, in
other words, it indicates the risk of incorrectly not including the terms affecting the
dependent variable. From table 4, the large F ratios and the small p values suggested
that the models were highly reliable and included the terms significantly affecting
analytic yield and YI.
Water
Solids Temperature added Stoichiometry Reaction Catalyst
(%) ( C) (ppm) (%) time (min) (mol%)
Sum of Mean
Source Df squares square F p
Table 4. Summary of ANOVA tables for the dependent variable analytic yield and YI.
108 X. Shi et al.
We also obtained the components of the ANOVA tables for analytic yield and
YI, and the detailed results for them are shown in tables 5 and 6, respectively. Any
factor or factorial interaction was considered significant if they had p<0.05. From
table 5, we can see that all the six factors, per cent solids, temperature, water con-
centration, stoichiometry of B to A, reaction time, and catalyst concentration were
significant to the analytic yield. This conclusion differs from the results of Taylor
et al. (1997), in which they concluded that only water, stoichiometry and catalyst
were significant. The difference can be partly explained by the fact that we used the
residual error to calculate the F ratios, instead of the ‘pure error’ calculated from the
five replicated trials set to centrepoint settings. In addition, we used the data gener-
ated by the ANN model, instead of the raw data themselves, for the statistical tests,
which enabled us to test all the 20 three-way interactions between the six factors,
instead of only 10 used by Taylor et al. The virtual DOE gave us advantages to
explore more solution space and to reveal a ‘bigger picture’ of the functional rela-
tionships. Nonetheless, if not applied with caution, the modelling process may also
distort the information contained in the data used to construct the model.
ANNs, virtual experimental design to quality improvement 109
Solids – – – þ – –
Temperature – þ þ – þ
Water added – þ – –
Stoichiometry – – –
Reaction time – þ
Catalyst –
þ , Interaction was significant for both responses at 1% level; –, otherwise.
Table 7. Symmetric significance matrix of two-way interactions for both analytic
yield and YI.
In that our goal was to produce the intermediate E with low yellowness index in a
high yield, we also studied the effects of the six factors and their interactions on YI.
Downloaded By: [Montana State University] At: 19:04 1 September 2009
From table 6, we concluded that all factors except the per cent solids were significant
to YI. Since the per cent solids factor was involved in some of the significant
interactions, its effect could not be ignored. To achieve the goals simultaneously,
we focused on the interactions that were significant for both analytic yield and YI.
Table 7 shows the symmetric significance matrix of two-way interactions for both
responses at 1% level. In addition to the six two-way interactions that were signifi-
cant for both responses, the three-way interactions between water concentration,
stoichiometry and temperature or catalyst concentration were found significant for
both responses at 1% level.
Figure 4. ANN predicted response surface of (a) analytic yield and (b) YI over reaction time
and catalyst concentration, with the per cent solids, temperature, water concentration and
stoichiometry set to 25%, 120 C, 500 ppm and 100%, respectively.
improves its quality by reducing its YI in a synergistic manner. Using the established
model, we predicted that a zero water concentration and a stoichiometry of 100
would give a ‘next-to-optimal’ yield of 97.27% and a yellowness index of 1.77.
Better results were expected by introducing the variation of the other four factors,
the per cent solids, temperature, reaction time, and catalyst concentration, even
though these four factors were found not significantly affecting the responses
when both the water concentration and stoichiometry were set to low levels. With
the knowledge in process behaviour achieved, we should be able to design a few real
world experiments within the neighbourhood of low water concentration and
low stoichiometry to further search for the reaction conditions that optimize both
analytic yield and YI.
ANNs, virtual experimental design to quality improvement 111
Downloaded By: [Montana State University] At: 19:04 1 September 2009
Figure 5. ANN predicted response surface of (a) analytic yield and (b) YI over water con-
centration and stoichiometry, with the per cent solids, temperature, reaction time, catalyst
concentration set to 25%, 120 C, 30 min, and 10 mol%, respectively.
‘white spots’ appeared on the slot surface, and the white spots were glistening spots
reflected by the roughness of the surface. The quality target was high surface finish of
the PCB slot, which was indicated by the size of burr and white spot at a specific
depth. The smaller the burr and white spot, the better the machining quality of
the slot.
We used the unpublished data provided by a company specialized in manufac-
turing hard alloy tools, which is in Guandong, P. R. China. The company performed
32 experiments to offer different combinations of two state variables, the number of
PCB layers (NL) and depth of the checkpoint on the drilled slot (DP), and three
process variables, spinning speed (V), forwarding speed (F) and milling cutter dia-
meter (D). The state variables differed from the process variables in that they were
not controllable for a specific PCB. As shown in table 8, the state variables varied
at four different levels and the process variable varied at two different levels.
The responses of interest were the height of the burr (HB), and length (LS) and
width (WS) of the white spot at the checkpoint on the drilled slot. The order of
conducting the 32 experiments was not randomized and the historical data thus were
happenstance data.
For the training of feedforward neural networks, we used the same algorithm as
described in the previous case study and all the data for input and output were
normalized based on equation (2). With the 30 training samples in table 8, we
established three mathematical models, NN-3, NN-4 and NN-5, to quantify the
relationships between the five investigated factors and the response, HB, LS or
WS, respectively. The models were also tested and validated. Table 9 lists the
parameters and performance of the ANN models for the machining process.
From the learning results, we can see that the established ANN models have
good ‘memory’ and the trained matrices of interconnected weights and bias reflect
the hidden functional relationships very well. Because the test and validation errors
of the models were small, the models are reliable for the prediction of burr and white
spot sizes for a PCB slot machined under any other combination of the five state
and process variables, as long as they are within the range investigated.
Table 9. Parameters and performance of the ANN models for the machining process.
process variable at its extreme and centrepoint values, which was a virtual 42 33
factorial design yielding 432 different combinations for the five factors. Owing to the
nature of virtual experiments, there are no uncontrolled factors contributing to the
responses, and therefore the order of conducting the 432 experiments was not neces-
sarily randomized. That is also the reason why every combination was unreplicated.
For the designed data, the responses were predicted using the established ANN
114 X. Shi et al.
models and the resulting virtual data were used to determine the effects of the
investigated factors on height of the burr (HB), and length (LS) and width (WS)
of the white spot.
We used SAS to perform the ANOVA procedure, which was applied to test the
statistical hypotheses about the significance of each factor and their two- and three-
way interactions to the dependent variable, MH, LS or WS. Compared with the
previous case, the F ratios are smaller and the p values larger in the ANOVA tables,
which suggest that the functional relationships between the independent variables
and responses are much more complex in the machining process and higher order
terms ignored by the three-way ANOVA models may contribute to the responses.
The three-way ANOVA models can be used to identify the significant factors and
interactions since the p values are under the 5% level. Analysis of the components of
the ANOVA tables indicated that all the five factors, number of PCB layers (NL),
depth of the checkpoint (DP), spinning speed (V), forwarding speed (F) and milling
Downloaded By: [Montana State University] At: 19:04 1 September 2009
cutter diameter (D), were significant to the three responses. Since our goal was to
produce the PCB slot with low values in not only HB, but also LS and WS, we
focused on the interactions that were significant for the three responses simulta-
neously. At the 5% level, there were two two-way interactions significant for all
the responses, namely the one between the number of PCB layers (NL) and depth
of the checkpoint (DP), the other between the forwarding speed (F) and milling
cutter diameter (D). In addition, only the three-way interaction between depth of
the checkpoint (DP), forwarding speed (F) and milling cutter diameter (D) was
significant for all the responses at 5% level.
Figure 7. ANN predicted response surface of (a) burr height (HB), (b) white spot length (LS)
and (c) white spot width (WS) over forwarding speed (F) and milling cutter diameter (D)
for a slot of depth in 0 mm in a four-layer PCB, with spinning speed (V) set to 40 krpm.
116 X. Shi et al.
Downloaded By: [Montana State University] At: 19:04 1 September 2009
Figure 8. ANN predicted response surface of (a) burr height (HB), (b) white spot length (LS)
and (c) white spot width (WS) over forwarding speed (F) and milling cutter diameter (D)
for a slot of depth in 7.4 mm in a four-layer PCB, with spinning speed (V) set to 40 krpm.
ANNs, virtual experimental design to quality improvement 117
the white spot length and the tendencies vanished when the cutter was in large
diameter and/or was operated at high forwarding speed. While the effects of for-
warding speed and cutter diameter on the white spot width were more complex,
the smallest width could be achieved with a combination of a forwarding speed of
0.48 m/min and a diameter of 1.6 mm. Such a combination gave very low values in
HB and WS simultaneously, whereas the white spot length was still more than
0.09 mm.
Therefore, we concluded that a high forwarding speed was generally preferred to
decrease the sizes of burrs and white spots, namely, to achieve better surface quality
for the machined slot. Although better results were expected by introducing the
variation of the other factors, it was impossible to achieve low values in HB, LS
and WS simultaneously within the investigated scopes. With the knowledge in pro-
cess behaviour achieved, however, we can design a few real-world experiments with
reasonably high forwarding speed to search further for the process parameters that
Downloaded By: [Montana State University] At: 19:04 1 September 2009
minimize the size of both the burr and white spot for a PCB slot under certain state
conditions.
4. Conclusions
For the production of intermediate E, the combination of low water concentra-
tion and low stoichiometry of B to A not only increases the yield of intermediate E
dramatically, but also improves its quality by reducing its yellowness in a synergistic
manner. These conclusions differ from the earlier reference, and give a more
comprehensive picture of the functional relationships inherent in the chemical
manufacturing process.
For the machining of PCB slot by a milling cutter, a high forwarding speed was
generally preferred to decrease the sizes of burrs and white spots, namely, to achieve
better surface quality of the slot. It was found impossible to achieve low burr heights,
white spot lengths and white spot widths simultaneously within the investigated
scopes and more real-world experiments are necessary to ensure better quality.
ANNs were successfully applied to model the complex functional relationships
inherent in two industrial processes using existing data and the established models
were used to predict the quality-associated responses, to identify the significant fac-
tors and interactions, and to achieve better understanding of the process behaviour.
The preliminary conclusions derived from historical data lead to the direction of
further real-world experiments for quality improvement.
References
ALAUDDIN, M., BARADIE, M. A. and EL, HASHMI, M. S. J., 1997, Prediction of tool life in end
milling by response surface methodology. Journal of Materials Processing Technology,
71, 456–465.
BODE, J., 1998, Decision support with neural networks in the management of research
and development: Concepts and application to cost estimation. Information and
Management, 34, 33–40.
CHEN, M. C., 2001, Tolerance synthesis by neural learning and non-linear programming.
International Journal of Production Economics, 70, 55–5.
COIT, D. W., JACKSON, B. T. and SMITH, A. E, 1998, Static neural network process models:
Considerations and case studies. International Journal of Production Research, 36,
2953–1968.
COIT, D. W. and SMITH, A. E., 1995, Using designed experiments to produce robust neural
network models of manufacturing processes. Proceedings of the Fourth Industrial
Engineering Research Conference (IERC), Nashville, TN, USA.
118 X. Shi et al.
GOODACRE, R. and KELL, D. B., 1993, Rapid and quantitative analysis of bioprocesses using
pyrolysis mass spectrometry and neural networks: application to indole production.
Analitica Chimica Acta, 279, 17–26.
HICKS, C. R. and TURNER, K. V., 1999, Fundamental Concepts in the Design of Experiments,
5th edn (Oxford: Oxford University Press).
HUSSAIN, M. A., 1999, Review of the applications of neural networks in chemical process
control — simulation and online implementation. Artificial Intelligence in Engineering,
13, 55–68.
INAMDAR, M., DATE, P. P., NARASIMHAN, K., MAITI, S. K. and SINGH, U. P., 2000,
Development of an artificial neural network to predict springback in air vee bending.
International Journal of Advances in Manufacturing Technology, 16, 376–381.
KIM, D. J. and KIM, B. M., 2000, Application of neural network and FEM for metal forming
processes. International Journal of Machine Tools and Manufacture, 40, 911–925.
KO, D. C., KIM, D. H. and KIM, B. M., 1999, Application of artificial neural network
and Taguchi method to perform design in metal forming considering workability.
International Journal of Machine Tools and Manufacture, 39, 771–785.
Downloaded By: [Montana State University] At: 19:04 1 September 2009
KUSTRIN, S. A., ZECEVIC, M., ZIVANOVIC, L. J. and TUCKER, I. G., 1998, Application of neural
networks for response surface modeling in HPLC optimization. Analitica Chimica Acta,
364, 265–273.
LOU, W. and NAKAI, S., 2001, Application of artificial neural networks for predicting the
thermal inactivation of bacteria: a combined effect of temperature, pH, and water
activity. Food Research International, 34, 573–579.
MCAULEY, D., 1997, The back-propagation network: learning by example [http://
psy.uq.oz.au/brainwav/Manual/BackProp.html].
MYERS, R. H. and MONTGOMERY, D. C., 1995, Response Surface Methodology: Process and
Product Optimization Using Designed Experiments (New York: Wiley).
NESIC, S. and VRHOVAC, M., 1999, A neural network model for CO2 corrosion of carbon steel.
Journal of Corrosion Science Engineering [http://www.cp.umist.ac.uk/JCSE/)1,Paper 6].
PYZDEK, T., 1999, Virtual-DOE, data mining and artificial neural networks [http://
www.QualityAmerica.com/knowledgecente/articles/PYZDEKneural.htm].
RUMELHART, D. E., HINTON, G. E. and WILLIAMS, R. J., 1986, Learning internal repre-
sentations by error propagation. In D. E. Rumelhart, J. L. McClelland and the PDP
Research Group (eds), Parallel Distributed Processing (Cambridge, MA: MIT Press),
pp. 318–362.
SCHMITT, J., UDELHOVEN, T., FLEMMING, H. C. and NAUMANN, D., 1998, Stacked spectral data
processing and artificial neural networks applied to FTIR and FT-Raman spectra in
biomedical applications. Proceedings of SPIE 3257, pp. 45–58.
SPEDDING, T. A. and WANG, Z. Q., 1997, Study on modeling of wire EDM process. Journal of
Materials Processing Technology, 69, 18–28.
TAKAHARA, J., TAKAYAMA, K., ISOWA, K. and NAGAI, T. 1997, Multi-objective simultaneous
optimization based on artificial neural network in a ketoprofen hydrogel formula con-
taining O-ethylmenthol as a percutaneous absorption enhancer. International Journal of
Pharmaceutics, 158, 203–210.
TAYLOR, R. W., BARREN, J. P., NICK, R. J. and CAWSE, J. N., 1997, Non-linear effects on yield
and colour for an intermediate in an industrial process. Polymer Testing, 16, 75–89.
WIENKE, D. and BUYDENS, L., 1996, Adaptive resonance theory based neural network for
supervised chemical pattern recognition (FuzzyARTMAP), Part 1: Theory and basic
properties. Chemometrics and Intelligent Laboratory Systems, 32, 151–164.
ZHANG, G., PATUWO, B. E. and HU, M. Y., 1998, Forecasting with artificial neural networks:
the state of the art. International Journal of Forecasting, 14, 35–62.
ZHANG, G. P., PATUWO, B. E. and HU, M. Y., 2001, A simulation study of artificial neural
networks for non-linear time-series forecasting. Computers and Operations Research, 28,
381–396.