Process Modeling and Optimization Using Focused Attention Neural Networks
Process Modeling and Optimization Using Focused Attention Neural Networks
TRANSACTIONS1
Abstract
Neural networks have been shown to be very useful for modeling and optimization of nonlinear and even chaotic
processes. However, in using standard neural network approaches to modeling and optimization of processes in the
presence of unmeasured disturbances, a dilemma arises between achieving the accurate predictions needed for modeling
and computing the correct gains required for optimization. As shown in this paper, the Focused Attention Neural
Network (FANN) provides a solution to this dilemma. Unmeasured disturbances are prevalent in process industry
plants and frequently have signi®cant eects on process outputs. In such cases, process outputs often cannot be accu-
rately predicted from the independent process input variables alone. To enhance prediction accuracy, a common neural
network modeling practice is to include other dependent process output variables as model inputs. The inclusion of
such variables almost invariably bene®ts prediction accuracy, and is benign if the model is used for prediction alone.
However, the process gains, necessary for optimization, sensitivity analysis and other process characterizations, are
almost always incorrect in such models. We describe a neural network architecture, the FANN, which obtains accuracy
in both predictions and gains in the presence of unmeasured disturbances. The FANN architecture uses dependent
process variables to perform feed-forward estimation of unmeasured disturbances, and uses these estimates together
with the independent variables as model inputs. Process gains are then calculated correctly as a function of the esti-
mated disturbances and the independent variables. Steady-state optimization solutions thus include compensation for
unmeasured disturbances. The eectiveness of the FANN architecture is illustrated using a model of a process with two
unmeasured disturbances and using a model of the chaotic Belousov±Zhabotinski chemical reaction. # 1998 Elsevier
Science Ltd. All rights reserved.
Keywords: Neural Networks; Steady state optimization; Disturbance rejection; Process modeling
purposes such as process understanding, sensitiv- values. For instance, in the distillation column
ity analysis, and optimization, accuracy in the example, signi®cant unmeasured disturbances
process gains (derivatives of outputs with respect would make it impossible to accurately predict the
to inputs) is also essential. With standard neural top and bottom compositions as a function of
network approaches to modeling processes subject reboil steam, re¯ux, feed ¯ow, and column pres-
to unmeasured disturbances, obtaining accuracy sure alone.
in both the predictions and gains is often not pos- To improve prediction accuracy in such situa-
sible. To understand why this is so, consider the tions, a common neural network modeling strat-
following example of a distillation column. egy is to include dependent variables along with
Typical process variables in a distillation column the independent variables as model inputs. In the
are: distillation column example, this would mean
adding the temperatures as model inputs in addi-
. Manipulated variables: reboil steam, re¯ux.
tion to the manipulated variables in order to aid
. Measured disturbance variables: feed ¯ow,
prediction of the top and bottom compositions.
column pressure.
Because dependent variables (e.g., the tempera-
. Output (controlled) variables: top and bot-
tures) re¯ect the eects of unmeasured dis-
tom compositions.
turbances (e.g., feed composition), including
In addition, distillation columns are typically dependent variables as inputs to a model ordina-
instrumented to monitor a number of other rily does improve prediction accuracy of the out-
dependent variables (which the operators are not put. However, because the functional relationship
interested in controlling directly): of the independent to the dependent variables is
not represented in the model, and because the
. Dependent (not controlled) variables: over-
independent and dependent variables are usually
head temperature, bottom section tempera-
highly correlated, the gains in such models are
ture, re¯ux temperature.
almost certain to be incorrect. In our example, this
These process variable categories are shown in means that adding the temperatures to the model
Fig. 1. inputs will cause the gains of the top and bottom
Unmeasured disturbances such as feed compo- compositions with respect to the manipulated
sition, weather, catalyst degradation in reactors, variables to be wrong. Consequently, while models
and plant wear, can cause outputs to vary despite containing dependent variables as inputs may
®xed independent variable settings [14]. If the have good ability to predict the output variables,
eects of such disturbances are at all signi®cant, a optimization settings and sensitivity analysis are
neural network model containing only the inde- typically inaccurate due to incorrect gains. The
pendent process variables as inputs (the manipu- Focused Attention Neural Network (FANN)
lated and measured disturbance variables, which allows steady-state neural network models to
contain no information about the disturbances), obtain accurate predictions and gains in the pre-
will be unable to accurately predict the output sence of unmeasured disturbances.
2. The problem
Given sucient data, the trained neural net-
In short, the problem addressed by the FANN work model of Fig. 2 will match the process Eq.
architecture is the following dilemma: (1) and yield correct predictions:
1. Information about unmeasured in¯uences y^
a1 a2 u
3
which is re¯ected in dependent process out-
put variables is frequently necessary for
The gain of the model is also correct, matching the
neural network process models to have the
process gain Eq. (2):
required prediction accuracy. However,
2. adding dependent variables as model inputs @
y^ a1 a2
causes the gains for the manipulated vari- @u
ables to be inaccurate.
This dilemma is illustrated in the following 2.2. Case 2: unmeasured disturbances, independent
example, which is considered in three cases. model inputs only
2.1. Case 1: no unmeasured disturbances We now alter the process equations of Case 1 by
adding a disturbance d to the state s:
Assume a noiseless, linear plant1 with manipu-
lated input u, dependent variable s, and output y a1 u s
variable y:
s a2 u d
y a1 u s
s a2 u Eliminating s we have:
y a1 u a2 u
a1 a2 u
1 which diers from the Case 1 process Eq. (1) by
the addition of the disturbance d. The gain equa-
from which we can compute the process gain of y tion for this process, on the other hand, is the
with respect to u: same as the Case 1 gain Eq. (2):
@y @y
a1 a2
2 a1 a 2
5
@u @u
Since there are no unmeasured disturbances We here consider the same model structure as in
aecting the process, y can be modeled as a func- Case 1 (Fig. 3).
tion of u only, as shown in Fig. 2. The presence of disturbances makes it impos-
sible to accurately model y as a function of u only.
1
A linear plant is chosen for simplicity to illustrate the Assuming the disturbance d is zero-mean, the
dilemma. It is shown later how a nonlinear system is modeled. trained neural network model will be identical to
44 J.D. Keeler et al./ISA Transactions 37 (1998) 41±52
Fig. 3. Case 2: unmeasured disturbances exist, and y is mod- Fig. 4. Case 3: unmeasured disturbances exist, and y is mod-
eled as a function of u only. The predictions for y are inaccu- eled as a function of u and s. With this model structure, the
rate by the amount d. However, the model gain is correct. predictions for y are accurate, but the gain for u is incorrect.
Table 1
Summary of standard neural network approaches
Summary of the models considered in Section 2, illustrating the dilemma faced by standard neural network approaches when used
to model processes subject to unmeasured distrubances.
(*) This case was not considered because a correct model can be obtained with only independent inputs.
process variables, estimating unmeasured dis- We ®rst assume that the dependent process vari-
turbances, and thereby providing accurate predic- ables s are, like the outputs y, functions of u and d:
tions and gains. This structure is now described.
s f
u; d
3.1. The FANN structure
As in the case of y, because disturbances d are
First, we assume that the vector of outputs y is present, s cannot be predicted from u alone.
given by a function, in general nonlinear, of the We proceed by expanding f as follows:
vector of manipulated variables u and the vector
of unmeasured disturbances d: s f
u; d g
u h
d l
u; d
10
^ s
y^ J
u;
9 y J
u; s J
u; g
u D Z
u; D
12
A model of this form was considered in Case 3 of Eqs. (11) and (12) now have the desired functional
Section 2, which illustrated the fact that such a dependenciesÐthe dependent process variables s
neural network model results (given sucient and y are expressed as functions of the indepen-
data) in accurate predictions of y, but in incorrect dent variables u and D.
gains. We denote a model representing Eq. (11) as:
We now describe how the gains may be made
^
s^ g^
u D
13
correct without sacri®cing prediction accuracy.
46 J.D. Keeler et al./ISA Transactions 37 (1998) 41±52
Fig. 5. The FANN architecture, described by Eqs. (13) and (14) Model 1 and Model 2 are neural network models. This structure
properly models the dependent variables s and outputs y as functions of the two types of independent variables, u (manipulated) and DÃ
(estimated eects of unmeasured disturbances).
Thus, estimating s simply requires summing the zation, DÃ is held ®xed, and u is adjusted to achieve
estimates gÃ(u) and DÃ. As is discussed below, gÃ(u) is the desired setpoint for y. (This assumes that the
obtained by training a model from u to s. The unmeasured disturbances are varying slowly.) By
computation of DÃ is discussed in the next section. ®xing DÃ the optimization takes into account the
Equations (9) and (13) together de®ne the eects of the unmeasured disturbances.
FANN model structure. Combining Eqs. (9) and
(13) in the manner of Eq. (12), the FANN archi- 3.2. Computing DÃ
tecture can be expressed as:
The question remains as to how to compute DÃ.
^
y^ Z
u; ^
D
14 Rearranging Eq. (11), we see that D can be given
by
The FANN model is shown in Fig. 5, where both
Model 1 and Model 2 are neural network models. D s ÿ g
u
15
Model 1 is trained to represent gÃ(u) by training
a model with s as outputs and u as inputs. This is To estimate D with a model DÃ, we note that the
analogous to the Case 2 model of Section 2, with s dependent variable s is measured and that g(u)
as outputs instead of y. Model 1 will be inaccurate may be modeled by gÃ(u) as described above.
in its predictions of s to the degree that unmea- Therefore, DÃ may be computed simply as:
sured disturbances are present, but (given su-
cient data) its gains with respect to s will be ^ s ÿ g^
u
D
16
correct. This is precisely what is desired. Model 2
is trained with y as outputs and u and s as inputs This relationship is shown in Fig. 6.
(in the training mode, sà is identical to s, by de®ni- Estimation of the eects of unmeasured dis-
tion of DÃ, described in the next section). turbances allows for accurate predictions as well
Thus, the FANN modeling structure correctly as correct calculation of process gains as a func-
treats both u and DÃ as independent variables, and tion of the independent variables and the unmea-
represents the functional dependence of s and y sured disturbances. These factors enable
upon those independent variables. During optimi- compensation for unmeasured disturbances during
optimization.
Table 2
Summary of neural network approaches including FANN
The FANN architecture compared to the models considered in Section 2. Only the FANN architecture obtains both correct
predictions and correct gains in the presence of unmeasured disturbances.
(*) These cases were not considered because a correct model can be obtained with only independent inputs.
48 J.D. Keeler et al./ISA Transactions 37 (1998) 41±52
Fig. 8. Actual versus FANN model estimates of the unmea- Fig. 10. Actual versus FANN model estimates of the optimized
sured disturbance d1 . values of the dependent variable s1 .
50 J.D. Keeler et al./ISA Transactions 37 (1998) 41±52
Fig. 12. A portion of the time series of the dependent variable, s, and the output variable, y, from the modulated BZ reaction map.
The system displays bursts of chaotic activity as s increases above 3.4.
J.D. Keeler et al./ISA Transactions 37 (1998) 41±52 51
and y
t, as shown in Fig. 13. First, we trained a reaction. Fig. 14 shows the extremely poor results
Case 2 neural network (Fig. 3) to model the BZ of one-step-ahead control to a setpoint of p 0:63
reaction. As expected, due to the unmeasured dis- using the Case 3 model.
turbance d
t, the model was unable to accurately Lastly, we trained a FANN model on the BZ
predict the output y
t, and the accuracy of the reaction. The prediction accuracy was identical to
model was only r2 0:90. This indicates that the the Case 3 model (r2 0:99). As Fig. 14 shows,
unmeasured disturbance is signi®cant and needs to control by the FANN model is dramatically better
be estimated. than the Case 3 model (65% smaller RMS error
Next, we trained a Case 3 neural network overall). Because the FANN model correctly
(Fig. 4) to model the process, which included the represents the behavior of the dependent variable
dependent variable s
t as a model input. As s
t and the output y
t in response to changes to
expected, the prediction accuracy improved to a the manipulated variable, the gain function for
high r2 0:99, but the gain for u
t was inaccurate u
t is correct, and the optimized settings maintain
(0.0025 the size of the gain for s
t), and hence the the output y
t at its setpoint despite the presence
model was ineective when used to control the BZ of ¯uctuating unmeasured disturbances.
Fig. 13. (a) The mathematical structure of the simulation of the BZ plant. The dependent variable s
t is a function of the manipulated
variable u
t and the unmeasured disturbance d
t. The output at the next time step, y
t 1, depends explicitly only on s
t and y
t.
(b) The FANN model does not receive the unmeasured disturbance d
t as an input, and only u
t is modi®able. The system computes
optimal values for u
t using a nonlinear programming algorithm which maintains the output y
t 1 at the setpoint P.
Fig. 14. BZ process dynamics under the one-step-ahead control of the Case 3 neural network model (Fig. 4) and the FANN model
(Figs. 5 and 6). For both cases, the setpoint is 0.63, and the control method is a standard non-linear programming code. The process
controlled by the Case 3 model diers only slightly from the original uncontrolled process (not shown), even though the manipulated
variable (also not shown) takes on its extreme values. This failure is due to the wrong gain for the manipulated variable resulting from
the inappropriate model structure. Using the FANN model, in which the gains are correct, results in the controlled output deviating
from the setpoint only slightly in the most chaotic region, a vast improvement over both the original dynamics and the Case 3 model
control.
52 J.D. Keeler et al./ISA Transactions 37 (1998) 41±52