Symbolic Incorporation of External Procedures Into Process Modeling Environments
Symbolic Incorporation of External Procedures Into Process Modeling Environments
Modeling Environments
Abstract
Despite the widespread availability of sophisticated, user-friendly process modeling environments, the
use of external procedures within these software packages will be necessary for quite some time. This
paper illustrates the importance of properly handling these external procedures and describes an
automated, symbolic approach for incorporating them correctly into an equation-oriented modeling
environment.
ABACUSS II, open systems, physical property and chemical kinetics libraries.
∗ Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
1
1 Introduction
The value of process modeling within the chemical and biological processing industries today is indis-
putable. There are an increasing number and variety of applications where it is used. For example,
modeling is used in activities such as experimental design, process feasibility studies, process synthesis [1]
and design [2], de-bottlenecking and optimization [3, 4], operator training [5, 6], and real-time optimiza-
tion. The information gained from proper modeling include improved understanding of the process, safer,
cleaner, and more profitable designs and operations, and reduction in time-to-market for new products.
For these reasons alone, process modeling, simulation, and optimization play a crucial role within the
process industries.
Two main approaches exist for process modeling: the sequential (and simultaneous) modular approach
and the equation-oriented approach. In the modular approach, the user constructs the process flowsheet
model, typically with a user-friendly graphical interface, by connecting blocks corresponding to members
of a library of unit operation model subroutines. The simulator then analyzes the flowsheet structure and
solves the problem with an appropriate algorithm. Substantial research has been performed over the past
several decades in flowsheet analysis and solution strategies. The unit operation model libraries typically
contain collections of subroutines implemented in lower level programming languages such as C or Fortran.
These subroutines take as arguments a relatively few number of parameters (e.g., flowrates, composition,
temperature, and pressure of input streams, unit operation parameters) and return the variables charac-
terizing a relatively small number of output streams. The implementation of these subroutines typically
contains sophisticated tailored solution strategies to ensure convergence to correct results. Information
moves through the flowsheet model in a similar manner to the material flow through the actual process,
from the output of one unit operation to the input of the next, and so on. This unidirectional flow of
information from inputs to outputs makes the modular simulator ideally suited for steady-state solution
2
of flowsheets with relatively few recycle streams. Flowsheets containing many recycle streams or design
constraints (e.g., specifications on outputs) are often much more difficult to converge.
solve the entire system of equations involved in the process flowsheet model simultaneously. The overall
system model is typically very large (tens to hundreds of thousands of equations and variables), but
extremely sparse. For example, each equation typically involves only five to ten variables. Having the
entire system of equations simultaneously allows very sophisticated large-scale numerical integration and
optimization codes to be applied to the process flowsheet model. Thus, the equation-oriented approach
is more suited for dynamic calculations, such as dynamic simulation, parametric sensitivity analysis, and
dynamic optimization, and steady-state optimization. When the entire system of equations is in this
open form, that is, all equations and variables are visible to the numerical algorithms, it is not typically
possible to combine subsets of the model equations with custom solution algorithms as is the case in the
modular approach. Consequently, equation-oriented process modeling environments tend to be far less
Many equation-oriented process modeling environments provide high-level declarative input languages
which allow the user to develop their own models in a very natural and intuitive way, much as the modeler
would with a pencil and paper using standard mathematical notation. This model prototyping capability
is extremely important from the standpoint of dynamic calculations. In order to capture properly the
dynamic behavior of a process, very detailed models must be constructed which include descriptions of
vessel geometry and internal arrangement, possible flow reversals and transitions, phase changes, etc.
Since it is obviously not possible to provide a library of unit operation models with sufficient detail for all
processes, the availability of a flexible input language is crucial for dynamic calculations. Furthermore,
most input languages support the ability to describe complex hybrid phenomena such as discrete control
3
actions, safety interlock systems, and process disturbances. This capability enables activities such as
the development of startup and shutdown procedures [7], disturbance studies, operator training, and
pling between model description and model solution. For example, the same model, written in the input
language of the process simulator, may be used for a variety of computations such as simulation, param-
eter estimation, sensitivity analysis, and optimization during the life of the process, from initial process
development to process decommissioning [8]. As experience with the process grows, additional insight
and detail may be incorporated into the model. Hence, the model becomes a repository of knowledge for
The ability to prototype efficiently detailed models is important within many emerging fields where
new and innovative processes require custom models. Some of these fields are biotechnological processes,
biomedical devices, specialty polymer and other advanced materials design, and microprocess systems.
The model development involved in these emerging areas must be performed by experts in the field, who
are often not experts in process modeling and the subsequent calculations. Consequently, the availability
of sophisticated, user-friendly model prototyping tools offer several benefits such as rapid model develop-
ment, model analysis tools for debugging, automated application of advanced numerical algorithms, and
visualization of results.
2 Motivation
Although the input languages of modern equation-oriented modeling environments enable quite general
models to be constructed, there are limitations. For example, the input languages of these modeling envi-
4
ronments are ideal for describing large scale systems of differential-algebraic equations (DAEs) or partial
differential-algebraic equations (PDAEs) where every equation and variable should be made accessible
symbolically to the numerical engine. As alluded to above, this eliminates the possibility of combining so-
phisticated solution algorithms for specific portions of the model that are difficult to converge, expensive
to compute, or exhibit multiple solutions. For example, a subset of the model equations may compute
vapor phase partial molar volumes from a cubic equation of state. A specific numerical algorithm should
be applied for this subtask, ensuring the calculation always converges to the correct vapor phase root.
Also, a user may wish to apply an inside-out flash calculation in order to compute robustly VLE. For these
reasons and others, most modern equation-oriented input languages provide the user with the ability to
incorporate external procedures coded in programming languages such as C or Fortran into the overall
process model.
There are a number of reasons why a user would want to use external procedures within a process
model in addition to being able to incorporate custom numerical algorithms. One reason is to make use of
third party libraries such as physical property or chemical kinetics packages, which are usually available
as subroutine or procedure libraries. Another is the fact that many organizations have vast amounts of
legacy codes (typically written in Fortran) that contain proprietary or classified information. Requiring
the user to recode the equations in the existing code in the form of the input language is tedious, error
prone, and often not possible. Furthermore, many of these existing codes have been well-tested, validated,
and are trusted. Consequently, they should be used “as is” whenever possible.
When model equations are written in the input language of the modeling environment, the simulator
is able to analyze these equations and construct analytical derivatives, sparsity patterns, and essentially
any other symbolic information that may be exploited during the subsequent numerical calculation.
However, with current technology external procedures are generally evaluated as “black-boxes” within
5
the simulator. That is, given a set of independent variables, the external procedure is called by the
simulator to calculate merely values for the corresponding dependent variables. The implication of this
is that, in general, necessary partial derivatives are approximated using finite differences, sparsity is not
exploited, and any discontinuities in the equations are not handled explicitly. All of these issues may
significantly impact the performance or even success of the numerical calculation. For example, it is well-
known that computing partial derivatives using finite differences is both inaccurate and expensive (if the
full Jacobian matrix is desired and no structural information is available). This is particularly important
during parametric sensitivity analysis and chemical kinetics simulations, where derivative calculation can
contribute significantly to the overall cost of the calculation. Similarly, without knowledge of how the
independent variables influence the dependent variables, the simulator must assume that every dependent
variable is a function of every independent variable. Consequently, the blocks of the overall Jacobian
matrix (i.e., Jacobian matrix of the entire process model) corresponding to external procedures will be
completely dense. This has two implications. First, sparsity is not exploited during LU factorization.
This can substantially effect performance since it is the presence of sparsity that enables equation-oriented
process simulators to solve large scale systems of equations efficiently. Second, even if the problem is
relatively small or matrix-free algorithms are employed during numerical solution, not having the sparsity
pattern for the subsets of equations corresponding to external procedures substantially reduces the utility
algorithms are applied to the process model by equation-oriented simulators, providing information such
as whether or not the model is well-posed, high index, etc. Equations contained in external procedures
may introduce these problems into the overall process model and by not using the exact sparsity pattern
during the structural analysis, the user may not get appropriate diagnosis of errors, other than failure
of the subsequent numerical calculation. Finally, when external procedures are treated as “black-boxes”
6
any discontinuities contained in the external code will not be revealed to the numerical algorithm. These
discontinuities are quite common and are the result of nonsmooth intrinsic functions such as MIN, MAX,
and ABS appearing in the code, in addition to the more obvious IF statements. The consequences of
not explicitly handling these discontinuities during numerical integration are well-known [10, 11] and
typical results are inefficient calculations and/or integration failures. The situation is much worse for
parametric sensitivity analysis where failure to handle the discontinuities properly will often lead to
quantitatively and qualitatively incorrect results being computed without any warning to the user [12, 13].
The importance of proper handling of external procedures is probably most compelling with the use of
physical property libraries. Calls to external physical property subroutines can account for as much as
90% of the overall computational cost of a process flowsheet calculation and the discontinuities embedded
in these subroutines are the cause of as much as 90% of the numerical integration failures during dynamic
simulation.
The difficulty of combining third-party codes properly into process modeling environments has been
recognized for many years. One approach is to incorporate external codes as software components via
technologies such as OMG’s CORBA [14] and Microsoft COM [15]. For example, the CAPE-OPEN
committee [16] has developed interface standards that will enable integration between a wide variety of
process modeling components, provided they share the standardized interfaces and the process modeling
environment implements the necessary component architecture. Unfortunately, an end user desiring to
integrate their external code using this approach will have to implement the necessary interfaces. For
example, the standard interface for the model components used within equation-oriented process modeling
environments, the Equation Set Object (ESO) [17, 16], requires the user to provide numerical values of
the partial derivatives, sparsity patterns, and an explicit form of the state transition network [18, 19]
corresponding to the discontinuous equations in the model. This is not only a burden (e.g., there can
7
be a combinatorial number of modes in the state transition network), but requires the end user to be
familiar with component-oriented programming and all of its pitfalls [20]. Lastly, although component-
based technologies enable integration between different components, there are communication overheads
This paper describes an alternative approach where automated code analysis and code transformation
techniques are used to incorporate properly external procedures automatically into an equation-oriented
modeling environment. In this approach, the user’s code is analyzed automatically and new code is
generated which is compiled and linked into the process simulator, providing all of the necessary symbolic
information that would otherwise be neglected. In fact, all of the symbolic information that is available
to the numerical algorithms for the subset of equations that are written in the input language of the
simulator also becomes readily available for the equations corresponding to external procedures, provided
source code is present. This enables the modeler to use external procedures with the confidence that
the subsequent calculation will be performed efficiently, robustly, and correctly. Furthermore, these
techniques, which will be described in detail below, enable the user to apply a much broader class of
numerical algorithms to models embedding external codes. For example, the automated code analysis
and code generation techniques can also be used to generate code for evaluating convex relaxations of
nonlinear functions [21] and interval extensions of the process model [22], allowing activities such as
robust solution of nonlinear systems of equations, global optimization, and nonconvex MINLP. Another
advantage of the correct incorporation of external procedures is in applications where speed is crucial (e.g.,
online applications). Most modern process modeling environments employ an interpretive architecture
for evaluating model equations and derivatives. That is, the model equations and some symbolic form
of the partial derivatives are held in computer memory as data structures which are “interpreted” to
provide values. It is well-known that interpreted evaluation can be as much as an order of magnitude
8
slower than compiled evaluation. Although this does not typically imply the overall calculation is an
order of magnitude slower, the speedup can be significant by using optimized, compiled external code,
particularly in applications such as parametric sensitivity analysis, dynamic optimization, and stochastic
optimization.
The ideas developed in this paper are implemented in two software packages, ABACUSS II and
DAEPACK. The following two sections elaborate on the features of these software packages. This is
followed by example problems illustrating the importance of proper incorporation of external procedures
The equation-oriented modeling environment employed to demonstrate the ideas presented in this paper
is ABACUSS II [23]. Similar to other equation-oriented modeling environments, such as gPROMS [9],
Aspen Custom Modeler, DIVA [24], and ASCEND [25], ABACUSS II provides an intuitive, high level
declarative input language with which the user may describe the process model (see syntax manual in
[23] for details). ABACUSS II also allows the user to formulate some (or all) of the process model using
The algorithmic (or automatic) differentiation (AD) literature (e.g., [26]) assumes that the code to
be differentiated operates in the following manner. A collection of independent variable values are taken
as arguments to the code, and the code evaluates the values of a collection of dependent variables as
a function of these independent variables. This very general model for the operation of a code is also
adopted for external procedures interfaced to ABACUSS II. If we denote the set of independent variables
by x ∈ Rnx and the set of dependent variables by y ∈ Rny , a reference to an external procedure within
9
an ABACUSS II model amounts to inserting the following ny equations in the overall process model:
y = f (x) (1)
where f : Rnx −→ Rny is the function evaluated by the external code. Note that the number of equations
We will use the following simple Fortran subroutine that just multiplies two numbers and returns the
SUBROUTINE MULT(A,B,C)
C = A * B
RETURN
END.
In this example, A and B are the independent variables, and C is the dependent variable. It is first
necessary for the user to communicate this information in the ABACUSS II input language, which is
EXTERNAL MULT(INDEPENDENT,
INDEPENDENT,
DEPENDENT) ;
END.
This declares that there is an external procedure identified by MULT that has three arguments, the first
two of which are independent variables, and the last being a dependent variable. Since any argument
may be an open dimension array, the actual number of equations implied by the external procedure is
10
Once the EXTERNAL block has been introduced, it is possible to employ it to define equations within
MODEL Example1
VARIABLE
U,V,W AS NOTYPE
EQUATION
MULT(U,V,W) ;
END
w = uv (2)
into the process model. On the other hand, the following ABACUSS II input:
MODEL Example2
VARIABLE
U,V AS NOTYPE
EQUATION
MULT(U^2,V,0) ;
END
u2 v = 0 (3)
into the process model. It should be noted that any of the arguments in the reference to the external
procedure may be expressed as a function of the MODEL variables, including constants as illustrated in
11
Often, an external procedure will correspond to the residual evaluator for a system of equations, the
residual being zero at a root of the equations. For example the subroutine:
SUBROUTINE RES(NZ,T,Z,ZPRIME,DELTA)
INTEGER NZ
DELTA(1) = ...
etc.
RETURN
END
EXTERNAL RES(INTEGER,
INDEPENDENT,
INDEPENDENT,
INDEPENDENT,
DEPENDENT) ;
END
MODEL Example3
PARAMETER
NC AS INTEGER
VARIABLE
X AS ARRAY(NC) OF NOTYPE
12
EQUATION
RES(NC,TIME,X,$X,0(1:NC)) ;
END
into the overall process model where f : R × Rnx × Rnx −→ Rnx . Note that the number of equations
introduced is inferred from the number of dependent variables. In this case, the dependent variables are
the zero vector, so it is necessary to associate a dimensionality with this constant in order to infer the
As stated above, a common situation where external procedures are employed is for computing physical
properties. In the example below, a call is made to an external physical property routine to compute the
equilibrium K-values. In the MODEL block in Figure 1, the array equation Y=K*X corresponds to NC
DO I := 1 TO NC DO
END.
The next equation is the reference to the external procedure. This line in the input file defines NC
SIGMA sums the entries of an array. That is, the equation SIGMA(X) = 1 corresponds to the equation:
Nc
X
xi = 1
i=1
The external procedure KVAL in Figure 1 has seven arguments. The first is an integer parameter
corresponding to the number of components present (the integer keyword indicates this argument is
13
EXTERNAL KVAL(integer # Number of components
dependent, # Array of K-values computed in this routine
independent, # Temperature
independent, # Pressure
independent, # Array of liquid mole fractions
independent, # Array of vapor mole fractions
workspace double # Real workspace required by this routine
) ;
MODEL VLE
# ABACUSS II model computing vapor-liquid equilibrium
PARAMETER
NC AS INTEGER
VARIABLE
Temp AS Temperature
Pres AS Pressure
X AS ARRAY(NC) OF MoleFraction
Y AS ARRAY(NC) OF MoleFraction
K AS ARRAY(NC) OF PositiveValues
EQUATION
# VLE
Y = K * X ;
SIGMA(X) = 1 ;
SIGMA(Y) = 1 ;
END
Figure 1: External declaration and ABACUSS II input file excerpt for the computation of VLE using an
external physical property subroutine.
14
simply a parameter and not a model variable). The second argument, an array of K-values, is designated
as the dependent variables, that is, these values are computed from the independent variables by the
external procedure. The third through sixth arguments are independent variables corresponding to
temperature (a scalar), pressure (a scalar), and liquid and vapor composition (both arrays). The final
argument enables the user to pass workspace to an external procedure; the user simply specifies the size
of the workspace required by the routine. The external subroutine called has the interface shown below.
SUBROUTINE KVAL(N,K,T,P,X,Y,W)
INTEGER N
END
The subroutine computing the K-values may be arbitrarily complex and call any number of additional
subroutines and/or functions. What is important is that given values for the independent variables
temperature, pressure, and the vapor and liquid mole fractions, the code return the corresponding values
The interface to external procedures used in ABACUSS II models is completely arbitrary provided
all arguments are native types (e.g., integers, reals, and characters) and the external procedure may be
implemented in any compiled programming language (e.g., C/C++, Fortran, and Pascal). The imple-
The key novelty in this paper is that, if source code is available for this subroutine, ABACUSS II will
use DAEPACK to automatically construct all of the additional symbolic information it needs in order to
15
perform the subsequent numerical calculation. DAEPACK is described in detail in the following section.
The remainder of this section outlines the architecture of the ABACUSS II software.
The architecture of ABACUSS II has been designed to enable high levels of flexibility in several
areas, including how the software is interfaced, how the software is deployed and in what environment,
how the process model is described, and what numerical algorithms are applied to the process model.
Figure 2 contains a schematic of the three layer architecture of ABACUSS II. The top layer is the
Input Layer
Middle Layer
(Embeddable Process Simulator)
Bottom Layer
interface level, where user has access to the functionality of ABACUSS II to perform the desired analyses
and calculations. The ABACUSS II distribution provides both a command-line interface (CLI) and a
graphical user interface (GUI). In addition, ABACUSS II exports a set of documented interfaces that
may be accessed in any number of third-party applications including Microsoft Excel and Matlab. The
exported interfaces also allow the user to readily construct custom GUIs using Microsoft Visual Basic
16
or Java, for example. Lastly, the full functionality of ABACUSS II is available within programs written
by the user in programming languages such as C, C++, or Fortran. The middle layer of ABACUSS II
is the input translator and calculation executive. This layer implements the interfaces exported to the
top layer. This level includes features such as input file translation, model symbolic analysis, and the
calculation executive. This level also provides a number of visualization tools for examining information
such as model structure and numerical results. This layer is available in several formats which allows
(.so in UNIX/Linux and .dll in Windows) and C++ object for using the software locally. The software
may also be executed remotely as a CORBA or DCOM component or using XML-RPC. The bottom layer
provides numerical algorithms and ability to incorporate external procedures. The external procedures
incorporated may be either portions of an overall model or user-supplied numerical components. The
ability to incorporate properly external procedures, the focus of this paper, is provided by the software
The symbolic techniques used to incorporate external procedures into an equation-oriented modeling en-
vironment have been implemented in the software library DAEPACK, a general purpose software library
for numerical calculations [27, 28]. DAEPACK is divided into two main sections, one containing a collec-
tion of numerical components for performing calculations such as numerical integration and parametric
sensitivity analysis, and the other containing a set of symbolic components which construct automatically
all of the symbolic information required by the numerical components. DAEPACK provides both the
standard numerical functionality of ABACUSS II and the ability to incorporate properly external codes
17
into an ABACUSS II model. Currently, the symbolic components of DAEPACK only support Fortran
source code, however, the ideas may be readily extended to any procedural programming language.
The symbolic components of DAEPACK have emerged from ideas originally developed in the algo-
rithmic, or automatic, differentiation (AD) community. AD is a technique for computing exact derivative
values (to within roundoff error) for functions implemented in the form of codes written in some program-
ming language. Derivative values are obtained by simply decomposing the computer code into sequences
of elementary operations for which partial derivatives are known symbolically and applying the chain-
rule. It is important to note that AD does not furnish symbolic expressions for the derivatives; rather, it
furnishes values for the partial derivatives of the dependent variables with respect to any desired values
for the independent variables. How the chain-rule is applied gives rise to several variants of AD, each
of which is appropriate under different circumstances. A full description of AD is beyond the scope of
this paper and excellent descriptions of AD may be found in [26, 29, 30, 31], and a description of chem-
ical engineering applications can be found in [32]. What is important to emphasize however is that AD
can be quite efficient, general, and automated. For example, using the appropriate variant of AD, the
cost of evaluating gradients and general vector-Jacobian and Jacobian-vector products is only a small
multiple of the cost of evaluating the underlying function evaluation code, independent of the number of
variables or functions involved. If the full Jacobian is desired, then sparsity may be exploited to reduce
the cost of Jacobian evaluation in a number of ways [33, 34, 27, 35]. AD is also quite general. The
underlying functions need not simply be implemented as a sequence of assignments but can be arbitrarily
complex code including common blocks, loops, IF statements, and complex hierarchies of subroutines
and functions. Moreover, AD is more properly termed algorithmic differentiation (see [26]) because the
code may contain complex iterative solution algorithms (e.g., computing partial molar volumes from an
equation of state using an iterative algorithm). Finally, the application of AD can be completely auto-
18
mated given relatively little information about the original code. Of course improved performance (both
in terms of memory usage and computational complexity) can be achieved through careful application of
Two main approaches exist for applying AD to computer codes: the operator overloading approach and
the source-to-source translation approach. The former uses the operator overloading capabilities of many
modern programming languages (e.g., C++, Fortran-90, and Pascal-SC) to have the compiler generate
the additional instructions required for computing the derivative values. Some AD implementations
applying this technique are [36, 37]. The latter approach relies on the use of compiler technologies to
generate new code for computing the derivative values. That is, given a code for evaluating a function,
source-to-source AD tools will generate new code for evaluating the derivative values. This new code
is compiled and linked with the application requiring the derivative values. Some AD implementations
described above to generate from a function evaluator code a wide variety of additional codes required
when applying state-of-the-art numerical algorithms to the model. DAEPACK provides a component that
constructs derivative code for evaluating gradients of scalar functions and vector-Jacobian and Jacobian-
vector products of vector functions. In addition, if the model is sparse, DAEPACK can be used to
generate code exploiting this sparsity to compute sparse Jacobian matrices with surprising efficiency (see
Example section below). DAEPACK may also be used to generate code that determines the sparsity
pattern of the code for use by sparse linear solvers, in block triangularization, and structural diagnosis. If
the original code contains nonsmooth intrinsic functions or IF statements, then DAEPACK can generate
new code which allows these discontinuities to be handled properly during numerical integration [41] and
parametric sensitivity analysis [13]. DAEPACK can also be used to generate new codes that evaluate the
19
interval extension of the original code and convex relaxations of nonlinear functions in the original code.
This enables the user to apply algorithms such as interval Newton/Generalized bisection [22] and global
Figure 3 contains a schematic showing the steps performed by DAEPACK to transform the user’s
original code into a set of new codes providing a wide variety of information. In this example, the user
Fortran Source Code for
New Fortran Source
External Procedure Evaluation
Code for Hybrid Calculations
File1.for EXT(...)
{
"Locked" Model
Evaluation
EXTDL(...)
Discontinuity Function
SUB1(...) SUB2(...)
Evaluation
Sparsity Pattern
FUNC1(...) FUNC2(...) TRANSLATION EXTDL_SP(...)
Determination
CODE
GENERATION
DAEPACK
Automatic
Code Generation
must simply provide the collection of Fortran source codes defining the external model and a DAEPACK
specification file. The specification file contains information such as what code is to be generated, which
are the independent and dependent variables, etc. ABACUSS II is able to generate automatically this
specification file from the ABACUSS II input file containing the declaration of the external code. In this
figure, six new Fortran codes are generated from the original code, providing all of the information for
20
performing hybrid parametric sensitivity analysis [13].
Figure 4 shows how DAEPACK is used in conjunction with ABACUSS II to incorporate properly
external codes into an overall process model. In this example, the ABACUSS II input file contains a
reference to an external subroutine EXT, for which the user has source code. Given the location of this
source code and a description of the independent and dependent variables (from the declaration of EXT
in the ABACUSS II input file), ABACUSS II will call DAEPACK to generate all of the code it needs to
perform the subsequent calculation robustly, efficiently, and correctly. The code generated by DAEPACK
will be compiled and placed into a shared library which can be dynamically loaded by ABACUSS II prior
21
to executing the calculation.
To summarize this section, all of the additional symbolic information required by ABACUSS II for
proper numerical calculation (e.g., sparsity pattern, analytical derivative values, discontinuity informa-
tion, etc.) can be obtained for external code with DAEPACK (provided the source code is available
and written in Fortran). Thus, all equations, regardless of whether they are written in the ABACUSS
II input language or available as external code, are treated in a consistent and correct manner within
ABACUSS II. The following section contains several example problems illustrating the importance of
5 Examples
This section contains several example problems illustrating the importance of proper incorporation of
The first example consists of three (very) small DAEs containing discontinuities. Each of these DAE
systems were coded into a Fortran subroutine and incorporated into ABACUSS II in two ways: 1) as
a “black-box” and 2) with additional symbolic information provided by DAEPACK. The first example,
case A, is:
ẋ = 1 (5)
xk1
if x ≤ 0
y =
xk2
otherwise.
22
See top diagram in Figure 5. The integration statistics are shown in first two columns of Table 1. As
might be expected by the continuity at the event there is not a significant difference between the two
cases although explicit handling of the event does reduce the number of error test failures. The second
ẋ = 1 (6)
xk1
if x ≤ 0
y =
αx
otherwise.
See center diagram in Figure 5. The integration statistics for this example are shown in the second two
columns of Table 1. In this example, there is a significant difference between the case where the external
procedure is incorporated as a “black-box” and when it is not. The third example, case C, is:
ẋ = 1 (7)
xk1
if x ≤ 0
y =
α + xk2
otherwise.
See bottom diagram in Figure 5. In this example, the numerical integration fails at the event when the
external is treated as a “black-box”. Although these three simple DAEs are small they illustrate the
importance of properly handling external procedures containing discontinuities. In the second example,
the “black-box” mode required nearly twice as many residual evaluations and Jacobian evaluations and
LU factorizations. If these equations were part of a much larger overall model then this additional work
could be quite significant. In the third example, if these equations were part of a much larger overall
model, they would probably cause a numerical integration failure and it would be very difficult to isolate
23
Table 1: Integration statistics for three simple discontinuity examples. Steps = number of integration
steps performed. RES = number of model residual evaluations required. JAC = number of Jacobian
evaluations required. ETF = number of integration error test failures. CTF = number of integration
JAC 52 50 89 48 I 38
ETF 13 8 25 4 L 2
CTF 0 0 0 0 S 0
24
5.2 Correct Use of Chemical Kinetics Libraries
The remaining two examples illustrate the importance of handling properly external procedures for phys-
ical property and chemical kinetics calculations. These computations are typically very complex, costly,
and difficult to code correctly and efficiently. In addition, there are many existing high quality codes
available that have been extensively validated and are trusted. Consequently, they should be used with
The first example is a model of an adiabatic, constant pressure problem for a perfectly stirred, batch
dyi
ρ = Wi wi i = 1, . . . , Nc (8)
dt
Nc
dT X
ρCp = Wk hk wk (9)
dt
k=1
ρ = ρ(T, y) (10)
where ρ is the mass density, yi is the mass fraction of component i, Nc is the number of chemical
species, Wi is the molecular weight of species i, T is temperature, Cp is the constant pressure heat
capacity, wi is the molar production rate of species i per unit volume, and hi is the enthalpy of species
i. The molar production rates, heat capacity, mass density, and enthalpies were computed with external
Fortran subroutines from the CHEMKIN-II library [42]. The chemical mechanism for the reaction of
oxygen, nitrogen, and n-heptane involves 544 chemical species (i.e., Nc = 544) and 2446 reactions.
This mechanism was obtained from Curran et al. [43]. Two simulations were performed with this
model, one where the external procedures were treated as “black-boxes” and the other when they were
incorporated properly using DAEPACK. The numerical integration was performed for a simulated time of
5.0 seconds on a 1.4 GHz PC with 512 MB RAM. The initial conditions were: yO2 = 0.0252, yN2 = 0.9734,
yC7 = 0.0014, yO = yH = 1o−16 and T = 800 K. Table 2 contains the timing information for this example.
25
Table 2: Timing information for constant pressure batch reactor example. Calculations performed on 1.4
Clearly significant improvements are realized by exploiting the additional symbolic information available
when the external procedures are incorporated properly. This benefit is examined in more detail in the
following example.
The next example is a reacting flow simulation. A gaseous mixture of oxygen, nitrogen, and n-heptane
are injected into a tubular reactor. In this model, the reactor is assumed to be isothermal and isobaric
and the gas is assumed to be ideal. Also, it is assumed that there are only variations in time and the
axial direction and diffusion is negligible. These assumptions result in the following system of equations:
∂xi ∂(xi vz ) RT
+ − wi = 0 i = 1, . . . , Nc − 1 (11)
∂t ∂z P
Nc
∂vz RT X
− wk = 0 (12)
∂z P
k=1
Nc
X
xk = 1 (13)
k=1
where xi (t, z) is the mole fraction of species i, Nc is the number of species, t is time, z is the axial
coordinate, R is the gas constant, T is temperature, P is pressure, vz (t, z) is the gas velocity in the
z-direction (all other velocity components are assumed to be negligible), and wi (T, P, x(t, z)) is the molar
production rate of species i per unit volume. The PDAE above was discretized using upwind finite
differences and coded into an ABACUSS II input file. As in the previous example, the molar production
rates, {wk }N
k=1 , were computed with external Fortran subroutines from the CHEMKIN-II library and the
c
26
same n-heptane mechanism was used.
Using 10 grid points in the discretization the overall model consisted of 10,890 variables (Nc mole
fractions, Nc molar production rates, and one velocity on each of the ten grid points) and equations.
Figure 6 contains a diagram of the exact sparsity pattern for the unsteady PFR model. The ordering of
variables in this sparsity pattern are the molar production rates on grid point 1, followed by the molar
production rates on grid point 2, and so on. The molar production rates are followed by the mole fractions
on grid point 1, followed by the mole fractions on grid point 2, and so on. The velocities on each grid
point are the last 10 columns of the sparsity pattern. The order of the equations (rows of the sparsity
pattern) are the calls to the external Fortran code to compute the molar production rates on each grid
point, followed by the discretization of the species balance equations, followed by the discretization of
the velocity relationships, and lastly the summation of mole fraction constraints on each grid point. The
sparsity pattern for the portions of the overall model corresponding to external procedures was obtained
with DAEPACK and these blocks can be seen in the upper right corner of the sparsity pattern in Figure
6. Note that each of these sub blocks of the Jacobian matrix are 544 rows by 544 columns, however,
they only contain 12,518 nonzero entries each. Although not clearly evident due to the limited resolution
of the Figure, these blocks are actually approximately 96% sparse. When these external procedures are
treated as “black-boxes”, these 10 blocks of the overall Jacobian matrix would contain 295,936 entries
each.
Two numerical integrations were performed with this model using ABACUSS II, one where the ex-
ternal procedure was treated as a “black-box” and another where symbolic information was obtained
with DAEPACK. Both simulations were performed at an absolute pressure of 12.5 atmospheres and
a temperature of 900 Kelvin. The reactor length was five centimeters. The initial condition was the
tubular reactor initially filled with pure nitrogen. The boundary condition was a fixed composition and
27
Table 3: Timing information for unsteady PFR example. Calculations performed on 1.4 GHz PC with
512 MB RAM.
velocity at the inlet of the reactor (mole fractions of O2 , N2 , and n-heptane equal to 0.0252, 0.9734, and
0.0014, respectively, with trace quantities of oxygen and hydrogen free radicals, mole fractions of 10−16
each). The inlet velocity was fixed at 1 cm/s. The numerical integration was performed for a simulated
time of 0.1 seconds on a 1.4 GHz PC with 512 MB RAM. Table 3 contains the timing information for
this example. This example clearly illustrates the benefit of properly incorporating external procedures
during numerical calculations. Using this additional symbolic information, obtained automatically by
ABACUSS II with DAEPACK, the simulation time reduced from approximately 2.3 hours to 4.7 min-
utes, a 30 fold speed improvement. The performance improvement can be attributed to essentially two
factors. First, by efficient accumulation of (structurally) nonzero derivative values in the sparse blocks
of the Jacobian matrix corresponding to the external code, the overall Jacobian matrix evaluation time
was reduced from 160 seconds to 3.3 seconds. In both scenarios, all derivative values other than those
associated with the external code were computed in the same manner. Second, by exploiting sparsity in
the linear solver, the time for a single LU factorization was reduced from 2 seconds to 0.1 seconds and
the time for a single backsubstitution was reduced from 0.12 seconds to 0.01 seconds. Again note that
in the “black-box” example, it is only the sub blocks associated with the external code that are dense
28
and sparsity is exploited for all other portions of the Jacobian matrix in both scenarios. This example
highlights an interesting observation when performing simulations involving complex physical properties
or kinetic mechanism calculations. In many numerical integration calculations, it is the cost of the LU
factorization that tends to dominate the cost of the overall calculation (which is why most modern nu-
merical integration codes attempt to reduce the number of LU factorizations performed). However, if
the residual evaluation is costly, as is the case in the example above, the Jacobian evaluation time can
significantly exceed the cost of the linear algebra. Thus, proper incorporation of external procedures
can substantially reduce the cost of the calculation even if matrix-free linear solvers are applied. Also of
significance is the amount of memory saved by exploiting sparsity. For example, when the external pro-
cedures were treated as “black-boxes”, each dense block of the overall Jacobian matrix corresponding to
these equations contained 295,936 entries. Since the matrix is stored in sparse triplet form (i.e., a double
precision array containing the values of the Jacobian matrix and two integer arrays containing the row
and column indices) the external blocks alone require approximately 47 megabytes to store the derivative
information. This is compared to only 2 megabytes of storage required for the external procedure blocks
6 Conclusion
This paper describes how source-to-source code transformation techniques can be used to incorporate
external code into an equation-oriented process modeling environment properly. This enables the user
to write complex models described partly in the input language of the process simulator and partly with
new or legacy external codes. By properly handling these external procedures, the user can be confident
that the subsequent calculation will be performance robustly, efficiently, and correctly.
29
The ideas described in this paper have been implemented with the equation-oriented process simulator
ABACUSS II and numerical and symbolic software library DAEPACK. DAEPACK currently works with
Fortran but can be readily extended to other procedural programming languages. Comparable capabilities
can be achieved with object-oriented languages like C++ using operator overloading features.
Although the focus of this paper is on incorporating external procedures into equation-oriented mod-
eling environments, the techniques described are quite useful for incorporating external procedures into
modular simulators for steady-state simulation and optimization. In particular, the ability to gener-
ate fast and accurate analytical derivative values can often substantially improve the performance of
Acknowledgments
The authors would like to acknowledge support from the EPA Center for Airborne Organics at MIT and
References
[1] J. M. Douglas. Conceptual Design of Chemical Processes. McGraw-Hill, New York, 1988.
[2] J. D. Seader. Computer modeling of chemical processes. In AIChE Symposium Series, volume 81.
30
[4] L. T. Biegler and R. R. Hughes. Process optimization: A comparative case study. Computers and
[5] S. C. Kassianides. An Integrated System for Compute Based Training of Process Operators. PhD
[6] S. Mani, S. K. Shoor, and H. S. Pederson. Experience with simulator training for ammonia plant
[7] F. A. Perris. The growing importance of dynamic simulation for process engineers. In Dynamic
[8] J. D. Perkins and G. W. Barton. Modelling and simultion in process operation. In G. V. Reklaitis and
1987.
[9] Paul Inigo Barton. The Modeling and Simulation of Combined Discrete/Continuous Processes. PhD
[10] M. B. Carver. Efficient integration over discontinuities in ordinary differential equation simulations.
G. Savastano, and G. C. Vansteenkiste, editors, Proceedings of the 9th IMACS Conference on Sim-
[12] Santos Galán, Willian F. Feehery, and Paul I. Barton. Parametric sensitivity functions for hybrid
31
[13] John E. Tolsma and Paul I. Barton. Hidden discontinuities and parametric sensitivity calculations.
[14] OMG. The Common Object Request Broker: Architecture and specifications. Technical Report
Release 2.0 July 1995, Update July 1996, Object Management Group, 1997. Formal document
97-02-25, (http://www.omg.org).
[16] CAPE-OPEN Consortium. Conceptual Design Document, December 1997. Adobe Acrobat PDF
[17] B. L. Braunschweig, C. C. Pantelides, H. I. Britt, and S. Sama. Open software architectures for
process modeling: Current status and future perspectives. AIChE Symposium Series, presented at
[18] J. G. Pearce. Computater simulation of multi-state systems. In Proc UKSC Conference on Computer
[19] M. P. Avraam, N. Shah, and C. C. Pantelides. Modelling and optimisation of general hybrid systems
in the continuous time domain. Computers and Chemical Engineering, 22(Suppl.):S221–S228, 1998.
[20] Clemens Szyperski. Component Software: Beyond Object-Oriented Programming. ACM Press, New
[21] Edward P. Gatzke, John E. Tolsma, and Paul I. Barton. Construction of convex function relaxations
using automated code generation techniques. submitted to Optimization and Engineering, 2001.
[22] R. E. Moore. Methods and Applications of Interval Analysis. SIAM, Philadelphia, 1979.
32
[23] John E. Tolsma, Jerry Clabaugh, and Paul I. Barton. ABACUSS II: Advanced modeling environment
and embedded process simulator. Technical Report ABACUSS II Web Page, Massachusetts Institute
[24] P. W. Holl, W. Marquardt, and E. D. Gilles. DIVA – A powerful tool for dynamic process simulation.
[25] Peter C. Piela. ASCEND: An Object-oriented Computer Environment for Modeling and Analysis.
[26] A. Griewank. Evaluating Derivatives: Principles and techniques of algorithmic differentiation. SIAM,
[27] John E. Tolsma and Paul I. Barton. DAEPACK: A combined symbolic and numeric library for
general numerical calculations. Technical Report DAEPACK Web Page, Massachusetts Institute of
[28] John E. Tolsma and Paul I. Barton. DAEPACK: An open modeling environment for legacy models.
[29] Andreas Griewank. On automatic differentiation. In M. Iri and K. Tanabe, editors, Mathematical
Programming: Recent Developments and Applications, pages 83–108. Kluwer Academic Publishers,
Dordrecht, 1989.
[30] Masao Iri, T. Tsuchiya, and M. Hoshi. Automatic computation of partial derivatives and rounding
and Applied Mathematics, 24:365–392, 1988. Original Japanese version appeared in J. Information
33
[31] Masao Iri. History of automatic differentiation and rounding estimation. In Andreas Griewank and
[32] John E. Tolsma and Paul I. Barton. On computational differentiation. Computers and Chemical
[33] B. M. Averick, J. J. Moré, C. H. Bischof, A. Carle, and A Griewank. Computing large sparse
Jacobian matrices using automatic differentiation. SIAM J. Sci. Stat. Comput., 15:285–294, 1994.
[34] C. H. Bischof, P. Khademi, A. Bouaricha, and A. Carle. Efficient computation of gradients and
[35] John E. Tolsma and Paul I. Barton. Efficient calculation of sparse Jacobians. SIAM Journal on
differentiation in Ada: Some practical experience. Opimization Methods and Software, 4:47–73, 1994.
[37] A. Griewank, D. Juedes, and J. Utke. ADOL–C: A package for the automatic differentiation of
[38] Christian Bischof, Alan Carle, George Corliss, Andreas Griewank, and Paul Hovland. ADIFOR –
Generating derivative codes from Fortran programs. Scientific Programming, 1(1):11–29, 1992.
[39] N. Rostaing, S. Dalmas, and A. Galligo. Automatic differentiation in Odyssee. Tellus, 45A:558–568,
1993.
34
[40] R. Giering and T. Kaminski. Recipes for adjoint construction. ACM Transactions on Mathematical
[41] Taeshin Park and Paul I. Barton. State event location in differential algegraic models. ACM Trans-
[42] R. J. Kee, F. M. Rupley, and J. A. Miller. CHEMKIN-II: A FORTRAN chemical kinetics package
for the analysis of gas-phase chemical kinetics. Technical Report Technical Report SAND89-8009,
35
x k1 x k2
k2 > k1
x k1 α x
α >0
k1 > 1
x k1 α +xk2
α >0
36
Figure 6: Sparsity pattern of the unsteady PFR model.
37