Scalable Parallel Nonlinear Optimization With Pynumero and Parapint
Scalable Parallel Nonlinear Optimization With Pynumero and Parapint
and Parapint
Jose S. Rodriguez1 , Robert Parker2 , Carl D. Laird2 , Bethany Nicholson3 ,
John D. Siirola3 , and Michael Bynum3
1
Davidson School of Chemical Engineering, Purdue University, West
Lafayette, IN, 47907 , Email: [email protected]
2
Carnegie Mellon University, Department of Chemical Engineering,
Pittsburgh, PA 15213 , Email: [email protected],
[email protected]
3
Center for Computing Research, Sandia National Laboratories,
Albuquerque NM 87185 , Email: [email protected], [email protected],
[email protected]
Abstract
We describe PyNumero, an open-source, object-oriented programming framework
in Python that supports rapid development of performant parallel algorithms for struc-
tured nonlinear programming problems (NLP’s) using the Message Passing Interface
(MPI). PyNumero provides three fundamental building blocks for developing NLP al-
gorithms: a fast interface for calculating first and second derivatives with the AMPL
Solver Library (ASL), a number of interfaces to efficient linear solvers, and block-
structured vectors and matrices based on NumPy, SciPy, and MPI that support dis-
tributed parallel storage and computation. PyNumero’s design enables efficient, par-
allel algorithm development using high-level Python syntax while keeping expensive
numerical calculations in fast, compiled implementations based on languages like C
and Fortran. To demonstrate the utility of PyNumero, we also present Parapint, a
Python package built on PyNumero for parallel solution of dynamic optimization prob-
lems. Parapint includes a parallel interior-point solver based on Schur-Complement
decomposition. We illustrate the effectiveness of PyNumero for developing parallel al-
gorithms with both code examples and scalability analyses for parallel matrix-vector
dot products, parallel solution of structured systems of linear equations using Schur-
Complement decomposition, and the parallel solution of a 2-dimensional PDE optimal
control problem. Our numerical results show nearly perfect scaling to over 1000 cores
for large matrix-vector dot products and structured linear systems. Moreover, we ob-
tain over 360 times speedup for the optimal control example.
1
Contents
1 Introduction 3
2 PyNumero Overview 5
2.1 Block Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Performance of MPI-based block matrices and vectors . . . . . . . . . . . . . 8
2.3 Linear Solver Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 NLP Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 An Equality-Constrained SQP Example . . . . . . . . . . . . . . . . . . . . . 12
3 Parapint 17
3.1 Parapint Composite NLPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Schur-Complement Decomposition . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Interior-Point Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Parallel solution of dynamic optimization problems . . . . . . . . . . . . . . . 20
4 Distribution 22
6 Acknowledgements 24
2
1 Introduction
Recent needs for efficient solution of large-scale, structured nonlinear optimization prob-
lems have led to the development of tailored solution algorithms for exploiting problem
structure. Special structure arises in many applications, including dynamic optimization,
stochastic programming, infrastructure applications with natural network structure (e.g.,
power transmission systems), parameter estimation, and many others. All of these appli-
cations have characteristics that result in large-scale optimization problems. For example,
dynamic optimization problems can become quite large due to discretization of time and
space in differential equations. Adequate sampling of large uncertainty spaces can also result
in large-scale stochastic programming problems with many scenarios.
Although these applications often involve the solution of very large nonlinear programs
(NLPs), many of the problems have an inherent structure that can be exploited using
decomposition and even parallel solution algorithms. This paper discusses the packages
PyNumero and Parapint, which are designed to support rapid development of performant
parallel algorithms for structured NLP problems. To frame the discussion of these tools, we
will focus on an example formulation addressed by algorithms within Parapint.
Many of the optimization problems discussed above can be structurally partitioned as
in Problem (1).
X
min fs (xs ) (1a)
xs ,d
s∈S
s.t.
cs (xs ) = 0 ∀s ∈ S (1b)
xs ≤ xs ≤ xs ∀s ∈ S (1c)
Ps x s − Psd d =0 ∀s ∈ S (1d)
Here, the set S denotes a set of partitions. These partitions could be formed by finite
elements in dynamic optimization, scenarios in stochastic programming, or data sets in
parameter estimation. The variables xs ∈ RNs are only involved in the constraints associated
with partition s (cs ∈ RMs ). However, there is often a set of coupling variables, d ∈ RD ,
that link two or more of the partitions, enforced by Equation (1d). Here, Ps ∈ RCs ×Ns ,
Psd ∈ RCs ×D , and Cs is the number of coupling variables in partition s. For example, the
coupling variables in two-stage stochastic programming are the first stage variables, whereas
the coupling variables in dynamic optimization are the differential variables at finite element
boundaries.
Many NLP algorithms have been developed to exploit the structure of Problem (1)
(Gondzio and Grothey 2009, Chiang et al. 2014, Shin et al. 2020a). Nonlinear interior-
point methods have been a popular choice with algorithms that exploit the structure in
Problem (1) by parallelizing the solution of the KKT system at every iteration of the NLP
algorithm. It is important to notice that the solution of the KKT system comprises the
majority of the computational effort of each iteration of interior-point algorithms. For
this reason a significant amount of research has focused on accelerating this step. Schur-
Complement decomposition has been used for parallel solution of parameter estimation
problems, stochastic programming problems, and dynamic optimization problems (Zavala
et al. 2008, Kang et al. 2014, Petra et al. 2014, Word et al. 2014). Zavala et al. (2008)
and Word et al. (2014) form the Schur-Complement explicitly using repeated backsolves
with a single factorization of each diagonal block in the KKT system. Petra et al. (2014)
3
also forms the Schur-Complement explicitly, but with a custom factorization routine that
produces the Schur-Complement as a by-product of the factorization of each of the diagonal
blocks in the KKT system. On the other hand, Kang et al. (2014) avoids formation of
the Schur-Complement entirely with a preconditioned conjugate gradient method. Other
algorithms have been utilized for exploiting the structure of interior-point KKT systems,
including Cyclic Reduction (Wan et al. 2019) and overlapping Schwarz (Shin et al. 2020a).
Alternatives also include decomposition approaches that solve sequences of NLP subprob-
lems to exploit the structure of Problem (1). For example, the alternating-direction method
of multipliers (ADMM) and Progressive Hedging (PH) (Eckstein and Bertsekas 1992, Ro-
driguez et al. 2018, Rockafellar and Wets 1991, Word et al. 2012) have both been used for
structured NLPs. More recently, Rodriguez et al. (2020) proposed a hybrid approach with
an ADMM-based preconditioner for iterative solution of structured KKT systems.
Several of these algorithms have been implemented in existing software. OOPS (the
Object-Oriented Parallel Solver) and PIPS-NLP are both parallel interior-point solvers writ-
ten in C++ that utilize Schur-Complement decomposition for parallel solution of the KKT
system (Gondzio and Grothey 2009, Chiang et al. 2014). MadNLP is a Julia package that
contains a parallel interior-point solver and has been utilized with an iterative method which
uses a Restrictive-Additive Schwarz (RAS) preconditioner (Shin et al. 2020b). The RAS
method was tested using a multi-threaded implementation. ParNMPC is a Matlab-based
package for parallel nonlinear model-predictive control and uses C and C++ code generation
for parallelization with OpenMP (Deng and Ohtsuka 2019).
While the above solvers all demonstrate impressive performance improvements over se-
rial algorithms, developing new, distributed memory algorithms is a challenging and time-
consuming task. The vast majority of both serial and parallel NLP solvers are implemented
in low-level languages such as C, and they are difficult to modify or extend. Significant
software engineering expertise is required to prototype even minor modifications to exist-
ing solvers. This impedes the exploration, development, and testing of new ideas slowing
research progress in this area.
To address and mitigate these challenges, we present PyNumero, a Python package for
numerical optimization that provides a high-level programming framework for rapidly devel-
oping efficient, parallel, and scalable solution algorithms for structured NLP’s. PyNumero
has been designed with computational performance in mind and utilizes Python-C interfaces
internally to ensure that expensive numeric calculations are all performed using compiled
kernels. While computations performed directly in Python can be slow, efficient algorithms
can be developed in Python by utilizing interfaces to low-level languages for computationally
intensive tasks, as demonstrated by NumPy, a widely used package for scientific computing
(Virtanen et al. 2020a).
PyNumero supports a variety of algorithmic building blocks that allow researchers to
quickly explore new parallel algorithms. In particular, PyNumero provides an interface to
the AMPL Solver Library (ASL) for automatic differentiation, interfaces to several linear
solvers, and parallel implementations of block-structured matrices and vectors. Parallel
linear algebra routines are based on the Message Passing Interface (MPI) and can be used
on shared or distributed memory machines. The intent is to enable more practitioners and
researchers in nonlinear optimization to write numerical algorithms and rapidly implement
new ideas in a high-level language with little to no sacrifice in computational performance.
Furthermore, PyNumero can be used alongside Pyomo to provide a unified Python platform
for both modeling and solving optimization problems. This platform allows PyNumero to
directly exploit Pyomo model structure and facilitates rapid implementation of structure-
4
exploiting optimization algorithms.
We demonstrate the effectiveness and computational performance of PyNumero with
code examples and scalability analyses for parallel matrix-vector dot products and parallel
solution of structured systems of linear equations using Schur-Complement decomposition.
Our results show nearly perfect scaling to over 1000 cores. Moreover, we present Parap-
int, a Python package built on top of PyNumero for parallel solution of both stochastic
and dynamic optimization problems. We present numerical results for a 2-dimensional par-
tial differential equation (PDE)-constrained optimal control problem with over 360 times
speedup over a serial interior-point algorithm.
The remainder of this paper is organized as follows. In Section 2, we present an overview
of PyNumero, describing the MPI-based block-structured matrices and vectors, interfaces
for linear solvers, and interactions with the NLP problem definition. This section closes
with a short example implementation of an equality-constrained sequential quadratic pro-
gramming (SQP) algorithm. In Section 3, we present the Parapint package, describe the
Schur-complement decomposition implementation, and illustrate parallel performance on a
2-D PDE optimal control case study. We provide distribution details for PyNumero and
Parapint in Section 4. Finally, we summarize our results and provide conclusions and future
research directions in Section 5.
2 PyNumero Overview
As shown in Figure 1, PyNumero provides three fundamental components for developing
parallel NLP algorithms. First, PyNumero implements block-based vector matrix classes
with both serial and parallel distributed implementations. These classes conveniently facil-
itate development of optimization algorithms and decomposition strategies for structured
problems. Second, PyNumero provides interfaces to several linear solvers, including the HSL
solvers MA27 and MA57, MUMPS, and any SciPy solver (Duff and Reid 1983, Amestoy
et al. 2000, Virtanen et al. 2020b). These linear solvers form the core computational ker-
nel for many nonlinear algorithms. Third, PyNumero provides a set of NLP interfaces for
function and derivative evaluations. The current interfaces, including the Pyomo interface
(PyomoNLP), perform all derivative calculations in C by calling the ASL (Gay 1997) from
Python. As Figure 1 illustrates, PyNumero provides a high-level Python API but performs
all computationally intensive operations via interfaces to efficient, compiled kernels. In this
section, we provide an overview of these components, illustrate the parallel performance of
the block-based matrix and vector classes, and show how these building blocks can be used
and integrated for NLP algorithm development. More detailed documentation can be found
in the online documentation at https://pyomo.readthedocs.io/.
5
NumPy / SciPy
Vectors and Matrices
NLP Interfaces
Linear Algebra
(serial and parallel)
Interfaces
AslNLP Mumps
PyomoNLP HSL
External Calls
PyNumero
(e.g., C++, TensorFlow)
…
on top of NumPy arrays and SciPy sparse matrices. PyNumero therefore benefits from the
fast compiled implementations within NumPy (e.g. vectorization and broadcasting), makes
all subroutines in NumPy/SciPy available for implementing algorithms, and minimizes the
burden on users to learn additional syntax besides what is offered in NumPy/SciPy.
PyNumero extends these implementations and provides classes for working with block-
structured matrices and vectors. These classes facilitate optimization algorithm development
and support distributed parallel storage, computation, and interrogation. In a KKT system
for an equality-constrained NLP, for example, the KKT matrix is composed of the Hessian of
the Lagrangian and the Jacobian of the constraints. Additionally, the KKT right-hand-side
(rhs) is composed of the gradient of the Lagrangian and the residuals of the constraints.
PyNumero supports construction of these composite matrices efficiently without copying the
underlying numerical data. These implementations are designed to work seamlessly with
NumPy arrays and SciPy sparse matrices.
Figure 2 shows a class diagram for PyNumero’s BlockVector and MPIBlockVector, the
serial and parallel implementations of block structured vectors, respectively. As the figure
shows, both of these classes inherit from NumPy’s ndarray class, and their application
programming interfaces (APIs) mirror that of NumPy’s ndarray. This allows algorithm
developers to write intuitive code while still exploiting structure for parallel computing.
Most parallel NLP algorithms work by exploiting problem structure. For example, cer-
tain classes of problems, such as stochastic programming and dynamic optimization, impose
certain structures on the KKT system. An example of the KKT system of a stochastic
programming problem is presented in Figure 3 where the plotted points represent non-zero
entries in the matrix. To this end, the PyNumero implementations of BlockVector and
BlockMatrix support arbitrary hierarchical representations of numerical data for structured
linear algebra operations. The MPIBlockVector and MPIBlockMatrix extend this function-
ality to support parallel, distributed data structures where individual blocks of numerical
data can be owned by different processes. While the BlockVector and MPIBlockVector
classes inherit from numpy.ndarray, they represent an ordered list of sub-vectors or blocks
that are either additional BlockVector objects or NumPy ndarray objects. The leaves
within these structures are all NumPy ndarray objects that support efficient elementary
6
numpy.ndarray
size: int
shape: Tuple[int]
all()
any()
min()
max()
compress()
dot()
add()
pynumero.BlockVector pynumero.MPIBlockVector
size: int size: int
shape: Tuple[int] shape: Tuple[int]
blocks: List[numpy.ndarray] blocks: List[numpy.ndarray]
all() all()
any() any()
min() min()
max() max()
compress() compress()
dot() dot()
add() add()
set_block() set_block()
get_block() get_block()
Figure 2: Class diagram for PyNumero’s block vectors. The list of attributes and methods
is incomplete.
linear algebra operations. This design allows us to utilize an API similar to that of NumPy,
while supporting composition through block structures and also utilizing NumPy for efficient
computation. The BlockMatrix and MPIBlockMatrix are similarly structured, containing
a set of sub-matrices (i.e., blocks) indexed by block row and column. Each of these blocks
can either be additional BlockMatrix objects or SciPy sparse matrices. As with their vector
counterparts, the leaves within these structures are SciPy objects that support efficient com-
putation. This design is important to represent parallel, distributed matrices with arbitrary
block structures. In addition to the benefits of distributed data ownership of the parallel
versions, simply being able to represent and manipulate the KKT system using its block
sub-matrices greatly simplifies the implementation of both general optimization algorithms
and tailored decomposition strategies.
We now illustrate the use of block-based vector classes in PyNumero. Listing 1 demon-
strates how to perform vector-vector addition with BlockVector and how NumPy arrays
are utilized. Lines 5-11 create two instances of BlockVector: x and y, each with two blocks.
Line 14 performs vector-vector addition with these two BlockVectors:
z1 = x + y (2)
Many standard operations and NumPy methods, in addition to vector-vector addition, are
supported, and PyNumero provides the underlying block-based implementations.
The MPIBlockVector automatically takes care of parallelization for all of the basic op-
erations needed for algorithm development. The only major difference in the API is in
construction. When constructing instances of MPIBlockVector, users must specify which
blocks are owned by which processes (i.e., the MPI rank ). Listing 2 demonstrates how to
7
0 5 10 15 20 25 30
0
10
15
20
25
30
8
1 from pyomo . contrib . pynumero . sparse import BlockVector
2 import numpy as np
3
4
5 x = BlockVector (2)
6 x . set_block (0 , np . random . normal ( size =3) )
7 x . set_block (1 , np . random . normal ( size =3) )
8
9 y = BlockVector (2)
10 y . set_block (0 , np . random . normal ( size =3) )
11 y . set_block (1 , np . random . normal ( size =3) )
12
13 # add x and y
14 z1 = x + y
9
2.00
1.75
Dot Product Time (8 cores as base)
1.50
1.25
1.00
0.75
0.50
0.25
0.00
8 16 32 64 128 256 512 1024
Number of Blocks/Processors
Figure 4: Weak scaling for PyNumero’s parallel matrix-vector dot product.
optimization problem with the constraints and variables ordered for Schur-Complement
decomposition (Word et al. 2014). Each nonzero block contains a square 100,000 by 100,000
matrix with a sparsity of 0.1% (each nonzero block contained 10 million non-zeros). Figure
4 shows the weak scaling results where the problem size is increased from 8 to 1024 blocks
while the cores utilized are simultaneously increased from 8 to 1024. At 1024 cores, this
represents a matrix with over 10 billion non-zeros. This is a challenging parallel problem to
test scalability since the computational effort required by each process is low and the overall
performance on many cores is dominated by communication. The dot product performed
with 1024 cores is 128 times larger than that performed with 8 cores. Nevertheless, this
distributed matrix-vector product is performed in less than 0.3 seconds and takes only 1.4
times longer than the smaller problem over 8 cores. Furthermore, if the work per processor
was larger, we would expect improved scaling results (as illustrated in other examples later).
As the figure shows, PyNumero’s parallel scalability is very good.
10
1 from scipy . sparse import tril
2 from pyomo . contrib . pynumero . linalg . ma27 import MA27Interface
3
4
5 A = get_coo_ matrix ()
6 rhs = get_rhs ()
7 solver = MA27Interface ()
8 solver . set_cntl (1 , 1e -6) # set the pivot tolerance
9 A_tril = tril ( A ) # extract lower triangular portion of A
10 status = solver . d o _ s y m b o l i c _ f a c t o r i z a t i o n ( dim =5 ,
11 irn = A_tril . row ,
12 icn = A_tril . col )
13 status = solver . d o _ n u m e r i c _ f a c t o r i z a t i o n ( dim =5 ,
14 irn = A_tril . row ,
15 icn = A_tril . col ,
16 entries = A_tril . data )
17 x = solver . do_backsolve ( rhs )
NumPy/SciPy objects, subroutines available in these packages can be used when writing
algorithms in PyNumero. This includes the SciPy direct and iterative solvers as well as any
other Python package based on NumPy such as PyTrilinos (Sala et al. 2008), Petsc4py (Dal-
cin et al. 2011), Cysparse, Krypy, and PyMumps. PyNumero also provides interfaces for the
HSL linear solvers MA27 and MA57 to solve sparse, symmetric linear systems. These latter
solvers are important in interior-point algorithms as they also provide the inertia (number
of positive and negative eigenvalues) of the factorized matrix.
Listing 3 illustrates how to use PyNumero’s interface to MA27 to solve a symmetric
linear system of equations. Line 2 imports the MA27Interface from PyNumero. Lines
5–6 call functions to construct the matrix and right-hand-side. Lines 7–8 construct an
instance of MA27Interface and set the pivot tolerance to 10−6 . Line 9 extracts the lower
triangular portion of the matrix (the matrix is symmetric). Lines 10–17 perform the symbolic
factorization, numeric factorization, and back-solve. These methods enable the use of a
single symbolic factorization for multiple matrices of the same nonzero structure or a single
factorization for multiple back-solves. This example highlights the ease of using an efficient
linear solver through a Python interface.
min f (x)
gL ≤ g(x) ≤ gU (3)
xL ≤ x ≤ xU
where x ∈ Rn are the primal variables with lower and upper bounds xL ∈ Rn and xU ∈ Rn
respectively. The inequality constraints g : Rn → Rm are bounded by gL ∈ Rm and
gU ∈ Rm . PyNumero also provides an interface with explicit distinction between the equality
11
(where gL = gU ) and inequality constraints (where gL < gU ) to facilitate the implementation
of algorithms that require such distinction,
min f (x)
s.t. c(x) = 0 (4)
dL ≤ d(x) ≤ dU
xL ≤ x ≤ xU
The equality constraints are represented by c : Rn → Rmc and d : Rn → Rmd denotes the
inequality constraints with bounds dL ∈ Rmd and dU ∈ Rmd and m = mc + md .
Gradient-based optimization algorithms have been proven to be among the most effective
algorithms for solving nonlinear optimization problems. The development of fast automatic
differentiation tools (Andersson 2013, Griewank et al. 1996, Fourer et al. 1993) enables
efficient computation of both first- and second-order derivatives. PyNumero uses the AMPL
Solver Library (ASL) to compute derivative information, and the Ctypes Python package
to call the underlying ASL subroutines from Python.
The Pynumero AslNLP class takes the problem definition in the form of an .nl file (Gay
2005), maps this to the form of Equations (3) or (4), and provides an API for evaluating the
model and its derivatives. The PyomoNLP class inherits from AslNLP, providing the same
API for evaluating the model, while also giving access to the associated Pyomo components
for the constraints and variables. These interfaces return derivative values from the ASL
using NumPy arrays and SciPy sparse matrices (Harris et al. 2020, Virtanen et al. 2020a).
This leverages the capabilities within the NumPy ecosystem to avoid marshalling of data
between the C and Python environments and enables performant Python implementations
of gradient-based nonlinear optimization algorithms. Listings 4 and 5 show a small example
of how a PyomoNLP instance can be used for function and derivative evaluations.
12
1 from pyomo . contrib . pynumero . interfaces . pyomo_nlp import PyomoNLP
2 import pyomo . environ as pyo
3 import numpy as np
4
5 # define optimization model
6 m = pyo . ConcreteModel ()
7 m . x = pyo . Var ([1 , 2 , 3] , bounds =(0.0 , None ) , initialize =3.0)
8 m . c = pyo . Constraint ( expr = m . x [3]**2 + m . x [1] == 25)
9 m . d = pyo . Constraint ( expr = m . x [2]**2 + m . x [1] <= 18.0)
10 m . o = pyo . Objective ( expr = m . x [1]**4 - 3* m . x [1]* m . x [2]**3 + m . x [3]**2 - 8.0)
11
12 # create NLP
13 nlp = PyomoNLP ( m )
14
15 # Set values of variables
16 nlp . set_primals ( np . array ([4 , -1 , 3]) )
17
18 # accessing variable values
19 primals = nlp . get_primals ()
20 print ( " Values of primal variables :\ n " , primals )
21 duals = nlp . get_duals ()
22 print ( " Values of dual variables :\ n " , duals )
23
24 # variable bounds
25 primals_lb = nlp . primals_lb ()
26 primals_ub = nlp . primals_ub ()
27 print ( " Variable lower bounds :\ n " , primals_lb )
28 print ( " Variable upper bounds :\ n " , primals_ub )
29
30 # NLP function evaluations
31 f = nlp . e v a l u a t e _ o b j e c t i v e ()
32 print ( " Objective Function \ n " , f )
33 g = nlp . e v a l u a t e _ c o n s t r a i n t s ()
34 print ( " Constraints \ n " , g )
35 c = nlp . e v a l u a t e _ e q _ c o n s t r a i n t s ()
36 print ( " Equality Constraints \ n " , c )
37 d = nlp . e v a l u a t e _ i n e q _ c o n s t r a i n t s ()
38 print ( " Inequality Constraints \ n " , d )
39
40 # NLP first and second - order derivatives
41 df = nlp . e v a l u a t e _ g r a d _ o b j e c t i v e ()
42 print ( " Gradient of Objective Function :\ n " , df )
43 jac_g = nlp . e v a l u a t e _ j a c o b i a n ()
44 print ( " Jacobian of Constraints :\ n " , jac_g )
45 jac_c = nlp . e v a l u a t e _ j a c o b i a n _ e q ()
46 print ( " Jacobian of Equality Constraints :\ n " , jac_c )
47 jac_d = nlp . e v a l u a t e _ j a c o b i a n _ i n e q ()
48 print ( " Jacobian of Inequality Constraints :\ n " , jac_d )
49 hess_lag = nlp . e v a l u a t e _ h e s s i a n _ l a g ()
50 print ( " Hessian of Lagrangian \ n " , hess_lag )
13
1 Values of primal variables :
2 [ 4. -1. 3.]
3 Values of dual variables :
4 [0. 0.]
5 Variable lower bounds :
6 [0. 0. 0.]
7 Variable upper bounds :
8 [ inf inf inf ]
9 Objective Function
10 -502.0
11 Constraints
12 [ -21. 19.]
13 Equality Constraints
14 [ -21.]
15 Inequality Constraints
16 [19.]
17 Gradient of Objective Function :
18 [ -432. -2. -84.]
19 Jacobian of Constraints :
20 (1 , 0) 8.0
21 (0 , 1) -2.0
22 (0 , 2) 1.0
23 (1 , 2) 1.0
24 Jacobian of Equality Constraints :
25 (0 , 1) -2.0
26 (0 , 2) 1.0
27 Jacobian of Inequality Constraints :
28 (0 , 0) 8.0
29 (0 , 2) 1.0
30 Hessian of Lagrangian
31 (0 , 0) -216.0
32 (1 , 1) 2.0
33 (2 , 0) -144.0
34 (2 , 2) 108.0
35 (0 , 2) -144.0
14
1 from pyomo . contrib . pynumero . interfaces . nlp import NLP
2 from pyomo . contrib . pynumero . sparse import BlockVector , BlockMatrix
3 from pyomo . contrib . pynumero . linalg . ma27 import MA27Interface
4 import numpy as np
5 from scipy . sparse import tril
6
7
8 def sqp ( nlp : NLP , max_iter =100 , tol =1 e -8) :
9 # setup KKT matrix
10 kkt = BlockMatrix (2 , 2)
11 rhs = BlockVector (2)
12
13 # create and initialize the iteration vector
14 z = BlockVector (2)
15 z . set_block (0 , nlp . get_primals () )
16 z . set_block (1 , nlp . get_duals () )
17
18 # create the linear solver
19 linear_solver = MA27Interface ()
20 linear_solver . set_cntl (1 , 1e -6) # pivot tolerance
21
22 # main iteration loop
23 for _iter in range ( max_iter ) :
24 nlp . set_primals ( z . get_block (0) )
25 nlp . set_duals ( z . get_block (1) )
26
27 grad_lag = ( nlp . e v a l u a t e _ g r a d _ o b j e c t i v e () +
28 nlp . e v a l u a t e _ j a c o b i a n _ e q () . transpose () * z . get_block (1) )
29 residuals = nlp . e v a l u a t e _ e q _ c o n s t r a i n t s ()
30
31 if ( np . abs ( grad_lag ) . max () <= tol and
32 np . abs ( residuals ) . max () <= tol ) :
33 break
34
35 kkt . set_block (0 , 0 , nlp . e v a l u a t e _ h e s s i a n _ l a g () )
36 kkt . set_block (1 , 0 , nlp . e v a l u a t e _ j a c o b i a n _ e q () )
37 kkt . set_block (0 , 1 , nlp . e v a l u a t e _ j a c o b i a n _ e q () . transpose () )
38
39 rhs . set_block (0 , grad_lag )
40 rhs . set_block (1 , residuals )
41
42 _kkt = tril ( kkt . tocoo () )
43 linear_solver . d o _ s y m b o l i c _ f a c t o r i z a t i o n ( _kkt . shape [0] , _kkt . row ,
44 _kkt . col )
45 linear_solver . d o _ n u m e r i c _ f a c t o r i z a t i o n ( _kkt . row , _kkt . col ,
46 _kkt . shape [0] , _kkt . data )
47 delta = linear_solver . do_backsolve ( - rhs . flatten () )
48 z += delta
15
1 from pyomo . contrib . pynumero . interfaces . pyomo_nlp import PyomoNLP
2
3 # Create a Pyomo model
4 m = b u i l d _ b u r g e r s _ m o d e l ()
5
6 # Create a PyNumero PyomoNLP
7 nlp = PyomoNLP ( m )
8
9 # Solve the problem
10 sqp ( nlp )
functions. The sqp function takes an instance of a NLP object as an argument along with
optional termination criteria. On line 10, we define the KKT matrix as a BlockMatrix with 2
block-rows and 2 block-columns (4 blocks total). On line 11, we define the KKT right-hand-
side as a BlockVector with 2 blocks. Each block in a BlockMatrix can contain a SciPy
sparse matrix or another BlockMatrix. Alternatively, a block can be empty, indicating
that there are no non-zeros in that block. Each block in a BlockVector can contain a
NumPy array or another BlockVector. Line 14 constructs a BlockVector for the iteration
variables that includes the primal and dual variables. The loop for computing steps begins
on line 23. On lines 24-25, we update the values of the primals and duals within the NLP
object. On lines 27-33, we compute the gradient of the Lagrangian and the residuals of the
constraints, check the norms for convergence, and terminate the algorithm if the tolerance
has been satisfied. If not converged, we need to compute a step in the iteration variables.
On lines 35-40, we build the KKT matrix and right-hand-side (rhs) with the Hessian of the
Lagrangian, the Jacobian of the constraints, the gradient of the Lagrangian and the residuals
of the constraints. On lines 42-47, we factorize and solve the KKT system. Finally, on line
48, we update the primals and duals using the computed step.
Code Listing 7 demonstrates the use of the sqp function in Code Listing 6 to solve a
Pyomo model for a 2D PDE-constrained optimal control problem with Burgers’ Equation:
Z 1Z 1
(y − ŷ)2 + αu2 dxdt
min (7a)
0 0
s.t.
∂y ∂2y ∂y
−v 2 + y=u (7b)
∂t ∂x ∂x
y(x = 0) = 0 (7c)
y(x = 1) = 0 (7d)
u(x = 0) = 0 (7e)
u(x = 1) = 0 (7f)
y(0 < x < 1, t = 0) = ŷ (7g)
u(0 < x < 1, t = 0) = 0 (7h)
(
1 x <= 0.5
ŷ = (7i)
0 otherwise
16
The PDE-constrained optimal control problem is first discretized spatially using central
finite difference, followed by backward finite difference (Biegler 2010) using Pyomo.DAE
(Nicholson et al. 2018). The resulting Pyomo model returned from build burgers model
(not shown here, but available online at https://github.com/Parapint/parapint) has
100,600 variables and 80,800 constraints. We construct a PyomoNLP instance on line 7 and
pass it to the sqp function on line 10. On a 2.9 GHz Quad-Core Intel Core i7 MacBook
Pro, Ipopt solves this problem in 1.27 seconds. The sqp function in Code Listing 6 solves
the problem in 1.4 seconds, only 11% slower than Ipopt. This demonstrates that there is
very little overhead introduced by writing the algorithm in Python.
Because PyNumero provides fast automatic differentiation capabilities for Pyomo ex-
pressions using ASL, and since all data is stored in NumPy arrays, users can write efficient
implementations like this SQP algorithm in very few lines of code using standard functions
within the NumPy/SciPy ecosystem.
3 Parapint
The primary goal of PyNumero is to facilitate research on decomposition algorithms for
nonlinear optimization. Here we utilize the MPIBlockVector and MPIBlockMatrix classes
described in Section 2.1 to develop a Schur-Complement based interior-point algorithm
for parallel solution of structured NLPs. The algorithm is available in Parapint (https:
//github.com/Parapint/parapint), an open-source Python package built on PyNumero.
In this section, we present Parapint both as an example of how PyNumero can be utilized
for parallel NLP algorithm development and as a framework for future research.
As shown in Figure 5, Parapint builds on PyNumero in three ways. First, Parapint
Model
Model
Model
Model
Model
NLP
Partition
Composite
Optimization Structure-aware
NLP
Algorithm Linear Solvers
Interfaces Structured Vectors,
Jacobian and Hessian
MPIBlockVector
MPIBlockMatrix
17
interfaces since they construct a large NLP from a number of smaller partitions. Third,
Parapint implements linear solvers that can recognize the structure imposed by the com-
posite interfaces and exploit this structure for parallel solution of the linear system in the
step computation. The composite NLP interfaces and the Schur-Complement approach for
solving structured linear systems are discussed in Sections 3.1 and 3.2, respectively, includ-
ing a scalability analysis of Parapint’s Schur-Complement implementation. In Section 3.3,
we briefly describe Parapint’s interior-point algorithm. In Section 3.4, we show an example
of how a dynamic optimization problem can be solved in parallel using Parapint, along with
computational results.
18
2.00
1.75
Schur-Complement Time (8 cores as base)
1.50
1.25
1.00
0.75
0.50
0.25
0.00
8 16 32 64 128 256 512 1024
Number of Blocks/Processors
Figure 6: Weak scaling for Schur-Complement decomposition.
Here, N is the set of blocks or partitions, yl∗ are vectors of data, ql are the parameters
being estimated, and θ are parameters common to all blocks. The relevant dimensions are:
ql ∈ R5,000 , yl ∈ R600,000 , A ∈ R600,000x5,000 , θ ∈ R10 , Pl ∈ R10x5,000 , and Pld ∈ R10x10 .
Weak scaling results are presented in Figure 6. As the figure shows, when the number of
coupling variables is small, the algorithm scales nearly perfectly to over 1,000 cores. The
largest problem solved has over 600,000,000 variables and constraints and is solved in under
5 seconds. As shown by Kang et al. (2014), the parallel efficiency of the (explicit) Schur-
Complement method degrades as the number of coupling variables increases due to the time
19
required to factorize a large, dense Schur-Complement. However, Figure 6 demonstrates that
PyNumero and Parapint provide a viable framework for parallel NLP algorithm development
and that the Python overhead is not significant for large problems.
where bwe rounds w to the nearest integer. We model the problem with Pyomo.DAE
(Nicholson et al. 2018, Bynum et al. 2021) and discretize the problem with central finite
difference with respect to x and backward finite difference with respect to t. We used 30
20
Full Space, Serial, Linear Extrapolation
Full Space, Serial, 2 TB Shared Memory
8000 Parallel Schur-Complement, Distributed Memory
Solution Time (s)
6000
4000
2000
0
2 4 8 16 32 64 128 256 512 1024
Time Horizon/# of Processes
Figure 7: Scaling results for Parapint’s interior-point algorithm applied to Problem (12)
.
finite elements in x and 1600 finite elements per unit time. For scalability studies, the
problem size was increased by increasing tf .
Scaling results are presented in Figure 7. We solve Problem (12) with Parapint’s interior-
point algorithm both in serial with a full-space method (no decomposition) and in parallel
with Schur-Complement decomposition. In the full-space method, a single, large KKT
system is solved with a sparse symmetric linear solver (MA27) at each iteration. The full-
space method is performed on a 2-TB, shared memory machine with 40 2.8 GHz Intel
Xeon CPU E7-8891 v3 cores (2 threads per core) while the Schur-Complement method is
performed on a distributed memory machine with 64 GB and 8 2.6 GHz Intel Xeon CPU E5-
2670 cores per node (2 threads per core). We utilize up to 8 processes per node. The x-axis
shows the time horizon of the problem (tf ), which is equal to the number of processes used
for the Schur-Complement method, on a logarithmic scale. The y-axis shows the solution
time. We are able to solve the full-space method up to tf =128. However, the full-space
method scales very closely to linearly with tf , and we projected the full-space solution time
to tf =1024 using linear extrapolation. As the figure shows, Parapint’s Schur-Complement
based interior-point algorithm scales well to over 1024 cores, achieving a projected speedup
factor of approximately 360 on 1024 cores.
The largest problem solved with the Schur-Complement method has approximately
250,000,000 variables and converges in under 30 seconds. The Schur-Complement for this
21
1 i m p o r t pyomo . e n v i r o n a s pe
2 f r o m pyomo i m p o r t d a e
3 import parapint
4 i m p o r t numpy a s np
5 f r o m mpi4py i m p o r t MPI
6 import l o g g i n g
7 import a r g p a r s e
8 i m p o r t math
9 f r o m pyomo . common . t i m i n g i m p o r t H i e r a r c h i c a l T i m e r
10 import csv
11 import p s u t i l
12 f r o m pyomo . c o n t r i b . pynumero . s p a r s e . m p i b l o c k m a t r i x import MPIBlockMatrix
13
14
15 comm : MPI .Comm = MPI .COMM WORLD # MPI Communicator
16 r a n k = comm . G e t r a n k ( )
17 s i z e = comm . G e t s i z e ( ) # Number of processes
18
19 logger = logging . getLogger ( name )
problem is 59, 334 × 59, 334 with 3,439,690 non-zeros. In order to achieve good scalability,
Parapint’s Schur-Complement linear solver exploits sparsity of the Schur-Complement ma-
trix for efficient communication and factorization. The script used to generate these results
is presented in Listings 8 – 11. Listing 8 shows the required import statements along with
a few statements setting up the MPI communicator and the logger. Listing 9 shows the
function used to build a Pyomo model for Problem (12) given a time window. Listing 10
shows how to setup the composite NLP interface used by the interior-point algorithm. Note
that the build model for time block method returns the state variables at the start and
end of the time window and the base class automatically introduces the coupling variables
and sets up the linking constraints. Finally, Listing 11 shows how to solve the problem and
record the results.
4 Distribution
The PyNumero package can be obtained with Pyomo. All Python files are distributed under
the Pyomo umbrella available at https://github.com/Pyomo. Instructions for compiling
the extensions can be found at https://pyomo.readthedocs.io/en/stable/contributed_
packages/pynumero/installation.html.
Parapint can be installed with pip through PyPI (https://pypi.org/project/parapint/).
22
1 def b u i l d b u r g e r s m o d e l ( n f e x =50 , n f e t =50 , s t a r t t =0 , e n d t =1 , a d d i n i t c o n d i t i o n s =True ) :
2 dt = ( e n d t − s t a r t t ) / f l o a t ( n f e t ) # f i n i t e element s i z e ( time )
3 start x = 0
4 end x = 1
5 dx = ( e n d x − s t a r t x ) / f l o a t ( n f e x ) # f i n i t e element s i z e ( space )
6
7 m = pe . B l o c k ( c o n c r e t e=True )
8 m. omega = pe . Param ( i n i t i a l i z e = 0 . 0 2 )
9 m. v = pe . Param ( i n i t i a l i z e = 0 . 0 1 )
10 m. r = pe . Param ( i n i t i a l i z e =0)
11 m. x = d a e . C o n t i n u o u s S e t ( bounds =( s t a r t x , e n d x ) )
12 m. t = d a e . C o n t i n u o u s S e t ( bounds =( s t a r t t , e n d t ) )
13 m. y = pe . Var (m. x , m. t )
14 m. d y d t = d a e . D e r i v a t i v e V a r (m. y , w r t=m. t )
15 m. dydx = d a e . D e r i v a t i v e V a r (m. y , w r t=m. x )
16 m. dydx2 = d a e . D e r i v a t i v e V a r (m. y , w r t =(m. x , m. x ) )
17 m. u = pe . Var (m. x , m. t )
18
19 def y i n i t r u l e (m, x , t ) : # desired state profile
20 i f x <= 0 . 5 ∗ e n d x :
21 r e t u r n 1 ∗ r o u n d ( math . c o s ( 2 ∗ math . p i ∗ t ) )
22 return 0
23 m. y0 = pe . Param (m. x , m. t , d e f a u l t= y i n i t r u l e )
24
25 def u p p e r x b o u n d (m, t ) : # boundary c o n d i t i o n s
26 r e t u r n m. y [ e n d x , t ] == 0
27 m. u p p e r x b o u n d = pe . C o n s t r a i n t (m. t , r u l e= u p p e r x b o u n d )
28
29 def l o w e r x b o u n d (m, t ) : # boundary c o n d i t i o n s
30 r e t u r n m. y [ s t a r t x , t ] == 0
31 m. l o w e r x b o u n d = pe . C o n s t r a i n t (m. t , r u l e= l o w e r x b o u n d )
32
33 def u p p e r x u b o u n d (m, t ) : # no c o n t r o l a t b o u n a r y
34 r e t u r n m. u [ e n d x , t ] == 0
35 m. u p p e r x u b o u n d = pe . C o n s t r a i n t (m. t , r u l e= u p p e r x u b o u n d )
36
37 def l o w e r x u b o u n d (m, t ) : # no c o n t r o l a t b o u n d a r y
38 r e t u r n m. u [ s t a r t x , t ] == 0
39 m. l o w e r x u b o u n d = pe . C o n s t r a i n t (m. t , r u l e= l o w e r x u b o u n d )
40
41 def l o w e r t b o u n d (m, x ) : # i n i t i a l conditions
42 i f x == s t a r t x o r x == e n d x :
43 r e t u r n pe . C o n s t r a i n t . S k i p
44 r e t u r n m. y [ x , s t a r t t ] == m. y0 [ x , s t a r t t ]
45
46 def l o w e r t u b o u n d (m, x ) : # initial control
47 i f x == s t a r t x o r x == e n d x :
48 r e t u r n pe . C o n s t r a i n t . S k i p
49 r e t u r n m. u [ x , s t a r t t ] == 0
50
51 if add init conditions :
52 m. l o w e r t b o u n d = pe . C o n s t r a i n t (m. x , r u l e= l o w e r t b o u n d )
53 m. l o w e r t u b o u n d = pe . C o n s t r a i n t (m. x , r u l e= l o w e r t u b o u n d )
54
55 def p d e (m, x , t ) : # The g o v e r n i n g PDE
56 i f t == s t a r t t o r x == e n d x o r x == s t a r t x :
57 e = pe . C o n s t r a i n t . S k i p
58 else :
59 e = m. d y d t [ x , t ] − m. v ∗m. dydx2 [ x , t ] + m. dydx [ x , t ] ∗m. y [ x , t ] == m. r + m. u [ x , m. t .
prev ( t ) ]
60 return e
61 m. pde = pe . C o n s t r a i n t (m. x , m. t , r u l e= p d e )
62
63 d i s c = pe . T r a n s f o r m a t i o n F a c t o r y ( ’ d a e . f i n i t e d i f f e r e n c e ’ )
64 d i s c . a p p l y t o (m, n f e=n f e t , w r t=m. t , scheme= ’BACKWARD’ ) # discretize space
65 d i s c . a p p l y t o (m, n f e=n f e x , w r t=m. x , scheme= ’CENTRAL ’ ) # discretize time
66
67 def i n t X (m, x , t ) : # Objective i n t e g r a l ( space )
68 r e t u r n (m. y [ x , t ] − m. y0 [ x , t ] ) ∗∗ 2 + m. omega ∗ m. u [ x , t] ∗∗ 2
69 m. i n t X = d a e . I n t e g r a l (m. x , m. t , w r t=m. x , r u l e= i n t X )
70
71 def i n t T (m, t ) : # O b j e c t i v e i n t e g r a l ( time )
72 r e t u r n m. i n t X [ t ]
73 m. i n t T = d a e . I n t e g r a l (m. t , w r t=m. t , r u l e= i n t T )
74
75 def o b j (m) : # m i n o r c o r r e c t i o n t o i n t e g r a l a t b l o c k −b o u n d a r i e s
76 e = 0 . 5 ∗ m. i n t T
77 f o r x i n s o r t e d (m. x ) :
78 i f x != s t a r t x and x != e n d x :
79 e += 0 . 5 ∗ 0 . 5 ∗ dx ∗ d t ∗ m. omega ∗ m. u [ x , s t a r t t ] ∗∗ 2
80 return e
81 m. o b j = pe . O b j e c t i v e ( r u l e= o b j )
82
83 return m
23
1 c l a s s B u r g e r s I n t e r f a c e ( parapint . i n t e r f a c e s . MPIDynamicSchurComplementInteriorPointInterface ) :
2 def init ( s e l f , s t a r t t , end t , num time blocks , n f e t , n f e x ) :
3 s e l f . nfe x = nfe x
4 s e l f . dt = ( e n d t − s t a r t t ) / f l o a t ( n f e t )
5 super ( BurgersInterface , s e l f ) . init ( s t a r t t =s t a r t t , e n d t=e n d t ,
6 n u m t i m e b l o c k s=n u m t i m e b l o c k s , comm=comm)
7
8 def b u i l d m o d e l f o r t i m e b l o c k ( s e l f , ndx , s t a r t t , e n d t , a d d i n i t c o n d i t i o n s ) :
9 n f e t = math . c e i l ( ( e n d t − s t a r t t ) / s e l f . d t )
10 m = b u i l d b u r g e r s m o d e l ( n f e x= s e l f . n f e x , n f e t =n f e t , s t a r t t =s t a r t t , e n d t=e n d t ,
11 a d d i n i t c o n d i t i o n s=a d d i n i t c o n d i t i o n s )
12
13 return (m, ( [m. y [ x , s t a r t t ] f o r x i n s o r t e d (m. x ) i f x n o t i n { 0 , 1 } ] ) ,
14 ( [m. y [ x , e n d t ] f o r x i n s o r t e d (m. x ) i f x n o t i n { 0 , 1 } ] ) )
Listing 10: Setting up the composite NLP interface for Problem (12)
optimization problem with 100K variables and 80K constraints. The overhead from the
Python interface to ASL and HSL only increases the solution time by 11% for large-scale
instances.
PyNumero uses object-oriented principles comprehensively, applying them to algorithms
and problem formulations that exploit block-structures via polymorphism and inheritance
mechanisms. Since block-structured problems result from real-life optimization problems,
we expect the design to promote research of decomposition algorithms. Of special interest
are stochastic programming problems and dynamic optimization problems. Current devel-
opments in Pyomo to model dynamics and uncertainty in optimization problems (Watson
et al. 2012, Nicholson et al. 2018) can be combined with features offered in PyNumero to
prototype and explore new decomposition approaches.
PyNumero’s parallel, block-based linear algebra tools make it possible to write efficient
and scalable parallel NLP algorithms in Python. As an example, we presented Parapint,
a Python package built on PyNumero for parallel solution of stochastic and dynamic opti-
mization problems. Parapint currently includes a Schur-Complement based interior-point
algorithm, and computational results were presented illustrating excellent performance to
at least 1024 cores.
As part of future work we plan to include interfaces to different automatic differentiation
packages. Currently, PyNumero relies on ASL to compute first and second derivatives. Effi-
cient packages available in Python like CasADi and PyAdolC would be excellent extensions
to PyNumero. Additionally, Parapint will be extended to include additional methods for
parallel solution of stochastic and dynamic optimization problems, including cyclic reduc-
tion (Wan et al. 2019), overlapping Schwarz (Shin et al. 2020a), and implicit methods (Kang
et al. 2014).
6 Acknowledgements
The authors would like to thank V. Zavala and D. Ridzal for their valuable inputs.
Sandia National Laboratories is a multimission laboratory managed and operated by
National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary
of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear
Security Administration under contract DE-NA-0003525. This paper describes objective
technical results and analysis. Any subjective views or opinions that might be expressed
in the paper do not necessarily represent the views of the U.S. Department of Energy
or the United States Government. This work was funded in part by the Institute for the
Design of Advanced Energy Systems (IDAES) with funding from the Office of Fossil Energy,
24
1 def setup logging ( args ) :
2 i f r a n k == 1 :
3 l o g g i n g . b a s i c C o n f i g ( l e v e l =l o g g i n g . INFO )
4
5
6 c l a s s Args ( o b j e c t ) :
7 def init ( self ) :
8 s e l f . n f e x = 50 # number o f f i n i t e elements in space
9 s e l f . n f e t = 200 # number o f f i n i t e elements in time per unit time
10 s e l f . end t = 1 # time h o r i z o n
11 s e l f . nblocks = 4 # number o f b l o c k s for decomposition
12
13 def parse arguments ( s e l f ) :
14 p a r s e r = a r g p a r s e . ArgumentParser ( )
15 p a r s e r . a d d a r g u m e n t ( ’−−n f e x ’ , t y p e=i n t , r e q u i r e d=True , h e l p= ’ number o f f i n i t e e l e m e n t s
for x ’ )
16 p a r s e r . a d d a r g u m e n t ( ’−−e n d t ’ , t y p e=i n t , r e q u i r e d=True , h e l p= ’ end t i m e ’ )
17 p a r s e r . a d d a r g u m e n t ( ’−− n f e t p e r t ’ , t y p e=i n t , r e q u i r e d=F a l s e , d e f a u l t =100 , h e l p= ’ number
o f f i n i t e elements f o r t per u n i t time ’ )
18 p a r s e r . a d d a r g u m e n t ( ’−−n b l o c k s ’ , t y p e=i n t , r e q u i r e d=True , h e l p= ’ number o f t i m e b l o c k s f o r
s c h u r complement ’ )
19 args = parser . parse args ()
20 s e l f . nfe x = args . nfe x
21 s e l f . end t = args . end t
22 s e l f . n f e t = args . n f e t p e r t ∗ args . end t
23 s e l f . nblocks = args . nblocks
24
25
26 def main ( a r g s , s u b p r o b l e m s o l v e r c l a s s , s u b p r o b l e m s o l v e r o p t i o n s ) :
27 # c o n s t r u c t t h e c o m p o s i t e NLP i n t e r f a c e
28 i n t e r f a c e = B u r g e r s I n t e r f a c e ( s t a r t t =0 ,
29 e n d t=a r g s . e n d t ,
30 n u m t i m e b l o c k s=a r g s . n b l o c k s ,
31 n f e t =a r g s . n f e t ,
32 n f e x=a r g s . n f e x )
33 # c o n s t r u c t t h e Schur−Complement l i n e a r s o l v e r
34 l i n e a r s o l v e r = p a r a p i n t . l i n a l g . MPISchurComplementLinearSolver (
35 s u b p r o b l e m s o l v e r s ={ndx : s u b p r o b l e m s o l v e r c l a s s ( ∗ ∗ s u b p r o b l e m s o l v e r o p t i o n s ) f o r ndx i n
range ( args . nblocks ) } ,
36 s c h u r c o m p l e m e n t s o l v e r=s u b p r o b l e m s o l v e r c l a s s ( ∗ ∗ s u b p r o b l e m s o l v e r o p t i o n s ) )
37 # S p e c i f y options f o r the i n t e r i o r point algorithm
38 o p t i o n s = parapint . a l g o r i t h m s . IPOptions ( )
39 options . l i n a l g . solver = l i n e a r s o l v e r
40 # c o n s t r u c t a t i m e r f o r r e p o r t i n g s t a t s on c o m p u t a t i o n a l p e r f o r m a n c e
41 timer = HierarchicalTimer ()
42 comm . B a r r i e r ( )
43 # S o l v e the problem with the i n t e r i o r p o i n t a l g o r i t h m
44 s t a t u s = p a r a p i n t . a l g o r i t h m s . i p s o l v e ( i n t e r f a c e =i n t e r f a c e , o p t i o n s=o p t i o n s , t i m e r=t i m e r )
45 a s s e r t s t a t u s == p a r a p i n t . a l g o r i t h m s . I n t e r i o r P o i n t S t a t u s . o p t i m a l
46 # Store the r e s u l t s in a csv f i l e
47 n primals = interface . n primals ()
48 l o g g e r . i n f o ( ’ \n ’ + s t r ( t i m e r ) )
49 i f r a n k == 1 :
50 f = open ( ’ b u r g e r s ’ + s t r ( a r g s . e n d t ) + ’ ’ + s t r ( a r g s . n f e x ) + ’ ’ + s t r ( a r g s . n f e t ) + ’
’ + s t r ( s i z e ) + ’ . c s v ’ , ’w ’ )
51 fieldnames = [ ’ end t ’ , ’ nfe x ’ , ’ n f e t ’ , ’ s i z e ’ , ’ n blocks ’ , ’ n primals ’ , ’ sc nnz ’ , ’
sc dim ’ , ’ virt mem ’ , ’ c p u p e r c e n t ’ ]
52 t i m e r i d e n t i f i e r s = timer . get timers ()
53 fi el dn am es . extend ( t i m e r i d e n t i f i e r s )
54 writer = csv . writer ( f )
55 writer . writerow ( fieldnames )
56 row = [ a r g s . e n d t , a r g s . n f e x , a r g s . n f e t , s i z e , a r g s . n b l o c k s , n p r i m a l s ,
57 l i n e a r s o l v e r . schur complement . data . s i z e ,
58 l i n e a r s o l v e r . schur complement . shape [ 0 ] ,
59 p s u t i l . virtual memory () . percent , p s u t i l . cpu percent () ]
60 row . e x t e n d ( t i m e r . g e t t o t a l t i m e ( name ) f o r name i n t i m e r i d e n t i f i e r s )
61 w r i t e r . w r i t e r o w ( row )
62 f . close ()
63
64
65 if name == ’ main ’:
66 a r g s = Args ( )
67 args . parse arguments ()
68 setup logging ( args )
69 # c n t l [ 1 ] i s t h e MA27 p i v o t t o l e r a n c e
70 main ( a r g s=a r g s ,
71 s u b p r o b l e m s o l v e r c l a s s=p a r a p i n t . l i n a l g . I n t e r i o r P o i n t M A 2 7 I n t e r f a c e ,
72 s u b p r o b l e m s o l v e r o p t i o n s ={ ’ c n t l o p t i o n s ’ : { 1 : 1 e −6}})
25
Cross-Cutting Research, U.S. Department of Energy. This work was also funded by Sandia
National Laboratories Laboratory Directed Research and Development (LDRD) program.
References
Patrick R Amestoy, Iain S Duff, Jean-Yves L’Excellent, and Jacko Koster. Mumps: a general
purpose distributed memory sparse solver. In International Workshop on Applied Parallel
Computing, pages 121–130. Springer, 2000.
J. Andersson. A General-Purpose Software Framework for Dynamic Optimization. PhD thesis,
Arenberg Doctoral School, KU Leuven, Department of Electrical Engineering (ESAT/SCD)
and Optimization in Engineering Center, Kasteelpark Arenberg 10, 3001-Heverlee, Belgium,
October 2013.
L T. Biegler. Nonlinear Programming: Concepts, Algorithms, and Applications to Chemical Pro-
cesses. SIAM, 2010.
Michael L Bynum, Gabriel A Hackebeil, William E Hart, Carl D Laird, Bethany L Nicholson,
John D Siirola, Jean-Paul Watson, and David L Woodruff. Pyomo—Optimization Modeling
in Python, volume 67. Springer Nature, 2021.
Naiyuan Chiang, Cosmin G Petra, and Victor M Zavala. Structured nonconvex optimization of
large-scale energy systems using PIPS-NLP. In Proc. of the 18th Power Systems Computation
Conference (PSCC), Wroclaw, Poland, 2014.
Lisandro D. Dalcin, Rodrigo R. Paz, Pablo A. Kler, and Alejandro Cosimo. Parallel distributed
computing using Python. Advances in Water Resources, 34(9):1124–1139, September 2011.
Haoyang Deng and Toshiyuki Ohtsuka. A parallel newton-type method for nonlinear model pre-
dictive control. Automatica, 109:108560, 2019.
Iain S Duff and John K Reid. The multifrontal solution of indefinite sparse symmetric linear
equations. ACM Transactions on Mathematical Software (TOMS), 9(3):302–325, 1983.
Jonathan Eckstein and Dimitri P Bertsekas. On the Douglas-Rachford splitting method and the
proximal point algorithm for maximal monotone operators. Mathematical Programming, 55
(1-3):293–318, 1992.
R. Fourer, D.M. Gay, and B.W. Kernighan. AMPL: A Modeling Language for Mathematical Pro-
gramming. Scientific Press, 1993. ISBN 9780894262333. URL https://books.google.com/
books?id=8vJQAAAAMAAJ.
David M Gay. Hooking your solver to ampl. Technical report, Citeseer, 1997.
David M Gay. Writing .nl files. Technical report, Sandia National Laboratories, 2005.
J. Gondzio and A. Grothey. Exploiting structure in parallel implementation of interior point meth-
ods for optimization. Computational Management Science, 6(2):135–160, May 2009.
Andreas Griewank, David Juedes, and Jean Utke. Algorithm 755: ADOL-C: A package for the
automatic differentiation of algorithms written in c/c++. ACM Trans. Math. Softw., 22(2):
131–167, June 1996. ISSN 0098-3500.
Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen,
David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert
Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Hal-
dane, Jaime Fernández del Rı́o, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin
Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E.
Oliphant. Array programming with NumPy. Nature, 585(7825):357–362, September 2020.
doi: 10.1038/s41586-020-2649-2. URL https://doi.org/10.1038/s41586-020-2649-2.
Jia Kang, Yankai Cao, Daniel P. Word, and C. D. Laird. An interior-point method for efficient so-
lution of block-structured NLP problems using an implicit Schur-complement decomposition.
Computers and Chemical Engineering, 71:563–573, 2014.
26
Bethany Nicholson, John D Siirola, Jean-Paul Watson, Victor M Zavala, and Lorenz T Biegler.
pyomo.dae: a modeling and automatic discretization framework for optimization with dif-
ferential and algebraic equations. Mathematical Programming Computation, 10(2):187–223,
2018.
Cosmin G. Petra, Olaf Schenk, Miles Lubin, and Klaus Gäertner. An augmented incomplete
factorization approach for computing the schur complement in stochastic optimization. SIAM
Journal on Scientific Computing, 36(2):C139–C162, 2014. doi: 10.1137/130908737. URL
https://doi.org/10.1137/130908737.
R Tyrrell Rockafellar and Roger J-B Wets. Scenarios and policy aggregation in optimization under
uncertainty. Mathematics of operations research, 16(1):119–147, 1991.
Jose S Rodriguez, Bethany Nicholson, Carl Laird, and Victor M Zavala. Benchmarking ADMM in
nonconvex NLPs. Computers & Chemical Engineering, 119:315–325, 2018.
Jose S Rodriguez, Carl D Laird, and Victor M Zavala. Scalable preconditioning of block-structured
linear algebra systems using ADMM. Computers & Chemical Engineering, 133:106478, 2020.
M. Sala, W. Spotz, and M. Heroux. PyTrilinos: High-performance distributed-memory solvers for
Python. ACM Transactions on Mathematical Software (TOMS), 34, March 2008.
Sungho Shin, Mihai Anitescu, and Victor M Zavala. Overlapping schwarz decomposition for con-
strained quadratic programs. In 2020 59th IEEE Conference on Decision and Control (CDC),
pages 3004–3009. IEEE, 2020a.
Sungho Shin, Carleton Coffrin, Kaarthik Sundar, and Victor M Zavala. Graph-based modeling and
decomposition of energy infrastructures. arXiv preprint arXiv:2010.02404, 2020b.
Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cour-
napeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J.
van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, An-
drew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng,
Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Hen-
riksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian
Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. SciPy 1.0: Fundamental Al-
gorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020a. doi:
10.1038/s41592-019-0686-2.
Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Courna-
peau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. Scipy 1.0:
fundamental algorithms for scientific computing in python. Nature methods, 17(3):261–272,
2020b.
Andreas Wächter and Lorenz T Biegler. On the implementation of a primal-dual interior point filter
line search algorithm for large-scale nonlinear programming. Mathematical Programming, 106:
25–57, 2006.
Wei Wan, John P Eason, Bethany Nicholson, and Lorenz T Biegler. Parallel cyclic reduction
decomposition for dynamic optimization problems. Computers & Chemical Engineering, 120:
54–69, 2019.
Jean-Paul Watson, David L Woodruff, and William E Hart. PySP: modeling and solving stochastic
programs in Python. Mathematical Programming Computation, 4(2):109–149, 2012.
Daniel P Word, Jean-Paul Watson, David L Woodruff, and Carl D Laird. A progressive hedging
approach for parameter estimation via stochastic nonlinear programming. In Computer Aided
Chemical Engineering, volume 31, pages 1507–1511. Elsevier, 2012.
Daniel P Word, Jia Kang, Johan Akesson, and Carl D Laird. Efficient parallel solution of large-scale
nonlinear dynamic optimization problems. Computational Optimization and Applications, 59
(3):667–688, 2014.
27
Victor M Zavala, Carl D Laird, and Lorenz T Biegler. Interior-point decomposition approaches for
parallel solution of large-scale nonlinear parameter estimation problems. Chemical Engineer-
ing Science, 63(19):4834–4845, 2008.
28