Decision Diagrams For Optimization
Decision Diagrams For Optimization
David Bergman
Andre A. Cire
Willem-Jan van Hoeve
John Hooker
Decision
Diagrams for
Optimization
Artificial Intelligence: Foundations, Theory,
and Algorithms
Series editors
Barry O’Sullivan, Cork, Ireland
Michael Wooldridge, Oxford, UK
More information about this series at http://www.springer.com/series/13900
David Bergman Andre A. Cire
•
Decision Diagrams
for Optimization
123
David Bergman Willem-Jan van Hoeve
Department of Operations and Information Tepper School of Business
Management, School of Business Carnegie Mellon University
University of Connecticut Pittsburgh, PA
Storrs, CT USA
USA
John Hooker
Andre A. Cire Tepper School of Business
Department of Management, UTSC Carnegie Mellon University
University of Toronto Pittsburgh, PA
Toronto, ON USA
Canada
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation for the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 A New Solution Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Plan of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Historical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Origins of Decision Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Decision Diagrams in Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Early Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 A Discrete Optimization Method . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.3 Decision Diagrams in Constraint Programming . . . . . . . . . . . 17
2.3.4 Relaxed Decision Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.5 A General-Purpose Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.6 Markov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
v
vi Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Foreword
This book provides an excellent demonstration of how the concepts and tools of one
research community can cross into another, yielding powerful insights and ideas.
Early work on decision diagrams focused on modeling and verifying properties
of digital systems, including digital circuits and abstract protocols. Decision dia-
grams (DDs) provided a compact representation of these systems and a useful data
structure for algorithms to construct these representations and to answer queries
about them. Fundamentally, though, they were used to solve problems having yes/no
answers, such as: “Is it possible for the system to reach a deadlocked state?”, or “Do
these two circuits compute the same function?”
Using DDs for optimization introduces an entirely new set of possibilities and
challenges. Rather than just finding some satisfying solution, the program must
find a “best” solution, based on some objective function. Researchers in the digital
systems and verification communities recognized that, given a DD representation of
a solution space, it is easy to count the number of solutions and to find an optimal
solution based on very general classes of objective functions. But, it took the skill of
leading experts in optimization, including the authors of this book, to fully expand
DDs into a general-purpose framework for solving optimization problems.
The authors show how the main strategies used in discrete optimization, includ-
ing problem relaxation, branching search, constraint propagation, primal solving,
and problem-specific modeling, can be adapted and cast into a DD framework. DDs
become a data structure for managing the entire optimization process: finding upper
bounds and feasible solutions, storing solutions to subproblems, applying global
constraints, and guiding further search. They are especially effective for solving
problems that fare poorly with traditional optimization techniques, including linear
xi
xii Foreword
Abstract This introductory chapter explains the motivation for developing decision
diagrams as a new discrete optimization technology. It shows how decision diagrams
implement the five main solution strategies of general-purpose optimization and
constraint programming methods: relaxation, branching search, constraint propaga-
tion, primal heuristics, and intelligent modeling. It presents a simple example to
illustrate how decision diagrams can be used to solve an optimization problem. It
concludes with a brief outline of the book.
1.3 An Example
A small example will illustrate some of the key concepts of optimization with
decision diagrams. Consider the integer programming problem
1.3 An Example 5
The example is chosen for its simplicity, and not because decision diagram tech-
nology is primarily directed at integer programming problems. This is only one of
many classes of problems that can be formulated recursively for solution by decision
diagrams.
A decision diagram for this problem instance represents possible assignments to
the variables x1 , . . . , x5 and is depicted in Fig. 1.1. It is a directed acyclic graph in
which the nodes are partitioned into six layers so that an arc leaving a node at layer
i corresponds to a value assignment for variable xi . Since all variables are binary in
this problem, the diagram is a binary decision diagram and has two types of arcs:
dashed arcs in layer i represent the assignment xi = 0, and solid arcs represent xi = 1.
Any path from the root node r to the terminal node t represents a complete value
assignment to the variables xi . One can verify that the diagram in Fig. 1.1 exactly
represents the seven feasible solutions of problem (1.1).
To capture the objective function, we associate with each arc a weight that
represents the contribution of that value assignment to the objective function.
Dashed arcs have a weight of zero in this instance, while solid arcs have weight
equal to the objective function coefficient of that variable. It follows that the value
assignment that maximizes the objective function corresponds to the longest path
from r to t with respect to these arc weights. For the diagram in Fig. 1.1, the
longest path has value −8 and indicates the assignment (x1 , . . . , x5 ) = (1, 1, 1, 1, 0),
which is the optimal solution of (1.1). In general, any linear function (or, more
generally, any separable function) can be optimized in polynomial time in the size
of the diagram. This fact, alongside the potential to represent feasible solutions in a
compact fashion, were early motivations for the application of decision diagrams to
optimization.
A formidable obstacle to this approach, however, is that a decision diagram
that exactly represents the feasible solutions of a problem can grow exponentially
6 1 Introduction
x1 0 5
x2 3 0 3
x3 0 −1 0 −1
x5 −3
−3 0
0
t
Fig. 1.1 A decision diagram for problem (1.1). Arcs are partitioned into layers, one for each
problem variable. Dashed and solid arcs in layer i represent the assignments xi = 0 and xi = 1,
respectively.
with the problem size. This is true in particular of integer programming, because
a shortest- or longest-path computation in the associated diagram is a linear pro-
gramming problem, and there are integer programming problems that cannot be
reformulated as linear programming problems of polynomial size [65]. In fact, most
practical problem classes result in decision diagrams that grow exponentially and
thus can be solved only in very small instances.
To circumvent this issue, the authors in [4] introduced the concept of a relaxed
decision diagram, which is a diagram of limited size that represents an overapprox-
imation of the solution set. That is, all feasible solutions are associated with some
path in the diagram, but not all paths in the diagram correspond to a feasible solution
of the problem. The size of the diagram is controlled by limiting its width, which is
the maximum number of nodes in any layer. A key property of relaxed diagrams is
that a longest path now yields an upper bound on the maximum value of an objective
function (and a shortest path yields a lower bound for minimization problems).
For example, Fig. 1.2 depicts a relaxed decision diagram for problem (1.1) with
a limited width of 2. Of the ten r–t paths, seven represent the feasible solutions
of (1.1), and the remaining three represent infeasible solutions. In particular, the
1.3 An Example 7
x1 0 5
x2 3 0 3
x3 0 −1 0 −1
x4 −15 −15
x5 −3
0
0
t
Fig. 1.2 A relaxed decision diagram for problem (1.1), with a limited width of at most two nodes
in any layer.
After a brief literature review in Chapter 2, the book develops methods for construct-
ing exact, relaxed, and restricted decision diagrams for optimization problems. It
then presents a general-purpose method for solving discrete optimization problems,
followed by discussions of variable order, recursive modeling, constraint program-
ming, and two special classes of problems.
Chapter 3 formally develops methods for constructing decision diagrams for
discrete optimization, based on a recursive (dynamic programming) model of the
problem that associates states with nodes of the diagram. It presents recursive
models of three classical optimization problems that reappear in later chapters: the
maximum independent set problem, the maximum cut problem on a graph, and the
maximum 2-satisfiability problem.
Chapter 4 modifies the compilation procedure of the previous chapter to create
a relaxed decision diagram by merging states as the diagram is constructed. It
investigates how various parameters affect the quality of the resulting bound. It
reports computational tests showing that, for the independent set problem, relaxed
decision diagrams can deliver tighter bounds, in less computation time, than those
obtained by linear programming and cutting planes at the root node in a state-of-
the-art integer programming solver.
Chapter 5 presents an algorithm for top-down construction of restricted decision
diagrams that provide a primal heuristic for finding feasible solutions. Computa-
tional results show that restricted diagrams can deliver better solutions than integer
programming technology for large set covering and set packing problems.
Chapter 6 combines the ideas developed in previous chapters to devise a general-
purpose solution method for discrete optimization, based entirely on decision dia-
grams. It introduces a novel search algorithm that branches on nodes of a relaxed
or restricted decision diagram. It reports computational tests showing that a solver
based on decision diagrams is competitive with or superior to state-of-the-art in-
teger programming technology on the three classes of problems described earlier,
even though integer programming benefits from decades of development, and even
though these problems have natural integer programming models. Further compu-
tational tests indicate that branching in a decision diagram can utilize massively
parallel computation much more effectively than integer programming methods.
Chapter 7 examines more deeply the effect of variable ordering on the size of
exact decision diagrams and the quality of bounds provided by relaxed diagrams,
1.4 Plan of the Book 9
Abstract This chapter provides a brief review of the literature on decision diagrams,
primarily as it relates to their use in optimization and constraint programming. It
begins with an early history of decision diagrams and their relation to switching
circuits. It then surveys some of the key articles that brought decision diagrams into
optimization and constraint solving. In particular it describes the development of
relaxed and restricted decision diagrams, the use of relaxed decision diagrams for
enhanced constraint propagation and optimization bounding, and the elements of a
general-purpose solver. It concludes with a brief description of the role of decision
diagrams in solving some Markov decision problems in artificial intelligence.
2.1 Introduction
Research on decision diagrams spans more than five decades, resulting in a large
literature and a wide range of applications. This chapter provides a brief review of
this literature, primarily as it relates to the use of decision diagrams in optimization
and constraint programming. It begins with an early history of decision diagrams,
showing how they orginated from representations of switching circuits and evolved
to the ordered decision diagrams now widely used for circuit design, product
configuration, and other purposes.
The chapter then surveys some of the key articles that brought decision diagrams
into optimization and constraint programming. It relates how decision diagrams
initially played an auxiliary role in the solution of some optimization problems
and were subsequently proposed as a stand-alone optimization method, as well as
The basic idea behind decision diagrams was introduced by Lee [110] in the form
of a binary-decision program, which is a particular type of computer program that
represents a switching circuit. Shannon had shown in his famous master’s thesis
[144] that switching circuits can be represented in Boolean algebra, thus bringing
Boole’s ideas into the computer age. Lee’s objective was to devise an alternative
representation that is more conducive to the actual computation of the outputs of
switching circuits.
Figure 2.1, taken from Lee’s article, presents a simple switching circuit. The
switches are controlled by binary variables x, y and z. The symbol x in the circuit
indicates a switch that is open when x = 0, while x indicates a switch that is open
when x = 1, and similarly for the other variables. The output of the circuit is 1 if
there is an open path from left to right, and otherwise the output is 0. For instance,
(x, y, z) = (1, 1, 0) leads to an output of 1, while (x, y) = (0, 0) leads to an output of
0, irrespective of the value of z.
A binary-decision program consists of a single type of instruction that Lee calls
T , which has the form
T : x; A, B.
x y z
y
x y z
Fig. 2.1 Example of a switching circuit from [110].
1. T : x; 2, 4
2. T : y; θ , 3
3. T : z; θ , I (2.1)
4. T : y; 3, 5
5. T : z; I, θ
where θ is Lee’s symbol for an output of 0, and I for an output of 1. The five
instructions correspond conceptually to the nodes of a decision diagram, because
at each node there is a choice to move to one or two other nodes, and the choice
depends on the value of an associated variable. However, the nodes need not be
organized into layers that correspond to the variables, and a given assignment to the
variables need not correspond to a path in the diagram. A BDD representation of
(2.1) appears in Fig. 2.2. In this case, the nodes can be arranged in layers, but there
is no path corresponding to (x, y, z) = (0, 0, 1).1
Lee formulated rules for constructing a switching circuit from a binary-decision
program. He also provided bounds on the minimum number of instructions that
are necessary to represent a given Boolean function. In particular, he showed that
computing the output of a switching circuit with a binary-decision program is in
general faster than computing it through Boolean operations and, or, and sum, often
by orders of magnitude.
The graphical structure we call a binary decision diagram, as well as the term,
were introduced by Akers [3]. Binary-decision programs and BDDs are equivalent
in some sense, but there are advantages to working with a graphical representation.
It is easier to manipulate and provides an implementation-free description of a
Boolean function, in the sense that it can be used as the basis for different algorithms
1 There is such a path, in this case, if one treats the arc from y to 0 as a “long arc,” meaning that z
can take either value on this arc.
14 2 Historical Overview
0 1
y y
0
1 1
0 z z
1 1
0 0
0 1
Fig. 2.2 Binary decision diagram corresponding to the binary-decision program (2.1).
for computing outputs. Akers used BDDs to analyze certain types of Boolean
functions and as a tool for test generation; that is, for finding a set of inputs which
can be used to confirm that a given implementation performs correctly. He also
showed that a BDD can often be simplified by superimposing isomorphic portions
of the BDD.
The advance that led to the widespread application of BBDs was due to Bryant
[37]. He adopted a data structure in which the decision variables are restricted
to a particular ordering, forcing all nodes in a layer of the BDD to correspond
to the same decision variable. The result is an ordered decision diagram (which
we refer to simply as a decision diagram in this book). For any given ordering
of the variables, all Boolean functions can be represented by ordered BDDs, and
many ordered BDDs can be simplified by superimposing isomorphic portions of
the BDD. A BDD that can be simplified no further in this fashion is known as a
reduced ordered binary decision diagram (RO-BDD). A fundamental result is that
RO-BDDs provide a canonical representation of Boolean functions. That is, for any
given variable ordering, every Boolean function has a unique representation as an
RO-BDD. This allows one to check whether a logic circuit implements a desired
Boolean function, for example, by constructing an RO-BDD for either and noting
whether they are identical.
Another advantage of ordered BDDs is that operations on Boolean functions,
such as disjunction and conjunction, can be performed efficiently by an appropriate
operation on the corresponding diagrams. The time complexity for an operation is
bounded by the product of the sizes of the BDDs. Unfortunately, the BDDs for some
popular circuits can grow exponentially even when they are reduced. For example,
2.3 Decision Diagrams in Optimization 15
the RO-BDD grows linearly for an adder circuit but exponentially for a multiplier
circuit. Furthermore, the size of a reduced BDD can depend dramatically on the
variable ordering. Computing the ordering that yields the smallest BDD is a co-
NP-complete problem [71]. Ordering heuristics that take into account the problem
domain may therefore be crucial in obtaining small BDDs for practical applications.
The canonical representation and efficient operations introduced by Bryant led
to a stream of BDD-related research in computer science. Several variants of the
basic BDD data structure were proposed for different theoretical and practical
purposes. A monograph by Wegener [157] provides a comprehensive survey of
different BDD types and their uses in practice. Applications of BDDs include formal
verification [99], model checking [50], product configuration [5]—and, as we will
see, optimization.
feasible sets of the relatively small subproblems at leaf nodes. The optimal solutions
of the subproblems are then extracted from the BDDs, so that no more branching is
necessary. Computational experiments were limited to a small number of instances
but showed a significant improvement over the IP methods of the time. We remark
in passing that Wegener’s monograph [157], mentioned earlier, proposes alternative
methods for formulating 0/1 programming problems with BDDs, although they have
not been tested experimentally. It also studies the growth of BDD representations for
various types of Boolean functions.
Hachtel and Somenzi [81] showed how BDDs can help solve maximum flow
problems in large-scale 0/1 networks, specifically by enumerating augmenting
paths. Starting with a flow of 0, a corresponding flow-augmenting BDD is compiled
and analyzed to compute the next flow. The process is repeated until there are no
more augmenting paths, as indicated by an empty BDD. Hachtel and Somenzi were
able to compute maximum flows for graphs having more than 1027 vertices and 1036
edges. However, this was only possible for graphs with short augmenting paths,
because otherwise the resulting BDDs would be too large.
Behle [19] showed how BDDs can help generate valid inequalities (cutting
planes) for general 0/1 programming. He first studied the reduced BDD that encodes
the threshold function represented by a 0/1 linear inequality, which he called a
threshold BDD. He also showed how to compute a variable ordering that minimizes
the size of the BDD. To obtain a BDD for a 0/1 programming problem, he conjoined
the BDDs representing the individual inequalities in the problem, using an algorithm
based on parallel computation. The resulting BDD can, of course, grow quite large
and is practical only for small problem instances. He observed that when the BDD
is regarded as a flow network, the polytope representing its feasible set is the convex
hull of the feasible set of the original 0/1 problem. Based on this, he showed how
to generate valid inequalities for the 0/1 problem by analyzing the polar of the flow
polytope, a method that can be effective for small but hard problem instances.
not enjoyed by other optimization methods: (a) they are insensitive to whether
the constraint and objective function are linear or convex, which makes them
appropriate for global optimization, and (b) they are well suited to comprehensive
postoptimality analysis.
Postoptimality analysis is arguably important because simply finding an optimal
solution misses much of the information and insight encoded in an optimization
model. Decision diagrams provide a transparent data structure from which one can
quickly extract answers to a wide range of queries, such as how the optimal solution
would change if certain variables were fixed to certain values, or what alternative
solutions are available if one tolerates a small increase in cost. The power of this
analysis is illustrated in [82, 86] for capital budgeting, network reliability, and
portfolio design problems.
In subsequent work [83], Hadžić and Hooker proposed a cost-bounding method
for reducing the size of the decision diagram used for postoptimality analysis.
Assuming that the optimal value is given, they built a BDD that represents all
solutions whose cost is within a given tolerance of the optimum. Since nearly all
postoptimality analysis of interest is concerned with solutions near the optimum,
such a cost-bounded BDD is adequate. They also showed how to reduce the size of
the BDD significantly by creating a sound cost-bounded BDD rather than an exact
one. This is a BDD that introduces some infeasible solutions, but only when their
cost is outside the tolerance. When conducting sensitivity analysis, the spurious
solutions can be quickly discarded by checking their cost. Curiously, a sound BDD
can be substantially smaller than an exact one even though it represents more
solutions, provided it is properly constructed. This is accomplished by pruning and
contraction methods that remove certain nodes and arcs from the BDD. A number
of experiments illustrated the space-saving advantages of sound BDDs.
Due to the tendency of BDDs to grow exponentially, a truly scalable solution
algorithm for discrete optimization became available only with the introduction of
relaxed decision diagrams. These are discussed in Section 2.3.4 below.
A LLDIFFERENT(X ), which requires that the set X of variables take distinct values.
Each global constraint represents a specific combinatorial structure that can be
exploited in the solution process. In particular, an associated filtering algorithm
removes infeasible values from variable domains. The reduced domains are then
propagated to other constraints, whose filtering mechanisms reduce them further.2
Decision diagrams have been proposed as a data structure for certain filtering
algorithms. For example, they are used in [70, 90, 107] for constraints defined on
set variables, whose domains are sets of sets. They have also been used in [44, 45] to
help filter “table” constraints, which are defined by an explicit list of allowed tuples
for a set of variables.
It is important to note that in this research, decision diagrams help to filter
domains for one constraint at a time, while information is conveyed to other
constraints in the standard manner through individual variable domains (i.e., through
a domain store). However, decision diagrams can be used for propagation as well,
as initially pointed out by Andersen, Hadžić, Hooker and Tiedemann [4]. Their
approach, and the one emphasized in this book, is to transmit information though a
“relaxed” decision diagram rather than through a domain store, as discussed in the
next section. Another approach is to conjoin MDDs associated with constraints that
contain only a few variables in common, as later proposed by Hadžić, O’Mahony,
O’Sullivan and Sellmann [87] for the market split problem. Either mechanism prop-
agates information about inter-variable relationships, as well as about individual
variables, and can therefore reduce the search significantly.
one that represents a proper subset of feasible solutions. Bergman, Ciré, van Hoeve
and Yunes [27] showed that restricted diagrams are competitive with the primal
heuristics in state-of-the-art solvers when applied to set covering and set packing
problems.
A third element is the connection between decision diagrams and dynamic pro-
gramming, studied by Hooker in [97]. A weighted decision diagram, which is one in
which costs are associated with the arcs, can be viewed as the state transition graph
for a dynamic programming model. This means that problems are most naturally
formulated for an MDD-based solver as dynamic programming models. The state
variables in the model are those used in the top-down compilation of relaxed and
restricted diagrams.
One advantage of dynamic programming models is that they allow for state-
dependent costs, affording a great deal of flexibility in the choice of objective
function. A given state-dependent cost function can be represented in multiple
ways by assigning costs to arcs of an MDD, but it is shown in [97] that if the
cost assignment is “canonical,” there is a unique reduced weighted diagram for the
problem. This generalizes the uniqueness theorem for classical reduced decision
diagrams. A similar result is proved by Sanner and McAllester [138] for affine
algebraic decision diagrams. The use of canonical costs can reduce the size of a
weighted decision diagram dramatically, as is shown in [97] for a textbook inventory
management problem.
A solver based on these elements is described in [26]. It uses a branch-and-
bound algorithm in which decision diagrams play the role of the linear programming
relaxation in traditional integer programming methods. The solver also uses a novel
search scheme that branches on nodes of a relaxed decision diagram rather than on
variables. It proved to be competitive with or superior to a state-of-the-art integer
programming solver on stable set, maximum cut, and maximum 2-SAT problems,
even though integer programming technology has improved by orders of magnitude
over decades of solver development.
The use of relaxed decision diagrams in the solver has a superficial resem-
blance to state space relaxation in dynamic programming, an idea introduced by
Christofides, Mingozzi and Toth [47]. However, there are fundamental differences.
Most importantly, the problem is solved exactly by a branch-and-bound search
rather than approximately by enumerating states. In addition, the relaxation is
created by splitting or merging nodes in a decision diagram (state transition graph)
rather than mapping the state space into a smaller space. It is tightened by filtering
2.3 Decision Diagrams in Optimization 21
Decision diagrams have also played an auxiliary role in the solution of planning
problems that arise in the artificial intelligence (AI) literature. These problems
are often modeled as stochastic dynamic programming problems, because a given
action or control can result in any one of several state transitions, each with a given
probability. Nearly all the attention in AI has been focused on Markov decision
processes, a special case of stochastic dynamic programming in which the state
space and choice of actions are the same in each period or stage. A Markov decision
process can also be partially observable, meaning that one cannot observe the
current state directly but can observe only a noisy signal that indicates that the
system could be in one of several possible states, each with a known probability.
The solution of stochastic dynamic programming models is complicated not only
by the large state spaces that characterize deterministic models, but by the added
burden of calculating expected immediate costs and costs-to-go that depend on
probabilistic outcomes.3 A natural strategy is to simplify and/or approximate the
cost functions, an option that has been explored for many years in the optimization
world under the name approximate dynamic programming (see [129] for a survey).
The AI community has devised a similar strategy. The most obvious approximation
technique is state aggregation, which groups states into sets and lets a single state
represent each set. A popular form of aggregation in AI is “abstraction,” in which
states are implicitly grouped by ignoring some of the problem variables.
This is where decision diagrams enter the picture. The cost functions are sim-
plified or approximated by representing them with weighted decision diagrams, or
rather algebraic decision diagrams (ADDs), which are a special case of weighted
decision diagrams in which costs are attached to terminal nodes. One well-known
3 The expected immediate cost of an action in a given state is the expected cost of taking that action
in that state. The expected cost-to-go is the expected total cost of taking that action and following
an optimal policy thereafter.
22 2 Historical Overview
approach [95] uses ADDs as an abstraction technique to simplify the immediate cost
functions in fully observable Markov decision processes. Some related techniques
are developed in [63, 143].
Relaxation is occasionally used in these methods, but it is very different from the
type of relaxation described above. Perhaps the closest analog appears in [146],
which uses ADDs to represent a relaxation of the cost-to-go function, thereby
providing a valid bound on the cost. Specifically, it attaches cost intervals to leaf
nodes of an ADD that represents the cost function. The ADD is reduced by merging
some leaf nodes and taking the union of the associated intervals. This does not create
a relaxation of the entire recursion, as does node merger as employed in this book,
but only relaxes the cost-to-go in an individual stage of the recursion. The result is
a relaxation that embodies less information about the interaction of stages.
On the other hand, the methods we present here do not accommodate stochastic
dynamic programming. All state transitions are assumed to be deterministic. It
is straightforward to define a stochastic decision diagram, in analogy with the
transition graph in stochastic dynamic programming, but it is less obvious how to
relax a stochastic decision diagram by node merger or other techniques. This poses
an interesting research issue that is currently under study.
Chapter 3
Exact Decision Diagrams
3.1 Introduction
Section 3.11 presents the constraint by separation method. Finally, Section 3.12
shows the validity of some key DP formulations used throughout this chapter.
max f (x)
Ci (x), i = 1, . . . , m (P)
x ∈ D,
In the formulation above, we define a variable x j for each item j with binary
domain D(x j ) = {0, 1} indicating whether item j is selected (x j = 1) or not (x j = 0).
The objective function is the total profit of the selected items, and there is a single
linear constraint enforcing the weight capacity. The set of feasible solutions is
Sol(P) = {(0, 0, 0, 0), (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1), (1, 1, 0, 0)}. The
optimal solution is x∗ = (1, 1, 0, 0) and has a value of z∗ = 15.
For the purposes of this book, a decision diagram (DD) is a graphical structure that
encodes a set of solutions to a discrete optimization problem P. Formally, B =
(U, A, d) is a layered directed acyclic multigraph with node set U, arc set A, and arc
labels d. The node set U is partitioned into layers L1 , . . . , Ln+1 , where layers L1 and
Ln+1 consist of single nodes, the root node r and the terminal node t, respectively.
Each arc a ∈ A is directed from a node in some L j to a node in L j+1 and has a
label d(a) ∈ D(x j ) that represents the assignment of value d(a) to variable x j . Thus,
every arc-specified path p = (a(1) , . . . , a(n) ) from r to t encodes an assignment to the
variables x1 , . . . , xn , namely x j = d(a( j) ) for j = 1, . . . , n. We denote this assignment
by x p . The set of r to t paths of B represents the set of assignments Sol(B).
Figure 3.1 depicts a decision diagram B for the knapsack problem (3.1). The
diagram is composed of five layers, where the first four layers correspond to
variables x1 , . . . , x4 , respectively. Every arc a in B represents either a value of 0,
depicted as a dashed arc in the figure, or a value of 1, depicted as a solid arc; e.g.,
d((u1 , v1 )) = 0. In particular, this DD encodes exactly the set of feasible solutions
to the knapsack problem (3.1). For example, the path p = (r, u1 , v2 , w2 , t) represents
the assignment x p = (0, 1, 0, 0).
The width |L j | of layer L j is the number of nodes in the layer, and the width of
a DD is max j {|L j |}. The size |B| of a DD B is given by its number of nodes. For
instance, the width of B in Fig. 3.1 is 2 and |B| = 8. No two arcs leaving the same
node have the same label, which means that every node has a maximum out-degree
of |D(x j )|. If all variables are binaries, then the DD is a binary decision diagram
26 3 Exact Decision Diagrams
r 0
1
0 8
x1
u1 u2
x2 0 7 0 7
v1 v2
x3 0 6 0
w1 w2
14
x4 0
0
Fig. 3.1 Exact BDD for the knapsack instance of Table 3.1. Dashed and solid arcs represent arc
labels 0 and 1, respectively. The numbers on the arcs indicate their length.
(BDD), which has been the subject of the majority of studies in the area due to
the applications in Boolean logic [110, 99, 37]. On the other hand, a multivalued
decision diagram (MDD) allows out-degrees higher than 2 and therefore encodes
values of general finite-domain variables.
Because we are interested in optimization, we focus on weighted DDs, in
which
each arc ahas an associated length v(a). The length of a directed path
p = a(1) , . . . , a(k) rooted at r corresponds to v(p) = ∑kj=1 v(a( j) ). A weighted DD
B represents an optimization problem P in a straightforward way. Namely, B is an
exact decision diagram representation of P if the r–t paths in B encode precisely
the feasible solutions of P, and the length of a path is the objective function value
of the corresponding solution. More formally, we say that B is exact for P when
In Fig. 3.1 the length v(a) is represented by a number on each arc a. One can
verify that the BDD B depicted in this figure satisfies both conditions (3.2) and (3.3)
3.4 Compiling Exact Decision Diagrams 27
for the knapsack problem (3.1). For example, the path p = (r, u1 , v2 , w2 , t) has a
length v(p) = 3, which coincides with f (x p ) = f ((0, 1, 0, 0)) = 3.
An exact DD reduces discrete optimization to a longest-path problem on a
directed acyclic graph. If p is a longest path in a DD B that is exact for P, then x p is
an optimal solution of P, and its length v(p) is the optimal value z∗ (P) = f (x p ) of
P. For Fig. 3.1, the longest path is given by the path p∗ with length v(p∗ ) = 15 that
∗
crosses nodes (r, u2 , v2 , w2 , t), representing the optimal solution x p = (1, 1, 0, 0).
It is common in the DD literature to allow various types of long arcs that skip one
or more layers [37, 116]. Long arcs can improve efficiency because they represent
multiple partial assignments with a single arc, but to simplify exposition, we will
suppose with minimal loss of generality that there are no long arcs throughout this
book. DDs also typically have two terminal nodes, corresponding to true and false,
but for our purposes only a true node is required as the terminal for feasible paths.
Given a DD B, two nodes belonging to the same layer L j are equivalent when the
paths from each to the terminal t are the same; i.e., they correspond to the same set of
assignments to (x j , . . . , xn ), which implies they are redundant in the representation.
A reduced DD is such that no two nodes of a layer are equivalent, as in the case
of the DD in Fig. 3.1. For a given ordering of the variables over the diagram layers,
there exists a unique (canonical) reduced DD which has the smallest width across
DDs with that ordering. A DD can be reduced in linear time on the number of arcs
and nodes of the graph [37, 157].
We now present a generic framework for compiling an exact decision diagram en-
coding the solutions of a discrete optimization problem P. The framework requires
P to be written as a dynamic programming (DP) model and extracts a decision
diagram from the resulting state transition graph. We first describe the elements of a
dynamic programming model, then outline the details of our framework, and finally
show DD examples on different problem classes.
28 3 Exact Decision Diagrams
s j+1 = s j + w j x j , j = 1, . . . , n (3.4)
s1 = 0, sn+1 ≤ U
x j ∈ {0, 1}, j = 1, . . . , n.
3.4 Compiling Exact Decision Diagrams 29
One can verify that any valid solution (x, s) to the DP model above leads to an
assignment x that is feasible to the knapsack model (3.1). Conversely, any feasible
assignment x to model (3.1) has a unique completion (s, x) that is feasible to the
DP model above, thus both models are equivalent. Notice also that the states are
Markovian; i.e., the state s j+1 only depends on the control x j and the previous state
s j , which is a fundamental property of DP models.
The main components of a DP model are the states, the way in which the
controls govern the transitions, and finally the costs of each transition. To specify
this formally, a DP model for a given problem P with n variables having domains
D(x1 ), . . . , D(xn ) must, in general, consist of the following four elements:
1. A state space S with a root state r̂ and a countable set of terminal states
tˆ1 , tˆ2 , . . . , tˆk . To facilitate notation, we also consider an infeasible state 0̂ that
leads to infeasible solutions to P. The state space is partitioned into sets for
each of the n + 1 stages; i.e., S is the union of the sets S1 , . . . , Sn+1 , where
S1 = {r̂}, Sn+1 = {tˆ1 , . . . , tˆk , 0̂}, and 0̂ ∈ S j for j = 2, . . . , n.
2. Transition functions t j representing how the controls govern the transition
between states; i.e., t j : S j × D j → S j+1 for j = 1, . . . , n. Also, a transition from
an infeasible state always leads to an infeasible state as well, regardless of the
control value: t j (0̂, d) = 0̂ for any d ∈ D j .
3. Transition cost functions h j : S × D j → R for j = 1, . . . , n.
4. To account for objective function constants, we also consider a root value vr ,
which is a constant that will be added to the transition costs directed out of the
root state.
n
min fˆ(s, x) = ∑ h j (s j , x j )
j=1
(3.5)
s j+1 = t j (s j , x j ), for all x j ∈ D j , j = 1, . . . , n
s j ∈ S j , j = 1, . . . , n + 1.
The formulation (3.5) is valid for P if, for every x ∈ D, there is an s ∈ S such
that (s, x) is feasible in (3.5) and
L1 r 0 r 0 r 0
x1 0 7 0 7 0 7
L2 0 u1 u2 3
0 u1 u2 3 0 u1 u2 3
x2 0 8 0 8 0 8 0 8
L3 0 v1 3 v2 v3 6 0 v1 3 v2 v3 6
x3 0 6 0 0
L4 w1 w2 w3 k4 w4 k4
0 4 3 7 6 10
x4
(a) L1 and L2 . (b) L3 . (c) L4 .
Fig. 3.2 Three consecutive iterations of Algorithm 1 for the 0/1 knapsack problem (3.1). Grey
boxes correspond to the DP states in model (3.4), and black-filled nodes indicate infeasible nodes.
In the next section we exemplify the DP formulation and the diagram con-
struction for different problem classes in optimization. To facilitate reading, proofs
certifying the validity of the formulations are shown in Section 3.12.
Given a graph G = (V, E) with an arbitrarily ordered vertex set V = {1, 2, . . . , n}, an
independent set I is a subset I ⊆ V such that no two vertices in I are connected by
an edge in E. If we associate weights w j ≥ 0 with each vertex j ∈ V , the maximum
independent set problem (MISP) asks for a maximum-weight independent set of G.
For example, in the graph depicted in Fig. 3.3, the maximum weighted independent
set is I = {2, 5} and has a value of 13. The MISP (which is equivalent to the
maximum clique problem) has found applications in many areas, including data
mining [61], bioinformatics [59], and social network analysis [16].
The MISP can be formulated as the following discrete optimization problem:
n
max ∑ w jx j
j=1
4 2
2 4
3 1
3 5
2 7
Fig. 3.3 Example of a graph with vertex weights for the MISP. Vertices are assumed to be labeled
arbitrarily, and the number alongside each circle indicates the vertex weight.
x1 0 3
x2 4 0 0
x3 0 2 0 0
x4 0 2
0
7
x5
0 0
t
Fig. 3.4 Exact BDD for the MISP on the graph in Fig. 3.3. Dashed and solid arcs represent labels
0 and 1, respectively.
34 3 Exact Decision Diagrams
As an illustration, consider the MISP for the graph in Fig. 3.3. The states
associated with nodes of BDP are shown in Fig. 3.5. For example, node u1 has
state {2, 3, 4, 5}, representing the vertex set V \ {1}. The state space described above
yields a reduced DD [23], thus it is the smallest possible DD for a fixed ordering of
variables over the layers.
min cT x
Ax ≥ e
x j ∈ {0, 1}, j = 1, . . . , n,
r {1, 2, 3, 4, 5}
x1 0 3
{2, 3, 4, 5} u1 u2 {4, 5}
x2 4 0 0
{5} u3 u4 {3, 4, 5} u5 {4, 5}
x3 0 2 0 0
u6 {5} u7 {4, 5}
x4 0 2
0
{5} u8 u9 0/
7
x5 0
0
t
0/
Fig. 3.5 Exact BDD with states for the MISP on the graph in Fig. 3.3.
cost subset V ⊆ {1, . . . , n} of the sets A j such that, for all i, ai, j = 1 for some j ∈ V ,
i.e., V covers {1, . . . , m}. It is widely applied in practice, and it was one of the first
combinatorial problems to be proved NP-complete [71].
We now formulate the SCP as a DP model. The state in a particular stage of our
model indicates the set of constraints that still need to be covered. Namely, let Ci be
the set of indices of the variables that participate in constraint i, Ci = { j | ai, j = 1},
and let last(Ci ) = max{ j | j ∈ Ci } be the largest index of Ci . The components of
the DP model are as follows:
• State spaces: In any stage, a state contains the set of constraints that still need to
be covered: S j = 2{1,...,m} ∪ {0̂} for j = 2, . . . , n. Initially, all constraints need to
be satisfied, hence r̂ = {1, . . ., m}. There is a single terminal state which indicates
that all constraints are covered: tˆ = 0./
• Transition functions: Consider a state s j in stage j. If the control satisfies x j = 1
then all constraints that variable x j covers, A j = {i | ai, j = 1} = {i : j ∈ Ci }, can be
removed from s j . However, if x j = 0, then the transition will lead to an infeasible
state if there exists some i such that last(Ci ) = j, since then constraint i will
never be covered. Otherwise, the state remains the same. Thus:
36 3 Exact Decision Diagrams
t j (s j , 1) = s j \ A j
j s j , if last(Ci ) > j for all i ∈ s j ,
t j (s , 0) =
0̂, otherwise.
• Cost functions: h j (s j , x j ) = −c j x j .
• A root value of 0.
Figure 3.6 shows an exact reduced BDD for this SCP instance where the nodes
are labeled with their corresponding states. If outgoing 1-arcs (0-arcs) of nodes in
layer j are assigned a cost of c j (zero), a shortest r–t path corresponds to the solution
(1, 1, 0, 0, 0, 0) with an optimal value of 3.
Different than the MISP, this particular DP model does not yield reduced DDs in
general. An example is the set covering problem
{1, 2, 3}
{1, 2, 3} {3}
{3} ∅
minimize x1 + x2 + x3
subject to x1 + x3 ≥ 1
x2 + x 3 ≥ 1
x1 , x2 , x3 ∈ {0, 1}
and the two partial solutions x1 = (1, 0), x2 = (0, 1). We have that the state reached
by applying the first and second set of controls is {2} and {1}, respectively. Thus,
they would lead to different nodes in the resulting DD. However, both have the
single feasible completion x̃ = (1).
There are several ways to modify the state function so that the resulting DD is
reduced, as presented in [28]. This requires only polynomial time to compute per
partial solution, but nonetheless at an additional computational cost.
A problem closely related to the SCP, the set packing problem (SPP), is the binary
program
max cT x
Ax ≤ e
x j ∈ {0, 1}, j = 1, . . . , n,
• State spaces: In any stage, a state contains the set of constraints for which no
variables have been assigned a 1: S j = 2{1,...,m} ∪ {0̂} for j = 2, . . . , n. Initially,
38 3 Exact Decision Diagrams
t j (s j , 0) = s j \ {i | last(Ci ) = j}
j s j \ {i | j ∈ Ci }, if A j ⊆ s j ,
t j (s , 1) =
0̂, otherwise.
• Cost functions: h j (s j , x j ) = −c j x j .
• A root value of 0.
subject to x1 + x2 + x3 ≤ 1
x1 + x 4 + x 5 ≤ 1 (3.10)
x2 + x 4 + x 6 ≤ 1
xi ∈ {0, 1}, i = 1, . . . , 6.
Figure 3.7 shows an exact reduced BDD for this SPP instance. The nodes are
labeled with their corresponding states, and we assign arc costs 1/0 to each 1/0-arc.
A longest r–t path, which can be computed by a shortest path on arc weights c = −c
because the BDD is acyclic, corresponds to the solution (0, 0, 1, 0, 1, 1) and proves
an optimal value of 3.
As in the case of the SCP, the above state function does not yield reduced DDs.
The problem
max x1 + x2 + x3
x1 + x3 ≤ 1
x2 + x 3 ≤ 1
x1 , x2 , x3 ∈ {0, 1}
3.8 Single-Machine Makespan Minimization 39
{1, 2, 3}
{1,2,3} {3}
{3} ∅
Fig. 3.7 Exact reduced BDD for the SPP instance (3.10).
has two partial solutions x1 = (1, 0), x2 = (0, 1). We have distinct states {2} and
{1} reached by the controls x1 and x2 , respectively, but both have the single feasible
completion, x̃ = (0).
There are several ways to modify the state function above so that the DD
construction algorithm outputs reduced decision diagrams. For example, one can
reduce the SPP to an independent set problem and apply the state function defined
in Section 3.5, which we demonstrate to have this property in Section 7.
Table 3.2 Processing times of a single-machine makespan minimization problem. Rows and
columns represent the job index and the position in the schedule, respectively.
Position in Schedule
Jobs 1 2 3
1 4 5 9
2 3 7 8
3 1 2 10
Table 3.2 depicts an instance of the MMP with three jobs. According to the
given table, performing jobs 3, 2, and 1 in that order would result in a makespan
of 1 + 7 + 9 = 17. The minimum makespan is achieved by the permutation (2, 3, 1)
and has a value of 2 + 3 + 9 = 14. Notice that the MMP presented here can be solved
as a classical matching problem [122]. More complex position-dependent problems
usually represent machine deterioration, and the literature on this topic is relatively
recent [2].
To formulate the MMP as an optimization problem, we let xi represent the i-th
job to be processed on the machine. The MMP can be written as
n
min ∑ pi,xi
i=1
xi = x j , i, j = 1, . . . , n, i < j (3.11)
xi ∈ {1, . . . , n}, i = 1, . . . , n.
• State spaces: In a stage j, a state contains the j − 1 jobs that were performed
previously on the machine: S j = 2{1,...,n} ∪ {0̂} for j = 2, . . . , n. Initially, no jobs
/ There is a single terminal state tˆ = {1, . . . , n},
have been performed, hence r̂ = 0.
when all jobs have been completed.
3.8 Single-Machine Makespan Minimization 41
• Cost functions: The transition cost corresponds to the processing time of the
machine at that stage: h j (s j , d) = −p j,d .
• A root value of 0.
Figure 3.8 depicts the MDD with node states for the MMP instance defined in
Table 3.2. In particular, the path traversing nodes r, u3 , u5 , and t corresponds to
processing jobs 3, 2, 1, in that order. This path has a length of 14, which is the
optimal makespan of that instance.
1
2
3
r 0/
x1 4 3 1
x2 7 2 5 2 5 7
x3 10 8 9
t {1,2,3}
Fig. 3.8 Example of an MDD for the minimum makespan problem in Table 3.2. Solid, dashed,
and dotted arcs represent labels 1, 2, and 3, respectively.
42 3 Exact Decision Diagrams
Given a graph G = (V, E) with vertex set V = {1, . . . , n}, a cut (S, T ) is a partition
of the vertices in V . We say that an edge crosses the cut if its endpoints are on
opposite sides of the cut. Given edge weights, the value v(S, T ) of a cut is the sum
of the weights of the edges crossing the cut. The maximum cut problem (MCP)
is the problem of finding a cut of maximum value. The MCP has been applied to
very-large-scale integration design, statistical physics, and other problems [88, 64].
To formulate the MCP as a binary optimization problem, let x j indicate the set
(S or T ) in which vertex j is placed, so that D j = {S, T}. Using the notation S(x) =
{ j | x j = S} and T (x) = { j | x j = T}, the objective function is f (x) = v(S(x), T (x)).
Since any partition is feasible, C = 0. / Thus the MCP can be written as
Consider the graph G depicted in Fig. 3.9. The optimal solution of the maximum
cut problem defined over G is the cut (S, T ) = ({1, 2, 4}, {3}) and has a length of 4,
which is the sum of the weights from edges (1, 3), (2, 3), and (3, 4). In our model
this corresponds to the solution x∗ = (S, S, T, S) with v(S(x∗ ), T (x∗ )) = 4.
We now formulate a DP model for the MCP. Let G = (V, E) be an edge-weighted
graph, which we can assume (without loss of generality) to be complete, because
missing edges can be included with weight 0. A natural state variable s j would be
the set of vertices already placed in S, as this is sufficient to determine the transition
cost of the next choice. However, we will be interested in merging nodes that lead
to similar objective function values. We therefore let the state indicate, for vertex
j, . . . , n, the net marginal benefit of placing that vertex in T , given previous choices.
We will show that this is sufficient information to construct a DP recursion.
3.9 Maximum Cut Problem 43
1
1 2
−2 3
2 −1
3 4
−1
sk + wk , if xk = S
sk+1
= , = k + 1, . . ., n
sk − wk , if xk = T
• Root value: vr = ∑ (w j j )−
1≤ j< j ≤n
Note that the root value is the sum of the negative arc weights. The state transition
is based on the fact that, if vertex k is added to S, then the marginal benefit of placing
vertex > k in T (given choices already made for vertices 1, . . . , k − 1) is increased
by wk . If k is added to T , the marginal benefit is reduced by wk . Figure 3.10 shows
the resulting weighted BDD for the example discussed earlier.
44 3 Exact Decision Diagrams
Consider again the graph G in Fig. 3.9. Figure 3.10 depicts an exact BDD for the
MCP on G with the node states as described before. A 0-arc leaving L j indicates that
x j = S, and a 1-arc indicates x j = T. Notice that the longest path p corresponds to
the optimal solution x p = (S, S, T, S), and its length 4 is the weight of the maximum
cut (S, T ) = ({1, 2, 4}, {3}).
Let x = (x1 , . . . , xn ) be a tuple of Boolean variables, where each x j can take value T
or F (corresponding to true or false). A literal is a variable x j or its negation ¬ x j .
A clause ci is a disjunction of literals, which is satisfied if at least one literal in ci
is true. If C = {c1 , . . . , cm } is a set of clauses, each with exactly two literals, and if
each ci has weight wi ≥ 0, the maximum 2-satisfiability problem (MAX-2SAT) is
the problem of finding an assignment of truth values to x1 , . . . , xn that maximizes
the sum of the weights of the satisfied clauses in C. MAX-2SAT has applications
in scheduling, electronic design automation, computer architecture design, pattern
recognition, inference in Bayesian networks, and many other areas [100, 105, 53].
To formulate the MAX-2SAT as a binary optimization problem, we use the
Boolean variables x j with domain D j = {F, T}. The constraint set C is empty, and
r (0, 0, 0, 0)
x1 −4
u1 (0, 1, 2, −2)
x2 0 4
x3 0 6 1 1
Fig. 3.10 Exact BDD with states for the MCP on the graph in Fig. 3.9.
3.10 Maximum 2-Satisfiability Problem 45
m
max ∑ wi ci (x) (3.13)
i=1
x j ∈ {F, T}, j = 1, . . . , n.
Table 3.3 shows an example for an instance of MAX-2SAT with three Boolean
variables x1 , x2 , and x3 . The optimal solution consists of setting x = (F, T, T). It has
length 19 since it satisfies all clauses but c5 .
To formulate MAX-2SAT as a DP model, we suppose without loss of generality
that a MAX-2SAT problem contains all 4 · n2 possible clauses, because missing
clauses can be given zero weight. Thus C contains x j ∨ xk , x j ∨ ¬ xk , ¬ x j ∨ xk , and
¬ x j ∨ ¬ xk for each pair j, k ∈ {1, . . . , n} with j = k. Let wTT
jk be the weight assigned
TF
to x j ∨ xk , w jk the weight assigned to x j ∨ ¬ xk , and so forth.
We let each state variable sk be an array (sk1 , . . . , skn ) in which each skj is the net
benefit of setting x j to true, given previous settings. The net benefit is the advantage
of setting x j = T over setting x j = F. Suppose, for example, that n = 2 and we have
fixed x1 = T. Then x1 ∨ x2 and x1 ∨ ¬ x2 are already satisfied. The value of x2 makes
no difference for them, but setting x2 = T newly satisfies ¬ x1 ∨ x2 , while x2 = F
newly satisfies ¬ x1 ∨ ¬ x2 . Setting x2 = T therefore obtains net benefit wFT FF
12 − w12 .
If x1 has not yet been assigned a truth value, then we do not compute a net benefit
for setting x2 = T. Formally, the DP formulation is as follows:
• State spaces: Sk = sk ∈ Rn | skj = 0, j = 1, . . . , k − 1 , with root state and ter-
minal state equal to (0, . . . , 0)
• Transition functions: tk (sk , xk ) = (0, . . . , 0, sk+1 k+1
k+1 , . . . , sn ), where
46 3 Exact Decision Diagrams
⎧ ⎫
⎨ sk + wTT − wTF , if xk = F ⎬
k k
sk+1
= , = k + 1, . . ., n
⎩ sk + wFT − wFF , if x = T ⎭
k k k
k = 2, . . . , n
• Root value: vr = 0
Figure 3.11 shows the resulting states and transition costs for the MAX-2SAT
instance of Table 3.3. Notice that the longest path p yields the solution x p = (F, T, T)
with length 14.
r (0, 0, 0)
x1
9 7
x2 3 8 4 7
Fig. 3.11 Exact BDD with states for the MAX-2SAT problem of Table 3.3.
3.11 Compiling Decision Diagrams by Separation 47
adding a new node to the layer, replicating the outgoing arcs from the original node
(so that no solutions are lost), and resetting the endpoint of the arc to this new node.
By performing these operations for all arcs of the DD, we ensure that constraint
C is not violated by any solution encoded by B . Transition costs could also be
incorporated at any stage of the algorithm to represent an objective function of P.
To illustrate the constraint separation procedure, consider the following optimiza-
tion problem P:
3
max ∑ xi
i=1
x1 + x2 ≤ 1 (3.14)
x2 + x 3 ≤ 1
x1 + x 3 ≤ 1
x1 + 2x2 − 3x3 ≥ 2
x1 , x2 , x3 ∈ {0, 1}.
r r {1, 2, 3} r
x1 0 1 0 1 0 1
u1 {2, 3} u4 u5 0/ u4 u5
1 0 1
x2 0 1 0 0
0 1
u2 u2 u6 u7
1
x3 0 1 0 1 0
0
t t t
(a) Initial relaxed DD. (b) First iteration. (c) DD after separating C1 .
Fig. 3.12 First three iterations of the separation method for the problem (3.14).
We will compile the exact BDD for P by separating paths that violate constraint
classes C1 and C2 in that order. The separation procedure requires a BDD encoding
a relaxation of P as input. This can be trivially obtained by creating a 1-width
BDD that contains the Cartesian product of the variable domains, as depicted in
Fig. 3.12(a). The arc lengths have already been set to represent the transition costs.
We now separate the constraint set C1 . Notice that the inequalities in C1 define
the constraints of an independent set problem. Thus, we can directly use the state
definition and transition function from Section 3.5 to separate C1 . Recall that the
state in this case represents the variable indices that can still be added to the
independent set so far. The state of the root node r is set to s(r) = {1, 2, 3}. We now
process layer L1 . The 0-arc and the 1-arc leaving the root node lead to two distinct
states {2, 3} and 0,
/ respectively. Hence, we split node u1 into nodes u4 and u5 as
depicted in Fig. 3.12(b), partitioning the incoming arcs and replicating the outgoing
arcs so that no solutions are lost. Notice now that, according to the independent set
transition function, the 1-arc leaving node u5 leads to an infeasible state (shaded in
Fig. 3.12(b)), therefore it will be removed when processing layer L2 .
The resulting DD after separating constraint C1 is presented in Fig. 3.12(c).
Notice that no solution violating C1 is encoded in the DD. The separation procedure
now repeats the same steps to separate constraint C2 , defining a suitable state and
modifying the DD as necessary.
50 3 Exact Decision Diagrams
In this section we show the correctness of the MCP and the MAX-2SAT formula-
tions. The proof of correctness of the MISP formulation can be found in [23], the
proof of correctness of the SCP and the SPP formulations in [27], and finally the
proof of correctness of the MMP formulation in [49].
Theorem 3.1. The specifications in Section 3.9 yield a valid DP formulation of the
MCP.
Proof. Note that any solution x ∈ {S, T }n is feasible, so that we need only show that
condition (3.6) holds. The state transitions clearly imply that sn+1 is the terminal
state tˆ = (0, . . . , 0), and thus sn+1 ∈ {tˆ, 0̂}. If we let (s,
¯ x)
¯ be an arbitrary solution
of (3.5), it remains to show that fˆ(s, ¯ = f (x).
¯ x) ¯ Let Hk be the sum of the first k
3.12 Correctness of the DP Formulations 51
transition costs for solution (s, ¯ so that Hk = ∑kj=1 h j (s¯j , x¯j ) and Hn + vr = fˆ(s,
¯ x), ¯ x).
¯
It suffices to show that
Hn + vr = ∑ w j j | 1 ≤ j < j ≤ n, x¯j = x¯j , (3.15)
j, j
where
Nk = ∑ (w j j )− + ∑ (wk )− ,
j< j ≤k j≤k<
so that, in particular, Nn = vr . This proves the theorem, because (3.17) implies (3.15)
when k = n.
We first note that (3.17) holds for k = 1, because in this case both sides vanish.
We now suppose (3.17) holds for k − 1 and show that it holds for k. The definition
of transition cost implies
Hk = Hk−1 + (σk skk )+ + ∑ min |sk |, |wk | ,
>k
σk sk wk ≥0
where σk is 1 if x¯k = T and −1 otherwise. This and the inductive hypothesis imply
Hk = ∑
w j j + ∑ min Lk−1 , Rk−1 − Nk−1 + (σk skk )+ + ∑ min |sk |, |wk | .
j< j ≤k−1 ≥k >k
x¯j =x¯j σk sk wk ≥0
52 3 Exact Decision Diagrams
We wish to show that this is equal to the right-hand side of (3.17) minus Nk . Making
the substitution (3.16) for state variables, we can establish this equality by showing
∑ min L
, R
k−1 k−1
− Nk−1 + (σ (Lk
k k−1 − R k
k−1 ))+
+∑min |L
k−1 − R
k−1 |, |wk |
≥k >k
σk (Lk−1 −Rk−1 )wk ≥0
= ∑ w jk + ∑ min Lk , Rk − Nk .
j<k >k
x¯j =x¯k
(3.18)
We will show that (3.18) holds when x¯k = T. The proof for x¯k = S is analogous.
Using the fact that Rk = Rk−1 + wk , (3.18) can be written
min Lkk−1 , Rkk−1 + ∑ min Lk−1 , Rk−1 + (Lkk−1 − Rkk−1 )+
>k
+∑min |Lk−1 − Rk−1 |, |wk |
>k
(3.19)
(Lk−1 −Rk−1 )wk ≥0
= Lkk−1 + ∑ min Lk−1 , Rk−1 + wk − (Nk − Nk−1 ).
>k
The first and third terms of the left-hand side of (3.19) sum to Lkk−1 . We can therefore
establish (3.19) by showing that, for each ∈ {k + 1, . . ., n}, we have
min Lk−1 , Rk−1 + δ min Rk−1 − Lk−1 , −wk
= min Lk−1 , Rk−1 + wk − wk , if wk < 0
min Lk−1 , Rk−1 + (1 − δ ) min Lk−1 − Rk−1 , wk
= min Lk−1 , Rk−1 + wk , if wk ≥ 0
Theorem 3.2. The specifications in Section 3.10 yield a valid DP formulation of the
MAX-2SAT problem.
Proof. Since any solution x ∈ {F, T }n is feasible, we need only show that the costs
are correctly computed. Thus if (s,
¯ x)
¯ is an arbitrary solution of (3.5), we wish to
ˆ
show that f (s, ¯ = f (x).
¯ x) ¯ If Hk is as before, we wish to show that Hn = SATn (x), ¯
where SATk (x)
¯ is the total weight of clauses satisfied by the settings x¯1 , . . . , x¯k . Thus
3.12 Correctness of the DP Formulations 53
αβ
SATk (x)
¯ = ∑
{w j j | 1 ≤ j < j ≤ k; α , β ∈ {F, T}; x¯j = α or x¯j = β }.
j j αβ
Note first that the state transitions imply (3.16) as in the previous proof, where
Lk = ∑ wFT
j + ∑ w j ,
TT
Rk = ∑ wFFj + ∑ wTF
j , for > k,
1≤ j≤k 1≤ j≤k 1≤ j≤k 1≤ j≤k
x¯j =T x¯j =F x¯j =T x¯j =F
where σk is 1 if x¯k = T and −1 otherwise. Also α is the truth value x¯k and β is the
value opposite x¯k . This and the inductive hypothesis imply
¯ + ∑ min Lk , Rk + (σk skk )+
Hk = SATk−1 (x)
≥k
βT βF
+ ∑ wαkF + wαkT + min (sk )+ + wk , (−sk )+ + wk .
>k
We wish to show that this is equal to the right-hand side of (3.20). We will establish
this equality on the assumption that x¯k = T, as the proof is analogous when x¯k = F.
Making the substitution (3.16) for state variables, and using the facts that Lk =
Lk−1 + wFT FF
k and Rk = Rk−1 + wk , it suffices to show
∑ min L
,
k kR + min Lk k k
k−1 k−1 + (Lk−1 − Rk−1 )
, R k +
>k
+ ∑ wTF k + wTT
k + min (L
k−1 − R
k−l )+
+ wFT
k , (L
k−1 − R
k−l )+
+ wFF
k
>k
= ∑ min Lk−1 + wFT FF
k , Rk + wk + SATk (x)
¯ − SATk−1 (x).
¯
>k
(3.21)
54 3 Exact Decision Diagrams
The second and third terms of the left-hand side of (3.21) sum to Lkk−1 . Also
SATk (x) ¯ = Lkk−1 + ∑ wTF
¯ − SATk−1 (x) TT
k + wk .
>k
Abstract Bounds on the optimal value are often indispensable for the practical
solution of discrete optimization problems, as for example in branch-and-bound pro-
cedures. This chapter explores an alternative strategy of obtaining bounds through
relaxed decision diagrams, which overapproximate both the feasible set and the
objective function of the problem. We first show how to modify the top-down com-
pilation from the previous chapter to generate relaxed decision diagrams. Next, we
present three modeling examples for classical combinatorial optimization problems,
and provide a thorough computational analysis of relaxed diagrams for the maxi-
mum independent set problem. The chapter concludes by describing an alternative
method to generate relaxed diagrams, the incremental refinement procedure, and
exemplify its application to a single-machine makespan problem.
4.1 Introduction
4 2
2 4
3 1
3 5
2 7
a BDD B that is exact for P, then x p is an optimal solution of P, and its length v(p)
is the optimal value z∗ (P) = f (x p ) of P. When B is relaxed for P, a longest path p
provides an upper bound on the optimal value. The corresponding solution x p may
not be feasible, but v(p) ≥ z∗ (P). We will show that the width of a relaxed DD is
restricted by an input parameter, which can be adjusted according to the number of
variables of the problem and computer resources.
Consider the graph and vertex weights depicted in Fig. 4.1. Figure 4.2(a) repre-
sents an exact BDD in which each path corresponds to an independent set encoded
by the arc labels along the path, and each independent set corresponds to some path.
r r
x1 0 3 0 3
x2 4 0 0
0 4 0
x3 0 2 0 0 0
2 0
x4 0
0 2 0 2
0
7 7
x5
0 0 0 0
t t
(a) (b)
Fig. 4.2 (a) Exact BDD and (b) relaxed BDD for the MISP on the graph in Fig. 4.1.
4.2 Top-Down Compilation of Relaxed DDs 57
A 1-arc leaving layer L j (solid) indicates that vertex j is in the independent set, and
a 0-arc (dashed) indicates that it is not. The longest r–t path in the BDD has value
11, corresponding to solution x = (0, 1, 0, 0, 1) and to the independent set {2, 5}, the
maximum-weight independent set in the graph.
Figure 4.2(b) shows a relaxed BDD. Each independent set corresponds to a path,
but there are paths p for which x p is infeasible (i.e., not an independent set). For
example, the path p̄ encoding x p̄ = (0, 1, 1, 0, 1) does not represent an independent
set because both endpoints of edge (2, 3) are selected. The length of each path that
represents an independent set is the weight of that set, making this a relaxed BDD.
The longest path in the BDD is p̄, providing an upper bound of 13.
Relaxed DDs were introduced by [4] for the purpose of replacing the domain
store used in constraint programming by a richer data structure. Similar methods
were applied to other types of constraints [84, 85, 94], all of which apply an alterna-
tive method of generating relaxation denoted by incremental refinement, described
in Chapter 9. In this chapter we derive relaxed DDs directly from a DP formulation
of the problem. Weighted DD relaxations were used to obtain optimization bounds
in [28, 24], the former of which applied them to set covering and the latter to the
maximum independent set problem.
Relaxed DDs of limited width can be built by considering an additional step in the
modeling framework for exact DDs described in Section 3.4. Recall that such a
framework relies on a DP model composed of a state space, transition functions,
transition cost functions, and a root value. For relaxed DDs, the model should also
have an additional rule describing how to merge nodes in a layer to ensure that the
output DD will satisfy conditions (4.1) and (4.2), perhaps with an adjustment in
the transition costs. The underlying goal of this rule is to create a relaxed DD that
provides a tight bound given the maximum available width.
This rule is applied in the following way. When a layer L j in the DD grows
too large during a top-down construction procedure, we heuristically select a subset
M ⊆ L j of nodes in the layer to be merged, perhaps by choosing nodes with similar
states. The state of the merged nodes is defined by an operator ⊕(M), and the length
v of every arc coming into a node u ∈ M is modified to ΓM (v, u). The process is
repeated until |L j | no longer exceeds the maximum width W .
58 4 Relaxed Decision Diagrams
The maximum independent set problem (MISP), first presented in Section 3.5, can
be summarized as follows: Given a graph G = (V, E) with an arbitrarily ordered
vertex set V = {1, 2, . . . , n} and weight w j ≥ 0 for each vertex j ∈ V , we wish to
find a maximum-weight set I ⊆ V such that no two vertices in I are connected by an
edge in E. It is formulated as the following discrete optimization problem:
n
max ∑ w jx j
j=1
In the DP model for the MISP, recall from Section 3.5 that the state associated
with a node is the set of vertices that can still be added to the independent set. That
is,
To create a relaxed DD for the MISP, we introduce a merging rule into the
DP model above where states are merged simply by taking their union. Hence,
if M = {ui | i ∈ I}, the merged state is ⊕(M) = i∈I ui . The transition cost is not
changed, so that ΓM (v, u) = v for all v, u. The correctness follows from the fact that a
transition function leads to an infeasible state only if a vertex j is not in s j , therefore
no solutions are lost.
As an example, Fig. 4.3(a) presents an exact DD for the MISP instance defined
on the graph in Fig. 4.1. Figure 4.3(b) depicts a relaxed BDD for the same graph
with a maximum width of 2, where nodes u2 and u3 are merged during the top-down
procedure to obtain u = u2 ∪ u3 = {3, 4, 5}. This reduces the BDD width to 2.
In the exact DD of Fig. 4.3(a), the longest path p corresponds to the optimal
solution x p = (0, 1, 0, 0, 1) and its length 11 is the weight of the maximum indepen-
dent set {2, 5}. In the relaxed DD of Fig. 4.3(b), the longest path corresponds to the
solution (0, 1, 1, 0, 1) and has length 13, which provides an upper bound of 13 on the
60 4 Relaxed Decision Diagrams
r {1, 2, 3, 4, 5} r {1, 2, 3, 4, 5}
x1 0 3 0 3
{2, 3, 4, 5} u1 u2 {4, 5} {2, 3, 4, 5} {4, 5}
x2 4 0 0 0 4 0
{5} u3 u4 {3, 4, 5} u5 {4, 5} {5} u {4, 5}
x3 0
0 2 0 0 2 0
u6 {5} u7 {4, 5} {3, 4, 5} {4, 5}
x4 0 2 0 2
0 0
{5} u8 u9 0/ {5} 0/
7 7
x5 0 0
0 0
t t
0/ 0/
(a) (b)
Fig. 4.3 (a) Exact BDD with states for the MISP on the graph in Fig. 4.1. (b) Relaxed BDD for
the same problem instance.
objective function. Note that the longest path in the relaxed DD corresponds to an
infeasible solution to that instance.
The maximum cut problem (MCP) was first presented in Section 3.9. Given a graph
G = (V, E) with vertex set V = {1, . . . , n}, a cut (S, T ) is a partition of the vertices
in V . We say that an edge crosses the cut if its endpoints are on opposite sides of the
cut. Given edge weights, the value v(S, T ) of a cut is the sum of the weights of the
edges crossing the cut. The MCP is the problem of finding a cut of maximum value.
The DP model for the MCP in Section 3.9 considers a state that represents the
net benefit of adding vertex to set T . That is, using the notation (α )+ = max{α , 0}
and (α )− = min{α , 0}, the DP model was:
• State spaces: Sk = sk ∈ Rn | skj = 0, j = 1, . . . , k − 1 , with root state and ter-
minal state equal to (0, . . . , 0)
• Transition functions: tk (sk , xk ) = (0, . . . , 0, sk+1 k+1
k+1 , . . . , sn ), where
4.4 Maximum Cut Problem 61
sk + wk , if xk = S
sk+1
= , = k + 1, . . ., n
sk − wk , if xk = T
• Root value: vr = ∑ (w j j )−
1≤ j< j ≤n
Recall that we identify each node u ∈ Lk with the associated state vector sk . When
we merge two nodes u1 and u2 in a layer Lk , we would like the resulting node unew =
⊕({u, u }) to reflect the values in u and u as closely as possible, while resulting
in a valid relaxation. In particular, path lengths should not decrease. Intuitively, it
may seem that unew
j = max{u1j , u2j } for each j is a valid relaxation operator, because
increasing state values could only increase path lengths. However, this can reduce
path lengths as well. It turns out that we can offset any reduction in path lengths by
adding the absolute value of the state change to the length of incoming arcs.
This yields the following procedure for merging nodes in M. If, for a given , the
states u have the same sign for all nodes u ∈ M, we change each u to the state with
smallest absolute value, and add the absolute value of each change to the length of
arcs entering u. When the states u differ in sign, we change each u to zero and
again add the absolute value of the changes to incoming arcs. More precisely, when
M ⊂ Lk we let
⎧ ⎫
⎪
⎪ min {u }, if u ≥ 0 for all u ∈ M ⎪
⎪
⎪ u∈M
⎨ ⎪
⎬
⊕(M) = − min {|u |}, if u ≤ 0 for all u ∈ M , = k, . . . , n
⎪
⎪ u∈M ⎪
⎪
⎪
⎩ ⎪
⎭ (MCP-relax)
0, otherwise
Figure 4.5 shows an exact DD and a relaxed DD for the MCP instance defined
over the graph in Fig. 4.4. The relaxed DD is obtained by merging nodes u2 and
62 4 Relaxed Decision Diagrams
1
1 2
−2 3
2 −1
3 4
−1
u3 during top-down construction. In the exact DD of Fig. 4.5(a), the longest path p
corresponds to the optimal solution x p = (S, S, T, S), and its length 4 is the weight
of the maximum cut (S, T ) = ({1, 2, 4}, {3}). In the relaxed DD of Fig. 4.5(b), the
longest path corresponds to the solution (S, S, S, S) and provides an upper bound of
5 on the objective function. Note that the actual weight of this cut is 0.
To show that ⊕ and Γ are valid relaxation operators, we rely on Lemma 4.1:
r (0, 0, 0, 0) r (0, 0, 0, 0)
x1 −4 −4
u1 (0, 1, 2, −2) (0, 1, 2, −2)
x2 0 4 7 5
Fig. 4.5 (a) Exact BDD with states for the MCP on the graph in Fig. 4.4. (b) Relaxed BDD for the
same problem instance.
4.5 Maximum 2-Satisfiability Problem 63
Proof. Let B be the result of recomputing the BDD, and take any x¯ ∈ {S, T}n . It
suffices to show that the path p corresponding to x¯ is no shorter in B than in B. We
may suppose p contains u, because otherwise p has the same length in B and B .
Only arcs of p that leave layers Lk−1 , . . . , Ln can have different lengths in B . The
length v(a) of the arc a leaving Lk−1 becomes v(a) + |Δ |. The states sj along p in
B for j = k, . . . , n become sj + Δ in B , and all other states along p are unchanged.
Thus from the formula for transition cost, the length v(a ) of the arc a leaving L
becomes at least
v(a ) + min (−(s + Δ ))+ , (s + Δ )+ − min (−s )+ , (s )+
≥ v(a ) + min (−s )+ − Δ , (s )+ + Δ − min (−s )+ , (s )+
≥ v(a ) − |Δ |.
From the same formula, the lengths of arcs leaving L j for j > k and j = cannot
decrease. As a result, the length v(p) of p in B becomes at least v(p) + |Δ | − |Δ | =
v(p) in B .
Proof. We can achieve the effect of Algorithm 3 if we begin with the exact BDD,
successively alter only one state sk and the associated incoming arc lengths as pre-
scribed by (MCP-relax), and compute the resulting exact BDD after each alteration.
We begin with states in L2 and work down to Ln . In each step of this procedure, we
increase or decrease sk = u by δ = |u | − | ⊕ (M) | for some M ⊂ Lk , where ⊕(M)
is computed using the states that were in Lk immediately after all the states in Lk−1
were updated. We also increase the length of arcs into u by δ . This means we can
let Δ = ±δ in Lemma 4.1 and conclude that each step of the procedure yields a
relaxed BDD.
The maximum 2-satisfiability problem was described in Section 3.10. The interpre-
tation of states is very similar for the MCP and MAX-2SAT. We therefore use the
64 4 Relaxed Decision Diagrams
same relaxation operators (MCP-relax). The proof of their validity for MAX-2SAT
is analogous to the proof of Theorem 4.1.
The selection of nodes to merge in a layer that exceeds the maximum allotted width
W is critical for the construction of relaxed BDDs. Different selections may yield
dramatic differences in the obtained upper bounds on the optimal value, since the
merging procedure adds paths corresponding to infeasible solutions to the BDD.
4.6 Computational Study 65
We now present a number of possible heuristics for selecting nodes. This refers to
how the subsets M are chosen according to the function node select in Algorithm 3.
The heuristics we test are described below.
8
minLP
minSize
7 random
Bound / Optimal Value
1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Density
Fig. 4.6 Bound quality vs. graph density for each merging heuristic, using the random instance
set with MPD ordering and maximum BDD width 10. Each data point represents an average over
20 problem instances. The vertical line segments indicate the range obtained in five trials of the
random heuristic.
The purpose of this experiment is to analyze the impact of the maximum BDD width
on the resulting bound. Figure 4.7 presents the results for instance p-hat 300-1
in the dimacs set. The results are similar for other instances. The maximum width
ranges from W = 5 to the value necessary to obtain the optimal value of 8. The
bound approaches the optimal value almost monotonically as W increases, but the
convergence is superexponential in W .
4.6 Computational Study 67
80
70
60
50
Bound
40
30
20
10
0
1 10 100 1000
Maximum BDD Width
Fig. 4.7 Relaxation bound vs. maximum BDD width for the dimacs instance p-hat 300-1.
We now address the key question of how BDD bounds compare with bounds
produced by traditional LP relaxation and cutting planes. To obtain a tight initial
LP relaxation, we used a clique cover model [78] of the maximum independent
set problem, which requires computing a clique cover before the model can be
formulated. We then augmented the LP relaxation with cutting planes generated
at the root node by the CPLEX MILP solver.
Given a collection C ⊆ 2V of cliques whose union covers all the edges of the
graph G, the clique cover formulation is
max ∑ xv
v∈V
The clique cover C was computed using a greedy procedure. Starting with C = 0, /
let clique S consist of a single vertex v with the highest positive degree in G. Add
to S the vertex with highest degree in G \ S that is adjacent to all vertices in S, and
repeat until no more additions are possible. At this point, add S to C , remove from
G all the edges of the clique induced by S, update the vertex degrees, and repeat the
overall procedure until G has no more edges.
68 4 Relaxed Decision Diagrams
We solved the LP relaxation with ILOG CPLEX 12.4. We used the interior point
(barrier) option because we found it to be up to 10 times faster than simplex on the
larger LP instances. To generate cutting planes, we ran the CPLEX MILP solver
with instructions to process the root node only. We turned off presolve, because no
presolve is used for the BDD method, and it had only a marginal effect on the results
in any case. Default settings were used for cutting plane generation.
The results for the random instance set appear in Table 4.1 and are plotted
in Fig. 4.8. The table displays geometric means, rather than averages, to reduce
the effect of outliers. It uses shifted geometric means1 for computation times. The
computation times for LP include the time necessary to compute the clique cover,
which is much less than the time required to solve the initial LP for random
instances, and about the same as the LP solution time for dimacs instances.
The results show that BDDs with width as small as 100 provide bounds that,
after taking means, are superior to LP bounds for all graph densities except 0.1. The
computation time required is about the same overall—more for sparse instances,
less for dense instances. The scatter plot in Fig. 4.9 (top) shows how the bounds
compare on individual instances. The fact that almost all points lie below the
diagonal indicates the superior quality of BDD bounds.
Table 4.1 Bound quality and computation times for LP and BDD relaxations, using random
instances. The bound quality is the ratio of the bound to the optimal value. The BDD bounds
correspond to maximum BDD widths of 100, 1000, and 10,000. Each graph density setting is
represented by 20 problem instances.
1 The shifted geometric mean of the quantities v1 , . . ., vn is g − α , where g is the geometric mean
of v1 + α , . . ., vn + α . We used α = 1 second.
4.6 Computational Study 69
2.8
LP only
LP+cuts
2.6 BDD 100
BDD 1000
BDD 10000
2.4
Bound / Optimal Value
2.2
1.8
1.6
1.4
1.2
1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Density
Fig. 4.8 Bound quality vs. graph density for random instances, showing results for LP only, LP
plus cutting planes, and BDDs with maximum width 100, 1,000, and 10,000. Each data point is
the geometric mean of 20 instances.
More important, however, is the comparison with the tighter bounds obtained
by an LP with cutting planes, because this is the approach used in practice. BDDs
of width 100 yield better bounds overall than even an LP with cuts, and they do
so in less than 1% of the time. However, the mean bounds are worse for the two
sparsest instance classes. By increasing the BDD width to 1000, the mean BDD
bounds become superior for all densities, and they are still obtained in 5% as much
time overall. See also the scatter plot in Fig. 4.9 (middle). Increasing the width to
10,000 yields bounds that are superior for every instance, as revealed by the scatter
plot in Fig. 4.9 (bottom). The time required is about a third as much as LP overall,
but somewhat more for sparse instances.
The results for the dimacs instance set appear in Table 4.2 and Fig. 4.10, with
scatter plots in Fig. 4.11. The instances are grouped into five density classes, with
the first class corresponding to densities in the interval [0, 0.2), the second class to
the interval [0.2, 0.4), and so forth. The table shows the average density of each
class. Table 4.3 shows detailed results for each instance.
BDDs of width 100 provide somewhat better bounds than the LP without cuts,
except for the sparsest instances, and the computation time is somewhat less overall.
Again, however, the more important comparison is with LP augmented by cutting
planes. BDDs of width 100 are no longer superior, but increasing the width to
70 4 Relaxed Decision Diagrams
1.5
1
1 1.5 2 2.5 3
LP bound / optimum
2.2
2
BDD 1000 bound / optimum
1.8
1.6
1.4
1.2
1
1 1.2 1.4 1.6 1.8 2 2.2
LP+cuts bound / optimum
2.2
2
BDD 10000 bound / optimum
1.8
1.6
1.4
1.2
1
1 1.2 1.4 1.6 1.8 2 2.2
LP+cuts bound / optimum
Fig. 4.9 Bound quality for an LP relaxation vs. width BDDs for random instances. Each data
point represents one instance. The three plots show results for BDDs of maximum width 100 (top),
1000 (middle), and 10,000 (bottom). The LP bound in the last two plots benefits from cutting
planes.
4.6 Computational Study 71
Table 4.2 Bound quality and computation times for LP and BDD relaxations, using dimacs
instances. The bound quality is the ratio of the bound to the optimal value. The BDD bounds
correspond to maximum BDD widths of 100, 1,000, and 10,000.
1000 yields better mean bounds than LP for all but the sparsest class of instances.
The mean time required is about 15% that required by LP. Increasing the width
to 10,000 yields still better bounds and requires less time for all but the sparsest
instances. However, the mean BDD bound remains worse for instances with density
less than 0.2. We conclude that BDDs are generally faster when they provide better
bounds, and they provide better bounds, in the mean, for all but the sparsest dimacs
instances.
LP only
3.5 LP+cuts
BDD 100
BDD 1000
BDD 10000
3
Bound / Optimal Value
2.5
1.5
1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Density
Fig. 4.10 Bound quality vs. graph density for dimacs instances, showing results for LP only, LP
plus cutting planes, and BDDs with maximum width 100, 1,000, and 10,000. Each data point is
the geometric mean of instances in a density interval of width 0.2.
72 4 Relaxed Decision Diagrams
15
10
1
1 2 5 10 15
LP bound / optimum
15
10
BDD 1000 bound / optimum
1
1 2 5 10 15
LP+cuts bound / optimum
10
BDD 10000 bound / optimum
1
1 2 5 10
LP+cuts bound / optimum
Fig. 4.11 Bound quality for an LP relaxation vs. BDDs for dimacs instances. The three plots
show results for BDDs of maximum width 100 (top), 1000 (middle), and 10,000 (bottom). The LP
bound in the last two plots benefits from cutting planes.
4.6 Computational Study 73
Table 4.3 Bound comparison for the dimacs instance set, showing the optimal value (Opt),
the number of vertices (Size), and the edge density (Den). LP times correspond to clique cover
generation (Clique), processing at the root node (CPLEX), and total time. The bound (Bnd) and
computation time are shown for each BDD width. The best bounds are shown in boldface (either
LP bound or one or more BDD bounds).
variable domains. Nevertheless, since not all nodes are split, their associated state
may possibly represent an aggregation of several states from the exact DD. Extra
care must be taken when defining the transition and cost functions to ensure the
resulting DD is indeed a relaxation. This will be exemplified in Section 4.7.1.
A general outline of the relaxed DD compilation procedure is presented in
Algorithm 4. The algorithm also requires a relaxed DD B as input, which can
be trivially obtained, e.g., using a 1-width DD as depicted in Fig. 4.12(a). The
algorithm traverses the relaxed DD B in a top-down fashion. For each layer j, the
algorithm first performs filtering, i.e., it removes the infeasible arcs by checking
whether the state transition function evaluates to an infeasible state 0̂. Next, the
algorithm splits the nodes when the maximum width has not been met. If that is not
the case, the procedure updates the state s associated with a node to ensure that the
resulting DD is indeed a relaxation. Notice that the compilation algorithm is similar
to Algorithm 2, except for the width limit, the order in which the infeasible state 0̂
r r r 0/ 1
2
3
x1
x2
x3
t t t {1, 2, 3}
(a) Initial relaxed DD. (b) Job 1 is exact. (c) Exact DD.
Fig. 4.12 Three phases of refinement for a set of jobs {1, 2, 3}. Jobs are ranked lexicographically.
and the equivalence of states are checked, and the state update procedure. Filtering
and refinement details (such as their order) can also be modified if appropriate.
We now present an example of the incremental refinement procedure for the single-
machine makespan minimization problem (MMP) presented in Section 3.8. Given a
positive integer n, let J = {1, . . . , n} be a set of jobs that we wish to schedule on
a machine that can process at most one job at a time. With each job we associate a
processing time pi j , indicating the time that job j requires on the machine if it is the
i-th job to be processed. We wish to minimize the makespan of the schedule, i.e.,
the total completion time. As discussed in Section 3.8, the MMP can be written as
the following optimization problem:
n
min ∑ pi,xi
i=1
A LLDIFFERENT(x1 , . . . , xn ) (4.3)
xi ∈ {1, . . . , n}, i = 1, . . . , n.
We will now show how to define the filtering and refinement operations for the
constraint (4.3). The feasible solutions are defined by all vectors x that satisfy the
A LLDIFFERENT constraint in (4.3); that is, they are the permutations of J without
4.7 Compiling Relaxed Diagrams by Separation 77
repetition. The states used for filtering and refinement for A LLDIFFERENT were
initially introduced by [4] and [94].
4.7.1.1 Filtering
In the filtering operation we wish to identify conditions that indicate when all order-
ings identified by paths crossing an arc a always assign some job more than once.
Let an arc a be infeasible if such a condition holds. We can directly use the state
and transition function defined in Section 3.8; i.e., the state at a stage j represents
the jobs already performed up to j. However, to strengthen the infeasibility test, we
will also introduce an additional redundant state that provides a sufficient condition
to remove arcs. This state represents the jobs that might have been performed up to
a stage. To facilitate notation, we will consider a different state label s(u) with each
one of these states, as they can be computed simultaneously during the top-down
procedure of Algorithm 4.
Namely, let us associate two states All↓u ⊆ J and Some↓u ⊆ J to each node u
of the DD. The state All↓u is the set of arc labels that appear in all paths from the
root node r to u (i.e., the same as in Section 3.8), while the state Some↓u is the set
of arc labels that appear in some path from the root node r to u. We trivially have
All↓r = Some↓r = 0.
/
Instead of defining the transitions in functional form, we equivalently write them
with respective to the graphical structure of the DD. To this end, let in(v) be the set
of incoming arcs at a node v. It follows from the definitions that All↓v and Some↓v for
some node v = r can be recursively computed through the relations
All↓v = (All↓u ∪ {d(a)}), (4.4)
a=(u,v)∈in(v)
Some↓v = (Some↓u ∪ {d(a)}). (4.5)
a=(u,v)∈in(v)
For example, in Fig. 4.12(b) we have All↓v1 = {1} and Some↓u = {1, 2, 3}.
Lemma 4.2. An arc a = (u, v) is infeasible if any of the following conditions holds:
Proof. The proof argument follows from [4]. Let π be any partial ordering identi-
fied by a path from r to u that does not assign any job more than once. In condition
(4.6), d(a) ∈ All↓u indicates that d(a) is already assigned to some position in π ,
therefore appending the arc label d(a) to π will necessarily induce a repetition. For
condition (4.7), notice first that the paths from r to u are composed of (a) arcs, and
therefore π represents an ordering with (a) positions. If |Some↓u | = (a), then any
j ∈ Some↓u is already assigned to some position in π , hence appending d(a) to π
also induces a repetition.
Thus, the tests (4.6) and (4.7) can be applied in lines 6 to 10 in Algorithm 4
to remove infeasible arcs. For example, in Fig. 4.12(b) the two shaded arcs are
infeasible. The arc (u1 , v1 ) with label 1 is infeasible due to condition (4.6) since
All↓u1 = {1}. The arc (vA ,t) with label 2 is infeasible due to condition (4.7) since
2 ∈ Some↓vA = {2, 3} and |Some↓vA | = 2.
We are also able to obtain stronger tests by equipping the nodes with additional
states that can be derived from a bottom-up perspective of the DD. Namely, as
in [94], we define two new states All↑u ⊆ J and Some↑u ⊆ J for each node u of
M . They are equivalent to the states All↓u and Some↓u , but now they are computed
with respect to the paths from t to u instead of the paths from r to u. As before, they
are recursively obtained through the relations
All↑u = (All↑v ∪ {d(a)}), (4.8)
a=(u,v)∈out(u)
Some↑u = (Some↑v ∪ {d(a)}), (4.9)
a=(u,v)∈out(u)
Lemma 4.3. An arc a = (u, v) is infeasible if any of the following conditions holds:
Proof. The proofs for conditions (4.10) and (4.11) follow from an argument in [94]
and are analogous to the proof of Lemma 4.2. Condition (4.12) implies that any
ordering identified by a path containing a will never assign all jobs J .
4.7 Compiling Relaxed Diagrams by Separation 79
4.7.1.2 Refinement
Refinement consists of splitting nodes to remove paths that encode infeasible solu-
tions, therefore strengthening the relaxed DD. Ideally, refinement should modify a
layer so that each of its nodes exactly represents a particular state of each constraint.
However, as it may be necessary to create an exponential number of nodes to
represent all such states, some heuristic decision must be considered on which nodes
to split in order to observe the maximum alloted width.
In this section we present a heuristic refinement procedure that exploits the
structure of the A LLDIFFERENT constraint. Our goal is to be as precise as possible
with respect to the jobs with a higher priority, where the priority of a job is defined
according to the problem data. More specifically, we will develop a refinement
heuristic that, when combined with the infeasibility conditions for the permutation
structure, yields a relaxed MDD where the jobs with a high priority are represented
exactly with respect to that structure; that is, these jobs are assigned to exactly one
position in all orderings encoded by the relaxed MDD.
Thus, if higher priority is given to jobs that play a greater role in the feasibility or
optimality of the problem at hand, the relaxed MDD may represent more accurately
the feasible orderings of the problem, providing, e.g., better bounds on the objective
function value. For example, if we give priority to jobs with a larger processing
time, the bound on the makespan would be potentially tighter with respect to the
ones obtained from other possible relaxed MDDs for this same instance. We will
exploit this property for a number of scheduling problems in Chapter 11.
To achieve this property, the refinement heuristic we develop is based on the
following theorem, which we will prove constructively later:
Theorem 4.3. Let W > 0 be the maximum MDD width. There exists a relaxed MDD
M where at least log2 W jobs are assigned to exactly one position in all orderings
identified by M .
Let us represent the job priorities by defining a ranking of jobs J ∗ = { j1∗ , . . . , jn∗ },
where jobs with smaller index in J ∗ have a higher priority. We can thus achieve
the desired property of our heuristic refinement by constructing the relaxed MDD
M based on Theorem 4.3, where we ensure that the jobs exactly represented in M
are those with a higher ranking.
Before proving Theorem 4.3, we first identify conditions on when a node violates
the desired refinement property and needs to be modified. To this end, let M be any
80 4 Relaxed Decision Diagrams
relaxed MDD. Assume the states All↓u and Some↓u as described before are computed
for all nodes u in M , and no arcs satisfy the infeasibility conditions (4.6) to (4.12).
We have the following lemma:
Lemma 4.4. A job j is assigned to exactly one position in all orderings identified
by M if and only if j ∈ Some↓u \ All↓u for all nodes u ∈ M .
Proof. Suppose first that a job j is assigned to exactly one position in all orderings
identified by M , and take a node u in M such that j ∈ Some↓u . From the definition
of Some↓u , there exists a path from r to u with an arc labeled j. This implies by
hypothesis that all paths from u to t do not have any arcs labeled j, otherwise we
will have a path that identifies an ordering where j is assigned more than once.
But then, also by hypothesis, all paths from r to u must necessarily have some arc
labeled j, thus j ∈ All↓u , which implies j ∈ Some↓u \ All↓u .
Conversely, suppose j ∈ Some↓u \ All↓u for all nodes u in M . Then a node u can
only have an outgoing arc a with d(a) = j if j ∈ Some↓u , which is due to the filtering
rule (4.6). Thus, no job is assigned more than once in any ordering encoded by M .
Finally, rule (4.12) ensures that j is assigned exactly once in all paths.
Proof. Proof of Theorem 4.3: Let M be a 1-width relaxation. We can obtain the
desired MDD by applying filtering and refinement on M in a top-down approach
as described in Section 4.7. For filtering, remove all arcs satisfying the infeasi-
bility rules (4.6) and (4.7). For refining a particular layer Li , apply the following
procedure: For each job j = j1 , . . . , jn in this order, select a node u ∈ Li such that
j ∈ Some↓u \ All↓u . Create two new nodes u1 and u2 , and redirect the incoming arcs
at u to u1 and u2 as follows: if the arc a = (v, u) is such that j ∈ (All↓v ∪ {d(a)}),
redirect it to u1 ; otherwise, redirect it to u2 . Replicate all the outgoing arcs of u to
u1 and u2 , remove u, and repeat this until the maximum width W is met, there are
no nodes satisfying this for j, or all jobs were considered.
We now show that this refinement procedure suffices to produce a relaxed
MDD satisfying the conditions of the theorem. Observe first that the conditions of
Lemma 4.4 are satisfied by any job at the root node r, since Some↓r = 0.
/ Suppose, by
induction hypothesis, that the conditions of Lemma 4.4 are satisfied for some job j
at all nodes in layers L1 , . . . , Li , i < i, and consider we created nodes u1 and u2 from
some node u ∈ Li such that j ∈ Some↓u \ All↓u as described above. By construction, any
incoming arc a = (v, u2 ) at u2 satisfies j ∈ (All↓v ∪ {d(a)}); by induction hypothesis,
4.7 Compiling Relaxed Diagrams by Separation 81
We can utilize Theorem 4.3 to guide our top-down approach for filtering and
refinement, following the refinement heuristic based on the job ranking J ∗ de-
scribed in the proof of Theorem 4.3. Namely, we apply the following refinement
at a layer Li : For each job j∗ = j1∗ , . . . , jn∗ in the order defined by J ∗ , identify the
nodes u such that j∗ ∈ Some↓u \ All↓u and split them into two nodes u1 and u2 , where
an incoming arc a = (v, u) is redirected to u1 if j∗ ∈ (All↓v ∪ {d(a)}) or u2 otherwise,
and replicate all outgoing arcs for both nodes. Moreover, if the relaxed MDD is a
1-width relaxation, then we obtain the bound guarantee on the number of jobs that
are exactly represented.
This procedure also yields a reduced MDD M for certain structured problems,
which we will show in Section 11.7. It provides sufficient conditions to split nodes
for any problem where an A LLDIFFERENT constraint is stated on the variables.
Lastly, recall that equivalence classes corresponding to constraints other than the
permutation structure may also be taken into account during refinement. Therefore,
if the maximum width W is not met in the refinement procedure above, we assume
that we will further split nodes by arbitrarily partitioning their incoming arcs. Even
though this may yield false equivalence classes, the resulting M is still a valid
relaxation and may provide a stronger representation.
As an illustration, let J = {1, 2, 3} and assume jobs are ranked lexicographi-
cally. Given the relaxed DD in Fig. 4.12(a), Fig. 4.12(b) without the shaded arcs
depicts the result of the refinement heuristics for a maximum width of 2. Notice that
job 1 appears exactly once in all solutions encoded by the DD. Figure 4.12(c) depicts
the result of the refinement for a maximum width of 3. It is also exact and reduced
(which is always the case if we start with a 1-width relaxation and the constraint set
is composed of only one A LLDIFFERENT).
Chapter 5
Restricted Decision Diagrams
5.1 Introduction
programming technology, such as the feasibility pump [66] and the pivot, cut, and
dive heuristic [60]. A survey of heuristics for integer programming is presented by
[73, 74] and [30]. Local search methods for binary problems can also be found in
[1] and [31].
In this chapter we present a general-purpose heuristic based on restricted deci-
sion diagrams (DDs). A weighted DD B is restricted for an optimization problem
P if B represents a subset of the feasible solutions of P, and path lengths are lower
bounds on the value of feasible solutions. That is, B is restricted for P if
4 2
2 4
3 1
3 5
2 7
r r
x1 0 3 0 3
x2 4 0 0
0 0
x3 0 2 0 0 0
2 0
x4 0
0 2 0 2
0
x5 7 7
0 0 0 0
t t
(a) (b)
Fig. 5.2 (a) Exact BDD and (b) restricted BDD for the MISP on the graph in Fig. 5.1.
Restricted BDDs can be constructed in a much simpler way than relaxed DDs. We
need only eliminate nodes from a layer when the layer becomes too large. Given a
valid DP formulation of a discrete optimization problem P and a maximum width
W , Algorithm 5 outputs a restricted DD for P. Note that it is similar to Algorithm 1
except for lines 3 to 5. Condition (5.1) for a restricted BDD is satisfied because
the algorithm only deletes solutions, and furthermore, since the algorithm never
modifies the states of any nodes that remain, condition (5.2) must also be satisfied.
Finally, nodes to be eliminated are also selected heuristically according to a pre-
defined function node select.
We remark in passing that it is also possible to apply Algorithm 3 to obtain
restricted DDs. To this end, we modify the operator ⊕(M) so that the algorithm
86 5 Restricted Decision Diagrams
IP solver. In particular, we took the bound obtained from the root node relaxation.
We set the solver parameters to balance the quality of the bound value and the CPU
time to process the root node. The CPLEX parameters that are distinct from the
default settings are presented in Table 5.1. We note that all cuts were disabled,
since we observed that the root node would be processed orders of magnitude faster
without adding cuts, which did not have a significant effect on the quality of the
heuristic solution obtained for the instances tested.
Version 12.4
Number of explored nodes (NodeLim) 0 (only root)
Parallel processes (Threads) 1
Cuts (Cuts, Covers, DisjCuts, ...) −1 (off)
Emphasis (MIPEmphasis) 4 (find hidden feasible solutions)
Time limit (TiLim) 3600
The bandwidth represents the largest distance, in the variable ordering given by
the constraint matrix, between any two variables that share a constraint. The smaller
the bandwidth, the more structured the problem, in that the variables participating in
common constraints are close to each other in the ordering. The minimum bandwidth
problem seeks to find a variable ordering that minimizes the bandwidth [114, 51, 62,
80, 115, 127, 140]. This underlying structure, when present in A, can be captured by
BDDs, resulting in good computational performance.
Our random matrices are generated according to three parameters: the number of
variables n, the number of ones per row k, and the bandwidth bw . For a fixed n,
k, and bw , a random matrix A is constructed as follows: We first initialize A as a
88 5 Restricted Decision Diagrams
zero matrix. For each row i, we assign the ones by selecting k columns uniformly
at random from the index set corresponding to the variables {xi , xi+1 , . . . , xi+bw }. As
an example, a constraint matrix with n = 9, k = 3, and bw = 4 may look like
⎛ ⎞
1 1010000 0
⎜ ⎟
⎜0 1110000 0⎟
⎜ ⎟
⎜0 0101100 0⎟
⎜ ⎟
A=⎜ ⎟.
⎜0 0010110 0⎟
⎜ ⎟
⎜0 0001011 0⎟
⎝ ⎠
0 0000011 1
Consider the case when bw = k. The matrix A has the consecutive ones property
and is totally unimodular [69], and IP is able to find the optimal solution for the
set packing and set covering instances at the root node. Similarly, we argue that an
(m + 1)-width restricted BDD is an exact BDD for both classes of problems, hence
also yielding an optimal solution for when this structure is present. Indeed, we show
that A having the consecutive ones property implies that the state of a BDD node u is
always of the form { j, j + 1, . . . , m} for some j ≥ L(u) during top-down compilation.
To see this, consider the set covering problem. For a partial solution x identified
by a path from r to a certain node u in the BDD, let s(x) be the set covering state
associated with u. We claim that, for any partial solution x that can be completed to
a feasible solution, s(x ) = {i(x ), i(x ) + 1, . . . , m} for some variable index i(x ), or
s(x ) = 0/ if x satisfies all of the constraints when completed with 0’s. Let j ≤ j be
the largest index in x with xj = 1. Because x can be completed to a feasible solution,
for each i ≤ bw + j − 1 there is a variable x ji with ai, ji = 1. All other constraints must
have x j = 0 for all i with ai, j = 0. Therefore s(x ) = {bw + j, bw + j + 1, . . ., m}, as
desired. Hence, the state of every partial solution must be of the form i, i + 1, . . . , m
/ Because there are at most m + 1 such states, the size of any layer cannot exceed
or 0.
(m + 1). A similar argument works for the SPP.
Increasing the bandwidth bw , however, destroys the totally unimodular property
of A and the bounded width of B. Hence, by changing bw , we can test how sensitive
IP and the BDD-based heuristics are to the staircase structure dissolving.
We note here that generating instances of this sort is not restrictive. Once the
bandwidth is large, the underlying structure dissolves and each element of the matrix
becomes randomly generated. In addition, as mentioned above, algorithms to solve
the minimum bandwidth problem exactly or approximately have been investigated.
5.3 Computational Study 89
To any SCP or SPP one can therefore apply these methods to reorder the matrix and
then apply the BDD-based algorithm.
We first analyze the impact of the maximum width W on the solution quality
provided by a restricted BDD. To this end, we report the generated bound versus the
maximum width W obtained for a set covering instance with n = 1, 000, k = 100,
bw = 140, and a cost vector c where each c j was chosen uniformly at random from
the set {1, . . . , nc j }, where nc j is the number of constraints in which variable j
participates. We observe that the reported results are common among all instances
tested.
Figure 5.3(a) depicts the resulting bounds, where the width axis is on a logarith-
mic scale, and Fig. 5.3(b) presents the total time to generate the W -restricted BDD
and extract its best solution. We tested all W in the set {1, 2, 3, . . . , 1, 000}. We see
that, as the width increases, the bound approaches the optimal value, with a super-
exponential-like convergence in W . The time to generate the BDD grows linearly
in W , which can be shown to be consistent with the complexity of the construction
algorithm.
2300 2.5
2200
Restricted BDD time (s)
2100 2
2000
Upper bound
1900 1.5
1800
1700 1
1600
1500 0.5
1400
1300 0
1 10 100 1000 0 100 200 300 400 500 600 700 800 900 1000
Width Width
(a) Upper bound. (b) Time.
Fig. 5.3 Restricted BDD performance versus the maximum allotted width for a set covering
instance with n = 1000, k = 100, bw = 140, and random cost vector.
90 5 Restricted Decision Diagrams
First, we report the results for two representative classes of instances for the set
covering problem. In the first class, we studied the effect of bw on the quality of
the bound. To this end, we fixed n = 500, k = 75, and considered bw as a multiple
of k, namely bw ∈ {1.1k, 1.2k, . . ., 2.6k}. In the second class, we analyzed if
k, which is proportional to the density of A, also has an influence on the resulting
bound. For this class we fixed n = 500, k ∈ {25, 50, . . ., 250}, and bw = 1.6k. In
all classes we generated 30 instances for each triple (n, k, bw ) and fixed 500 as the
restricted BDD maximum width.
It is well known that the objective function coefficients play an important role
in the bound provided by IP solvers for the set covering problem. We considered
two types of cost vectors c in our experiments. The first is c = 1, which yields
the combinatorial set covering problem. For the second cost function, let nc j be
the number of constraints that include variable x j , j = 1, . . . , n. We chose the cost
of variable x j uniformly at random from the range [0.75nc j , 1.25nc j ]. As a result,
variables that participate in more constraints have a higher cost, thereby yielding
harder set covering problems to solve. This cost vector yields the weighted set
covering problem.
The feasible solutions are compared with respect to their optimality gap. The op-
timality gap of a feasible solution is obtained by first taking the absolute difference
between its objective value and a lower bound to the problem, and then dividing this
by the solution’s objective value. In both BDD and IP cases, we used the dual value
obtained at the root node of CPLEX as the lower bound for a particular problem
instance.
The results for the first instance class are presented in Fig. 5.4. Each data point
in the figure represents the average optimality gap, over the instances with that con-
figuration. We observe that the restricted BDD yields a significantly better solution
for small bandwidths in the combinatorial set covering version. As the bandwidth
increases, the staircase structure is lost and the BDD gap becomes progressively
worse in comparison with the IP gap. This is a result of the increasing width of
the exact reduced BDD for instances with larger bandwidth matrices. Thus, more
information is lost when we restrict the BDD size. The same behavior is observed
for the weighted set covering problem, although the gap provided by the restricted
BDD is generally better than the IP gap even for larger bandwidths. Finally, we note
that the restricted BDD time is also comparable to the IP time, which is on average
5.3 Computational Study 91
less than 1 second for this configuration. This time takes into account both BDD
construction and extraction of the best solution it encodes by means of a shortest
path algorithm.
The results for the second instance class are presented in Fig. 5.5. We note that
restricted BDDs provide better solutions when k is smaller. One possible explanation
for this behavior is that a sparser matrix causes variables to participate in fewer
constraints, thereby decreasing the possible number of BDD node states. Again,
less information is lost by restricting the BDD width. Moreover, we note once again
that the BDD performance, when compared with CPLEX, is better for the weighted
instances tested. Finally, we observe that the restricted BDD time is similar to the
IP time, always below one second for instances with 500 variables.
Next, we compare solution quality and time as the number of variables n
increases. We generated random instances with n ∈ {250, 500, 750, . . ., 4, 000},
k = 75, and bw = 2.2k = 165 to this end. The choice of k and bw was motivated
by Fig. 5.4(b), corresponding to the configuration where IP outperforms BDD with
respect to solution quality when n = 500. As before, we generated 30 instances for
each n. Moreover, only weighted set covering instances are considered in this case.
The average optimality gap and time are presented in Figs. 5.6(a) and 5.6(b),
respectively. The y axis in Fig. 5.6(b) is on logarithmic scale. For n > 500, we
observe that the restricted BDDs yield better-quality solutions than the IP method,
and as n increases this gap remains constant. However, the IP times grow at a much
faster rate than the restricted BDD times. In particular, with n = 4, 000, the BDD
times are approximately two orders of magnitude faster than the corresponding IP
times.
55 55
Average Optimality Gap (%)
50 50
45 45
40 40
35 35
30 30
25 IP 25 IP
BDD BDD
20 20
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6
Bandwidth/k Bandwidth/k
(a) Combinatorial. (b) Weighted.
Fig. 5.4 Average optimality gaps for combinatorial and weighted set covering instances with
n = 500, k = 75, and varying bandwidth.
92 5 Restricted Decision Diagrams
30 30
IP IP
BDD BDD
25 25
0 50 100 150 200 250 0 50 100 150 200 250
k k
(a) Combinatorial. (b) Weighted.
Fig. 5.5 Average optimality gaps for combinatorial and weighted set covering instances with
n = 500, varying k, and bw = 1.6k.
60 1000
Average Optimality Gap (%)
55 100
50 10
Time (s)
45 1
40 0.1
IP IP
BDD BDD
35 0.01
0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 2500 3000 3500 4000
n n
(a) Average Optimality Gap (in %). (b) Time (in seconds).
Fig. 5.6 Average optimality gaps and times for weighted set covering instances with varying n,
k = 75, and bw = 2.2k = 165. The y axis in the time plot is on logarithmic scale.
We extend the same experimental analysis of the previous section to set packing
instances. Namely, we initially compare the quality of the solutions by means of
two classes of instances. In the first class we analyze variations of the bandwidth
by generating random instances with n = 500, k = 75, and setting bw in the
range {1.1k, 1.2k, . . ., 2.5k}. In the second class, we analyze variations in the
density of the constraint matrix A by generating random instances with n = 500,
k ∈ {25, 50, . . ., 250}, and with a fixed bw = 1.6k. In all classes, we created 30
instances for each triple (n, k, bw ) and set 500 as the restricted BDD maximum
width.
The quality is also compared with respect to the optimality gap of the feasible so-
lutions, which is obtained by dividing the absolute difference between the solution’s
5.3 Computational Study 93
12 70
60
10
50
8
40
6
30
4 20
2 IP 10 IP
BDD BDD
0 0
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6
Bandwidth/k Bandwidth/k
(a) Combinatorial. (b) Weighted.
Fig. 5.7 Average optimality gaps for combinatorial and weighted set packing instances with
n = 500, k = 75, and varying bandwidth.
objective value and an upper bound to the problem by the solution’s objective value.
We use the dual value at CPLEX’s root node as the upper bound for each instance.
Similarly to the set covering problem, experiments were performed with two
types of objective function coefficients. The first, c = 1, yields the combinatorial set
packing problem. For the second cost function, let nc j again denote the number of
constraints that include variable x j , j = 1, . . . , n. We chose the objective coefficient
of variable x j uniformly at random from the range [0.75nc j , 1.25nc j ]. As a result,
variables that participate in more constraints have a higher cost, thereby yielding
harder set packing problems since this is a maximization problem. This cost vector
yields the weighted set packing problem.
The results for the first class of instances are presented in Fig. 5.7. For all tested
instances, the solution obtained from the BDD restriction was at least as good as
the IP solution for all cost functions. As the bandwidth increases, the gap also
increases for both techniques, as the upper bound obtained from CPLEX’s root
node deteriorates for larger bandwidths. However, the BDD gap does not increase
as much as the IP gap, which is especially noticeable for the weighted case. We note
that the difference in times between the BDD and IP restrictions are negligible and
lie below one second.
The results for the second class of instances are presented in Fig. 5.8. For all
instances tested, the BDD bound was at least as good as the bound obtained with IP,
though the solution quality from restricted BDDs was particularly superior for the
weighted case. Intuitively, since A is sparser, fewer BDD node states are possible
in each layer, implying that less information is lost by restricting the BDD width.
94 5 Restricted Decision Diagrams
20 45
IP
Average Optimality Gap (%)
Fig. 5.8 Average optimality gaps for combinatorial and weighted set packing instances with
n = 500, varying k, and bw = 1.6k.
Finally, we observe that times were also comparable for both IP and BDD cases, all
below one second.
Next, we proceed analogously to the set covering case and compare solution
quality and time as the number of variables n increases (Fig. 5.9). As before, we
generate 30 random instances per configuration, with n ∈ {250, 500, 750, . . ., 4000},
k = 75, and bw = 2.2k = 165. Only weighted set packing instances are considered.
The average optimality gap and solution times are presented in Figs. 5.9(a)
and 5.9(b), respectively. Similar to the set covering case, we observe that the BDD
restrictions outperform the IP heuristics with respect to both gap and time for this
particular configuration. The difference in gaps between restricted BDDs and IP
remains approximately the same as n increases, while the time to generate restricted
BDDs is orders of magnitude less than the IP times for the largest values of n tested.
100 1000
Average Optimality Gap (%)
90
100
80
10
Time (s)
70
60 1
50
0.1
40 IP IP
BDD BDD
30 0.01
0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 2500 3000 3500 4000
n n
(a) Combinatorial. (b) Weighted.
Fig. 5.9 Average optimality gaps and times for weighted set packing instances with varying n,
k = 75, and bw = 2.2k = 165. The y axis in the time plot is on logarithmic scale.
Chapter 6
Branch-and-Bound Based on Decision Diagrams
6.1 Introduction
Some of the most effective methods for discrete optimization are branch-and-bound
algorithms applied to an integer programming formulation of the problem. Linear
programming (LP) relaxation plays a central role in these methods, primarily by
providing bounds and feasible solutions as well as guidance for branching.
As we analyzed in Chapters 4 and 5, limited-size decision diagrams (DDs)
can be used to provide useful relaxations and restrictions of the feasible set of
an optimization problem in the form of relaxed and restricted DDs, respectively.
We will use them in a novel branch-and-bound scheme that operates within a DD
relaxation of the problem. Rather than branch on values of a variable, the scheme
branches on a suitably chosen subset of nodes in the relaxed DD. Each node gives
E
r
0 3 v(u¯1 ) = 0
x1 E E
u¯1 u¯2 u¯1
x2 0 4 0 v(u¯4 ) = 3
0+0 0+4
R u¯3 u¯4 R u¯4
x3 0 2
2 0 0 0 3+0
R u¯5 u¯6 R
x4 0 0 2 0
0 2 2 0
u¯7 u¯8 0 0
x5 R 7 R
0 0 0
0 7 7
t t t
R
(a) (b) (c)
Fig. 6.1 (a) Relaxed BDD for the MISP on the graph Fig. 3.3 with nodes labeled as exact (E) or
relaxed (R); (b) exact BDD for subproblem corresponding to u¯1 ; (c) exact BDD for subproblem
corresponding to u¯4 .
98 6 Branch-and-Bound Based on Decision Diagrams
Proof. z∗ (P|u ) is the length of a longest r–t path of B that contains u, and any such
path has length v∗ (Bru ) + v∗ (But ).
Proof. Let B be the exact BDD for P created using the same DP model. Because
each node u¯ ∈ S is exact, it has a corresponding node u in B (i.e., a node associated
with the same state), and S is a cutset of B. Thus
while open nodes remain, we select a node u from Q. We first obtain a lower bound
on z∗ (P|u ) by creating a restricted BDD But as described above, and we update the
incumbent solution zopt . If But is exact (i.e., |L j | never exceeds W in Algorithm 5),
there is no need for further branching at node u. This is analogous to obtaining
an integer solution in traditional branch-and-bound. Otherwise we obtain an upper
bound on z∗ (P|u ) by building a relaxed BDD B¯ut as described above. If we cannot
prune the search using this bound, we identify an exact cutset S of B¯ut and add the
nodes in S to Q. Because S is exact, for each u ∈ S we know that v∗ (u ) = v∗ (u) +
v∗ (B¯uu ). The search terminates when Q is empty, at which point the incumbent
solution is optimal by Theorem 6.1.
As an example, consider again the relaxed BDD B¯ in Fig. 6.1(a). The longest path
¯ = 13, an upper bound on the optimal value. Suppose
length in this graph is v∗ (B)
that we initially branch on the exact cutset {u¯1 , u¯4 }, for which we have v(u¯1 ) = 0 and
v(u¯4 ) = 3. We wish to generate restricted and relaxed BDDs of maximum width 2
for the subproblems. Figure 6.1(b) shows a restricted BDD B¯u¯1t for the subproblem
at u¯1 , and Fig. 6.1(c) shows a restricted BDD B¯u¯4t for the other subproblem. As
it happens, both BDDs are exact, and so no further branching is necessary. The
two BDDs yield bounds v∗ (B¯u¯1t ) = 11 and v∗ (B¯u¯4t ) = 10, respectively, and so the
optimal value is 11.
100 6 Branch-and-Bound Based on Decision Diagrams
Given a relaxed BDD, there are many exact cutsets. Here we present three such
cutsets and experimentally evaluate them in Section 6.5.
Proof. By the definition of a frontier cutset, each node in the cutset is exact. We
need only show that each solution x ∈ Sol(B) ¯ contains some node in FC(B). ¯ But
the path p corresponding to x ends at t, which is relaxed because B̄ is not exact.
Since the root r is exact, there must be a first relaxed node u in p. The node
¯ as desired.
immediately preceding this node in p is in FC(B),
• Maximum width: Wider relaxed BDDs provide tighter bounds but require more
time to build. For each subproblem in the branch-and-bound procedure, we set
the maximum width W equal to the number of variables whose value has not yet
been fixed.
• Node selection for merger: The selection of the subset M of nodes to merge
during the construction of a relaxed BDD (line 4 of Algorithm 1) likewise affects
the quality of the bound, as discussed in Chapters 4 and 5. We use the following
heuristic: After constructing each layer L j of the relaxed BDD, we rank the nodes
in L j according to a rank function rank(u) that is specified in the DP model with
the state merging operator ⊕. We then let M contain the lowest-ranked |L j | − W
nodes in L j .
• Variable ordering: Much as branching order has a significant impact on IP
performance, the variable ordering chosen for the layers of the BDD can affect
branching efficiency and the tightness of the BDD relaxation. We describe below
the variable ordering heuristics we used for the three problem classes.
• Search node selection: We must also specify the next node in the set Q of open
nodes to be selected during branch-and-bound (Algorithm 6). We select the node
u with the minimum value v∗ (u).
The tests were run on an Intel Xeon E5345 with 8 GB RAM. The BDD-based
algorithm was implemented in C++. The commercial IP solver ILOG CPLEX
12.4 was used for comparison. Default settings, including presolve, were used for
CPLEX unless otherwise noted. No presolve routines were used for the BDD-based
method.
We first specify the key elements of the algorithm that we used for the MISP. Node
selection for merger is based on the rank function rank(u) = v∗ (u). For variable
ordering, we considered the heuristic minState first described in Section 4.6.2: after
selecting the first j − 1 variables and forming layer L j , we choose vertex j as the
vertex that belongs to the fewest number of states in L j . Finally, we used FC cutsets
for all MISP tests.
102 6 Branch-and-Bound Based on Decision Diagrams
We refer to this as the tight MISP formulation. The clique cover C is computed
using a greedy procedure: Starting with C = 0,
/ let clique S consist of a single vertex
v with the highest positive degree in G. Add to S the vertex with highest degree
in G \ S that is adjacent to all vertices in S, and repeat until no more additions are
possible. At this point, add S to C , remove from G all the edges of the clique induced
by S, update the vertex degrees, and repeat the overall procedure until G has no more
edges.
We begin by reporting results on randomly generated graphs. We generated
random graphs with n ∈ {250, 500, . . ., 1, 750} and density p ∈ {0.1, 0.2, . . ., 1} (10
graphs per n, p configuration) according to the Erdös–Rényi model G(n, p) (where
each edge appears independently with probability p).
Figure 6.2 depicts the results. The solid lines represent the average percent gap
for the BDD-based technique after 1,800 seconds, one line per value of n, and the
dashed lines depict the same statistics for the integer programming solver using the
tighter, clique model, only. It is clear that the BDD-based algorithm outperforms
CPLEX on dense graphs, solving all instances tested with density 80% or higher,
and solving almost all instances, except for the largest, with density equal to
70%, whereas the integer programming solver could not close any but the smallest
instances (with n = 250) at these densities.
CPLEX outperformed the BDD technique for the sparsest graphs (with p = 10),
but only for the small values of n. As n grows, we see that the BDD-based algorithm
starts to outperform CPLEX, even on the sparsest graphs, and that the degree to
which the ending percent gaps increase as n grows is more substantial for CPLEX
than it is for the BDD-based algorithm.
6.5 Computational Study 103
250-CPLEX
1400 250-BDD
500-CPLEX
500-BDD
1200 750-CPLEX
750-BDD
Average Percent Gap
1000 1000-CPLEX
1000-BDD
1250-CPLEX
800 1250-BDD
1500-CPLEX
1500-BDD
600 1750-CPLEX
1750-BDD
400
200
10 20 30 40 50 60 70 80 90
Density
We also tested on the 87 instances of the maximum clique problem in the well-
known DIMACS benchmark (http://cs.hbg.psu.edu/txn131/clique.html). The MISP
is equivalent to the maximum clique problem on the complement of the graph.
Figure 6.3 shows a time profile comparing BDD-based optimization with CPLEX
performance for the standard and tight IP formulations. The BDD-based algorithm
is superior to the standard IP formulation but solved four fewer instances than the
tight IP formulation after 30 minutes. However, fewer than half the instances were
solved by any method. The relative gap (upper bound divided by lower bound) for
the remaining instances therefore becomes an important factor. A comparison of the
50
40
number solved
30
20
10
BDD
CPLEX-TIGHT
0 CPLEX
0 200 400 600 800 1000 1200 1400 1600 1800
time (s)
Fig. 6.3 Results on 87 MISP instances for BDDs and CPLEX. Number of instances solved versus
time for the tight IP model (top line), BDDs (middle), standard IP model (bottom).
104 6 Branch-and-Bound Based on Decision Diagrams
10
9
1 2 3 4 5 6 7 8 9 10
gap ratio (BDD)
Fig. 6.4 Results on 87 MISP instances for BDDs and CPLEX. End gap comparison after 1800
seconds.
relative gap for BDDs and the tight IP model appears in Fig. 6.4, where the relative
gap for CPLEX is shown as 10 when it found no feasible solution. Points above the
diagonal are favorable to BDDs. It is evident that BDDs tend to provide significantly
tighter bounds. There are several instances for which the CPLEX relative gap is
twice the BDD gap, but no instances for which the reverse is true. In addition,
CPLEX was unable to find a lower bound for three of the largest instances, while
BDDs provided bounds for all instances.
We evaluated our approach on random instances for the MCP. For n ∈ {30, 40, 50}
and p ∈ {0.1, 0.2, . . ., 1}, we again generated random graphs (10 per n, p configura-
tion). The weights of the edges generated were drawn uniformly from [−1, 1].
We let the rank of a node u ∈ L j associated with state s j be
n
rank(u) = v∗ (u) + ∑ sj .
= j
We order the variables x j according to the sum of the lengths of the edges incident
to vertex j. Variables with the largest sum are first in the ordering.
6.5 Computational Study 105
250
200
time (s)
150
100
50
Fig. 6.5 Average solution time for MCP instances (n = 30 vertices) using BDDs (with LEL and FC
cutsets) and CPLEX (with and without presolve). Each point is the average of 10 random instances.
106 6 Branch-and-Bound Based on Decision Diagrams
10 BDD (LEL) 10
BDD (FC)
IP (presolve-on)
8 IP (presolve-off) 8
number solved
number solved
6 6
4 4
2 2
0 0
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1
density density
Fig. 6.6 Number of MCP instances with n = 40 vertices solved after 60 seconds (left) and 1800
seconds (right), versus graph density, using BDDs (with LEL and FC cutsets) and CPLEX (with
and without presolve). The legend is the same for the two plots.
Figure 6.7 (left) shows time profiles for 100 instances with n = 50 vertices. The
profiles for CPLEX (with presolve) and BDDs (with LEL) are roughly competitive,
with CPLEX marginally better for larger time periods. However, none of the
methods could solve even a third of the instances, and so the gap for the remaining
instances becomes important. Figure 6.7 (right) shows that the average percent gap
(i.e., 100(UB − LB)/LB) is much smaller for BDDs on denser instances, and com-
parable on sparser instances, again suggesting greater robustness for a BDD-based
method relative to CPLEX. In view of the fact that CPLEX benefits enormously
from presolve, it is conceivable that BDDs could likewise profit from a presolve
routine.
We also tested the algorithm on the g-set, a classical benchmark set, created by
the authors in [91], which has since been used extensively for computational testing
on algorithms designed to solve the MCP. The 54 instances in the benchmark set are
35 500
BDD (LEL) BDD (LEL)
30 BDD (FC) BDD (FC)
IP (with presolve) 400 IP (presolve-on)
25 IP (no presolve) IP (presolve-off)
number solved
percent gap
20 300
15
200
10
5
100
0
-5 0
0.1 1 10 100 1000 0.2 0.4 0.6 0.8 1
time (s) density
Fig. 6.7 Time profile (left) for 100 MCP instances with n = 50 vertices, comparing BDDs (with
LEL and FC cutsets) and CPLEX (with and without presolve). Percent gap (right) versus density
after 1800 seconds, where each point is the average over 10 random instances.
6.5 Computational Study 107
large, each having at least 800 vertices. The results appear in Table 6.1 only for those
instances for which the BDD-based algorithm was able to improve upon the best
known integrality gaps. For the instances with 1% density or more, the integrality
gap provided by the BDD-based algorithm is about an order of magnitude worse
than the best known integrality gaps, but for these instances (which are among the
sparsest), we are able to improve on the best known gaps through proving tighter
relaxation bounds and identifying better solutions than have ever been found.
The first column provides the name of the instance. The instances are ordered
by density, with the sparsest instances reported appearing at the top of the table.
We then present the upper bound (UB) and lower bound (LB), after one hour of
computation time, for the BDD-based algorithm, follow by the best known upper
bound and lower bound that we could find in the literature. In the final columns,
we record the previously best known percent gap and the new percent gap obtained
from the BDD-based algorithm. Finally, we present the reduction in percent gap
obtained.
For three instances (g32, g33, and g34), better solutions were identified by
the BDD-based algorithm than have ever been found by any technique, with an
improvement in objective function value of 12, 4, and 4, respectively. In addition,
for four instances (g50, g33, g11, and g12) better upper bounds were proven than
were previously known, reducing the best known upper bound by 89.18, 1, 60, and
5, respectively. For these instances, the reduction in the percent gap is shown in the
last column. Most notably, for g50 and g11, the integrality gap was significantly
tightened (82.44 and 95.24 percent reduction, respectively). As the density grows,
however, the BDD-based algorithm is not able to compete with other state-of-the-
art techniques, yielding substantially worse solutions and relaxation bounds than the
best known values.
We note here that the BDD-based technique is a general branch-and-bound
procedure, whose application to the MCP is only specialized through the DP
model that is used to calculate states and determine transition costs. This general
technique was able to improve upon best known solutions obtained by heuristics
and exact techniques specifically designed to solve the MCP. And so, although
the technique is unable to match the best known objective function bounds for all
instances, identifying the best known solution via this general-purpose technique is
an indication of the power of the algorithm.
108 6 Branch-and-Bound Based on Decision Diagrams
For the MAX-2SAT problem, we created random instances with n ∈ {30, 40}
variables and density d ∈ {0.1, 0.2, . . ., 1}. We generated 10 instances for each pair
(n, d), with each of the 4 · n2 possible clauses selected with probability d and, if
selected, assigned a weight drawn uniformly from [1, 10].
We used the same rank function as for the MCP, and we ordered the variables in
ascending order according to the total weight of the clauses in which the variables
appear.
We formulated the IP using a standard model. Let clause i contain variables x j(i)
and xk(i) . Let xij be x j if x j is posited in clause i, and 1 − x j if x j is negated. Let δi be
a 0–1 variable that will be forced to 0 if clause i is unsatisfied. Then if there are m
clauses and wi is the weight of clause i, the IP model is
m
i
max ∑ wi δi x j(i) + xk(i) + (1 − δi) ≥ 1, all i; x j , δi ∈ {0, 1}, all i, j .
i
i=1
Figure 6.8 shows the time profiles for the two size classes. BDDs with LEL are
clearly superior to CPLEX for n = 30. When n = 40, BDDs prevail over CPLEX
as the available solving time grows. In fact, BDDs solve all but 2 of the instances
within 30 minutes, while CPLEX leaves 17 unsolved using no presolve, and 22
unsolved using presolve.
6.6 Parallel Branch-and-Bound 109
100
80
number solved
60
40
20
BDD (LEL)
BDD (FC)
IP (with presolve)
0 IP (no presolve)
0.01 0.1 1 10 100 1000 10000
time (s)
60
40
20
0
0.1 1 10 100 1000 10000
time (s)
Fig. 6.8 Time profile for 100 MAX-2SAT instances with n = 30 variables (top) and n = 40
variables (bottom), comparing BDDs (with LEL and FC cutsets) and CPLEX (with and without
presolve).
In recent years, hardware design has increasingly focused on multicore systems and
parallelized computing. In order to take advantage of these systems, it is crucial that
solution methods for combinatorial optimization be effectively parallelized and built
to run not only on one machine but also on a large cluster.
Different combinatorial search methods have been developed for specific prob-
lem classes, including mixed-integer programming (MIP), Boolean satisfiability
(SAT), and constraint programming (CP). These methods represent (implicitly or
explicitly) a complete enumeration of the solution space, usually in the form of a
110 6 Branch-and-Bound Based on Decision Diagrams
branching tree where the branches out of each node reflect variable assignments. The
recursive nature of branching trees suggests that combinatorial search methods are
amenable to efficient parallelization, since we may distribute subtrees to different
compute cores spread across multiple machines of a compute cluster.
Yet, in practice this task has proved to be very challenging. For example, Gurobi,
one of the leading commercial MIP solvers, achieves an average speedup factor
of 1.7 on 5 machines (and only 1.8 on 25 machines) when compared with using
only 1 machine [79]. Furthermore, during the 2011 SAT Competition, the best
parallel SAT solvers obtained an average speedup factor of about 3 on 32 cores,
which was achieved by employing an algorithm portfolio rather than a parallelized
search [102]. The winner of the parallel category of the 2013 SAT Competition also
achieved a speedup of only about 3 on 32 cores.
Constraint programming search appears to be more suitable for parallelization
than search for MIP or SAT. Different strategies, including a recursive application
of search goals [125], work stealing [48], problem decomposition [135], and a
dedicated parallel scheme based on limited discrepancy search [118] all exhibit
good speedups (sometimes near-linear). This is specially true in scenarios involving
infeasible instances or where evaluating the search tree leaves is costlier than evalu-
ating internal nodes. Nonetheless, recent developments in CP and SAT have moved
towards more constraint learning during search (such as lazy clause generation
[120]) for which efficient parallelization becomes increasingly more difficult.
Our goal in this section is to investigate whether branch-and-bound based on
decision diagrams can be effectively parallelized. The key observation is that relaxed
decision diagrams can be used to partition the search space, since for a given layer
in the diagram each path from the root to the terminal passes through a node in
that layer. We can therefore branch on nodes in the decision diagram instead of
branching on variable–value pairs, as is done in conventional search methods. Each
of the subproblems induced by a node in the diagram is processed recursively, and
the process continues until all nodes have been solved by an exact decision diagram
or pruned due to reasoning based on bounds on the objective function.
When designing parallel algorithms geared towards dozens or perhaps hundreds
of workers operating in parallel, the two major challenges are (1) balancing the
workload across the workers, and (2) limiting the communication cost between
workers. In the context of combinatorial search and optimization, most of the
current methods are based on either parallelizing the traditional tree search or
using portfolio techniques that make each worker operate on the entire problem.
6.6 Parallel Branch-and-Bound 111
The former approach makes load balancing difficult as the computational cost of
solving similarly sized subproblems can be orders of magnitude different. The latter
approach typically relies on extensive communication in order to avoid duplication
of effort across workers.
In contrast, using decision diagrams as a starting point for parallelization offers
several notable advantages. For instance, the associated branch-and-bound method
applies relaxed and restricted diagrams that are obtained by limiting the size of
the diagrams to a certain maximum value. The size can be controlled, for example,
simply by limiting the maximum width of the diagrams. As the computation time for
a (sub)problem is roughly proportional to the size of the diagram, by controlling the
size we are able to control the computation time. In combination with the recursive
nature of the framework, this makes it easier to obtain a balanced workload. Further,
the communication between workers can be limited in a natural way by using both
global and local pools of currently open subproblems and employing pruning based
on shared bounds. Upon processing a subproblem, each worker generates several
new ones. Instead of communicating all of these back to the global pool, the worker
keeps several of them to itself and continues to process them. In addition, whenever
a worker finds a new feasible solution, the corresponding bound is communicated
immediately to the global pool as well as to other workers, enabling them to prune
subproblems that cannot provide a better solution. This helps avoid unnecessary
computational effort, especially in the presence of local pools.
Our scheme is implemented in X10 [42, 139, 158], which is a modern pro-
gramming language designed specifically for building applications for multicore
and clustered systems. For example, [34] recently introduced SatX10 as an efficient
and generic framework for parallel SAT solving using X10. We refer to our proposed
framework for parallel decision diagrams as DDX10. The use of X10 allows us
to program parallelization and communication constructs using a high-level, type-
checked language, leaving the details of an efficient backend implementation for a
variety of systems and communication hardware to the language compiler and run-
time. Furthermore, X10 also provides a convenient parallel execution framework,
allowing a single compiled executable to run as easily on one core as on a cluster of
networked machines.
Our main contributions are as follows: First, we describe, at a conceptual level,
a scheme for parallelization of a sequential branch-and-bound search based on
approximate decision diagrams and discuss how this can be efficiently implemented
in the X10 framework. Second, we provide an empirical evaluation on the maximum
112 6 Branch-and-Bound Based on Decision Diagrams
independent set problem, showing the potential of the proposed method. Third, we
compare the performance of DDX10 with a state-of-the-art parallel MIP solver.
Experimental results indicate that DDX10 can obtain much better speedups than
parallel MIP, especially when more workers are available. The results also demon-
strate that the parallelization scheme provides near-linear speedups up to 256 cores,
even in a distributed setting where the cores are split across multiple machines.
The limited amount of information required for each BDD node makes the
branch-and-bound algorithm naturally suitable for parallel processing. Once an
exact cut C is computed for a relaxed BDD, the nodes u ∈ C are independent and
can each be processed in parallel. The information required to process a node u ∈ C
is its corresponding state, which is bounded by the number of vertices of G, |V |.
After processing a node u, only the lower bound v∗ (G[E(u)]) is needed to compute
the optimal value.
There are many possible parallel strategies that can exploit this natural characteristic
of the branch-and-bound algorithm for approximate decision diagrams. We propose
here a centralized strategy. Specifically, a master process keeps a pool of BDD
nodes to be processed, first initialized with a single node associated with the root
state V . The master distributes the BDD nodes to a set of workers. Each worker
receives a number of nodes, processes them by creating the corresponding relaxed
and restricted BDDs, and either sends back to the master new nodes to explore (from
an exact cut of their relaxed BDD) or sends to the master as well as all workers an
improved lower bound from a restricted BDD.
The workers also send the upper bound obtained from the relaxed BDD from
which the nodes were extracted, which is then used by the master for potentially
pruning the nodes according to the current best lower bound at the time these nodes
are brought out from the global pool to be processed.
Even though conceptually simple, our centralized parallelization strategy in-
volves communication between all workers and many choices that have a significant
impact on performance. After discussing the challenge of effective parallelization,
we explore some of these choices in the rest of this section.
6.6 Parallel Branch-and-Bound 113
We refer to the pool of nodes kept by the master as the global pool. Each node in
the global pool has two pieces of information: a state, which is necessary to build
the relaxed and restricted BDDs, and the longest path value in the relaxed BDD that
created that node, from the root to the node. All nodes sent to the master are first
stored in the global pool and then redistributed to the workers. Nodes with an upper
114 6 Branch-and-Bound Based on Decision Diagrams
bound that is no more than the best found lower bound at the time are pruned from
the pool, as these can never provide a solution better than one already found.
In order to select which nodes to send to workers first, the global pool is
implemented here using a data structure that mixes a priority queue and a stack.
Initially, the global pool gives priority to nodes that have a larger upper bound,
which intuitively are nodes with higher potential to yield better solutions. However,
this search strategy simulates a best-first search and may result in an exponential
number of nodes in the global queue that still need to be explored. To remedy this,
the global pool switches to a last-in, first-out node selection strategy when its size
reaches a particular value (denoted maxPQueueLength), adjusted according to the
available memory on the machine where the master runs. This strategy resembles a
stack-based depth-first search and limits the total amount of memory necessary to
perform search.
Besides the global pool, workers also keep a local pool of nodes. The subprob-
lems represented by the nodes are usually small, making it advantageous for workers
to keep their own pool so as to reduce the overall communication to the master. The
local pool is represented by a priority queue, selecting nodes with a larger upper
bound first. After a relaxed BDD is created, a certain fraction of the nodes (with
preference for those with a larger upper bound) in the exact cut are sent to the master,
while the remaining fraction (denoted fracToKeep) of nodes are added to the local
pool. The local pool size is also limited; when the pool reaches this maximum size
(denoted maxLocalPoolSize), we stop adding more nodes to the local queue and start
sending any newly created nodes directly to the master. When a worker’s local pool
becomes empty, it notifies the master that it is ready to receive new nodes.
The global queue starts off with a single node corresponding to the root state V . The
root assigned to an arbitrary worker, which then applies a cut to produce more states
and sends a fraction of them, as discussed above, back to the global queue. The
size of the global pool thus starts to grow rapidly, and one must choose how many
nodes to send subsequently to other workers. Sending one node (the one with the
highest priority) to a worker at a time would mimic the sequential case most closely.
However, it would also result in the most number of communications between the
master and the workers, which often results in a prohibitively large system overhead.
6.6 Parallel Branch-and-Bound 115
On the other hand, sending too many nodes at once to a single worker runs the risk
of starvation, i.e., the global queue becoming empty and other workers sitting idle
waiting to receive new work.
Based on experimentation with representative instances, we propose the follow-
ing parameterized scheme to dynamically decide how many nodes the master should
send to a worker at any time. Here, we use the notation [x]u as a shorthand for
min{u, max{, x}}, that is, x capped to lie in the interval [, u].
q ∞
nNodesToSendc,c,c ¯ c∗
¯ ∗ (s, q, w) = min cs, , (6.3)
w c
where s is a decaying running average of the number of nodes added to the global
pool by workers after processing a node,1 q is the current size of the global pool, w
¯ and c∗ are parametrization constants.
is the number of workers, and c, c,
The intuition behind this choice is as follows: c is a flat lower limit (a relatively
small number) on how many nodes are sent at a time irrespective of other factors.
The inner minimum expression upper bounds the number of nodes to send to be no
more than both (a constant times) the number of nodes the worker is in turn expected
to return to the global queue upon processing each node and (a constant times) an
even division of all current nodes in the queue into the number of workers. The
first influences how fast the global queue grows, while the second relates to fairness
¯ and c∗ reduce
among workers and the possibility of starvation. Larger values of c, c,
the number of times communication occurs between the master and workers, at the
expense of moving further away from mimicking the sequential case.
Load balancing also involves appropriately setting the fracToKeep value dis-
cussed earlier. We use the following scheme, parameterized by d and d ∗ :
where t is the number of states received by the worker. In other words, the fraction
of nodes to keep for the local queue is 1/d ∗ times the number of states received by
the worker, capped to lie in the range [d, 1].
1 When a cut C is applied upon processing a node, the value of s is updated as snew = rsold + (1 −
r)|C|, with r = 0.5 in the current implementation.
116 6 Branch-and-Bound Based on Decision Diagrams
The MISP problem can be formulated and solved using several existing general-
purpose discrete optimization techniques. A MIP formulation is considered to be
very effective and has been used previously to evaluate the sequential BDD approach
in Section 6.5. Given the availability of parallel MIP solvers as a comparison point,
we present two sets of experiments on the MISP problem: (1) we compare DDX10
with a MIP formulation solved using IBM ILOG CPLEX 12.5.1 on up to 32 cores,
and (2) we show how DDX10 scales when going beyond 32 cores and employing up
to 256 cores distributed across a cluster. We borrow the MIP encoding from Section
6.5 and employ the built-in parallel branch-and-bound MIP search mechanism of
CPLEX. The comparison with CPLEX is limited to 32 cores because this is the
largest number of cores we have available on a single machine (note that CPLEX
12.5.1 does not support distributed execution). Since the current version of DDX10
6.6 Parallel Branch-and-Bound 117
The comparison between DDX10 and IBM ILOG CPLEX 12.5.1 was conducted on
2.3 GHz AMD Opteron 6134 machines with 32 cores, 64 GB RAM, 512 KB L2
cache, and 12 MB L3 cache.
To draw meaningful conclusions about the scaling behavior of CPLEX vs.
DDX10 as the number w of workers is increased, we start by selecting problem
instances where both approaches exhibit comparable performance in the sequential
setting. To this end, we report an empirical evaluation on random instances with 170
vertices and six graph densities ρ = 0.19, 0.21, 0.23, 0.25, 0.27, and 0.29. For each
ρ , we generated five random graphs, obtaining a total of 30 problem instances. For
each pair (ρ , w) with w being the number of workers, we aggregate the runtime over
the five random graphs using the geometric mean.
Figure 6.9 summarizes the result of this comparison for w = 1, 2, 4, 16, and
32. As we see, CPLEX and DDX10 display comparable performance for w = 1
(the leftmost data points). While the performance of CPLEX varies relatively little
as a function of the graph density ρ , that of DDX10 varies more widely. As
observed earlier in this section for the sequential case, BDD-based branch-and-
bound performs better on higher-density graphs than sparse graphs. Nevertheless,
2The current version of DDX10 may be downloaded from http://www.andrew.cmu.edu/
user/vanhoeve/mdd.
118 6 Branch-and-Bound Based on Decision Diagrams
D19
D21
D23
1000 D25
D27
D29
10
1
1 2 4 8 16 32
Number of Cores
D19
D21
D23
1000 D25
D27
D29
Time (seconds)
100
10
1
1 2 4 8 16 32
Number of Cores
Fig. 6.9 Performance of CPLEX (above) and DDX10 (below), with one curve for each graph
density ρ shown in the legend as a percentage. Both runtime (y-axis) and number of cores (x-axis)
are on logarithmic scale.
The two experiments reported in this section were conducted on a larger cluster,
with 13 of 3.8 GHz Power7 machines (CHRP IBM 9125-F2C) with 32 cores (4-way
SMT for 128 hardware threads) and 128 GB of RAM. The machines are connected
via a network that supports the PAMI message passing interface [106], although
DDX10 can also be easily compiled to run using the usual network communication
with TCP sockets. We used 24 workers on each machine, using as many machines
as necessary to operate w workers in parallel.
Random Instances
The first experiment reuses the random MISP instances introduced in the previous
section, with the addition of similar but harder instances on graphs with 190 vertices,
resulting in 60 instances in total.
120 6 Branch-and-Bound Based on Decision Diagrams
D19
D21
D23
1000 D25
D27
D29
10
1
1 4 16 64 256
Number of Cores
D19
D21
D23
1000 D25
D27
D29
Time (seconds)
100
10
1
1 4 16 64 256
Number of Cores
Fig. 6.10 Scaling behavior of DDX10 on MISP instances with 170 (above) and 190 (below)
vertices, with one curve for each graph density ρ shown in the legend as a percentage. Both runtime
(y-axis) and number of cores (x-axis) are on logarithmic scale.
As Fig. 6.10 shows, DDX10 scales near-linearly up to 64 cores and still very well
up to 256 cores. The slight degradation in performance when going to 256 cores is
more apparent for the higher-density instances (lower curves in the plots), which do
not have much room left for linear speedups as they need only a couple of seconds
to be solved with 64 cores. For the harder instances (upper curves), the scaling is
still satisfactory even if not linear. As noted earlier, coming anywhere close to near-
linear speedups for complex combinatorial search and optimization methods has
been remarkably hard for SAT and MIP. These results show that parallelization of
BDD based branch-and-bound can be much more effective.
6.6 Parallel Branch-and-Bound 121
Table 6.3 Number of nodes in multiples of 1, 000 processed (#No) and pruned (#Pr) by DDX10
as a function of the number of cores. Same setup as in Table 6.2.
DIMACS Instances
The second experiment is on the DIMACS instances used by [22], where it was
demonstrated that sequential BDD-based branch-and-bound has complementary
strengths compared with sequential CPLEX and outperforms the latter on several
instances, often the ones with higher graph density ρ . We consider here the subset
of instances that take at least 10 seconds (on our machines) to solve using sequential
BDDs and omit any that cannot be solved within the time limit of 1800 seconds
(even with 256 cores). The performance of DDX10 with w = 1, 4, 16, 64, and 256 is
reported in Table 6.2, with rows sorted by hardness of instances.
These instances represent a wide range of graph size, density, and structure. As
we see from the table, DDX10 is able to scale very well to 256 cores. Except for
three instances, it is significantly faster on 256 cores than on 64 cores, despite the
substantially larger communication overhead for workload distribution and bound
sharing.
122 6 Branch-and-Bound Based on Decision Diagrams
Table 6.3 reports the total number of nodes processed through the global queue,
as well as the number of nodes pruned due to bounds communicated by the work-
ers.3 Somewhat surprisingly, the number of nodes processed does not increase by
much compared with the sequential case, despite the fact that hundreds of workers
start processing nodes in parallel without waiting for potentially improved bounds
which might have been obtained by processing nodes sequentially. Furthermore,
the number of pruned nodes also stays steady as w grows, indicating that bound
communication is working effectively. This provides insight into the amiable scaling
behavior of DDX10 and shows that it is able to retain sufficient global knowledge
even when executed in a distributed fashion.
3Here we do not take into account the number of nodes added to local pools, which is usually a
small fraction of the number of nodes processed by the global pool.
Chapter 7
Variable Ordering
Abstract One of the most important parameters that determines the size of a
decision diagram is the variable ordering. In this chapter we formally study the
impact of variable ordering on the size of exact decision diagrams for the maximum
independent set problem. We provide worst-case bounds on the size of the exact
decision diagram for particular classes of graphs. For general graphs, we show that
the size is bounded by the Fibonacci numbers. Lastly, we demonstrate experimen-
tally that variable orderings that produce small exact decision diagrams also produce
better bounds from relaxed decision diagrams.
7.1 Introduction
The ordering of the vertices plays an important role in not only the size of exact
decision diagrams, but also in the bound obtained by DD-based relaxations and
restrictions. It is well known that finding orderings that minimize the size of DDs
(or even improving on a given ordering) is NP-hard [58, 35]. We found that the
ordering of the vertices is the single most important parameter in creating small-
width exact DDs and in proving tight bounds via relaxed DDs.
In this chapter we analyze how the combinatorial structure of a problem can be
exploited to develop variable orderings that bound the size of the DD representing its
solution space. We will particularly focus on the maximum independent set problem
(MISP) for our analysis, first described in Section 3.5. Given a graph G = (V, E)
with a vertex set V , an independent set I is a subset I ⊆ V such that no two vertices
in I are connected by an edge in E, i.e., (u, v) ∈ E for any distinct u, v ∈ I. If we
associate weights with each vertex j ∈ V , the MISP asks for a maximum-weight
independent set of G. Since variables are binaries, the resulting diagram is a binary
decision diagram (BDD).
Different orderings can yield exact BDDs with dramatically different widths. For
example, Fig. 7.1 shows a path on six vertices with two different orderings given
by x1 , . . . , x6 and y1 , . . . , y6 . In Fig. 7.2(a) we see that the vertex ordering x1 , . . . , x6
yields an exact BDD with width 1, while in Fig. 7.2(b) the vertex ordering y1 , . . . , y6
yields an exact BDD with width 4. This last example can be extended to a path
with 2n vertices, yielding a BDD with a width of 2n−1 , while ordering the vertices
according to the order in which they lie on the paths yields a BDD of width 1.
x1 x2 x3 x4 x5 x6
y1 y4 y2 y5 y3 y6
Fig. 7.1 Path graph.
x1 y1
x2 y2
x3 y3
x4 y4
x5 y5
x6 y6
(a) (b)
Fig. 7.2 (a) Variable order that results in a small reduced BDD. (b) Variable order that results in a
larger reduced BDD.
Our study focuses on studying variable orderings for the layers of a BDD
representing the set of feasible solutions to a MISP instance. For particular classes of
graphs, variable orderings are given that can be used to provide worst-case bounds
on the width of exact BDDs [24, 89]. This is followed by the description of a family
of variable orderings for which the exact BDD is bounded by the Fibonacci numbers.
Based on this analysis, various heuristic orderings for relaxed BDDs are suggested,
7.2 Exact BDD Orderings 125
which operate on the assumption that an ordering that results in a small-width exact
reduced BDD also results in a relaxed BDD with a strong optimization bound.
Even though these orderings are specific to the maximum independent set prob-
lem, they indicate novel look-ahead ordering heuristics that are applicable to any
combinatorial optimization problem. Recent work has also extended the results
to more general independent systems [89], relating the size of a BDD with the
bandwidth of the constraint matrix.
Let E(u) be the state associated with a node u, and let S(L j ) be the set of states on
nodes in L j , S(L j ) = ∪u∈L j E(u). To bound the width of a given layer j, we need only
count the number of states that may arise from independent sets on {v1 , . . . , v j−1 }.
This is because each layer will have one and only one node for each possible state,
and so there is a one-to-one correspondence between the number of states and the
size of a layer.
It is assumed for the remainder of this chapter that the form of the decision
diagram used is not a zero-compressed decision diagram, as is shown in the figures
above. The bounds can be slightly improved should zero-compressed decision
diagrams be employed, but for ease of exposition and clarity the use of BDDs is
assumed.
Theorem 7.1. Let G = (V, E) be a clique. Then, for any ordering of the vertices, the
width of the exact reduced BDD will be 2.
Proof. Consider any layer j. The only possible independent sets on {v1 , . . . , v j+1 }
are 0/ or {vi }, i = 1, . . . , j − 1. For the former, E(0/ | {v j , . . . , vn }) = {v j , . . . , vn } and
for the latter, E({vi } | {v j , . . . , vn }) = 0,
/ establishing the bound.
Theorem 7.2. Let G = (V, E) be a path. Then, there exists an ordering of the vertices
for which the width of the exact reduced BDD will be 2.
Proof. Let the ordering of the vertices be given by the positions in which they
appear in the path. Consider any layer j. Of the remaining vertices in G, namely
{v j , . . . , vn }, the only vertex with any adjacencies to {v1 , . . . , v j−1 } is v j . Therefore,
for any independent set I ⊆ {v1 , . . . , v j−1 }, E(I | V j−1 ) will be either {v j , . . . , vn }
126 7 Variable Ordering
(when v j−1 ∈
/ I) or {v j+1 , . . . , vn } (when v j−1 ∈ I). Therefore there can be at most
two states in any given layer.
Proof. Using the ordering defined by V above, let u be any node in B in layer j and
I the independent set it corresponds to. Four cases are possible and will define E(u),
implying the width of 4:
⎧
⎪
⎪ {v j , v j+1 , . . . , vn }
⎪
⎪
v1 , v j−1 ∈
/I
⎨ {v , v , . . . , v } v1 ∈ I, v j−1 ∈
/I
j j+1 n−1
E(u) =
⎪ {v j+1, . . . , vn }
⎪ v1 ∈ / I, v j−1 ∈ I
⎪
⎪
⎩ {v , v , . . . , v } v1 , v j−1 ∈ I.
j+1 j+1 n−1
Theorem 7.4. Let G = (V, E) be a complete bipartite graph — a graph for which
the vertex set V can be partitioned into two sets V1 ,V2 so that
E ⊆ {{v1 , v2 } : v1 ∈ V1 , v2 ∈ V2 } .
There exist orderings of the vertices for which the width of the exact reduced BDD
will be 2.
Proof. Let V1 ,V2 be the two partitions of V providing the necessary conditions for
the graph being bipartite, and let the variables be ordered so that V1 = {v1 , . . . , v|V1 | }
and V2 = {v|V1 |+1 , . . . , vn }. Let B be the exact reduced BDD in this ordering, u any
node in B, and I the independent set induced by u.
Let j be the layer of u. If I does not contain any vertex in V1 then E(u) =
{v j , . . . , vn }. Otherwise, I contains only vertices in the first shore. Therefore, if
j ≤ |V1 |, E(u) = {v j , . . . , v|V1 | } and if j > |V1 |, E(u) = 0.
/ As these are the only
possibilities, the width of any of these layers is at most 2.
The above also implies, for example, that star graphs have the same width of 2.
We now consider interval graphs, that is, graphs that are isomorphic to the
intersection graph of a multiset of intervals on the real line. Such graphs have
vertex orderings v1 , . . . vn for which each vertex vi is adjacent to the set of vertices
7.2 Exact BDD Orderings 127
vai , vai +1 , . . . vi−1 , vi+1 , . . . , vbi for some ai , bi . We call such an ordering an interval
ordering for G. Note that paths and cliques, for example, are contained in this class
of graphs. In addition, note that determining whether or not an interval ordering
exists (and finding such an ordering) can be done in linear time in n.
Theorem 7.5. For any interval graph, any interval ordering v1 , . . . , vn yields an
exact reduced BDD for which the width will be no larger than n, the number of
vertices in G.
Proof. Let Tk = {vk , . . . , vn }. It is shown here that, for any u in the exact BDD
created using any interval ordering, E(u) = Tk for some k.
Each node u corresponds to some independent set in G. Fix u and let V contained
in {v1 , . . . , vk−1 } be the independent set in G that it corresponds to. Let b̃ be the
maximum right limit of the intervals corresponding to the vertices in V . Let ã be the
minimum left limit of the intervals corresponding to the variables in {v1 , . . . , vk−1 },
which is larger than b̃, and k the index of the graph vertex with this limit (i.e.,
for which ak = ã). E(u) = Vk , and since u was arbitrary, the theorem follows, and
ω (B) ≤ n.
Theorem 7.6. Let G = (V, E) be a tree. Then, there exists an ordering of the vertices
for which the width of the exact reduced BDD will be no larger than n, the number
of vertices in G.
Proof. We proceed by induction on n. For the base case, a tree with 2 vertices is a
path, which we already know has width 2. Now let T be a tree on n vertices. Any
tree on n vertices contains a vertex v for which the connected components C1 , . . . ,Ck
created upon deleting v from T have sizes |Ci | ≤ n2 [103]. Each of these connected
components are trees with fewer than 2n vertices, so by induction, there exists an
ordering of the vertices on each component Ci for which the resulting BDD Bi will
have width ω (Bi ) ≤ n2 . For component Ci , let vi1 , . . . , vi|Ci | be an ordering achieving
this width.
Let the final ordering of the vertices in T be v11 , . . . , v1|C1 | , v21 , . . . , vk|C | , v, which we
k
use to create BDD B for the set of independent sets in T . Consider layer ≤ n − 1
of B corresponding to vertex vij . We claim that the only possible states in S()
are s ∪ Ci+1 ∪ · · · ∪ Ck and s ∪ Ci+1 ∪ · · · ∪ Ck ∪ {v}, for s ∈ Si ( j), where Si ( j) is
the set of states in BDD Bi in layer j. Take any independent set on the vertices
I ⊆ {v11 , . . . , v1|C1 | , v21 , . . . , vij−1 }. All vertices in I are independent of the vertices in
Ci+1 , . . . ,Ck , and so E(I | {vij , . . . , vi|Ci | } ∪ Ci+1 ∪ · · · ∪ Ck ) ⊇ Ci+1 ∪ · · · ∪ Ck . Now,
128 7 Variable Ordering
Theorem 7.7. Let G = (V, E) be any graph. There exists an ordering of the vertices
for which ω j ≤ Fj+1 , where Fk is the kth Fibonacci number.
Theorem 7.7 provides a bound on the width of the exact BDD for any graph.
The importance of this theorem goes further than the actual bound provided on
the width of the exact BDD for any graph. First, it illuminates another connection
between the Fibonacci numbers and the family of independent sets of a graph, as
investigated throughout the literature graph theory (see, for example, [38, 67, 57,
159]). In addition to this theoretical consideration, the underlying principles in the
proof provide insight into what heuristic ordering for the vertices in a graph could
lead to BDDs with small width. The ordering inspired by the underlying principle
in the proof yields strong relaxation BDDs.
Case 1: y j+1 is the last vertex in the path that it belongs to. Take any node
u ∈ L j+1 and its associated state E(u). Including or not including y j+1 results in
state E(u)\{y j+1} since y j+1 is independent of all vertices yi , i ≥ j + 2. Therefore,
ω j+2 ≤ ω j+1 since each arc directed out of u will be directed at the same node, even
if the zero-arc and the one-arc are present. And, since in any BDD ωk ≤ 2 · ωk−1 ,
we have ω j+3 ≤ 2 · ω j+2 ≤ 2 · ω j+1 < ω j + 2 · ω j+1.
Case 2: y j+1 is the first vertex in the path that it belongs to. In this case, y j must
be the last vertex in the path that it belongs to. By the reasoning in case 1, it follows
that ω j+1 ≤ ω j . In addition, we can assume that y j+1 is not the last vertex in the path
that it belongs to because then we are in case 1. Therefore, y j+2 is in the same path
as y j+1 in P. Consider L j+2 . In the worst case, each node in L j+1 has y j+1 in its state
so that ω j+2 = 2 · ω j+1 . But, any node arising from a one-arc will not have y j+2 in
its state. Therefore, there are at most ω j+1 nodes in L j+2 with y j+2 in their states and
at most ω j+1 nodes in L j+2 without y j+2 in their states. For the set of nodes without
y j+2 in their states, we cannot make a one-arc, showing that ω j+3 ≤ ω j+2 + ω j+1 .
Therefore, we have ω j+3 ≤ ω j+1 + ω j+2 ≤ 3 · ω j+1 ≤ ω j + 2 · ω j+1.
Case 3: y j+1 is not first or last in the path that it belongs to. As in case 2,
ω j+1 ≤ 2 · ω j , with at most ω j nodes on layer L j+1 with w j+2 in its corresponding
state label. Therefore, L j+2 will have at most ω j more nodes in it than layer L j+1 .
130 7 Variable Ordering
As the same holds for layer L j+3 , in that it will have at most ω j+1 more nodes in it
than layer L j+2 , we have ω j+3 ≤ ω j+2 + ω j+1 ≤ ω j+1 + ω j + ω j+1 = ω j + 2 · ω j+1 ,
as desired, finishing the proof.
We note here that, using instance C2000.9 from the DIMACS benchmark set,1
a maximal path decomposition ordering of the vertices yields widths approximately
equal to the Fibonacci numbers, as seen in Table 7.1.
j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 · · ·
wj 1 2 3 5 8 13 21 31 52 65 117 182 299 481 624 · · ·
Fib( j + 1) 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 · · ·
In this section we provide heuristic orderings for the vertices to be used during the
top-down compilation of relaxation BDDs. The orderings are suggested based on the
theorems proved in the previous sections, with the idea that, by examining simple
structured problems, we can gain intuition as to what is controlling the width of the
exact BDD for general graphs, hopefully yielding tighter upper bounds.
Maximal Path Decomposition (MPD). As show in Theorem 7.7, such an ordering
yields an exact BDD with width bounded by the Fibonacci numbers, yielding a
theoretical worst-case bound on the width for any instance. This ordering can be
precomputed in worst-case time complexity O(|V | + |E|). We note that different
maximal path decompositions may yield different sized BDDs.
Minimum Number of States (MIN). In this ordering, we select the next vertex in
the BDD as the vertex which appears in the fewest states of the layer we are
currently building. The driving force behind the proof of Theorem 7.7 is that when
constructing a layer, if a vertex does not belong to the state of a node on a previous
layer, we cannot include this vertex, i.e., we cannot add a one-arc, only the zero-arc.
This suggests that selecting a variable appearing the fewest number of times in the
1 http://dimacs.rutgers.edu/Challenges/
7.4 Experimental Results 131
states on a layer will yield a small-width BDD. The worst-case time complexity to
perform this selection is O(W |V |) per layer.
k-Look Ahead Ordering (kLA). This ordering can be employed for any binary
optimization. In 1LA, after selecting the first j vertices and constructing the top
j + 1 layers, the next chosen vertex is the one that yields the smallest width for layer
j + 2 if it were selected next. This procedure can be generalized for arbitrary k < n
by considering subsets of yet to be selected vertices. The worst-case running time
for selecting a vertex can be shown to be O( nk ·W |V |2 log |W |) per layer.
For general k we can proceed as follows. Let X = {x1 , . . . , xn }. We begin by
selecting every possible set of k variables for X . For each set S, we build the
exact BDD using any ordering of the variables in S, yielding an exact BDD up
to layer k + 1. We note that it suffices to just consider sets of variables as opposed
to permutations of variables because constructing a partial exact BDD using any
ordering in S would yield the same number of states (and hence the same width) for
layer k + 1. We then select the set which yields the fewest number of nodes in the
resulting partially constructed BDD, using the variable in S that yields the smallest
width of layer 2 as the first variable in the final ordering.
Continuing in this fashion, for j ≥ 2, we select every set of the unselected
variables of size k = min k, n − j and construct the exact BDD if these were the next
selected vertices. For the set S that achieves the minimum width of layer L j+k+1 ,
we choose the variable in S that minimizes the width if it were to be selected as the
next vertex, and continue until all layers are built. The worst case running time for
selecting a vertex can be shown to be O nk ·W |V |2 log |W | per layer.
The purpose of the first set of experiments is to demonstrate empirically that variable
orderings potentially play a key role in the width of exact BDDs representing com-
binatorial optimization problems. To this end, we have selected a particular graph
structure, namely trees, for which we can define an ordering yielding a polynomial
bound on its width (Theorem 7.6). We then compare the ordering that provides this
bound with a set of randomly generated orderings. We also compare with the MPD
heuristic, which has a known bound for general graphs according to Theorem 7.7.
The trees were generated from the benchmark problems C125.9, keller4,
c-fat100-1, p hat300-1, brock200 1, and san200 0.7 1 by select-
ing 5 random trees each on 50 vertices from these graphs. The tree-specific ordering
discussed in Theorem 7.6 is referred to as the CV (due to the computation of
cut-vertices in the corresponding proof). We generated exact BDDs using 100
uniform-random orderings for each instance, and report the minimum, average, and
maximum obtained widths.
The results are shown in Table 7.2. In all cases, none of the 100 random orderings
yielded exact BDDs with width smaller than the ones generated from the CV
or MPD orderings. Moreover, the average was consistently more than an order
of magnitude worse than either of the structured orderings. This confirms that
investigating variable orderings can have a substantial effect on the width of the
exact BDDs produced for independent set problems. In addition, we see that also
across all instances, the CV ordering, which is specific to trees, outperforms the
MPD ordering that can be applied to general graphs, suggesting that investigating
orderings specific to particular classes of instances can also have a positive impact
on the width of exact BDDs.
The second set of experiments aims at providing empirical evidence to the hypoth-
esis that a problem instance with a smaller exact BDD results in a relaxation BDD
that yields a tighter bound. The instances in this test were generated as follows:
We first selected five instances from the DIMACS benchmark: brock200 1,
gen200 p.0.9 55, keller4, p hat300-2, and san200 0.7 1. Then, we
uniformly at random extracted 5 connected induced subgraphs with 50 vertices for
7.4 Experimental Results 133
each instance, which is approximately the largest graph size for which the exact
BDD can be built within our memory limits.
The tests are described next. For each instance and all orderings MPD, MIN,
random, and 1LA, we collected the width of the exact BDD and the bound obtained
by a relaxation BDD with a maximum width of 10 (the average over 100 orderings
for the random procedure). This corresponds to sampling different exact BDD
widths and analyzing their respective bounds, since distinct variable orderings may
yield BDDs with very different exact widths.
Figure 7.3 presents a scatter plot of the derived upper bound as a function of
the exact widths in log-scale, also separated by the problem class from which the
instance was generated. Analyzing each class separately, we observe that the bounds
and width increase proportionally, reinforcing our hypothesis. In particular, this
proportion tends to be somewhat constant, that is, the points tend to a linear curve for
each class. We notice that this shape has different slopes according to the problem
class, hence indicating that the effect of the width might be more significant for
certain instances.
In Fig. 7.4 we plot the bound as a function of the exact width for a single
random instance extracted from san200 0.7 1. In this particular case, we applied
a procedure that generated 1000 exact BDDs with a large range of widths: the
minimum observed BDD width was 151 and the maximum was 27,684, and the
widths were approximately uniformly distributed in this interval. We then computed
the corresponding upper bounds for a relaxed BDD, constructed using the orderings
described above, with width 10. The width is given in a log-scale. The figure also
shows a strong correlation between the width and the obtained bound, analogous to
134 7 Variable Ordering
180
brock200-1
gen200-p.0.9-55
160 keller4
p-hat300-2
140 san200-0.7-1
100
80
60
40
20
0
10 100
Exact BDD width
38
36
Upper-bound - width = 10
34
32
30
28
26
24
22
10 100 1000 10000 100000
Exact BDD width
Fig. 7.4 Bound of relaxation BDD vs. exact BDD width for san200 0.7 1.
the previous set of experiments. A similar behavior is obtained if the same chart is
plotted for other instances.
We now report the upper bound provided by the relaxation BDD for the original
benchmark set, considering all heuristic orderings suggested for maximum widths
100, 500, and 1000. In addition, we generate 100 random orderings generated
7.4 Experimental Results 135
uniformly at random, denoted here by RAND, and the bound reported is obtained
by taking the average over the 100 generated orderings. The average compilation
time for maximum width 100, 500, and 1000 was 0.21, 1.49, and 3.01 seconds,
respectively, for the MIN ordering (which was similar to RAND and MPD), while
the average time for maximum width 100, 500, and 1000 was 65.01, 318.68, and
659.02, respectively, for the 1LA ordering. For comparison purposes, we have also
included the upper bound obtained by considering the IP formulation of the MISP,
since this corresponds to a well-known bounding technique for general domains. We
ran these instances with CPLEX 12.2 with default settings and took the resulting
bound obtained after the root node was computed. We impose a time limit of 60
seconds so that the results were comparable to the MIN ordering with width 1000
since the longest time to create any relaxation BDD with these parameters was for
C.4000.5, which took 50.42 seconds.
The results are presented in Table 7.3. We report for each instance the optimal
or the best known feasible solution and the bounds, where CPLEX is the bound
obtained by the root node relaxation using CPLEX (the notation 1.00E+75 indicates
that a bound was not obtained in the 60-second time limit). By first comparing the
results obtained between orderings, we see that the MIN ordering and the general-
purpose 1LA heuristic provide the best bounds for most instances. We highlight here
that MIN and 1LA were the heuristics that provided the smallest BDD widths for
the instances tested in Section 7.4.2. We note that MIN generates BDDs an average
of an order of magnitude faster than 1LA.
To compare the obtained bounds with CPLEX, we consider the relative bound
measure, which is given by (upper bound/optimum). The average relative bound
for CPLEX (omitting the instances for which CPLEX was unable to provide a
bound) is given by 3.85, while for MIN and 1LA it is given by 2.34 and 2.32,
respectively, for a width of 100, and 1.92 and 1.90, respectively, for a width of
1000 (the averages are not significantly different at the 5% level between MIN
and 1LA). The average relative ordering for RAND was 5.51 and 4.25 for widths
of 100 and 1000, respectively. This indicates that variable orderings are crucial to
obtain tighter and relevant bounds, being particularly significant for larger instances
when comparing with CPLEX, explaining the smaller average relative bound. We
further observe that, since times were very small for the structured heuristics, the
bounds obtained here can be improved using the general-purpose bound-improving
procedures in [28].
136 7 Variable Ordering
Instance OPT MIN MAX RAND 1LA MIN MAX RAND 1LA MIN MAX RAND 1LA CPLEX(1 min.) MIN
C1000.9.clq 68 261 419 585.42 259 244 394 528.25 241 240 384 506.63 238 221.78 240
C125.9.clq 34 46 55 71.68 44 45 52 64.51 42 43 50 61.78 41 41.2846 43
C2000.5.clq 16 153 353 368.34 152 121 249 252.27 120 110 218 218 110 1.00E+75 110
C2000.9.clq 77 480 829 1170.91 479 447 788 1055.26 447 436 767 1012.4 433 1.00E+75 436
C250.9.clq 44 80 107 144.84 78 74 99 130.46 73 72 98 125.21 72 70.9322 72
C4000.5.clq 18 281 708 736.31 280 223 497 504.46 223 202 429 435.31 203 1.00E+75 202
C500.9.clq 57 142 215 291.48 142 134 203 262.57 133 132 198 251.8 131 123.956 132
gen200 p0.9 44.clq 44 62 84 115.69 62 61 79 103.98 59 59 78 99.78 56 44 59
gen200 p0.9 55.clq 55 67 88 116.39 65 63 84 104.88 62 61 81 100.57 59 55 61
gen400 p0.9 55.clq 55 100 168 233.15 100 99 161 210.21 96 94 156 201.84 94 55 94
gen400 p0.9 65.clq 65 112 168 233.63 110 105 161 210.55 105 103 159 202.11 101 65 103
gen400 p0.9 75.clq 75 118 170 234.23 118 109 164 211.2 109 108 158 202.73 105 75 108
brock200 1.clq 21 42 64 72.12 41 36 54 58.61 36 34 50 54.01 35 38.9817 34
brock200 2.clq 12 22 35 35.6 22 17 24 24.68 18 16 22 21.69 16 22.3764 16
brock200 3.clq 15 28 48 48.87 29 24 36 36.22 25 23 33 32.39 23 28.3765 23
brock200 4.clq 17 32 53 56.61 32 29 42 43.32 27 26 37 39.12 25 31.5437 26
brock400 1.clq 27 72 127 145.81 71 63 108 118.75 63 60 102 109.32 61 67.2201 60
brock400 2.clq 29 75 128 147.35 72 63 107 119.47 61 61 101 110.16 60 67.9351 61
brock400 3.clq 31 72 127 146.19 73 64 109 118.63 64 60 102 109.12 60 67.4939 60
brock400 4.clq 33 70 129 146.43 71 63 110 119.54 63 63 106 109.59 61 67.3132 63
brock800 1.clq 23 99 204 222.01 100 85 160 168.39 86 79 145 151.21 78 136.103 79
brock800 2.clq 24 101 201 224.38 100 86 162 170.65 85 79 145 153.29 79 136.538 79
brock800 3.clq 25 101 203 222.61 100 84 164 169.05 84 81 149 151.31 79 130.832 81
brock800 4.clq 26 101 205 223.41 100 84 161 169.81 84 80 145 152.66 78 132.696 80
c-fat200-1.clq 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
c-fat200-2.clq 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24
c-fat200-5.clq 58 58 58 58 58 58 58 58 58 58 58 58 58 61.6953 58
c-fat500-1.clq 14 14 15 16.62 14 14 14 14 14 14 14 14 14 230.513 14
c-fat500-10.clq 126 126 126 126 126 126 126 126 126 126 126 126 126 246 126
c-fat500-2.clq 26 26 26 26 26 26 26 26 26 26 26 26 26 240 26
c-fat500-5.clq 64 64 64 64 64 64 64 64 64 64 64 64 64 244.5 64
hamming10-2.clq 512 512 512 892.69 515 512 512 871.68 512 512 512 862.99 512 512 512
hamming10-4.clq 40 106 91 456.63 105 96 76 385.13 93 79 72 359.76 79 206.047 79
hamming6-2.clq 32 32 32 37.01 32 32 32 34.03 32 32 32 33.28 32 32 32
hamming6-4.clq 4 4 4 5.98 4 4 4 4 4 4 4 4 4 5.33333 4
hamming8-2.clq 128 128 128 194.42 128 128 128 184.51 128 128 128 180.71 128 128 128
hamming8-4.clq 16 20 21 62.23 19 18 18 45.66 18 17 17 40.56 17 16 17
johnson16-2-4.clq 8 11 11 38.75 11 9 9 29.24 9 8 8 25.64 8 8 8
johnson32-2-4.clq 16 40 35 250.07 42 38 29 215.06 39 35 25 202.36 40 16 35
johnson8-2-4.clq 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
johnson8-4-4.clq 14 14 15 24.57 14 14 14 19.82 14 14 14 18.54 14 14 14
keller4.clq 11 19 22 43.38 18 16 17 31.24 16 15 16 27.54 15 14.75 15
keller5.clq 27 58 98 280.74 59 56 77 225.75 55 48 72 207.08 49 32.875 48
keller6.clq 59 171 417 1503.26 174 142 332 1277.98 144 123 307 1197.76 125 1.00E+75 123
MANN a27.clq 126 142 138 327.2 135 140 137 318.93 137 139 137 315.25 136 133.331 139
MANN a45.clq 345 371 365 954.51 366 368 362 942.45 363 368 362 937.06 365 357.162 368
MANN a81.clq 1100 1154 1143 3186.21 1141 1150 1143 3166.06 1143 1148 1143 3158.78 1141 1131.82 1148
MANN a9.clq 16 18 18 27.21 17 16 16 23.9 16 16 16 22.88 16 17 16
p hat1000-1.clq 10 47 86 88.73 48 35 52 52.71 36 31 43 43.37 31 413.5 31
p hat1000-2.clq 46 130 210 225.57 129 116 171 178.1 112 112 159 163.47 108 376.5 112
p hat1000-3.clq 68 202 324 383.76 197 187 286 322.62 179 179 272 302.07 175 245.674 179
p hat1500-1.clq 12 68 136 139.02 68 51 83 83.08 51 46 69 68.33 45 1.00E+75 46
p hat1500-2.clq 65 199 344 357.01 193 176 285 286.03 174 168 267 263.95 163 1.00E+75 168
p hat1500-3.clq 94 298 511 594.04 296 277 452 502.22 270 272 433 470.91 266 1.00E+75 272
p hat300-1.clq 8 17 27 26.05 18 14 16 15.89 14 12 13 13.39 12 18.2278 12
p hat300-2.clq 25 48 64 66.46 45 42 51 52.29 40 40 48 47.83 39 35.2878 40
p hat300-3.clq 36 70 99 114.66 67 65 89 95.93 61 62 84 89.86 60 55.2598 62
p hat500-1.clq 9 28 45 45.33 27 21 28 27.3 21 18 23 22.7 19 158 18
p hat500-2.clq 36 77 112 116.55 72 69 92 92.8 64 66 84 85.54 63 160.25 66
p hat500-3.clq 50 111 172 195.67 109 106 155 165.35 102 104 147 154.88 99 90.7331 104
p hat700-1.clq 11 36 62 63.27 36 27 39 37.83 27 24 31 31.33 24 272.5 24
p hat700-2.clq 44 101 155 163.03 99 90 128 130.39 88 85 118 120.19 83 272.5 85
p hat700-3.clq 62 153 234 272.83 147 142 208 230.14 141 137 198 215.93 134 160.333 137
san1000.clq 15 28 184 202.02 26 21 101 104.09 19 19 78 79.84 19 462.5 19
san200 0.7 1.clq 30 32 66 73.67 31 30 57 60.3 30 30 52 55.37 30 30 30
san200 0.7 2.clq 18 23 58 71.76 21 20 48 56.2 20 19 46 50.23 18 18 19
san200 0.9 1.clq 70 71 86 118.89 70 70 82 108.56 70 70 81 105.13 70 70 70
san200 0.9 2.clq 60 68 86 116.48 64 64 83 105.39 60 60 81 101.05 60 60 60
san200 0.9 3.clq 44 57 84 115 54 55 78 103.23 53 51 77 99 52 44 51
san400 0.5 1.clq 13 17 66 69.02 18 14 35 35.6 14 13 28 28.31 13 13 13
san400 0.7 1.clq 40 50 142 160.35 51 46 127 136.08 43 42 119 126.86 41 40 42
san400 0.7 2.clq 30 44 129 147.55 45 38 108 119.96 39 37 103 109.84 35 30 37
san400 0.7 3.clq 22 36 118 137.72 38 29 98 108.29 31 29 91 97.98 29 22 29
san400 0.9 1.clq 100 117 175 236.22 118 109 169 214.05 108 108 164 205.73 108 100 108
sanr200 0.7.clq 18 34 58 63 36 31 46 49.56 32 30 44 45.18 29 34.5339 30
sanr200 0.9.clq 42 67 86 114.78 66 63 83 103.25 60 61 80 98.89 61 59.5252 61
sanr400 0.5.clq 13 40 70 73.32 39 33 50 50.5 31 29 45 43.73 29 43.1544 29
sanr400 0.7.clq 21 64 115 128.44 64 55 96 101.06 54 52 89 91.69 52 62.078 52
Chapter 8
Recursive Modeling
Abstract This chapter focuses on the type of recursive modeling that is required
for solution by decision diagrams. It presents a formal development that highlights
how solution by decision diagrams differs from traditional enumeration of the state
space. It illustrates the versatility of recursive modeling with examples: single-
facility scheduling, scheduling with sequence-dependent setup times, and minimum
bandwidth problems. It shows how to represent state-dependent costs with canonical
arc costs in a decision diagram, a technique that can sometimes greatly simplify the
recursion, as illustrated by a textbook inventory management problem. It concludes
with an extension to nonserial recursive modeling and nonserial decision diagrams.
8.1 Introduction
The optimization and constraint solving communities have developed two primary
modeling styles, one based on constraints and one on recursive formulations.
Constraint-based modeling is the norm in mathematical programming, where con-
straints almost invariably take the form of inequalities or equations, as well as
in constraint programming, which draws from a collection of high-level global
constraints. Recursive modeling, on the other hand, characterizes dynamic program-
ming and Markov decision processes.
Both modeling paradigms have seen countless successful applications, but it
is hard to deny that constraint-based modeling is the dominant one. Recursive
modeling is hampered by two perennial weaknesses: it frequently results in state
spaces that grow exponentially (the “curse of dimensionality”), and there are no
general-purpose solvers for recursive models, as there are for mathematical pro-
gramming and constraint programming models. An extensive literature shows how
to overcome the curse of dimensionality in many applications, using state space
relaxation, approximate dynamic programming, and the like. Yet these require
highly tailored solution algorithms, and many other recursive formulations remain
intractable. As a result, the major inherent advantage of recursive modeling too often
goes unexploited: its ability to model a vast range of feasible sets and objective
functions, with no need for linear, convex, or closed-form expressions.
Solution methods based on decision diagrams can accommodate both types of
modeling, as illustrated throughout this book. However, decision diagrams have
a special affinity to recursive modeling due to their close relationship with state
transition graphs in deterministic dynamic programming. Furthermore, they offer
the prospect of addressing the two weaknesses of recursive modeling in a novel
fashion. The use of relaxed decision diagrams allows recursive models to be solved
by branch-and-bound methods rather than by enumerating the state space, as de-
scribed in Chapter 6. This, in turn, may allow the development of general-purpose
branch-and-bound solvers that are analogous to mixed integer programming solvers.
Decision diagrams may therefore help to unlock the unrealized potential of recursive
modeling.
Recursive modeling may seem restrictive at first, because it requires that the
entire problem be formulated in a sequential, Markovian fashion. Each stage of the
recursion can depend only on the previous stage. This contrasts with a constraint-
based formulation, in which constraints can be added at will, with no need for a
particular overall structure. The only requirement is that the individual constraints
be in recognizable form, as for example linear inequalities.
However, once an overall recursive structure is identified, recursive modeling
provides enormous flexibility. Any objective function or feasible set that can be
expressed in terms of the current state and control can be modeled, either in closed
form or by subroutine call. A wide range of problems naturally have this structure,
and these frequently have no convenient mathematical programming formulation.
Many other problems can, with a little ingenuity, be put in the required form.
A particular strength of recursive models is that any possible objective function
over finite domains can be modeled with state-dependent costs, which correspond to
arc costs on the corresponding decision diagram. In fact, a single objective function
can be realized with several different sets of arc costs, one of which can be described
as canonical. A straightforward conversion of arc costs to canonical costs can result
8.2 General Form of a Recursive Model 139
We first recall the general form of a recursive model for decision diagrams that was
set out in Chapters 3 and 4. The model consists of an exact formulation and a node
merger rule for creating a relaxed decision diagram.
The exact formulation consists of control variables (or controls, for short), a state
space, and transition functions. The controls are x1 , . . . , xn and have finite domains
D(x1 ), . . . , D(xn ), respectively. The state space S is the union of sets S1 , . . . , Sn+1
corresponding to stages of the recursion. The initial set S1 contains only an initial
state r̂, and the final set Sn+1 contains one or more terminal states. In addition, each
Si contains an infeasible state 0̂. For each stage i = 1, . . . , n there is a state transition
function ti : Si × D(xi ) → Si+1 , where ti (si , xi ) specifies the result of applying control
x j in state si . The infeasible state always transitions to itself, so that ti (0̂, xi ) = 0̂
for all xi ∈ D(xi ). There are also immediate cost functions hi : Si × D(xi ) → R for
i = 1, . . . , n, where hi (si , xi ) is the immediate cost of applying control xi in state si .
The optimization problem is to identify controls that minimize
n
F(x1 , . . . , xn ) = ∑ hi (si , xi ) (8.1)
i=1
subject to
si+1 = ti (si , xi ), xi ∈ D(xi ), i = 1, . . . , n,
(8.2)
si ∈ Si , i = 1, . . . , n + 1.
The summation in the objective function (8.1) can be replaced by another operator,
such as a product, maximum, or minimum.
140 8 Recursive Modeling
where gi (si ) is the cost-to-go at state si . The optimal value is g1 (r̂). The sum in (8.3)
can again be replaced by another operator. Backward induction requires, of course,
that the elements of the state space S be enumerated, frequently a task of exponential
complexity. Optimal solutions are recovered in a forward pass by letting Xi (si ) be
the set of controls xi ∈ D(xi ) that achieve the minimum in (8.3) for state si . Then
an optimal solution is any sequence of controls (x¯1 , . . . , x¯n ) such that x¯i ∈ Xi (s¯i ) for
i = 1, . . . , n, s¯1 = r̂, and s¯i+1 = ti (s¯i , x¯i ) for i = 1, . . . , n − 1.
The state transition graph for a recursive model is defined recursively. The initial
state r̂ corresponds to a node of the graph, and for every state si that corresponds
to a node of the graph and every xi ∈ D(xi ), there is an arc from that node to a
node corresponding to state ti (si , xi ). The arc has length equal to the immediate cost
hi (si , xi ), and a shortest path from r̂ to a terminal state corresponds to an optimal
solution.
The state transition graph can be regarded as a decision diagram, after some
minor adjustments: remove nodes corresponding to the infeasible state, and add an
arc from each terminal state to a terminal node. The result is a decision diagram in
which each layer (except the terminal layer) corresponds to a stage of the recursion.
A relaxed decision diagram is created by a relaxation scheme (⊕, Γ ) that merges
states associated with nodes of the original decision diagram. The operator ⊕ maps
a set M of states corresponding to nodes on layer i to a single state ⊕(M). The
function Γ maps the immediate cost v = hi−1 (si−1 , xi−1 ) of any control xi−1 that
leads to state si ∈ M to a possibly altered cost ΓM (si , v). This results in modified
, h
transition and cost functions ti−1 i−1 that are identical to the original functions
except that they reflect the redirected arcs and altered costs:
ti−1 (si−1 , xi−1 ) = ⊕(M)
whenever hi−1 (si−1 , xi−1 ) ∈ M,
hi−1 (si−1 , xi−1 ) = ΓM (si−1 ,ti−1 (si−1 , xi−1 ))
8.3 Examples 141
The relaxation scheme is valid if every feasible solution of the original recursion is
feasible in the modified recursion and has no greater cost. For this it suffices that
8.3 Examples
where fi is now interpreted as the earliest possible finish time of the most recent job.
The immediate costs are unchanged.
This recursive model readily accommodates any side constraint or objective
function that can be defined in terms of (Ji , fi ) and x j . For example, release times r j
for the jobs are accommodated by the slightly different transition function
(Ji ∪{ j}, max{r j , fi } + p j ), if j ∈ Ji
ti ((Ji , fi ), j) =
0̂, if j ∈ Ji
and immediate cost hi ((Ji , fi ), j) = (max{r j , fi } + p j − d j )+ . One can shut down the
machine for maintenance in the interval [a, b] by using the transition function
⎧
⎪
⎨ (Ji ∪{ j}, fi + p j + b − a), if j ∈ Ji and fi ∈ [a − p j , b)
ti ((Ji , fi ), j) = (Ji ∪{ j}, fi + p j ), if j ∈ Ji and fi ∈ [a − p j , b)
⎪
⎩
0̂, if j ∈ Ji
In addition, processing job j may require that certain components have already been
fabricated in the processing of previous jobs. We simply set ti (Ji , j) = 0̂ when the
jobs in Ji do not yield the necessary components. Such side constraints actually
make the problem easier by simplifying the decision diagram.
A wide variety of objective functions are also possible. For example, the cost of
processing job j may depend on which jobs have already been processed, perhaps
again due to common components. We can let the cost associated with control x j = j
be any desired function c j (Ji ), perhaps evaluated by a table lookup. Or the cost could
be an arbitrary function c j (( fi + p j − d j )+ ) of tardiness, such as a step function, or
a function of both Ji and fi .
Note that the finish time of job j is based on the previous job i that results in the
earliest finish time for job j. This ensures a valid relaxation. States are merged by
taking the intersection of the sets Ji , as before, and the union of the sets Li :
⊕(M) = Ji , Li . (8.6)
(Ji ,Li )∈M (Ji ,Li )∈M
As in the previous section, the model accommodates any side constraint that can be
defined in terms of the current state and control.
The famous traveling salesman problem results when deadlines are removed and
the objective is to minimize total travel time. The transition function (8.4) for the
exact recursion simplifies to
(Ji ∪{ j}, j), if j ∈ Ji
ti ((Ji , i ), j) =
0̂, if j ∈ Ji
with immediate cost hi ((Ji , i ), j) = pi j . This is the classical dynamic programming
model for the problem, which is generally impractical to solve by backward induc-
tion because of the exponential state space. However, the relaxation scheme given
above allows a recursive model to be solved by branch and bound. The set Li of
144 8 Recursive Modeling
pairs (i , fi ) becomes a set of jobs i that could be the last job processed, and the
transition function (8.5) simplifies to
(Ji ∪{ j}, { j}), if j ∈ Ji
ti ((Ji , Li ), j) =
0̂, if j ∈ Ji
∑ |i − j|.
(xi ,x j )∈E
8.3 Examples 145
It is not obvious how to write integer programming models for these problems, and
[40] observes that no useful integer programming models are known.
A recursive model, however, can be formulated as follows. A state at stage i is
a tuple si = (si1 , . . . , sii ), where si is interpreted as the vertex assigned to position .
This is a brute-force model in the sense that there is only one feasible path from the
root to each state, namely the path that sets (x1 , . . . , xi ) = (si1 , . . . , sii ). However, the
model will become more interesting when relaxed. The state transition function is
i (si1 , . . . , sii , j), if j ∈ {si1 , . . . , sii }
ti (s , j) =
0̂, otherwise.
hi (si , j) = max {i + 1 − k}
k=1,...,i
(sik , j)∈E
hi (si , j) = ∑ (i + 1 − sik)
k=1,...,i
(sik , j)∈E
To compute the immediate cost, we let the lower-bound length LB jk (Si ) of an edge
( j, k) be the minimum length of ( j, k) over all positions that can be assigned to
vertex k given the current state Si . Thus the lower-bound length of ( j, k) is
1 The concepts of domain filtering and domain consistency in constraint programming are dis-
cussed in Chapter 9.
146 8 Recursive Modeling
The immediate cost for the minimum bandwidth problem is now the maximum
lower-bound length of edges incident to vertex j and a vertex in S1i ∪ · · · ∪ Sii :
hi (Si , j) = ∑ LB jk (Si ).
k∈Si1 ∪···∪Sii
Table 8.1 (a) A small set covering problem. The dots indicate which elements belong to each set
i. (b) A nonseparable cost function for the problem. Values are shown only for feasible x.
(a) (b)
Set i x F(x)
1 2 3 4 (0,1,0,1) 6
A • • (0,1,1,0) 7
B • • • (0,1,1,1) 8
C • • (1,0,1,1) 5
D • • (1,1,0,0) 6
(1,1,0,1) 8
(1,1,1,0) 7
(1,1,1,1) 9
to arcs in a certain canonical fashion. In fact, a canonical cost assignment may allow
one to simplify the diagram substantially, resulting in a simpler recursive model
that can be solved more rapidly. This will be illustrated for a textbook inventory
management problem.
While separable cost functions are the easiest to model, a weighted decision
diagram (and by implication, a recursive model) can accommodate an arbitrary
objective function, separable or nonseparable. Consider, for example, the nonsep-
arable cost function shown in Table 8.1(b). This and any other cost function can be
represented in a branching tree by placing the cost of every solution on the arc that
leads to the corresponding leaf node, and a cost of zero on all other arcs, as shown
in Fig. 8.2.
This search tree becomes a weighted decision diagram if the leaf nodes are
superimposed to form a terminal node. Furthermore, the decision diagram can be
reduced. The reduction proceeds as for an unweighted decision diagram, namely
by superimposing isomorphic subdiagrams rooted at nodes in the same layer. A
subdiagram rooted at a given node is the portion of the diagram that contains all
paths between that node and the terminal node. In a weighted decision diagram,
subdiagrams are isomorphic only when corresponding arcs reflect the same cost as
well as the same control. This limits the amount of reduction that is possible.
However, it is frequently possible to achieve greater reduction when the arc costs
are canonical. An assignment of arc costs to a tree or decision diagram is canonical
if, for every layer Li ≥ 2, and for every node on that layer, the smallest arc cost
r r
..
• ...
.... .......
... ..
• ..
.... .......
...
.... ....
.... .... ....
....
. .... . ....
.... ....
x1 x1
.... ....
. ... . ...
..
....
.... ....
....
....
.... ..
....
....
3
....
....
....
....
... ... ... ...
.. .... .. ....
. .... . ....
.... ....
• • • •
.... ....
.... ..... .... ....
.. ..
... ... ........ ... ... ......
... . ..... ... . ....
..... ....
... ... ..... ... ... ...
... . ..... ... . ....
... ..... ...
x2 x2
... . ... . ....
... ..... ... ....
...
...
... .
..... .....
.....
.....
.....
5 ...
...
... .
..... ....
... 5
....
....
... ... .....
..... ... ... ....
... .. ..... ... .. ....
... ... ..... ... ... ...
• • • • • •
. ..... . ....
......... ...
......... ......... ... ..
........
.. .......... ... ..... .... .. .......... ...
.. ... . .. .. . ...... .....
... .............. .
....
....... ...
. ... .............. .....
..
....
. ...
.. ....... . .. ....... .
.......
.. ..
...
......
..... .......
.. .....
...... .....
....... ... . ....... ... .
x3 x3
... . ... ...
. ......... .. ... . ......... .. ...
....... ... ... ....... ... ...
... ... ............. ... ... ... ............. ....
4 4
. . ... ...
.. ...
. ....... .. .. .. ...
....... .. ...
...
........ ... .... ...
........ ... .....
....... ..... .......
...
..
... .....
...
... .......
.......
.......
..
.. ...........
......
...
..
... .....
4 ...... .......
..........
.......
.
.
..
.
........ . ..............
......
..
.........
• • • •
....... .
..... .......................... ..... ... .....
... . ........
. ... .............
...
... ....... .... ...
... ....... ....
. .. . ...
... ...... ... ... ...... ..
... .. ...
... ..... ... ... .....
.. ...
... ...
x4 ...
...
...
... ..
.
....
.....
. ....
.
...
...
x4 6
...
...
...
...
...
..
....
.
6 ...
...
....
...
... ..
.....
...
...
. .....
...
... .... ..... ...
... .... .....
...... ......
... ...... ... ......
.... ............... ..... ...............
• .....
• .....
1 1
(a) (b)
Fig. 8.1 (a) Decision diagram for the set covering problem in Table 8.1(a). Dashed arcs correspond
to setting xi = 0, and solid arcs to setting xi = 1. (b) Decision diagram showing arc costs for a
separable objective function. Unlabeled arcs have zero cost.
8.4 State-Dependent Costs 149
.
......
• ..
.................
......
......
. ......
...... ......
. ......
......
x1 ......
.
......
......
......
......
......
......
......
...... ......
......
...... ......
• •
......
... ......
...... ...
... ... ........
... ....
.. ....
... .. ....
... ....
.. ....
x2 ...
... ....
... .....
... .
... ... ....
....
... . ....
... ... ....
....
... .. ....
... ... ....
• • •
.. ....
...
. .. ... ..
... ..... ... ... .....
...
. ... ..... .. ...
... ...
... . .
. ...
.
. ... ..... .
. ...
x3 .. ... .. ...
... .... ...
.. ... .. .. ...
... ...
... .... ... ...
...
..
. ... ... .
. . ...
...
. ...
... ..... . ...
.. ... . .. ...
... ..... ...
• • • • •
... ...
.... ... ...
.. ..
... ....... ...
... ....... ... . ... .......
. . . .
.... .
... ..... .... .
... ..... ... .....
.
. ... . ... ...
.... .....
x4
. ... . ... . ...
... ... . ... ... ... ...
6 ....
.
.....
...
7 ..
...
...
...
...
8 5 .....
....
...
6 ..
...
8 ...
...
...
7 ..
...
...
...
...
9
... . .
.... ...
.... .
.... ...
... .... .
.... .
. ...
.
.
...
...
. . ...
• ....
• • • .....
• • • •
... ... ...
.. .. .
. . .. ..
Fig. 8.2 Branching tree for the set covering problem in Table 8.1(a). Only feasible leaf nodes are
shown.
.
......
• .
.................
......
......
. ......
..... ......
......
.
x1 ..... ......
6 5
. ......
..... ......
......
.
...... ......
......
.
...... ......
......
.
...... ......
• •
.. ......
.
.. ......
..... ....
... ... ........
... . ....
....
...
... ... ....
....
.. ....
x2
... ... ....
...
...
...
... ..
....
1 .....
....
....
....
....
... ... ....
... . ....
... ... ....
• • •
.. ....
...
.... .. ...
... ..... ...
... ... .....
... ...
... ... ... .. ...
.
. ... . ..
. ...
. ... ..... . ...
.
.
..
...
...
1...
...
...
...
.....
..
..... ...
...
... 1
...
.. ... . .. ...
... ..... ...
.. ... .... ...
. ...
...
• • • • •
... .. . ...
.... ... . ...
.. .... ..
. ..
.
... ........ ... ........ ........
. ..... . .
... .. ....
.
. .. ....
.
. .. ....
.
.
. . ... . . ... . ...
.... ... .... . ...
x4 . .. ... .. .... .. ...
.
.... ... ... ...
....
. ..
.
.
...
...
... 1 ....
....
...
.
..
. 2 ...
...
.... ..
.
.
...
...
... 2
.... ... ... ...
... ...
.
...
... .... ...
.
...
. ..
.
.
...
...
. ...
....
• • • • • • • •
... .... ... ...
. ... . . ...
.. .. . .. .. ..
Fig. 8.3 Branching tree with canonical arc costs. Unlabeled arcs have zero cost.
leaving that node is a predefined value αi . In the simplest case αi = 0, but in some
applications it is convenient to allow other values.
A simple algorithm converts any set of arc costs to canonical costs. For each layer
Li , i = n, n − 1, . . ., 2, do the following: for each node u in layer Li , add αi − cmin to
the cost on each arc leaving u, where cmin is the minimum cost on arcs leaving u, and
add cmin − αi to each arc entering u. For example, the costs on the tree of Fig. 8.2
become the canonical costs shown in Fig. 8.3 if each αi = 0. This tree can, in turn,
be reduced to the decision diagram in Fig. 8.4(a). Note that this reduced diagram is
slightly larger than the reduced unweighted diagram in Fig. 8.1(a), which is to be
expected since costs must be matched before subdiagrams are superimposed.
The arc costs of Fig. 8.4(a) represent a state-dependent objective function (8.1).
For example, the state at the leftmost node in layer 4 is S4 = {2, 3}, and the
150 8 Recursive Modeling
(a) (b)
r r
..
• ..
.... .......
... ..
• ..
.... .......
...
.... ....
.... .... ....
....
. .... . ....
.... ....
x1 x1
.... ....
. ... . ...
6 ....
..
... .
5 ....
....
....
....
9 .
....
.
.... ....
8
....
....
....
.. ... .. ...
... .... ... ....
. .... . ....
.... ....
• • • •
.... ....
.... ..... .... .....
. ..
... .. .......... ... .. .......
... ..... ... .....
.. ..... .. .....
... ... ..... ... ... ....
... . ..... ... . .....
. .
x2 x2
... . . ..... ... . . .....
... ..... ... .....
...
...
... ...
...
. .....
1
.....
.....
.....
...
...
...
5 ...
...
. ....
....
....
....
... .. ..... ... .. ....
... ..... ... .....
.. ..... .. .....
... .. ..... ... .. .....
• ......
...
... ....
..
. • .
.
.
..
...
.
.
....... ....
• .....
.........
.
• ...
.........
.
. ........
...
.......
........
• ......
......
.
....... ....
•.....
.........
.
... .
... ...... .... .. ........ ...
.
. ...... ....
... ... .
. .. . ....... ... .. .
.. ... ..... ...
. .......
....... ..... ..... ...
.
x3 x3
... ...
... .... .. .. ... . ........ .. ...
1 ...
...
...
..
....
. ...
...
..
..
..
...
.... 1
....
....
2 ...
..
...
...
.
... ..............
.......
....... ..
..
...
....
....
....
4
... .... .
.... ......... ... .
.... .......
. ..
.......
.. .... ....... ..
... .
.... ....
.
... ............
.. ...
. ........ ... ......
... ... ...... ....... ......
....... . .............
• • • .....
• •
.. .
.........
.
.... ........... .... .... ......
... ...... ... ........... ... .............
... . .... ...... .... ...
... ...... ....
... ...... .... .
. ... .... .
.
... . .... .. .. .. ..
... .....
.....
. ... ...
.....
. ...
.. ..... ... ... ...
x4 x4
...
... .... . ... ... . ...
.... . ... .... .... ... .... ....
... ... ...
1 ....
....
.....
.....
... .... ...
..
... .... ...
.
.. .. 2
.....
....
.
...... ...
...
...
...
..
.....
....
.
...6
...
......
...... . . ...... ... ..... ......
....... . .... . ............. ...
..... ...............
......
• ............... ........
.......
• .....
1 1
Fig. 8.4 (a) Weighted decision diagram with canonical arc costs for a nonseparable objective
function. (b) Canonical arc costs for a separable objective function. Unlabeled arcs have zero cost.
immediate costs for this state are the costs on the arcs leaving the node, namely
h4 (S4 , 1) = 1 and h4 (S4 , 0) = 0. In general, any possible objective function can be
represented by state-dependent costs because it can be represented by a weighted
decision diagram.
Furthermore, two uniqueness results can be proved [97]. There is a unique
canonical assignment of arc costs representing a given objective function, once
the offsets αi are fixed. In addition, the classical uniqueness theorem for reduced,
ordered diagrams [37] can be extended to weighted diagrams.
Theorem 8.2. A weighted decision diagram that is reduced when costs are ignored
remains reduced when its arc costs are converted to canonical costs.
8.4 State-Dependent Costs 151
To transform the costs to canonical costs, subtract hi si + (m − si )ci from the cost
on each arc (si , si+1 ), and add this amount to each arc coming into si . Then for any
period i, the arcs leaving any given node si have the same set of costs. Specifically,
realizing that xi represents si+1 , arc (si , si+1 ) has cost
152 8 Recursive Modeling
• ....
...........
.... ... ....
.... .... ....... ...
..... .....
•
......
............ ............
...
x1 x1
... ...
... ... ...
....
... ...
.... ... ...
... ...
..... ... .... ....
.. ...
...
.
. ...
... .... .... ...
.... .... ... ... ..
.... ... ... ... ... ..
... .
. .... ... ... ..
.. ... .... ...
.....
. .
. .... ...
...
... ..
....... ..... ....
... ...
....
...
. .....
• • • •
.
.
. .... ... .... ....... .... .............
......... . . .
............. ........ . ..
............. .......
... ............. . ........... .. ...... .... .......
. ... . ... . . .. ... ... ..
..
....
..... ..................... ... ..... ....... ........ ....... ..... .....
...
x2 x2 ...
.
.. ...... ...
. .... ......... .... ... ... ......... .... . ..
..... ... ........ . ........
. ... ..... .... .... ...
.... .... ......... ... ......... ..... .... . ...
.... .... .... ....... .. ....... ... .... .... .. ... ...
. .
.......
.......
...............
.
....... . .... .... ..
..... .... .... ........ .... .............. ....... ....... ..... ... . ..
... .
..... .................. .
. .
....
. ..
.. .. .
... ... .... ...
.
..... ....... ............. .......
. .... .... .............. ....... .. ... .... ...
. .. ...... ... .. .... .
.. ....... ... ..... ...
... . .
.
.
........................... .... .. .... ....... ... .. ..... .... .......
• • • •
.......... .
. .
............ ........ ............
......... .......
.
........
.. . ..................
.......... ... ...... ... .........
... .................... ....... .... ...... .................. ....
. ....
. . .. ...
.... .... ...... ....... ... .... ...
x3 x3
... ... ....... ..
. .. .... ... ....... .... ... .. ..
.. ..
... ..... ............ ...
.... .... ...... ....... ....
....... ... ....
... .... .
... .............. ...
.. ...
.... .. .... ....... ... .. .... ...
• • • •
....... .... ..........
. .......... .......... .......... ..
............. ...... .......... . .
.......................
.... ... .... ...... ... .......
.... . ... .
.
... .... . .....
...
xn xn ...
.... . .... ...
.... ...
... .. . ...
... .
. .... ... .... ...
.... ..... ...
. .. ... ...
....
.... ... .... ... ... ..
.... ... .... .... ... ..
... ... .... ... .... ..
.... .... .
.... .. ... ...
.... ..
.
.... .... ....... ...
... ...
.... ... .... .... .. ..
...... .... ..........
• •
... ......
....... ..............
(a) (b)
Fig. 8.5 (a) State transition graph for a production and inventory management problem. (b)
Reduced state transition graph after converting costs to canonical costs.
and so depends only on the next state si+1 . These costs are canonical for the offsets
In any layer, the subdiagrams rooted at the nodes are isomorphic, and the decision
diagram can be reduced as in Fig. 8.5(b). There is now one state in each period
rather than m. If we call this state 1, the transition function is
1, if 0 ≤ xi ≤ m
gi (1, xi ) =
0̂, otherwise.
If x¯1 , . . . , x¯n are the optimal controls, the resulting stock levels are given by si+1 = x¯i
and the production levels by xi = x¯i − si + di.
The decision diagram is therefore reduced in size by a factor of m, and solution
of the problem becomes trivial. The optimal control xi is the one that minimizes the
immediate cost (8.7), which is
0 if ci + hi+1 ≥ ci+1
xi =
m otherwise.
The optimal solution is therefore a “bang-bang” inventory policy that either empties
or fills the warehouse in the next period. The optimal production schedule is
di − si if ci + hi+1 ≥ ci+1
xi =
m + di − si otherwise.
This result relies on the fact that the unit production and holding costs are
linear, and excess inventory can be sold (xi < 0). If excess inventory cannot be
sold (or if the salvage value is unequal to the production cost), some reduction
of the decision diagram is still possible, because subdiagrams rooted at states
corresponding to lower inventory levels will be identical. If production and holding
costs are nonlinear, the decision diagram does not simplify in general.
Up to this point, only serial recursive models have been considered. That is, the
stages form a directed path in which each stage depends on the previous stage in a
Markovian fashion. Nonserial recursions can allow one to formulate a wider variety
of problems in recursive form, or to formulate a given problem using simpler states.
In a nonserial recursion, the “stages” form a tree rather than a path.
Nonserial dynamic programming was introduced into operations research more
than 40 years ago [29], even if it seems to have been largely forgotten in the field.
Essentially the same idea has surfaced in other contexts, including Bayesian net-
works [109], belief logics [142, 145], pseudo-Boolean optimization [52], location
theory [46], k-trees [8, 9], and bucket elimination [54].
The idea is best explained by example. Figure 8.6(a) shows a small set parti-
tioning problem. The goal is to select a minimum subcollection of the six sets that
154 8 Recursive Modeling
Set i
1 2 3 4 5 6 x4 ................................................x1.........................................................x3
...
....
.. .... ....... .... ...
A • • • ...
... ....
...
...
.... ....... .......
............. .....
.
...
...
... ... ...
.... ...... .... ....
B • • ...
...
... ........
......
.
.
.
..
..
.......
....
...
...
...
... ....... ....
C • • • ... .... ....... ....
...
....
...
... ... .... ...
... ...... ....... ....
....
... ...
........... ....
D • • x2 x6
............................................
..
x5
(a) (b)
Fig. 8.6 (a) A small set partitioning problem. The dots indicate which elements belong to each
set i. (b) Dependency graph for the problem. The dashed edge is an induced edge.
partitions the set {A, B, C, D}, where the ith set is Si . The control is a binary variable
xi that indicates whether Si is selected. The feasible solutions are (x1 , . . . , x6 ) =
(0, 0, 0, 1, 1, 1), (0, 1, 1, 0, 0, 0), (1, 0, 0, 0, 0, 1), where the last two solutions are opti-
mal.
A nonserial recursion can be constructed by reference to the dependency graph
for the problem, shown in Fig. 8.6(b). The graph connects two variables with an
edge when the corresponding sets have an element in common. For example, x1 and
x2 are connected because S1 and S2 have element C in common. Arbitrarily adopting
a variable ordering x1 , . . . , x6 , vertices are removed from the graph in reverse order.
Each time a vertex is removed, the vertices adjacent to it are connected, if they are
not already connected. Edges added in this way are induced edges. For example,
removing x6 in the figure induces the edge (x2 , x3 ).
The feasible values of xi depend on the set of variables to which xi was adjacent
when removed. Thus x5 depends on (x1 , x3 ), and similarly for the other control
variables. Let Si (xi ) be Si if xi = 1 and the empty set otherwise. The state s5 on
which x5 depends can be regarded as the multiset that results from taking the union
S1 (x1 ) ∪ S3 (x3 ). Thus if (x1 , x3 ) = (1, 1), the state is {A, A, B, C}. Note that A is
repeated because it occurs in both S1 and S3 . A state is feasible if and only if it
contains no repeated elements.
The “stages” of a nonserial recursion correspond to the control variables as in
a serial recursion. However, applying a control can result in a transition to states
in multiple stages. The transition function is therefore written as gik (si , xi ), which
indicates the state in stage k that results from applying control xi in state si . There
is also a terminal stage, which can be denoted stage τ and allows two states, the
infeasible state 0̂ and a feasible state 1̂. Only one state is possible in stage 1, namely
s1 = 0.
/ The transition functions for the example are
8.5 Nonserial Recursive Modeling 155
x1...............................
...
... ..
.....
... ........∅.... .....
... ...
.. .. .. .. .... ....
.............................
....... ...
... ...
....... ...
...
..... ..
x 2 .................. ...........
. .... . . . . .....................................................................................
... ................
... ........
......
..
..... ....
....
.
...... ..
.
... ........ .....................
..
.
∅. . . . . . . .
........ ........... ........................... . . .
{A,C} ........ .....
..
..
. {C}
............... ....... .................. ... .................... .
... ..... .....
. ...................... . ....................
.............................................................................
...... ....... .. . ..
... ...... ....... ... ......... .....
...... ....... ....................
................. ...
. ....... ... .....
.. ...... ....... ................... ....... .. ..... ..
... ...... ....... .... .
. .. .
..................... ....... . ..
. ..... .....
..
. ...... . .
. . .....
..
..
.. .. . . . ... .. . . . ..
...... ...... ....................... ....... ... ..... .....
... ...... ....... .. ................... . . .
.
.. ..
.. ....... ......... ...................
.. . . . .. ... . . . ... ...... ....... . .....
..
..... . .
.................. ....... ....... .
.. ...... .
....... ................... .... ..
. .....
...... ... ....... ..................... .. ..... ..
.
. . . . ....... .. .. . ..... .. ... ... . . . ..
.... . .....
.. ................................... ......................... ........................................... .
... ... .
... .
. . . . . . . . . . . . . . . . . ..
. . . .. . ..... .. ..... ... ... . . .................... . . . ..... .
. .. . . . .
... .. . .. .
...
... . . . . . . . . . . .. .
. . . ... . . . .............................................
..
................... ...... .............................................
x x4
. ..... . ............. . . .. . ....... .. ......
.
. .
......... ..
. .... . . .
...... ......... . . . . .. ........ ..
... ..
. .. . . . . . . . . . . . . .
................. ....... .
.. ......... .. ...................
3 ........ .
.... .. ...
. .. ........
... .....
. . . .. ..
.
. .. ........ ......
....
{C,D} {A,C} {C,D} {A,C}
.. ...
....
...
......
........ .........
∅ ..
........................... . .. . . . . ..
..
. ∅
....
....
.. . ....... .....
...
..
........... ....... . .. ...................... ....... ............... ......... ...
...
.........
.............. ......
. .. ... ... ... . ...
.
. .
.....................
. ...
...
..
..
.... .. . ..... .. .. .. . .
. . . ............... .............
. . .. . . . . .. . . . .. .
..... .
........
.. ............
. ......................................................................................................... . . ...........................................................................................
... ....... .... ......................... ..
.............. ..................... . ....
.
... ... ...... ....................
.................................. ....
.. ........ . ....... .................... .
........................
...... ....... . .. . ..
... .... ....... .
. . . .
..........................
.. ........ ....... ... ............................................
... ........ ........ .. .......................
.... ....... . ... .....................
.. ...... ...... .. .......................
. ........................
... .................................................................................................................................. .......
....... .....
..............
..................................................................................................................
x x
... .
.
.......... .
. .
. ..............
. .................................
. ......... ...................... ..........
......... . ........ . ........
5 ...... .
....
. . .
..
.......
.....
... ....
... . . . .. . ............... .......
.....
... 6
... .
...
...... ∅ .
{A,B,C,D} {A,C} .. . . . . . . ..
.
.
. ... .
. ∅
....
.. . . ........
{A,B,C,D} ...
......
..
.
........
.......... ..........
.
...... ...... .. .... ........ . . . .. ...
...
. ...
...
. .. . ..... ........
................... .... . ................. . . .... .
....................... ...... .............. .................................................................................
..... ................................................................................................ ... ...
......
...... ... .... ... .......
. ...
.....
...... ..... ...... ... .
.......
...... ... . .. .. . .
..... .... ...
... ........
...... .... ..... ... ..
..... .. ..
{A} {B,D}
Fig. 8.7 Nonserial state transition graph for a set partitioning problem. Only nodes and arcs that
are part of feasible solutions are shown. Each feasible solution corresponds to a tree incident to
the root and the three terminal states, which are not encircled. The boldface tree corresponds to
optimal solution (x1 , . . ., x6 ) = (0, 1, 1, 0, 0, 0).
g12 (0,
/ x1 ) = S1 (x1 )
g23 (s2 , x2 ) = g24 (s2 , x2 ) = s2 ∪ S2 (x2 )
g35 (s3 , x3 ) = g36 (s3 , x3 ) = s3 ∪ S3 (x3 )
giτ (si , xi ) = si ∪ Si (xi ), i = 4, 5, 6
The state transition graph of Fig. 8.7 can be regarded as a nonserial decision
diagram. The ovals correspond to “layers,” and the three terminal states belong to
the terminal layer. In general, the concepts of reduction and canonical costs can be
carried over from serial to nonserial decision diagrams.
Chapter 9
MDD-Based Constraint Programming
Abstract This is the first of three chapters that apply decision diagrams in the
context of constraint programming. This chapter starts by providing a background of
the solving process of constraint programming, focusing on consistency notions and
constraint propagation. We then extend this methodology to MDD-consistency and
MDD-based constraint propagation. We present MDD propagation algorithms for
specific constraint types, including linear inequalities, A LLDIFFERENT, A MONG,
and E LEMENT constraints, and experimentally demonstrate how MDD propagation
can improve conventional domain propagation.
9.1 Introduction
The previous chapters have focused on developing decision diagrams for optimiza-
tion. For example, in Chapter 6 we saw how decision diagrams can form the basis for
a stand-alone solver for discrete optimization problems. In the following chapters
we take a different perspective, and integrate decision diagrams within a constraint
programming framework. In particular, we will discuss how multivalued decision
diagrams (MDDs) can be used to improve constraint propagation, which is the
central inference process of constraint programming.
x1 > x2 (c1 )
x1 + x 2 = x3 (c2 )
A LLDIFFERENT(x1 , x2 , x3 , x4 ) (c3 )
x1 ∈ {1, 2}, x2 ∈ {0, 1, 2, 3}, x3 ∈ {2, 3}, x4 ∈ {0, 1}
This model is a constraint satisfaction problem (CSP) as it does not have an objective
function to be optimized. We apply constraint propagation by considering each
9.2 Constraint Programming Preliminaries 159
constraint in turn. From constraint (c1 ) we deduce that x2 ∈ {0, 1}. Then we consider
constraint (c2 ), but find that each domain value for each variable participates in a
solution to the constraint. When we consider constraint (c3 ), we can realize that the
values {0, 1} will be assigned (in any order) to variables x2 and x4 , and hence we
reduce the domain of x1 to {2}, and consequently x3 ∈ {3}. We continue revisiting
the constraints whose variables have updated domains. Constraint (c1 ) does not
reduce any more domains, but by constraint (c2 ) we deduce x2 ∈ {1}. Constraint
(c3 ) then updates x4 ∈ {0}. No additional domain filtering is possible, and we
finish the constraint propagation process. In this case, we arrive at a solution to
the problem, (x1 , x2 , x3 , x4 ) = (2, 1, 3, 0).
Domain Consistency
1 Note that, even if a set of constraints is domain consistent, the conjunction of the constraints in
the set may not be domain consistent.
160 9 MDD-Based Constraint Programming
This function will return ‘true’ if the constraint is domain consistent, and returns
‘false’ otherwise. In the latter case one of the domains is empty, hence no solution
exists, and the CP solver can backtrack from the current search state. The time
complexity of algorithm D OMAIN C ONSISTENCY is polynomial in the number of
variables and the size of the variable domains, but relies on the time complexity for
determining whether a solution exists to the constraint. For some constraints this
can be done in polynomial or even constant time, while for others the check for
feasibility is an NP-complete problem itself (see Example 9.4).
Here, the Boolean parameters b1 (v) and b2 (v) represent whether a domain
value v in D(x1 ), resp. D(x2 ), participates in a solution to C or not. Algorithm
D OMAIN C ONSISTENCY B INARY TABLE considers all tuples in C and therefore runs
in time and space O(|C(x1 , x2 )|).
We note that much more refined variants of this algorithm exist in the literature;
see for example [132].
In practice, this rule can be implemented by updating the maximum value in D(xi ),
which may be done in constant time with the appropriate data structure for storing
the domains.
Bounds Consistency
Domain consistency is one of the stronger notions of consistency one can apply
to an individual constraint. However, as mentioned above, for some constraints
establishing domain consistency may be NP-hard. For others, domain consistency
may be too time-consuming to establish, relative to the associated domain reduction.
162 9 MDD-Based Constraint Programming
Therefore, other consistency notions have been introduced that are weaker, but
are more efficient to establish. The most important alternative is that of bounds
consistency.
Let C(x1 , x2 , . . . , xn ) be a constraint, and let each variable xi have an associated
domain D(xi ) that is a subset of a totally ordered universe U. We say that C is
bounds consistent if for all xi and vi ∈ {min D(xi ), max D(xi )}, there exists a solution
(v1 , v2 , . . . , vn ) such that v j ∈ [min D(x j ), max D(x j )], j = i. In other words, we
consider a convex relaxation of the variable domains, and require that all bounds
participate in a solution to C.
Similar to domain consistency, bounds consistency can be established via a
generic algorithm, by adapting algorithm D OMAIN C ONSISTENCY to consider do-
main bounds. But oftentimes, more efficient algorithms exist for specific constraint
types.
Example 9.5. Consider again the linear inequality constraint from Example 9.3. For
this constraint, establishing bounds consistency is equivalent to establishing domain
consistency.
D(x) is updated by this process, each constraint that has variable x in its scope is
added again to Q, to be revisited (lines 7–8). If algorithm D OMAIN C ONSISTENCY
detects an empty domain, it returns ‘false’, as will C ONSTRAINT P ROPAGATION
(lines 5–6). Otherwise, C ONSTRAINT P ROPAGATION returns ‘true’ (line 9). Practi-
cal implementations of C ONSTRAINT P ROPAGATION can be made more efficient,
for example by choosing the order in which constraints are processed. We already
saw this algorithm in action in Example 9.1, where we repeatedly established
domain consistency on the individual constraints until no more domain reductions
were possible.
The constraint propagation process can be adapted to include bounds consistency
algorithms instead of domain consistency algorithms for specific constraints. Re-
gardless, when the variables have finite domains, the propagation cycle is guaranteed
to terminate and reach a fixed point. Moreover, under certain conditions of the
propagators, one can show that the fixed point is unique, irrespective of the order in
which the propagators are applied [7].
Systematic Search
A LLDIFFERENT(x1 , x2 , x3 , x4 ) (c1 )
x1 + x 2 + x 3 ≥ 9 (c2 )
xi ∈ {1, 2, 3, 4}, i = 1, . . . , 4
x1 3 4 4
3
1 2 2
4 4
3 4 3 4 4 3 2 2
x2 2 1 1 21 2
3 3
3 4 4 3 4 3 4
x3 3
2 12 1 1 2 2
4 3
1
x4 2 1
Example 9.7. We continue Example 9.6. In the ideal case, we would create an exact
MDD to represent constraint (c1 ), as in Fig. 9.1(a). We then communicate the MDD,
which represents relationships between the variables, to constraint (c2 ). We can
inspect that all paths in the MDD that use an arc with label 1 for x1 , x2 , x3 do not
satisfy constraint (c2 ), and hence these arcs can be removed. We can also remove
nodes that are no longer connected to either the root or the terminal. The resulting
MDD is depicted in Fig. 9.1(b). If we project this information to the variable
domains, we obtain x1 , x2 , x3 ∈ {2, 3, 4} and x4 ∈ {1}, which can be communicated
to the domain store.
Observe that in Example 9.7 the exact MDD for the A LLDIFFERENT constraint
has exponential size. In practice one therefore needs to use relaxed MDDs, of
limited width, to make this approach scalable.
In the remainder of this chapter, we will assume that the MDD defined on a set
of variables {x1 , x2 , . . . , xn } will follow the lexicographic ordering of the variables,
unless noted otherwise. Also, we assume that the MDDs do not contain long arcs in
this chapter.
166 9 MDD-Based Constraint Programming
In principle, the MDD store can replace the domain store completely, and we could
have one single MDD representing the entire problem. This may not be efficient in
practice, however, as the resulting relaxed MDD may not be sufficiently strong.
Instead we recommend to maintain both the domain store and an MDD store,
where the MDD store represents a suitable substructure and not necessarily all
aspects of the problem. Moreover, we may choose to introduce multiple MDDs,
each representing a substructure or group of constraints that will be processed
on it. Consider for example a workforce scheduling problem in which we need
to meet given workforce levels over time, while meeting individual employee
workshift rules. We may choose to introduce one MDD per employee, representing
the employee’s work schedule over time, while the workforce-level constraints are
enforced using a conventional CP model with domain propagation.
The purpose of the MDD store is to represent a more refined relationship among
a set of variables than the domain store’s Cartesian product of variable domains.
This is accomplished by MDD filtering and refinement. MDD filtering generalizes
the concept of domain filtering to removing infeasible arcs from an MDD, while
the refinement attempts to strengthen the MDD representation by splitting nodes,
within the allowed maximum width. We note that a principled approach to node
refinement in MDDs was introduced by Hadzic et al. [84]. A generic top-down
filtering and refinement compilation scheme for a given set of constraints was
presented in Section 4.7, as Algorithm 4. This is precisely the procedure we will
apply in the context of MDD-based constraint programming. Note that we can adjust
the strength of the MDD store by setting the maximum allowed MDD width from
one (the domain store) to infinity (exact MDD).
Example 9.8. Consider a CSP with variables x1 ∈ {0, 1}, x2 ∈ {0, 1, 2}, x3 ∈ {1, 2},
and constraints x1 = x2 , x2 = x3 , x1 = x3 . All domain values are domain consistent
(even if we were to apply the A LLDIFFERENT propagator on the conjunction of
the constraints), and the coventional domain store defines the relaxation {0, 1} ×
{0, 1, 2} × {1, 2}, which includes infeasible solutions such as (1, 1, 1).
The MDD-based approach starts with the MDD of width one in Fig. 9.2(a), in
which parallel arcs are represented by a set of corresponding domain values for
clarity. We refine and propagate each constraint separately. Starting with x1 = x2 ,
we refine the MDD by splitting the node at layer 2, resulting in Fig. 9.2(b). This
allows us to delete two domain values, based on x1 = x2 , as indicated in the figure. In
9.3 MDD-Based Constraint Programming 167
Fig. 9.2 Refining and propagating an MDD of width one (a) for x1 = x2 (b), x2 = x3 (c), and
x1 = x3 (d), yielding the MDD in (e). Dashed lines mark removed inconsistent values.
Fig. 9.2(c) and (d) we refine and propagate the MDD for the constraints x2 = x3 and
x1 = x3 , respectively, until we reach the MDD in Fig. 9.2(e). This MDD represents
all three solutions to the problem, and provides a much tighter relaxation than the
Cartesian product of variable domains.
We will next discuss how the outcome of MDD propagation can be characterized.
Let C(X) be a constraint with variables X as its scope, and let MDD M be defined
on a set of variables X ⊇ X. A path with arc labels v1 , . . . , vn from the root r to the
terminal t in M is feasible for C if setting xi = vi for all xi ∈ X satisfies C. Following
[4], we say that C is MDD consistent with respect to M if every arc in M belongs to
a path that is feasible for C.
Note that domain consistency is equivalent to MDD consistency on an MDD of
width one, in which the nodes of subsequent layers Li and Li+1 are connected with
parallel arcs with labels D(xi ).
As seen in Section 4.7, the feasibility of an MDD arc a = (u, v) in layer L j with
respect to constraint C can be determined by the transition function t Cj (s(u), d(a)).
168 9 MDD-Based Constraint Programming
Recall that s(u) represents the (top-down) state information of node u and d(a)
represents the label of a. Note that parallel arcs are distinguished uniquely by
their labels. In addition, MDD propagation algorithms may utilize bottom-up state
information, as we have seen in Section 4.7.1. For clarity, we will describe the
MDD propagation rules for an arc a = (u, v) in terms of s↓ (u), s↑ (v), and d(a),
where s↓ (·) and s↑ (·) represent the state information computed from the root r and
terminal t, respectively. For each constraint type, the MDD propagator is based on
an appropriate definition of state information, and a recursive procedure to compute
this information during the top-down and bottom-up pass.
For some constraints we can establish MDD consistency, with respect to a given
MDD, in polynomial time. We will list some of these constraints in later sections.
In general, if we can determine in polynomial time whether any particular variable–
value assignment is consistent with the MDD and a constraint C, then we can
achieve MDD consistency in polynomial time due to the following theorem:
Proof. The proof is based on a shaving argument. For each arc a of M we consider
the MDD Ma that consists of all the r–to–t paths in M containing a. Then a can be
removed from M if and only if x j = d(a) is inconsistent with C and Ma , where j is
the layer from which a originates. This can be determined in time and space at most
O( f (Ma )) ≤ O( f (M)). By repeating this operation for each arc of M we obtain the
theorem.
Corollary 9.1. For binary table constraints, MDD consistency can be established
in polynomial time (in the size of the constraint and the MDD).
Proof. Let C(xi , x j ) be a binary table constraint, where i < j without loss of
generality. We let state s↓ (u) contain all the values assigned to xi in some path from
9.3 MDD-Based Constraint Programming 169
r to u, and s↑ (u) contain all the values assigned to x j in some path from u to t. We
initialize s↓ (r) = ∅, and recursively compute
↓ ∪a =(u,v)∈in(v) d(a ) if u ∈ Li
s (v) =
∪a =(u,v)∈in(v) s↓ (u) otherwise
for all nodes v in layers L2 , . . . , Ln+1 . Similarly, we initialize s↓ (t) = ∅, and recur-
sively compute
↑ ∪a ∈out(u) d(a ) if u ∈ L j
s (u) =
∪a =(u,v)∈out(u) s↑ (v) otherwise
for all nodes u in layers Ln , Ln−1 , . . . , L1 . Then we can delete arc a = (u, v) from M
if and only if
(a) u ∈ Li and tuple (d(a), w) does not satisfy C for any w ∈ s↑ (v), or
(b) v ∈ L j and tuple (w, d(a)) does not satisfy C for any w ∈ s↓ (u).
The top-down and bottom-up passes to compute the states and perform the checks
for feasibility require time that is polynomial in the size of M. The corollary then
follows from Theorem 9.1.
In the CP literature exact MDDs have been used to represent the solution set for
specific constraints. In fact, there is a close relationship between MDDs, table
constraints, and the so-called R EGULAR constraint that represents fixed-length
strings that belong to a regular language [126, 20]. Namely, they provide different
data structures for storing explicit lists of tuples that satisfy the constraint; see [43]
for a computational comparison, and [123, 124] for efficient domain consistency
algorithms based on MDDs. In such cases, we want to establish MDD consistency
of one MDD with respect to another. We will next see that this can be done in a
generic fashion. Moreover, it will provide us with another tool to test whether MDD
consistency can be established in polynomial time for certain constraint types.
Formally, our goal in this section is to establish MDD consistency on a given
MDD M with respect to another MDD M on the same set of variables. That is, M
is MDD consistent with respect to M if every arc in M belongs to a path (solution)
170 9 MDD-Based Constraint Programming
Algorithm 7 Intersection(M,M )
Input: MDD M with root r, MDD M with root r . M and M are defined on the
same ordered sequence of n variables.
Output: MDD I with layers LI1 , . . . , LIn+1 and arc set AI . Each node u in I has an
associated state s(u).
1: create node rI with state s(rI ) := (r, r )
2: LI1 := {rI }
3: for i = 1 to n do
4: LIi+1 := {}
5: for all u ∈ LIi with s(u) = (v, v ) do
6: for all a = (v, ṽ) ∈ M and a = (v , ṽ ) ∈ M such that d(a) = d(a ) do
7: create node ũ with state s(ũ) := (ṽ, ṽ )
8: if ∃ w̃ ∈ LIj+1 with s(w̃) = s(ũ) then ũ := w̃
9: else LIi+1 += ũ end if
10: add arc (u, ũ) with label d(a) to arc set AI
11: remove all arcs and nodes from I that are not on a path from rI to t I ∈ LIn+1
12: return I
Algorithm 8 MDD-Consistency(M,M )
Input: MDD M with root r, MDD M with root r . M and M are defined on the
same ordered sequence of n variables.
Output: M that is MDD consistent with respect to M
1: create I := Intersection(M,M )
2: for i = 1 to n do
3: create array Support[u, l] := 0 for all u ∈ LM i and arcs out of u with label l
4: for all arcs a = (v, ṽ) in AI with s(v) = (u, u ) such that v ∈ LIi do
5: Support[u, d(a)] := 1
6: for all arcs a = (u, ũ) in M such that u ∈ LM i do
7: if Support[u, d(a)] = 0 then remove a from M end if
8: remove all arcs and nodes from M that are not on a path from r to t ∈ LM n+1
9: return M
of M that also exists in M . For our purposes, we assume that M and M follow the
same variable ordering.
We can achieve MDD consistency by first taking the intersection of M and M ,
and then removing all arcs from M that are not compatible with the intersection.
Computing the intersection of two MDDs is well studied, and we present a top-down
intersection algorithm that follows our definitions in Algorithm 7. This description
is adapted from the ‘melding’ procedure in [104].
The intersection MDD, denoted by I, represents all possible paths (solutions)
that are present in both M and M . Each partial path in I from the root rI to a node
9.3 MDD-Based Constraint Programming 171
the size of the reduced ordered BDD for this inequality is bounded from below by
√
Ω (2 n/2 ).
We next present MDD propagation algorithms for several constraint types. In some
cases, the propagation may not be as strong as for the conventional domain store,
in the sense that, when specialized to an MDD of width one, it may not remove as
many values as a conventional propagator would. However, a ‘weak’ propagation
algorithm can be very effective when applied to the richer information content of
the MDD store.
If one prefers not to design a propagator specifically for MDDs, there is also the
option of using a conventional domain store propagator by adapting it to MDDs.
This can be done in a generic fashion, as will be explained in Section 9.4.7.
Lastly, our description will mainly focus on MDD filtering. The refinement
process can be based on the same state information as is used for filtering.
and
0, if u = t,
sp↑u =
min{la + sp↑v : a = (u, v) ∈ out(u)}, otherwise.
This state information can be computed in linear time (in the size of the given
MDD).
Then we delete an arc a = (u, v) with u ∈ L j and j ∈ J whenever every path
through a is longer than b; that is,
The inequality propagator of Section 9.4.2 can be extended to equalities [84] and
two-sided inequalities [94], by storing all path lengths instead of only the shortest
and/or longest paths.
9.4 Specialized Propagators 175
w + d(a) + w ∈
/ [L,U], for all w ∈ s↓ (u), w ∈ s↑ (v). (9.2)
Because rule (9.2) for deleting an arc is both a necessary and sufficient condition
for the arc to be supported by a feasible solution, one top-down and bottom-up pass
suffices to achieve MDD consistency. Observing that the states can be computed in
pseudo-polynomial time, by Theorem 9.1 this algorithm runs in pseudo-polynomial
time (see also [84]).
Theorem 9.3 ([4]). Establishing MDD consistency for the A LLDIFFERENT con-
straint is NP-hard.
176 9 MDD-Based Constraint Programming
1: for i = 1, . . . , n do
2: Establish MDD consistency for A LLDIFFERENT on M
3: Let Li+1 = {vi+1, j } for some arc out of Li with label j
Since MDD consistency guarantees that all arcs in M belong to a solution to
A LLDIFFERENT, this procedure greedily constructs a single r–t path which is
Hamiltonian in G.
The A MONG constraint counts the number of variables that are assigned to a value
in a given set S, and ensures that this number is between a given lower and upper
bound [21]:
Definition 9.1. Let X be a set of variables, let L,U be integer numbers such that
0 ≤ L ≤ U ≤ |X|, and let S ⊂ ∪x∈X D(x) be a subset of domain values. Then we
define A MONG(X , S, L,U) as
L≤ ∑ (x ∈ S) ≤ U.
x∈X
L≤ ∑ fi (xi ) ≤ U,
xi ∈X
where
1, if v ∈ S,
fi (v) =
0, otherwise.
Because each fi (·) ∈ {0, 1}, the number of distinct path lengths is bounded by
U − L + 1, and we can compute all states in polynomial time. Therefore, by
Theorem 9.1, MDD consistency can be achieved in polynomial time for A MONG
constraints. Observe that tractability also follows from Theorem 9.2, since the size
of an exact MDD for A MONG can be bounded by O(n(U − L + 1)).
To improve the efficiency of this propagator in practice, one may choose to only
propagate bounds information. That is, we can use the inequality propagator for the
pair of inequalities separately, and reason on the shortest and longest path lengths,
as in Section 9.4.2.
D×
i (M) = {d(a) | a = (u, v), u ∈ Li }.
As before, we denote by M|u the subgraph of M defined by all r–t paths through u.
We then apply, for node u ∈ Li , a conventional domain propagator to the domains
D× × × ×
1 (M|u ), . . . , Di−1 (M|u ), {d(a) | a ∈ out(u)}, Di+1 (M|u ), . . . , Dn (M|u ). (9.3)
We remove values only from the domain of xi , that is from {d(a) | a ∈ out(u)}, and
delete the corresponding arcs from M. This can be done for each node in layer Li
and for each layer in turn. Note that the induced domain relaxation D× (M|u ) can be
computed recursively for all nodes u of a given MDD M in polynomial time (in the
size of the MDD).
We can strengthen the propagation by recording which values can be deleted
from the domains D×j (M|u ) for j = i when (9.3) is propagated [4]. If v can be
deleted from D×j (M|u ), we place the ‘nogood’ x j = v on each arc in out(u). Then
we recursively move the nogoods on level i toward level j. If j > i, for example,
we propagate (9.3) for each node on level i and then note which nodes on level
i + 1 have the property that all incoming arcs have the nogood x j = v. These nodes
propagate the nogood to all their outgoing arcs in turn, and so forth until level j is
reached, where all arcs with nogood x j = v and label v are deleted.
We next provide detailed experimental evidence to support the claim that MDD-
based constraint programming can be a viable alternative to constraint programming
based on the domain store. The results in this section are obtained with the MDD-
based CP solver described in [93], which is implemented in C++. The solver does
not propagate the constraints until a fixed point is reached. Instead, by default we
9.5 Experimental Results 179
allocate one bottom-up and top-down pass to each constraint. The bottom-up pass
is used to compute the state information s↑ . The top-down pass processes the MDD
a layer at a time, in which we first compute s↓ , then refine the nodes in the layer,
and finally apply the propagator conditions based on s↑ and s↓ . Our outer search
procedure is implemented using a priority queue, in which the search nodes are
inserted with a specific weight. This allows us to easily encode depth-first search or
best-first search procedures. Each search tree node contains a copy of the MDD of
its parent, together with the associated branching decision.
All the experiments are performed using a 2.33 GHz Intel Xeon machine with
8 GB memory. For comparison reasons, the solver applies a depth-first search, using
a static lexicographic-first variable selection heuristic, and a minimum-value-first
value selection heuristic. We vary the maximum width of the MDD, while keeping
all other settings the same.
backtracks time
7 4
10 10
6
10 10
3
5
10
backtracks width 4
0 -2
10 10
0 1 2 3 4 5 6 7 -2 -1 0 1 2 3 4
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
backtracks width 1 time width 1 (s)
7 4
10 10
6
10 10
3
5
10
backtracks width 8
0 -2
10 10
0 1 2 3 4 5 6 7 -2 -1 0 1 2 3 4
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
backtracks width 1 time width 1 (s)
7 4
10 10
6
10 10
3
5
backtracks width 16
10 2
time width 16 (s)
10
4
10
1
10
3
10
0
2 10
10
-1
10
1 10
0 -2
10 10
0 1 2 3 4 5 6 7 -2 -1 0 1 2 3 4
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
backtracks width 1 time width 1 (s)
Fig. 9.3 Scatter plots comparing width 1 versus width 4, 8, and 16 (from top to bottom) in terms
of backtracks (left) and computation time in seconds (right) on multiple A MONG problems.
We can observe that width 4 already consistently outperforms the domain prop-
agation, in some cases by up to six orders of magnitude in terms of search tree size
(backtracks), and up to four orders of magnitude in terms of computation time. For
width 8, this behavior is even more consistent, and for width 16, all instances can
be solved in under 10 seconds, while the domain propagation needs hundreds or
thousands of seconds for several of these instances.
9.5 Experimental Results 181
Table 9.1 The effect of the MDD width on time in seconds (CPU) and backtracks (BT) when
finding one feasible solution on nurse rostering instances.
Abstract In this chapter we present a detailed study of MDD propagation for the
S EQUENCE constraint. This constraint can be applied to combinatorial problems
such as employee rostering and car manufacturing. It will serve to illustrate the main
challenges when studying MDD propagation for a new constraint type: Tractability,
design of the propagation algorithm, and practical efficiency. In particular, we show
in this chapter that establishing MDD consistency for S EQUENCE is NP-hard, but
fixed-parameter tractable. Furthermore, we present an MDD propagation algorithm
that may not establish MDD consistency, but is highly effective in practice when
compared to conventional domain propagation.
10.1 Introduction
Table 10.1 Nurse rostering problem specification. Variable set X represents the shifts to be
assigned over a sequence of days. The possible shifts are day (D), evening (E), night (N), and
day off (O).
Definition 10.1. Let X be an ordered set of n variables, q, L,U integer numbers such
that 0 ≤ q ≤ n, 0 ≤ L ≤ U ≤ q, and S ⊂ ∪x∈X D(x) a subset of domain values. Then
n−q+1
!
S EQUENCE (X , S, q, L,U) = A MONG (si , S, L,U),
i=1
When modeling a given problem, we have the choice of using separate A MONG
constraints or single S EQUENCE constraints. It can be shown that achieving do-
main consistency on the S EQUENCE constraint is stronger than achieving domain
consistency on the decomposition into A MONG constraints. For many practical
applications the additional strength from the S EQUENCE constraint is crucial to find
10.1 Introduction 185
a solution in reasonable time [134, 153]. It is known that conventional domain con-
sistency can be achieved for S EQUENCE in polynomial time [152, 153, 36, 113, 55].
We also know from Section 9.4.5 that MDD consistency can be achieved for
the A MONG constraint in polynomial time [94]. The question we address in this
chapter is how we can handle S EQUENCE constraints in the context of MDD-based
constraint programming.
We first show that establishing MDD consistency on the S EQUENCE constraint is
NP-hard. This is an interesting result from the perspective of MDD-based constraint
programming. Namely, of all global constraints, the S EQUENCE constraint has
perhaps the most suitable combinatorial structure for an MDD approach; it has a
prescribed variable ordering, it combines subconstraints on contiguous variables,
and existing approaches can handle this constraint fully by using bounds reasoning
only.
We then show that establishing MDD consistency on the S EQUENCE constraint
is fixed parameter tractable with respect to the lengths of the subsequences (the
A MONG constraints), provided that the MDD follows the order of the S EQUENCE
constraint. The proof is constructive, and follows from the generic intersection-
based algorithm to filter one MDD with another.
The third contribution is a partial MDD propagation algorithm for S EQUENCE,
that does not necessarily establish MDD consistency. It relies on the decomposition
of S EQUENCE into ‘cumulative sums’, and an extension of MDD filtering to the
information that is stored at its nodes.
Lastly, we provide an experimental evaluation of our partial MDD propagation
algorithm. We evaluate the strength of the algorithm for MDDs of various maxi-
mum widths, and compare the performance with existing domain propagators for
S EQUENCE. We also compare our algorithm with the known MDD approach that
uses the natural decomposition of S EQUENCE into A MONG constraints [94]. Our
experiments demonstrate that MDD propagation can outperform domain propaga-
tion for S EQUENCE by reducing the search tree size, and solving time, by several
orders of magnitude. Similar results are observed with respect to MDD propagation
of A MONG constraints. These results thus provide further evidence for the power of
MDD propagation in the context of constraint programming.
186 10 MDD Propagation for S EQUENCE Constraints
representing that clause, thus ensuring that the variable can take any assignment
with respect to this clause. For the variables that do appear in the clause, we will
explicitly list out all allowed combinations.
More precisely, for clause ci , we first define a local root node ri representing
layer (yi,1 ), and we set tag (ri ) = ‘unsat’. For each node u in layer (yi, j ) (for
j = 1, . . . , n), we do the following: If variable x j does not appear in ci , or if tag (u)
is ‘sat’, we create two nodes v, v in (yi, j ), one single node w in (yi, j+1 ), and arcs
(u, v) with label 1, (u, v ) with label 0, (v, w) with label 0, and (v , w) with label
1. This corresponds to the ‘diamond’ structure. We set tag (w) = tag (u). Otherwise
(i.e., tag (u) is ‘unsat’ and yi, j appears in ci ), we create two nodes v, v in (yi, j ),
two nodes w, w in (yi, j+1 ), and arcs (u, v) with label 1, (u, v ) with label 0, (v, w)
with label 0, and (v , w ) with label 1. If ci contains as literal yi, j , we set tag (w) =
‘sat’ and tag (w ) = ‘unsat’. Otherwise (ci contains yi, j ), we set tag (w) = ‘unsat’ and
tag (w ) = ‘sat’.
This procedure will be initialized by a single root node r representing (y11 ). We
iteratively append the MDDs of two consecutive clauses ci and ci+1 by merging the
nodes in the last layer of ci that are marked ‘sat’ into a single node, and let this
node be the local root for ci+1 . We finalize the procedure by merging all nodes in
the last layer that are marked ‘sat’ into the single terminal node t. By construction,
we ensure that only one of yi j and yi j can be set to 1. Furthermore, the variable
assignment corresponding to each path between layers (yi,1 ) and (yi+1,1 ) will
satisfy clause ci , and exactly n literals are chosen accordingly on each such path.
We next need to ensure that, for a feasible path in the MDD, each variable x j will
correspond to the same literal yi, j or yi, j in each clause ci . To this end, we impose
the constraint
S EQUENCE(Y, S = {1}, q = 2n, L = n,U = n) (10.1)
y1,1
y1,2
y1,2
y1,3
y1,3
y1,4
y1,4
c2 y2,1
y2,1
y2,2
y2,2
y2,3
y2,3
y2,4
y2,4
The MDD M contains 2mn + 1 layers, while each layer contains at most six
nodes. Therefore, it is of polynomial size (in the size of the 3-SAT instance), and
the overall construction needs polynomial time.
:0
:1
x1
0 1
x2
00 01 10 11
x3
00 01 10 11
x4
00 01 10 11
x5
00 01 10 11
x6
Fig. 10.2 The exact MDD for the S EQUENCE constraint of Example 10.3.
In many practical situations the value of q will lead to prohibitively large exact
MDDs for establishing MDD consistency, which limits the applicability of Corol-
lary 10.1. Therefore we next explore a more practical partial filtering algorithm that
is polynomial also in q.
One immediate approach is to propagate the S EQUENCE constraint in MDDs
through its natural decomposition into A MONG constraints, and apply the MDD
filtering algorithms for A MONG proposed by [94]. However, it is well known that,
for classical constraint propagation based on variable domains, the A MONG decom-
position can be substantially improved by a dedicated domain filtering algorithm
for S EQUENCE [152, 153, 36, 113]. Therefore, our goal in this section is to provide
MDD filtering for S EQUENCE that can be stronger in practice than MDD filtering
for the A MONG decomposition, and stronger than domain filtering for S EQUENCE.
In what follows, we assume that the MDD at hand respects the ordering of the
variables in the S EQUENCE constraint.
10.4 Partial MDD Filtering for S EQUENCE 191
Our proposed algorithm extends the original domain consistency filtering algorithm
for S EQUENCE by [152] to MDDs, following the ‘cumulative sums’ encoding as
proposed by [36]. This representation takes the following form: For a sequence
of variables X = x1 , x2 , . . . , xn , and a constraint S EQUENCE(X , S, q, L,U), we first
introduce variables y0 , y1 , . . . , yn , with respective initial domains D(yi ) = [0, i] for
i = 1, . . . , n. These variables represent the cumulative sums of X , i.e., yi represents
∑ij=1 (x j ∈ S) for i = 1, . . . , n. We now rewrite the S EQUENCE constraint as the
following system of constraints:
where δS : X → {0, 1} is the indicator function for the set S, i.e., δS (x) = 1 if x ∈ S
and δS (x) = 0 if x ∈
/ S. [36] show that establishing singleton bounds consistency
on this system suffices to establish domain consistency for the original S EQUENCE
constraint.
In order to apply similar reasoning in the context of MDDs, the crucial obser-
vation is that the domains of the variables y0 , . . . , yn can be naturally represented at
the nodes of the MDD. In other words, a node v in layer Li represents the domain
of yi−1 , restricted to the solution space formed by all r–t paths containing v. Let us
denote this information for each node v explicitly as the interval [lb(v), ub(v)], and
we will refer to it as the ‘node domain’ of v. Following the approach of [94], we can
compute this information in linear time by one top-down pass, by using equation
(10.2), as follows:
The resulting top-down pass itself takes linear time (in the size of the MDD), while
a direct implementation of the recursive step for each node takes O(q · (ω (M))2 )
operations for an MDD M. Now, the relevant ancestor nodes for a node v in layer
Li+q are stored in Av [q], a subset of layer Li . We similarly compute all descendant
nodes of v in a vector Dv of length q + 1, such that Dv [i] contains all descendants of
v in the i-th layer below v, for i = 0, 1, . . . , q. We initialize Dt = [{t}, 0,
/ . . ., 0].
/
However, for our purposes we only need to maintain the minimum and maximum
value of the union of the domains of Av , resp. Dv , because constraints (10.3) and
(10.4) are inequalities; see the application of Av and Dv in rules (10.8) below. This
makes the recursive step more efficient, now taking O(qω (M)) operations per node.
Alternatively, we can approximate this information by only maintaining a mini-
mum and maximum node domain value for each layer, instead of a list of ancestor
layers. This will compromise the filtering, but may be more efficient in practice, as
it only requires to maintain two integers per layer.
We next process each of the constraints (10.2), (10.3), and (10.4) in turn to remove
provably inconsistent arcs, while at the same time we filter the node information.
Starting with the ternary constraints of type (10.2), we remove an arc (u, v) if
lb(u) + δS (d(u, v)) > ub(v). Updating [lb(v), ub(v)] for a node v is done similarly to
the rules (10.5) above:
lb(v) = max lb(v), min(u,v)∈in(v) {lb(u) + δS (d(u, v))} ,
(10.6)
ub(v) = min ub(v), min(u,v)∈in(v) {ub(u) + δS (d(u, v))} .
In fact, the resulting algorithm is a special case of the MDD consistency equality
propagator of [84], and we thus inherit the MDD consistency for our ternary
constraints.
10.4 Partial MDD Filtering for S EQUENCE 193
Next, we process the constraints (10.3) and (10.4) for a node v in layer Li+1
(i = 0, . . . , n). Recall that the relevant ancestors from Li+1−q are Av [q], while its
relevant descendants from Li+1+q are Dv [q]. The variable corresponding to node v
is yi , and it participates in four constraints:
yi ≥ l + yi−q ,
yi ≤ u + yi−q,
(10.7)
yi ≤ yi+q − l,
yi ≥ yi+q − u.
Note that we can apply these constraints to filter only the node domain [lb(v), ub(v)]
corresponding to yi . Namely, the node domains corresponding to the other variables
yi−q and yi+q may find support from nodes in layer Li+1 other than v. We update
lb(v) and ub(v) according to equations (10.7):
0
1 [0,0] [0,0]
y0
x1 [0,0] [1,1] [0,0] [1,1]
y1
x2 [0,0] [2,2] [1,1] [0,0] [2,2] [1,1]
y2
x3 [1,1] [0,2] [2,3] [1,1] [2,2] [2,2]
y3
x4 [1,1] [0,2] [1,4] [1,1] [2,2] [3,3]
y4
x5
y5
[0.5] [0,5] [2,4]
(a) Initial MDD (b) Node domains (c) MDD after filtering
Fig. 10.3 MDD propagation for the constraint S EQUENCE(X, S = {1}, q = 3, L = 1,U = 2) of
Example 10.4.
The authors in [153] defined the generalized S EQUENCE constraint, which ex-
tends the S EQUENCE constraint by allowing the A MONG constraints to be specified
with different lower and upper bounds, and subsequence lengths:
k
!
G EN -S EQUENCE(X , S, s, l, u) = A MONG(si , S, li , ui ).
i=1
We next investigate the strength of our partial MDD filtering algorithm with respect
to other consistency notions for the S EQUENCE constraint. In particular, we formally
compare the outcome of our partial MDD filtering algorithm with MDD propagation
for the A MONG encoding and with domain propagation for S EQUENCE. We say
10.4 Partial MDD Filtering for S EQUENCE 195
that two notions of consistency for a given constraint are incomparable if one is not
always at least as strong as the other.
First, we recall Theorem 4 from [36].
Theorem 10.2. [36] Bounds consistency on the cumulative sums encoding is incom-
parable to bounds consistency on the A MONG encoding of S EQUENCE.
Note that, since all variable domains in the A MONG and cumulative sums
encoding are ranges (intervals of integer values), bounds consistency is equivalent
to domain consistency for these encodings.
Proof. We apply the examples from the proof of Theorem 4 in [36]. Consider the
constraint S EQUENCE(X , S = {1}, q = 2, L = 1,U = 2) with the ordered sequence
of binary variables X = {x1 , x2 , x3 , x4 } and domains D(xi ) = {0, 1} for i = 1, 2, 4,
and D(x3 ) = {0}. We apply the ‘trivial’ MDD of width 1 representing the Cartesian
product of the variable domains. Establishing MDD consistency on the cumulative
sums encoding yields
y0 ∈ [0, 0], y1 ∈ [0, 1], y2 ∈ [1, 2], y3 ∈ [1, 2], y4 ∈ [2, 3],
x1 ∈ {0, 1}, x2 ∈ {0, 1}, x3 ∈ {0}, x4 ∈ {0, 1}.
y0 ∈ [0, 0], y1 ∈ [0, 0], y2 ∈ [0, 1], y3 ∈ [1, 1], y4 ∈ [1, 1],
x1 ∈ {0}, x2 ∈ {0, 1}, x3 ∈ {0, 1}, x4 ∈ {0},
while establishing MDD consistency on the A MONG encoding does not prune any
value.
196 10 MDD Propagation for S EQUENCE Constraints
Proof. The first example in the proof of Corollary 10.2 also shows that domain con-
sistency on S EQUENCE can be stronger than MDD consistency on the cumulative
sums encoding.
To show the opposite, consider a constraint S EQUENCE(X , S = {1}, q, L,U) with
a set of binary variables of arbitrary size, arbitrary values q, L, and U = |X | − 1.
Let M be the MDD defined over X consisting of two disjoint paths from r to t:
the arcs on one path all have label 0, while the arcs on the other all have value 1.
Since the projection onto the variable domains gives x ∈ {0, 1} for all x ∈ X , domain
consistency will not deduce infeasibility. However, establishing MDD consistency
with respect to M on the cumulative sums encoding will detect this.
Even though formally our MDD propagation based on cumulative sums is incom-
parable to domain propagation of S EQUENCE and MDD propagation of A MONG
constraints, in the next section we will show that in practice our algorithm can
reduce the search space by orders of magnitude compared with these other methods.
memory issues. We therefore excluded this algorithm from the comparisons in the
sections below.
Because single S EQUENCE constraints can be solved in polynomial time, we
consider instances with multiple S EQUENCE constraints in our experiments. We
assume that these are defined on the same ordered set of variables. To measure
the impact of the different propagation methods correctly, all approaches apply the
same fixed search strategy, i.e., following the given ordering of the variables, with
a lexicographic value ordering heuristic. For each method, we measure the number
of backtracks from a failed search state as well as the solving time. All experiments
are performed using a 2.33 GHz Intel Xeon machine.
q = (rand()%((n/2) − 5)) + 5.
Here, rand() refers to the standard C++ random number generator, i.e., rand()%k
selects a number in the range [0, k − 1]. Without the minimum length of 5, many
of the instances would be very easy to solve by either method. We next define the
difference between l and u as Δ := (rand()%q), and set
l := (rand()%(q − Δ )),
u := l + Δ .
250
200
Number of instances solved
150
100
MDD Width 32
MDD Width 2
Domain (Cumulative Sums)
Domain (Sequence − HPRS)
Domain (Sequence − Flow)
0
Fig. 10.4 Performance comparison of domain and MDD propagators for the S EQUENCE con-
straint. Each data point reflects the total number of instances that are solved by a particular method
within the corresponding time limit.
103 TO
TO
MDD Propagator (Width 32) − Backtracks
101
104
100
102
10−1
10−2
100
100 102 104 106 TO 10−2 10−1 100 101 102 103 TO
Domain Propagator (Cumulative Sums) − Backtracks Domain Propagator (Cumulative Sums) − Time (s)
Fig. 10.5 Comparing domain and MDD propagation for S EQUENCE constraints. Each data point
reflects the number of backtracks (a) resp. solving time in seconds (b) for a specific instance, when
solved with the best domain propagator (cumulative sums encoding) and the MDD propagator with
maximum width 32. Instances for which either method needed 0 backtracks (a) or less than 0.01
seconds (b) are excluded. Here, TO stands for ‘timeout’ and represents that the specific instance
could not be solved within 1,800 s (b). In (a), these instances are labeled separately by TO (at tick-
mark 108 ); note that the reported number of backtracks after 1,800 seconds may be much less than
108 for these instances. All reported instances with fewer than 108 backtracks were solved within
the time limit.
sums) with MDD propagation (maximum width 32). This comparison is particularly
meaningful because both propagation methods rely on the cumulative sums repre-
sentation. For each instance, Fig. 10.5(a) depicts the number of backtracks while
Fig. 10.5(b) depicts the solving time of both methods. The instances that were not
solved within the time limit are collected under ‘TO’ (time out) for that method.
Figure 10.5(a) demonstrates that MDD propagation can lead to dramatic search tree
reductions, by several orders of magnitude. Naturally, the MDD propagation comes
with a computational cost, but Fig. 10.5(b) shows that, for almost all instances
(especially the harder ones), the search tree reductions correspond to faster solving
times, again often by several orders of magnitude.
We next evaluate the impact of increasing maximum widths of the MDD propa-
gator. In Fig. 10.6, we present for each method the ‘survival function’ with respect
to the number of backtracks (a) and solving time (b). Formally, when applied
to combinatorial backtrack search algorithms, the survival function represents the
probability of a run taking more than x backtracks [75]. In our case, we approximate
this function by taking the proportion of instances that need at least x backtracks
10.5 Computational Results 201
Domain Consistency
1.0
1.0
MDD Width 2
MDD Width 4
MDD Width 8
0.5
0.5
MDD Width 16
MDD Width 32
MDD Width 64
MDD Width 128
Survival function
Survival function
0.1
0.1
0.05
0.05
Domain Consistency
MDD Width 2
MDD Width 4
MDD Width 8
0.005 0.01
0.005 0.01
MDD Width 16
MDD Width 32
MDD Width 64
MDD Width 128
100 101 102 103 104 105 106 107 10−2 10−1 100 101 102 103
Fails Time
(a) Survival function with respect to backtracks b) Survival function with respect to time
Fig. 10.6 Evaluating the impact of increased width for MDD propagation via survival function
plots with respect to search backtracks (a) and solving time (b). Both plots are in log–log scale.
Each data point reflects the percentage of instances that require at least that many backtracks (a)
resp. seconds (b) to be solved by a particular method.
(Fig. 10.6(a)), respectively seconds (Fig. 10.6(b)). Observe that these are log-log
plots. With respect to the search tree size, Fig. 10.6(a) clearly shows the strengthen-
ing of the MDD propagation when the maximum width is increased. In particular,
the domain propagation reflects the linear behavior over several orders of magnitude
that is typical for heavy-tailed runtime distributions. Naturally, similar behavior
is present for the MDD propagation, but in a much weaker form for increasing
maximum MDD widths. The associated solving times are presented in Fig. 10.6(b).
It reflects similar behavior, but also takes into account the initial computational
overhead of MDD propagation.
We next consider the nurse rostering problem defined in Example 10.1, which
represents a more structured problem class. That is, we define a constraint satis-
faction problem on variables xi (i = 1, . . . , n), with domains D(xi ) = {O, D, E, N}
representing the shift for the nurse. We impose the eight S EQUENCE constraints
modeling the requirements listed in Table 10.1. By the combinatorial nature of this
problem, the size of the CP search tree turns out to be largely independent of the
202 10 MDD Propagation for S EQUENCE Constraints
Table 10.2 Comparing domain propagation and the MDD propagation for S EQUENCE on nurse
rostering instances. Here, n stands for the number of variables, BT for the number of backtracks,
and CPU for solving time in seconds.
length of the time horizon, when a lexicographic search (by increasing day i) is
applied. We however do consider instances with various time horizons (n = 40, 60,
80, 100), to address potential scaling issues.
The results are presented in Table 10.2. The columns for ‘Domain Sequence’
show the total number of backtracks (BT) and solving time in seconds (CPU) for
the domain-consistent S EQUENCE propagator. Similarly, the columns for ‘Domain
Cumul. Sums’ show this information for the cumulative sums domain propagation.
The subsequent columns show these numbers for the MDD propagator, for MDDs
of maximum width 1, 2, 4, and 8. Note that propagating an MDD of width 1 corre-
sponds to domain propagation, and indeed the associated number of backtracks is
equivalent to the domain propagator of the cumulative sums. As a first observation,
a maximum width of 2 already reduces the number of backtracks by a factor of 8.3.
For maximum width of 8 the MDD propagation even allows to solve the problem
without search. The computation times are correspondingly reduced, e.g., from
157 s (resp. 96 s) for the domain propagators to 0.10 s for the MDD propagator
(width 8) for the instance with n = 100. Lastly, we can observe that in this case
MDD propagation does not suffer from scaling issues when compared with domain
propagation.
As a final remark, we also attempted to solve these nurse rostering instances using
the S EQUENCE domain propagator of CP Optimizer (IloSequence). It was able
to solve the instance with n = 40 in 1,150 seconds, but none of the other instances
were solved within the time limit of 1,800 seconds.
10.5 Computational Results 203
250
200
Number of instances solved
150
100
Sequence − Width 2
Among − Width 128
Among − Width 32
Among − Width 8
Among − Width 2
0
Fig. 10.7 Performance comparison of MDD propagation for S EQUENCE and A MONG for various
maximum widths. Each data point reflects the total number of instances that are solved by a
particular method within the corresponding time limit.
In our last experiment, we compare our S EQUENCE MDD propagator with the
MDD propagator for A MONG constraints by [94]. Our main goal is to determine
whether a large MDD is by itself sufficient to solve these problem (irrespective
of propagating A MONG or a cumulative sums decomposition), or whether the
additional information obtained by our S EQUENCE propagator makes the difference.
We apply both methods, i.e., MDD propagation for S EQUENCE and MDD
propagation for A MONG, to the dataset of Section 10.5.1 containing 250 instances.
The time limit is again 1,800 seconds, and we run the propagators with maximum
MDD widths 2, 8, 32, and 128.
We first compare the performance of the MDD propagators for A MONG and
S EQUENCE in Fig. 10.7. The figure depicts the number of instances that can be
solved within a given time limit for the various methods. The plot indicates that
the A MONG propagators are much weaker than the S EQUENCE propagator, and
moreover that larger maximum widths alone do not suffice: using the S EQUENCE
204 10 MDD Propagation for S EQUENCE Constraints
103 TO
TO
Width 2 Width 2
Width 8 Width 8
Sequence MDD Propagator − Backtracks
Width 32 Width 32
106
102
101
104
100
102
10−1
10−2
100
100 102 104 106 TO 10−2 10−1 100 101 102 103 TO
Among MDD Propagator − Backtracks Among MDD Propagator − Time (s)
Fig. 10.8 Evaluating MDD propagation for S EQUENCE and A MONG for various maximum widths
via scatter plots with respect to search backtracks (a) and solving time (b). Both plots are in log–log
scale and follow the same format as Fig. 10.5.
propagator with maximum width 2 outperforms the A MONG propagators for all
maximum widths up to 128.
The scatter plot in Fig. 10.8 compares the MDD propagators for A MONG and
S EQUENCE in more detail, for widths 2, 8, 32, and 128 (instances that take 0
backtracks, resp. less than 0.01 seconds, for either method are discarded from
Fig. 10.8(a), resp. 10.8(b)). For smaller widths, there are several instances that the
A MONG propagator can solve faster, but the relative strength of the S EQUENCE
propagator increases with larger widths. For width 128, the S EQUENCE propagator
can achieve orders of magnitude smaller search trees and solving time than the
A MONG propagators, which again demonstrates the advantage of MDD propagation
for S EQUENCE when compared with the A MONG decomposition.
Chapter 11
Sequencing and Single-Machine Scheduling
11.1 Introduction
Sequencing problems are among the most widely studied problems in operations re-
search. Specific variations of sequencing problems include single-machine schedul-
ing, the traveling salesman problem with time windows, and precedence-constrained
machine scheduling. Sequencing problems are those where the best order for per-
forming a set of tasks must be determined, which in many cases leads to an NP-hard
problem [71, Section A5]. Sequencing problems are prevalent in manufacturing and
routing applications, including production plants where jobs should be processed
one at a time on an assembly line, and in mail services where packages must
be scheduled for delivery on a vehicle. Industrial problems that involve multiple
facilities may also be viewed as sequencing problems in certain scenarios, e.g., when
Job parameters
r
Job Release (r j ) Deadline (d j ) Processing (p j )
j2 j3 π1
j1 2 20 3
j2 0 14 4 u1 u2
j3 1 14 2
j1 j3 j2 j1 π2
Setup times u3 u4 u5
j1 j2 j3
j1 - 3 2 j3 j1 j2 π3
j2 3 - 1
j3 1 2 - t
(a) Instance data. (b) MDD.
solution where jobs j3 , j2 , and j1 are performed in this order. The completion times
for this solution are c j1 = 15, c j2 = 9, and c j3 = 3. Note that we can never have a
solution where j1 is first on the machine, otherwise either the deadline of j2 or j3
would be violated. Hence, there is no arc a with d(a) = j1 directed out of r.
We next show how to compute the orderings that yield the optimal makespan and
the optimal sum of setup times in polynomial time in the size of M . For the case of
total tardiness and other similar objective functions, we are able to provide a lower
bound on its optimal value also in polynomial time in M .
• Makespan. For each arc a in M , define the earliest completion time of a, or ecta ,
as the minimum completion time of the job d(a) among all orderings that are
identified by the paths in M containing a. If the arc a is directed out of r, then a
assigns the first job that is processed in such orderings, thus ecta = rd(a) + pd(a).
For the remaining arcs, recall that the completion time cπi of a job πi depends
only on the completion time of the previous job πi−1 , the setup time tπi−1 ,πi , and
on the specific job parameters; namely, cπi = max{rπi , cπi−1 + tπi−1 ,πi } + pπi . It
follows that the earliest completion time of an arc a = (u, v) can be computed by
the relation
the tardiness function for a job is nondecreasing in its completion time, we can
utilize the earliest completion time as follows: For any arc a = (u, v), the value
max{0, ecta − δd(a) } yields a lower bound on the tardiness of the job d(a) among
all orderings that are represented by the paths in M containing a. Hence, a lower
bound on the total tardiness is given by the length of the shortest path from r to
t, where the length of an arc a is set to max{0, ecta − δd(a) }. Observe that this
bound is tight if the MDD is composed by a single path.
We remark that valid bounds for many other types of objective in the scheduling
literature can be computed in an analogous way as above. For example, suppose the
objective is to minimize ∑ j∈J f j (c j ), where f j is a function defined for each job j
and which is nondecreasing in the completion time c j . Then, as in total tardiness, the
value fd(a) (ecta ) for an arc a = (u, v) yields a lower bound on the minimum value
of fd(a) (cd(a) ) among all orderings that are identified by the paths in M containing
a. Using such bounds as arc lengths, the shortest path from r to t represents a lower
bound on ∑ j∈J f j (c j ). This bound is tight if f j (c j ) = c j , or if M is composed by
a single path. Examples of such objectives include weighted total tardiness, total
square tardiness, sum of (weighted) completion times, and number of late jobs.
Example 11.2. In the instance depicted in Fig. 11.1, we can apply the recurrence re-
lation (11.1) to obtain ectr,u1 = 4, ectr,u2 = 3, ectu1 ,u3 = 10, ectu1 ,u4 = 7, ectu2 ,u4 = 9,
ectu2 ,u5 = 7, ectu3 ,t = 14, ectu4 ,t = 11, and ectu5 ,t = 14. The optimal makespan is
min{ectu3 ,t , ectu4 ,t , ectu5 ,t } = ectu4 ,t = 11; it corresponds to the path (r, u1 , u4 , t),
which identifies the optimal ordering ( j2 , j3 , j1 ). The same ordering also yields the
optimal sum of setup times with a value of 2.
Suppose now that we are given due dates δ j1 = 13, δ j2 = 8, and δ j3 = 3. The
length of an arc a is given by la = max{0, ecta − δd(a) }, as described earlier. We
have lu1 ,u4 = 4, lu2 ,u4 = 1, lu3 ,t = 11, and lu5 ,t = 6; all remaining arcs a are such
that la = 0. The shortest path in this case is (r, u2 , u4 , t) and has a value of 1. The
minimum tardiness, even though it is given by the ordering identified by this same
path, ( j3 , j2 , j1 ), has a value of 3.
The reason for this gap is that the ordering with minimum tardiness does not
necessarily coincide with the schedule corresponding to the earliest completion
time. Namely, we computed lu4 ,t = 0 considering ectu4 ,t = 11, since the completion
time of the job d(u4 , t) = j1 is 11 in ( j2 , j3 , j1 ). However, in the optimal ordering
( j3 , j2 , j1 ) for total tardiness, the completion time of j1 would be 15; this solution
yields a better cost than ( j2 , j3 , j1 ) due to the reduction in the tardiness of j3 .
11.4 Relaxed MDDs 211
r r
π1 j1 j2 j3 π1 j2 j3
u1 u1 u2
π2 j1 j2 j3 π2 j1 j3 j2 j1
u2 u3 u4
π3 j1 j2 j3 π3 j3 j2
j1
t t
Fig. 11.2 Two relaxed MDDs for the sequencing problem in Fig. 11.1.
We next consider the compilation of relaxed MDDs for sequencing problems, which
represent a superset of the feasible orderings of J . As an illustration, Fig. 11.2(a)
and 11.2(b) present two examples of a relaxed MDD with maximum width W = 1
and W = 2, respectively, for the problem depicted in Fig. 11.1. In particular, the
MDD in Fig. 11.2(a) encodes all the orderings represented by permutations of J
with repetition, hence it trivially contains the feasible orderings of any sequencing
problem. It can be generally constructed as follows: We create one node ui for each
layer Li and connect the pair of nodes ui and ui+1 , i = 1, . . . , n, with arcs a1 , . . . , an
such that d(al ) = jk for each job jk .
It can also be verified that the MDD in Fig. 11.2(b) contains all the feasible
orderings of the instance in Fig. 11.1. However, the rightmost path going through
nodes r, u2 , u4 , and t identifies an ordering π = ( j3 , j1 , j1 ), which is infeasible as
job j1 is assigned twice in π .
The procedures in Section 11.3 for computing the optimal makespan and the
optimal sum of setup times now yield a lower bound on such values when applied
to a relaxed MDD, since all feasible orderings of J are encoded in the diagram.
Moreover, the lower bounding technique for total tardiness remains valid.
Considering that a relaxed MDD M can be easily constructed for any sequenc-
ing problem (e.g., the 1-width relaxation of Fig. 11.2(a)), we can now apply the
techniques presented in Section 4.7 and Chapter 9 to incrementally modify M in
order to strengthen the relaxation it provides, while observing the maximum width
212 11 Sequencing and Single-Machine Scheduling
W . Under certain conditions, we obtain the reduced MDD representing exactly the
feasible orderings of J , provided that W is sufficiently large.
Recall that we modify a relaxed MDD M by applying the operations of filtering
and refinement, which aim at approximating M to an exact MDD, i.e., one that
exactly represents the feasible orderings of J . We revisit these concepts below,
and describe them in the context of sequencing problems:
Observe that, if a relaxed MDD M does not have any infeasible arcs and
no nodes require splitting, then by definition M is exact. However, it may not
necessarily be reduced.
As mentioned in Chapter 9, filtering and refinement are independent operations
that can be applied to M in any order that is suitable for the problem at hand. In
this chapter we assume a top-down approach: We traverse layers L2 , . . . , Ln+1 one
at a time in this order. At each layer Li , we first apply filtering to remove infeasible
arcs that are directed to the nodes in Li . After the filtering is complete, we perform
refinement to split the nodes in layer Li as necessary, while observing the maximum
width W .
Example 11.3. Figure 11.3 illustrates the top-down application of filtering and re-
finement for layers L2 and L3 . Assume a scheduling problem with three jobs
J = { j1 , j2 , j3 } and subject to a single precedence constraint stating that job j2
11.4 Relaxed MDDs 213
L1 r r r
π1 j1 j2 j3 j2 j3 j2 j3
L2 u u1 u2 u1 u2
j3 j3
π2 j1 j2 j3 j2 j2 j1 j3 j2
L3 v j1 v j1 v1 v2
j3 j3
π3 j1 j2 j3 j1 j2 j3 j2 j2
j1 j1
L4 t t t
(a) Initial relaxation. (b) After processing L2 . (c) After processing L3 .
Fig. 11.3 Example of filtering and refinement. The scheduling problem is such that job j2 must
precede j1 in all feasible orderings. Shaded arrows represent infeasible arcs detected by the
filtering.
must precede job j1 . The initial relaxed MDD is a 1-width relaxation, depicted in
Fig. 11.3(a). Our maximum width is set to W = 2.
We start by processing the incoming arcs at layer L2 . The filtering operation
detects that the arc a ∈ in(u) with d(a) = j1 is infeasible, otherwise we will have
an ordering starting with job j1 , violating the precedence relation. Refinement will
split node u into nodes u1 and u2 , since for any feasible ordering starting with job
j2 , i.e., ( j2 , π ) for some π , the ordering ( j3 , π ) is infeasible as it will necessarily
assign job j3 twice. The resulting MDD is depicted in Fig. 11.3(b). Note that, when
a node is split, we replicate its outgoing arcs to each of the new nodes.
We now process the incoming arcs at layer L3 . The filtering operation detects that
the arc with label j2 directed out of u1 and the arc with label j3 directed out of u2
are infeasible, since the corresponding paths from r to v would yield orderings that
assign some job twice. The arc with label j1 leaving node u2 is also infeasible, since
we cannot have any ordering with prefix ( j3 , j1 ). Finally, refinement will split node v
into nodes v1 and v2 ; note in particular that the feasible orderings prefixed by ( j2 , j3 )
and ( j3 , j2 ) have the same completions, namely ( j1 ), therefore the corresponding
paths end at the same node v1 . The resulting MDD is depicted in Fig. 11.3(c). We
can next process the incoming arcs at layer L4 , and remove arcs with labels j1 and
j2 out of v1 , and arcs with labels j2 and j3 out of v2 .
214 11 Sequencing and Single-Machine Scheduling
11.5 Filtering
Lemma 11.1. An arc a = (u, v) with label d(a) is infeasible if either of the following
conditions hold:
Proof. Let π be any partial ordering identified by a path from r to u, and consider
(11.3). By definition of Some↓u , we have that any job j in the set J \ Some↓u is not
assigned to any position in π . Thus, if any such job j must precede d(a), then all
orderings prefixed by (π , d(a)) will violate this precedence constraint, and the arc
is infeasible. The condition (11.4) is the symmetrical version of (11.3).
Consider now that a deadline d j is imposed for each job j ∈ J . With each arc a we
associate the state ecta as defined in Section 11.3: It corresponds to the minimum
completion time of the job in the (a)-th position among all orderings that are
identified by paths in M containing the arc a. As in relation (11.1), the state ecta
for an arc a = (u, v) is given by the recurrence
⎧
⎪
⎪ r + pd(a) if a ∈ out(r),
⎨ d(a)
ecta = max{rd(a) , min{ecta + td(a),d(a) : a ∈ in(u), d(a) = d(a )}} + pd(a)
⎪
⎪
⎩
otherwise.
Here we added the trivial condition d(a) = d(a ) to strengthen the bound on ecta
in the relaxed MDD M . We could also include the condition d(a) d(a ) if
precedence constraints are imposed over d(a).
216 11 Sequencing and Single-Machine Scheduling
Proof. The value lsta + pd(a) represents an upper bound on the maximum time
the job d(a) can be completed so that no deadlines are violated in the orderings
identified by paths in M containing a. Since ecta is the minimum time that job d(a)
will be completed among all such orderings, no feasible ordering identified by a
path traversing a exists if rule (11.5) holds.
three types of objectives: minimize makespan, minimize the sum of setup times,
and minimize total (weighted) tardiness.
Minimize Makespan
Now, for each arc a = (u, v) let st↑a be the minimum possible sum of setup times
incurred by the partial orderings represented by paths from u to t that contain a.
The state st↑a can be recursively computed through a bottom-up traversal of M , as
follows:
↑ 0, if a ∈ in(t),
sta =
min{td(a),d(a) + st↑a : a ∈ out(v), d(a) = d(a )}, otherwise.
To impose an upper bound z∗ on the total tardiness, assume ecta is computed for
each arc a. We define the length of an arc a as la = max{0, ecta − δd(a) }. For a node
u, let sp↓u and sp↑u be the shortest path from r to u and from t to u, respectively, with
respect to the lengths la . That is,
0, if u = r,
sp↓u =
min{la + sp↓v : a = (v, u) ∈ in(u)}, otherwise
and
0, if u = t,
sp↑u =
min{la + sp↑v : a = (u, v) ∈ out(u)}, otherwise.
Lemma 11.4. A node u should be removed from M if
Proof. Length la represents a lower bound on the tardiness of job d(a) with respect
to solutions identified by r–t paths that contain a. Thus, sp↓u and sp↑u are a lower
bound on the total tardiness for the partial orderings identified by paths from r to u
and t to u, respectively, since the tardiness of a job is nondecreasing in its completion
time.
Given a set of precedence relations for a problem (e.g., that were possibly derived
from other relaxations), we can use the filtering rules (11.3) and (11.4) from
Section 11.5.2 to strengthen a relaxed MDD. In this section, we show that a converse
relation is also possible. Namely, given a relaxed MDD M , we can deduce all
precedence relations that are satisfied by the partial orderings represented by M
in polynomial time in the size of M . To this end, assume that the states All↓u , All↑u ,
Some↓u , and Some↑u as described in Section 11.5.1 are computed for all nodes u in
M . We have the following results:
Theorem 11.1. Let M be an MDD that exactly identifies all the feasible orderings
of J . A job j must precede job j in any feasible ordering if and only if either
j ∈ All↓u or j ∈ All↑u for all nodes u in M .
11.7 Refinement 219
Proof. Suppose there exists a node u in layer Li , i ∈ {1, . . . , n+1}, such that j ∈ All↓u
and j ∈ All↑u . By definition, there exists a path (r, . . . , u, . . . , t) that identifies an
ordering where job j starts before job j. This is true if and only if job j does not
precede j in any feasible ordering.
Corollary 11.1. The set of all precedence relations that must hold in any feasible
ordering can be extracted from M in O(n2 |M |).
Proof. It follows from the state definitions that All↓u ⊆ Some↓u and All↑u ⊆ Some↑u .
Hence, if the conditions for the relation j j from Theorem 11.1 are satisfied by
Some↓u and Some↑v , they must be also satisfied by any MDD which only identifies
feasible orderings.
11.7 Refinement
job follows from the problem data. More specifically, we will develop a heuristic for
refinement that, when combined with the infeasibility conditions for the permutation
structure described in Section 11.5.1, yields a relaxed MDD where the jobs with a
high priority are represented exactly with respect to that structure; that is, these jobs
are assigned to exactly one position in all orderings encoded by the relaxed MDD.
We also take care that a given maximum width W is observed when creating new
nodes in a layer.
Thus, if higher priority is given to jobs that play a greater role in the feasibility
or optimality of the sequencing problem at hand, the relaxed MDD may represent
more accurately the feasible orderings of the problem, providing, e.g., better bounds
on the objective function value. For example, suppose we wish to minimize the
makespan on an instance where certain jobs have very large release dates and
processing times in comparison with other jobs. If we construct a relaxed MDD
where these longer jobs are assigned exactly once in all orderings encoded by the
MDD, the bound on the makespan would be potentially tighter with respect to the
ones obtained from other possible relaxed MDDs for this same instance. Examples
of job priorities for other objective functions are presented in Section 11.9. Recall
from Section 4.7.1.2 that the refinement heuristic requires a ranking of jobs J ∗ =
{ j1∗ , . . . , jn∗ }, where jobs with smaller index in J ∗ have higher priority.
We note that the refinement heuristic also yields a reduced MDD M for certain
structured problems, given a sufficiently large width. The following corollary, stated
without proof, is directly derived from Lemma 4.4 and Theorem 4.3.
Corollary 11.3. Assume W = +∞. For a sequencing problem having only prece-
dence constraints, the relaxed MDD M that results from the constructive proof of
Theorem 4.3 is a reduced MDD that exactly represents the feasible orderings of this
problem.
Lastly, recall that equivalence classes corresponding to constraints other than the
permutation structure may also be taken into account during refinement. Therefore,
if the maximum width W is not met in the refinement procedure above, we assume
that we will further split nodes by arbitrarily partitioning their incoming arcs. Even
though this may yield false equivalence classes, the resulting M is still a valid
relaxation and may provide a stronger representation.
11.8 Encoding Size for Structured Precedence Relations 221
The actual constraints that define a problem instance greatly impact the size of an
MDD. If these constraints carry a particular structure, we may be able to compactly
represent that structure in an MDD, perhaps enabling us to bound its width.
In this section we present one such case for a problem class introduced by [14], in
which jobs are subject to discrepancy precedence constraints: For a fixed parameter
k ∈ {1, . . . , n}, the relation j p jq must be satisfied for any two jobs j p , jq ∈ J
if q ≥ p + k. This precedence structure was motivated by a real-world application
in steel rolling mill scheduling. The work by [15] also demonstrates how solution
methods to this class of problems can serve as auxiliary techniques in other cases,
for example, as heuristics for the TSP and vehicle routing with time windows.
We stated in Corollary 11.3 that we are able to construct the reduced MDD
M when only precedence constraints are imposed and a sufficiently large W is
given. We have the following results for M if the precedence relations satisfy the
discrepancy structure for a given k:
Lemma 11.5. We have All↓v ⊆ { j1 , . . . , jmin{m+k−1, n} } for any given node v ∈ Lm+1 ,
m = 1, . . . , n.
Proof. Let us first assume n ≥ k + 2 and restrict our attention to layer Lm+1 for some
m ∈ {k, . . . , n − k + 1}. Also, let F := {All↓u : u ∈ Lm+1 }. It can be shown that, if M
is reduced, no two nodes u, v ∈ Lm+1 are such that All↓u = All↓v . Thus, |F | = |Lm+1 |.
We derive the cardinality of F as follows: Take All↓v ∈ F for some v ∈ Lm+1 .
Since |All↓v | = m, there exists at least one job ji ∈ All↓v such that i ≥ m. According
to Lemma 11.5, the maximum index of a job in All↓v is m + k − 1. So consider the
jobs indexed by m + k − 1 − l for l = 0, . . . , k − 1; at least one of them is necessarily
contained in All↓v . Due to the discrepancy precedence constraints, jm+k−1−l ∈ All↓v
implies that any ji with i ≤ m − l − 1 is also contained in All↓v (if m − l − 1 > 0).
222 11 Sequencing and Single-Machine Scheduling
We can use an analogous argument for the layers Lm+1 such that m < k or
m > n − k + 1, or when k = n − 1. The main technical difference is that we have
fewer than k−1 possibilities for the new combinations, and so the maximum number
of nodes is strictly less than 2k−1 for these cases. Thus the width of M is 2k−1 .
We next describe how the techniques of the previous sections can be added to IBM
ILOG CP Optimizer (CPO), a state-of-the-art general-purpose constraint program-
ming solver. In particular, it contains dedicated syntax and associated propagation
algorithms for sequencing and scheduling problems. Given a sequencing problem as
considered in this chapter, CPO applies a depth-first branch-and-bound search where
jobs are recursively appended to the end of a partial ordering until no jobs are left
unsequenced. At each node of the branching tree, a number of sophisticated propa-
gation algorithms are used to reduce the possible candidate jobs to be appended to
11.9 Application to Constraint-Based Scheduling 223
Three formulations were considered for each problem: a CPO model with its
default propagators, denoted by CPO; a CPO model containing only the MDD-based
propagator, denoted by MDD; and a CPO model with the default and MDD-based
propagators combined, denoted by CPO+MDD. The experiments mainly focus on the
comparison between CPO and CPO+MDD, as these indicate whether incorporating
the MDD-based propagator can enhance existing methods.
We have considered two heuristic strategies for selecting the next job to be
appended to a partial schedule. The first, denoted by lex search, is a static method
that always tries to first sequence the job with the smallest index, where the index of
a job is fixed per instance and defined by the order in which it appears in the input.
This allows for a more accurate comparison between two propagation methods,
224 11 Sequencing and Single-Machine Scheduling
since the branching tree is fixed. In the second strategy, denoted by dynamic search,
the CPO engine automatically selects the next job according to its own state-of-the-
art scheduling heuristics. The purpose of the experiments that use this search is to
verify how the MDD-based propagator is influenced by strategies that are known to
be effective for constraint-based solvers. The dynamic search is only applicable to
CPO and CPO+MDD.
We measure two performance indicators: the total solving time and the number
of fails. The number of fails corresponds to the number of times during search that a
partial ordering was detected to be infeasible, i.e., either some constraint is violated
or the objective function is greater than a known upper bound. The number of fails
is proportional to the size of the branching tree and, hence, to the total solving time
of a particular technique.
The techniques presented here do not explore any additional problem structure
that was not described in this chapter, such as specific search heuristics, problem
relaxations, or dominance criteria (except only if such structure is already explored
by CPO). More specifically, we used the same MDD-based propagator for all
problems, which dynamically determines what node state and refinement strategy
to use according to the input constraints and the objective function.
The experiments were performed on a computer equipped with an Intel Xeon
E5345 at 2.33 GHz with 8 GB RAM. The MDD code was implemented in C++ using
the CPO callable library from ILOG CPLEX Academic Studio V.12.4.01. We set
the following additional CPO parameters for all experiments: Workers=1, to use a
single computer core; DefaultInferenceLevel=Extended, to use the max-
imum possible propagation available in CPO; and SearchType=DepthFirst.
We first investigate the impact of the maximum width and refinement on the number
of fails and total solving time for the MDD approaches. As a representative test
case, we consider the traveling salesman problem with time windows (TSPTW).
The TSPTW is the problem of finding a minimum-cost tour in a weighted digraph
starting from a selected vertex (the depot), visiting each vertex within a given time
window, and returning to the original vertex. In our case, each vertex is a job, the
release dates and deadlines are defined according to the vertex time windows, and
11.9 Application to Constraint-Based Scheduling 225
1e+06
500
100 200
1e+05
Number of Fails
50
Time (s)
1e+04
20
1e+03
10
5
1e+02
2
4 8 16 32 64 128 256 512 1024 4 8 16 32 64 128 256 512 1024
Fig. 11.4 Impact of the MDD width on the number of fails and total time for the TSPTW instance
n20w200.001 from the Gendreau class. The axes are in logarithmic scale.
travel distances are perceived as setup times. The objective function is to minimize
the sum of setup times.
We selected the instance n20w200.001 from the well-known Gendreau bench-
mark proposed by [72], as it represents the typical behavior of an MDD. It consists
of a 20-vertex graph with an average time window width of 200 units. The tested
approach was the MDD model with lex search. We used the following job ranking
for the refinement strategy described in Section 11.7: The first job in the ranking,
j1∗ , was set as the first job of the input. The i-th job in the ranking, ji∗ , is the
one that maximizes the sum of the setup times to the jobs already ranked, i.e.,
ji∗ = arg max p∈J \{ j∗ ,..., j∗ } {∑i−1
k=1 t jk ,p } for the setup times t. The intuition is that
∗
1 i−1
we want jobs with largest travel distances to be exactly represented in M .
The number of fails and total time to find the optimal solution for different
MDD widths are presented in Fig. 11.4. Due to the properties of the refinement
technique in Theorem 4.3, we consider only powers of 2 as widths. We note from
Fig. 11.4(a) that the number of fails is decreasing rapidly as the width increases, up
to a point where it becomes close to constant (from 512 to 1024). This indicates that,
at a certain point, the relaxed MDD is very close to an actual exact representation
of the problem, and hence no benefit is gained from any increment of the width.
The number of fails has a direct impact on the total solving time, as observed in
Fig. 11.4(b). Namely, the times decrease accordingly as the width increases. At
226 11 Sequencing and Single-Machine Scheduling
500
500
Num. of Fails Random / Num. of Fails Structured
20
20
10
10
5
5
2
2
1
1
Fig. 11.5 Performance comparison between random and structured refinement strategies for the
TSPTW instance n20w200.001. The axes are in logarithm scale.
the point where the relaxed MDD is close to exact, larger widths only introduce
additional overhead, thus increasing the solving time.
To analyze the impact of the refinement, we generated 50 job rankings uniformly
at random for the refinement strategy described in Section 11.7. These rankings were
compared with the structured one for setup times used in the previous experiment.
To make this comparison, we solved the MDD model with lex search for each of
the 51 refinement orderings, considering widths from 4 to 1024. For each random
order, we divided the resulting number of fails and time by the ones obtained
with the structured refinement for the same width. Thus, this ratio represents how
much better the structured refinement is over the random strategies. The results are
presented in the box-and-whisker plots of Fig. 11.5. For each width the horizontal
lines represent, from top to bottom, the maximum observed ratio, the upper quartile,
the median ratio, the lower quartile, and the minimum ratio.
We interpret Fig. 11.5 as follows: An MDD with very small width captures
little of the jobs that play a more important role in the optimality or feasibility
of the problem, in view of Theorem 4.3. Thus, distinct refinement strategies are
not expected to differ much on average, as shown, e.g., in the width-4 case of
Fig. 11.5(a). As the width increases, there is a higher chance that these crucial jobs
are better represented by the MDD, leading to a good relaxation, but also a higher
chance that little of their structure is captured by a random strategy, leading in turn
to a weak relaxation. This yields a larger variance in the refinement performance.
11.9 Application to Constraint-Based Scheduling 227
Finally, for sufficiently large widths, we end up with an almost exact representation
of the problem and the propagation is independent of the refinement order (e.g.,
widths 512 and 1024 of Fig. 11.5(a)). Another aspect we observe in Fig. 11.5(b)
is that, even for relatively small widths, the structured refinement can be orders of
magnitude better than a random one. This emphasizes the importance of applying
an appropriate refinement strategy for the problem at hand.
1e+08
1e+03
CPO+MDD Width 16 − Number of Fails
1e+06
1e+02
CPO+MDD Width 16 − Time (s)
1e+01
1e+04
1e+00
1e+02
1e−01
1e+00
1e+00 1e+02 1e+04 1e+06 1e+08 1e−01 1e+00 1e+01 1e+02 1e+03
CPO − Number of Fails CPO − Time (s)
Fig. 11.6 Performance comparison between CPO and CPO+MDD for minimizing sum of setup
times on Dumas, Gendreau, and Ascheuer TSPTW classes with lex search. The vertical and
horizontal axes are in logarithmic scale.
350
1e+03
300
CPO+MDD Width 1024 − Time (s)
1e+02
150
1e+00
100
50
1e−01
CPO
CPO+MDD − Width 1024
0
1e−01 1e+00 1e+01 1e+02 1e+03 0 300 600 900 1200 1500 1800
Fig. 11.7 Performance comparison between CPO and CPO+MDD for minimizing sum of setup
times on Dumas, Gendreau, and Ascheuer TSPTW classes using default depth-first CPO search.
The horizontal and vertical axes in (a) are in logarithmic scale.
1 Since the TSPLIB results are not updated on the TSPLIB website, we report updated bounds
obtained from [92], [76], and [6].
230 11 Sequencing and Single-Machine Scheduling
Table 11.1 Results on ATSPP instances. Values in bold represent instances solved for the first
time. TL represents that the time limit (1,800 s) was reached.
CPO CPO+MDD
width 2048
Instance Vertices Bounds Best Time (s) Best Time (s)
br17.10 17 55 55 0.01 55 4.98
br17.12 17 55 55 0.01 55 4.56
ESC07 7 2125 2125 0.01 2125 0.07
ESC25 25 1681 1681 TL 1681 48.42
p43.1 43 28140 28205 TL 28140 287.57
p43.2 43 [28175, 28480] 28545 TL 28480 279.18
p43.3 43 [28366, 28835] 28930 TL 28835 177.29
p43.4 43 83005 83615 TL 83005 88.45
ry48p.1 48 [15220, 15805] 18209 TL 16561 TL
ry48p.2 48 [15524, 16666] 18649 TL 17680 TL
ry48p.3 48 [18156, 19894] 23268 TL 22311 TL
ry48p.4 48 [29967, 31446] 34502 TL 31446 96.91
ft53.1 53 [7438, 7531] 9716 TL 9216 TL
ft53.2 53 [7630, 8026] 11669 TL 11484 TL
ft53.3 53 [9473, 10262] 12343 TL 11937 TL
ft53.4 53 14425 16018 TL 14425 120.79
100
5.00
α = 0.25
CPO − Num. of Fails / CPO+MDD Width 16 − Num. of Fails
α = 0.50 α = 0.25
α = 0.75 α = 0.50
α = 0.75
50
2.00
20
1.00
10
0.50
5
0.20
0.10
2
0.05
1
Fig. 11.8 Comparison between CPO and CPO+MDD for minimizing makespan on three instances
with randomly generated setup times. The vertical axes are in logarithmic scale.
{0, . . . , (50.5)β }, where β ∈ {0, 0.5, 1, . . ., 4}. In total, 10 instances are generated for
each β . We computed the number of fails and total time to minimize the makespan
using CPO and CPO+MDD models with a maximum width of 16, applying a lex
search in both cases. We then divided the CPO results by the CPO+MDD results,
and computed the average ratio for each value of β . The job ranking for refinement
is done by sorting the jobs in decreasing order according to the value obtained by
summing their release dates with their processing times. This forces jobs with larger
completion times to have higher priority in the refinement.
The results are presented in Fig. 11.8. For each value of α , we plot the ratio
of CPO and CPO+MDD in terms of the number of fails (Fig. 11.8(a)) and time
(Fig. 11.8(b)). The plot in Fig. 11.8(a) indicates that the CPO+MDD inference
becomes more dominant in comparison with CPO for larger values of β , that is,
when setup times become more important. The MDD introduces a computational
overhead in comparison with the CPO times (around 20 times slower for this
particular problem size). This is compensated as β increases, since the number of
fails for the CPO+MDD model becomes orders of magnitude smaller in comparison
with CPO. The same behavior was observed on average for other base instances
generated under the same scheme.
To evaluate this on structured instances, we consider the TSPTW instances
defined by the Gendreau and Dumas benchmark classes, where we changed the
objective function to minimize makespan instead of the sum of setup times. We
232 11 Sequencing and Single-Machine Scheduling
1e+08
1e+03
CPO+MDD Width 16 − Number of Fails
1e+06
1e+00
1e+02
1e−01
1e+00
1e+00 1e+02 1e+04 1e+06 1e+08 1e−01 1e+00 1e+01 1e+02 1e+03
CPO − Number of Fails CPO − Time (s)
Fig. 11.9 Performance comparison between CPO and CPO+MDD for minimizing makespan on
Dumas and Gendreau TSPTW classes. The vertical and horizontal axes are in logarithmic scale.
selected all instances with up to 100 jobs, yielding 240 test cases in total. We solved
the CPO and the CPO+MDD models with lex search, so as to compare the inference
strength for these problems. A maximum width of 16 was set for CPO+MDD, and a
time limit of 1,800 seconds was imposed for both cases. The job ranking is the same
as in the previous experiment.
The CPO approach was able to solve 211 instances to optimality, while the
CPO+MDD approach solved 227 instances to optimality (including all the instances
solved by CPO). The number of fails and solving time are presented in Fig. 11.9,
where we only depict instances solved by both methods. In general, for easy
instances (up to 40 jobs or with a small time window width), the reduction of
the number of fails induced by CPO+MDD was not significant, and thus did not
compensate the computational overhead introduced by the MDD. However, we note
that the MDD presented better performance for harder instances; the lower diagonal
of Fig. 11.9(b) is mostly composed by instances from the Gendreau class with
larger time windows, for which the number of fails was reduced by five and six
orders of magnitude. We also note that the result for the makespan objective is less
pronounced than for the sum of setup times presented in Section 11.9.3.
11.9 Application to Constraint-Based Scheduling 233
60
60
50
50
Number of Instances Solved
40
30
30
20
20
CPO CPO
CPO+MDD Width 16 CPO+MDD Width 16
10
10
CPO+MDD Width 32 CPO+MDD Width 32
CPO+MDD Width 64 CPO+MDD Width 64
CPO+MDD Width 128 CPO+MDD Width 128
0
0
0 300 600 900 1200 1500 1800 0 300 600 900 1200 1500 1800
Time(s) Time(s)
Fig. 11.10 Performance comparison between CPO and CPO+MDD for minimizing total tardiness
on randomly generated instances with 15 jobs.
In all cases, a minimum width of 128 would suffice for the MDD propagation to
provide enough inference to solve all the considered problems.
References
[1] E. Aarts and J. K. Lenstra. Local Search in Combinatorial Optimization.
John Wiley & Sons, New York, 1997.
[2] H. Abeledo, R. Fukasawa, A. Pessoa, and E. Uchoa. The time dependent
traveling salesman problem: polyhedra and algorithm. Mathematical Pro-
gramming Computation, 5(1):27–55, 2013.
[3] S. B. Akers. Binary decision diagrams. IEEE Transactions on Computers,
C-27:509–516, 1978.
[4] H. R. Andersen, T. Hadžić, J. N. Hooker, and P. Tiedemann. A constraint
store based on multivalued decision diagrams. In Principles and Practice of
Constraint Programming (CP 2007), volume 4741 of LNCS, pages 118–132.
Springer, 2007.
[5] H. R. Andersen, T. Hadžić, and D. Pisinger. Interactive cost configuration
over decision diagrams. Journal of Artificial Intelligence Research, 37:99–
139, 2010.
[6] D. Anghinolfi, R. Montemanni, M. Paolucci, and L. M. Gambardella. A
hybrid particle swarm optimization approach for the sequential ordering
problem. Computers & Operations Research, 38(7):1076–1085, 2011.
[7] K. R. Apt. Principles of Constraint Programming. Cambridge University
Press, 2003.
[8] S. Arnborg, D. G. Corneil, and A. Proskurowski. Complexity of finding em-
beddings in a k-tree. SIAM Journal on Algebraic and Discrete Mathematics,
8:277–284, 1987.
[9] S. Arnborg and A. Proskurowski. Characterization and recognition of partial
k-trees. SIAM Journal on Algebraic and Discrete Mathematics, 7:305–314,
1986.
[10] N. Ascheuer. Hamiltonian Path Problems in the On-line Optimization of
Flexible Manufacturing Systems. PhD thesis, Technische Universität Berlin,
Germany, 1995.
[11] N. Ascheuer, M. Jünger, and G. Reinelt. A branch and cut algorithm for
the asymmetric traveling salesman problem with precedence constraints.
Computational Optimization and Applications, 17:61–84, 2000.
[12] K. R. Baker and B. Keller. Solving the single-machine sequencing problem
using integer programming. Computers and Industrial Engineering, 59:730–
735, 2010.
© Springer International Publishing Switzerland 2016 235
D. Bergman et al., Decision Diagrams for Optimization, Artificial Intelligence:
Foundations, Theory, and Algorithms, DOI 10.1007/978-3-319-42849-9
236 References
[26] D. Bergman, A. A. Ciré, W.-J. van Hoeve, and J. N. Hooker. Discrete opti-
mization with binary decision diagrams. INFORMS Journal on Computing,
28:47–66, 2016.
[27] D. Bergman, A. A. Ciré, W.-J. van Hoeve, and T. Yunes. BDD-based
heuristics for binary optimization. Journal of Heuristics, 20(2):211–234,
2014.
[28] D. Bergman, W.-J. van Hoeve, and J. N. Hooker. Manipulating MDD
relaxations for combinatorial optimization. In T. Achterberg and C. Beck,
editors, CPAIOR Proceedings, volume 6697 of LNCS. Springer, 2011.
[29] U. Bertele and F. Brioschi. Nonserial Dynamic Programming. Academic
Press, New York, 1972.
[30] T. Berthold. Primal heuristics for mixed integer programs. Master’s thesis,
Zuse Institute Berlin, 2006.
[31] D. Bertsimas, D. A. Iancu, and D. Katz. A new local search algorithm
for binary optimization. INFORMS Journal on Computing, 25(2):208–221,
2013.
[32] C. Bessiere. Constraint propagation. In F. Rossi, P. van Beek, and T. Walsh,
editors, Handbook of Constraint Programming, pages 29–83. Elsevier, 2006.
[33] R. E. Bixby. A brief history of linear and mixed-integer programming
computation. Documenta Mathematica. Extra Volume: Optimization Stories,
pages 107–121, 2012.
[34] B. Bloom, D. Grove, B. Herta, A. Sabharwal, H. Samulowitz, and
V. Saraswat. SatX10: A scalable plug & play parallel SAT framework. In
SAT Proceedings, volume 7317 of LNCS, pages 463–468. Springer, 2012.
[35] B. Bollig and I. Wegener. Improving the variable ordering of OBDDs is NP-
complete. IEEE Transactions on Computers, 45:993–1002, 1996.
[36] S. Brand, N. Narodytska, C.G. Quimper, P. Stuckey, and T. Walsh. Encod-
ings of the sequence constraint. In Principles and Practice of Constraint
Programming (CP 2007), volume 4741 of LNCS, pages 210–224. Springer,
2007.
[37] R. E. Bryant. Graph-based algorithms for boolean function manipulation.
IEEE Transactions on Computers, C-35:677–691, 1986.
[38] N. J. Calkin and H. S. Wilf. The number of independent sets in a grid graph.
SIAM Journal on Discrete Mathematics, 11(1):54–60, 1998.
[39] A. Caprara, M. Fischetti, and P. Toth. Algorithms for the set covering
problem. Annals of Operations Research, 98:2000, 1998.
238 References
[127] E. Piñana, I. Plana, V. Campos, and R. Martı́. GRASP and path relinking
for the matrix bandwidth minimization. European Journal of Operational
Research, 153(1):200–210, 2004.
[128] M. Pinedo. Scheduling: Theory, Algorithms and Systems. Prentice Hall, 3rd
edition, 2008.
[129] W. B. Powell. Approximate Dynamic Programming: Solving the Curses of
Dimensionality. Wiley, 2nd edition, 2011.
[130] W. Pullan, F. Mascia, and M. Brunato. Cooperating local search for the
maximum clique problem. Journal of Heuristics, 17(2):181–199, 2011.
[131] P. Refalo. Learning in search. In P. Van Hentenryck and M. Milano, editors,
Hybrid Optimization: The Ten Years of CPAIOR, pages 337–356. Springer,
2011.
[132] J.-C. Régin. AC-*: A configurable, generic and adaptive arc consistency
algorithm. In Proceedings of CP, volume 3709 of LNCS, pages 505–519.
Springer, 2005.
[133] J.-C. Régin. Global constraints: A survey. In P. Van Hentenryck and
M. Milano, editors, Hybrid Optimization: The Ten Years of CPAIOR, pages
63–134. Springer, 2011.
[134] J.-C. Régin and J.-F. Puget. A filtering algorithm for global sequencing
constraints. In Principles and Practice of Constraint Programming (CP
1997), volume 1330 of LNCS, pages 32–46. Springer, 1997.
[135] J.-C. Régin, M. Rezgui, and A. Malapert. Embarrassingly parallel search. In
Principles and Practice of Constraint Programming (CP 2013, volume 8124
of LNCS, pages 596–610, 2013.
[136] A. Rendl, M. Prandtstetter, G. Hiermann, J. Puchinger, and G. Raidl. Hybrid
heuristics for multimodal homecare scheduling. In CPAIOR Proceedings,
volume 7298 of LNCS, pages 339–355. Springer, 2012.
[137] F. Rossi, P. van Beek, and T. Walsh, editors. Handbook of Constraint
Programming. Elsevier, 2006.
[138] S. Sanner and D. McAllester. Affine algebraic decision diagrams (AADDs)
and their application to structured probabilistic inference. In Proceedings, In-
ternational Joint Conference on Artificial Intelligence (IJCAI), pages 1384–
1390, 2005.
[139] V. Saraswat, B. Bloom, I. Peshansky, O. Tardieu, and D. Grove. Report on
the experimental language, X10. Technical report, IBM Research, 2011.
246 References
[153] W.-J. van Hoeve, G. Pesant, L.-M. Rousseau, and A. Sabharwal. New filtering
algorithms for combinations of among constraints. Constraints, 14:273–292,
2009.
[154] P. Vilı́m. O(n log n) filtering algorithms for unary resource constraint. In J.-C.
Régin and M. Rueher, editors, CPAIOR Proceedings, volume 3011 of LNCS,
pages 335–347. Springer, 2004.
[155] P. Vilı́m, P. Laborie, and P. Shaw. Failure-directed search for constraint-based
scheduling. In CPAIOR Proceedings, volume 9075 of LNCS, pages 437–453.
Springer, 2015.
[156] A. von Arnim, R. Schrader, and Y. Wang. The permutahedron of N-sparse
posets. Mathematical Programming, 75(1):1–18, 1996.
[157] I. Wegener. Branching Programs and Binary Decision Diagrams: Theory
and Applications. Society for Industrial and Applied Mathematics, 2000.
[158] X10 programming language web site. http://x10-lang.org/, January 2010.
[159] Y. Zhao. The number of independent sets in a regular graph. Combinatorics,
Probability & Computing, 19(2):315–320, 2010.
Index
dynamic programming model, 35–37 time window constraint, 207, 215, 227
integer programming model, 34 top-down compilation, 19
restricted decision diagram, 86 exact decision diagram, 30–32
set packing problem, 20, 37 relaxed decision diagram, 57–58
dynamic programming model, 37–39 restricted decision diagram, 85–86
integer programming model, 37 transition cost function, 29
restricted decision diagram, 86 MAX-2SAT, 46
set variables, 18 maximum cut problem, 43
single-machine scheduling, 39, 76–81, 141, maximum independent set problem, 34
142, 207 set covering problem, 36
A LLDIFFERENT, 214 set packing problem, 38
dynamic programming model, 40–42 single-machine scheduling, 41
filtering, 77–78, 212, 214–218 transition function, 29, 139
makespan, 39, 207, 209, 217 MAX-2SAT, 45
MDD representation, 208–210 maximum cut problem, 43
precedence constraint, 207, 215, 218–219 maximum independent set problem, 34
refinement, 79–81, 212, 219–220 set covering problem, 35
relaxed MDD, 211–214 set packing problem, 38
sum of setup times, 207, 209, 217 single-machine scheduling, 41
time window constraint, 207, 215 traveling salesman problem, 143
total tardiness, 141, 207, 209, 218, 233–234 makespan objective, 230–232
width, 79, 221 precedence constraints, 228–229
solution counting, 15 time windows, 224, 227, 230–232
sound decision diagram, 17
SPP, see set packing problem unary machine scheduling, see single-machine
stable set problem, 19, 20 scheduling
state space, 29, 139
MAX-2SAT, 45 variable ordering, 15, 123–135
maximum cut problem, 43 in branch and bound, 101
maximum independent set problem, 34 k-look ahead, 131
set covering problem, 35 maximal path decomposition, 130
set packing problem, 37 maximum independent set problem
single-machine scheduling, 40 bipartite graph, 126
state space relaxation, 20 clique, 125
state transition graph, 27, 138, 140 cycle, 126
state variable, 28 Fibonacci number, 128
state-dependent cost, 20, 138, 146 interval graph, 127
state-graph, 31 path, 125
stochastic dynamic programming, 21, 22 tree, 127, 132
switching circuit, 12 minimum number of states, 130
relaxation bound, 134
table constraint, 18, 160, 168, 169 relaxed, 130–131
terminal state, 28 relaxed decision diagram, 65
threshold decision diagram, 16 width versus relaxation bound, 132
254 Index