0% found this document useful (0 votes)

66 views22 pages

Journal of Statistical Software

This document summarizes a novel algorithmic framework called FLSSS for solving combinatorial optimization problems in the subset sum family. The core algorithms in FLSSS are implemented in C++ for speed and efficiently solve problems like the multidimensional knapsack problem and generalized assignment problem. FLSSS differs from other subset sum solvers by allowing restrictions on subset size and element bounds, as well as finding approximate solutions for real-valued sets within a given error tolerance.

Uploaded by

kaushikray06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views22 pages

Journal of Statistical Software

Uploaded by

kaushikray06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

JSS Journal of Statistical Software

MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/

FLSSS: A Novel Algorithmic Framework for

Combinatorial Optimization Problems in the
arXiv:1612.04484v3 [cs.DS] 23 Nov 2018

Subset Sum Family

Charlie Wusuo Liu

Abstract
This article details the algorithmics in FLSSS, an R package for solving various subset
sum problems. The fundamental algorithm engages the problem via combinatorial space
compression adaptive to constraints, relaxations and variations that are often crucial for
data analytics in practice. Such adaptation conversely enables the compression algorithm
to drain every bit of information a sorted superset could bring for rapid convergence.
Multidimensional extension follows a novel decomposition of the problem and is friendly to
multithreading. Data structures supporting the algorithms have trivial space complexity.
The framework offers exact algorithms for the multidimensional knapsack problem and
the generalized assignment problem.

Keywords: subset sum, combinatorial optimization.

1. Introduction
The Subset Sum problem is an NP-complete combinatorial optimization problem (Kleinberg
and Tardos 2006). Given a set of integers, it seeks a subset whose elements sum to a given
target. Algorithms for solving the problem exist in a vast body of literature. These algorithms
range over the exact and approximate approaches in both the deterministic and stochastic
categories (Bazgan, Santha, and Tuza 2002; Koiliaris and Xu 2017; Ghosh and Chakravarti
1999; Wang 2004; Gu and Ruicui 2015). However, implementations of the algorithms are often
inconvenient to access while the claimed performances remain in computational complexity
analysis. The Subset Sum problem is formally defined in the integer domain, yet for data
analytics, real numbers are typically the subjects. We often do not need or cannot even have
a subset sum precisely equal the given target because of limited precision in floating-point
arithmetics.
2 FLSSS

The package name of FLSSS stands for fixed-length subset sum solver, the single function im-
plemented in its first version. Algorithms in the package are meticulously implemented with
many rounds of optimizations regarding both the mathematics and hardware adaptation for
pushing speed limits. Solvers in the package differ from the mainstream definition of Subset
Sum in the options of (i) restricting subset size, (ii) bounding subset elements, (iii) mining
real-value sets with predefined subset sum errors, and (iv) finding one or more subsets in
limited time. A novel algorithm for mining the one-dimensional Subset Sum (Section 2) in-
duced algorithms for the multi-Subset Sum (Section 3) and the multidimensional Subset Sum
(Section 4). The latter can be scheduled in a multithreaded environment, and the framework
offers strong applications as exact algorithms to the multidimensional Knapsack (Section 5)
and the Generalized Assignment problems (Section 6) for solving to optimality. The package
provides an additional functionality that maps reals to integers with controlled precision loss.
These integers are further zipped non-uniformly in 64-bit buffers. Algebras (addition, sub-
traction, comparison) over compressed integers are defined through simple bit manipulations
with virtually zero speed lags (Section 4.3) relative to those over normal integers. Acceleration
from the dimension reduction can be substantial.
Core algorithms in FLSSS are implemented in C++. Inexpensive preprocessing steps such
as data assembling and reshaping are coded in R. The package employs Rcpp (Eddelbuettel,
Francois, Allaire, Ushey, Kou, Russell, Bates, and Chambers 2018) APIs for getting memory
addresses and size information of R objects. The basic multithreading template is taken
from RcppParallel (Allaire, Francois, Ushey, Vandenbrouck, and Geelnard 2018). Thread
synchronization tools such as mutex and atomic classes are borrowed from Intel TBB library
(Intel 2017) included in RcppParallel.

2. One-dimensional Subset Sum

The one-dimensional fixed-size Subset Sum algorithm (OFSSA) iterates two major steps:
index hypercube contraction and subspacing. Assuming subset size n, the algorithm views a
qualified subset as a point in an n-dimensional hypercube, each dimension of which consists
of indexes of elements in a sorted superset. The contraction step compresses the hypercube
to a smaller hyperrectangle. The subspacing step halves the hyperrectangle over a particular
dimension and reshapes other dimensions accordingly. If certain dimension has only one index
left, it is excluded and the problem reduces to an (n − 1)-size Subset Sum.
Hypercube contraction and the data structure adaptive to it are the key algorithmics in
OFSSA. The iteration of compression and subspacing falls in the branch-and-bound paradigm
(Land and Doig 1960). Implementation of OFSSA focuses on maximizing the intensity of
hypercube contraction and minimizing the waste of recalculations anywhere in the algorithm.
The one-dimensional variable-size Subset Sum can convert to a fixed-size problem via (i)
doubling the superset size by padding zeros, and (ii) mining a subset of size equal to half of
the new superset size. In practice, looping over all subset sizes is usually more efficient.

2.1. Contraction
A real-value superset of size N is sorted and stored in an array

x = (x0 , x1 , . . . , xN −1 ) = x(0), x(1), . . . , x(N − 1) .
Journal of Statistical Software 3

Given subset size n, we look for an integer array i = (i0 , i1 , . . . , in−1 ) such that
n−1
X
x(ik ) ∈ [MIN, MAX] , the subset sum range. (1)
k=0

The index array i satisfies

x(i0 ) ≤ x(i1 ) ≤ . . . ≤ x(in−1 ) , (2)

0 ≤ i0 < i1 < . . . < in−1 ≤ n − 1 , (3)

ik ∈ [k, N − n + k], k ∈ {0, 1, . . . , n − 1} . (4)

Equation (4) outlines the n-dimensional hypercube where potential qualified subsets reside.
The contraction algorithm (i) finds the infima (greatest lower bounds) of ik , k = 0 to n − 1;
(ii) finds the suprema (least upper bounds) of ik , k = n − 1 to 0; (iii) repeat (i) and (ii)
until the infima and suprema become stationary. Note k in (i) and (ii) proceeds in opposite
directions. Let l(ik ) and u(ik ) be the current lower and upper bounds of ik .

First index
We inspect l(i0 ) first. The initial value of l(i0 ) equals 0Pby Equation (4). Our goal is to uplift
l(i0 ) if possible. Equation (1) implies x(i0 ) ≥ MIN − n−1 t=1 x(it ). Notice the initial maxima
of x(i1 ), . . . , x(in−1 ) are x(N − n + 1), . . . , x(N − 1), thus

n−1
X N
X −1
x(i0 ) ≥ MIN − max x(it ) = MIN − x(t) . (5)
t=1 t=N −n+1

Therefore, updating l(i0 ) is equivalent to solving the following optimization system :

N
X −1
l(i0 ) ← min(α) subject to x(α) ≥ MIN − x(t) . (6)
t=N −n+1

System (6) updates

PNthe lower bound of i0 to the index of the least element (in x) that is no
−1
less than MIN − t=N −n+1 x(t).

Rest indexes
For k ∈ [1, n), the update of l(ik−1 ) immediately triggers

l(ik ) ← max l(ik ), l(ik−1 ) + 1

because of Constraint (3).

Similar to Inequality (5), we have
k−1
X n−1
X
x(ik ) ≥ MIN − max x(it ) − max x(it ) . (7)
t=0 t=k+1
4 FLSSS

The sum n−1

P
t=k+1 x(it ) is maximized when ik+1 , . . . , in−1 equal their current individual max-
Pk−1
ima u(ik+1 ), . . . , u(in−1 ). To maximize the sum t=0 x(it ), we cannot simply assign i0 , . . . , ik−1
to their current individual maxima, because Constraint (3) further upper-bounds i0 , . . . , ik−1
with ik . In fact, it ≤ ik − (k − t) for any t ∈ [0, k]. Therefore,

k
X Xk
max x(it ) = x min u(it ), ik − k + t ,
t=0 t=0

and

l(ik ) ← min(α) subject to

l(ik ) ≤ α ≤ u(ik ) ,
k n−1 (8)
X X
x min u(it ), α − k + t ≥ MIN − x u(it ) .
t=0 t=k+1

Notice the left side of the second constraint is a non-decreasing function of α, thus a brute-force
solution to System (8) can be of (i) initializing α with the current l(ik ), and (ii) incrementing
α by 1 repeatedly until the second constraint turns true. If α = u(ik ) and the constraint is
still unsatisfied, contraction fails and no qualified subsets would exist.

Upper bounds
Updating the lower and the upper bounds
are symmetrical. The upper bound u(ik ) is updated
by u(ik ) ← min u(ik ), u(ik+1 ) − 1 first and then

u(ik ) ← max(α) subject to

l(ik ) ≤ α ≤ u(ik ) ,
n−1 k−1 (9)
X X
x max l(it ), α + t − k ≤ MAX − x l(it ) .
t=k t=0

2.2. Contraction implementation

The contraction algorithm is heavily optimized and far different from the narrative in Section 2.1.
These optimizations focus on Systems (8) and (9), and mainly consist of (i) decomposition
of the min, max operators, and (ii) an auxiliary quasi-triangle matrix of sums of consecutive
elements in x for quick lookup.

Decompose minimum function

In the second constraint of System (8), both u(it ) and α − k + t are increasing functions of
t. When t = k, min u(it ), α − k + t = α − k + t since α ≤ u(ik ), thus we know α − k + t
dictates the minimum function at the right end. Additionally, the discrete differential (slope)
of α − k + t regarding t is 1 and that of u(it ) is no less than 1 for every t, therefore, if
α − k + t and u(it ) ever intersect, then u(it ) dictates the minimum function on the left of the
intersection point and α − k + t dictates that on the right. The proof is trivial. Assuming t∗
Journal of Statistical Software 5

the intersection point of α − k + t and u(it ), System (8) becomes

l(ik ) ← min(α) subject to

l(ik ) ≤ α ≤ u(ik ) , (i)
∗ −1
tX k n−1
X X (10)
x u(it ) + x α − k + t ≥ MIN − x u(it ) , (ii)
t=0 t=t∗ t=k+1
∗
α − k + t ≤ u(it∗ ) . (iii)

If t∗ = 0, then α − k + t and u(it ) have no intersection and the first sum term in (10)(ii) is
ignored. Our next goal is to find an appropriate t∗ . We will see knowing t∗ brings considerable
computational advantage.
For convenience, let

∗ −1
tX k
∗
X
f (t , α) = x u(it ) + x α−k+t . (11)
t=0 t=t∗

Notice f is a non-decreasing function of α. Constraint (10)(iii) implies α ≤ u(it∗ ) + k − t∗ ,

thus we have the following constraint:

∗ −1
tX k n−1
X X
x u(it∗ ) + t − t∗ ≥ MIN −

max(f ) = x u(it ) + x u(it ) . (12)
t=0 t=t∗ t=k+1

Function max(f ) is a non-decreasing function of t∗ , which can be easily proved by valuing

the discrete differential regarding t∗ . We update t∗ using linear search. The initial value
of t∗ was found in the prior iteration of k − 1, and is incremented P by 1 repeatedly until
Inequality (12) becomes true. Here is a computing shortcut: except for kt=t∗ x u(it∗ )+t−t∗
(see Section 2.2.2 for its update process), sum terms in Inequality (12) are not updated
by summations but by adding (subtracting) the incoming (outgoing) elements when t∗ is
incremented. The updated t∗ is the least t∗ to let Inequality (10)(ii) possibly be satisfied for
some α. Inequality (10)(ii) is then rewritten as

k n−1 ∗ −1
tX
X X
x α − k + t ≥ MIN − x u(it ) − x u(it ) . (13)
t=t∗ t=k+1 t=0

We initialize α ← max l(ik ), u(it∗ −1 ) + (k − t∗ + 1) , and repeatedly increment α by 1 until

Inequality (13) turns true. The resulting α becomes the new l(ik ). Figure 1 demonstrates a
visual explanation for updating t∗ and l(ik ). Decomposing the max operator in System (9)
follows the same rationale.

Quasi-triangle matrix
The first sum in Inequality (13) and the second sum in (12) both add consecutive elements
6 FLSSS

● ● ● ●
α −−
α k +kt + t
● ● ● ●

l(it) t )
● ● ● ●
● u(i ● ● ● ●
u(itt))
l(i
● ● ● ●

●
● l(ik ) − 1 ●
●
●
●
●
●

u
l(ik )
● ● ● ●
l(ik−1 )

(1) (2) (3) (4)

Index ● Index ● Index ● Index ●

● ● ● ●

u(it∗ ) ●
●

●
●

● ● ● ●
● ● ● ●
● ● ● ●
u

u
● ● ● ●

(5) (6) (7) (8)

Figure 1: Panels (1)-(5) update t∗ by Inequality (12) using linear search. Panels (6)-(8) update l(ik ) by
Inequality (13) using linear search.

in x. We construct the following matrix

 P1 P2 Pn−1 
x(0) t=0 x(t) t=0 x(t) ... t=0 x(t)
 P2 P3 .. 
 x(1) t=1 x(t) t=1 x(t) ... . 
.. .. ..
 
 PN 
M= . . . t=N −n+1 x(t)
 (14)
 PN −2 PN −1 
 x(N − 3) x(t) x(t) 
t=N −3 t=N −3
 x(N − 2) PN −1 x(t)
 

t=N −2
x(N − 1)
Pk ∗ ∗
for fast looking up the sums. For instance, t=t∗ x α − k + t = M[α − k + t , k − t ].
This matrix is constructed once and used for every contraction until the qualified subsets are
found. Because each column of M is in ascending order, updating l(ik ) by Inequality (13)
can also use binary search. However, simulations show binary searches here usually have
lower performance due to CPU caching mechanisms (Denning 2005). The following code
compares the time costs of mining a hundred supersets using binary search and linear search
for contraction. Figure 2 shows the results.

R> set.seed(42)
R> lrtime = numeric(100)
R> bitime = numeric(100)
R> len = 100
R> me = 1e-4
R> for(i in 1L : 100L)
+ {
+ superset = sort(runif(1000, 0, 1e6))
+ target = sum(sample(superset, len))
+ lrtime[i] = system.time({FLSSS::FLSSS(len = len, v = superset,
+ target = target, ME = me, solutionNeed = 10,
Journal of Statistical Software 7

●
● Binary search
Time (seconds)

● ●
● Linear search
2.0

●
● ● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ●
● ●
● ● ● ●● ● ● ● ● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ● ●
1.0

● ● ● ● ● ● ●
● ● ● ● ●● ●
●
●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ●
●● ● ● ●
● ●
0.0

Cases
Figure 2: Supersets sizes = 1000, subsets sizes = 100, subset sum error thresholds = 0.0001, requested
number of subsets ≥ 10, g++ ’-O2’ compile, Intel(R) i7-4770 CPU @ 3.40GHz, Windows 7. Preprocessing
time included. Each of the 100 supersets contains uniforms in [0, 1000000]; the corresponding target sum
is the sum of elements of a subset sampled at random. Linear search yields about 1.8x acceleration.

+ useBiSrchInFB = FALSE)})['elapsed']
+ bitime[i] = system.time({FLSSS::FLSSS(len = len, v = superset,
+ target = target, ME = me, solutionNeed = 10,
+ useBiSrchInFB = TRUE)})['elapsed']
+ }
R> mean(bitime / lrtime)

[1] 1.790667

It is easy to see the infima l(it ) and suprema u(it ) will become stationary after finite con-
tractions because l(it ) ≤ u(it ). The uniqueness of the stationed infima and suprema remains
to be proved. FLSSS provides a function z_findBound() for examining the concept. In the
following example code, the first call to z_findBound() contracts a 10-dimensional hypercube
starting with the infima. The second call to z_findBound() starts with the suprema. Both
calls converge to the same hyperrectangle.

R> x = c(14, 60, 134, 135, 141, 192, 199, 203, 207, 234)
R> MIN = 813
R> MAX = 821
R> lit = as.integer(c(1, 2, 3, 4, 5))
R> uit = as.integer(c(6, 7, 8, 9, 10))
R> hyperRectangle1 = FLSSS:::z_findBound(len = 5, V = as.matrix(x),
+ target = (MIN + MAX) / 2, me = (MAX - MIN) / 2, initialLB = lit,
+ initialUB = uit, UBfirst = FALSE)
R> hyperRectangle2 = FLSSS:::z_findBound(len = 5, V = as.matrix(x),
+ target = (MIN + MAX) / 2, me = (MAX - MIN) / 2, initialLB = lit,
+ initialUB = uit, UBfirst = TRUE)
R> hyperRectangle1[-1]; hyperRectangle2[-1]

[[1]]
[1] 1 3 5 6 8
[[2]]
[1] 3 6 8 9 10
8 FLSSS

(1) p (2) p
(a, b) (c, d)
a b c d

Figure 3: Panel (1) shows the variable-subspacing (VS) method. Assuming the narrowest dimension of
hyperrectangle p has width 4, then subspacing p would produce 4 child hyperrectangles a, b, c, d. On the
other hand, binary-subspacing (BS) halves p. If contracting a, b would both fail, VS has to contract both
a and b to know, yet BS only has to contract ( a, b). For BS, c or d are decedents of ( c, d), a smaller
hyperrectangle than p that gives birth to c and d in VS.
0.0 0.5 1.0 1.5 2.0 2.5

● Variable tree
Time (seconds)

Binary
●
tree ●● ● ● ●
● ● ● ● ● ●
● ●
● ●● ●
●
● ●● ●● ● ● ● ●
● ● ● ●● ● ● ● ● ●●
● ● ● ● ●
● ● ● ●● ● ● ●●
● ● ● ● ●
● ● ● ● ●
●
● ●● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ●
● ● ●
● ● ●
● ● ● ●
●
● ● ●
●
●●

Cases
Figure 4: Supersets sizes = 70, dimensionality = 14, subsets sizes = 7, subset sum error thresholds = 0.01
for all dimensions, requested number of subsets < ∞, g++ ’-O2’ compile, 7 threads, Intel(R) i7-4770 CPU
@ 3.40GHz, Windows 7. Preprocessing time included. Each of the 100 supersets contains random uniforms
in [0, 10000]; the corresponding target sum is the sum of elements of a subset sampled at random. Binary
subspcing yields about 1.6x acceleration.

[[1]]
[1] 1 3 5 6 8
[[2]]
[1] 3 6 8 9 10

2.3. Subspacing
Some previous versions of FLSSS (i) select a dimension of the hyperrectangle resulted from
contraction, (ii) fix the value of that dimension, (iii) reduce the dimensionality of the problem
by 1 and (iv) execute contraction again. In step (i), it chooses the dimension having the least
domain width so to produce the fewest branches. These steps are a mixture of depth-first
and best-first searches.
The current version of FLSSS differs in step (ii) and (iii): it halves the domain of the selected
dimension, and reduces the dimensionality only if the domain width equals 1. Virtually,
the current subspacing method constructs a binary tree while the previous ones construct
a variable-branch tree. The binary tree appears to converge slower since it lengthens the
path between each two nodes of dimension reduction, but it actually yields higher overall
speed because (1) it could prune multiple branches at once if contraction fails over a halved
hyperrectangle, and (2) child nodes will receive a smaller hyperrectangle (in terms of volume)
Journal of Statistical Software 9

Algorithm 1 Subspacing
Parent Child
β κ n nz
n−1
P n−1
P
MIN MAX x l(it ) x u(it )
t=0 t=0
l u u0 S(u)

l(i0 ), . . LB
. , l(in−1 ) u(i0 ), . . .UB
, u(in−1 ) u(i0 ), .Bresv
. . , u(iκ )
A hyperrectangle object in stack consists of twelve parameters explained in the following
algorithm.
LEFT BRANCH:
1: β ← 0 . / β = 0 implies the left branch, 1 the right branch.
2: Copy n, dimensionality of the parent hyperrectangle;
n−1
P n−1
P
3: Copy MIN, MAX, x l(it ) , x u(it ) , l and u from the parent.
t=0 t=0
4: Update the copied parameters through contraction. If it fails, return a failure signal.
5: T ← {t|u(it ) = l(it )} , nz ← |T | . / |T | is the cardinality.
6: l ← {l(it )|t ∈
/ T } , u ← {u(it )|t ∈
/ T } , n ← n − nz .
n−1
P n−1
P
7: Update x l(it ) and x u(it ) .
t=0 t=0
8: Push {u(it )|t ∈ T } in a global buffer B that is to hold a qualified subset. / This step goes
concurrently with Step 5.
9: κ ← arg min u(it ) − l(it ) .
t
n−1
u0 ← {u(i0 ), . . . , u(iκ )} , S(u) ←
P
10: x u(it ) .
t=0
11: For t ∈ [0, κ] , u(it ) ← min u(it ), bu(iκ )/2c − κ + t . / Loop t from κ and stop once
u(it ) ≤ bu(iκ )/2c − κ + t .
n−1
P n−1
P
12: Update x l(it ) , x u(it ) . / Use M for fast update.
t=0 t=0
If contraction succeeds in Step 4, move to the right hyperrectangle in stack and execute the
above steps again. Otherwise left-propagate through stack while erasing the last nz elements
in buffer B for each hyperrectangle, and stop once the current one has β = 0 .
RIGHT BRANCH:
1: β ← 1 .
n−1
2: For t ∈ [0, κ] , u(it ) ← u0 (it ) ;
P
x u(it ) ← S(u) .
t=0
3: For t ∈ [κ, n − 1] , l(it ) ← max l(it ), u(iκ ) + 1 + t − κ . / Loop t from κ and stop once
l(it ) ≥ u(iκ ) + 1 + t − κ .
n−1
P
4: Update x l(it ) . / Use M for fast update.
t=0
Move to the right hyperrectangle and execute LEFT BRANCH.
10 FLSSS

which leads to faster contraction. Figure 3 and Algorithm 1 present details of subspacing. The
speed benefit from binary subspacing will be more pronounced for multidimensional subset
sum (Section 3) where contractions consume the majority of mining time. The following
code compares the time costs of mining a hundred supersets using the variable and binary
subspacing methods. Figure 4 shows the results.

R> set.seed(42)
R> N = 70L; n = 7L; d = 14L
R> mflsssBinTreeTime = numeric(100)
R> mflsssVarTreeTime = numeric(100)
R> for(i in 1L : 100L)
+ {
+ x = matrix(runif(N * d, 0, 10000), ncol = d)
+ tmp = colSums(x[sample(1L : N, n), ])
+ Sl = tmp - 0.01
+ Su = tmp + 0.01
+ mflsssBinTreeTime[i] = system.time(FLSSS::mFLSSSpar(
+ maxCore = 7, len = n, mV = x, mTarget = (Sl + Su) / 2,
+ mME = (Su - Sl) / 2, solutionNeed = 1e3, tlimit = 3600))['elapsed']
+ mflsssVarTreeTime[i] = system.time(FLSSS:::mFLSSSparVariableTree(
+ maxCore = 7, len = n, mV = x, mTarget = (Sl + Su) / 2,
+ mME = (Su - Sl) / 2, solutionNeed = 1e3, tlimit = 3600))['elapsed']
R> }
R> mean(mflsssVarTreeTime / mflsssBinTreeTime)

[1] 1.58791

3. Multi-Subset Sum
Given multiple sorted supersets and a subset size for each, the multi-Subset Sum seeks a
subset from every superset such that elements in all subsets sum in a given range. The
OFSSA directly applies to this problem following four steps: (i) shift elements in some or
all supersets such that (ii) all elements in the shifted supersets constitute a nondecreasing
sequence, a new superset; (iii) calculate a new subset sum target range in response to the
shifting in (i); (iv) mine a subset of size equal to the sum of given subset sizes while the subset
elements are bounded by the sub-supersets within the new superset.
Let x0 , . . . , xK−1 be K sorted supersets of sizes N0 , . . . , NK−1 . Let n0 , . . . , nK−1 be the
respective subset sizes and [MIN, MAX] be the target sum range. Shifting the supersets
follows
xsh (t) ← xh (t) − xh (0) + xh−1 (NK−1 − 1), h ∈ [1, K), t ∈ [0, Nh ) (15)

where xsh denotes a shifted superset. Elements in shifted supersets are then pooled together
Journal of Statistical Software 11

as a new nondecreasing superset:

xs ← { xs0 (0), . . . , xs0 (N0 − 1) ,
xs1 (0), . . . , xs1 (N1 − 1) ,
.. (16)
.
xsK−1 (0), . . . , xsK−1 (NK−1 − 1) } .
The subset sum target range is adjusted by
K−1
X
MINs ← MIN +

xh−1 (NK−1 − 1) − xh (0) · nh ,
h=1
(17)
s s
MAX ← MIN + (MAX − MIN) .
PK−1
The multi-Subset sum seeks a monotonically increasing index array is of size h=0 nh . Let
h0 ∈ [0, K). The initial hyperrectangle for contraction is outlined by
h0
hX h
X
0 0 +1
hX h
X
0
h0
hX 0 +1
hX
s
i (k) ∈ Nh + k − nh , Nh − nh0 + k − nh given k ∈ nh , nh . (18)
h=0 h=0 h=0 h=0 h=0 h=0

4. Multidimensional Subset Sum

A real superset of size N in the d-dimensional space is an N × d matrix
   
x(0, 0) ... x(0, d − 1) x(0, )
.. .. .. ..
x= =  = x(, 0), . . . , x(, d − 1) .
   
. . . .
x(N − 1, 0) . . . x(N − 1, d − 1) x(N − 1, )
(19)
Elementary algegrabs on two elements x(s, ) and x(t, ), s, t ∈ [0, N ), are defined element-
wisely:

x(s, ) ± x(t, ) = x(s, 0) ± x(t, 0), . . . , x(s, d − 1) ± x(t, d − 1) ,
(20)
x(s, ) ≤ x(t, ) ≡ x(s, 0) ≤ x(t, 0) ∧ . . . ∧ x(s, d − 1) ≤ x(t, d − 1) .

Given subset size n and subset sum range [SL , SU ] as two size-d arrays, the multidimensional
fixed-size Subset Sum algorithm (MFSSA) seeks an integer array i = (i0 , i1 , . . . , in−1 ) such
that
n−1
X
SL ≤ x(it , ) ≤ SU . (21)
t=0
The multidimensional variable-size Subset Sum follows similar conversion in Section 2.

4.1. Comonotonization
If all columns in x are comonotonic (Dhaene, Denuit, Goovaerts, and Vyncke 2002), x can
be sorted so that x(0, ) ≤ . . . ≤ x(N − 1, ) . Overloading arithmetic operators (Eckel 2000)
in OFSSA then solves the problem.
12 FLSSS

In general, MFSSA roughly consists of (i) padding an extra column of non-decreasing integers
to the superset, (ii) scaling and adding this column to the rest to make all columns comono-
tonic (comonotonize / comonotonization), and (iii) mining for at most N (N − n)/2 + 1 subset
sum ranges regarding the new superset. Each of these subset sum ranges corresponds to a
subset sum in the extra column. Mining different subset sum ranges in (iii) are independent
and share the same auxiliary matrix of M (Section 2.2.2), thus can employ multithreading.
Let x∗ be the comonotonized superset. The extra column x∗ (, d) is referred to as the key
column. For convenience, let array v refer to x∗ (, d) . The key column is constructed by

v(0) ← 0 , and for s ∈ [1, N ) :

v(s) ← v(s − 1) If x(s − 1, ) ≤ x(s, ) , (22)
v(s) ← v(s − 1) + 1 Otherwise .

Let ∆x(, t) be the discrete differential of x(, t), t ∈ [0, d). The rest columns of x∗ are
computed by

θ(t) = min 0, min ∆x(, t) and

(23)
x∗ (, t) ← x(, t) + v · θ(t) ,

where θ(t) is referred to as the comonotonization multiplier for x(, t). The key column v has
no subset sum constraint. However, because it is a sorted integer sequence with maximal
discrete differential of 1, all unique subset sums in v compose an integer sequence:

n−1
X n−1
X n−1
X N
X −1
S key = v(t), 1 + v(t), 2 + v(t), . . . , v(t) . (24)
t=0 t=0 t=0 t=N −n

PN −1 Pn−1
The size of S key equals t=N −n v(t) − t=0 v(t) + 1, which would be no more than N (N −
key
n)/2 + 1. Let NS be the size of S . We have the following NS subset sum ranges to mine:

S key (0)θ(0) + SL (0) S key (0)θ(d − 1) + SL (d − 1) S key (0)

 
...
S ∗L = 
 .. .. .. .. 
. . . . 
S key (NS − 1)θ(0) + SL (0) . . . S key (NS − 1)θ(d − 1) + SL (d − 1) S key (NS − 1)
S key (0)θ(0) + SU (0) S key (0)θ(d − 1) + SU (d − 1) S key (0)
 
...
S ∗U = 
 .. .. .. .. 
. . . . 
S key (NS − 1)θ(0) + SU (0) . . . S key (NS − 1)θ(d − 1) + SU (d − 1) S key (NS − 1)
(25)

where [S ∗L (s, ), S ∗U (s, )], s ∈ [0, NS ), account for one subset sum range.
Consider the following toy example of finding a size-2 subset from a 2D superset (n = 2, d =
2, N = 3)
 
4 10
x= 2 25  (26)
8 17
Journal of Statistical Software 13

with SL = (11, 26) and SU = (12, 28). The minimal discrete differentials of the two columns
are -2 and -8. Then the comonotonization multipliers are 2 and 8 respectively. We comono-
tonize x by    
4 + 0 × 2 10 + 0 × 8 0 4 10 0
x∗ =  2 + 1 × 2 25 + 1 × 8 1  =  4 33 1  (27)
8 + 2 × 2 17 + 2 × 8 2 12 33 2
according to Equations (22), (23). The unique size-two subset sums in the key column are 1,
2, 3, thus
   
11 + 1 × 2 26 + 1 × 8 1 13 34 1
S ∗L =  11 + 2 × 2 26 + 2 × 8 2  =  15 42 2 ,
11 + 3 × 2 26 + 3 × 8 3 17 50 3
    (28)
12 + 1 × 2 28 + 1 × 8 1 14 36 1
S ∗U =  12 + 2 × 2 28 + 2 × 8 2  =  16 44 2 .
12 + 3 × 2 28 + 3 × 8 3 18 52 3

For x∗ , there are 3 subset sum ranges subject to mining.

4.2. Order optimizations

The order optimizations have two folds: (a) reordering rows of the superset before comono-
tonization and (b) reordering the subset sum ranges. Both can accelerate subset mining.
Several conjectures for the speedup are given at the end of the section.
Before adding the key column, we sort rows of x by one of its columns, the leader column, in
ascending order. The leader column has the least sum of squares of rank differences, or, the
greatest sum of Spearman’s correlations (Spearman 1904), with other columns. In a sense,
the leader column correlates the rest columns the most. Comparing sums of Spearman’s
correlations is not the only way to define a leader column. We choose the method because it
is most common and computationally cheap.
The next optimization reorders the rows of S ∗L and S ∗U based on their likelihoods of yielding
qualified subsets. Let t0 be the index of the leader column. After sorting x by x(, t0 ), we
estimate the percentile of the leader column’s subset sum target within the range of all
possible subset sums:

[SL (, t0 ) + SU (, t0 )] / 2 − n−1 0
P
s=0 x(s, t )
p = PN −1 n−1
. (29)
0) − 0)
P
s=N −n x(s, t s=0 x(s, t
If there exist qualified subsets, their subset sums in the key column should have percentiles
close to p. We prioritize the subset sum targets whose percentiles are close to p, thus the rows
of S ∗L and S ∗U are ordered by
S ∗ (, d) − S ∗U (0, d)
∗ U − p . (30)

S U (N − 1, d) − S ∗U (0, d)

Given m threads, the first m rows of S ∗L and S ∗U are mined concurrently. The first finished
thread then works on the m + 1 th rows so forth if the current number of qualified subsets
is unsatisfied. The threads are scheduled by several atomic class objects (Intel 2011a). The
scheduling overhead is negligible.
14 FLSSS

●
Time (seconds)

●
0.8

●
● ● ● ● ● ● ● ● ●● ● ●
●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●●● ● ●●
0.6

●● ● ●● ●● ●● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●● ● ● ● ● ● ●
● ●
● ● ● ● ●
No order optimization ●
0.4

With order optimization

0.2

Cases

Figure 5: A hundred 60 × 5 supersets and subset sum targets generated at random, subset size n = 6,
g++ ’-O2’ compile, 7 threads, Intel(R) i7-4770 CPU @ 3.40GHz, Windows 7. Preprocessing time included.
Order optimization yields about 4x acceleration for finding all qualified subsets.

The following code demonstrates the speed advantage from order optimizations in a hundred
test cases. Figure 5 shows the results.

R> set.seed(42)
R> N = 60L; n = 6L; d = 5L
R> noOpt = numeric(100)
R> withOpt = numeric(100)
R> for(i in 1L : 100L)
+ {
+ x = matrix(runif(N * d) * 10000, ncol = d)
+ solution = sample(1L : N, n)
+ Sl = colSums(x[solution, ]) * 0.999
+ Su = Sl / 0.999 * 1.001
+ rm(solution); gc()
+ noOpt[i] = system.time(FLSSS::mFLSSSparImposeBounds(maxCore = 7,
+ len = n, mV = x, mTarget = (Sl + Su) / 2, mME = (Su - Sl) / 2,
+ solutionNeed = 1e6))['elapsed']
+ withOpt[i] = system.time(FLSSS::mFLSSSpar(maxCore = 7, len = n,
+ mV = x, mTarget = (Sl + Su) / 2, mME = (Su - Sl) / 2,
+ solutionNeed = 1e6))['elapsed']
+ }
R> mean(noOpt / withOpt)

[1] 4.392401

These simulations seek all qualified subsets, thus order optimization (b) has no effect, because
all rows in S ∗L and S ∗U will be in trial. A certain yet minor reason for the acceleration due
to superset reordering is that it can lower the number of unique elements in the key column
by Equation (22), thus leads to fewer rows in S ∗L and S ∗U and fewer tasks for the computing
threads. A probable and major reason is that reordering the superset puts its elements in
compact shapes or clusters instead of random, scattered formations in multidimensional space,
which leads to (i) more intense hyperrectangle contraction (Section 2.1) navigated by those
compact shapes or clusters, and thus (ii) fewer child hyperrectangles spawned for predicting
the locations of qualified subsets.
Journal of Statistical Software 15

4.3. Integer compression

To further accelerate the mining speed with less concern about the accuracy, we round superset
x as xz , and then compress every row of the comonotonized superset xz∗ into an array of
64-bit integer buffers. The final superset is denoted by xz∗c . The transformation from x
to xz∗c is referred to as integerization. The consequent dimension reduction enhances cache
locality (Denning 2005) and thus computing speed.
In MFSSA, shifting and scaling a column and its subset sum range does not affect mining
results. Let λ(t), an integer, be the user-defined maximum of column xz (, t). We shift, scale
and round a column of x and the corresponding subset sum range by

−

z x(, t) min x(, t)
x (, t) ← · λ(t) ,
max x(, t)

SL (t) − min x(, t) · n

z
SL (t) ← · λ(t) , (31)
max x(, t)

SU (t) − min x(, t) · n

z
SU (t) ← · λ(t) .
max x(, t)

The above equations guarantee nonnegative integers in xz (, t) with minimum 0 and maximum
λ(t). Without considering numeric errors brought by scaling and shifting, larger λ(t) makes
xz and x have closer joint distributions, and the chance of the two yielding different qualified
subsets (element index-wise) will be lower.
The comonotonization of xz results in xz∗ , S z∗ z∗
L and S U . Compressing integers for dimension
reduction approximately consists of (i) finding the largest absolute value that could be reached
during mining (LVM) for each dimension (column) of xz∗ , (ii) calculating how many bits are
needed to represent the LVM for each dimension and how many 64-bit buffers are needed to
store those bits for all dimensions, (iii) cramming each row of xz∗ into the buffers and (iv)
defining compressed integer algebras through bit-manipulation.
The LVM for a certain dimension should equal or exceed the maximum of the absolute values
of all temporary or permanent variables in the mining program within that dimension. Let
ψ(t) be the LVM for xz∗ (t), t ∈ [0, d). We have
−1
z∗ z∗ NX
xz∗ (s, t) ,

ψ(t) ← max max S L (, t) , max S U (, t) ,
(32)
s=N −n

and the number of bits alloted to dimension t equals

β(t) ← log2 ψ(t) + 1 . (33)

The extra bit is used as the sign bit for comparison.

For the first 64-bit integer in the s th row of xz∗c , s ∈ [0, N ),

xz∗c (s, 0) ←xz∗ (s, 0) 64 − β(0) +

xz∗ (s, 1) 64 − β(0) − β(1) +

xz∗ (s, 2) 64 − β(0) − β(1) − β(2) +

(34)
..
.
16 FLSSS

where is the left bit-shift operator. The term 64 − β(0) or 64 − β(0) − β(1) . . . is referred
to as shift distance. The above construction for xz∗c (s, 0) stops once the shift distance would
become negative. The next 64-bit buffer xz∗c (s, 1) then starts accommodating bits of the rest
elements in xz∗ (s, ), so on and so forth.
Let dc be the dimensionality of xz∗c . It is estimated before integer compression. A different
order of xz∗ columns means a different order of β, and such order may lead to a smaller
dc . Finding the best order for minimizing dc accounts for a bin-packing problem (Korte and
Vygen 2006). Currently FLSSS does not optimize this order for giving users the option of
bounding partial columns (see package user manual), which needs to put the lower-bounded
(upper-bounded) columns next to each other. Equation (34) also applies to computing S z∗c L
z∗c
and S U .

4.4. Comparison mask

The comparison mask is responsible for comparing two compressed integer arrays. Similar to
Equation (34), a 64-bit buffer array π of size dc is constructed by

π(0) ←1 64 − β(0) +

1 64 − β(0) − β(1) + (35)
..
.

In π, bits equal to 1 align with the sign bits in a row of xz∗c . We define

xz∗c (s, ) ± xz∗c (t, ) = xz∗c (s, 0) ± xz∗c (t, 0), . . . , xz∗c (s, dc − 1) ± xz∗c (t, dc − 1) ,

xz∗c (s, ) ≤ xz∗c (t, ) ≡ xz∗c (t, 0) − xz∗c (s, 0) & π(0) = 0 ∧

z∗c
x (t, 1) − xz∗c (s, 1) & π(1) = 0 ∧

(36)
..
.
z∗c
x (t, d − 1) − x (s, dc − 1) & π(dc − 1) = 0
c z∗c

where & is the bitwise AND operator.

Consider the following toy example where dc = 1. Five elemental integers constitute one
64-bit buffer, and the comparison mask is

comparisonMask = 1000000000100001000000000100000000000010000000000000000000000000 .

The mask indicates the first elemental integer occupies the 2nd to 9th bits, the second ele-
mental integer occupies the 12th to 15th bits, and so on. The following C++ codes define
addition, subtraction and comparison of such two 64-bit buffers on a 64-bit machine:

std::size_t add(std::size_t x, std::size_t y) { return x + y; }

std::size_t subtract(std::size_t x, std::size_t y) { return x - y; }

bool lessEqual(std::size_t x, std::size_t y, std::size_t comparisonMask)

{ return (y - x) & signMask == 0; }
Journal of Statistical Software 17

●
1.5
● ● ● ● ●
● ●
● ●
●
Time (seconds)

● ● ● ●
● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
1.0

● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ●
● ●
● ●
● ● ● ●
0.5

● ● ●● ●●
● ●
● ● ●
● ●
● ●
● ● Original ● ● ● ●
Integer compression
0.0

Cases

Figure 6: A hundred 70 × 14 supersets and subset sum targets generated at random, subset size n = 7,
g++ ’-O2’ compile, 7 threads, Intel(R) i7-4770 CPU @ 3.40GHz, Windows 7. Preprocessing time included.
Integerization yields about 1.5x acceleration for finding all qualified subsets.

The addition and subtraction have no overheads comparing to those for normal integers. The
comparison may halve the speed of a single ‘≤’ operation, but acceleration due to dimension
reduction would reverse the situation globally. See (Liu 2018) for more discussions on com-
pressed integer algebras. The following code demonstrates the speedup from integerization.
Figure 6 shows the results.

R> set.seed(42)
R> N = 70L; n = 7L; d = 14L
R> noOpt = numeric(100)
R> withOpt = numeric(100)
R> for(i in 1L : 100L)
+ {
+ x = matrix(runif(N * d) * 10000, ncol = d)
+ solution = sample(1L : N, n)
+ Sl = colSums(x[solution, ]) * 0.999
+ Su = Sl / 0.999 * 1.001
+ rm(solution); gc()
+ noOpt[i] = system.time(FLSSS::mFLSSSpar(maxCore = 7, len = n, mV = x,
+ mTarget = (Sl + Su) / 2, mME = (Su - Sl) / 2, solutionNeed = 1e3,
+ tlimit = 3600))['elapsed']
+ withOpt[i] = system.time(FLSSS::mFLSSSparIntegerized(maxCore = 7,
+ len = n, mV = x, mTarget = (Sl + Su) / 2, mME = (Su - Sl) / 2,
+ solutionNeed = 1e3, tlimit = 3600))['elapsed']
+ }
R> mean(noOpt / withOpt)

[1] 1.477555

5. Multidimensional Knapsack problem

The Knapsack problem, especially the 0-1 Knapsack problem, is one of the oldest combina-
torial optimization problems and has been extensively studied (Martello and Toth 1990b).
18 FLSSS

Given a set of items with a profit attribute and a cost attribute, the 0-1 Knapsack problem
seeks a subset of items to maximize the total profit while the total cost does not surpass
a given value. The multidimensional 0-1 Knapsack problem assigns multiple cost attributes
to an item, and maximizes the total profit while the total cost in each cost dimension stays
below their individual upper bounds. The computational complexity of the multidimensional
0-1 Knapsack problem escalates rapidly as the dimensionality rises.
MFSSA directly applies to the multidimensional fixed-size 0-1 Knapsack (MF01K) problem.
See Section 2 for converting a variable-size instance. Consider an MF01K instance with subset
size n, cost attributes d, items N . The costs constitute an N × d superset x. Rows of x are
sorted by item profits in ascending order. For t ∈ [0, d), the subset sum upper bound SU (t)
equals the given cost upper bound, and the lower bound SL (t) equals the sum of the least
n elements in x(t). We then pad the key column in Equation (22), and comonotonize x to
obtain x∗ in Equation (23) and S ∗L , S ∗U in Equation (25).

5.1. Optimization
Column S ∗L (, d) or S ∗U (, d) essentially consists of sums of ranks of the item profits, thus a
qualified subset found via mining [S ∗L (s, ), S ∗U (s, )], s ∈ [1, N ) would more likely have a
greater total item profit than that from [S ∗L (s0 , ), S ∗U (s0 , )], s > s0 . Therefore, we sort S ∗L
and S ∗U by S ∗U (, d) in descending order to increase the chance of qualified subsets having
higher total item profits being found sooner. On the other hand, a heuristic approach stops
immediately once it finds a qualified subset.
Given m threads, instead of mining the first m rows of S ∗L and S ∗U concurrently (Section 4.2),
the threads all concentrate on [S ∗L (0, ), S ∗U (0, )] first. Given a constant φ, we perform a
breadth-first search starting from the root node of the binary tree in Figure 3 until we have
no less than mφ nodes for trail in the same hierarchy. The threads then have mφ independent
tasks to work on. These tasks have heterogeneous difficulties. To lower the chance of thread
idleness, the first m tasks are solved concurrently, and the first finished thread moves onto
the m + 1 th task and so forth. If idle threads are detected, another breadth-first expansion
generates sufficient tasks to keep them busy.
Once a thread finds a qualified subset, it updates the current optimum if the total profit of
the subset exceeds that of the current optimum. The optimum and its profit are guarded by
a spin mutex lock (Intel 2011b).
After each contraction (Section 2.1), if the total item profit of the hyperrectangle’s upper
bounds u is below that of the current optimum, we prune the entire branch rooted at this
hyperrectangle.

6. Generalized Assignment problem

The Generalized Assignment Problem (GAP) is an NP-hard combinatorial optimization prob-
lem (Martello and Toth 1990a). It assigns T tasks to A agents where each agent can take
zero to all tasks. A task would both profit and cost an agent, and every agent has a budget.
The GAP seeks an assignment to maximize the total profit.
Journal of Statistical Software 19

Let c and p be the cost and profit matrix:

   
c(0, 0) ... c(0, A − 1) p(0, 0) ... p(0, A − 1)
c= .. .. .. .. .. ..
, p = 
   
. . . . . . 
c(T − 1, 0) . . . c(T − 1, A − 1) p(T − 1, 0) . . . p(T − 1, A − 1)
(37)
where c(s, t) and p(s, t) are the cost and profit from assigning task s to agent t. We integrate
c and p to a multidimensional superset:
 
c(0, 0) 0 ... 0 p(0, 0)

 0 c(0, 1) . . . 0 p(0, 1) 

 .. .. . . .
. .
. 
 . . . . . 
 

 0 0 0 c(0, A − 1) p(0, A − 1) 

 c(1, 0) 0 ... 0 p(1, 0) 
x= . (38)
 
 0 c(1, 1) . . . 0 p(1, 1) 
 .. .. .. .. .. 

 . . . . . 


 0 0 0 c(1, A − 1) p(1, A − 1)  
 .. .. . . .. .. 
 . . . . . 
0 0 0 c(T − 1, A − 1) p(T − 1, A − 1)
This (T × A) × (A + 1) superset has T blocks of A rows. Each block is transformed from a
row in c and in p. For instance, the first A rows of x constitute the first block where the
diagonal entries equal c(0, ) and the last column equals p(0, ).

Given agent budgets b = b(0), . . . , b(A − 1) , the GAP is equivalent to a multidimensional
subset sum problem of finding a subset of size T from x subject to: (i) the subset sums of
the first A columns are no greater than b, (ii) each of the T blocks contributes one element
to the subset, and (iii) the subset sum is maximized for the profit column.
Within each block, we sort the rows by the profit column x(, A) in ascending order. Then
x(, A) is replaced with the key column according to Equation (22), block by block. The
aforementioned constraint (ii) changes the initial hypercube (Equation (4)) to
ik ∈ [Ak, Ak + A), k ∈ {0, 1, . . . , T − 1} . (39)
It shows ik is no longer bounded by ik−1 and ik+1 . The auxiliary matrix M becomes obsolete
because sums of consecutive elements in the superset are no longer needed. We apply the
same multithreading technique in Section 5.1 to solving GAP.
For the sake of clarity, consider the following example of assigning 2 tasks to 3 agents:

21 13 9 117 214 167
c= , p= , b = 26 25 27 . (40)
6 11 17 111 453 20
We process the integrated cost and profit matrices by the following steps:
     
21 0 0 117 21 0 0 117 21 0 0 0
 0 13 0 214   0 0 9 167   0 0 9 1 
     
 0 0 9 167   ⇒  0 13 0 214  ⇒  0 13 0 2
   
  → x . (41)
 6 0 0 111  Order  0 0 17 20  Replace  0 0 17 0
    
 
 0 11 0 453  rows by  6 0 0 111  with  6 0 0 1 
profits profit
0 0 17 20 in each 0 11 0 453 ranks 0 11 0 2
block
20 FLSSS

The minimal discrete differential in x(, 0), x(, 1), x(, 2) are -21, -13 and -17 respectively.
Instead of taking a different comonotonization multiplier for each column, as Equation (23)
suggests, here we take a number larger than the negative of the lowest discrete differential
of all columns, e.g. 22, as the universal comonotonization multiplier for all columns. The
advantage of such choice is shown below.
Given the comonotonization multiplier 22, Equation (41) undergoes the following transfor-
mation:
 21+0×22 0+0×22 0+0×22   
22 22 22 0 0.955 0 0 0
 0+1×22 0+1×22 9+1×22
1   1 1 1.409 1 
22
 0+2×22 22 22
13+2×22 0+2×22
  

22 22 22 2   2 2.591 2 2 
x ⇒  0+0×22
 0+0×22 17+0×22
 ⇒  → x. (42)
 22 22 22 0 
  0 0 0.773 0  
 6+1×22 0+1×22 0+1×22
1   1.273 1 1 1 
22 22 22
0+2×22 11+2×22 0+2×22
22 22 22 2 2 2.5 2 2

There are five unique size-2 subset sums in the key column: 4, 3, 2, 1, 0. Five subset sum
ranges are thus subject to mining:
 
−∞ −∞ −∞ 4

 −∞ −∞ −∞ 3 

SL = 
 −∞ −∞ −∞ 2 ,

 −∞ −∞ −∞ 1 
−∞ −∞ −∞ 0
 26+4×22 25+4×22 27+4×22    (43)
22 22 22 4 5.182 5.136 5.227 4
26+3×22 25+3×22 27+3×22

22 22 22 3   4.182 4.136 4.227 3 
26+2×22 25+2×22 27+2×22
   
SU = 
 22 22 22 2 =
  3.182 3.136 3.227 2 

26+1×22 25+1×22 27+1×22

22 22 22 1   2.182 2.136 2.227 1 
26+0×22 25+0×22 27+0×22
22 22 22 0 1.182 1.136 1.227 0

where S L (s, ) and S U (s, ), s ∈ [0, 4] account for one subset sum range.
For row s in every block of x, there exists only one element that is fractional and is always
greater than s. This property is exploited for speedup via compact representation of x.
More specifically, each row of x is represented by two values: the fraction and its column
index. Algorithms for summation, subtraction and comparison are tuned for the compact
representation. These algorithms are also considerably easier to implement.
GAP in FLSSS is an exact algorithm. For large-scale suboptimality-sufficient problems, the
speed performance may not catch up with fast heuristics such as (Haddadi and Ouzia 2004)
and (Nauss 2003) that employs a variety of relaxation and approximation techniques to ap-
proach suboptima.

7. Discussion and outlook

This article introduced algorithms and engineering details for a variety of subset sum prob-
lems. The variety includes problem dimensionality, solution quantity, relaxation on target
subset sum, constraints on subset size and subset elements. The algorithmic framework ap-
plies to the knapsack problem and the generalized assignment problem as exact algorithms.
Journal of Statistical Software 21

For the package’s interest, record-holding heuristics suitable for large-scale and suboptimality-
sufficient problems of generalized assignment or multidimensional knapsack may be imple-
mented to the package in the future.

References

Allaire J, Francois R, Ushey K, Vandenbrouck G, Geelnard M (2018). RcppParallel: Parallel

Programming Tools for Rcpp. R package version 4.4.1.

Bazgan C, Santha M, Tuza Z (2002). “Efficient Approximation Algorithms for the SUBSET-
SUMS EQUALITY Problem.” Journal of Computer and System Sciences.

Denning PJ (2005). “The Locality Principle.” Communications of the ACM.

Dhaene J, Denuit M, Goovaerts MJ, Vyncke D (2002). “The concept of comonotonicity in

actuarial science and finance: theory.” Insurance: Mathematics and Economics.

Eckel B (2000). “Operator overloading and inheritance.” Thinking in C++.

Eddelbuettel D, Francois R, Allaire J, Ushey K, Kou Q, Russell N, Bates D, Chambers J

(2018). Rcpp: Seamless R and C++ Integration. R package version 1.0.0.

Ghosh D, Chakravarti N (1999). “A competitive local search heuristic for the subset sum
problem.” Computers and Operations Research.

Gu S, Ruicui (2015). “A Finite-Time Convergent Recurrent Neural Network Based Algorithm

for the L Smallest k-Subsets Sum Problem.” Neurocomputing.

Haddadi S, Ouzia H (2004). “Effective algorithm and heuristic for the generalized assignment
problem.” European Journal of Operational Research.

Intel (2011a). “Atomic Operations.” Intel Threading Building Blocks, 319872-009US.

Intel (2011b). “Mutex Flavors.” Intel Threading Building Blocks, 319872-009US.

Intel (2017). “Intel Threading Building Blocks.” Intel Threading Building Blocks.

Kleinberg J, Tardos E (2006). “The Subset Sum Problem.” Algorithm Design.

Koiliaris K, Xu C (2017). “A Faster Pseudopolynomial Time Algorithm for Subset Sum.”

Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms.

Korte B, Vygen J (2006). “Bin-Packing.” Combinatorial Optimization: Theory and Algo-

rithms.

Land AH, Doig AG (1960). “An automatic method of solving discrete programming problems.”
Econometrica.

Liu CW (2018). “Add, subtract and compare compressed integers.” Stackoverflow.

Martello S, Toth P (1990a). “Generalized assignment problem.” Knapsack problems.

22 FLSSS

Martello S, Toth P (1990b). “Knapsack problem.” Knapsack problems.

Nauss RM (2003). “Solving the generalized assignment problem: An optimizing and heuristic
approach.” Informs Journal on Computing.

Spearman CE (1904). “The proof and measurement of association between two things.”
American Journal of Psychology.

Wang RL (2004). “A genetic algorithm for subset sum problem.” Neurocomputing.

Affiliation:
Charlie Wusuo Liu
Boston, Massachusetts
United States
E-mail: [email protected]