Journal of Statistical Software
Journal of Statistical Software
Abstract
This article details the algorithmics in FLSSS, an R package for solving various subset
sum problems. The fundamental algorithm engages the problem via combinatorial space
compression adaptive to constraints, relaxations and variations that are often crucial for
data analytics in practice. Such adaptation conversely enables the compression algorithm
to drain every bit of information a sorted superset could bring for rapid convergence.
Multidimensional extension follows a novel decomposition of the problem and is friendly to
multithreading. Data structures supporting the algorithms have trivial space complexity.
The framework offers exact algorithms for the multidimensional knapsack problem and
the generalized assignment problem.
1. Introduction
The Subset Sum problem is an NP-complete combinatorial optimization problem (Kleinberg
and Tardos 2006). Given a set of integers, it seeks a subset whose elements sum to a given
target. Algorithms for solving the problem exist in a vast body of literature. These algorithms
range over the exact and approximate approaches in both the deterministic and stochastic
categories (Bazgan, Santha, and Tuza 2002; Koiliaris and Xu 2017; Ghosh and Chakravarti
1999; Wang 2004; Gu and Ruicui 2015). However, implementations of the algorithms are often
inconvenient to access while the claimed performances remain in computational complexity
analysis. The Subset Sum problem is formally defined in the integer domain, yet for data
analytics, real numbers are typically the subjects. We often do not need or cannot even have
a subset sum precisely equal the given target because of limited precision in floating-point
arithmetics.
2 FLSSS
The package name of FLSSS stands for fixed-length subset sum solver, the single function im-
plemented in its first version. Algorithms in the package are meticulously implemented with
many rounds of optimizations regarding both the mathematics and hardware adaptation for
pushing speed limits. Solvers in the package differ from the mainstream definition of Subset
Sum in the options of (i) restricting subset size, (ii) bounding subset elements, (iii) mining
real-value sets with predefined subset sum errors, and (iv) finding one or more subsets in
limited time. A novel algorithm for mining the one-dimensional Subset Sum (Section 2) in-
duced algorithms for the multi-Subset Sum (Section 3) and the multidimensional Subset Sum
(Section 4). The latter can be scheduled in a multithreaded environment, and the framework
offers strong applications as exact algorithms to the multidimensional Knapsack (Section 5)
and the Generalized Assignment problems (Section 6) for solving to optimality. The package
provides an additional functionality that maps reals to integers with controlled precision loss.
These integers are further zipped non-uniformly in 64-bit buffers. Algebras (addition, sub-
traction, comparison) over compressed integers are defined through simple bit manipulations
with virtually zero speed lags (Section 4.3) relative to those over normal integers. Acceleration
from the dimension reduction can be substantial.
Core algorithms in FLSSS are implemented in C++. Inexpensive preprocessing steps such
as data assembling and reshaping are coded in R. The package employs Rcpp (Eddelbuettel,
Francois, Allaire, Ushey, Kou, Russell, Bates, and Chambers 2018) APIs for getting memory
addresses and size information of R objects. The basic multithreading template is taken
from RcppParallel (Allaire, Francois, Ushey, Vandenbrouck, and Geelnard 2018). Thread
synchronization tools such as mutex and atomic classes are borrowed from Intel TBB library
(Intel 2017) included in RcppParallel.
2.1. Contraction
A real-value superset of size N is sorted and stored in an array
x = (x0 , x1 , . . . , xN −1 ) = x(0), x(1), . . . , x(N − 1) .
Journal of Statistical Software 3
Given subset size n, we look for an integer array i = (i0 , i1 , . . . , in−1 ) such that
n−1
X
x(ik ) ∈ [MIN, MAX] , the subset sum range. (1)
k=0
First index
We inspect l(i0 ) first. The initial value of l(i0 ) equals 0Pby Equation (4). Our goal is to uplift
l(i0 ) if possible. Equation (1) implies x(i0 ) ≥ MIN − n−1 t=1 x(it ). Notice the initial maxima
of x(i1 ), . . . , x(in−1 ) are x(N − n + 1), . . . , x(N − 1), thus
n−1
X N
X −1
x(i0 ) ≥ MIN − max x(it ) = MIN − x(t) . (5)
t=1 t=N −n+1
Rest indexes
For k ∈ [1, n), the update of l(ik−1 ) immediately triggers
l(ik ) ← max l(ik ), l(ik−1 ) + 1
k
X Xk
max x(it ) = x min u(it ), ik − k + t ,
t=0 t=0
and
Notice the left side of the second constraint is a non-decreasing function of α, thus a brute-force
solution to System (8) can be of (i) initializing α with the current l(ik ), and (ii) incrementing
α by 1 repeatedly until the second constraint turns true. If α = u(ik ) and the constraint is
still unsatisfied, contraction fails and no qualified subsets would exist.
Upper bounds
Updating the lower and the upper bounds
are symmetrical. The upper bound u(ik ) is updated
by u(ik ) ← min u(ik ), u(ik+1 ) − 1 first and then
If t∗ = 0, then α − k + t and u(it ) have no intersection and the first sum term in (10)(ii) is
ignored. Our next goal is to find an appropriate t∗ . We will see knowing t∗ brings considerable
computational advantage.
For convenience, let
∗ −1
tX k
∗
X
f (t , α) = x u(it ) + x α−k+t . (11)
t=0 t=t∗
∗ −1
tX k n−1
X X
x u(it∗ ) + t − t∗ ≥ MIN −
max(f ) = x u(it ) + x u(it ) . (12)
t=0 t=t∗ t=k+1
k n−1 ∗ −1
tX
X X
x α − k + t ≥ MIN − x u(it ) − x u(it ) . (13)
t=t∗ t=k+1 t=0
Inequality (13) turns true. The resulting α becomes the new l(ik ). Figure 1 demonstrates a
visual explanation for updating t∗ and l(ik ). Decomposing the max operator in System (9)
follows the same rationale.
Quasi-triangle matrix
The first sum in Inequality (13) and the second sum in (12) both add consecutive elements
6 FLSSS
● ● ● ●
α −−
α k +kt + t
● ● ● ●
l(it) t )
● ● ● ●
● u(i ● ● ● ●
u(itt))
l(i
● ● ● ●
●
● l(ik ) − 1 ●
●
●
●
●
●
u
l(ik )
● ● ● ●
l(ik−1 )
u(it∗ ) ●
●
●
●
●
●
●
●
● ● ● ●
● ● ● ●
● ● ● ●
u
u
● ● ● ●
Figure 1: Panels (1)-(5) update t∗ by Inequality (12) using linear search. Panels (6)-(8) update l(ik ) by
Inequality (13) using linear search.
R> set.seed(42)
R> lrtime = numeric(100)
R> bitime = numeric(100)
R> len = 100
R> me = 1e-4
R> for(i in 1L : 100L)
+ {
+ superset = sort(runif(1000, 0, 1e6))
+ target = sum(sample(superset, len))
+ lrtime[i] = system.time({FLSSS::FLSSS(len = len, v = superset,
+ target = target, ME = me, solutionNeed = 10,
Journal of Statistical Software 7
●
● Binary search
Time (seconds)
● ●
● Linear search
2.0
●
● ● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ●
● ●
● ● ● ●● ● ● ● ● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ● ●
1.0
● ● ● ● ● ● ●
● ● ● ● ●● ●
●
●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ●
●● ● ● ●
● ●
0.0
Cases
Figure 2: Supersets sizes = 1000, subsets sizes = 100, subset sum error thresholds = 0.0001, requested
number of subsets ≥ 10, g++ ’-O2’ compile, Intel(R) i7-4770 CPU @ 3.40GHz, Windows 7. Preprocessing
time included. Each of the 100 supersets contains uniforms in [0, 1000000]; the corresponding target sum
is the sum of elements of a subset sampled at random. Linear search yields about 1.8x acceleration.
+ useBiSrchInFB = FALSE)})['elapsed']
+ bitime[i] = system.time({FLSSS::FLSSS(len = len, v = superset,
+ target = target, ME = me, solutionNeed = 10,
+ useBiSrchInFB = TRUE)})['elapsed']
+ }
R> mean(bitime / lrtime)
[1] 1.790667
It is easy to see the infima l(it ) and suprema u(it ) will become stationary after finite con-
tractions because l(it ) ≤ u(it ). The uniqueness of the stationed infima and suprema remains
to be proved. FLSSS provides a function z_findBound() for examining the concept. In the
following example code, the first call to z_findBound() contracts a 10-dimensional hypercube
starting with the infima. The second call to z_findBound() starts with the suprema. Both
calls converge to the same hyperrectangle.
R> x = c(14, 60, 134, 135, 141, 192, 199, 203, 207, 234)
R> MIN = 813
R> MAX = 821
R> lit = as.integer(c(1, 2, 3, 4, 5))
R> uit = as.integer(c(6, 7, 8, 9, 10))
R> hyperRectangle1 = FLSSS:::z_findBound(len = 5, V = as.matrix(x),
+ target = (MIN + MAX) / 2, me = (MAX - MIN) / 2, initialLB = lit,
+ initialUB = uit, UBfirst = FALSE)
R> hyperRectangle2 = FLSSS:::z_findBound(len = 5, V = as.matrix(x),
+ target = (MIN + MAX) / 2, me = (MAX - MIN) / 2, initialLB = lit,
+ initialUB = uit, UBfirst = TRUE)
R> hyperRectangle1[-1]; hyperRectangle2[-1]
[[1]]
[1] 1 3 5 6 8
[[2]]
[1] 3 6 8 9 10
8 FLSSS
(1) p (2) p
(a, b) (c, d)
a b c d
Figure 3: Panel (1) shows the variable-subspacing (VS) method. Assuming the narrowest dimension of
hyperrectangle p has width 4, then subspacing p would produce 4 child hyperrectangles a, b, c, d. On the
other hand, binary-subspacing (BS) halves p. If contracting a, b would both fail, VS has to contract both
a and b to know, yet BS only has to contract ( a, b). For BS, c or d are decedents of ( c, d), a smaller
hyperrectangle than p that gives birth to c and d in VS.
0.0 0.5 1.0 1.5 2.0 2.5
● Variable tree
Time (seconds)
Binary
●
tree ●● ● ● ●
● ● ● ● ● ●
● ●
● ●● ●
●
● ●● ●● ● ● ● ●
● ● ● ●● ● ● ● ● ●●
● ● ● ● ●
● ● ● ●● ● ● ●●
● ● ● ● ●
● ● ● ● ●
●
● ●● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ●
● ● ●
● ● ●
● ● ● ●
●
● ● ●
●
●●
Cases
Figure 4: Supersets sizes = 70, dimensionality = 14, subsets sizes = 7, subset sum error thresholds = 0.01
for all dimensions, requested number of subsets < ∞, g++ ’-O2’ compile, 7 threads, Intel(R) i7-4770 CPU
@ 3.40GHz, Windows 7. Preprocessing time included. Each of the 100 supersets contains random uniforms
in [0, 10000]; the corresponding target sum is the sum of elements of a subset sampled at random. Binary
subspcing yields about 1.6x acceleration.
[[1]]
[1] 1 3 5 6 8
[[2]]
[1] 3 6 8 9 10
2.3. Subspacing
Some previous versions of FLSSS (i) select a dimension of the hyperrectangle resulted from
contraction, (ii) fix the value of that dimension, (iii) reduce the dimensionality of the problem
by 1 and (iv) execute contraction again. In step (i), it chooses the dimension having the least
domain width so to produce the fewest branches. These steps are a mixture of depth-first
and best-first searches.
The current version of FLSSS differs in step (ii) and (iii): it halves the domain of the selected
dimension, and reduces the dimensionality only if the domain width equals 1. Virtually,
the current subspacing method constructs a binary tree while the previous ones construct
a variable-branch tree. The binary tree appears to converge slower since it lengthens the
path between each two nodes of dimension reduction, but it actually yields higher overall
speed because (1) it could prune multiple branches at once if contraction fails over a halved
hyperrectangle, and (2) child nodes will receive a smaller hyperrectangle (in terms of volume)
Journal of Statistical Software 9
Algorithm 1 Subspacing
Parent Child
β κ n nz
n−1
P n−1
P
MIN MAX x l(it ) x u(it )
t=0 t=0
l u u0 S(u)
l(i0 ), . . LB
. , l(in−1 ) u(i0 ), . . .UB
, u(in−1 ) u(i0 ), .Bresv
. . , u(iκ )
A hyperrectangle object in stack consists of twelve parameters explained in the following
algorithm.
LEFT BRANCH:
1: β ← 0 . / β = 0 implies the left branch, 1 the right branch.
2: Copy n, dimensionality of the parent hyperrectangle;
n−1
P n−1
P
3: Copy MIN, MAX, x l(it ) , x u(it ) , l and u from the parent.
t=0 t=0
4: Update the copied parameters through contraction. If it fails, return a failure signal.
5: T ← {t|u(it ) = l(it )} , nz ← |T | . / |T | is the cardinality.
6: l ← {l(it )|t ∈
/ T } , u ← {u(it )|t ∈
/ T } , n ← n − nz .
n−1
P n−1
P
7: Update x l(it ) and x u(it ) .
t=0 t=0
8: Push {u(it )|t ∈ T } in a global buffer B that is to hold a qualified subset. / This step goes
concurrently with Step 5.
9: κ ← arg min u(it ) − l(it ) .
t
n−1
u0 ← {u(i0 ), . . . , u(iκ )} , S(u) ←
P
10: x u(it ) .
t=0
11: For t ∈ [0, κ] , u(it ) ← min u(it ), bu(iκ )/2c − κ + t . / Loop t from κ and stop once
u(it ) ≤ bu(iκ )/2c − κ + t .
n−1
P n−1
P
12: Update x l(it ) , x u(it ) . / Use M for fast update.
t=0 t=0
If contraction succeeds in Step 4, move to the right hyperrectangle in stack and execute the
above steps again. Otherwise left-propagate through stack while erasing the last nz elements
in buffer B for each hyperrectangle, and stop once the current one has β = 0 .
RIGHT BRANCH:
1: β ← 1 .
n−1
2: For t ∈ [0, κ] , u(it ) ← u0 (it ) ;
P
x u(it ) ← S(u) .
t=0
3: For t ∈ [κ, n − 1] , l(it ) ← max l(it ), u(iκ ) + 1 + t − κ . / Loop t from κ and stop once
l(it ) ≥ u(iκ ) + 1 + t − κ .
n−1
P
4: Update x l(it ) . / Use M for fast update.
t=0
Move to the right hyperrectangle and execute LEFT BRANCH.
10 FLSSS
which leads to faster contraction. Figure 3 and Algorithm 1 present details of subspacing. The
speed benefit from binary subspacing will be more pronounced for multidimensional subset
sum (Section 3) where contractions consume the majority of mining time. The following
code compares the time costs of mining a hundred supersets using the variable and binary
subspacing methods. Figure 4 shows the results.
R> set.seed(42)
R> N = 70L; n = 7L; d = 14L
R> mflsssBinTreeTime = numeric(100)
R> mflsssVarTreeTime = numeric(100)
R> for(i in 1L : 100L)
+ {
+ x = matrix(runif(N * d, 0, 10000), ncol = d)
+ tmp = colSums(x[sample(1L : N, n), ])
+ Sl = tmp - 0.01
+ Su = tmp + 0.01
+ mflsssBinTreeTime[i] = system.time(FLSSS::mFLSSSpar(
+ maxCore = 7, len = n, mV = x, mTarget = (Sl + Su) / 2,
+ mME = (Su - Sl) / 2, solutionNeed = 1e3, tlimit = 3600))['elapsed']
+ mflsssVarTreeTime[i] = system.time(FLSSS:::mFLSSSparVariableTree(
+ maxCore = 7, len = n, mV = x, mTarget = (Sl + Su) / 2,
+ mME = (Su - Sl) / 2, solutionNeed = 1e3, tlimit = 3600))['elapsed']
R> }
R> mean(mflsssVarTreeTime / mflsssBinTreeTime)
[1] 1.58791
3. Multi-Subset Sum
Given multiple sorted supersets and a subset size for each, the multi-Subset Sum seeks a
subset from every superset such that elements in all subsets sum in a given range. The
OFSSA directly applies to this problem following four steps: (i) shift elements in some or
all supersets such that (ii) all elements in the shifted supersets constitute a nondecreasing
sequence, a new superset; (iii) calculate a new subset sum target range in response to the
shifting in (i); (iv) mine a subset of size equal to the sum of given subset sizes while the subset
elements are bounded by the sub-supersets within the new superset.
Let x0 , . . . , xK−1 be K sorted supersets of sizes N0 , . . . , NK−1 . Let n0 , . . . , nK−1 be the
respective subset sizes and [MIN, MAX] be the target sum range. Shifting the supersets
follows
xsh (t) ← xh (t) − xh (0) + xh−1 (NK−1 − 1), h ∈ [1, K), t ∈ [0, Nh ) (15)
where xsh denotes a shifted superset. Elements in shifted supersets are then pooled together
Journal of Statistical Software 11
Given subset size n and subset sum range [SL , SU ] as two size-d arrays, the multidimensional
fixed-size Subset Sum algorithm (MFSSA) seeks an integer array i = (i0 , i1 , . . . , in−1 ) such
that
n−1
X
SL ≤ x(it , ) ≤ SU . (21)
t=0
The multidimensional variable-size Subset Sum follows similar conversion in Section 2.
4.1. Comonotonization
If all columns in x are comonotonic (Dhaene, Denuit, Goovaerts, and Vyncke 2002), x can
be sorted so that x(0, ) ≤ . . . ≤ x(N − 1, ) . Overloading arithmetic operators (Eckel 2000)
in OFSSA then solves the problem.
12 FLSSS
In general, MFSSA roughly consists of (i) padding an extra column of non-decreasing integers
to the superset, (ii) scaling and adding this column to the rest to make all columns comono-
tonic (comonotonize / comonotonization), and (iii) mining for at most N (N − n)/2 + 1 subset
sum ranges regarding the new superset. Each of these subset sum ranges corresponds to a
subset sum in the extra column. Mining different subset sum ranges in (iii) are independent
and share the same auxiliary matrix of M (Section 2.2.2), thus can employ multithreading.
Let x∗ be the comonotonized superset. The extra column x∗ (, d) is referred to as the key
column. For convenience, let array v refer to x∗ (, d) . The key column is constructed by
Let ∆x(, t) be the discrete differential of x(, t), t ∈ [0, d). The rest columns of x∗ are
computed by
θ(t) = min 0, min ∆x(, t) and
(23)
x∗ (, t) ← x(, t) + v · θ(t) ,
where θ(t) is referred to as the comonotonization multiplier for x(, t). The key column v has
no subset sum constraint. However, because it is a sorted integer sequence with maximal
discrete differential of 1, all unique subset sums in v compose an integer sequence:
n−1
X n−1
X n−1
X N
X −1
S key = v(t), 1 + v(t), 2 + v(t), . . . , v(t) . (24)
t=0 t=0 t=0 t=N −n
PN −1 Pn−1
The size of S key equals t=N −n v(t) − t=0 v(t) + 1, which would be no more than N (N −
key
n)/2 + 1. Let NS be the size of S . We have the following NS subset sum ranges to mine:
where [S ∗L (s, ), S ∗U (s, )], s ∈ [0, NS ), account for one subset sum range.
Consider the following toy example of finding a size-2 subset from a 2D superset (n = 2, d =
2, N = 3)
4 10
x= 2 25 (26)
8 17
Journal of Statistical Software 13
with SL = (11, 26) and SU = (12, 28). The minimal discrete differentials of the two columns
are -2 and -8. Then the comonotonization multipliers are 2 and 8 respectively. We comono-
tonize x by
4 + 0 × 2 10 + 0 × 8 0 4 10 0
x∗ = 2 + 1 × 2 25 + 1 × 8 1 = 4 33 1 (27)
8 + 2 × 2 17 + 2 × 8 2 12 33 2
according to Equations (22), (23). The unique size-two subset sums in the key column are 1,
2, 3, thus
11 + 1 × 2 26 + 1 × 8 1 13 34 1
S ∗L = 11 + 2 × 2 26 + 2 × 8 2 = 15 42 2 ,
11 + 3 × 2 26 + 3 × 8 3 17 50 3
(28)
12 + 1 × 2 28 + 1 × 8 1 14 36 1
S ∗U = 12 + 2 × 2 28 + 2 × 8 2 = 16 44 2 .
12 + 3 × 2 28 + 3 × 8 3 18 52 3
[SL (, t0 ) + SU (, t0 )] / 2 − n−1 0
P
s=0 x(s, t )
p = PN −1 n−1
. (29)
0) − 0)
P
s=N −n x(s, t s=0 x(s, t
If there exist qualified subsets, their subset sums in the key column should have percentiles
close to p. We prioritize the subset sum targets whose percentiles are close to p, thus the rows
of S ∗L and S ∗U are ordered by
S ∗ (, d) − S ∗U (0, d)
∗ U − p . (30)
S U (N − 1, d) − S ∗U (0, d)
Given m threads, the first m rows of S ∗L and S ∗U are mined concurrently. The first finished
thread then works on the m + 1 th rows so forth if the current number of qualified subsets
is unsatisfied. The threads are scheduled by several atomic class objects (Intel 2011a). The
scheduling overhead is negligible.
14 FLSSS
●
Time (seconds)
●
0.8
●
● ● ● ● ● ● ● ● ●● ● ●
●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●●● ● ●●
0.6
●● ● ●● ●● ●● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●● ● ● ● ● ● ●
● ●
● ● ● ● ●
No order optimization ●
0.4
Cases
Figure 5: A hundred 60 × 5 supersets and subset sum targets generated at random, subset size n = 6,
g++ ’-O2’ compile, 7 threads, Intel(R) i7-4770 CPU @ 3.40GHz, Windows 7. Preprocessing time included.
Order optimization yields about 4x acceleration for finding all qualified subsets.
The following code demonstrates the speed advantage from order optimizations in a hundred
test cases. Figure 5 shows the results.
R> set.seed(42)
R> N = 60L; n = 6L; d = 5L
R> noOpt = numeric(100)
R> withOpt = numeric(100)
R> for(i in 1L : 100L)
+ {
+ x = matrix(runif(N * d) * 10000, ncol = d)
+ solution = sample(1L : N, n)
+ Sl = colSums(x[solution, ]) * 0.999
+ Su = Sl / 0.999 * 1.001
+ rm(solution); gc()
+ noOpt[i] = system.time(FLSSS::mFLSSSparImposeBounds(maxCore = 7,
+ len = n, mV = x, mTarget = (Sl + Su) / 2, mME = (Su - Sl) / 2,
+ solutionNeed = 1e6))['elapsed']
+ withOpt[i] = system.time(FLSSS::mFLSSSpar(maxCore = 7, len = n,
+ mV = x, mTarget = (Sl + Su) / 2, mME = (Su - Sl) / 2,
+ solutionNeed = 1e6))['elapsed']
+ }
R> mean(noOpt / withOpt)
[1] 4.392401
These simulations seek all qualified subsets, thus order optimization (b) has no effect, because
all rows in S ∗L and S ∗U will be in trial. A certain yet minor reason for the acceleration due
to superset reordering is that it can lower the number of unique elements in the key column
by Equation (22), thus leads to fewer rows in S ∗L and S ∗U and fewer tasks for the computing
threads. A probable and major reason is that reordering the superset puts its elements in
compact shapes or clusters instead of random, scattered formations in multidimensional space,
which leads to (i) more intense hyperrectangle contraction (Section 2.1) navigated by those
compact shapes or clusters, and thus (ii) fewer child hyperrectangles spawned for predicting
the locations of qualified subsets.
Journal of Statistical Software 15
The above equations guarantee nonnegative integers in xz (, t) with minimum 0 and maximum
λ(t). Without considering numeric errors brought by scaling and shifting, larger λ(t) makes
xz and x have closer joint distributions, and the chance of the two yielding different qualified
subsets (element index-wise) will be lower.
The comonotonization of xz results in xz∗ , S z∗ z∗
L and S U . Compressing integers for dimension
reduction approximately consists of (i) finding the largest absolute value that could be reached
during mining (LVM) for each dimension (column) of xz∗ , (ii) calculating how many bits are
needed to represent the LVM for each dimension and how many 64-bit buffers are needed to
store those bits for all dimensions, (iii) cramming each row of xz∗ into the buffers and (iv)
defining compressed integer algebras through bit-manipulation.
The LVM for a certain dimension should equal or exceed the maximum of the absolute values
of all temporary or permanent variables in the mining program within that dimension. Let
ψ(t) be the LVM for xz∗ (t), t ∈ [0, d). We have
−1
z∗ z∗ NX
xz∗ (s, t) ,
ψ(t) ← max max S L (, t) , max S U (, t) ,
(32)
s=N −n
where is the left bit-shift operator. The term 64 − β(0) or 64 − β(0) − β(1) . . . is referred
to as shift distance. The above construction for xz∗c (s, 0) stops once the shift distance would
become negative. The next 64-bit buffer xz∗c (s, 1) then starts accommodating bits of the rest
elements in xz∗ (s, ), so on and so forth.
Let dc be the dimensionality of xz∗c . It is estimated before integer compression. A different
order of xz∗ columns means a different order of β, and such order may lead to a smaller
dc . Finding the best order for minimizing dc accounts for a bin-packing problem (Korte and
Vygen 2006). Currently FLSSS does not optimize this order for giving users the option of
bounding partial columns (see package user manual), which needs to put the lower-bounded
(upper-bounded) columns next to each other. Equation (34) also applies to computing S z∗c L
z∗c
and S U .
In π, bits equal to 1 align with the sign bits in a row of xz∗c . We define
xz∗c (s, ) ± xz∗c (t, ) = xz∗c (s, 0) ± xz∗c (t, 0), . . . , xz∗c (s, dc − 1) ± xz∗c (t, dc − 1) ,
xz∗c (s, ) ≤ xz∗c (t, ) ≡ xz∗c (t, 0) − xz∗c (s, 0) & π(0) = 0 ∧
z∗c
x (t, 1) − xz∗c (s, 1) & π(1) = 0 ∧
(36)
..
.
z∗c
x (t, d − 1) − x (s, dc − 1) & π(dc − 1) = 0
c z∗c
comparisonMask = 1000000000100001000000000100000000000010000000000000000000000000 .
The mask indicates the first elemental integer occupies the 2nd to 9th bits, the second ele-
mental integer occupies the 12th to 15th bits, and so on. The following C++ codes define
addition, subtraction and comparison of such two 64-bit buffers on a 64-bit machine:
●
1.5
● ● ● ● ●
● ●
● ●
●
Time (seconds)
● ● ● ●
● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ● ●
1.0
● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ●
● ●
● ●
● ● ● ●
0.5
● ● ●● ●●
● ●
● ● ●
● ●
● ●
● ● Original ● ● ● ●
Integer compression
0.0
Cases
Figure 6: A hundred 70 × 14 supersets and subset sum targets generated at random, subset size n = 7,
g++ ’-O2’ compile, 7 threads, Intel(R) i7-4770 CPU @ 3.40GHz, Windows 7. Preprocessing time included.
Integerization yields about 1.5x acceleration for finding all qualified subsets.
The addition and subtraction have no overheads comparing to those for normal integers. The
comparison may halve the speed of a single ‘≤’ operation, but acceleration due to dimension
reduction would reverse the situation globally. See (Liu 2018) for more discussions on com-
pressed integer algebras. The following code demonstrates the speedup from integerization.
Figure 6 shows the results.
R> set.seed(42)
R> N = 70L; n = 7L; d = 14L
R> noOpt = numeric(100)
R> withOpt = numeric(100)
R> for(i in 1L : 100L)
+ {
+ x = matrix(runif(N * d) * 10000, ncol = d)
+ solution = sample(1L : N, n)
+ Sl = colSums(x[solution, ]) * 0.999
+ Su = Sl / 0.999 * 1.001
+ rm(solution); gc()
+ noOpt[i] = system.time(FLSSS::mFLSSSpar(maxCore = 7, len = n, mV = x,
+ mTarget = (Sl + Su) / 2, mME = (Su - Sl) / 2, solutionNeed = 1e3,
+ tlimit = 3600))['elapsed']
+ withOpt[i] = system.time(FLSSS::mFLSSSparIntegerized(maxCore = 7,
+ len = n, mV = x, mTarget = (Sl + Su) / 2, mME = (Su - Sl) / 2,
+ solutionNeed = 1e3, tlimit = 3600))['elapsed']
+ }
R> mean(noOpt / withOpt)
[1] 1.477555
Given a set of items with a profit attribute and a cost attribute, the 0-1 Knapsack problem
seeks a subset of items to maximize the total profit while the total cost does not surpass
a given value. The multidimensional 0-1 Knapsack problem assigns multiple cost attributes
to an item, and maximizes the total profit while the total cost in each cost dimension stays
below their individual upper bounds. The computational complexity of the multidimensional
0-1 Knapsack problem escalates rapidly as the dimensionality rises.
MFSSA directly applies to the multidimensional fixed-size 0-1 Knapsack (MF01K) problem.
See Section 2 for converting a variable-size instance. Consider an MF01K instance with subset
size n, cost attributes d, items N . The costs constitute an N × d superset x. Rows of x are
sorted by item profits in ascending order. For t ∈ [0, d), the subset sum upper bound SU (t)
equals the given cost upper bound, and the lower bound SL (t) equals the sum of the least
n elements in x(t). We then pad the key column in Equation (22), and comonotonize x to
obtain x∗ in Equation (23) and S ∗L , S ∗U in Equation (25).
5.1. Optimization
Column S ∗L (, d) or S ∗U (, d) essentially consists of sums of ranks of the item profits, thus a
qualified subset found via mining [S ∗L (s, ), S ∗U (s, )], s ∈ [1, N ) would more likely have a
greater total item profit than that from [S ∗L (s0 , ), S ∗U (s0 , )], s > s0 . Therefore, we sort S ∗L
and S ∗U by S ∗U (, d) in descending order to increase the chance of qualified subsets having
higher total item profits being found sooner. On the other hand, a heuristic approach stops
immediately once it finds a qualified subset.
Given m threads, instead of mining the first m rows of S ∗L and S ∗U concurrently (Section 4.2),
the threads all concentrate on [S ∗L (0, ), S ∗U (0, )] first. Given a constant φ, we perform a
breadth-first search starting from the root node of the binary tree in Figure 3 until we have
no less than mφ nodes for trail in the same hierarchy. The threads then have mφ independent
tasks to work on. These tasks have heterogeneous difficulties. To lower the chance of thread
idleness, the first m tasks are solved concurrently, and the first finished thread moves onto
the m + 1 th task and so forth. If idle threads are detected, another breadth-first expansion
generates sufficient tasks to keep them busy.
Once a thread finds a qualified subset, it updates the current optimum if the total profit of
the subset exceeds that of the current optimum. The optimum and its profit are guarded by
a spin mutex lock (Intel 2011b).
After each contraction (Section 2.1), if the total item profit of the hyperrectangle’s upper
bounds u is below that of the current optimum, we prune the entire branch rooted at this
hyperrectangle.
The minimal discrete differential in x(, 0), x(, 1), x(, 2) are -21, -13 and -17 respectively.
Instead of taking a different comonotonization multiplier for each column, as Equation (23)
suggests, here we take a number larger than the negative of the lowest discrete differential
of all columns, e.g. 22, as the universal comonotonization multiplier for all columns. The
advantage of such choice is shown below.
Given the comonotonization multiplier 22, Equation (41) undergoes the following transfor-
mation:
21+0×22 0+0×22 0+0×22
22 22 22 0 0.955 0 0 0
0+1×22 0+1×22 9+1×22
1 1 1 1.409 1
22
0+2×22 22 22
13+2×22 0+2×22
22 22 22 2 2 2.591 2 2
x ⇒ 0+0×22
0+0×22 17+0×22
⇒ → x. (42)
22 22 22 0
0 0 0.773 0
6+1×22 0+1×22 0+1×22
1 1.273 1 1 1
22 22 22
0+2×22 11+2×22 0+2×22
22 22 22 2 2 2.5 2 2
There are five unique size-2 subset sums in the key column: 4, 3, 2, 1, 0. Five subset sum
ranges are thus subject to mining:
−∞ −∞ −∞ 4
−∞ −∞ −∞ 3
SL =
−∞ −∞ −∞ 2 ,
−∞ −∞ −∞ 1
−∞ −∞ −∞ 0
26+4×22 25+4×22 27+4×22 (43)
22 22 22 4 5.182 5.136 5.227 4
26+3×22 25+3×22 27+3×22
22 22 22 3 4.182 4.136 4.227 3
26+2×22 25+2×22 27+2×22
SU =
22 22 22 2 =
3.182 3.136 3.227 2
26+1×22 25+1×22 27+1×22
22 22 22 1 2.182 2.136 2.227 1
26+0×22 25+0×22 27+0×22
22 22 22 0 1.182 1.136 1.227 0
where S L (s, ) and S U (s, ), s ∈ [0, 4] account for one subset sum range.
For row s in every block of x, there exists only one element that is fractional and is always
greater than s. This property is exploited for speedup via compact representation of x.
More specifically, each row of x is represented by two values: the fraction and its column
index. Algorithms for summation, subtraction and comparison are tuned for the compact
representation. These algorithms are also considerably easier to implement.
GAP in FLSSS is an exact algorithm. For large-scale suboptimality-sufficient problems, the
speed performance may not catch up with fast heuristics such as (Haddadi and Ouzia 2004)
and (Nauss 2003) that employs a variety of relaxation and approximation techniques to ap-
proach suboptima.
For the package’s interest, record-holding heuristics suitable for large-scale and suboptimality-
sufficient problems of generalized assignment or multidimensional knapsack may be imple-
mented to the package in the future.
References
Bazgan C, Santha M, Tuza Z (2002). “Efficient Approximation Algorithms for the SUBSET-
SUMS EQUALITY Problem.” Journal of Computer and System Sciences.
Ghosh D, Chakravarti N (1999). “A competitive local search heuristic for the subset sum
problem.” Computers and Operations Research.
Haddadi S, Ouzia H (2004). “Effective algorithm and heuristic for the generalized assignment
problem.” European Journal of Operational Research.
Intel (2017). “Intel Threading Building Blocks.” Intel Threading Building Blocks.
Land AH, Doig AG (1960). “An automatic method of solving discrete programming problems.”
Econometrica.
Nauss RM (2003). “Solving the generalized assignment problem: An optimizing and heuristic
approach.” Informs Journal on Computing.
Spearman CE (1904). “The proof and measurement of association between two things.”
American Journal of Psychology.
Affiliation:
Charlie Wusuo Liu
Boston, Massachusetts
United States
E-mail: [email protected]