Branch-and-Bound Applications in Combinatorial Data Analysis
Branch-and-Bound Applications in Combinatorial Data Analysis
Series Editors:
J. Chambers
D. Hand
W. Härdle
Statistics and Computing
Brusco/Stahl: Branch-and-Bound Applications in Combinatorial Data Analysis.
Dalgaard: Introductory Statistics with R.
Gentle: Elements of Computational Statistics.
Gentle: Numerical Linear Algebra for Applications in Statistics.
Gentle: Random Number Generation and Monte Carlo Methods, 2nd Edition.
Härdle/Klinke/Turlach: XploRe: An Interactive Statistical Computing Environment.
Krause/Olson: The Basics of S-PLUS, 4th Edition.
Lange: Numerical Analysis for Statisticians.
Lemmon/Schafer: Developing Statistical Software in Fortran 95
Loader: Local Regression and Likelihood.
Ó Ruanaidh/Fitzgerald: Numerical Bayesian Methods Applied to Signal Processing.
Pannatier: VARIOWIN: Software for Spatial Data Analysis in 2D.
Pinheiro/Bates: Mixed-Effects Models in S and S-PLUS.
Venables/Ripley: Modern Applied Statistics with S, 4th Edition.
Venables/Ripley: S Programming.
Wilkinson: The Grammar of Graphics.
Michael J. Brusco
Stephanie Stahl
Branch-and-Bound
Applications in
Combinatorial
Data Analysis
Michael J. Brusco Stephanie Stahl
Department of Marketing 2352 Hampshire Way
College of Business Tallahassee, FL 32309-3138
Florida State University USA
Tallahassee, FL 32306-1110
USA
Series Editors:
J. Chambers D. Hand W. Härdle
Bell Labs, Lucent Technologies Department of Mathematics Institut für Statistik und Ökonometrie
600 Mountain Avenue South Kensington Campus Humboldt-Universität zu Berlin
Murray Hill, NJ 07974 Imperial College, London Spandauer Str. 1
USA London SW7 2AZ D-10178 Berlin
United Kingdom Germany
9 8 7 6 5 4 3 2 1
springeronline.com
For Cobol Lipshitz and Snobol Gentiment
Preface
Michael J. Brusco
Stephanie Stahl
January, 2005
Contents
Preface vii
1 Introduction 1
1.1 Background............................................................................ 1
1.2 Branch-and-Bound................................................................. 4
1.2.1 A Brief History............................................................. 4
1.2.2 Components of a Branch-and-Bound Model................ 6
1.3 An Outline of the Monograph................................................ 9
1.3.1 Module 1: Cluster Analysis–Partitioning..................... 9
1.3.2 Module 2: Seriation ..................................................... 10
1.3.3 Module 3: Variable selection ...................................... 11
1.4 Layout for Nonintroductory Chapters .................................. 11
I Cluster Analysis—Partitioning 13
2 An Introduction to Branch-and-Bound Methods for
Partitioning 15
2.1 Partitioning Indices................................................................ 16
2.2 A Branch-and-Bound Paradigm for Partitioning .................. 20
2.2.1 Algorithm Notation ..................................................... 20
2.2.2 Steps of the Algorithm ................................................ 21
2.2.3 Algorithm Description ................................................ 21
3 Minimum-Diameter Partitioning 25
3.1 Overview .............................................................................. 25
3.2 The INITIALIZE Step .......................................................... 26
3.3 The PARTIAL SOLUTION EVALUATION Step .............. 30
3.4 A Numerical Example .......................................................... 32
3.5 Application to a Larger Data Set .......................................... 34
3.6 An Alternative Diameter Criterion ....................................... 38
3.7 Strengths and Limitations ..................................................... 39
3.8 Available Software ................................................................ 39
x Contents
6 Multiobjective Partitioning 77
6.1 Multiobjective Problems in Cluster Analysis ....................... 77
6.2 Partitioning of an Object Set Using Multiple Bases ............. 77
6.3 Partitioning of Objects in a Single Data Set Using Multiple
Criteria .................................................................................. 82
6.4 Strengths and Limitations .................................................... 84
6.5 Available Software .............................................................. 85
II Seriation 89
7 Introduction to the Branch-and-Bound Paradigm for Seriation 91
7.1 Background ......................................................................... 91
7.2 A General Branch-and-Bound Paradigm for Seriation ....... 93
References 209
Index 219
1 Introduction
1.1 Background
There are many problems in statistics that require the solution of con-
tinuous (or smooth) optimization problems. Examples include maximum-
likelihood estimation for confirmatory factor analysis and gradient-based
search methods for multidimensional scaling. These problems are often
characterized by establishment of an objective function and, in some in-
stances, a set of corresponding constraints. Solution proceeds via stan-
dard calculus-based methods for (constrained or unconstrained) optimi-
zation problems. In some situations, the optimality conditions associated
with the calculus-based approach enable a closed form solution to be ob-
tained. An example is the classic normal equation for linear regression
analysis. In other circumstances, no closed form solution exists and the
problem must be solved via numerical estimation procedures, such as the
Newton-Raphson method.
Flow to workstation
1 2 3 4 5 6
Flow from 1 --- 0 10 40 0 10
workstation 2 30 --- 10 10 20 0
3 0 5 --- 0 10 30
4 5 0 5 --- 10 0
5 0 10 20 50 --- 20
6 0 0 5 20 0 ---
The plant manager could find the optimal solution to this problem by
enumerating all feasible sequences and selecting the one that corre-
sponded to the least amount of backtracking. To determine the number of
possible sequences, the manager determines that there are six possible
assignments for the leftmost position in the sequence. Once the leftmost
position is assigned a workstation, there are five possibilities for the sec-
ond leftmost position, and so on. Thus, the feasible set of solutions for
this problem consists of n! = 6 × 5 × 4 × 3 × 2 × 1 = 720 possible se-
quences. The best of these sequences is 2-1-5-3-6-4, which yields total
1.1 Background 3
Table 1.2. Reordered workflow matrix, W, for the optimal sequence 2-1-5-3-6-
4.
Flow to workstation
2 1 5 3 6 4
Flow from 2 --- 30 20 10 0 10
workstation 1 0 --- 0 10 10 40
5 10 0 --- 20 20 50
3 5 0 10 --- 30 0
6 0 0 0 5 --- 20
4 0 5 10 5 0 ---
the authors note, the primary limiting factor of the dynamic program-
ming approach for the described applications is its sensitivity to com-
puter storage capacity.
Branch-and-bound is also an attractive partial enumeration strategy for
optimization problems in combinatorial data analysis; however, there is
currently no analogous resource to Hubert et al.’s (2001) dynamic pro-
gramming monograph. Our goal in this monograph is to fill this void. As
we shall see, the branch-and-bound approach offers some advantages
over dynamic programming for certain classes of problems, but also has
some drawbacks that must be carefully addressed for successful imple-
mentation. The branch-and-bound algorithms described in this mono-
graph require far less computer storage than their dynamic programming
competitors, which makes the former class of algorithms more amenable
to handling larger problems. On the other hand, dynamic programming
methods are much less sensitive to the characteristics of the input data
and are, therefore, more consistent with respect to solution times for
problems of a given size. Branch-and-bound algorithms are quite sensi-
tive to the properties of the data. For some data sets, branch-and-bound
might be more efficient than dynamic programming, whereas in other
cases branch-and-bound can require much more computation time than
dynamic programming. The key is to recognize that neither approach is
superior in all cases and consider the data properties when choosing a
methodology.
1.2 Branch-and-Bound
sition and, subsequently stepping back to the second position (i.e., mov-
ing from level three to level two). Next, a branch right operation would
be applied to the second position, resulting in the new partial sequence 2-
5. The implication is that all possible solutions from the partial sequence
2-4 had been either implicitly or explicitly evaluated, and the next step
was to move back to the second level and build branches from 2-5.
To illustrate the components of branch-and-bound, we return one final
time to the minimum backtracking example. We assume the initial upper
bound is 75 and that the current partial sequence is 1-5. The total back-
tracking that corresponds to this partial sequence is (w21 + w31 + w41 +
w51 + w61 + w25 + w35 + w45 + w65) = (30 + 0 + 5 + 0 + 0 + 20 + 10 + 10 +
0) = 75. We know that any completed sequence must have a backtrack-
ing total of at least 75 because we know that workstations 1 and 5 are the
first two in the sequence. However, we can augment this term by recog-
nizing that the very best possible contributions corresponding to the yet
unassigned workstations are given by min(w23 + w32) + min(w24 + w42) +
min(w26 + w62) + min(w34 + w43) + min(w36 + w63) + min(w46 + w64) = 5 +
0 + 0 + 0 + 5 + 0 = 10. Adding the two components together yields 75 +
10 = 85, which is greater than the current upper bound of 75. Thus, the
partial sequence of two workstations is pruned, and all (6-2)! = 24 possi-
ble sequences that could stem from this partial sequence need not be ex-
plicitly evaluated.
After pruning the partial sequence 1-5, we would branch right to the
partial sequence 1-6. The direct contribution to backtracking from this
partial sequence is 85, which exceeds the upper bound. Thus, the partial
sequence would be pruned. Branching right is not appropriate here, as
there is not a seventh workstation. Therefore, retraction occurs by mov-
ing back to level one and branching right, thus creating a new partial so-
lution that simply consists of workstation 2 in the first position.
When considering the partial sequence consisting of workstation 2 in
the first position, the direct contribution to backtracking is (w12 + w32 +
w42 + w52 + w62) = (0 + 5+ + 0 + 10 + 0) = 15. Adding the minimum pos-
sible contributions that could possibly occur from pairs of the unassigned
workstations yields a value of 30. Because 15 + 30 = 45 < 75, the
branch-and-bound algorithm would proceed by branching forward (i.e.,
moving from the first level to the second level) and creating the new par-
tial sequence 2-1. Pursuing this branch would ultimately lead to the op-
timal sequence 2-1-5-3-6-4, which was noted as the optimal solution in
section 1.1.
1.3 An Outline of the Monograph 9
Cluster Analysis–Partitioning
2 An Introduction to Branch-and-Bound
Methods for Partitioning
There are many possible indices that can be used to develop a partition of
the objects in S based on the dissimilarity information in A (see, for ex-
ample, Guénoche, 2003; Hansen & Jaumard, 1997; Hubert et al., 2001,
Chapter 3). Our development of these indices uses the following nota-
tion:
K = the number of clusters, indexed k = 1,..., K;
Ck = the set of objects assigned to cluster k (k = 1,..., K);
nk = the number of objects in cluster k, which is the cardinality of
Ck for k = 1,..., K (i.e., nk = ⏐Ck⏐, for k = 1,..., K);
πK = a feasible partition of K clusters, (C1, C2,..., CK);
ΠK = the set of all feasible partitions of size K {(πK = {C1, C2,...,
CK}) ∈ ΠK}.
2.1 Partitioning Indices 17
§ ¦ aij ·
¨ (i < j )∈Ck
K ¸ (2.3)
min : f 3 (π K ) = ¦ ¨ ¸,
π K ∈Π K nk
k =1 ¨ ¸
© ¹
¨ (i <¦
§ aij ·
K
j )∈Ck
¸
min : f 4 (π K ) = ¦ ¨ ¸. (2.4)
k =1 ¨ n k ( nk − 1) / 2 ¸
π K ∈Π K
© ¹
All of these criteria are concerned with the dissimilarity between pairs
of objects within the same cluster, known as the dissimilarity elements.
For a given cluster k, the maximum dissimilarity element between any
pair of objects within that cluster is referred to as the cluster diameter.
The maximum cluster diameter across all k clusters is termed the parti-
tion diameter, and this is associated with the objective criterion (2.1).
Minimization of the partition diameter produces clusters that are compact
in the sense that the large dissimilarity elements are suppressed by plac-
ing more dissimilar objects in different clusters. Criterion (2.1) is funda-
mentally different from (2.2), (2.3), and (2.4) in the sense that it mini-
18 2 An Introduction to Branch-and-Bound Methods for Partitioning
1 K §K · (2.5)
¦ (−1) k ¨¨ ¸¸( K − k ) n .
K ! k =0 ©k ¹
Assuming that the n objects have been enumerated into an object list, a
branch-and-bound paradigm for partitioning builds solutions by adding
successive objects to available clusters. A partial solution consists of an
assignment of the first p objects to clusters. The remaining n – p objects
have not been assigned a cluster. We denote Sp as the set of objects that
have been assigned to clusters and S p = S \ S p (i.e., the complement of
Sp with respect to S) as the unassigned objects. The evaluation of a partial
solution requires an answer to the question: Is it definitely impossible to
assign the objects in S p to clusters so as to provide an optimal solution?
If we can definitively answer this question as “Yes,” then the current par-
tial solution need not be pursued any further. Thus, the partial solution,
as well as all possible partial and complete clustering solutions that stem
from it, can be eliminated from further consideration. If the answer to the
question is “No,” then we move deeper into the search tree by assigning
object p + 1 to a cluster.
fr(λ*) = the incumbent (best found) objective value for the selected
criterion r corresponding to the incumbent complete solu-
tion, λ*.
2003; Diehr, 1985). Another part of the initialization process might in-
clude a reordering of the rows and columns of A to enable partial solu-
tions to be eliminated earlier in the branching process. Because the ap-
propriate heuristic procedures and matrix reordering strategies can differ
across criteria, we defer these until specific coverage of the criteria.
Step 1 is referred to as “BRANCH FORWARD” because, at this step,
the branch-and-bound algorithm is moving deeper into the search tree by
assigning a cluster membership to a newly selected object. Step 1 ad-
vances the pointer, p, and places the corresponding object in the first
cluster (k = 1).
Step 2 is a FEASIBILITY TEST that determines whether or not the
current partial solution could possibly yield a partition upon completion
of the cluster assignments. If n – p < τ, then there are not enough unas-
signed objects to fill the remaining clusters. Consider, for example, a
clustering problem for which n = 20 and K = 6. If the current pointer po-
sition is p = 17 and the 17 assigned objects have only been assigned to
two of the six clusters, then τ = 6 – 2 = 4. The assignment of the remain-
ing three unassigned objects could only provide memberships for three of
the four empty clusters. Thus, it is impossible to complete a six-cluster
partition and the current partial solution is pruned.
PARTIAL SOLUTION EVALUATION in Step 3 is the component of
the algorithm that really “makes or breaks” the success of the branch-
and-bound paradigm. This is the point in the algorithm where the ques-
tion is posed: Can completion of the partial solution possibly achieve an
objective value that is better than the incumbent? The evaluation of a
partial solution obviously includes information from the current partial
solution itself but, in order to achieve maximum effectiveness, must also
reflect what objective criterion contributions can possibly be achieved
from the yet unassigned objects. This is perhaps the most challenging as-
pect of branch-and-bound algorithm design, and appropriate strategies
can vary across different criteria.
If the current partial solution passes the tests in Steps 2 and 3, then it is
tested for completeness in Step 4. In other words, a check is made to de-
termine whether the partial solution is actually a complete solution. If the
partial solution is, in fact, a completed partition, then the partial solution
evaluation of Step 3 has determined it to be better than the incumbent so-
lution and, so, it replaces the current incumbent solution (λ* = λ). Other-
wise, the search moves deeper into the tree by returning to Step 1.
A DISPENSATION of the current partial solution is made in Step 5.
Step 5 determines whether the current partial solution should BRANCH
RIGHT in Step 6 by assigning the current object p to the next cluster (k +
2.2 A Branch-and-Bound Paradigm for Partitioning 23
3.1 Overview
Partitioning based on the diameter criterion (2.1) has a rich history in the
classification literature. The diameter criterion is also known as the
Maximum Method (Johnson, 1967) due to the focus on the minimization
of the maximum dissimilarity element within clusters. Compact Parti-
tioning is also a viable descriptor for minimum-diameter partitioning be-
cause the resultant clusters are kept as tight as possible given the down-
ward pressure on the maximum dissimilarities within clusters.
There are fundamental relationships between minimum-diameter parti-
tioning, complete-link hierarchical cluster analysis, and the coloring of
threshold graphs, and these are well documented in the classification lit-
erature (Baker & Hubert, 1976; Brusco & Cradit, 2004; Guénoche, 1993;
Hansen & Delattre, 1978; Hubert, 1974). Complete-link hierarchical
clustering is an agglomerative method that, at each level of the hierarchy,
merges clusters that result in the minimum increase in the maximum
pairwise dissimilarity element. Cutting the tree at some number of clus-
ters K ≥ 2 can produce a partition of objects; however, the resulting parti-
tion is not guaranteed to be an optimal solution for (2.1). In fact, results
from Hansen and Delattre (1978) suggest that the suboptimality associ-
ated with the use of complete-link hierarchical clustering as a heuristic
for the minimum-diameter partitioning problem can be quite severe.
Rao (1971) demonstrated that, for the special case of K = 2, the mini-
mum-diameter partitioning problem (2.1) can be solved using a straight-
forward repulsion algorithm. This bipartitioning procedure has also been
discussed by Hubert (1973), and a streamlined algorithm was presented
by Guénoche et al. (1991). The divisive bipartitioning approach can also
be used, in a heuristic manner, to produce solutions to (2.1) for K ≥ 3.
Specifically, using K-1 bipartition splits (each time focusing on the clus-
ter with the largest diameter), a K-cluster partition is produced. The final
partition obtained, however, does not necessarily yield a minimum di-
ameter when K ≥ 3.
26 3 Minimum-Diameter Partitioning
from the merger as small as possible. Thus, after one, two, and three it-
erations, there are n − 1, n − 2, and n − 3 clusters, respectively. The com-
plete-link procedure continues until there are exactly K clusters. The al-
gorithm is sensitive to the order of the data, as well as ties in the data set.
For most problems, a minimum-diameter partition solution for K clusters
is not realized from the algorithm. Therefore, we recommend applying
the exchange algorithm depicted in the pseudocode below, which is simi-
lar to the strategy described by Banfield and Bassil (1977), immediately
following the complete-link algorithm. The exchange algorithm consists
of two phases: (a) single-object relocation, and (b) pairwise interchange.
The single-object relocation phase examines the effect of moving each
object from its current cluster to each of the other clusters. Any move
that improves the diameter is accepted. The pairwise interchange phase
evaluates all possible swaps of objects that are not in the same cluster.
The two phases are implemented until no relocation or interchange fur-
ther improves the diameter criterion. The resulting solution, although not
necessarily a global optimum, is locally optimal with respect to reloca-
tions and interchanges. We now consider pseudocode for an initialization
procedure.
{Randomly Generate a feasible partition of the n objects into num_k
clusters and let lambda(i) represent the cluster membership of object i,
for 1 ≤ i ≤ n.}
Set flag = True
while flag
set flag = False; flag1 = True
while flag1 /* SINGLE OBJECT RELOCATION */
flag1 = False
for i = 1 to n
h = lambda(i)
diam_h = 0
for j = 1 to n
if lambda(j) = h and i <> j then
if A(i, j) > diam_h then diam_h = A(i, j)
end if
next j
for k = 1 to num_k
diam_k = 0
if k <> h then
for j = 1 to n
if lambda(j) = k and i <> j then
28 3 Minimum-Diameter Partitioning
k «
(
¬ q =1,..., p »¼
)
min ª max a jq λ q = k º ≥ f1(λ*), for any j = p + 1,..., n. (3.3)
The unassigned objects are evaluated in turn and, if any object cannot
be assigned to at least one of the K clusters without creating a pairwise
dissimilarity value that equals or exceeds the incumbent partition diame-
3.3 The PARTIAL SOLUTION EVALUATION Step 31
Diameters
1 1 1 0 0 Branch forward
2 11 2 35 0 Branch forward
12 2 1 0 0 TERMINATE
Table 3.3. Dissimilarity matrix for lipread consonants based on data from Man-
ning and Shofner (1991).
b c d f g h j k l m n
b 141 176 308 155 118 265 296 298 331 280
c 141 118 292 149 280 306 229 194 325 265
d 176 118 251 147 288 235 273 227 324 196
f 308 292 251 298 271 282 275 249 216 243
g 155 149 147 298 273 157 269 290 324 241
h 118 280 288 271 273 267 269 233 267 200
j 265 306 235 282 157 267 182 241 322 269
k 296 229 273 275 269 269 182 184 296 204
l 298 194 227 249 290 233 241 184 175 149
m 331 325 324 216 324 267 322 296 175 243
n 280 265 196 243 241 200 269 204 149 243
p 69 214 149 290 275 282 275 288 269 296 243
q 284 312 327 331 271 306 300 247 316 312 255
r 318 314 321 296 300 280 324 214 202 302 243
s 149 292 245 204 284 249 286 271 147 276 176
t 182 122 90 288 169 288 298 229 271 304 210
v 227 182 173 265 173 292 275 292 345 296 269
w 355 325 335 355 312 345 349 318 361 316 347
x 282 255 286 255 298 178 286 198 176 271 214
y 308 329 337 316 331 325 286 325 320 335 308
z 129 49 82 311 182 249 278 275 292 292 273
3.5 Application to a Larger Data Set 35
§ f * (π ) − f 1* (π K ) ·
drr ( K ) = 100¨¨ 1 K −*1 ¸¸ . (3.4)
© f1 (π K −1 ) ¹
36 3 Minimum-Diameter Partitioning
Table 3.4. Results for lipread consonants data at different numbers of clusters
(K).
Partition
K diameter drr(K) Partition
Table 3.5. Within-cluster submatrices for lipread consonants data in Table 3.3
that are obtained when optimizing (2.1).
Cluster 1
b c d g j p t v
b --
c 141 --
d 176 118 --
g 155 149 147 --
j 265 306 235 157
p 69 214 149 275 275 --
t 182 122 90 169 298 176 --
v 227 182 173 173 275 200 131 --
z 129 49 82 182 278 180 147 204
Cluster 2
f h k l m n r s
f --
h 271 --
k 275 269 --
l 249 233 184 --
m 216 267 296 175 --
n 243 200 204 149 243 --
r 296 280 214 202 302 243 --
s 204 249 271 147 276 176 300 --
x 255 178 198 176 271 214 227 75
Cluster 3
q w y
q --
w 298 --
y 284 300 --
38 3 Minimum-Diameter Partitioning
Brucker (1978) showed that the problem posed by (3.5) is NP-hard for
K ≥ 3. Polynomial algorithms for optimal bipartitioning solutions (K = 2)
for (3.5) have been developed by Hansen and Jaumard (1987) and Ram-
nath, Khan, and Shams (2004).
We adapted the branch-and-bound algorithm for minimizing the sum
of diameters. The necessary modifications for partial solution evaluation
are relatively minor. However, the bounding procedure is not as sharp
and the algorithm for minimizing the sum of cluster diameters does not
have the scalability of the algorithm for minimizing partition diameter.
We applied the algorithm for minimizing the sum of cluster diameters to
the data in Table 3.1 assuming K = 2 and obtained the partition {1, 2, 3,
5, 6}, {4}. The diameter for the cluster {1, 2, 3, 5, 6} is 63 and the di-
ameter for {4} is zero, resulting in a sum of diameters index of 63 and a
partition diameter of 63. Recall that the optimal partition diameter of 50
is obtained from the partition {1, 2, 5, 6} {3, 4}, which has a sum of
cluster diameters of 50 + 43 = 93. Thus, the optimal solutions for the two
diameter criteria are not the same. We also make the worthwhile obser-
vation that the partition {1, 4, 5, 6} {2, 3}, which in not an optimal solu-
tion for either criterion, is nevertheless an excellent compromise solu-
tion. The diameter for the cluster {1, 4, 5, 6} is 52 and the diameter for
{2, 3} is 17, resulting in a sum of diameters index of 69 and a partition
diameter of 52.
We have observed that the minimization of the sum of cluster diame-
ters has a tendency to produce a fairly large number of “singleton” clus-
ters (that is, clusters with only one object). Clusters with only one object
produce cluster diameters of zero, and this property often promotes the
peeling off of objects into singleton clusters. Although this tendency
might limit the utility of the sum of diameters criterion for some applica-
tions, the criterion nevertheless can facilitate interesting comparisons
with the more popular partition diameter criterion.
3.8 Available Software 39
One of the advantages of the partition diameter index is that, unlike (2.2),
(2.3), and (2.4), it is not predisposed to produce clusters of particular
sizes. This is important for contexts with the potential for one fairly large
cluster of objects and a few small clusters. Another advantage is that
minimization of the partition diameter is computationally less difficult
than minimizing the within-cluster sums of dissimilarities. The pruning
rules are very strong and, particularly when using a good reordering of
the objects, optimal solutions can often be obtained for problems with
hundreds of objects and ten or fewer clusters.
One of the limitations of the minimum-diameter criterion is that it
tends to produce a large number of alternative optimal solutions. The al-
ternative optima can often differ markedly with respect to membership
assignments, as well as relative cluster sizes. The quantitative analyst
must, therefore, select an appropriate optimal solution from the candidate
pool. There are at least two alternatives for facilitating this task. One ap-
proach is to apply a method that enumerates the complete set (or a subset
of the complete set) of minimum-diameter partitions (Guénoche, 1993).
Another approach is to apply multicriterion clustering procedures to
break ties among the minimum-diameter partitions based on a secondary
criterion (Brusco & Cradit, 2005; Delattre & Hansen, 1980). We will
discuss this latter approach in Chapter 6.
desired number of clusters, which can range from 2 to 20. The output of
the program is written both to the screen and a file “results.” The output
includes the partition diameter corresponding to the heuristic solution
(the upper bound), optimal partition diameter, the CPU time required to
obtain the solution, and the cluster assignments for each object. The pro-
gram “bbdisum.for” has the same file structures as bbdiam.for, but is de-
signed for minimization of the sum of the cluster diameters.
To illustrate the operation of bbdiam.for, we present the following
screen display information for execution of the program for the data in
Table 3.3 at K = 3 clusters.
> TYPE 1 FOR HALF MATRIX OR TYPE 2 FOR FULL MATRIX
> 1
> PLEASE INPUT NUMBER OF CLUSTERS 2 TO 20
> 3
> HEURISTIC SOLUTION DIAMETER 306.00000
> THE OPTIMAL MINIMUM DIAMETER 306.00000
> THE TOTAL COMPUTATION TIME 0.01
1 1 1 2 1 2 1 2 2 2 2 1 3 2 2
1 1 3 2 3 1
Stop – Program terminated.
For this particular data set, the heuristic produces the optimal partition
diameter and thus the initial upper bound is very tight. The optimal clus-
ter assignments are written in row form. Objects 1, 2, 3, 5, 7, 12, 16, 17,
and 21 are assigned to cluster 1. For these data, those objects correspond
to {b, c, d, g, j, p, t, v, z}, as described in section 3.5 above.
The application of bbdisum.for to the data in Table 3.3 under the as-
sumption of K = 3 clusters yields the following results:
> TYPE 1 FOR HALF MATRIX OR TYPE 2 FOR FULL MATRIX
> 1
> PLEASE INPUT NUMBER OF CLUSTERS 2 TO 20
> 3
> THE OPTIMAL MINIMUM SUM OF DIAMETERS 345.00000
1 0.0000
2 0.0000
3 345.0000
THE TOTAL COMPUTATION TIME 0.03
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 1 3 2 3
Stop - Program terminated.
tion. Two of the three clusters consist of only one object, {w} and {y},
respectively. These objects have some rather large dissimilarities with
other objects, which is the reason they are placed in their own singleton
clusters. Notice that the sum of diameters of 345 is also the partition di-
ameter. Although this partition diameter is somewhat higher than the op-
timal partition diameter of 306, the minimum partition diameter solution
produces a sum of diameters index of 306 + 302 + 300 = 908, which is
appreciably larger than 345.
The diameter-based partitioning algorithms are currently dimensioned
for up to 250 objects and 20 clusters. Actual problem sizes that can be
handled by the algorithm are conditional not only on n and K, but also on
the structure of the data. For instances if K = 2, we would expect these
branch-and-bound implementations to be significantly outperformed by
the appropriate bipartitioning algorithms (Guénoche et al., 1991; Hansen
& Jaumard, 1987).
4 Minimum Within-Cluster Sums of
Dissimilarities Partitioning
4.1 Overview
if (WithinSum(lambda(i)) + WithinSum(lambda(j)))
> (WithinI + WithinJ) then
WithinSum(lambda(i)) = WithinI
WithinSum(lambda(j)) = WithinJ
swap = lambda(i), lambda(i) = lambda(j), lambda(j) = swap
flag2 = True
end if
end if
next j
next i
loop /* End of pairwise interchange loop; flag2 */
loop /* End of exchange algorithm; flag */
The initial upper bound for (2.2) produced by this heuristic process is
denoted f2(λ*). For the remainder of this chapter, this notation refers to
the best found solution at any point in the branch-and-bound process.
¦ ¦ (a )
p −1 p
Component 1 = ij λi = λ j . (4.1)
i =1 j =i +1
next j
next i
/* Rank order the collection of dissimilarities */
for i = 1 to Index – 1
for j = i + 1 to Index
if collection(j) < collection(i) then
hold = collection(j), collection(j) = collection(i),
collection(i) = hold
end if
next j
next i
/* Determine the smallest number of dissimilarities needed for
Component 3 using equation (4.3) */
Set beta = n – Position and alpha = 0
while beta > num_k
beta = beta – num_k
alpha = alpha + 1
loop
Index = alpha * beta + num_k * alpha * (alpha – 1) / 2
for i = 1 to Index
Component3(Position) = Component3(Position) + collection(i)
next i
next Position /* End computation of Component3 bounds */
With the stored values for Component 3, the implementation of the
partial solution evaluation is expedited. During any given evaluation of
the partial solution, the complete bound is computed as follows:
Set Component1 = 0, Component2 = 0, Prune = False, Fathom = True.
/* Calculate Component 1 for assigned objects */
for i = 1 to Position – 1
for j = i + 1 to Position
if lambda(i) = lambda(j) then Component1 = Component1 + A(i, j)
next j
next i
/* Calculate Component 2 for unassigned objects by the minimum
possible contribution to the objective function value across all clusters */
for unassigned = Position + 1 to n
for k = 1 to num_k
sum(k) = 0
for assigned = 1 to Position
if lambda(assigned) = k then
4.3 The PARTIAL SOLUTION EVALUATION Step 49
develop improved bounds has also been recognized as useful when using
criterion (2.3), and will be discussed in more detail in Chapter 5.
p≥4 0
object 2 to cluster 2. The bound components for this partial solution are
Component 1 = 0, Component 2 = 134, and Component 3 = 63. Because
(0 + 134 + 63) = 197 < f2(λ*) = 211, it is necessary to pursue the corre-
sponding branch and plunge deeper into the tree. However, if we obtain
the optimal 2-cluster partition for the four objects (3, 4, 5, and 6), the re-
sulting partition is {3, 4} {5, 6}, which yields an objective value of a34 +
a56 = 43 + 34 = 77. This value of 77 is a better reflection of the best that
can be achieved among the last four objects, and can therefore replace
the value of 63 for Component 3. As a result, the new bound evaluation
is (0 + 134 + 77) = 211 ≥ f2(λ*) = 211. Using the improved bound com-
ponent, it is clear that the current partial solution cannot be pursued so as
to provide a partition that yields an objective criterion value better than
the incumbent value of 211. The partial solution is pruned at row 11 and
retraction occurs, precluding the evaluation of the solutions in rows 12
through 17 and resulting in algorithm termination and return of the opti-
mal solution. Again, we reiterate that this solution procedure requires a
back-to-front approach using successive executions of the branch-and-
bound algorithm. The main algorithm is executed for the last n − (K + 1)
objects and the optimal objective function value is stored as Compo-
nent3(n − (K + 1)) for use in the execution for the last n − (K + 2) objects
and so forth.
Cluster 1 b c d g p t v z
b --
c 141 --
d 176 118 --
g 155 149 147 --
p 69 214 149 275 --
t 182 122 90 169 176 --
v 227 182 173 173 200 131 --
z 129 49 82 182 180 147 204 --
Cluster 2 f h l m n s x
f --
h 271 --
l 249 233 --
m 216 267 175 --
n 243 200 149 243 --
s 204 249 147 276 176 --
x 255 178 176 271 214 75 --
Cluster 3 j k q r w y
j --
k 182 --
q 300 247 --
r 324 214 192 --
w 349 318 298 337 --
y 286 325 284 363 300 --
56 4 Minimum Within-Cluster Sums of Dissimilarities Partitioning
As pointed out in sections 2.1 and 4.5, criterion (2.3) has a particularly
important role in cluster analysis. We refer to (2.3) as the standardized
within-cluster sums of dissimilarities, where standardization occurs via
the division of the within-cluster sums by the number of objects assigned
to the cluster. When the elements of A correspond to squared Euclidean
distances between pairs of objects, then an optimal solution to (2.3)
minimizes the within-cluster sums of squared deviations between objects
and their cluster centroids. That is, minimizing the sum of the standard-
ized within-cluster sums of dissimilarities is equivalent to minimizing the
within-cluster sums of squares. This is the well-known K-means criterion
(Forgy, 1965; MacQueen, 1967), for which heuristic programs are avail-
able with most commercial statistical software packages.
Although (2.3) is a viable criterion when the elements of A do not cor-
respond to squared Euclidean distances between pairs of objects, greater
caution must be taken in branch-and-bound implementation for such in-
stances. Because the dissimilarities are assumed to be nonnegative, the
addition of an object to an existing cluster cannot decrease the raw
within-cluster sum of pairwise dissimilarities; however, the standardized
within-cluster sum can decrease. Consider, for example, a hypothetical
cluster with two objects {1, 2}. Suppose that a12 = 5, and, thus, the raw
sum of dissimilarities for the cluster is 5 and the standardized sum is 5/2
= 2.5 (divide by two because there are two objects in the cluster). Sup-
pose we wish to add object 3 to the cluster and a13 = a23 = 1. The raw
sum for the cluster {1, 2, 3} increases to 5 + 1 + 1 = 7, but the standard-
ized sum decreases to 7/3 = 2.33.
The possibility for reduction of the standardized within-cluster sums
of pairwise dissimilarities when an object is added to a cluster precludes
the use of some of the bound components available for (2.2). In other
words, the development of strong bounds (analogous to Components 2
60 5 Minimum Within-Cluster Sums of Squares Partitioning
and 3 described in section 4.2) for (2.3) are not readily available for gen-
eral dissimilarity matrices. This seriously diminishes the effectiveness of
branch-and-bound for (2.3) when A does not have the necessary metric
properties.
Stronger bound components are available, however, when considering
the special case of A corresponding to squared Euclidean distances. Un-
der such conditions, Koontz et al. (1975) proved that the addition of an
object to a cluster cannot possibly decrease the standardized within-
cluster sums of dissimilarities. In fact, an increase in the standardized
sum will not occur only if the added object is positioned exactly at the
centroid of the cluster (in which case, the change in the standardized sum
is 0). Throughout the remainder of this chapter, we will assume that A is
a matrix of squared Euclidean distances.
Notice that the implementation is extremely similar to that of the im-
plementation of criterion (2.2) in Chapter 4.
/* Compute the within-cluster sum of dissimilarities */
for k = 1 to num_k /* Initialize sum/centroid for each cluster */
Centroid(k) = 0
next k
for Position1 = 1 to n − 1
k1 = lambda(Position1)
for Position2 = Position1 + 1 to n
k2 = lambda(Position2)
if k1 = k2 then
Centroid(k1) = Centroid(k1) + A(Position1, Position2)
end if
next Position2
next Position1
EVALUATION = 0
for k = 1 to num_k
EVALUATION = EVALUATION + Centroid(k) / CSize(k)
next k /* End computation of criterion (2.3) */
cluster. Objects are assigned to their nearest cluster, and the centroids are
recomputed. The iterative process of reassignment and recomputation
continues until no objects change cluster assignment on a particular itera-
tion. Although the K-means algorithm is extremely efficient, the algo-
rithm is sensitive to the initial seed points and, therefore, multiple restarts
of the algorithm using different seed points are recommended. The im-
portance of using multiple seed points has recently been demonstrated in
a study by Steinley (2003).
For our demonstrations of the branch-and-bound process, we assume a
matrix of squared Euclidean distances. We have applied the exchange al-
gorithm described in Chapters 3 and 4 after appropriate modification for
criterion (2.3) to a randomly generated solution to produce the initial in-
cumbent solution in our demonstrations. Given the similarity between
criterion (2.2) and criterion (2.3), the INITIALIZE step of Chapter 4 is
naturally very similar to the INITIALIZE step for this chapter. However,
for criterion (2.3) we need not compute Component 3, but we do need to
track the cluster sizes and ensure that no feasible solution contains an
empty cluster. The need to track the cluster sizes is a byproduct of the
possibility that the standardized within-cluster sum of squares will de-
crease when a new object is added. Although this initialization works
well, a tandem approach, using K-means followed by the exchange algo-
rithm, would quite possibly yield better results. The incumbent solution
will be noted as f3(λ*) in the remainder of the chapter.
k=0
for c = 1 to num_k
CSize(c) = 0
next c
for i = 1 to n
k=k+1
if k > num_k then k = 1
lambda(i) = k
CSize(k) = CSize(k) + 1
next i
flag = True
while flag
Set flag = False, flag1 = True, and tempflag = 1
/* SINGLE OBJECT RELOCATION */
while flag1 and (tempflag < 10)
tempflag = tempflag + 1
flag1 = False
62 5 Minimum Within-Cluster Sums of Squares Partitioning
flag1 = True
end if
end if
next NewCluster
next object
loop flag1
Set flag2 = True
while flag2 /* PAIRWISE INTERCHANGE */
flag2 = False
for i = 1 to n − 1
for j = i + 1 to n
if lambda(i) <> lambda(j) then
WithinI = WithinSum(lambda(i))
WithinJ = WithinSum(lambda(j))
for bedfellow = 1 to n
if lambda(bedfellow) = lambda(i) then
WithinI = WithinI - A(i, bedfellow) + A(j, bedfellow)
end if
if lambda(bedfellow) = lambda(j) then
WithinJ = WithinJ - A(j, bedfellow) + A(i, bedfellow)
end if
next bedfellow
size1 = CSize(lambda(i)) and size2 = CSize(lambda(j))
if (WithinSum(lambda(i)) / size1
+ WithinSum(lambda(j)) / size2)
> (WithinI / size1 + WithinJ / size2) then
WithinSum(lambda(i)) = WithinI
WithinSum(lambda(j)) = WithinJ
swap = lambda(i), lambda(i) = lambda(j), lambda(j) = swap
flag2 = True
end if
end if
next j
next i
loop /* End of pairwise interchange loop; flag2 */
loop /* End of exchange algorithm; flag */
64 5 Minimum Within-Cluster Sums of Squares Partitioning
Following Koontz et al. (1975) and Diehr (1985), the evaluation of par-
tial solutions initially focuses on two bound components. The first of the
components is the sum of the standardized within-cluster sums of dis-
similarities for the p assigned objects:
ª p −1 p
K «¦ ¦
(
aij λ i = λ j = k )º»
i =1 j =i +1 (5.1)
Component 1 = ¦ « ».
k =1 « nk »
« »
¬ ¼
The second bound component, which is very weak, examines the ef-
fect of adding each of the unassigned objects to each of the clusters. For
each of the unassigned objects, the minimum possible contribution across
all clusters is selected, and the maximum of these minimums is selected
as the second bound component. More formally:
Sum(c) = 0
for assigned = 1 to Position
if lambda(assigned) = c then
Sum(c) = Sum(c) + A(unassigned, assigned)
next assigned
next c
MinK = Sum(1)
for c = 2 to num_k
if Sum(c) < MinK then MinK = Sum(c)
next c
if MinK > Component2 then Component2 = MinK
next unassigned
/* Determine acceptability of partial solution evaluation */
if Component1 + Component2 >= incumbent then
PART_EVAL = prune
else
PART_EVAL = fathom
end if /* End of partial solution evaluation */
Because the second component of the bound is weak, a better option is
to replace Component 2 with an optimal solution for the unassigned ob-
jects. Again, the motivation here is that the branch-and-bound algorithm
using Components 1 and 2 is effective for small values of n, and thus can
be used to rapidly obtain optimal solutions for suborders of the set of ob-
jects. This is similar to the approach outlined in section 4.2. Koontz et al.
(1975) and Diehr (1985) went a step further and discussed the initial par-
titioning of the object set, S, into e manageably sized subsets {S1, S2, ...,
Se}. Because the sizes for the subsets are small, they can be optimally
solved using the bounding components associated with (5.1) and (5.2).
The optimal criterion values for the object subsets 1, 2,..., e are denoted
f 3* ( S1 ), f 3* ( S 2 ),..., f 3* ( S e ) , respectively. The optimal solutions for the
subsets are subsequently pieced together in a systematic fashion until an
optimal solution for the complete object set is provided. This decomposi-
tion approach is based on bound improvement, which is achievable be-
cause the optimal criterion value solution for the complete object set
must equal or exceed the sum of the optimal objective criterion values
for the subsets. Specifically:
f 3* ( S ) ≥ f 3* ( S1 ) + f 3* ( S 2 ) + ... + f 3* ( S e ) . (5.4)
66 5 Minimum Within-Cluster Sums of Squares Partitioning
4
6
5
6
4
2
3
3
2
1
1
5
0
0 1 2 3 4 5 6 7 8 9
Table 5.2. Branch-and-bound summary for criterion (2.3) for the data in Table
5.1
Standardized
Components
sums
Row λ p k=1 k=2 2 3 Dispensation
1 1 1 0 0 0 10 Branch forward
Prune (24.5 ≥ 23.1),
2 11 2 14.50 0 0 10 Branch right
3 12 2 0 0 0 10 Branch forward
4 121 3 4.00 0 0 10 Branch forward
5 1211 4 12.67 0 5.33 0 Branch forward
Prune (32.4 ≥ 23.1),
6 12111 5 23.50 0 8.90 0 Branch right
7 12112 5 12.67 5.0 5.33 0 Branch forward
*New Incumbent,
8 121121 6 18.00 5.0 0 0 f3(λ*) = 23
Suboptimal, 39.3 ≥
9 121122 6 12.67 26.67 0 0 23, Retract
10 1212 4 4.00 4.00 8.67 0 Branch forward
Prune (25.5 ≥ 23.1),
11 12121 5 12.67 4.00 8.83 0 Branch right
12 12122 5 4.00 14.67 3.33 0 Branch forward
*New Incumbent,
13 121221 6 7.33 14.67 0 0 f3(λ*) = 22
Suboptimal, 37.5 ≥
14 121222 6 4.00 33.50 0 0 22, Retract
15 122 3 0 4.50 0 10 Branch forward
Prune (23.17 ≥ 22),
16 1221 4 12.50 4.50 6.17 0 Branch right
17 1222 4 0 7.33 8.50 0 Branch forward
Prune (26.5 ≥ 22),
18 12221 5 8.50 7.33 10.17 0 Branch right
19 12222 5 0 17.75 2.50 0 Branch forward
*New Incumbent,
20 122221 6 2.50 17.75 0 0 f*(λ*) = 20.25
Suboptimal, 34 ≥
21 122222 6 0 34.00 0 0 20.25, Retract
22 2 1 TERMINATE
The minimum-diameter partitioning algorithm was applied using an
initial bound of 26, whereas the within-cluster sums of dissimilarities al-
gorithm used a bound of 74. The optimal 2-cluster minimum-diameter
partition is {1, 3, 4, 6}, {2, 5}, and the optimal 2-cluster raw within-
5.4 A Numerical Example 69
cluster sum-of dissimilarities partition is {1, 3, 6}, {2, 4, 5}. Thus, three
different criteria, (2.1), (2.2), and (2.3), yield three different optimal par-
titions for this small data set.
Table 5.3. Branch-and-bound summary for criterion (2.1) for the data in Table
5.1.
Diameters
Row λ p k=1 k=2 Dispensation
1 1 1 0 0 Branch forward
2 11 2 29 0 Prune (29 ≥ 26), Branch right
3 12 2 0 0 Branch forward
4 121 3 8 0 Branch forward
5 1211 4 25 0 Branch forward
6 12111 5 26 0 Prune (Component 2), Branch right
7 12112 5 25 10 Branch forward
8 121121 6 25 10 *New Incumbent, f1(λ*) = 25
9 121122 6 25 34 Suboptimal, 34 ≥ 25, Retract
10 1212 4 8 8 Branch forward
11 12121 5 17 8 Prune (Component 2), Branch right
12 12122 5 8 26 Prune (26 ≥ 25), Retract
13 122 3 0 9 Branch forward
14 1221 4 25 9 Prune (25 ≥ 25), Branch right
15 1222 4 0 9 Branch forward
16 12221 5 17 9 Prune (Component 2), Branch right
17 12222 5 0 26 Prune (26 ≥ 25), Retract
18 2 1 0 0 TERMINATE
70 5 Minimum Within-Cluster Sums of Squares Partitioning
Table 5.4. Branch-and-bound summary for criterion (2.2) for the data in Table
5.1.
Sums Components
Row λ p k=1 k=2 2 3 Dispensation
1 1 1 0 0 0 31 Branch forward
2 11 2 29 0 0 20 Branch forward
3 111 3 46 0 0 20 Branch forward
Prune (84 ≥ 74),
4 1111 4 84 0 0 0 Branch right
Prune (92 ≥ 74),
5 1112 4 46 0 46 0 Retract
Prune (76 ≥ 74),
6 112 3 29 0 27 20 Retract
7 121 3 8 0 32 20 Branch forward
Prune (82 ≥ 74),
8 1211 4 38 0 44 0 Branch right
9 1212 4 8 8 44 0 Branch forward
Prune (94 ≥ 74),
10 12121 5 38 8 48 0 Branch right
11 12122 5 8 44 14 0 Branch forward
*New Incumbent,
12 121221 6 22 44 0 0 f2(λ*) = 66
Suboptimal (98 ≥ 66),
13 121222 6 8 90 0 0 Retract
14 122 3 0 9 35 20 Branch forward
Prune (82 ≥ 66),
15 1221 4 25 9 48 0 Branch right
16 1222 4 0 22 22 0 Branch forward
Prune (74 ≥ 66),
17 12221 5 17 22 35 0 Branch right
Prune (76 ≥ 66),
18 12222 5 0 71 5 0 Retract
19 2 1 TERMINATE
Table 5.5. Coordinates for 22 German towns from Späth (1980, p. 43).
Town x- y- Town x- y-
Name Name
# axis axis # axis axis
1 Aachen -57 28 12 Köln -38 35
2 Ausburg 54 -65 13 Mannheim -5 -24
3 Braunschweig 46 79 14 München 70 -74
4 Bremen 8 111 15 Nürnberg 59 -26
5 Essen -36 52 16 Passau 114 -56
6 Freigburg -22 -76 17 Regensburg 83 -41
7 Hamburg 34 129 18 Saarbrücken -40 -28
8 Hof 74 6 19 Würzburg 31 -12
9 Karlsruhe -6 -41 20 Bielefeld 0 71
10 Kassel 21 45 21 Lübeck 50 140
11 Kiel 37 155 22 Münster -20 70
Table 5.6. Results for German towns data at different numbers of clusters (K),
criterion (2.3).
Standardized
within-cluster
K sums ssrr(K) Partition
{1,3,4,5,7,10,11,12,20,21,22}
2 64409.45 -- {2,6,8,9,13,14,15,16,17,18,19}
{1,5,6,9,12,13,18} {2,8,14,15,16,17,19}
3 39399.14 38.83 {3,4,7,10,11,20,21,22}
{1,5,10,12,20,22} {2,8,14,15,16,17,19}
4 21719.32 44.87 {3,4,7,11,21} {6,9,13,18}
{1,5,10,12,20,22} {2,14,16,17}
5 16592.55 23.60 {3,4,7,11,12} {6,9,13,18} {8,15,19}
{1,5,12,22} {2,14,16,17} {3,10,20}
6 11889.25 28.35 {4,7,11,21} {6,9,13,18} {8,15,19}
{1,5,12} {2,14,16,17} {3,10} {4,20,22}
7 9950.75 16.30 {6,9,13,18} {7,11,21} {8,15,19}
{1,5,12} {2,14} {3,10} {4,20,22}
8 8177.50 17.82 {6,9,13,18} {7,11,21} {8,15,19} {16,17}
Table 6.1. Example data set (20 objects measured on four performance drivers
and two performance measures).
Object v1 v2 v3 v4 w1 w2
1 3 7 4 7 4 6
2 6 6 5 6 6 3
3 4 7 7 7 7 2
4 1 7 3 6 3 5
5 5 6 6 7 1 7
6 7 3 7 4 2 4
7 6 2 7 2 6 1
8 7 6 7 4 7 3
9 7 4 6 1 7 2
10 6 7 7 5 5 3
11 5 7 6 2 2 7
12 2 7 7 5 7 3
13 3 6 7 3 7 1
14 7 6 7 2 3 5
15 1 7 6 4 5 3
16 7 2 5 6 6 2
17 6 1 3 7 7 2
18 5 5 2 7 4 6
19 7 3 6 5 7 4
20 6 4 4 7 2 4
The optimal partitioning solutions for these two sets of variables are
shown in the first two panels of Table 6.2. Our example has L = 2 data
sources: (a) service attributes, and (b) performance measures. Therefore,
we can conveniently express the within-cluster sums of squares as a per-
centage of the total sum of squares, which is computed as follows:
¦a
i< j
l
ij
88.7% of the total variation in these data, but this same partition explains
only 13.8% of the variation for service attributes. Thus, the two sets of
clustering variables appear to have a somewhat antagonistic relationship.
If we select an optimal partition for the service attributes, the explanation
of variation in the performance measures is extremely poor, and vice
versa. The question is: Can a multiobjective approach identify a partition
that yields a good compromise between the two sets of data, yielding
good explanatory power for both sets of variables?
Table 6.2. Single-objective and biobjective solution for the data in Table 6.1.
Explained Explained
Criterion variation variation
optimized F(λ*) (v1, v2, v3, v4) (w1, w2) Partition (λ*)
{1,2,3,5,10}
Service N/A {4,12,13,15}
66.2% 4.4% {6,7,8,9,11,14}
attributes
{16,17,18,19,20}
{1,4,6,14,18,20}
Performance N/A {2,8,10,12,15,19}
13.8% 88.7%
measures {3,7,9,13,16,17}
{5,11}
Biobjective {1,4,5,18,20}
84.54286 {2,3,8,10,12,13,15}
equal 51.4% 78.1% {7,9,16,17,19}
weights {6,11,14}
When there are l = 1,..., L separate sources of data, the general mul-
tiobjective model considered for this type of problem is as follows:
L
Minimize : Z1 (π K ) = ¦ wl f3l (π K ) , (6.2)
π K ∈Π K
l =1
¦w l = 1; (6.4)
l =1
Diameter Sums
Row (2.1) (2.2) Partition
1 306 15590 {b,c,d,g,j,p,t,v,z} {f,h,k,l,m,n,r,s,x} {q,w,y}
2 325 14569 {b,c,d,g,j,p,t,v,z} {f,h,l,m,n,r,s,x} {k,q,w,y}
3 329 14229 {b,c,d,g,j,p,t,z} {f,h,l,m,n,r,s,x} {k,q,v,w,y}
4 335 13944 {b,c,d,g,p,t,v,z} {f,h,j,l,n,r,s,x} {k,m,q,w,y}
5 337 13932 {b,c,d,g,p,t,v,z} {f,h,j,l,n,s,y} {k,m,q,r,w,x}
6 345 13778 {b,c,d,g,p,t,v,z} {f,j,l,n,r,s,x} {h,k,m,q,w,y}
7 347 13706 {b,c,d,g,p,t,v,z} {f,j,k,l,r,s,x} {h,m,n,q,w,y}
8 349 13428 {b,c,d,g,p,t,v,z} {f,h,l,n,r,s,x} {j,k,m,q,w,y}
9 355 13382 {b,c,d,g,p,t,v,z} {h,k,l,n,r,s,x} {f,j,m,q,w,y}
10 363 13177 {b,c,d,g,p,t,v,z} {f,h,l,m,n,s,x} {j,k,q,r,w,y}
terpreted in section 6.2. Also, observe that selected weights for the dis-
played solution are (w1 = w2 = .5), yielding an equal weighting scheme.
The input information is as follows:
> TYPE 1 FOR HALF MATRIX OR 2 FOR FULL MATRIX INPUT
> 1
> INPUT NUMBER OF CLUSTERS
> 4
> INPUT WEIGHTS
> .5 .5
The output information is as follows:
NUMBER OF OBJECTS 5 Z = 2.00000************
NUMBER OF OBJECTS 6 Z = 5.50000 15.75010
NUMBER OF OBJECTS 7 Z = 9.83333 14.50010
NUMBER OF OBJECTS 8 Z = 13.58333 13.58343
NUMBER OF OBJECTS 9 Z = 15.16667 15.16677
NUMBER OF OBJECTS 10 Z = 17.91667 17.91677
NUMBER OF OBJECTS 11 Z = 25.20833 25.20843
NUMBER OF OBJECTS 12 Z = 37.00000 37.00010
NUMBER OF OBJECTS 13 Z = 43.58333 43.87510
NUMBER OF OBJECTS 14 Z = 49.95833 50.43343
NUMBER OF OBJECTS 15 Z = 57.20833 57.20843
NUMBER OF OBJECTS 16 Z = 63.70833 63.70843
NUMBER OF OBJECTS 17 Z = 73.33333 73.33343
NUMBER OF OBJECTS 18 Z = 77.30833 77.30843
NUMBER OF OBJECTS 19 Z = 81.96786 82.95843
NUMBER OF OBJECTS 20 Z = 84.54286 84.54296
problem was 5.96 seconds, and the optimal partition shown is consistent
with the solution in Table 6.2 corresponding to equal weights.
There are numerous possible modifications of bbbiwcss.for. First, the
code could be modified to run the biobjective algorithm for a systematic
set of weights rather than require user specification of the weights. For
example, an outer loop represented by iloop = 0 to 10 could embed the
algorithmic code. The weight for matrix A would be iloop / 10 and the
weight for matrix B would be ( 1 – iloop / 10). Thus, in total, 11 solu-
tions would be produced. Two of these are the single objective optimal
solutions for each matrix and the others differentially weighting the ma-
trices. A second modification, as described in section 6.2, would be to
normalize the within-cluster sums of squares for each matrix based on
the optimal single criterion within-cluster sums of squares. A third exten-
sion of the code would be to incorporate three or more matrices. Al-
though this is not particularly difficult from a computational standpoint,
finding a good set of weights when three or more criteria are considered
can be a time-consuming process.
Part II
Seriation
7 Introduction to the Branch-and-Bound
Paradigm for Seriation
7.1 Background
is chosen more than once in a permutation and the retraction test to en-
sure that only valid objects are chosen, i.e. if n objects define a matrix,
then n +1 objects cannot be candidates for a position. Once a partial per-
mutation is known to be valid, we can use fathoming tests to determine
whether or not we wish to pursue building the permutation. A fathoming
test, such as an adjacency test or bound test, is used to determine whether
or not a partial sequence can lead to an optimal permutation. For exam-
ple, the stored lower bound is used in the bound test to determine
whether or not the current partial sequence could possibly lead to a better
solution. If any of the fathoming tests fail, then we abandon the current
partial permutation by selecting the next available valid object for the
current position; otherwise, we move to the next position in the sequence
and find the first available valid object for that position. By abandoning
futile partial sequences, we avoid evaluating every possible permutation
of objects to find the optimal solution. Finally, when an entire permuta-
tion is found as a true candidate for being an optimal solution, we evalu-
ate the objective function for the given permutation. If we have not pre-
viously found a better solution, then we store the evaluated permutation
and update our lower bound as the new best-found solution.
Throughout the chapters pertaining to seriation, the algorithm and all
examples assume a maximization objective. We use the following nota-
tion to assist in discussions of permutations of objects (additional nota-
tion will be defined as needed):
n = the number of objects, indexed i = 1,..., n;
A = an n × n matrix containing data for the n objects, {aij};
Ψ = the set of all n! feasible permutations of the n objects;
ψ(k) = the object in position k of permutation ψ, (ψ ∈ Ψ);
p = a pointer for the object position;
fL = lower bound used in bounding tests.
The algorithm presented below and the algorithmic pseudocode in
Appendix B are designed for forward branching. That is, the permuta-
tions are generated from left to right. Another branching method is alter-
nating branching in which objects are assigned to the ends of the permu-
tation and progressively fill positions with objects until the middle
positions are filled. In alternating branching, steps 2 and 3 (ADVANCE
and BRANCH) are slightly modified as well as the fathoming routines,
but the remainder of the algorithm—initialization and evaluation—are
intact.
7.2 A General Branch-and-Bound Paradigm for Seriation 95
The dominance index is perhaps the most widely used index for seriation
of asymmetric matrices (Hubert et al., 2001) with a rich history in bio-
metric, psychometric, and other literature bases (Blin & Whinston, 1974;
Bowman & Colantoni, 1973, 1974; Brusco, 2001; Brusco & Stahl,
2001a; DeCani, 1969, 1972; Flueck & Korsh, 1974; Hubert, 1976;
Hubert & Golledge, 1981; Phillips, 1967, 1969; Ranyard, 1976; Rodgers
& Thompson, 1992; Slater, 1961). Essentially, maximization of the
dominance index is achieved by finding a permutation that maximizes
the sum of matrix elements above the main diagonal. For any pair of ob-
jects corresponding to the rows and columns of an asymmetric matrix A,
the tendency will be to place object i to the left of object j in the se-
quence if aij > aji. Notice that the asymmetric matrix, A, need not be re-
stricted to nonnegative entries. The problem of maximizing the domi-
nance index for an asymmetric matrix A can be mathematically stated as
follows (Lawler, 1964):
n −1 n (8.1)
max : f (ψ ) = ¦ aψ ( k )ψ (l ) = ¦ ¦ aψ ( k )ψ ( l ) .
ψ ∈Ψ
k <l k =1 l = k +1
1 if r j > ri (8.3.1)
°
where indexi (r j ) = ®1 if r j = ri , j < i
°0 otherwise.
¯
8.2 Fathoming Tests for Optimizing the Dominance Index 99
The adjacency test for the dominance index amounts to the comparison
of aψ(p-1)ψ(p) to aψ(p)ψ(p-1). By using this adjacency test, we are guaranteed
to find an optimal permutation that satisfies the well-known necessary
condition for optimal solutions known as the Hamiltonian ranking prop-
erty, which is,
aψ ( k )ψ ( k +1) ≥ aψ ( k +1)ψ ( k ) for 1 ≤ k ≤ n − 1. (8.4)
next j
next i
In mathematical parlance, the first component of the upper bound for a
partial sequence in optimizing the dominance index is rather straightfor-
ward.
p k −1 (8.5.1)
§ ·
f B1 = ¦ ¨ rψ ( k ) − ¦ aψ ( k )ψ (l ) ¸ .
k =1 © l =1 ¹
The second component of the upper bound is concerned with the com-
plement of R(p) in the set of all n objects, S\R(p), which contains the re-
maining (n – p) objects. Because the dominance index sums row entries
to the right of the diagonal, we want to find the largest possible entries to
the right of the diagonal in the rows for the remaining objects. We allow
the complement to contain indices for the remaining objects with an ini-
tial index of Position + 1 for the sake of clarity.
/* Determine S\R(Position) */
index = Position + 1
for i = 1 to n
found = False
for j = 1 to Position
if permutation(j) = i then found = true
next j
if not found then
complement(index) = i
index = index + 1
end if
next i
/* Find maximum contribution to objective function value by the
complement */
ComplementContribution = 0
for i = Position + 1 to n − 1
for j = i + 1 to n
if A(complement(i), complement(j)) >
A(complement(j), complement(i)) then
max = A(complement(i), complement(j))
else
max = A(complement(j), complement(i))
end if
ComplementContribution =ComplementContribution + max
102 8 Seriation—Maximization of a Dominance Index
next j
next i
With the understanding of how to calculate the highest possible con-
tribution of S\R(Position) to the dominance index, the mathematical nota-
tion is succinct and sensible:
f B2 = ¦ max(a ij , a ji ) . (8.5.2)
i < j∈S \ R ( p )
Thus, the upper bound for a partial sequence in optimizing the domi-
nance index is UpperB = PartSolution + ComplementContribution or,
mathematically, fB = fB1 + fB2. If the upper bound, UpperB, is less than or
equal to the previously determined lower bound, LowerB, then the bound
test fails, i.e. BOUNDTEST = False.
The reordered matrix gives us an initial lower bound of 52. Using this
lower bound, we begin the main algorithm. The execution of the branch-
and-bound algorithm shows beneficial pruning for many partial se-
quences, leading to right branches or retraction as shown in Table 8.2. As
a note of interest, the number of iterations reduces from 22 to 17 by per-
forming the branch-and-bound algorithm on the re-ordered matrix in Ta-
ble 8.1. Re-ordering a matrix during the INITIALIZE step is a commonly
8.4 EXAMPLES—Extracting and Ordering a Subset 103
Dispensation
complement
contribution
Adjacency
Maximum
sequence
solution
UpperB
Partial
Partial
Row
test
and exhibit the characteristic of aij + aji = 1 for all i j. Not all paired-
comparison matrices have binary entries.) Maximizing the dominance
index produces a permutation that tends to place the 1s in the upper tri-
angle (and 0s generally in the lower triangle) of the re-ordered matrix. As
an example, we refer to a 15 × 15 tournament matrix originally reported
by Hubert and Schultz (1975).
Table 8.4. A 15 × 15 tournament matrix from Hubert & Schultz (1975) per-
muted to achieve a maximum dominance index.
6 2 11 14 10 5 12 7 1 3 8 13 15 9 4
6 0 1 1 1 1 0 0 1 1 1 1 1 1 0 1
2 0 0 1 1 1 0 1 0 0 1 0 1 0 1 0
11 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1
14 0 0 0 0 1 1 1 0 1 1 1 1 1 0 1
10 0 0 0 0 0 1 1 1 1 1 1 0 1 0 0
5 1 1 0 0 0 0 1 1 1 0 1 1 1 0 1
12 1 0 0 0 0 0 0 1 1 1 1 1 0 1 0
7 0 1 0 1 0 0 0 0 1 1 1 1 1 1 1
1 0 1 0 0 0 0 0 0 0 1 1 1 0 1 1
3 0 0 1 0 0 1 0 0 0 0 1 1 1 0 1
8 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1
13 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0
15 0 1 0 0 0 0 1 0 1 0 0 0 0 1 1
9 1 0 0 1 1 1 0 0 0 1 0 0 0 0 1
4 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0
Table 8.7. A 13 × 13 submatrix extracted from Thurstone’s (1927) data and ex-
hibiting perfect dominance.
12 11 4 5 8 6 3 10 2 15 1 14 9
12 0 652 615 678 680 716 855 894 818 886 809 981 966
11 348 0 615 667 657 697 785 830 793 848 778 970 970
4 385 385 0 515 534 556 740 743 757 785 789 947 970
5 322 333 485 0 580 593 774 856 719 769 762 981 981
8 320 343 466 420 0 512 746 819 726 820 788 966 951
6 284 303 444 407 488 0 679 804 715 756 756 963 947
3 144 215 260 226 254 321 0 563 585 716 662 944 917
10 106 170 257 144 181 196 437 0 635 595 682 902 917
2 182 207 242 281 274 285 415 365 0 589 677 925 863
15 114 152 215 231 180 244 284 405 411 0 581 924 819
1 191 222 211 238 212 244 338 318 323 419 0 822 760
14 19 30 53 19 34 37 56 98 75 76 178 0 559
9 34 30 30 19 49 53 83 83 137 181 240 441 0
110 8 Seriation—Maximization of a Dominance Index
Results for the data in Table 8.6, Thurstone’s (1927) severity of criminal
offenses data from Hubert and Golledge (1981, p. 435):
MAXIMUM DOMINANCE INDEX 78946.0000 CPU TIME 0.01
112 8 Seriation—Maximization of a Dominance Index
12 11 4 5 8 6 13 7 3 10 2 15 1 14 9
Results for van der Heijden, Malthas, and van den Roovaart’s (1984) in-
terletter confusion matrix from Heiser (1988, p. 41):
MAXIMUM DOMINANCE INDEX 10577.0000 CPU TIME 0.90
19 5 22 6 12 21 23 13 2 1 8 14 11 24 18 17 15 3 7 4 20 25
9 16 26 10
+1 if x > 0 (9.1)
°
sign( x) = ® 0 if x = 0
°− 1 if x < 0.
¯
For the weighted within row and column gradient, the appropriate al-
gorithmic pseudocode tallies 2*A(permutation(i), permutation(k)) –
A(permutation(i), permutation(j)) – A(permutation(j), permutation(k)).
from the others but holds to a roughly similar pattern. The adjacency
tests for criteria involving column examinations require the formation of
a set, S\R(p), containing all unassigned objects, i.e. the complement in S
of the set of objects chosen for the first p positions in the sequence, as
was used in the bound test for maximizing the dominance index.
The following equations were developed (Brusco, 2002b) as success-
ful adjacency tests for the corresponding gradient indices of equations
(9.2.1) through (9.2.4). The left-hand side of each equation totals the
anti-Robinson index contributions to the objective function for the crite-
rion by ψ(p – 1) and ψ(p) when ψ(p – 1) precedes ψ(p); the right-hand
side represents contributions if ψ(p – 1) and ψ(p) exchange places in the
sequence.
§ p−2 · § · (9.4.1)
¨¨ ¦ sign(aψ (i )ψ ( p ) − aψ (i )ψ ( p −1) ) ¸¸ + ¨¨ ¦ sign(aψ ( p −1) j − aψ ( p −1)ψ ( p ) ) ¸¸
© i =1 ¹ © j∉R ( p ) ¹
§ p−2 · § ·
≥ ¨¨ ¦ sign(aψ (i )ψ ( p −1) − aψ (i )ψ ( p ) ) ¸¸ + ¨¨ ¦ sign(aψ ( p ) j − aψ ( p )ψ ( p −1) ) ¸¸
© i =1 ¹ © j∉R ( p ) ¹
§ p −2 · (9.4.2)
¨¨ ¦ ( sign(aψ (i )ψ ( p ) − aψ (i )ψ ( p −1) ) + sign(aψ (i )ψ ( p ) − aψ ( p −1)ψ ( p ) )) ¸¸
© i =1 ¹
§ ·
+ ¨¨ ¦ ( sign(aψ ( p −1) j − aψ ( p −1)ψ ( p ) ) + sign(aψ ( p −1) j − aψ ( p ) j )) ¸¸
© j∉R ( p ) ¹
§ p −2 ·
≥ ¨¨ ¦ ( sign(aψ (i )ψ ( p −1) − aψ (i )ψ ( p ) ) + sign( aψ (i )ψ ( p −1) − aψ ( p )ψ ( p −1) )) ¸¸
© i =1 ¹
§ ·
+ ¨¨ ¦ ( sign(aψ ( p ) j − aψ ( p )ψ ( p −1) ) + sign(aψ ( p ) j − aψ ( p −1) j )) ¸¸
© j∉R ( p ) ¹
§ p−2 · § · (9.4.3)
¨¨ ¦ (aψ ( i )ψ ( p ) − aψ ( i )ψ ( p −1) ) ¸¸ + ¨¨ ¦ (aψ ( p −1) j − aψ ( p −1)ψ ( p ) ) ¸¸
© i =1 ¹ © j∉R ( p ) ¹
§ p−2 · § ·
≥ ¨¨ ¦ (aψ (i )ψ ( p −1) − aψ (i )ψ ( p ) ) ¸¸ + ¨¨ ¦ (aψ ( p ) j − aψ ( p )ψ ( p −1) ) ¸¸
© i =1 ¹ © j∉R ( p ) ¹
118 9 Seriation—Maximization of Gradient Indices
§ p−2 · (9.4.4)
¨¨ ¦ ( 2 a ψ ( i )ψ ( p ) − a ψ ( i )ψ ( p − 1 ) − a ψ ( p − 1 )ψ ( p ) ) ¸¸
© i =1 ¹
§ ·
+ ¨¨ ¦ ( 2 aψ ( p −1 ) j − a ψ ( p − 1 )ψ ( p ) − a ψ ( p ) j ) ¸¸
© j∉ R ( p ) ¹
§ p−2 ·
≥ ¨¨ ¦ ( 2 a ψ ( i )ψ ( p − 1 ) − a ψ ( i )ψ ( p ) − a ψ ( p )ψ ( p − 1 ) ) ¸¸
© i =1 ¹
§ ·
+ ¨¨ ¦ ( 2 aψ ( p) j − a ψ ( p )ψ ( p − 1 ) − a ψ ( p − 1 ) j ) ¸¸
© j∉ R ( p ) ¹
These equations for the passing conditions of the adjacency tests can
be reduced for ease of implementation.
(9.5.1)
p−2
¦ 2(sign(aψ ( i )ψ ( p ) − aψ (i )ψ ( p −1) )
i =1
p−2 (9.5.2)
¦ (2sign(aψ (i )ψ ( p ) − aψ (i )ψ ( p−1) )
i =1
+ sign(aψ (i )ψ ( p ) − aψ ( p −1)ψ ( p) )
− sign(aψ (i )ψ ( p −1) − aψ ( p)ψ ( p −1) ))
≥ ¦ (sign(aψ ( p) j − aψ ( p)ψ ( p −1) )
j∉R ( p )
+ sign(aψ ( p ) j − aψ ( p −1) j )
− sign(aψ ( p −1) j − aψ ( p −1)ψ ( p ) )
− sign(aψ ( p−1) j − aψ ( p) j ))
9.2 Fathoming Tests for Optimizing Gradient Indices 119
p−2 (9.5.3)
¦ 2(aψ ( i )ψ ( p ) − aψ ( i )ψ ( p −1) )
i =1
p−2 (9.5.4)
¦ (3aψ ( i )ψ ( p ) − 3aψ ( i )ψ ( p −1) − aψ ( p −1)ψ ( p ) + aψ ( p )ψ ( p −1) )
i =1
Notice that aψ(p-1)ψ(p) = aψ(p)ψ(p-1) in the adjacency test for the weighted
within row gradient of a symmetric matrix, further reducing the computa-
tion. To demonstrate the ease of implementation, we can develop the
pseudocode for the unweighted gradient for rows and columns.
/* Initialize the sides of the equation */
Set compare1 = 0 and compare2 = 0
/*Calculate left-hand side of the equation */
for i = 1 To Position – 2
difference = A(permutation(i), permutation(Position))
− A(permutation(i), permutation(Position - 1))
/* Subtract sign of difference if difference < 0 */
if difference > 0 then compare1 = compare1 + 2
if difference < 0 then compare1 = compare1 – 2
difference = A(permutation(i), permutation(Position))
− A(permutation(Position - 1), permutation(Position))
/* Subtract sign of difference if < 0 */
if difference > 0 then compare1 = compare1 + 1
if difference < 0 then compare1 = compare1 – 1
difference = A(permutation(i), permutation(Position – 1))
− A(permutation(Position), permutation(Position – 1))
/* Subtract sign of difference if < 0 */
if difference > 0 then compare1 = compare1 – 1
if difference < 0 then compare1 = compare1 + 1
next i
/* Determine S\R(Position) */
Index = Position + 1
for i = 1 To n
120 9 Seriation—Maximization of Gradient Indices
found = False
for j = 1 To Position
if permutation(j) = i then found = True
next j
if not found then
complement(Index) = i
Index = Index + 1
end if
next i
/* Calculate right-hand side of the equation */
compare2 = 0
for j = Position + 1 to n
difference = A(permutation(Position), complement(j))
− A(permutation(Position), permutation(Position – 1))
/* Subtract sign of difference if < 0 */
if difference > 0 then compare2 = compare2 + 1
if difference < 0 then compare2 = compare2 – 1
difference = A(permutation(Position), complement(j))
−A(permutation(Position – 1), complement(j))
/* Subtract sign of difference if < 0 */
if difference > 0 then compare2 = compare2 + 1
if difference < 0 then compare2 = compare2 – 1
difference = A(permutation(Position – 1), complement(j))
−A(permutation(Position – 1), permutation(Position))
/* Subtract sign of difference if < 0 */
if difference > 0 then compare2 = compare2 – 1
if difference < 0 then compare2 = compare2 + 1
difference = A(permutation(Position – 1), complement(j))
−A(permutation(Position), complement(j))
/* Subtract sign of difference if < 0 */
if difference > 0 then compare2 = compare2 – 1
if difference < 0 then compare2 = compare2 + 1
next j
We simply compare the left-hand side of the equation with the right-
hand side, compare1 >= compare2, to determine whether or not the par-
tial sequence passes the adjacency test.
9.2 Fathoming Tests for Optimizing Gradient Indices 121
As with the adjacency tests, the bound tests for equations (9.2.1) through
(9.2.4) are unique yet not too dissimilar. The upper bounds for the un-
weighted gradient indices incorporate a constant term, b = p(n – p)(n – p
–1)/2 + (n – p)(n – p – 1)(n – p – 2)/6, to account for the number of terms
corresponding to triples formed by one of the p objects in the partial or-
dering and pairs of distinct i, j not in the complement, S|R(p), summed
with the number of terms corresponding to triples formed by distinct i, j,
k not in the complement (hence, this second term for the within row and
column index is multiplied by 2, making the divisor equal to 3).
§ p − 2 p −1 p · § p −1 p · (9.6.1)
f Ur = ¨¨ ¦ ¦ ¦ sign(aȥ(i)ȥ(k) − aȥ(i)ȥ(j) )¸¸ + ¨¨ ¦ ¦ ¦ sign(aȥ(i)k − aȥ(i)ȥ(j) )¸¸
© i =1 j =i +1k = j +1 ¹ © i =1 j =i +1k∉R(p) ¹
+ p(n − p)(n − p − 1 )/ 2 + (n − p)(n − p − 1 )(n − p − 2 )/ 6
§ p − 2 p −1 p · (9.6.2)
f Urc = ¨¨ ¦ ¦ ¦ (sign(aȥ(i)ȥ(k) − aȥ(i)ȥ(j) ) + sign(aȥ(i)ȥ(k) − aȥ(j)ȥ(k) ) )¸¸
© i =1 j =i +1k = j +1 ¹
§ p −1 p ·
+ ¨¨ ¦ ¦ ¦ (sign(aȥ(i)k − aȥ(i)ȥ(j) ) + sign(aȥ(i)k − aȥ(j)k )¸¸
© i =1 j =i +1k∉R(p) ¹
+ p(n − p)(n − p − 1 ) + (n − p)(n − p − 1 )(n − p − 2 )/ 3
p − 2 p −1 p p −1 p (9.6.3)
f Wr = ¦ ¦ ¦ (aψ (i )ψ ( k ) − aψ (i )ψ ( j ) ) + ¦ ¦ ¦ (aψ (i ) k − aψ ( i )ψ ( j ) )
i =1 j = i +1k = j +1 i =1 j =i +1k∉R ( p )
§ p p
·
+ ¦ max¨¨ ¦ (aψ ( k ) j − aψ ( k )i ), ¦ (aψ ( k )i − aψ ( k ) j ) ¸¸
( i < j )∉R ( p ) © k =1 k =1 ¹
+ ¦ max( a ik − aij , aik − a jk , aij − a jk )
( i < j < k )∉R ( p )
122 9 Seriation—Maximization of Gradient Indices
p − 2 p −1 p
(9.6.4)
f Wrc = ¦ ¦ ¦ (2aψ ( i )ψ ( k ) − aψ (i )ψ ( j ) − aψ ( j )ψ ( k ) )
i =1 j =i +1k = j +1
p −1 p
+¦ ¦ ¦ (2aψ (i ) k − aψ (i )ψ ( j ) − aψ ( j ) k )
i =1 j =i +1k∉R ( p )
§ p p
·
+ ¦ max¨¨ ¦ (2aψ ( k ) j − aψ ( k )i − aij ), ¦ (2aψ ( k )i − aψ ( k ) j − a ji ) ¸¸
( i < j )∉R ( p ) © k =1 k = 1 ¹
+ ¦ max((2aik − aij − a jk ), (2aij − aik − a jk ), (2a jk − aik − aij ))
( i < j < k )∉R ( p )
The algorithmic psuedocode for the bound tests for the unweighted
cases entails conditional statements to determine the sign function val-
ues. For the weighted cases, a function to find maximum values in trip-
lets must be used. Here, we present the algorithmic psuedocode for the
unweighted gradient within rows and columns.
/* Determine S/R(Position) */
Index = Position + 1
for i = 1 to n
found = False
for j = 1 to Position
if permutation(j) = i then found = True
next j
if not found then
complement(Index) = i
Index = Index + 1
end if
next i
/* Initialize and calculate the upper bound */
UpperB = 0
for i = 1 to Position – 2
for j = i + 1 to Position – 1
for k = j + 1 to Position
difference = A(permutation(i), permutation(k))
– A(permutation(i), permutation(j))
if difference > 0 then UpperB = UpperB + 1
if difference < 0 then UpperB = UpperB – 1
difference = A(permutation(i), permutation(k))
– A(permutation(j), permutation(k))
if difference > 0 then UpperB = UpperB + 1
9.3 EXAMPLE—An Archaeological Exploration 123
Table 9.1. (Dis)Agreement indices for the Kabah collection with rows and col-
umns labeled according to the 17 deposit identifications.
II VII IA XIA IB XB IX XA XIB
II 200 108 68 96 99 116 105 112 106
VII 108 200 95 76 93 84 92 87 94
IA 68 95 200 47 55 65 54 62 50
XIA 96 76 47 200 56 58 50 49 32
IB 99 93 55 56 200 53 34 33 46
XB 116 84 65 58 53 200 36 31 34
IX 105 92 54 50 34 36 200 19 30
XA 112 87 62 49 33 31 19 200 32
XIB 106 94 50 32 46 34 30 32 200
VIII 108 109 72 60 41 43 32 42 47
IVA 119 93 65 66 54 45 33 33 54
VB 128 108 74 79 50 46 36 40 51
VA 145 116 90 93 61 58 52 49 71
VIB 154 118 100 87 53 68 60 57 71
VIA 156 140 109 100 67 71 62 60 72
III 149 122 98 124 69 81 61 57 80
IVB 151 136 95 101 86 69 63 67 79
VIII IVA VB VA VIB VIA III IVB -
II 108 119 128 145 154 156 149 151 -
VII 109 93 108 116 118 140 122 136 -
IA 72 65 74 90 100 109 98 95 -
XIA 60 66 79 93 87 100 124 101 -
IB 41 54 50 61 53 67 69 86 -
XB 43 45 46 58 68 71 81 69 -
IX 32 33 36 52 60 62 61 63 -
XA 42 33 40 49 57 60 57 67 -
XIB 47 54 51 71 71 72 80 79 -
VIII 200 53 41 51 52 57 66 61 -
IVA 53 200 47 48 57 73 48 43 -
VB 41 47 200 22 46 36 51 48 -
VA 51 48 22 200 29 28 34 39 -
VIB 52 57 46 29 200 25 48 55 -
VIA 57 73 36 28 25 200 55 61 -
III 66 48 51 34 48 55 200 46 -
IVB 61 43 48 39 55 61 46 200 -
We can permute the rows and columns to optimize the gradient indi-
ces, as shown in Table 9.2. For all initial lower bounds, the dynamic
ranking permutation of graduated rowsums is ψ = (1, 2, 3, 4, 17, 16, 15,
9.3 EXAMPLE—An Archaeological Exploration 125
Table 9.2. Optimal seriation for gradient indices of the Kabah collection.
Gradient ψ* f(ψ*)
Index
Ur (16, 17, 13, 14, 12, 15, 11, 10, 8, 7, 5, 6, 9, 4, 3, 2, 1) 562
(III, IVB, VA, VIB, VB, VIA, IVA, VIII, XA, IX, IB,
XB, XIB, XIA, IA, VII, II)
Urc (1, 2, 3, 4, 9, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17) 1050
(II, VII, IA, XIA, XIB, IB, XB, IX, XA, VIII, IVA, VB,
VA, VIB, VIA, III, IVB)
Wr (17, 16, 15, 14, 13, 12, 11, 10, 7, 8, 6, 5, 9, 4, 3, 2, 1) 23844
(IVB, III, VIA, VIB, VA, VB, IVA, VIII, IX, XA, XB,
IB, XIB, XIA, IA, VII, II)
Wrc (1, 2, 3, 4, 9, 5, 6, 8, 7, 10, 11, 12, 13, 14, 15, 16, 17) 37980
(II, VII, IA, XIA, XIB, IB, XB, XA, IX, VIII, IVA, VB,
VA, VIB, VIA, III, IVB)
Another aspect of differences in the optimal permutations for gradient
indices is in the difference between within row gradient indices and
126 9 Seriation—Maximization of Gradient Indices
within row and column gradient indices. The within row gradient indices
require optimal permutations to form in a particular direction. In contrast,
the within row and column gradient indices are influenced by symmetry
in the matrices so that the reverse of an optimal permutation is an alter-
native optimal permutation. For the above example using forward
branching in the branch-and-bound algorithm, the optimal within row
and column gradients are found as early as possible, i.e., when object 1 is
in the first position rather than when the opposite end of the permutation
is in the leading position. The result is that the within row gradients ap-
pear to be maximized for the reverse (or near-reverse) optimal permuta-
tion for the within row and column gradient indices—yet, that is not the
case because the within row and column gradients are optimized with
their reverse optimal permutations! Again, as we shall see in Chapter 11,
these alternative optimal permutations can be discovered when examin-
ing multiple criteria simultaneously. Moreover, in Chapter 10, we will
force our main algorithm to take advantage of symmetry in matrices and
optimal permutations.
x switch 0 x switch x x
x switch x 0 switch x x
x (x) x x (x) 0 x
x (x) x x (x) x 0
Figure 10.1. An illustration of the change to the placement of matrix entries af-
ter two objects swap positions. The diagonal is zeroed, “x” indicates no change
in placement, “(x)” indicates a change in placement that will occur on the same
side of the diagonal, and “switch” indicates an entry that will “switch sides” of
the diagonal.
¦ (2 * SumLeft(ψ(i)) − rψ (i ) ) 2 = ¦ (2 * ¦ aψ (i )ψ ( j ) − rψ (i ) ) 2 . (10.3)
i=q i=q j =1
p
= ¦ (2 * SumLeft (ψ ' (i )) − rψ '(i ) ) 2
i=q
p −1
= ( SumLeft (ψ ' (q )) − rψ '( q ) ) 2 + ( SumLeft (ψ ' ( p )) − rψ '( p ) ) 2 + ¦ (2 * SumLeft (ψ ' (i)) − rψ '(i ) )2
i = q +1
p p −1
= (( SumLeft (ψ (q) + ¦ aψ ( i )ψ ( q ) ) − rψ ( q ) ) 2 + (( SumLeft (ψ ( p ) − ¦ aψ ( i )ψ ( p ) ) − rψ ( p ) ) 2
i = q +1 i=q
p −1
+ ¦ (2 * ( SumLeft (ψ (i)) − aψ
i = q +1
(i ) q + aψ (i ) p ) − rψ (i ) ) 2
q −1 p
= (¦ aψ ( q )ψ ( j ) + − rψ ( q ) ) 2 (10.4)
j =1
¦ aψ
i = q +1
( i )ψ ( q )
p −1
+ (¦ aψ ( p )ψ ( j ) − ¦ aψ ( i )ψ ( p ) − rψ ( p ) ) 2
j =1
p −1 i −1
+ ¦ (2 * (¦ aψ ( i )ψ ( j ) − aψ (i )ψ ( q ) + aψ (i )ψ ( p ) ) − rψ (i ) ) 2
i = q +1 j =1
Given a partial sequence of p out of n objects, the upper bound for the
objective function value for unidimensional scaling is comprised of two
components. The first component calculates the contribution of the first p
objects to the objective function value:
p i −1 (10.5)
f U 1 = ¦ (rψ ( i ) − 2¦ aψ ( i )ψ ( j ) ) 2 .
i =1 j =1
10.2 Fathoming Tests for Optimal Unidimensional Scaling 137
UpperB1 = 0
for i = 1 to Position
temp = 0
for j = 1 to i – 1
temp = temp + A(permutation(i), permutation(j))
next j
temp = rowsum(permutation(i)) – 2*temp
temp = temp*temp
UpperB1 = UpperB1 + temp
next i
The second component is an upper bound on the possible contribution
by the remaining n – p objects. For unselected objects in unfilled posi-
tions, the upper bound should find the maximum possible contribution to
the objective function of object i if placed in position j. For any object,
the set of row entries is divided into two groups—those to the left of the
diagonal and those to the right of the diagonal. For unidimensional scal-
ing, the goal is to have the largest possible difference between the sums
of the two groups. Therefore, we should put the smaller entries into the
smaller of the subsets and the larger entries in the larger of the subsets.
That is, if object i is placed in position p and p ≤ n / 2, then the most de-
sirable outcome is to have the smallest entries to the left of the diagonal;
similarly, if object i is placed in position p and p > n / 2, then the most
desirable outcome is to have the smallest entries to the right of the di-
agonal. So, for a row i, we find the pth greatest row entry with,
order (i, p) = [aij | p = {aik : aij < aik } | +1]. (10.6)
found = False
for j = 1 to Position
if permutation(j) = i then found = true
next j
if not found then
complement(index) = i
index = index + 1
end if
next i
/* Find maximum contribution to objective function value by the
complement. */
for OpenPosition = Position + 1 to n
for Object = Position + 1 to n
if B(complement(Object), OpenPosition) > MAX then
MAX = B(complement(Object), OpenPosition)
end if
next Object
UpperB2 = UpperB2 + MAX
next OpenPosition
Table 10.2. The iterative process for unidimensional scaling of the symmetric
dissimilarity matrix in Table 10.1.
Dispensation
Interchange
Sequence
UpperB1
UpperB2
UpperB
Partial
Row
test
Table 10.3. Confusion data for English consonants degraded by –18db (Miller
& Nicely, 1955).
(p) (t) (k) (f) (θ) (s) (³) (b)
(p) .000 .102 .083 .087 .095 .083 .053 .057
(t) .073 .000 .095 .068 .068 .082 .064 .032
(k) .083 .092 .000 .063 .058 .121 .050 .017
(f) .101 .082 .101 .000 .049 .045 .037 .071
(θ) .071 .075 .075 .054 .000 .088 .050 .058
(s) .071 .067 .091 .044 .071 .000 .067 .044
(³) .060 .075 .101 .063 .049 .138 .000 .037
(b) .045 .041 .090 .056 .071 .056 .045 .000
(d) .054 .081 .061 .044 .051 .051 .047 .074
(g) .036 .066 .095 .030 .059 .059 .049 .086
(v) .040 .076 .080 .049 .031 .054 .040 .112
(δ) .074 .051 .046 .032 .028 .065 .046 .093
(z) .074 .074 .061 .037 .053 .078 .029 .090
(3) .036 .073 .077 .064 .055 .068 .032 .100
(m) .079 .100 .063 .058 .058 .058 .033 .058
(n) .047 .076 .085 .025 .038 .076 .038 .059
(d) (g) (v) (δ) (z) (3) (m) (n)
(p) .061 .027 .064 .042 .045 .042 .061 .045
(t) .045 .027 .077 .041 .059 .050 .041 .059
(k) .046 .038 .050 .042 .067 .046 .071 .058
(f) .075 .052 .060 .060 .056 .011 .049 .067
(θ) .083 .058 .096 .025 .058 .038 .050 .058
(s) .095 .060 .060 .063 .044 .052 .067 .020
(³) .078 .026 .075 .067 .034 .030 .060 .056
(b) .075 .071 .090 .045 .056 .041 .067 .063
(d) .000 .071 .084 .057 .061 .044 .051 .084
(g) .099 .000 .059 .046 .053 .066 .079 .072
(v) .063 .058 .000 .067 .085 .049 .054 .076
(δ) .079 .083 .069 .000 .079 .056 .083 .083
(z) .057 .037 .086 .049 .000 .041 .090 .049
(3) .082 .036 .068 .050 .068 .000 .082 .059
(m) .063 .050 .054 .033 .046 .025 .000 .117
(n) .059 .055 .038 .034 .042 .051 .140 .000
142 10 Seriation—Unidimensional Scaling
1 10 0.00000
2 4 31.38095
3 14 62.14286
4 9 101.71429
5 15 128.09524
6 19 135.66667
7 6 167.57143
8 11 187.19048
9 8 214.80952
10 12 278.23810
11 1 288.09524
12 21 319.42857
13 2 325.23810
14 3 338.80952
15 16 350.76190
16 17 388.71429
17 5 408.52381
18 7 449.33333
19 13 496.33333
20 20 552.80952
21 18 598.14286
One of the most important aspects of the results for the lipread conso-
nant data is the disparity between the CPU times for bbinward.for and
bbforwrd.for, with the former requiring nearly 40 times more CPU time
than the latter. Brusco and Stahl (2004) observed that for dissimilarity
matrices with a near-perfect anti-Robinson structure, branching inward
tends to perform somewhat better than branching forward. In other
words, for well-structured matrices, the bounding process proposed by
Defays (1978) is extremely effective and inward branching leads to bet-
ter pruning of partial solutions. However, the forward algorithm tends to
be appreciably more efficient than inward branching for even modest de-
partures from anti-Robinson structure. Because the lipread consonant
data are rather messy and devoid of a strong structure, the tremendous
advantages for the forward algorithm are not surprising.
11 Seriation—Multiobjective Seriation
also define user-specified weights, 0 < wq < 1 for q = 1,…, Q such that
Q
Q (11.1.1)
subject to : ¦w q = 1,
q =1
wq > 0, (11.1.2)
rowsum for row i is considered the (Q + 1)th rowsum for the ith row of all
matrices. Similarly, the (Q + 1)th matrix is the weighted matrix.
/* Calculate weighted rowsums */
For i = 1 To n
rowsum(Q + 1, i) = 0
For k = 1 To Q
rowsum(Q + 1, i) = rowsum(Q+1, i) + w(k) * rowsum(k, i)
Next k
Next i
/* Calculate weighted matrix. */
For i = 1 To n
For j = 1 To n
A(Q + 1, i, j) = 0
For k = 1 To Q
A(Q + 1, i, j) = A(Q + 1, i, j) + w(k) * A(k, i, j)
Next k
Next j
Next i
With these weighted rowsums and matrix in place, we can simply per-
form the branch-and-bound algorithm on the (Q + 1)th rowsums and ma-
trix. The particular routines for the main algorithm—finding the initial
lower bound, evaluation, and fathoming—also use these prefabricated to-
tals in the same manner as a single objective problem.
To find an equitable multiobjective solution, we can set each weight
for each matrix as 1/Q, normalize the weights, find the weighted row-
sums and matrix, and use the branch-and-bound algorithm to solve the
multiobjective problem. If we would like to continue finding points in
the efficient set, then we can vary the weights, normalize the weights,
find the new weighted rowsums and matrix, and solve. Not every weight-
ing scheme is unique to a point on the efficient frontier; however, each
weighting scheme produces a single point on the efficient frontier.
Therefore, a logical methodology is to vary the weight of the matrices
over a fixed interval by steady increments. For example, in a biobjective
problem, an analyst could vary the first weight by .05 over the interval
[.4, .6], i.e. at the starting point plus four steps of .05.
Consider the confusion matrices in Table 11.1 for acoustic recognition
of digits according to male or female voice. The maximum dominance
index for the male voice (f1) is 1751 with an optimal permutation of ψM
= (5, 9, 1, 4, 7, 6, 8, 3, 2); for the female voice (f2), the maximum domi-
nance index is 4737, indicating more confusion, with a slightly different
11.3 Maximizing the Dominance Index for Multiple Matrices 151
Table11.2. Generating points on the efficient frontier for maximizing the domi-
nance index of matrices in Table 11.1. For a given permutation, the function f1
measures the dominance index for the male voice data and f2 for the female
voice data.
w(1) w(2) F(ψ*) f1(ψ*) f2(ψ*) Optimal
*
(% of f1 ) (% of f2*) biobjective
permutation (ψ*)
0 1 N/A 1725 4737 ( 5, 9, 1, 4, 7, 3, 6, 2, 8)
(98.5151% ) (100% )
0.1 0.9 0.9986 1733 4735 ( 5, 9, 1, 4, 7, 6, 3, 2, 8)
(98.972%) (99.9578%)
0.2 0.8 0.9986 1751 4729 ( 5, 9, 1, 4, 7, 6, 8, 3, 2)
(100%) ( 99.8311%)
0.3 0.7 0.9986 1751 4729 ( 5, 9, 1, 4, 7, 6, 8, 3, 2)
(100%) (99.8311%)
0.4 0.6 0.9988 1751 4729 ( 5, 9, 1, 4, 7, 6, 8, 3, 2)
(100%) (99.8311%)
0.5 0.5 0.999 1751 4729 ( 5, 9, 1, 4, 7, 6, 8, 3, 2)
(100% ) (99.8311%)
0.6 0.4 0.9992 1751 4729 ( 5, 9, 1, 4, 7, 6, 8, 3, 2)
(100%) (99.8311%)
0.7 0.3 0.9993 1751 4729 ( 5, 9, 1, 4, 7, 6, 8, 3, 2)
(100%) (99.8311%)
0.8 0.2 0.9995 1751 4729 ( 5, 9, 1, 4, 7, 6, 8, 3, 2)
(100%) (99.8311%)
0.9 0.1 0.9997 1751 4729 ( 5, 9, 1, 4, 7, 6, 8, 3, 2)
(100%) (99.8311%)
1 0 N/A 1751 4729 ( 5, 9, 1, 4, 7, 6, 8, 3, 2)
(100%) (99.8311%)
Closer Examination of Efficient Frontier for Weights between (0.08, 0.92) and
(0.11, 0.89)
0.08 0.92 0.9989 1725 4737 ( 5, 9, 1, 4, 7, 3, 6, 2, 8)
(98.5151%) (100%)
0.09 0.91 0.9987 1733 4735 ( 5, 9, 1, 4, 7, 6, 3, 2, 8)
(98.972%) (99.9578%)
0.1 0.9 0.9986 1733 4735 ( 5, 9, 1, 4, 7, 6, 3, 2, 8)
(98.972%) (99.9578%)
0.11 0.89 0.9985 1751 4729 ( 5, 9, 1, 4, 7, 6, 8, 3, 2)
(100%) (99.8311%)
11.4 UDS for Multiple Symmetric Dissimilarity Matrices 153
Figure 11.1. Points on the efficient frontier for maximizing the dominance in-
dex of matrices in Table 11.1. Axes are defined by single objective solutions.
Table 11.3. Points generated for the efficient frontier when applying biobjective
UDS to the confusion data for consonants degraded by –18db and –12db (Miller
& Nicely, 1955). For a given permutation, g1 measures the Defays criterion for
the –18db data and g2 for the –12db data.
w(1) w(2) G(ψ*) g1(ψ*) g2(ψ*) Optimal biobjective
(% of f1*) (% of f2*) permutation (ψ*)
0 1 N/A 1192.63982252 1270.4331286 (t, k, p, f, θ, s, ³, 3, z,
(99.2318%) (100%) d, g, δ, v, b, m, n)
0.05 0.95 0.9999 1192.63982252 1270.4331286 (t, k, p, f, θ, s, ³, 3, z,
(99.2318%) (100%) d, g, δ, v, b, m, n)
0.1 0.9 0.9996 1192.8183188 1270.41475884 (t, k, p, f, θ, ³, s, 3, z,
(99.2467%) (99.9986%) d, g, δ, v, b, m, n)
0.15 0.85 0.9992 1193.25412256 1270.3402682 (t, p, k, f, θ, ³, s, 3, z,
(99.283%) (99.9927%) d, g, δ, v, b, m, n)
0.2 0.8 0.9989 1193.25412256 1270.3402682 (t, p, k, f, θ, ³, s, 3, z,
(99.283%) (99.9927%) d, g, δ, v, b, m, n)
0.25 0.75 0.9985 1194.04238284 1270.07806468 (t, p, k, f, θ, ³, s, 3, z,
(99.3485%) (99.9721%) d, g, v, b, δ, m, n)
0.3 0.7 0.9982 1194.04238284 1270.07806468 (t, p, k, f, θ, ³, s, 3, z,
(99.3485%) (99.9721%) d, g, v, b, δ, m, n)
0.35 0.65 0.9978 1194.04238284 1270.07806468 (t, p, k, f, θ, ³, s, 3, z,
(99.3485%) (99.9721%) d, g, v, b, δ, m, n)
0.4 0.6 0.9975 1194.39919652 1269.86984556 (p, t, k, f, θ, ³, s, 3, z,
(99.3782%) (99.9557%) d, g, v, b, δ, m, n)
0.45 0.55 0.9972 1194.39919652 1269.86984556 (p, t, k, f, θ, ³, s, 3, z,
(99.3782%) (99.9557%) d, g, v, b, δ, m, n)
0.5 0.5 0.997 1196.808673 1267.71883456 (3, z, g, d, δ, b, v, n,
(99.5787%) (99.7863%) m, s, ³, θ, f, k, t, p)
0.55 0.45 0.9968 1196.808673 1267.71883456 (3, z, g, d, δ, b, v, n,
(99.5787%) (99.7863%) m, s, ³, θ, f, k, t, p)
0.6 0.4 0.9967 1198.42538244 1265.46336088 (3, z, g, d, δ, b, v, n,
(99.7132%) (99.6088%) m, θ, f, p, k, t, s, ³)
0.65 0.35 0.9967 1198.42538244 1265.46336088 (3, z, g, d, δ, b, v, n,
(99.7132%) (99.6088%) m, θ, f, p, k, t, s, ³)
0.7 0.3 0.9968 1199.2132036 1263.85166888 (θ, f, p, t, k, ³, s, m, n,
(99.7788%) (99.482%) v, b, δ, d, z, g, 3)
0.75 0.25 0.9969 1200.647193 1259.6138228 (³, f, p, t, k, s, θ, m, n,
(99.8981%) (99.1484%) v, b, d, δ, z, g, 3)
0.8 0.2 0.9971 1201.51024336 1256.62626288 (³, f, p, t, k, s, θ, m, n,
(99.9699%) (98.9132%) d, v, b, z, δ, g, 3)
11.4 UDS for Multiple Symmetric Dissimilarity Matrices 159
When we wish to analyze a single matrix with multiple criteria, the main
algorithm and the calculation for the initial lower bound remain the
same. However, the routines called by these algorithms must be per-
formed with respect to the desired weighting scheme. The main algo-
rithm calls to evaluation and fathoming routines; the calculation of the
initial lower bound uses the evaluation routine.
We adopt the convention of using EVALUATION1 as the evaluation
routine for the first gradient index (f1 = Ur), EVALUATION2 as the
evaluation routine for the second gradient index criteria, and so forth.
The overall EVALUATION routine uses normalized weights derived
from a convex combination of weights. Therefore, the overall objective
function value will actually be a percentage, i.e. a value between 0 and 1
inclusive. Conditional statements augment implementation by avoiding
computation of irrelevant criteria for a particular problem.
EVALUATION = 0
if w(1) > 0 then
EVALUATION = w(1) * EVALUATION1
if w(2) > 0 then
EVALUATION = EVALUATION + w(2) * EVALUATION2
if w(3) > 0 then
EVALUATION = EVALUATION + w(3) * EVALUATION3
if w(4) > 0 then
EVALUATION = EVALUATION + w(4) * EVALUATION4
In the case of maximizing gradient indices, the fathoming routines are
the adjacency test and the bound test. For all of the gradient indices, the
adjacency test is based on an inequality. The example of algorithmic
pseudocode in section 9.2.2 used compare1 and compare2 variables to
evaluate the left- and right-hand sides of the equation. If we are inter-
ested in multiple gradient indices, then we accumulate the left- and right-
hand sides of the equation with respect to the normalized weights (as-
suming that compare1 and compare2 are global variables).
if Position = 1 then
ADJACENCYTEST = Pass
else
Set Lefthand = 0 and Righthand = 0.
if w(1) > 0 then
11.5 Comparing Gradient Indices for a Symmetric Dissimilarity Matrix 161
ADJACENCYTEST1
Lefthand = w(1) * compare1
Righthand = w(1) * compare2
end if
if w(2) > 0 then
ADJACENCYTEST2
Lefthand = Lefthand + w(2) * compare1
Righthand = Righthand + w(2) * compare2
end If
if w(3) > 0 then
ADJACENCYTEST3
Lefthand = Lefthand + w(3) * compare1
Righthand = Righthand + w(3) * compare2
end if
if w(4) > 0 then
ADJACENCYTEST4
Lefthand = Lefthand + w(4) * compare1
Righthand = Righthand + w(4) * compare2
end if
if Lefthand < Righthand then
ADJACENCYTEST = False
else
ADJACENCYTEST = True
end if
end if
Similarly, we rely on the fact that all bound tests calculate an upper
bound to which the current lower bound is compared, as explained in
section 9.2.3. By calculating the overall upper bound with respect to
normalized weights (assuming UpperB is a global variable), we can de-
termine the feasibility of a partial sequence as a candidate for an optimal
permutation for the overall objective function.
If Position < 3 then
BOUNDTEST = Pass
Else
BT = 0
if w(1) > 0 then
BOUNDTEST1
BT = w(1) * UpperB1
end if
if w(2) > 0 then
162 11 Seriation—Multiobjective Seriation
BOUNDTEST2
BT = w(2) * UpperB2
end if
if w(3) > 0 then
BOUNDTEST3
BT = w(3) * UpperB3
end if
if w(4) > 0 then
BOUNDTEST4
BT = w(1) * UpperB4
end if
if BT < LowerB then
BOUNDTEST = Fail
else
BOUNDTEST = Pass
end if
end if
Recall the archaeological example of Chapter 9. We have varying op-
timal permutations for the four gradient indices. An interesting problem
is to compare results for the weighted within row gradient and the un-
weighted within row gradient for the archaeological data in Table 9.1.
Referring to the gradient indices, Ur and Wr, we can vary the weights for
the individual criteria from (0.05, 0.95) to (0.95, 0.05) for finding points
on the efficient frontier for F. Points found on the efficient frontier in this
migration are reported in Table 11.4. For use as reference to endpoints
and to illustrate the usefulness of the biobjective modeling, the single ob-
jective solutions are italicized above and below the points of the efficient
frontier. We stress that a requirement of Soland’s theorem is strictly posi-
tive weights for efficient solutions. This subset of the efficient frontier
has six points—(549, 23844), (551, 23837), (553, 23826), (557, 23773),
(561, 23651), and (562, 23437). These six points are plotted in Figure
11.3.1. The first important observation is in the weighting scheme of
(0.85, 0.95), which finds an alternative optimal solution for the first crite-
ria that improves the value of the second criterion as promised in Chapter
9. This clearly demonstrates the usefulness of biobjective programming
when a second criterion is considered but not of great importance. In ad-
dition, we can see how the trade-offs become more drastic as we ap-
proach the single objective solution for Criteria 1 (Ur), as when moving
from the fifth point to the sixth point there is a mere increase of 1 in f1*
yet a drop of 214 in f3*.
11.5 Comparing Gradient Indices for a Symmetric Dissimilarity Matrix 163
Table 11.4. Biobjective seriation of the Kabah collection by weighted and un-
weighted within row gradient indices to find points in the efficient set.
w(1) w(2) F(ψ*) f 1(ψ*) f 3(ψ*) Optimal biobjective
(% of f1*) (% of f3*) permutation (ψ*)
0 1 N/A 549 23844 (IVB, III, VIA, VIB, VA, VB,
(97.6868%) (100% ) IVA, VIII, IX, XA, XB, IB,
XIB, XIA, IA, VII, II)
0.05 0.95 0.9988 549 23844 (IVB, III, VIA, VIB, VA,
(97.6868%) (100%) VB, IVA, VIII, IX, XA, XB,
IB, XIB, XIA, IA, VII, II)
0.1 0.9 0.9978 551 23837 (IVB, III, VIA, VIB, VA,
(98.0427%) (99.9706%) VB, IVA, VIII, XA, IX, XB,
IB, XIB, XIA, IA, VII, II)
0.15 0.85 0.997 553 23826 (III, IVB, VIA, VIB, VA,
(98.3986%) (99.9245%) VB, IVA, VIII, XA, IX, XB,
IB, XIB, XIA, IA, VII, II)
0.25 0.75 0.9955 557 23773 (III, IVB, VIB, VIA, VA,
(99.1103%) (99.7022%) VB, IVA, VIII, XA, IX, XB,
IB, XIB, XIA, IA, VII, II)
0.45 0.55 0.9947 561 23651 (VIA, VIB, VA, III, IVB,
(99.8221%) (99.1906%) VB, IVA, VIII, XA, IX, XB,
IB, XIB, XIA, IA, VII, II)
0.85 0.15 0.9974 562 23437 (III, IVB, VA, VIB, VB,
(100%) (98.2931%) VIA, IVA, VIII, XA, IX,
XB, IB, XIB, XIA, IA, VII,
II)
1 0 N/A 562 23429 (III, IVB, VA, VIB, VB, VIA,
(100%) (98.2595%) IVA, VIII, XA, IX, IB, XB,
XIB, XIA, IA, VII, II)
We can also take a closer look at the nature of the within row gradi-
ents and the within row and column gradients as shown in Table 11.5.
We can see that any weight on the first criteria (Ur) forces the second cri-
teria (Urc) to find an alternative optimal solution in near-reverse order.
On the opposite end of the efficient frontier, the smallest weight on the
second criteria forces the first criteria to find an alternative optimal solu-
tion by simply reversing the placements of objects XB and IB. In both
cases, the primary criterion remains optimal while improving the value
for the secondary criterion. For the selected weighting schemes, the biob-
jective program generates four points on the efficient frontier as shown
in Figure 11.3.2. This figure graphically demonstrates the improvement
in f1 (a 10% improvement) simply by finding an alternate optimal per-
mutation for the second criterion using the biobjective program.
164 11 Seriation—Multiobjective Seriation
Figure 11.3.1. Plotting a subset of the efficient frontier for maximizing Criteria
1 (Ur) and Criteria 3 (Wr) applied to the data for the Kabah collection.
max : F (ψ ) = w f ¨ f (ψ ) * ¸ + wg ¨ g (ψ ) * ¸ = ¨ f * ¸ f (ψ ) + ¨ g * ¸ g (ψ ) .
§ · § · §w · §w · (11.3)
ψ∈Ψ © f ¹ © g ¹ © f ¹ © g ¹
Table 11.5. A subset of the efficient set for maximizing unweighted within row
gradient indices and unweighted within row and column gradient indices for the
Kabah collection.
w(1) w(2) F(ψ*) f1(ψ*) f2(ψ*) Optimal biobjective
(% of f1*) (% of f2*) permutation (ψ*)
0 1 N/A 499 1050 (II, VII, IA, XIA, XIB, IB, XB,
(88.79%) (100%) IX, XA, VIII, IVA, VB, VA,
VIB, VIA, III, IVB)
0.05 0.95 0.9992 553 1050 (III, IVB, VIA, VIB, VA,
(98.3986%) (100%) VB, IVA, VIII, XA, IX, XB,
IB, XIB, XIA, IA, VII, II)
0.7 0.3 0.9898 557 1036 (III, IVB, VIB, VIA, VA,
(99.1103%) (98.6667%) VB, IVA, VIII, XA, IX, XB,
IB, XIB, XIA, IA, VII, II)
0.85 0.15 0.9905 561 994 (VIA, VIB, VA, III, IVB,
(99.8221%) (94.6667%) VB, IVA, VIII, XA, IX, XB,
IB, XIB, XIA, IA, VII, II)
0.9 0.1 0.9932 562 979 (III, IVB, VA, VIB, VB,
(100%) (93.2381%) VIA, IVA, VIII, XA, IX, XB,
IB, XIB, XIA, IA, VII, II)
1 0 N/A 562 976 (III, IVB, VA, VIB, VB, VIA,
(100%) (92.9524%) IVA, VIII, XA, IX, IB, XB,
XIB, XIA, IA, VII, II)
Figure 11.3.2. Four points on the efficient frontier for maximizing Criteria 1
(Ur) and Criteria 2 (Urc) as applied to the Kabah collection.
11.6 Multiple Matrices with Multiple Criteria 167
end if
if i = Position then
for j = AltPosition to Position − 1
SumLeft = SumLeft
− Magnitude (permutation(Position), permutation(j))
next j
end if
UDSContribution = 2 * SumLeft − rowsum(m, permutation(i))
AltContribution = AltContribution
+ w(m) * UDSContribution * UDSContribution
next i
end if
if w(2) > 0 then
/* Calculate contribution of Dominance Index function for rows
permutation(AltPosition) through permutation(Position) */
for i = AltPosition + 1 To Position
CurrentContribution = CurrentContribution
+ w(s) * Sign(permutation(AltPosition), permutation(i))
next i
for i = AltPosition + 1 to Position − 1
CurrentContribution = CurrentContribution
+ w(s) * Sign(permutation(i), permutation(Position))
AltContribution = AltContribution
+ w(s) * Sign(permutation(i), permutation(AltPosition))
next i
for i = AltPosition to Position − 1
AltContribution = AltContribution
+ w(s) * Sign(permutation(Position), permutation(i))
next i
end if
AltPosition = AltPosition + 1
if CurrentContribution < AltContribution then
INTERCHANGETEST = Fail
loop /* AltPosition loop */
end if
As an example, we return to the matrix for assessment of criminal of-
fences (Table 8.6). The results in Table 11.6 agree with the results ob-
tained by Brusco and Stahl (2005) for the same biobjective problem
when employing dynamic programming to find optimal permutations.
11.7 Strengths and Limitations 169
Table 11.6. A subset of the efficient set for maximizing the dominance index for
the magnitude component and the Defays’ maximization for the magnitude com-
ponent of the skew-symmetric component of Table 8.6.
w(1) w(2) F(ψ*) f(ψ*) g(ψ*) Optimal biobjective
(% of f*) (% of g*) permutation (ψ*)
0 1 N/A 95 122813150.5 (Libel, Larceny, Burglary, Assault
(94.0594%) (100%) & Battery, Forgery, Counterfeit-
ing, Perjury, Embezzlement, Arson,
Adultery, Kidnapping, Seduction,
Abortion, Homicide, Rape)
0.01 0.99 .9996 97 122806940.5 (Libel, Larceny, Assault & Battery,
(96.0396%) (99.9949%) Burglary, Forgery, Counterfeiting,
Perjury, Embezzlement, Arson,
Adultery, Kidnapping, Seduction,
Abortion, Homicide, Rape)
0.05 0.95 .9984 99 122733072.5 (Libel, Larceny, Assault & Battery,
(98.0198%) ( 99.9348%) Burglary, Forgery, Counterfeiting,
Perjury, Embezzlement, Arson,
Adultery, Kidnapping, Seduction,
Abortion, Rape, Homicide)
1 0 N/A 101 122562702.5 (Libel, Larceny, Assault & Battery,
(100%) (99.7961%) Burglary, Forgery, Counterfeiting,
Perjury, Embezzlement, Arson,
Kidnapping, Adultery, Seduction,
Abortion, Rape, Homicide)
Variable Selection
12 Introduction to Branch-and-Bound Methods
for Variable Selection
12.1 Background
{v4, v5}. Although the importance of this pruning loses some of its luster
because the size of this problem is small, the fundamental aspect of the
approach is evident. That is, for problems of practical size, the branch-
and-bound approach enables variable subsets to be implicitly evaluated
and eliminated before they need to be explicitly evaluated.
Table 13.2. Variable selection using criterion (2.3) on the data in Table 13.1.
Row Variables p Criterion Dispensation
1 {v1} 1 0.00000 Branch forward
2 {v1, v2} 2 15.29762 Prune (15.3 ≥ 13.3), Branch right
3 {v1, v3} 2 8.80000 Branch forward
4 {v1, v3, v4} 3 80.06667 Suboptimal, 80.07 ≥ 13.3, Branch right
5 {v1, v3, v5} 3 51.94286 Suboptimal, 51.94 ≥ 13.3, Branch right
6 {v1, v3, v6} 3 13.20000 *New Incumbent, f* = 13.2
7 {v1, v4} 2 22.41667 Prune (22.42 ≥ 13.2), Branch right
8 {v1, v5} 2 20.09524 Prune (20.10 ≥ 13.2), Retract
9 {v2} 1 1.80000 Branch forward
10 {v2, v3} 2 21.06667 Prune (21.07 ≥ 13.2), Branch right
11 {v2, v4} 2 25.08333 Prune (25.08 ≥ 13.2), Branch right
12 {v2, v5} 2 20.33333 Prune (20.33 ≥ 13.2), Retract
13 {v3} 1 2.83333 Branch forward
14 {v3, v4} 2 26.21429 Prune (26.21 ≥ 13.2), Branch right
15 {v3, v5} 2 21.33333 Prune (21.33 ≥ 13.2), Retract
16 {v4} 1 2.97857 Branch forward
17 {v4, v5} 2 30.27381 Prune (30.27 ≥ 13.2), Retract
18 {v5} 1 TERMINATE
The optimal solution for the variable selection problem is identified in
row 6 of Table 13.2, indicating that the optimal subset of variables is {v1,
v3, v6}. The optimal partition of objects corresponding to this optimal
subset is shown in Table 13.3. Each of the four clusters in Table 13.3
contains five objects, and the similarity of within-cluster measurements
across the three selected variables clearly reveals considerable cluster
homogeneity. The table also reveals the lack of within-cluster homogene-
ity with respect to the unselected variables {v2, v4, v5}.
Table 13.3. Cluster assignments for optimal solution to the variable selection
problem.
Object v1 v3 v6 v2 v4 v5
Cluster 1 9 7 2 6 4 7 2
16 6 1 7 4 4 1
2 7 2 7 2 6 1
14 6 2 7 7 2 4
19 7 1 6 5 2 5
Cluster 2 3 2 6 5 3 5 5
4 1 5 6 5 7 1
18 2 6 5 2 2 6
8 2 5 5 5 6 1
13 2 6 5 2 1 4
Cluster 3 5 7 6 2 3 1 6
17 7 7 3 1 3 5
12 6 6 3 5 6 7
6 6 7 3 5 7 6
7 7 6 2 5 4 7
Cluster 4 10 1 3 2 7 7 5
11 2 2 3 5 2 5
1 2 3 2 2 4 4
15 1 3 2 5 4 7
20 2 3 3 5 2 2
If n and K were sufficiently large so as to preclude successful imple-
mentation of the branch-and-bound algorithm for (2.3), the variable se-
lection branch-and-bound procedure could employ the K-means heuristic
procedure (MacQueen, 1967) for the solution evaluation process. The al-
gorithm would no longer produce a guaranteed, comprehensively optimal
solution; however, tremendous improvements in efficiency would be re-
alized.
In addition to computational feasibility issues associated with problem
size, there are at least two important limitations to the approach de-
13.4. Strengths, Limitations, and Extensions 185
ated with the slope and intercept terms of (14.1), then the Residual Sum
of Squares (RSSy) can be written in matrix form as:
RSSy = (y – Xb)′ (y – Xb). (14.3)
n
where TSS y = ¦ (y − y) ,
2
i (14.6)
i =1
1 n
and y= ¦ yi . (14.7)
n i =1
variables that are eliminated rather than selected, and (b) the partial solu-
tion evaluation requires the solution of the normal equation (14.4), rather
than a combinatorial clustering problem. For this reason, the branch-and-
bound algorithm for variable selection in regression tends to have greater
scalability than the variable selection algorithm for K-means clustering.
Dependent
Independent variables (predictors) variable
Object v1 v2 v3 v4 v5 v6 y
1 4 4 1 7 5 7 22
2 6 5 2 7 6 6 28
3 2 7 2 5 6 4 17
4 7 3 7 3 6 4 38
5 5 6 6 1 3 7 36
6 1 3 6 2 7 4 20
7 2 2 3 7 7 7 21
8 4 4 1 4 4 3 18
9 5 3 6 1 4 6 35
10 7 6 1 3 7 5 29
11 4 3 3 7 7 5 23
12 4 4 7 7 5 1 24
13 6 3 7 7 6 5 38
14 1 3 2 2 6 7 16
15 6 6 4 5 2 3 30
16 1 7 5 3 4 5 21
17 4 6 3 5 5 6 26
18 3 5 5 3 4 6 29
19 6 7 5 6 1 2 27
20 6 4 5 4 5 1 28
21 6 4 3 7 7 7 33
22 2 1 4 7 5 5 23
23 4 3 3 4 7 7 29
24 2 6 7 6 7 3 23
25 6 3 1 1 6 2 22
192 14 Variable Selection for Regression Analysis
Eliminated
which was close to what might be expected, as the data in Table 14.1
were generated using the following equation in conjunction with a small
random error component, İi.
y i = 3.0 xi1 + 2.0b3 xi 3 + 1.5b6 xi 6 + ε i . (14.11)
14.3 Application to a Larger Data Set 193
Table 14.4. Candidate independent variables from Heinz et al. (2003). The de-
pendent variable is weight (in kilograms).
Independent
Label Description
variable class
v1 Biacromial diameter
v2 Biiliac diameter, or “pelvic breadth”
Measurements
v3 Bitrochanteric diameter
Skeletal
(cm)
Table 14.6. Minitab regression output for the optimal 13-variable subset.
Predictor Coefficient Std. Dev. T-statistic p-value
Intercept -121.708 2.5240 -48.22 .000
v4 (chest depth) .26497 .06846 3.87 .000
v8 (knee diameter) .5572 .12170 4.58 .000
v10 (shoulder girth) .08507 .02812 3.03 .003
v11 (chest girth) .1709 .03389 5.04 .000
v12 (waist girth) .37698 .02513 15.00 .000
v14 (hip girth) .22403 .03844 5.83 .000
v15 (thigh girth) .24641 .04872 5.06 .000
v17 (forearm girth) .56662 .09669 5.86 .000
v18 (knee girth) .17977 .07427 2.42 .016
v19 (calf girth) .35343 .06159 5.74 .000
v22 (age) -.05208 .01176 -4.43 .000
v23 (height) .31403 .01584 19.83 .000
v24 (gender) -1.4278 .48490 -2.94 .003
RSSy = 2181.9, R2 = 97.6%, Adjusted R2 = 97.5%
14.3 Application to a Larger Data Set 197
Table 14.7. Minitab results after removal of age and gender as predictors.
Predictor Coefficient Std. Dev. T-statistic p-value
Intercept -121.26500 2.34000 -51.83 .000
v4 (chest depth) .21318 .06926 3.08 .002
v8 (knee diameter) .44350 .12230 3.63 .000
v10 (shoulder girth) .08181 .02817 2.90 .004
v11 (chest girth) .17810 .03457 5.15 .000
v12 (waist girth) .32883 .02324 14.15 .000
v14 (hip girth) .22388 .03673 6.10 .000
v15 (thigh girth) .33886 .04630 7.32 .000
v17 (forearm girth) .48190 .09062 5.32 .000
v18 (knee girth) .21742 .07537 2.88 .004
v19 (calf girth) .35264 .06306 5.59 .000
v23 (height) .31072 .01519 20.46 .000
RSS = 2296.7, R2 = 97.5%, Adjusted R2 = 97.4%
The number of possible model refinements and explorations that can
be made with the branch-and-bound model is enormous. Nevertheless,
we offer one final demonstration, attempting to provide a compromise
between model A and model B. We eliminated the age (v22) and gender
(v24) variables from the candidate pool and sought to identify a subset of
predictors that would provide an effective combination of girth and
skeletal variables (in addition to height). The branch and bound algo-
rithm produced the same subset as shown in Table 14.6, with the excep-
tion that age and gender are omitted. The Minitab regression results for
this subset (Table 14.7) reveal that the removal of age and gender does
198 14 Variable Selection for Regression Analysis
not substantially impair the explained variation, and this 11-predictor so-
lution still explains more variation than model B.
size. Hastie, Tibshirani, and Friedman (2001, Chapter 3) suggest that the
subset size should be jointly determined on the basis of bias and vari-
ance, and thoroughly address these two issues in Chapter 7 of their book.
Finally, we note that there are a number of extensions of branch-and-
bound methods in regression analysis. For example, Leenen and Van
Mechelen (1998) described an application of branch-and-bound within
the context of Boolean regression. Armstrong and Frome (1976) devel-
oped a branch-and-bound procedure for regression where the slope coef-
ficients for independent variables are constrained to be nonnegative (see
also Hand, 1981a). The bounding procedure in this application deter-
mines which variables should be driven to the boundary, i.e., forced to
have coefficients of zero. Variable selection has also been recognized as
an especially important problem in logistic regression (Hosmer, Jovano-
vic, & Lemeshow, 1989; King, 2003).
The output measures from the program correspond to the terms in the
brackets on the left-side of (14.12). The (D – d) × 1 vector sel contains
the selected variables. The first element of the vector b is the intercept
term, and the remaining D – d components of the vector are the slope co-
efficients and match one-to-one with the selected variables in sel. The rss
and r2 terms are the optimal residual sum of squares and coefficient of
determination, respectively. The cp value is Mallows Cp index (Mallows,
1973), which could be useful as a guideline for comparing subsets of dif-
14.5 Available Software 201
size, whereas Furnival and Wilson’s (1974) algorithm produces both op-
timal and several near-optimal solutions for subsets of various sizes.
These limitations of our programs notwithstanding, our experience
with the programs is encouraging. Hastie et al. (2001) observe that the
practical limit for branch-and-bound procedures similar to the one de-
signed by Furnival and Wilson (1974) is roughly 30 ≤ D ≤ 40. We have
successfully applied bestsub.m and bestsub.for to data sets with 30 to 50
predictor variables and several good competing regression models within
the set of candidate variables (a condition that tends to increase solution
difficulty). Even for the relatively larger problem instances, we are fre-
quently able to produce optimal results in just a few minutes of micro-
computer CPU time. We believe that the successful performance of our
implementations is associated with the variable reordering based on the
two-stage heuristic procedures use to establish an initial regression solu-
tion.
Although our results with bestsub.m and bestsub.for do appear promis-
ing, there are a variety of possible enhancements that could be under-
taken. For example, greater practical utility would be realized from the
implementation of procedures to compare the results associated with dif-
ferent subset sizes.
APPENDIX A
This algorithm assumes that the number of objects, n, has been previ-
ously determined. Other pertinent variables are for the position pointer
(Position), the array of n possible positions for the objects (permutation),
the incumbent solution (incumbent), and the array holding the objects in
the order of the best solution found at any time during the execution of
the algorithm (BestSolution). Secondary Boolean variables (NotRetract,
NotRedundancy, found, and fathom) control the flow of the algorithm or
assist in determining the final object to be assigned a place in the permu-
tation.
Four functions—EVALUATION (Real), ADJACENCYTEST (Boo-
lean) or INTERCHANGETEST (Boolean), and BOUNDTEST (Boo-
lean)—are dependent on the criteria being implemented. The
ADJACENCYTEST compares the contribution to the objective function
value of swapping permutation(Position – 1) with permutation(Position);
if the contribution is greater, then the adjacency test fails, we prune the
branch and we move to the next branch. The ADJACENCYTEST can
be replaced by an INTERCHANGETEST, which extends the adjacency
test to compare effects on the objective function value by swapping the
candidate for permutation(Position) with objects previously assigned to
positions in permutation. The variable LowerB is determined in the algo-
rithm, initially set to 0 or calculated by the combinatorial heuristic sug-
gested by Hubert and Arabie (1994). The determination of the initial
lower bound parallels the INITIALIZE step in branch-and-bound proce-
dures for cluster analysis. The BOUNDTEST calculates an upper bound
given a partial sequence of the n objects; if the upper bound for the par-
tial sequence is less than LowerB, then the branch is pruned and we
branch right.
/* Set lower bound on objective function value */
206 APPENDIX B
next j
if not found then permutation(n) = i
next i
/* Evaluate when complete sequence is ready */
incumbent = EVALUATION
if incumbent > LowerB then
LowerB = incumbent
BestSolution = permutation
end if
else
/* Perform fathoming tests. If either test fails, then we
remain in this loop—incrementing the object in Position
until permutation(Position) > n */
if ADJACENCYTEST then fathom = BOUNDTEST
end if
end if /* No Retraction */
end if /* No Redundancy */
loop /* fathom loop */
loop /* Termination loop */
/* Return BestSolution as optimal permutation and LowerB as optimal
objective function value */
References
Cho, R. Y., Yang, V., & Hallett, P. E. (2000). Reliability and dimensionality of
judgments of visually textured materials. Perception & Psychophysics, 62,
735-752.
Cobby, J. M. (1986). AS R67: A remark on AS 199, a branch and bound algo-
rithm for determining the optimal feature subset of a given size. Applied
Statistics, 35, 314.
Cobby, J. M. (1991). Correction to remark AS 67 – a remark on AS 199, a
branch and bound algorithm for determining the optimal feature subset of a
given size. Applied Statistics, 40, 376-377.
DeCani, J. S. (1969). Maximum likelihood paired comparison ranking by linear
programming. Biometrika, 56, 537-545.
DeCani, J. S. (1972). A branch and bound algorithm for maximum likelihood
paired comparison ranking by linear programming. Biometrika, 59, 131-
135.
Defays, D. (1978). A short note on a method of seriation. British Journal of
Mathematical and Statistical Psychology, 31, 49-53.
Delattre, M., & Hansen, P. (1980). Bicriterion cluster analysis. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, 2, 277-291.
DeSarbo, W. S., Carroll, J. D., Clark, L. A., & Green, P. E. (1984). Synthesized
clustering: A method for amalgamating alternative clustering bases with dif-
ferent weighting of variables. Psychometrika, 49, 57-78.
DeSarbo, W. S., & Grisaffe, D. (1998). Combinatorial optimization approaches
to constrained market segmentation: An application to industrial market
segmentation. Marketing Letters, 9, 115-134.
Diday, E. (1986). Orders and overlapping clusters by pyramids. In J. de Leeuw,
W. J. Heiser, J. Meulman, & F. Critchley (Eds.), Multidimensional data
analysis (pp. 201-234). Leiden: DSWO Press.
Diehr, G. (1985). Evaluation of a branch and bound algorithm for clustering.
SIAM Journal for Scientific and Statistical Computing, 6, 268-284.
Draper, N. R., & Smith, H. (1981). Applied regression analysis (2nd edition).
New York: Wiley.
Durand, C., & Fichet, B. (1988). One-to-one correspondence in pyramidal repre-
sentations: A unified approach. In H. H. Bock (Ed.), Classification and re-
lated methods of data analysis (pp. 85-90). New York: Springer-Verlag.
Ferligoj, A., & Batagelj, V. (1992). Direct multicriteria clustering algorithms.
Journal of Classification, 9, 43-61.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems.
Annals of Eugenics, 7, 179-188.
Flueck, J. A., & Korsh, J. F. (1974). A branch search algorithm for maximum
likelihood paired comparison ranking. Biometrika, 61, 621-626.
Forgy, E. W. (1965). Cluster analyses of multivariate data: Efficiency versus in-
terpretability of classifications. Biometrics, 21, 768.
Fowlkes, E. B., Gnanadesikan, R., & Kettenring, J. R. (1988). Variable selection
in clustering. Journal of Classification, 5, 205-228.
212 References
Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierar-
chical clusterings. Journal of the American Statistical Association, 78, 553-
584.
Friedman, J. H., & Meulman, J. J. (2004). Clustering objects on subsets of at-
tributes. Journal of the Royal Statistical Society B, 66, 815-849.
Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.).
New York: Academic Press.
Furnival, G. M. (1971). All possible regressions with less computation. Tech-
nometrics, 13, 403-408.
Furnival, G. M., & Wilson, R. W. (1974). Regression by leaps and bounds.
Technometrics, 16, 499-512.
Gnanadesikan, R., Kettenring, J. R., & Tsao, S. L. (1995). Weighting and selec-
tion of variables for cluster analysis. Journal of Classification, 12, 113-136.
Green, P. E., Carmone, F. J., & Kim, J. (1990). A preliminary study of optimal
variable weighting in K-means clustering. Journal of Classification, 7, 271-
285.
Groenen, P. J. F., & Heiser, W. J. (1996). The tunneling method for global op-
timization in multidimensional scaling. Psychometrika, 61, 529-550.
Groenen, P. J. F., Heiser, W. J., & Meulman, J. J. (1999). Global optimization of
least-squares multidimensional scaling by distance smoothing. Journal of
Classification, 16, 225-254.
Grötschel, M., Jünger, M., & Reinelt, G. (1984). A cutting plane algorithm for
the linear ordering problem. Operations Research, 32, 1195-1220.
Guénoche, A. (1993). Enumération des partitions de diamètre minimum. Dis-
crete Mathematics, 111, 277-287.
Guénoche, A. (2003). Partitions optimisées selon différents critères: Enuméra-
tion des partitions de diamètre minimum. Mathematics and Social Sciences,
41, 41-58.
Guénoche, A., Hansen, P., & Jaumard, B. (1991). Efficient algorithms for divi-
sive hierarchical clustering with the diameter criterion. Journal of Classifi-
cation, 8, 5-30.
Hamamoto, Y., Uchimura, S., Matsura, Y., Kanaoka, T., & Tomita, S. (1990).
Evaluation of the branch and bound algorithm for feature selection. Pattern
Recognition Letters, 11, 453-456.
Hand, D. J. (1981a). Branch and bound in statistical data analysis. The Statisti-
cian, 30, 1-13.
Hand, D. J. (1981b). Discrimination and classification. New York: Wiley.
Hansen, P., & Delattre, M. (1978). Complete-link cluster analysis by graph col-
oring. Journal of the American Statistical Association, 73, 397-403.
Hansen, P., & Jaumard, B. (1987). Minimum sum of diameters clustering. Jour-
nal of Classification, 2, 277-291.
Hansen, P., & Jaumard, B. (1997). Cluster analysis and mathematical program-
ming. Mathematical Programming, 79, 191-215.
Hansen, P., Jaumard, B., & Mladenovic, N. (1998). Minimum Sum of squares
clustering in a low dimensional space. Journal of Classification, 15, 37-55.
References 213
Little, J. D. C., Murty, K. G., Sweeney, D. W., & Karel, C. (1963). An algorithm
for the traveling salesman problem. Operations Research, 11, 972-989.
MacQueen, J. B. (1967). Some methods for classification and analysis of multi-
variate observations. In L. M. Le Cam & J. Neyman (Eds.), Proceedings of
the fifth Berkeley symposium on mathematical statistics and probability
(vol. 1, pp. 281-297). Berkeley, CA: University of California Press.
Mallows, C. L. (1973). Some comments on Cp. Technometrics, 15, 661-675.
Manning, S. K., & Shofner, E. (1991). Similarity ratings and confusability of
lipread consonants compared with similarity ratings of auditory and ortho-
graphic stimuli. American Journal of Psychology, 104, 587-604.
MathWorks, Inc. (2002). Using MATLAB (Version 6). Natick, MA: The Math-
Works, Inc.
Miller, A. J. (2002). Subset selection in regression (2nd ed.). London: Chapman
and Hall.
Miller, G. A., & Nicely, P. E. (1955). Analysis of perceptual confusions among
some English consonants. Journal of the Acoustical Society of America, 27,
338-352.
Milligan, G. W. (1989). A validation study of a variable-weighting algorithm for
cluster analysis. Journal of Classification, 6, 53-71.
Milligan, G. W. (1996). Clustering validation: Results and implications for ap-
plied analyses. In P. Arabie, L. J. Hubert, & G. De Soete (Eds.), Clustering
and classification (pp. 341-375). River Edge, NJ: World Scientific Publish-
ing.
Milligan, G. W., & Cooper, M. C. (1986). A study of the comparability of exter-
nal criteria for hierarchical cluster analysis. Multivariate Behavioral Re-
search, 21, 441-458.
Minitab, Inc. (1998). Minitab user’s guide 2: Data analysis and quality tools.
State College, PA: Minitab, Inc.
Morgan, B. J. T., Chambers, S. M., & Morton, J. (1973). Acoustic confusion of
digits in memory and recognition. Perception & Psychophysics, 14, 375-
383.
Mulvey, J., & Crowder, H. (1979). Cluster analysis: An application of Lagran-
gian relaxation. Management Science, 25, 329-340.
Murty, K. G. (1995). Operations research: Deterministic optimization models.
Englewood Cliffs, NJ: Prentice-Hall.
Murty, K. G., Karel, C., & Little, J. D. C. (1962). The traveling salesman prob-
lem: Solution by a method of ranking assignments. Cleveland: Case Institute
of Technology.
Narendra, P. M., & Fukunaga, K. (1977). A branch and bound algorithm for fea-
ture subset selection. IEEE Transactions on Computers, 26, 917-922.
Neter, J., Wasserman, W., & Kutner, M. H. (1985). Applied linear statistical
models (2nd ed.). Homewood, IL: Irwin.
Palubeckis, G. (1997). A branch-and-bound approach using polyhedral results
for a clustering problem. INFORMS Journal on Computing, 9, 30-42.
216 References
Parker, R. G., & Rardin R. L. (1988). Discrete optimization. San Diego: Aca-
demic Press.
Phillips, J. P. N. (1967). A procedure for determining Slater’s i and all nearest
adjoining orders. British Journal of Mathematical and Statistical Psychol-
ogy, 20, 217-225.
Phillips, J. P. N. (1969). A further procedure for determining Slater’s i and all
nearest adjoining orders. British Journal of Mathematical and Statistical
Psychology, 22, 97-101.
Ramnath, S., Khan, M. H., & Shams, Z. (2004). New approaches for sum-of-
diameters clustering. In D. Banks, L. House, F. R. McMorris, P. Arabie, &
W. Gaul (Eds.), Classification, clustering, and data mining applications (pp.
95-103). Berlin: Springer-Verlag.
Ranyard, R. H. (1976). An algorithm for maximum likelihood ranking and Sla-
ter’s i from paired comparisons. British Journal of Mathematical and Statis-
tical Psychology, 29, 242-248.
Rao, M. R. (1971). Cluster analysis and mathematical programming. Journal of
the American Statistical Association, 66, 622-626.
Ridout, M. S. (1988). Algorithm AS233: An improved branch and bound algo-
rithm for feature subset selection. Applied Statistics, 37, 139-147.
Roberts, S. J. (1984). Algorithm AS199: A branch and bound algorithm for de-
termining the optimal feature subset of a given size. Applied Statistics, 33,
236-241.
Robinson, W. S. (1951). A method for chronologically ordering archaeological
deposits. American Antiquity, 16, 293-301.
Rodgers, J. L., & Thompson, T. D. (1992). Seriation and multidimensional scal-
ing: A data analysis approach to scaling asymmetric proximity matrices.
Applied Psychological Measurement, 16, 105-117.
Ross, B. H., & Murphy, G. L. (1999). Food for thought: Cross-classification and
category organization in a complex real-world domain. Cognitive Psychol-
ogy, 38, 495-553.
Simantiraki, E. (1996). Unidimensional scaling: A linear programming approach
minimizing absolute deviations. Journal of Classification, 13, 19-25.
Slater, P. (1961). Inconsistencies in a schedule of paired comparisons. Bio-
metrika, 48, 303-312.
Soland, R. M. (1979). Multicriteria optimization: A general characterization of
efficient solutions. Decision Sciences, 10, 26-38.
Späth, H. (1980). Cluster analysis algorithms for data reduction and classifica-
tion of objects. New York: Wiley.
Steinley, D. (2003). Local optima in K-means clustering: What you don’t know
may hurt you. Psychological Methods, 8, 294-304.
Thurstone, L. L. (1927). The method of paired comparisons for social values.
Journal of Abnormal and Social Psychology, 31, 384-400.
Tobler, W. R. (1976). Spatial interaction patterns. Journal of Environmental
Studies, 6, 271-301.
References 217
van der Heijden, A. H. C., Mathas, M. S. M., & van den Roovaart, B. P. (1984).
An empirical interletter confusion matrix for continuous-line capitals. Per-
ception & Psychophysics, 35, 85-88.
Vega-Bermudez, F., Johnson, K. O., & Hsiao, S. S. (1991). Human tactile pat-
tern recognition: Active versus passive touch, velocity effects, and patterns
of confusion. Journal of Neurophysiology, 65, 531-546.
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function.
Journal of the American Statistical Association, 58, 236-244.
Weisberg, S. (1985). Applied linear regression. New York: Wiley.
Zupan, J. (1982). Clustering of large data sets. Letchworth, UK: Research Stud-
ies Press.
Index