0% found this document useful (0 votes)

83 views18 pages

Review Madeira 2015

This document discusses pattern mining-based approaches to biclustering, which is the task of finding subsets of rows that exhibit coherent patterns over subsets of columns in a matrix. Pattern mining allows for efficient and exhaustive searches to discover flexible bicluster structures with configurable coherence and noise tolerance. The document proposes a structured view of state-of-the-art pattern mining biclustering approaches and principles to guide defining new approaches. Empirical evidence shows these principles ensure robustness, efficiency and flexibility of pattern mining biclustering.

Uploaded by

silvarjf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views18 pages

Review Madeira 2015

Uploaded by

silvarjf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Pattern Recognition 48 (2015) 39413958

Contents lists available at ScienceDirect

Pattern Recognition
journal homepage: www.elsevier.com/locate/pr

A structured view on pattern mining-based biclustering

Rui Henriques a,b,n, Cludia Antunes a, Sara C. Madeira a,b
a
CSE Department, Instituto Superior Tcnico, Universidade de Lisboa, Portugal
b
INESC-ID, Lisbon, Portugal

art ic l e i nf o a b s t r a c t

Article history: Mining matrices to nd relevant biclusters, subsets of rows exhibiting a coherent pattern over a subset of
Received 1 May 2014 columns, is a critical task for a wide-set of biomedical and social applications. Since biclustering is a
Received in revised form challenging combinatorial optimization task, existing approaches place restrictions on the allowed
12 May 2015
structure, coherence and quality of biclusters. Biclustering approaches relying on pattern mining (PM)
Accepted 26 June 2015
allow an exhaustive yet efcient space exploration together with the possibility to discover exible
Available online 8 July 2015
structures of biclusters with parameterizable coherency and noise-tolerance. Still, state-of-the-art
Keywords: contributions are dispersed and the potential of their integration remains unclear.
Biclustering This work proposes a structured and integrated view of the contributions of state-of-the-art PM-
Pattern mining
based biclustering approaches, makes available a set of principles for a guided denition of new PM-
based biclustering approaches, and discusses their relevance for applications in pattern recognition.
Empirical evidence shows that these principles guarantee the robustness, efciency and exibility of
PM-based biclustering.
& 2015 Elsevier Ltd. All rights reserved.

1. Introduction order combinations of single-nucleotide polymorphisms (SNPs) [42].

However, due to the complexity of the biclustering task1, most of the
The clustering of data matrices groups rows according to their existing algorithms are either based on greedy or stochastic approaches,
overall values across columns. However, in real-world contexts, the potentially producing sub-optimal and constrained biclustering solu-
correlation of a subset of rows is typically only signicant and tions [82,67]. Illustrative constraints that prevent the exibility of the
meaningful for a subset of the overall columns [114]. Biclustering biclustering task include the search for a xed number of biclusters,
seeks to nd sub-matrices (biclusters), subsets of rows with a coherent non-overlapping structures and biclusters with differential-values only
pattern across subsets of columns. Illustrating, given a matrix that (binary settings) or sequential constraints [81,82,119]. In this context,
captures the expression of a set of genes (rows) across a set of the survey of efcient optimal searches for exible biclustering scenar-
conditions (columns), a bicluster denes a group of genes with ios is the target task in this work.
coherent expression for a subset of conditions. The biclustering task The attempts to perform biclustering based on pattern mining
in this domain is critical for the discovery of putative transcriptional (PM) techniques [86,111,97], referred in this work as PM-based
modules of genes that participate in a cellular process that is only biclustering, show solid results for efcient and exible exhaustive
active in specic conditions [46,40]. Table 1 provides additional searches. In fact, since pattern mining research is driven by
applications in biomedical and social domains, synthesizing the mean- scalability requirements [54], its integration with biclustering
ing and relevance of discovering biclusters for pattern recognition. denes a new promising direction. Contributions of PM-based
Recent ndings from biomedical domains show that exhaustive and approaches for biclustering include:
exible approaches to biclustering provide an unprecedented opportu-
nity for an unbiased assessment of the native structure and modular efcient exhaustive searches: PM algorithms as-is allow for the
organization of biological networks [14], new insights on the molecular efcient analysis of large matrices (over 10.000 400
units involved in cellular functions [60,111], and discriminative high-

1
Biclustering involves combinatorial optimization to select and group rows
n
Correspondence to: DEI, IST, Avenida Rovisco Pais, 1, 1049-001 Lisboa, Portugal. and columns and it is known to be a NP problem (by mapping the task over binary
Tel.: 351 21 310 0 300; fax: 351 21 841 7 789. matrices into the problem of nding maximal cliques in weighted bipartite graphs
E-mail addresses: [email protected] (R. Henriques), [104]). The problem complexity increases for non-binary settings and when
[email protected] (C. Antunes), elements are allowed to participate in more than one bicluster (non-exclusive
[email protected] (S.C. Madeira). structure) and in no bicluster at all (non-exhaustive structure).

http://dx.doi.org/10.1016/j.patcog.2015.06.018
0031-3203/& 2015 Elsevier Ltd. All rights reserved.
3942 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958

Table 1
Relevance of the biclustering task for pattern recognition applications.

Data Biclustering solutions

Biomedical Physiological [23,39,44] Modules of sliding features and partitions of the signal across a subset of case or stimuli-elicited responses; groups of patients
with shared local patterns; markers for phenotype characterization.
Clinical [59,27] Groups of patients with correlated clinical features or health records (shared treatments, diagnoses, prescriptions and clinical
tests); class-conditional proles for computer-aided diagnosis.
Genomic structural Correlated groups of mutations and copy number variations, such as genetic similarities and dissimilarities of different
variations [42,67] populations.
Biological networks [12] Modules of genes, proteins or metabolites with cohesive local interaction using matrices that capture the pairwise connections
between all molecular units.
Gene expression [62,82] Groups of genes involved in functional processes and pathways (cellular responses to growth, development, drugs and disease
progression) only active under certain conditions.
Genome-wide [127,124] Conserved functional subsequences (alignments), factor binding sites and insertion mutagenesis.
Other [37,78,73] Local regularities in translational, chemical or nutritional data;

Social Social networks [50] Groups of individuals with shared interests, correlated activity and/or coherent intercommunication; aggregation of contents
based on correlated accessors' prole, comments and tags.
Text [82] Groups of content-related documents to support searches, suggestions and tagging (rows in the input matrix denote
documents and columns denote the words), among others.
(e-)commerce [9] Hidden browsing patterns containing relationships between sets of (web) users and (web) pages and acquisitions which are
useful for (web) advertising and marketing.
Financial trading [68] Subsets of indicators producing similar protability for subsets of trading points (buy and sell signals) in the stock market in
order to support buy-and-hold decisions.
Collaborative ltering [33] Groups of users who share the same rating patterns and behaviorial patterns for a subset of all available actions for
recommendation and quality studies.

elements). Additional PM principles can be used to foster research. In this context, this work provides three major
scalability, including searches in distributed/partitioned data contributions:
settings or targeting approximate patterns [54,52].
dealing with missing and noisy values [62,63]: PM methods can motivates, formalizes and provides a qualitative and quantita-
mine transactions with varying length, and therefore a specic tive assessment of the state-of-the-art algorithms for PM-based
element from the input matrix can be associated with zero or biclustering;
multiple values, allowing the removal or bounded estimations offers a structured view on how to dene, parameterize and
of a missing or noisy value. extend PM-based biclustering by coherently integrating the
inherent orientation to learn constant models, yet recently available yet dispersed contributions;
extended to also learn additive, multiplicative, symmetric, further surveys PM principles as well as adequate preproces-
order-preserving and plaid models [62,60,63]; sing and postprocessing criteria to guarantee the robustness,
capturing biclusters from patterns with multiple levels of exibility and scalability of PM-based biclustering across
expression [96,101]. This contrasts with the majority of existing domains.
approaches that rely on differential values or xed coherency
strength [119]; The paper is organized as follows. The remainder of this section
exible structures of biclusters (arbitrary positioning of biclus- provides background on pattern mining and biclustering, and
ters) and searches (no need to x the number of biclusters surveys the contributions from existing PM-based biclustering
apriori) [96,111]; approaches. Section 2 introduces a consistent set of principles to
annotating the signicance of biclusters with PM principles to guide the denition of PM-based biclustering approaches. In
assess the relevance of patterns [72]; particular, Sections 2.12.3 cover principles according to three
easy extension for multi-class settings using discriminative PM major decision dimensions (mining, mapping and closing), and
or classication rules [43,95]; Section 2.4 compares the behavior of state-of-the-art PM-based
easy incorporation of PM-based constraints that can be effec- biclustering approaches and proposes a set of principles to address
tively used to guide the search, promoting both efciency, by their current challenges. Section 3 provides initial empirical
pruning the search space, and a focus on non-trivial biclusters evidence of the relevance of the proposed principles. Finally, the
[116]. implications of this work are synthesized.

These properties of PM-based biclustering approaches are 1.1. Background on PM-based biclustering
critical to tackle the problems highlighted in Table 1. Although
the latest biclustering advances for pattern recognition are Pattern mining: Frequent patterns are itemsets, rules, subsequences,
increasingly deterministic [89,110,128,137,47,35,131], they fail to or substructures that appear in a dataset with frequency no less than
meet several of the enumerated properties of PM-based bicluster- a user-specied threshold. Let L be a nite set of items, and P be an
ing. Table 2 pinpoints the benets of using PM-based biclustering itemset P D L. A transaction t is a pair t id ; P with id A N. An itemset
for pattern recognition. database D over L is a nite set of transactions ft 1 ; ; t n g. A
Despite these listed potentialities, recent surveys on bicluster- transaction id; P contains P 0 , denoted P 0 D t id ; P, if P 0 D P. The
ing [46,40,28,114] fail to explore the opportunities associated with coverage P of an itemset P is the set of all transactions in D in
PM-based biclustering. Additionally, the existing efforts towards which the itemset P occurs: P ft A DP D tg. The support of an
PM-based biclustering provide critical principles that are not yet itemset P in D, denoted supP, can either be absolute, being its
integrated [14,86,111]. As such, there is still space for new coverage size P , or a relative threshold given by j P j =j Dj .
approaches that benet from the integration of principles pro- An association rule is dened as an implication of the form
vided by these existing contributions as well as from other elds of P-P 0 , where P; P 0 D L and P \ P 0 . The left-hand side of the rule
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3943

Table 2
Benets of PM-based biclustering for pattern recognition.

Property Benet

Exhaustive scalable searches Delivery of optimality guarantees for large data such as data from clinical, molecular and social web domains.
Noise robustness Handling of uncertainty relations observed in social networks [50] and stock markets [68]; artefacts in multivariate physiological data
(such as electroencephalograms [41]), experimental errors in molecular arrays [56].
Handling of missing values Adequate mining of incomplete and/or sparse matrices derived from biological networks, web social contexts, and healthcare data.
Flexible coherency Constant models for non-differential (yet coherent) functional associations; additive and multiplicative factors to model the distinct
responsiveness and experimental bias of biological molecules and physiological signals; symmetries to simultaneously capture activation
and repression mechanisms and opposed (yet correlated) regularities associated with trading, tweeting, browsing and (e-)commerce
activity; plaid models for overlapping regulatory inuence in biological contexts and cumulative effects in social/biological networks
[60,62,61].
Parameterizable level of Dynamic denition of the desirable coherency strength for an adequate multi-level analysis of matrices derived from expression data
coherency (optimum number of expression levels [86]), scored networks, collaborative ltering data (grading scale), and physiological signals
(adequate resolution [39]).
Flexible structures Overlapping groups of molecular units, physiological features, patients, web users and transactions with varying size and congurations.
Annotated signicance Testing the statistical signicance of biclustering solutions (guaranteeing that their coherence does not occur by chance) to further validate
their use to support critical decisions, such as medical and nancial decisions.
Constraint-driven searches Discovery of non-trivial biclusters and ability to focus the search on specic biclusters of interest (e.g. specic regulatory behavior, high-
order SNPs from genome-wide data, web users with a specic behavior, health records related with particular medical conditions, domain-
guidance from background knowledge [42,66]).
Biclustering-based Support for classication tasks from matrices with a large number of uninformative elements (beneting from local views), including
classication computer-aided diagnosis, phenotype discrimination and user recommendations [27,43].

A bicluster B I; J is a r s submatrix of A, where

is named antecedent and the right-hand side consequent. Given an I i1 ; ; ir X is a subset of rows and J j1 ; ; js Y is a
itemset database D, the support of a rule, supP-P 0 , is given by subset of columns;
supP [ P 0 , and the condence of a rule, conf P-P0 , is given by The biclustering task is to identify a structure of biclusters
supP [ P 0
supP . Condence reveals the strength of the rule (the condi- B fB1 ; ; Bp g such that each bicluster Bk I k ; J k satises
tional probability that a transaction that contains the items in the specic criteria of homogeneity and signicance.
antecedent also contains the items in consequent).

Denition 1.1. Given an itemset database D and a minimum The homogeneity criteria is commonly guaranteed through the
support and condence thresholds, and : use of a merit function to guide the search [98]. An illustrative
merit function is the variance of values in the rows or columns in
frequent itemset mining (FIM) problem consists of computing the bicluster. Merit functions can either dene the homogeneity of
the set fPP D L; supP Z g; each bicluster (intra-bicluster homogeneity) or the homogeneity
association rule mining aims to compute fP; P 0 P D L; P 0 D of a set of biclusters (inter-bicluster homogeneity), allowing some
L; supP-P 0 Z; conf P-P 0 Zg. biclusters to deviate from the expected homogeneity as long as the
overall criterion is preserved. The merit function is the simplest
way to affect the coherency, quality and structure. The coherency of
A frequent itemset or a pattern is an itemset with supP Z . To
a bicluster is dened by the observed correlation of values
illustrate these concepts, consider the following itemset database,
(Denition 1.3). Biclusters can follow dense, constant, additive,
Dex ft 1 ; fB; E; Gg; t 2 ; fA; B; C; E; H; Jg; t 3 ; fA; B; D; H; Jg; t 4 ; fD;
multiplicative, plaid or order-preserving coherencies, either across
H; Jg; t 5 ; fA; H; Jg; t 6 ; fA; Ggg, with L12. We have fB;Jg ft 2 ;
rows or columns [82]. The quality of a bicluster is dened by the
t 3 g and supfB;Jg ft 2 ; t 3 g=6 0:3. An illustrative rule in Dex is R1 :
type and amount of accommodated noise. The structure is dened
fH; Jg-fAg with supR1 0.5 and conf R1 0.75. For 4, the FIM
by the number,2 size and positioning of biclusters. Flexible
tasks returns ffAg; fHg; fJg; fH; Jgg.
structures are characterized by an arbitrary-high set of (possibly
Consider two itemsets P and P 0 , where P 0 D P, and a predicate M.
overlapping) biclusters. The statistical signicance of a bicluster
M is monotonic when MP ) MP 0 and anti-monotonic when
determines how its probability of occurrence deviates from
:MP 0 ) :MP. FIM approaches rely on these properties: the
expectations. Following the taxonomy proposed by Madeira and
support of P is bounded by the support of P 0 and, if P 0 is not
Oliveira [82], Table 4 synthesizes the main biclustering approaches
frequent, then P is also not frequent. Table 3 shows three major
acccording to their search paradigm.
search variants that rely on these properties.
Since FIM proposal [2], multiple extensions have been proposed,
Denition 1.3. Let the elements in a bicluster aij A I; J have
including principles to enhance the scalability of pattern miners, and
coherency across rows given by aij kj i ij , where kj is the
condensed and approximate pattern representations [24,54].
expected value for column j, i is the adjustment for row i, and ij is
Pattern mining has been additionally applied over structured
the noise factor. Given a dataset A and a specic coherency
datasets, leading to contributions in different elds, including
strength A 0; maxA minA , aij kj i ij where ij A kj
sequential pattern mining [79], graph mining [129] and cube
=2; kj =2. The factors dene the coherency assumption:
computation [55].
constant when 0, multiplicative if aij is better described by
kj i ij , and additive otherwise. A plaid assumption considers the
Biclustering: Biclustering allows the discovery of subspaces, each
cumulative contributions from multiple biclusters on areas where
dening a subset of rows that show a coherent pattern that is
their rows and columns overlap.
observed for a subset of the overall columns.

Denition 1.2. Given a matrix, A (X,Y), with a set of rows 2

The number of outputted biclusters can either be xed (restrictive setting),
X fx1 ; ; xn g, a set of columns Yfy1 ; ; ym g, and elements aij A R parameterized by the user [29,67], dynamically parameterized based on the size
relating row i and column j: and stochastic properties of the input matrix [63], or variable [119].
3944 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958

Table 3
Three major search strategies to perform frequent itemset mining.

Strategy Principles Optimizations [54,22,76] Criticism

Apriori-based Monotonicity principle (an itemset is candidate if Incremental mining; Hashing; Use of bit-sets; Inefcient for dense data (density above
[2] all its subsets are frequent): k 1-itemsets are Reduced scans; Partitioning and sampling; 20%).
combined to create new candidate k-itemsets in k Dynamic itemset counting;
scans until no new candidate group can be
generated.
Pattern growth Divide-and-conquer without candidate generation Depth-rst tree generation; Alternative trees; Not able to deliver the supporting
[1] and multiple scans. A frequent-pattern tree is built Combined bottom-up and top-down traversals; transactions of a pattern (required for
(from an ordered list of frequent items) and mined Array-based structures. biclustering). Adequate for dense
(based on prex paths co-occurring with growing matrices and low supports.
sufx patterns). By using the least frequent items as
a sufx, a good selectivity is achieved.
Vertical Eclat, a representative vertical method, builds the Specialized structures; Bit-set operations; Optimized for attened matrices (n4 m).
projection transaction-set for each item and grows the
[135] itemsets under a depth-rst strategy (similar to FP-
growth) by intersecting transaction-sets to avoid
multiple scans.

Table 4
Classes of biclustering approaches according to merit-guided searches and optima guarantees.

Paradigm Optimality guarantees

Divide-and-conquer approaches to exploit the matrix recursively with the branching following a Local optima (local searches dependent on initial assumptions and
global merit function [57,128,137]. Although efcient, the structure of biclusters is restrictive convergence behavior)
and the initial assumptions can easily lead to the missing of relevant biclusters.
Greedy iterative approaches with the selection, addition and removal of rows and columns being
performed until a local merit function is maximized [35,89,131,94,15].

Two-way clustering approaches under merit functions to produce the clusters on both dimensions of Distance-based guarantees as learners rely on approximative views
the data matrix and to derive biclusters from their combinations [49,120,47]; Stochastic (clustering abstractions or generative models)
approaches that model data with a multivariate distribution [105,112,17,113] and learn a
parametric model that maximizes a merit function. This model is used to derive biclusters.

Ensemble methods [56] that use a merit function to aggregate a large set of biclustering solutions Dependent on selected approaches
from the iterative application of multiple biclustering approaches.

Exhaustive approaches under constrains (e.g. x number of biclusters, differential expression) Global optima
[119,126,110], which rely on heuristics based on merit functions to guide the space exploration.

PM-based biclustering: While traditional biclustering approaches rely extending methods based on the introduced monotonic (or Apriori)
on exible merit functions to guide the space exploration, PM-based property [2]. The rst class of methods rely on an itemization step
approaches require these functions to be dened in terms of support followed by the application of FIM under a low support threshold.
and, eventually, condence or other interestingness metrics. This The itemization step maps a real-value or discrete matrix into an
restriction enables a scalable exhaustive space search that produces itemset database. For real-value matrices, normalization and discre-
an arbitrarily high number of biclusters within a exible structure. tization procedures are applied. Then, the discrete value of each
element is concatenated with its column index. Each transaction of
Denition 1.4. Let A be a matrix whose values in R are assigned to the target itemset database corresponds to a row with these new
a set of items L. A bicluster under a constant model can either values. FIM is then applied over this database to mine frequent
follow: an overall orientation where aij A L; a column-based patterns for composing biclusters with coherency across rows. The
orientation where aij kj and kj A L; or a row-based orientation second class of methods relies on variants of the FIM task to learn
where aij ki and ki A L. A bicluster following an additive (or frequent patterns directly from the real-valued matrix. In both
multiplicative) model has aij kj i (or aij ki j ), where ki A R classes, the coherency strength is implicitly dened by the number
and j A R dene the column and row contributions. A bicluster of items or the maximum allowed distance. Biclusters with coher-
under a symmetric model either considers symmetries on rows ci ency across columns can be mined using the transpose matrix.
aij or columns cj aij , where ci A f 1; 1g. Finally, biclusters with coherent values overall can be discovered by
mining one item (or range of values) at a time. Fig. 1 illustrates how
Denition 1.5. Given a matrix A whose elements are the con-
to deliver these different types of biclusters using frequent patterns
catenation of the observed values aij A L with their column (or
when considering the constant model.
row) indexes. Let P of an itemset P in A be its set of indexes. set of
biclusters [ k I k ; J k can be derived from a set of frequent itemsets
1.2. Related work
[ k P k by mapping I k ; J k Bk, where Bk P k ; P k , to compose
biclusters with coherency across rows, or I k ; J k P k ; P k for
To our knowledge, BicPAM [62], BiModule [96], DeBi [111], Bellay's
column-coherency.
et al. [14], GenMiner [86] and BiP [60] are the state-of-the-art
Two classes of PM-based biclustering approaches can be consid- methods for the rst class of PM-based biclustering. BiModule
ered: (1) a rst class targeting discrete matrices by using as-is pattern [96,97] allows a parameterized multi-value itemization of the input
miners, and (2) a second class targeting numeric matrices by matrix to discover constant biclusters derived from (closed) frequent
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3945

Fig. 1. Mining biclusters with constant assumptions over itemset matrices. To discover biclusters with constant values on the rows, the input matrix needs to be itemized.
Column identiers are combined with the observed values, and FIM applied under a parameterizable support threshold 24 P Z 2. Constant values on columns can be
mined using the transpose matrix. To nd biclusters with constant values overall, each item needs to be separately mined. In each iteration, only the elements containing the
selected item are included in the transactions.

patterns using the LCM miner [125]. DeBi [111] derives biclusters Different options for PM-based biclustering can be grouped
from (maximal) frequent patterns mined over binarized matrices according to its three major steps: mapping (preprocessing), mining
using the MAFIA miner [22], and places key post-processing princi- (pattern discovery), and closing (postprocessing). The core step is
ples to adjust them in order to guarantee their statistical signicance. the mining step, corresponding to the application of the target
The recently proposed BicPAM [62], parameterized with the F2G pattern miners. This step is driven by the chosen paradigm, target
miner [65] by default, extends the constant assumption of previous patterns and search properties. The mapping step (optional for
approaches to nd biclusters with symmetric, additive and multi- methods able to deal with non-discrete data) is responsible for the
plicative factors by performing iterative corrections on the input itemization of a (real-value) matrix and for other preprocessing
matrix. BicPAM also surpasses discretization problems by introducing options to handle outlier, noisy and missing elements. Finally, the
the possibility to assign multiple discrete values to a single element, closing step includes the postprocessing of the mined patterns to
and offers new strategies to robustly handle noise and missing affect the structure and quality of the target biclustering solutions.
values. Bellay's et al. method [14] uses the Apriori miner [2] with These options impact the homogeneity of the biclustering
additional principles to evaluate the functional coherency of the solutions. The homogeneity criteria can be intentionally controlled
discovered biclusters against the background noise. This is one of to search for biclusters with a specic coherency (underlying
diverse PM-based attempts to exhaustively discover dense biclusters pattern correlation), structure (number, size and positioning of
in either unweighted networks [13,90,133,80] or, more interestingly, biclusters) and quality (amount and type noise within a particular
in scored networks [32,30]. GenMiner [86] includes external knowl- bicluster or set of biclusters).
edge within the input matrix to derive biclusters from association Section 2.1 covers the core PM-based biclustering paradigms.
rules that relate annotations (external grouping of rows or columns) Sections 2.2 2.3 detail the remaining mapping and closing
with clusters derived from (closed) frequent patterns using CLOSE dimensions and discuss their implications in the behavior of PM-
[102]. BiP [60] is prepared to discover plaid models by relying on based approaches.
noise-tolerant association rules for the recovery of apparent noisy
areas due to the presence of cumulative effects on the overlapping 2.1. Mining options: discovery of biclusters using pattern mining
areas between biclusters.
The itemization step is optional for the second class of methods Flexible scenarios where the number and position of biclusters
[8]. To our knowledge, RAP [101], RCB discovery [8] and ET- is not constrained require efcient algorithms [111,81]. The ade-
bicluster [52] are state-of-the-art methods here. RAP [101] plugs quate use of PM approaches is critical to guarantee the exibility
an adapted range-based metric to mine constant biclusters on and scalability of the biclustering algorithm, and depends essen-
rows (or columns), while RCB discovery targets biclusters with tially on four variables discussed below: (1) the chosen PM-based
constant values overall [8]. ET-bicluster extends the previous approach to biclustering, (2) the application schema, (3) the target
approaches to discover noisy biclusters, although an exhaustive pattern representations, and (4) the search strategies.
enumeration of biclusters is not guaranteed [52]. Alternative
support metrics with dedicated Apriori-based searches have been 2.1.1. Mining approaches to compose biclusters
additionally proposed [69,115,53]. In what follows, we overview the state-of-the-art options using:
(1) frequent pattern mining, (2) association rule mining, (3) structured
pattern mining, and (4) hybrid approaches to compose biclusters.

2. PM-based biclustering 2.1.1.1. Frequent pattern mining. Two main strategies can be
considered: (1) relying on frequent itemset mining (FIM) support
We propose a structured view of PM-based biclustering accord- metric as-is; and (2) dening new (anti-)monotonic support
ing to a set of dimensions of decision. We rely on state-of-the-art metrics for a dedicated yet efcient search.
literature to characterize each dimension. These dimensions Fig. 1 illustrates how PM can be applied to nd biclusters with
gather principles on different steps with impact on the biclusters constant items overall, on rows and on columns. When ignoring the
type, structure and quality, as illustrated in Figs. 2 and 3. closing step, the discovered biclusters are the frequent itemsets. The
Throughout this paper we dene a set of principles for each step. support threshold denes the minimum number of rows in a
3946 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958

Fig. 2. Process-view dimensions and their impact on biclustering solutions.

Fig. 3. Structured view of PM-based biclustering: illustrative options across the major dimensions. It groups critical decision dimensions (corresponding to either a row, a
column or a cell of the framework) to support the design of PM-based biclustering approaches. A set of principles for each dimension is illustrated and detailed throughout
this work for each biclustering step (mining, mapping and closing) and biclustering goal (dened according to a specic type, structure and quality of biclustering solutions).

bicluster. By decreasing this threshold we are degrading the efciency long as the new support metric is (anti-)monotonic, its inclusion
of the task, but searching for a broader set of biclusters with smaller within Apriori-based frameworks [101] can be easily handled with
sizes. In the context of gene expression, this is critical since small efciency. Patterns are thus generated using breadth-rst level-
groups of genes can be functionally related. Additionally, the search wise pattern tree.
can allow the pruning of itemsets below a minimum number of Han et al. proposed Min-Apriori [53], an algorithm to deal with
columns and above a maximum number of rows and columns. ordinal items. Steinbach et al. [115] introduced a framework to
From the point of view of an itemized database, the FIM-based generalize the notion of support to extend association analysis to
biclusters are perfect biclusters, that is, they do not allow value-variations continuous-based patterns. An alternative support function [69] has
in any of its elements. Contrasting, from the point of view of the input been proposed to mine hyperclique patterns (groups of columns or
real-value matrix, these biclusters can handle noise as different values rows strongly related) over numeric matrices. Calders et al. [25]
may be assigned with the same item. The number of items can be proposed the use of rank-based measures to score the similarity of
exibly parameterized to control the level of noise-tolerance, which sets of numeric attributes within new support metrics by extending
contrasts with traditional biclustering approaches over discrete matrices3 , Spearman's , and Spearman's Footrule F correlation metrics [71].
[94,119]. Although BiModule [97,96] allows a parameterizable number of Here, efcient algorithms are designed to deal with the ranks of
items and support threshold, the structural data noise and the applied attribute values, but not with the original numeric values. However,
itemization procedure often leads to the partitioning of large biclusters these approaches do not capture key properties of real-valued
into smaller ones (with many of them ltered out as no longer satisfy the matrices, such as the need to ensure that the values of items in a
support criterion). Contrasting, although DeBi [111] and Bellay's et al. transaction are within a range to guarantee coherence and distin-
method [14] alleviate this problem by providing postprocessing strategies guish positive from negative values.
to improve the functional coherence of the discovered biclusters, they More recent approaches propose range-based support metrics
require the input data to be binarized. to either discover coherency on rows, such as RAP [101]. RAP is
FIM-based approaches suffer from the risk of assigning ele- dened under a sign-coherence constraint, enforcing that a
ments with similar real-values to different items. We refer to this transaction can only contribute to the support of a pattern if the
drawback as the items-boundary problem. In order to address this values of all the items in it have the same sign.4 An alternative, RCB
problem, the notion of support of an itemset can be redened. As

4
3
Illustrating, xMotif [94] relies on greedy search and uses a size merit function For a matrix A X; Y and I D X; J D Y, the support metric is dened as
and a noise threshold to guarantee the discovery of large and interesting biclusters, supJ i A X Si; J, with:

and SAMBA-based approaches [119] map binarized matrices into a weighted minj A J aij if maxj aij minj aij r minj j aij j 4 8 j aij 4 03 8 j aij o 0
Si; J
bipartite graph to nd subgraphs that maximize a weight merit function. 0 otherwise:
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3947

to discover biclusters with varying quality [60]. In matrices with

numerous correlations, the support should be set low, the condence
set high, and constraints incorporated to deal with the explosion of
rules from frequent itemsets. For instance, the rule-based GenMiner
approach [86] imposes rules to be non-redundant with minimal
antecedent and maximal consequent (minimal non-redundant rule
for short) in order to avoid the explosion of rules. Alternatively,
Fig. 4. Biclustering using FIM over discrete matrices versus range-based searches association rules can be pruned based on their statistical/biological
over numeric matrices. Biclusters discovered with range-based support metrics are signicance [5] according to hypotheses veried by correlation
less prone to the items-boundary problem.
coefcients (such as Pearson's Product Moment Correlation, Spear-
man's Rank-order Correlation Coefcient and Kendall's Tau).
discovery method [8], veries range constraints on both dimen- Carmona-Saez et al. [26] and GenMiner [86] extended simple
sions (rows and columns) using a monotonic range measure. The rules by integrating annotations from semantic sources and
Apriori-based method is slightly modify in order to grow (biomedical) knowledge bases. Illustrative rules include:
homogeneous-squares that are then used to compose rectangles annotation1 ) fc1 ; c2 g, meaning that a group of genes (with
(biclusters). Finally, ET-bicluster model [52] revises the previous the same annotation) is likely to be under-expressed in condition
support metrics for the discovery of noisy biclusters by guarantee- c1 and over-expressed in condition c2, or fc1 ; c2 g ) annotation1 ,
ing that each supporting transaction of a pattern does not exceed a meaning that a group of genes with the expression prole given by
specic error-threshold. Although this support metric is not anti- c1 and c2 is likely to have specic annotations.
monotonic and thus does not guarantee the exhaustive search of Finally, and similarly to support customization, condence and
all possible patterns, optimality distances can be given. other interestingness metrics can be customized and plugged within
Despite the relevance of this type of hyperclique-based an Apriori-based framework. However, to our knowledge, there are
approaches to avoid the items-boundary problem, they require not yet implementations of this type of rule-based approaches.
the denition and parameterization of (anti-)monotonic metrics. 2.1.1.3. Structured pattern mining. Approaches that target different
Additionally, PM principles to enhance scalability and to discover types of patterns provide alternative search paradigms for
condensed representations for these range-based patterns cannot biclustering and hold the potential to discover biclusters with
be directly applied. Fig. 4 provides an illustrative application of specic properties. This set of approaches includes:
this type of enhanced FIM-based approaches against traditional
FIM-based approaches. Constraint-based pattern mining or actionable pattern discovery
In labeled datasets, FIM-based approaches have been extended approaches. Biclusters are declaratively dened through the use
for the discovery of class-discriminative biclusters (biclusters with of exible pattern constraints that specify the target homoge-
signicantly higher support for a particular class) [43,116,95]. neity criteria. In this context, a bicluster is a specic formal
2.1.1.2. Association rule mining. Association rule mining can concept called bi-set. A bi-set satises, at least, a local constraint:
alternatively be used to compose biclusters [60]. Its core task is the column set (or intent) is the maximal set of columns that are
the support-guided discovery and condence-guided combination true for the supporting set of rows (or extent) [19,4];
of frequent itemsets [54]. Given a matrix A, a simple association Sequential pattern mining (SPM) approaches: SPM can be used to
rule relates columns (J-J 0 ) or, when transposed, relates rows mine order-preserving biclusters [61,63,78]. A bicluster is
(I-I 0 ). An illustrative rule from a transposed gene expression order-preserving if there is a permutation of its columns under
matrix is fgeneA g-fgeneB ; geneC g, meaning that when geneA is which the sequence of values in every row is (either mono-
under-expressed, it is very likely that genes B and C are over- tonically or strictly) increasing. For this aim, the indexes of the
expressed. An arbitrary high number of states/items can be elements in the matrix are reordered per row; the ordered set
considered. When using association rules to compose biclusters, of indexes are mapped into a sequential database; SPM is
the items on the antecedent and consequent of a rule, as well as applied; and the biclusters are mapped from the frequent
the supporting transactions from both sides, are considered to sequences and their supporting transactions (Fig. 6). OP-
derive each bicluster. Thus, association rules can be used to Clustering [78] was the rst attempt to SPM-based biclustering.
capture accommodate noise when condence levels are below More recently, BicSPAM [63] was proposed to address the
100%, as illustrated in Fig. 5. Consider the illustrative rule R1 : efciency bottlenecks and noise-intolerance of previous algo-
fg 2 ; g 3 g-fg 4 ; g 5 g with condence below 90%. Instead of using the rithms, and allow a parameterizable variation of the degree of
conditions that support fg 2 ; g 3 ; g 4 ; g 5 g to build the bicluster, one co-occurrences versus precedences to affect the order-
can extend it by considering the conditions that support uniquely preserving coherency;
Two-way PM-based clustering approaches: This is a promising
direction since the patterns underlying the clusters on each
dimension can be used to affect the structure, quality and type
of biclusters [49];
Graph mining approaches: Real-value matrices can be mapped
into weighted bipartite graphs, and thus biclustering can be
mapped into the task of nding maximal cliques [84] or other
Fig. 5. Discovering biclusters from association rules: comparing noise-intolerant substructures from graphs derived from binarized matrices
biclusters from frequent itemsets vs. noise-tolerant biclusters from association rules.
[119]. Despite its computational complexity, structured pattern
mining over weighted bipartite graphs is a direction with
R1 antecedent, fg 2 ; g 3 g. Condence is thus seen as a homogeneity growing attention [28];
indicator. Cube computation approaches: Cube computation shares simi-
To mine specic rules of interest, other interestingness metrics larities with frequent pattern analysis, being well-suited to deal
have been used to augment the support-condence framework, with matrices in Rn when n 4 2 [55,132]. The additional
including lift, conviction, chi-square, cosine and all-condence dimensions can be used to capture additional informative
[117]. BiP explores the thresholds of these metrics can be explored views (such as time points or replicates) [3], to model
3948 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958

additional representations can be considered as indirect, minimal,

non-redundant, approximative, quantitative and sporadic rules [117].
Although an analysis of the impact of using each representation on
the biclustering solutions is possible, we consider simple, maximal
and closed representations for simplicity sake.

Fig. 6. Mining order-preserving biclusters in real-valued matrices with sequential

Denition 2.1. Given an itemset matrix, a support threshold ,
pattern mining. and the coverage function : 2L -2D that maps an itemset P to its
set of supporting transactions. A closed frequent itemset is a
contributions from overlapping areas of biclusters under a plaid frequent itemset that has no superset with the same support
model assumption [60], or to nd biclusters' consensus over ( 8 P0 *P P 0 o P). A maximal frequent itemset is a frequent itemset
cubes with different pre-processing and closing criteria. with all supersets being infrequent, 8 P0 *P P 0 o .

Given an itemset database Dex ffA; B; H; Jg; fD; H; Jg; fC; D; H; Jgg,
2.1.1.4. Hybrid approaches. Biclustering can rely on multiple types of and thresholds 2 (j P j Z 2) and j Pj Z 2, there is one maximal
patterns discovered by different PM approaches. Valid options include frequent itemset (fD; H; Jg) and there are two closed frequent
the denition of ensemble methods combining plain and structured itemsets (fD; H; Jg and fH; Jg). The selection of the pattern repre-
patterns or the output of multiple PM methods (parameterized with sentation essentially depends on the type and structure of the
different support-condence thresholds). Frequent itemsets can be target biclusters, and on the post-processing needs.
also used to produce an initial solution, while rules can be posteriorly Maximal itemsets for biclustering, such as those used in DeBi
mined to shape the discovered biclusters by accommodating noise. [111], are associated with biclusters with the columns' size max-
An alternative ensemble model can rely on the multiple results from imized. Such attened biclusters are only of interest when there is
the iterative parameterization of a PM method with different PM- an extension step to be performed to include new rows. However,
based constraints of interest. To our knowledge, these hybrid since both vertical and smaller biclusters are lost, this representation
possibilities have not been systemically studied in literature. leads to incomplete solutions. The opposite alternative is the use of
all frequent itemsets for biclustering. This solution leads to a high
2.1.2. Application schema number of potentially redundant biclusters (if contained by another
The previous pattern mining approaches can be iteratively bicluster), which can degrade the performance of the mining and
applied with a decreasing support threshold until a stopping criteria closing steps. Finally, the search for closed itemsets, such as FIM-
is achieved [62]. BicPAM makes available distinct stopping criteria, based BiModule [96] and rule-based GenMiner [86], allows the
including a minimum coverage of the elements in the input matrix discovery of overlapping biclusters if a reduction on the number of
by the discovered biclusters or, alternatively, an approximate columns results in a higher number of rows. Closed pattern solutions
number of biclusters (after or prior to postprocessing) [62]. Such are thus enabling the return of all maximal biclusters (set of
criteria can either be driven from user expectations or dynamically biclusters that are not included in other biclusters). The properties
derived from the properties of the input matrix [63]. of these three alternative representations are illustrated in Fig. 7.
Furthermore, iterative corrections can be applied on the matrix to
enable the discovery of more exible coherencies. BicPAM makes use 2.1.4. Search strategies
of the observed differences and of the least common divisor between The choice of the search strategy depends essentially on the
the observed values for a given column (or row) in the matrix in target biclustering task and on the properties of the considered
order to perform iterative corrections across rows (or columns) and implementation. Generally, PM searches are centered on comput-
thus identify shifting and scaling factors. The removal of these factors ing the set of frequent patterns, which is the core task of all
in the matrix allows the discovery of additive models and multi- pattern miners.
plicative models [62]. Similarly, BicPAM can also rely on combinatorial The choice of whether to use a vertical or an horizontal data
sign-adjustments across rows (or columns) to model symmetries, and format depends essentially on the type of biclusters we are
integrate them with shifting and scaling factors [62]. Pruning targeting. To nd constant items on the rows or on both dimen-
strategies are considered to avoid redundant calculus and reduce sions, we usually benet from using searches over horizontal data.
the computational complexity of these iterative corrections. This is particularly true for matrices where the total number of
BiP relies on the converging application of PM for learning plaid rows largely exceeds the total number of columns. To nd constant
models [60], based on the observation that, by incrementally items on the columns (when n 4 m), a vertical data format should
removing overlapping contributions, the residual values become be the choice, as the performance of searches using the horizontal
closer to the underlying unstructured noise. For this aim, BiP format degrades exponentially with the increase in the number
performs checks between iterative applications of PM searches in of items.
order to recover areas explained by cumulative effects (contribu-
tions on overlapping areas between biclusters) and to remove
noisy areas that are not described by a plaid assumption. Without
degradation of efciency levels, it also provides relaxations to
model overlapping contributions characterized by noisy and non-
linear cumulative effects [60].

2.1.3. Pattern representation

Depending on the chosen PM approach, different patterns, such
as frequent itemsets, association rules, sequential patterns or struc-
tured patterns, can be considered. Each of these patterns can have
different representations, being the most common: simple, maximal,
closed, pseudo-closed, approximated, rare, top-K, multilevel and Fig. 7. Comparison of biclustering solutions using simple, maximal and closed
erasable [24]. In particular, when targeting association rules, patterns.
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3949

The choice of whether to use an Apriori-based, pattern-growth maximum and minimum) is the simplest discretization option, but it
or combined approach, depends on three variables: (1) the type of usually leads to an accentuated weak distribution of items and it is
PM-based approaches (range-based approaches cannot rely on prone to the items-boundary problem. The rst problem can be
pattern-growth methods), (2) the density of the resulting itemset corrected using a percentage-based method for the depth partitioning
matrix, and (3) the ability to retrieve the supporting transaction set of items that leads to intervals containing approximately the same
for each frequent itemset without degrading the overall efciency. number of elements. Alternatively, distributions combine the proper-
This analysis is detailed in supplementary material. When biclusters ties of the previous solutions. In the example, a Gaussian distribution is
with constant values overall are targeted, the resulting matrices are able to minimize the loss of potentially relevant biclusters. By nding
sparser (Fig. 1) and, therefore, an Apriori strategy is preferred. For multiple suitable curves (for each row or column) or one suitable
denser matrices, pattern-growth strategies are preferable. overall curve to approximate the matrix, one can either use threshold
In particular, the discovery of patterns together with their methods [26,31] or compute the statistical cutoff points to create
supporting transactions has been tackled using extensions over equally-distributed areas. Nordi [86] is a Gaussian-based method used
Apriori and vertical-based algorithms by relying on bitset vectors to in GenMiner [86] that statistically detects outliers (using the Grubbs
capture the supporting transactions per pattern [86,111,96]. How- method), applies normality tests (using QQ-plot and Lilliefors) to
ever, bitset vectors offer efciency problems in terms of memory transform the initial row distributions into a more Normal distribu-
and time for large and dense datasets. Henriques et al. [65] study tion, and computes cutoff thresholds using the z-score methodology. In
efcient alternatives and propose a pattern-growth algorithm to the presence of matrices with multimodal distributions, more expedite
discover full-patterns with heightened time and memory efciency. methods based on a mixture of distributions must be considered.
An additional key aspect is the chosen implementation. The use of A unique advantage of PM-based approaches is the fact that they
bit-set operations and either reduced number of scans or efcient can easily address the items-boundary problem of discretization
tree-traversals are usually key for a top performance. Efcient procedures by assigning two or more items to an element in the
implementations include algorithms to mine closed itemsets under original matrix with a real value that is near a discretization boundary
an Apriori search (LCM [125], Charm [136]), vertical search (TD-Close (or cut-off point). This is possible since PM is able to learn from
[77]) or pattern-growth search (FPClose [51]); and to mine maximal transactions (mapped from the rows of an itemized matrix) with an
itemsets under an Apriori search (MaxMiner [11]), vertical search arbitrary number of items. Despite the critical relevance of this
(Maa [22]) or pattern-growth search (AFOPT [76]). Similarly, multi- strategy, its impact was not yet systemically assessed.
ple implementation variants can be found to compose association Alternative discretization options that aim to deal with this problem
rules [138,87] and to mine structured patterns. For instance, include: (1) adaptive discretization based on dynamic threshold selec-
sequence miners can either use Apriori, pattern-growth and vertical tion policy [107]; (2) statistical methods to detect differential activity of
searches, and nd closed and maximal sequential patterns [79]. In elements as the basis to create partitions [31] (commonly adopted as a
DeBi [111], BiModule [97] and GenMiner [86] use Maa [22], LCM binarization method); (3) distance-based subspace clustering models
[125] and CLOSE [102] implementations, respectively. Range-based [75] to exibly partition the values while preserving meaningful and
variants use Apriori [2]. Additional principles proposed in literature signicant clusters; (4) fuzzication approaches where a continuous
[138,99,100] can be seized to guarantee the scalability of the search domain is partition into fuzzy sets, provided to be more robust to noise
when mining large biclusters from dense or large data settings. when compared with other simple binning techniques [47]; and (5)
supervised discretization methods [45] (when descriptive labels per
2.2. Mapping options: preprocessing input data row or column are present or computed using clustering methods),
where a row or column is partitioned into a number of disjoint
Previous section covered essential mining options with impact on intervals in such a way that the entropy of the partition is minimal.
the coherency, structure and quality of PM-based biclustering solu- An additional preprocessing concern appears for matrices with
tions. However, their optimum application requires the input arbitrary-high number of missing elements. Although multiple
matrices to be correctly normalized5 and (depending on the PM- imputation methods have been proposed [122,38,58] to alleviate
based approach) discretized. The problem of dening an adequate this problem, they can introduce additional noise and undesirably
coherency strength is identical for range-based approaches (distance affect the homogeneity of the output biclusters. BicPAM [62] and
thresholds as a function of data domain values) and discrete PM- BicSPAM [63] consider varying relaxations to surpass this problem,
based approaches (number of items). Although discretization may including a relaxed setting where the missing element is replaced
imply loss of information, it alleviates the noise dilemma [26,31]. by all the available items (leading to transactions with varying
Since discretization is a key step for the class of PM-based methods size), and a medium-constrained setting to consider a parameter-
that relies on itemset databases, having key implications on the target izable number of items around its value-estimation.
solution, we study two variables: (1) the number of items (also referred
to as symbols or expression levels) and (2) the method used to map the 2.3. Closing options: postprocessing biclustering solutions
normalized real-value matrix into a itemset database. A sensitivity
analysis on the impact of the number of items on the quality and size of PM-based biclustering approaches produce exhaustive solu-
biclusters was, rst, performed in Bidens [83] and BiModule [96]. Fig. 8 tions with exible structures (arbitrary number and positioning of
illustrates how simple discretization options can lead to different biclusters). These non-exhaustive, non-exclusive structures, where
solutions. The itemization (concatenation of the item with the overlapping is allowed, are the most suitable option to tackle the
column-index) implies that the resulting number of items is at most applications listed in Table 1.
m l, being l the number of items specied by the user. The use of Two key challenges of exhaustive solutions are: handling noise
xed ranges (potentially equal sized intervals between the observed and dealing with the potential explosion of valid biclusters. Part of
these questions can be answered in the mapping step by selecting
the number of items and discretization setting able to handle the
5
Normalization options are often applied before biclustering to enhance items-boundary problem. However, postprocessing may be
differences across rows and/or columns and, consequently, to improve the ability required to avoid the following two challenges of the noise
to discover biclusters. de Souto et al. [34] compare three normalization procedures
(z-score, scaling and rank-based procedures) over gene expression datasets using
dilemma. The rst results from a too restrictive noise tolerance,
alternative clustering algorithms. Additional methods for preprocessing the input commonly associated with a high number of items, which leads to
matrix have been reported [118,83,25]. many small sized biclusters. The second is related to heightened
3950 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958

levels of noise allowance, commonly occurring in binarized parti- veried [29]; or discover patterns under more restrictive condi-
tions or through the use of rule-based approaches under a relaxed tions (as higher support and condence thresholds) and use them
level of condence. To handle these challenges we propose the use guide the removal of rows and columns [62,63].
of a set of criteria structured according to three major postproces- The second type of ltering is required to guarantee the dissim-
sing steps (merging, ltering and extension) described below. ilarity of biclusters, removing biclusters partially contained in larger
biclusters. BiModule [96] lters small biclusters by sorting biclusters
Merging options: Merging biclusters may serve two goals: noise following the score aIJ log 2 j Ij log 2 j J and biclusters whose cells
allowance (to avoid solutions composed uniquely of small biclus- overlap by more than 25% with a higher scored bicluster. The work
ters) and overall biclustering structure manipulation. The rst goal by Bellay et al. [14] separates biclusters that represent biological
is driven by the observation that when two biclusters share a phenomena from false discoveries (emerging from the background
signicant area it is probable that their merging composes a larger data distributions) using randomized data scores.
bicluster still respecting some homogeneity criteria. Commonly,
such decomposition is related to the items-boundary problem or Extension options: Three optional and non-exclusive strategies can be
with a missing value. The simplest criterion to allow the merging used to extend the discovered biclusters so that the resulting solution
is either to rely on the overlapping area (as a percentage of the still satises some pre-dened homogeneity signicance criteria. First
smaller bicluster), to compute the overall noisy percentage after strategy consists on the use of statistical tests to include rows or
the merging, or both. Additional homogeneity criteria relying on columns from each bicluster. DeBi [111] uses statistical tests to extend
the real-values provided by the input matrix can be formulated. biclusters obtained over binary matrices by evaluating the association
Henriques et al. [61] performed a comparison between three strength between key columns of a bicluster and a new row using
distinct efcient merging techniques. Bellay et al. [14] proposed Fisher's exact test for independence on a contingency table. This
a Markov Clustering (MCL) algorithm to both summarize biclus- guarantees that each row in the bicluster shows a statistical difference
tering solutions and allow for the creation of larger biclusters. between the columns in the bicluster and the columns not in the
bicluster, leading to more functionally coherent biclusters. Second
Filtering options: Filtering is needed at two levels: (1) at the row/ strategy is to rely on traditional merit functions for further (greedy)
column level and (2) at the bicluster level. The rst type of ltering extensions over PM-based biclusters. Third strategy is to discover
is needed to exclude rows or columns from a particular bicluster in patterns under more relaxed criteria (such as lower support-
order to improve its homogeneity. This is usually the case when a condence thresholds) and use them to guide the extension step
low number of items is considered, leading to highly noise- [62]. When considering lower supports, new columns and rows can be
tolerant biclusters. For this purpose, we can rely on statistical added to the original frequent patterns. Similarly, more relaxed
tests on each row and column of a particular bicluster to identify association rules, with less restrictive ways to group the antecedent-
removals [111]; use existing greedy-iterative approaches to max- consequent, can be used to guide extensions.
imize a merit function until a parameterizable reduction in size is
Alternatives to merging, ltering and extension options: Alternatives to
previously introduced closing options to deal with large sets of small
biclusters include: (1) summarization techniques based on simple and
hierarchical clustering methods or on the denition of similarity
measures to compare biclusters [18]; (2) user-driven formal con-
straints and querying expressions [19,20]; (3) co-clustering for exclu-
sively partition both dimensions to select representative biclusters
[36]; (4) pre- and post-pruning techniques (including item-based
constraints and discrimination metrics) [88]; (5) patterns based on
half-spaces (as quantitative rules) in which external sources of
Fig. 8. Comparison of alternative discretization options by addressing their impact information are used as a ltering basis [48]; and (6) verication
on the itemization and biclustering solutions with constant values on columns. techniques based on metrics computed using external data sources as,

Table 5
Systemic comparison of the two major classes of PM-based biclustering approaches.

Approach Major benets Challenges Proposed principles to tackle challenges

PM-based biclustering - Exhaustive searches; 1. Deterioration of efciency levels for large 1. Data partitioning methods; PM in distributed
- Handle missings and noise; data (in the absence of PM scalability settings; approximated patterns (discovered
- Biclusters with multi-levels of principles); under specic performance guarantees) [54,134];
coherency strength; 2. Not natively prepared to capture additive, 2. Iterative data mappings on rows/ columns
- Extensions to discover exible multiplicative, symmetric and plaid (with pruning heuristics) to mine non-constant
coherencies; coherencies (their discovery can be biclusters [62]; merging procedures sensitive to
- Flexible structures; computationally expensive); overlapping plaid effects [60];
- Flexible searches; 3. High number of mined biclusters (memory 3. Adequate data structures; ltering options
- Constraint-based guidance; usage); pushed into mining step;
4. Need to x thresholds for the standard 4. Use of multi-thresholds (iterative method);
(customized) support metric; data-driven estimation;

Range-based support - Range-based support addresses 1. Separation of positive and negative values to 1. Merging of biclusters with shared columns (or
biclustering the items-boundary problem; guarantee monotonicity, resulting in biclusters rows) but different signs to avoid the violation of
- Easy extension of Apriori methods without simultaneous under- and over- the (anti-)monotonic property;
to seize efciency gains when expressed values;
dealing with multiple distances 2. Dedicated Apriori-based methods do not 2. Dedicated extensions to mine patterns with
(support thresholds); allow the direct use of PM scalability tree structures (required for dense datasets), and
principles; to make use of (scalable) data partitions;
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3951

for instance, term enrichment (in gene expression data) to affect the In particular, we rely on Jaccard-based match scores (MS) to assess
addition-removal of columns-rows per bicluster. the similarity of B and H [108]. MSB; H denes the extent to
which found biclusters cover the hidden biclusters (complete-
2.4. A systematic comparison of PM-based biclustering approaches ness), while MSH; B reects how well hidden biclusters are
recovered (precision).
In what follows, we provide a synthesis of the benets and 1 X I 1 \ I 2
challenges of using PM-based biclustering approaches together MSB; H max :
BI ;J A BI2 ;J2 A HI 1 [ I 2
with principles on how to tackle existing challenges. Table 5 1 1

focuses on PM-based biclustering classes in general, while Since MS scores are not sensitive to the number of biclusters in
Table 6 focuses on each surveyed approach in particular. both sets, Hochreiter et al. [67] introduced a consensus (FC) by
Understandably, different applications may be better tackled by computing similarities between the pairs of closest biclusters
different PM-based biclustering approaches. BicPAM, BiModule and between B and H. Let S1 and S2 be, respectively, the larger and
RAP are default options for settings where meaningful biclusters can smaller set of biclusters from fB; Hg, and MP be the assigned pairs
only be found using multiple coherency levels, which is often the using the Munkres method based on overlapping areas [93].
case with scored biological/social networks, expression data and
1 X I1 \ I2 J 1 \ J 2
physiological data [62,96,101]. DeBi and BicPAM are critical for the FCB; H
analysis of large Boolean datasets, such as the ones derived from S 1 I I1 J 1 I2 J 2 I 1 \ I 2 J 1 \ J 2
1 ;J 1 A S 1 ;I 2 ;J 2 A S 2 A MP

(web) text data or genomic structural variations [111,62]. GenMiner's

In the absence of hidden biclusters, only subjective metrics can be
ability to incorporate external knowledge is relevant for biological
formulated. Merit functions can be applied as long as they are not biased
and clinical contexts [86]. The constant overall assumption of RCB is
towards the merit functions used within the approaches under compar-
critical to efciently mine biclusters with a specic behavior or rating
ison. A detailed comparison of merit functions is provided by Orze-
in web social data and collaborative ltering data [8]. The noise-
chowski [98]. Complementarily, domain-driven scores can be computed
tolerance of ET-biclusters and BicPAM is relevant to deal with
using the groups of rows and columns in biclustering solutions retrieved
experimental errors and instance-based variations of physiological,
from real datasets against annotations extracted from (biomedical)
molecular and clinical data [52,62]. Finally, BiP and BicPAM are the
knowledge bases, such as Gene Ontology and Yeastract [85,121],
choice for the analysis of non-trivial (yet coherent) behavior across
semantic sources or bibliographic databases. Biclusters can be ranked
biomedical and social domains as they allow the discovery of exible
using a p-value p from testing the hypergeometric hypothesis against
(yet meaningful and signicant) coherencies [60,62].
these annotations [16,130]. In biological domains, this score can be used
to investigate whether the discovered biclusters show signicant
3. Performance evaluation of PM-based biclustering enrichment with respect to terms in gene ontologies, transcription
approaches factors, protein-interaction networks and metabolic pathways using
varying levels of signicance and correction procedures [96,111].
This section evaluates the performance of PM-based biclustering We propose a methodology for evaluating PM-based biclustering
approaches. We rst describe the quality evaluation methodology approaches according to three major decision axes. The rst axis
and then present preliminary results on synthetic and real data. concerns the target set of synthetic and real datasets. Synthetic
datasets must have varying sizes and congurations and be able to
3.1. Methodology exploit different biclustering solutions with respect to their coher-
ency, size, noise and overlapping degree. The second axis includes
Effective evaluation of PM-based biclustering solutions is the set of biclustering approaches and parameterizations to estab-
challenged by three major issues. First, a large variety of metrics lish comparisons. Finally, the third axis denes the set of metrics to
and synthetic datasets have been proposed (with many being be used. It should assess: (1) time and memory efciency;
biased to the specicities of a particular approach) [98]. This is the (2) accuracy from synthetic data using match scores, CE subspace
case either when a variant of the optimized merit function is used or FC consensus; and (3) domain relevance scores from real data.
to evaluate the approach, or when a developed approach is
optimized towards specic data settings. Second, there is no 3.2. Results
ground truth to evaluate biclusters observed in real data. Finally,
existing efforts to develop a standard evaluation [92,108] only Below we collect initial empirical evidence that shows the
cover a subset of all aspects, often leading to wrong assumptions relevance of PM-based biclustering approaches. The following
regarding the performance of the assessed approaches. experiments were computed using an Intel Core i3 1.80 GHz with
Evaluating biclustering solutions on both synthetic and real 6GB of RAM.
data is essential. In synthetic data, a set of biclusters H fH 1 ; H g g Synthetic datasets were generated8 by varying the size of the
(referred as hidden or true biclusters) is typically planted. Objec- matrices, the number and shape of the planted biclusters and the
tive metrics can be formulated since an approximate solution is number of items (L A f5; 10; 20g). The properties are described in
known a priori, including the relative non-intersecting area (RNAI) Table 7. The number of rows and columns for each bicluster
[21] and its extension (CE subspace6) [103], match scores [108,67], followed a Uniform distribution over the ranges presented in
and clustering metrics7(such as entropy, recall and precision) [6,7]. Table 7. We allow for overlapping biclusters and a random noise
factor (up to 7 15% of the range of values), which can difcult the
6
RNIA cannot distinguish if several or a single found biclusters cover a hidden recovery of planted biclusters. For each of these settings we
bicluster, thus CE maps each found bicluster to at most one hidden bicluster and instantiated 20 matrices: 10 matrices with background values
each hidden bicluster to at most one found bicluster.
7
Clustering metrics are applied to one dimension at a time (rows or columns).
Typical objective functions aim high intra-cluster similarity (overall pattern for rows (footnote continued)
within a bicluster is similar across all columns) and low inter-cluster similarity well the hidden clusters are represented [7,91]. The underlying principle is that
(patterns differ for rows from different biclusters). Entropy combines these views biclusters should cover many rows of a particular hidden cluster but few rows from
by measuring the homogeneity of the found clusters B against the hidden clusters other hidden clusters.
8
H. Alternatively, F-measure (and its precision and recall components) evaluate how Available in http://web.ist.utl.pt/rmch/software/bicpam.
3952 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958

Table 6
Benets, challenges and possible improvements of state-of-the-art PM-based biclustering approaches. PM-based biclustering benets and challenges in Table 5 apply to
DeBi, BiModule, GenMiner and BicPAM/BiP, while both PM-based and range-based biclustering benets and challenges in Table Table 5 apply to RAP, RCB and ET-Biclusters.

Approach Major benets Challenges Principles to tackle challenges

DeBi Complete and statistical rigorous options for Efciency deterioration from post-processing Discovery of closed patterns (removes the need
post-processing biclustering solutions; discovery extension procedures; discovery of maximal for an exhaustive extension of biclusters);
adapted to the target signicance level; (see PM- patterns (loss of a large number of potentially multi-level discretization (standardly as
based benets) signicant biclusters); binarization of data; (see remaining PM-based approaches); (see PM-
PM-based challenges) based principles)
BiModule Multi-level discretization with removal of No merging-extension options for handling noise Inclusion of the surveyed closing options; (see
outliers; (see PM-based benets) and growing biclusters; (see PM-based challenges) PM-based principles)
GenMiner More complete frame to derive noisy biclusters Require annotations from knowledge bases; non- Retrieval of annotations from the dataset under
from rules (non-perfect condence levels); parameterized levels of expression (only 3); (see analysis when knowledge bases are not
allows extracting relations between genes and PM-based challenges) available; delivery of rules without the need
real-world annotations; (see PM-based benets) annotations for annotation on the antecedent
or consequent; inclusion of the surveyed
mapping options; (see PM-based principles)
BicPAM/ BiP Discovery of additive/multiplicative/symmetric/ Efciency levels of the search for non-constant New heuristics, scalability principles,
plaid models; robustness to discretization, noise models rapidly deteriorates for very large approximative searches (replacing the
and missings; dedicated PM searches to explore matrices; (see PM-based challenges) exhaustive criteria), or constraint-based
further efciency gains; (see PM-based benets) guidance to learn non-constant models; (see
PM-based principles)
RAP (see PM & range-based benets) Not able to deal with noisy biclusters; (see PM- Inclusion of closing framework (merging and
and range-based principles) extension strategies); (see PM- and range-based
challenges)
RCB Discovery (see PM and range-based benets) Constant coherency overall excludes biclusters Combined results with other approach
with meaningful differences across columns biclustering solutions (e.g. RAP); alternative
(rows); joining squares (discovered patterns) to computational methods; (see PM- and range-
compose rectangles (biclusters) is a based principles)
combinatorial problem that impacts efciency 8;
(see PM- and range-based challenges)
ET-Bicluster Parameterizable discovery of biclusters based on Inclusion of error-based thresholds on the Adoption of more relaxed thresholds to avoid
the allowed amount of noise; (see PM and range- Apriori-method violates the (anti-)monotonic loosing biclusters of interest with a post-
based benets) property, thus not guaranteeing exhaustive ltering of biclusters non-satisfying criteria;
solutions; (see PM- and range-based challenges) inference of bounds on the performance
guarantees; (see PM- and range-based
principles)

Table 7
Properties of the generated set of synthetic datasets.

Matrix size (#rows # cols) 100 30 500 60 1000 100 2000 200 4000 400

Nr. of hidden biclusters 3 5 10 15 20

Nr. columns in biclusters [5,7] [6,8] [6,10] [6,14] [6,20]
Nr. rows in biclusters [10,20] [15,30] [20,40] [40,70] [60,100]

following a Uniform distribution, U(1, L), and 10 matrices proposed principles in this work by considering closed patterns,
according to a Gaussian distribution, N(L L
2 , 6 ). multiple levels of coherency strength ( A f3; 5; 7g), an assign-
ment of two items for elements with values near item-boundaries,
Comparison: We selected 15 state-of-the-art approaches9: FABIA and merging ( 4 70% overlap) and ltering options. The support
method with sparse prior option [67], ISA [70], OPSM [15], CC [29], threshold was incrementally decreased 10% until the area of the
Samba [119], xMotifs [94], OP-Clustering [78], BicSPAM [63], Bexpa discovered biclusters covered at least 5% of the input matrix.
[106], BCPlaid [123] and the PM-based BiModule [96], DeBi [111], Fig. 9 compares the ability of these state-of-the-art approaches
RAP [101], BicPAM [62] and BiP [60] biclustering approaches. We to discover planted biclusters with constant coherency on rows.
used the following software: R packages fabia10 and biclust11, Results conrm the superior performance of PM-based bicluster-
BicAT [10], Expander12, (Evo-)Bexpa [106], RAP13 and BicPAMS14. ing approaches both in terms of the MS B; H (correctness)
In particular, we adjust BicPAM behavior according to the and MS H; B (completeness) as they provide exhaustive and
exible searches. Superiority is also veried for non-constant
models. Fig. 10 compares the performance of biclustering
9
The specied number of biclusters for FABIA, Bexpa, CC, xMotifs and ISA methods prepared to discover shifting-scaling factors when the
(number of starting points) was the number of hidden biclusters plus 10%:
planted biclusters follow additive and multiplicative models.
H 1:1. Note that this specication guides the search, optimistically biasing Fabia
Consensus (FC) levels. The default number of iterations for the OPSM method was A closer look to the performance of PM-based biclustering, when
varied from 10 to 200 iterations. Remaining parameterizations were set by default. multiple levels of coherency strength are considered, is provided
10
http://www.bioinf.jku.at/software/fabia/fabia.html. in Fig. 11.
11
http://cran.r-project.org/web/packages/biclust
12
http://acgt.cs.tau.ac.il/expander.
13
http://www.mybiosoftware.com/rap-association-analysis-approach-biclus
Efciency: Fig. 12 shows the boundaries on efciency of PM-based
tering.html. biclustering approaches when considering 20.000 rows (magni-
14
https://web.ist.utl.pt/rmch/software/bicpams. tude of the human genome). We varied the number of columns,
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3953

Fig. 9. Comparison of the performance of state-of-the-art biclustering approaches on data settings with varying properties and constant coherencies.

Fig. 10. Comparison of biclustering approaches to recover biclusters with non-constant coherency.

Fig. 11. Performance of PM-based biclustering for data settings with varying coherency strength.

items (j Lj A f5; 7g) and considered a simple merging option 1000 100 data setting. The FIM methods were tested using
( 4 70% overlap). We planted 15 biclusters to occupy 2% of the SPMF15 and F2G [65]. FPGrowth [65] and Eclat [135] are the most
area. Charm [136], an efcient pattern miner to deliver closed competitive choices for small support thresholds, while Apriori [2]
patterns (maximal biclusters), was used. Generally, we observe is the best option for medium-to-large support levels. Additionally,
that PM-based biclustering approaches are scalable for these the use of simple patterns (using FPGrowth [1]) degrades MSB; H,
dense and large matrices. Understandably, the number of items while the use of maximal patterns (using CharmMFI [136])
has strong impact in efciency as it denes the density of the penalizes MSH; B as it discards biclusters with a non-large
itemset database. The scalability of pattern mining methods can be number of columns (even if they have larger number of rows).
guaranteed for even harder settings by adopting some of the
largely researched parallelization, distribution, streaming and Impact of closing options: We planted additional levels of noise, by
error-bounding PM principles [54]. Additionally, hyperclique pat- varying the amount of noisy elements from 0 to 10%, for the
terns [52], which require item-pairwise support-similarity, can be 1000 100 setting. Fig. 14 describes the impact of alternative strategies
also considered to promote the efciency of the mining procedure. to extend, merge and lter biclusters using Charm. When increasing the
planted noise, extension options are critical to maintain attractive levels
Impact of mining options: Fig. 13 illustrates the impact of the of accuracy (20pp higher than the baseline option). Fig. 13(b) illustrates
chosen search and pattern representations (simple, closed, max-
imal) in the efciency and MS levels of PM-based biclustering
approaches when using a discretization step with 10 items and the 15
http://www.philippe-fournier-viger.com/spmf.
3954 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958

Fig. 12. Efciency bounds of PM-based biclustering in the absence of scalability principles for datasets with 20,000 rows.

Fig. 13. Comparison of mining searches and pattern representations for the 1000 100 setting.

Fig. 14. Impact of extending, merging and ltering options. (a) Extending biclusters for varying levels of noise. (b) Merging for varying overlapping degrees (5% of planted
noise). (c) Filtering for varying homogeneity degrees (2% of planted noise).

Table 8
Illustrative set of PM-based biclusters with unique properties and heightened biological relevance.

ID Dataset Pattern Items Closing options # Genes # Conds #pvalues o 0:01 #p-values [0.01,0.05] Best p-value

B1 dlblc FAABFFF A-F Merging with tight overlapping 83 7 41 21 1.97E 10

B2 dlblc AAABCA A-C Extensions allowed (with tight merging) 153 8 9 1 2.27E 12
B3 hughes EEECEE A-E Merging allowed 581 6 12 7 1.31E 25
B4 hughes CCDCBCBCC A-E Merging with relaxed overlapping 654 10 16 4 1.31E 17

the impact of merging biclusters with large overlapping areas assuming For each dataset standard PM-based biclustering (closed FIM) was
a level of planted noise of 5%. When decreasing the overlapping applied using multiple levels of expression L A f47g and different
threshold, MS levels increase up to a certain threshold (near 70% for closing options: (1) merging (70% overlap), (2) relaxed merging (55%
this experimental setting). A correct identication of this threshold can overlap) with ltering of rows, and (3) tight merging (90% overlap)
lead to signicant gains (near 15pp in this setting). Finally, the use of with extensions on rows that appear in another bicluster sharing a
ltering strategies to remove rows and columns can also enhance the minimum 50% of conditions. The biological relevance of each
recovery of the planted biclusters, as it is illustrated in Fig. 14(c). bicluster was obtained using the Gene Ontology (GO) annotations
Similarly to the merging option, MS increases up to a 75% homogeneity using the GoToolBox [85]. Table 8 shows an illustrative set of PM-
(given by 1 MSR [29]) and decreases above this threshold since the based biclusters with signicantly enriched GO terms (after Bonfer-
homogeneity criteria becomes too restrictive. roni correction). These biclusters could hardly be discovered by peer
Domain relevance: To assess the relevance of PM-based bicluster- biclustering methods, since many of them include conditions with
ing in biological settings we used two gene expression datasets : (1) multiple degrees of expression (such as B1, B2 and B4). All of them
dlblc dataset (660 genes, 180 conditions, human genome) [109], and have heightened biological signicance as observed by the number of
(2) hughes dataset (6300 genes, 300 conditions, yeast genome) [74]. highly enriched terms. Interestingly, we also observe that different
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3955

closing options lead to distinct biclusters. Complementary analyzes [2] Tomasz Imieliski, Rakesh Agrawal, Arun Swami, Mining association rules
supporting the biological relevance of PM-based biclustering are between sets of items in large databases, SIGMOD Rec. 22 (June (2)) (1993)
207216.
provided in [62,60,111]. [3] H.A. Ahmed, P. Mahanta, D.K. Bhattacharyya, J.K. Kalita, A. Ghosh, Intersected
coexpressed subcube miner: An effective triclustering algorithm, in: WICT,
December 2011, pp. 846851.
4. Conclusions [4] Faris Alqadah, Joel S. Bader, Rajul Anand, Chandan K. Reddy, Query-based
biclustering using formal concept analysis, in: SDM, SIAM/Omnipress,
This work provides a structure view on pattern mining-based Anaheim, California, USA, 2012, pp. 648659.
[5] Ronnie Alves, Domingo S. Rodrguez-Baena, Domingo S. Rodrguez-
approaches to biclustering as they are increasingly positioned as Baena Ronnie Alves, Jess S. Aguilar-Ruiz, Gene association analysis: a survey
the means to perform exhaustive searches under relaxed condi- of frequent pattern mining from gene expression data, Brief. Bioinform. 11
tions (exible structures of biclusters with parameterizable coher- (2) (2010) 210224.
[6] I. Assent, R. Krieger, E. Muller, T. Seidl, DUSC: Dimensionality unbiased
ency and quality) with heightened efciency. In this context, this subspace clustering, in: ICDM, 2007.
work surveys and integrates the contributions of existing PM- [7] Assent Ira, Mller Emmanuel, Krieger Ralph, Jansen Timm, Seidl Thomas,
based biclustering approaches, evaluates their performance, and Machine learning and knowledge discovery in databases, in:
Daelemans Walter, Goethals Bart, Morik Katharina (Eds.), Pleiades: Subspace
discusses their relevance for pattern recognition applications.
Clustering and Evaluation, Lecture Notes in Computer Science, 5212,
A set of principles were synthesized, covering alternative Springer, Berlin Heidelberg, 2008, pp. 666671, ISBN: 978-3-540-87480-5,
design options to guide the denition of PM-based biclustering http://dx.doi.org/10.1007/978-3-540-87481-2_44.
approaches: (1) mining paradigms (including frequent itemset [8] Gowtham Atluri, Jeremy Bellay, Gaurav Pandey, Chad Myers, Vipin Kumar,
Discovering coherent value bicliques in genetic interaction data, in: BIOKDD,
mining, association rule mining, sequential PM, constraint-based 2000.
PM and structured PM), principles to dene support-condence- [9] R. Rathipriya, K. Thangavel, J. Bagyamani, Binary particle swarm optimization
correlation metrics, pattern representations (as simple, condensed based biclustering of web usage data, CoRR abs/11080748 (2011).
[10] Simon Barkow, Stefan Bleuler, Amela. Preli, Philip Zimmermann,
and approximate), searches, and extensions to consider exible Eckart Zitzler, Bicat: a biclustering analysis toolbox, Bioinformatics 22 (May
coherencies; (2) pre-processing options, including strategies to (10)) (2006) 12821283.
deal with the items-boundary problem when discretization pro- [11] Roberto J. Bayardo Jr., Efciently mining long patterns from databases,
SIGMOD Rec. 27 (June 2) (1998) 8593.
cedures are considered and with noisy and missing elements; and [12] Grkan Bebek, Jiong Yang, Pathnder: mining signal transduction pathway
(3) strategies to compose adequate structures of biclusters through segments from proteinprotein interaction networks, BMC Bioinform. 8
extension-merging-ltering steps without the need to adapt the (2007).
[13] Jeremy Bellay, Gowtham Atluri, Tina L. Sing, Kiana Toughi, Michael
core task. As such, this work introduces a highly-parameterizable
Costanzo, Philippe Souza Moraes Ribeiro, Gaurav Pandey, Joshua Baller,
environment to design PM-based biclustering approaches, where Benjamin VanderSluis, Magali Michaut, Sangjo Han, Philip Kim, Grant W.
the behavior can be dynamically dened according to the input Brown, Brenda J. Andrews, Charles Boone, Vipin Kumar, Chad L. Myers,
dataset and the target biclustering type, structure and quality. In Putting genetic interactions in context through a global modular decom-
position, Genome Res. 21 (8) (2011) 13751387.
particular, the quality of a target solution can be easily affected [14] Jeremy Bellay, et al., Putting genetic interactions in context through a global
through the mining options, such as the condence of association modular decomposition, Genome Res. 21 (8) (2011) 13751387.
rules to dene the level of tolerated noise; mapping options, such [15] Amir Ben-Dor, Benny Chor, Richard Karp, Zohar Yakhini, Discovering local
structure in gene expression data: the order-preserving submatrix problem,
as the number of items (coherency strength) and multi-item RECOMB, ACM, New York, NY, USA (2002) 4957.
assignments; and merging, ltering and extension options based, [16] G.F. Berriz, O.D. King, B. Bryant, C. Sander, F.P. Roth, Characterizing gene sets
respectively, on the allowed noise (overlapping degree), dissim- with FuncAssociate, Bioinformatics 19 (2003) 25022504.
[17] Manuele Bicego, Pietro Lovato, Alberto Ferrarini, Massimo Delledonne,
ilarity and homogeneity of biclusters. Biclustering of expression microarray data with topic models, in: IC on
A qualitative comparison of the state-of-the-art PM-based Pattern Recognition, IEEE, 2010, pp. 27282731.
biclustering approaches was provided, as well as initial empirical [18] Sylvain Blachon, Ruggero Pensa, Jrmy Besson, Cline Robardet, Jean-
Francois Boulicaut, Olivier Gandrillon, Clustering formal concepts to discover
evidence supporting the accuracy, efciency and biological rele-
biologically relevant knowledge from gene expression data, In Silico Biol. 7
vance of this class of algorithms. (July) (0033) (2007).
Following this comprehensive work, new research can embrace [19] Jean-Franois Boulicaut, Jrmy Besson, Actionability and formal concepts: a
several promising directions, including: (1) development of new data mining perspective, in: IC on Formal Concept Analysis, Springer-Verlag,
Berlin, Heidelberg, 2008, pp. 1431.
integrative PM-based biclustering approaches; (2) proposal of sta- [20] Jean-Franois Boulicaut, Inductive databases and multiple uses of frequent
tistical tests to effectively assess the signicance of biclusters with itemsets: The cInQ approach, in: Rosa Meo, PierLuca Lanzi, and Mika
varying coherency and quality; (3) integration of principles from Klemettinen (Eds.), Database Sup. for Data Mining App., LNCS, vol. 2682,
Springer, Berlin, Heidelberg, 2004, pp. 123.
domain-driven PM to incorporate constraints in PM-based biclus- [21] Doruk Bozda, Ashwin S. Kumar, V. Catalyurek, Comparative analysis of
tering when background knowledge is available; and (4) design of biclustering algorithms, Bioinformatics and Computational Biology, ACM,
robust classiers based on discriminative PM-based biclusters. New York, NY, USA (2010) 265274.
[22] Douglas Burdick, Manuel Calimlim, Johannes Gehrke, Maa: a maximal
frequent itemset algorithm for transactional databases, in: ICDE, IEEE
Conict of interest Computer Society, Washington, DC, USA, 2001, pp. 443452.
[23] Stanislav Busygin, Nikita Boyko, Panos M. Pardalos, Michael Bewernitz,
Georges Ghacibeh, Biclustering EEG data from epileptic patients treated
None declared. with vagus nerve stimulation, Data Mining, Systems Analysis and Optimiza-
tion in Biomedicine, 953, AIP Publishing, Gainesville, Florida, USA (2007)
220231.
Acknowledgments [24] Toon Calders, Bart Goethals, Mining all non-derivable frequent itemsets, in:
PKDD, Springer-Verlag, London, UK, 2002, pp. 7485.
[25] Toon Calders, Bart Goethals, Szymon Jaroszewicz, Mining rank-correlated
This work was supported by Fundao para a Cincia e a sets of numerical attributes, In: ACM SIGKDD, ACM, New York, NY, USA,
Tecnologia under the projects UID/CEC/50021/2013 and the PhD 2006, pp. 96105.
grant SFRH/BD/75924/2011 to RH. [26] Pedro Carmona-Saez, Monica Chagoyen, Andres Rodriguez, Oswaldo Trelles,
JoseM Carazo, Alberto Pascual-Montano, Integrated analysis of gene expres-
sion by association rules discovery, BMC Bioinform. 7 (2006) 116.
References [27] Andr Valrio Carreiro, Artur J. Ferreira, Mrio AT. Figueiredo, Sara
Cordeiro Madeira, Towards a classication approach using meta-
biclustering: impact of discretization in the analysis of expression time
[1] Ramesh C. Agarwal, Charu C. Aggarwal, V. Prasad, A tree projection algorithm series, J. Integr. Bioinf. 9 (3) (2012) 207.
for generation of frequent item sets, J. Parallel Distrib. Comput. 61 (March 3) [28] Malika Charrad, Mohamed Ben Ahmed, Simultaneous clustering: a survey,
(2001) 350371. Pattern Recognition and Machine Intelligence, in: Kuznetsov Sergei O.,
3956 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958

Mandal Deba P., Kundu Malay K., Pal Sankar K (Eds.), Simultaneous Cluster- [56] Blaise Hanczar, Mohamed Nadif, Ensemble methods for biclustering tasks,
ing: A Survey, 6744, Springer, Berlin Heidelberg, ISBN 978-3-642-21785- Pattern Recognit. 45 (11) (2012) 39383949.
22011, pp. 370375. http://dx.doi.org/10.1007/978-3-642-21786-9_60. [57] J.A. Hartigan, Direct clustering of a data matrix, J. Am. Stat. Assoc. 67 (337)
[29] Yizong Cheng, George M. Church, Biclustering of expression data, Intelligent (1972) 123129.
Systems for Molecular Biology, AAAI Press, La Jolla, California, USA (2000) [58] Trond Hellem, Bjarte Dysvik, Inge Jonassen, LSimpute: accurate estimation of
93103. missing values in microarray data with least squares methods, Nucleic Acids
[30] Recep Colak, Flavia Moser, Jeffrey Shih-Chieh Chu, Alexander Schnhuth, Res. e32 (February (3)) (2004) 34.
Nansheng Chen, Martin Ester, Module discovery by exhaustive search for [59] R. Henriques, C. Antunes, Learning predictive models from integrated
densely connected, co-expressed regions in biomolecular interaction net- healthcare data: extending pattern-based and generative models to capture
works, PLoS One 5 (10) (2010) e13348. temporal and crossattribute dependencies, in: System Sciences (HICSS),
[31] Chad Creighton, Samir Hanash, Mining gene expression databases for January 2014, pp. 25622569.
association rules, Bioinformatics 19 (1) (2003) 7986. [60] R. Henriques, S. Madeira, Biclustering with exible plaid models to unravel
[32] Phuong Dao, Recep Colak, Raheleh Salari, Flavia Moser, Elai Davicioni, interactions between biological processes, in: IEEE/ACM Trans. Comput. Biol.
Alexander Schnhuth, Martin Ester, Inferring cancer subnetwork markers Bioinf. 2015 (volume pp), (99), p. 1, http://dx.doi.org/10.1109/TCBB.2014.
using density-constrained biclustering, Bioinformatics 26 (18) (2010) 2388206.
625631. [61] Rui Henriques, Cludia Antunes, Sara C. Madeira, Methods for the efcient
[33] P.A.D. de Castro, F.O. de Franga, H.M. Ferreira, F.J. von Zuben, Applying discovery of large item-indexable sequential patterns, in: Lecture
biclustering to perform collaborative ltering, Intell. Syst. Des. Appl. (Octo- Notes in Computer Science, Springer Int. Pub., 2014, pp. 100116
ber) (2007) 421426. http://dx.doi.org/10.1007/978-3-319-08407-7_7.
[34] M.C.P. de Souto, D.S.A. de Araujo, I.G. Costa, R. Soares, T.B. Ludermir, [62] Rui Henriques, Sara Madeira, Bicpam: pattern-based biclustering for biome-
A. Schliep, Comparative study on normalization procedures for cluster dical data analysis, Algorithms Mol. Biol. 9 (1) (2014) 27.
analysis of gene expression datasets, in: IJCNN, June, 2008, PP. 27922798. [63] Rui Henriques, Sara Madeira, Bicspam: exible biclustering using sequential
[35] Zhaohong Deng, Kup-Sze Choi, Fu-Lai Chung, Shitong Wang, Enhanced soft patterns, BMC Bioinf. 15 (2014) 130.
subspace clustering integrating within-cluster and between-cluster informa- [65] Rui Henriques, Sara C. Madeira, Cludia Antunes, F2g: efcient discovery of
tion, Pattern Recognit. 43 (3) (2010) 767781. full-patterns, in: ECML/PKDD IW on New Frontiers in Mining Complex
[36] Inderjit S. Dhillon, Subramanyam Mallela, Dharmendra S. Modha, Patterns, Prague, 2013.
Information-theoretic co-clustering, in: KDD, ACM, New York, NY, USA, [66] Rui Henriques, Silvia Moura Pina, Cludia Antunes, Temporal mining of
2003, pp. 8998. integrated healthcare data: methods, revealings and implications, in: SDM
[37] Chris Ding, Ya Zhang, Tao Li, Stephen R. Holbrook, Biclustering protein IW on Data Mining for Medicine and Healthcare, SIAM, Austin, US, 2013,
complex interactions with a biclique nding algorithm, in: ICDM, IEEE pp. 5664.
Computer Society, Washington, DC, USA, 2006, pp. 178187. [67] Sepp Hochreiter, Ulrich Bodenhofer, Martin Heusel, Andreas Mayr, Andreas
[38] A.R. Donders, G.J. van der Heijden, T. Stijnen, K.G. Moons, Review: a gentle Mitterecker, Adetayo Kasim, Tatsiana Khamiakova, Suzy Van Sanden, Dan
introduction to imputation of missing values, J. Clin. Epidemiol. 59 (10) Lin, Willem Talloen, Luc Bijnens, Hinrich W.H. Ghlmann, Ziv Shkedy, Djork-
(2006) 10871091. Arn Clevert, FABIA: factor analysis for bicluster acquisition, Bioinformatics
[39] E. Elhamifar, R. Vidal, Sparse subspace clustering, in: Computer Vision and 26 (June (12)) (2010) 15201527.
Pattern Recognition, June 2009, pp. 27902797. [68] Qinghua Huang, A biclustering technique for mining trading rules in stock
[40] Kemal Eren, Mehmet Deveci, Onur Kktun, mit V. atalyrek, markets, in: Dehuai Zeng (Ed.), Applied Informatics and Communication, of
M. Deveci, A comparative analysis of biclustering algorithms for gene Communications in Computer and Information Science, vol. 224, Springer,
expression data, Brief. Bioinf. 14 (3) (2013) 279292. Berlin, Heidelberg, 2011, pp. 1624.
[41] Nikita Boyko. Neng Fan, Panos M. Pardalos, in: Wanpracha Chaovalitwongse, [69] Yaochun Huang, Hui Xiong, Weili Wu, Sam Y. Sung, Mining quantitative
Panos M. Pardalos, Petros Xanthopoulos (Eds.), Recent advances of data maximal hyperclique patterns: a summary of results, in: PAKDD, Springer-
biclustering with application in computational neuroscience, Computational Verlag, Berlin, Heidelberg, 2006, pp. 552556.
Neuroscience, 38, Springer Optimization and Its Applications Springer, New [70] jan Ihmels, Sven Bergmann, Naama Barkai, Dening transcription modules
York, ISBN 978-0-387-88629-92010, pp. 85112. http://dx.doi.org/ using large-scale gene expression data, Bioinformatics 20 (September (13))
10.1007/978-0-387-88630-5_6. (2004) 19932003.
[42] Gang Fang, Majda Haznadar, Wen Wang, Haoyu Yu, Michael Steinbach, [71] Maurice G. Kendall, Rank Correlation Methods, Grifn, London, 1948.
Timothy R. Church, William S. Oetting, Brian Van Ness, Vipin Kumar, High- [72] Adam Kirsch, Michael Mitzenmacher, Andrea Pietracaprina, Geppino Pucci,
order SNP combinations associated with complex diseases: efcient dis- Eli Upfal, and Fabio Vandin, An efcient rigorous approach for identifying
covery, statistical power and functional interactions, Plos One 7 (2012). statistically signicant frequent itemsets, in: ACM SIGMOD Symposium on
[43] Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers, Principles of Database Systems, PODS '09, ACM, New York, NY, USA, 2009,
Vipin Kumar, Subspace differential coexpression analysis: problem denition pp. 117126.
and a general approach, in: Pacic Symposium on Biocomputing, World [73] L. Lazzeroni, A. Owen, Plaid models for gene expression data, Stat. Sin. 12
Scientic Publishing, 2010, pp. 145156. (2002) 6186.
[44] Paolo Favaro, Ren Vidal, Paolo Favaro, Avinash Ravichandran, A closed form [74] William Lee, Desiree Tillo, Nicolas Bray, HRandall Morse, Ronald W. Davis,
solution to robust subspace estimation and clustering, in: Computer Vision Timothy R. Hughes, Corey Nislow, A high-resolution atlas of nucleosome
and Pattern Recognition, IEEE, Colorado Springs, USA, 2011, pp. 18011807. occupancy in yeast, Nat. Genet. 39 (September (10)) (2007) 12351244.
[45] Usama M. Fayyad, Keki B. Irani, Multi-interval discretization of continuous- [75] Guimei Liu, Jinyan Li, Kelvin Sim, and Limsoon Wong, Distance based
valued attributes for classication learning, in: IJCAI, 1993, pp. 10221029. subspace clustering with exible dimension partitioning, in: ICDE, IEEE,
[46] Adelaide Freitas, Wassim Ayadi, Mourad Elloumi, Jos Lus, Jin-Kao 2007, pp. 12501254.
Hao Oliveira, Survey on biclustering of gene expression data, Biological [76] Guimei Liu, Hongjun Lu, Wenwu Lou, Jeffrey Xu Yu, On computing, storing
Knowledge Discovery Handbook (2012) 591608. and querying frequent patterns, in: ACM SIGKDD, ACM, New York, NY, USA,
[47] Guojun Gan, Jianhong Wu, A convergence theorem for the fuzzy subspace 2003, pp. 607612.
clustering (fsc) algorithm, Pattern Recognit. 41 (6) (2008) 19391947. [77] Hongyan Liu, Jiawei Han, Dong Xin, Zheng Shao, Top-down mining of
[48] Elisabeth Georgii, Lothar Richter, Ulrich Rckert, Stefan Kramer, Analyzing interesting patterns from very high dimensional data, in: ICDE, IEEE
microarray data using quantitative association rules, Bioinformatics 21 Computer Society, Washington, DC, USA, 2006, p. 114.
(January 2) (2005) 123129. [78] Jinze Liu, Wei Wang, Op-cluster: clustering by tendency in high dimensional
[49] Gad Getz, Erel Levine, and Eytan Domany. Coupled two-way clustering space, in: ICDM, IEEE Computer Society, Washington, DC, USA, Melbourne,
analysis of gene microarray data. Proc. Natl. Acad. Sci. 97 (22) (2000) 12079 Florida, USA, 2003, p. 187.
12084. [79] Nizar R. Mabroukeh, C.I. Ezeife, A taxonomy of sequential pattern mining
[50] Dmitry Gnatyshak, DmitryI Ignatov, Alexander Semenov, Jonas Poelmans, algorithms, ACM Comput. Surv. 43 (December (1)) (2010) 31341.
Gaining insight in social networks with biclustering and triclustering of [80] Jamie I. MacPherson, Jonathan E. Dickerson, John W. Pinney, David L.
LNBIP, in: Perspectives in Business Informatics Research, vol. 128, Springer, Robertson, Patterns of HIV-1 protein interaction identify perturbed host
Berlin Heidelberg, 2012, pp. 162171. cellular subsystems, PLoS Comput. Biol. 6 (7) (2010) e1000863.
[51] Gsta Grahne, Jianfei Zhu, Efciently using prex-trees in mining frequent [81] Sara Madeira, Miguel Nobre Parreira Cacho Teixeira, Isabel S-Correia, and
itemsets, in: FIMI, vol. 90, 2003. Arlindo Oliveira, Identication of regulatory modules in time series gene
[52] Rohit Gupta, Navneet Rao, Vipin Kumar, Discovery of error-tolerant biclus- expression data using a linear time biclustering algorithm, IEEE/ACM Trans.
ters from noisy gene expression data, BMC Bioinf. 12 (12) (2011) 117. Comput. Biol. Bioinf. 1 (January) (2010) 153165.
[53] E.H. Han, G. Karypis, V. Kumar, Min-apriori: an algorithm for nding [82] Sara C. Madeira, Arlindo L. Oliveira, Biclustering algorithms for biological
association rules in data with continuous attributes, Department of Compu- data analysis: A survey, IEEE/ACM Trans. Comput. Biol. Bioinf. 1 (January (1))
ter Science, University of Minnesota, Minneapolis (1997). (2004) 2445.
[54] Jiawei Han, Hong Cheng, Dong Xin, Xifeng Yan, Frequent pattern mining: [83] M.A. Mahfouz, M.A. Ismail, Bidens: iterative density based biclustering
current status and future directions, Data Min. Knowl. Discov. 15 (August (1)) algorithm with application to gene expression analysis, in: PWASET, vol. 37
(2007) 5586. 2009, pp. 342348.
[55] Jiawei Han, Jian Pei, Guozhu Dong, Ke Wang, Efcient computation of iceberg [84] Kazuhisa Makino, Takeaki Uno, New algorithms for enumerating all maximal
cubes with complex measures, SIGMOD Rec. 30 (May (2)) (2001) 112. cliques of LNCS, in: SWAT, vol. 3111, Springer, 2004, pp. 260272.
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3957

[85] David Martin, Christine Brun, Elisabeth Remy, Pierre Mouren, Denis Thieffry, [112] Fanhua Shang, L.C. Jiao, Fei Wang, Graph dual regularization non-negative
Bernard Jacq, Gotoolbox: functional analysis of gene datasets based on gene matrix factorization for co-clustering, Pattern Recognit. 45 (6) (2012) 2237
ontology, Genome biology, BioMed Central Ltd, 5(12), 2014, R101. 2250 (Brain Decoding).
[86] Ricardo Martinez, Claude Pasquier, Nicolas Pasquier, Genminer: Mining [113] Qizheng Sheng, Yves Moreau, Bart De Moor, Biclustering microarray data by
informative association rules from genomic data, Bioinformatics and Biome- gibbs sampling, in: ECCB, 2003, pp. 196205.
dicine, 2007, Nov, 1522, http://dx.doi.org/10.1109/BIBM.2007.49. [114] Kelvin Sim, Vivekanand Gopalkrishnan, Arthur Zimek, Gao Cong, A survey on
[87] Tara McIntosh, Sanjay Chawla, High condence rule mining for microarray enhanced subspace clustering, Data Min. Knowl. Discov. 26 (2) (2013)
analysis, IEEE/ACM Trans. Comput. Biol. Bioinf. 4 (October (4)) (2007), 332397.
611623. [115] Michael Steinbach, Pang-Ning Tan, Hui Xiong, Vipin Kumar, Generalizing the
[88] Guy W. Mineau, Akshay Bissoon, Robert Godin, Simple pre- and post- notion of support, in: ACM SIGKDD, 2004, ACM, New York, NY, USA, pp. 689
pruning techniques for large conceptual clustering structures, Electron. 694.
Trans. Artif. Intell. 4 (C) (2000) 120. [116] Michael Steinbach, Haoyu Yu, Gang Fang, Vipin Kumar, Using constraints to
[89] Sushmita Mitra, Haider Banka, Multi-objective evolutionary biclustering of generate and explore higher order discriminative patterns of LNCS, in:
gene expression data, Pattern Recognit. 39 (December (12)) (2006) PAKDD, vol. 6634, Springer, 2011, pp. 338350.
24642477. [117] Pang-Ning Tan, Vipin Kumar, Jaideep Srivastava, Selecting the right interest-
[90] Anirban Mukhopadhyay, Ujjwal Maulik, Sanghamitra Bandyopadhyay, A ingness measure for association patterns, in: ACM SIGKDD, ACM, Edmonton,
novel biclustering approach to association rule mining for predicting HIV-1 Alberta, Canada, 2002, pp. 3241.
human protein interactions, PLoS One 7 (4) (2012) e32289. [118] A. Tanay, R. Sharan, R. Shamir, Biclustering algorithms: a survey, in: Hand-
[91] Emmanuel Mller, Ira Assent, Ralph Krieger, Stephan Gnnemann, Thomas book of Computational Molecular Biology, 2004.
Seidl, Densest: Density estimation for data mining in high dimensional [119] Amos Tanay, Roded Sharan, Ron Shamir, Discovering statistically signicant
spaces, in: SDM, SIAM, 2009, 173184. biclusters in gene expression data, in: ISMB, 2002, pp. 136144.
[92] Emmanuel Mller, Stephan Gnnemann, Ira Assent, Thomas Seidl, Evaluat- [120] Chun Tang, Li Zhang, Murali Ramanathan, Aidong Zhang, Interrelated two-
ing clustering in subspace projections of high dimensional data, VLDB way clustering: an unsupervised approach for gene expression data analysis,
Endow. 2 (August (1)) (2009) 12701281. in: BIBE, Washington, DC, USA, 2001, IEEE CS, p. 41.
[93] James Munkres, Algorithms for the assignment and transportation problems, [121] Teixeira, Miguel Cacho and Monteiro, Pedro Tiago and Guerreiro, Joana
Soci. Ind. Appl. Math. 5 (1) (1957) 3238. Fernandes and Gonc- alves, Joana Pinho and Mira, Nuno Pereira and dos
[94] T.M. Murali, Simon Kasif, Extracting conserved gene expression motifs from Santos, Sandra Costa and Cabrito, Tnia Rodrigues and Palma, Margarida and
gene expression data, in: Pacic Symposium on Biocomputing, 2003, Costa, Catarina and Francisco, Alexandre Paulo and others. The YEASTRACT
pp. 7788. database: an upgraded information system for the analysis of gene and
[95] Omar Odibat, Chandan K. Reddy, Efcient mining of discriminative co-
genomic transcription regulation in Saccharomyces cerevisiae, Nucleic Acids
clusters from gene expression data, Knowl. Inf. Syst. (2013) 130.
Res. (database issue) (2014).
[96] Yoshifumi Okada, Wataru Fujibuchi, Paul Horton, A biclustering method for
[122] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani,
gene expression module discovery using closed itemset enumeration algo-
D. Botstein, R.B. Altman, Missing value estimation methods for DNA micro-
rithm, IPSJ Trans. Bioinf. 48 (SIG5) (2007) 3948.
arrays, Bioinformatics 17 (6) (2001) 520525. http://dx.doi.org/10.1093/
[97] Yoshifumi Okada, Kosaku Okubo, Paul Horton, Wataru Fujibuchi, Exhaustive
bioinformatics/17.6.520.
search method of gene expression modules and its application to human
[123] Heather Turner, Trevor Bailey, Wojtek Krzanowski, Improved biclustering of
tissue data, IAENG Int. J. Comput. Sci. 34 (1) (2007) 119126.
microarray data demonstrated through systematic performance tests, Com-
[98] Patryk Orzechowski, Proximity measures and results validation in bicluster-
put. Stat. Data Anal. 48 (2) (2005), 235254.
ing - a survey of LNCS, Articial Intelligence and Soft Computing, vol. 7895,
[124] Miranda van Uitert, Wouter Meuleman, Lodewyk Wessels, Biclustering
Springer, Berlin Heidelberg (2013) 206217.
sparse binary genomic data, J. Comput. Biol. 15 (10) (2008) 13291345.
[99] Feng Pan, Gao Cong, Anthony K.H. Tung, Jiong Yang, Mohammed Javeed Zaki,
[125] Takeaki Uno, Masashi Kiyomi, Hiroki Arimura, Lcm ver.3: collaboration of
Carpenter: nding closed patterns in long biological datasets, in: ACM
array, bitmap and prex tree for frequent itemset mining, in: OSDM, ACM,
SIGKDD, 2003, pp. 637642.
New York, NY, USA, 2005.
[100] Feng Pan, A.K.H. Tung, Gao Cong, Xin Xu, Cobbler: combining column and
[126] Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu, Clustering by pattern
row enumeration for closed pattern discovery, in: Scientic and Statistical
similarity in large data sets, in: SIGMOD, ACM, New York, NY, USA, 2002,
Database Management, June 2004, pp. 2130.
[101] Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Chad L. Myers, Vipin pp. 394405.
Kumar, An association analysis approach to biclustering, in: ACM SIGKDD, [127] Shu Wang, Robin R Gutell, Daniel P Miranker, Biclustering as a method for
ACM, New York, NY, USA, 2009, pp. 677686. rna local multiple sequence alignment, Bioinformatics 23 (24) (2007)
[102] Nicolas Pasquier, Yves Bastide, Rak Taouil, Lot Lakhal, Efcient mining of 32893296.
association rules using closed itemset lattices, Inf. Syst. 24 (March (1)) (1999) [128] Zhiguan Wang, Chi Wai Yu, Ray C.C. Cheung, Hong Yan, Hypergraph based
2546. geometric biclustering algorithm, Pattern Recognit. Lett. 33 (12) (2012)
[103] Anne Patrikainen, Marina Meila, Comparing subspace clusterings, IEEE Trans. 16561665.
Knowl. Data Eng. 18 (July (7)) (2006) 902916. [129] Takashi Washio, Hiroshi Motoda, State of the art of graph-based data mining,
[104] Ren Peeters., The maximum edge biclique problem is np-complete, Discrete SIGKDD Explor. Newslett. 5 (July (1)) (2003) 5968.
Appl. Math. 131 (September (3)) (2003) 651654. [130] Peter H. Westfall, S. Stanley Young, Resampling-Based Multiple Testing :
[105] Liuqing Peng, Junying Zhang, An entropy weighting mixture model for Examples and Methods for p-Value Adjustment, John Wiley & Sons, 1993.
subspace clustering of high-dimensional data, Pattern Recognit. Lett. 32 (8) [131] Hu Xia, Jian Zhuang, Dehong Yu, Novel soft subspace clustering with multi-
(2011) 11541161. objective evolutionary approach for high-dimensional data, Pattern Recognit.
[106] Beatriz Pontes, Ral Girldez, Jess S Aguilar-Ruiz, Congurable pattern- 46 (9) (2013) 25622575. http://dx.doi.org/10.1016/j.patcog.2013.02.005.
based evolutionary biclustering of gene expression data, Algorithms Mol. [132] Dong Xin, Zheng Shao, Jiawei Han, Hongyan Liu, C-cubing: efcient compu-
Biol. 8(1) (2013) 4. tation of closed cubes by aggregation-based checking, in: ICDE, IEEE
[107] Ignacio Ponzoni, Francisco Azuaje, Juan Augusto, David Glass, Inferring Computer Society, 2006, p. 4.
adaptive regulation thresholds and association rules from gene expression [133] Hui Xiong, Xiao-Feng Heb, Chris Ding, Ya Zhang, Vipin Kumar, Stephen R
data through combinatorial optimization learning, IEEE/ACM Trans. Comput. Holbrook, Identication of functional modules in protein complexes via
Biol. Bioinf. 4 (4) (2007) 624634. hyperclique pattern discovery, in: Pacic Symposium on Biocomputing,
[108] Amela Preli, Stefan Bleuler, Philip Zimmermann, Anja Wille, 2005.
Peter Bhlmann, Wilhelm Gruissem, Lars Hennig, Lothar Thiele, [134] Hui Xiong, Pang-Ning Tan, Vipin Kumar, Hyperclique pattern discovery, Data
Eckart Zitzler, A systematic comparison and evaluation of biclustering Min. Knowl. Discov. 13 (2) (2006) 219242.
methods for gene expression data, Bioinformatics 22 (June (9)) (2006) [135] Mohammed J. Zaki, Karam Gouda, Fast vertical mining using diffsets, in: ACM
11221129. SIGKDD, ACM, New York, NY, USA, 2003, pp. 326335.
[109] Andreas Rosenwald, George Wright, Wing C. Chan, et al., The use of [136] Mohammed J. Zaki, Ching J. Hsiao, CHARM: An Efcient Algorithm for Closed
molecular proling to predict survival after chemotherapy for diffuse large- Itemset Mining.
B-cell lymphoma, N. Engl. J. Med. 346 (June 25) (2002) 19371947. [137] Hongya Zhao, Kwok Leung Chan, Lee-Ming Cheng, L. Cheng, Hong Yan, A
[110] Swarup Roy, KDhruba Bhattacharyya, KJugal Kalita, Cobi: pattern based co- probabilistic relaxation labeling framework for reducing the noise effect in
regulated biclustering of gene expression data, Pattern Recognit. Lett. 34 (14) geometric biclustering of gene expression data, Pattern Recognit. 42 (11)
(2013) 16691678. (2009) 25782588. http://dx.doi.org/10.1016/j.patcog.2009.03.016.
[111] Akdes Serin, Martin Vingron, Debi: discovering differentially expressed [138] Feida Zhu, Xifeng Yan, Jiawei Han, P.S. Yu, Hong Cheng, Mining
biclusters using a frequent itemset approach, Algorithms Mol. Biol. 6 (2011) colossal frequent patterns by core pattern fusion, in: ICDE, April 2007,
112. pp. 706715.
3958 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958

Rui Henriques received a M.Sc. degree in computer science and engineering from Instituto Superior Tcnico (IST), Universidade de Lisboa. He is developing his Ph.D. studies
in the eld of learning from high-dimensional and structured data at IST and INESC-ID. He had received distinctions for his academic achievements by IST between 2006 and
2008, and a National Award for his merits by Caixa Geral de Depsitos, in 2009. He has also been a Business Analyst at McKinsey with wide exposure to real-life projects.

Cludia Antunes received her Ph.D. from Instituto Superior Tcnico (IST, University of Lisbon, Portugal) in the domain of data mining and machine learning, proposing new
methods to deal with temporal data, in particular for mining event sequential patterns. She is currently a Professor at DEI department at IST and the scientic coordinator of
two projects funded by FCT in the areas of domain-driven data mining and educational data mining. Cludia has been working on methods for general pattern mining, from
transactional to structured data. Her main interests are centered on mining complex knowledge from complex data, with emphasis on the incorporation of background
knowledge in the pattern mining process.

Sara C. Madeira received a (5-year) B.Sc. degree in computer science from the University of Beira Interior, Covilh, Portugal, in 2000, and the M.Sc. and Ph.D. degrees in
computer science and engineering (CSE) at Instituto Superior Tcnico (IST), Technical University of Lisbon, in 2002 and 2008. She is currently an Assistant Professor, at the CSE
department at IST, and a Senior Researcher at INESC-ID, Lisbon. Her research interests include algorithms and data structures, data mining, machine learning, bioinformatics
and medical informatics.

Pattern Recognition - Theodoridis Koutroumbas
No ratings yet
Pattern Recognition - Theodoridis Koutroumbas
641 pages
Pattern Mining Current Challenges and Op
No ratings yet
Pattern Mining Current Challenges and Op
16 pages
Unit-4_Part-2
No ratings yet
Unit-4_Part-2
45 pages
1. Clustering
No ratings yet
1. Clustering
75 pages
Outer-Points Shaver-Robust Graph-based Clustering via Node Cutting
No ratings yet
Outer-Points Shaver-Robust Graph-based Clustering via Node Cutting
13 pages
Clustering High-Dimensional Data - A Survey On Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering
No ratings yet
Clustering High-Dimensional Data - A Survey On Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering
58 pages
Multi-Objective Evolutionary Biclustering of Gene Expression Data
No ratings yet
Multi-Objective Evolutionary Biclustering of Gene Expression Data
14 pages
PublishedPaper
No ratings yet
PublishedPaper
17 pages
Clustering Part2
No ratings yet
Clustering Part2
29 pages
Data Mining Graphs and Networks
No ratings yet
Data Mining Graphs and Networks
5 pages
aipptoriginal-191215023212
No ratings yet
aipptoriginal-191215023212
16 pages
澳大利亚悉尼科技大学利用质量与距离峰值快速自主聚类，开发出Torque Clustering算法，实现无参数化高效聚类
No ratings yet
澳大利亚悉尼科技大学利用质量与距离峰值快速自主聚类，开发出Torque Clustering算法，实现无参数化高效聚类
14 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
Mukhopadhyay 2015
No ratings yet
Mukhopadhyay 2015
46 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
DBSCAN Past, present and future
No ratings yet
DBSCAN Past, present and future
7 pages
ml8
No ratings yet
ml8
5 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
DNE110318f
No ratings yet
DNE110318f
10 pages
applsci-14-00715
No ratings yet
applsci-14-00715
13 pages
Chapter - 1: 1.1 Overview
No ratings yet
Chapter - 1: 1.1 Overview
50 pages
Clustering
No ratings yet
Clustering
65 pages
ClusteringAlgorithms ConventionalandRecent
No ratings yet
ClusteringAlgorithms ConventionalandRecent
30 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Unit 3 Updated Notes
No ratings yet
Unit 3 Updated Notes
29 pages
A General-Purpose Distributed Pattern Mining System
No ratings yet
A General-Purpose Distributed Pattern Mining System
16 pages
DS143 Group 13 Presentation-1
No ratings yet
DS143 Group 13 Presentation-1
27 pages
Clustering new
No ratings yet
Clustering new
6 pages
Chapter 9
No ratings yet
Chapter 9
22 pages
Clustering 2
No ratings yet
Clustering 2
17 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
CLIQUE and PROCLUS
0% (1)
CLIQUE and PROCLUS
13 pages
Enhanced Over - Sampling Techniques For Imbalanced Big Data Set Classi Fication
No ratings yet
Enhanced Over - Sampling Techniques For Imbalanced Big Data Set Classi Fication
33 pages
Author's Accepted Manuscript: Pattern Recognition
No ratings yet
Author's Accepted Manuscript: Pattern Recognition
41 pages
Generic Pattern Mining
No ratings yet
Generic Pattern Mining
17 pages
Dbsmote: Density-Based Synthetic Minority Over-Sampling Technique
No ratings yet
Dbsmote: Density-Based Synthetic Minority Over-Sampling Technique
21 pages
Clustering Methods For Big Data Analytics Techniques, Toolboxes and Applications
No ratings yet
Clustering Methods For Big Data Analytics Techniques, Toolboxes and Applications
192 pages
A02-Multivariate Time Series Clustering Based On Complex Network
No ratings yet
A02-Multivariate Time Series Clustering Based On Complex Network
17 pages
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
No ratings yet
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
3 pages
Design and Implementation of High End Multiple Security Based ATM Monitoring System
No ratings yet
Design and Implementation of High End Multiple Security Based ATM Monitoring System
3 pages
Kshape
No ratings yet
Kshape
49 pages
A Network Flow Model For Biclustering Via Optimal Re-Ordering of Data Matrices
No ratings yet
A Network Flow Model For Biclustering Via Optimal Re-Ordering of Data Matrices
12 pages
Data Mining Technologies and Implementations
No ratings yet
Data Mining Technologies and Implementations
34 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Lecture 6 - Clustering
No ratings yet
Lecture 6 - Clustering
25 pages
Distributed Data Mining: Scaling Up and Beyond: Foster Provost New York University New York, NY 10012
No ratings yet
Distributed Data Mining: Scaling Up and Beyond: Foster Provost New York University New York, NY 10012
25 pages
Clustering High-Dimensional Data
No ratings yet
Clustering High-Dimensional Data
5 pages
Mathematics n4 Student Book_compressed-1
No ratings yet
Mathematics n4 Student Book_compressed-1
298 pages
CLUSTERING
No ratings yet
CLUSTERING
5 pages
Research On Pattern Analysis and Data Classification Methodology For Data Mining and Knowledge Discovery
No ratings yet
Research On Pattern Analysis and Data Classification Methodology For Data Mining and Knowledge Discovery
10 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
B.tech (Computer Science Engineering) KU-2022-23
No ratings yet
B.tech (Computer Science Engineering) KU-2022-23
158 pages
Graph Regularized Feature Selection With Data Reconstruction
No ratings yet
Graph Regularized Feature Selection With Data Reconstruction
10 pages
PR Assignment 02 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 02 - Seemal Ajaz (206979)
5 pages
On Clustering Binary Data: Tao Li Shenghuo Zhu
No ratings yet
On Clustering Binary Data: Tao Li Shenghuo Zhu
5 pages
NGDM07 Philip Yu
No ratings yet
NGDM07 Philip Yu
22 pages
Sakhr - Chaib - Paper On Data Mining
No ratings yet
Sakhr - Chaib - Paper On Data Mining
3 pages
T. Oden (Auth.), E. Oñate, J. Periaux, A. Samuelsson (Eds.) - The Finite Element Method in The 1990'S - A Book Dedicated To O.C. Zienkiewicz-Springer-Verlag Berlin Heidelberg (1991)
100% (1)
T. Oden (Auth.), E. Oñate, J. Periaux, A. Samuelsson (Eds.) - The Finite Element Method in The 1990'S - A Book Dedicated To O.C. Zienkiewicz-Springer-Verlag Berlin Heidelberg (1991)
632 pages
Data Clustering Seminar
No ratings yet
Data Clustering Seminar
34 pages
B SC Mathematics
No ratings yet
B SC Mathematics
35 pages
06. JEE Main FT - 6(N)_QP
No ratings yet
06. JEE Main FT - 6(N)_QP
20 pages
Form 3 Term 2 Scheme
No ratings yet
Form 3 Term 2 Scheme
20 pages
R Tutorial PDF
100% (2)
R Tutorial PDF
196 pages
Previous 10 Year Question Paper RM
No ratings yet
Previous 10 Year Question Paper RM
20 pages
FURTHERMATHS
No ratings yet
FURTHERMATHS
6 pages
B.Tech - ECE - Syllabus - 2017 - Final-1 (2019 - 09 - 10 17 - 10 - 34 UTC)
No ratings yet
B.Tech - ECE - Syllabus - 2017 - Final-1 (2019 - 09 - 10 17 - 10 - 34 UTC)
112 pages
(ROBERT M. L. BAKER, JR.) Astrodynamics Applicati
No ratings yet
(ROBERT M. L. BAKER, JR.) Astrodynamics Applicati
278 pages
Math Mode
No ratings yet
Math Mode
157 pages
Sbi Po: Mains Paper (English)
No ratings yet
Sbi Po: Mains Paper (English)
40 pages
Matrices and Calculus_Question bank
No ratings yet
Matrices and Calculus_Question bank
5 pages
Ch3 Matrices
No ratings yet
Ch3 Matrices
33 pages
Arctic Circles, Domino Tilings and Square Yt
No ratings yet
Arctic Circles, Domino Tilings and Square Yt
47 pages
Assignment 1 Soluti
No ratings yet
Assignment 1 Soluti
3 pages
ASCD Upgrade Curriculum
No ratings yet
ASCD Upgrade Curriculum
30 pages
M1 Question Bank 1 SRM Eec
No ratings yet
M1 Question Bank 1 SRM Eec
5 pages
2 Transform Coding - KLT - Discriet
No ratings yet
2 Transform Coding - KLT - Discriet
17 pages
MT 1117: Linear Algebra For ICT: Instructor: A.V. Mathias Department of Mathematics & Statistics University of Dodoma
No ratings yet
MT 1117: Linear Algebra For ICT: Instructor: A.V. Mathias Department of Mathematics & Statistics University of Dodoma
25 pages
Graphics3 2DTransformations
No ratings yet
Graphics3 2DTransformations
27 pages
Matrix Theory PDF
No ratings yet
Matrix Theory PDF
6 pages
Test Beams: J. Struct. Eng. 1989.115:2129-2144
No ratings yet
Test Beams: J. Struct. Eng. 1989.115:2129-2144
16 pages
JMM Volume 7 Issue 2 Pages 251-261
No ratings yet
JMM Volume 7 Issue 2 Pages 251-261
11 pages
Nda Syllabus
No ratings yet
Nda Syllabus
6 pages
MTH 501
No ratings yet
MTH 501
7 pages
Linear Algebra PDF
No ratings yet
Linear Algebra PDF
20 pages
Re - ST - Hausman and Xthausman After Panel Fe, Re PDF
No ratings yet
Re - ST - Hausman and Xthausman After Panel Fe, Re PDF
6 pages
Applications & Interpretation - 1 Page Formula Sheet: IB Mathematics SL & HL - First Examinations 2021
100% (1)
Applications & Interpretation - 1 Page Formula Sheet: IB Mathematics SL & HL - First Examinations 2021
1 page
Vector Database: Definitive Reference for Developers and Engineers
From Everand
Vector Database: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Pinecone Hybrid Search Engineering: The Complete Guide for Developers and Engineers
From Everand
Pinecone Hybrid Search Engineering: The Complete Guide for Developers and Engineers
William Smith
No ratings yet