Review Madeira 2015
Review Madeira 2015
Pattern Recognition
journal homepage: www.elsevier.com/locate/pr
art ic l e i nf o a b s t r a c t
Article history: Mining matrices to nd relevant biclusters, subsets of rows exhibiting a coherent pattern over a subset of
Received 1 May 2014 columns, is a critical task for a wide-set of biomedical and social applications. Since biclustering is a
Received in revised form challenging combinatorial optimization task, existing approaches place restrictions on the allowed
12 May 2015
structure, coherence and quality of biclusters. Biclustering approaches relying on pattern mining (PM)
Accepted 26 June 2015
allow an exhaustive yet efcient space exploration together with the possibility to discover exible
Available online 8 July 2015
structures of biclusters with parameterizable coherency and noise-tolerance. Still, state-of-the-art
Keywords: contributions are dispersed and the potential of their integration remains unclear.
Biclustering This work proposes a structured and integrated view of the contributions of state-of-the-art PM-
Pattern mining
based biclustering approaches, makes available a set of principles for a guided denition of new PM-
based biclustering approaches, and discusses their relevance for applications in pattern recognition.
Empirical evidence shows that these principles guarantee the robustness, efciency and exibility of
PM-based biclustering.
& 2015 Elsevier Ltd. All rights reserved.
1
Biclustering involves combinatorial optimization to select and group rows
n
Correspondence to: DEI, IST, Avenida Rovisco Pais, 1, 1049-001 Lisboa, Portugal. and columns and it is known to be a NP problem (by mapping the task over binary
Tel.: 351 21 310 0 300; fax: 351 21 841 7 789. matrices into the problem of nding maximal cliques in weighted bipartite graphs
E-mail addresses: [email protected] (R. Henriques), [104]). The problem complexity increases for non-binary settings and when
[email protected] (C. Antunes), elements are allowed to participate in more than one bicluster (non-exclusive
[email protected] (S.C. Madeira). structure) and in no bicluster at all (non-exhaustive structure).
http://dx.doi.org/10.1016/j.patcog.2015.06.018
0031-3203/& 2015 Elsevier Ltd. All rights reserved.
3942 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958
Table 1
Relevance of the biclustering task for pattern recognition applications.
Biomedical Physiological [23,39,44] Modules of sliding features and partitions of the signal across a subset of case or stimuli-elicited responses; groups of patients
with shared local patterns; markers for phenotype characterization.
Clinical [59,27] Groups of patients with correlated clinical features or health records (shared treatments, diagnoses, prescriptions and clinical
tests); class-conditional proles for computer-aided diagnosis.
Genomic structural Correlated groups of mutations and copy number variations, such as genetic similarities and dissimilarities of different
variations [42,67] populations.
Biological networks [12] Modules of genes, proteins or metabolites with cohesive local interaction using matrices that capture the pairwise connections
between all molecular units.
Gene expression [62,82] Groups of genes involved in functional processes and pathways (cellular responses to growth, development, drugs and disease
progression) only active under certain conditions.
Genome-wide [127,124] Conserved functional subsequences (alignments), factor binding sites and insertion mutagenesis.
Other [37,78,73] Local regularities in translational, chemical or nutritional data;
Social Social networks [50] Groups of individuals with shared interests, correlated activity and/or coherent intercommunication; aggregation of contents
based on correlated accessors' prole, comments and tags.
Text [82] Groups of content-related documents to support searches, suggestions and tagging (rows in the input matrix denote
documents and columns denote the words), among others.
(e-)commerce [9] Hidden browsing patterns containing relationships between sets of (web) users and (web) pages and acquisitions which are
useful for (web) advertising and marketing.
Financial trading [68] Subsets of indicators producing similar protability for subsets of trading points (buy and sell signals) in the stock market in
order to support buy-and-hold decisions.
Collaborative ltering [33] Groups of users who share the same rating patterns and behaviorial patterns for a subset of all available actions for
recommendation and quality studies.
elements). Additional PM principles can be used to foster research. In this context, this work provides three major
scalability, including searches in distributed/partitioned data contributions:
settings or targeting approximate patterns [54,52].
dealing with missing and noisy values [62,63]: PM methods can motivates, formalizes and provides a qualitative and quantita-
mine transactions with varying length, and therefore a specic tive assessment of the state-of-the-art algorithms for PM-based
element from the input matrix can be associated with zero or biclustering;
multiple values, allowing the removal or bounded estimations offers a structured view on how to dene, parameterize and
of a missing or noisy value. extend PM-based biclustering by coherently integrating the
inherent orientation to learn constant models, yet recently available yet dispersed contributions;
extended to also learn additive, multiplicative, symmetric, further surveys PM principles as well as adequate preproces-
order-preserving and plaid models [62,60,63]; sing and postprocessing criteria to guarantee the robustness,
capturing biclusters from patterns with multiple levels of exibility and scalability of PM-based biclustering across
expression [96,101]. This contrasts with the majority of existing domains.
approaches that rely on differential values or xed coherency
strength [119]; The paper is organized as follows. The remainder of this section
exible structures of biclusters (arbitrary positioning of biclus- provides background on pattern mining and biclustering, and
ters) and searches (no need to x the number of biclusters surveys the contributions from existing PM-based biclustering
apriori) [96,111]; approaches. Section 2 introduces a consistent set of principles to
annotating the signicance of biclusters with PM principles to guide the denition of PM-based biclustering approaches. In
assess the relevance of patterns [72]; particular, Sections 2.12.3 cover principles according to three
easy extension for multi-class settings using discriminative PM major decision dimensions (mining, mapping and closing), and
or classication rules [43,95]; Section 2.4 compares the behavior of state-of-the-art PM-based
easy incorporation of PM-based constraints that can be effec- biclustering approaches and proposes a set of principles to address
tively used to guide the search, promoting both efciency, by their current challenges. Section 3 provides initial empirical
pruning the search space, and a focus on non-trivial biclusters evidence of the relevance of the proposed principles. Finally, the
[116]. implications of this work are synthesized.
These properties of PM-based biclustering approaches are 1.1. Background on PM-based biclustering
critical to tackle the problems highlighted in Table 1. Although
the latest biclustering advances for pattern recognition are Pattern mining: Frequent patterns are itemsets, rules, subsequences,
increasingly deterministic [89,110,128,137,47,35,131], they fail to or substructures that appear in a dataset with frequency no less than
meet several of the enumerated properties of PM-based bicluster- a user-specied threshold. Let L be a nite set of items, and P be an
ing. Table 2 pinpoints the benets of using PM-based biclustering itemset P D L. A transaction t is a pair t id ; P with id A N. An itemset
for pattern recognition. database D over L is a nite set of transactions ft 1 ; ; t n g. A
Despite these listed potentialities, recent surveys on bicluster- transaction id; P contains P 0 , denoted P 0 D t id ; P, if P 0 D P. The
ing [46,40,28,114] fail to explore the opportunities associated with coverage P of an itemset P is the set of all transactions in D in
PM-based biclustering. Additionally, the existing efforts towards which the itemset P occurs: P ft A DP D tg. The support of an
PM-based biclustering provide critical principles that are not yet itemset P in D, denoted supP, can either be absolute, being its
integrated [14,86,111]. As such, there is still space for new coverage size P , or a relative threshold given by j P j =j Dj .
approaches that benet from the integration of principles pro- An association rule is dened as an implication of the form
vided by these existing contributions as well as from other elds of P-P 0 , where P; P 0 D L and P \ P 0 . The left-hand side of the rule
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3943
Table 2
Benets of PM-based biclustering for pattern recognition.
Property Benet
Exhaustive scalable searches Delivery of optimality guarantees for large data such as data from clinical, molecular and social web domains.
Noise robustness Handling of uncertainty relations observed in social networks [50] and stock markets [68]; artefacts in multivariate physiological data
(such as electroencephalograms [41]), experimental errors in molecular arrays [56].
Handling of missing values Adequate mining of incomplete and/or sparse matrices derived from biological networks, web social contexts, and healthcare data.
Flexible coherency Constant models for non-differential (yet coherent) functional associations; additive and multiplicative factors to model the distinct
responsiveness and experimental bias of biological molecules and physiological signals; symmetries to simultaneously capture activation
and repression mechanisms and opposed (yet correlated) regularities associated with trading, tweeting, browsing and (e-)commerce
activity; plaid models for overlapping regulatory inuence in biological contexts and cumulative effects in social/biological networks
[60,62,61].
Parameterizable level of Dynamic denition of the desirable coherency strength for an adequate multi-level analysis of matrices derived from expression data
coherency (optimum number of expression levels [86]), scored networks, collaborative ltering data (grading scale), and physiological signals
(adequate resolution [39]).
Flexible structures Overlapping groups of molecular units, physiological features, patients, web users and transactions with varying size and congurations.
Annotated signicance Testing the statistical signicance of biclustering solutions (guaranteeing that their coherence does not occur by chance) to further validate
their use to support critical decisions, such as medical and nancial decisions.
Constraint-driven searches Discovery of non-trivial biclusters and ability to focus the search on specic biclusters of interest (e.g. specic regulatory behavior, high-
order SNPs from genome-wide data, web users with a specic behavior, health records related with particular medical conditions, domain-
guidance from background knowledge [42,66]).
Biclustering-based Support for classication tasks from matrices with a large number of uninformative elements (beneting from local views), including
classication computer-aided diagnosis, phenotype discrimination and user recommendations [27,43].
Denition 1.1. Given an itemset database D and a minimum The homogeneity criteria is commonly guaranteed through the
support and condence thresholds, and : use of a merit function to guide the search [98]. An illustrative
merit function is the variance of values in the rows or columns in
frequent itemset mining (FIM) problem consists of computing the bicluster. Merit functions can either dene the homogeneity of
the set fPP D L; supP Z g; each bicluster (intra-bicluster homogeneity) or the homogeneity
association rule mining aims to compute fP; P 0 P D L; P 0 D of a set of biclusters (inter-bicluster homogeneity), allowing some
L; supP-P 0 Z; conf P-P 0 Zg. biclusters to deviate from the expected homogeneity as long as the
overall criterion is preserved. The merit function is the simplest
way to affect the coherency, quality and structure. The coherency of
A frequent itemset or a pattern is an itemset with supP Z . To
a bicluster is dened by the observed correlation of values
illustrate these concepts, consider the following itemset database,
(Denition 1.3). Biclusters can follow dense, constant, additive,
Dex ft 1 ; fB; E; Gg; t 2 ; fA; B; C; E; H; Jg; t 3 ; fA; B; D; H; Jg; t 4 ; fD;
multiplicative, plaid or order-preserving coherencies, either across
H; Jg; t 5 ; fA; H; Jg; t 6 ; fA; Ggg, with L12. We have fB;Jg ft 2 ;
rows or columns [82]. The quality of a bicluster is dened by the
t 3 g and supfB;Jg ft 2 ; t 3 g=6 0:3. An illustrative rule in Dex is R1 :
type and amount of accommodated noise. The structure is dened
fH; Jg-fAg with supR1 0.5 and conf R1 0.75. For 4, the FIM
by the number,2 size and positioning of biclusters. Flexible
tasks returns ffAg; fHg; fJg; fH; Jgg.
structures are characterized by an arbitrary-high set of (possibly
Consider two itemsets P and P 0 , where P 0 D P, and a predicate M.
overlapping) biclusters. The statistical signicance of a bicluster
M is monotonic when MP ) MP 0 and anti-monotonic when
determines how its probability of occurrence deviates from
:MP 0 ) :MP. FIM approaches rely on these properties: the
expectations. Following the taxonomy proposed by Madeira and
support of P is bounded by the support of P 0 and, if P 0 is not
Oliveira [82], Table 4 synthesizes the main biclustering approaches
frequent, then P is also not frequent. Table 3 shows three major
acccording to their search paradigm.
search variants that rely on these properties.
Since FIM proposal [2], multiple extensions have been proposed,
Denition 1.3. Let the elements in a bicluster aij A I; J have
including principles to enhance the scalability of pattern miners, and
coherency across rows given by aij kj i ij , where kj is the
condensed and approximate pattern representations [24,54].
expected value for column j, i is the adjustment for row i, and ij is
Pattern mining has been additionally applied over structured
the noise factor. Given a dataset A and a specic coherency
datasets, leading to contributions in different elds, including
strength A 0; maxA minA , aij kj i ij where ij A kj
sequential pattern mining [79], graph mining [129] and cube
=2; kj =2. The factors dene the coherency assumption:
computation [55].
constant when 0, multiplicative if aij is better described by
kj i ij , and additive otherwise. A plaid assumption considers the
Biclustering: Biclustering allows the discovery of subspaces, each
cumulative contributions from multiple biclusters on areas where
dening a subset of rows that show a coherent pattern that is
their rows and columns overlap.
observed for a subset of the overall columns.
Table 3
Three major search strategies to perform frequent itemset mining.
Apriori-based Monotonicity principle (an itemset is candidate if Incremental mining; Hashing; Use of bit-sets; Inefcient for dense data (density above
[2] all its subsets are frequent): k 1-itemsets are Reduced scans; Partitioning and sampling; 20%).
combined to create new candidate k-itemsets in k Dynamic itemset counting;
scans until no new candidate group can be
generated.
Pattern growth Divide-and-conquer without candidate generation Depth-rst tree generation; Alternative trees; Not able to deliver the supporting
[1] and multiple scans. A frequent-pattern tree is built Combined bottom-up and top-down traversals; transactions of a pattern (required for
(from an ordered list of frequent items) and mined Array-based structures. biclustering). Adequate for dense
(based on prex paths co-occurring with growing matrices and low supports.
sufx patterns). By using the least frequent items as
a sufx, a good selectivity is achieved.
Vertical Eclat, a representative vertical method, builds the Specialized structures; Bit-set operations; Optimized for attened matrices (n4 m).
projection transaction-set for each item and grows the
[135] itemsets under a depth-rst strategy (similar to FP-
growth) by intersecting transaction-sets to avoid
multiple scans.
Table 4
Classes of biclustering approaches according to merit-guided searches and optima guarantees.
Divide-and-conquer approaches to exploit the matrix recursively with the branching following a Local optima (local searches dependent on initial assumptions and
global merit function [57,128,137]. Although efcient, the structure of biclusters is restrictive convergence behavior)
and the initial assumptions can easily lead to the missing of relevant biclusters.
Greedy iterative approaches with the selection, addition and removal of rows and columns being
performed until a local merit function is maximized [35,89,131,94,15].
Two-way clustering approaches under merit functions to produce the clusters on both dimensions of Distance-based guarantees as learners rely on approximative views
the data matrix and to derive biclusters from their combinations [49,120,47]; Stochastic (clustering abstractions or generative models)
approaches that model data with a multivariate distribution [105,112,17,113] and learn a
parametric model that maximizes a merit function. This model is used to derive biclusters.
Ensemble methods [56] that use a merit function to aggregate a large set of biclustering solutions Dependent on selected approaches
from the iterative application of multiple biclustering approaches.
Exhaustive approaches under constrains (e.g. x number of biclusters, differential expression) Global optima
[119,126,110], which rely on heuristics based on merit functions to guide the space exploration.
PM-based biclustering: While traditional biclustering approaches rely extending methods based on the introduced monotonic (or Apriori)
on exible merit functions to guide the space exploration, PM-based property [2]. The rst class of methods rely on an itemization step
approaches require these functions to be dened in terms of support followed by the application of FIM under a low support threshold.
and, eventually, condence or other interestingness metrics. This The itemization step maps a real-value or discrete matrix into an
restriction enables a scalable exhaustive space search that produces itemset database. For real-value matrices, normalization and discre-
an arbitrarily high number of biclusters within a exible structure. tization procedures are applied. Then, the discrete value of each
element is concatenated with its column index. Each transaction of
Denition 1.4. Let A be a matrix whose values in R are assigned to the target itemset database corresponds to a row with these new
a set of items L. A bicluster under a constant model can either values. FIM is then applied over this database to mine frequent
follow: an overall orientation where aij A L; a column-based patterns for composing biclusters with coherency across rows. The
orientation where aij kj and kj A L; or a row-based orientation second class of methods relies on variants of the FIM task to learn
where aij ki and ki A L. A bicluster following an additive (or frequent patterns directly from the real-valued matrix. In both
multiplicative) model has aij kj i (or aij ki j ), where ki A R classes, the coherency strength is implicitly dened by the number
and j A R dene the column and row contributions. A bicluster of items or the maximum allowed distance. Biclusters with coher-
under a symmetric model either considers symmetries on rows ci ency across columns can be mined using the transpose matrix.
aij or columns cj aij , where ci A f 1; 1g. Finally, biclusters with coherent values overall can be discovered by
mining one item (or range of values) at a time. Fig. 1 illustrates how
Denition 1.5. Given a matrix A whose elements are the con-
to deliver these different types of biclusters using frequent patterns
catenation of the observed values aij A L with their column (or
when considering the constant model.
row) indexes. Let P of an itemset P in A be its set of indexes. set of
biclusters [ k I k ; J k can be derived from a set of frequent itemsets
1.2. Related work
[ k P k by mapping I k ; J k Bk, where Bk P k ; P k , to compose
biclusters with coherency across rows, or I k ; J k P k ; P k for
To our knowledge, BicPAM [62], BiModule [96], DeBi [111], Bellay's
column-coherency.
et al. [14], GenMiner [86] and BiP [60] are the state-of-the-art
Two classes of PM-based biclustering approaches can be consid- methods for the rst class of PM-based biclustering. BiModule
ered: (1) a rst class targeting discrete matrices by using as-is pattern [96,97] allows a parameterized multi-value itemization of the input
miners, and (2) a second class targeting numeric matrices by matrix to discover constant biclusters derived from (closed) frequent
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3945
Fig. 1. Mining biclusters with constant assumptions over itemset matrices. To discover biclusters with constant values on the rows, the input matrix needs to be itemized.
Column identiers are combined with the observed values, and FIM applied under a parameterizable support threshold 24 P Z 2. Constant values on columns can be
mined using the transpose matrix. To nd biclusters with constant values overall, each item needs to be separately mined. In each iteration, only the elements containing the
selected item are included in the transactions.
patterns using the LCM miner [125]. DeBi [111] derives biclusters Different options for PM-based biclustering can be grouped
from (maximal) frequent patterns mined over binarized matrices according to its three major steps: mapping (preprocessing), mining
using the MAFIA miner [22], and places key post-processing princi- (pattern discovery), and closing (postprocessing). The core step is
ples to adjust them in order to guarantee their statistical signicance. the mining step, corresponding to the application of the target
The recently proposed BicPAM [62], parameterized with the F2G pattern miners. This step is driven by the chosen paradigm, target
miner [65] by default, extends the constant assumption of previous patterns and search properties. The mapping step (optional for
approaches to nd biclusters with symmetric, additive and multi- methods able to deal with non-discrete data) is responsible for the
plicative factors by performing iterative corrections on the input itemization of a (real-value) matrix and for other preprocessing
matrix. BicPAM also surpasses discretization problems by introducing options to handle outlier, noisy and missing elements. Finally, the
the possibility to assign multiple discrete values to a single element, closing step includes the postprocessing of the mined patterns to
and offers new strategies to robustly handle noise and missing affect the structure and quality of the target biclustering solutions.
values. Bellay's et al. method [14] uses the Apriori miner [2] with These options impact the homogeneity of the biclustering
additional principles to evaluate the functional coherency of the solutions. The homogeneity criteria can be intentionally controlled
discovered biclusters against the background noise. This is one of to search for biclusters with a specic coherency (underlying
diverse PM-based attempts to exhaustively discover dense biclusters pattern correlation), structure (number, size and positioning of
in either unweighted networks [13,90,133,80] or, more interestingly, biclusters) and quality (amount and type noise within a particular
in scored networks [32,30]. GenMiner [86] includes external knowl- bicluster or set of biclusters).
edge within the input matrix to derive biclusters from association Section 2.1 covers the core PM-based biclustering paradigms.
rules that relate annotations (external grouping of rows or columns) Sections 2.2 2.3 detail the remaining mapping and closing
with clusters derived from (closed) frequent patterns using CLOSE dimensions and discuss their implications in the behavior of PM-
[102]. BiP [60] is prepared to discover plaid models by relying on based approaches.
noise-tolerant association rules for the recovery of apparent noisy
areas due to the presence of cumulative effects on the overlapping 2.1. Mining options: discovery of biclusters using pattern mining
areas between biclusters.
The itemization step is optional for the second class of methods Flexible scenarios where the number and position of biclusters
[8]. To our knowledge, RAP [101], RCB discovery [8] and ET- is not constrained require efcient algorithms [111,81]. The ade-
bicluster [52] are state-of-the-art methods here. RAP [101] plugs quate use of PM approaches is critical to guarantee the exibility
an adapted range-based metric to mine constant biclusters on and scalability of the biclustering algorithm, and depends essen-
rows (or columns), while RCB discovery targets biclusters with tially on four variables discussed below: (1) the chosen PM-based
constant values overall [8]. ET-bicluster extends the previous approach to biclustering, (2) the application schema, (3) the target
approaches to discover noisy biclusters, although an exhaustive pattern representations, and (4) the search strategies.
enumeration of biclusters is not guaranteed [52]. Alternative
support metrics with dedicated Apriori-based searches have been 2.1.1. Mining approaches to compose biclusters
additionally proposed [69,115,53]. In what follows, we overview the state-of-the-art options using:
(1) frequent pattern mining, (2) association rule mining, (3) structured
pattern mining, and (4) hybrid approaches to compose biclusters.
2. PM-based biclustering 2.1.1.1. Frequent pattern mining. Two main strategies can be
considered: (1) relying on frequent itemset mining (FIM) support
We propose a structured view of PM-based biclustering accord- metric as-is; and (2) dening new (anti-)monotonic support
ing to a set of dimensions of decision. We rely on state-of-the-art metrics for a dedicated yet efcient search.
literature to characterize each dimension. These dimensions Fig. 1 illustrates how PM can be applied to nd biclusters with
gather principles on different steps with impact on the biclusters constant items overall, on rows and on columns. When ignoring the
type, structure and quality, as illustrated in Figs. 2 and 3. closing step, the discovered biclusters are the frequent itemsets. The
Throughout this paper we dene a set of principles for each step. support threshold denes the minimum number of rows in a
3946 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958
Fig. 3. Structured view of PM-based biclustering: illustrative options across the major dimensions. It groups critical decision dimensions (corresponding to either a row, a
column or a cell of the framework) to support the design of PM-based biclustering approaches. A set of principles for each dimension is illustrated and detailed throughout
this work for each biclustering step (mining, mapping and closing) and biclustering goal (dened according to a specic type, structure and quality of biclustering solutions).
bicluster. By decreasing this threshold we are degrading the efciency long as the new support metric is (anti-)monotonic, its inclusion
of the task, but searching for a broader set of biclusters with smaller within Apriori-based frameworks [101] can be easily handled with
sizes. In the context of gene expression, this is critical since small efciency. Patterns are thus generated using breadth-rst level-
groups of genes can be functionally related. Additionally, the search wise pattern tree.
can allow the pruning of itemsets below a minimum number of Han et al. proposed Min-Apriori [53], an algorithm to deal with
columns and above a maximum number of rows and columns. ordinal items. Steinbach et al. [115] introduced a framework to
From the point of view of an itemized database, the FIM-based generalize the notion of support to extend association analysis to
biclusters are perfect biclusters, that is, they do not allow value-variations continuous-based patterns. An alternative support function [69] has
in any of its elements. Contrasting, from the point of view of the input been proposed to mine hyperclique patterns (groups of columns or
real-value matrix, these biclusters can handle noise as different values rows strongly related) over numeric matrices. Calders et al. [25]
may be assigned with the same item. The number of items can be proposed the use of rank-based measures to score the similarity of
exibly parameterized to control the level of noise-tolerance, which sets of numeric attributes within new support metrics by extending
contrasts with traditional biclustering approaches over discrete matrices3 , Spearman's , and Spearman's Footrule F correlation metrics [71].
[94,119]. Although BiModule [97,96] allows a parameterizable number of Here, efcient algorithms are designed to deal with the ranks of
items and support threshold, the structural data noise and the applied attribute values, but not with the original numeric values. However,
itemization procedure often leads to the partitioning of large biclusters these approaches do not capture key properties of real-valued
into smaller ones (with many of them ltered out as no longer satisfy the matrices, such as the need to ensure that the values of items in a
support criterion). Contrasting, although DeBi [111] and Bellay's et al. transaction are within a range to guarantee coherence and distin-
method [14] alleviate this problem by providing postprocessing strategies guish positive from negative values.
to improve the functional coherence of the discovered biclusters, they More recent approaches propose range-based support metrics
require the input data to be binarized. to either discover coherency on rows, such as RAP [101]. RAP is
FIM-based approaches suffer from the risk of assigning ele- dened under a sign-coherence constraint, enforcing that a
ments with similar real-values to different items. We refer to this transaction can only contribute to the support of a pattern if the
drawback as the items-boundary problem. In order to address this values of all the items in it have the same sign.4 An alternative, RCB
problem, the notion of support of an itemset can be redened. As
4
3
Illustrating, xMotif [94] relies on greedy search and uses a size merit function For a matrix A X; Y and I D X; J D Y, the support metric is dened as
and a noise threshold to guarantee the discovery of large and interesting biclusters, supJ i A X Si; J, with:
and SAMBA-based approaches [119] map binarized matrices into a weighted minj A J aij if maxj aij minj aij r minj j aij j 4 8 j aij 4 03 8 j aij o 0
Si; J
bipartite graph to nd subgraphs that maximize a weight merit function. 0 otherwise:
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3947
Given an itemset database Dex ffA; B; H; Jg; fD; H; Jg; fC; D; H; Jgg,
2.1.1.4. Hybrid approaches. Biclustering can rely on multiple types of and thresholds 2 (j P j Z 2) and j Pj Z 2, there is one maximal
patterns discovered by different PM approaches. Valid options include frequent itemset (fD; H; Jg) and there are two closed frequent
the denition of ensemble methods combining plain and structured itemsets (fD; H; Jg and fH; Jg). The selection of the pattern repre-
patterns or the output of multiple PM methods (parameterized with sentation essentially depends on the type and structure of the
different support-condence thresholds). Frequent itemsets can be target biclusters, and on the post-processing needs.
also used to produce an initial solution, while rules can be posteriorly Maximal itemsets for biclustering, such as those used in DeBi
mined to shape the discovered biclusters by accommodating noise. [111], are associated with biclusters with the columns' size max-
An alternative ensemble model can rely on the multiple results from imized. Such attened biclusters are only of interest when there is
the iterative parameterization of a PM method with different PM- an extension step to be performed to include new rows. However,
based constraints of interest. To our knowledge, these hybrid since both vertical and smaller biclusters are lost, this representation
possibilities have not been systemically studied in literature. leads to incomplete solutions. The opposite alternative is the use of
all frequent itemsets for biclustering. This solution leads to a high
2.1.2. Application schema number of potentially redundant biclusters (if contained by another
The previous pattern mining approaches can be iteratively bicluster), which can degrade the performance of the mining and
applied with a decreasing support threshold until a stopping criteria closing steps. Finally, the search for closed itemsets, such as FIM-
is achieved [62]. BicPAM makes available distinct stopping criteria, based BiModule [96] and rule-based GenMiner [86], allows the
including a minimum coverage of the elements in the input matrix discovery of overlapping biclusters if a reduction on the number of
by the discovered biclusters or, alternatively, an approximate columns results in a higher number of rows. Closed pattern solutions
number of biclusters (after or prior to postprocessing) [62]. Such are thus enabling the return of all maximal biclusters (set of
criteria can either be driven from user expectations or dynamically biclusters that are not included in other biclusters). The properties
derived from the properties of the input matrix [63]. of these three alternative representations are illustrated in Fig. 7.
Furthermore, iterative corrections can be applied on the matrix to
enable the discovery of more exible coherencies. BicPAM makes use 2.1.4. Search strategies
of the observed differences and of the least common divisor between The choice of the search strategy depends essentially on the
the observed values for a given column (or row) in the matrix in target biclustering task and on the properties of the considered
order to perform iterative corrections across rows (or columns) and implementation. Generally, PM searches are centered on comput-
thus identify shifting and scaling factors. The removal of these factors ing the set of frequent patterns, which is the core task of all
in the matrix allows the discovery of additive models and multi- pattern miners.
plicative models [62]. Similarly, BicPAM can also rely on combinatorial The choice of whether to use a vertical or an horizontal data
sign-adjustments across rows (or columns) to model symmetries, and format depends essentially on the type of biclusters we are
integrate them with shifting and scaling factors [62]. Pruning targeting. To nd constant items on the rows or on both dimen-
strategies are considered to avoid redundant calculus and reduce sions, we usually benet from using searches over horizontal data.
the computational complexity of these iterative corrections. This is particularly true for matrices where the total number of
BiP relies on the converging application of PM for learning plaid rows largely exceeds the total number of columns. To nd constant
models [60], based on the observation that, by incrementally items on the columns (when n 4 m), a vertical data format should
removing overlapping contributions, the residual values become be the choice, as the performance of searches using the horizontal
closer to the underlying unstructured noise. For this aim, BiP format degrades exponentially with the increase in the number
performs checks between iterative applications of PM searches in of items.
order to recover areas explained by cumulative effects (contribu-
tions on overlapping areas between biclusters) and to remove
noisy areas that are not described by a plaid assumption. Without
degradation of efciency levels, it also provides relaxations to
model overlapping contributions characterized by noisy and non-
linear cumulative effects [60].
The choice of whether to use an Apriori-based, pattern-growth maximum and minimum) is the simplest discretization option, but it
or combined approach, depends on three variables: (1) the type of usually leads to an accentuated weak distribution of items and it is
PM-based approaches (range-based approaches cannot rely on prone to the items-boundary problem. The rst problem can be
pattern-growth methods), (2) the density of the resulting itemset corrected using a percentage-based method for the depth partitioning
matrix, and (3) the ability to retrieve the supporting transaction set of items that leads to intervals containing approximately the same
for each frequent itemset without degrading the overall efciency. number of elements. Alternatively, distributions combine the proper-
This analysis is detailed in supplementary material. When biclusters ties of the previous solutions. In the example, a Gaussian distribution is
with constant values overall are targeted, the resulting matrices are able to minimize the loss of potentially relevant biclusters. By nding
sparser (Fig. 1) and, therefore, an Apriori strategy is preferred. For multiple suitable curves (for each row or column) or one suitable
denser matrices, pattern-growth strategies are preferable. overall curve to approximate the matrix, one can either use threshold
In particular, the discovery of patterns together with their methods [26,31] or compute the statistical cutoff points to create
supporting transactions has been tackled using extensions over equally-distributed areas. Nordi [86] is a Gaussian-based method used
Apriori and vertical-based algorithms by relying on bitset vectors to in GenMiner [86] that statistically detects outliers (using the Grubbs
capture the supporting transactions per pattern [86,111,96]. How- method), applies normality tests (using QQ-plot and Lilliefors) to
ever, bitset vectors offer efciency problems in terms of memory transform the initial row distributions into a more Normal distribu-
and time for large and dense datasets. Henriques et al. [65] study tion, and computes cutoff thresholds using the z-score methodology. In
efcient alternatives and propose a pattern-growth algorithm to the presence of matrices with multimodal distributions, more expedite
discover full-patterns with heightened time and memory efciency. methods based on a mixture of distributions must be considered.
An additional key aspect is the chosen implementation. The use of A unique advantage of PM-based approaches is the fact that they
bit-set operations and either reduced number of scans or efcient can easily address the items-boundary problem of discretization
tree-traversals are usually key for a top performance. Efcient procedures by assigning two or more items to an element in the
implementations include algorithms to mine closed itemsets under original matrix with a real value that is near a discretization boundary
an Apriori search (LCM [125], Charm [136]), vertical search (TD-Close (or cut-off point). This is possible since PM is able to learn from
[77]) or pattern-growth search (FPClose [51]); and to mine maximal transactions (mapped from the rows of an itemized matrix) with an
itemsets under an Apriori search (MaxMiner [11]), vertical search arbitrary number of items. Despite the critical relevance of this
(Maa [22]) or pattern-growth search (AFOPT [76]). Similarly, multi- strategy, its impact was not yet systemically assessed.
ple implementation variants can be found to compose association Alternative discretization options that aim to deal with this problem
rules [138,87] and to mine structured patterns. For instance, include: (1) adaptive discretization based on dynamic threshold selec-
sequence miners can either use Apriori, pattern-growth and vertical tion policy [107]; (2) statistical methods to detect differential activity of
searches, and nd closed and maximal sequential patterns [79]. In elements as the basis to create partitions [31] (commonly adopted as a
DeBi [111], BiModule [97] and GenMiner [86] use Maa [22], LCM binarization method); (3) distance-based subspace clustering models
[125] and CLOSE [102] implementations, respectively. Range-based [75] to exibly partition the values while preserving meaningful and
variants use Apriori [2]. Additional principles proposed in literature signicant clusters; (4) fuzzication approaches where a continuous
[138,99,100] can be seized to guarantee the scalability of the search domain is partition into fuzzy sets, provided to be more robust to noise
when mining large biclusters from dense or large data settings. when compared with other simple binning techniques [47]; and (5)
supervised discretization methods [45] (when descriptive labels per
2.2. Mapping options: preprocessing input data row or column are present or computed using clustering methods),
where a row or column is partitioned into a number of disjoint
Previous section covered essential mining options with impact on intervals in such a way that the entropy of the partition is minimal.
the coherency, structure and quality of PM-based biclustering solu- An additional preprocessing concern appears for matrices with
tions. However, their optimum application requires the input arbitrary-high number of missing elements. Although multiple
matrices to be correctly normalized5 and (depending on the PM- imputation methods have been proposed [122,38,58] to alleviate
based approach) discretized. The problem of dening an adequate this problem, they can introduce additional noise and undesirably
coherency strength is identical for range-based approaches (distance affect the homogeneity of the output biclusters. BicPAM [62] and
thresholds as a function of data domain values) and discrete PM- BicSPAM [63] consider varying relaxations to surpass this problem,
based approaches (number of items). Although discretization may including a relaxed setting where the missing element is replaced
imply loss of information, it alleviates the noise dilemma [26,31]. by all the available items (leading to transactions with varying
Since discretization is a key step for the class of PM-based methods size), and a medium-constrained setting to consider a parameter-
that relies on itemset databases, having key implications on the target izable number of items around its value-estimation.
solution, we study two variables: (1) the number of items (also referred
to as symbols or expression levels) and (2) the method used to map the 2.3. Closing options: postprocessing biclustering solutions
normalized real-value matrix into a itemset database. A sensitivity
analysis on the impact of the number of items on the quality and size of PM-based biclustering approaches produce exhaustive solu-
biclusters was, rst, performed in Bidens [83] and BiModule [96]. Fig. 8 tions with exible structures (arbitrary number and positioning of
illustrates how simple discretization options can lead to different biclusters). These non-exhaustive, non-exclusive structures, where
solutions. The itemization (concatenation of the item with the overlapping is allowed, are the most suitable option to tackle the
column-index) implies that the resulting number of items is at most applications listed in Table 1.
m l, being l the number of items specied by the user. The use of Two key challenges of exhaustive solutions are: handling noise
xed ranges (potentially equal sized intervals between the observed and dealing with the potential explosion of valid biclusters. Part of
these questions can be answered in the mapping step by selecting
the number of items and discretization setting able to handle the
5
Normalization options are often applied before biclustering to enhance items-boundary problem. However, postprocessing may be
differences across rows and/or columns and, consequently, to improve the ability required to avoid the following two challenges of the noise
to discover biclusters. de Souto et al. [34] compare three normalization procedures
(z-score, scaling and rank-based procedures) over gene expression datasets using
dilemma. The rst results from a too restrictive noise tolerance,
alternative clustering algorithms. Additional methods for preprocessing the input commonly associated with a high number of items, which leads to
matrix have been reported [118,83,25]. many small sized biclusters. The second is related to heightened
3950 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958
levels of noise allowance, commonly occurring in binarized parti- veried [29]; or discover patterns under more restrictive condi-
tions or through the use of rule-based approaches under a relaxed tions (as higher support and condence thresholds) and use them
level of condence. To handle these challenges we propose the use guide the removal of rows and columns [62,63].
of a set of criteria structured according to three major postproces- The second type of ltering is required to guarantee the dissim-
sing steps (merging, ltering and extension) described below. ilarity of biclusters, removing biclusters partially contained in larger
biclusters. BiModule [96] lters small biclusters by sorting biclusters
Merging options: Merging biclusters may serve two goals: noise following the score aIJ log 2 j Ij log 2 j J and biclusters whose cells
allowance (to avoid solutions composed uniquely of small biclus- overlap by more than 25% with a higher scored bicluster. The work
ters) and overall biclustering structure manipulation. The rst goal by Bellay et al. [14] separates biclusters that represent biological
is driven by the observation that when two biclusters share a phenomena from false discoveries (emerging from the background
signicant area it is probable that their merging composes a larger data distributions) using randomized data scores.
bicluster still respecting some homogeneity criteria. Commonly,
such decomposition is related to the items-boundary problem or Extension options: Three optional and non-exclusive strategies can be
with a missing value. The simplest criterion to allow the merging used to extend the discovered biclusters so that the resulting solution
is either to rely on the overlapping area (as a percentage of the still satises some pre-dened homogeneity signicance criteria. First
smaller bicluster), to compute the overall noisy percentage after strategy consists on the use of statistical tests to include rows or
the merging, or both. Additional homogeneity criteria relying on columns from each bicluster. DeBi [111] uses statistical tests to extend
the real-values provided by the input matrix can be formulated. biclusters obtained over binary matrices by evaluating the association
Henriques et al. [61] performed a comparison between three strength between key columns of a bicluster and a new row using
distinct efcient merging techniques. Bellay et al. [14] proposed Fisher's exact test for independence on a contingency table. This
a Markov Clustering (MCL) algorithm to both summarize biclus- guarantees that each row in the bicluster shows a statistical difference
tering solutions and allow for the creation of larger biclusters. between the columns in the bicluster and the columns not in the
bicluster, leading to more functionally coherent biclusters. Second
Filtering options: Filtering is needed at two levels: (1) at the row/ strategy is to rely on traditional merit functions for further (greedy)
column level and (2) at the bicluster level. The rst type of ltering extensions over PM-based biclusters. Third strategy is to discover
is needed to exclude rows or columns from a particular bicluster in patterns under more relaxed criteria (such as lower support-
order to improve its homogeneity. This is usually the case when a condence thresholds) and use them to guide the extension step
low number of items is considered, leading to highly noise- [62]. When considering lower supports, new columns and rows can be
tolerant biclusters. For this purpose, we can rely on statistical added to the original frequent patterns. Similarly, more relaxed
tests on each row and column of a particular bicluster to identify association rules, with less restrictive ways to group the antecedent-
removals [111]; use existing greedy-iterative approaches to max- consequent, can be used to guide extensions.
imize a merit function until a parameterizable reduction in size is
Alternatives to merging, ltering and extension options: Alternatives to
previously introduced closing options to deal with large sets of small
biclusters include: (1) summarization techniques based on simple and
hierarchical clustering methods or on the denition of similarity
measures to compare biclusters [18]; (2) user-driven formal con-
straints and querying expressions [19,20]; (3) co-clustering for exclu-
sively partition both dimensions to select representative biclusters
[36]; (4) pre- and post-pruning techniques (including item-based
constraints and discrimination metrics) [88]; (5) patterns based on
half-spaces (as quantitative rules) in which external sources of
Fig. 8. Comparison of alternative discretization options by addressing their impact information are used as a ltering basis [48]; and (6) verication
on the itemization and biclustering solutions with constant values on columns. techniques based on metrics computed using external data sources as,
Table 5
Systemic comparison of the two major classes of PM-based biclustering approaches.
PM-based biclustering - Exhaustive searches; 1. Deterioration of efciency levels for large 1. Data partitioning methods; PM in distributed
- Handle missings and noise; data (in the absence of PM scalability settings; approximated patterns (discovered
- Biclusters with multi-levels of principles); under specic performance guarantees) [54,134];
coherency strength; 2. Not natively prepared to capture additive, 2. Iterative data mappings on rows/ columns
- Extensions to discover exible multiplicative, symmetric and plaid (with pruning heuristics) to mine non-constant
coherencies; coherencies (their discovery can be biclusters [62]; merging procedures sensitive to
- Flexible structures; computationally expensive); overlapping plaid effects [60];
- Flexible searches; 3. High number of mined biclusters (memory 3. Adequate data structures; ltering options
- Constraint-based guidance; usage); pushed into mining step;
4. Need to x thresholds for the standard 4. Use of multi-thresholds (iterative method);
(customized) support metric; data-driven estimation;
Range-based support - Range-based support addresses 1. Separation of positive and negative values to 1. Merging of biclusters with shared columns (or
biclustering the items-boundary problem; guarantee monotonicity, resulting in biclusters rows) but different signs to avoid the violation of
- Easy extension of Apriori methods without simultaneous under- and over- the (anti-)monotonic property;
to seize efciency gains when expressed values;
dealing with multiple distances 2. Dedicated Apriori-based methods do not 2. Dedicated extensions to mine patterns with
(support thresholds); allow the direct use of PM scalability tree structures (required for dense datasets), and
principles; to make use of (scalable) data partitions;
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3951
for instance, term enrichment (in gene expression data) to affect the In particular, we rely on Jaccard-based match scores (MS) to assess
addition-removal of columns-rows per bicluster. the similarity of B and H [108]. MSB; H denes the extent to
which found biclusters cover the hidden biclusters (complete-
2.4. A systematic comparison of PM-based biclustering approaches ness), while MSH; B reects how well hidden biclusters are
recovered (precision).
In what follows, we provide a synthesis of the benets and 1 X I 1 \ I 2
challenges of using PM-based biclustering approaches together MSB; H max :
BI ;J A BI2 ;J2 A HI 1 [ I 2
with principles on how to tackle existing challenges. Table 5 1 1
focuses on PM-based biclustering classes in general, while Since MS scores are not sensitive to the number of biclusters in
Table 6 focuses on each surveyed approach in particular. both sets, Hochreiter et al. [67] introduced a consensus (FC) by
Understandably, different applications may be better tackled by computing similarities between the pairs of closest biclusters
different PM-based biclustering approaches. BicPAM, BiModule and between B and H. Let S1 and S2 be, respectively, the larger and
RAP are default options for settings where meaningful biclusters can smaller set of biclusters from fB; Hg, and MP be the assigned pairs
only be found using multiple coherency levels, which is often the using the Munkres method based on overlapping areas [93].
case with scored biological/social networks, expression data and
1 X I1 \ I2 J 1 \ J 2
physiological data [62,96,101]. DeBi and BicPAM are critical for the FCB; H
analysis of large Boolean datasets, such as the ones derived from S 1 I I1 J 1 I2 J 2 I 1 \ I 2 J 1 \ J 2
1 ;J 1 A S 1 ;I 2 ;J 2 A S 2 A MP
Table 6
Benets, challenges and possible improvements of state-of-the-art PM-based biclustering approaches. PM-based biclustering benets and challenges in Table 5 apply to
DeBi, BiModule, GenMiner and BicPAM/BiP, while both PM-based and range-based biclustering benets and challenges in Table Table 5 apply to RAP, RCB and ET-Biclusters.
DeBi Complete and statistical rigorous options for Efciency deterioration from post-processing Discovery of closed patterns (removes the need
post-processing biclustering solutions; discovery extension procedures; discovery of maximal for an exhaustive extension of biclusters);
adapted to the target signicance level; (see PM- patterns (loss of a large number of potentially multi-level discretization (standardly as
based benets) signicant biclusters); binarization of data; (see remaining PM-based approaches); (see PM-
PM-based challenges) based principles)
BiModule Multi-level discretization with removal of No merging-extension options for handling noise Inclusion of the surveyed closing options; (see
outliers; (see PM-based benets) and growing biclusters; (see PM-based challenges) PM-based principles)
GenMiner More complete frame to derive noisy biclusters Require annotations from knowledge bases; non- Retrieval of annotations from the dataset under
from rules (non-perfect condence levels); parameterized levels of expression (only 3); (see analysis when knowledge bases are not
allows extracting relations between genes and PM-based challenges) available; delivery of rules without the need
real-world annotations; (see PM-based benets) annotations for annotation on the antecedent
or consequent; inclusion of the surveyed
mapping options; (see PM-based principles)
BicPAM/ BiP Discovery of additive/multiplicative/symmetric/ Efciency levels of the search for non-constant New heuristics, scalability principles,
plaid models; robustness to discretization, noise models rapidly deteriorates for very large approximative searches (replacing the
and missings; dedicated PM searches to explore matrices; (see PM-based challenges) exhaustive criteria), or constraint-based
further efciency gains; (see PM-based benets) guidance to learn non-constant models; (see
PM-based principles)
RAP (see PM & range-based benets) Not able to deal with noisy biclusters; (see PM- Inclusion of closing framework (merging and
and range-based principles) extension strategies); (see PM- and range-based
challenges)
RCB Discovery (see PM and range-based benets) Constant coherency overall excludes biclusters Combined results with other approach
with meaningful differences across columns biclustering solutions (e.g. RAP); alternative
(rows); joining squares (discovered patterns) to computational methods; (see PM- and range-
compose rectangles (biclusters) is a based principles)
combinatorial problem that impacts efciency 8;
(see PM- and range-based challenges)
ET-Bicluster Parameterizable discovery of biclusters based on Inclusion of error-based thresholds on the Adoption of more relaxed thresholds to avoid
the allowed amount of noise; (see PM and range- Apriori-method violates the (anti-)monotonic loosing biclusters of interest with a post-
based benets) property, thus not guaranteeing exhaustive ltering of biclusters non-satisfying criteria;
solutions; (see PM- and range-based challenges) inference of bounds on the performance
guarantees; (see PM- and range-based
principles)
Table 7
Properties of the generated set of synthetic datasets.
Matrix size (#rows # cols) 100 30 500 60 1000 100 2000 200 4000 400
following a Uniform distribution, U(1, L), and 10 matrices proposed principles in this work by considering closed patterns,
according to a Gaussian distribution, N(L L
2 , 6 ). multiple levels of coherency strength ( A f3; 5; 7g), an assign-
ment of two items for elements with values near item-boundaries,
Comparison: We selected 15 state-of-the-art approaches9: FABIA and merging ( 4 70% overlap) and ltering options. The support
method with sparse prior option [67], ISA [70], OPSM [15], CC [29], threshold was incrementally decreased 10% until the area of the
Samba [119], xMotifs [94], OP-Clustering [78], BicSPAM [63], Bexpa discovered biclusters covered at least 5% of the input matrix.
[106], BCPlaid [123] and the PM-based BiModule [96], DeBi [111], Fig. 9 compares the ability of these state-of-the-art approaches
RAP [101], BicPAM [62] and BiP [60] biclustering approaches. We to discover planted biclusters with constant coherency on rows.
used the following software: R packages fabia10 and biclust11, Results conrm the superior performance of PM-based bicluster-
BicAT [10], Expander12, (Evo-)Bexpa [106], RAP13 and BicPAMS14. ing approaches both in terms of the MS B; H (correctness)
In particular, we adjust BicPAM behavior according to the and MS H; B (completeness) as they provide exhaustive and
exible searches. Superiority is also veried for non-constant
models. Fig. 10 compares the performance of biclustering
9
The specied number of biclusters for FABIA, Bexpa, CC, xMotifs and ISA methods prepared to discover shifting-scaling factors when the
(number of starting points) was the number of hidden biclusters plus 10%:
planted biclusters follow additive and multiplicative models.
H 1:1. Note that this specication guides the search, optimistically biasing Fabia
Consensus (FC) levels. The default number of iterations for the OPSM method was A closer look to the performance of PM-based biclustering, when
varied from 10 to 200 iterations. Remaining parameterizations were set by default. multiple levels of coherency strength are considered, is provided
10
http://www.bioinf.jku.at/software/fabia/fabia.html. in Fig. 11.
11
http://cran.r-project.org/web/packages/biclust
12
http://acgt.cs.tau.ac.il/expander.
13
http://www.mybiosoftware.com/rap-association-analysis-approach-biclus
Efciency: Fig. 12 shows the boundaries on efciency of PM-based
tering.html. biclustering approaches when considering 20.000 rows (magni-
14
https://web.ist.utl.pt/rmch/software/bicpams. tude of the human genome). We varied the number of columns,
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3953
Fig. 9. Comparison of the performance of state-of-the-art biclustering approaches on data settings with varying properties and constant coherencies.
Fig. 10. Comparison of biclustering approaches to recover biclusters with non-constant coherency.
Fig. 11. Performance of PM-based biclustering for data settings with varying coherency strength.
items (j Lj A f5; 7g) and considered a simple merging option 1000 100 data setting. The FIM methods were tested using
( 4 70% overlap). We planted 15 biclusters to occupy 2% of the SPMF15 and F2G [65]. FPGrowth [65] and Eclat [135] are the most
area. Charm [136], an efcient pattern miner to deliver closed competitive choices for small support thresholds, while Apriori [2]
patterns (maximal biclusters), was used. Generally, we observe is the best option for medium-to-large support levels. Additionally,
that PM-based biclustering approaches are scalable for these the use of simple patterns (using FPGrowth [1]) degrades MSB; H,
dense and large matrices. Understandably, the number of items while the use of maximal patterns (using CharmMFI [136])
has strong impact in efciency as it denes the density of the penalizes MSH; B as it discards biclusters with a non-large
itemset database. The scalability of pattern mining methods can be number of columns (even if they have larger number of rows).
guaranteed for even harder settings by adopting some of the
largely researched parallelization, distribution, streaming and Impact of closing options: We planted additional levels of noise, by
error-bounding PM principles [54]. Additionally, hyperclique pat- varying the amount of noisy elements from 0 to 10%, for the
terns [52], which require item-pairwise support-similarity, can be 1000 100 setting. Fig. 14 describes the impact of alternative strategies
also considered to promote the efciency of the mining procedure. to extend, merge and lter biclusters using Charm. When increasing the
planted noise, extension options are critical to maintain attractive levels
Impact of mining options: Fig. 13 illustrates the impact of the of accuracy (20pp higher than the baseline option). Fig. 13(b) illustrates
chosen search and pattern representations (simple, closed, max-
imal) in the efciency and MS levels of PM-based biclustering
approaches when using a discretization step with 10 items and the 15
http://www.philippe-fournier-viger.com/spmf.
3954 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958
Fig. 12. Efciency bounds of PM-based biclustering in the absence of scalability principles for datasets with 20,000 rows.
Fig. 13. Comparison of mining searches and pattern representations for the 1000 100 setting.
Fig. 14. Impact of extending, merging and ltering options. (a) Extending biclusters for varying levels of noise. (b) Merging for varying overlapping degrees (5% of planted
noise). (c) Filtering for varying homogeneity degrees (2% of planted noise).
Table 8
Illustrative set of PM-based biclusters with unique properties and heightened biological relevance.
ID Dataset Pattern Items Closing options # Genes # Conds #pvalues o 0:01 #p-values [0.01,0.05] Best p-value
the impact of merging biclusters with large overlapping areas assuming For each dataset standard PM-based biclustering (closed FIM) was
a level of planted noise of 5%. When decreasing the overlapping applied using multiple levels of expression L A f47g and different
threshold, MS levels increase up to a certain threshold (near 70% for closing options: (1) merging (70% overlap), (2) relaxed merging (55%
this experimental setting). A correct identication of this threshold can overlap) with ltering of rows, and (3) tight merging (90% overlap)
lead to signicant gains (near 15pp in this setting). Finally, the use of with extensions on rows that appear in another bicluster sharing a
ltering strategies to remove rows and columns can also enhance the minimum 50% of conditions. The biological relevance of each
recovery of the planted biclusters, as it is illustrated in Fig. 14(c). bicluster was obtained using the Gene Ontology (GO) annotations
Similarly to the merging option, MS increases up to a 75% homogeneity using the GoToolBox [85]. Table 8 shows an illustrative set of PM-
(given by 1 MSR [29]) and decreases above this threshold since the based biclusters with signicantly enriched GO terms (after Bonfer-
homogeneity criteria becomes too restrictive. roni correction). These biclusters could hardly be discovered by peer
Domain relevance: To assess the relevance of PM-based bicluster- biclustering methods, since many of them include conditions with
ing in biological settings we used two gene expression datasets : (1) multiple degrees of expression (such as B1, B2 and B4). All of them
dlblc dataset (660 genes, 180 conditions, human genome) [109], and have heightened biological signicance as observed by the number of
(2) hughes dataset (6300 genes, 300 conditions, yeast genome) [74]. highly enriched terms. Interestingly, we also observe that different
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3955
closing options lead to distinct biclusters. Complementary analyzes [2] Tomasz Imieliski, Rakesh Agrawal, Arun Swami, Mining association rules
supporting the biological relevance of PM-based biclustering are between sets of items in large databases, SIGMOD Rec. 22 (June (2)) (1993)
207216.
provided in [62,60,111]. [3] H.A. Ahmed, P. Mahanta, D.K. Bhattacharyya, J.K. Kalita, A. Ghosh, Intersected
coexpressed subcube miner: An effective triclustering algorithm, in: WICT,
December 2011, pp. 846851.
4. Conclusions [4] Faris Alqadah, Joel S. Bader, Rajul Anand, Chandan K. Reddy, Query-based
biclustering using formal concept analysis, in: SDM, SIAM/Omnipress,
This work provides a structure view on pattern mining-based Anaheim, California, USA, 2012, pp. 648659.
[5] Ronnie Alves, Domingo S. Rodrguez-Baena, Domingo S. Rodrguez-
approaches to biclustering as they are increasingly positioned as Baena Ronnie Alves, Jess S. Aguilar-Ruiz, Gene association analysis: a survey
the means to perform exhaustive searches under relaxed condi- of frequent pattern mining from gene expression data, Brief. Bioinform. 11
tions (exible structures of biclusters with parameterizable coher- (2) (2010) 210224.
[6] I. Assent, R. Krieger, E. Muller, T. Seidl, DUSC: Dimensionality unbiased
ency and quality) with heightened efciency. In this context, this subspace clustering, in: ICDM, 2007.
work surveys and integrates the contributions of existing PM- [7] Assent Ira, Mller Emmanuel, Krieger Ralph, Jansen Timm, Seidl Thomas,
based biclustering approaches, evaluates their performance, and Machine learning and knowledge discovery in databases, in:
Daelemans Walter, Goethals Bart, Morik Katharina (Eds.), Pleiades: Subspace
discusses their relevance for pattern recognition applications.
Clustering and Evaluation, Lecture Notes in Computer Science, 5212,
A set of principles were synthesized, covering alternative Springer, Berlin Heidelberg, 2008, pp. 666671, ISBN: 978-3-540-87480-5,
design options to guide the denition of PM-based biclustering http://dx.doi.org/10.1007/978-3-540-87481-2_44.
approaches: (1) mining paradigms (including frequent itemset [8] Gowtham Atluri, Jeremy Bellay, Gaurav Pandey, Chad Myers, Vipin Kumar,
Discovering coherent value bicliques in genetic interaction data, in: BIOKDD,
mining, association rule mining, sequential PM, constraint-based 2000.
PM and structured PM), principles to dene support-condence- [9] R. Rathipriya, K. Thangavel, J. Bagyamani, Binary particle swarm optimization
correlation metrics, pattern representations (as simple, condensed based biclustering of web usage data, CoRR abs/11080748 (2011).
[10] Simon Barkow, Stefan Bleuler, Amela. Preli, Philip Zimmermann,
and approximate), searches, and extensions to consider exible Eckart Zitzler, Bicat: a biclustering analysis toolbox, Bioinformatics 22 (May
coherencies; (2) pre-processing options, including strategies to (10)) (2006) 12821283.
deal with the items-boundary problem when discretization pro- [11] Roberto J. Bayardo Jr., Efciently mining long patterns from databases,
SIGMOD Rec. 27 (June 2) (1998) 8593.
cedures are considered and with noisy and missing elements; and [12] Grkan Bebek, Jiong Yang, Pathnder: mining signal transduction pathway
(3) strategies to compose adequate structures of biclusters through segments from proteinprotein interaction networks, BMC Bioinform. 8
extension-merging-ltering steps without the need to adapt the (2007).
[13] Jeremy Bellay, Gowtham Atluri, Tina L. Sing, Kiana Toughi, Michael
core task. As such, this work introduces a highly-parameterizable
Costanzo, Philippe Souza Moraes Ribeiro, Gaurav Pandey, Joshua Baller,
environment to design PM-based biclustering approaches, where Benjamin VanderSluis, Magali Michaut, Sangjo Han, Philip Kim, Grant W.
the behavior can be dynamically dened according to the input Brown, Brenda J. Andrews, Charles Boone, Vipin Kumar, Chad L. Myers,
dataset and the target biclustering type, structure and quality. In Putting genetic interactions in context through a global modular decom-
position, Genome Res. 21 (8) (2011) 13751387.
particular, the quality of a target solution can be easily affected [14] Jeremy Bellay, et al., Putting genetic interactions in context through a global
through the mining options, such as the condence of association modular decomposition, Genome Res. 21 (8) (2011) 13751387.
rules to dene the level of tolerated noise; mapping options, such [15] Amir Ben-Dor, Benny Chor, Richard Karp, Zohar Yakhini, Discovering local
structure in gene expression data: the order-preserving submatrix problem,
as the number of items (coherency strength) and multi-item RECOMB, ACM, New York, NY, USA (2002) 4957.
assignments; and merging, ltering and extension options based, [16] G.F. Berriz, O.D. King, B. Bryant, C. Sander, F.P. Roth, Characterizing gene sets
respectively, on the allowed noise (overlapping degree), dissim- with FuncAssociate, Bioinformatics 19 (2003) 25022504.
[17] Manuele Bicego, Pietro Lovato, Alberto Ferrarini, Massimo Delledonne,
ilarity and homogeneity of biclusters. Biclustering of expression microarray data with topic models, in: IC on
A qualitative comparison of the state-of-the-art PM-based Pattern Recognition, IEEE, 2010, pp. 27282731.
biclustering approaches was provided, as well as initial empirical [18] Sylvain Blachon, Ruggero Pensa, Jrmy Besson, Cline Robardet, Jean-
Francois Boulicaut, Olivier Gandrillon, Clustering formal concepts to discover
evidence supporting the accuracy, efciency and biological rele-
biologically relevant knowledge from gene expression data, In Silico Biol. 7
vance of this class of algorithms. (July) (0033) (2007).
Following this comprehensive work, new research can embrace [19] Jean-Franois Boulicaut, Jrmy Besson, Actionability and formal concepts: a
several promising directions, including: (1) development of new data mining perspective, in: IC on Formal Concept Analysis, Springer-Verlag,
Berlin, Heidelberg, 2008, pp. 1431.
integrative PM-based biclustering approaches; (2) proposal of sta- [20] Jean-Franois Boulicaut, Inductive databases and multiple uses of frequent
tistical tests to effectively assess the signicance of biclusters with itemsets: The cInQ approach, in: Rosa Meo, PierLuca Lanzi, and Mika
varying coherency and quality; (3) integration of principles from Klemettinen (Eds.), Database Sup. for Data Mining App., LNCS, vol. 2682,
Springer, Berlin, Heidelberg, 2004, pp. 123.
domain-driven PM to incorporate constraints in PM-based biclus- [21] Doruk Bozda, Ashwin S. Kumar, V. Catalyurek, Comparative analysis of
tering when background knowledge is available; and (4) design of biclustering algorithms, Bioinformatics and Computational Biology, ACM,
robust classiers based on discriminative PM-based biclusters. New York, NY, USA (2010) 265274.
[22] Douglas Burdick, Manuel Calimlim, Johannes Gehrke, Maa: a maximal
frequent itemset algorithm for transactional databases, in: ICDE, IEEE
Conict of interest Computer Society, Washington, DC, USA, 2001, pp. 443452.
[23] Stanislav Busygin, Nikita Boyko, Panos M. Pardalos, Michael Bewernitz,
Georges Ghacibeh, Biclustering EEG data from epileptic patients treated
None declared. with vagus nerve stimulation, Data Mining, Systems Analysis and Optimiza-
tion in Biomedicine, 953, AIP Publishing, Gainesville, Florida, USA (2007)
220231.
Acknowledgments [24] Toon Calders, Bart Goethals, Mining all non-derivable frequent itemsets, in:
PKDD, Springer-Verlag, London, UK, 2002, pp. 7485.
[25] Toon Calders, Bart Goethals, Szymon Jaroszewicz, Mining rank-correlated
This work was supported by Fundao para a Cincia e a sets of numerical attributes, In: ACM SIGKDD, ACM, New York, NY, USA,
Tecnologia under the projects UID/CEC/50021/2013 and the PhD 2006, pp. 96105.
grant SFRH/BD/75924/2011 to RH. [26] Pedro Carmona-Saez, Monica Chagoyen, Andres Rodriguez, Oswaldo Trelles,
JoseM Carazo, Alberto Pascual-Montano, Integrated analysis of gene expres-
sion by association rules discovery, BMC Bioinform. 7 (2006) 116.
References [27] Andr Valrio Carreiro, Artur J. Ferreira, Mrio AT. Figueiredo, Sara
Cordeiro Madeira, Towards a classication approach using meta-
biclustering: impact of discretization in the analysis of expression time
[1] Ramesh C. Agarwal, Charu C. Aggarwal, V. Prasad, A tree projection algorithm series, J. Integr. Bioinf. 9 (3) (2012) 207.
for generation of frequent item sets, J. Parallel Distrib. Comput. 61 (March 3) [28] Malika Charrad, Mohamed Ben Ahmed, Simultaneous clustering: a survey,
(2001) 350371. Pattern Recognition and Machine Intelligence, in: Kuznetsov Sergei O.,
3956 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958
Mandal Deba P., Kundu Malay K., Pal Sankar K (Eds.), Simultaneous Cluster- [56] Blaise Hanczar, Mohamed Nadif, Ensemble methods for biclustering tasks,
ing: A Survey, 6744, Springer, Berlin Heidelberg, ISBN 978-3-642-21785- Pattern Recognit. 45 (11) (2012) 39383949.
22011, pp. 370375. http://dx.doi.org/10.1007/978-3-642-21786-9_60. [57] J.A. Hartigan, Direct clustering of a data matrix, J. Am. Stat. Assoc. 67 (337)
[29] Yizong Cheng, George M. Church, Biclustering of expression data, Intelligent (1972) 123129.
Systems for Molecular Biology, AAAI Press, La Jolla, California, USA (2000) [58] Trond Hellem, Bjarte Dysvik, Inge Jonassen, LSimpute: accurate estimation of
93103. missing values in microarray data with least squares methods, Nucleic Acids
[30] Recep Colak, Flavia Moser, Jeffrey Shih-Chieh Chu, Alexander Schnhuth, Res. e32 (February (3)) (2004) 34.
Nansheng Chen, Martin Ester, Module discovery by exhaustive search for [59] R. Henriques, C. Antunes, Learning predictive models from integrated
densely connected, co-expressed regions in biomolecular interaction net- healthcare data: extending pattern-based and generative models to capture
works, PLoS One 5 (10) (2010) e13348. temporal and crossattribute dependencies, in: System Sciences (HICSS),
[31] Chad Creighton, Samir Hanash, Mining gene expression databases for January 2014, pp. 25622569.
association rules, Bioinformatics 19 (1) (2003) 7986. [60] R. Henriques, S. Madeira, Biclustering with exible plaid models to unravel
[32] Phuong Dao, Recep Colak, Raheleh Salari, Flavia Moser, Elai Davicioni, interactions between biological processes, in: IEEE/ACM Trans. Comput. Biol.
Alexander Schnhuth, Martin Ester, Inferring cancer subnetwork markers Bioinf. 2015 (volume pp), (99), p. 1, http://dx.doi.org/10.1109/TCBB.2014.
using density-constrained biclustering, Bioinformatics 26 (18) (2010) 2388206.
625631. [61] Rui Henriques, Cludia Antunes, Sara C. Madeira, Methods for the efcient
[33] P.A.D. de Castro, F.O. de Franga, H.M. Ferreira, F.J. von Zuben, Applying discovery of large item-indexable sequential patterns, in: Lecture
biclustering to perform collaborative ltering, Intell. Syst. Des. Appl. (Octo- Notes in Computer Science, Springer Int. Pub., 2014, pp. 100116
ber) (2007) 421426. http://dx.doi.org/10.1007/978-3-319-08407-7_7.
[34] M.C.P. de Souto, D.S.A. de Araujo, I.G. Costa, R. Soares, T.B. Ludermir, [62] Rui Henriques, Sara Madeira, Bicpam: pattern-based biclustering for biome-
A. Schliep, Comparative study on normalization procedures for cluster dical data analysis, Algorithms Mol. Biol. 9 (1) (2014) 27.
analysis of gene expression datasets, in: IJCNN, June, 2008, PP. 27922798. [63] Rui Henriques, Sara Madeira, Bicspam: exible biclustering using sequential
[35] Zhaohong Deng, Kup-Sze Choi, Fu-Lai Chung, Shitong Wang, Enhanced soft patterns, BMC Bioinf. 15 (2014) 130.
subspace clustering integrating within-cluster and between-cluster informa- [65] Rui Henriques, Sara C. Madeira, Cludia Antunes, F2g: efcient discovery of
tion, Pattern Recognit. 43 (3) (2010) 767781. full-patterns, in: ECML/PKDD IW on New Frontiers in Mining Complex
[36] Inderjit S. Dhillon, Subramanyam Mallela, Dharmendra S. Modha, Patterns, Prague, 2013.
Information-theoretic co-clustering, in: KDD, ACM, New York, NY, USA, [66] Rui Henriques, Silvia Moura Pina, Cludia Antunes, Temporal mining of
2003, pp. 8998. integrated healthcare data: methods, revealings and implications, in: SDM
[37] Chris Ding, Ya Zhang, Tao Li, Stephen R. Holbrook, Biclustering protein IW on Data Mining for Medicine and Healthcare, SIAM, Austin, US, 2013,
complex interactions with a biclique nding algorithm, in: ICDM, IEEE pp. 5664.
Computer Society, Washington, DC, USA, 2006, pp. 178187. [67] Sepp Hochreiter, Ulrich Bodenhofer, Martin Heusel, Andreas Mayr, Andreas
[38] A.R. Donders, G.J. van der Heijden, T. Stijnen, K.G. Moons, Review: a gentle Mitterecker, Adetayo Kasim, Tatsiana Khamiakova, Suzy Van Sanden, Dan
introduction to imputation of missing values, J. Clin. Epidemiol. 59 (10) Lin, Willem Talloen, Luc Bijnens, Hinrich W.H. Ghlmann, Ziv Shkedy, Djork-
(2006) 10871091. Arn Clevert, FABIA: factor analysis for bicluster acquisition, Bioinformatics
[39] E. Elhamifar, R. Vidal, Sparse subspace clustering, in: Computer Vision and 26 (June (12)) (2010) 15201527.
Pattern Recognition, June 2009, pp. 27902797. [68] Qinghua Huang, A biclustering technique for mining trading rules in stock
[40] Kemal Eren, Mehmet Deveci, Onur Kktun, mit V. atalyrek, markets, in: Dehuai Zeng (Ed.), Applied Informatics and Communication, of
M. Deveci, A comparative analysis of biclustering algorithms for gene Communications in Computer and Information Science, vol. 224, Springer,
expression data, Brief. Bioinf. 14 (3) (2013) 279292. Berlin, Heidelberg, 2011, pp. 1624.
[41] Nikita Boyko. Neng Fan, Panos M. Pardalos, in: Wanpracha Chaovalitwongse, [69] Yaochun Huang, Hui Xiong, Weili Wu, Sam Y. Sung, Mining quantitative
Panos M. Pardalos, Petros Xanthopoulos (Eds.), Recent advances of data maximal hyperclique patterns: a summary of results, in: PAKDD, Springer-
biclustering with application in computational neuroscience, Computational Verlag, Berlin, Heidelberg, 2006, pp. 552556.
Neuroscience, 38, Springer Optimization and Its Applications Springer, New [70] jan Ihmels, Sven Bergmann, Naama Barkai, Dening transcription modules
York, ISBN 978-0-387-88629-92010, pp. 85112. http://dx.doi.org/ using large-scale gene expression data, Bioinformatics 20 (September (13))
10.1007/978-0-387-88630-5_6. (2004) 19932003.
[42] Gang Fang, Majda Haznadar, Wen Wang, Haoyu Yu, Michael Steinbach, [71] Maurice G. Kendall, Rank Correlation Methods, Grifn, London, 1948.
Timothy R. Church, William S. Oetting, Brian Van Ness, Vipin Kumar, High- [72] Adam Kirsch, Michael Mitzenmacher, Andrea Pietracaprina, Geppino Pucci,
order SNP combinations associated with complex diseases: efcient dis- Eli Upfal, and Fabio Vandin, An efcient rigorous approach for identifying
covery, statistical power and functional interactions, Plos One 7 (2012). statistically signicant frequent itemsets, in: ACM SIGMOD Symposium on
[43] Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers, Principles of Database Systems, PODS '09, ACM, New York, NY, USA, 2009,
Vipin Kumar, Subspace differential coexpression analysis: problem denition pp. 117126.
and a general approach, in: Pacic Symposium on Biocomputing, World [73] L. Lazzeroni, A. Owen, Plaid models for gene expression data, Stat. Sin. 12
Scientic Publishing, 2010, pp. 145156. (2002) 6186.
[44] Paolo Favaro, Ren Vidal, Paolo Favaro, Avinash Ravichandran, A closed form [74] William Lee, Desiree Tillo, Nicolas Bray, HRandall Morse, Ronald W. Davis,
solution to robust subspace estimation and clustering, in: Computer Vision Timothy R. Hughes, Corey Nislow, A high-resolution atlas of nucleosome
and Pattern Recognition, IEEE, Colorado Springs, USA, 2011, pp. 18011807. occupancy in yeast, Nat. Genet. 39 (September (10)) (2007) 12351244.
[45] Usama M. Fayyad, Keki B. Irani, Multi-interval discretization of continuous- [75] Guimei Liu, Jinyan Li, Kelvin Sim, and Limsoon Wong, Distance based
valued attributes for classication learning, in: IJCAI, 1993, pp. 10221029. subspace clustering with exible dimension partitioning, in: ICDE, IEEE,
[46] Adelaide Freitas, Wassim Ayadi, Mourad Elloumi, Jos Lus, Jin-Kao 2007, pp. 12501254.
Hao Oliveira, Survey on biclustering of gene expression data, Biological [76] Guimei Liu, Hongjun Lu, Wenwu Lou, Jeffrey Xu Yu, On computing, storing
Knowledge Discovery Handbook (2012) 591608. and querying frequent patterns, in: ACM SIGKDD, ACM, New York, NY, USA,
[47] Guojun Gan, Jianhong Wu, A convergence theorem for the fuzzy subspace 2003, pp. 607612.
clustering (fsc) algorithm, Pattern Recognit. 41 (6) (2008) 19391947. [77] Hongyan Liu, Jiawei Han, Dong Xin, Zheng Shao, Top-down mining of
[48] Elisabeth Georgii, Lothar Richter, Ulrich Rckert, Stefan Kramer, Analyzing interesting patterns from very high dimensional data, in: ICDE, IEEE
microarray data using quantitative association rules, Bioinformatics 21 Computer Society, Washington, DC, USA, 2006, p. 114.
(January 2) (2005) 123129. [78] Jinze Liu, Wei Wang, Op-cluster: clustering by tendency in high dimensional
[49] Gad Getz, Erel Levine, and Eytan Domany. Coupled two-way clustering space, in: ICDM, IEEE Computer Society, Washington, DC, USA, Melbourne,
analysis of gene microarray data. Proc. Natl. Acad. Sci. 97 (22) (2000) 12079 Florida, USA, 2003, p. 187.
12084. [79] Nizar R. Mabroukeh, C.I. Ezeife, A taxonomy of sequential pattern mining
[50] Dmitry Gnatyshak, DmitryI Ignatov, Alexander Semenov, Jonas Poelmans, algorithms, ACM Comput. Surv. 43 (December (1)) (2010) 31341.
Gaining insight in social networks with biclustering and triclustering of [80] Jamie I. MacPherson, Jonathan E. Dickerson, John W. Pinney, David L.
LNBIP, in: Perspectives in Business Informatics Research, vol. 128, Springer, Robertson, Patterns of HIV-1 protein interaction identify perturbed host
Berlin Heidelberg, 2012, pp. 162171. cellular subsystems, PLoS Comput. Biol. 6 (7) (2010) e1000863.
[51] Gsta Grahne, Jianfei Zhu, Efciently using prex-trees in mining frequent [81] Sara Madeira, Miguel Nobre Parreira Cacho Teixeira, Isabel S-Correia, and
itemsets, in: FIMI, vol. 90, 2003. Arlindo Oliveira, Identication of regulatory modules in time series gene
[52] Rohit Gupta, Navneet Rao, Vipin Kumar, Discovery of error-tolerant biclus- expression data using a linear time biclustering algorithm, IEEE/ACM Trans.
ters from noisy gene expression data, BMC Bioinf. 12 (12) (2011) 117. Comput. Biol. Bioinf. 1 (January) (2010) 153165.
[53] E.H. Han, G. Karypis, V. Kumar, Min-apriori: an algorithm for nding [82] Sara C. Madeira, Arlindo L. Oliveira, Biclustering algorithms for biological
association rules in data with continuous attributes, Department of Compu- data analysis: A survey, IEEE/ACM Trans. Comput. Biol. Bioinf. 1 (January (1))
ter Science, University of Minnesota, Minneapolis (1997). (2004) 2445.
[54] Jiawei Han, Hong Cheng, Dong Xin, Xifeng Yan, Frequent pattern mining: [83] M.A. Mahfouz, M.A. Ismail, Bidens: iterative density based biclustering
current status and future directions, Data Min. Knowl. Discov. 15 (August (1)) algorithm with application to gene expression analysis, in: PWASET, vol. 37
(2007) 5586. 2009, pp. 342348.
[55] Jiawei Han, Jian Pei, Guozhu Dong, Ke Wang, Efcient computation of iceberg [84] Kazuhisa Makino, Takeaki Uno, New algorithms for enumerating all maximal
cubes with complex measures, SIGMOD Rec. 30 (May (2)) (2001) 112. cliques of LNCS, in: SWAT, vol. 3111, Springer, 2004, pp. 260272.
R. Henriques et al. / Pattern Recognition 48 (2015) 39413958 3957
[85] David Martin, Christine Brun, Elisabeth Remy, Pierre Mouren, Denis Thieffry, [112] Fanhua Shang, L.C. Jiao, Fei Wang, Graph dual regularization non-negative
Bernard Jacq, Gotoolbox: functional analysis of gene datasets based on gene matrix factorization for co-clustering, Pattern Recognit. 45 (6) (2012) 2237
ontology, Genome biology, BioMed Central Ltd, 5(12), 2014, R101. 2250 (Brain Decoding).
[86] Ricardo Martinez, Claude Pasquier, Nicolas Pasquier, Genminer: Mining [113] Qizheng Sheng, Yves Moreau, Bart De Moor, Biclustering microarray data by
informative association rules from genomic data, Bioinformatics and Biome- gibbs sampling, in: ECCB, 2003, pp. 196205.
dicine, 2007, Nov, 1522, http://dx.doi.org/10.1109/BIBM.2007.49. [114] Kelvin Sim, Vivekanand Gopalkrishnan, Arthur Zimek, Gao Cong, A survey on
[87] Tara McIntosh, Sanjay Chawla, High condence rule mining for microarray enhanced subspace clustering, Data Min. Knowl. Discov. 26 (2) (2013)
analysis, IEEE/ACM Trans. Comput. Biol. Bioinf. 4 (October (4)) (2007), 332397.
611623. [115] Michael Steinbach, Pang-Ning Tan, Hui Xiong, Vipin Kumar, Generalizing the
[88] Guy W. Mineau, Akshay Bissoon, Robert Godin, Simple pre- and post- notion of support, in: ACM SIGKDD, 2004, ACM, New York, NY, USA, pp. 689
pruning techniques for large conceptual clustering structures, Electron. 694.
Trans. Artif. Intell. 4 (C) (2000) 120. [116] Michael Steinbach, Haoyu Yu, Gang Fang, Vipin Kumar, Using constraints to
[89] Sushmita Mitra, Haider Banka, Multi-objective evolutionary biclustering of generate and explore higher order discriminative patterns of LNCS, in:
gene expression data, Pattern Recognit. 39 (December (12)) (2006) PAKDD, vol. 6634, Springer, 2011, pp. 338350.
24642477. [117] Pang-Ning Tan, Vipin Kumar, Jaideep Srivastava, Selecting the right interest-
[90] Anirban Mukhopadhyay, Ujjwal Maulik, Sanghamitra Bandyopadhyay, A ingness measure for association patterns, in: ACM SIGKDD, ACM, Edmonton,
novel biclustering approach to association rule mining for predicting HIV-1 Alberta, Canada, 2002, pp. 3241.
human protein interactions, PLoS One 7 (4) (2012) e32289. [118] A. Tanay, R. Sharan, R. Shamir, Biclustering algorithms: a survey, in: Hand-
[91] Emmanuel Mller, Ira Assent, Ralph Krieger, Stephan Gnnemann, Thomas book of Computational Molecular Biology, 2004.
Seidl, Densest: Density estimation for data mining in high dimensional [119] Amos Tanay, Roded Sharan, Ron Shamir, Discovering statistically signicant
spaces, in: SDM, SIAM, 2009, 173184. biclusters in gene expression data, in: ISMB, 2002, pp. 136144.
[92] Emmanuel Mller, Stephan Gnnemann, Ira Assent, Thomas Seidl, Evaluat- [120] Chun Tang, Li Zhang, Murali Ramanathan, Aidong Zhang, Interrelated two-
ing clustering in subspace projections of high dimensional data, VLDB way clustering: an unsupervised approach for gene expression data analysis,
Endow. 2 (August (1)) (2009) 12701281. in: BIBE, Washington, DC, USA, 2001, IEEE CS, p. 41.
[93] James Munkres, Algorithms for the assignment and transportation problems, [121] Teixeira, Miguel Cacho and Monteiro, Pedro Tiago and Guerreiro, Joana
Soci. Ind. Appl. Math. 5 (1) (1957) 3238. Fernandes and Gonc- alves, Joana Pinho and Mira, Nuno Pereira and dos
[94] T.M. Murali, Simon Kasif, Extracting conserved gene expression motifs from Santos, Sandra Costa and Cabrito, Tnia Rodrigues and Palma, Margarida and
gene expression data, in: Pacic Symposium on Biocomputing, 2003, Costa, Catarina and Francisco, Alexandre Paulo and others. The YEASTRACT
pp. 7788. database: an upgraded information system for the analysis of gene and
[95] Omar Odibat, Chandan K. Reddy, Efcient mining of discriminative co-
genomic transcription regulation in Saccharomyces cerevisiae, Nucleic Acids
clusters from gene expression data, Knowl. Inf. Syst. (2013) 130.
Res. (database issue) (2014).
[96] Yoshifumi Okada, Wataru Fujibuchi, Paul Horton, A biclustering method for
[122] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani,
gene expression module discovery using closed itemset enumeration algo-
D. Botstein, R.B. Altman, Missing value estimation methods for DNA micro-
rithm, IPSJ Trans. Bioinf. 48 (SIG5) (2007) 3948.
arrays, Bioinformatics 17 (6) (2001) 520525. http://dx.doi.org/10.1093/
[97] Yoshifumi Okada, Kosaku Okubo, Paul Horton, Wataru Fujibuchi, Exhaustive
bioinformatics/17.6.520.
search method of gene expression modules and its application to human
[123] Heather Turner, Trevor Bailey, Wojtek Krzanowski, Improved biclustering of
tissue data, IAENG Int. J. Comput. Sci. 34 (1) (2007) 119126.
microarray data demonstrated through systematic performance tests, Com-
[98] Patryk Orzechowski, Proximity measures and results validation in bicluster-
put. Stat. Data Anal. 48 (2) (2005), 235254.
ing - a survey of LNCS, Articial Intelligence and Soft Computing, vol. 7895,
[124] Miranda van Uitert, Wouter Meuleman, Lodewyk Wessels, Biclustering
Springer, Berlin Heidelberg (2013) 206217.
sparse binary genomic data, J. Comput. Biol. 15 (10) (2008) 13291345.
[99] Feng Pan, Gao Cong, Anthony K.H. Tung, Jiong Yang, Mohammed Javeed Zaki,
[125] Takeaki Uno, Masashi Kiyomi, Hiroki Arimura, Lcm ver.3: collaboration of
Carpenter: nding closed patterns in long biological datasets, in: ACM
array, bitmap and prex tree for frequent itemset mining, in: OSDM, ACM,
SIGKDD, 2003, pp. 637642.
New York, NY, USA, 2005.
[100] Feng Pan, A.K.H. Tung, Gao Cong, Xin Xu, Cobbler: combining column and
[126] Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu, Clustering by pattern
row enumeration for closed pattern discovery, in: Scientic and Statistical
similarity in large data sets, in: SIGMOD, ACM, New York, NY, USA, 2002,
Database Management, June 2004, pp. 2130.
[101] Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Chad L. Myers, Vipin pp. 394405.
Kumar, An association analysis approach to biclustering, in: ACM SIGKDD, [127] Shu Wang, Robin R Gutell, Daniel P Miranker, Biclustering as a method for
ACM, New York, NY, USA, 2009, pp. 677686. rna local multiple sequence alignment, Bioinformatics 23 (24) (2007)
[102] Nicolas Pasquier, Yves Bastide, Rak Taouil, Lot Lakhal, Efcient mining of 32893296.
association rules using closed itemset lattices, Inf. Syst. 24 (March (1)) (1999) [128] Zhiguan Wang, Chi Wai Yu, Ray C.C. Cheung, Hong Yan, Hypergraph based
2546. geometric biclustering algorithm, Pattern Recognit. Lett. 33 (12) (2012)
[103] Anne Patrikainen, Marina Meila, Comparing subspace clusterings, IEEE Trans. 16561665.
Knowl. Data Eng. 18 (July (7)) (2006) 902916. [129] Takashi Washio, Hiroshi Motoda, State of the art of graph-based data mining,
[104] Ren Peeters., The maximum edge biclique problem is np-complete, Discrete SIGKDD Explor. Newslett. 5 (July (1)) (2003) 5968.
Appl. Math. 131 (September (3)) (2003) 651654. [130] Peter H. Westfall, S. Stanley Young, Resampling-Based Multiple Testing :
[105] Liuqing Peng, Junying Zhang, An entropy weighting mixture model for Examples and Methods for p-Value Adjustment, John Wiley & Sons, 1993.
subspace clustering of high-dimensional data, Pattern Recognit. Lett. 32 (8) [131] Hu Xia, Jian Zhuang, Dehong Yu, Novel soft subspace clustering with multi-
(2011) 11541161. objective evolutionary approach for high-dimensional data, Pattern Recognit.
[106] Beatriz Pontes, Ral Girldez, Jess S Aguilar-Ruiz, Congurable pattern- 46 (9) (2013) 25622575. http://dx.doi.org/10.1016/j.patcog.2013.02.005.
based evolutionary biclustering of gene expression data, Algorithms Mol. [132] Dong Xin, Zheng Shao, Jiawei Han, Hongyan Liu, C-cubing: efcient compu-
Biol. 8(1) (2013) 4. tation of closed cubes by aggregation-based checking, in: ICDE, IEEE
[107] Ignacio Ponzoni, Francisco Azuaje, Juan Augusto, David Glass, Inferring Computer Society, 2006, p. 4.
adaptive regulation thresholds and association rules from gene expression [133] Hui Xiong, Xiao-Feng Heb, Chris Ding, Ya Zhang, Vipin Kumar, Stephen R
data through combinatorial optimization learning, IEEE/ACM Trans. Comput. Holbrook, Identication of functional modules in protein complexes via
Biol. Bioinf. 4 (4) (2007) 624634. hyperclique pattern discovery, in: Pacic Symposium on Biocomputing,
[108] Amela Preli, Stefan Bleuler, Philip Zimmermann, Anja Wille, 2005.
Peter Bhlmann, Wilhelm Gruissem, Lars Hennig, Lothar Thiele, [134] Hui Xiong, Pang-Ning Tan, Vipin Kumar, Hyperclique pattern discovery, Data
Eckart Zitzler, A systematic comparison and evaluation of biclustering Min. Knowl. Discov. 13 (2) (2006) 219242.
methods for gene expression data, Bioinformatics 22 (June (9)) (2006) [135] Mohammed J. Zaki, Karam Gouda, Fast vertical mining using diffsets, in: ACM
11221129. SIGKDD, ACM, New York, NY, USA, 2003, pp. 326335.
[109] Andreas Rosenwald, George Wright, Wing C. Chan, et al., The use of [136] Mohammed J. Zaki, Ching J. Hsiao, CHARM: An Efcient Algorithm for Closed
molecular proling to predict survival after chemotherapy for diffuse large- Itemset Mining.
B-cell lymphoma, N. Engl. J. Med. 346 (June 25) (2002) 19371947. [137] Hongya Zhao, Kwok Leung Chan, Lee-Ming Cheng, L. Cheng, Hong Yan, A
[110] Swarup Roy, KDhruba Bhattacharyya, KJugal Kalita, Cobi: pattern based co- probabilistic relaxation labeling framework for reducing the noise effect in
regulated biclustering of gene expression data, Pattern Recognit. Lett. 34 (14) geometric biclustering of gene expression data, Pattern Recognit. 42 (11)
(2013) 16691678. (2009) 25782588. http://dx.doi.org/10.1016/j.patcog.2009.03.016.
[111] Akdes Serin, Martin Vingron, Debi: discovering differentially expressed [138] Feida Zhu, Xifeng Yan, Jiawei Han, P.S. Yu, Hong Cheng, Mining
biclusters using a frequent itemset approach, Algorithms Mol. Biol. 6 (2011) colossal frequent patterns by core pattern fusion, in: ICDE, April 2007,
112. pp. 706715.
3958 R. Henriques et al. / Pattern Recognition 48 (2015) 39413958
Rui Henriques received a M.Sc. degree in computer science and engineering from Instituto Superior Tcnico (IST), Universidade de Lisboa. He is developing his Ph.D. studies
in the eld of learning from high-dimensional and structured data at IST and INESC-ID. He had received distinctions for his academic achievements by IST between 2006 and
2008, and a National Award for his merits by Caixa Geral de Depsitos, in 2009. He has also been a Business Analyst at McKinsey with wide exposure to real-life projects.
Cludia Antunes received her Ph.D. from Instituto Superior Tcnico (IST, University of Lisbon, Portugal) in the domain of data mining and machine learning, proposing new
methods to deal with temporal data, in particular for mining event sequential patterns. She is currently a Professor at DEI department at IST and the scientic coordinator of
two projects funded by FCT in the areas of domain-driven data mining and educational data mining. Cludia has been working on methods for general pattern mining, from
transactional to structured data. Her main interests are centered on mining complex knowledge from complex data, with emphasis on the incorporation of background
knowledge in the pattern mining process.
Sara C. Madeira received a (5-year) B.Sc. degree in computer science from the University of Beira Interior, Covilh, Portugal, in 2000, and the M.Sc. and Ph.D. degrees in
computer science and engineering (CSE) at Instituto Superior Tcnico (IST), Technical University of Lisbon, in 2002 and 2008. She is currently an Assistant Professor, at the CSE
department at IST, and a Senior Researcher at INESC-ID, Lisbon. Her research interests include algorithms and data structures, data mining, machine learning, bioinformatics
and medical informatics.