Software Module Clustering An in Depth L
Software Module Clustering An in Depth L
4, DECEMBER 2012 1
Abstract—Software module clustering is an unsupervised learning method used to cluster software entities (e.g.,
classes, modules, or files) with similar features. The obtained clusters may be used to study, analyze, and understand the
software entities’ structure and behavior. Implementing software module clustering with optimal results is challenging.
Accordingly, researchers have addressed many aspects of software module clustering in the past decade. Thus, it is
essential to present the research evidence that has been published in this area. In this study, 143 research papers from
arXiv:2012.01057v1 [cs.SE] 2 Dec 2020
well-known literature databases that examined software module clustering were reviewed to extract useful data. The
obtained data were then used to answer several research questions regarding state-of-the-art clustering approaches,
applications of clustering in software engineering, clustering processes, clustering algorithms, and evaluation methods.
Several research gaps and challenges in software module clustering are discussed in this paper to provide a useful
reference for researchers in this field.
Index Terms—Systematic literature study, software module clustering, clustering applications, clustering algorithms,
clustering evaluation, clustering challenges.
its tasks by invoking ready-made web services in the past decade is lacking. A few survey/re-
available on the Internet/network [9], [10]. view studies on software module clustering
This paper presents a comprehensive sys- have been found in the literature. Shtern and
tematic literature study to structure and cat- Tzerpos [11] provided an overview of different
egorize the state-of-the-art research evidence software clustering methods and their applica-
related to software module clustering during tions in software engineering. The study high-
the past decade. The study defines several re- lighted some approaches for the evaluation of
search questions (RQs) that cover many aspects software clustering results. It also presented
of the field and then identifies relevant papers some research challenges to be addressed to
and their results. It concludes by discussing improve software clustering results. In [12],
future research opportunities in the area. In this the basic concepts and necessity of software
process, a systematic method is used to collect module clustering were presented briefly. The
and analyze the related published research. authors also described different metaheuristic
The remainder of this paper is structured search techniques that have been applied to
as follows: Section 2 presents the motivation the software module clustering problem in the
and an overview of related work. Section 3 maintenance phase of the software develop-
describes in detail the research methodology ment life cycle. In [13], the authors presented
used to conduct this study. Section 4 presents different search-based approaches to software
the results and outcomes. Section 5 discusses clustering that have been classified into sev-
the issues of validity. Finally, Section 6 presents eral categories (mono-objective, multiobjective,
the conclusions of the study. and many-objective) based on the number of
clustering quality criteria. Furthermore, the ad-
vantages and disadvantages of each category
2 M OTIVATION AND R ELATED W ORK are presented briefly. In [14] and its extended
Software module clustering is an important version [15], the authors described search-
topic of research in software engineering. Al- based optimization techniques and their appli-
though it started in the 1990s, software module cations in different software engineering do-
clustering research has experienced more mo- mains. They briefly introduced software mod-
mentum and attention in the past decade. This ularization and refactoring as clustering prob-
momentum and attention are reflected in the lems that can be addressed using several search
dramatic increase in the number of publication. algorithms. Additionally, they presented some
Among the many factors leading to this in- research challenges with search algorithms, in-
creased attention, there have been two leading cluding determining suitable stopping criteria
factors in the past decade. First is the dramatic to terminate the search and issues related to
increase in software application size due to the visualizing the search results. The authors in
newly added functionalities and features they [16] also dedicated parts of their study to
provide. This, in turn, has led to an increase performing software modularization and refac-
in the number of modules for these applica- toring using search-based optimization tech-
tions. Here, software module clustering is a niques. They briefly introduced a number of
good approach to manage and maintain this algorithms in this respect, including NSGA-II
kind of application. Second, the advancement and PCA-NSGA-II. Additionally, a number of
of artificial intelligence (AI) methods (such as evaluation metrics, such as coupling, cohesion,
data mining, clustering, optimization, and ma- and MQ (modularization quality) have been
chine learning methods) in the past decade mentioned.
has played a substantial role in increasing the The aforementioned studies were not fo-
research activity related to software module cused on conducting a dedicated literature
clustering. analysis of software module clustering: some
Although it has been an active research area aspects of software module clustering are cov-
since the 1990s, a systematic literature study ex- ered as part of other related topics. By contrast,
ploring research on software module clustering our paper presents an in-depth and systematic
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 3
• RQ5. What are the software systems used in the keywords, as this study is a general
as targets for the experiments of software
analysis of clustering in software engineering.
module clustering? Table 2 presents the keywords associated with
• RQ6. What are the current factbase each part of the method in the final search
sources, their types, their forms, and ex-
string. Notably, the search terms used in the
traction tools used in software module final search string were linked with one another
clustering? based on the steps described in [22]. Here,
• RQ7. What are the most used similarity the "OR" Boolean operator was used to link
measures in the software module cluster- synonyms or related search terms to the topic
ing? of this study and the "AND" Boolean operator
• RQ8. What are the most used algorithms, was employed to link the main search terms.
their types, and standard stop conditions? Different combinations of search strings tried
• RQ9. What are the most used tools for to construct the final one since the term "clus-
visualizing clustering results? tering" is used in other research areas such as
• RQ10. What are the metrics used to eval- data mining, image processing, and statistics.
The final search string was the one that meets
uate clustering results and the current ap-
proaches to obtaining the gold/expert de-the following two criteria.
composition? • The search string that returns the most
• RQ11. What are the potential future re- relevant studies.
search directions on software module clus- • The search string that returns the maxi-
tering? mum number of the identified pilot set.
For this study, a pilot set of 25 papers has
3.2 Search Strategy been selected based on our experience and
initial research review.
3.2.1 Literature Sources
As an example, the search string Try 2 in
In this study, five standard online databases Table 3 was excluded because the returned
were selected as sources that index the liter- result is incomplete compared to the search
ature of software engineering and computer string Try 5.
science [17]. These sources are presented in
Table 1. 3.3 Paper Selection
TABLE 1: The used database sources to explore 3.3.1 Inclusion/Exclusion Criteria
the literature. To decide whether a paper is relevant to the
Source URL scope of this research, a set of criteria, which
IEEE Xplore https://ieeexplore.ieee.org/ are presented below, for inclusion and exclu-
Elsevier ScienceDirect https://sciencedirect.com/ sion was considered.
ACM Digital Library https://dl.acm.org/
Scopus https://scopus.com/ • Inclusion criteria are:
SpringerLink https://link.springer.com/ – Papers published online from 2008-
2019. Papers related directly to soft-
ware module clustering. This is en-
3.2.2 Search String sured by reading the title of each ob-
The population, intervention, comparison, and tained paper. However, the abstract or
outcomes (PICO) method [18] was employed full-text reading has been also applied
to identify related studies. Here, population when the title reading was not enough.
(P) refers to the applied area of clustering, This criterion filtered most of the pa-
intervention (I) refers to the process or pro- pers out.
cedure used to solve the clustering problem
and outcomes (O) refers to the outcomes of • Exclusion criteria are:
the work and research of clustering in software – Papers not published in English are
engineering. Comparison (C) is not considered excluded since English is the prevalent
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 5
language used in the scientific peer- cluded because the content of such
reviewing global community. publications is eventually presented in
– Papers without accessible full text. peer-reviewed venues, which have al-
– Papers not formally peer-reviewed ready been considered in our study.
(gray literature and books). – Papers that are published as surveys
– Papers not published electronically. are filtered out because they do not
– The duplicated papers were excluded actually bring new technical contribu-
from the list. Authors sometimes pub- tions to software module clustering.
lish expanded versions of their confer-
ence papers to journal venues. Such
papers share most of the material and 3.3.2 Snowballing
considering them both would affect The snowballing [23] search method was ap-
the quality of this study. To overcome plied to the remaining papers to reduce the
this issue, duplicated papers are iden- possibility of missing critical related papers.
tified by comparing paper titles, ab- In this method, each research paper’s list of
stracts, and contents. When the du- references is examined in terms of the previ-
plication is confirmed, the least recent ously applied inclusion/exclusion criteria. The
publication is removed. process was then recursively applied to newly
– Master and Ph.D. dissertations are ex- identified papers.
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 6
+6 Result = 166
Apply snowballing search technique
-6 Result = 143
Repeat paper selection process
13
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 14
Figure 12 shows that "source code", "docu- document and within the whole document
mentation", "dynamic information", and "byte- collection (the target software system).
code" are the most commonly used sources for • Software metrics: They are quantitative
factbase extraction, constituting approximately measures that enable software engineers
82% (117/143), 3% (4/143), 2% (3/143), and and managers to understand the target
3% (2/143) of the total published papers, re- software system. The number of code lines
spectively. In addition, many papers combined per class, number of methods per class,
two sources for factbase extraction. In this re- and depth of inheritance level are possible
spect, "source code and evolutionary informa- metrics.
tion" is the most commonly used combination, • Extended dependency graph: It is a graph
accounting for approximately 6% (8/143) of the that combines logical/static relationships
total publications. and evolutionary relationships among soft-
The factbases are extracted from the sources ware entities. The evolutionary relation-
in different forms. Analysis of the selected ships of the targeted software system rep-
papers reveals that the "dependency graph," resent the changes applied to its source
"vector-space model," "software metrics," and files over time. Currently, many version
"extended dependency graph" are the most control systems, such as the concurrent
commonly used forms of the factbase. Figure 13 versions system (CVS) and Git, store these
shows the analysis results of the published pa- changes.
pers. The figure also shows that approximately Notably, researchers typically use tools,
57% (81/143), 32% (46/143), 8% (12/143), which are mostly open-source Java programs,
and 3% (4/143) of the total published papers to perform factbase extraction. Table 8 presents
considered "dependency graph", "vector-space all the tools that have been used in the selected
model", "software metrics", and "extended de- studies, along with their links.
pendency graph", respectively. The following
points describe these factbase forms in detail: C. Filtering and preprocessing
• Dependency graph: It is a graph represen- Filtering is a useful preprocessing phase
tation of the target software system. The in any clustering process to identify and
nodes in the dependency graph represent remove unnecessary textual and nontextual
software entities, whereas the edges rep- information that has been extracted from
resent the logical/static relationships be- comments and source codes. Textual
tween entities. In some cases, edges are information can be meaningless words,
weighted to denote the degree of depen- such as words with less than three characters,
dency. Once the dependency graph is ex- language keywords, or common English
tracted, many characteristics of the target words that are not usually useful for a
software system can be discovered, such as search [39]. Textual information can also
the independence degree of the software be library classes or header files used in
entities based on their relationships. multiple modules and made available across
• Vector-space model: It is used to cap- the implementations of a programming
ture the relative importance of terms language. These classes have to be eliminated;
(e.g., class name, function name, object otherwise, they tend to group many classes
name, and variable name) in a docu- in a single large cluster around them [38].
ment, e.g., class file, and program file. Nontextual information can be operators,
In the vector-space model, a document symbols, special characters, and punctuations.
is represented by a vector of terms ex- Preprocessing can be implemented in the form
tracted from the document with associated of a normalization procedure. The attributes
weights (which can often be computed us- of the software entities can be a mixture of
ing the term frequency-inverse document numerical and categorical data. Here, with
frequency (TF-IDF) method [40]) repre- the help of normalization, all the attributes
senting the importance of the terms in the are treated equally [100]. Preprocessing can
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 15
cluster (containing all n entities) accounting for approximately 50% (71/143) fol-
and split it until n clusters are lowed by "Hierarchical Clustering", with ap-
obtained. proximately 31% (44/143), and "Fuzzy Clus-
– Partitional clustering: For a given set tering", with approximately 1% (2/143) of the
of n entities, this approach simply di- total publications.
vides the set of entities into nonover-
lapping clusters such that each entity
is in exactly one cluster.
Notably, some studies have combined
more than one clustering algorithm. For
example, hierarchical and partitional
clustering can be combined to achieve a
common goal. This type of clustering is
called cooperative clustering [84].
Fig. 15: Number of published papers vs. clus- The analysis of the selected studies indicated
tering algorithm. the use of several different termination condi-
tions. The clustering process can be terminated
Figure 16 shows that "Partitional Clustering" based on one of the following cutoff conditions
is the most commonly used type of clustering, [116], [4]:
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 18
noncohesive [55].
Figure 18 shows that "Results Compared
with Others Results" is the most commonly
used clustering evaluation method, accounting
for approximately 53% (76/143) of all publi-
cations, followed by "Modularization Quality",
30% (43/143), and "MoJo Similarity Metrics",
16% (23/143). Many studies use more than one
clustering evaluation method; thus, the results
overlap.
Some clustering evaluation methods (e.g.,
MoJo Similarity Metrics and precision/recall) Fig. 19: Publication ratio by source of expert
work only when expert decomposition is avail- decomposition.
able as a comparison standard. Examination of
the total publications reveals that a number of
approaches have been presented for obtaining software system, e.g., the folder structure,
expert decomposition. A summary of these the package structure, or the file structure.
approaches is presented as follows: The advantage of using this approach is
• Domain Expert-Based Decomposition: The
that its decompositions have good quality
decomposition is performed by software because they have been created by the
domain experts. Domain experts are per- original developers and domain experts
sonnel with experience in software design [81].
• Original Developer-Based Decomposition:
and development [55]. The experts either
evaluate the results produced by a clus- The decomposition is obtained by ap-
tering algorithm (e.g., if the results have proaching the original developers of the
a positive impact on the system’s under- target software system. The drawback of
standability) or they provide a clustering this approach is that the original develop-
benchmark that can be compared with the ers are typically not available.
• Documentation-Based Decomposition: The
results produced by a clustering algorithm
[38]. Well-known drawbacks of this ap- decomposition is obtained using the key
proach are as follows: (a) Experts may pro- functional concepts extracted from the
vide many valid ways to decompose a soft- software architecture documentation.
• Maintenance Log-Based Decomposition:
ware system into meaningful subsystems
[84]. (b) They might lead to poor decom- The decomposition is obtained by extract-
position if they did not fully understand ing information embedded in maintenance
the purpose of the clustering approach. (c) logs, which can be utilized to produce
They may lead to poor decomposition if multiple decomposition stages of the target
their experience and knowledge are not system.
sufficient [114]. In addition, finding do- Many clustering approaches use expert de-
main experts with suitable experience, es- composition to evaluate the clustering results.
pecially on open-source software systems Figure 19 shows the distribution of the expert
[37] and legacy systems [92], is difficult. (d) decomposition, where 24 approaches are based
Software systems are constantly evolving on domain experts, 15 are based on factual
and maintaining an up-to-date expert de- information, 14 are based on the original de-
composition can be a tedious, error-prone, velopers, two are based on the documentation,
and time-consuming task [93]. and one is based on maintenance logs.
• Factual Information-Based Decomposition: Many factors affect the quality and efficacy
The decomposition is obtained using the of a clustering process. The following points
current factual information of the targeted summarize those factors [65]:
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 21
• The domain, type, size, and architecture of through all the displayed results. Thus, the
the targeted software system. For example, user can better understand the results. The
a clustering algorithm that is successful for visualization tools should also automati-
a procedural program or a small software cally associate labels with the generated
system might be unsuccessful for a large clusters.
system developed in an object-oriented • CASE tool: It is essential to consider mak-
paradigm [87]. ing the whole clustering process a working
• The choice of factbase sources, preprocess- tool (e.g., third-party libraries, plugins, and
ing of data extracted from them and find- standalone applications) available for re-
ing an appropriate representation of them searchers, software engineers, and practi-
[100]. tioners to perform further experimentation
• The entities, features, and relationships be- and collect feedback for future improve-
tween them in terms of how well they have ments.
been selected. • Targeted systems: The targeted or subject
• The type of similarity measures and al- software systems used in the experiments
gorithms used for the clustering process are mostly Java-based open-source sys-
[100]. tems. It would be interesting to examine
• Arbitrary decisions during the clustering the use of non-open-source systems and
process influence the quality and perfor- systems written in other popular program-
mance of clustering [82]. ming languages. Furthermore, considering
different software application domains to
keep the results generic and widely ap-
4.5 Potential Future Directions of Research
plicable requires further study. When the
(RQ11)
targeted systems are selected from vari-
Answering RQ11 will help to identify pos- ous application domains, a specific cluster-
sible research areas that may require further ing algorithm may exhibit various perfor-
investigation. Based on the analysis of the con- mance characteristics. Therefore, the var-
sidered papers, several potential directions of ious performance features of a particu-
research were identified. The following points lar clustering algorithm, when applied to
summarize and categorize these future research multiple target systems, should be investi-
directions: gated.
• Scalability: The clustering approach should • Entity features: Some software entities
handle a growing quantity of input with- such as files and classes have a large num-
out decreasing the clustering results’ qual- ber of different types of features (e.g., lines
ity. This process can be achieved by per- of code, executable statements, number
forming clustering in parallel using mul- of functions, and number of variables or
tithreading programming techniques and objects) to be extracted. From a practical
hardware systems that have multicore pro- perspective, addressing this large number
cessors. The consistency of clustering must of various features is not preferable. There-
be ensured, i.e., performing the clustering fore, experimental studies to determine the
multiple times on the same dataset should number and type of features needed for
produce the same results. better clustering results should be con-
• Visualization: The display of the clustering ducted. Further experiments may reveal
results should be improved in cases where a clustering approach that is more suit-
large outputs are presented on the screen able for target systems that present specific
simultaneously. This improvement can be characteristics (e.g., size or implemented
achieved by applying filters to separate functionality).
the results into different abstraction lay- • Factbase sources: There are different
ers. As a result, clusters within a specific sources for factbase extraction. Each has
layer only can be viewed instead of going its own set of features and drawbacks.
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 22
Thus, experimental studies that can graphs), such as utilizing the defined nam-
determine the type of factbase sources ing of modules and internal variables or
that provide better clustering results other factbase sources as part of clustering
should be performed. The impact of arrangement criteria, would be a useful
integrating different factbase sources on endeavor.
the overall clustering accuracy should be • Beyond clustering: The ultimate aim of
studied. software module clustering is to help soft-
• Cooperative clustering: Only a few studies ware engineers apply the recommended
have combined more than one clustering clustering results to their software projects.
algorithm to achieve a common goal. Cur- However, do software engineers really ad-
rently, clusters that are produced by the here to the recommended clustering re-
first clustering algorithm are subdivided sults? If yes, what will be the impact,
and reclustered by the second clustering cost, or effort of realizing the suggested
algorithm, a situation that is often unde- results? No research has comprehensively
sirable. Thus, more experiments should be addressed these issues. Thus, a thorough
performed, and tools to address the situa- investigation in this respect may be a good
tion by not reclustering all the clusters that step for further study.
are part of the initial solution set should be
developed.
• Selection of clustering algorithms: The se-
5 T HREATS TO VALIDITY
lection of appropriate software clustering Every literature mapping study has a num-
algorithms plays a significant role in pro- ber of threats that might affect its validity. In
ducing meaningful clustering results. The this study, several threats were eliminated by
authors in [163] proposed guidelines for considering well-known recommendations and
selecting or rejecting a clustering algorithm guidelines on conducting literature mapping
for a given software system. However, studies as follows:
there are no comprehensive methods for • Coverage of research questions: The threat
clustering algorithm selection. Thus, fur- here is that the research questions of this
ther research and experiments can be con- study may not cover all the aspects of the
ducted to provide formal selection meth- state-of-the-art research in software mod-
ods based on empirical evidence. ule clustering. To address this threat, all
• Clustering with aesthetic aspects of the the authors of this study used brainstorm-
software design: The computational deter- ing to define the desired set of research
mination of the optimized cluster is often questions that cover the existing research
mechanistic, ignoring the fact that software in the area.
is a creative artifact. Typically, cluster ar- • Coverage of relevant studies: It cannot be
rangement is determined via the grouping guaranteed that all the relevant studies
of nodes from dependency graphs. Having in software module clustering have been
an optimized clustering of modules versus identified. Accordingly, different literature
a meaningful set of clusters are two sides databases have been used, and a PICO
of the same coin with competing objec- method-based search string with various
tives. On the one hand, it is desirable to term synonyms (each author of the pa-
maximize cohesion and minimize coupling per suggested different terms that lead
at any cost. On the other hand, one may to desired clustering concepts) has been
also need to capture semantic as well as applied to obtain the relevant research
the essence and aesthetic aspects of the publications. However, some unidentified
software design. Integrating natural lan- papers may remain. To address this issue,
guage processing with deep learning and the snowballing method was intensively
considering on other criteria (apart from applied to reduce the possibility of missing
simply grouping nodes from dependency important related papers.
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 23
• Paper inclusion/exclusion criteria: Appli- that enabled the clustering process were briefly
cation of the criteria can suffer from single- discussed. Finally, as there are many research
author judgment and personal bias. To ad- studies on software module clustering, novice
dress this issue, each paper was included researchers are likely to experience difficulties
or excluded for this study only after the in addressing different aspects of the area.
authors reached a consensus. Therefore, we propose this analysis study as
• Accuracy of data extraction: Data extrac- a primary reference to simplify the process of
tion can suffer from the single-author expe- finding the most relevant information.
rience. Accordingly, each author individ-
ually performed the data extraction pro- 7 ACKNOWLEDGEMENT
cess, and the outcomes of all authors were This work has been partially funded by
compared in an online meeting. In the the Knowledge Foundation of Sweden (KKS)
meeting, all authors discussed differences through the Synergy Project AIDA - A Holistic
between the outcomes until a final and AI-driven Networking and Processing Frame-
agreed consensus was reached. Automatic work for Industrial IoT (Rek:20200067).
filtering provided by Microsoft Excel was
also used to ensure the accuracy of the data R EFERENCES
extraction process. [1] A. Adolfsson, M. Ackerman, and N. C. Brownstein, “To
• Reproducibility of the study: The issue cluster, or not to cluster: An analysis of clusterability
here is whether other researchers can per- methods,” Pattern Recognition, vol. 88, pp. 13–26, 2019.
[2] P. Antonellis, D. Antoniou, Y. Kanellopoulos, C. Makris,
form this study with similar results. Ac- E. Theodoridis, C. Tjortjis, and N. Tsirakis, “Clustering
cordingly, all the steps followed and per- for Monitoring Software Systems Maintainability Evolu-
formed in this study were reported in the tion,” Electronic Notes in Theoretical Computer Science, vol.
233, pp. 43–57, 2009.
research methodology (see Section 3). [3] G. Şerban and I.-G. Czibula, “Object-Oriented Software
Systems Restructuring through Clustering,” in Interna-
tional Conference on Artificial Intelligence and Soft Comput-
6 C ONCLUSION ing (ICAISC), 2008, pp. 693–704.
[4] J. Feng and H. Seok, “Applying agglomerative hierar-
This paper systematically reports the state- chical clustering algorithms to component identification
of-the-art empirical contributions in software for legacy systems,” Information and Software Technology,
vol. 53, no. 6, pp. 601–614, 2011.
module clustering. Thus, to ascertain the re- [5] R. A. Bittencourt and D. D. S. Guerrero, “Comparison of
cent clustering applications in software engi- Graph Clustering Algorithms for Recovering Software
neering, the algorithms and tools used to en- Architecture Module Views,” in the 13th European Con-
ference on Software Maintenance and Reengineering. IEEE,
able the software module clustering process 2009, pp. 251–254.
were identified. A total of 143 papers from [6] B. Joshi, P. Budhathoki, W. L. Woon, and D. Svetinovic,
popular literature databases published in the “Software Clone Detection Using Clustering Approach,”
in International Conference on Neural Information Process-
area of software module clustering from 2008- ing, 2015, pp. 520–527.
2019 were selected for this study. The pub- [7] R. Naseem, A. Ahmed, S. U. Khan, M. Saqib, and
lished papers were a combination of works M. Habib, “Program restructuring using agglomerative
from conferences, journals, symposiums, and clustering technique based on binary features,” in 2012
International Conference on Emerging Technologies. IEEE,
workshops. However, most of the published 2012, pp. 1–6.
papers were from conferences. From different [8] M. Kargar, A. Isazadeh, and H. Izadkhah, “Multi-
perspectives and based on several identified programming language software systems modulariza-
tion,” Computers and Electrical Engineering, vol. 80, pp.
RQs, the selected studies were thoroughly re- 1–22, 2019.
viewed and analyzed. The findings were in [9] Y. Yu, J. Lu, J. Fernandez-Ramil, and P. Yuan, “Com-
different categories. For instance, statistics on paring web services with other software components,”
in IEEE International Conference on Web Services (ICWS),
the published studies, their publication venues, 2007, pp. 388–397.
active authors, and countries were reported. [10] N. Arunachalam, A. Amuthan, C. Kavya, M. Sharmilla,
Then, software module clustering applications K. Ushanandhini, and M. Shanmughapriya, “A survey
on web service clustering,” in 2017 International Con-
were categorized. All the algorithms, tools, tar- ference on Computation of Power, Energy Information and
get software systems, evaluations, and metrics Commuincation (ICCPEIC), 2017, pp. 247–252.
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 24
[11] M. Shtern and V. Tzerpos, “Clustering Methodologies for [27] A. Parashar and J. K. Chhabra, “Clustering Dynamic
Software Engineering,” Advances in Software Engineering, Class Coupling Data to Measure Class Reusability Pat-
vol. 2012, pp. 1–18, 2012. tern,” in International Conference on High Performance Ar-
[12] V. Singh, “Software module clustering using metaheuris- chitecture and Grid Computing, 2011, pp. 126–130.
tic search techniques: A survey,” in 2016 3rd International [28] Amarjeet and J. K. Chhabra, “An empirical study of
Conference on Computing for Sustainable Global Develop- the sensitivity of quality indicator for software module
ment (INDIACom), 2016, pp. 2764–2767. clustering,” in 2014 Seventh International Conference on
[13] F. Morsali and M. R. Keyvanpour, “Search-based soft- Contemporary Computing (IC3). IEEE, 2014, pp. 206–211.
ware module clustering techniques: A review article,” in [29] A. Prajapati and J. K. Chhabra, “A Particle Swarm
2017 IEEE 4th International Conference on Knowledge-Based Optimization-Based Heuristic for Software Module Clus-
Engineering and Innovation (KBEI), 2017, pp. 0977–0983. tering Problem,” Arabian Journal for Science and Engineer-
[14] M. Harman, “The current state and future of search ing, vol. 43, no. 12, pp. 7083–7094, 2017.
based software engineering,” in Future of Software En- [30] Amarjeet and J. K. Chhabra, “FP-ABC: Fuzzy-Pareto
gineering (FOSE ’07), 2007, pp. 342–357. dominance driven artificial bee colony algorithm for
[15] M. Harman, S. A. Mansouri, and Y. Zhang, “Search- many-objective software module clustering,” Computer
based software engineering: Trends, techniques and ap- Languages, Systems & Structures, vol. 51, pp. 1–21, 2017.
plications,” ACM Computing Surveys (CSUR), vol. 45, [31] M. Kargar, A. Isazadeh, and H. Izadkhah, “Semantic-
no. 1, pp. 11:1–11:61, 2012. based software clustering using hill climbing,” in 2017
[16] A. Ramírez, J. R. Romero, and S. Ventura, “A survey International Symposium on Computer Science and Software
of many-objective optimisation in search-based software Engineering Conference (CSSE). IEEE, 2017, pp. 55–60.
engineering,” Journal of Systems and Software, vol. 149, pp. [32] A. Rathee and J. K. Chhabra, “Software Remodulariza-
382–395, 2019. tion by Estimating Structural and Conceptual Relations
[17] K. Petersen, S. Vakkalanka, and L. Kuzniarz, “Guidelines Among Classes and Using Hierarchical Clustering,” in
for conducting systematic mapping studies in software International Conference on Advanced Informatics for Com-
engineering: An update,” Information and Software Tech- puting Research (ICAICR), 2017, pp. 94–106.
nology, vol. 64, pp. 1–18, 2015. [33] Amarjeet and J. K. Chhabra, “Improving package struc-
[18] B. Kitchenham and S. Charters, “Guidelines for Perform- ture of object-oriented software using multi-objective
ing Systematic Literature Reviews in Software Engineer- optimization and weighted class connections,” Journal of
ing,” version 2.3., EBSE Technical Report EBSE- 2007-01, King Saud University - Computer and Information Sciences,
Software Engineering Group, School of Computer Science vol. 29, no. 3, pp. 349–364, 2017.
and Mathematics, Keele University, UK and Department of [34] A. Rathee and J. Kumar, “Improving Cohesion of a
Computer Science, University of Durham, 2007. Software System by Performing Usage Pattern Based
[19] O. Pedreira, F. García, N. Brisaboa, and M. Piattini, Clustering,” Procedia Computer Science, vol. 125, pp. 740–
“Gamification in software engineering – A systematic 746, 2018.
mapping,” Information and Software Technology, vol. 57, [35] Amarjeet and J. K. Chhabra, “Many-objective artificial
pp. 157–168, 2015. bee colony algorithm for large-scale software module
[20] M. Usman, R. Britto, J. Börstler, and E. Mendes, “Tax- clustering problem,” Soft Computing, vol. 22, no. 19, pp.
onomies in software engineering: A Systematic mapping 6341–6361, 2018.
study and a revised taxonomy development method,” [36] A. Rathee and J. K. Chhabra, “A multi-objective search
Information and Software Technology, vol. 85, pp. 43–59, based approach to identify reusable software compo-
2017. nents,” Journal of Computer Languages, vol. 52, pp. 26–43,
[21] B. S. Ahmed, K. Z. Zamli, W. Afzal, and M. Bures, 2019.
“Constrained Interaction Testing: A Systematic Litera- [37] A. Corazza, S. Di Martino, and G. Scanniello, “A Proba-
ture Study,” IEEE Access, vol. 5, pp. 25 706–25 730, 2017. bilistic Based Approach towards Software System Clus-
[22] P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, tering,” in 2010 14th European Conference on Software
and M. Khalil, “Lessons from applying the systematic Maintenance and Reengineering. IEEE, 2010, pp. 88–96.
literature review process within the software engineering [38] G. Scanniello, A. D’Amico, C. D’Amico, and T. D’Amico,
domain,” Journal of Systems and Software, vol. 80, no. 4, “Using the Kleinberg Algorithm and Vector Space Model
pp. 571–583, 2007. for Software System Clustering,” in 2010 IEEE 18th In-
[23] C. Wohlin, “Guidelines for snowballing in systematic ternational Conference on Program Comprehension. IEEE,
literature studies and a replication in software engineer- 2010, pp. 180–189.
ing,” in Proceedings of the 18th International Conference on [39] S. Romano, G. Scanniello, M. Risi, and C. Gravino, “Clus-
Evaluation and Assessment in Software Engineering - EASE tering and lexical information support for the recovery
’14. ACM Press, 2014, pp. 1–10. of design pattern in source code,” in 2011 27th IEEE
[24] Google. Google trends. [Online]. Available: International Conference on Software Maintenance (ICSM).
https://trends.google.com/trends/?geo=US,Accessed: IEEE, 2011, pp. 500–503.
10/03/2020. [40] G. Scanniello and A. Marcus, “Clustering Support for
[25] O. Pedreira, F. García, N. Brisaboa, and M. Piattini, Static Concept Location in Source Code,” in 2011 IEEE
“Gamification in software engineering - a systematic 19th International Conference on Program Comprehension,
mapping,” Information and Software Technology, vol. 57, 2011, pp. 1–10.
pp. 157–168, 2015. [41] A. Corazza, S. D. Martino, V. Maggio, and G. Scanniello,
[26] R. E. Lopez-herrejon, L. Linsbauer, and A. Egyed, “A “Investigating the Use of Lexical Information for Soft-
systematic mapping study of search-based software en- ware System Clustering,” in 2011 15th European Confer-
gineering for software product lines,” Information and ence on Software Maintenance and Reengineering. IEEE,
Software Technology, vol. 61, pp. 33–51, 2015. 2011, pp. 35–44.
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 25
[42] M. Risi, G. Scanniello, and G. Tortora, “Using fold-in [57] I. Czibula and G. Czibula, “Hierarchical clustering for
and fold-out in the architecture recovery of software adaptive refactorings identification,” in 2010 IEEE In-
systems,” Formal Aspects of Computing, vol. 24, no. 3, pp. ternational Conference on Automation, Quality and Testing,
307–330, 2012. Robotics (AQTR). IEEE, 2010, pp. 1–6.
[43] G. Scanniello, C. Gravino, A. Marcus, and T. Menzies, [58] Z. Marian, I. G. Czibula, and G. Czibula, “A hierarchical
“Class level fault prediction using software clustering,” clustering-based approach for software restructuring at
in 2013 28th IEEE/ACM International Conference on Auto- the package level,” in Proceedings of 19th International
mated Software Engineering (ASE). IEEE, 2013, pp. 640– Symposium on Symbolic and Numeric Algorithms for Sci-
645. entific Computing (SYNASC). IEEE, 2017, pp. 239–246.
[44] A. Corazza, S. Di Martino, V. Maggio, and G. Scanniello, [59] A. Alkhalid, C.-H. Lung, and S. Ajila, “Software architec-
“Weighing lexical information for software clustering in ture decomposition using adaptive K-nearest neighbor
the context of architecture recovery,” Empirical Software algorithm,” in 2013 26th IEEE Canadian Conference on
Engineering, vol. 21, no. 1, pp. 72–103, 2016. Electrical and Computer Engineering (CCECE). IEEE, 2013,
[45] S. Mohammadi and H. Izadkhah, “A new algorithm for pp. 1–4.
software clustering considering the knowledge of depen- [60] A. Alkhalid, C.-h. Lung, D. Liu, and S. Ajila, “Software
dency between artifacts in the source code,” Information Architecture Decomposition Using Clustering Tech-
and Software Technology, vol. 105, pp. 252–256, 2019. niques,” in 2013 IEEE 37th Annual Computer Software and
[46] H. Masoud and S. Jalili, “A clustering-based model Applications Conference. IEEE, 2013, pp. 806–811.
for class responsibility assignment problem in object- [61] D. Liu, C.-h. Lung, and S. A. Ajila, “Adaptive Clustering
oriented analysis,” Journal of Systems and Software, vol. 93, Techniques for Software Components and Architecture,”
pp. 110–131, 2014. in 2015 IEEE 39th Annual Computer Software and Applica-
[47] M. Akbari and H. Izadkhah, “Hybrid of genetic algo- tions Conference. IEEE, 2015, pp. 460–465.
rithm and krill herd for software clustering problem,” [62] J. Huang and J. Liu, “A similarity-based modularization
2019 IEEE 5th Conference on Knowledge Based Engineering quality measure for software module clustering prob-
and Innovation (KBEI), pp. 565–570, 2019. lems,” Information Sciences, vol. 342, pp. 96–110, 2016.
[48] A. H. Farajpour Tabrizi and H. Izadkhah, “Software [63] J. Huang, J. Liu, and X. Yao, “A multi-agent evolutionary
Modularization by Combining Genetic and Hierarchical algorithm for software module clustering problems,” Soft
Algorithms,” in 5th Conference on Knowledge Based Engi- Computing, vol. 21, no. 12, pp. 3415–3428, 2017.
neering and Innovation (KBEI). IEEE, 2019, pp. 454–459. [64] W. Pan, B. Li, J. Liu, Y. Ma, and B. Hu, “Analyzing
[49] M. Kargar, H. Izadkhah, and A. Isazadeh, “Tarimliq: A the structure of Java software systems by weighted k-
new internal metric for software clustering analysis,” in core decomposition,” Future Generation Computer Sys-
27th Iranian Conference on Electrical Engineering (ICEE). tems, vol. 83, pp. 431–444, 2018.
IEEE, 2019, pp. 1879–1883. [65] M. Risi, G. Scanniello, and G. Tortora, “Architecture
[50] R. Naseem, O. Maqbool, and S. Muhammad, “Cooper- Recovery Using Latent Semantic Indexing and K-Means:
ative clustering for software modularization,” Journal of An Empirical Evaluation,” in 2010 8th IEEE International
Systems and Software, vol. 86, no. 8, pp. 2045–2062, 2013. Conference on Software Engineering and Formal Methods.
[51] R. Naseem and M. M. Deris, “A New Binary Similar- IEEE, 2010, pp. 103–112.
ity Measure Based on Integration of the Strengths of [66] A. Alkhalid, M. Alshayeb, and S. Mahmoud, “Software
Existing Measures: Application to Software Clustering,” refactoring at the function level using new Adaptive K-
in International Conference on Soft Computing and Data Nearest Neighbor algorithm,” Advances in Engineering
Mining (SCDM), 2016, pp. 304–315. Software, vol. 41, no. 10-11, pp. 1160–1178, 2010.
[52] R. Naseem, M. M. Deris, O. Maqbool, and S. Shahzad, [67] A. Alkhalid, M. Alshayeb, and S. Mahmoud, “Software
“Euclidean space based hierarchical clusterers combi- refactoring at the package level using clustering tech-
nations: an application to software clustering,” Cluster niques,” IET Software, vol. 5, no. 3, pp. 274–286, 2011.
Computing, vol. 22, pp. 7287–7311, 2019. [68] M. D. O. Barros, “An analysis of the effects of composite
[53] Z. Shah, R. Naseem, M. A. Orgun, A. Mahmood, and objectives in multiobjective software module clustering,”
S. Shahzad, “Software Clustering Using Automated Fea- in Proceedings of the 14th international conference on Genetic
ture Subset Selection,” in Advanced Data Mining and and evolutionary computation conference (GECCO). ACM
Applications: 9th International Conference, ADMA Proceed- Press, 2012, pp. 1205–1212.
ings, ser. Lecture Notes in Computer Science. Berlin, [69] M. C. Monçores, A. C. Alvim, and M. O. Barros, “Large
Heidelberg: Springer Berlin Heidelberg, 2013, pp. 47–58. Neighborhood Search applied to the Software Module
[54] S. Muhammad, O. Maqbool, and A. Abbasi, “Evaluat- Clustering problem,” Computers & Operations Research,
ing relationship categories for clustering object-oriented vol. 91, pp. 92–111, 2017.
software systems,” IET Software, vol. 6, no. 3, pp. 260– [70] P. Antonellis, D. Antoniou, Y. Kanellopoulos, C. Makris,
274, 2012. E. Theodoridis, C. Tjortjis, and N. Tsirakis, “Employing
[55] R. Naseem, M. B. M. Deris, O. Maqbool, J. peng Li, Clustering for Assisting Source Code Maintainability
S. Shahzad, and H. Shah, “Improved binary similarity Evaluation according to ISO / IEC- 9126,” in Artificial
measures for software modularization,” Frontiers of In- Intelligence Techniques in Software Engineering Workshop
formation Technology and Electronic Engineering, vol. 18, (AISEW), 2008, pp. 1–5.
pp. 1082–1107, 2017. [71] S. Arshad and C. Tjortjis, “Clustering Software Metric
[56] I. G. Czibula and G. Czibula, “Clustering based auto- Values Extracted from C# Code for Maintainability As-
matic refactorings identification,” in Proceedings of the sessment,” in Proceedings of the 9th Hellenic Conference on
2008 10th International Symposium on Symbolic and Nu- Artificial Intelligence - SETN ’16. New York, New York,
meric Algorithms for Scientific Computing, SYNASC 2008, USA: ACM Press, 2016, pp. 1–4.
2008, pp. 253–256. [72] D. Papas and C. Tjortjis, “Combining Clustering and
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 26
Classification for Software Quality Evaluation,” in Hel- in 2011 18th Asia-Pacific Software Engineering Conference.
lenic Conference on Artificial Intelligence, 2014, pp. 273–286. IEEE, 2011, pp. 315–321.
[73] Linhui Zhong, Liangbo Xue, Nengwei Zhang, Jing Xia, [88] K. Praditwong, M. Harman, and X. Yao, “Software Mod-
and Jun Chen, “A tool to support software clustering ule Clustering as a Multi-Objective Search Problem,”
using the software evolution information,” in 2016 7th IEEE Transactions on Software Engineering, vol. 37, no. 2,
IEEE International Conference on Software Engineering and pp. 264–282, 2011.
Service Science (ICSESS). IEEE, 2016, pp. 304–307. [89] K. Praditwong, “Solving software module clustering
[74] L. Zhong, J. He, N. Zhang, P. Zhang, and J. Xia, “Soft- problem by evolutionary algorithms,” in 2011 Eighth
ware Evolution Information Driven Service-Oriented International Joint Conference on Computer Science and Soft-
Software Clustering,” in EEE International Congress on Big ware Engineering (JCSSE). IEEE, 2011, pp. 154–159.
Data. IEEE, 2016, pp. 493–500. [90] K. Kobayashi, M. Kamimura, K. Kato, K. Yano, and
[75] C. Y. Chong, S. P. Lee, and T. C. Ling, “Efficient software A. Matsuo, “Feature-gathering dependency-based soft-
clustering technique using an adaptive and preventive ware clustering using Dedication and Modularity,” in
dendrogram cutting approach,” Information and Software 2012 28th IEEE International Conference on Software Main-
Technology, vol. 55, no. 11, pp. 1994–2012, 2013. tenance (ICSM). IEEE, 2012, pp. 462–471.
[76] C. Y. Chong and S. P. Lee, “Constrained agglomerative [91] J. Misra, K. Annervaz, V. Kaulgud, S. Sengupta, and
hierarchical software clustering with hard and soft con- G. Titus, “Software Clustering: Unifying Syntactic and
straints,” Proceedings of the 10th International Conference Semantic Features,” in 2012 19th Working Conference on
on Evaluation of Novel Approaches to Software Engineering Reverse Engineering. IEEE, 2012, pp. 113–122.
(ENASE), pp. 177–188, 2015. [92] G. E. Boussaidi, A. B. Belle, S. Vaucher, and H. Mili, “Re-
[77] M. N. Adnan, M. R. Islam, and S. Hossain, “Clustering constructing Architectural Views from Legacy Systems,”
software systems to identify subsystem structures using in the 19th Working Conference on Reverse Engineering.
knowledgebase,” in Malaysian Conference in Software En- IEEE, 2012, pp. 345–354.
gineering, 2011, pp. 445–450. [93] A. Mahmoud and Nan Niu, “Evaluating software clus-
[78] K. Jeet and R. Dhir, “Software Architecture Recovery tering algorithms in the context of program comprehen-
using Genetic Black Hole Algorithm,” ACM SIGSOFT sion,” in 2013 21st International Conference on Program
Software Engineering Notes, vol. 40, no. 1, pp. 1–5, 2015. Comprehension (ICPC). IEEE, 2013, pp. 162–171.
[79] A. S. Mamaghani and M. R. Meybodi, “Clustering of [94] V. Köhler, M. Fampa, and O. Araújo, “Mixed-Integer Lin-
Software Systems Using New Hybrid Algorithms,” in ear Programming Formulations for the Software Clus-
The 9th IEEE International Conference on Computer and tering Problem,” Computational Optimization and Applica-
Information Technology. IEEE, 2009, pp. 20–25. tions, vol. 55, no. 1, pp. 113–135, 2013.
[80] Z. Han, L. Wang, L. Yu, X. Chen, J. Zhao, and X. Li, “De- [95] A. C. Kumari, K. Srinivas, and M. P. Gupta, “Software
sign pattern directed clustering for understanding open module clustering using a hyper-heuristic based multi-
source code,” in 2009 IEEE 17th International Conference objective genetic algorithm,” in 2013 3rd IEEE Interna-
on Program Comprehension. IEEE, 2009, pp. 295–296. tional Advance Computing Conference (IACC). IEEE, 2013,
[81] F. Beck and S. Diehl, “Evaluating the Impact of Software pp. 813–818.
Evolution on Software Clustering,” in 2010 17th Working [96] C. Deiters, A. Rausch, and M. Schindler, “Using spectral
Conference on Reverse Engineering. IEEE, 2010, pp. 99– clustering to automate identification and optimization
108. of component structures,” in 2013 2nd International Work-
[82] Y. Wang, P. Liu, H. Guo, H. Li, and X. Chen, “Im- shop on Realizing Artificial Intelligence Synergies in Software
proved Hierarchical Clustering Algorithm for Software Engineering (RAISE). IEEE, 2013, pp. 14–20.
Architecture Recovery,” in 2010 International Conference [97] A. Ibrahim, D. Rayside, and R. Kashef, “Cooperative
on Intelligent Computing and Cognitive Informatics. IEEE, based software clustering on dependency graphs,” in
2010, pp. 247–250. 2014 IEEE 27th Canadian Conference on Electrical and Com-
[83] Q. Zhang, D. Qiu, Q. Tian, and L. Sun, “Object-oriented puter Engineering (CCECE). IEEE, 2014, pp. 1–6.
software architecture recovery using a new hybrid clus- [98] A. M. Saeidi, J. Hage, R. Khadka, and S. Jansen, “A
tering algorithm,” in 2010 Seventh International Conference search-based approach to multi-view clustering of soft-
on Fuzzy Systems and Knowledge Discovery. IEEE, 2010, ware systems,” in 2015 IEEE 22nd International Confer-
pp. 2546–2550. ence on Software Analysis, Evolution, and Reengineering
[84] M. Shtern and V. Tzerpos, “On the Comparability of (SANER). IEEE, 2015, pp. 429–438.
Software Clustering Algorithms,” in 2010 IEEE 18th In- [99] M. Schindler, O. Fox, and A. Rausch, “Clustering
ternational Conference on Program Comprehension. IEEE, Source Code Elements by Semantic Similarity Using
2010, pp. 64–67. Wikipedia,” in 2015 IEEE/ACM 4th International Workshop
[85] I. Sora, G. Glodean, and M. Gligor, “Software architec- on Realizing Artificial Intelligence Synergies in Software
ture reconstruction: An approach based on combining Engineering. IEEE, 2015, pp. 13–18.
graph clustering and partitioning,” in 2010 International [100] J. Kaur and P. Tomar, “A software component selection
Joint Conference on Computational Cybernetics and Technical technique based on fuzzy clustering,” in 2016 1st India
Informatics. IEEE, 2010, pp. 259–264. International Conference on Information Processing (IICIP).
[86] M. Faunes, M. Kessentini, and H. Sahraoui, “Deriving IEEE, 2016, pp. 1–5.
High-level Abstractions from Legacy Software Using [101] D. Jensen and A. Lundkvist, “On the significance of re-
Example-driven Clustering,” in Proceedings of the 2011 lationship directions in clustering algorithms for reverse
Conference of the Center for Advanced Studies on Collabora- engineering,” in Proceedings of the Symposium on Applied
tive Research, 2011, pp. 188–199. Computing - SAC ’17. New York, New York, USA: ACM
[87] U. Erdemir, U. Tekin, and F. Buzluca, “Object Oriented Press, 2017, pp. 1239–1244.
Software Clustering Based on Community Structure,” [102] X. Li, L. Zhang, and N. Ge, “Framework Informa-
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 27
tion Based Java Software Architecture Recovery,” in [117] R. Adnan, B. Graaf, A. van Deursen, and J. Zonn-
24th Asia-Pacific Software Engineering Conference Work- eveld, “Using Cluster Analysis to Improve the Design of
shops (APSECW). IEEE, 2017, pp. 114–120. Component Interfaces,” in The 23rd IEEE/ACM Interna-
[103] S. M. Naim, K. Damevski, and M. S. Hossain, “Recon- tional Conference on Automated Software Engineering (ASE).
structing and evolving software architectures using a IEEE, sep 2008, pp. 383–386.
coordinated clustering framework,” Automated Software [118] K. Cassell, P. Andreae, L. Groves, and J. Noble, “Towards
Engineering, vol. 24, pp. 543–572, 2017. automating class-splitting using betweenness cluster-
[104] M. Hall, N. Walkinshaw, and P. McMinn, “Effectively ing,” in ASE2009 - 24th IEEE/ACM International Confer-
incorporating expert knowledge in automated software ence on Automated Software Engineering, 2009, pp. 595–599.
remodularisation,” IEEE Transactions on Software Engi- [119] M. Fokaefs, N. Tsantalis, E. Stroulia, and A. Chatzi-
neering, vol. 44, no. 7, pp. 613–630, 2018. georgiou, “JDeodorant: Identification and Application
[105] J. Sun and B. Ling, “Software Module Clustering Al- of Extract Class Refactorings,” in Proceeding of the 33rd
gorithm Using Probability Selection,” Wuhan University international conference on Software engineering - ICSE ’11.
Journal of Natural Sciences, vol. 23, no. 2, pp. 93–102, 2018. New York, New York, USA: ACM Press, 2011, pp. 1037–
[106] K. Z. Zamli, F. Din, N. Ramli, and B. S. Ahmed, “Soft- 1039.
ware Module Clustering Based on the Fuzzy Adaptive [120] Y. Zhang, G. Huang, W. Zhang, X. Liu, and H. Mei,
Teaching Learning Based Optimization Algorithm,” in “Towards module-based automatic partitioning of Java
Proceedings of Intelligent and Interactive Computing (IIC), applications,” Frontiers of Computer Science, vol. 6, no. 6,
2018, pp. 167–177. pp. 725–740, 2012.
[107] C. Cho, K.-S. Lee, M. Lee, and C.-G. Lee, “Software [121] A. Hussain and M. S. Rahman, “A new hierarchical
Architecture Module-View Recovery Using Cluster En- clustering technique for restructuring software at the
sembles,” IEEE Access, vol. 7, pp. 72 872–72 884, 2019. function level,” in Proceedings of the 6th India Software
[108] M. Papachristou, “Software clusterings with vector se- Engineering Conference on - ISEC ’13. New York, New
mantics and the call graph,” in Proceedings of the 27th York, USA: ACM Press, 2013, pp. 45–54.
ACM Joint Meeting on European Software Engineering Con- [122] G. Santos, M. T. Valente, and N. Anquetil, “Remodular-
ference and Symposium on the Foundations of Software En- ization analysis using semantic clustering,” in 2014 Soft-
gineering (ESEC/FSE), 2019, pp. 1184–1186. ware Evolution Week - IEEE Conference on Software Main-
[109] H. Sözer, “Evaluating the effectiveness of multi-level tenance, Reengineering, and Reverse Engineering (CSMR-
greedy modularity clustering for software architecture WCRE). IEEE, 2014, pp. 224–233.
recovery,” in European Conference on Software Architecture [123] A. S. Mamaghani and M. Hajizadeh, “Software modu-
(ECSA), 2019, pp. 71–87. larization using the modified firefly algorithm,” in 2014
[110] L. Lövei, C. Hoch, H. Köllö, T. Nagy, A. Nagyné Víg, 8th. Malaysian Software Engineering Conference (MySEC).
D. Horpácsi, R. Kitlei, and R. Király, “Refactoring mod- IEEE, 2014, pp. 321–324.
ule structure,” in Proceedings of the 7th ACM SIGPLAN
[124] L. L. Silva, M. T. Valente, and M. d. A. Maia, “Assessing
workshop on ERLANG. ACM Press, 2008, pp. 83–89.
modularity using co-change clusters,” in Proceedings of
[111] L.-H. Zhong, L. Xu, M.-s. Ye, Y. Zheng, and B. Xie, the 13th international conference on Modularity. ACM
“An Approach for Software Architecture Refactoring Press, 2014, pp. 49–60.
Based on Clustering of Extended Component Depen-
dency Graph,” in International Conference on Computa- [125] M. W. Mkaouer, M. Kessentini, S. Bechikh, K. Deb, and
tional Intelligence and Software Engineering. IEEE, dec M. Ó Cinnéide, “High dimensional search-based soft-
2009, pp. 1–4. ware engineering: Finding Tradeoffs Among 15 Objec-
tives for Automating Software Refactoring Using NSGA-
[112] M. Fokaefs, N. Tsantalis, A. Chatzigeorgiou, and
III,” in Proceedings of the 2014 conference on Genetic and
J. Sander, “Decomposing object-oriented class modules
evolutionary computation (GECCO). ACM Press, 2014,
using an agglomerative clustering technique,” in 2009
pp. 1263–1270.
IEEE International Conference on Software Maintenance.
IEEE, 2009, pp. 93–101. [126] M. Paixao, M. Harman, and Y. Zhang, “Multi-objective
[113] M. Glorie, A. Zaidman, A. van Deursen, and L. Hofland, Module Clustering for Kate,” in International Symposium
“Splitting a large software repository for easing fu- on Search Based Software Engineering SSBSE 2015: Search-
ture software evolution-an industrial experience report,” Based Software Engineering, 2015, pp. 282–288.
Journal of Software Maintenance and Evolution: Research and [127] I. Šūpulniece, I. Pol, aka, S. Běrziša, E. Ozolin, š, E. Palacis,
Practice, vol. 21, no. 2, pp. 113–141, 2009. E. Meiers, and J. Grabis, “Source Code Driven Enter-
[114] Y. Liu, G. Guo, and J. Qi, “An algorithm of system de- prise Application Decomposition: Preliminary Evalua-
composition based on laplace spectral graph partitioning tion,” Procedia Computer Science, vol. 77, pp. 167–175,
technology,” in Proceedings - International Conference on 2015.
Computer Science and Software Engineering, CSSE 2008, [128] M. Bishnoi and P. Singh, “Modularizing Software Sys-
2008, pp. 85–89. tems using PSO optimized hierarchical clustering,” in
[115] J. Dietrich, V. Yakovlev, C. McCartin, G. Jenson, and 2016 International Conference on Computational Techniques
M. Duchrow, “Cluster analysis of Java dependency in Information and Communication Technologies (ICCTICT).
graphs,” in Proceedings of the 4th ACM symposium on IEEE, 2016, pp. 659–664.
Software visuallization (SoftVis). New York, New York, [129] J. Lee, D.-K. Kim, J. Park, and S. Park, “Class Modu-
USA: ACM Press, 2008, pp. 91–94. larization Using Indirect Relationships,” in the 22nd In-
[116] B. Khan, S. Sohail, and M. Y. Javed, “Evolution Strategy ternational Conference on Engineering of Complex Computer
Based Automated Software Clustering Approach,” in Systems (ICECCS). IEEE, 2017, pp. 110–119.
2008 Advanced Software Engineering and Its Applications [130] Y. Wang, H. Yu, Z. Zhu, W. Zhang, and Y. Zhao, “Auto-
(ASEA). IEEE, dec 2008, pp. 27–34. matic Software Refactoring via Weighted Clustering in
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 28
Method-Level Networks,” IEEE Transactions on Software based technique for software quality prediction through
Engineering, vol. 44, no. 3, pp. 202–236, 2018. clustering and chi-square test,” in 2015 International Con-
[131] B. G. Varghese R, K. Raimond, and J. Lovesum, “A ference on Applied and Theoretical Computing and Commu-
novel approach for automatic remodularization of soft- nication Technology (iCATccT). IEEE, 2015, pp. 238–245.
ware systems using extended ant colony optimization [146] P. S. Sandhu, M. Kaur, and A. Kaur, “A Density Based
algorithm,” Information and Software Technology, vol. 114, Clustering approach for early detection of fault prone
pp. 107–120, 2019. modules,” in 2010 International Conference on Electronics
[132] E. Hatami and B. Arasteh, “An efficient and stable and Information Engineering. IEEE, 2010, pp. 525–530.
method to cluster software modules using ant colony [147] X. Tan, X. Peng, S. Pan, and W. Zhao, “Assessing
optimization algorithm,” The Journal of Supercomputing, Software Quality by Program Clustering and Defect
pp. 1–23, 2019. Prediction,” in 2011 18th Working Conference on Reverse
[133] Y.-S. Seo and J.-H. Huh, “GUI-based software modular- Engineering. IEEE, 2011, pp. 244–248.
ization through module clustering in edge computing [148] C. M. Rosenberg and L. Moonen, “Improving problem
based IoT environments,” Journal of Ambient Intelligence identification via automated log clustering using dimen-
and Humanized Computing, vol. 22, pp. 7287–7311, 2019. sionality reduction,” in Proceedings of the 12th ACM/IEEE
[134] S. Kebir, A.-D. Seriai, A. Chaoui, and S. Chardigny, International Symposium on Empirical Software Engineering
“Comparing and Combining Genetic and Clustering and Measurement - ESEM ’18. New York, New York,
Algorithms for Software Component Identification from USA: ACM Press, 2018, pp. 1–10.
Object-Oriented Code,” in Proceedings of the 5th Interna- [149] M. Bailey, K.-i. K.-I. K.-i. K.-I. Lin, and L. Sherrell,
tional Conference on Computer Science and Software Engi- “Clustering Source Code Files to Predict Change Prop-
neering, 2012, pp. 1–8. agation during Software Maintenance,” in Proceedings of
[135] C. Srinivas, V. Radhakrishna, and C. V. G. Rao, “Clus- the Annual Southeast Conference, 2012, pp. 106–111.
tering Software Components for Component Reuse and [150] R. Benkoczi, D. Gaur, S. Hossain, and M. A. Khan,
Program Restructuring,” in Proceedings of the Second In- “A design structure matrix approach for measuring co-
ternational Conference on Innovative Computing and Cloud change-modularity of software products,” in Proceedings
Computing - ICCC ’13. New York, New York, USA: ACM of the 15th International Conference on Mining Software
Press, 2013, pp. 261–266. Repositories - MSR ’18, 2018, pp. 331–335.
[136] V. Radhakrishna, C. Srinivas, and C. Rao, “Document [151] A. K. Malviya and V. K. Yadav, “Maintenance activities in
Clustering Using Hybrid XOR Similarity Function for object oriented software systems using K-means cluster-
Efficient Software Component Reuse,” Procedia Computer ing technique: A review,” in 2012 CSI Sixth International
Science, vol. 17, pp. 121–128, 2013. Conference on Software Engineering (CONSEG). IEEE,
[137] C. Patel, A. Hamou-Lhadj, and J. Rilling, “Software 2012, pp. 1–5.
Clustering Using Dynamic Analysis and Static Depen- [152] B. Mathur and M. Kaushik, “In Object-Oriented Software
dencies,” in 2009 13th European Conference on Software Framework Improving Maintenance Exercises Through
Maintenance and Reengineering. IEEE, 2009, pp. 27–36. K-Means Clustering Approach,” in 2018 3rd International
[138] S. Vodithala and S. Pabboju, “A clustering technique Conference On Internet of Things: Smart Innovation and
based on the specifications of software components,” in Usages (IoT-SIU). IEEE, 2018, pp. 1–7.
2015 International Conference on Advanced Computing and [153] S. H. Hamad, T. Fergany, R. A. Ammar, and A. A. Abd
Communication Systems. IEEE, 2015, pp. 1–6. El-Raouf, “A Double K-Clustering Approach for restruc-
[139] S. Hasheminejad and S. Jalili, “CCIC: Clustering analysis turing Distributed Object-Oriented software,” in IEEE
classes to identify software components,” Information and Symposium on Computers and Communications (ISCC).
Software Technology, vol. 57, pp. 329–351, 2015. IEEE, jul 2008, pp. 169–174.
[140] A. C. Kumari and K. Srinivas, “Hyper-heuristic approach [154] A. A. El-raouf, “Restructuring Distributed Object-
for multi-objective software module clustering,” Journal Oriented Software Using Hierarchical Clustering,” in
of Systems and Software, vol. 117, pp. 384–401, 2016. Proceeding ICCOMP’09 Proceedings of the WSEAES 13th
[141] V. Karande, S. Chandra, Z. Lin, J. Caballero, L. Khan, international conference on Computers, 2009, pp. 412–416.
and K. Hamlen, “BCD: Decomposing Binary Code Into [155] A. Ashish, “Clones clustering using K-means,” Proceed-
Components Using Graph-Based Clustering,” in Pro- ings of the 10th International Conference on Intelligent Sys-
ceedings of the 2018 on Asia Conference on Computer and tems and Control (ISCO), pp. 1–6, 2016.
Communications Security - ASIACCS ’18. New York, New [156] M. Sudhamani and L. Rangarajan, “Code similarity
York, USA: ACM Press, 2018, pp. 393–398. detection through control statement and program fea-
[142] M. A. Saied, A. Ouni, H. Sahraoui, R. G. Kula, K. Inoue, tures,” Expert Systems with Applications, vol. 132, pp. 63–
and D. Lo, “Improving reusability of software libraries 75, 2019.
through usage pattern mining,” Journal of Systems and [157] P. Kreutzer, G. Dotzler, M. Ring, B. M. Eskofier, and
Software, vol. 145, pp. 164–179, 2018. M. Philippsen, “Automatic clustering of code changes,”
[143] C. Psarras, T. Diamantopoulos, and A. Symeonidis, “A in Proceedings of the 13th International Workshop on Mining
Mechanism for Automatically Summarizing Software Software Repositories - MSR ’16. New York, New York,
Functionality from Source Code,” in IEEE 19th Inter- USA: ACM Press, 2016, pp. 61–72.
national Conference on Software Quality, Reliability and [158] Q. Khan, U. Akram, W. H. Butt, and S. Rehman, “Im-
Security (QRS). IEEE, 2019, pp. 121–130. plementation and evaluation of optimized algorithm for
[144] R. Islam and K. Sakib, “A Package Based Clustering for software architectures analysis through unsupervised
enhancing software defect prediction accuracy,” in 2014 learning (clustering),” in 2016 17th International Confer-
17th International Conference on Computer and Information ence on Sciences and Techniques of Automatic Control and
Technology (ICCIT). IEEE, 2014, pp. 81–86. Computer Engineering (STA). IEEE, 2016, pp. 266–276.
[145] A. Ali, K. Choudhary, and A. Sharma, “Object oriented [159] B. T. Bennett, “Using hierarchical agglomerative cluster-
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 29
70 [144] A Package Based Clustering for enhancing software defect prediction accuracy 2014
71 [28] An empirical study of the sensitivity of quality indicator for software module clustering 2014
72 [124] Assessing modularity using co-change clusters 2014
73 [72] Combining Clustering and Classification for Software Quality Evaluation 2014
74 [97] Cooperative based software clustering on dependency graphs 2014
75 [125] High dimensional search-based software engineering: Finding Tradeoffs Among 15 Objectives for Automating 2014
Software Refactoring Using NSGA-III
76 [122] Remodularization analysis using semantic clustering 2014
77 [123] Software modularization using the modified firefly algorithm 2014
78 [138] A clustering technique based on the specifications of software components 2015
79 [98] A search-based approach to multi-view clustering of software systems 2015
80 [61] Adaptive Clustering Techniques for Software Components and Architecture 2015
81 [139] CCIC: Clustering analysis classes to identify software components 2015
82 [99] Clustering Source Code Elements by Semantic Similarity Using Wikipedia 2015
83 [76] Constrained agglomerative hierarchical software clustering with hard and soft constraints 2015
84 [126] Multi-objective Module Clustering for Kate 2015
85 [145] Object oriented based technique for software quality prediction through clustering and chi-square test 2015
86 [78] Software Architecture Recovery using Genetic Black Hole Algorithm 2015
87 [6] Software Clone Detection Using Clustering Approach 2015
88 [127] Source Code Driven Enterprise Application Decomposition: Preliminary Evaluation 2015
89 [51] A New Binary Similarity Measure Based on Integration of the Strengths of Existing Measures: Application to Software 2016
Clustering
90 [62] A similarity-based modularization quality measure for software module clustering problems 2016
91 [100] A software component selection technique based on fuzzy clustering 2016
92 [73] A tool to support software clustering using the software evolution information 2016
93 [157] Automatic clustering of code changes 2016
94 [155] Clones clustering using K-means 2016
95 [71] Clustering Software Metric Values Extracted from C# Code for Maintainability Assessment 2016
96 [140] Hyper-heuristic Approach for Multi-Objective Software Module Clustering 2016
97 [158] Implementation and evaluation of optimized algorithm for software architectures analysis through unsupervised 2016
learning (clustering)
98 [128] Modularizing Software Systems using PSO optimized hierarchical clustering 2016
99 [74] Software Evolution Information Driven Service-Oriented Software Clustering 2016
100 [44] Weighing lexical information for software clustering in the context of architecture recovery 2016
101 [58] A hierarchical clustering-based approach for software restructuring at the package level 2017
102 [63] A multi-agent evolutionary algorithm for software module clustering problems 2017
103 [29] A Particle Swarm Optimization-Based Heuristic for Software Module Clustering Problem 2017
104 [129] Class Modularization Using Indirect Relationships 2017
105 [30] FP-ABC: Fuzzy-Pareto dominance driven artificial bee colony algorithm for many-objective software module 2017
clustering
106 [102] Framework Information Based Java Software Architecture Recovery 2017
107 [55] Improved binary similarity measures for software modularization 2017
108 [33] Improving package structure of object-oriented software using multi-objective optimization and weighted class 2017
connections
109 [69] Large Neighborhood Search applied to the Software Module Clustering problem 2017
110 [101] On the significance of relationship directions in clustering algorithms for reverse engineering 2017
111 [103] Reconstructing and evolving software architectures using a coordinated clustering framework 2017
112 [31] Semantic-based software clustering using hill climbing 2017
113 [32] Software Remodularization by Estimating Structural and Conceptual Relations Among Classes and Using Hierarchical 2017
Clustering
114 [159] Using hierarchical agglomerative clustering to locate potential aspect interference 2017
115 [150] A design structure matrix approach for measuring co-change-modularity of software products 2018
116 [64] Analyzing the structure of Java software systems by weighted K-core decomposition 2018
117 [130] Automatic Software Refactoring via Weighted Clustering in Method-Level Networks 2018
118 [141] BCD: Decomposing Binary Code Into Components Using Graph-Based Clustering 2018
119 [160] Discovering Program Topoi via Hierarchical Agglomerative Clustering 2018
120 [104] Effectively incorporating expert knowledge in automated software remodularisation 2018
121 [161] Functionality-Oriented Microservice Extraction Based on Execution Trace Clustering 2018
122 [34] Improving Cohesion of a Software System by Performing Usage Pattern Based Clustering 2018
123 [148] Improving Problem Identification via Automated Log Clustering using Dimensionality Reduction 2018
124 [142] Improving reusability of software libraries through usage pattern mining 2018
125 [152] In Object-Oriented Software Framework Improving Maintenance Exercises Through K-Means Clustering Approach 2018
126 [35] Many-objective artificial bee colony algorithm for large-scale software module clustering problem 2018
127 [105] Software Module Clustering Algorithm Using Probability Selection 2018
128 [106] Software Module Clustering Based on the Fuzzy Adaptive Teaching Learning Based Optimization Algorithm 2018
129 [143] A Mechanism for Automatically Summarizing Software Functionality from Source Code 2019
130 [36] A multi-objective search based approach to identify reusable software components 2019
131 [45] A new algorithm for software clustering considering the knowledge of dependency between artifacts in the source 2019
code
132 [131] A novel approach for automatic remodularization of software systems using extended ant colony optimization 2019
algorithm
133 [132] An efficient and stable method to cluster software modules using ant colony optimization algorithm 2019
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 32
134 [156] Code similarity detection through control statement and program features 2019
135 [52] Euclidean space based hierarchical clusterers combinations: an application to software clustering 2019
136 [109] Evaluating the effectiveness of multi-level greedy modularity clustering for software architecture recovery 2019
137 [133] GUI-based software modularization through module clustering in edge computing based IoT environments 2019
138 [47] Hybrid of genetic algorithm and krill herd for software clustering problem 2019
139 [8] Multi-programming language software systems modularization 2019
140 [107] Software Architecture Module-View Recovery Using Cluster Ensembles 2019
141 [108] Software clusterings with vector semantics and the call graph 2019
142 [48] Software Modularization by Combining Genetic and Hierarchical Algorithms 2019
143 [49] Tarimliq: A new internal metric for software clustering analysis 2019
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 33